1X Technologies has announced the integration of its new video-pretrained world model, 1XWM, into its NEO robot platform. This development targets robotics researchers, developers, and early adopters interested in advanced home robots that navigate and act with human-like understanding. The initial release is for a limited group, primarily for research and internal evaluation, with broader commercial deployment expected following further validation.
The 1XWM model represents a technical shift from conventional vision-language-action (VLA) models by using internet-scale video pretraining combined with egocentric human and robot data. This model predicts robot actions by generating text-conditioned video rollouts, which are then translated into motion commands through an Inverse Dynamics Model. Unlike prior approaches, this method does not require tens of thousands of robot demonstration hours, enabling faster adaptation to new tasks. The backbone is a 14B parameter generative video model, fine-tuned for NEO’s humanoid embodiment, with inference currently taking about 11 seconds per rollout.
Few notes…
— dar (@radbackwards) January 12, 2026
- This is a whole new world. We have now gone from a world where humanoid robots are constrained by tele op data collection to unlocking themselves to collect their own data by using a video backbone grounded in physics to generate pretty much any AI abilities… try…
Compared to similar models from other robotics labs, 1XWM shows improved generalization to novel objects and motions, especially for tasks not present in the robot’s training data. Early user feedback and internal benchmarks indicate that the model handles complex real-world tasks, such as bimanual coordination and robust object manipulation, with success rates matching or exceeding previous models. Experts note that leveraging egocentric human data and detailed captioning during training has led to more physically plausible and reliable robot behavior.
1X Technologies, known for its focus on embodied AI and robotics, is leveraging breakthroughs in generative video modeling and scalable hardware design. For this launch, the company collaborated with cloud infrastructure specialists at Verda to optimize inference speed, aiming to further reduce latency and expand the model’s capabilities for broader household autonomy.