1X Technologies, a robotics startup, recently unveiled a groundbreaking achievement—a novel generative model that significantly enhances the efficiency of training robotic systems within simulation environments. This model directly addresses a key challenge in robotics: developing a "world model" capable of forecasting how the environment responds to robot actions.
Due to the high costs and potential risks associated with training robots directly in physical environments, researchers typically utilize simulated settings to train control models before deploying them in the real world. However, discrepancies between simulated and real environments give rise to numerous challenges, commonly referred to as the "simulation-to-reality gap."
To bridge this gap, 1X Technologies' new model trains directly on raw sensor data collected from robots, enabling it to simulate real-world scenarios. By analyzing thousands of hours of video footage and actuator data gathered by the company's proprietary humanoid robot, EVE, during various household and office mobility tasks, the model can predict the outcomes of specific robot actions based on current environmental observations.
This learned world model is particularly adept at simulating object interactions. As showcased in a video shared by 1X Technologies, the model successfully predicts video sequences of a robot grasping boxes and simulates complex object interactions, including rigid bodies, object drop effects, partial observability, deformable objects (such as curtains and clothing), and hinged objects (such as doors, drawers, and chairs).
Despite the model's demonstrated potential, environmental changes remain a challenge. As the robot's operating environment evolves, the generative model must also update. However, since the model serves as a fully learned simulator, researchers believe that providing fresh data from the real world can more easily rectify the model without the need for manual adjustments to the physical simulator.
1X Technologies' new system is inspired by innovative projects like OpenAI Sora and Runway, which have demonstrated that generative models can learn world models and maintain temporal consistency through appropriate training data and techniques. Unlike these models, which primarily generate video content, 1X's model is an interactive generative system capable of responding to actions during generation, opening new possibilities for training robotic control models and reinforcement learning systems.
However, the generative model also faces challenges, such as occasionally producing unrealistic scenarios, like failing to predict an object's fall while hovering or suddenly making objects disappear in consecutive frames. To address these challenges, continuously collecting more data and training more optimized models are crucial directions.
Additionally, 1X Technologies encourages community involvement by releasing its model and weights and planning to host competitions to reward contributors who improve the model. The company states that they are actively exploring various world modeling and video generation methods to further enhance the model's performance and practicality.