Recently, Brett Adcock, the founder and CEO of Figure, revealed a new machine learning model designed for humanoid robots. This announcement came two weeks after the company declared its decision to end its collaboration with OpenAI. The new model, named Helix, is described as a "universal" Vision-Language-Action (VLA) model.
VLA models are still an emerging concept in the robotics field, leveraging visual and language inputs to process information. One of the most well-known examples in this category is Google DeepMind's RT-2, which trains robots using a combination of video data and large language models (LLMs).
Helix operates similarly by integrating visual data and language prompts to control robots in real-time. According to Figure, Helix demonstrates strong object generalization capabilities, enabling it to grasp thousands of household items with various shapes, sizes, colors, and materials that were not included in its training set, simply through natural language instructions.
Helix aims to bridge the gap between visual and language processing. Upon receiving natural language voice commands, the robot first visually assesses its surroundings before executing tasks. For instance, it can be instructed to "pass the bag on your right to the robot on your right" or "take the bag from the robot on your left and put it in the open drawer." Both examples involve coordination between two robots since Helix is designed to simultaneously control two robots to collaborate on various household chores.
Figure highlights the application of VLA models by showcasing its Model 02 humanoid robot working in a home environment. Household environments are particularly complex for robots due to the lack of structured consistency compared to warehouses and factories.
Challenges in learning and control have been one of the main barriers preventing complex robotic systems from entering homes. Combined with price tags reaching tens of thousands of dollars, most humanoid robotics companies have not prioritized the home market. Typically, they first manufacture robots for industrial clients to enhance reliability and reduce costs before considering entering the home market. The widespread adoption of domestic robots may still take several years.
Although Figure previously focused on workplace pilot projects with companies like BMW, the release of Helix indicates that the company sees potential in the home market. Testing these training models in home environments is challenging because teaching robots to perform tasks in complex areas such as kitchens can help them develop broader behavioral capabilities across different scenarios.
Figure notes that for robots to function effectively in homes, they must be able to generate intelligent new behaviors as needed, especially when encountering unfamiliar objects. Currently, teaching robots even a single new behavior requires significant human effort: either hours of manual programming by doctoral-level experts or tens of thousands of demonstrations.
Manual programming isn't suitable for home environments due to the high number of unknown factors. The differences between kitchens, living rooms, and bathrooms, as well as the tools used for cooking and cleaning, vary widely. Additionally, people leave clutter, rearrange furniture, and prefer varying lighting conditions. This approach is both time-consuming and expensive, despite Figure being well-funded.
An alternative method involves extensive training. Mechanical arms used for grasping and placing objects are often trained in labs using this method. However, what people don't see is the hundreds of hours of repetitive practice required to make the demonstrations robust enough to handle highly variable tasks. To ensure a robot accurately grasps an object on its first attempt, it must have practiced doing so hundreds of times beforehand.
Currently, Helix is still in its early stages of development. This release serves as a recruitment tool aimed at attracting more engineers to join the project and contribute to its advancement.