This week, the Massachusetts Institute of Technology (MIT) unveiled an innovative robot training framework. Unlike traditional methods that teach robots new tasks using specific datasets, this approach employs vast amounts of information for training, akin to the training methodologies used for large language models (LLMs).
Researchers have highlighted that imitation learning—which involves acquiring skills by observing individuals performing tasks—can fail when confronted with minor challenges. Such challenges may include variations in lighting, different environmental setups, or the emergence of new obstacles. In these situations, robots struggle to adapt due to insufficient data support.
To address this issue, the research team drew inspiration from the data-intensive problem-solving approaches of large language models like GPT-4. However, unlike in the linguistic domain where data predominantly consist of sentences, the robotics field encounters challenges related to data heterogeneity. Consequently, a different architecture is required to achieve comparable pre-training outcomes.
To this end, the team developed a novel architecture named the Heterogeneous Pre-training Transformer (HPT), which can integrate information from various sensors and environments. By leveraging transformer technology, these diverse data sources can be consolidated into the training model, and the performance improves proportionally with the size of the transformer.
Users can utilize this model by inputting the robot's design, configuration, and the tasks it is required to perform.
David Held, an associate professor at Carnegie Mellon University, described the goal of this research as developing a universal robotic "brain" that can be downloaded and used directly without additional training. Although still in its nascent stages, the research team hopes that by scaling up, they can achieve significant breakthroughs in robotic policy, akin to those experienced by large language models.
This research is partially funded by Toyota Research Institute. Last year, the institute showcased its night-time robot training technology at the TechCrunch Disrupt event. Recently, the institute also entered into a milestone collaboration with Boston Dynamics, aiming to integrate its robotic learning research advancements with Boston Dynamics' hardware.