Google's artificial intelligence research division, DeepMind, has announced three new advancements that it says will help robots make better, faster, and safer decisions in the field.
These advancements include a new training system for AI systems designed to drive autonomous robots, which improves the way robots collect data and enhances their ability to generalize, among other things.
DeepMind states that its research aims to create robots that can better understand and perform more complex tasks without the need to train from scratch for each task.
AutoRT
The first advancement is a new AI training system called AutoRT, which combines large language models and visual language models with specialized algorithms for robots. Its goal is to expand the learning capabilities of robots and teach them to perform useful real-world applications.
According to DeepMind, AutoRT is a technology that helps teach multiple robots simultaneously to perform different tasks in different environments. Robots use Visual Language Models (VLM) to understand the environment and objects they see, while deploying Large Language Models (LLM) to suggest and select appropriate tasks for the robots to execute.
The AutoRT system underwent extensive testing for seven months and safely trained 20 robots simultaneously. Overall, DeepMind generated a rich dataset of 77,000 robot experiments, including 6,650 independent tasks performed by the robots.
DeepMind explains that AutoRT incorporates a "robot constitution" that applies safety rules to decision-makers based on LLM. These rules are based on Isaac Asimov's Three Laws of Robotics, which prioritize human safety and require robots to avoid any tasks involving humans, animals, electronic devices, or sharp objects.
In addition, it relies on established safety rules applied to traditional robots. For example, if a robot is under excessive pressure beyond a certain limit while performing a task, such as attempting to lift an object that is too heavy, it will automatically stop executing the task.
SARA-RT
The second system, Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT), aims to help robot transformer models learn to execute new tasks in the most efficient way.
DeepMind explains that it uses a technique called "up-training" to fine-tune robot transformer models. This technique reduces the computational workload of the original model by converting quadratic complexity into linear complexity, while improving its speed.
DeepMind's researchers state that SARA-RT is the "first scalable attention mechanism that provides computational improvements without sacrificing quality."
They explain that SARA-RT can be applied to various transformer models, including those that handle spatial data from robot depth cameras, expanding their applications in the robotics industry.
RT-Trajectory
Lastly, in a new AI model called RT-Trajectory, DeepMind outlines a new technique that introduces visual contours from training videos into robot motion descriptions. By doing so, it can help robots generalize more effectively and improve their understanding of how to perform specific tasks.
The working principle of this new technique is to overlay 2D trajectory sketches of robot arms onto training videos, providing the model with a convenient low-level visual clue to assist in learning robot control strategies.
Researchers conducted a test that involved 41 unknown tasks performed on a robot arm driven by RT-Trajectory and found that its performance was more than twice as good as existing models. Overall, it achieved a 63% task success rate, compared to a 29% success rate of an older technique called RT-2.
DeepMind states that it hopes these new systems will be adopted by the robotics industry and help build safer, more efficient, and more useful robots capable of performing a wider range of tasks.