Google DeepMind has recently unveiled two new AI models designed to enable robots to perform a broader range of real-world tasks. One of these models is called Gemini Robotics, a vision-language-action model capable of understanding novel situations even without prior training.
Gemini Robotics is built on Google's flagship AI model, Gemini 2.0. During the press conference, Carolina Parada, Senior Director of Robotics at Google DeepMind, explained that Gemini Robotics transfers its capabilities to the physical world by integrating Gemini’s multimodal understanding of the world and adding physical actions as a new modality.
This new model has made significant progress in three key areas that Google DeepMind considers crucial for developing practical robots: generality, interactivity, and dexterity. Not only can Gemini Robotics generalize to new scenarios, but it also excels in interacting with humans and the environment while performing precise physical tasks such as origami folding or unscrewing bottle caps.
Parada highlighted that while advancements in these three areas have historically been achieved separately in general-purpose robotics, they are now being significantly improved across all domains through a single model. This breakthrough allows for the creation of more capable, responsive, and robust robots adaptable to environmental changes.
In addition, Google DeepMind introduced Gemini Robotics-ER (Embodied Reasoning model), which the company describes as an advanced vision-language model capable of "understanding our complex and dynamic world." For instance, when packing a lunchbox, this model must know the location of items, how to open the lunchbox, how to grasp objects, and where to place them. Gemini Robotics-ER is designed to connect with existing low-level controllers (systems governing robotic movements) to enable new functionalities driven by the model.
Regarding safety, Vikas Sindhwani, a researcher at Google DeepMind, told reporters that the company is developing a "hierarchical approach," adding that the Gemini Robotics-ER model "is trained to evaluate whether executing a potential action in a given scenario is safe." The company has also released new benchmarks and frameworks to advance safety research within the AI industry. Last year, Google DeepMind introduced its "Robot Constitution," a set of rules for robotic behavior inspired by Isaac Asimov.
Currently, Google DeepMind is collaborating with Apptronik to "build the next generation of humanoid robots" and is providing trusted testers, including Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools, with access to the Gemini Robotics-ER model.