Jeff Bezos and OpenAI Invest in Physical Intelligence to Develop a Universal Robot "Brain"

2024-11-05

Physical Intelligence, a robotics startup, recently announced the successful raising of $400 million in new funding. The company is dedicated to developing practical AI models aimed at creating "brains" for robots. This funding round was co-led by Jeff Bezos, the founder and Executive Chairman of Amazon, Thrive Capital, and Lux Capital. Other notable investors in this round include renowned AI firm OpenAI, Redpoint Ventures, and Bond. Following the investment, Physical Intelligence's valuation has reached approximately $2.4 billion. Earlier, in March of this year, the company secured a $70 million seed round led by Thrive Capital.

Physical Intelligence was co-founded by former Google Robotics scientist Carol Hausman, who also serves as the CEO. The team comprises researchers from the University of California, Berkeley, and Stanford University. The company focuses on developing a universal AI model suitable for various types of robots, enabling them to comprehend the physical world and execute complex, multi-task operations.

Hausman stated in an interview that the company's development extends beyond creating brains for specific robots; they are building a universal brain capable of controlling any robot.

According to Physical Intelligence, most current robots are specialized, with the majority of industrial robots limited to performing single tasks or a series of simplified actions. While these robots can continue operating amidst minor environmental changes, they struggle to adapt to cluttered or complex real-world settings such as homes or other practical locations.

In a blog post last week, Physical Intelligence stated that artificial intelligence has the potential to transform this situation, enabling robots to learn and follow user instructions. Users only need to specify the desired task, and the robot will autonomously adjust its behavior to adapt to the environment.

To achieve this goal, Physical Intelligence has developed an AI model named π0 (pi-zero), which serves as a universal foundational model for robots. Users can simply issue commands to the robot, enabling it to perform tasks in a manner similar to interacting with large language model chat assistants. Unlike large language models, pi-zero processes various types of data, including text, images, videos, and "physical intelligence"—the actual experience of bodily movements, grasping, and manipulating objects.

Using pi-zero, Physical Intelligence has demonstrated the robot's capability for precise adjustments in tasks such as folding clothes, making coffee, clearing dining tables, and assembling boxes. For instance, when clearing a dining table, the robot must distinguish between trash and tableware, dispose of the garbage into the bin, and place the utensils onto a tray. It also needs to learn to shake off any trash before placing the tableware on the tray.

The company states that the primary challenge in developing universal models is the lack of large-scale multi-task and multi-robot data. As the dataset expands, it will lay the foundation for creating more powerful and flexible robot brains.

Physical Intelligence notes that although progress has been made, universal robot models are still in their infancy. Similar to how large language models serve as foundational models in the language domain, universal robot models provide the foundational AI for physical intelligence.

Currently, there are other similar foundational models for robot control in the market, such as the open-source OpenVLA model with 7 billion parameters, commonly used by academic researchers for experiments, and the Octo model boasting 93 billion parameters. Physical Intelligence claims that its pi-zero outperforms both OpenVLA and Octo in most complex tasks.

Equipping robots with "brains" has become a long-term trend in the technology industry. Last year, Google researchers showcased a robot utilizing the 562 million-parameter PaLM-E model, capable of understanding basic single voice commands such as picking up and passing objects. Earlier this year, NVIDIA also announced Project GR00T, aimed at developing a universal foundational model for bipedal humanoid robots.

Physical Intelligence states that realizing this vision requires not only more data but also the collective effort of the entire robotics community. The company has established partnerships with multiple companies and robotics laboratories to collaboratively improve hardware designs and utilize partners' data to train pre-trained models.