Robot developer Figure has released a video on social media showcasing its first humanoid robot engaging in real-time conversation with OpenAI's generative artificial intelligence, generating widespread attention.
Excitedly, Figure tweeted, "With the help of OpenAI's technology, Figure 01 can now engage in complete conversations with people," highlighting the robot's ability to instantly understand and respond to human interaction.
The company explained that its collaboration with OpenAI has brought advanced visual and language intelligence to its robot, enabling it to perform "quick, low-level, dexterous actions."
In the video, Figure 01 interacts with its founder and senior AI engineer Corey Lynch. Lynch instructs the robot to complete a series of tasks in a makeshift kitchen, including identifying apples, plates, and cups.
When Lynch asks the robot for something to eat, Figure 01 quickly identifies the apple as food. Then, Lynch asks Figure 01 to collect the trash in a basket and simultaneously asks it questions, demonstrating the robot's excellent multitasking abilities.
On Twitter, Lynch provides a more detailed explanation of the Figure 01 project. He writes, "Our robot is capable of describing its visual experiences, planning future actions, reflecting on its memory, and verbally explaining its reasoning process."
According to Lynch, they convert the images captured by the robot's camera and the voice captured by the built-in microphone into text, which is then input into a large multimodal model trained by OpenAI. Multimodal AI refers to artificial intelligence that can understand and generate different types of data, such as text and images.
Lynch emphasizes that Figure 01's behavior is learned, runs at normal speed, and is not remotely controlled. "The model processes the entire conversation history, including past images, to generate language responses, which are conveyed to humans through text-to-speech," Lynch says. "The same model is also responsible for deciding which learned closed-loop behavior to run on the robot to execute a given command, loading specific neural network weights onto the GPU and executing the strategy."
Lynch explains that Figure 01 is designed to describe its surroundings concisely and can apply "common sense" in decision-making, such as inferring that plates should be placed on a shelf. It can also parse ambiguous statements, such as hunger, and translate them into actions, such as providing an apple, while explaining its actions.
Figure 01's debut on Twitter has sparked enthusiastic responses, with many impressed by its capabilities and considering it a significant milestone in the development of artificial intelligence.
Lynch also shared some technical details for AI developers and researchers. He says, "All actions are driven by neural network visual motion transformers, which directly map pixels to actions. These networks receive images from the robot's body at a frequency of 10 Hz and generate 24-DOF actions (wrist poses and finger joint angles) at a frequency of 200 Hz."
Figure 01's influential debut comes at a time when policymakers and global leaders are working to address the spread of AI tools into mainstream domains. While most discussions focus on large language models like OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude AI, developers are also exploring ways to combine artificial intelligence with humanoid robot entities.
Currently, Figure AI and OpenAI have not responded to Decrypt's request for comment.
Ken Goldberg, a professor of industrial engineering at the University of California, Berkeley, previously told Decrypt, "One is a pragmatic goal, which is what Elon Musk and others are pursuing. A lot of work is being done now—why people invest in companies like Figure—is because they hope these robots will work and be compatible," especially in the field of space exploration.
In addition to Figure, other companies are also working to combine AI with robotics technology, such as Hanson Robotics, which introduced its Desdemona AI robot in 2016.
Corey Lynch, senior AI engineer at Figure AI, tweeted, "Even just a few years ago, I would have thought having full conversations with humanoid robots while they plan and execute fully learned behaviors would be something we'd have to wait decades to see. Clearly, a lot has changed."