Covariant Launches RFM-1 Robot Foundation Model, Enabling Human-like Reasoning Abilities in Robots
Covariant officially launched RFM-1 (Robot Foundation Model 1). Peter Chen, co-founder and CEO of Covariant, a branch of the University of California, Berkeley's artificial intelligence division, described the platform as "primarily a large language model (LLM) designed specifically for robot language."
RFM-1 is an important achievement of Covariant's brain AI platform in the process of collecting a large amount of data. With the consent of their clients, this startup has been building a resource library similar to a large language model database for robots.
"The vision of RFM-1 is to empower billions of emerging robots," said Chen. "We have successfully deployed numerous robots in warehouses. However, this is far from our ultimate goal. We sincerely hope that robots can play important roles in manufacturing, food processing, recycling, agriculture, service industries, and even people's homes."
As RFM-1 platform is released, more and more robotics companies are discussing the future of "universal" systems. Emerging humanoid robot companies such as Agility, Figure, 1X, and Apptronik play key roles in this conversation. This form is particularly suitable for adapting to various environments, just like the humans they imitate, although the stability of AI/software systems in vehicles is a completely different issue.
Currently, Covariant's software is mainly deployed on industrial robotic arms to perform various warehouse tasks, including picking. Although the company promises a certain degree of hardware independence, it has not yet been deployed on humanoid robots.
"We are indeed excited about the many advances happening in the field of more universal robot hardware," said Chen. "By combining the intelligence inflection point with the hardware inflection point, we will witness further breakthroughs in robot applications. However, many of these have not been fully realized, especially in terms of hardware. It is very difficult to go beyond staged videos. How many people have personally interacted with humanoid robots? This is enough to reflect their maturity."
However, when it comes to the role of RFM-1 in the robot decision-making process, Covariant does not shy away from comparing it to humans. According to their press release, the platform "provides robots with human-like reasoning capabilities, which is the first successful attempt of generative AI to give commercial robots a deeper understanding of language and the physical world."
This is one of the areas where we need to be cautious about the statements, whether in comparison to abstract or even philosophical concepts, or in terms of their long-term effectiveness in the real world. "Human-like reasoning capabilities" is a broad concept that means different things to different people. Here, this concept applies to the system's ability to process real-world data and determine the best course of action for the current task.
This is different from traditional robot systems, which can only repeat and endlessly perform a single task. Starting from the automotive assembly line, this single-purpose robot thrives in highly structured environments. As long as the task changes minimally, the mechanical arm can repeat the work unhindered.
However, even the slightest deviation can quickly lead to failure. For example, inaccuracies in the placement of items on a conveyor belt or adjustments in lighting that affect onboard cameras. These differences have a huge impact on the robot's execution capabilities. Now imagine having the robot use new parts, new materials, or even perform completely different tasks. That would be even more challenging.
Traditionally, this would require the intervention of programmers. The robot would need to be reprogrammed. In most cases, people from outside the factory workshop would be involved. This consumes a lot of resources and time. To avoid this situation, one of the following two things needs to happen: 1) workshop workers need to learn programming; 2) a new method of interacting with robots, a more natural method, is needed.
While it is a good thing to have workshop workers learn programming, it seems unlikely that companies would be willing to invest in it and wait for the necessary time. The latter is what Covariant is trying to achieve through RFM-1. Although the analogy of "ChatGPT for robots" is not perfect, it is a reasonable shorthand (especially considering the founder's connection to OpenAI).
From the customer's perspective, the platform appears as a text field, very similar to the iterative versions of generative AI currently targeted at consumers. By typing or voice inputting text commands such as "pick up the apple," the system uses its training data (shape, color, size, etc.) to identify the object in front of it that best matches the description.
Then, RFM-1 generates a video result - essentially a simulation - to determine the best course of action based on past training. This final point is similar to how our brains calculate potential outcomes before executing an action.
In a live demonstration, the system responded to inputs such as "pick up the red object" and even more complex semantic inputs like "the thing you put on your foot before you put on your shoes," resulting in the robot correctly picking up an apple and a pair of socks, respectively.
When discussing the prospects of this system, many bold ideas have been put forward. At the very least, Covariant's founders have impressive backgrounds in the industry. Chen studied artificial intelligence under Pieter Abbeel at the University of California, Berkeley, who is also a co-founder and chief scientist of Covariant. Abbeel became an early employee of OpenAI in 2016, just a month before Chen joined ChatGPT. Covariant was established the following year.
Chen stated that the company expects the new RFM-1 platform to be used with "most" of the hardware already deployed with Covariant's software.