Former Google engineer and prominent AI researcher Francois Chollet has co-founded a nonprofit organization dedicated to developing benchmarks for assessing AI's human-level intelligence. Known as the ARC Prize Foundation, it will be chaired by Greg Kamradt, former engineering director at Salesforce and founder of the AI product studio Leverage, who will also serve on its board.
ARC Prize Foundation plans to start fundraising later in January this year. In an article posted on the organization's website, Chollet mentioned that they are evolving into a formal nonprofit foundation aimed at providing useful guidance for the development of artificial general intelligence (AGI). He emphasized their goal to advance progress by narrowing the gap between AI and fundamental human capabilities.
The foundation will expand the ARC-AGI test, developed by Chollet to evaluate whether AI systems can acquire new skills beyond untrained data sets. The test consists of puzzle-like problems where AI must generate the correct "answer" grid from a set of differently colored blocks. These challenges are designed to force AI to adapt to novel problems not seen before.
In 2019, Chollet introduced ARC-AGI (Abstract Reasoning Corpus for Artificial General Intelligence). Although many AI systems excel in mathematical Olympiads and solve doctoral-level problems, as of now, even the best-performing AIs can only complete less than one-third of the tasks in ARC-AGI.
Unlike most cutting-edge AI benchmarks, Chollet stated in his article that they do not aim to measure AI risks through superhumanly difficult tests. Future versions of the ARC-AGI benchmark will focus on closing the gap with human abilities until it approaches zero.
Last June, Chollet, along with Zapier co-founder Mike Knoop, launched a competition to build AI surpassing ARC-AGI. OpenAI's unreleased o3 model was the first to achieve a qualifying score, albeit requiring immense computational power. Chollet noted that ARC-AGI has flaws, as many models can achieve high scores through brute force methods, and he does not consider o3 to have human-level intelligence.
According to a statement released last December, preliminary data suggests that the upcoming version of the ARC-AGI benchmark will pose significant challenges to o3, potentially reducing its score below 30%, while a smart person could score over 95% without training. "When creating tasks that are easy for humans but hard for AI becomes impossible, you know AGI has arrived," Chollet remarked.
Knoop announced plans to launch the second-generation ARC-AGI benchmark and new competitions in the first quarter. Additionally, the nonprofit will work on designing the third generation of ARC-AGI.
It remains unclear how ARC Prize Foundation will address criticisms regarding Chollet's promotion of ARC-AGI as an AGI benchmark. The definition of AGI is currently under intense debate; an OpenAI employee recently claimed that if AGI is defined as "AI better than most people in most tasks," then AGI has already been achieved.
Notably, OpenAI CEO Sam Altman stated last December that the company intends to collaborate with the ARC-AGI team to develop future benchmarks. However, Chollet did not mention any potential partnerships in his recent announcement.
Nevertheless, ARC Prize Foundation mentioned in a series of posts about X that they will establish an "academic network" to further promote AGI advancements and evaluations, as well as form an "alliance of frontier AI lab partners" to jointly create industry AGI standards.