Researchers Reach Consensus on What AGI Is


The DeepMind research team is focused on the next frontier of artificial intelligence - Artificial General Intelligence (AGI) - but they realize that a key question needs to be addressed first. They ask, what exactly is AGI?

Generally, AGI is typically seen as a form of artificial intelligence that has the ability to understand, learn, and apply knowledge to a wide range of tasks, similar to how the human brain operates. Wikipedia expands the scope, stating that AGI is a "hypothetical intelligent agent [that] can learn to perform any intellectual task that a human or animal can do."

The OpenAI Charter describes AGI as a set of "highly autonomous systems that outperform humans in most economically valuable work."

AI expert and founder of Geometric Intelligence, Gary Marcus, defines it as "any flexible and broadly applicable intelligence that matches or exceeds human intelligence in performance and reliability."

With so many variations in the definition, the DeepMind team has adopted a simple viewpoint proposed by Voltaire centuries ago: "If you wish to converse with me, define your terms."

In a paper published on the preprint server arXiv, the researchers outline their framework for classifying AGI model capabilities and behaviors.

By doing so, they hope to establish a common language for researchers to use when measuring progress, comparing approaches, and assessing risks.

"Achieving 'intelligence' at a human level is a hidden or explicit goal for many in our field," said Shane Legg, who introduced the term AGI 20 years ago.

Legg explains, "I see a lot of discussion where people seem to be using the term to refer to different things, which leads to all sorts of confusion. Now that AGI has become so important, we need to nail down its definition."

In a paper titled "Levels of AGI: Operationalizing Progress on the Path to AGI," the team summarizes several principles required for AGI models. These include a focus on the capabilities of the system rather than the process.

"Achieving AGI does not mean that the system 'thinks' or 'understands' with attributes such as consciousness or perception," the team emphasizes.

AGI systems must also have the ability to learn new tasks and know when to seek clarification or assistance from humans.

Another parameter is a focus on potential rather than the actual deployment of the program. "Introducing deployment as a criterion for measuring AGI introduces non-technical barriers, such as legal and societal considerations, as well as potential ethical and safety issues," the researchers explain.

Then, the team compiled a list of intelligence thresholds from "Level 0, non-AGI" to "Level 5, superhuman." Levels 1-4 include "emerging," "competent," "expert," and "master."

Three programs reached the threshold for the AGI label. However, these three text-generating models (ChatGPT, Bard, and Llama 2) only reached the "Level 1, emerging" level. No other current AI programs meet the criteria for AGI.

Other programs listed as AI include SHRDLU, an early natural language understanding computer developed at MIT, listed as "Level 1, emerging AI."

Siri, Alexa, and Google Assistant are listed as "Level 2, competent." Grammarly, a grammar checker, ranks as "Level 3, expert AI."

Higher on the list are "Level 4, master" programs such as Deep Blue and AlphaGo. Topping the list at "Level 5, superhuman" is DeepMind's AlphaFold, which predicts the 3D structure of proteins from their amino acid sequences, and StockFish, a powerful open-source chess program.

However, AGI does not have a single definition and has been evolving.

"As our understanding of these fundamental processes deepens, it may be important to reexamine our definition of AGI," says Meredith Ringel Morris, Chief Scientist of Human-AI Interaction at Google DeepMind.

"We cannot enumerate a sufficiently general set of tasks that would constitute a measure of intelligence," the researchers say. "Therefore, an AGI benchmark should be a flexible benchmark. Such a benchmark should include a framework for generating and agreeing on new tasks."