Noam Brown, the head of AI reasoning research at OpenAI, stated that certain forms of "reasoning" AI models could have emerged as early as two decades ago if researchers had known the right methodologies and algorithms at that time.
Speaking at a panel discussion during Nvidia's GTC conference on Wednesday, Brown noted there were several reasons why this research direction was overlooked. His work revealed that humans spend significant time contemplating when facing challenges—a process that could prove highly beneficial in AI systems.
Brown was involved in game AI research at Carnegie Mellon University, including the development of Pluribus, an AI that defeated elite human poker players. The distinguishing feature of the AI he helped create was its problem-solving approach through "reasoning" rather than relying on more brute-force methods.
He is also one of the creators of OpenAI's o1 model, which employs a technique called reasoning at test time. This involves performing additional computations during the model’s operation to simulate a form of "reasoning" before responding to queries. Reasoning models are generally more accurate and reliable than traditional models, especially in areas like mathematics and science.
When asked whether academia could conduct experiments on the scale of large AI labs like OpenAI, given limited computational resources, Brown acknowledged that it has become increasingly challenging in recent years due to growing computational demands. However, he suggested that academia can still make an impact by focusing on areas requiring fewer computational resources, such as model architecture design.
Brown pointed out opportunities for collaboration between leading labs and academic institutions. Frontier labs actively monitor academic publications and assess whether research could be highly effective when scaled up. If a paper presents strong arguments, these labs will likely investigate further.
Brown's comments have drawn attention amid cuts to scientific funding during the Trump administration. Prominent AI experts, including Nobel laureate Geoffrey Hinton, criticized these reductions, arguing they could jeopardize AI research both domestically and internationally.
Brown specifically highlighted AI benchmarking as an area where academia could make a substantial impact. He noted that the current state of AI benchmarking is poor and does not require extensive computational resources.
Popular AI benchmarks today often focus on testing obscure knowledge, with scores poorly correlating to task proficiency that people care about. This has led to widespread confusion regarding model capabilities and improvements.