Google DeepMind Enhances AI's Ability in Complex Mathematics Closer to Human Level

2024-07-29

Researchers paired two new systems, AlphaProof and AlphaGeometry 2, to solve problems from the International Mathematical Olympiad. This global math competition for exceptional high school students has been held since 1959 and consists of six extremely difficult questions each year. The problems involve algebra and geometry, and gold medalists will be on par with the world's most outstanding and intelligent young mathematicians. Although the performance of AI systems is impressive, they have not yet reached the level of the smartest humans. Google's DeepMind "team" scored 28 out of 42 points in just 42 minutes, falling short of a gold medal by only one point and settling for a silver medal. It is understandable that, unlike human performance, the answers submitted by DeepMind's AlphaProof and AlphaGeometry 2 are either flawless or completely useless. The AI accurately solved four problems, earning the highest score, but had no clue about the other two problems. The technology couldn't even begin to calculate the answers. Another noteworthy point is that the DeepMind experiment had no time limit. Some problems were solved within seconds, while others continued day and night for three days. In contrast, human participants in the Olympiad had a maximum of nine hours to complete the test. The two AI systems paired by researchers are said to be very different. AlphaProof answered three questions by combining large language models, such as those used in chatbots, with specialized "reinforcement learning" techniques. AlphaGeometry, on the other hand, combines a large language model with a focused and mathematical approach. Thomas Hubert, the lead researcher of AlphaProof, said, "We are trying to build a bridge between these two domains so that we can leverage the guarantees provided by formal mathematics and the data available in informal mathematics."