Inflection Unveils Inflection-2.5 Model with Performance Comparable to GPT-4

2024-03-08

Inflection AI, a startup founded by Mustafa Suleyman, co-founder of DeepMind, and Reid Hoffman, co-founder of LinkedIn, recently announced the launch of a new foundational model called Inflection-2.5.

Based on previous achievements, Inflection-2.5 has shown significant improvements in performance compared to the company's original Inflection-1 model, almost on par with OpenAI's GPT-4 model, especially in the field of STEM. This model has now been applied to the company's Pi assistant, aiming to compete with competitors such as ChatGPT and Gemini. Users can test it through mobile and web platforms.

This move signifies that Inflection AI has become a new force challenging OpenAI's dominance in the rapidly developing field of AI, while OpenAI continues to adhere to its philosophy of developing AI for the benefit of humanity. Just recently, Anthropic released Claude 3 Opus, the first model to surpass GPT-4.

Although Inflection-2.5 has made significant improvements in performance, it still falls slightly behind GPT-4.

Since its inception, Inflection AI has been committed to creating an AI that is "understanding, useful, and safe," with a more personalized and colloquial performance compared to other models, including the GPT series. The company employs unique empathetic fine-tuning techniques to give the model behind Pi assistant unique personality traits and exceptional emotional intelligence (EQ).

With the launch of Inflection 2.5, the startup, which raised $1.3 billion in funding in June 2023, is strengthening the intelligence of its AI, covering fields such as physics and mathematics. In a blog post released by the company, it mentioned that users can discuss a range of topics when interacting with the Pi assistant supported by Inflection 2.5, from sharing hobbies to programming, from checking biology test answers to drafting business plans.

In terms of benchmark performance, the upgraded model has shown significant improvements in all aspects compared to Inflection 1 and is approaching GPT-4, although it still lags behind.

For example, in the MMLU benchmark test, which measures performance in tasks ranging from high school to professional difficulty levels, Inflection-2.5 scored 85.5, just below GPT-4's 87.3. In STEM exams, the model performs almost as well as the OpenAI model, scoring 63 in the Hungarian mathematics exam (compared to GPT-4's 68) and ranking in the 85th percentile in the physics GRE exam, while GPT-4 ranks in the 97th percentile.

In the GSM8K benchmark test, which includes 8.5K high-quality elementary school math problems, the Inflection model scored 86.3, while GPT-4 scored 92. In the 0-shot HumanEval test, which aims to evaluate code generation capabilities, the Inflection model scored 73.8, while GPT-4 scored 79.3.

Although the performance has not yet surpassed GPT-4, Inflection AI points out that this model, which achieves 94% of GPT-4's performance level, far exceeds OpenAI's large language models (LLMs) in training efficiency.

According to the company, Inflection-2.5 achieved these results using only 40% of the floating-point operations (computational power) used in training GPT-4.

In addition, like GPT-4, this model also integrates real-time web search functionality to provide users with the latest information on current events. Considering that the Pi assistant is positioned as an AI for the general public, this will be an important upgrade. However, it is worth noting that the quality of web search results may vary as there are currently no relevant benchmark tests.

How to access Inflection-2.5?

Inflection AI has already applied the new model to its Pi chatbot. This means that anyone using the assistant can start testing its capabilities.

The company has not yet shared how users can benefit from the upgraded model but stated that this change has had a significant impact on users' emotions, engagement, and retention, accelerating organic user growth of the chatbot.

Currently, the Pi chatbot, which can be used on Android, iOS, web, and desktop applications, has 1 million daily active users and 6 million monthly active users. This AI has exchanged over 4 billion messages, with an average conversation duration of 33 minutes.