Microsoft officially releases Phi-2 small-scale language model with 2.7 billion parameters

2023-12-13

Microsoft has released Phi-2, a language model with 2.7 billion parameters, which was first announced by CEO Satya Nadella at last month's Ignite conference. Phi-2 demonstrates extraordinary capabilities typically only seen in models that are at least 5 to 25 times larger in size. In a base language model with no more than 13 billion parameters, Phi-2 showcases state-of-the-art performance in complex benchmark tests measuring reasoning, language understanding, mathematics, coding, and common sense.

The remarkable capabilities of Phi-2 can be attributed to Microsoft's emphasis on high-quality training data and innovation in effectively expanding model knowledge. By training on carefully curated "textbook-quality" data designed to impart knowledge, combined with techniques that transfer insights from smaller models, Phi-2 breaks the conventional scaling laws.

Traditionally, the capabilities of language models are closely tied to their size, with larger models having more impressive abilities. However, Phi-2 defies this notion. It not only matches but in some cases even surpasses models that are 25 times larger in size.

Phi-2 matches or exceeds large models such as 7B Mistral, 13B Llama-2, and even 70B Llama-2 in selected benchmark tests. It also matches or surpasses the recently announced Google Gemini Nano 2 in performance, despite being smaller in size. The tests are extensive and include inference tasks, language understanding, mathematics, coding challenges, and more.

Microsoft attributes the performance of Phi-2 at a smaller scale to two key factors:

  • The quality of training data plays a crucial role in model capabilities. By focusing on high-quality "textbook" data specifically designed to teach reasoning, knowledge, and common sense, Phi-2 learns more from less data.
  • Techniques that embed knowledge from smaller models help effectively scale model insights. Microsoft successfully unlocked the unexpected powerful capabilities of Phi-2 with 2.7 billion parameters, starting from the 1.3 billion parameter Phi-1.5, using methods such as knowledge transfer.

It is worth noting that although Phi-2 did not undergo reinforcement learning or alignment techniques such as human feedback or directive fine-tuning, it still demonstrates superior safety in reducing toxicity and bias compared to other available open-source models that use alignment strategies. Microsoft states that this improved behavior comes from their tailored data curation approach. The ability to develop powerful and safer models by selectively choosing data has positive implications for the industry's ongoing efforts in addressing issues related to model outputs and other risks.

The efficiency of Phi-2 makes it an ideal platform for researchers to explore critical model development, such as improving interpretability, safety, and ethical advancements in language models. Microsoft has already released access to Phi-2 on the Azure Model Gallery to facilitate such research while promoting new applications in natural language processing.