Microsoft recently introduced a new member to its Phi series of generative artificial intelligence models, named Phi-4. The company revealed that Phi-4 offers several enhancements over its predecessor, particularly excelling in mathematical problem-solving abilities. These improvements are partly attributed to the increased quality of training data.
Phi-4 was officially launched on Thursday evening, but access is extremely limited. Currently, the model is available exclusively on Microsoft's newly launched Azure AI Foundry development platform and is restricted to research purposes under Microsoft's research licensing agreement.
Phi-4 is Microsoft's latest small-scale language model, featuring 14 billion parameters and competing with other models such as GPT-4o mini, Gemini 2.0 Flash, and Claude 3.5 Haiku. These small-scale AI models typically operate faster and at a lower cost, with their performance steadily improving in recent years.
Microsoft stated that the significant performance enhancements of Phi-4 are due to the use of a "high-quality synthetic dataset," along with high-quality human-generated content datasets and certain unspecified post-training improvement methods.
Currently, many AI laboratories are closely monitoring innovations related to synthetic data and post-training enhancements. Alexandr Wang, CEO of Scale AI, mentioned in a tweet on Thursday that "we have encountered a training data bottleneck," which aligns with several reports on the topic in recent weeks.