Microsoft Releases Code for Compact Language Model Phi-4

2025-01-09

Recently, Microsoft unveiled the code for Phi-4, a compact language model designed to generate text and solve mathematical problems.

Last month, Microsoft provided detailed information about the Phi-4 model. Initially, access was restricted to Microsoft's Azure Foundry AI development service. Now, it is available for download on Hugging Face, a popular platform for hosting open-source AI projects.

Phi-4 is the fourth iteration in Microsoft's series of compact language models launched in 2023, featuring 14 billion parameters that dictate how the neural network processes data. Researchers at Microsoft trained Phi-4 over 21 days using a cluster of 1920 Nvidia H100 GPUs.

The model is based on the industry-standard Transformer architecture, which forms the foundation of most large language models. This architecture breaks down user input into individual words and determines their meanings by analyzing surrounding text while focusing on the most relevant contextual elements.

Phi-4 employs a decoder-only variant of the Transformer architecture. While standard Transformers analyze both preceding and following text to determine word meanings, the decoder-only model only considers prior context, reducing the amount of data processed and lowering inference costs.

In a research paper, Microsoft outlined two post-training optimization techniques—direct preference optimization and supervised fine-tuning—that enhance Phi-4's output quality. Both methods involve providing the language model with examples to guide its real-time response generation.

Internal evaluations compared Phi-4 with the LLama 3.3 70B model, which has five times as many parameters. Results showed that Phi-4 outperformed its competitor in the GPQA and MATH benchmark tests, datasets comprising scientific and mathematical questions respectively.

Phi-4 joins the ranks of other compact language models recently open-sourced by major tech companies.

In February of last year, Google introduced the Gemma series of compact language models, ranging from 2 billion to 270 billion parameters. Google claimed that the 270-billion-parameter version surpassed models twice its size in performance.

More recently, Meta Platforms released two versions of the LLama 3.2 model with fewer than 5 billion parameters and subsequently made more efficient versions available through open-source channels. These versions utilize quantization, a machine learning technique that compresses data to reduce hardware requirements.