Microsoft introduces serverless fine-tuning feature for its Phi-3 small-scale language model

2024-07-26

Microsoft is a major supporter and partner of OpenAI, but that doesn't mean Microsoft is willing to let OpenAI dominate the field of generative artificial intelligence.

In order to prove this point, Microsoft announced today a new approach that allows developers to fine-tune their Phi-3 small language model without managing their own servers for free (initially).

Fine-tuning refers to the process of adapting an AI model by providing system prompts or adjusting the model's base weights (parameters) to make the model perform differently and better in specific use cases and for end users, or even add new functionalities.

So what is Phi-3?

Microsoft released Phi-3 in April as a low-cost enterprise-level option for third-party developers to build new applications and software. It is a model with 3 billion parameters, much smaller than most mainstream language models (e.g., Meta's Llama 3.1 with 405 billion parameters), but according to Sébastien Bubeck, Microsoft's VP of Generative AI, Phi-3 performs on par with OpenAI's GPT-3.5 model.

Specifically, Phi-3 aims to provide affordable performance in encoding, commonsense reasoning, and general knowledge.

Now, Phi-3 has become a family of six different models with varying numbers of parameters and context lengths (the number of tokens, or numerical representations of data, that can be provided in a single input), ranging from 4,000 to 128,000, with costs ranging from $0.0003 to $0.0005 per 1,000 input tokens.

However, if calculated based on the more typical "per million" token pricing, it starts at $0.3/$0.9 per million tokens, which is twice the input price of OpenAI's newly launched GPT-4o mini model and about 1.5 times the output token price.

Phi-3 is designed to be safe for enterprise use, with guardrails to reduce bias and toxicity. Even at its initial announcement, Microsoft's Bubeck promoted its ability to fine-tune for specific enterprise use cases.

"You can bring in your data and fine-tune this general model to get amazing performance in narrow verticals," he told us.

However, at that time, there was no serverless option for fine-tuning. If you wanted to do it, you had to set up your own Microsoft Azure server or download the model and run it on your own local machine, which may not have had enough space.

Serverless fine-tuning opens up new options

However, today Microsoft announced the public availability of "Models-as-a-Service (serverless endpoints)" in its Azure AI development platform.

Microsoft also announced that "Phi-3-small is now available through serverless endpoints, so developers can quickly and easily start AI development without managing the underlying infrastructure."

According to Microsoft's blog post, Phi-3-vision will also be able to handle image inputs and "will soon be available through serverless endpoints."

However, these models are simply provided "as-is" through Microsoft's Azure AI development platform. Developers can build applications on top of these models, but they cannot create their own fine-tuned versions of the models for their specific use cases.

For developers who want to do that, Microsoft suggests turning to Phi-3-mini and Phi-3-medium, which can be fine-tuned with third-party data to create more relevant, secure, and cost-effective AI experiences.

"Given their smaller compute footprint, cloud and edge compatibility, Phi-3 models are great for fine-tuning to improve base model performance across a variety of scenarios, including learning new skills or tasks (e.g., tutoring) or improving consistency and quality of responses (e.g., tone or style in chat/question-answering)," the company wrote.

Specifically, Microsoft stated that educational software company Khan Academy is already using fine-tuned Phi-3 to evaluate the performance of its Khanmigo for Teachers, supported by Microsoft Azure OpenAI services.

A new price and feature battle for enterprise AI developers

The serverless fine-tuning pricing for Phi-3-mini-4k-instruct starts at $0.004 per 1,000 tokens ($4 per million tokens), while the pricing for the medium model has not been announced yet.

While this is undoubtedly good news for developers who want to stay within the Microsoft ecosystem, it also marks Microsoft's own ally OpenAI as a significant competitor in attracting enterprise AI developers.

Just a few days ago, OpenAI announced that "tier 4 and tier 5" users of its application programming interface (API), who spend at least $250 or $1,000 per day on API credits, can fine-tune GPT-4o mini for free up to 2 million tokens until September 23.

In addition, Meta has just released the open-source Llama 3.1 series and the new Mistral Large 2 model, both of which can be fine-tuned for different purposes. Clearly, the competition to provide attractive AI options for enterprise development is in full swing, with AI providers using models of different sizes to attract developers.