Mistral recently refreshed its open-source coding framework, Codestral, releasing an updated version named Codestral 25.01. This framework has gained significant popularity among developers, intensifying the competition in the developer-focused programming model market.
Mistral announced on its official blog that the new Codestral features a more efficient architecture. The company claims that Codestral 25.01 will take the lead among similar models with double the speed of its predecessor.
Like its original version, Codestral 25.01 is optimized for low-latency, high-frequency operations and supports code correction, test generation, and intermediate code filling tasks. According to the company, this model is particularly practical for enterprise users who need to handle large volumes of data and require model residency.
Benchmark tests show that Codestral 25.01 excels in Python coding assessments, scoring 86.6% on HumanEval tests. It outperforms the previous Codestral version, Codellama 70B Instruct, and DeepSeek Coder 33B instruct.
The model is currently available to developers through Mistral's IDE plugin partners. Users can deploy Codestral 25.01 locally via the code assistant Continue or access its API through Mistral's la Plateforme and Google Vertex AI. Additionally, the model is previewed on Azure AI Foundry and will soon be available on Amazon Bedrock.
Since launching its first code-centric model, Codestral, last May, Mistral has continuously introduced related products. This 22 billion-parameter model can write code in 80 different languages and outperforms other code-centric models. Later, Mistral also released Codestral-Mamba based on the Mamba architecture, which generates longer code strings and processes more inputs.
Shortly after announcing the update, Codestral 25.01 quickly garnered attention, rapidly climbing the Copilot Arena rankings within hours of its release.
Coding has been one of the earliest capabilities of foundational models, even those like OpenAI's GPT-3 and Anthropic's Claude, which are general-purpose models. However, specialized programming models have seen improvements over the past year and often outperform larger models in specific tasks.
In the past year alone, several programming-focused models have been launched for developers. Alibaba released Qwen2.5-Coder in November, while China's DeepSeek Coder became the first to surpass GPT-4 Turbo in June. Microsoft also introduced GRIN-MoE, a mixture-of-experts-based model capable of writing code and solving math problems.
The debate over whether to use a general-purpose model that learns everything or a specialized programming model remains unresolved. Some developers prefer the broad capabilities offered by models like Claude, but the surge in specialized programming models highlights the demand for expertise. Since Codestral is trained on programming data, it naturally performs better in coding tasks than in tasks like composing emails.