Mistral launches innovative AI models: Mathstral and Codestral Mamba

2024-07-17

French renowned AI startup Mistral, known for its powerful open-source AI model technology, has recently launched two innovative products in its expanding Large Language Model (LLM) series: a model focusing on the field of mathematics and a code generation model based on the cutting-edge Mamba architecture tailored for programmers and developers.


The Mamba architecture aims to revolutionize traditional Transformer models by optimizing attention mechanisms, significantly improving processing efficiency. Compared to most Transformer-based models on the market, the Mamba architecture model demonstrates faster inference speed and longer context processing capabilities. This innovation has attracted several companies and developers, including AI21, to launch new AI models based on Mamba.

Mistral keeps up with the trend and introduces the Codestral Mamba 7B model, which cleverly integrates the Mamba architecture and ensures fast response even when processing extremely long input texts. Codestral Mamba excels in improving code productivity, especially for local coding projects, bringing developers an unprecedented convenience. The model is freely available on Mistral's la Plateforme API, supporting inputs of up to 256,000 tokens, which is more than twice the processing capacity of OpenAI GPT-4.

In benchmark tests, Codestral Mamba outperforms open-source competitors such as CodeLlama 7B, CodeGemma-1.17B, and DeepSeek, achieving excellent results in HumanEval tests. Developers can easily access, modify, and deploy the model through GitHub and the HuggingFace platform. Furthermore, it follows the open-source Apache 2.0 license, further promoting the sharing and advancement of technology.


In addition, Mistral has also launched the Mathstral 7B model, designed for mathematical reasoning and scientific exploration. This model, created in collaboration with Project Numina, has a broad context window of 32K and demonstrates outstanding performance in the field of mathematical reasoning, surpassing similar models. In tests that require more time for reasoning calculations, Mathstral achieves "significantly superior" results. Users can directly use the model or fine-tune it according to their needs.

Mistral emphasizes in its official blog: "Mathstral is another example of achieving excellent performance and speed balance in building dedicated models, as well as a concrete embodiment of the new development concept of la Plateforme platform that we actively promote, especially its added fine-tuning functionality."

Mathstral also adopts the Apache 2.0 open-source license, and users can easily access it through Mistral's la Plateforme and the HuggingFace platform.

Mistral is committed to sharing its model achievements on open-source platforms and competing fiercely with leading players in the AI field such as OpenAI and Anthropic. Recently, the company successfully completed a Series B financing round of $640 million, with a valuation soaring to nearly $6 billion, and received recognition and investment from tech giants like Microsoft and IBM.