Open-Source Mega-Model Mixtral 8x7B Surpasses GPT-3.5 in MMLU Benchmark

2024-01-10

Mistral AI has released its latest Mixtral 8x7B language model, the detailed architecture of which has been disclosed in the Mixtral 8x7B paper, along with a series of comprehensive benchmark test scores, comparing it to LLaMA 2 70B and GPT-3.5. In the highly anticipated language understanding benchmark test MMLU (Massive Multitask Language Understanding), Mixtral outperforms the aforementioned models. Larger models such as Gemini Ultra or GPT-4 score between 85% and 90% under different prompting methods. On the LMSys leaderboard, which ranks AI answers based on human ratings, Mixtral 8x7b slightly outperforms Claude 2.1, GPT-3.5, and Google's Gemini Pro. However, GPT-4 still maintains a significant lead. This reflects a trend over the past few months: for many organizations, achieving or slightly surpassing the level of GPT-3.5 models seems relatively easy, but GPT-4 remains unbeatable. Mistral claims that Mixtral 8x7B is currently the most outstanding open-domain language model on the market. Previously, Mistral released a new language model through a Torrent link. Recently, the company has provided more details about the Mixtral 8x7B model and announced an API service and a new round of financing. According to Mistral, Mixtral is a Sparse Mixture-of-Experts (SMoE) model that uses the Apache 2.0 license. There are rumors that OpenAI developed GPT-4 using a similar architecture. For a query, Mixtral selects two out of eight parameter sets and only uses a small portion of the total number of parameters for each inference, reducing costs and latency. Specifically, Mixtral has 45 billion parameters, but only uses 12 billion parameters for inference per token. This is the largest model the startup has released so far, following the release of the powerful Mistral 7B model in September last year. According to Mistral, Mixtral outperforms Meta's LLaMA 2 70B in most benchmark tests and provides 6 times faster inference speed. The company also claims that compared to the Meta model, Mixtral has improved in terms of authenticity and bias. According to Mistral, this makes it the "most powerful open-domain model with a permissive license and the best performance in terms of cost/performance trade-offs." In standard benchmark tests, it performs on par with or surpasses OpenAI's GPT-3.5. Mixtral is capable of handling contexts of up to 32,000 tokens, supports English, French, Italian, German, and Spanish, and is able to write code. In addition to the basic Mixtral 8x7B model, Mistral has also introduced the Mixtral 8x7B Instruct version. This model has been optimized for precise instructions through supervised fine-tuning and Direct Preference Optimization (DPO). In the MT-Bench test, the model achieved a high score of 8.30, making it the best open-source model with performance comparable to GPT-3.5. Currently, Mixtral is in the testing phase on the Mistral platform. Mistral states that in addition to the smaller Mistral 7B model, a more powerful prototype model that outperforms GPT-3.5 is also available on the platform.