The Paris-based open-source generative AI startup Mistral AI recently released a new large-scale language model, aiming to compete with industry giants.
The new model, named Mixtral 8x22B, is expected to outperform the company's previous Mixtral 8x7B model and become a strong challenger to well-known competitors such as OpenAI's GPT-3.5 and Meta Platforms Inc.'s Llama 2.
Last December, the startup successfully raised $415 million in funding, surpassing a valuation of $2 billion. According to the company, the new Mixtral 8x22B model is currently the most powerful model, with a context window of 65,000 tokens, capable of handling and referencing a considerable amount of text. In addition, the Mixtral 8x22B model has an astonishing parameter size of 176 billion, indicating a vast number of variables used for decision-making and prediction.
Mistral AI, co-founded by AI researchers from Google and Meta, is one of the AI startups dedicated to building open-source models. The company takes a unique approach by releasing torrent links through the social media platform X to provide access to new models. Subsequently, the Mixtral 8x22B model is also made available on platforms such as Hugging Face and Together AI, where users can retrain and fine-tune it to suit more specialized task requirements.
Shortly after Mistral released the Mixtral 8x22B model, its competitors also launched their latest models. On Tuesday, OpenAI introduced GPT-4 Turbo with Vision, the latest model in the GPT-4 Turbo series with visual capabilities, capable of processing various types of user-uploaded images, drawings, and other visuals. Later that same day, Google also unveiled its state-of-the-art Gemini Pro 1.5 LLM, offering a free version to developers with a maximum of 50 requests per day.
Meta also announced plans to launch Llama 3 at the end of this month, demonstrating strong competitive momentum.
Mixtral 8x22B is expected to surpass Mistral AI's previous Mixtral 8x7B model in terms of performance, with the latter expected to outperform GPT-3.5 and Llama 2 in multiple key benchmark tests.
This new model adopts an advanced sparse "expert mixture" architecture, enabling efficient computation and high performance across various tasks. The sparse MoE approach aims to optimize performance and cost by combining different models, with each model focusing on different categories of tasks.
Mistral AI states on its website: "At each layer, for each token, the router network selects two groups ('experts') to process the token and adds their outputs. This technique increases the number of model parameters while controlling costs and latency, as each token in the model uses only a fraction of the total parameter set."
Due to its unique architecture, despite its large scale, Mixtral 8x22B requires only about 44 billion active parameters per forward pass, making it faster and more economical than models of equivalent scale.
Therefore, the release of Mixtral 8x22B has become an important milestone in the field of open-source generative AI, providing researchers, developers, and other enthusiasts with the opportunity to use advanced models without the limitations of restricted access and high costs. The model is available under the permissive Apache 2.0 license.
The AI community has largely responded positively to this release on social media, with enthusiasts expressing their anticipation for the model's significant role in tasks such as customer service, drug discovery, and climate modeling.
Although Mistral AI has received widespread acclaim for its open-source approach, it has also faced some criticism. The company's models are referred to as "cutting-edge models" and carry a certain risk of misuse. Additionally, as anyone can download and build upon the company's AI models, this startup cannot prevent its technology from being used for harmful purposes.