Mistral AI: A Hybrid Expert Model Challenging the Dominance of OpenAI, Google, and Meta

2023-12-13

Mistral AI, a Paris-based AI startup, recently raised over $113 million in funding in June, despite not yet launching a real product. While Gemini was in the spotlight last week, Mistral AI has now become the focus with its latest model, Mixtral 8x7B. This model combines Sparse Mixture of Experts (SMoE) with Open Weights and is shared through a magnet link on X. Mistral AI's latest model, 8X7B, is based on the MoE architecture and can rival other popular models such as GPT 3.5 and Llama 2 70B. Mixtral is licensed under Apache 2.0 and outperforms Llama 2 70B on most benchmark tests, with a 6x improvement in inference speed. Positioning itself as a "Mixture Expert," Mistral AI cleverly markets itself, considering that OpenAI has been using the same approach to train GPT-4 since last year. However, somehow, Mistral AI's latest model has suddenly gained popularity. The Mixture Expert approach allows models to be pre-trained with less computational cost, enabling significant scaling of models or datasets within the same computing budget compared to dense models. It is a decoder-only model where the feed-forward block selects from a set of 8 different parameter groups. At each layer, for each token, a routing network selects two groups ("experts") to process the token and combines their outputs additively. This approach increases the number of parameters in the model while managing computational costs and processing time. Specifically, Mixtral has a total of 46.7 billion parameters, effectively using only 12.9 billion parameters per token. Therefore, it handles inputs and generates outputs at a speed and cost efficiency comparable to a 12.9 billion parameter model. However, OpenAI scientist Andrej Karpathy pointed out that the name "8x7B" is somewhat misleading, as not all 7B parameters are multiplied by 8. Only the feed-forward blocks in the Transformer are multiplied by 8, while the rest remain unchanged. Therefore, the total parameter count is not 56B but only 46.7B. Mistral AI is thriving as a Paris-based startup, announcing a recent funding of $415 million, valuing the company at $2 billion. Andreessen Horowitz (a16z) led the latest funding round, with Lightspeed Venture Partners also participating. Open-source LLM companies often struggle to sustain their business. To overcome this, Mistral AI recently launched "La Plateforme," where it provides API endpoints for its available models. The company has created three categories for its models - Mistral Tiny, Mistral Small, and Mistral Medium. Mistral 7B Instruct v0.2 and Mixtral 8x7B belong to Mistral Tiny and Mistral Small, respectively. Interestingly, the Medium model has not been released yet. Mistral AI is currently developing Mistral Medium, a top-tier service model based on standard benchmark tests. It excels in English, French, Italian, German, Spanish, and code, scoring 8.6 on MT-Bench. Theoretically, it even outperforms GPT 3.5. Interestingly, Mistral has chosen to introduce paid endpoints and has not open-sourced their Medium models, which showcase superior metrics. Introducing hosted API endpoints is the most effective way to quickly gather customer feedback, iterate real-world use cases, and, crucially, monetize open-source models. In contrast, Stability AI is currently struggling to generate enough revenue for survival. In response, the company has launched the Stability AI Membership, charging developers for commercial use of their LLMs. Meta has always been a leader in the open-source community, constantly publishing research papers and releasing models. However, generating revenue is not necessarily a top priority for Meta, as it has already gained significant profits from advertising through its family of social media apps. For startups venturing into creating open-source models, it is essential to achieve profitability to continue creating them. With Mistral AI raising significant funding, investors may expect a return on their investment. Is Mistral AI the next OpenAI? Europe recently reached a preliminary agreement on important rules for using AI in the European Union. Surprisingly, Mistral AI does not support the proposed EU AI Act. The company may feel that it would hinder its progress in the near future and potentially require disclosing trade secrets. Therefore, along with other open-source companies, they are exempt from this legislation. Mistral AI may not continue to release its upcoming models as open-source, but this is just speculation. This is considering that OpenAI also started as an open-source company. Interestingly, a few months ago, OpenAI lobbied to weaken the highly anticipated EU AI Act to alleviate regulatory burdens on the company. Karpathy pointed out the same thing, saying, "I'm glad they called it 'open weights' release and not 'open source,' which, to me, implies training code, datasets, and documentation." Currently, there are not many AI startups in Europe that can truly challenge OpenAI and Google. Although Mistral AI makes generative AI interesting with top-notch marketing and excellent products, it announced that it will continue to exist.