In the past few days, it has been a roller coaster ride for the growing open-source AI community - even by the standards of this rapidly changing field.
Here is a brief timeline: around January 28th, a user named "Miqu Dev" posted a set of files on HuggingFace, a leading open-source AI model and code sharing platform. These files collectively formed what appeared to be a new large-scale language model (LLM) labeled as "miqu-1-70b".
The entry on HuggingFace noted that the "prompt format" (i.e., the way users interact with it) of the new LLM is similar to Mistral, a well-funded open-source AI company based in Paris, known for its Mixtral 8x7b, which many consider to be the best-performing open-source LLM currently available, a fine-tuned and retrained version of Meta's Llama 2.
Posted on 4chan
On the same day, an anonymous user on 4chan (possibly "Miqu Dev") posted a link to the miqu-1-70b files, which caught people's attention on 4chan.
Some turned to X, where they shared their findings about this model and its exceptionally high performance on common LLM tasks, as measured by benchmarks, approaching the performance of OpenAI's previous leader, GPT-4, on EQ-Bench.
Is Mistral Quantized?
Machine learning (ML) researchers also took notice of this on LinkedIn.
"Does 'miqu' stand for MIstral QUantized? We are not sure, but it quickly became one of the best open-source LLMs," wrote Maxime Labonne, an ML scientist at JPMorgan, one of the world's largest banks and financial companies.
"The investigation is ongoing. Meanwhile, we might soon see a fine-tuned version of miqu surpassing GPT-4's performance."
In ML, quantization refers to a technique that allows certain AI models to run on less powerful computers and chips by replacing specific long sequences of numbers in the model's architecture with shorter ones.
Users speculated that "Miqu" could be a new model leaked by Mistral itself in a secretive manner - especially given Mistral's reputation for quietly releasing new models and updates through mysterious and technical means - or it could be an act of employee or customer defection.
Confirmation from the Top
Now, we finally have confirmation of the latter possibility in the aforementioned speculation: Arthur Mensch, co-founder and CEO of Mistral, made a statement on X, saying, "An overzealous employee of ours, who had early access to a quantized (and watermarked) old version of the model, leaked it.
To quickly start working with a select few clients, we retrained this model from Llama 2 the moment we gained access to the entire cluster - pretraining completed on the day Mistral 7B was released. We have made good progress since then - stay tuned!"
With Mensch's "stay tuned!" hint, it appears that Mistral is training a version of this so-called "Miqu" model that performs at a level close to GPT-4, if not matching or surpassing it.
A Turning Point for Open-Source AI and Beyond?
This could be a watershed moment for open-source generative AI and the entire AI and computer science field: since its release in March 2023, GPT-4 has been the most powerful and high-performing LLM in most benchmark tests worldwide. Even Google's long-rumored Gemini model, currently available, has failed to surpass it.
The release of an open-source model at the GPT-4 level, assuming it can be used for free, could pose significant competition to OpenAI and its subscription tiers, especially as more and more companies seek to leverage open-source models or a mix of open-source and proprietary models to power their applications. OpenAI may maintain an advantage with its faster GPT-4 Turbo and GPT-4V (vision), but the situation is becoming clear: the open-source AI community is catching up rapidly. Does OpenAI have enough of a lead, with its GPT Store and other features, to maintain its top position in the LLM landscape?