In a recent announcement, Meta introduced two new models in the Llama 4 series: the compact model Scout and the mid-sized model Maverick. According to Meta, Maverick surpasses GPT-4o and Gemini 2.0 Flash in several widely reported benchmarks.
Maverick quickly climbed to second place on the AI benchmarking site LMArena. This platform allows users to compare outputs from different systems and vote for their preferred results. In their press release, Meta highlighted that Maverick achieved an ELO rating of 1417, surpassing OpenAI’s GPT-4o and trailing only Gemini 2.5 Pro. A higher ELO score indicates more victories against competing models in direct comparisons.
This achievement seemingly positions Meta’s open-source Llama 4 as a strong contender against the most advanced proprietary models from OpenAI, Anthropic, and Google. However, upon closer inspection of Meta’s documentation, AI researchers have uncovered some irregularities.
Meta admitted in the fine print of its documentation that the version of Maverick tested on LMArena was not identical to the one released to the public. Based on Meta's own information, they deployed an "experimental chat-optimized version" specifically tailored for conversational abilities on LMArena. TechCrunch was the first to report this revelation.
"Meta's interpretation of our policies does not align with the behavior we expect from model providers," LMArena posted on social media platform X two days after the model's release. "Meta should have clearly indicated that 'Llama-4-Maverick-03-26-Experimental' is a customized model optimized for human preferences. Therefore, we will update our leaderboard policy to strengthen our commitment to fair and reproducible evaluations and prevent such confusion in the future."
Ashley Gabrielle, a spokesperson for Meta, stated in an email declaration: "We have experimented with various types of customized variants. 'Llama-4-Maverick-03-26-Experimental' is a chat-optimized version we tried, and it performed exceptionally well on LMArena."