In an AI copyright case, internal communications from Meta were disclosed, revealing that during the development of the Llama 3 model, executives and researchers were fully committed with the aim of surpassing OpenAI's GPT-4 model.
Meta's Vice President of Generative AI, Ahmed Al-Dale, mentioned in a message to researcher Hugo Touvron in October 2023: "To be honest... our target must be GPT-4. We will soon have 64,000 GPUs! We need to learn how to create cutting-edge technology and win this race."
Although Meta releases open-source AI models, its AI leadership is more focused on outperforming competitors like Anthropic and OpenAI, who typically do not release model weights but provide services through APIs. Meta executives and researchers view Anthropic's Claude and OpenAI's GPT-4 as benchmarks and goals.
Mistral, a French AI startup and one of Meta's main competitors in the open-source domain, was frequently mentioned in internal messages, often with dismissive tones. Al-Dale stated: "Mistral is beneath us. We should be able to do much better."
Tech companies are currently racing to launch advanced AI models, and Meta's internal communications reveal intense competition within its AI leadership. In multiple exchanges, they discussed their efforts to "aggressively" acquire data for training Llama; one executive even told colleagues, "Llama 3 is my only concern."
The prosecutor in the case accused Meta executives of occasionally using copyrighted material, including books, in their haste to launch AI models.
Touvron mentioned that the dataset used for Llama 2 was "suboptimal" and discussed how Meta could improve it for Llama 3 by enhancing the data source mix. He and Al-Dale discussed overcoming obstacles to using the LibGen dataset, which includes copyrighted works from Cengage Learning, Macmillan Learning, McGraw Hill, and Pearson Education.
Al-Dale asked, "Do we have the right datasets? Is there anything you want to use but can't due to some silly reason?"
Meta CEO Mark Zuckerberg previously stated he was working to close the performance gap between Meta's Llama AI models and non-open-source models from companies like OpenAI and Google. Internal messages revealed the immense pressure faced by Meta.
Zuckerberg wrote in a letter in July 2024: "This year, Llama 3 is competitive with the most advanced models and even leads in certain areas. Starting next year, we expect future Llama models to become the industry's most advanced."
Meta released Llama 3 in April 2024, an open-source AI model that rivals leading non-open-source models from Google, OpenAI, and Anthropic, and outperforms Mistral's open-source options. However, the data used to train these models—said to have been approved by Zuckerberg despite copyright concerns—is currently under scrutiny in multiple lawsuits.