Thousands of Llama 3 model variants are now available on Hugging Face.

2024-04-23

Last week, Meta released an early version of its latest large-scale language model, Llama 3, which has generated a lot of excitement. Clem Delangue, co-founder and CEO of Hugging Face, mentioned in an article that by next weekend, there will be over 10,000 available variants as Hugging Face has already publicly shared 1,000 Llama 3 model variants. This new model includes an image generator that can update images in real-time based on user input prompts. Meta has released two versions of Llama 3 - one with 8 billion parameters and another with 70 billion parameters. Meta claims that in certain benchmark tests, both versions of Llama 3 outperformed models of similar size such as Google's Gemma and Gemini, Mistral 7B, and Anthropic's Claude 3. In a Reddit conversation, someone claimed that the 8 billion parameter-guided model of Llama-3 performed better in benchmark tests compared to the 70 billion parameter-guided model of Llama-2. Compared to Meta's Llama 2 model, Llama 3 has increased the number of tokens from 32,000 to 128,000. With more tokens, Llama 3 can compress sequences more efficiently, reducing the number of tokens by 15% and providing better downstream performance. Andrej Karpathy, Director of AI at Tesla, expressed support for releasing both the 8 billion and 70 billion parameter base models and fine-tuned models. He also emphasized the need for smaller models, especially for educational purposes, unit testing, and potential embedded applications. Karpathy also mentioned limitations. While the increase in sequence length is a step in the right direction, he pointed out that it still falls short of industry-leading standards. In addition to these limitations, Arvind Srinivas, CEO of Perplexity AI, stated, "The most impressive aspect of Llama 3 to me is how they have packed so much knowledge and reasoning into dense 8 billion and 70 billion parameters, while others are scaling up sparse MoEs." This doesn't mean that having many GPUs is not important. Considering how many runs it takes to obtain the correct data combination, it may even be more important. Pratik Desai, founder of Kissan AI, released Dhenu Llama 3 based on fine-tuning Llama3 8B. "It is available for anyone to play with and provide feedback. If you have spare GPUs, feel free to host and share. In the near future, we will have an instruction version with a five times larger dataset," Desai wrote on X. In addition to supporting researchers, Reddit Llama 3 introduced GroqInc's "Llama 3 Researcher," which provides Llama 3 8B at a speed of 876 tokens per second - the fastest among any models we have benchmarked. According to Rowan Cheung, founder of AI news website The Rundown AI, this is like a GPT-4 level chatbot that is completely free to use, running over 800 tokens per second on Groq. Furthermore, Groq outputs 800 tokens per second on Llama 3, indicating new use cases for multiple actions under local AI agents, wrote Brian Roemmele. Yann LeCun, Chief AI Scientist at Meta, revealed that they are currently developing even more powerful language models. LeCun noted that the most advanced Llama model with over 400 billion parameters is being trained. The newly released AI models will be integrated into Meta's virtual assistant, Meta AI, which the company claims to be the most advanced among its free counterparts. Jim Fan from NVIDIA stated that the upcoming Llama-3 400B+ will mark a turning point for the community to gain access to GPT-4 level model open weights. Additionally, he mentioned that this will change the way many research works and grassroots startups compute. "I compared the data of Claude 3 Opus, GPT-4, and Gemini. Llama 3 400B is still in training, hoping to get better in the coming months," he added, highlighting the immense research potential that such a powerful infrastructure can unlock. Expect a surge in energy from the builders of the entire ecosystem!