Meta's long-awaited Llama3 model has finally made its debut, bringing a series of remarkable technological innovations. Despite being another relatively small model, including variants with 8B and 80B parameters, Llama3 still focuses on high-quality training data and effective protection measures. Compared to the previous Llama2 model, Meta has expanded the training dataset used for Llama3 by seven times, with training involving up to 15 trillion tokens. To maximize data quality with relatively fewer parameters, Meta has also independently developed a series of data pipelines, filters, and heuristic-based methods.
For Meta's model series, Llama3 is undoubtedly a significant advancement. As the company continues to refine its processes and release new versions with larger parameter counts and training datasets, its performance will continue to be optimized. In the future, a fully multimodal version is planned, along with a 400B parameter version and multilingual support waiting to be released. But people may wonder, what sets Llama3 apart from OpenAI's GPT models or Google's Gemini? Below, we will explore several key reasons why Llama3 is important.
1. Llama3 is completely open to developers for free
Meta has taken a radically different approach from OpenAI. A unique move by Meta in the AI field is the openness of its model's availability and portability. Similar to companies like Mistral, Meta offers anyone the opportunity to use its models for free, with unrestricted licenses for commercial or research purposes. The company has actively released its models, aiming to drive the development of AI, and has committed to early support for cloud platforms like AWS and Databricks, as well as providing necessary support for developers to fine-tune models locally.
Meta clearly aims to build an ecosystem and toolchain around its AI models and actively embrace the large online community that builds, trains, and fine-tunes freely available models for various applications. This contrasts sharply with the more "product-driven" approach of companies like OpenAI and Google. This strategy may be Meta's attempt to avoid the "curse" of traditional vendors in the technology field, where they invest significant resources in developing products early in the market, only to be quickly surpassed by other innovators. Llama3 is expected to be a catalyst for inspiring more AI innovation and investment, while also sharing some of the workload in understanding and pushing the capabilities of models with Meta.
2. Meta takes AI protection measures seriously
Llama Guard 2 and Cybersec Eval 2 aim to provide comprehensive protection for models and user safety
Llama3 not only brings technological innovations but also introduces a "system-level" approach to AI responsibility, which is apparently a topic that other major players in the AI field are reluctant to discuss. Part of the reason for this is Meta's open availability strategy for its models, which may to some extent weaken certain protection measures relied upon by OpenAI and Google's Gemini models. However, Meta is eager to emphasize the protection measures it has taken during the training and fine-tuning stages, with the most notable being the newly introduced Llama Guard 2.
Llama Guard 2 is an independent LLM model (ironically trained on Llama3) with 8B parameters. Its main function is to serve as an input-output protector for the Llama3 model, filtering and screening incoming tasks to ensure high security and stability when processing various requests.
This innovative move not only demonstrates Meta's determination and strength in AI protection but also sets a new benchmark for the entire industry. By adopting stricter protection measures, Meta aims to provide users with safer and more reliable AI services while ensuring that the model itself is not subject to malicious attacks or abuse.
3. Llama3 prioritizes quality over scale
Meta's models are highly compatible and easily adaptable to your personal computer
Compared to competitors in the market, Llama3 performs exceptionally well in benchmark scores for both the 8B parameter model and the 70B parameter model. Once again, Meta showcases its unique strategy of training on smaller datasets and parameter counts while always prioritizing high-quality data. The advantages of this approach are evident. By doing so, the computational cost of the model is significantly reduced, and the training process becomes faster. Although training Llama3 still requires the use of a custom 24,000 GPU cluster provided by NVIDIA, Meta has gradually moved away from the pursuit of large-scale parameter counts pursued by large LLMs (such as GPT4 with over a trillion parameters) and instead focuses on building a high-quality offline dataset.
This strategy not only improves training efficiency but also brings many other benefits. Running Llama3 on local machines becomes more convenient (although even an 8B parameter model still requires considerable computational resources). This feature greatly facilitates developers, startups, and potential AI disruptors, enabling them to easily use the latest models without significant upfront capital investment. It can be said that the launch of Llama3 brings new possibilities to the entire AI industry and provides more opportunities for dream realization to innovators.
4. This is just the beginning for Llama3
Meta plans to release a more powerful 400B parameter version in the future, and currently, the Llama 3 400B parameter model is in the intense training stage. One highlight of Llama3 is that Meta has transparently disclosed its future improvement plans, including multimodal support, multilingual support, and the upcoming 400B parameter version. While more parameters mean a larger model, it will undoubtedly enhance its processing capabilities and be widely welcomed.
Multilingual support is undoubtedly a challenge as Llama3 is currently trained only on English. We speculate that Meta may be working behind the scenes to improve its data processing pipeline and enhance its ability to perform RLHF (Reinforcement Learning from Human Feedback) and fine-tuning in various languages. We look forward to seeing a multilingual version of Llama3 and hope that this heralds the inclusion of multilingual support in future model versions from Meta.
In addition, multimodal support (i.e., generation and ingestion of images and videos) is also part of Meta's plans. Although they have released a separate image generator alongside Llama3, they have decided not to include true multimodal support for now due to cautious considerations of the existing limitations of current multimodal models. However, we are excited about the future, especially considering Meta's firm commitment to model security. We have reason to believe that with the continuous advancement of technology, Meta will bring us more surprises and breakthroughs.
The Future Outlook of Llama3
Among the many companies in the AI field, Meta stands out. While Meta has also released another AI assistant alongside Llama3, it does not seem eager to cater to market trends. Although Meta is still catching up to some extent with Google and OpenAI, its models are growing and focusing on all key areas, including providing convenient support for developers, achieving scalability, gaining platform recognition, and ensuring the safety of general AI. These are often overlooked by companies more eager to push products to the market.
Whether Meta will succeed in these aspects is currently difficult to determine. Meta's approach is unique, whether out of laziness or patience. However, regardless of the outcome, we are excited about the future of Llama3. We have reason to believe that with the continuous advancement of technology and Meta's ongoing efforts, Llama3 will bring us more surprises and breakthroughs.