NVIDIA Open Sources Universal Large Model Nemotron-4 340B

2024-06-17

NVIDIA officially launches the Nemotron-4 340B model series under the permissive NVIDIA Open Model License. This series includes base, instruction, and reward models, which not only maintain high efficiency but also continuously push the boundaries of open access in the field of AI. The Nemotron-4 340B models have demonstrated outstanding performance in various benchmark tests, often surpassing existing open models. It is worth mentioning that these models have been optimized to run on a single NVIDIA DGX H100 system with just 8 GPUs, making them highly efficient and of great interest to a wider range of researchers and developers. In numerous benchmark tests, they outperform models such as Llama-3 70B and Mixtral 8X7B. NVIDIA firmly believes that these models will bring great benefits to the AI community in terms of research and commercial applications. One notable application is generating high-quality synthetic data for training smaller, more specialized AI models. In fact, synthetic data played a crucial role in the development of Nemotron-4. Over 98% of the data used to train the Nemotron-4 340B instruction model was synthesized using the base and reward models. This fully demonstrates the enormous potential of these models in creating valuable training data. What's even more exciting is that NVIDIA will also make the pipeline used for generating this synthetic data publicly available, allowing more people to leverage this approach. The Nemotron-4 340B base model was trained on a massive dataset containing trillions of tokens, covering various English texts, multilingual data, and programming languages. It adopts a standard transformer architecture and is enhanced with cutting-edge techniques such as grouped query attention and rotational position embeddings. Building upon this solid foundation, the instruction model undergoes supervised fine-tuning and preference optimization by combining human annotations and synthetic data. NVIDIA has developed a novel "iterative weak-to-strong alignment" approach, using each generation of models to create higher-quality synthetic data for training the next generation. The Nemotron-4 340B reward model also performs exceptionally well, ranking high on the RewardBench leaderboard. It captures detailed quality scores by predicting rewards on multiple fine-grained attributes such as usefulness, coherence, and verbosity, which drive the optimization of the instruction model. Benchmark evaluations fully demonstrate the outstanding capabilities of the Nemotron-4 models. The base model is on par with leading open models, the instruction model excels in following complex instructions and engaging in coherent conversations, and the reward model even surpasses some well-known proprietary systems. Human evaluations further validate the powerful performance of these models.