NVIDIA Unveils Upgraded NeMo Framework, Boosting LLM Training Efficiency

2023-12-06

NVIDIA has updated its NeMo framework and enhanced the training of large language models (LLMs) on the H200 GPU. These developments are aimed at developers and researchers in the field of artificial intelligence, particularly those working on AI foundational models such as Llama 2 and Nemotron-3. The updated NeMo framework has now become cloud-native, supporting a wider range of model architectures and employing advanced parallel techniques for efficient training. In particular, the H200 GPU has made significant progress in improving the performance of the Llama 2 model, surpassing the performance of previous versions. These tools were announced on December 4th and are now globally available, serving various applications from academic research to industry use. The updates aim to meet the growing demand for better training performance in complex and diverse large language models. They focus on accelerating the training process, improving efficiency, and expanding model capabilities, which are crucial for computationally intensive models. Enhancements include mixed-precision implementation, optimized activation functions, and improved communication efficiency. The H200 GPU achieves a performance of 836 TFLOPS per GPU, significantly increasing training throughput. The introduction of Fully Sharded Data Parallelism and Mixture of Experts architecture optimizes model training and capacity. TensorRT-LLM enhances reinforcement learning based on human feedback, supporting larger models and improving performance. For those interested, NVIDIA provides the NeMo framework as an open-source library, along with containers on NGC, as part of the NVIDIA AI Enterprise Edition. NVIDIA also offers additional resources such as the GTC conference, webinars, and SDKs to further interact with NVIDIA's AI tools.