New MLPerf Benchmark Results: NVIDIA Leads, Intel Impresses
MLCommons has released the latest MLPerf Inference v4.0 benchmark results, highlighting the rapid progress in AI hardware and software. The new benchmark introduces larger generative AI models and showcases impressive performance improvements achieved by leading technology companies.
The MLPerf Inference v4.0 benchmark suite, developed by MLCommons, is an industry-standard tool for measuring the performance of machine learning (ML) systems in various deployment scenarios. To keep up with the evolving field of generative AI, the working group created a new task force to determine which models should be added to the v4.0 version of the benchmark. After careful consideration, the suite includes two new benchmarks: Llama 2 70B and Stable Diffusion XL.
Llama 2 70B is a model with 70 billion parameters, an order of magnitude larger than the GPT-J model introduced in MLPerf Inference v3.1. This larger model scale requires different types of hardware and provides excellent benchmarks for high-end systems. Llama 2 70B is included in the MLPerf Inference v4.0 release, marking a significant increase in model parameters and showcasing the rapid development of generative AI models.
Stable Diffusion XL, with 2.6 billion parameters, is a popular text-to-image generative AI model used to create captivating images based on text prompts. The benchmark tests calculate metrics such as latency and throughput by generating a large number of images to assess overall performance.
The results of MLPerf Inference v4.0 include over 8,500 performance results and 900 power results from 23 submitting organizations. Four companies, including Dell, Fujitsu, NVIDIA, and Qualcomm Technologies, have submitted power data focused on data centers for MLPerf Inference v4.0, demonstrating ongoing progress in efficient AI acceleration.
NVIDIA, running TensorRT-LLM software on the Hopper system, provides the most powerful platform for generative AI. The H200 GPU, equipped with 141GB of HBM3e memory and 4.8TB/s bandwidth, achieved a record of 31,000 tokens per second in the Llama 2 benchmark, a 45% improvement over the H100 GPU.
TensorRT-LLM, NVIDIA's software for optimizing large language model inference, played a crucial role in these performance improvements. It nearly tripled the performance of the Hopper GPU on the GPT-J LLM compared to results from just six months ago in MLPerf Inference v3.1. This demonstrates the powerful capabilities of NVIDIA's full-stack approach, optimizing both hardware and software for generative AI workloads.
NVIDIA also showcased strong performance in the Stable Diffusion XL benchmark, with the NVIDIA HGX H200 system achieving 13.8 queries per second and 13.7 samples per second in server and offline scenarios with 8 GPUs. Additionally, custom cooling designs for the H200 GPU, such as the MGX platform, can improve performance by up to 14% compared to standard air-cooled variants.
In MLPerf Inference v4.0, Intel's Gaudi 2 accelerator remains the only benchmarked alternative to NVIDIA's H100 GPU in generative AI performance. While Gaudi 2's performance lags behind NVIDIA's products, Intel claims it offers strong cost-effectiveness, an important consideration for total cost of ownership.
In MLPerf Inference v4.0, Intel's 5th generation Xeon Scalable processors with Intel Advanced Matrix Extensions (AMX) also demonstrated significant improvements. Compared to the previous generation, the 5th generation Xeon processors achieved an average performance improvement of 1.42x across different categories in MLPerf. Notably, in the GPT-J benchmark, the 5th generation Xeon achieved a 1.8x performance improvement compared to the submitted v3.1 version through software optimizations such as continuous batching.
As the demand for generative AI continues to grow, hardware and software vendors are constantly pushing the limits of performance. Just last week at the GTC conference, NVIDIA's founder and CEO, Jensen Huang, announced the upcoming Blackwell GPU, which will provide new levels of performance for trillion-parameter AI models.
Meanwhile, Intel continues to improve its AI offerings in its product portfolio, providing customers with a variety of solutions to meet their diverse AI needs.
The MLPerf Inference benchmark is a valuable tool for customers to evaluate AI performance and make informed decisions when selecting specific workload systems. As the industry-standard benchmark continues to evolve and incorporate more generative AI models and real-world scenarios, it will continue to drive innovation and competition in the AI hardware and software field.