NVIDIA Unveils DGX SuperPOD: The Most Powerful AI Computing Platform
At this year's NVIDIA GTC conference, NVIDIA unveiled the new DGX SuperPod system, its most powerful system to date and part of a wide-ranging hardware and software promotion.
In recent years, DGX has become one of NVIDIA's main server hardware and cloud systems. The new DGX SuperPod system features NVIDIA's next-generation AI acceleration GPU, Blackwell, which follows the Hopper GPU and was also announced at the GTC conference. NVIDIA positions Blackwell as the support and empowerment for AI models with trillions of parameters.
The DGX SuperPod integrates the GB200 superchip version of Blackwell, which includes CPU and GPU resources. NVIDIA's previous Grace Hopper series superchips were the core of the previous generation DGX system. NVIDIA's existing DGX systems have been widely used in various use cases such as drug discovery, healthcare, fraud detection, financial services, recommendation systems, and consumer internet.
Ian Buck, Vice President of NVIDIA's Large Scale and High-Performance Computing, said at the press conference, "It is a world-class supercomputing platform that is ready to use. It supports NVIDIA's complete AI software stack and provides unparalleled reliability and scalability."
"What's inside the DGX SuperPod?"
Although the term "SuperPod" may seem like a marketing exaggeration, the hardware included in NVIDIA's new DGX system is impressive.
The DGX SuperPod is not just a single rack server; it is a combination of multiple DGX GB200 systems. Each DGX GB200 system is equipped with 36 Nvidia GB200 superchips, including 36 Nvidia Grace CPUs and 72 Nvidia Blackwell GPUs, which are connected into a supercomputer through the fifth-generation Nvidia NVLink.
What makes the SuperPod "super" is that it can be configured with 8 or more DGX GB200 systems and can connect tens of thousands of GB200 superchips through NVIDIA's Quantum InfiniBand.
The system can provide 240TB of memory, which is crucial for training large-scale language models (LLMs) and large-scale inference of generative AI. Another impressive number claimed by NVIDIA is that the DGX SuperPod has 11.5 exaflops of AI supercomputing power.
"Advanced networking and data processing units empower the gen AI SuperPod architecture"
The core of the DGX SuperPod's "super" capability lies in its ability to connect so many GB200 systems through a unified computing architecture.
Driving this architecture is NVIDIA's latest Quantum-X800 InfiniBand network technology, which provides up to 1800GB/s bandwidth per GPU on each platform.
The DGX also integrates NVIDIA's BlueField-3 DPU (Data Processing Unit) and the fifth-generation NVIDIA NVLink interconnect technology.
In addition, the new SuperPod also includes NVIDIA's fourth-generation Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology. According to NVIDIA, the new version of SHARP in the new generation DGX SuperPod architecture provides 14.4 teraflops of in-network computing power, a four-fold increase compared to the previous generation.
"Blackwell to be applied to NVIDIA DGX Cloud"
The DGX system based on the new GB200 will also be included in NVIDIA's DGX cloud services.
The capabilities of GB200 will first be available on Amazon Web Services (AWS), Google Cloud, and Oracle Cloud.
Buck said, "DGX Cloud is a cloud service that we have collaborated deeply with our cloud partners to design, aiming to provide the best NVIDIA technology for our own AI research and product development, as well as to serve our customers."
The new GB200 will also contribute to the Ceiba supercomputer project developed by NVIDIA in collaboration with Amazon Web Services (AWS), which was first announced in November 2023. The Ceiba project aims to create the world's largest public cloud supercomputing platform using DGX Cloud.
Buck said, "I am pleased to announce that the Ceiba project has made breakthrough progress, and we have now upgraded it to support the Grace Blackwell architecture with 20,000 GPUs. It will provide over 400 exaflops of AI computing power."