NVIDIA Leads High Performance Computing Field with Infiniband Technology

2023-11-28

"Huang Renxun stated during the NVIDIA earnings conference call, 'The vast majority of specialized large-scale AI factories have standardized on InfiniBand.' NVIDIA has consistently performed well, and its latest third-quarter financial report reflects the unstoppable growth of this tech giant. The company recently announced revenue of $18.12 billion, a year-on-year increase of 206% and a quarter-on-quarter increase of 34%. NVIDIA attributes this astonishing growth to the continuous improvement of its NVIDIA HGX platform and the end-to-end network achieved through InfiniBand. NVIDIA has announced that its network business now contributes to an annualized revenue run rate of over $10 billion, nearly triple that of last year. This is due to the increasing demand for InfiniBand, which has grown at a rate of five times per year. InfiniBand is considered crucial for achieving the scale and performance required for training large language models (LLMs). When combined with NVIDIA HGX, it forms the infrastructure for AI supercomputers and data centers. InfiniBand is typically used to connect servers in supercomputing environments. Its greatest advantage is its ability to provide low latency and high bandwidth communication, which is essential for parallel processing tasks. NVIDIA's Quantum InfiniBand switches are said to meet these requirements at a lower cost and complexity for large-scale datasets and high-resolution simulations. A few months ago, NVIDIA achieved breakthrough performance on its leading H100 chip. These tests were conducted on 3,584 H100 GPUs connected with InfiniBand, allowing the GPUs to deliver performance at both the individual and scale levels. This demonstrates their strength when combined with high-performance networking capabilities. Regarding the future of InfiniBand, Huang Renxun said that the vast majority of specialized large-scale AI factories have standardized on InfiniBand, not only because of data rates and latency, but also because of the way traffic moves in the network. He also referred to it as a 'Computing Fabric.' Comparing it to Ethernet, Huang mentioned the significant differences between the two. With NVIDIA investing $2 billion in AI factories, any deviation would result in millions of dollars worth of changes and accumulate significant costs over the next 4-5 years. Huang stated that the value proposition of InfiniBand for AI factories is 'undeniable.' However, Ethernet is not excluded. InfiniBand is used in cases that require high bandwidth and low latency, while Ethernet is applicable in other scenarios. Ethernet is a widely used general-purpose network technology for wired local area networks (LANs) and is suitable for a wide range of applications, primarily for connecting end devices. However, its capabilities cannot match those of InfiniBand. Interestingly, NVIDIA also offers gateway devices that connect InfiniBand data centers to Ethernet-based infrastructure and storage. NVIDIA will also release Spectrum-X, an Ethernet product, in the first quarter of next year, which is said to have 1.6 times higher network performance than other available Ethernet technologies. In terms of functionality, Intel's Omni Path Architecture (OPA) was designed for low-latency communication in high-speed data transfer and high-performance computing environments. It was released in 2016 and discontinued in 2019. With the availability of GPU and networking products, enterprises can now choose to integrate their entire architecture framework with NVIDIA products. In addition to discussing partnerships with Reliance, Infosys, and Tata, the company mentioned collaborations with multiple organizations to optimize their InfiniBand usage in AI computing requirements. During the earnings conference call, NVIDIA discussed its partnership with Scaleway, a French private cloud provider, which will establish their regional AI cloud based on NVIDIA H100 InfiniBand and AI enterprise software to drive AI progress in Europe. Furthermore, the Jülich Supercomputing Centre in Germany announced its plans to build the next-generation AI supercomputer using nearly 24,000 Grace Hopper Superchips and Quantum-2 InfiniBand, making it the world's most powerful AI supercomputer with over 90 exaflops of AI performance. Interestingly, Microsoft Azure uses over 29,000 miles of InfiniBand cabling. InfiniBand-enabled HB and N-series virtual machines are used by Microsoft to achieve high-performance computing (HPC) with cost-effectiveness. By bundling networking and GPUs, NVIDIA is driving its growth and position in the supercomputing market. Considering the lack of alternatives to NVIDIA InfiniBand, the company's dominance seems to be further enhanced, ultimately making it indispensable for companies looking to leverage GPUs and networking."