Battle of GPUs: NVIDIA vs AMD

2023-12-19

When AMD unveiled the MI300X as part of its AI push, CEO Lisa Su and her colleagues showcased the accelerator's impressive performance by comparing it to NVIDIA's H100 in Llama 2 inference performance. In this demonstration, AMD's server with eight MI300X accelerators outperformed a server with one H100 by 1.6 times.


However, NVIDIA was not satisfied with this comparison and refuted it. According to NVIDIA's blog post, contrary to AMD's demonstration, the company believes that the H100 GPU performs significantly better than the MI300X when benchmarked with optimized software.


NVIDIA countered the claim by stating that AMD did not include its optimizations in the comparison with TensorRT-LLM. NVIDIA's response included a comparison between a single H100 and an eight-way H100 GPU running the Llama 2 70B chat model.


The results obtained using the software prior to AMD's demonstration showed a performance improvement of two times at a batch size of 1. Furthermore, when applying AMD's standard 2.5-second latency, NVIDIA clearly had a significant advantage, with performance exceeding the MI300 by an astonishing 14 times.


AMD's Swift Response


Surprisingly, in response to NVIDIA's challenge, AMD conducted new benchmark tests with the MI300X, showcasing its performance superiority over the H100 by 30% when the software was finely tuned.


AMD took an active approach by emulating NVIDIA's testing conditions using TensorRT-LLM and considering common factors in server workloads, such as latency. AMD emphasized key points in its argument, particularly highlighting the advantages of using vLLM with FP16 compared to being limited to TensorRT-LLM with FP8.


AMD claimed that NVIDIA benchmarked the H100 using its proprietary TensorRT-LLM instead of the widely used vLLM. Additionally, AMD pointed out the difference in data types used, as NVIDIA compared it with DGX-H100's TensorRT-LLM using AMD's vLLM FP16 instead of vLLM that supports FP8 data types. AMD defended its decision to use vLLM with FP16, citing its widespread usage, unlike TensorRT-LLM, which does not support FP8.


The issue of considering latency in server environments is also a point of contention. AMD criticized NVIDIA for focusing only on throughput performance without addressing real-world latency issues.


To counter NVIDIA's testing methodology, AMD conducted three performance runs using NVIDIA's TensorRT-LLM. The tests demonstrated improved performance and reduced latency. AMD applied additional optimization measures, resulting in a 2.1 times performance improvement compared to the H100 when running vLLM on both platforms.


Intense Competition


The competition between NVIDIA and AMD has been ongoing for a long time. However, it is interesting to note that this is the first time NVIDIA has directly compared the performance of its product to AMD's. This clearly indicates that competition in this field is heating up.


Currently, NVIDIA needs to devise a strategy in response to AMD, considering the consequences of abandoning FP16 to support FP8 in the closed system of TensorRT-LLM. They also need to be aware that other companies, such as Intel and Cerebras, are also improving their GPU manufacturing capabilities.


Not only these two chip giants, but other companies like Cerebras Systems and Intel are also trying to make their mark in the market. Intel CEO Pat Gelsinger showcased the Gaudi3 AI chip at its AI Everywhere event, although little information is available about it.


NVIDIA plans to launch the GH200 superchip in early next year. However, AMD did not compare its new chip with the GH200 but with the H100. It is evident that the performance of the GH200 will certainly be better than the previous generation. Due to their close levels of performance, many companies may consider AMD as an alternative, such as Microsoft, Meta, and Oracle, which have already announced their integration into data centers.


Gelsinger predicts that the GPU market will reach a size of $400 billion by 2027. This undoubtedly provides development opportunities for many competitors.


Meanwhile, Cerebras Systems CEO Andrew Feldman criticized NVIDIA's monopoly behavior at the Global AI Conference. "We spent time figuring out how to do better than NVIDIA," he said when talking about the company's ambitious plans. "By next year, we will build 36EB of AI computing power."