AMD's ROCm Ready to Challenge NVIDIA's CUDA

2023-12-08

While the world is still craving for more NVIDIA GPUs, AMD has released the MI300X, a product that is said to be much faster than NVIDIA's offerings. AMD's goal is not only to challenge NVIDIA in terms of hardware but also to corner the competition in software through its open-source ROCm, directly competing with NVIDIA's CUDA.

"While hardware is important, it is software that truly drives innovation," said Lisa Su when talking about the upcoming release of ROCm next week.

At the "Advancing AI" conference, it is evident that AMD's focus on software has paved the way for its success. AMD President Victor Peng demonstrated how the company can create a successful open-source framework, ROCm, by building a strong ecosystem.

Peng introduced the latest iteration of their parallel computing framework, ROCm 6, optimized for the comprehensive software stack of AMD Instinct, particularly for large language models in generative AI.

Everyone loves open-source

"We designed ROCm to be modular and open-source, so that it can be easily accessed by a wide range of users and quickly contribute to the open-source AI community," said Peng, emphasizing that this software strategy highlights the fact that CUDA is proprietary and closed-source.

In addition, ROCm now supports Radeon GPUs and, combined with Ryzen 1.0 software, enables AI in edge computing, making it more accessible to AI researchers and developers.

During the demonstration, Peng also showcased a recommendation letter from OpenAI's Phillipe Tillet, stating, "OpenAI is collaborating with AMD to support an open ecosystem. We plan to support AMD's GPUs, including MI300, in the upcoming 3.0 release of the standard Triton distribution." Tillet is the creator of Triton.

AMD partnered with three emerging AI startups, Databricks, Essential AI, and Lamini, to showcase how these companies leverage AMD Instinct M1300X accelerators and the open ROCm 6 software stack to provide differentiated AI solutions for enterprise customers. All three startups are using ROCm and MI250X and highly appreciate their performance in various use cases.

Databricks co-founder Ion Stoica, Essential AI co-founder Ashish Vaswani, and Lamini co-founder Sharon Zhou discussed how they previously utilized AMD's hardware and software and demonstrated how the openness of the technology helped them fully own it.

"ROCm has been plug-and-play since day one," said Stoica, highlighting how easy it was to integrate it into the Databricks stack after acquiring MosaicML with just a little optimization. He further added that Databricks uses MI250X in almost all software workflows and eagerly anticipates MI300X.

ROCm vs. CUDA: Apple to Apple Comparison

"We have surpassed CUDA," said Zhou. Lamini previously emphasized in their blog how they found their ground with AMD and how ROCm is ready for production. Lamini's mission is to help enterprises easily access and use small language models, and AMD and ROCm have been instrumental in helping them achieve this goal.

AMD continues to strategically invest in companies like Mipsology and Nod.AI, which have significantly enhanced its capabilities in AI software.

Many open-source tools like PyTorch are ready to be used with ROCm on MI300X, making it effortless for most developers. The features of this CUDA alternative include support for new data types, advanced graphics and kernel optimizations, optimized libraries, and state-of-the-art attention algorithms.

It is worth noting that there is a significant performance improvement, with the overall latency for text generation increasing by approximately 8 times compared to running ROCm 5 on MI250.

Peng demonstrated that when performing inference on Llama 2 70B, MI300X with ROCm 6 is 8 times faster than MI250X with ROCm 5.

For smaller models like Llama 2 13B, MI300X with ROCm exhibits 1.2 times better performance compared to a single GPU with NVIDIA and CUDA.

ROCm 6 now supports dynamic FP16, BF16, and FP8 to improve performance and reduce memory usage. The new version also brings open-source libraries and supports various key features of generative AI, including FlashAttention, HIPGraph, and vLLM, which provide speedups of 1.3X, 1.4X, and 2.6X, respectively.

In conclusion, Peng stated, "I firmly believe in this. With the launch of ROCm 6 and MI300X, we will drive a turning point in developer adoption. We are empowering innovators to realize the profound benefits of AI faster."