Intel Gaudi Outperforms NVIDIA in AI Accelerator Cost-Effectiveness

2024-01-08

NVIDIA is not the only company manufacturing AI accelerators for training and inference. Intel is also fiercely competing in this field and has achieved remarkable results with its Intel Gaudi 2 technology.

Databricks conducted new research, which showed that Intel Gaudi 2 has strong performance competitiveness compared to NVIDIA's industry-leading AI accelerators. Databricks' research found that in large language model (LLM) inference, Gaudi 2 has comparable decoding latency to NVIDIA's H100 system and outperforms NVIDIA's A100 in terms of performance. The research also found that Gaudi 2 inference has higher memory bandwidth utilization than H100 and A100.

On its top accelerators, NVIDIA still offers more training performance. Using Databricks' MosaicML LLM manufacturing factory for training, researchers found that Gaudi 2 ranks second in single-node LLM training performance, surpassed only by NVIDIA's H100, with a performance of over 260 TFLOPS/chip. Overall, Databricks' research report states that based on public cloud pricing, Gaudi 2 has the best cost-effectiveness in training and inference compared to A100 and H100.

Intel has provided its own test results for Gaudi 2 in training and inference through MLcommons MLPerf benchmarks. The new data provided by Databricks provides third-party validation for Intel's Gaudi technology performance.

"We are impressed with the performance of Gaudi 2, especially the high utilization achieved in large language model inference," said Abhinav Venigalla, Chief NLP Architect at Databricks. "We expect to evaluate the performance using Gaudi 2's FP8 support, which is available in their latest software release, but due to time constraints, we could only check the performance using BF16."

Databricks' performance numbers are not surprising for Intel. Eitan Medina, Chief Operating Officer of Habana Labs, a subsidiary of Intel, said the report is consistent with Intel's measured data and feedback from customers.

"It's always good to have validation for what we say," Medina said. "Since many people say that Gaudi is Intel's best-kept secret, it is important to publish these reviews so that more and more customers know that Gaudi is a viable alternative."

Intel continues to strive for competitive advantage with Gaudi

In 2019, Intel acquired AI chip startup Habana Labs and its Gaudi technology for $2 billion and has been continuously improving this technology since then.

One of the ways vendors prove performance is through industry-standard benchmark tests. Both NVIDIA and Intel regularly participate in MLcommons MLPerf benchmarks, which focus on training and inference and are updated multiple times a year. In the latest MLPerf 3.1 training benchmark test released in November last year, both NVIDIA and Intel claimed to have set new records for LLM training speed. The MLPerf 3.1 inference benchmark test was also released a few months ago in September, demonstrating strong competition from both NVIDIA and Intel.

Although benchmarks like MLPerf and reports from Databricks are valuable, Medina pointed out that many customers rely on their own tests to ensure that the hardware and software stack are suitable for specific models and use cases.

"The maturity of the software stack is extremely important because people have doubts about those benchmarking organizations, and vendors may go to great lengths to optimize specific benchmark tests," he said.

According to Medina, MLPerf has its place because people know they have to submit results and the technology stack needs to reach a certain level of maturity. Nevertheless, he emphasized that MLPerf results are not relied upon when making commercial decisions.

"MLPerf results are to some extent the maturity filter that organizations use before investing time in testing," Medina said.

Gaudi 3 to be launched in 2024

As the new Gaudi 2 data is released, Intel is preparing to launch Gaudi 3 AI accelerator technology in 2024.

Gaudi 2 is based on 7nm process, while Gaudi 3 is based on 5nm process and will provide four times the processing power and double the network bandwidth. Medina stated that Gaudi 3 will be released and put into mass production in 2024.

"Gaudi 3 is a product that takes over from Gaudi 2 and provides leading performance," Medina said. "This is actually a huge leap in performance, translating into advantages in performance per dollar and performance per watt."

Looking ahead, Intel is developing future generations of products that integrate the company's high-performance computing (HPC) and AI accelerator technologies in 2025 and beyond. Intel also continues to see the value of its CPU technology in AI inference workloads. Intel recently announced its fifth-generation Xeon processors with AI acceleration capabilities.

"CPUs still account for a considerable proportion in inference, and even fine-tuning on CPUs may have advantages," Medina said. "CPUs participate in data preparation and, of course, work together with Gaudi accelerators for those workloads that are extremely AI compute-intensive. Therefore, the overall strategy is to provide a range of solutions."