xAI Colossus AI System Launched, GPU Reaches 100,000 Units

2024-09-04

According to reports, xAI has completed the assembly of its artificial intelligence training system called "Colossus", which is equipped with 100,000 Graphics Processing Unit (GPU) cards. This news was announced by Elon Musk, the CEO of Tesla and xAI, on his X platform on Monday. The Colossus system went online last weekend, and Musk referred to it as the "most powerful AI training system in the world". This could mean that the cluster is faster than the Aurora system of the US Department of Energy, which is currently considered the world's fastest AI supercomputer. In a benchmark test in May, Aurora achieved a speed of 10.6 exaflops. Colossus is equipped with 100,000 Nvidia H100 GPU cards, which have been regarded as Nvidia's most powerful AI processors since their launch in 2022. The H100 GPU can run language models up to 30 times faster than the previous generation. The performance of the H100 is partly attributed to its Transformer Engine module, which is specifically designed for running neural network models based on the Transformer architecture. Musk also revealed that xAI plans to increase the number of Colossus GPUs to 200,000 in the coming months, including 50,000 updated and faster H200 GPUs. The H200 is an upgraded version of the H100, which was launched by Nvidia in November last year. It achieves an improvement in data transfer speed by adopting HBM3e memory and increasing the memory capacity to 141GB. xAI's flagship large language model, Grok-2, was trained on 15,000 GPUs. The 100,000 GPUs of Colossus may contribute to the development of even more powerful language models. There are reports that xAI is expected to release the successor to Grok-2 by the end of this year. In addition, there are reports that some of the GPUs originally allocated to Tesla have been reassigned to xAI. In January of this year, CNBC pointed out that Musk requested Nvidia to redirect 12,000 H100 GPUs worth over $500 million from Tesla to xAI and other AI projects. During the same period, Musk estimated that Tesla's spending on Nvidia hardware would reach between $3 billion and $4 billion by the end of this year.