Cerebras Announces Breakthrough in AI Supercomputer Condor Galaxy 3 with 8 ExaFLOPs Performance
Cerebras and G42 have jointly announced that they have begun building an artificial intelligence supercomputer called Condor Galaxy 3, which will deliver a performance of up to 8 exaFLOPs.
Andrew Feldman, CEO of Cerebras, based in Sunnyvale, California, stated that this powerful performance will be provided by 58 million AI-optimized cores. He also revealed that Condor Galaxy 3 will be one of the world's largest AI supercomputers, owned by G42, a national cloud and AI enabler based in Abu Dhabi, United Arab Emirates.
Condor Galaxy 3 will be equipped with 64 sets of Cerebras' latest CS-3 systems, all powered by the industry's fastest AI chip, the Wafer Scale Engine 3 (WSE-3), which can deliver 8 exaFLOPs of AI performance and has 58 million AI-optimized cores.
"We have built a large, fast AI supercomputer. As we continue to build and expand clusters, we are starting to train large-scale models on these clusters," said Feldman.
In terms of "chips," Cerebras has a unique approach. While the cores designed by the company are small, they are distributed across the entire semiconductor wafer, which is typically used to manufacture hundreds of chips. By using the same substrate to manufacture chips, it improves communication speed and processing efficiency. This is why it can install 900,000 cores on a single chip (or more precisely, a fairly large wafer).
Condor Galaxy 3, located in Dallas, Texas, is the third device in the Condor Galaxy AI supercomputer network. The strategic partnership between Cerebras and G42 has already provided 8 exaFLOPs of AI supercomputing performance through Condor Galaxy 1 and Condor Galaxy 2, both of which are among the world's largest AI supercomputers.
The completion of Condor Galaxy 3 brings the total performance of the Condor Galaxy network to 16 exaFLOPs. By the end of 2024, Condor Galaxy will provide over 55 exaFLOPs of AI computing power. Overall, Cerebras will build nine AI supercomputers for G42.
Kyrill Evtimov, CTO of G42 Group, stated in a statement, "Through Condor Galaxy 3, we continue to realize our shared vision of transforming the global AI computing inventory by developing the world's largest and fastest AI supercomputers. The existing Condor Galaxy network has already trained some leading open-source models with millions of downloads, and we look forward to seeing the Condor Galaxy supercomputer open the next wave of innovation with twice the performance."
Condor Galaxy 3 consists of 64 sets of Cerebras CS-3 systems, with its core being the new WSE-3 5nm chip, which achieves twice the performance at the same power consumption and cost. Designed specifically for training the largest AI models in the industry, the WSE-3 chip, with 40 trillion transistors, has 900,000 AI-optimized cores per chip and provides an astonishing peak AI performance of 125 petaflops.
Feldman stated, "We are pleased to announce that our newly launched CS-3 system will play a key role in our groundbreaking strategic partnership with G42. From Condor Galaxy 3 to Condor Galaxy 9, each will use 64 new CS-3 systems, expanding our computing power from 36 exaFLOPs to over 55 exaFLOPs. This marks a major milestone in the field of AI computing, providing unparalleled processing power and efficiency."
Condor Galaxy has already trained generative AI models, including Jais-30B, Med42, Crystal-Coder-7B, and BTLM-3B-8K. Among them, Jais 13B and Jais30B are the best Arabic bilingual models in the world and are now available on Azure Cloud. BTLM-3B-8K is the top-ranked 3B model on HuggingFace, providing 7B parameter performance in a lightweight 3B parameter model, suitable for inference, according to the company.
Med42 is a leading clinical LLM developed in collaboration with M42 and Core42. It was trained on Condor Galaxy 1 in just one weekend and has achieved better performance and accuracy than MedPaLM.
Condor Galaxy 3 will be launched in the second quarter of 2024.
In other news, Cerebras shared the chips that power the supercomputer. The company stated that with the launch of the Wafer Scale Engine 3 (WSE-3), its existing fastest AI chip has doubled the world record.
The WSE-3 offers twice the performance of the previous record holder, the Cerebras WSE-2, at the same power consumption and price. Designed specifically for training the largest AI models in the industry, this 5nm chip with 40 trillion transistors powers the Cerebras CS-3 AI supercomputer, delivering a peak AI performance of 125 petaflops through 900,000 AI-optimized computing cores.
Feldman stated that this powerful computer will be shipped in 150 trays.
He further stated, "We have announced the five-nanometer component of the current generation of wafer-scale engines, which is currently the fastest chip on Earth. It is manufactured by TSMC and has a component size of 46,000 square millimeters. At the five-nanometer node, it integrates an astonishing 40 trillion transistors, 900,000 AI cores, and provides 125 petaflops of AI computing power."
The CS-3 is equipped with a massive memory system of up to 1.2 petabytes, specifically designed for training the next-generation cutting-edge models that are 10 times larger than GPT-4 and Gemini. Without the need for partitioning or reconstruction, a 24 trillion parameter model can be stored in a single logical memory space, greatly simplifying the training workflow and improving developer productivity. Training a trillion-parameter model on the CS-3 is as simple as training a billion-parameter model on a GPU.
The CS-3 is suitable for both enterprise and large-scale needs. The compact four-system configuration can fine-tune a 70B model in one day, while with 2048 systems in full-scale use, Llama 70B can be trained from scratch in one day, an unprecedented achievement in the generative AI field.
The latest Cerebras software framework provides native support for PyTorch 2.0 and the latest AI models and technologies, such as multimodal models, vision transformers, expert blending, and diffusion. It is worth mentioning that Cerebras remains the only platform that provides native hardware acceleration for dynamic and unstructured sparsity, achieving up to eight times faster training speed.
Feldman excitedly said, "Eight years ago, when we started this journey, everyone thought wafer-scale processors were just an unattainable dream. Now, we are proud to introduce the third generation of our groundbreaking wafer-scale AI chip, the WSE-3. The WSE-3 is the fastest AI chip in the world, designed specifically for handling the latest cutting-edge AI work, from expert blending to 24 trillion parameter models. We are excited to bring the WSE-3 and CS-3 to the market to help tackle today's biggest AI challenges."
Each component of the CS-3 is optimized for AI work, providing higher computing performance than other systems while occupying less space and consuming less power. Despite the doubling of GPU power consumption with each generation, the performance of the CS-3 has doubled while still staying within the same power consumption range. The CS-3 also offers exceptional ease of use, requiring 97% less code for large language models compared to GPUs, and can train models from 1B to 24T parameters in pure data parallel mode. It is worth noting that on Cerebras, implementing a GPT-3 scale model only requires 565 lines of code, setting an industry record.
Feldman stated, "Our system supports models with up to 24 trillion parameters."
Cerebras has received a large number of CS-3 orders from enterprises, governments, and international cloud services.
Rick Stevens, Deputy Director of the Computing, Environment, and Life Sciences Division at Argonne National Laboratory, stated in a statement, "We have been an early customer of Cerebras solutions from the beginning, and with the 100x to 300x performance improvement brought by Cerebras wafer-scale technology, we have been able to accelerate our scientific and medical AI research rapidly. We are very excited to see what new breakthroughs CS-3 will bring us with double the performance within the same power consumption range."
Cerebras also announced a new technical and GTM collaboration with Qualcomm this week, achieving a 10x improvement in AI inference performance through the inference-aware training provided by Cerebras on the CS-3.
Raj Talluri, Vice President of Cloud Computing at Qualcomm, stated in a statement, "Our technical collaboration with Cerebras aims to provide customers with the highest-performance AI training solution while combining the best performance/total cost of ownership inference solution. In addition, customers can get fully optimized, deployable models, significantly reducing time to return on investment."
By using Cerebras' industry-leading CS-3 AI accelerator for training and Qualcomm's Cloud AI 100 Ultra for inference, production-level deployment can achieve up to a 10x price-performance improvement.
Feldman pointed out, "We are excited to announce a global partnership with Qualcomm to jointly train models optimized for their inference engines. This collaboration allows us to adopt a range of unique technologies and some more widely available ones to significantly reduce the cost of inference. As a result, we will train models to accelerate inference using a variety of different strategies."
Cerebras has over 400 engineers. Feldman said, "Delivering a large amount of computing power according to plan is not easy. I don't think any other company in this field can do what we do. In the past six months, no other startup has been able to deliver as much computing power as we have. And with the collaboration with Qualcomm, we are driving down the cost of inference."