Ampere Computing, the server CPU design company, announced that its AmpereOne chip series will expand to 256 cores next year. In addition, the company will collaborate with Qualcomm to develop cloud AI accelerators.
Jeff Wittich, Chief Product Officer, stated that the new Ampere Central Processing Unit (CPU) will offer 40% higher performance than any CPU currently on the market.
Based in Santa Clara, California, Ampere will work with Qualcomm to develop AI inference solutions using Qualcomm's high-performance, low-power Qualcomm Cloud AI 100 inference solution and Ampere CPU.
Ampere CEO Renee James stated that the increasing power demands and energy challenges of artificial intelligence have brought more attention to Ampere's performance and efficiency-focused silicon design approach than ever before.
"We started on this path six years ago because it was clear that this was the right path," James said. "Low power used to be synonymous with low performance. But Ampere has proven that this is not true. We have pioneered the frontier of computational efficiency and delivered performance beyond traditional CPUs in an efficient computing environment."
Data Center Energy Efficiency
James stated that the industry is facing a growing problem with AI's rapid development: energy.
"The current path is unsustainable. We believe that future data center infrastructure must consider how to upgrade existing air-cooled environments to accommodate upgraded computing capabilities and build environmentally sustainable new data centers that align with the available power from the grid. This is what we are pursuing at Ampere," James said.
Wittich responded to James' comments.
"Why are we developing new CPUs? It is to address the increasingly serious power problem in data centers - in fact, data centers consume more and more power. This has always been a problem. But today's problem is more serious than a few years ago because now we have AI as a catalyst to consume more power," Wittich said. "Creating more efficient solutions is crucial. We do this in general computing. We also do this in AI. We really need to urgently establish broad cross-solution solutions involving numerous ecosystem partners that make these solutions widely available and address big problems, not just the power consumption itself."
Wittich shared Ampere's vision of "AI computing," which integrates traditional cloud-native capabilities into AI.
"Our Ampere CPU can run a range of workloads - from popular cloud-native applications to AI. This includes AI integrated with traditional cloud-native applications such as data processing, web services, media streaming, and more," Wittich said.
Grand Roadmap
James and Wittich both emphasized the upcoming release of the new AmpereOne platform and announced that a 12-channel, 256-core CPU is ready for production on TSMC's N3 manufacturing process node. Ampere designs chips and collaborates with external foundries for manufacturing. The previously announced chip with 192 cores, released last year, is now in production and available on the market.
Ampere is collaborating with Qualcomm to jointly launch a solution that integrates Ampere CPU and Qualcomm Cloud AI 100 Ultra. This solution will address the LLM inference problem of the industry's largest generative AI models.
Wittich stated that Ampere is working with Qualcomm to develop a joint solution to manufacture truly efficient CPUs. Qualcomm has a truly efficient high-performance AI accelerator. Their Cloud AI 100 Ultra card excels in all aspects of AI, especially in very large models with hundreds of billions of parameters," he said.
He said that when you have such models, you may want a specialized solution like an accelerator. Therefore, Ampere is working with Qualcomm to optimize a joint solution called Super Microserver, which will be plug-and-play and convenient for customers to adopt.
"This is an innovative solution for people in the AI inference field," Wittich said. "We have done some really cool work with Qualcomm."
The upcoming 256-core AmpereOne CPU will expand its 12-channel platform. It will adopt the same air-cooling solution as the existing 192-core AmpereOne CPU and offer more than 40% higher performance than any CPU currently on the market, without requiring special platform designs. The company's 192-core, 12-channel memory platform is expected to be launched later this year, an increase from the previous eight-channel memory.
Ampere also announced that Meta's Llama 3 is now running on Oracle Cloud's Ampere CPU. Performance data shows that running Llama 3 on a 128-core Ampere Altra CPU without a GPU performs the same as Nvidia A10 GPU paired with an x86 CPU, while consuming only one-third of the power.
Ampere announced the establishment of the UCIe working group as part of the AI Platform Alliance, which began in October last year. As part of this, the company stated that it will leverage the flexibility of its CPU to integrate other customer IPs into future CPUs using open interface technology.
Competition is a Good Thing
Executives provided new details about AmpereOne's performance and Original Equipment Manufacturer (OEM) and Original Design Manufacturer (ODM) platforms. AmpereOne continues to maintain its lead in performance per watt over AMD's Genoa by 50% and Bergamo by 15%. For data centers looking to refresh and consolidate old infrastructure to regain space, budget, and power, AmpereOne's performance per rack has increased by up to 34%.
The company also revealed that the new AmpereOne OEM and ODM platforms will be shipped in the coming months.
Ampere announced a collaboration with NETINT to launch a joint solution that utilizes the company's Quadra T1U video processing chip and Ampere CPU, capable of transcoding 360 live channels simultaneously and adding multilingual subtitles in real-time to 40 streams using OpenAI's Whisper model.
In addition to existing features such as memory tagging, QoS enforcement, and grid congestion management, the company also unveiled a new FlexSKU feature that allows customers to use the same SKU to address scaling and upgrading use cases.
Wittich stated that Ampere has been working with Oracle to run large models in AI clouds, reducing costs by 28% and consuming only one-third of the power of competing Nvidia solutions.
"Oracle has saved a lot of power. This allows them to deploy more AI computing capabilities by running on CPUs," he said. "This is our AI story and how it all comes together."
He stated that with the saved power, you can run with 15% fewer servers, 33% fewer racks, and 35% less power.