Alibaba Cloud Launches 8th Gen Enterprise Instances G8i with 7x AI Inference Performance Boost

2024-01-12

Alibaba Cloud recently announced a breakthrough in the eighth-generation enterprise-level general-purpose computing instance ECS g8i. This instance is based on the fifth-generation Intel Xeon Scalable processor and Alibaba Cloud's self-developed "Feitian+CIPU" architecture. The overall performance of the ECS g8i instance has been increased by up to 85%, and the AI inference performance has been increased by up to 7 times. It can support large language models with up to 72 billion parameters, reducing the construction cost of small and medium-sized models by 50%.

According to Zhang Xiantao, General Manager of Alibaba Cloud's Elastic Computing product line, the outstanding performance of Alibaba Cloud's ECS g8i instance proves that the CPU-centric computing system also has great potential for accelerating AI inference. Public clouds can not only handle large-scale AI models but also create new possibilities for accelerating the implementation of AI applications.

As an enterprise-level general-purpose computing instance, the ECS g8i instance has been comprehensively improved in terms of computing, storage, network, and security. In terms of key parameters, the L3 cache capacity of the ECS g8i instance has been increased to 320MB, and the memory speed is up to 5600MT/s. The overall performance has been increased by 85%, and the single-core performance has been increased by 25%.

Currently, AI large model inference still faces many challenges in terms of computing power. For example, the first packet delay is limited by parallel processing capability and floating-point operation capability, and throughput performance is limited by memory bandwidth and network latency. The ECS g8i instance has effectively optimized these issues, including upgrading the built-in instruction set from AVX512 to Intel AMX advanced matrix extension acceleration technology, which allows generative AI to run faster. Based on the AMX AI acceleration capability, g8i can respond faster to small and medium-sized parameter models, reducing the construction cost by 50% compared to A10 GPU cloud servers when running AI workloads such as knowledge retrieval, question-answering systems, and summary generation.

At the same time, based on the self-developed eRDMA ultra-low latency elastic network, Alibaba Cloud's g8i instance cluster has the advantages of ultra-low latency network and high elasticity, which can easily support distributed inference of large language models with 72 billion parameters. In terms of security, Alibaba Cloud has built end-to-end security protection across its product line to ensure the security of data storage, data transmission, and data computation.

Zhang Xiantao said that in the next step, Alibaba Cloud will continue to deepen its technology and continuously innovate its products to provide enterprises with more stable, powerful, secure, and elastic computing services, and promote the comprehensive outbreak of AI applications in various industries.