Facebook Unveils Next-Gen AI Chip MTIA with Significant Performance Boost
Meta has released detailed information about its next-generation Meta Training and Inference Accelerator (MTIA) chip, a customized chip series designed to optimize the company's AI workloads. This latest version offers significant improvements in performance compared to its predecessor, MTIA v1, and plays a crucial role in driving Meta's ad ranking and recommendation models.
The new MTIA chip is part of Meta's ongoing investment in AI infrastructure, aiming to complement existing and future AI systems to enhance the user experience of its products and services. Recognizing the increasing computational demands and complexity of AI models, Meta acknowledges the importance of developing efficient and scalable solutions to support generative AI (GenAI) products, recommendation systems, and advanced AI research.
The new MTIA chip features an internal structure with an 8x8 grid of processing elements (PEs), significantly improving both dense computation performance (3.5x improvement over MTIA v1) and sparse computation performance (7x improvement). This chip architecture strives to achieve optimal balance between computation, memory bandwidth, and memory capacity to efficiently serve ranking and recommendation models, even with relatively small batch sizes.
Meta has developed a large-scale rack system capable of accommodating up to 72 accelerators to support the next generation of silicon chips. This system design enables the chips to operate at a frequency of 1.35GHz (an improvement over 800 MHz) with a power consumption of 90 watts. Compared to the first-generation design, it offers higher computational power, memory bandwidth, and memory capacity.
Software has been a key focus area for Meta since its investment in MTIA. The MTIA stack is designed to integrate seamlessly with PyTorch 2.0 and features such as TorchDynamo and TorchInductor. Meta has also optimized the software stack by creating the Triton-MTIA compiler backend, which generates high-performance code for MTIA hardware and improves developers' productivity.
Preliminary results show that the next-generation MTIA chip delivers three times the performance of the first-generation chip across four key models evaluated. At the platform level, Meta has achieved six times the model serving throughput and 1.5 times the performance per watt improvement on the first-generation MTIA system through doubling the number of devices and powerful dual-socket CPUs.
MTIA has been deployed in Meta's data centers and is actively serving models in production environments. This chip has proven to be highly complementary when used in conjunction with available GPUs on the market, providing the optimal combination of performance and efficiency for Meta's specific workloads. As part of Meta's long-term roadmap, MTIA will continue to evolve and expand to support the company's ambitious AI goals, including supporting GenAI workloads and investments in memory bandwidth, networking, and capacity.