Tencent Launches Hunyuan-Large Model, Sets New Benchmark for MoE Models in the Industry

2024-11-05

Today, Tencent officially announced the debut of its newly developed Hunyuan-Large model. According to the company, this model is currently the largest open-source Mixture of Experts (MoE) model in the industry based on the Transformer architecture, boasting a total of 389 billion parameters and an impressive 52 billion active parameters.

To further advance technological development in the field of artificial intelligence, Tencent has open-sourced three versions of Hunyuan-Large on the Hugging Face platform: Hunyuan-A52B-Pretrain, Hunyuan-A52B-Instruct, and Hunyuan-A52B-Instruct-FP8. Additionally, Tencent has released comprehensive technical reports and training and inference operation manuals to assist developers in gaining a deeper understanding of the model's technical features and operational processes.

Technically, the Hunyuan-Large model showcases numerous advantages. Firstly, by employing high-quality synthetic data to enhance training, the model is capable of learning more diverse representation features, effectively handling long-context inputs, and better generalizing to unseen data, thereby improving its generalization ability and robustness.

Secondly, regarding memory usage and computational overhead, Hunyuan-Large utilizes an innovative KV cache compression technique. By introducing Grouped Query Attention (GQA) and Cross-Layer Attention (CLA) strategies, the model significantly reduces the memory footprint and computational costs of KV caching, thereby enhancing inference throughput and efficiency.

Furthermore, to cater to the learning requirements of different expert sub-models, Hunyuan-Large incorporates expert-specific learning rate scaling. This technique assigns varying learning rates to different experts, ensuring that each sub-model can effectively learn from the data and contribute to the overall performance enhancement.

Hunyuan-Large excels in handling extended contexts. The pre-trained model supports text sequences up to 256K tokens, while the Instruct model accommodates sequences up to 128K tokens, providing the model with a significant advantage in managing tasks involving lengthy contextual inputs.

To validate the practical application and security of Hunyuan-Large, Tencent conducted extensive benchmark testing across various languages and tasks. The test results revealed that the model achieved remarkable performance across multiple domains and tasks, showcasing its strong application potential and value.

With the release of the Hunyuan-Large model, Tencent has not only injected new vitality into the artificial intelligence sector but also provided developers with more powerful tools and platforms. As the model continues to be optimized and refined in the future, it is expected to play a significant role in an increasing number of fields and scenarios.