QianWen Team Releases Lightweight Qwen1.5-MoE-A2.7B MoE Model

2024-04-01

Recently, the Qianwen team of Tongyi announced the open source of its first MoE model, Qwen1.5-MoE-A2.7B, which is known for its lightweight design with only 2.7 billion activation parameters. Despite its smaller size compared to industry-leading models like Mistral 7B and Qwen1.5-7B with 7 billion parameters, this model demonstrates impressive performance. Notably, it achieves significant optimization in both training cost and inference speed. Qwen1.5-MoE adopts a carefully designed MoE architecture in its model structure, with deep optimization of the MoE layer configuration in the transformer block. The model introduces 64 fine-grained experts and innovates in the initialization process and routing mechanism. Particularly, its fine-grained experts technique cleverly divides the FFN layer, generating more experts and significantly improving the computational capability of the model without increasing the parameter count. Experimental data shows that the Qwen1.5-MoE-A2.7B model achieves remarkable results in language understanding, mathematical ability, and code comprehension, demonstrating strong competitiveness in multilingual benchmark tests. Although there is still room for further optimization, its performance is already close to the best-in-class 7B models in the industry. It is worth mentioning that Qwen1.5-MoE-A2.7B reduces training costs by up to 75% compared to Qwen1.5-7B, while improving inference speed by 1.74 times. This achievement not only highlights the advantages of MoE models in parameter simplification but also significantly reduces the consumption of computational resources while maintaining high performance, injecting new vitality into the development of the AI field. The open-source Qwen1.5-MoE-A2.7B model released by the Qianwen team undoubtedly brings new technological breakthroughs to the field of artificial intelligence, indicating that achieving more efficient and environmentally friendly model computation and resource utilization through technological innovation without sacrificing performance will be an important trend in the future development of AI.