Beijing Artificial Intelligence Research Institute Launches Aquila2 Series Models

2024-08-20

Large-scale language models (LLMs) are leading the way in transformative research with their exceptional multitasking capabilities. However, the training process of these models faces challenges such as high computational resource consumption, strong dependence on static datasets, and difficulty in adapting to data changes quickly. To address these issues, the Language Foundation Model and Software Team at the Beijing Academy of Artificial Intelligence (BAAI) has successfully developed the Aquila2 series models, opening up new efficient pathways for LLM training.

The Aquila2 series models, ranging from 7 billion to 70 billion parameters, are trained using the innovative HeuriMentor (HM) framework. This framework integrates three core components: Adaptive Training Engine (ATE), Training State Monitor (TSM), and Data Management Unit (DMU). It not only significantly enhances the controllability and flexibility of the training process but also achieves dynamic adjustment of data distribution, thereby greatly improving training efficiency and model performance.

In terms of model design, the Aquila2 series adopts a carefully selected vocabulary of 100,000 words and combines it with Byte Pair Encoding (BPE) technology to ensure the effectiveness of vocabulary and richness of expression. At the same time, the model training data covers a balanced mix of English and Chinese, relying on high-quality datasets such as Pile and WudaoCorpus, laying a solid foundation for bilingual processing capabilities. In addition, Aquila2 introduces Group Query Attention (GQA) mechanism and Rotary Position Embedding (RoPE) positional encoding method, further enhancing inference efficiency and sequence data processing capabilities.

After comprehensive evaluation, the Aquila2-34B model has demonstrated outstanding performance in multiple natural language processing tasks, particularly in bilingual understanding and human-like comprehension abilities. Compared to leading models in the industry such as Baichuan2, Qwen, LLaMA2, and InternLM, Aquila2-34B has achieved excellent results on multiple datasets, proving its strong competitiveness and wide application potential.

It is worth mentioning that the training process of the Aquila2 series models benefits from the support of the HM framework, enabling real-time adjustment and optimization of data distribution, thereby accelerating model convergence and improving final quality. This innovation not only provides new ideas and methods for LLM training but also lays a solid foundation for the future development of AI technology.