MiniMax Unveils China's First MoE Language Model: A Hundred Billion Parameters for Efficient Complex Task Processing

2024-01-17


On January 16, 2024, MiniMax launched the first MoE (Mixture of Experts) large language model abab6 in China. This model adopts the structure of MoE and has a massive number of parameters, which enables it to handle complex tasks and improve the training efficiency and data utilization of the model. In some more challenging and detailed scenarios, abab6 also shows significant improvements compared to abab5.5.

Last month, Wei Wei, the Vice President of MiniMax, revealed at a sub-forum of the Digital China Forum and Digital Development Forum that they are about to release the first domestic large-scale model based on the MoE architecture, which is comparable to OpenAI GPT-4. After half a month of partial customer testing and feedback, abab6 is officially launched.

What is MoE (Mixture of Experts)?

The Mixture of Experts is an integrated learning method that divides a problem into multiple sub-tasks and trains a group of experts for each sub-task. The parameters of the model are also divided into multiple "experts", and only a subset of experts participates in the calculation during each inference. In this way, abab6 has the advantage of large parameters, which enables it to handle complex tasks and improve the computational efficiency of the model, allowing more data to be trained in a unit of time.

In April 2023, MiniMax released an open platform. In June 2023, they started developing the MoE (Mixture of Experts) model. Currently, most of the large language models in open source and academic projects do not adopt the MoE architecture. In order to train abab6, MiniMax independently developed an efficient MoE training and inference framework, and created some training techniques for MoE models. So far, abab6 is the first domestic MoE large language model with over 100 billion parameters.