MoorThread launches open-source large-scale audio understanding model MooER

2024-08-26

Recently, Moore Thread officially released its self-developed audio understanding model, MooER, which is the first open-source project in China based on domestically produced full-featured GPU training. MooER not only has the ability to recognize Chinese and English speech, but also realizes the speech translation function from Chinese to English, marking an important step in the development of domestic AI speech technology. MooER's performance on the Covost2 Chinese-to-English test set is particularly impressive, achieving a BLEU score of 25.2, which is close to the level of industrial applications. In order to promote the further development of AI speech technology, Moore Thread AI team has publicly released the inference code of MooER and its 5000-hour training model, and plans to open more training code and 80,000-hour training data in the future. From a technical perspective, MooER adopts a deep learning architecture, especially through end-to-end training, directly generating text output from raw speech signals, eliminating the complex module division in traditional speech recognition systems. Its internal structure design includes three parts: Encoder, Adapter, and Decoder (based on the large-scale language model LLM), which are responsible for feature extraction, model adaptability, and text generation, respectively. In addition, MooER also introduces the LoRA (Low-Rank Adaptation) technology, which improves training efficiency and effectiveness by optimizing a small number of parameters in the model. It is worth noting that MooER also uses pseudo-labeling technology during the training process, which uses the model's own prediction results as training data to further enhance the model's learning ability. At the same time, the model supports speech recognition in Chinese and English, as well as speech translation from Chinese to English, demonstrating its powerful multilingual processing capabilities. Moore Thread's move undoubtedly injects new vitality into the development of domestic AI speech technology. With the opening of more training data and code, MooER is expected to become an important force driving the progress of AI speech technology.