"Domestic GPU and AI Platform Join Hands to Breakthrough: Moore Threads and Wuwen Core Successfully Train 3B Large Model"

2024-05-27

Moor Thread and Wuwen Xinquan jointly announced today that based on the domestic full-featured GPU Qianka cluster, the two parties have successfully completed the 3B-scale large-scale model "MT-infini-3B" training. This milestone achievement signifies the full validation of the domestic GPU's strength in the field of large-scale model training and opens a new chapter in the deep cooperation between domestic GPU and AI large-scale models.

It is understood that this training uses the domestic full-featured GPU MTT S4000 developed by Moor Thread and relies on Wuwen Xinquan's AIStudio PaaS platform. After 13.2 days of continuous and stable training, the cluster training stability reached 100%, and the efficiency of Qianka training compared to single-machine training exceeded 90%. This achievement not only verifies the reliability of the Qianka intelligent computing cluster in the scenario of large-scale model training but also sets a model for the deep integration of domestic large language models and domestic GPU Qianka intelligent computing clusters.


The performance of the MT-infini-3B model in this training is outstanding among models of the same scale. Compared with other models trained on mainstream international hardware, it has achieved performance leadership in multiple test sets such as C-Eval, MMLU, and CMMLU. This result fully demonstrates the strong capabilities of domestic GPU in the field of AI large-scale model training.

Summer Lixue, co-founder and CEO of Wuwen Xinquan, said that Wuwen Xinquan is committed to creating an "M x N" intermediate layer product between "M models" and "N chips" to achieve efficient and unified deployment of various large-scale model algorithms on diverse chips. Currently, Wuwen Xinquan has reached a deep strategic cooperation with Moor Thread, and "MT-infini-3B" is the first end-to-end large-scale model training case under the cooperation of both parties, and it is also the first end-to-end large-scale model training based on domestic GPU chips from 0 to 1 in the industry.

This cooperation not only provides strong support for the deep integration of domestic GPU and AI large-scale models but also injects new vitality into the development of the domestic AI industry. In the future, with the continuous maturity of domestic GPU technology and the continuous expansion of application scenarios, we have reason to believe that domestic GPU will play a more important role in the field of AI.