Alibaba Group Holding Ltd.'s stock price surged over 8% today following the release of an inference model that performs on par with DeepSeek-R1.
The new model, QwQ-32B, was open-sourced on Wednesday.
R1 consists of multiple neural networks with a total of 671 billion parameters. When responding to queries, it activates only one neural network, utilizing just 37 billion of its 671 billion parameters at any given time. In contrast, Alibaba's new QwQ-32B is significantly smaller, featuring a total of 32.5 billion parameters according to the company.
QwQ-32B is built upon the Transformer architecture that underpins most large language models. Transformer-based LLMs employ a machine learning technique called attention to infer sentence meanings. Through attention mechanisms, neural networks can consider multiple data points while prioritizing the most important ones during decision-making.
For this model, Alibaba implemented several modifications to the original Transformer architecture. A key addition is rotary position encoding, which enables the LLM to better understand relationships between text segments, thereby improving output quality.
The model can handle prompts containing up to 131,072 tokens, with each token representing several characters. Alibaba states that the model excels particularly in reasoning tasks such as code generation, mathematical problem-solving, and executing tasks within external applications.
The company developed QwQ-32B using a method called reinforcement learning. In reinforcement learning projects, researchers provide AI models with training tasks while a secondary AI model evaluates the responses. When the training LLM completes tasks correctly, it earns points that help guide the learning process.
Alibaba developed QwQ-32B through two training sessions. The first session focused on teaching the model mathematical and coding skills. To support this process, Alibaba set up a server to run and check code generated by QwQ-32B during training.
In the second training session, the company refined QwQ-32B's general problem-solving capabilities. Although the session followed a relatively straightforward workflow, it not only enhanced the model's problem-solving skills but also improved its ability to align outputs with user instructions.
According to Alibaba, QwQ-32B outperformed R1 in three out of five benchmark tests used for comparing LLMs. The most significant score difference came in benchmarks measuring an LLM's ability to interact with external systems, where it led by 6%. The other two tests evaluated the LLM's question-answering skills and its ability to align outputs with user instructions.
The release of QwQ-32B follows Alibaba's recent commitment to invest 380 billion yuan (approximately $53 billion) in AI infrastructure over the next three years – more than the company's combined investments in AI and its public cloud platform over the past decade.
Other Chinese tech giants are also prioritizing LLM development. Last week, Tencent Holdings Ltd. launched Hunyuan Turbo S, a "fast-thinking" inference model that responds to prompts in less than a second with output quality comparable to DeepSeek-V3, the predecessor of R1.