New Open-Source AI Model Matches DeepSeek in Performance with Significantly Less Training Data

2025-02-14

An international research team from leading academic institutions and tech companies has recently unveiled a new model that is set to revolutionize the AI reasoning field. This model matches the performance of DeepSeek, one of China's most advanced AI systems, and even surpasses it in some areas.

OpenThinker-32B, developed by the Open Mind Alliance, achieved an accuracy rate of 90.6% on the MATH500 benchmark, slightly higher than DeepSeek's 89.4%.

The model also outperforms DeepSeek in general problem-solving tasks, scoring 61.6 on the GPQA-Diamond benchmark compared to DeepSeek's 57.6. On the LCBv2 benchmark, it scored a solid 68.9, demonstrating strong performance across diverse testing scenarios.

In other words, it outperforms the similar-scale DeepSeek R1 version in general scientific knowledge (GPQA-Diamond). It beat DeepSeek on MATH500 but fell short on the AIME benchmark—both tests aim to measure mathematical ability.

In coding, it lags slightly behind DeepSeek, scoring 68.9 versus 71.2. However, since the model is open-source, all these scores could significantly improve once people begin enhancing it.

This achievement stands out due to its efficiency: OpenThinker achieves these results with only 114,000 training samples, whereas DeepSeek uses 800,000.

The OpenThoughts-114k dataset provides detailed metadata for each question, including correct solutions, test cases for code problems, starting code when necessary, and domain-specific information.

Its custom Curator framework verifies code solutions based on test cases, while AI reviewers handle mathematical validation.

The team reported using four nodes, each equipped with eight H100 GPUs, completing the process in approximately 90 hours. Another dataset containing 137,000 unverified samples was trained on Italy's Leonardo supercomputer, consuming 11,520 A100 hours in just 30 hours.

"Verification maintains quality while expanding the diversity and scale of training prompts," the team noted in their documentation. The study shows that even unverified versions perform well, though they do not reach the optimal results of verified models.

The model is built on Alibaba's Qwen2.5-32B-Instruct LLM and supports a moderate 16,000-token context window—sufficient for complex mathematical proofs and lengthy coding problems but significantly less than current standards.

This release comes at a time of intensifying competition in AI reasoning capabilities, seemingly progressing at the speed of thought. OpenAI announced on February 12 that all subsequent GPT-5 models will have reasoning capabilities. The following day, Elon Musk heavily promoted xAI's Grok-3 enhanced problem-solving ability, promising it would be the best reasoning model yet. Just hours before, Nous Research released another open-source reasoning model, DeepHermes, based on Meta's Llama 3.1.

Momentum in the field grew after DeepSeek demonstrated performance comparable to OpenAI’s o1, with costs significantly reduced. DeepSeek R1 can be freely downloaded, used, and modified, and its training techniques are openly shared.

However, unlike the Open Mind team, which decided to open-source everything, the DeepSeek development team keeps its training data confidential.

This key difference means developers may find it easier to understand OpenThinker and reproduce its results from scratch, as they have access to all the pieces of the puzzle.

For the broader AI community, this release reaffirms the feasibility of building competitive models without large proprietary datasets. Additionally, for Western developers still uncertain about using Chinese models, it could be a more trustworthy competitor—whether or not it is open-source.

OpenThinker is available for download on HuggingFace. A smaller, less powerful 7B-parameter model is also available for lower-end devices.

The Open Mind team includes researchers from various U.S. universities, such as Stanford, Berkeley, and UCLA, along with Germany's Jülich Supercomputing Centre. It is supported by the Toyota Research Institute in the U.S. and other participants in the EU's AI sector.