DeepSeek releases open-source Coder V2 model, surpassing GPT-4 Turbo

2024-06-18

Chinese AI startup DeepSeek, which has attracted attention for successfully training a ChatGPT competitor based on 20 trillion English and Chinese tokens, recently announced the release of DeepSeek Coder V2, an open-source expert mixed (MoE) code language model.


DeepSeek Coder V2 is built on the foundation of the MoE model DeepSeek-V2, which made its debut last month and has performed exceptionally well in programming and mathematical tasks. This model supports over 300 programming languages and outperforms numerous state-of-the-art closed-source models, including GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro. DeepSeek proudly claims that this is the first open model to achieve such a feat, surpassing similar models like Llama 3-70B in performance.

So, what new highlights does DeepSeek Coder V2 bring?

Established last year, DeepSeek is on a mission to "unveil the mystery of AGI (Artificial General Intelligence) with curiosity" and has become a shining star in AI competitions, alongside companies like Qwen, 01.AI, and Baidu. In fact, within just one year of its establishment, the company has open-sourced a series of models, including the DeepSeek Coder family.

The original DeepSeek Coder had a maximum of 33 billion parameters and performed well in benchmark tests, with features such as project-level code completion and filling. However, it only supported 86 programming languages at the time and had a context window of 16K. The new V2 version has significantly expanded on this, increasing language support to 338 and expanding the context window to 128K, enabling it to handle more complex and diverse coding tasks.

In a series of benchmark tests, including MBPP+, HumanEval, and Aider, which aim to evaluate the capabilities of large language models (LLMs) in code generation, editing, and problem-solving, DeepSeek Coder V2 achieved high scores of 76.2, 90.2, and 73.7, respectively. These scores outperformed numerous closed-source and open-source models, including GPT-4 Turbo, Claude 3 Opus, Gemini 1.5 Pro, Codestral, and Llama-3 70B. Similar outstanding performance was observed in benchmark tests evaluating the model's mathematical abilities (MATH and GSM8K).


However, the only model that surpassed DeepSeek's products in multiple benchmark tests was GPT-4o, which achieved slightly higher scores in HumanEval, LiveCode Bench, MATH, and GSM8K.

DeepSeek attributes these technological and performance advancements to its DeepSeek V2, based on the expert mixed (MoE) framework. Specifically, the company pretrained the base V2 model on an additional dataset containing 60 trillion tokens, primarily sourced from code and math-related data on GitHub and CommonCrawl.

This allows the model, which comes with 16B and 236B parameter options, to activate only 2.4B and 21B "expert" parameters when solving current tasks, meeting diverse computational and application needs.

In addition to excelling in coding and math-related tasks, DeepSeek Coder V2 has demonstrated powerful performance in general inference and language understanding tasks. For example, in the MMLU benchmark test, the model achieved a high score of 79.2, surpassing other models specifically designed for code and almost matching Llama-3 70B's score. Although GPT-4o and Claude 3 Opus continue to lead in the MMLU category, DeepSeek Coder V2's outstanding performance proves the increasing capabilities of open-source code models, approaching those of state-of-the-art closed-source models.


Currently, DeepSeek Coder V2 is open under the MIT license, allowing for research and unrestricted commercial use. Users can download instruction and base models of sizes 16B and 236B through the Hugging Face platform. Additionally, DeepSeek offers an on-demand API access mode through its platform, making it convenient for users to utilize this advanced model.

For those who want to test the model's functionality beforehand, DeepSeek also provides the option to interact with DeepSeek Coder V2 through a chatbot, allowing users to experience the model's powerful capabilities firsthand.