DeepSeek V3 Open Source: 685 Billion Parameter Model Excels in Multidisciplinary Evaluations

2024-12-27

Renowned private equity giant, Fantasia Quantitative, has officially open-sourced the latest version of its AI model, DeepSeek V3, through its artificial intelligence company, DeepSeek. This model has made significant advancements in multilingual programming capabilities and has excelled in various evaluations.

DeepSeek V3 employs a hybrid expert (MoE) architecture with 6850 billion parameters, featuring 256 experts. It uses a sigmoid routing mechanism to select the top 8 experts for each computation. This design allows the model to handle complex tasks more efficiently, enhancing both response speed and processing efficiency. The generation speed of DeepSeek V3 has increased from 20 TPS to 60 TPS, a threefold improvement over the V2.5 model, particularly in handling multimodal data and long texts.

In terms of functionality, DeepSeek V3 is capable of natural language query processing and code generation, helping developers quickly generate code snippets and improve development efficiency. Additionally, the model supports FP8 mixed-precision training and features the DualPipe algorithm, which optimizes cross-node AI-to-AI communication, further enhancing training efficiency. During pre-training and post-training, DeepSeek V3 was trained on 14.8T tokens and underwent two stages of context expansion, increasing the context window from 4K to 128K. It also underwent supervised fine-tuning and reinforcement learning.

Performance-wise, DeepSeek V3 has excelled in multiple standard and open benchmarks, especially in the areas of coding and mathematics. The chat version of DeepSeek V3 outperforms other open-source models and matches the performance of leading closed-source models. Notably, the training cost for this model was only 2.788M H800 GPU hours, totaling $5.576M, making it highly cost-effective.

From a technical perspective, the MoE architecture of DeepSeek V3 allows each expert to handle specific tasks or data types, with a dynamic routing mechanism selecting the appropriate experts for each computation. This design not only improves computational efficiency but also reduces unnecessary calculations and memory usage. In terms of operational workflow, DeepSeek V3 involves key stages such as planning, searching, extracting, and enriching, using large language models to efficiently identify and extract specific information from content and further enhance it.

Furthermore, DeepSeek V3 possesses multimodal capabilities, utilizing OCRv12 technology to better preserve text, formatting, and formulas in images. For stream rendering optimization, the web version uses a streaming output, but the current 60 TPS rendering speed may experience some delay due to the need for re-parsing Markdown each time.

In various evaluations, DeepSeek V3 has achieved outstanding results. In the LiveBench test, the model scored very high, indicating its ability to quickly respond to user queries and provide feedback. In educational benchmarks, DeepSeek V3 achieved high accuracy in MMLU and MMLU-Pro tests, surpassing all other open-source models and matching the performance of leading closed-source models. In factuality benchmarks, the model outperformed GPT-4o and Claude-Sonnet-3.5 in Chinese factual knowledge. In coding, mathematics, and reasoning benchmarks, DeepSeek V3 performed best in mathematical benchmarks and excelled in programming-related tasks.

In summary, DeepSeek V3, the latest AI model open-sourced by DeepSeek, a subsidiary of Fantasia Quantitative, has demonstrated exceptional performance in multilingual programming capabilities, performance evaluations, technical principles, and various benchmarks.