DeepSeek AI Open Sources Advanced Large Language Model, Outperforming Llama2

2023-12-04

Chinese artificial intelligence startup DeepSeek AI has open-sourced and launched the DeepSeek LLM series of cutting-edge large-scale language models. This includes DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat.

The 67B Base version of DeepSeek LLM outperforms Llama2 70B Base in inference, encoding, mathematics, and Chinese comprehension, demonstrating superior performance.

This progress is not only a quantitative leap but also a qualitative improvement, reflecting the model's proficiency in a wide range of application areas. In particular, DeepSeek Chat achieved a pass rate of 73.78% on the encoding benchmark test HumanEval, surpassing models of similar scale. It also achieved an exceptionally high score of 84.1% on the mathematical dataset GSM8K without any fine-tuning.

DeepSeek AI has open-sourced models with 700 million and 6.7 billion parameters, including both the base version and a specialized chat variant. By providing open access to these models, the company hopes to promote broader artificial intelligence research and commercial applications.

To ensure fairness in performance evaluation, DeepSeek AI has designed new question sets, including the Hungarian National High School Exam and Google's Instruction Following Evaluation Dataset. These evaluations demonstrate the model's outstanding capabilities in previously unseen exams and tasks.

The startup outlines its rigorous data collection and training process, which focuses on respecting copyrights while enhancing diversity and uniqueness. Its multi-step pipeline introduces high-quality text, mathematics, code, books, and other data, applying filtering to remove toxic and duplicate content.

DeepSeek's language models adopt a similar architecture to LLaMA and undergo intensive pre-training. The 7B model utilizes multi-head attention mechanisms, while the 67B model leverages grouped query attention techniques. The training process involves large batch sizes and a multi-step learning rate schedule, ensuring robust and efficient learning.

Users can access DeepSeek Chat and DeepSeek Encoder through web interfaces similar to ChatGPT or Claude. However, please note that the web-based chatbots provided by DeepSeek include content restrictions to comply with Chinese regulations. This is different from safety warnings enabled by reinforcement learning through human feedback.

If a question involves sensitive topics, it will be automatically blocked and subsequently deleted. The model will not provide an answer but display a message stating the content has been withdrawn for "security reasons."

However, this censorship seems to only occur in the web experience at https://chat.deepseek.com/ and not in the actual model.

DeepSeek's release sets a new standard for the AI community, offering exciting possibilities for researchers and practitioners. This open-source initiative not only demonstrates DeepSeek AI's commitment to driving the development of the field but also makes a significant contribution to the AI community's ongoing mission to pursue more advanced and capable language models.