DeepSeek Open-Source Week Day 3: Introducing DeepGEMM, an Open-Source Matrix Multiplication Library AI NEWS

Home
AInews
DeepSeek Open-Source Week Day 3: Introducing DeepGEMM, an Open-Source Matrix Multiplication Library

DeepSeek Open-Source Week Day 3: Introducing DeepGEMM, an Open-Source Matrix Multiplication Library

2025-02-26

In the third day of Open Source Week, the DeepSeek team unveiled its latest matrix multiplication library, DeepGEMM. Designed specifically for NVIDIA Hopper architecture, DeepGEMM is optimized for FP8 (8-bit floating point) General Matrix Multiplication (GEMM). The library provides efficient solutions for both regular and mixed expert (MoE) group GEMM computations.

DeepGEMM's key strength lies in its lightweight design and high performance. Written in CUDA, it does not require pre-compilation during installation but instead uses a lightweight Just-In-Time (JIT) module to dynamically generate all kernels at runtime. This approach simplifies operations management while ensuring code flexibility and adaptability.

Performance-wise, DeepGEMM demonstrates exceptional computational capabilities on NVIDIA H800 GPUs. In regular GEMM calculations (M=64, N=2112, K=7168), it achieves 206 TFLOPS, representing a 2.7x speedup compared to CUTLASS 3.6's optimized implementation. For MoE group GEMM calculations, DeepGEMM delivers consistent speedups ranging from 1.1x to 1.2x.

Technologically, DeepGEMM introduces innovative features such as a secondary accumulation mechanism within CUDA cores, effectively addressing precision issues in FP8 computations. It also supports non-aligned block sizes like 112, further enhancing Stream Multiprocessor (SM) utilization. Additionally, DeepGEMM deeply integrates Hopper architecture's Tensor Memory Accelerator (TMA) technology, enabling data asynchronous transfer and computation overlap, thereby boosting overall efficiency.

DeepGEMM is tailored for large models like DeepSeek-V3/R1, supporting dense matrix and MoE group computations suitable for inference and training scenarios. Developers can quickly deploy this library with Python 3.8+ and CUDA 12.8+ environments.

DeepGEMM is open-sourced under the MIT license and hosted on GitHub. This initiative not only provides a template for Hopper architecture optimization for AI researchers but also opens up opportunities for community contributors to further enhance matrix computation techniques.

While DeepGEMM may not outperform expert-tuned libraries in certain specific shapes, its simple design, efficient performance, and innovative optimization techniques make it an invaluable resource for learning Hopper FP8 matrix multiplication and optimization technologies. The DeepSeek-AI team looks forward to more developers joining in to advance matrix computation technology continuously.

Related Information:

DeepSeek Launches Open Source Week, Unveiling Its First Open Source Project – FlashMLA | ATYUN.COM Official Website - Comprehensive Platform for Artificial Intelligence Tutorials and News

Day Two of DeepSeek's Open Source Week: Release of MoE Model Communication Library DeepEP | ATYUN.COM Official Website - Comprehensive Platform for Artificial Intelligence Tutorials and News

COUNT

COUNT - Automate accounting and gain valuable insights

Scan Relief

Scan Relief - Automate receipt scanning and organization

Mindtrip

Mindtrip - AI chatbot that helps you organize a your trip

Ai Drive

Ai Drive - Chat with multiple PDF files

Convex

Convex - AI backend platform for AI assisted app development

Ilus AI

Ilus AI - AI illustration tool for stunning visual content

Vast AI

Vast AI - Cloud-based GPU Rentals for AI Computing

RECENT AI TOOLS

Gitingest

COUNT

Scan Relief

Mindtrip

Ai Drive

RECENT AI NEWS

Huawei to Launch New AI Chip, Challenging Nvidia

Google DeepMind UK Team Reportedly Seeks to Form a Union

Cedar: A New Approach to Solving Kubernetes Authorization Issues

Thin Film Actuator Powered Microbots: Morph, Lock Shape, and Operate Tetherlessly

Double-clicking the Google Photos search icon restores classic search

Meta's AI Chatbot Enables Sexual Conversations with Minors

Solve This Math Problem by Musk to Get Hired at Tesla?

Google AI Studio Update: Features, Tools, VEO 2, and Gemini 2.0

RECENT AI TOOLS