DeepSeek Launches Open-Source Week with Its First Project: FlashMLA AI NEWS

Home
AInews
DeepSeek Launches Open-Source Week with Its First Project: FlashMLA

DeepSeek Launches Open-Source Week with Its First Project: FlashMLA

2025-02-24

DeepSeek has officially launched its "Open Source Week" initiative, starting with the release of its first open-source project, FlashMLA. This project is an efficient multi-head linear attention (MLA) decoding kernel optimized for NVIDIA Hopper architecture GPUs, specifically designed to handle variable-length sequence data.

Inspired by flashAttention 2&3 and Cutlass projects, FlashMLA aims to enhance memory and computational efficiency through technological innovation. It supports CUDA 12.3 and later versions, as well as PyTorch 2.0 and above, providing robust technical support for developers.

Technically, FlashMLA employs BF16 data format to balance performance and efficiency. Additionally, it introduces a paging key-value (KV) cache mechanism with a block size of 64, enabling more precise memory management. This mechanism significantly improves memory usage efficiency when dealing with large-scale data.

In terms of hardware performance, FlashMLA demonstrates exceptional capabilities on NVIDIA H800 SXM5 GPUs. Under memory-constrained scenarios, its memory bandwidth can reach up to 3000 GB/s, while under compute-constrained scenarios, its computational power reaches 580 TFLOPS. These performance metrics highlight FlashMLA's strength in handling complex computational tasks.

The technical principles of FlashMLA include block scheduling and parallel computation, along with optimized memory access patterns. Through block scheduling, FlashMLA decomposes computing tasks into smaller blocks for parallel processing, fully utilizing the GPU's parallel computing capability. By optimizing memory access patterns, FlashMLA reduces memory access overhead, further enhancing performance when processing large-scale data.

For developers, FlashMLA offers a straightforward installation and deployment process. Developers can quickly deploy FlashMLA by executing simple installation commands, such as python setup.py install. They can also verify its performance through benchmark test scripts, like python tests/test_flash_mla.py.

11X

AI tool for automating outbound sales prospecting

Standard AI

Understand how customers shop with AI video analysis

Fiber AI

AI contact data search and verification tool

Google Antigravity

AI coding platform for agentic development

Scribble Vet

AI veterinary scribe for efficient clinical notes

Bender AI

Information retrieval error handling tool

Riskified

AI fraud detection tool for ecommerce merchants

RECENT AI TOOLS

Firecrawl

11X

Standard AI

Fiber AI

Google Antigravity

RECENT AI NEWS

Apple Urges Indian Court to Halt New Antitrust Penalty Mechanism, Risking $38 Billion in Losses

Warning: A Humanoid Robot-Shaped Asset Bubble Is Forming

By 2030, OpenAI Will Have 220 Million Paying Users But Still Won’t Be Profitable

Alibaba Enters the Smart Glasses Race with Removable Battery

Procure AI Raises $13M to Advance Enterprise Procurement Automation

DeepSeek Wins Gold Medal at IMO 2025 Alongside OpenAI and Google

Mixpanel Vulnerability Exposed Account Data of Some OpenAI API Users

NIO Licenses Its Autonomous Driving Chip Technology

RECENT AI TOOLS