DeepSeek Open Source Week Day 5: Introduction to 3FS Parallel File System

2025-02-28

On the fifth day of DeepSeek's Open Source Week, the official announcement was made regarding the open-sourcing of 3FS (Fire-Flyer File System). Designed as a parallel file system tailored for modern SSDs and RDMA networks, 3FS aims to enhance data access performance.

3FS demonstrates exceptional performance capabilities. In an 180-node cluster environment, it achieves an aggregated read throughput of up to 6.6 TiB/s. Additionally, during the GraySort benchmark test on a 25-node cluster, 3FS reached a throughput of 3.66 TiB/min. Moreover, the peak KVCache lookup throughput per client node surpassed 40 GiB/s.

This file system employs a decentralized architecture with strong consistency semantics, offering notable advantages in handling data-intensive applications. The key strengths of 3FS lie in its high performance, robust consistency, and ease of use, making it an ideal choice for AI training and inference workloads.

3FS has been widely adopted in DeepSeek's V3/R1 versions, supporting various processes such as training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search, and KVCache lookups during inference. This highlights 3FS's proven ability to deliver robust support in practical applications.

In conjunction with 3FS, DeepSeek also released Smallpond, an open-source data processing framework built on top of DuckDB and 3FS. Smallpond is a lightweight framework that offers high-performance data processing capabilities, scaling to PB-level datasets while remaining easy to operate without requiring long-running services.

In summary, the open-sourcing of 3FS provides a new solution for data-heavy applications, while the introduction of Smallpond further enriches the ecosystem surrounding 3FS.

Related Information:

DeepSeek Launches Open Source Week, Releases Its First Open Source Project – FlashMLA | ATYUN.COM Official Website - Comprehensive AI Tutorials and Information Platform

Day Two of DeepSeek's Open Source Week: Introduces MoE Model Communication Library DeepEP | ATYUN.COM Official Website - Comprehensive AI Tutorials and Information Platform

Day Three of DeepSeek's Open Source Week: Releases Matrix Multiplication Library DeepGEMM | ATYUN.COM Official Website - Comprehensive AI Tutorials and Information Platform

Day Four of DeepSeek's Open Source Week: Announces Optimized Parallel Strategies | ATYUN.COM Official Website - Comprehensive AI Tutorials and Information Platform

Day Five of DeepSeek's Open Source Week: Unveils 3FS Parallel File System | ATYUN.COM Official Website - Comprehensive AI Tutorials and Information Platform