Alibaba Cloud Open-Sources Video Generation Model Wan2.1 AI NEWS

Home
AInews
Alibaba Cloud Open-Sources Video Generation Model Wan2.1

Alibaba Cloud Open-Sources Video Generation Model Wan2.1

2025-02-26

Alibaba Cloud has recently open-sourced its advanced video generation model, Wan2.1, which boasts powerful visual content generation capabilities. This model supports two primary tasks: text-to-video generation (text-to-video) and image-to-video generation (image-to-video). To cater to diverse needs, Wan2.1 offers two versions: a professional edition with 14 billion parameters, designed for handling complex motion generation and physical modeling, and an ultra-fast edition with 1.3 billion parameters, optimized for consumer-grade GPUs with lower memory requirements, making it suitable for secondary development and academic research.

Technically, Wan2.1 is built on the Causal 3D VAE and Video Diffusion Transformer architectures. The Causal 3D VAE architecture is specifically designed for video generation, capable of processing spatiotemporal information in videos while incorporating causality constraints to ensure coherent and logically consistent generated content. The Video Diffusion Transformer architecture combines the strengths of diffusion models and Transformers, generating data by gradually removing noise and leveraging self-attention mechanisms to capture long-range dependencies within the video.

In terms of training and inference, Wan2.1 employs various parallel strategies to accelerate the training process. During training, data parallelism (DP) and full Sharded data parallelism (FSDP) are utilized, with RingAttention and Ulsses hybrid parallel strategies introduced for the diffusion module to further enhance training efficiency. For inference, channel parallelism (CP) is used for acceleration, and model slicing techniques are applied for large models to optimize inference performance.

In practical applications, Wan2.1 demonstrates versatility. Beyond basic text-to-video and image-to-video tasks, it also supports video editing, text-to-image generation (text-to-image), and video-to-audio generation. Additionally, the model features visual effects and text rendering capabilities, catering to a wide range of creative scenarios.

Performance-wise, Wan2.1 achieved remarkable results on the authoritative evaluation dataset Vbench. The professional edition with 14 billion parameters scored an impressive 86.22%, significantly outperforming other domestic and international models like Sora, Luma, and Pika. The ultra-fast edition can generate 480P videos with just 8.2GB of GPU memory, compatible with almost all consumer-grade GPUs, offering high generation efficiency.

Notably, Wan2.1 is open-sourced under the Apache 2.0 license and supports multiple mainstream frameworks. It is available on platforms such as GitHub, HuggingFace, and ModelScope, providing developers with convenient usage and deployment environments. This move aims to promote further advancements and applications in video generation technology.

COUNT

COUNT - Automate accounting and gain valuable insights

Scan Relief

Scan Relief - Automate receipt scanning and organization

Mindtrip

Mindtrip - AI chatbot that helps you organize a your trip

Ai Drive

Ai Drive - Chat with multiple PDF files

Convex

Convex - AI backend platform for AI assisted app development

Ilus AI

Ilus AI - AI illustration tool for stunning visual content

Vast AI

Vast AI - Cloud-based GPU Rentals for AI Computing

RECENT AI TOOLS

Gitingest

COUNT

Scan Relief

Mindtrip

Ai Drive

RECENT AI NEWS

Huawei to Launch New AI Chip, Challenging Nvidia

Google DeepMind UK Team Reportedly Seeks to Form a Union

Cedar: A New Approach to Solving Kubernetes Authorization Issues

Thin Film Actuator Powered Microbots: Morph, Lock Shape, and Operate Tetherlessly

Double-clicking the Google Photos search icon restores classic search

Meta's AI Chatbot Enables Sexual Conversations with Minors

Solve This Math Problem by Musk to Get Hired at Tesla?

Google AI Studio Update: Features, Tools, VEO 2, and Gemini 2.0

RECENT AI TOOLS