Stability AI Unveils SVD 1.1 Upgrade Enhancing AI Video Generation Capabilities

2024-02-07

Stability AI is known for its growing content creation and coding open-source AI model array. The company has announced an upgrade to its image-to-video diffusion model, Stable Video Diffusion (SVD).

The upgraded model, called SVD 1.1, is a refined version of SVD 1.0, optimizing the generation of short AI videos with better motion and higher consistency.

In a post announcing the upgrade, Stability AI's Chief Technology Officer, Tom Mason, confirmed that the new model is available to the public and can be downloaded via Hugging Face.

He also noted that the model will be offered as part of Stability subscription membership, with different levels for individual and enterprise users, including free, $20 per month, and above. If users want to deploy the new SVD 1.1 for commercial purposes, they will need to join the membership.

What to expect from Stability AI's SVD 1.1?

As early as November 2023, Stability introduced two modes for AI videos: SVD and SVD-XT. The former is the base model, generating up to 14 frames of a four-second video using still images as reference frames. The latter is a fine-tuned version that works the same way but can generate up to 25 frames.

Now, after fine-tuning SVD-XT, Stability has released SVD 1.1. The company claims that this mode can also generate a four-second video with 25 frames, but with a resolution of 1024×576 and the same size for contextual frames.

More importantly, this upgrade is expected to provide more consistent video output compared to the original model.

For example, in many cases, SVD and SVD-XT fail to deliver realistic effects, resulting in videos with no motion or slow camera panning, and unable to generate the desired faces and characters. SVD 1.1 is expected to eliminate all these issues and promises better dynamic effects in the output.

"Fine-tuning (for SVD 1.1) was performed under fixed conditions of 6FPS and motion bucket Id 127 to improve the consistency of the output without adjusting hyperparameters. These conditions can still be adjusted and have not been removed. Performance beyond the fixed tuning settings may vary compared to SVD 1.0," as stated on the new model's Hugging Face page.


Actual AI-generated videos still need to be observed

Although Stability claims that SVD 1.1 has improved performance, the actual effects are yet to be observed. The model's Hugging Face page indicates that this mode is for research purposes only and also reminds everyone that some existing issues may still occur.

It is worth noting that, in addition to Hugging Face, the Stable Video Diffusion model can also be used through the API on Stability AI's developer platform. This provides developers with a convenient way to seamlessly integrate advanced video generation into their products.

"...We have released the Stable Video Diffusion API, which can generate 4-second videos in MP4 format, including 25 generated frames and the remaining interpolated frames, at a frame rate of 24fps. We support motion intensity control as well as various layouts and resolutions, including 1024×576, 768×768, and 576×1024," Mason stated in his post.

Last year, Stability AI drove the development of generative AI through frequent model releases. It seems that 2024 is following the same path. Founded in 2019, the company has raised a significant amount of funding, including the $101 million announced in 2022. However, it is not the only company operating in this field. Competitors like Runway and Pika have also gained attention for their customer-facing web platforms, which not only generate videos but also offer options for easy customization and upgrades.

Recently, competitor Runway introduced the Multi Motion Brush feature on its platform, allowing users to add motion to specific parts of AI videos. Another AI video generation company, Pika, allows users to modify specific areas in videos, such as replacing a cow's face with a duck's face. However, both platforms still do not provide their models through an API, making it impossible for developers to integrate them into their respective applications.