Stability AI Unveils Stable Audio 2.0: A New Frontier in AI-Generated Music Creation

2024-04-04

Stability AI has released Stable Audio 2.0, the latest version of its AI model that generates music and sound effects. This iteration introduces a range of new features and functionalities, enabling artists and musicians to create high-quality, complete tracks with unprecedented ease and flexibility. One of the most significant advancements in Stable Audio 2.0 is its ability to generate songs up to three minutes long, including fully structured compositions with introductions, developments, conclusions, and stereo sound effects. This sets Stable Audio 2.0 apart from other state-of-the-art models as it can produce coherent music structures that closely resemble those created by humans. In addition to text-to-audio capabilities, Stable Audio 2.0 now supports audio-to-audio generation. Users can upload their own audio samples and use natural language prompts to transform them, unlocking a world full of creative possibilities. This feature allows users to customize the output's theme, ensuring it aligns with the specific style and tone of their projects. The new model also enhances the production of sound effects, such as the sound of typing on a keyboard, crowd noise, or the buzz of a city street. This functionality provides new ways to enhance audio projects and create immersive experiences. To achieve these impressive effects, Stable Audio 2.0's underlying diffusion model is specially designed to generate complete tracks with coherent structures. The architecture employs a novel highly compressed autoencoder that compresses the original audio waveform into a shorter representation. For the diffusion model, a diffusion transformer (DiT) similar to the one used in Stable Diffusion 3 is utilized, as it excels in handling long sequence data. Stability AI also prioritizes protecting creators' rights and ensuring fair compensation. The model is specifically trained on a licensed dataset from the AudioSparx music library, which contains over 800,000 audio files. All artists from AudioSparx have the option to "opt-out" of Stable Audio model training. Additionally, to safeguard the copyrights of uploaded audio, Stability AI collaborates with Audible Magic, utilizing its content recognition (ACR) technology for real-time content matching to prevent copyright infringement. Stable Audio 2.0 can be used for free on the Stable Audio website and will soon be accessible through the Stable Audio API. Stability AI has stated that they will release a research paper in the future, providing additional technical details about the model.