Stability AI Releases Open Audio Model: Stable Audio Open

2024-06-06

Stability AI is pushing its generative AI technology in the audio field to the public eye by releasing Stable Audio Open 1.0.


While Stability AI has gained attention for its stable AI technology in text-to-image generation, this is only a part of the company's extensive product line. They have also launched various models for code, text, and audio. In September 2023, Stability AI publicly introduced Stable Audio for the first time, a generative AI tool for text-to-audio. Subsequently, on April 3rd, Stable Audio 2.0 was released, further improving the clarity and length of the audio.

Although the complete Stable Audio tool is commercially available and capable of generating audio up to 3 minutes long, the newly launched Stable Audio Open has significant limitations in terms of functionality. Stable Audio Open is not intended for creating full songs but focuses on generating shorter audio clips, such as sound effects.

As the name suggests, Stable Audio Open is an open model, although it is not open source in the traditional sense. Stable Audio Open does not adopt a license approved by the Open Source Initiative (OSI) but is made available to users under Stability AI's non-commercial research community agreement. This agreement allows users to access the model but restricts its usage.

Zach Evans, the Director of Audio Research at Stability AI, stated in an interview with VentureBeat, "The purpose of releasing Stable Audio Open is to provide audio researchers and producers with practical opportunities to explore, adopt, and creatively utilize our generative audio models, accelerating the research and practical application of these incredible new tools."

So, what exactly is Stable Audio Open?

Stable Audio Open is a specially optimized model primarily used for creating audio samples such as drum beats, improvised instrument performances, and environmental sounds, suitable for music production and sound design.

Unlike Stability AI's commercial product Stable Audio, which can generate coherent music tracks up to three minutes long, Stable Audio Open focuses on generating high-quality audio data up to 47 seconds in length based on text prompts.

Stability AI has taken a responsible approach in training this model. The model was trained on audio data from FreeSound and Free Music Archive, ensuring that no unauthorized copyrighted or proprietary materials were used.

One key advantage of the release of Stable Audio Open is that users can fine-tune the model on their own custom audio data. For example, a drummer can use their own drum recording samples to fine-tune the model and generate entirely new and unique rhythms.

The fine-tuning feature of Stable Audio is implemented through the Stable Audio Tools library, which adopts a true open-source license. The weights of the Stable Audio Open model are now available on the Hugging Face platform.

The audio research team at Stability AI has been working hard to improve the quality and controllability of their generative audio models. Evan stated, "We look forward to further releasing commercial and open models in the future to showcase the latest advancements in our research."