Meta AI Launches Audiobox: A New AI Audio Model Supporting Voice Text Generation

2023-12-01

Meta AI has launched Audiobox, a new research model for generating audio. It allows for the creation of custom voices, sound effects, and soundscapes using voice and text prompts.

Audiobox builds upon Meta's previous voice generation model, Voicebox, significantly improving the controllability and quality of audio AI. The model outperforms previous systems in generating sounds and sound effects that accurately match the desired style and environment described in the text prompts.

What sets Audiobox apart is its ability to accept both voice recordings and natural language text as input. This dual input method provides finer control over audio generation.

For example, users can input a voice sample and add a text prompt like "speak slowly in a large cave" to change the rhythm or environment of the voice. The voice input retains its unique sound characteristics, while the text is used to modify other parameters.

Meta developed Audiobox to make audio production more accessible. The model reduces the difficulty of creating custom sounds, voices, and soundscapes for podcasts, videos, games, and other media projects. Even beginners can easily generate high-quality audio elements to enhance their projects without extensive professional knowledge.

However, responsible development is crucial, as with all impactful AI innovations. Meta has selectively granted usage permissions for Audiobox to researchers with a good track record in voice and responsibility research. To prevent misuse, the company has also implemented audio watermarking and sound authentication security measures in the model.

Earlier today, Alibaba Cloud also fully open-sourced its Qwen-Audio model. Similar to Audiobox, their multimodal base model can handle various types of audio data and text, achieving remarkable results in various benchmarks for sound understanding.

Between Meta's emphasis on controllability with Audiobox and Alibaba's focus on versatility with Qwen-Audio, responsible and fair open innovation in audio AI is clearly progressing. With more researchers gaining access to these powerful technologies, we are likely to see further breakthroughs in capabilities, versatility, and quality in this field.