Meta AI Launches Audiobox: A New AI Audio Model Supporting Voice Text Generation AI NEWS

Home
AInews
Meta AI Launches Audiobox: A New AI Audio Model Supporting Voice Text Generation

Meta AI Launches Audiobox: A New AI Audio Model Supporting Voice Text Generation

2023-12-01

Meta AI has launched Audiobox, a new research model for generating audio. It allows for the creation of custom voices, sound effects, and soundscapes using voice and text prompts.

Audiobox builds upon Meta's previous voice generation model, Voicebox, significantly improving the controllability and quality of audio AI. The model outperforms previous systems in generating sounds and sound effects that accurately match the desired style and environment described in the text prompts.

What sets Audiobox apart is its ability to accept both voice recordings and natural language text as input. This dual input method provides finer control over audio generation.

For example, users can input a voice sample and add a text prompt like "speak slowly in a large cave" to change the rhythm or environment of the voice. The voice input retains its unique sound characteristics, while the text is used to modify other parameters.

Meta developed Audiobox to make audio production more accessible. The model reduces the difficulty of creating custom sounds, voices, and soundscapes for podcasts, videos, games, and other media projects. Even beginners can easily generate high-quality audio elements to enhance their projects without extensive professional knowledge.

However, responsible development is crucial, as with all impactful AI innovations. Meta has selectively granted usage permissions for Audiobox to researchers with a good track record in voice and responsibility research. To prevent misuse, the company has also implemented audio watermarking and sound authentication security measures in the model.

Earlier today, Alibaba Cloud also fully open-sourced its Qwen-Audio model. Similar to Audiobox, their multimodal base model can handle various types of audio data and text, achieving remarkable results in various benchmarks for sound understanding.

Between Meta's emphasis on controllability with Audiobox and Alibaba's focus on versatility with Qwen-Audio, responsible and fair open innovation in audio AI is clearly progressing. With more researchers gaining access to these powerful technologies, we are likely to see further breakthroughs in capabilities, versatility, and quality in this field.

Figma Make

Create prototype apps from existing designs

Doctronic

AI platform providing personalized health guidance

3D Look AI

AI body scanner for accurate body measurements

VulnZap

AI code vulnerability scanner

The Furnisher

AI room design tool for quick makeovers

Dexter

AI agent for comprehensive financial research

Harness AI

AI-powered DevOps automation for faster code delivery

RECENT AI TOOLS

Keploy

Figma Make

Doctronic

3D Look AI

VulnZap

RECENT AI NEWS

OpenAI Releases GPT-5.2 with Cutting-Edge Mathematical Capabilities

Disney Partners with OpenAI to Allow Sora to Generate AI Videos Featuring Its Characters

Runway Launches Its First World Model and Adds Native Audio to Its Latest Video Model

Google Launches “Disco”: A Gemini-Powered Tool That Turns Browser Tabs into Web Apps

Google AI Try-On: Snap a Selfie to Try Clothes

1X Reaches Agreement to Bring “Home” Humanoid Robots into Factories and Warehouses

Google Adds New Features to Boost Website Visibility in AI Search

Google Launches Sub-$5 AI Plus Plan in India to Compete with ChatGPT Go

RECENT AI TOOLS