Google DeepMind Launches V2A Technology, Adding Realistic Sound Effects to Silent Videos AI NEWS

Home
AInews
Google DeepMind Launches V2A Technology, Adding Realistic Sound Effects to Silent Videos

Google DeepMind Launches V2A Technology, Adding Realistic Sound Effects to Silent Videos

2024-06-19

Voice is undoubtedly the core element of producing high-quality videos. Therefore, despite the realistic effects achieved in video production by tools like Google's Veo, OpenAI's Sora, and Runway's Gen-3 Alpha, these works often feel lacking in agility and vitality. To make up for this shortcoming, Google DeepMind's latest AI model attempts to inject vitality into videos by generating synchronized music. This is truly an amazing technology.

Google's V2A (Video-to-Audio) technology cleverly combines video pixels with optional text prompts to create audio that closely matches the visual content. It can not only generate music and sound effects but also match the dialogue of actions on the screen.

V2A adopts a diffusion-based approach to generate realistic audio. The system first encodes the video input into a compressed form and then refines the audio from random noise, guided by visual content and optional text prompts. The generated audio is then decoded into waveforms and seamlessly integrated with the video.

To improve audio quality and achieve more accurate sound generation, DeepMind trained the model on additional data, such as AI-generated sound annotations and dialogue scripts. This enables V2A to accurately match audio events with various visual scenes while responding to provided annotations or scripts.

However, V2A also has its limitations. The quality of the audio largely depends on the quality of the input video, and flaws or distortions in the video directly affect the sound quality. In addition, there is room for improvement in lip synchronization for speech videos, as the paired video generation model may not perfectly match mouth movements with the script.

In the field of generative AI, there are also other tools striving to address this issue. For example, earlier this year, Pika Labs launched a similar feature called "Sound Effects." Recently, Eleven Labs also introduced the Sound Effects Generator.

According to Google, what sets V2A apart is its ability to deeply understand the original video pixels. At the same time, it eliminates the tedious step of manually aligning the generated sound with the visual content. Combining V2A with video generation models like Veo can create a coherent audiovisual experience, making it highly promising in entertainment and virtual reality applications.

Google is very cautious when releasing video AI tools. Currently, to the disappointment of AI content creators, Google does not plan to publicly release these tools immediately. Instead, the company is focused on addressing existing limitations and ensuring a positive impact on the creative community. Like other models, the output of the V2A model will include a SynthID watermark to prevent misuse.

DeepAI

Chat with AI for free

OpenRouter

Access every major AI model trough one platform

MINT AI

AI agents for optimizing advertising campaigns

Toki AI

Toki AI schedules events through messaging apps

Ikko Earbuds

Touchscreen translation assistant for AI earbuds

Action Figure Generator

Create custom collectible action figures made by AI

Spot AI

Transform cameras into smart video intelligence

RECENT AI TOOLS

Rithmm

DeepAI

OpenRouter

MINT AI

Toki AI

RECENT AI NEWS

Reddit Sues Perplexity and AI Data Scraping Companies for Unauthorized Use of Its Data

Google Cloud Launches Nvidia G4 AI Virtual Machines

Multiple Users Report ChatGPT's Impact on Mental Health, Seek Help from FTC

Meta Cuts 600 Jobs in Artificial Intelligence Division

Leena Opens "AI Colleague Studio" for Enterprise Agent Customization

OpenAI Requests List of Participants in ChatGPT Suicide Lawsuit Memorials

Amazon integrates AI with robotics and smart glasses to streamline delivery processes

Amazon Launches AI Smart Glasses for Delivery Drivers

RECENT AI TOOLS