Google Plans to Merge Gemini and Veo AI Models, Says DeepMind CEO Demis Hassabis

2025-04-11

In a recent public appearance on a podcast co-hosted by LinkedIn co-founder Reid Hoffman, Demis Hassabis, CEO of Google DeepMind, revealed that Google plans to eventually integrate its Gemini AI model with the Veo video generation model. The goal is to enhance Gemini's understanding of the physical world.

"From the very beginning, we designed Gemini, our foundational model, to be multimodal," Hassabis explained. "This decision was driven by our vision of creating a universal digital assistant—one that can genuinely assist you in real-world scenarios."

The AI industry is progressively moving toward "all-in-one" models capable of comprehending and synthesizing various forms of media. For instance, Google's latest Gemini model can generate not only text but also audio and images. Similarly, OpenAI's default model in ChatGPT natively supports image creation, including artwork styled after Studio Ghibli. Meanwhile, Amazon has announced plans to launch an "any-to-any" model later this year.

These all-in-one models require massive amounts of training data, including images, videos, audio, text, and more. Hassabis hinted that much of Veo’s video data comes from YouTube, a platform owned by Google.

"Essentially, by analyzing vast amounts of YouTube videos, [Veo 2] can learn the rules of the physical world," Hassabis stated.

Google previously informed TechCrunch that its models "might" be trained on "some" YouTube content under agreements with creators. Reports indicate that last year, Google expanded sections of its terms of service to allow the company to gather more data for training its AI systems.