Challenge Sora! Google Releases New Video Generation Model Veo

2024-05-15

At today's Google annual I/O developer conference, the tech giant officially unveiled Veo, a cutting-edge generative AI video model that is leading the industry trend. Developed by DeepMind AI division, Veo is considered a significant milestone that can match or even surpass other competitors in terms of realism and visual quality generated by AI. Google Veo has attracted widespread attention in the industry due to its powerful video generation capabilities. This model can generate high-quality 1080p video clips that can exceed 60 seconds in length. It supports various film styles, ranging from realism to surrealism and animation. Google stated that the launch of Veo aims to create accessible tools for video production, allowing experienced filmmakers, aspiring creators, and knowledge-sharing educators to unlock new creative possibilities through Veo. To showcase Veo's powerful capabilities, Google collaborated with renowned artist Donald Glover (aka Childish Gambino) and tested Veo's AI video generation capabilities through his creative studio, Gilga. DeepMind released a series of stunning videos and teasers on their official YouTube channel and social media platforms, including realistic neon cities and swimming jellyfish in the ocean. These videos were generated from simple text prompts and are almost indistinguishable from live-action or skillfully computer-generated animations. According to Eli Collins, Vice President of Product Management at Google, and Douglas Eck, Senior Research Director, Veo provides an unprecedented level of creative control. It can understand professional film terms such as "time-lapse" or "aerial shots of landscapes." Additionally, Veo can easily and quickly perform high-quality editing on AI-generated videos or user-uploaded clips with just text prompts. For example, users can input the command "add kayaking in aerial shots of the coastline," and Veo will apply this command to the initial video and generate a new, edited video. It is worth mentioning that Veo achieves high consistency between video frames, avoiding common issues like unstable transitions and ghosting found in other competitors. This is thanks to the "cutting-edge latent diffusion transformer" technology that Veo relies on, which reduces the occurrence of inconsistencies and maintains stability in the position of characters, objects, and styles, making the generated videos closer to real-life visual experiences. To further enhance Veo's performance and efficiency, Google has added more details to each video subtitle in its training data and used high-quality compressed video representations (also known as latent representations). These improvements not only enhance overall quality but also shorten the time required to generate videos. Additionally, all videos generated by Veo will be embedded with SynthID content credentials tracking watermark to ensure that these videos can be identified as AI-generated works. Google stated that the launch of Veo marks the culmination of years of research by DeepMind, building upon a series of advanced technologies including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet, and Lumiere. Although Veo is not yet available to the public, Google plans to provide private previews to selected creators through a waitlist. In the future, Google also plans to expand some of Veo's features to YouTube Shorts and other products. With the release of Google Veo, the industry is filled with anticipation for the application prospects of generative AI in the field of video production. This innovative technology has the potential to bring more convenient and efficient creative tools to creators, driving the continuous development of the video production industry.