Google is set to launch the next-generation text-to-image base model on its Vertex AI platform. Imagen 3 will be available for partial customer preview, offering developers faster image generation speed, stronger prompt understanding capabilities, more realistic character image generation effects, and finer control over text rendering within images compared to the previous generation.
Imagen 3 made its debut at Google I/O conference in May this year, initially providing private previews only on the ImageFX platform for selected creators. However, Google has promised that this AI model will officially join the Vertex AI platform.
"This is the most powerful image generation model we have developed so far," said Douglas Eck, Senior Research Director at Google DeepMind. "Imagen 3 generates more realistic images with richer details and fewer visual defects or distortions. It excels at understanding prompts written by people—the more creative and detailed your prompts are, the better the generated results. Moreover, Imagen 3 can remember small details in longer prompts... Furthermore, it achieves the best performance in rendering text, which has always been a major challenge for image generation models."
With its launch on Vertex AI, Imagen 3 supports multiple languages, security features (such as Google DeepMind's SynthID digital watermark), and various aspect ratios.
Shutterstock, a provider of stock images, is one of the first companies to adopt this model. "Since we added Imagen to our AI image generator, our users have generated millions of images using this model," commented Justin Hiza, Vice President of Data Services at the company. "We are excited about the improvements in Imagen 3 because it allows our users to realize their ideas faster without sacrificing image quality. As a significant improvement to Shutterstock's first ethically sourced AI image generator, we also appreciate its built-in security and the compensation protection provided by Google Cloud for generative AI."
Although Google continues to innovate Imagen, it has not revealed when it will allow its Gemini AI to resume image generation capabilities after experiencing a significant "inaccuracy" backlash. When asked about this issue at a press conference, Thomas Kurian, CEO of Google Cloud, pointed out that Imagen and Gemini are two different types of models: "Gemini is a multimodal model, which means you can input various types of data modes to it, and it can reason on that basis... allowing you to reason across images, videos, and audios... This is different from what we do with Imagen. Imagen is a diffusion model specifically designed for generating text-to-image with ultra-high fidelity... Imagen is not a substitute for the image capabilities in Gemini. These two technologies serve different purposes."