Google Gemini 1.5 Pro Update Introduces "Audio Sensory" Features

2024-04-10

In Google's Next event, Google announced important updates to its Gemini 1.5 Pro, giving the model "auditory" capabilities. Now, this model can not only listen to uploaded audio files but also extract key information from sources such as earnings conference calls or video audios without relying on written transcripts.

At the same time, Google also introduced its AI application platform, Vertex AI, to the public for the first time, pushing Gemini 1.5 Pro forward. This model was initially released in February as the mid-range product of the Gemini series, surpassing the largest and most powerful Gemini Ultra model in terms of performance. Google claims that Gemini 1.5 Pro not only has the ability to understand complex instructions but also eliminates the need for model fine-tuning.

Currently, users without access to Vertex AI and AI Studio cannot use Gemini 1.5 Pro. Most people are currently experiencing the Gemini language model through the Gemini chatbot. While Gemini Ultra provides support for the Gemini Advanced chatbot, which is powerful and capable of understanding long instructions, it is slightly slower than Gemini 1.5 Pro.

In addition to Gemini 1.5 Pro, Google's another large-scale AI model, the text-to-image generation model Imagen 2, has also received updates. This update enhances Gemini's image generation capabilities, adding image restoration and image expansion functions, allowing users to easily add or remove elements from images. Google also applies its SynthID digital watermarking feature to all images created through the Imagen model, adding an invisible watermark to mark their source.

The new features of Imagen, especially image restoration and image expansion, have also appeared in other text-to-image models such as Stability AI's Stable Cascade and Getty's iStock Generative AI. In addition, these features have been widely expanded in consumer availability on the new Samsung Galaxy phones.

Google also stated that it is publicly previewing a method that combines AI responses with Google Search, allowing AI to leverage the latest information to answer questions. However, responses generated by large language models are not always accurate and sometimes even intentional; therefore, Google specifically ensures that Gemini avoids answering questions related to the 2024 US election.

Recently, Gemini has faced criticism for generating historically inaccurate photos of people. This incident has sparked discussions about the accuracy of AI models in history and culture, reminding us to be more cautious and prudent when using such technology.