Google Gemini 1.5 Pro Update Introduces "Audio Sensory" Features AI NEWS

Home
AInews
Google Gemini 1.5 Pro Update Introduces "Audio Sensory" Features

Google Gemini 1.5 Pro Update Introduces "Audio Sensory" Features

2024-04-10

In Google's Next event, Google announced important updates to its Gemini 1.5 Pro, giving the model "auditory" capabilities. Now, this model can not only listen to uploaded audio files but also extract key information from sources such as earnings conference calls or video audios without relying on written transcripts.

At the same time, Google also introduced its AI application platform, Vertex AI, to the public for the first time, pushing Gemini 1.5 Pro forward. This model was initially released in February as the mid-range product of the Gemini series, surpassing the largest and most powerful Gemini Ultra model in terms of performance. Google claims that Gemini 1.5 Pro not only has the ability to understand complex instructions but also eliminates the need for model fine-tuning.

Currently, users without access to Vertex AI and AI Studio cannot use Gemini 1.5 Pro. Most people are currently experiencing the Gemini language model through the Gemini chatbot. While Gemini Ultra provides support for the Gemini Advanced chatbot, which is powerful and capable of understanding long instructions, it is slightly slower than Gemini 1.5 Pro.

In addition to Gemini 1.5 Pro, Google's another large-scale AI model, the text-to-image generation model Imagen 2, has also received updates. This update enhances Gemini's image generation capabilities, adding image restoration and image expansion functions, allowing users to easily add or remove elements from images. Google also applies its SynthID digital watermarking feature to all images created through the Imagen model, adding an invisible watermark to mark their source.

The new features of Imagen, especially image restoration and image expansion, have also appeared in other text-to-image models such as Stability AI's Stable Cascade and Getty's iStock Generative AI. In addition, these features have been widely expanded in consumer availability on the new Samsung Galaxy phones.

Google also stated that it is publicly previewing a method that combines AI responses with Google Search, allowing AI to leverage the latest information to answer questions. However, responses generated by large language models are not always accurate and sometimes even intentional; therefore, Google specifically ensures that Gemini avoids answering questions related to the 2024 US election.

Recently, Gemini has faced criticism for generating historically inaccurate photos of people. This incident has sparked discussions about the accuracy of AI models in history and culture, reminding us to be more cautious and prudent when using such technology.

MathGPT

MathGPT - Solve math problems with step-by-step explanations

Face Detector

Face Detector - Analyze face shape from uploaded photos

Glambase

Glambase - Create and monetize AI influencers.

Aider Chat

Aider Chat - Pair program with AI in terminal.

Tidio Chat

Tidio Chat - Manage customer communications through live chat, email, and chatbots.

Botpress

Botpress - Build and manage AI chatbots.

Theee AI

Theee AI - Use 50,000 AI tools for free online

RECENT AI TOOLS

CopyCopter

MathGPT

Face Detector

Glambase

Aider Chat

RECENT AI NEWS

El Capitan Tops Supercomputer Rankings, Powered by AMD Instinct Chips

Logo Creator: New AI-Powered Design Tool Simplifies Logo Creation Process

AWS Launches Multi-Agent Orchestrator for Managing AI Agents

Microsoft Ignite Conference Unveils Copilot Actions and Multiple AI Enhancements

Microsoft Launches Windows 365 Link, a New Option for Cloud Mini PCs

Niantic Develops Large-Scale Geospatial Models to Redefine Real-World Interactions

Google Gemini Update: Personalized Memory Feature Launched

OpenAI Launches Advanced Voice Mode for ChatGPT Web Version

RECENT AI TOOLS