Why "Multimodal Artificial Intelligence" is the Hottest Technology Today AI NEWS

Home
AInews
Why "Multimodal Artificial Intelligence" is the Hottest Technology Today

Why "Multimodal Artificial Intelligence" is the Hottest Technology Today

2024-05-16

This week, OpenAI and Google showcased their latest and most advanced artificial intelligence technologies. Over the past two years, tech companies have been competing to make AI models smarter, but now there is a new focus: making them multimodal. OpenAI and Google are focusing on AI that can seamlessly switch between its machine mouth, eyes, and ears.

"Multimodal" has become the hottest term for tech companies betting on the most appealing form of AI models in everyday life. Since the launch of ChatGPT in 2022, AI chatbots have lost their shine. Therefore, companies hope to have more natural conversations and visual sharing with AI assistants instead of typing. When you see multimodal AI performing well, it feels like science fiction coming true.

On Monday, OpenAI showcased GPT-4 Omni, which evokes the dystopian film "Her" about human disconnection. OpenAI claims that the model can process both video and audio simultaneously. In the demonstration, an OpenAI employee used a smartphone camera to show a math problem to ChatGPT and verbally asked the chatbot to explain it. OpenAI stated that the model is now being rolled out to advanced users.

The next day, Google launched Project Astra, promising similar functionality. However, compared to GPT-4 Omni, Project Astra seems a bit slow, with a more mechanized voice resembling Siri rather than "Her." However, Google stated that this is still in the early stages and even pointed out some of the current challenges that OpenAI has already overcome.

In a blog post, Google wrote, "While we have made incredible progress in developing AI systems that can understand multimodal information, reducing response time to a level suitable for conversation is a challenging engineering task."

You may still remember Gemini, the demo video released by Google in December 2023, which turned out to be highly processed. Six months later, Google is still not ready to release the content shown in the video, but OpenAI is accelerating the development of GPT-4o. Multimodal AI represents the next big race in AI development, and OpenAI seems to be winning this race.

One key difference with GPT-4o is that a single AI model can handle audio, video, and text natively. Previously, OpenAI needed separate AI models to translate speech and video into text so that language-based GPT-4 could understand these different media. Given the slower response time, Google may still be using multiple AI models to perform these tasks.

As tech companies adopt multimodal AI, we are also seeing wider adoption of AI wearables. Examples of AI devices that utilize these different media include Humane AI Pin, Rabbit R1, and Meta Ray-Bans. These devices are expected to reduce our reliance on smartphones, but Siri and Google Assistant may soon adopt multimodal AI as well.

Action Figure Generator

Create custom collectible action figures made by AI

Spot AI

Transform cameras into smart video intelligence

Miko

AI interactive learning companion for children

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

Giskard AI

AI platform for identifying model vulnerabilities

SnapCalorie

AI photo calorie tracker for accurate nutrition

RECENT AI TOOLS

Ikko Earbuds

Action Figure Generator

Spot AI

Miko

Comet

RECENT AI NEWS

Intel Launches New Crescent Island GPU, Re-entering the AI Chip Market

You will soon be able to shop at Walmart through ChatGPT

Google Meet Launches AI-Powered Virtual Makeup Feature

Gemini by Google is Now Available to Help You Schedule Google Calendar Meetings

Google Updates Search and Discovery Features with New Expandable Ads and AI Capabilities

Sam Altman Says ChatGPT Will Soon Allow Adult Users to Engage in Explicit Conversations

Oracle Details Upcoming AI Clusters Powered by Nvidia and AMD Chips

Salesforce Launches New OpenAI and Anthropic Integrations

RECENT AI TOOLS