x.AI Unveils Multimodal Model Grok-1.5V: A Step Closer to Generalized AI AI NEWS

Home
AInews
x.AI Unveils Multimodal Model Grok-1.5V: A Step Closer to Generalized AI

x.AI Unveils Multimodal Model Grok-1.5V: A Step Closer to Generalized AI

2024-04-15

x.AI's research lab, Grok-1.5 Vision (Grok-1.5V), recently released a preview version of its first multimodal model. This achievement is undoubtedly remarkable for the emerging company, which has only been established for 9 months. The upgraded version of this large-scale language model demonstrates more powerful capabilities in understanding and interacting with the physical world.

Grok-1.5V has the ability to process a wide range of visual information, including documents, charts, graphs, and photos. It excels in multidisciplinary reasoning and understanding spatial relationships in the physical world, surpassing similar models even in x.AI's newly launched RealWorldQA benchmark test.

In a blog post, the startup showcased various applications of Grok-1.5V. It effortlessly handles tasks such as writing code based on drawings, calculating calories from nutrition label photos, and even creating bedtime stories from children's drawings. Additionally, the model can explain internet memes, convert tables to CSV format, and provide suggestions for home maintenance issues like decaying wood on the patio. These functionalities fully demonstrate the astonishing versatility and practicality of Grok-1.5V.

x.AI stated in the blog post, "Enhancing our multimodal understanding and generation capabilities is an important step towards building beneficial Artificial General Intelligence (AGI) that can comprehend the universe." The lab is excited to release RealWorldQA to the community and plans to further expand the benchmark test as they improve their multimodal model.

The launch of RealWorldQA highlights x.AI's determination to advance AI's understanding of the physical world, a crucial step in developing practical real-world AI assistants. The benchmark test consists of over 760 images with question-answer pairs, presenting significant challenges to cutting-edge models despite many examples being relatively simple for humans. This further emphasizes the significance of Grok-1.5V's achievements.

Earlier this week, Meta also released its OpenEQA benchmark test, aiming to evaluate AI models' understanding of physical space. The benchmark test includes over 1600 questions about real environments, testing the models' ability to recognize objects, perform spatial reasoning, and apply common-sense knowledge. Given Grok-1.5V's outstanding performance in understanding the physical world, there are high expectations for its performance on the OpenEQA benchmark test.

x.AI emphasizes the importance of advancing multimodal understanding and generation capabilities in building beneficial AGI. They plan to make significant progress in various modalities such as images, audio, and video in the coming months. The company also stated that Grok-1.5V will soon be made available to early testers and existing Grok users.

Toki AI

Toki AI schedules events through messaging apps

Ikko Earbuds

Touchscreen translation assistant for AI earbuds

Action Figure Generator

Create custom collectible action figures made by AI

Spot AI

Transform cameras into smart video intelligence

Miko

AI interactive learning companion for children

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

RECENT AI TOOLS

MINT AI

Toki AI

Ikko Earbuds

Action Figure Generator

Spot AI

RECENT AI NEWS

Unitree releases H2 humanoid robot

Meta Adds More Parental Controls for Teen AI Use

NVIDIA Launches Mass Production of Blackwell Chips at TSMC's Arizona Facility

Mirantis Kubernetes Management Platform k0rdent Releases v1.2.0

Google DeepMind and CFS Collaborate to Develop AI Plasma Control System for Nuclear Fusion

Microsoft Introduces New Copilot Automation Features for Windows 11

Pinterest Introduces New Controls to Help Users Reduce "AI Junk" in Their Feed

Honor Teases "Robot Phone" Integrating AI, Robotics, and Next-Generation Imaging Technologies

RECENT AI TOOLS