CogVLM2: Zhifu AI Releases Next-Generation Multimodal Large Model AI NEWS

Home
AInews
CogVLM2: Zhifu AI Releases Next-Generation Multimodal Large Model

CogVLM2: Zhifu AI Releases Next-Generation Multimodal Large Model

2024-05-21

Leading AI technology development company Zhipu·AI officially announced that its latest developed multimodal large model CogVLM2 has been officially launched. This new generation model has achieved a qualitative leap in key performance indicators. Compared with the previous generation CogVLM, it has significantly improved in processing capability, understanding depth, and applicability. CogVLM2 not only supports text lengths of up to 8K, but also handles images with resolutions of up to 1344*1344, setting a new benchmark in the field of AI vision and text processing.

According to Zhipu·AI, CogVLM2 achieved a performance improvement of up to 32% in the OCRbench benchmark test, and a performance improvement of 21.9% in the TextVQA benchmark test, fully demonstrating its outstanding ability in document image understanding. Although the model size of CogVLM2 reaches 19B, its performance in various tests is close to or even surpasses the well-known GPT-4V model in the industry.

CogVLM2's technical architecture has been carefully optimized, with a visual encoder with 5 billion parameters and a visual expert module with up to 7 billion parameters. This unique design allows for a closer integration of visual and language modalities, achieving deep fusion. Through fine parameter settings and interactions between modules, CogVLM2 can accurately model the complex relationship between visual and language sequences, thereby significantly improving the processing capability of visual information while maintaining the advantage in language processing.

CogVLM2 actually activates only about 12 billion parameters during inference, thanks to its unique multi-expert module structure. This design not only significantly improves inference efficiency, but also makes CogVLM2 more stable and efficient in processing large-scale data.

In terms of model performance, CogVLM2 performs well in multimodal benchmark tests. Whether it is in text and image understanding tests such as TextVQA, DocVQA, ChartQA, or in complex reasoning and interdisciplinary task tests such as OCRbench, MMMU, MMVet, MMBench, CogVLM2 has achieved excellent results. Both of its models have achieved state-of-the-art performance in multiple benchmarks, and can also compete with closed-source models in other performance aspects.

The release of CogVLM2 model by Zhipu·AI undoubtedly promotes the development of AI technology in the field of multimodal processing. With the continuous progress of technology and the expansion of application scenarios, CogVLM2 is expected to bring more possibilities and opportunities for AI technology.

Spot AI

Transform cameras into smart video intelligence

Miko

AI interactive learning companion for children

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

Giskard AI

AI platform for identifying model vulnerabilities

SnapCalorie

AI photo calorie tracker for accurate nutrition

Supio

**AI legal assistant for personal injury cases**

RECENT AI TOOLS

Action Figure Generator

Spot AI

Miko

Comet

Mirelo AI

RECENT AI NEWS

Intel Launches New Crescent Island GPU, Re-entering the AI Chip Market

You will soon be able to shop at Walmart through ChatGPT

Google Meet Launches AI-Powered Virtual Makeup Feature

Gemini by Google is Now Available to Help You Schedule Google Calendar Meetings

Google Updates Search and Discovery Features with New Expandable Ads and AI Capabilities

Sam Altman Says ChatGPT Will Soon Allow Adult Users to Engage in Explicit Conversations

Oracle Details Upcoming AI Clusters Powered by Nvidia and AMD Chips

Salesforce Launches New OpenAI and Anthropic Integrations

RECENT AI TOOLS