Shanghai AI Lab Unveils Next-Generation "ShuSheng" Visual Model, Pioneering Open Source for Core Visual Tasks AI NEWS

Home
AInews
Shanghai AI Lab Unveils Next-Generation "ShuSheng" Visual Model, Pioneering Open Source for Core Visual Tasks

Shanghai AI Lab Unveils Next-Generation "ShuSheng" Visual Model, Pioneering Open Source for Core Visual Tasks

2024-01-30

Shanghai Artificial Intelligence Laboratory, in collaboration with Tsinghua University, the Chinese University of Hong Kong, and SenseTime, has officially open-sourced a new generation of Scholar·Vision Large Model (InternVL).

It is understood that this visual encoder model, named InternVL-6B, has a parameter size of up to 6 billion, marking a significant breakthrough in the field of visual large models. The model adopts a progressive alignment technique that combines contrastive learning and generation, achieving fine alignment between visual and language large models on internet-scale data. This means that InternVL-6B can not only handle subtle visual information in complex images but also perform image-to-text tasks.

What is even more remarkable is that InternVL-6B also has the ability to interpret complex page information and even solve mathematical problems. The implementation of this function undoubtedly further expands its practical application in various fields.

Shanghai AI Laboratory has always been at the forefront of visual large model research. In 2021, they released Scholar 1.0, becoming the first large model in China to cover a wide range of visual tasks. With just one base model, the model can comprehensively cover four core visual tasks: classification, object detection, semantic segmentation, and depth estimation.

In 2022, they once again updated and released the visual large model InternImage. This model constructs a new architecture for visual large models based on dynamic sparse convolution, opening up a new approach to large model architecture beyond Transformers. In 12 visual tasks, InternImage has demonstrated outstanding performance.

It can be foreseen that with the continuous development and application of visual large models, their potential in various fields will be further unleashed.

DeepAI

Chat with AI for free

OpenRouter

Access every major AI model trough one platform

MINT AI

AI agents for optimizing advertising campaigns

Toki AI

Toki AI schedules events through messaging apps

Ikko Earbuds

Touchscreen translation assistant for AI earbuds

Action Figure Generator

Create custom collectible action figures made by AI

Spot AI

Transform cameras into smart video intelligence

RECENT AI TOOLS

Rithmm

DeepAI

OpenRouter

MINT AI

Toki AI

RECENT AI NEWS

Reddit Sues Perplexity and AI Data Scraping Companies for Unauthorized Use of Its Data

Google Cloud Launches Nvidia G4 AI Virtual Machines

Multiple Users Report ChatGPT's Impact on Mental Health, Seek Help from FTC

Meta Cuts 600 Jobs in Artificial Intelligence Division

Leena Opens "AI Colleague Studio" for Enterprise Agent Customization

OpenAI Requests List of Participants in ChatGPT Suicide Lawsuit Memorials

Amazon integrates AI with robotics and smart glasses to streamline delivery processes

Amazon Launches AI Smart Glasses for Delivery Drivers

RECENT AI TOOLS