2024-01-30
Shanghai Artificial Intelligence Laboratory, in collaboration with Tsinghua University, the Chinese University of Hong Kong, and SenseTime, has officially open-sourced a new generation of Scholar·Vision Large Model (InternVL).
It is understood that this visual encoder model, named InternVL-6B, has a parameter size of up to 6 billion, marking a significant breakthrough in the field of visual large models. The model adopts a progressive alignment technique that combines contrastive learning and generation, achieving fine alignment between visual and language large models on internet-scale data. This means that InternVL-6B can not only handle subtle visual information in complex images but also perform image-to-text tasks.
What is even more remarkable is that InternVL-6B also has the ability to interpret complex page information and even solve mathematical problems. The implementation of this function undoubtedly further expands its practical application in various fields.
Shanghai AI Laboratory has always been at the forefront of visual large model research. In 2021, they released Scholar 1.0, becoming the first large model in China to cover a wide range of visual tasks. With just one base model, the model can comprehensively cover four core visual tasks: classification, object detection, semantic segmentation, and depth estimation.
In 2022, they once again updated and released the visual large model InternImage. This model constructs a new architecture for visual large models based on dynamic sparse convolution, opening up a new approach to large model architecture beyond Transformers. In 12 visual tasks, InternImage has demonstrated outstanding performance.
It can be foreseen that with the continuous development and application of visual large models, their potential in various fields will be further unleashed.
RECENT AI NEWS
RECENT AI TOOLS
Generate tattoo designs from text prompts.
Build and deploy full-stack web applications.
Translate text into multiple languages
Generate images and videos from text.
Create interactive lessons with AI.
Here is a summary of the main use case of the product in six words:“Generate marketing assets with AI assistance”.
Video to 3D models for animations
Create viral short videos from text.
Automatically create faceless short videos