Nous releases open-source visual language model Hermes 2 Vision Alpha, facing optimization challenges AI NEWS

Home
AInews
Nous releases open-source visual language model Hermes 2 Vision Alpha, facing optimization challenges

Nous releases open-source visual language model Hermes 2 Vision Alpha, facing optimization challenges

2023-12-05

Nous Research, a private applied research group known for its open-source work in the field of large-scale language models (LLMs), has released a lightweight vision-language model called Nous Hermes 2 Vision.

This open-source model can be obtained through Hugging Face and is developed based on the company's previous OpenHermes-2.5-Mistral-7B model. It has visual capabilities, including the ability to extract textual information from visual content and generate detailed answers based on image prompts.

However, shortly after the release of this model, it was discovered that the model had more hallucinations than expected, leading to errors. As a result, the project was renamed Hermes 2 Vision Alpha. The company plans to release a more stable version that offers similar benefits with fewer errors.

Nous Hermes 2 Vision Alpha

Named after the Greek god Hermes, Nous Vision aims to be a system that can "navigate complex human discourse with divine skill." It utilizes user-provided image data and combines visual information with its learned knowledge to provide detailed answers in natural language.

For example, it can analyze a user's image and provide detailed descriptions of its different aspects. One of Nous' co-founders, known as Teknium on X, shared a test screenshot where the LLM was able to analyze a picture of a hamburger and determine whether eating it would be harmful to health, explaining the reasons behind it.

While ChatGPT based on GPT-4V also brings the ability to use image prompts, Nous' open-source product stands out in two key aspects.

Firstly, unlike traditional approaches that rely on large 3B visual encoders, Nous Hermes 2 Vision adopts SigLIP-400M. This not only simplifies the model's architecture, making it lighter than similar products, but also helps improve performance on vision-language tasks.

Secondly, it has been trained on a custom dataset with rich function calls. This allows users to prompt the model with tags and extract written information from images, such as menus or billboards.

"This unique addition transforms Nous-Hermes-2-Vision into a vision-language action model. Developers now have a versatile tool ready to create various sophisticated automations," the company wrote on the model's Hugging Face page.

Other datasets used for training the model include LVIS-INSTRUCT4V, ShareGPT4V, and dialogues from OpenHermes-2.5.

Ongoing Issues

Although the Nous vision-language model is available for research and development purposes, early usage has shown that it is far from perfect.

Shortly after its release, the co-founder posted a message stating that the model had issues, including generating many hallucinations and misusing EOS tokens. Subsequently, the model was renamed as an alpha version.

"I see people talking about 'hallucinations,' and yes, the situation is indeed bad. I am aware of it too, as the underlying LLM is an unreviewed model. I will update this version by the end of this month to address these issues," wrote Quan Nguyen, a researcher at Nous, on X.

However, Nguyen also pointed out in another post that the function call feature still works well if users define a good architecture. He also mentioned that if there is sufficient user feedback, he will release a model specifically designed for function calls.

So far, Nous Research has released 41 open-source models with different architectures and functionalities as part of its Hermes, YaRN, Capybara, Puffin, and Obsidian series.

Miko

AI interactive learning companion for children

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

Giskard AI

AI platform for identifying model vulnerabilities

SnapCalorie

AI photo calorie tracker for accurate nutrition

Supio

**AI legal assistant for personal injury cases**

TTS Maker

Free AI tool for converting text to speech

RECENT AI TOOLS

Spot AI

Miko

Comet

Mirelo AI

Giskard AI

RECENT AI NEWS

Microsoft Deploys the World's First GB300 Supercluster for OpenAI

Unitree R1 Bipedal Humanoid Robot Ranks on TIME's 2025 Best Inventions List

Dishwashing and laundry "housework buddy" is here! Figure 03 humanoid robot: 1.68 meters tall, 5-hour battery life

Sora Reaches 1 Million Downloads Faster Than ChatGPT

Google Launches Gemini Enterprise: Unified AI Platform for Businesses

Figma Leverages Google's Gemini to Accelerate Enterprise AI in Its Design Platform

Intel Launches Panther Lake, the First Core Ultra Based on 18A Process

Amazon Launches Quick Suite, Introducing AI Agents to the Enterprise Workplace

RECENT AI TOOLS