Hugging Face Releases Lightweight Visual Language Model SmolVLM-256M AI NEWS

Home
AInews
Hugging Face Releases Lightweight Visual Language Model SmolVLM-256M

Hugging Face Releases Lightweight Visual Language Model SmolVLM-256M

2025-01-24

Recently, Hugging Face has made available a new visual language model called SmolVLM-256M, which boasts the lowest parameter count within its category.

Thanks to its compact size, SmolVLM-256M can operate on devices with relatively limited processing power, such as consumer-grade laptops. Additionally, it supports WebGPU technology, making it possible for the model to run in web browsers. WebGPU enables AI-based web applications to leverage the graphics processing units of users' computers. This model is capable of handling various tasks involving visual data, including answering questions about scanned documents, describing video content, and interpreting charts. Hugging Face has also developed a version of this model that allows output customization based on user prompts.

Technically speaking, SmolVLM-256M comprises 256 million parameters, significantly fewer than the billions found in state-of-the-art foundational models. The fewer parameters mean less required hardware resources, which explains why SmolVLM-256M can run on devices like laptops.

SmolVLM-256M represents the latest addition to Hugging Face's series of open-source visual language models. One of the key enhancements over earlier models from the company is the adoption of a new encoder. This software component is responsible for converting files processed by AI into encodings that are easier for neural networks to handle.

The encoder used in SmolVLM-256M is based on an open-source AI algorithm named SigLIP base patch-16/512, which itself originates from an image processing model released by OpenAI in 2021. With 93 million parameters, this encoder has less than one-quarter of the parameter count of Hugging Face's previous generation encoder, contributing to the reduced hardware footprint of SmolVLM-256M. Interestingly, smaller encoders can process images at higher resolutions, typically improving visual comprehension without increasing the number of parameters, according to research by Apple and Google.

To train this AI, Hugging Face utilized an enhanced dataset previously employed for developing its preceding visual language models. To enhance SmolVLM-256M's reasoning capabilities, they added handwritten mathematical expressions to the dataset among other improvements aimed at boosting the model's document understanding and image description skills.

In an internal evaluation, Hugging Face compared SmolVLM-256M with a multimodal model featuring 8 billion parameters released 18 months prior. Across more than six benchmarks, the former outperformed the latter. In the MathVista benchmark test, the model's score for tackling geometric problems increased by over 10%.

Besides SmolVLM-256M, Hugging Face introduced a more powerful algorithm named SmolVLM-500M, which contains 500 million parameters. It trades some hardware efficiency for improved output quality. According to Hugging Face, SmolVLM-500M also performs better in following user instructions.

Currently, Hugging Face has uploaded the source codes of both models to its eponymous AI project hosting platform.

COUNT

COUNT - Automate accounting and gain valuable insights

Scan Relief

Scan Relief - Automate receipt scanning and organization

Mindtrip

Mindtrip - AI chatbot that helps you organize a your trip

Ai Drive

Ai Drive - Chat with multiple PDF files

Convex

Convex - AI backend platform for AI assisted app development

Ilus AI

Ilus AI - AI illustration tool for stunning visual content

Vast AI

Vast AI - Cloud-based GPU Rentals for AI Computing

RECENT AI TOOLS

Gitingest

COUNT

Scan Relief

Mindtrip

Ai Drive

RECENT AI NEWS

Huawei to Launch New AI Chip, Challenging Nvidia

Google DeepMind UK Team Reportedly Seeks to Form a Union

Cedar: A New Approach to Solving Kubernetes Authorization Issues

Thin Film Actuator Powered Microbots: Morph, Lock Shape, and Operate Tetherlessly

Double-clicking the Google Photos search icon restores classic search

Meta's AI Chatbot Enables Sexual Conversations with Minors

Solve This Math Problem by Musk to Get Hired at Tesla?

Google AI Studio Update: Features, Tools, VEO 2, and Gemini 2.0

RECENT AI TOOLS