IBM Launches New Granite 3.2 Model Family, Delivering Essential Inference Capabilities

2025-02-27

IBM has unveiled the new Granite AI model family, featuring experimental reasoning, vision, and predictive capabilities.

As with previous releases, IBM has made its models available under the permissive open-source Apache 2.0 license. All Granite models are now accessible on Hugging Face, with some also available on IBM watson.ai and other platforms.

The new lineup includes its flagship text-only large language model, Granite 3.2 Instruct variant, offered in 8B and 2B versions. It is capable of performing tasks like summarization, problem-solving, and code generation, designed to follow instructions. These types of models are best suited for building AI assistants and agents. Both versions are trained using "chain-of-thought" reasoning similar to other industry-standard models, but IBM engineers have designed them to be smaller yet more efficient.

The reasoning capabilities in each model can be toggled on or off programmatically. This means IBM has created a single model that can function as either a conversational model or a reasoning model, rather than releasing separate "reasoning models." Since reasoning demands significant computational power during deployment, disabling unnecessary reasoning at runtime can save substantial amounts of energy.

"The next era of AI focuses on efficiency, integration, and real-world impact—enabling enterprises to achieve powerful results without excessive computational costs," said Sriram Raghavan, IBM's VP of AI Research (pictured).

Reasoning models "think through" problems step by step, commonly referred to as "chain-of-thought" in the industry. Since the release of DeepSeek's R1, these models have gained popularity. Most reasoning models scan the entire reasoning space to identify the optimal logical "path" before generating a final answer. However, it isn't always necessary to follow the entire path once a poor outcome is identified.

IBM engineers developed a novel reasoning scaling technique that reduces the computational cost of reasoning tasks by incorporating a reward system via a secondary process-based reward model. This reward model monitors the LLM and redirects it to logic paths with higher confidence outcomes during reasoning. Combined with search techniques that can scan the entire logic space, IBM researchers claim they have created a smaller, more efficient reasoning model approach that accomplishes everything within a single model compared to R1.

"DeepSeek's R1 release, in many ways, validates IBM's strategy of smaller, efficient models," said Dave Vellante, chief analyst at theCUBE Research, SiliconANGLE's sister market research firm. "IBM's briefing reinforces this perspective, noting that DeepSeek utilized expert mixes and other efficiency methods as early as December 2024, but only recently gained market attention with the R1 spotlight. We believe this reflects IBM's focus on training efficiency and specialized architectures."

IBM claims that Granite 3.2 8B can be fine-tuned to rival larger models such as Claude 3.5 Sonnet and OpenAI GPT-4o on mathematical reasoning benchmarks like AIME2024 and MATH500 tests.

New Multimodal Vision Model and Smaller Guardrail Models

IBM has also introduced the new multimodal Granite Vision 3.2 2B, equipped with computer vision capabilities aimed at helping businesses handle visual document understanding.

Granite Vision can manage various visual understanding tasks but excels particularly in document processing. While most Vision Language Models (VLMs) are designed for general visual tasks, few excel at optical character or text recognition. IBM’s engineering team spent considerable time training Vision 3.2 to adapt to unique visual features like layout, fonts, charts, and infographics.

Granite Guardian 3.2 represents IBM's latest guardrail AI model, designed to detect and highlight risks in prompts and responses. The company states that its performance matches version 3.1 but is faster and more cost-effective.

One advantage of Guardian 3.2 is that it provides "verbalized confidence levels" while monitoring inputs and outputs, indicating confidence levels. Instead of offering binary "yes" or "no" responses, it expresses confidence as "high" or "low." This gives developers better guidance on whether they can trust or reject an output, providing them with a usable threshold.

In addition to the updated 8B version, IBM has released two new model sizes. The first is a streamlined 5-billion-parameter version that retains near-original performance. The second is Granite 3.2 3B-A800M, created by fine-tuning a Mixture-of-Experts base model. It achieves high performance at a lower cost by activating 800 million out of its 3 billion parameters at a time.

The final model in IBM's Granite family includes the compact Granite Timeseries model, also known as Tiny Time Mixers. The latest Granite-Times