IBM has officially launched the new Granite 3.1 series, aimed at further solidifying its leading position in the open-source artificial intelligence domain. The newly released Granite 3.1 large language model (LLM) brings several upgrades for enterprise users, including an extended context length of 128K tokens, a new embedding model, integrated hallucination detection, and significant performance improvements.
According to IBM, its latest Granite 8B Instruct model outperformed similarly sized open-source competitors, such as Meta Llama 3.1, Qwen 2.5, and Google Gemma 2, in a series of academic benchmark tests on the OpenLLM Leaderboard. This achievement once again demonstrates IBM's strength in AI model development.
The release of Granite 3.1 is part of IBM's efforts to accelerate the update speed of open-source models. Just this October, IBM launched Granite 3.0 and claimed that its generative AI-related business had reached $20 billion. With this update, IBM aims to incorporate more features into smaller models to meet the needs of enterprise users. David Cox, Vice President of AI Models at IBM Research, stated that they have improved metrics across the board, resulting in significant performance enhancements, and have used Granite in multiple use cases.
For enterprises, the importance of performance and smaller models cannot be overstated. IBM evaluates model performance through a series of academic and practical tests, emphasizing that their models are tested and trained to optimize for enterprise use cases. Cox noted that efficiency is not just about speed but also about detailed efficiency metrics. Smaller models are easier to run in enterprises and are more cost-effective. Therefore, while IBM is working to integrate more features into the smallest models, it is also focused on enhancing model performance.
In addition to performance and efficiency improvements, IBM has significantly expanded the context length of Granite. While the context length in Granite 3.0 was limited to 4k, it has been extended to 128k in Granite 3.1, allowing for the processing of longer documents. This is a crucial upgrade for enterprise AI users, whether for retrieval-augmented generation (RAG) or proactive AI systems.
Furthermore, IBM has released a series of embedding models to accelerate the data vectorization process. The Granite-Embedding-30M-English model, for example, can achieve a performance of 0.16 seconds per query, which, according to IBM, is faster than competitors like Snowflake Arctic.
To enhance the performance of Granite 3.1, IBM has adopted an advanced multi-stage training process and focused on improving the quality of training data. Cox emphasized that it's not about quantity but about enhancing model performance through high-quality data.
To reduce the risk of hallucinations and incorrect outputs in LLMs, IBM has integrated hallucination protection directly into the model. The Granite Guardian 3.1 8B and 2B models now include a hallucination detection capability, enabling localized protection within the model, thereby improving efficiency and accuracy.
These new Granite models are currently available for free to enterprise users and can be accessed through IBM's Watsonx enterprise AI services and commercial product integrations. IBM plans to maintain an active update schedule for the Granite models and intends to add multimodal capabilities in the upcoming Granite 3.2, scheduled for early 2025. Cox mentioned that more distinctive features will be announced at the IBM Think conference in the future.