AI21 Labs Unveils Jamba: First Large-Scale Model Built on Mamba Architecture Open Source

2024-03-29

AI21 Labs has just released Jamba, the world's first production-grade AI model based on the innovative Mamba architecture. Most existing models, such as GPT, Gemini, and Llama, are based on the Transformer architecture. Jamba combines the advantages of the Mamba Structured State Space Model (SSM) and the traditional Transformer architecture, resulting in impressive performance and efficiency improvements. Jamba features a wide context window of 256K tokens, equivalent to approximately 210 pages of text, and can accommodate up to 140K tokens on a single 80GB GPU. This remarkable achievement is made possible by its hybrid SSM-Transformer architecture, which utilizes a mixture of experts (MoE) layer and only utilizes 12B out of the available 52B parameters during inference. As a result, compared to similar products like Meta's Llama 2 with a context window of 32,000 tokens, Jamba is able to handle longer contexts while maintaining high throughput and efficiency. One key advantage of Jamba is its three-fold increase in throughput on long contexts compared to similarly sized Transformer-based models like Mixtral 8x7B. This is thanks to its unique hybrid architecture, which consists of Transformer, Mamba, and MoE layers, optimizing for memory, throughput, and performance. It adopts a block and layer approach, where each Jamba block contains an attention layer or a Mamba layer, followed by a multi-layer perceptron (MLP). This results in only one Transformer layer out of every eight total layers. AI21 Labs states that this approach allows the model to maximize quality and throughput on a single GPU, leaving enough memory for common inference workloads. Jamba's outstanding performance is not only reflected in efficiency and cost-effectiveness. The model has demonstrated impressive results in various benchmark tests, performing on par or surpassing state-of-the-art models of similar size across a wide range of tasks. Jamba is released under the Apache 2.0 license. It can be obtained on Hugging Face and accessed as part of the NVIDIA NIM Inference Microservice from the NVIDIA API catalog, allowing enterprise application developers to deploy it using the NVIDIA AI Enterprise software platform. Currently, Jamba is released as a research model and does not have the necessary safeguards for commercial use. However, AI21 Labs plans to release a fine-tuned and more secure version in the coming weeks. As the AI community continues to explore and refine new architectures, we can expect more impressive advancements in performance, efficiency, and accessibility, paving the way for a new generation of powerful AI models.