Stability AI Unveils Compact, High-Efficiency Language Model: Stable LM 2 1.6B

2024-01-22

Stability AI has released Stable LM 2 1.6B, the first model in their new series of multilingual models. This model is compact, yet highly capable in various natural language tasks, making it an essential tool for developers.

The model was pretrained on a filtered dataset of 20 trillion word chunks from open-source large-scale datasets (supplemented with multilingual data from CulturaX) using 512 AWS A100 40GB GPUs (AWS P4d instances). It can smoothly handle English, Spanish, German, Italian, French, Portuguese, and Dutch. Stability AI states that the model architecture incorporates new algorithms in language modeling, striking a balance between speed and performance while allowing for faster training and iteration.

In the field of small-scale language models, Stable LM 2 1.6B stands out. Benchmark test results demonstrate that it achieves state-of-the-art results for a model with less than 2 billion parameters. It outperforms models such as Microsoft's Phi-1.5 (1.3B), TinyLlama 1.1B, and Falcon 1B in most tasks on the Open LLM Leaderboard.

Thanks to its multilingual data, Stable LM 2 also exhibits higher accuracy on translation datasets from companies like ARC, HellaSwag, and TruthfulQA. Furthermore, its performance on MT Bench further showcases its capabilities, as it delivers competitive results comparable to larger models, if not better.

One of the most appealing aspects of Stable LM 2 1.6B is its compact size and speed. This means reduced hardware requirements for training and deployment. However, it is important to note that smaller models also have some drawbacks, such as increased hallucinations and decreased inference/other burst capabilities.

Stability AI has released both the base model and instruction-adjusted versions. The company also provides the last checkpoint before pretraining cooling and optimizer states. This transparency is a great advantage for developers who wish to fine-tune and experiment with the model. The company plans to publish a technical report that will provide more details on the data and training procedures.