AI2 releases open-source model OLMoE, reducing inference costs

2024-09-11

Allen Institute for AI (AI2) and Contextual AI have collaborated to release a new open-source model called OLMoE. The model adopts a sparse expert mixture (MoE) architecture with 7 billion parameters, but only utilizes 1 billion parameters for processing each input token. OLMoE has two versions: the general OLMoE-1B-7B and the instruction-tuning OLMoE-1B-7B-Instruct.


AI2 emphasizes that OLMoE is fully open-source, which sets it apart from most other MoE models that only provide model weights without disclosing training data, code, or methods. This lack of open resources hinders the development of cost-effective open MoE models, making them inaccessible to many academics and other researchers.

AI2 research scientist Nathan Lambert stated on X (formerly Twitter) that OLMoE will contribute to policy-making and serve as a starting point for the academic H100 cluster.

Based on AI2's previous open-source model OLMO 1.7-7B, OLMoE supports a context window of 4096 tokens and the training dataset includes Dolma 1.7. The training dataset of OLMoE combines data from DCLM and Dolma, including a filtered subset of Common Crawl, Dolma CC, Refined Web, StarCoder, C4, Stack Exchange, OpenWebMath, Project Gutenberg, Wikipedia, and more.

Experiments show that OLMoE performs comparably to other models but significantly reduces inference costs and memory storage. OLMoE surpasses existing models with similar active parameters in terms of performance, even outperforming large models like Llama2-13B-Chat and DeepSeekMoE-16B. In benchmark tests, OLMoE-1B-7B performs similarly to models with 7 billion or more parameters, such as Mistral-7B, Llama 3.1-B, and Gemma 2.

While many AI model developers are utilizing MoE architecture to build models, such as Mistral's Mixtral 8x22B and X.ai's Grok model, AI2 and Contextual AI believe that these models are not fully open-source and do not provide information on training data or source code. This raises new design questions about how to use MoE models, such as the total number of parameters versus active parameters and whether to use multiple small experts or a few large experts.

Currently, open-source organizations have begun exploring the definition and promotion of open-source AI models.