Eagle 7B: RNN Surpasses Transformer in Performance for the First Time

2024-01-30

The open-source community has recently released a new RNN model called Eagle 7B, based on the RWKV-v5 architecture. This model has been trained on 1.1 trillion tokens and supports over 100 languages. RWKV architecture, also known as "Rotating Weighted Key-Value," is a variant of the recurrent neural network (RNN) architecture widely used in the field of artificial intelligence and natural language processing. Eagle 7B promises to be a leading 7B model in terms of inference cost, environmental efficiency, and language diversity. It reduces inference costs while delivering outstanding performance. With 7.52 billion parameters, this model performs exceptionally well in multilingual benchmark tests, setting a new standard for similar models. It competes competitively with larger-scale models in English language evaluations and has the uniqueness of being a "no-attention transformer," although additional adjustments may be required for specific purposes. Eagle 7B excels in multilingual performance, claiming significant results in benchmark tests covering 23 languages. It also shows significant improvement in English performance, surpassing its predecessor RWKV v4 and competing with top-tier models. This model is available under the Apache 2.0 license and can be downloaded from the HuggingFace platform for both personal and commercial use. Eagle 7B aims to achieve more inclusive AI technology, supporting a wider range of languages by utilizing a more scalable architecture and more efficient data utilization. This model challenges the dominance of transformer models and demonstrates superior performance when trained with comparable amounts of data, such as RNNs like RWKV. In the RWKV model, the rotation mechanism helps to better understand the position or order of elements in a sequence, while the weighted key-value allows the model to retrieve stored information more efficiently from previous elements in the sequence. Although there are still questions about the scalability of RWKV compared to transformers, the team remains optimistic about its potential. Future plans include additional training, publishing an in-depth paper on Eagle 7B, and developing a 2T model. With the continuous development and innovation of the open-source community, we look forward to more outstanding models and technologies driving the advancement of the field of artificial intelligence.