InternLM releases latest large-scale language model InternLM2.5-7B-Chat

2024-07-08

InternLM has officially launched its latest open large-scale language model, InternLM2.5-7B-Chat. This model is presented in the innovative GGUF format, which is fully compatible with the cutting-edge LLM (large language model) inference open-source framework llama.cpp. This compatibility allows InternLM2.5-7B-Chat to freely run on various hardware platforms, from local to cloud, meeting diverse application needs.

The introduction of the GGUF format provides the model with a rich selection of versions, including half-precision and various low-bit quantization versions (such as q5_0, q5_k_m, q6_k, q8_0), greatly enhancing the flexibility and efficiency of the model in different scenarios.

As another masterpiece of the InternLM series, InternLM2.5-7B-Chat has achieved comprehensive upgrades while inheriting the advantages of its predecessors. This model not only has a powerful foundation of 7 billion parameters but also carefully crafted a version specifically designed for practical chat scenarios. Its outstanding inference ability, especially in mathematical reasoning, has surpassed well-known models in the industry such as Llama3 and Gemma2-9B, demonstrating extraordinary competitiveness.

It is worth mentioning that InternLM2.5-7B-Chat has an impressive 1 million context window, which almost achieves perfect performance in benchmark tests such as LongBench for evaluating long-context tasks. This feature gives the model a unique advantage in handling complex conversations and accurately retrieving information from massive documents.

To further leverage the strengths of the model, InternLM2.5-7B-Chat has introduced a -1M variant specifically designed for 1 million long-context inference. However, running this high-performance version requires powerful computing resources, such as servers equipped with 4xA100-80G GPUs.

In performance evaluations using the OpenCompass tool, InternLM2.5-7B-Chat demonstrates excellent performance in multiple dimensions, including subject ability, language ability, knowledge ability, reasoning ability, and comprehension ability. In a series of authoritative benchmark tests such as MMLU, CMMLU, BBH, MATH, GSM8K, and GPQA, the model leads with outstanding scores. For example, the MMLU benchmark test achieves a high score of 72.8, far ahead of competitors such as Llama-3-8B-Instruct and Gemma2-9B-IT.

In addition, InternLM2.5-7B-Chat also has excellent tool usage capabilities, supporting information collection from over 100 web pages, providing users with an unprecedented convenient experience. The upcoming release of Lagent will further enhance this functionality, making the model more intelligent and efficient in instruction following, tool selection, and reflection.

To facilitate users in getting started quickly, InternLM provides detailed installation guides, model download instructions, and sample code for model inference and service deployment. Users can also use the lmdeploy framework to perform batch offline inference on quantized models, enjoying the high efficiency brought by INT4 weight quantization and deployment (W4A16). Compared to the FP16 format, this setup can improve the inference speed by 2.4 times on compatible NVIDIA GPUs, providing users with an ultimate performance experience.