Alibaba's large-scale model once again leads the open-source trend with the release of the new Qwen1.5 version, offering six different sizes for developers to choose from. Among them, the largest Qwen1.5-72B-Chat model has performed remarkably well on MT-Bench and Alpaca-Eval v2, surpassing not only Claude-2.1 and GPT-3.5-Turbo-0613 but also outscoring previous versions of GPT-4 in some tasks.
This news has sparked heated discussions among developers worldwide, with many congratulating Alibaba and showing great interest in the 0.5B mini version. At the same time, there is also anticipation for the open-source release of the multimodal large-scale model Qwen-VL-Max.
It is worth mentioning that Qwen1.5 not only provides a wide range of model choices but also achieves deep integration with mainstream frameworks. Currently, the code for Qwen1.5 has been merged into Hugging Face transformers, allowing it to be used without the need for trust_remote_code. In addition, Qwen1.5 has collaborated with popular third-party frameworks such as vLLM, SGLang, AutoAWQ, AutoGPTQ, Axolotl, LLaMA-Factory, and llama.cpp, providing a one-stop service from fine-tuning, deployment, quantization to local inference.
The powerful performance of the Qwen1.5 series is another highlight. In benchmark tests such as MMLU (5-shot), C-Eval, Humaneval, GS8K, and BBH, the score of Qwen1.5-72B has surpassed that of GPT-4. In terms of long-context support, the entire series of models has demonstrated outstanding capabilities, especially the Chat models. For example, the performance of Qwen1.5-7B-Chat in small models is comparable to that of GPT-3.5. Qwen1.5-72B-Chat's performance is significantly better than GPT3.5-turbo-16k and slightly behind GPT4-32k.
In terms of code execution, Qwen1.5 also performs well. Although the 72B chat model still lags behind GPT-4 in mathematics and visualization, its code execution efficiency surpasses that of GPT-4. The Alibaba team stated that they will continue to optimize this feature in future versions.
In addition to performance improvements, Qwen1.5 has also been upgraded in terms of functionality and uniformity. The entire series of models now supports a maximum length of at least 32k, with enhanced multilingual capabilities and a wider range of multilingual evaluations. Furthermore, the entire series of models now uniformly supports system prompts and strong external system capabilities (agent/RAG/Tool-use/Code-interpreter), providing developers with a more convenient and efficient development experience.