The latest addition to the Qwen series is Qwen2.5-Omni-7B, the first end-to-end multimodal large language model, which has been officially open-sourced recently. This advanced model can process multiple input types such as text, images, audio, and video simultaneously, while generating real-time textual responses and natural speech synthesis outputs.
In comprehensive evaluations of multimodal fusion tasks like OmniBench, Qwen2.5-Omni has set new industry benchmarks, surpassing competitors such as Google's Gemini-1.5-Pro. It demonstrates human-like multi-sensory cognition of the world and can identify emotions through audio-visual recognition, providing smarter and more natural feedback and decision-making in complex scenarios.
Qwen2.5-Omni leverages the innovative Thinker-Talker dual-core architecture developed by the Qwen team, along with TMRoPE - a position embedding algorithm that integrates audio-video technology. These technological advancements enable Qwen2.5-Omni to support various input formats and generate real-time textual and voice responses.
Notably, Qwen2.5-Omni has shown outstanding multimodal performance in authoritative benchmark tests for single-modal models of comparable scale. Its evaluation scores in speech understanding, image comprehension, video analysis, and speech generation surpass those of specialized Audio or Vision-Language models, with speech generation capabilities reaching human-level parity.
Unlike large closed-source models with hundreds of billions of parameters, Qwen2.5-Omni's compact 7-billion-parameter architecture makes widescale industrial application of multimodal large models feasible. The model is now available through ModelScope and Hugging Face platforms, with direct user experience accessible via Qwen Chat.
Furthermore, since 2023, the Qwen team has developed over 200 large-scale models covering various parameter ranges, including text generation, visual understanding/generation, speech processing, image creation, and video models across all modalities. To date, the number of derivative models from Qwen in global AI open-source communities has exceeded 100,000.