Zixiang Future Releases New Version of Multimodal Generation and Understanding Model

2025-01-03

Recently, ZhiXiang Future announced the launch of ZhiXiang Multimodal Generation Model 3.0 and ZhiXiang Multimodal Understanding Model 1.0.

It is reported that ZhiXiang's multimodal large models have established the largest multimodal copyright corpus in China, comprising tens of thousands of hours of copyrighted video materials and thousands of authorized IPs, covering over 70% of Chinese-language film and television data. These resources have generated hundreds of millions of AIGC secondary creation materials and have been widely applied in various fields such as film and television, tourism, communication, marketing, and education.

In terms of technology, ZhiXiang Multimodal Generation Model 3.0 has significantly enhanced its image and video generation capabilities. The new version improves picture quality and relevance, enhances the controllability of camera and scene movements, and optimizes multi-scenario driving. Additionally, it innovatively combines autoregressive and diffusion models to create a globally pioneering diffusion-autoregressive model architecture, effectively reducing model size and computational costs while achieving a dual improvement in performance and efficiency. The new version also introduces a mixed imaging model MOE architecture, ensuring high generation quality while significantly accelerating inference speed, providing technical support for real-time or near-real-time applications.

At the same time, ZhiXiang Multimodal Understanding Model 1.0 has officially debuted. This version achieves fine and accurate understanding of image and video content through object-level and event-level spatiotemporal modeling. During the on-site demonstration at the pilot zone launch ceremony, ZhiXiang Multimodal Understanding Model 1.0 successfully provided detailed descriptions of video scenes, capturing complex relationships, logical sequences, spatial arrangements, and camera movements among objects in the frames.

Furthermore, ZhiXiang Future Technology showcased an innovative "one-stop video platform." This platform allows users to upload personal photos to create new interactive experiences and demonstrated personalized interactive presentations of Anhui cultural relics. This practice not only enhances the appeal of the content but also provides a unique perspective for promoting Anhui's cultural tourism.

The release of the new versions of ZhiXiang's multimodal large models marks a significant step forward in the company's technological innovation and application expansion in the field of artificial intelligence. It also injects new vitality into the creative industry and visual arts.