"Shengshu Technology and Tsinghua University officially release China's first long-duration video large model Vidu"

2024-04-28

At the 2024 Zhongguancun Forum Annual Meeting, the Future AI Pioneer Forum, Tsinghua University and Shengshu Technology jointly released Vidu, the first video mega-model in China with long duration, high consistency, and high dynamics on April 27th. The birth of this innovative achievement marks a significant breakthrough in China's video generation technology field and is expected to lead the development direction of global video mega-model technology. It is reported that Vidu adopts the U-ViT architecture, a fusion of Diffusion and Transformer, originally created by the team from Tsinghua University and Shengshu Technology. This architecture supports the one-click generation of high-definition video content with a duration of up to 16 seconds and a resolution of up to 1080P. This technological breakthrough not only greatly improves the efficiency of video generation but also ensures the high quality and high consistency of video content. At the forum, Professor Zhu Jun from Tsinghua University and Chief Scientist of Shengshu Technology provided a detailed introduction to the technical features and application prospects of Vidu. He stated that Vidu can not only simulate the real physical world but also has rich imagination, capable of generating multi-camera, temporally and spatially consistent video content. At the same time, Vidu can also incorporate unique Chinese elements, such as pandas and dragons, into videos, showcasing the charm of Chinese culture. It is worth mentioning that Vidu adopts a "one-step" approach in video generation, directly generating high-quality videos from text descriptions without the need for intermediate frame interpolation or other multi-step processes. This end-to-end generation method not only simplifies the video production process but also improves the efficiency and quality of video generation. Professor Zhu Jun also revealed that the rapid breakthrough of Vidu is due to the team's long-term accumulation and multiple original achievements in Bayesian machine learning and multimodal mega-model fields. The U-ViT architecture, as the core technology independently developed by the team, provides strong support for the realization of Vidu. With the continuous development of artificial intelligence technology, video mega-models will play an increasingly important role in various fields. This innovative achievement by Tsinghua University and Shengshu Technology undoubtedly injects new momentum into the development of video mega-model technology in China and even globally. In the future, we look forward to seeing more innovative applications based on Vidu technology, bringing more convenience and joy to human life. Professor Zhu Jun stated that the naming of Vidu has profound implications. It not only sounds like "Video" but also embodies the meaning of "We do," reflecting the team's firm belief and unremitting efforts in the field of video mega-models. In the future, they hope to strengthen cooperation with upstream and downstream companies in the industry chain and research institutions to jointly promote the progress of video mega-model technology and contribute more to the development of human society.