"SmartAI Develops High-Quality Text-to-Video Model, Benchmarking OpenAI Sora, Expected to Release Within the Year"

2024-05-06

In the field of AI large models, the Chinese unicorn company Zhipu AI is brewing a technological revolution. According to reports, the company is committed to developing a text-to-video model that is comparable to OpenAI's Sora and is expected to be unveiled to the public as early as this year. "The text-to-video technology is entering a period of rapid development, and this year is expected to be a key year for the explosion of large models," said an insider at Zhipu AI. They have found that there is a wide demand for this technology in the domestic market, covering areas such as film production, short videos, and game development. Zhipu AI aims to launch high-quality products to meet these needs by leveraging higher-quality data and more powerful model parameters. Established in 2019, Zhipu AI originated from the technological achievements of the Department of Computer Science at Tsinghua University and has a strong academic background. The CEO Zhang Peng, President Wang Shaolan, and Chairman Liu Debing are all graduates of the Department of Computer Science at Tsinghua University and have served as core members of the Knowledge Engineering Group (KEG) laboratory. As one of the earliest companies in China to enter the field of large models, Zhipu AI has launched several highly anticipated products. In March 2023, they released the ChatGLM series of billion-scale open-source dialogue models and completed four important upgrades in the past year, ultimately launching the GLM4 series in January 2024. Based on these powerful large models, Zhipu AI has also built the AIGC model product matrix, including the AI efficiency assistant Zhipu Qingyan, the high-efficiency code model CodeGeeX, the multimodal understanding model CogVLM, and the text-to-video model CogView. These products have demonstrated outstanding performance and application value in different fields. In March of this year, Zhang Peng stated that Zhipu AI has more than 2,000 ecological partners and over 1,000 large-scale applications of large models. They have also engaged in deep collaboration with more than 200 companies, covering multiple fields such as media, consulting, consumer goods, finance, new energy, the internet, and intelligent office. In addition, Zhipu AI's ChatGLM-6B model has accumulated more than 13 million downloads worldwide, with over 50,000 GitHub stars, surpassing the total of Meta Llama's two versions. This has made Zhipu AI one of the most popular open-source organizations globally, surpassing giants such as OpenAI, Google, and Microsoft. Regarding the upcoming text-to-video model, insiders at Zhipu AI stated that its performance is already close to top overseas models. They revealed that the latest large model GLM-4 has made significant improvements in overall performance compared to the previous generation and is approaching the level of GPT-4. In certain Chinese alignment tasks, GLM-4 can even slightly surpass GPT-4. According to the latest Berkeley Arena Hard benchmark test, GLM ranks only below GPT-4 Turbo and Claude 3 Opus, leading in both domestic and international models. During the development of the text-to-video model, Zhipu AI faces challenges such as choosing the technical route and obtaining high-quality video material corpus. They need to optimize the combination architecture of Transformer and Diffusion in the technical route, solve the jitter problem between consecutive frames, achieve high-resolution and high-consistency long sequence generation, and obtain more detailed real-world scene data. At the same time, they also need to pay attention to data copyright and usage issues to ensure the legality and sustainability of the model. It is worth noting that Zhipu AI has previously invested in Shengshu Technology and jointly released Vidu, China's first long-duration, high-consistency, and high-dynamic video large model, with Tsinghua University. This model has been hailed as "China's first Sora-level video model" and "China's first self-developed video large model." If Zhipu AI also releases a similar text-to-video model, the two will form a certain competitive relationship. However, this will also drive both parties to continuously innovate and break through in technology, jointly promoting the development of China's AI industry. So far, Zhipu AI has invested in and acquired more than 13 AI industry chain startups through industrial investment. Zhang Peng has stated that Zhipu AI is undergoing a process of qualitative change, especially in the so-called emergence of large models. Their goal is to target AGI (Artificial General Intelligence) and achieve super-cognitive intelligence, self-interpretation, self-evaluation, and self-supervision beyond human level, while ensuring the safety and controllability of the models.