After the release of OpenAI's text-to-video model Sora, the competition in the field of video generation technology in China has entered a new phase. Recently, the leading short video platform in China, Kuaishou, announced that its self-developed video generation model "Kelinge" has officially launched its official website.
As a leader in the short video industry in China, Kuaishou has successfully created this video generation model with a wide range of applications, thanks to its years of deep accumulation in video technology. According to 36kr, unlike other video generation models on the market that focus on showcasing videos, Kuaishou's "Kelinge" model not only matches Sora in terms of performance, but also opens up invitation-based testing and experience in Kuaishou's subsidiary app, Kuaiying, allowing users to personally experience its powerful video generation capabilities.
Kuaishou's "Kelinge" model is independently developed by Kuaishou's AI team, adopting a similar technical approach as Sora and combining multiple self-developed innovative technologies. These technological advantages make the "Kelinge" model demonstrate outstanding performance in the field of video generation. Specifically, it has the following notable features:
Firstly, the "Kelinge" model is capable of generating substantial and reasonable movements. By using a 3D spatiotemporal attention mechanism, this model can better model the complex spatiotemporal movements in videos, allowing the generated videos to maintain smoothness while adhering to objective motion laws.
Secondly, the "Kelinge" model can simulate the characteristics of the real physical world. Its self-developed model architecture and powerful modeling capabilities create an imagination space that closely approximates reality for users. Whether it is light and shadow reflection, fluid motion under the influence of gravity, or interaction with the physical world, the "Kelinge" model can generate videos that adhere to physical laws.
In addition, the "Kelinge" model possesses strong conceptual combination ability and imagination. Through a deep understanding of text-video semantics and powerful conceptual combination capabilities learned from the Diffusion Transformer architecture, it can easily transform users' rich imagination into concrete visuals. This enables users to effortlessly realize their creative ideas.
It is worth mentioning that the videos generated by the "Kelinge" model have a resolution of up to 1080p and a duration of up to 2 minutes (30fps frame rate), and support flexible output video aspect ratios. This advantage is due to its self-developed 3D VAE technology, which can encode videos into compact latent spaces and decode them into videos with rich details. At the same time, efficient training infrastructure, ultimate inference optimization, and scalable underlying architecture ensure that the "Kelinge" model can generate high-quality video content.
During the development process, the "Kelinge" model has established an efficient large-scale automated data solution, covering various aspects such as massive video mining, multidimensional labeling and filtering, video description enhancement, and data-driven quality evaluation. These measures ensure that the model can fully utilize data resources and improve the generation performance during the training process.
The "Kelinge" model is currently available for invitation-based testing and experience in the Kuaiying app, supporting creators to apply and experience the latest text-to-video functionality. In the future, with Kuaishou's comprehensive layout in the era of AI models, more application directions based on the "Kelinge" model will be launched one after another. For example, the "AI Dance King" feature based on limb-driven movements has already been successfully launched on Kuaishou and Kuaiying apps, allowing users to experience the fun of one-click dancing by simply uploading a photo. In the future, the "AI Singing and Dancing" new gameplay will also be launched, providing users with more diversified AI creation and interactive experiences.