Stepwise Star has unveiled its latest open-source image-to-video model, Step-Video-TI2V. This model is trained on the 30B-parameter Step-Video-T2V video generation model and can produce videos with 102 frames, a duration of 5 seconds, and a resolution of 540P.
Step-Video-TI2V features two key characteristics: controllable motion intensity and camera movement control. To achieve adjustable motion strength, the model incorporates video dynamics scoring information through the AdaLN module during training, allowing users to specify different motion levels when generating videos. This enables precise control over the dynamic range, balancing the video's motion, stability, and consistency. Additionally, the model supports understanding and controlling various types of camera movements, capable of producing cinematic-level camera effects.
In terms of data optimization, Step-Video-TI2V provides specialized and accurate annotations for subject actions and camera motions, enhancing its performance in subject dynamics and camera effects. Moreover, the model excels particularly in anime-related tasks, making it highly suitable for animation creation and short video production scenarios.
Besides these features, Step-Video-TI2V supports generating videos from images in multiple sizes, catering to diverse creative needs and platform-specific requirements. The model also exhibits some special effects generation capabilities and is expected to unlock further potential using technologies like LoRA in the future.
Currently, the Step-Video-TI2V model has been adapted for Huawei’s Ascend computing platform and is available on the Modelers community for users to experience and utilize.