Hotshot launches Text-to-Video AI Generator

2024-08-23

Hotshot, a startup company, recently announced the launch of a text-to-video AI generator called Hotshot and released its early preview version. The company's three founders, Aakash Sastry, John Mullan, and Duncan Crawbuck, established Hotshot in 2023. Sastry stated on X social network, "With our model, we are able to build powerful new video applications for users. This is just the beginning, and we will continue to share more progress."


The public can now use Hotshot's service for free at Hotshot.co. Users can generate two watermark-free videos per day. Previously, Hotshot was an AI photo creation and editing application, but it is no longer maintained.

Through X Direct Message, Sastry revealed that their team has been dedicated to developing consumer applications for the past eleven years and has received support from investors such as Lachy Groom, Alexis Ohanian, and SV Angel.

According to a paper published by the team, Hotshot is a text-to-video model that can generate videos up to 10 seconds long with a resolution of 720p. This work was completed by four engineers in four months. Prior to this, Hotshot also trained an open-source model called Hotshot-XL, which can generate videos at 8 frames per second and one second in duration, with over 20,000 monthly active users. In addition, the team developed a follow-up model called Hotshot Act-One, which is used to create three-second video clips while maintaining the same 8 frames per second speed.

The Hotshot model released this time is the most ambitious work to date. The paper mentions that during the training process, the team used 600 million video clips and thousands of GPUs. Due to frequent hardware failures, especially when they are in extreme conditions, the entire training process requires constant monitoring. The paper points out that managing this process is a round-the-clock job for team members.

To compress the space and time data of the videos, the team trained a new autoencoder that allows the videos to reduce in size while retaining all content information for further AI model training.

Hotshot's text-to-video model is highly flexible and may expand to longer videos, higher resolutions, and other modalities such as audio in the future. Sastry showcased examples of Hotshot's ability to generate videos in different styles on X, including videos resembling comic book animations or videos created using tracing techniques. He also predicted that AI-generated content will become an important component of digital media. It is expected that within the next 12 months, the entire video can be generated through AI, and creators will be able to control the entire generation process from text to video to audio.