Podcastle, a platform specializing in podcast recording and editing, has recently entered the competition for AI text-to-speech technology by launching its self-developed AI model, Asyncflow v1.0. The company has also introduced an API for developers, enabling them to integrate this text-to-speech model directly into their applications.
With Asyncflow v1.0, Podcastle can offer over 450 AI voices for text narration. According to Podcastle, during the development and training of this model, they focused on cost control and efficiency improvement, giving them a competitive edge in the market.
This move places Podcastle alongside companies like ElevenLabs, Speechify, and WellSaid, all of which are working on technologies and models that convert any text into AI-generated voice clips. This technology is widely applicable, spanning marketing, advertising, content creation, education, and corporate training.
Podcastle's founder, Artur Yeritsyan, stated in an interview that the company had planned to develop a text-to-speech model from the beginning, but high development costs and data requirements were once obstacles.
"We've always wanted to build a powerful text-to-speech model. However, the development costs were very high. Thanks to the rapid advancement of large language models in recent years, we made significant breakthroughs last year, allowing us to create high-quality voice models without needing vast amounts of data," Yeritsyan explained.
Last year, Podcastle successfully completed a $13.5 million Series A funding round, providing financial support for the project’s progress.
In terms of pricing, Podcastle charges approximately $40 for every 500 minutes of text-to-speech conversion, while ElevenLabs charges $99 for the same service.
Additionally, Podcastle has upgraded its voice cloning feature, making the training process faster. Previously, users had to read around 70 different sentences, but now, just a few seconds of audio recording can create a voice clone. This new process incorporates Podcastle's Magic Dust AI technology, released last year, to enhance recording quality.
Testing shows that although the voices created through this new process perform well in mimicking tone, they still sound somewhat mechanical. Podcastle says they will continue improving the feature and note that users can achieve varied results by training with different voice samples.
Podcastle also emphasized that beyond cost advantages, integrating audio, video, podcasting, and AI narration tools into a redesigned platform will help it stand out in the competition. Yeritsyan pointed out that while most users currently use Podcastle for audio content, demand for video processing is rapidly increasing.