After mastering machine learning (ML)-based voice cloning and synthesis technology, ElevenLabs, an AI startup founded by former Google and Palantir employees two years ago, is expanding its product portfolio and launching new text-to-speech models.
ElevenLabs has teased that this AI will allow creators to generate sound effects simply by describing their imagination in text. In the era of AI-driven digital experiences, it is expected to enrich content in a new way.
The model is not currently publicly available, but ElevenLabs has showcased its capabilities through a one-minute teaser trailer. The trailer was produced by OpenAI's new product, Sora, and enhanced with ElevenLabs' own AI voice. The company has also set up a registration page and is inviting potential users to join the early access waiting list for the model.
AI sound effects beyond voice
Founded in 2022, ElevenLabs has been dedicated to researching AI to achieve cross-language and cross-regional access to audio and video content (from movies to podcasts). The company initially launched a range of products, including text-to-speech and speech-to-speech models that can generate AI voices in 29 different languages from given content (text/audio/video), while providing natural speech and emotions (the original voice of the speaker in speech-to-speech).
Although both of these tools have been widely adopted by content-producing companies and individuals, fully AI-generated content has also started to emerge with the introduction of tools like Runway, Pika, and the recent OpenAI (Sora). These products can generate realistic AI videos based on simple text prompts, but they lack default audio. This is where ElevenLabs' new model comes into play, allowing users to create sound effects by describing the desired content.
When put into use, this product will easily enable AI creators to enhance their work with background sounds that seamlessly blend with the piece. The sound effects can be anything from chirping birds to moving vehicles and honking horns. It can even be the sounds of people talking, eating, or walking on busy streets.
Luke Harries, the developer responsible for ElevenLabs' development, wrote in a reposted article, "At ElevenLabs, we have only showcased our text-to-speech model to the public in the past. However, we are still developing more products. When OpenAI announced their Sora model - which can generate incredible videos but without sound - we decided to give a sneak peek of our new product line." The article also includes a series of videos generated by Sora, enhanced with AI sound effects from ElevenLabs' model.
In addition to AI-generated content, the sound produced by the new model can even be applied to text or any other video (Instagram shorts, commercials, or video game trailers) that require background audio for ordinary speech. As for how it will be used and its quality, we will have to wait and see.