Previously, OpenAI has applied for a trademark for the term "Voice Engine," sparking rumors about the company's upcoming competitor to Siri and Alexa. Now, OpenAI has released a preview version of Voice Engine, a new model that can generate speech that sounds natural and like the original speaker using just a 15-second audio sample.
Since late 2022, this technology has been in development and has been used to support OpenAI's text-to-speech API, ChatGPT Voice, and Read Aloud's preset voices. Despite these promising applications, the company has not announced a public release date and remains cautious about its wider release due to potential misuse risks.
To better understand the potential uses and impact of Voice Engine, OpenAI has privately tested this technology with a small group of trusted partners. These early adopters have developed impressive applications, such as providing reading assistance with natural and emotive voices for non-readers and children, translating content into languages understandable to global audiences, improving basic service delivery in remote areas, and supporting individuals with speech disorders or disabilities.
OpenAI has shared some examples of Voice Engine in practical applications:
Providing reading assistance for non-readers and children with natural and emotive voices that represent a wider range of speakers than preset voices. Age of Learning, an educational technology company dedicated to children's academic success, has been using this technology to generate preset narration content.
Translating video and podcast content into fluent languages to attract a global audience with their own voices. HeyGen, an early adopter of this technology, is an AI visual storytelling platform that collaborates with their enterprise clients to create customized, human-like virtual avatars for various content, from product marketing to sales presentations.
Improving basic service delivery in remote areas and reaching global communities. Dimagi is building tools for community health workers to provide various basic services, such as counseling for breastfeeding mothers. To help these workers enhance their skills, Dimagi uses Voice Engine and GPT-4 to provide interactive feedback in their primary language (including Swahili) or more informal languages like Kenya's popular hybrid language, Sheng.
While these use cases demonstrate the positive potential of Voice Engine, OpenAI also recognizes the serious risks associated with generating speech that closely resembles human voices. The company actively collaborates with partners from different sectors, incorporating their feedback to ensure responsible development and deployment of the technology.
OpenAI has taken a safety-oriented approach in building Voice Engine, including requiring partners to comply with usage policies that prohibit unauthorized impersonation, obtaining explicit and informed consent from the original speakers, and disclosing the use of AI-generated speech. The company has also implemented security measures such as watermarking technology to trace the source of generated audio and actively monitors the usage of the technology.
OpenAI is not the only company driving the development of synthetic speech technology. Other players in the field, such as ElevenLabs, offer state-of-the-art AI voice solutions for various products and services, including professional voice cloning, dubbing, and translation.
Recently, Hume AI launched the Empathetic Voice Interface, which utilizes an empathetic large-scale language model to adjust its language and tone based on context and user's emotional expressions. These developments highlight the rapid progress and growing interest in AI-driven speech technology across industries.
Looking ahead, OpenAI encourages society to enhance its ability to address the challenges posed by increasingly realistic generative models. This includes gradually phasing out voice-based sensitive information authentication, exploring policies to protect individuals' voices in AI, raising public awareness of AI's capabilities and limitations, and accelerating the development and adoption of technologies that track the sources of audiovisual content.
As the debate around synthetic speech technology continues, OpenAI's preview of Voice Engine highlights both its potential benefits and the necessity of responsible deployment. The company's cautious approach and ongoing communication with stakeholders are crucial for understanding and mitigating the risks associated with this powerful technology.