OpenAI Updates Realtime API: Adds Voice Options and Lowers Prices

2024-10-31

Recently, OpenAI updated its Realtime API, which is still in testing, introducing additional voice options to the platform and lowering costs associated with prompt caching.

With this update, beta users of the Realtime API can now develop applications using five new voices. In a recent announcement, OpenAI highlighted three of these voices: Ash, Verse, and Ballad, the latter featuring a British accent.

According to OpenAI's API documentation, the native voice-to-voice feature eliminates intermediate text processing, resulting in lower latency and more refined outputs. Additionally, these new voices offer greater control and expressiveness compared to previous options.

However, OpenAI cautioned that since the API is still in the testing phase, client authentication is not available. Moreover, real-time audio processing may encounter challenges.

OpenAI noted, "Network conditions significantly impact real-time audio, making it challenging to reliably transmit audio from the client to the server for large-scale processing under unstable network conditions."

OpenAI has had a controversial history in the AI voice sector. In March this year, the company launched the Voice Engine voice cloning platform to compete with ElevenLabs, but it was restricted to a limited number of researchers. In May, after showcasing GPT-4o and its voice mode, OpenAI paused the Sky voice following dissatisfaction expressed by actress Scarlett Johansson.

In September, OpenAI introduced the advanced ChatGPT voice mode in the United States for paid subscribers, including users of ChatGPT Plus, Enterprise, Teams, and Edu.

The ideal voice-to-voice AI technology enables businesses to respond more rapidly through voice interactions. For instance, when a customer calls a company's support platform, the voice-to-voice feature can recognize the customer's voice, comprehend their needs, and reply with AI-generated speech at low latency. Additionally, voice-to-voice technology allows users to create voiceovers where the user delivers the lines, but the output voice is not their own. Platforms offering such services include Replica and ElevenLabs.

This month, during Developer Day, OpenAI unveiled the Realtime API, aiming to accelerate the development of voice assistants.

Regarding costs, while utilizing the voice-to-voice feature may incur high expenses, OpenAI plans to reduce the Realtime API pricing through prompt caching.

Specifically, the costs for cached text inputs will decrease by 50%, and cached audio inputs will be reduced by 80%.

During Developer Day, OpenAI also announced the prompt caching feature, which stores frequently requested contexts and prompts in the model's memory, thereby reducing the number of tokens required to generate responses. Lowering input costs could attract more developers to integrate with the API.

Notably, OpenAI is not the only company to introduce prompt caching. In August, Anthropic also launched a prompt caching feature for Claude 3.5 Sonnet.

RECENT AI NEWS

RECENT AI TOOLS

Deepgram

Deepgram - Build voice AI functionalities into your apps

How Old Do I Look —— Free AI face age detector

How Old Do I Look —— Free AI face age detector - Instantly see your age through AI's eyes, free and easy to use!

Nemotron

Nemotron - Seamless, Human-Like Conversations with Advanced AI Technology

AI Desk

AI Desk - Automate 24/7 customer support and boost sales

Averi AI

Averi AI - Your marketing partner, always on and always expert

Walle - Payments for Agents

Walle - Payments for Agents - Streamline agent payments without the hassle of storing card information

Fill Genius

Fill Genius - One Click, Effortless Form Filling

Mochi 1

Mochi 1 - Transform Your Ideas into Stunning Creative Videos