OpenAI Launches New Transcription and Voice Generation AI Model AI NEWS

Home
AInews
OpenAI Launches New Transcription and Voice Generation AI Model

OpenAI Launches New Transcription and Voice Generation AI Model

2025-03-21

OpenAI has integrated new transcription and speech generation AI models into its API, aiming to enhance the capabilities of their previous versions.

These new models align with OpenAI's broader "agent" vision of building automated systems capable of independently completing tasks on behalf of users. According to Olivier Godement, OpenAI's product lead, the coming months will see an increase in such agents, focusing on helping customers and developers utilize these practical, accessible, and accurate tools.

The new text-to-speech model, named “gpt-4o-mini-tts,” not only generates more refined and realistic voices but also offers greater "controllability." Developers can use natural language to instruct the model on how to pronounce words, such as requesting it to speak "in the tone of a mad scientist" or "with the calm voice of a meditation teacher."

Jeff Harris, a member of OpenAI’s product team, highlighted that the goal is to allow developers to customize the "experience" and "context" of the voice. In different scenarios, a monotonous and unchanging voice isn't ideal. For instance, in customer support situations, the voice can convey emotions like expressing apologies when needed.

As for the new speech-to-text models, “gpt-4o-transcribe” and “gpt-4o-mini-transcribe,” they are set to replace OpenAI’s long-standing Whisper transcription model. OpenAI claims these new models were trained on a "diverse, high-quality audio dataset," allowing them to better capture accented and varied speech, even performing well in noisy environments.

Harris added that the new models have also improved in reducing "hallucinations." Whisper sometimes fabricates words or entire conversations, leading to inaccuracies in transcriptions. The new models show significant improvements in ensuring precise capture of spoken words without adding details that weren’t heard.

However, transcription accuracy may vary across languages. Based on OpenAI’s internal benchmarks, the more accurate transcription model, “gpt-4o-transcribe,” exhibits a word error rate of nearly 30% for Hindi and Dravidian languages like Tamil, Telugu, Malayalam, and Kannada.

In contrast to past practices, OpenAI does not plan to publicly release its new transcription models. Historically, the company had released new versions of Whisper under the MIT license for commercial use. Harris noted that the new models are much "larger" than Whisper, making them unsuitable for public release. Unlike Whisper, they cannot run on local laptops. OpenAI aims to be more cautious about future open-source releases, optimizing them for specific needs.

Mindtrip

Mindtrip - AI chatbot that helps you organize a your trip

Ai Drive

Ai Drive - Chat with multiple PDF files

Convex

Convex - AI backend platform for AI assisted app development

Ilus AI

Ilus AI - AI illustration tool for stunning visual content

Vast AI

Vast AI - Cloud-based GPU Rentals for AI Computing

Amazon Nova Act

Amazon Nova Act - Error retrieving information

RIZZ AI

RIZZ AI - Elevate your Tinder experience with AI chat

RECENT AI TOOLS

Scan Relief

Mindtrip

Ai Drive

Convex

Ilus AI

RECENT AI NEWS

OpenAI's ChatGPT-4.5 Passes Turing Test with 73% Success Rate

Overtraining Large Language Models May Lead to Fine-Tuning Difficulties

Google Launches AI to Decode Dolphin Language on Pixel Phones

Hugging Face Expands into Hardware with Acquisition of Pollen Robotics

OpenAI May Tidy Up Model Naming This Summer, Ditching Embarrassing Terms Like “GPT-4o”

Meta Plans to Use EU User Data for AI Training

Google Classroom Adds AI-Generated Quiz Question Feature

NVIDIA Advances US Chip Production: Blackwell AI GPU Leads Domestication Efforts

RECENT AI TOOLS