AWS Upgrades Transcribe Service: Enhanced AI Capabilities, Supports 100+ Languages

2023-11-29

Amazon Web Services (AWS) recently announced a major upgrade to its speech-to-text service, Amazon Transcribe, supported by next-generation speech models. This update significantly expands Transcribe's speech recognition capabilities, supporting over 100 languages and achieving substantial improvements in accuracy, as well as integrating a range of AI-driven capabilities.

The core of the update is a generative AI model consisting of billions of parameters, trained on millions of hours of speech data across multiple languages. The algorithm of this model enables it to learn common speech patterns and better recognize diverse accents and noisy environments.

As a result, Transcribe now promises a 20-50% increase in accuracy for most languages, and even a 30-70% improvement in accuracy for challenging domains such as telecommunication speech, known for its difficulty and scarcity of data. The expanded language support and improved speech recognition quality unlock potential for new use cases across different industries.

Emergency call platform Carbyne plans to leverage Transcribe's extensive multilingual capabilities to expand access to 911 and emergency response. Carbyne's CTO, Alex Dizengof, explains that this will enable their translation feature to better serve non-native English speakers, supporting their mission of "everyone matters."

One standout feature is the generative call summary in Amazon Transcribe Call Analytics. This feature simplifies the entire interaction into a concise summary, greatly reducing the post-call workload for agents and allowing managers to quickly grasp the context of the interaction.

The new automatic speech recognition design takes into account usability, customization, user security, and privacy. It includes features such as automatic punctuation, custom vocabulary, automatic language identification, speaker separation, word-level confidence scores, and custom vocabulary filtering.

Other applications include automatic captions for media and content companies, mining insights from customer call records in contact centers, and any organization dealing with a large volume of spoken audio can benefit from it.

Importantly, these upgrades are automatically applied to all Amazon Transcribe customers without the need for migration. API endpoints, input parameters, and backend processes remain unchanged.

Other new features provide customizable capabilities to meet users' security, privacy, and accessibility needs, such as automatic language identification, custom vocabulary filtering, and speaker labeling.