Amazon Unveils Largest Text-to-Speech Model to Date AI NEWS

Home
AInews
Amazon Unveils Largest Text-to-Speech Model to Date

Amazon Unveils Largest Text-to-Speech Model to Date

2024-02-18

A group of artificial intelligence researchers from Amazon's AI Research Institute has announced the development of what they describe as the largest text-to-speech model in history. By "largest," they mean the model with the most parameters and the largest training dataset. They have published a paper on the development and training process of this model on the arXiv preprint server. Large language models (LLMs) like ChatGPT have gained attention for their ability to answer questions and generate advanced documents in a human-like manner. However, artificial intelligence is still in the process of entering other mainstream applications. In this new research, the researchers aim to enhance the capabilities of text-to-speech applications by increasing the number of parameters and expanding the training corpus. This new model, called "Big Adaptive Streaming TTS" (BASE TTS), has 980 million parameters and is trained on 100,000 hours of recorded speech from public websites, mostly in English. The team also provided examples of pronunciation for words and phrases in other languages to ensure the model can correctly pronounce them, such as "au contraire" or "adios, amigo." The Amazon team also tested the model on smaller datasets to understand its development in the emerging field of AI quality, where both LLMs and text-to-speech applications seem to have suddenly reached higher levels of intelligence. They found that for their application, a moderate-sized dataset was a significant leap towards higher levels. They also noted that this leap involved a range of language attributes, such as the ability to use compound nouns, express emotions, use foreign words, apply sub-languages and punctuation, and emphasize the correct words in a sentence for questioning. The team stated that BASE TTS will not be publicly released as they are concerned about its potential unethical use. Instead, they plan to use it for learning applications and apply the knowledge gained so far to improve the overall voice quality of text-to-speech applications.

Vizcom AI

Transform sketches into 3D models and edit them

Keploy

Automated testing made easy with AI technology

Figma Make

Create prototype apps from existing designs

Doctronic

AI platform providing personalized health guidance

3D Look AI

AI body scanner for accurate body measurements

VulnZap

AI code vulnerability scanner

The Furnisher

AI room design tool for quick makeovers

RECENT AI TOOLS

Plaud

Vizcom AI

Keploy

Figma Make

Doctronic

RECENT AI NEWS

Kling 2.6 Adds Voice Control and Dynamic Upgrades as AI Video Tools Get More Realistic

Alibaba's Qwen Releases AI Model to Split Images into Editable Layers, Similar to Photoshop

NVIDIA NitroGen Model Launches with Impact

Google Metrax introduces predefined model evaluation metrics for JAX

OpenAI Allows Users to Adjust ChatGPT's Enthusiasm Level Directly

AWS Introduces ECS Fargate Mode to Simplify Deployment of Containerized Applications

Cursor Acquires AI Code Review Startup Graphite

OpenAI Releases Most Advanced Agent Programming Model GPT-5.2-Codex

RECENT AI TOOLS