"Hume's Emotion-Rich Speech AI and API Interface EVI 2 Arrives"

2024-09-19

Last week, the new Empathic Voice Interface 2 (EVI 2) was officially launched, introducing a suite of enhanced features aimed at improving naturalness, emotional responsiveness, and customizability, while significantly reducing costs for developers and businesses. Through its API, EVI 2 reduced latency by 40% and costs by 30%.

"We aim for developers to integrate this technology into any application, creating their desired brand voice and adjusting it according to user needs to ensure the voice sounds trustworthy and personalized," Cowen stated during a video call with VentureBeat last week.

In fact, Cowen told VentureBeat that he is observing and hopes to see more companies no longer pushing users away from their applications but instead enabling them to use AI voice assistants equipped with EVI to handle technical and customer service issues.

He mentioned that, particularly due to EVI 2's design, it is now possible for end users to directly connect to voice assistants powered by EVI 2 within applications, and in many cases, this provides a better user experience. If Hume's development tools are correctly used to integrate EVI 2 with a client's underlying applications, then EVI 2-driven voice assistants can now retrieve information or perform actions on behalf of users without the need to connect to any external phone numbers.

"Developers are beginning to realize that they don't have to place voice over telephone lines; they can embed it anywhere within applications," Cowen told VentureBeat.

For example, if I want to change my address information in an online account, I can simply use the integrated EVI 2 to request the change, instead of having it guide me through all the steps and screens.

Timely Release

The timing of EVI 2's release is particularly advantageous for Hume. Although it is not as widely known as OpenAI or the potential competitor Anthropic—the latter reportedly revamping investor Amazon's Alexa voice assistant for release— Hume has already outpaced Anthropic and OpenAI by launching a powerful, cutting-edge humanoid voice assistant that businesses can use immediately.

In contrast, OpenAI's advanced voice mode for ChatGPT, supported by the GPT-4o model and showcased in May, is still available only to a limited number of users and requires a waiting list. Additionally, Cowen believes that EVI 2 actually excels in detecting and responding to user emotions, allowing it to respond with its own emotional expressions.

"EVI 2 is entirely end-to-end. It simply receives audio signals and outputs audio signals, which is more akin to how [OpenAI's] GPT for voice operates," he told VentureBeat. In other words, both EVI 2 and GPT-4o directly convert audio signal waveforms and data into tokens, rather than first transcribing them into text and then inputting them into the language model. The first-generation EVI models used the latter approach—but in VentureBeat's independent demonstrations, it remained very fast and responsive."

For developers and businesses looking to differentiate their voice AI capabilities, reduce costs, or maintain low expenses by using voice AI instead of human call centers, Hume's EVI 2 may be an attractive option.

Conversational AI Advancements in EVI 2

Cowen and Hume assert that EVI 2 enables faster and more fluid conversations, with response times of less than a second, and offers a variety of voice customization options.

They state that EVI 2 is designed to predict and adapt to user preferences in real-time, making it an ideal choice for a wide range of applications, from customer service bots to virtual assistants.

Key improvements in EVI 2 include an advanced voice generation system that enhances voice naturalness and clarity, as well as emotional intelligence that helps the model understand the user's tone and adjust its responses accordingly.

EVI 2 also supports features like voice modulation, allowing developers to fine-tune the voice based on parameters such as pitch, nasality, and gender, making it both flexible and customizable while avoiding the risks associated with voice cloning.

At VentureBeta, we have also seen and reported on numerous proprietary and open-source voice AI models. Instances online where two or more voice AI models engage in conversation have led to some strange and unsettling outcomes, such as agonized screams.

When I asked Cowen about these instances, he seemed somewhat amused but not overly concerned about such occurrences with Hume.

"These are indeed issues inherent to these models. You have to eliminate these problems with the right data, and we're very good at that," he told me. "Perhaps occasionally someone might try to trick it, but that's rare."

Furthermore, Cowen stated that Hume has no plans to offer "voice cloning," which involves replicating a speaker's voice based on just a few seconds of sample to make it say any given text.