Tavus Launches AI Model Family for Real-time Facial Interaction

2025-03-07

Tavus is an artificial intelligence research startup specializing in developing real-time AI technology models that simulate the experience of conversing with others. Today, they announced the launch of a series of groundbreaking AI models.

The company states it is building what they call a human-computer interaction operating system, named "Conversational Video Interface," which will enable AI to naturally perceive, interpret, and respond just like talking to another person during a Zoom or FaceTime call. Tavus's mission is to make AI understand facial expressions, tone of voice, and body language while interpreting their meanings, as well as responding through its own expressions and tone to convey meaning.

"Humans are evolutionarily designed for face-to-face communication. So, we want to teach machines how to achieve this," CEO Hasan Raza told SiliconANGLE in an interview. "If we believe in a sci-fi future with AI colleagues, friends, and assistants, we need to build the interfaces to make that happen."

The products released today include three models: Phoenix-3, the first full-face AI rendering model capable of conveying subtle expressions; Raven-0, a breakthrough AI perception model that observes and reasons like humans; and Sparrow-0, a state-of-the-art turn-taking dialogue model that adds "a spark of life" to conversations.

Phoenix-3 is the company’s flagship foundational model, aimed at creating "digital twins," highly realistic representations of individuals powered by AI-driven human expression capabilities, as Raza explained. Now in its third iteration, it provides full-face animation, capable of cloning individuals and accurately portraying every facial muscle, crucial for mimicking subtle expressions. He noted that most commercial facial animation models cannot handle full faces, resulting in mismatches between the lower and upper parts, which disrupts immersion.

"Phoenix-3 is a full-face expression model with emotion control functionality, the first to achieve this without requiring extensive data," said Raza.

Most importantly, Phoenix-3’s high fidelity and facial muscle control allow it to accurately simulate "micro-expressions." These are fleeting, involuntary facial expressions that result from emotional reactions. By incorporating this feature, the model creates a vivid video experience that is more emotionally expressive and lifelike than simple animated faces.

To enable Phoenix-3 to respond like humans, Raven-0 grants AI the ability to observe and interpret what’s happening within a scene. Instead of capturing single snapshots, it continuously observes and understands the context of video events. This includes recognizing users' emotions and detecting changes in their environment.

For instance, an AI tutor can identify when students appear confused or frustrated by monitoring their expressions and adjust explanations accordingly. Similarly, a support assistant can observe how customers interact with a product and provide guidance on resolving any issues.

Sparrow-0 addresses many mistakes commonly made by AI, Raza said. Natural conversations have fluidity, with participants engaging in a back-and-forth process where one waits for the other to finish speaking before interjecting.

However, AI sometimes interjects too quickly—sometimes right as someone else is speaking. This abruptness occurs because AI models think faster than humans, and developers work hard to reduce latency, the time required for an AI model to respond. But if the AI responds too quickly, it feels unnatural.

The Sparrow model strives to make conversations feel natural by understanding speech rhythm to know when to pause, speak, or listen. It doesn’t react to filler words like "uh" or wait for long silences—instead, it adjusts based on tone, rhythm, and context.

"If it’s very certain you're having a fast, friendly conversation, it’ll respond quickly," Raza explained. "But if you say, ‘Hey, let me think,’ the AI gives you space. This makes the conversation feel much more natural."

Unlike other companies piecing together technologies, Raza said Tavus has developed an integrated system combining these models. The result is a highly immersive experience that feels more like talking to another person rather than the unnatural interactions typical of other human avatar AI systems.

Raza stated that there’s still a long way to go in terms of model capabilities, meaning continuous improvement in AI perceiving and understanding humans.

"It’s not perfect today, but it’s the best of its kind," Raza added. "However, in the future, our goal is to have a model so deeply understanding of humans that unless you ask, you wouldn't know it’s a model."