Large Language Model

How Tavus is helping to make AI videos feel like real conversations

April 2, 2025
6 minute read

Tavus, an AI video research company, leverages advanced AI models to create human-like digital interactions that feel as authentic as real human conversations. The platform integrates visual question-answering and multi-image reasoning, allowing people to create engaging, real-time interactions with digital replicas. The company uses Llama 3.3 to enable its conversational video interface (CVI) platform, which developers can use to build rich, realistic, and real-time conversational experiences with digital twins.

“Incorporating Llama models effectively gives digital replicas both ‘eyes’ and a ‘brain’— eyes to interpret visual content through multi-image reasoning and a brain to provide nuanced, context-aware responses,” says Hassaan Raza, Co-Founder and CEO of Tavus.

This approach enables Tavus to solve key challenges related to conversational quality and visual question answering, bringing lifelike responsiveness and coherence to every interaction.

Choosing Llama

Tavus’s conversation layer is run on Llama, which enables the platform to handle real-time digital interactions that previously would have required extensive engineering time and multiple models. The setup makes everything more efficient and ensures quick, clear responses.

The team chose Llama as a replacement for closed source AI models because it offered better conversational quality, faster response times, and a flexible, open source design. Using an open source model like Llama was crucial for Tavus because it allowed on-premise deployment and testing, which enhanced speed, data privacy, and interoperability compared to closed source models.

The open source community and vast tooling have enabled Tavus to experiment with and customize the models, contributing to faster iteration and better alignment with Tavus’s specific use cases. Tavus has reported significant improvements in efficiency and quality, with Llama’s 70B model processing approximately 2,000 tokens per second.

The company integrates Llama models for several key functions. For conversational AI, Llama models deliver responsive, context-aware interactions in real time, allowing digital replicas to handle long-form conversations smoothly. Tool calling enhances responsiveness and supports dynamic interactions with added functionality. Multi-image reasoning enables visual question answering, providing accurate responses based on visual context within videos.

Additionally, by integrating fine-tuned Llama models and leveraging retrieval-augmented generation (RAG) techniques, Tavus allows clients to use their own data and retrieval sources, tailoring the AI to meet specific business needs.

Integration and implementation

Tavus successfully integrated Llama with ease, leveraging the 8B and 70B Instruct versions with customizations that included advanced prompt engineering with multi-level prompting, enhancing conversational depth.

The infrastructure was initially tested with both on-prem (vLLM) and hosted cloud solutions (Cerebras, Fireworks). Tavus also uses vector databases and embedded models for storage and query optimization, with partners like Cerebras and Fireworks supporting cloud infrastructure.

With Cerebras’s Llama implementation, Tavus achieved a 440% – 550% latency improvement over high-latency models and a 25% – 50% edge over comparable GPT models.

“Llama has been one of the least complex and most reliable components in our AI stack,” says Raza, who notes that it also benefits from strong community support and interoperability with internal workflows.

Llama 3.2 and 3.3, which includes multimodal capabilities and smaller models suitable for on-device and edge cases, are helping Tavus explore new possibilities. In the future, the company hopes to expand functionalities within the CVI platform, including enhanced speech recognition, turn detection, and visual question answering.

Learn more about Tavus.


Share:

Our latest updates delivered to your inbox

Subscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.

Join us in the pursuit of what’s possible with AI.

Related Posts
Computer Vision
Introducing Segment Anything: Working toward the first foundation model for image segmentation
April 5, 2023
FEATURED
Research
MultiRay: Optimizing efficiency for large-scale AI models
November 18, 2022
FEATURED
ML Applications
MuAViC: The first audio-video speech translation benchmark
March 8, 2023