SPEECH & AUDIO

NLP

Seamless: Multilingual Expressive and Streaming Speech Translation

November 30, 2023

Abstract

Recent advancements in automatic speech translation have dramatically expanded language coverage, improved multimodal capabilities, and enabled a wide range of tasks and functionalities. That said, large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model—SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. The expanded version of SeamlessAlign adds 114,800 hours of automatically aligned data for a total of 76 languages. SeamlessM4T v2 provides the foundation on which our two newest models, SeamlessExpressive and SeamlessStreaming, are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one’s voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention (EMMA) mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To understand the performance of these models, we combined novel and modified versions of existing automatic metrics to evaluate prosody, latency, and robustness. For human evaluations, we adapted existing protocols tailored for measuring the most relevant attributes in the preservation of meaning, naturalness, and expressivity. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. In sum, Seamless gives us a pivotal look at the technical foundation needed to turn the Universal Speech Translator from a science fiction concept into a real-world technology. Finally, contributions in this work—including models, code, and a watermark detector—are publicly released and accessible at the link below.

Download the Paper

AUTHORS

Written by

Seamless Communication

Loïc Barrault

Yu-An Chung

Mariano Coria Meglioli

David Dale

Ning Dong

Mark Duppenthaler

Paul-Ambroise Duquenne

Brian Ellis

Hady Elsahar

Justin Haaheim

John Hoffman

Min-Jae Hwang

Hirofumi Inaguma

Christopher Klaiber

Ilia Kulikov

Pengwei Li

Daniel Licht

Jean Maillard

Ruslan Mavlyutov

Alice Rakotoarison

Kaushik Ram Sadagopan

Abinesh Ramakrishnan

Tuan Tran

Guillaume Wenzek

Yilin Yang

Ethan Ye

Ivan Evtimov

Pierre Fernandez

Cynthia Gao

Prangthip Hansanti

Elahe Kalbassi

Amanda Kallet

Artyom Kozhevnikov

Gabriel Mejia Gonzalez

Robin San Roman

Christophe Touret

Corinne Wong

Carleigh Wood

Bokai Yu

Pierre Andrews

Can Balioglu

Peng-Jen Chen

Marta R. Costa-jussà

Maha Elbayad

Hongyu Gong

Francisco Guzmán

Kevin Heffernan

Somya Jain

Justine Kao

Ann Lee

Xutai Ma

Alexandre Mourachko

Benjamin Peloquin

Juan Pino

Sravya Popuri

Christophe Ropers

Safiyyah Saleem

Holger Schwenk

Anna Sun

Paden Tomasello

Changhan Wang

Jeff Wang

Skyler Wang

Mary Williamson

Publisher

arXiv

Related Publications

June 05, 2026

CONVERSATIONAL AI

RANKING AND RECOMMENDATIONS

Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval

Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava

June 05, 2026

May 20, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

Dongyan Lin, Phillip Rust, Angel Villar Corrales, Alvin W. M. Tan, Mahi Luthra, Charles-Eric Saint-James, Rashel Moritz, Sheila Krogh-Jespersen, Vanessa Stark, Surya Parimi, Jiayi Shen, Youssef Benchekroun, Yosuke Higuchi, Martin Gleize, Tom Fizycki, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Juan Pino, Michael C. Frank, Emmanuel Dupoux

May 20, 2026

May 18, 2026

CONVERSATIONAL AI

RESEARCH

GIM: Evaluating models via tasks that integrate multiple cognitive domains

Rohit Patel, Alexandre Rezende, Steven McClain

May 18, 2026

May 12, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

NeuralSet: A High-Performing Python Package for Neuro-AI

Jean Remi King, Corentin Bel, Linnea Evanson, Julien Gadonneix, Sophia Houhamdi, Jarod Levy, Josephine Raugel, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Teon Brooks, Katelyn Begany, Yohann Benchetrit, Marlene Careil, Hubert Jacob Banville, Stéphane d'Ascoli, Simon Dahan, Jérémy Rapin

May 12, 2026

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.