November 30, 2023
Recent advancements in automatic speech translation have dramatically expanded language coverage, improved multimodal capabilities, and enabled a wide range of tasks and functionalities. That said, large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model—SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. The expanded version of SeamlessAlign adds 114,800 hours of automatically aligned data for a total of 76 languages. SeamlessM4T v2 provides the foundation on which our two newest models, SeamlessExpressive and SeamlessStreaming, are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one’s voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention (EMMA) mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To understand the performance of these models, we combined novel and modified versions of existing automatic metrics to evaluate prosody, latency, and robustness. For human evaluations, we adapted existing protocols tailored for measuring the most relevant attributes in the preservation of meaning, naturalness, and expressivity. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. In sum, Seamless gives us a pivotal look at the technical foundation needed to turn the Universal Speech Translator from a science fiction concept into a real-world technology. Finally, contributions in this work—including models, code, and a watermark detector—are publicly released and accessible at the link below.
Written by
Seamless Communication
Elahe Kalbassi
Xutai Ma
Abinesh Ramakrishnan
Alexandre Mourachko
Alice Rakotoarison
Amanda Kallet
Yu-An Chung
Anna Sun
Artyom Kozhevnikov
Benjamin Peloquin
Bokai Yu
Brian Ellis
Can Balioglu
Carleigh Wood
Christophe Ropers
Christophe Touret
Christopher Klaiber
Corinne Wong
Cynthia Gao
Daniel Licht
David Dale
Ethan Ye
Gabriel Mejia Gonzalez
Guillaume Wenzek
Hady Elsahar
Hirofumi Inaguma
Ilia Kulikov
Ivan Evtimov
Jean Maillard
Jeff Wang
John Hoffman
Justin Haaheim
Prangthip Hansanti
Kaushik Ram Sadagopan
Kevin Heffernan
Mariano Coria Meglioli
Mark Duppenthaler
Marta R. Costa-jussà
Mary Williamson
Min-Jae Hwang
Ning Dong
Paden Tomasello
Paul-Ambroise Duquenne
Peng-Jen Chen
Pengwei Li
Pierre Andrews
Pierre Fernandez
Robin San Roman
Ruslan Mavlyutov
Safiyyah Saleem
Skyler Wang
Somya Jain
Sravya Popuri
Tuan Tran
Yilin Yang
Publisher
arXiv
May 12, 2026
Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit
May 12, 2026
May 04, 2026
Sachin Mehta, Alisa Liu, Margaret Li, Artidoro Pagnoni, Gargi Ghosh, Luke Zettlemoyer, Mike Lewis, Srini Iyer, Tomasz Limisiewicz
May 04, 2026
March 24, 2026
Jenny Zhang, Bingchen Zhao, Jakob Foerster, Sam Devlin, Tatiana Shavrina, Winnie Yang
March 24, 2026
March 17, 2026
Omnilingual MT Team, Niyati Bafna, Ioannis Tsiamas, Mark Duppenthaler, Albert Ventayol-Boada, Alexandre Mourachko, Andrea Caciolai, Arina Turkatenko, Artyom Kozhevnikov, Belen Alastruey, Charles-Eric Saint-James, Chierh CHENG, Christophe Ropers, Cynthia Gao, David Dale, Edan Toledo, Eduardo Sánchez, Gabriel Mejia Gonzalez, Holger Schwenk, Jean Maillard, Joe Chuang, João Maria Janeiro, Kevin Heffernan, Marta R. Costa-jussa, Mary Williamson, Nate Ekberg, Paul-Ambroise Duquenne, Pere Lluís Huguet Cabot, Rashel Moritz, Shireen Yates, Surya Parimi
March 17, 2026

Our approach
Latest news
Foundational models