A family of AI research models that enable more natural and authentic communication across languages
The Seamless Communication models
A model that aims to preserve expression and intricacies of speech across languages.
A model that can deliver speech and text translations with around two seconds of latency.
A foundational multilingual and multitask model that allows people to communicate effortlessly through speech and text.
A model that merges capabilities from SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 into one.
Translations should capture the nuances of human expression. While existing translation tools are skilled at capturing the content within a conversation, they typically rely on monotone, robotic text-to-speech systems for their output. SeamlessExpressive aims to preserve intricacies of speech; such as pauses and speech rate, in addition to vocal style and emotional tone.
English input: whisper
Please keep the volume down. We just put the baby to sleep.
Spanish output: non-expressive
Spanish output: expressive
English input: sad
Please, don't leave. I hate being here alone.
French output: non-expressive
French output: expressive
Near real-time translation
SeamlessStreaming is the first massively multilingual model that delivers translations with around two-seconds of latency and nearly the same accuracy as an offline model. Built upon SeamlessM4T v2, SeamlessStreaming supports automatic speech recognition and speech-to-text translation for nearly 100 input and output languages, in addition to speech-to-speech translation for nearly 100 input languages and 36 output languages.Learn more
Foundational model for universal translation
In August 2023, we introduced the first version of SeamlessM4T, a foundational multilingual and multitask model that delivered state-of-the-art results for translation and transcription across speech and text. Built upon this work, our improved model, SeamlessM4T v2, serves as the foundation for our new SeamlessExpressive and SeamlessStreaming models. It features a new architecture with a non-autoregressive text to unit decoder that delivers improved consistency between text and speech output.Learn more
More model details
Try the SeamlessExpressive demo
Our approach to research
We believe in the power of collaboration and open research to break down communication barriers. To enable our fellow researchers to build upon this work, we’re publicly releasing the full suite of Seamless Communication models, along with metadata, data and tools.
Safety and responsibility
We’re dedicated to promoting a safe and responsible AI ecosystem. We have taken a number of steps to improve the safety of our Seamless Communication models; significantly reducing the impacts of hallucinated toxicity in translations, and implementing a custom watermarking approach for audio outputs from our expressive models.