AI research by Meta

Seamless Communication

A significant step towards removing language barriers through expressive, fast and high-quality AI translation

Download the models

A family of AI research models that enable more natural and authentic communication across languages

The Seamless Communication models

SeamlessExpressive

A model that aims to preserve expression and intricacies of speech across languages.

SeamlessStreaming

A model that can deliver speech and text translations with around two seconds of latency.

SeamlessM4T v2

A foundational multilingual and multitask model that allows people to communicate effortlessly through speech and text.

Seamless

A model that merges capabilities from SeamlessExpressive, SeamlessStreaming and SeamlessM4T v2 into one.


Preserving prosody

SeamlessExpressive

Translations should capture the nuances of human expression. While existing translation tools are skilled at capturing the content within a conversation, they typically rely on monotone, robotic text-to-speech systems for their output. SeamlessExpressive aims to preserve intricacies of speech; such as pauses and speech rate, in addition to vocal style and emotional tone.

English input: whisper

Please keep the volume down. We just put the baby to sleep.

Spanish output: non-expressive

Spanish output: expressive

English input: sad

Please, don't leave. I hate being here alone.

French output: non-expressive

French output: expressive

Near real-time translation

SeamlessStreaming

SeamlessStreaming is the first massively multilingual model that delivers translations with around two-seconds of latency and nearly the same accuracy as an offline model. Built upon SeamlessM4T v2, SeamlessStreaming supports automatic speech recognition and speech-to-text translation for nearly 100 input and output languages, in addition to speech-to-speech translation for nearly 100 input languages and 36 output languages.

Learn more

Foundational model for universal translation

SeamlessM4T v2

In August 2023, we introduced the first version of SeamlessM4T, a foundational multilingual and multitask model that delivered state-of-the-art results for translation and transcription across speech and text. Built upon this work, our improved model, SeamlessM4T v2, serves as the foundation for our new SeamlessExpressive and SeamlessStreaming models. It features a new architecture with a non-autoregressive text to unit decoder that delivers improved consistency between text and speech output.

Learn more

More model details

Learn more about the research behind Seamless Communication

Technical overview

Try the SeamlessExpressive demo

Try the SeamlessExpressive demo to hear how you sound in a different language while maintaining elements of your expression and tone.

SeamlessExpressive demo

Our approach to research

Open innovation

We believe in the power of collaboration and open research to break down communication barriers. To enable our fellow researchers to build upon this work, we’re publicly releasing the full suite of Seamless Communication models, along with metadata, data and tools.

Safety and responsibility

We’re dedicated to promoting a safe and responsible AI ecosystem. We have taken a number of steps to improve the safety of our Seamless Communication models; significantly reducing the impacts of hallucinated toxicity in translations, and implementing a custom watermarking approach for audio outputs from our expressive models.