RESEARCH

SPEECH & AUDIO

Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

March 17, 2026

Abstract

Cross-lingual sentence encoders have traditionally been limited to a few hundred languages, and have sacrificed downstream performance to achieve better alignment across languages, limiting their adoption. In this work, we introduce OmniSONAR, a novel family of omnilingual, cross-lingual and cross-modal sentence embedding models that breaks this barrier. We establish a unified semantic space, natively encompassing text, speech, code and mathematical expressions, while achieving state-of-the-art downstream performance for an unprecedented scale of thousands of languages, from high-resource languages to extremely low-resource varieties. To achieve this scale without representation collapse and while maintaining top-tier performance in the high-resource languages, we employ a progressive training strategy. We first build a state-of-the-art foundational embedding space for 200 languages using an LLM-initialized Encoder-Decoder, combining token-level decoding with a novel split-softmax contrastive loss and synthetic hard negatives. Leveraging this strong foundational space, we expand to several thousands of language varieties via a specialized two-stage teacher-student encoder distillation framework. Further modeling extensions derived from OmniSONAR address long context inputs and token-centric representations. Finally, we demonstrate the cross-modal extensibility of this space by seamlessly mapping 177 spoken languages into it. OmniSONAR redefines the state of the art for multilingual representation learning. It halves the cross-lingual similarity search error rate of the previous best models on the 200 languages of FLORES, while also achieving a staggering 15-fold error rate reduction across 1,560 languages in the BIBLE benchmark. Furthermore, our embedding model enables unprecedented translation capabilities, outperforming NLLB-3B on several multilingual benchmarks, and surpassing all previous models, including multi-billion-parameter LLMs, by 15 chrF++ points in 1,560→English translation in the BIBLE benchmark. Beyond alignment and translation, OmniSONAR demonstrates strong general-purpose capabilities across downstream embedding tasks on MTEB and programming languages on XLCoST. For the speech modality, our massively multilingual extension exhibits a 43% lower error rate in cross-lingual and cross-modal similarity search, while achieving 97% of SeamlessM4T performance in speech-to-text translation, despite being a zero-shot translation model trained only with ASR data. Finally, by training an encoder-decoder language model, Spectrum, exclusively on English text that processes OmniSONAR sequences, we unlock immediate high-performance transfer to thousands of languages and the speech modality for complex downstream tasks. These outstanding results position OmniSONAR as a robust, language- and modality-agnostic foundation for any downstream usage.

Download the Paper

AUTHORS

Written by

Omnilingual SONAR Team

João Maria Janeiro

Pere Lluís Huguet Cabot

Ioannis Tsiamas

Yen Meng

Vivek Iyer

Guillem Ramirez

Loic Barrault

Belen Alastruey

Yu-An Chung

Marta R. Costa-jussa

David Dale

Kevin Heffernan

Jaehyeong Jo

Artyom Kozhevnikov

Alexandre Mourachko

Christophe Ropers

Holger Schwenk

Paul-Ambroise Duquenne

Publisher

arXiv

Related Publications

March 17, 2026

RESEARCH

NLP

Omnilingual MT: Machine Translation for 1,600 Languages

Omnilingual MT Team, Belen Alastruey, Niyati Bafna, Andrea Caciolai, Kevin Heffernan, Artyom Kozhevnikov, Christophe Ropers, Eduardo Sánchez, Charles-Eric Saint-James, Ioannis Tsiamas, Chierh CHENG, Joe Chuang, Paul-Ambroise Duquenne, Mark Duppenthaler, Nate Ekberg, Cynthia Gao, Pere Lluís Huguet Cabot, João Maria Janeiro, Jean Maillard, Gabriel Mejia Gonzalez, Holger Schwenk, Edan Toledo, Arina Turkatenko, Albert Ventayol-Boada, Rashel Moritz, Alexandre Mourachko, Surya Parimi, Mary Williamson, Shireen Yates, David Dale, Marta R. Costa-jussa

March 17, 2026

February 27, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

Unified Vision–Language Modeling via Concept Space Alignment

Yifu Qiu, Paul-Ambroise Duquenne, Holger Schwenk

February 27, 2026

February 26, 2026

CONVERSATIONAL AI

RESEARCH

Learning Personalized Agents from Human Feedback

Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Shaoliang Nie, Michael Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, Saghar Hosseini

February 26, 2026

February 11, 2026

RESEARCH

COMPUTER VISION

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan, Ziqi Huang, Animesh Sinha, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Junzhe Sun, Chu Wang, Serena Yeung-Levy, Felix Juefei-Xu

February 11, 2026

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.