RESEARCH

SPEECH & AUDIO

Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

March 17, 2026

Abstract

Cross-lingual sentence encoders have traditionally been limited to a few hundred languages, and have sacrificed downstream performance to achieve better alignment across languages, limiting their adoption. In this work, we introduce OmniSONAR, a novel family of omnilingual, cross-lingual and cross-modal sentence embedding models that breaks this barrier. We establish a unified semantic space, natively encompassing text, speech, code and mathematical expressions, while achieving state-of-the-art downstream performance for an unprecedented scale of thousands of languages, from high-resource languages to extremely low-resource varieties. To achieve this scale without representation collapse and while maintaining top-tier performance in the high-resource languages, we employ a progressive training strategy. We first build a state-of-the-art foundational embedding space for 200 languages using an LLM-initialized Encoder-Decoder, combining token-level decoding with a novel split-softmax contrastive loss and synthetic hard negatives. Leveraging this strong foundational space, we expand to several thousands of language varieties via a specialized two-stage teacher-student encoder distillation framework. Further modeling extensions derived from OmniSONAR address long context inputs and token-centric representations. Finally, we demonstrate the cross-modal extensibility of this space by seamlessly mapping 177 spoken languages into it. OmniSONAR redefines the state of the art for multilingual representation learning. It halves the cross-lingual similarity search error rate of the previous best models on the 200 languages of FLORES, while also achieving a staggering 15-fold error rate reduction across 1,560 languages in the BIBLE benchmark. Furthermore, our embedding model enables unprecedented translation capabilities, outperforming NLLB-3B on several multilingual benchmarks, and surpassing all previous models, including multi-billion-parameter LLMs, by 15 chrF++ points in 1,560→English translation in the BIBLE benchmark. Beyond alignment and translation, OmniSONAR demonstrates strong general-purpose capabilities across downstream embedding tasks on MTEB and programming languages on XLCoST. For the speech modality, our massively multilingual extension exhibits a 43% lower error rate in cross-lingual and cross-modal similarity search, while achieving 97% of SeamlessM4T performance in speech-to-text translation, despite being a zero-shot translation model trained only with ASR data. Finally, by training an encoder-decoder language model, Spectrum, exclusively on English text that processes OmniSONAR sequences, we unlock immediate high-performance transfer to thousands of languages and the speech modality for complex downstream tasks. These outstanding results position OmniSONAR as a robust, language- and modality-agnostic foundation for any downstream usage.

Download the Paper

AUTHORS

Written by

Omnilingual SONAR Team

Ioannis Tsiamas

Yen Meng

Vivek Iyer

Guillem Ramirez

Jaehyeong Jo

Alexandre Mourachko

Yu-An Chung

Artyom Kozhevnikov

Belen Alastruey

Christophe Ropers

David Dale

Holger Schwenk

João Maria Janeiro

Kevin Heffernan

Loic Barrault

Marta R. Costa-jussa

Paul-Ambroise Duquenne

Pere Lluís Huguet Cabot

Publisher

arXiv

Related Publications

June 05, 2026

CONVERSATIONAL AI

RANKING AND RECOMMENDATIONS

Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval

Anshumali Shrivastava, Jason Chen, Qi Ma, Zeyu Yang

June 05, 2026

May 26, 2026

HUMAN & MACHINE INTELLIGENCE

THEORY

Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images

Valentin Wyart, Huy V. Vo, Jean Remi King, Josephine Raugel, Jérémy Rapin, Marc Szafraniec, Max Seitzer, Patrick Labatut, Piotr Bojanowski

May 26, 2026

May 20, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

Alvin W. M. Tan, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Michael C. Frank, Angel Villar Corrales, Charles-Eric Saint-James, Dongyan Lin, Emmanuel Dupoux, Jiayi Shen, Juan Pino, Mahi Luthra, Martin Gleize, Phillip Rust, Rashel Moritz, Sheila Krogh-Jespersen, Surya Parimi, Tom Fizycki, Vanessa Stark, Yosuke Higuchi, Youssef Benchekroun

May 20, 2026

May 18, 2026

CONVERSATIONAL AI

RESEARCH

GIM: Evaluating models via tasks that integrate multiple cognitive domains

Alexandre Rezende, Rohit Patel, Steven McClain

May 18, 2026

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.