October 03, 2024
We present BLASER 2.0, an automatic metric of machine translation quality which supports both speech and text modalities. Compared to its predecessor BLASER (Chen et al., 2023), BLASER 2.0 is based on better underlying text and speech representations that cover 202 text languages and 57 speech ones and extends the training data. BLASER 2.0 comes in two varieties: a reference-based and a reference-free (quality estimation) model. We demonstrate that the reference-free version is applicable not only at the dataset level, for evaluating the overall model performance, but also at the sentence level, for scoring individual translations. In particular, we show its applicability for detecting translation hallucinations and filtering training datasets to obtain more reliable translation models. The BLASER 2.0 models are publicly available at https://github.com/facebookresearch/sonar.
Written by
David Dale
Marta R. Costa-jussa
Publisher
EMNLP
Research Topics
February 07, 2025
The Omnilingual MT Team, Pierre Andrews, Mikel Artetxe, Mariano Coria Meglioli, Marta R. Costa-jussa, Joe Chuang, David Dale, Cynthia Gao, Jean Maillard, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Eduardo Sánchez, Yiannis Tsiamas, Arina Turkatenko, Albert Ventayol, Shireen Yates
February 07, 2025
February 06, 2025
Jarod Levy, Mingfang (Lucy) Zhang, Svetlana Pinet, Jérémy Rapin, Hubert Jacob Banville, Stéphane d'Ascoli, Jean Remi King
February 06, 2025
February 06, 2025
Mingfang (Lucy) Zhang, Jarod Levy, Stéphane d'Ascoli, Jérémy Rapin, F.-Xavier Alario, Pierre Bourdillon, Svetlana Pinet, Jean Remi King
February 06, 2025
January 04, 2025
January 04, 2025
Foundational models
Our approach
Latest news
Foundational models