July 28, 2019
In this paper, we study the problem of hybrid language modeling, that is using models which can predict both characters and larger units such as character ngrams or words. Using such models, multiple potential segmentations usually exist for a given string, for example one using words and one using characters only. Thus, the probability of a string is the sum of the probabilities of all the possible segmentations. Here, we show how it is possible to marginalize over the segmentations efficiently, in order to compute the true probability of a sequence. We apply our technique on three datasets, comprising seven languages, showing improvements over a strong character level language model.
Research Topics
March 17, 2026
Omnilingual MT Team, Niyati Bafna, Ioannis Tsiamas, Mark Duppenthaler, Albert Ventayol-Boada, Alexandre Mourachko, Andrea Caciolai, Arina Turkatenko, Artyom Kozhevnikov, Belen Alastruey, Charles-Eric Saint-James, Chierh CHENG, Christophe Ropers, Cynthia Gao, David Dale, Edan Toledo, Eduardo Sánchez, Gabriel Mejia Gonzalez, Holger Schwenk, Jean Maillard, Joe Chuang, João Maria Janeiro, Kevin Heffernan, Marta R. Costa-jussa, Mary Williamson, Nate Ekberg, Paul-Ambroise Duquenne, Pere Lluís Huguet Cabot, Rashel Moritz, Shireen Yates, Surya Parimi
March 17, 2026
March 17, 2026
Omnilingual SONAR Team, Ioannis Tsiamas, Yen Meng, Vivek Iyer, Guillem Ramirez, Jaehyeong Jo, Alexandre Mourachko, Yu-An Chung, Artyom Kozhevnikov, Belen Alastruey, Christophe Ropers, David Dale, Holger Schwenk, João Maria Janeiro, Kevin Heffernan, Loic Barrault, Marta R. Costa-jussa, Paul-Ambroise Duquenne, Pere Lluís Huguet Cabot
March 17, 2026
February 27, 2026
Yifu Qiu, Holger Schwenk, Paul-Ambroise Duquenne
February 27, 2026
November 10, 2025
Omnilingual ASR team, Skyler Wang, Ife Adebara, Michael Auli, Kaushik Ram Sadagopan, Zheng-Xin Yong, Albert Ventayol-Boada, Alexandre Mourachko, Alexander Erben, Yu-An Chung, Arina Turkatenko, Artyom Kozhevnikov, Caley Drooff, Can Balioglu, Chierh Cheng, Christophe Ropers, Cynthia Gao, Gabriel Mejia Gonzalez, Gil Keren, Jean Maillard, Joe Chuang, Kehan Lyu, Kevin Chan, Mark Duppenthaler, Mary Williamson, Matthew Setzler, Paul-Ambroise Duquenne, Rashel Moritz, Safiyyah Saleem, Sagar Miglani, Shireen Yates, Vineel Pratap, Yen Meng
November 10, 2025
October 31, 2019
Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc’Aurelio Ranzato
October 31, 2019
March 14, 2019
Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin
March 14, 2019
January 13, 2020
Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
January 13, 2020
April 30, 2018
Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston
April 30, 2018

Our approach
Latest news
Foundational models