November 04, 2020
In this paper, we investigate out-of-vocabulary (OOV) word recovery in hybrid automatic speech recognition (ASR) systems, with emphasis on dynamic vocabulary expansion for both Weight Finite State Transducer (WFST)-based decoding and word-level RNNLM rescoring. We first describe our OOV candidate generation method based on a hybrid lexical model (HLM) with phoneme-sequence constraints. Next, we introduce a framework for efficient second pass OOV recovery with a dynamically expanded vocabulary, showing that, by calibrating OOV candidates' language model (LM) scores, it significantly improves OOV recovery and overall decoding performance compared to HLM-based first pass decoding. Finally we propose an open-vocabulary word-level recurrent neural network language model (RNNLM) re-scoring framework, making it possible to re-score ASR hypotheses containing recovered OOVs, using a single word-level RNNLM ignorant of OOVs when it was trained. By evaluating OOV recovery and overall decoding performance on Spanish/English ASR `tasks, we show the proposed OOV recovery pipeline has the potential of an efficient open-vocab word-based ASR decoding framework, with minimal extra computation versus a standard WFST based decoding and RNNLM rescoring pipeline.
Publisher
ICASSP
Research Topics
April 16, 2026
Karen Hambardzumyan, Nicolas Baldwin, Edan Toledo, Rishi Hazra, Michael Kuchnik, Bassel Al Omari, Thomas Simon Foster, Anton Protopopov, Jean-Christophe Gagnon-Audet, Ishita Mediratta, Kelvin Niu, Michael Shvartsman, Alisia Lupidi, Alexis Audran-Reiss, Parth Pathak, Tatiana Shavrina, Despoina Magka, Hela Momand, Derek Dunfield, Nicola Cancedda, Pontus Stenetorp, Carole-Jean Wu, Jakob Foerster, Yoram Bachrach, Martin Josifoski
April 16, 2026
March 24, 2026
Jenny Zhang, Bingchen Zhao, Winnie Yang, Jakob Foerster, Sam Devlin, Tatiana Shavrina
March 24, 2026
March 17, 2026
Omnilingual MT Team, Belen Alastruey, Niyati Bafna, Andrea Caciolai, Kevin Heffernan, Artyom Kozhevnikov, Christophe Ropers, Eduardo Sánchez, Charles-Eric Saint-James, Ioannis Tsiamas, Chierh CHENG, Joe Chuang, Paul-Ambroise Duquenne, Mark Duppenthaler, Nate Ekberg, Cynthia Gao, Pere Lluís Huguet Cabot, João Maria Janeiro, Jean Maillard, Gabriel Mejia Gonzalez, Holger Schwenk, Edan Toledo, Arina Turkatenko, Albert Ventayol-Boada, Rashel Moritz, Alexandre Mourachko, Surya Parimi, Mary Williamson, Shireen Yates, David Dale, Marta R. Costa-jussa
March 17, 2026
March 17, 2026
Omnilingual SONAR Team, João Maria Janeiro, Pere Lluís Huguet Cabot, Ioannis Tsiamas, Yen Meng, Vivek Iyer, Guillem Ramirez, Loic Barrault, Belen Alastruey, Yu-An Chung, Marta R. Costa-jussa, David Dale, Kevin Heffernan, Jaehyeong Jo, Artyom Kozhevnikov, Alexandre Mourachko, Christophe Ropers, Holger Schwenk, Paul-Ambroise Duquenne
March 17, 2026

Our approach
Latest news
Foundational models