October 16, 2019
The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions. Based on a noise model of transcription errors, Lead2Gold searches for better transcriptions of the training data with a beam search that takes this noise model into account. The beam search is differentiable and does not require a forced alignment step, thus the whole system is trained end-to-end. Lead2Gold can be viewed as a new loss function that can be used on top of any sequence-to-sequence deep neural network. We conduct proof-of-concept experiments on noisy transcriptions generated from letter corruptions with different noise levels. We show that Lead2Gold obtains a better ASR accuracy than a competitive baseline which does not account for the (artificially-introduced) transcription noise.
Publisher
ASRU
April 23, 2024
Sachit Menon, Ishan Misra, Rohit Girdhar
April 23, 2024
April 05, 2024
Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao
April 05, 2024
December 07, 2023
Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Davide Testuggine, Madian Khabsa
December 07, 2023
November 06, 2023
David Esiobu, Ellen Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Yu, Eleonora Presani, Adina Williams, Eric Smith
November 06, 2023
Product experiences
Foundational models
Product experiences
Latest news
Foundational models