Research

NLP

Training ASR models by Generation of Contextual Information

May 4, 2020

Abstract

Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data. However, in many applications and locales, only moderate amounts of data are available, which has led to a surge in semi- and weakly-supervised learning research. In this paper, we conduct a large-scale study evaluating the effectiveness of weakly-supervised learning for speech recognition by using loosely related contextual information as a surrogate for ground-truth labels. For weakly supervised training, we use 50k hours of public English social media videos along with their respective titles and post text to train an encoder-decoder transformer model. Our best encoder-decoder models achieve an average of 20.8% WER reduction over a 1000 hours supervised baseline, and an average of 13.4% WER reduction when using only the weakly supervised encoder for CTC fine-tuning. Our results show that our setup for weak supervision improved both the encoder acoustic representations as well as the decoder language generation abilities.

Download the Paper

AUTHORS

Written by

Kritika Singh

Dmytro Okhonko

Jun Liu

Yongqiang Wang

Frank Zhang

Ross Girshick

Sergey Edunov

Fuchun Peng

Yatharth Saraf

Geoffrey Zweig

Abdelrahman Mohamed

Related Publications

June 05, 2026

Conversational AI

Ranking & Recommendations

Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval

Anshumali Shrivastava, Jason Chen, Qi Ma, Zeyu Yang

June 05, 2026

May 19, 2026

Human & Machine Intelligence

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

Alvin W. M. Tan, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Michael C. Frank, Angel Villar Corrales, Charles-Eric Saint-James, Dongyan Lin, Emmanuel Dupoux, Jiayi Shen, Juan Pino, Mahi Luthra, Martin Gleize, Phillip Rust, Rashel Moritz, Sheila Krogh-Jespersen, Surya Parimi, Tom Fizycki, Vanessa Stark, Yosuke Higuchi, Youssef Benchekroun

May 19, 2026

May 17, 2026

Conversational AI

GIM: Evaluating models via tasks that integrate multiple cognitive domains

Alexandre Rezende, Rohit Patel, Steven McClain

May 17, 2026

May 12, 2026

Human & Machine Intelligence

NeuralSet: A High-Performing Python Package for Neuro-AI

Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit

May 12, 2026

October 31, 2019

NLP

Facebook AI's WAT19 Myanmar-English Translation Task Submission

Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc’Aurelio Ranzato

October 31, 2019

March 14, 2019

NLP

On the Pitfalls of Measuring Emergent Communication | Facebook AI Research

Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin

March 14, 2019

January 13, 2020

NLP

Scaling up online speech recognition using ConvNets | Facebook AI Research

Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

January 13, 2020

April 30, 2018

NLP

Computer Vision

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent | Facebook AI Research

Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

April 30, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.