April 30, 2020
We propose autoencoding speaker conversion for training data augmentation in automatic speech translation. This technique directly transforms an audio sequence, resulting in audio synthesized to resemble another speaker's voice. Our method compares favorably to SpecAugment on English→French and English→Romanian automatic speech translation (AST) tasks as well as on a low-resource English automatic speech recognition (ASR) task. Further, in ablations, we show the benefits of both quantity and diversity in augmented data. Finally, we show that we can combine our approach with augmentation by machine-translated transcripts to obtain a competitive end-to-end AST model that outperforms a very strong cascade model on an English→French AST task. Our method is sufficiently general that it can be applied to other speech generation and analysis tasks.
Publisher
ICASSP
Research Topics
May 06, 2026
Saarang Panchavati, Antoine Ratouchniak, Mingfang (Lucy) Zhang, Elisa Cascardi, Hubert Banville, Jarod Levy, Jean-Rémi King, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit
May 06, 2026
May 04, 2026
Sachin Mehta, Alisa Liu, Margaret Li, Artidoro Pagnoni, Gargi Ghosh, Luke Zettlemoyer, Mike Lewis, Srini Iyer, Tomasz Limisiewicz
May 04, 2026
April 16, 2026
Nicola Cancedda, Pontus Stenetorp, Alexis Audran-Reiss, Alisia Lupidi, Anton Protopopov, Bassel Al Omari, Carole-Jean Wu, Derek Dunfield, Despoina Magka, Edan Toledo, Hela Momand, Ishita Mediratta, Jakob Foerster, Jean-Christophe Gagnon-Audet, Karen Hambardzumyan, Kelvin Niu, Martin Josifoski, Michael Kuchnik, Michael Shvartsman, Nicolas Baldwin, Parth Pathak, Rishi Hazra, Tatiana Shavrina, Thomas Simon Foster, Yoram Bachrach
April 16, 2026
March 24, 2026
Jenny Zhang, Bingchen Zhao, Jakob Foerster, Sam Devlin, Tatiana Shavrina, Winnie Yang
March 24, 2026

Our approach
Latest news
Foundational models