January 13, 2020
We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency and a better word error rate. We improve the core TDS architecture in order to limit the future context and hence reduce latency while maintaining accuracy. Also important to the efficiency of the recognizer is our highly optimized beam search decoder. To show the impact of our design choices, we analyze throughput, latency and accuracy and also discuss how these metrics can be tuned based on the user requirements.
Written by
Publisher
Research Topics
September 10, 2019
Jinfeng Rao, Linqing Liu, Yi Tay, Wei Yang, Peng Shi, Jimmy Lin
September 10, 2019
May 17, 2019
Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin
May 17, 2019
July 27, 2019
Patrick Lewis, Ludovic Denoyer, Sebastian Riedel
July 27, 2019
August 01, 2019
Yi Tay, Shuohang Wang, Luu Anh Tuan, Jie Fu, Minh C. Phan, Xingdi Yuan, Jinfeng Rao, Siu Cheung Hui, Aston Zhang
August 01, 2019
Who We Are
Our Actions
Newsletter