April 29, 2020
Simultaneous machine translation models start generating a target sequence before they have encoded the source sequence. Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure. In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention. We also introduce two novel and interpretable approaches for latency control that are specifically designed for multiple attention heads. We apply MMA to the simultaneous machine translation task and demonstrate better latency-quality tradeoffs compared to MILk, the previous state-of-the-art approach. We analyze how the latency controls affect the attention span and we study the relationship between the speed of a head and the layer it belongs to. Finally, we motivate the introduction of our model by analyzing the effect of the number of decoder layers and heads on quality and latency.
Publisher
International Conference on Learning Representations (ICLR)
Research Topics
November 16, 2022
Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer
November 16, 2022
October 31, 2022
Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel
October 31, 2022
December 06, 2020
Michael Lewis, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer, Marjan Ghazvininejad, Sida Wang
December 06, 2020
November 30, 2020
Dhruv Batra, Devi Parikh, Meera Hahn, Jacob Krantz, James Rehg, Peter Anderson, Stefan Lee
November 30, 2020
April 30, 2018
Yedid Hoshen, Lior Wolf
April 30, 2018
November 01, 2018
Yedid Hoshen, Lior Wolf
November 01, 2018
December 02, 2018
Sagie Benaim, Lior Wolf
December 02, 2018
June 30, 2019
Geng Ji, Dehua Cheng, Huazhong Ning, Changhe Yuan, Hanning Zhou, Liang Xiong, Erik B. Sudderth
June 30, 2019
Foundational models
Latest news
Foundational models