April 26, 2020
State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make output predictions at different stages of the network and we investigate different ways to predict how much computation is required for a particular sequence. Unlike dynamic computation in Universal Transformers, which applies the same set of layers iteratively, we apply different layers at every step to adjust both the amount of computation as well as the model capacity. On IWSLT German-English translation our approach matches the accuracy of a well tuned baseline Transformer while using less than a quarter of the decoder layers.
Publisher
International Conference on Learning Representations (ICLR)
Research Topics
November 16, 2022
Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer
November 16, 2022
October 31, 2022
Fabio Petroni, Giuseppe Ottaviano, Michele Bevilacqua, Patrick Lewis, Scott Yih, Sebastian Riedel
October 31, 2022
December 06, 2020
Michael Lewis, Armen Aghajanyan, Gargi Ghosh, Luke Zettlemoyer, Marjan Ghazvininejad, Sida Wang
December 06, 2020
November 30, 2020
Dhruv Batra, Devi Parikh, Meera Hahn, Jacob Krantz, James Rehg, Peter Anderson, Stefan Lee
November 30, 2020
April 30, 2018
Yedid Hoshen, Lior Wolf
April 30, 2018
November 01, 2018
Yedid Hoshen, Lior Wolf
November 01, 2018
December 02, 2018
Sagie Benaim, Lior Wolf
December 02, 2018
June 30, 2019
Geng Ji, Dehua Cheng, Huazhong Ning, Changhe Yuan, Hanning Zhou, Liang Xiong, Erik B. Sudderth
June 30, 2019
Foundational models
Our approach
Latest news
Foundational models