March 26, 2020
The use of deep pre-trained bidirectional transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding label, two approaches are common: Cross-encoders performing full self-attention over the pair and Bi-encoders encoding the pair separately. The former often performs better, but is too slow for practical use. In this work, we develop a new transformer architecture, the Poly-encoder, that learns global rather than token level self-attention features. We perform a detailed comparison of all three approaches, including what pre-training and fine-tuning strategies work best. We show our models achieve state-of-the-art results on three existing tasks; that Poly-encoders are faster than Cross-encoders and more accurate than Bi-encoders; and that the best results are obtained by pre-training on large datasets similar to the downstream tasks.
Publisher
ICLR
Research Topics
September 05, 2024
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Luke Zettlemoyer, Omer Levy, Xuezhe Ma
September 05, 2024
August 20, 2024
Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour, Pierce Chuang, Abhay Harpale, Vikas Bhardwaj, Di Xu (SWE), Shicong Zhao, Ankit Ramchandani, Luna Dong, Anuj Kumar
August 20, 2024
August 11, 2024
Igor Tufanov, Karen Hambardzumyan, Javier Ferrando, Lena Voita
August 11, 2024
August 11, 2024
Marta R. Costa-jussa, Mariano Coria Meglioli, Pierre Andrews, David Dale, Kae Hansanti, Elahe Kalbassi, Christophe Ropers, Carleigh Wood
August 11, 2024
Foundational models
Latest news
Foundational models