June 20, 2023
Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval. CITADEL learns to route different token vectors to the predicted lexical "keys" such that a query token vector only interacts with document token vectors routed to the same key. This design significantly reduces the computation cost while maintaining high accuracy. Notably, CITADEL achieves the same or slightly better performance than the previous state of the art, ColBERT-v2, on both in-domain (MS MARCO) and out-of-domain (BEIR) evaluations, while being nearly 40 times faster. Code and data are available at: https://github.com/facebookresearch/dpr-scale/tree/citadel
Publisher
TACL
Research Topics
July 23, 2024
Llama team
July 23, 2024
June 25, 2024
Elena Voita, Javier Ferrando Monsonis, Christoforos Nalmpantis
June 25, 2024
June 25, 2024
Min-Jae Hwang, Ilia Kulikov, Benjamin Peloquin, Hongyu Gong, Peng-Jen Chen, Ann Lee
June 25, 2024
June 14, 2024
Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Scott Yih, Xilun Chen
June 14, 2024
Product experiences
Foundational models
Product experiences
Latest news
Foundational models