October 22, 2022
In this paper, we tackle the challenging problem of Few-shot Object Detection. Existing FSOD pipelines (i) use average-pooled representations that result in information loss; and/or (ii) discard position information that can help detect object instances. Consequently, such pipelines are sensitive to large intra-class appearance and geometric variations between support and query images. To address these drawbacks, we propose a Time-rEversed diffusioN tEnsor Transformer (TENET), which i) forms high-order tensor representations that capture multi-way feature occurrences that are highly discriminative, and ii) uses a trans- former that dynamically extracts correlations between the query image and the entire support set, instead of a single average-pooled support em- bedding. We also propose a Transformer Relation Head (TRH), equipped with higher-order representations, which encodes correlations between query regions and the entire support set, while being sensitive to the positional variability of object instances. Our model achieves state-of- the-art results on PASCAL VOC, FSOD, and COCO.
Written by
Naila Murray
Lei Wang
Piotr Koniusz
Shan Zhang
Publisher
ECCV
Research Topics
November 11, 2024
Sherry Xue, Romy Luo, Changan Chen, Kristen Grauman
November 11, 2024
October 31, 2024
Mike Lambeta, Tingfan Wu, Ali Sengül, Victoria Rose Most, Nolan Black, Kevin Sawyer, Romeo Mercado, Haozhi Qi, Alexander Sohn, Byron Taylor, Norb Tydingco, Gregg Kammerer, Dave Stroud, Jake Khatha, Kurt Jenkins, Kyle Most, Neal Stein, Ricardo Chavira, Thomas Craven-Bartle, Eric Sanchez, Yitian Ding, Jitendra Malik, Roberto Calandra
October 31, 2024
October 16, 2024
Movie Gen Team
October 16, 2024
September 10, 2024
Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman
September 10, 2024
Foundational models
Latest news
Foundational models