COMPUTER VISION

Better (pseudo-)labels for semi-supervised instance segmentation

April 03, 2024

Abstract

Despite the availability of large datasets for tasks like image classification and image-text alignment, labeled data for more complex recognition tasks, such as detection and segmentation, is less abundant. In particular, for instance segmentation annotations are time-consuming to produce, and the distribution of instances is often highly skewed across classes. While semi-supervised teacher-student distillation methods show promise in leveraging vast amounts of unlabeled data, they suffer from miscalibration, resulting in overconfidence in frequently represented classes and underconfidence in rarer ones. Additionally, these methods encounter difficulties in efficiently learning from a limited set of examples. We introduce a dual-strategy to enhance the teacher model's training process, substantially improving the performance on few-shot learning. Secondly, we propose a calibration correction mechanism that that enables the student model to correct the teacher's calibration errors. Using our approach, we observed marked improvements over a state-of-the-art supervised baseline performance on the LVIS dataset, with an increase of 2.8% in average precision (AP) and 10.3% gain in AP for rare classes.

Download the Paper

AUTHORS

Written by

Francois Porcher

Camille Couprie

Marc Szafraniec

Jakob Verbeek

Publisher

PML4LRS @ ICLR

Research Topics

Computer Vision

Related Publications

November 20, 2024

CONVERSATIONAL AI

COMPUTER VISION

Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations

Jianfeng Chi, Ujjwal Karn, Hongyuan Zhan, Eric Smith, Javier Rando, Yiming Zhang, Kate Plawiak, Zacharie Delpierre Coudert, Kartikeya Upasani, Mahesh Pasupuleti

November 20, 2024

November 11, 2024

COMPUTER VISION

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

Sherry Xue, Romy Luo, Changan Chen, Kristen Grauman

November 11, 2024

October 31, 2024

HUMAN & MACHINE INTELLIGENCE

ROBOTICS

Digitizing Touch with an Artificial Multimodal Fingertip

Mike Lambeta, Tingfan Wu, Ali Sengül, Victoria Rose Most, Nolan Black, Kevin Sawyer, Romeo Mercado, Haozhi Qi, Alexander Sohn, Byron Taylor, Norb Tydingco, Gregg Kammerer, Dave Stroud, Jake Khatha, Kurt Jenkins, Kyle Most, Neal Stein, Ricardo Chavira, Thomas Craven-Bartle, Eric Sanchez, Yitian Ding, Jitendra Malik, Roberto Calandra

October 31, 2024

October 16, 2024

SPEECH & AUDIO

COMPUTER VISION

Movie Gen: A Cast of Media Foundation Models

Movie Gen Team

October 16, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.