COMPUTER VISION

Egocentric Video Task Translation

June 18, 2023

Abstract

Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks—hand-object manipulations, navigation in the space, or human-human interactions—that unfold continuously, driven by the person’s goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once. Unlike traditional transfer or multitask learning, EgoT2’s “flipped design” entails separate task-specific backbones and a task translator shared across all tasks, which captures synergies between even heterogeneous tasks and mitigates task competition. Demonstrating our model on a wide array of video tasks from Ego4D, we show its advantages over existing transfer paradigms and achieve top-ranked results on four of the Ego4D 2022 benchmark challenges.

Download the Paper

AUTHORS

Publisher

CVPR

Research Topics

Computer Vision

Related Publications

September 30, 2023

INTEGRITY

COMPUTER VISION

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

Pierre Fernandez, Guillaume Couairon, Hervé Jegou, Matthijs Douze, Teddy Furon

September 30, 2023

September 29, 2023

COMPUTER VISION

Among Us: Adversarially Robust Collaborative Perception by Consensus

Yiming Li, Qi Fang, Jiamu Bai, Siheng Chen, Felix Xu, Chen Feng

September 29, 2023

September 27, 2023

COMPUTER VISION

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Xiaoliang Dai, Ji Hou, Kevin Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue (R) Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yiwen Song, Yi Wen, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh

September 27, 2023

September 22, 2023

COMPUTER VISION

CORE MACHINE LEARNING

Common Corruption Robustness of Point Cloud Detectors: Benchmark and Enhancement

Shuangzhi Li, Zhijie Wang, Felix Xu, Qing Guo, Xingyu Li, Lei Ma

September 22, 2023

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.