COMPUTER VISION

EgoTV: Egocentric Task Verification from Natural Language Task Descriptions

September 19, 2023

Abstract

To enable progress towards egocentric agents capable of understanding everyday tasks specified in natural language, we propose a benchmark and a synthetic dataset called Egocentric Task Verification (EgoTV). The goal in EgoTV is to verify the execution of tasks from egocentric videos based on the natural language description of these tasks. EgoTV contains pairs of videos and their task descriptions for multi-step tasks – these tasks contain multiple sub-task de- compositions, state changes, object interactions, and sub-task ordering constraints. In addition, EgoTV also provides abstracted task descriptions that contain only partial details about ways to accomplish a task. Consequently, EgoTV requires causal, temporal, and compositional reasoning of video and language modalities, which is missing in existing datasets. We also find that existing vision-language models struggle at such all round reasoning needed for task verification in EgoTV. Inspired by the needs of EgoTV, we pro- pose a novel Neuro-Symbolic Grounding (NSG) approach that leverages symbolic representations to capture the compositional and temporal structure of tasks. We demonstrate NSG’s capability towards task tracking and verification on our EgoTV dataset and a real-world dataset derived from CrossTask (CTV). We open-source the EgoTV and CTV datasets and the NSG model for future research on egocentric assistive agents.

Download the Paper

AUTHORS

Written by

Ruta Desai

Akshara Rai

Brian Chen

Nitin Kamra

Rishi Hazra

Publisher

ICCV

Research Topics

Computer Vision

Related Publications

December 12, 2024

COMPUTER VISION

EvalGIM: A Library for Evaluating Generative Image Models

Melissa Hall, Oscar Mañas, Reyhane Askari, Mark Ibrahim, Candace Ross, Pietro Astolfi, Tariq Berrada Ifriqi, Marton Havasi, Yohann Benchetrit, Karen Ullrich, Carolina Braga, Abhishek Charnalia, Maeve Ryan, Mike Rabbat, Michal Drozdzal, Jakob Verbeek, Adriana Romero Soriano

December 12, 2024

December 11, 2024

COMPUTER VISION

Video Seal: Open and Efficient Video Watermarking

Pierre Fernandez, Hady Elsahar, Zeki Yalniz, Alexandre Mourachko

December 11, 2024

December 11, 2024

NLP

COMPUTER VISION

Meta CLIP 1.2

Hu Xu, Bernie Huang, Ellen Tan, Ching-Feng Yeh, Jacob Kahn, Christine Jou, Gargi Ghosh, Omer Levy, Luke Zettlemoyer, Scott Yih, Philippe Brunet, Kim Hazelwood, Ramya Raghavendra, Daniel Li (FAIR), Saining Xie, Christoph Feichtenhofer

December 11, 2024

December 11, 2024

COMPUTER VISION

Measuring Deja Vu Memorization Efficiently

Narine Kokhlikyan, Bargav Jayaraman, Florian Bordes, Chuan Guo, Kamalika Chaudhuri

December 11, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.