MAY 2, 2022

EgoObjects: Large-Scale Egocentric Dataset of Objects

The EgoObjects dataset is designed to push the frontier of first-person and open-world object understanding for improving metaverse AR products.


To further push the limits of egocentric perception, we create the first large-scale data set focused on object detectors for egocentric video — featuring diverse viewpoints as well as different scale, background, and lighting conditions. While most existing comparable data sets are either not object-centric or not large-scale, our initial release will cover over 12,000 videos (40+ hours) across 200 main object categories in over 25 countries. Besides the main objects, the videos also capture various surrounding objects in the background. The total number of object categories can go up to 600.

Data collection is conducted with a wide range of egocentric recording devices (Rayban Stories, Snap Spectacles, and Mobile) in realistic household scenarios. EgoObjects also features an array of rich data annotations, like bounding boxes, category labels, instance IDs, as well as rich meta information, like background description, lighting condition and location.

EgoObjects Challenge Version

The EgoObjects challenge version will be used for continual learning challenge at the CLVision workshop at CVPR 2022 with three tracks including continual instance-level object classification, continual category-level object detection, continual instance-level object detection. These tracks are designed to advance object understanding in the egocentric perspective, a fundamental building block for AR applications.

Key Application

Computer vision, machine learning

Intended Use Cases

Research on continual learning of instance-/category-level object classification/detection. Open-source for CVPR 2022 CLVision workshop.

Primary Data Type

Image (jpg)

Data Function

Training, testing

Dataset Characteristics

Total number of images: ~100k

Image frame rate: 1 FPS

Number of main object categories: 200

Number of all object categories: up to 600


Main object category (self-provided):


Background description (self-provided):


Location (self-provided):


2D bounding boxes (human labeled):


Category labels (human labeled):


Instance IDs (human labeled):


Nature Of Content

Frames from video recordings of indoor objects taken by egocentric cameras

Privacy PII

Data for indoor objects only. No person data included

View License
Access Cost

Open access

Data Collection

Data sources

Vendor data collection efforts

Data selection

All images are opted-in for data use in algorithm training and benchmarking by the users

Sampling Methods

Frames sampled at 1 FPS

Geographic distribution

25 countries: US, ZA, NG, IN, VN, FR, DE, etc

Labeling Methods

Human labels

Labeling procedure - Human

Vendors provided meta information including main object category, background description, and locations. Annotators labeled 2D bounding boxes with category labels and instance IDs.