Featured Dataset
SA-V Dataset
SA-V is a dataset designed for training general-purpose object segmentation models from open world videos. The dataset was introduced in our paper “Segment Anything 2”.
Datasets
FACET Dataset
FACET is a comprehensive benchmark dataset designed for measuring or evaluating the robustness and algorithmic fairness of AI and machine-learning vision models for protected groups.
EgoTV Dataset
A benchmark and dataset for systematic investigation of vision-language models on compositional, causal (e.g., effect of actions), and temporal (e.g., action ordering) reasoning in egocentric settings.
MMCSG Dataset
The MMCSG (Multi-Modal Conversations in Smart Glasses) dataset comprises two-sided conversations recorded using Aria glasses, featuring multi-modal data such as multi-channel audio, video, accelerometer, and gyroscope measurements.
Speech Fairness Dataset
Designed for training general-purpose object segmentation models from open world images.
Casual Conversations V2
For evaluating computer vision, audio and speech models for accuracy across a diverse set of ages, genders, language/dialects, geographies, disabilities, and more.
Casual Conversations
For evaluating computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.
Foundational models
Latest news
Foundational models