Featured Dataset
SA-V Dataset
SA-V is a dataset designed for training general-purpose object segmentation models from open world videos. The dataset was introduced in our paper “Segment Anything 2”.
Datasets

Multi-room Apartments Simulation (MRAS) Dataset
The Multi-Room Apartments Simulation (MRAS) dataset is a multi-modal dataset created for the task of estimating spatially-distributed acoustic parameters in complex scenes. It includes a large collection of scene geometries and Room-impulse Responses (RIRs), simulated from dozens of unique source positions and a dense grid of receivers.

Meta Synthetic Environments Lidar Dataset
The Meta Synthetic Environments (MSE) Lidar Dataset is the first-of-its-kind large-scale single-photon lidar dataset, built on top of Aria Synthetic Environments (ASE) and intended to unlock new machine learning capabilities for single-photon lidars.

FACET Dataset
FACET is a comprehensive benchmark dataset designed for measuring or evaluating the robustness and algorithmic fairness of AI and machine-learning vision models for protected groups.

EgoTV Dataset
A benchmark and dataset for systematic investigation of vision-language models on compositional, causal (e.g., effect of actions), and temporal (e.g., action ordering) reasoning in egocentric settings.

MMCSG Dataset
The MMCSG (Multi-Modal Conversations in Smart Glasses) dataset comprises two-sided conversations recorded using Aria glasses, featuring multi-modal data such as multi-channel audio, video, accelerometer, and gyroscope measurements.

Speech Fairness Dataset
By releasing this dataset, we hope to further motivate the AI community to make strides toward improving the fairness of speech recognition models, which will help all users have a better experience using applications with ASR.

Casual Conversations V2
For evaluating computer vision, audio and speech models for accuracy across a diverse set of ages, genders, language/dialects, geographies, disabilities, and more.

Casual Conversations
For evaluating computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.
Our approach
Latest news
Foundational models