JULY 29, 2024

SA-V Dataset

SA-V is a dataset designed for training general-purpose object segmentation models from open world videos. The dataset was introduced in our paper “Segment Anything 2”.

Overview

SA-V consists of 51K diverse videos and 643K spatio-temporal segmentation masks (i.e., masklets). It is intended to be used for computer vision research for the purposes permitted under the CC by 4.0 license.

The videos were collected via a contract third-party company. Out of the 643K masklets, 191K were SAM 2 assisted manual annotation and 452K were automatically generated by SAM 2 verified by annotators.

SA-V

Key Application

Computer Vision, Segmentation

Intended Use Cases
  • Train and evaluate generic object segmentation models

  • Allow access to a permissive, large-scale video dataset

Primary Data Type

Videos, Mask annotations

Data Function

Training, Testing

Dataset Characteristics

  • Total number of videos: 51K

  • Total number of masklets: 643K

  • Average masklets per video: 12.61

  • Average video resolution: 1401×1037 pixels

NOTE: There are no class labels for the videos or mask annotations.

Labels

Class agnostic mask annotations

Nature Of Content

The videos vary in subject matter. Common themes of the videos include: locations, objects, scenes. Masks range from large scale objects such as buildings to fine grained details such as interior decorations.

License

CC BY 4.0

Access Cost

Open access

Data Collection

Data sources

Videos were collected via a contracted third-party company.
Masks generated by the Meta Segment Anything Model 2 (SAM 2) and human annotators.

Data selection

Videos were selected based on their content.

Sampling Methods

Unsampled

Geographic distribution

Placeholder alt

Labeling Methods

Masks generated by the Meta Segment Anything Model 2 (SAM 2) and human annotators (more details in the Segment Anything 2 paper)

Label types

Masks in the training set are provided in the COCO run-length encoding (RLE) annotation format. Masks in validation and test sets are provided in PNG format.

Labeling procedure - Manual and Automatic

The final mask annotations we are releasing are manual annotations and model-generated automatic annotations. We collected 191K manual annotations from expert human annotators using an interactive model in the loop process with the Meta Segment Anything Model 2 (SAM 2). In addition, we collected 452K masklet annotations that were automatically generated by SAM 2 and verified by annotators. Please refer to our paper for more details.

Validation Methods

All the 643K masklet annotations were reviewed and validated by human annotators.

Please email segment-anything@meta.com or report any issues.