SA-V is a dataset designed for training general-purpose object segmentation models from open world videos. The dataset was introduced in our paper “Segment Anything 2”.
SA-V consists of 51K diverse videos and 643K spatio-temporal segmentation masks (i.e., masklets). It is intended to be used for computer vision research for the purposes permitted under the CC by 4.0 license.
The videos were collected via a contract third-party company. Out of the 643K masklets, 191K were SAM 2 assisted manual annotation and 452K were automatically generated by SAM 2 verified by annotators.
Computer Vision, Segmentation
Train and evaluate generic object segmentation models
Allow access to a permissive, large-scale video dataset
Videos, Mask annotations
Training, Testing
Total number of videos: 51K
Total number of masklets: 643K
Average masklets per video: 12.61
Average video resolution: 1401×1037 pixels
NOTE: There are no class labels for the videos or mask annotations.
Class agnostic mask annotations
The videos vary in subject matter. Common themes of the videos include: locations, objects, scenes. Masks range from large scale objects such as buildings to fine grained details such as interior decorations.
CC BY 4.0
Open access
Data sources
Videos were collected via a contracted third-party company.
Masks generated by the Meta Segment Anything Model 2 (SAM 2) and human annotators.
Data selection
Videos were selected based on their content.
Unsampled
Masks generated by the Meta Segment Anything Model 2 (SAM 2) and human annotators (more details in the Segment Anything 2 paper)
Masks in the training set are provided in the COCO run-length encoding (RLE) annotation format. Masks in validation and test sets are provided in PNG format.
The final mask annotations we are releasing are manual annotations and model-generated automatic annotations. We collected 191K manual annotations from expert human annotators using an interactive model in the loop process with the Meta Segment Anything Model 2 (SAM 2). In addition, we collected 452K masklet annotations that were automatically generated by SAM 2 and verified by annotators. Please refer to our paper for more details.
All the 643K masklet annotations were reviewed and validated by human annotators.
Please email segment-anything@meta.com or report any issues.
Foundational models
Latest news
Foundational models