Key capabilities
SAM 2 is the first unified model for segmenting objects across images and videos. You can use a click, box, or mask as the input to select an object on any image or frame of video.
Read the research paperUsing SAM 2, you can select one or multiple objects in a video frame. Use additional prompts to refine the model predictions.
SAM 2 is capable of strong zero-shot performance for objects, images and videos not previously seen during model training, enabling use in a wide range of real-world applications.
SAM 2 is designed for efficient video processing with streaming inference to enable real-time, interactive applications.
SAM 2 outperforms the best models in the field for object segmentation in videos and images.
Track an object across any video interactively with as little as a single click on one frame, and create fun effects.
Try the demoOur approach
SAM 2 brings state-of-the-art video and image segmentation capabilities into a single model, while preserving a simple design and fast inference speed.
Model architecture
The SAM 2 model extends the promptable capability of SAM to the video domain by adding a per session memory module that captures information about the target object in the video. This allows SAM 2 to track the selected object throughout all video frames, even if the object temporarily disappears from view, as the model has context of the object from previous frames. SAM 2 also supports the ability to make corrections in the mask prediction based on additional prompts on any frame.
SAM 2’s streaming architecture—which processes video frames one at a time—is also a natural generalization of SAM to the video domain. When SAM 2 is applied to images, the memory module is empty and the model behaves like SAM.
The Segment Anything Video Dataset
SAM 2 was trained on a large and diverse set of videos and masklets (object masks over time), created by applying SAM 2 interactively in a model in the loop data-engine. The training data includes the SA-V dataset, which we are open sourcing.
Please email support@segment-anything.com with any issues or questions regarding the SA-V dataset.
Explore the datasetAccess our research
To enable the research community to build upon this work, we’re publicly releasing a pretrained Segment Anything 2 model, along with the SA-V dataset, a demo, and code.
Download the modelSAM 2 can be used by itself, or as part of a larger system with other models in future work to enable novel experiences.
The video object segmentation outputs from SAM 2 could be used as input to other AI systems such as modern video generation models to enable precise editing capabilities.
SAM 2 can be extended to take other types of input prompts such as in the future enabling creative ways of interacting with objects in real-time or live video.
Foundational models
Latest news
Foundational models