Key capabilities

Segment any object, now in any video or image

SAM 2 is the first unified model for segmenting objects across images and videos. You can use a click, box, or mask as the input to select an object on any image or frame of video.

Read the research paper

Select objects and make adjustments across video frames

Using SAM 2, you can select one or multiple objects in a video frame. Use additional prompts to refine the model predictions.

Robust segmentation, even in
unfamiliar videos

SAM 2 is capable of strong zero-shot performance for objects, images and videos not previously seen during model training, enabling use in a wide range of real-world applications.

Real-time interactivity and results

SAM 2 is designed for efficient video processing with streaming inference to enable real-time, interactive applications.

State-of-the-art performance for object segmentation

SAM 2 outperforms the best models in the field for object segmentation in videos and images.

Highlights

  • SAM 2 improves on SAM for segmentation in images
  • SAM 2 outperforms existing video object segmentation models, especially for tracking parts
  • SAM 2 requires less interaction time than existing interactive video segmentation methods

Try it yourself

Track an object across any video interactively with as little as a single click on one frame, and create fun effects.

Try the demo

Our approach

The next generation of Meta Segment Anything

SAM 2 brings state-of-the-art video and image segmentation capabilities into a single model, while preserving a simple design and fast inference speed.

Model architecture

Meta Segment Anything Model 2 design

The SAM 2 model extends the promptable capability of SAM to the video domain by adding a per session memory module that captures information about the target object in the video. This allows SAM 2 to track the selected object throughout all video frames, even if the object temporarily disappears from view, as the model has context of the object from previous frames. SAM 2 also supports the ability to make corrections in the mask prediction based on additional prompts on any frame.

SAM 2’s streaming architecture—which processes video frames one at a time—is also a natural generalization of SAM to the video domain. When SAM 2 is applied to images, the memory module is empty and the model behaves like SAM.


The Segment Anything Video Dataset

A large and diverse video segmentation dataset

SAM 2 was trained on a large and diverse set of videos and masklets (object masks over time), created by applying SAM 2 interactively in a model in the loop data-engine. The training data includes the SA-V dataset, which we are open sourcing.

Please email support@segment-anything.com with any issues or questions regarding the SA-V dataset.

Explore the dataset

Highlights

  • ~600K+ masklets collected on ~51K videos
  • Geographically diverse, real world scenarios collected across 47 countries
  • Annotations include whole objects, parts, and challenging occlusions

Access our research

Open innovation

To enable the research community to build upon this work, we’re publicly releasing a pretrained Segment Anything 2 model, along with the SA-V dataset, a demo, and code.

Download the model

Highlights

  • We are providing transparency into the SAM 2 training data
  • We prioritized geographic diversity in the SA-V dataset for real-world representation
  • We conducted a fairness evaluation of SAM 2
Person coding on desktop computer

Potential model applications

SAM 2 can be used by itself, or as part of a larger system with other models in future work to enable novel experiences.

Download the model

Extensible outputs

The video object segmentation outputs from SAM 2 could be used as input to other AI systems such as modern video generation models to enable precise editing capabilities.

Extensible inputs

SAM 2 can be extended to take other types of input prompts such as in the future enabling creative ways of interacting with objects in real-time or live video.


Explore additional resources

Read the AI at Meta blog
Read the research paper
Download the dataset
Explore the dataset
Download the model
Try the demo