RESEARCH

COMPUTER VISION

SAM 3: Segment Anything with Concepts

November 19, 2025

Abstract

We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., “yellow school bus”), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of an image-level detector and a memory-based video tracker that share a single backbone. Recognition and localization are decoupled with a presence head, which boosts detection accuracy. SAM 3 delivers a 2× gain over existing systems in both image and video PCS, and improves previous SAM capabilities on visual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation.

Download the Paper

AUTHORS

Written by

Nicolas Carion

Laura Gustafson

Yuan-Ting Hu

Shoubhik Debnath

Ronghang Hu

Didac Suris Coll-Vinent

Chaitanya Ryali

Kalyan Vasudev Alwala

Haitham Khedr

Andrew Huang

Jie Lei

Tengyu Ma

Baishan Guo

Arpit Kalla

Markus Marks

Joseph Greer

Meng Wang

Peize Sun

Roman Rädle

Triantafyllos Afouras

Effrosyni Mavroudi

Katherine Xu

Tsung-Han Wu

Yu Zhou

Liliane Momeni

Rishi Hazra

Shuangrui Ding

Sagar Vaze

Francois Porcher

Feng Li

Siyuan Li

Aishwarya Kamath

Ho Kei Cheng

Piotr Dollar

Nikhila Ravi

Kate Saenko

Pengchuan Zhang

Christoph Feichtenhofer

Publisher

arxiv

Research Topics

Computer Vision

Related Publications

February 27, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

Unified Vision–Language Modeling via Concept Space Alignment

Yifu Qiu, Paul-Ambroise Duquenne, Holger Schwenk

February 27, 2026

February 26, 2026

CONVERSATIONAL AI

RESEARCH

Learning Personalized Agents from Human Feedback

Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Shaoliang Nie, Michael Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, Saghar Hosseini

February 26, 2026

February 11, 2026

RESEARCH

COMPUTER VISION

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Leon Liangyu Chen, Haoyu Ma, Zhipeng Fan, Ziqi Huang, Animesh Sinha, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Junzhe Sun, Chu Wang, Serena Yeung-Levy, Felix Juefei-Xu

February 11, 2026

January 02, 2026

COMPUTER VISION

PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation

Yuanhao Cai, Kunpeng Li, Menglin Jia, Jialiang Wang, Junzhe Sun, Feng Liang, Weifeng Chen, Felix Xu, Chu Wang, Ali Thabet, Xiaoliang Dai, Xuan Ju, Alan Yuille, Ji Hou

January 02, 2026

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.