COMPUTER VISION

SAM 2: Segment Anything in Images and Videos

July 29, 2024

Abstract

We present Segment Anything Model 2 (SAM 2 ), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks. We are releasing a version of our model, the dataset and an interactive demo.

Download the Paper

AUTHORS

Written by

Nikhila Ravi

Valentin Gabeur

Yuan-Ting Hu

Ronghang Hu

Chay Ryali

Tengyu Ma

Haitham Khedr

Roman Rädle

Chloe Rolland

Laura Gustafson

Eric Mintun

Junting Pan

Kalyan Vasudev Alwala

Nicolas Carion

Chao-Yuan Wu

Ross Girshick

Piotr Dollar

Christoph Feichtenhofer

Publisher

ArXiv

Research Topics

Computer Vision

Related Publications

October 31, 2024

HUMAN & MACHINE INTELLIGENCE

ROBOTICS

Digitizing Touch with an Artificial Multimodal Fingertip

Mike Lambeta, Tingfan Wu, Ali Sengül, Victoria Rose Most, Nolan Black, Kevin Sawyer, Romeo Mercado, Haozhi Qi, Alexander Sohn, Byron Taylor, Norb Tydingco, Gregg Kammerer, Dave Stroud, Jake Khatha, Kurt Jenkins, Kyle Most, Neal Stein, Ricardo Chavira, Thomas Craven-Bartle, Eric Sanchez, Yitian Ding, Jitendra Malik, Roberto Calandra

October 31, 2024

October 16, 2024

SPEECH & AUDIO

COMPUTER VISION

Movie Gen: A Cast of Media Foundation Models

Movie Gen Team

October 16, 2024

September 10, 2024

COMPUTER VISION

Video Editing via Factorized Diffusion Distillation

Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman

September 10, 2024

September 05, 2024

CONVERSATIONAL AI

NLP

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Luke Zettlemoyer, Omer Levy, Xuezhe Ma

September 05, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.