Research

Computer Vision

Mask R-CNN

October 22, 2017

Abstract

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available.

Download the Paper

Related Publications

February 27, 2026

Human & Machine Intelligence

Unified Vision–Language Modeling via Concept Space Alignment

Yifu Qiu, Holger Schwenk, Paul-Ambroise Duquenne

February 27, 2026

February 11, 2026

Computer Vision

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Leon Liangyu Chen, Haoyu Ma, Ziqi Huang, Xiaoliang Dai, Jialiang Wang, Zecheng He, Jianwei Yang, Chunyuan Li, Serena Yeung-Levy, Animesh Sinha, Chu Wang, Felix Juefei-Xu, Junzhe Sun, Zhipeng Fan

February 11, 2026

December 18, 2025

Computer Vision

Pixel Seal: Adversarial-only training for invisible image and video watermarking

Alexandre Mourachko, Hady Elsahar, Pierre Fernandez, Sylvestre Rebuffi, Tom Sander, Tomáš Souček, Tuan Tran, Valeriu Lacatusu

December 18, 2025

November 19, 2025

Computer Vision

SAM 3: Segment Anything with Concepts

Ronghang Hu, Peize Sun, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Andrew Huang, Arpit Kalla, Baishan Guo, Chaitanya Ryali, Christoph Feichtenhofer, Didac Suris Coll-Vinent, Haitham Khedr, Jie Lei, Joseph Greer, Kalyan Vasudev Alwala, Kate Saenko, Laura Gustafson, Markus Marks, Meng Wang, Nicolas Carion, Nikhila Ravi, Pengchuan Zhang, Piotr Dollar, Rishi Hazra, Roman Rädle, Shoubhik Debnath, Tengyu Ma, Yuan-Ting Hu

November 19, 2025

June 11, 2019

Computer Vision

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero | Facebook AI Research

Yuandong Tian, Jerry Ma, Qucheng Gong, Shubho Sengupta, Zhuoyuan Chen, James Pinkerton, Larry Zitnick

June 11, 2019

April 30, 2018

NLP

Computer Vision

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent | Facebook AI Research

Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

April 30, 2018

October 10, 2016

Speech & Audio

Computer Vision

Polysemous Codes | Facebook AI Research

Matthijs Douze, Hervé Jégou, Florent Perronnin

October 10, 2016

June 18, 2018

Speech & Audio

Computer Vision

Low-shot learning with large-scale diffusion | Facebook AI Research

Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

June 18, 2018

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.