Products

AI Research

Resources

About

RESEARCH

NLP

TextCaps: a Dataset for Image Captioning with Reading Comprehension

August 23, 2020

Abstract

Image descriptions can help visually impaired people to quickly understand the image content.While we made significant progress in automatically describing images and optical character recognition, current approaches are unable to include written text in their descriptions, although text is omnipresent in human environments and frequently critical to understand our surroundings. To study how to comprehend text in the context of an image we collect a novel dataset, TextCaps, with 145k captions for 28k images. Our dataset challenges a model to recognize text, relate it to its visual context, and decide what part of the text to copy or paraphrase, requiring spatial, semantic, and visual reasoning between multiple text tokens and visual entities, such as objects. We study baselines and adapt existing approaches to this new task, which we refer to as image captioning with reading comprehension. Our analysis with automatic and human studies shows that our new TextCaps dataset provides many new technical challenges over previous datasets.

Download the Paper

AUTHORS

Written by

Oleksii Sidorov

Amanpreet Singh

Marcus Rohrbach

Ronghang Hu

Publisher

ECCV

Research Topics

Natural Language Processing (NLP)

Computer Vision

Related Publications

July 17, 2026

CONVERSATIONAL AI

REINFORCEMENT LEARNING

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Zilin Xiao, Qi Ma, Jason Chen, Xintao Chen, Avinash Atreya, Hanjie Chen, Vicente Ordonez

July 17, 2026

July 13, 2026

AR/VR

RESEARCH

S-EMBER: A Large-Scale Benchmark for Streaming Egocentric Memory Retrieval

Xiaodong Wang, Xuanyi Zhao, Pedro Rodriguez, Devendra Singh Sachan, Barlas Oguz, Seungwhan Moon, Shang-Wen Li, Gargi Ghosh, Xin Dong, Wen-Tau Yih

July 13, 2026

July 03, 2026

HUMAN & MACHINE INTELLIGENCE

ROBOTICS

Interpreting Physics in Video World Models

Sonia Joseph, Quentin Garrido, Randall Balestriero, Matthew Kowal, Thomas Fel, Shahab Bakhtiari, Blake Richards, Mike Rabbat

July 03, 2026

June 05, 2026

CONVERSATIONAL AI

RANKING AND RECOMMENDATIONS

Superintelligent Retrieval Agent: The Next Frontier of Agentic Retrieval

Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava

June 05, 2026

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.

About AI at Meta

Media Generation

Foundational models

Our approach

Our approach About AI at Meta People Careers

Research

Research Infrastructure Resources Demos

Meta AI

Meta AI Assistant Media Generation Vibes AI Studio

Latest news

Latest news Blog Newsletter

Foundational models

Meta © 2026