RESEARCH

NLP

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

September 19, 2019

Abstract

Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the ‘state’ of all images and dialogs. We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4.25M question-answer pairs. We use CLEVR-Dialog to benchmark performance of standard visual dialog models; in particular, on visual coreference resolution (as a function of the coreference distance). This is the first analysis of its kind for visual dia- log models that was not possible without this dataset. We hope the findings from CLEVR- Dialog will help inform the development of future models for visual dialog. Our code and dataset are publicly available.

Download the Paper

AUTHORS

Written by

Marcus Rohrbach

Devi Parikh

Dhruv Batra

Satwik Kottur

José M. F. Moura

Publisher

NAACL

Related Publications

December 17, 2024

NLP

FLAME : Factuality-Aware Alignment for Large Language Models

Jack Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Scott Yih, Xilun Chen

December 17, 2024

December 12, 2024

NLP

CORE MACHINE LEARNING

Memory Layers at Scale

Vincent-Pierre Berges, Barlas Oguz

December 12, 2024

December 12, 2024

NLP

Byte Latent Transformer: Patches Scale Better Than Tokens

Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman, Srini Iyer

December 12, 2024

December 12, 2024

HUMAN & MACHINE INTELLIGENCE

NLP

Explore Theory-of-Mind: Program-Guided Adversarial Data Generation for Theory of Mind Reasoning

Melanie Sclar, Jane Yu, Maryam Fazel-Zarandi, Yulia Tsvetkov, Yonatan Bisk, Yejin Choi, Asli Celikyilmaz

December 12, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.