NLP

COMPUTER VISION

Large-scale Pretraining for Visual Dialog:A Simple State-of-the-Art Baseline

July 15, 2020

Abstract

Prior work in visual dialog has focused on training deep neural models on VisDial [1] in isolation. Instead, we present an approach to leverage pretraining on related vision-language datasets before transferring to visual dialog. We adapt the recently proposed ViLBERT model [2] for multi-turn visually-grounded conversations. Our model is pretrained on the Conceptual Captions [3] and Visual Question Answering [4] datasets, and finetuned on VisDial. Our best single model outperforms prior published work by >1% absolute on NDCG and MRR. Next, we find that additional finetuning using “dense” annotations in VisDial leads to even higher NDCG – more than 10% over our base model – but hurts MRR – more than 17% below our base model! This highlights a trade-off between the two primary metrics – NDCG and MRR – which we find is due to dense annotations not correlating well with the original ground-truth answers to questions.

Download the Paper

AUTHORS

Written by

Devi Parikh

Abhishek Das

Dhruv Batra

Vishvak Murahari

Publisher

ECCV

Related Publications

July 02, 2024

GRAPHICS

COMPUTER VISION

Meta 3D AssetGen: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials

Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahendra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny

July 02, 2024

July 02, 2024

GRAPHICS

COMPUTER VISION

Meta 3D Gen

Raphael Bensadoun, Tom Monnier, Yanir Kleiman, Filippos Kokkinos, Yawar Siddiqui, Mahendra Kariya, Omri Harosh, Roman Shapovalov, Emilien Garreau, Animesh Karnewar, Ang Cao, Idan Azuri, Iurii Makarov, Eric-Tuan Le, Antoine Toisoul, David Novotny, Oran Gafni, Natalia Neverova, Andrea Vedaldi

July 02, 2024

July 02, 2024

GRAPHICS

COMPUTER VISION

Meta 3D TextureGen: Fast and Consistent Texture Generation for 3D Objects

Raphael Bensadoun, Yanir Kleiman, Idan Azuri, Omri Harosh, Andrea Vedaldi, Natalia Neverova, Oran Gafni

July 02, 2024

June 25, 2024

NLP

Neurons in Large Language Models: Dead, N-gram, Positional

Elena Voita, Javier Ferrando Monsonis, Christoforos Nalmpantis

June 25, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.