June 17, 2023
Neural captioners are typically trained to mimic humangenerated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a selfsupervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify such image among a set of candidates. We experiment with the popular ClipCap captioner, also replicating the main results with BLIP. In terms of similarity to groundtruth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, our discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner without finetuning. We further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task
Written by
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
Publisher
CVPR
April 18, 2024
Jonas Kohler, Albert Pumarola, Edgar Schoenfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, Ali Thabet
April 18, 2024
April 14, 2024
Heng-Jui Chang, Ning Dong (AI), Ruslan Mavlyutov, Sravya Popuri, Andy Chung
April 14, 2024
March 20, 2024
Armen Avetisyan, Chris Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Julian Engel, Edward Miller, Richard Newcombe, Vasileios Balntas
March 20, 2024
February 21, 2024
Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon
February 21, 2024
Product experiences
Foundational models
Product experiences
Latest news
Foundational models