CORE MACHINE LEARNING

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

December 18, 2024

Abstract

Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers are tasked with the heavy burden of implementing each protocol, bearing a non-trivial computational cost, and making sense of how all these benchmarks translate into meaningful axes of progress. To facilitate a systematic evaluation of VLM progress, we introduce UniBench: a unified implementation of 50+ VLM benchmarks spanning a comprehensive range of carefully categorized capabilities from object recognition to spatial awareness, counting, and much more. We showcase the utility of UniBench for measuring progress by evaluating nearly 60 publicly available vision-language models, trained on scales of up to 12.8B samples. We find that while scaling training data or model size can boost many vision-language model capabilities, scaling offers little benefit for reasoning or relations. Surprisingly, we also discover today's best VLMs struggle on simple digit recognition and counting tasks, e.g. MNIST, which much simpler networks can solve. Where scale falls short, we find that more precise interventions, such as data quality or tailored-learning objectives offer more promise. For practitioners, we also offer guidance on selecting a suitable VLM for a given application. Finally, we release an easy-to-run UniBench code-base with the full set of 50+ benchmarks and comparisons across 59 models as well as a distilled, representative set of benchmarks that runs in 5 minutes on a single GPU.

Download the Paper

AUTHORS

Written by

Randall Balestriero

Diane Bouchacourt

Mark Ibrahim

Caner Hazirbas

Haider Al-Tahan

Quentin Garrido

Publisher

NeurIPS Benchmarks Track

Research Topics

Core Machine Learning

Related Publications

May 12, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

NeuralSet: A High-Performing Python Package for Neuro-AI

Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit

May 12, 2026

November 18, 2025

RESEARCH

CORE MACHINE LEARNING

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Roberta Raileanu, * Equal authorship, Alexis Audran-Reiss, Amar Budhiraja *, Anton Protopopov, Bhavul Gauri, Despoina Magka, Gaurav Chaurasia, Michael Slater, Shalini Maiti *, Tatiana Shavrina, Yoram Bachrach

November 18, 2025

October 13, 2025

REINFORCEMENT LEARNING

RESEARCH

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Paria Rashidinejad, Cai Zhou, Tommi Jaakkola, DiJia Su, Bo Liu, Feiyu Chen, Chenyu Wang, Shannon Zejiang Shen, Sid Wang, Siyan Zhao, Song Jiang, Yuandong Tian

October 13, 2025

September 24, 2025

RESEARCH

NLP

CWM: An Open-Weights LLM for Research on Code Generation with World Models

Chris Cummins, Hugh Leather, Aram Markosyan, Matteo Pagliardini, Tal Remez, Volker Seeker, Marco Selvi, Lingming Zhang, Abhishek Charnalia, Alex Gu, Badr Youbi Idrissi, Christian Keller, Daniel Haziza, David Zhang, Dmitrii Pedchenko, Emily McMilin, Fabian Gloeckle, Felix Kreuk, Francisco Massa, François Fleuret, Gabriel Synnaeve, Gal Cohen, Gallil Maimon, Jacob Kahn, Jade Copet, Jannik Kossen, Jonas Gehring, Jordi Armengol-Estape, Juliette Decugis, Keyur Muzumdar, Kunhao Zheng, Luca Wehrstedt, Maximilian Beck, Michael Hassid, Michel Meyer, Naila Murray, Oren Sultan, Ori Yoran, Pedram Bashiri, Peter O'Hearn, Pierre Chambon, Pierre-Emmanuel Mazaré, Quentin Carbonneaux, Rahul Kindi, Sida Wang, Taco Cohen, Vegard Mella, Yossi Adi, Yuxiang Wei, Zacharias Fisches

September 24, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.