August 22, 2025
Large Language Models (LLMs) have shown great potential in reasoning tasks through test-time scaling methods like self-consistency with majority voting. However, this approach often leads to diminishing returns in accuracy and high computational overhead. To address these challenges, we introduce Deep Think with Confidence (DeepConf), a simple yet powerful method that enhances both reasoning efficiency and performance at test time. DeepConf leverages model-internal confidence signals to dynamically filter out low-quality reasoning traces during or after generation. It requires no additional model training or hyperparameter tuning and can be seamlessly integrated into existing serving frameworks. We evaluate DeepConf across a variety of reasoning tasks and the latest open-source models, including Qwen 3 and GPT-OSS series. Notably, on challenging benchmarks such as AIME 2025, DeepConf@512 achieves up to 99.9% accuracy and reduces generated tokens by up to 84.7% compared to full parallel thinking.
Publisher
arXiv
Research Topics
Core Machine Learning
May 12, 2026
Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit
May 12, 2026
November 18, 2025
Roberta Raileanu, * Equal authorship, Alexis Audran-Reiss, Amar Budhiraja *, Anton Protopopov, Bhavul Gauri, Despoina Magka, Gaurav Chaurasia, Michael Slater, Shalini Maiti *, Tatiana Shavrina, Yoram Bachrach
November 18, 2025
October 13, 2025
Paria Rashidinejad, Cai Zhou, Tommi Jaakkola, DiJia Su, Bo Liu, Feiyu Chen, Chenyu Wang, Shannon Zejiang Shen, Sid Wang, Siyan Zhao, Song Jiang, Yuandong Tian
October 13, 2025
September 24, 2025
Chris Cummins, Hugh Leather, Aram Markosyan, Matteo Pagliardini, Tal Remez, Volker Seeker, Marco Selvi, Lingming Zhang, Abhishek Charnalia, Alex Gu, Badr Youbi Idrissi, Christian Keller, Daniel Haziza, David Zhang, Dmitrii Pedchenko, Emily McMilin, Fabian Gloeckle, Felix Kreuk, Francisco Massa, François Fleuret, Gabriel Synnaeve, Gal Cohen, Gallil Maimon, Jacob Kahn, Jade Copet, Jannik Kossen, Jonas Gehring, Jordi Armengol-Estape, Juliette Decugis, Keyur Muzumdar, Kunhao Zheng, Luca Wehrstedt, Maximilian Beck, Michael Hassid, Michel Meyer, Naila Murray, Oren Sultan, Ori Yoran, Pedram Bashiri, Peter O'Hearn, Pierre Chambon, Pierre-Emmanuel Mazaré, Quentin Carbonneaux, Rahul Kindi, Sida Wang, Taco Cohen, Vegard Mella, Yossi Adi, Yuxiang Wei, Zacharias Fisches
September 24, 2025

Our approach
Latest news
Foundational models