November 25, 2019
State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing full-planning on Markov Decision Processes (MDPs) built by the gathered experience. In this paper, we focus on model-based RL in the finite-state finite-horizon undiscounted MDP setting and establish that exploring with greedy policies – act by 1-step planning – can achieve tight minimax performance in terms of regret, \tilde{O}(\sqrt{HSAT}). Thus, full-planning in model-based RL can be avoided altogether without any performance degradation, and, by doing so, the computational complexity decreases by a factor of S. The results are based on a novel analysis of real-time dynamic programming, then extended to model-based RL. Specifically, we generalize existing algorithms that perform full-planning to act by 1-step planning. For these generalizations, we prove regret bounds with the same rate as their full-planning counterparts.
Written by
Mohammad Ghavamzadeh
Nadav Merlis
Shie Mannor
Yonathan Efroni
Publisher
NeurIPS
July 03, 2026
Sonia Joseph, Quentin Garrido, Randall Balestriero, Matthew Kowal, Thomas Fel, Shahab Bakhtiari, Blake Richards, Mike Rabbat
July 03, 2026
June 05, 2026
Zeyu Yang, Qi Ma, Jason Chen, Anshumali Shrivastava
June 05, 2026
May 26, 2026
Josephine Raugel, Max Seitzer, Marc Szafraniec, Huy V. Vo, Jérémy Rapin, Patrick Labatut, Piotr Bojanowski, Valentin Wyart, Jean Remi King
May 26, 2026
May 20, 2026
Dongyan Lin, Phillip Rust, Angel Villar Corrales, Alvin W. M. Tan, Mahi Luthra, Charles-Eric Saint-James, Rashel Moritz, Sheila Krogh-Jespersen, Vanessa Stark, Surya Parimi, Jiayi Shen, Youssef Benchekroun, Yosuke Higuchi, Martin Gleize, Tom Fizycki, Nicolas Hamilakis, Manel Khentout, Sho Tsuji, Balázs Kégl, Juan Pino, Michael C. Frank, Emmanuel Dupoux
May 20, 2026

Our approach
Latest news
Foundational models