July 11, 2018
Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficient algorithms. In this work, we show that the Tree Backup and Retrace algorithms are unstable with linear function approximation, both in theory and in practice with specific examples. Based on our analysis, we then derive stable and efficient gradient-based algorithms using a quadratic convex-concave saddle-point formulation. By exploiting the problem structure proper to these algorithms, we are able to provide convergence guarantees and finite-sample bounds. The applicability of our new analysis also goes beyond Tree Backup and Retrace and allows us to provide new convergence rates for the GTD and GTD2 algorithms without having recourse to projections or Polyak averaging.
May 14, 2025
Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick
May 14, 2025
May 13, 2025
Marlène Careil, Yohann Benchetrit, Jean-Rémi King
May 13, 2025
April 25, 2025
Rulin Shao, Qiao Rui, Varsha Kishore, Niklas Muennighoff, Victoria Lin, Daniela Rus, Bryan Kian Hsiang Low, Sewon Min, Scott Yih, Pang Wei Koh, Luke Zettlemoyer
April 25, 2025
April 17, 2025
Ansong Ni, Ruta Desai, Yang Li, Xinjie Lei, Dong Wang, Ramya Raghavendra, Gargi Ghosh, Daniel Li (FAIR), Asli Celikyilmaz
April 17, 2025
Our approach
Latest news
Foundational models