July 17, 2021
Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level’s future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample efficiency and generalization on Procgen Benchmark—matching the previous state-of-the-art in test return—and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines.
Publisher
ICML 2021
Research Topics
December 05, 2020
Deepak Pathak, Abhinav Gupta, Mustafa Mukadam, Shikhar Bahl
December 05, 2020
December 07, 2020
Yuandong Tian, Qucheng Gong, Tina Jiang
December 07, 2020
March 13, 2021
Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, Andre Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra
March 13, 2021
October 10, 2020
Luis Pineda, Sumana Basu, Adriana Romero,Roberto CalandraRoberto Calandra, Michal Drozdzal
October 10, 2020
December 05, 2020
Andrea Tirinzonin, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric
December 05, 2020
Foundational models
Latest news
Foundational models