ROBOTICS

REINFORCEMENT LEARNING

Accelerating Exploration with Unlabeled Prior Data

December 10, 2023

Abstract

Learning to solve tasks from a sparse reward signal is a major challenge for standard reinforcement learning (RL) algorithms. However, in the real world, agents rarely need to solve sparse reward tasks entirely from scratch. More often, we might possess prior experience to draw on that provides considerable guidance about which actions and outcomes are possible in the world, which we can use to explore more effectively for new tasks. In this work, we study how prior data without reward labels may be used to guide and accelerate exploration for an agent solving a new sparse reward task. We propose a simple approach that learns a reward model from online experience, labels the unlabeled prior data with optimistic rewards, and then uses it concurrently alongside the online data for downstream policy and critic optimization. This general formula leads to rapid exploration in several challenging sparse-reward domains where tabula rasa exploration is insufficient, including the AntMaze domain, Adroit hand manipulation domain, and a visual simulated robotic manipulation domain. Our results highlight the ease of incorporating unlabeled prior data into existing online RL algorithms, and the (perhaps surprising) effectiveness of doing so.

Download the Paper

AUTHORS

Written by

Qiyang Li

Jason Zhang

Dibya Ghosh

Amy Zhang

Sergey Levine

Publisher

NeurIPS

Research Topics

Reinforcement Learning

Robotics

Related Publications

May 06, 2024

REINFORCEMENT LEARNING

COMPUTER VISION

Solving General Noisy Inverse Problem via Posterior Sampling: A Policy Gradient Viewpoint

Haoyue Tang, Tian Xie

May 06, 2024

May 06, 2024

ROBOTICS

Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration

Ben Newman, Christopher Paxton, Kris Kitani, Henny Admoni

May 06, 2024

April 30, 2024

REINFORCEMENT LEARNING

Multi-Agent Diagnostics for Robustness via Illuminated Diversity

Mikayel Samvelyan, Minqi Jiang, Davide Paglieri, Jack Parker-Holder, Tim Rocktäschel

April 30, 2024

April 02, 2024

ROBOTICS

REINFORCEMENT LEARNING

MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation

Patrick Lancaster, Nicklas Hansen, Aravind Rajeswaran, Vikash Kumar

April 02, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.