THEORY

REINFORCEMENT LEARNING

Dual Approximation Policy Optimization

August 16, 2024

Abstract

We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the L2-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.

Download the Paper

AUTHORS

Written by

Zhihan Xiong

Maryam Fazel

Lin Xiao

Publisher

ICML

Research Topics

Theory

Reinforcement Learning

Core Machine Learning

Related Publications

December 12, 2024

REINFORCEMENT LEARNING

Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models

Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, Matteo Pirotta

December 12, 2024

November 06, 2024

THEORY

CORE MACHINE LEARNING

The Road Less Scheduled

Aaron Defazio, Alice Yang, Harsh Mehta, Konstantin Mishchenko, Ahmed Khaled, Ashok Cutkosky

November 06, 2024

July 08, 2024

THEORY

CORE MACHINE LEARNING

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes

Antonio Orvieto, Lin Xiao

July 08, 2024

July 01, 2024

REINFORCEMENT LEARNING

Behaviour Distillation

Andrei Lupu, Chris Lu, Robert Lange, Jakob Foerster

July 01, 2024

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.