THEORY

REINFORCEMENT LEARNING

Dual Approximation Policy Optimization

August 16, 2024

Abstract

We propose Dual Approximation Policy Optimization (DAPO), a framework that incorporates general function approximation into policy mirror descent methods. In contrast to the popular approach of using the L2-norm to measure function approximation errors, DAPO uses the dual Bregman divergence induced by the mirror map for policy projection. This duality framework has both theoretical and practical implications: not only does it achieve fast linear convergence with general function approximation, but it also includes several well-known practical methods as special cases, immediately providing strong convergence guarantees.

Download the Paper

AUTHORS

Written by

Zhihan Xiong

Maryam Fazel

Lin Xiao

Publisher

ICML

Research Topics

Theory

Reinforcement Learning

Core Machine Learning

Related Publications

October 13, 2025

REINFORCEMENT LEARNING

RESEARCH

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models

Chenyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu

October 13, 2025

September 24, 2025

CONVERSATIONAL AI

REINFORCEMENT LEARNING

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Dulhan Jayalath, Shashwat Goel, Thomas Simon Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten

September 24, 2025

September 08, 2025

THEORY

REINFORCEMENT LEARNING

Understanding Reinforcement Learning for Model Training, and future directions with GRAPE

Rohit Patel

September 08, 2025

September 02, 2025

REINFORCEMENT LEARNING

NLP

Jointly Reinforcing Diversity and Quality in Language Model Generations

Tianjian Li, Yiming Zhang, Ping Yu, Swarnadeep Saha, Daniel Khashabi, Jason Weston, Jack Lanchantin, Tianlu Wang

September 02, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.