REINFORCEMENT LEARNING

CORE MACHINE LEARNING

Decision Transformer: Reinforcement Learning via Sequence Modeling

October 27, 2021

Abstract

We propose a hypothesis that effective policies can be learned from data without dynamic programming bootstrapping. To investigate this, we consider replacing traditional reinforcement learning (RL) algorithms -- which typically bootstrap against a learned value function -- with a simple sequence modeling objective. We train a transformer model on sequences of returns, states, and actions with an autoregressive prediction loss widely used in language modeling, reducing policy sampling to sequence generation. By training a transformer model using a supervised loss function, we can remove the need for dynamic programming bootstrapping, which is known to be unstable with function approximation. Furthermore, we can also leverage the simplicity, scalability, and long-range memory capabilities of transformers. Through experiments spanning a diverse set of offline RL benchmarks including Atari, OpenAI Gym, and Key-to-Door, we show that our Decision Transformer model can learn to generate diverse behaviors by conditioning on desired returns. In particular, our Decision Transformer, when conditioned with high desired returns, produces a policy that is competitive or better than state of the art model-free offline RL algorithms.

Download the Paper

AUTHORS

Written by

Lili Chen

Kevin Lu

Kimin Lee

Michael Laskin

Pieter Abbeel

Aravind Srinivas

Igor Mordatch

Aravind Rajeswaran

Aditya Grover

Publisher

NeurIPS

Research Topics

Reinforcement Learning

Core Machine Learning

Related Publications

May 12, 2026

HUMAN & MACHINE INTELLIGENCE

RESEARCH

NeuralSet: A High-Performing Python Package for Neuro-AI

Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit

May 12, 2026

December 26, 2025

REINFORCEMENT LEARNING

NLP

Safety Alignment of LMs via Non-cooperative Games

Brandon Amos, Anselm Paulus, Arman Zharmagambetov, Ilia Kulikov, Ivan Evtimov, Kamalika Chaudhuri, Remi Munos

December 26, 2025

December 01, 2025

CONVERSATIONAL AI

REINFORCEMENT LEARNING

Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Amine Benhalloum, Hany Awadalla, Hejia Zhang, Hunter Lang, Julian Katz-Samuels, Karishma Mandyam, Licheng Yu, Manaal Faruqui, Maryam Fazel-Zarandi, Nanshu Wang, Qi Qi, Richard Yuanzhe Pang, Selina Xiaoliang Peng, Shengjie Bi, Shengyu Feng, Shishir G. Patil, Sopan Khosla, Sujan Gonugondla, Vincent Li, Wenzhe Li, Yuanhao Xiong, Yue Yu, Yun He, Yundi Qian

December 01, 2025

November 18, 2025

RESEARCH

CORE MACHINE LEARNING

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Roberta Raileanu, * Equal authorship, Alexis Audran-Reiss, Amar Budhiraja *, Anton Protopopov, Bhavul Gauri, Despoina Magka, Gaurav Chaurasia, Michael Slater, Shalini Maiti *, Tatiana Shavrina, Yoram Bachrach

November 18, 2025

Help Us Pioneer The Future of AI

We share our open source frameworks, tools, libraries, and models for everything from research exploration to large-scale production deployment.