October 27, 2021
We propose a hypothesis that effective policies can be learned from data without dynamic programming bootstrapping. To investigate this, we consider replacing traditional reinforcement learning (RL) algorithms -- which typically bootstrap against a learned value function -- with a simple sequence modeling objective. We train a transformer model on sequences of returns, states, and actions with an autoregressive prediction loss widely used in language modeling, reducing policy sampling to sequence generation. By training a transformer model using a supervised loss function, we can remove the need for dynamic programming bootstrapping, which is known to be unstable with function approximation. Furthermore, we can also leverage the simplicity, scalability, and long-range memory capabilities of transformers. Through experiments spanning a diverse set of offline RL benchmarks including Atari, OpenAI Gym, and Key-to-Door, we show that our Decision Transformer model can learn to generate diverse behaviors by conditioning on desired returns. In particular, our Decision Transformer, when conditioned with high desired returns, produces a policy that is competitive or better than state of the art model-free offline RL algorithms.
Written by
Lili Chen
Kevin Lu
Kimin Lee
Michael Laskin
Pieter Abbeel
Aravind Srinivas
Igor Mordatch
Aravind Rajeswaran
Aditya Grover
Publisher
NeurIPS
May 12, 2026
Corentin Bel, Linnea Evanson, Julien Gadonneix, Andrea Santos Revilla, Mingfang (Lucy) Zhang, Julie Bonnaire, Charlotte Caucheteux, Alexandre Défossez, Théo Desbordes, Pablo Diego-Simón, Shubh Khanna, Juliette Millet, Pierre Orhan, Saarang Panchavati, Antoine Ratouchniak, Alexis Thual, Hubert Jacob Banville, Jarod Levy, Jean Remi King, Josephine Raugel, Jérémy Rapin, Katelyn Begany, Marlene Careil, Simon Dahan, Sophia Houhamdi, Stéphane d'Ascoli, Teon Brooks, Yohann Benchetrit
May 12, 2026
December 26, 2025
Brandon Amos, Anselm Paulus, Arman Zharmagambetov, Ilia Kulikov, Ivan Evtimov, Kamalika Chaudhuri, Remi Munos
December 26, 2025
December 01, 2025
Amine Benhalloum, Hany Awadalla, Hejia Zhang, Hunter Lang, Julian Katz-Samuels, Karishma Mandyam, Licheng Yu, Manaal Faruqui, Maryam Fazel-Zarandi, Nanshu Wang, Qi Qi, Richard Yuanzhe Pang, Selina Xiaoliang Peng, Shengjie Bi, Shengyu Feng, Shishir G. Patil, Sopan Khosla, Sujan Gonugondla, Vincent Li, Wenzhe Li, Yuanhao Xiong, Yue Yu, Yun He, Yundi Qian
December 01, 2025
November 18, 2025
Roberta Raileanu, * Equal authorship, Alexis Audran-Reiss, Amar Budhiraja *, Anton Protopopov, Bhavul Gauri, Despoina Magka, Gaurav Chaurasia, Michael Slater, Shalini Maiti *, Tatiana Shavrina, Yoram Bachrach
November 18, 2025

Our approach
Latest news
Foundational models