October 27, 2021
We propose a hypothesis that effective policies can be learned from data without dynamic programming bootstrapping. To investigate this, we consider replacing traditional reinforcement learning (RL) algorithms -- which typically bootstrap against a learned value function -- with a simple sequence modeling objective. We train a transformer model on sequences of returns, states, and actions with an autoregressive prediction loss widely used in language modeling, reducing policy sampling to sequence generation. By training a transformer model using a supervised loss function, we can remove the need for dynamic programming bootstrapping, which is known to be unstable with function approximation. Furthermore, we can also leverage the simplicity, scalability, and long-range memory capabilities of transformers. Through experiments spanning a diverse set of offline RL benchmarks including Atari, OpenAI Gym, and Key-to-Door, we show that our Decision Transformer model can learn to generate diverse behaviors by conditioning on desired returns. In particular, our Decision Transformer, when conditioned with high desired returns, produces a policy that is competitive or better than state of the art model-free offline RL algorithms.
Written by
Lili Chen
Kevin Lu
Aravind Rajeswaran
Kimin Lee
Aditya Grover
Michael Laskin
Pieter Abbeel
Aravind Srinivas
Igor Mordatch
Publisher
NeurIPS
August 12, 2024
Arman Zharmagambetov, Yuandong Tian, Aaron Ferber, Bistra Dilkina, Taoan Huang
August 12, 2024
August 09, 2024
Emily Wenger, Eshika Saxena, Mohamed Malhou, Ellie Thieu, Kristin Lauter
August 09, 2024
August 02, 2024
August 02, 2024
July 29, 2024
Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Saketh Rambhatla, Mian Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra
July 29, 2024
Foundational models
Latest news
Foundational models