September 06, 2023
Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user’s response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user’s subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms.
Publisher
RecSys
December 12, 2024
Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, Matteo Pirotta
December 12, 2024
August 16, 2024
Zhihan Xiong, Maryam Fazel, Lin Xiao
August 16, 2024
July 01, 2024
Andrei Lupu, Chris Lu, Robert Lange, Jakob Foerster
July 01, 2024
May 06, 2024
Haoyue Tang, Tian Xie
May 06, 2024
Foundational models
Latest news
Foundational models