May 30, 2019
In stochastic multi-armed bandit (MAB), the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected (i.e., rested bandit setting). In this paper, we consider the nonparametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon T, and without any knowledge on the decreasing behavior of the K arms, FEWA achieves problem-dependent, Õ(log (KT)), and problem-independent, Õ( √ KT), regret bounds. This result substantially improves over the algorithm proposed by Levine et al. (2017), which suffers regret Õ(K1/3T2/3), and it matches standard bounds for the stochastic MAB setting, thus showing that the rotting bandit is not harder. Finally, we report simulations confirming the theoretical improvements of FEWA.
November 27, 2022
Nicolas Ballas, Bernhard Schölkopf, Chris Pal, Francesco Locatello, Li Erran, Martin Weiss, Nasim Rahaman, Yoshua Bengio
November 27, 2022
November 27, 2022
Andrea Tirinzoni, Aymen Al Marjani, Emilie Kaufmann
November 27, 2022
November 16, 2022
Kushal Tirumala, Aram H. Markosyan, Armen Aghajanyan, Luke Zettlemoyer
November 16, 2022
November 10, 2022
Unnat Jain, Abhinav Gupta, Himangi Mittal, Pedro Morgado
November 10, 2022
April 08, 2021
Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert Gordo, Cristian Canton Ferrer
April 08, 2021
April 30, 2018
Tomer Galanti, Lior Wolf, Sagie Benaim
April 30, 2018
April 30, 2018
Yedid Hoshen, Lior Wolf
April 30, 2018
December 11, 2019
Eliya Nachmani, Lior Wolf
December 11, 2019
Foundational models
Latest news
Foundational models