November 12, 2021
Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider “simple regret” in BwK, which tracks algorithm’s performance in a given round, and prove that it is small in all but a few rounds. Third, we provide a general “reduction” from BwK to bandits which takes advantage of some known helpful structure, and apply this reduction to combinatorial semi-bandits, linear contextual bandits, and multinomial-logit bandits. Our results build on the BwK algorithm from Agrawal and Devanur (2014), providing new analyses thereof.
Publisher
NeurIPS
July 08, 2024
Antonio Orvieto, Lin Xiao
July 08, 2024
July 01, 2024
Andrei Lupu, Chris Lu, Robert Lange, Jakob Foerster
July 01, 2024
May 06, 2024
Haoyue Tang, Tian Xie
May 06, 2024
April 30, 2024
Mikayel Samvelyan, Minqi Jiang, Davide Paglieri, Jack Parker-Holder, Tim Rocktäschel
April 30, 2024
Product experiences
Foundational models
Product experiences
Latest news
Foundational models