Revisiting Simple Regret: Fast Rates for Returning a Good Arm

Yao Zhao, Connor James Stephens, Csaba Szepesvári, Kwang Sung Jun

Research output: Contribution to journalConference articlepeer-review

4 Scopus citations

Abstract

Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits. However, possibly due to the challenge of characterizing it, in the theory community, it has been largely overlooked in favor of studying the probability of selecting the best, or an ϵ-good arm. In this paper, we make significant progress on minimizing simple regret in both data-rich (T ≥ n) and data-poor regime (T ≤ n) where n is the number of arms, and T is the number of samples. At its heart is our improved instance-dependent analysis of the well-known Sequential Halving (SH) algorithm, where we bound the probability of returning an arm whose mean reward is not within ϵ from the best (i.e., not ϵ-good) for any choice of ϵ > 0, despite ϵ not being input to SH. Our bound not only leads to an optimal worst-case simple regret bound of n/T up to logarithmic factors but also essentially matches the instance-dependent lower bound for returning an ϵ-good arm reported by Katz-Samuels and Jamieson (2020). For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once. Our empirical study shows that BSH outperforms existing methods on real-world tasks.

Original languageEnglish (US)
Pages (from-to)42110-42158
Number of pages49
JournalProceedings of Machine Learning Research
Volume202
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: Jul 23 2023Jul 29 2023

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Revisiting Simple Regret: Fast Rates for Returning a Good Arm'. Together they form a unique fingerprint.

Cite this