Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can all be viewed as solving similar tasks, which are drawn i.i.d. from a prior distribution that captures task similarities. In this work, we study a bandit variant of this problem, where the task prior is unknown. We formulate this problem as learning to act in a hierarchical Bayesian bandit. Our setting subsumes many existing metaand multi-task bandit settings, and we analyze a natural hierarchical Thompson sampling algorithm (hierTS) that solves it. Our regret bounds hold for many instances of this problem, including when the tasks are solved sequentially or in parallel. They also reflect the structure of the problem, such that the regret is low when the width of the task prior is low. Our proofs rely on novel total variance decompositions, which can be applied to other graphical model structures. Finally, our theory is complemented by experiments, which confirm that the hierarchical structure is useful for knowledge sharing among the tasks. This confirms that hierarchical Bayesian bandits are a universal and statistically-efficient approach to learning to act under similar bandit tasks.

Authors' notes