Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Hindsight rationality is an approach to playing multi-agent, general-sum games that prescribes no-regret learning dynamics and describes jointly hindsight rational behavior with mediated equilibria. We explore the space of deviation types in extensive-form games (EFGs) for strong types that are efficient to compute in games with moderate lengths. We identify four types of deviations that subsume previously studied types within a broader class of partial sequence deviations. Combining time selection regret minimization and the counterfactual regret minimization (CFR) algorithm, we introduce the extensive-form regret minimization (EFR) algorithm that is hindsight rational for a general and natural class of deviations in extensive-form games. We provide instantiations and regret bounds for EFR that correspond to each partial sequence deviation type. In addition, we present a thorough empirical analysis of EFR’s performance with different deviation types in common benchmark games. As theory suggests, partial sequence deviation EFR instances tend to accumulate more reward than weaker versions of EFR.

Authors' notes