NeuPL: Neural Population Learning

Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by training new policies against existing ones, growing a population of policies that collectively become robust to exploit. Unfortunately, this approach suffers from two fundamental issues in real-world games: a) under finite compute budget, approximate best-response procedure often needs truncating, resulting in "good"-responses populating the policy population; b) in skill-based games, tabula rasa learning of best-responses is wasteful and quickly become computationally intractable, facing increasingly skillful opponents. In this work, we propose Neural Population Learning (NeuPL) as a solution to both issues. NeuPL offers convergence guarantees to a population of best-responses under mild conditions. By representing an entire population of policies within a single conditional model, NeuPL enables skill transfer across policies by construction. Empirically, we show the generality, improved performance and efficiency of NeuPL across several test domains. Most interestingly, we show that novel strategies become more accessible, not less, as the neural population expands. See for supplementary illustrations.

Authors' notes