AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

Many real-world applications require artificial agents compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the game of StarCraft has emerged by consensus as an important challenge domain for AI research, due to its iconic and enduring status among the hardest professional esports, and its relevance to the real world in terms of its raw complexity and multi-agent challenges. StarCraft has a combinatorial action space, a planning horizon that extends over tens of thousands of real-time decisions, and severe conditions of imperfect information. Further, StarCraft raises important game-theoretic challenges: it features a vast space of cyclic, non-transitive strategies and counter-strategies; discovering novel strategies is intractable with naive self-play exploration methods; and those strategies may not be effective when deployed in real-world play with humans. Over the course of a decade and numerous competitions, the best results were made possible by either hand-crafting major elements of the system, simplifying important aspects of the game, or giving systems clearly superhuman capabilities such as executing tens of thousands of actions per minute. Even with these modifications, no previous system has come close to rivalling the skill of top players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse League of continually adapting strategies and counter-strategies, each represented by deep neural networks. We evaluated our agent, AlphaStar, in the full game of StarCraft II, in a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of human players.