StarCraft II Unplugged: Large Scale Offline Reinforcement Learning

StarCraft II remains one of the most challenging reinforcement learning environments today. Unlike other popular benchmark environments, StarCraft II is partially observed, stochastic, and mastery requires both strategic planning over long-time horizons, and real-time low-level execution. StarCraft II has also available millions of games played by human players and made available by Blizzard. This makes StarCraft II an interesting offline RL benchmark. Unlike other offline RL benchmarks, the data comes from actual human players, as opposed to pre-trained agents. The use of a natural dataset better models real-world offline RL tasks, and presents new challenges to current offline RL methods. We present results on this benchmark using offline RL methods such as behaviour cloning, V-trace Actor Critic, and MuZero.

Authors' notes