Understanding and Preventing Capacity Loss in Reinforcement Learning

The reinforcement learning (RL) problem is rife with sources of nonstationarity that can destabilize or inhibit learning progress. We identify a key mechanism by which this occurs in agents using neural networks as function approximators: capacity loss, whereby networks trained on nonstationary target values lose their ability to quickly fit new target functions over time. We demonstrate that capacity loss occurs in a broad range of RL agents and environments, and provide concrete instances where this prevents agents from making learning progress in sparse-reward tasks. We then present a simple auxiliary task that mitigates this phenomenon by regularizing a subspace of features towards its value at initialization, improving performance over a state-of-the-art model-free algorithm in the Atari 2600 suite. Finally, we study how this auxiliary task affects different notions of capacity and evaluate other mechanisms by which it may improve performance.

Authors' notes