How RL Agents Behave when their Actions are Modified

Reinforcement learning in complex environments may require supervision to prevent the agent from attempting dangerous actions. As a result, the actions executed in the environment may not match the actions specified by the algorithm. How does this affect learning? We present the Modified Action Markov Decision Process, an extension of the MDP model that allows actions to differ from the policy. We analyze the asymptotic behaviors of common reinforcement learning algorithms and show that they adapt to interventions in different ways: while some ignore them completely, others go to various lengths in trying to avoid action modifications that decrease reward. Agent designers can use this to choose algorithms that lack an incentive to evade oversight or have an incentive to avoid self-damage

Authors' notes