Introducing Symmetries to Black Box Meta Reinforcement Learning

Meta reinforcement learning (RL) attempts to discover newRL algorithms automatically from environment interaction.In so-called black-box approaches, the policy and the learningalgorithm are jointly represented by a single neural network.These methods are very flexible, but they tend to underper-form in terms of generalisation to new, unseen environments.In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RLapproach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically thereuse of the learning rule, and invariance to input and out-put permutations) that are not present in typical black-boxmeta RL systems. We hypothesise that these symmetries canplay an important role in meta-generalisation. Building off re-cent work in black-box supervised meta learning (Kirsch andSchmidhuber 2020), we develop a black-box meta RL sys-tem that exhibits these same symmetries. We show throughcareful experimentation that incorporating these symmetriescan lead to algorithms with a greater ability to generalise tounseen action & observation spaces, tasks, and environments.