CoBERL: Contrastive BERT for Reinforcement Learning

The ability to detect and react to events that occur at different time scales is a central aspect of human intelligence. Unfortunately, current deep reinforcement learning agents have difficulties keeping track of long-term dependencies. To effectively tackle this shortcoming, we propose Contrastive BERT for Reinforcement Learning (CoBERL), an agent that combines a novel architecture with a new unsupervised objective to obtain better representations and ultimately performance. We introduce a Transformer-based architecture that combines the strengths of Transformer self-attention with Long short-term memories. We also propose a novel unsupervised representation learning objective that explicitly enforces self-attention consistency. We extensively demonstrate improved performance of our proposed agent on a varied set of tasks ranging from control and memory probing tasks to Atari games.

Authors' notes