The Challenges of Exploration for Offline Reinforcement Learning

Offline Reinforcement Learning (ORL) enables us to separate the two distinct but connected processes which build the foundation of data-efficient reinforcement learning: collecting the most informative experience and optimally inferring knowledge from given data. While the second question has gained much attention recently, there is less work on the first. In settings where data is costly however, collecting the right data becomes particularly important. In this paper, we thus look into the data collection process and further focus on the task-agnostic setting which allows for additional amortization of data collection costs through increased re-usability. We combine curiosity-based exploration with model- predictive control to understand the effects of online planning on task-agnostic exploration. With Explore2Offline, we propose a setting for evaluating the quality of the collected data by infer- ring policies from the collected data through a standard offline RL algorithm and reward relabelling. We evaluate a wide variety of data collection strategies using this scheme and demonstrate their performance on various tasks. With this work, we expect to gain intuition about data needs for offline RL, which will finally lead to improved data-efficiency in RL agents.

(Nathan Lambert's internship project)

Authors' notes