Offline methods for reinforcement learning have the potential to help bridge the gap between reinforcement learning research and real-world applications. They make it possible to learn policies from offline datasets, thus overcoming concerns associated with online data collection in the real-world, including cost, safety, or ethical concerns. In this paper, we propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games (e.g., Atari benchmark) and simulated motor control problems (e.g. DM Control Suite). The datasets include domains that are partially or fully observable, use continuous or discrete actions, and have stochastic vs. deterministic dynamics. We propose detailed evaluation protocols for each domain in RL Unplugged and provide an extensive analysis of supervised learning and offline RL methods using these protocols. We will release data for all our tasks and open-source all algorithms presented in this paper. We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community. Moving forward, we view RL Unplugged as a living benchmark suite that will evolve and grow with datasets contributed by the research community and ourselves. Our project page is available on github.
Many of the successes of RL rely heavily on repeated online interactions of an agent with an environment, which we call online RL. Despite its success in simulation, the uptake of RL for real-world applications has been limited. Power plants, robots, healthcare systems, or self-driving cars are expensive to run and inappropriate controls can have dangerous consequences. They are not easily compatible with the crucial idea of exploration in RL and the data requirements of online RL algorithms. Nevertheless, most real-world systems produce large amounts of data as part of their normal operation, and the goal of offline RL to learn a policy directly from that logged data without interacting with the environment.
Offline RL methods (e.g Agarwal et al., 2020; Fujimoto et al., 2018) have shown promising results on well-known benchmark domains. However, non-standardized evaluation protocols, differing datasets, and ack of baselines make algorithmic comparisons difficult. Nevertheless, some important properties of potential real-world application domains such as partial observability, high-dimensional sensory streams (i.e., images), diverse action spaces, exploration problems, non-stationarity, and stochasticity, are underrepresented in the current offline RL literature.
We introduce a novel collection of task domains and associated datasets together with a clear evaluation protocol. We include widely-used domains such as the DM Control Suite (Tassa et al., 2018) and Atari 2600 games (Bellemare et al., 2013), but also domains that are still challenging for strong online RL algorithms such as real-world RL (RWRL) suite tasks (Dulac-Arnold et al., 2020) and DM Locomotion tasks (Heess et al., 2017; Merel et al., 2019a,b, 2020). By standardizing the environments, datasets, and evaluation protocols, we hope to make research in offline RL more reproducible and accessible. We call our suite of benchmarks “RL Unplugged”, because offline RL methods can use it without any actors interacting with the environment. Our paper offers four main contributions: (i) a unified API for datasets (ii) a varied set of environments (iii) clear evaluation protocols for offline RL research, and (iv) reference performance baselines.