Offline Distillation for Robot Lifelong Learning with Imbalanced Experience

Robots may experience non-stationary environment dynamics throughout its lifetime: the robot body might deform due to wear-and-tear; the surroundings it interacts with might change. Eventually, we want the robots to perform well in all of the environment variations it has encountered. At the same time, it should still be able to learn fast in a new environment. We investigate two challenges within this problem setting: First, existing off-policy algorithms struggle with the trade-off between being conservative with the old data and effective exploration in the new environment. We propose the Offline Distillation Pipeline to break this trade-off. Second, training with the combined datasets from multiple environments might create significant performance drop. We provide evidence that the drop is due to the additional boostrapping error caused by the imbalanced quality and size of the datasets and propose a simple fix. In the experiments, we demonstrate the challenges and evaluate the proposed solutions on a simulated bipedal robot walking task across various environment changes.

Authors' notes