Beyond Pick-and-Place: Tackling robotic stacking of diverse shapes with simulation-to-reality transfer and offline reinforcement learning

We study the problem of vision-based robotic manipulation where inter-object interaction matters and involves complex contact dynamics. We do so by attempting to solve challenging stacking tasks that involve a diverse set of objects with complex geometries -- that we carefully designed to require solutions beyond simple ``pick-and-place''. Our solution consists of a reinforcement learning (RL) approach combined with interactive imitation learning for simulation-to-reality transfer. Our approach can efficiently handle a large number of distinct shape stacking combinations in the real world and exhibits a diverse set of interesting behaviors. In a large experimental study, we investigate what choices matter for learning such general vision-based agents in simulation, and what affects optimal transfer to the real robot; for which we collect over 50,000 real world episodes (over 250 hours of actual robot evaluation time). We then leverage data collected by such policies and improve upon them with offline RL.

Authors' notes