V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
Francis Song,
Abbas Abdolmaleki,
Jost Tobias Springenberg,
Aidan Clark,
Hubert Soyer,
Jack Rae,
Seb Noury,
Arun Ahuja,
Siqi Liu,
Dhruva Tirumala,
Nicolas Heess,
Dan Belov,
Martin Riedmiller,
Matt Botvinick
arXiv
2019-09-26
Deep reinforcement learning