Continuous Neural Algorithmic Planners

Planning is an important aspect of successful agency, especially as tasks get more combinatorially challenging. This intuition is applied in reinforcement learning, bringing about algorithms, such as value iteration, that allow us to plan and obtain optimal policies, if given the necessary information about the environment. Implicit planning eliminates the need for this privileged information, by combining learned world models with model-free reinforcement learning. A recent implicit planner, XLVIN, allows reaping the benefits of modern representation learning while still maintaining alignment to the value iteration algorithm; however, it only supports discrete action spaces, and is hence nontrivially applicable on most tasks of real-world interest. We expand XLVIN to continuous action spaces by discretising the action space, and evaluating several selective expansion policies. Our proposal, CNAP, demonstrates how neural algorithmic reasoning can make measurable impact in higher-dimensional continuous control settings, such as MuJoCo, bringing gains in low-data settings and outperforming model-free baselines.

Authors' notes