In standard neural networks the amount of computation used is directly proportional to the size of the inputs, instead of the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet requires minimal changes to the network architecture, and learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a complex synthetic problem, PonderNet dramatically improves performance over previous state of the art adaptive computation methods by also succeeding at extrapolation tests where traditional neural networks fail. Finally, we tested our method on a real world question and answering dataset where we matched the current state of the art results using less compute.

Authors' notes