Improved predictives for neural networks through linearization

In this paper we argue that in Bayesian deep learning, the frequently utilised generalized Gauss Newton (GGN) approximation should be understood as a modification of the underlying probabilistic model and should be considered separately from further approximate inference techniques, such as Laplace or VI. Applying the GGN approximation turns the BNN into a locally linearized generalized linear model or, equivalently, a generalized Gaussian process. Because we then use this linearized model for inference, we should also predict using this modified likelihood rather than the original BNN likelihood. This formulation extends previous results by Khan et al 2019 and Foong et al 2019 to general likelihoods and alleviates the underfitting behaviour observed by Ritter et al 2018. We demonstrate our approach on several UCI classification datasets as well as CIFAR10.

Authors' notes