Gaussian dropout as an information bottleneck layer

As models become more powerful, they can acquire the ability to fit the data well in multiple qualitatively different ways. At the same time, we might have requirements other than high predictive performance that we would like the model to satisfy. One way to express such preferences is by controlling the information flow in the model with carefully placed information bottleneck layers, which limit the amount of information that passes through them by applying noise to their inputs. The most notable example of such a layer is the stochastic representation layer of the Deep Variational Information Bottleneck, using which requires adding a variational upper bound on the mutual information between its inputs and outputs as a penalty to the loss function. We show that using Gaussian dropout, which involves multiplicative Gaussian noise, achieves the same goal in a simpler way without requiring any additional terms in the objective. We evaluate the two approaches in the generative modelling setting, by using them to encourage the use of latent variables in a VAE with an autoregressive decoder for modelling images.