Due to the phenomenon of “posterior collapse”, current latent variable generative models pose a challenging design choice which trades-off optimizing the ELBO but handicapping the decoder’s capacity and expressivity, or changing the loss to something that is not directly minimizing the description length of the data. In this paper we propose an alternative that utilizes the best, most powerful generative models as decoders, whilst optimizing the proper variational lower bound all while ensuring that the latent variables preserve and encode useful information. δ-VAEs proposed here achieve this by constraining the variational family for the posterior to have a minimum distance to the prior. For sequential latent variable models, our approach resembles the classic representation learning approach of slow feature analysis. We demonstrate the efficacy of our approach at modeling text on LM1B and modeling images: learning representations, improving sample quality, and achieving state of the art log-likelihood on CIFAR-10 and ImageNet 32 × 32.