Adaptive optimization methods that use preconditioned gradients, like Adam, are crucial to achieve good performance for a variety of learning tasks. In this work, we identify that the benefits of adaptivity could diminish when trained with differential privacy, primarily because the noise added to ensure privacy could significantly reduce the effectiveness of the preconditioner. To this end, we propose AdaDPS, an approach that uses \emph{non-sensitive side information} to precondition the gradients, allowing effective use of adaptive methods in private settings. We formally show that by the use of side information for preconditioning in AdaDPS, one can substantially reduce the amount of noise to achieve similar privacy guarantees; thereby, yielding better optimization performance. Empirically, AdaDPS demonstrates 10% absolute average accuracy improvement across various tasks and models compared with strong baselines.

Authors' notes