More Robustness with Random Data
Abstract

Recent work argue that robust training requires substantially larger datasets than those required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a sizable robust-accuracy gap between models trained solely on data from the original training set and those trained with additional data extracted from the 80 Million Tiny Images'' dataset. In this paper, we explore how state-of-the-art generative models can be leveraged to artificially increase the size of the original training set and improve adversarial robustness to \lp-norm bounded perturbations. We demonstrate that it is possible to significantly reduce the robust-accuracy gap to models trained with additional real data. Surprisingly, we also show that even the addition of non-realistic random data (generated by Gaussian sampling) can improve robustness. We evaluate our approach on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $\epsilon = 8/255$ and $\epsilon = 128/255$, respectively. We show large absolute improvements in robust accuracy compared to previous state-of-the-art methods. Against $\ell_\infty$ norm-bounded perturbations of size $\epsilon = 8/255$, our model achieves 63.58\% and 33.49\% robust accuracy on CIFAR-10 and CIFAR-100, respectively (improving upon the state-of-the-art by +6.44\% and +3.29\%). Against $\ell_2$ norm-bounded perturbations of size $\epsilon = 128/255$, our model achieves 78.31\% on CIFAR-10 (+3.81\%). These results beat most prior works that use external data.

