Fighting Gradients with Gradients: Dynamic Defenses against Adversarial Attacks

Adversarial attacks optimize against models to defeat defenses. We argue that models should fight back by optimizing their defenses against attacks at test-time. Existing defenses are static, and stay the same once trained, even while attacks change. We propose a dynamic defense, defensive entropy minimization (dent), to adapt the model and input during testing by gradient optimization. Our dynamic defense adapts fully at test-time, without altering training, which makes it compatible with existing models and defenses. Dent improves robustness to attack by 20+ points absolute for state-of-the-art adversarial training defenses against AutoAttack on CIFAR-10 at ∞ = 8/255

Authors' notes