Certifying Some Distributional Robustness with Principled Adversarial Training - ShortScience.org

A novel method for adversarially-robust learning with theoretical guarantees under small perturbations. 1) Given the default distribution P_0, defines a proximity of it as a set of distributions which are \rho-close to P_0 in terms of Wasserstein metric with a predefined cost function c (e.g. L2); 2) Formulates the robust learning problem as minimization of the worst-case example in the proximity and proposes a Lagrangian relaxation of it; 3) Given it, provides a data-dependent upper bound on the worst-case loss, demonstrates that the problem of finding the worst-case adversarial perturbation, which is generally NP hard, renders to optimization of a concave function if the maximum amount of perturbation \rho is low.