Professional Documents
Culture Documents
Robust Deep Learning
Robust Deep Learning
J. Zico Kolter
1
Outline
Introduction
Final thoughts
2
Outline
Introduction
Final thoughts
3
The AI breakthrough (some recent history)
??
??
5
Adversarial attacks
Figure from
Madry et al.
7
[Lee and Kolter, 2019], https://arxiv.org/abs/1906.11897
Why should we care?
…you probably don’t have an adversary changing inputs to your classifier at a
pixel level (or if you do, you have bigger problems)
1. Genuine security implications for deep networks (e.g., with physical attacks)
8
Outline
Introduction
Final thoughts
9
Adversarial attacks as optimization
Key insight: the same process that enabled us to learn the model parameters via
gradient descent also allows us to create an adversarial example via gradient
descent
𝜕
Loss 𝑓/ (𝑥 + 𝛿), 𝑦
𝜕𝛿
11
Solving with projected gradient descent
Since we are trying to maximize the loss when creating an adversarial example,
we repeatedly move in the direction of the positive gradient
Since we also need to ensure that 𝛿 ∈ Δ, we also project back into this set after
each step, a process known as projected gradient descent (PGD)
𝜕
𝛿 ≔ Proj∆ 𝛿 + 𝛼 Loss 𝑓/ 𝑥 + 𝛿 , 𝑦
𝜕𝛿
Example: for Δ = {𝛿: 𝛿 ∞ ≤ 𝜖} (called the ℓ∞ ball), the projection operator just
clips each coordinate to [−𝜖, 𝜖]
12
The Fast Gradient Sign Method
The Fast Gradient Sign Method (FGSM) takes a
∂
single PGD step with step size 𝛼 → ∞, which α (fθ (x + δ), y)
corresponds exactly to just taking a step in the ∂δ
signs of the gradient terms
P∆
Creates weaker attacks than running full PGD,
but substantially faster
δ=0
∆
13
Illustration of adversarial examples
We will demonstrate adversarial attacks on MNIST data set, using two different
architectures
2-layer fully
connected MLP 6 layer ConvNet
Conv-32x28x28 Conv-64x14x14
14
Illustrations of FGSM/PGD
Test Error, epsilon=0.1
96.4%
92.6%
ConvNet
(FGSM): 74.3%
41.7%
15
Outline
Introduction
Final thoughts
16
Adversarial robustness
“pig”
2. Certified defenses: Provably upper bound inner maximization [Wong and Kolter,
2018; Ragunathan et al., 2018; Mirman et al., 2018; Cohen et al., 2019]
17
Adversarial training
How do we optimize the objective
We would like to solve it with gradient descent, but how do we compute the
gradient of the objective with the max term inside?
18
Danskin’s Theorem
A fundamental result in optimization:
𝜕 𝜕
max Loss 𝑓/ (𝑥 + 𝛿), 𝑦 = Loss 𝑓/ (𝑥 + 𝛿 ⋆ ), 𝑦
𝜕𝜃 (∈∆ 𝜕𝜃
Seems “obvious,” but it is a very subtle result; means we can optimize through the
max by just finding it’s maximizing value
19
Adversarial training
Test Error, epsilon=0.1
Repeat 74.4%
1. Select minibatch 𝐵
2. For each 𝑥, 𝑦 ∈ 𝐵, compute
adversarial example 𝛿 ⋆ 𝑥
41.7%
3. Update parameters
𝛼 𝜕
𝜃≔𝜃− ∑ Loss 𝑓/ (𝑥 + 𝛿 ⋆ 𝑥 ), 𝑦
𝐵 ",$∈T
𝜕𝜃
2.6%
Common to also mix robust/standard updates 1.1% 0.9%
2.8%
(not done in our case)
ConvNet Robust ConvNet
Clean FGSM PGD
20
Evaluating robust models
Our model looks good, but we should be careful declaring success
Need to evaluate against different attacks, PGD attacks run for longer, with
random restarts, etc
21
Adversarial robustness
“pig”
2. Certified defenses: Provably upper bound inner maximization [Wong and Kolter,
2018; Ragunathan et al., 2018; Mirman et al., 2018; Cohen et al., 2019]
22
Provable defenses
max Loss 𝑓/ 𝑥 + 𝛿 , 𝑦 ≤ max Loss 𝑓/rel 𝑥 + 𝛿 , 𝑦 ≤ Loss(𝑓/dual 𝑥, Δ , 𝑦)
(∈∆ (∈∆
z z z
ẑ ẑ ẑ
ℓ u ℓ u u
24
2D Toy Example
Simple 2D toy problem, 2-100-100-100-2 MLP network, trained with Adam
(learning rate = 0.001, no hyperparameter tuning)
27
What causes adversarial examples?
Adversarial examples are caused (informally) by small regions of adversarial class
“jutting” into an otherwise “nice” decision region (see also, e.g., [Roth et al., 2019])
Data point
Incorrect class
Correct class
28
Randomization as a defense?
We can “smooth” this decision region by adding Gaussian noise to the input and
picking the majority class of the classifier over this noise
This was proposed (in many different ways) as a heuristic defense, but [Lecuyer et
al, 2018] and later [Li et al., 2018] demonstrated that it gives certified bounds; we
simplify and tighten this analysis in [Cohen et al., 2019]
29
Visual intuition of randomized smoothing
To classify panda images, classify a bunch of versions perturbed by random noise,
take the majority vote
Note that this requires that our “base” classifier 𝑓 be able to classify noisy images
well (in practice, means we also need to train on these noisy images)
30
The randomized smoothing guarantee
Theorem (binary case):
• Given some input 𝑥, let 𝑦 ̂ = 𝑔(𝑥) be prediction of the smoothed classifier,
and let 𝑝 > 1/2 be the associated probability of this class under the
smoothing distribution
𝑝 = 𝐏c∼k(0,m2 o) 𝑓 𝑥 + 𝜖 = 𝑦 ̂
31
Proof of certified robustness
Reasonable question: why can performance on random noise tell us anything
about performance under adversarial noise?
32
Proof of certified robustness (cont)
(Follows from
Neyman-Pearson x+δ
lemma in
hypothesis testing) x+δ
x
See also [Li and x
Kuelbs 1998] R
(thanks Ludwig Schmidt for
pointing out reference)
𝑓(𝑥) 𝑔 𝑥
For linear classifier, we can compute ℓ2 distance to worse-case boundary exactly
𝑅 = 𝜎Φ−1 𝑝
where 𝑝 is probability of majority class; implies any perturbation with 𝛿 2 ≤ 𝑅
cannot change class label ∎
33
Caveats (a.k.a. the fine print)
The procedure here only guarantees robustness for the smoothed classifier 𝑔 not
for the underlying classifier 𝑓
34
Comparison to previous SOTA on CIFAR10
For identical networks, mostly outperforms previous SOTA for ℓ2 robustness, but also
scales to much larger networks (where it uniformly outperforms duality-based approaches)
35
Performance on ImageNet
Example: we can certify smoothed classifier has top-1 accuracy of 37% under any
perturbation with 𝛿 2 ≤ 1 (in normalized pixels, i.e., RGB values in [0,1])
36
Future and ongoing work
Extension to other perturbation norms besides ℓ2 ?
• Seems extremely challenging (possibly impossible under certain
assumptions), e.g., can’t do better than naive 𝑑1/2 scaling for ℓ∞ norm
A strange property:
• Previous work on LP bounds was extremely specific to neural networks
• Smoothing work never uses the fact that base classifier is neural network
My best guess for a way forward: we need to use model information to extract
properties of base classifier beyond single probability 𝑝, use these to get better
bounds
37
Outline
Introduction
Final thoughts
38
Robust artificial intelligence
Deep learning is making amazing strides, but we have a long ways to go before
we can build deep learning systems that achieve even ”small” degrees of
robustness/adaptability compared to what humans take for granted
Resources:
• http://zicokolter.com – Web page with all papers
• http://github.com/locuslab – Code associated with all papers
• http://adversarial-ml-tutorial.org – Tutorial/code on adversarial robustness
• http://locuslab.github.io – Group blog
39