Professional Documents
Culture Documents
CNN 01 Abcdefg
CNN 01 Abcdefg
Convolutional Networks
Content originally planned for today got split into two lectures
W
R
Downstream
gradients
f
Local
gradients
Forward pass computes outputs
Upstream
gradient
Backward pass computes gradients
56
Problem: So far our
56 231
classifiers don’t 231
24 2
respect the spatial
24
structure of images!
Input image
2
(2, 2)
(4,)
Input:
x W1 h W2 s
3072
Output: 10
Hidden layer:
100
56
Problem: So far our
56 231
classifiers don’t 231
24 2
respect the spatial
24
structure of images!
Input image
2
(2, 2) Solution: Define new
computational nodes (4,)
Input:
x W1 h W2 s that operate on images!
3072
Output: 10
Hidden layer:
100
x h s
x h s
x h s
Input Output
1 1
10 x 3072
3072 10
weights
Input Output
1 1
10 x 3072
3072 10
weights
1 number:
the result of taking a dot
product between a row of W
and the input (a 3072-
dimensional dot product)
32 height
32 width
3 depth /
channels
Justin Johnson Lecture 7 - 12 September 24, 2019
Convolution Layer
3x32x32 image
3x5x5 filter
32 width
3 depth /
channels
Justin Johnson Lecture 7 - 13 September 24, 2019
Convolution Layer Filters always extend the full
depth of the input volume
3x32x32 image
3x5x5 filter
32 width
3 depth /
channels
Justin Johnson Lecture 7 - 14 September 24, 2019
Convolution Layer
3x32x32 image
3x5x5 filter
1 number:
32 the result of taking a dot product between the filter
and a small 3x5x5 chunk of the image
(i.e. 3*5*5 = 75-dimensional dot product + bias)
32
3
3x5x5 filter
28
28
32
1
3
28
32
1 1
3
Convolution
Layer
32
32 6x3x5x5
3 filters Stack activations to get a
6x28x28 output image!
Justin Johnson Lecture 7 - 18 September 24, 2019
Convolution Layer 6 activation maps,
each 1x28x28
3x32x32 image Also 6-dim bias vector:
Convolution
Layer
32
32 6x3x5x5
3 filters Stack activations to get a
6x28x28 output image!
Justin Johnson Lecture 7 - 19 September 24, 2019
Convolution Layer 28x28 grid, at each
point a 6-dim vector
3x32x32 image Also 6-dim bias vector:
Convolution
Layer
32
32 6x3x5x5
3 filters Stack activations to get a
6x28x28 output image!
Justin Johnson Lecture 7 - 20 September 24, 2019
Convolution Layer 2x6x28x28
2x3x32x32 Batch of outputs
Batch of images Also 6-dim bias vector:
Convolution
Layer
32
32 6x3x5x5
3 filters
Convolution
Layer
H
W Cout x Cinx Kw x Kh
Cout
Cin filters
32 28 26
32 28 26
32 28 26
32 28 26
Conv ReLU
W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28
Justin Johnson Lecture 7 - 27 September 24, 2019
What do convolutional filters learn?
MLP: Bank of whole-image templates
32 28
Conv ReLU
W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28
Justin Johnson Lecture 7 - 28 September 24, 2019
What do convolutional filters learn?
First-layer conv filters: local image templates
(Often learns oriented edges, opposing colors)
32 28
Conv ReLU
W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28 AlexNet: 64 filters, each 3x11x11
32 28
Conv ReLU
W1: 6x3x5x5
32 b1: 6 28
3 6
Input: First hidden layer:
N x 3 x 32 x 32 N x 6 x 28 x 28
Justin Johnson Lecture 7 - 30 September 24, 2019
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
7
Justin Johnson Lecture 7 - 31 September 24, 2019
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
7
Justin Johnson Lecture 7 - 32 September 24, 2019
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
7
Justin Johnson Lecture 7 - 33 September 24, 2019
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
7
Justin Johnson Lecture 7 - 34 September 24, 2019
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
Output: 5x5
7
Justin Johnson Lecture 7 - 35 September 24, 2019
A closer look at spatial dimensions
Input: 7x7
Filter: 3x3
Output: 5x5
In general: Problem: Feature
7 maps “shrink”
Input: W
Filter: K with each layer!
Output: W – K + 1
7
Justin Johnson Lecture 7 - 36 September 24, 2019
A closer look at spatial dimensions
0 0 0 0 0 0 0 0 0
Input: 7x7
0 0
Filter: 3x3
0 0
Output: 5x5
0 0
In general: Problem: Feature
0 0
Input: W maps “shrink”
0 0
Filter: K with each layer!
0 0 Output: W – K + 1
0 0
Solution: padding
0 0 0 0 0 0 0 0 0 Add zeros around the input
Justin Johnson Lecture 7 - 37 September 24, 2019
A closer look at spatial dimensions
0 0 0 0 0 0 0 0 0
Input: 7x7
0 0
Filter: 3x3
0 0
Output: 5x5
0 0
0 0
In general: Very common:
Input: W Set P = (K – 1) / 2 to
0 0
Filter: K make output have
same size as input!
0 0 Padding: P
0 0 Output: W – K + 1 + 2P
0 0 0 0 0 0 0 0 0
Input Output
Input Output
Be careful – ”receptive field in the input” vs “receptive field in the previous layer”
Hopefully clear from context!
Input volume: 3 x 32 x 32
10 5x5 filters with stride 1, pad 2
Input volume: 3 x 32 x 32
10 5x5 filters with stride 1, pad 2
Input volume: 3 x 32 x 32
10 5x5 filters with stride 1, pad 2
Input volume: 3 x 32 x 32
10 5x5 filters with stride 1, pad 2
Input volume: 3 x 32 x 32
10 5x5 filters with stride 1, pad 2
Input volume: 3 x 32 x 32
10 5x5 filters with stride 1, pad 2
1x1 CONV
56 with 32 filters
56
(each filter has size 1x1x64,
and performs a 64-
dimensional dot product)
56 56
64 32
1x1 CONV
56 with 32 filters
56
(each filter has size 1x1x64,
and performs a 64-
dimensional dot product)
56 56
Stacking 1x1 conv layers
64 gives MLP operating on 32
each input position Lin et al, “Network in Network”, ICLR 2014
W
Cin
Justin Johnson Lecture 7 - 57 September 24, 2019
Other types of convolution
So far: 2D Convolution 1D Convolution
Input: Cin x H x W Input: Cin x W
Weights: Cout x Cin x K x K Weights: Cout x Cin x K
H Cin
W
Cin W
Justin Johnson Lecture 7 - 58 September 24, 2019
Other types of convolution
So far: 2D Convolution 3D Convolution
Input: Cin x H x W Input: Cin x H x W x D
Weights: Cout x Cin x K x K Weights: Cout x Cin x K x K x K
H
Cin-dim vector
H
at each point
in the volume
W D
Cin W
Justin Johnson Lecture 7 - 59 September 24, 2019
PyTorch Convolution Layer
x h s
Hyperparameters:
Kernel Size
Stride
Pooling function
3 2 1 0 3 4
1 2 3 4 Introduces invariance to
small spatial shifts
y No learnable parameters!
Justin Johnson Lecture 7 - 64 September 24, 2019
Pooling Summary
Input: C x H x W
Hyperparameters:
- Kernel size: K Common settings:
- Stride: S max, K = 2, S = 2
- Pooling function (max, avg) max, K = 3, S = 2 (AlexNet)
Output: C x H’ x W’ where
- H’ = (H – K) / S + 1
- W’ = (W – K) / S + 1
Learnable parameters: None!
Justin Johnson Lecture 7 - 65 September 24, 2019
Components of a Convolutional Network
Fully-Connected Layers Activation Function
x h s
Example: LeNet-5
x h s
Per-channel
std, shape is D
N X
Normalized x,
Shape is N x D
D
Ioffe and Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift”, ICML 2015
Per-channel
std, shape is D
N X
Normalized x,
Shape is N x D
Normalized x,
Learning = , Shape is N x D
= will recover the
Output,
identity function! Shape is N x D
Input: Per-channel
mean, shape is D
Learnable scale and
Per-channel
shift parameters: std, shape is D
Normalized x,
Learning = , Shape is N x D
= will recover the
Output,
identity function! Shape is N x D
Normalized x,
Learning = , Shape is N x D
= will recover the
Output,
identity function! Shape is N x D
Normalized x,
During testing batchnorm Shape is N x D
becomes a linear operator!
Can be fused with the previous Output,
fully-connected or conv layer Shape is N x D
x: N × D x: N×C×H×W
Normalize Normalize
FC
BN
tanh
FC
BN ImageNet
accuracy
tanh
tanh
x: N × D x: N × D
Normalize Normalize
𝞵,𝝈: 1 × D 𝞵,𝝈: N × 1
ɣ,β: 1 × D ɣ,β: 1 × D
y = ɣ(x-𝞵)/𝝈+β y = ɣ(x-𝞵)/𝝈+β
Ba, Kiros, and Hinton, “Layer Normalization”, arXiv 2016
x: N×C×H×W x: N×C×H×W
Normalize Normalize
x h s
x h s
Most
computationally
expensive!
x h s