Professional Documents
Culture Documents
Lec9 CNN 25jan18
Lec9 CNN 25jan18
Sudeshna Sarkar
Spring 2018
22 Jan 2018
LeNet-5 (LeCun, 1998)
Ranzato
4
Locally Connected Layer
STATIONARITY? Statistics is similar at
different locations
Ranzato
5
Convolutional Layer
Ranzato
6
Convolution
Kernel
w7 w8 w9
w4 w5 w6
w1 w2 w3
Feature Map
Grayscale Image
wT x
wT x
wT x
wT x
Output size:
N −K
+1
S
translated
image image
output layer
input
layer hidden layer
Convolutional
:
32x32x3 image
32 height
32 width
3 depth
32x32x3 image
5x5x3 filter
32
32
5x5x3 filter
32
32
32x32x3 image
32
5x5x3 filter 𝑤
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
(i.e. 5*5*3 = 75-dimensional dot product +
bias) 𝑤 𝑇 𝑥 + 𝑏
32
32 5x5x3 filter
28
32 28
3 1
28
32 28
3 1
activation maps
32
28
Convolution Layer
32 28
3 6
CONV,
ReLU
e.g. 6
5x5x3
filters
32 28
3 6
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
filters filters
32 28 24
3 6 10
28
32 28
3 1
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 2
=> 3x3 output!
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
7
7x7 input (spatially)
assume 3x3 filter
applied with stride 3?
7 doesn’t fit!
cannot apply 3x3 filter on
7x7 input with stride 3.
(recall:)
(N - F) / stride + 1
32 28 24
….
CONV, CONV, CONV,
ReLU ReLU ReLU
e.g. 6 e.g. 10
5x5x3 5x5x6
filters filters
32 28 24
3 6 10
Ranzato
55
Pooling Layer
Ranzato
56
Pooling layer
- makes the representations smaller and more manageable
- operates over each activation map independently:
MAX POOLING
Single depth slice
x 1 1 2 4
max pool with 2x2
5 6 7 8 filters and stride 2 6 8
3 2 1 0 3 4
1 2 3 4
y
MAX POOLING
x 1 1 2 4
max pool with 2x2 filters
5 6 7 8
and stride 2
6 8
3 2 1 0 3 4
1 2 3 4
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
(227-11)/4+1 = 55
Case Study: AlexNet
[Krizhevsky et al. 2012]
1. Train on
Imagenet
best model
TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
Inception module
Fun features:
Compared to AlexNet:
- 12X less params
- 2x more compute
- 6.67% (vs. 16.4%)
at runtime: faster
than a VGGNet!
(even though it has
8x more layers)
spatial dimension
[He et al., 2015]
only 56x56!
Convolution
Pooling
Softmax
Other
GoogLeNet vs State of the art
GoogLeNe
t Convolution
Pooling
Softmax
Other