CNN_Short

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Convolutional Neural

Networks
Muhammad Atif Tahir
Some Slides from
Waterloo (CS)
Andrew Ng (Coursera)
CNN
⚫ We know it is good to learn a small model.
⚫ From this fully connected model, do we really need all
the edges?
⚫ Can some of these be shared?
Consider learning an image:

⚫Some patterns are much smaller than


the whole image

Can represent a small region with fewer parameters

“beak” detector
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.

“upper-left
beak” detector

They can be compressed


to the same parameters.

“middle beak”
detector
A convolutional layer
A CNN is a neural network with some convolutional layers
(and some other layers). A convolutional layer has a number
of filters that does convolutional operation.

Beak detector

A filter
Convolution These are the network
parameters to be learned.

1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
0 1 0 0 1 0 -1 1 -1 Filter 2
0 0 1 0 1 0 -1 1 -1



6 x 6 image
Each filter detects a
small pattern (3 x 3).
1 -1 -1
Convolution -1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image
1 -1 -1
Convolution -1 1 -1 Filter 1
-1 -1 1
If stride=2

1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0

6 x 6 image
1 -1 -1
Convolution -1 1 -1 Filter 1
-1 -1 1
stride=1

1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1

6 x 6 image 3 -2 -2 -1
-1 1 -1
Convolution -1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map
0 1
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Color image: RGB 3 channels
1 -1 -1 -1-1 11 -1-1
11 -1-1 -1-1 -1 1 -1
-1-1 11 -1-1 -1-1-1 111 -1-1-1 Filter 2
-1 1 -1 Filter 1 -1 1 -1
-1-1 -1-1 11 -1-1 11 -1-1
-1 -1 1
Color image
1 0 0 0 0 1
1 0 0 0 0 1
0 11 00 00 01 00 1
0 1 0 0 1 0
0 00 11 01 00 10 0
0 0 1 1 0 0
1 00 00 10 11 00 0
1 0 0 0 1 0
0 11 00 00 01 10 0
0 1 0 0 1 0
0 00 11 00 01 10 0
0 0 1 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected

1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image

x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected



0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3


1 0 0 0 0 1
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0


0 0 1 0 1 0
13 0
6 x 6 image
14 0
fewer parameters! 15 1 Only connect to
16 1 9 inputs, not
fully connected

1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3


1 0 0 0 0 1
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0


0 0 1 0 1 0
13: 0
6 x 6 image
14: 0
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters

The whole CNN
cat dog ……
Convolution

Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times

Max Pooling

Flattened
Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1

3 -1 -3 -1 -1 -1 -1 -1

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3
Why Pooling

⚫ Subsampling pixels will not change the object


bird
bird

Subsampling

We can subsample the pixels to make image smaller


fewer parameters to characterize the image
A CNN compresses a fully connected
network in two ways:
⚫Reducing number of connections
⚫Shared weights on the edges
⚫Max pooling further reduces the complexity
Max Pooling

New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
The whole CNN
3 0
-1 1 Convolution

3 1
0 3
Max Pooling
Can
A new image
repeat
Convolution many
Smaller than the original
times
image
The number of channels Max Pooling

is the number of filters


The whole CNN
cat dog ……
Convolution

Max Pooling

Fully Connected A new image


Feedforward network
Convolution

Max Pooling

Flattened A new image


3
Flattening
0

1
3 0
-1 1 3

3 1 -1
0 3 Flattened

1 Fully Connected
Feedforward network

3
Convolution Layer (Summary)

• Convolution layer takes an input feature map of dimension 𝑊 × 𝐻 ×


𝑁 and produces an output feature map of dimension 𝑊 ෡ ×𝐻 ෡×𝑀
• Each layer is defined using following parameters:
• # Input channels (N)
• # Output channels (M)
• Kernel size
• Padding
• Stride
Convolution Layer (Summary)

Figure: In this example, 5x5 input is


Figure: In this example, 5x5 input is
convolved with 3x3 kernel with
convolved with 3x3 kernel with
stride=2 and padding=1 to produce
stride=padding=1 to produce an
an output of size 3x3.
output of size 5x5.
Parameters
• For one layer,
• i, no. of input maps (or channels)
• f, filter size (just the length)
• o, no. of output maps (or channels. this is also defined by how many
filters are used)
• One filter is applied to every input map
• num_params
= weights + biases
= [i × (f×f) × o] + o
Parameters
Greyscale image with 2×2 filter, output 3 channels

• i = 1 (greyscale has only 1 channel)


• f=2
• o=3
• num_params
= [i × (f×f) × o] + o
= [1 × (2×2) × 3] + 3
= 15
RGB image with 2×2 filter, output of 1
channel
• i = 3 (3 channels)
• f=2
• o=1
• num_params
= [i × (f×f) × o] + o
= [3 × (2×2) × 1] + 1
= 13
Image with 2 channels, with 2×2 filter, and
output of 3 channels
• i = 2 (2 channels)
• f=2
• o=3
• num_params
= [i × (f×f) × 3] + 3
= [2 × (2×2) × 3] + 3
= 27
Formula

• output = (input + 2p –k)/s+1

• Where k is kernel size

• E.g.

• s = 1; p = 0, k = 3, input = 28

• Output = (28-3)/1 + 1 = 26x26


Problem: In AlexNet, the input image is of size 227x227x3. The first
convolutional layer has 96 kernels of size 11x11x3. The stride is 4 and
padding is 0. Therefore, the size of the output image right after the first bank
of convolutional layers is

So, the output image is of size 55x55x96 ( one channel for each kernel )
Only modified the network structure and
CNN in Keras input format (vector -> 3-D tensor)

input

Convolution
1 -1 -1
-1 1 -1
-1 1 -1
-1 1 -1 … There are
25 3x3
-1 -1 1
-1 1 -1 … Max Pooling
filters.
Input_shape = ( 28 , 28 , 1)

28 x 28 pixels 1: black/white, 3: RGB Convolution

3 -1 3 Max Pooling

-3 1
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)

Input
1 x 28 x 28

Convolution
How many parameters for
each filter? 9 25 x 26 x 26

Max Pooling
25 x 13 x 13

Convolution
How many parameters 225=
for each filter? 50 x 11 x 11
25x9
Max Pooling
50 x 5 x 5
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)

Input
1 x 28 x 28

Output Convolution

25 x 26 x 26
Fully connected Max Pooling
feedforward network
25 x 13 x 13

Convolution
50 x 11 x 11

Max Pooling
1250 50 x 5 x 5
Flattened
AlphaGo

Next move
Neural
(19 x 19
Network positions)

19 x 19 matrix
Black: 1 Fully-connected feedforward
network can be used
white: -1
none: 0 But CNN performs much better
AlphaGo’s policy network
The following is quotation from their Nature article:
Note: AlphaGo does not use Max Pooling.
CNN in speech recognition

The filters move in the


CNN frequency direction.
Frequency

Image Time
Spectrogram
CNN in text classification

Source of image:
http://citeseerx.ist.psu.edu/viewdoc/downlo
ad?doi=10.1.1.703.6858&rep=rep1&type=p
df
Convolutional NN

Convolutional Neural Networks is extension


of traditional Multi-layer Perceptron, based
on 3 ideas:
1. Local receive fields
2. Shared weights
3. Spatial / temporal sub-sampling
See LeCun paper (1998) on text recognition:
http://yann.lecun.com/exdb/publis/pdf/lecun-
01a.pdf
What is Convolutional NN ?
CNN - multi-layer NN architecture
 Convolutional + Non-Linear Layer
 Sub-sampling Layer
 Convolutional +Non-L inear Layer
 Fully connected layers
⚫ Supervised

Classi-
Feature Extraction
fication
What is Convolutional NN ?

2x2

Convolution + NL Sub-sampling Convolution + NL


CNN success story: ILSVRC 2012
Imagenet data base: 14 mln labeled images, 20K categories
ILSVRC: Classification
Imagenet Classifications 2012
ILSVRC 2012: top rankers
http://www.image-net.org/challenges/LSVRC/2012/results.html

N Error-5 Algorithm Team Authors


1 0.153 Deep Conv. Neural Univ. of Krizhevsky et al
Network Toronto
2 0.262 Features + Fisher Vectors ISI Gunji et al
+ Linear classifier
3 0.270 Features + FV + SVM OXFORD_VG Simonyan et al
G
4 0.271 SIFT + FV + PQ + SVM XRCE/INRIA Perronin et al
5 0.300 Color desc. + SVM Univ. of van de Sande et al
Amsterdam
Imagenet 2013: top rankers
http://www.image-net.org/challenges/LSVRC/2013/results.php

N Error-5 Algorithm Team Authors


1 0.117 Deep Convolutional Neural Clarifi Zeiler
Network
2 0.129 Deep Convolutional Neural Nat.Univ Min LIN
Networks Singapore
3 0.135 Deep Convolutional Neural NYU Zeiler
Networks Fergus
4 0.135 Deep Convolutional Neural Andrew Howard
Networks
5 0.137 Deep Convolutional Neural Overfeat Pierre Sermanet et
Networks NYU al
Imagenet Classifications 2013
Conv Net Topology
⚫ 5 convolutional layers
⚫ 3 fully connected layers + soft-max
⚫ 650K neurons , 60 Mln weights
Why ConvNet should be Deep?

Rob Fergus, NIPS 2013


Why ConvNet should be Deep?
Why ConvNet should be Deep?
Why ConvNet should be Deep?
Why ConvNet should be Deep?
Conv Nets:
beyond Visual Classification
CNN applications
CNN is a big Plenty low hanging fruits
hammer

You need just a right nail!


Conv NN: Detection

Sermanet, CVPR 2014


Conv NN: Scene parsing

Farabet, PAMI 2013


CNN: indoor semantic labeling
RGBD

Farabet, 2013
Conv NN: Action Detection

Taylor, ECCV 2010

You might also like