Module 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Deep Learning – BCSE332L

vi R
Convolutional Neural Networks

ar
Dr. R. Bhargavi
g a
Professor
SCOPE

B h
VIT University

1
Computer Vision - Applications
Image Classification Object Detection

R
Malignant/Benign

g avi
r
Style Transfer

B ha
Dr. R Bhargavi, VIT 2
Working with Images - Fully connected DNN
• A fully connected DNN/MLP takes only tabular data as the input.
• It does not work well with images because they heavily rely on certain pixel
positions. Hence any positional variance will result in miss-classification (Example
shown in figure below)
Features are considered as independent of each other.

R

i
• A traditional fully connected DNN has huge number of learnable parameters

av
• Images of size 1024 x 1024 x 3, with 2 hidden layers of size 1000 ?

har g
B
Dr. R Bhargavi, VIT 3
Working with Images –DNN (cont…)
Input image

vi R
ar g a
B h
Flattened Input image to a Fully connected DNN

Dr. R Bhargavi, VIT 4


Working with Images - CNN
• Automatic feature extraction is done in CNN.
• This allows us to feed Images directly, instead of extracting features manually.
• Convolutional layers are responsible for feature extraction.
• Convolution layers will consider locality into account.

v
is also used for DNN.
i R
As the conv layers learn the representations the name representation learning

ar g a
B h
Dr. R Bhargavi, VIT 5
Convolutional Neural Network - Architecture

vi R
ar g a
B h Convolutional layers
Abstract
Features

FC layers (for
classification)

Dr. R Bhargavi, VIT 6


CNN – Architecture (cont…)

vi R
Convolution

ar g a
Pooling
Fully
Connected

h
Convolution Fully
Pooling Fully

B
Connected
Connected

Trainable Layers

Dr. R Bhargavi, VIT 7


CNN – Architecture (cont…)

vi R
ar g a
B h
Dr. R Bhargavi, VIT 8
Convolutional Layer
• Convolutional layer is the core building block of a Convolutional Network.
• Conv layer’s parameters consist of a set of learnable filters.

input.

v R
Local connectivity: Each neuron is connected only to a small region in the

i
ar g a x1
x3
x2
x4 *
w1

w3
w2

w4
z

h
=

B
Z

𝑧 = 𝑏 + % 𝑤! 𝑥!
Receptive Field
of the Neuron in !
the feature map
Dr. R Bhargavi, VIT 9
Convolutional Layer (cont…)
• Parameter sharing/ Weight sharing: In one conv layer same filter is used for the
entire image.

R
• Rationale - If detecting a horizontal edge is important at some location in the

i
image, it should intuitively be useful at some other location as well due to the

v
translationally-invariant structure of images. There is therefore no need to

ar g a
relearn to detect a horizontal edge at every one of the distinct locations in the
Conv layer output volume.

B h
Dr. R Bhargavi, VIT 10
Convolution Operation

Feature Map

vi R
ar g a
B h
Output size is given by (nh – kh +1) x (nw – kw +1) where (nh x nw) is the size (height
and width) of the input tensor and (kh x kw) is the size of the kernel

Dr. R Bhargavi, VIT 11


Convolutions with Multiple Channels
1 input channel and multiple output channels
• Use multiple kernels.

i R
• Each kernel results in one channel.

v
a
• Same convolution operation is used for each of the output channels.

ar g
• Each kernel learns different parameters corresponding to different
filters.
h
B
Dr. R Bhargavi, VIT 12
Convolutions with Multiple Channels (cont…)
Multiple input channels (3channels) and single output channel

vi R
ar g a
B h
Dr. R Bhargavi, VIT 13
Convolutions with Multiple Channels (cont…)
Multiple input channels (3channels) and multiple output channel
Kernel1: 3 channels

vi R Kernel2: 3 channels

ar g a Kernel3: 3 channels

B h
Input: 3 channels
Kernel4: 3 channels

Kernel5: 3 channels
Output: 5 channels

Dr. R Bhargavi, VIT 14


Padding
• With zero padding (aka valid conv) each convolution operation reduces the size of the
output.
• Some pixels (for example the corner ones) are least used where as few are used more often.
• If the input is of size n x n, and filter size is f x f and padding size is p then the resultant
output size will be ( n +2p –f +1) x (n +2p –f +1)
• If the o/p size is same as i/p size then it is called as Same padding

Padding size = 1

vi R
a
0 0 0 0 0 0 0 -1 0 0 3 0 3 2 0

r g
0 1 0 2 2 1 0 -1 1 0 4 -1 1 0 -2
* =

ha
0 2 1 1 2 1 0 0 1 0 2

B
0 2 1 1 1 1 0 2

0 0 1 1 2 2 0 2

0 2 2 1 1 1 0

0 0 0 0 0 0 0

Dr. R Bhargavi, VIT 15


Stride
• If the input is of size n x n, filter size is f x f, padding size is p, and Stride = s
then the resultant output size will be ( n +2p –f)/s +1 x (n +2p –f)/s +1

Stride = 2

R
0 0 0 0 0 0 0 -1 0 0 3 3 0

vi
0 1 0 2 2 1 0 -1 1 0 2 0 0
* =

a
2 -2 -2

g
0 2 1 1 2 1 0 0 1 0

ar
0 2 1 1 1 1 0

h
0 0 1 1 2 2 0

0
2

0
B 2

0
1

0
1

0
1

0
0

Dr. R Bhargavi, VIT 16


Inductive Biases
• Sparse connectivity – based on the assumption that neighboring pixels are
related
• Parameter sharing – based on the the assumption that same filters work in

R
different parts of the image

i

v
The above two assumptions are called Inductive biases.

a

g
Inductive biases result in CNNs learn more quickly and generalize better as

r
compared to fully connected NNs.

B ha
Dr. R Bhargavi, VIT 17
Pooling
• Used between the conv layers.
• Reduce the spatial size of the representation to reduce the amount of parameters
and computation in the network.
• Controls the overfitting.
• Accepts a volume of size W1×H1×D1

R
• Requires two hyperparameters:

i
• Spatial extent F.

av
• The stride S,

g
Produces a volume of size W2×H2×D2 where:

r

a
• W2= ((W1−F)/S)+1

h
• H2= ((H1−F)/S)+1

B
• D2=D1
• No learnable parameters.
• Padding the input using zero-padding is not done for pooling layer.

Dr. R Bhargavi, VIT 18


Pooling

vi R
ar g a
B h
Dr. R Bhargavi, VIT 19
CNN Architectures

vi R
ar g a
B h
Dr. R Bhargavi, VIT 20
Source: https://arxiv.org/pdf/1605.07678.pdf
LeNet
• LeNet, proposed by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick
Haffner in 1998, laid the groundwork for convolutional neural networks (CNNs)

R
and their applications in handwritten digit recognition.

i
LeNet was trained using stochastic gradient descent (SGD) with

v

a
backpropagation.

har g
The network was trained on the MNIST dataset, comprising 60,000 training
examples and 10,000 test examples.

B
• Data augmentation techniques such as translation, rotation, and scaling were
employed to increase the diversity of training samples and improve
generalization.
• LeNet achieved a remarkable accuracy of over 99% on the MNIST dataset.
Dr. R Bhargavi, VIT 21
LeNet-5 (cont…)
• Used sigmoid and tanh activations.
• Has approx. 60k learnable parameters.
• LeNet was used to read zip codes, digits, etc

vi R
ar g a
Stride = 1B h
6 Kernels - 5 x 5 Avg pool - 2 x 2
Stride = 2 16 Kernels - 5 x 5
Stride = 1
Avg pool - 2 x 2
Stride = 2

Dr. R Bhargavi, VIT 22


Source: http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf
AlexNet
• This deep convolutional neural network is
trained to classify the 1.2 million high-
resolution images in the ImageNet LSVRC-
2010 contest into the 1000 different classes.

vi R
Test data performance - Achieved top-1 and
top-5 error rates of 37.5% and 17.0%.

r g a
In the ILSVRC-2012 competition, a variant of

a
this model achieved a winning top-5 test error

h
rate of 15.3%.

B
Source: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Dr. R Bhargavi, VIT 23
AlexNet
• AlexNet consists of eight layers, including
• five convolutional layers followed by
• max-pooling layers and

R
• three fully connected layers.

vi
• Rectified Linear Units (ReLU) were used as activation functions, providing

a
faster convergence and alleviating the vanishing gradient problem.

h r g
The neural network has 60 million parameters and 650,000 neurons

a
Local Response Normalization (LRN) was introduced to normalize activations

B
within local regions of the feature maps.
• LRN operates on local groups of neurons, normalizing activity within each
group and across feature channels.

Dr. R Bhargavi, VIT 24


AlexNet - Training
• LRN is done using the formula

• !
𝑎",$ is the activity of a neuron computed by applying kernel i at position (x, y)
and then applying ReLU

i R
n - “adjacent” kernel maps at the same spatial position.

v
a
• N - the total number of kernels in the layer.

r g
• The constants k, n, α, and β are hyper-parameters with values k = 2, n = 5, α =

a
10−4 , and β = 0.75.

B h
AlexNet was trained using stochastic gradient descent (SGD) with momentum.

Dr. R Bhargavi, VIT 25


AlexNet - Training
• Overfitting reduction:
• Data augmentation
• Dropout

• Data augmentation techniques such as cropping, flipping, and color jittering

R
were employed to increase the diversity of training samples.

avi
The network was trained on two NVIDIA GTX 580 GPUs, marking one of the
earliest instances of utilizing GPU acceleration for deep learning.

g
har
B
Dr. R Bhargavi, VIT 26
AlexNet - Architecture
Conv ReLU
ReLU Conv ReLU 3x3
Maxpool 5x5
Maxpool S = 1,
Conv 3x3 p = same
Same
11 x 11 S=1 3x3
S=2
S=4 S=2

227 x 227 x 3 55 x 55 x 96

vi R 27 x 27 x 96 27 x 27 x 256 13 x 13 x 256 13 x 13 x 384

Conv

ar g
ReLU
a Conv ReLU

h
3x3 3x3 Maxpool

B
S = 1, Same 3x3
Same S = 1, S=2
FC FC
FC

13 x 13 x 384 13 x 13 x 256 6 x 6 x 256 4096 4096 1000


SoftMax
Architecture (cont…)

vi R
ar g a
B h
Dr. R Bhargavi, VIT 28
How to compute Number of parameters in
CNN
What will be the output size of the following network ? How many learnable
parameters exist? No padding is used and Stride = 1

vi R
a
3x3

g
10 x 10 x 1

r
x1

B ha Gray scale
image
Conv

Dr. R Bhargavi, VIT 29


Number of parameters in CNN (cont…)
3x3
10 x 10 x 1
x1

R
Gray scale

i
image

g av
Output size = (10 -3 +1, 10-3+1, 1) = 8,8,1

r
a
Parameters = (3 x 3 x 1) + 1 = 10

B h
Dr. R Bhargavi, VIT 30
Number of parameters in CNN (cont…)
What will be the output size of the following network ? How many learnable
parameters exist? No padding is used and Stride = 1

vi R
g a
10 x 10 x 1

har Conv Conv

B
Gray scale 3x3x5 3x3x2
image

Dr. R Bhargavi, VIT 31


Number of parameters in CNN (cont…)
10 x 10 x 1

R
Conv Conv

i
Gray scale 3x3x5 3x3x2

v
image

r g a
After first Conv Output size = (10 -3 +1, 10-3+1, 5) = 8,8,5

a
h
Parameters = for each Each filter (3 x 3 x 1) + 1 = 10 , For 5 filters = 50

B
Now
After Second conv filter, output size = (8 – 3 + 1, 8 – 3 + 1, 2) = 6,6,2
Parameters = Each filter (3 x 3 x 5)+1 = 46; Two filters = 92
Total parameters = 50 + 92 = 142
Dr. R Bhargavi, VIT 32
Number of parameters in CNN (cont…)
What will be the output size of the following network ? How many learnable
parameters exist? No padding is used and Stride = 1

vi R
ar g a100 x 100 x 3

h
Conv Conv

B
Color image 3x3x8 3x3x1

Dr. R Bhargavi, VIT 33


Number of parameters in CNN (cont…)

100 x 100 x 3

vi R
Color image
Conv
3x3x8
Conv
3x3x1

r g a
After first Conv Output size = (100 -3 +1, 100-3+1, 8) = 98, 98, 8

a
h
Parameters = for each Each filter (3 x 3 x 3) + 1 = 28 , For 8 filters = 224

B
Now
After Second conv filter, output size = (98 – 3 + 1, 98 – 3 + 1, 1) = 96,96,1
Parameters = Each filter (3 x 3 x 8)+1 = 73; only one filter = 73
Total parameters = 224 + 73 = 297
Dr. R Bhargavi, VIT 34
Number of parameters in CNN (cont…)
What will be the output size of the following network ? How many learnable
parameters exist? No padding is used and Stride = 1

vi R
ar g a
h
Conv Conv

B
(100) , 5 (3), 8 (3) ,1

Dr. R Bhargavi, VIT 35


Number of parameters in CNN (cont…)

i R
Conv Conv

v
(100) , 5 (3), 8 (3) ,1

r g a
After first Conv Output size = (100-3+1, 8) = 98, 8

a
h
Parameters = for each Each filter (3 x 5) + 1 = 16 , For 8 filters = 128

B
Now
After Second conv filter, output size = (98 – 3 + 1, 1) = 96,1
Parameters = Each filter (3 x 8)+1 = 25; only one filter = 25
Total parameters = 128 + 25 = 153
Dr. R Bhargavi, VIT 36
INCEPTION Module

vi R
ar g a
B h
Dr. R Bhargavi, VIT 37
GOOGLENET / INCEPTION NET

vi R
ar g a
B h
Auxiliary Loss
Dr. R Bhargavi, VIT 38
INCEPTION NET (cont…)

vi R
ar g a
B h
Dr. R Bhargavi, VIT 39
vi R
ar g a
B h
Dr. R Bhargavi, VIT 40

You might also like