Lecture 4

Lecture 4: CNN Models
By: Dr. Eman Ahmed

Contents
• AlexNet
• VGG16-19
• Data Augmentation
AlexNet
https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a6
8c45b-Paper.pdf
AlexNet
AlexNet
AlexNet Dropout
• It is based on setting the output of each hidden neuron to zero (with a
probability ex: 0.5). The neurons which are “dropped out” in this way do
not contribute to the forward pass and do not participate in
backpropagation.
• So, every time an input is presented, the neural network samples a
different architecture, but all these architectures share weights.
• At test time, all the neurons are used but their outputs are multiplied by
0.5, which is a reasonable approximation to taking the geometric mean of
the predictive distributions produced by the exponentially-many dropout
networks.
• “We use dropout in the first two fully-connected layers. Without dropout,
our network exhibits substantial overfitting. Dropout roughly doubles the
number of iterations required to converge. “
ImageNet Dataset
• 1000 classes
• ImageNet consists of variable-resolution images → the images are
down-sampled to a fixed resolution of 256 × 256.
• Given a rectangular image, we first rescaled the image such that the
shorter side was of length 256, and then cropped out the central
256×256 patch from the resulting image.
• Preprocessing: subtracting the mean activity over the training set
from each pixel. So, AlexNet was trained on the (centered) raw RGB
values of the pixels
• AlexNet achieved 84.7% accuracy on ImageNet.
VGGNets
• Why? VGGNet was born out of the need to reduce the # of
parameters in the CONV layers and improve on training time.
• What? There are multiple variants of VGGNet (VGG16, VGG19, etc.)
which differ only in the total number of layers in the network.
VGG-16
VGG16 Architecture
VGG16
https://arxiv.org/abs/1409.1556
• VGG16 has a total of 138 million parameters. The important point to
note here is that all the conv kernels are of size 3x3 and maxpool
kernels are of size 2x2 with a stride of two.
• How? The idea behind having fixed size kernels is that all the variable
size convolutional kernels used in Alexnet (11x11, 5x5, 3x3) can be
replicated by making use of multiple 3x3 kernels as building blocks.
The replication is in terms of the receptive field covered by the
kernels.
• Given an input layer of size 5x5x1. Implementing a conv layer with a
kernel size of 5x5 and stride one will result in an output feature map
of 1x1.
• The same output feature map can be obtained by implementing two
3x3 conv layers with a stride of 1
5x5 as two convolutional layers with filter 3x3
Conv1 (3x3) Conv2 (3x3)

Benefits of using 3x3 filters with consecutive
conv layers
• For a 5x5 conv layer filter, the number of variables is 25. On the other
hand, two conv layers of kernel size 3x3 have a total of 3x3x2=18 variables
(a reduction of 28%).
• Similarly, the effect of one 7x7 conv layer can be achieved by implementing
three 3x3 conv layers with a stride of one. This reduces the number of
trainable variables by 44.9%.
• Also, the effect of one 11x11 conv layer can be achieved by implementing
five 3x3 conv layers with a stride of one. This reduces the number of
trainable variables by 62.8%.
• A reduced number of trainable variables means faster learning and more
robust to overfitting.
• The VGG16 model can achieve a test accuracy of 92.7% in ImageNet.
Data Augmentation
• Data augmentation is a technique of artificially increasing the training set
by creating modified copies of a dataset using existing data. It includes
making minor changes to the dataset or using deep learning to generate
new data points.
• It is used when the classes of the problem are unbalanced. This causes the
classifier to be biased to the majority class as it is trained well on its
samples. In this case, augmentation is applied to the minority class so that
its number of samples increases to reach the majority class. Finally, all
classes should have the same number of samples to avoid biasing the
classifier to the majority class.
• It is also used in small datasets to increase the number of samples to avoid
overfitting.
Data Augmentation
• It is done only on the training set.
• It is not applied to the validation or test sets.
• It is important to first split the data into training, validation and test
sets then apply data augmentation to the training set only.
Image Augmentation Techniques
• Rotation
• Shearing
• Horizontal and vertical flipping
• Brightness
• Contrast variation
• Zooming
Example of Data augmentations
Resources
• https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-
and-inception-
7baaaecccc96#:~:text=AlexNet%20was%20born%20out%20of,with%2
0an%20accuracy%20of%2073.8%25.
• https://towardsdatascience.com/image-augmentation-14a0aafd0498

Lecture 4

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4

Uploaded by

Copyright:

Available Formats

Lecture 4: CNN Models

By: Dr. Eman Ahmed

Conv1 (3x3) Conv2 (3x3)

You might also like