Professional Documents
Culture Documents
Lecture 4
Lecture 4
https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a6
8c45b-Paper.pdf
AlexNet
AlexNet
AlexNet Dropout
• It is based on setting the output of each hidden neuron to zero (with a
probability ex: 0.5). The neurons which are “dropped out” in this way do
not contribute to the forward pass and do not participate in
backpropagation.
• So, every time an input is presented, the neural network samples a
different architecture, but all these architectures share weights.
• At test time, all the neurons are used but their outputs are multiplied by
0.5, which is a reasonable approximation to taking the geometric mean of
the predictive distributions produced by the exponentially-many dropout
networks.
• “We use dropout in the first two fully-connected layers. Without dropout,
our network exhibits substantial overfitting. Dropout roughly doubles the
number of iterations required to converge. “
ImageNet Dataset
• 1000 classes
• ImageNet consists of variable-resolution images → the images are
down-sampled to a fixed resolution of 256 × 256.
• Given a rectangular image, we first rescaled the image such that the
shorter side was of length 256, and then cropped out the central
256×256 patch from the resulting image.
• Preprocessing: subtracting the mean activity over the training set
from each pixel. So, AlexNet was trained on the (centered) raw RGB
values of the pixels
• AlexNet achieved 84.7% accuracy on ImageNet.
VGGNets
• Why? VGGNet was born out of the need to reduce the # of
parameters in the CONV layers and improve on training time.
• What? There are multiple variants of VGGNet (VGG16, VGG19, etc.)
which differ only in the total number of layers in the network.
VGG-16
VGG16 Architecture
VGG16
https://arxiv.org/abs/1409.1556
• VGG16 has a total of 138 million parameters. The important point to
note here is that all the conv kernels are of size 3x3 and maxpool
kernels are of size 2x2 with a stride of two.
• How? The idea behind having fixed size kernels is that all the variable
size convolutional kernels used in Alexnet (11x11, 5x5, 3x3) can be
replicated by making use of multiple 3x3 kernels as building blocks.
The replication is in terms of the receptive field covered by the
kernels.
• Given an input layer of size 5x5x1. Implementing a conv layer with a
kernel size of 5x5 and stride one will result in an output feature map
of 1x1.
• The same output feature map can be obtained by implementing two
3x3 conv layers with a stride of 1
5x5 as two convolutional layers with filter 3x3