Professional Documents
Culture Documents
An Analysis of Convolutional Neural Network Architectures
An Analysis of Convolutional Neural Network Architectures
• Cannot take into account the variability of the real world objects.
SOLUTION
• Keep preprocessing to a minimum. Feed raw pixel data.
• However, this does not solve the problem of space invariance and affine
transformations of real world objects.
• Convolution.
• This is achieved by varying the size of the filter which becomes the
“receptive field size” of the filter.
SO, CONV NETS.
• They combine three architectural ideas, to ensure some degrees of
shifting, scaling and distortion.
• Local Receptive fields, ie, convolving filters that extract localized features.
• Spacial Subsampling
• These features become an input for the next layer which extracts more
sophisticated features.
• Once a feature has been extracted, its absolute position in the image is of
little consequence, because of all the possibilities in which an object can
exist.
• What matters is its relative position with respect to the other extracted
features, for these will be recombined to give higher features.
• Learning exact locations of filters has another pitfall. It leads to loss of
generality.
• The more the depth of the network, the richer features it can extract.
CASE STUDY 1: LE NET 5 (1998)
• Main ideas : Convolution, local receptive fields, shared weights, spacial
subsampling.
• LeNet-5 is a very simple network.
• It only has 7 layers, among which there are 3 convolutional layers (C1, C3
and C5), 2 sub-sampling (pooling) layers (S2 and S4), and 1 fully
connected layer (F6), that are followed by the output layer.
• Convolutional layers use 5 by 5 convolutions with stride 1.
• Sub-sampling layers are 2 by 2 average pooling layers. Tanh sigmoid
activations are used throughout the network.
• It was limited by the computational power and small size of labeled data at
the time.
CASE STUDY : ALEXNET(2012)
AlexNet contains eight layers:
• For example :
• The added non linearity adds to the richness of the features that a filter,
in this case the NIN can extract. Further combating the Spatial Variance.
THE INCEPTION MODULE
• Previously, in nets like AlexNet, and VGGNet, conv size is fixed for each
layer.
• Now, 1×1 conv, 3×3 conv, 5×5 conv, and 3×3 max pooling are done
together for the previous input, and stacked together again at
output. When an image comes in, let the network choose the right
path.
• We can now appreciate the depth wise dimensionality reduction that NIN
provides.
• Number of weights = 0
• Like Always, this is the idea from NIN which can be less prone to
overfitting.
• In the last few years, experts have turned to global average pooling (GAP)
layers to minimize overfitting by reducing the total number of parameters
in the model.
• Similar to max pooling layers, GAP layers are used to reduce the spatial
dimensions of a three-dimensional tensor.