Goog Le Net

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Unveiling GoogLeNet:

A Deep Dive into Google's State-of-the-


Art Convolutional Neural Network
Architecture

Presented By: Muhammad Mohsin Zafar


GoogLeNet
Introduction

 A deep Convolutional Neural Network (CNN) developed by researchers at Google.


 It was introduced in 2014.
 Named GoogLeNet because it was developed at Google and LeNet was the first structured
CNN.
GoogLeNet
Introduction

 A deep Convolutional Neural Network (CNN) developed by researchers at Google.


 It was introduced in 2014.
 Named GoogLeNet because it was developed at Google and LeNet was the first structured
CNN.
 Prior to the GoogLeNet every model was trying to increase the layers of the model to
attain batter generalization which results in higher complexity because as we increase the
layers (Going deeper), the number of parameters increase exponentiality.
GoogLeNet
ILSVRC

 Won the ILSVRC (ImageNet Large-Scale


Visual Recognition Challenge) that year, with a
top-five error rate of 6.67%.
GoogLeNet
ILSVRC

 Won the ILSVRC (ImageNet Large-Scale


Visual Recognition Challenge) that year, with a
top-five error rate of 6.67%.
 ILSVRC 2014 classification challenge involves
the task of classifying the image into one of 1000
categories in the ImageNet hierarchy. There are
about 1.2 million images for training, 50,000 for
validation and 100,000 images for testing.
GoogLeNet
Motivation (Going Deeper)

Researchers noted the as we increase the number of convolutional layer the results are
getting batter But as you can imagine, this can often create complications like:
 The bigger the model, the more prone it is to overfitting. This is particularly noticeable
when the training data is small.
GoogLeNet
Motivation (Going Deeper)

Researchers noted the as we increase the number of convolutional layer the results are
getting batter But as you can imagine, this can often create complications:
 The bigger the model, the more prone it is to overfitting. This is particularly noticeable
when the training data is small.
 Increasing the number of parameters means you need to increase your existing
computational resources.
GoogLeNet
Motivation (Going Deeper)

Researchers noted the as we increase the number of convolutional layer the results are
getting batter But as you can imagine, this can often create complications:
 The bigger the model, the more prone it is to overfitting. This is particularly noticeable
when the training data is small.
 Increasing the number of parameters means you need to increase your existing
computational resources.
 The problems like vanishing gradient can occur which training very deep models.
Going Deeper and Wider
 Addressed the complexity problem of architecture by forming
wider network instead of deeper by introducing the concept of
inception module.
Going Deeper and Wider
(Naïve Inception)
 Addressed the complexity problem of architecture by forming
wider network instead of deeper by introducing the concept of
inception module.
 First inception module called naïve inception module
incorporated 3 convolutions with different kernel or filter sizes
(1x1, 3x3 and 5x5) at the same level and a 3x3 max pooling.
 Outputs of these 3 convolutions and pooling are concatenated
and fed to the next inception module.
Naïve Inception

 The problem is that large number of parameters are generated


when we directly apply 5 x 5 convolution with large number of
filters which results in complex model.

 This architecture might cover the optimal sparse structure, it


would do it very inefficiently, leading to a computational blow
up within a few stages.
Inception V3

 Due to the higher complexity problem in naïve inception


module, they employed 1x1 convolution before 3x3 and 5x5
convolutions.
Inception V3

 Due to the higher complexity problem in naïve inception


module, they employed 1x1 convolution before 3x3 and 5x5
convolutions.
 By employing 1x1 convolution before 5x5, it is noted that
combined parameters of 1x1 and 5x5 convolutions are less
than when employing only 5x5 convolution.
 By introducing 1x1 convolution, we can increase the number of
filters without increasing the complexity of the model.
Complexity Comparison

Suppose we need to perform:


5×5 convolution without the use of 1×1 convolution as below:
 No of operations = (14×14×480) × (5×5×48) = 112.9M
Complexity Comparison

Suppose we need to perform:


With 1 x 1 convolution:
 No of operations for 1×1 = (14×14×480)×(1×1×16) = 1.5M
 No of operations for 5×5 = (14×14×16)×(5×5×48) = 3.8M
 Total No of operations = 1.5M + 3.8M = 5.3M
Complexity Comparison

Suppose we need to perform:


5×5 convolution without the use of 1×1 convolution as below:
 No of operations = (14×14×480) × (5×5×48) = 112.9M
With 1 x 1 convolution:
 No of operations for 1×1 = (14×14×480)×(1×1×16) = 1.5M
 No of operations for 5×5 = (14×14×16)×(5×5×48) = 3.8M
 Total No of operations = 1.5M + 3.8M = 5.3M
5.3M is much smaller than 112.9M
General Architecture

 It is already a very deep model compared with previous AlexNet, ZFNet and
VGGNet.
 There are numerous inception modules connected to go deeper.
Global Average Pooling

 Previously fully connected layers were used at the end of architecture which adds
a lot of parameters and increases computational complexity But in GoogLeNet
global average pooling is introduced.
Global Average Pooling

 Number of weights (connections) above =


7×7×1024×1024 = 51.3M
 In GoogLeNet, global average pooling is used nearly at
the end of network by averaging each feature map from
7×7 to 1×1.
Number of weights = 0
Global Average Pooling

 Number of weights (connections) above =


7×7×1024×1024 = 51.3M
 In GoogLeNet, global average pooling is used nearly at
the end of network by averaging each feature map from
7×7 to 1×1.
Number of weights = 0
 A move from FC layers to average pooling improved
the top-1 accuracy by about 0.6%.
Detailed Architecture
Detailed Architecture
Detailed Architecture
Inception V3 Algorithm
GoogLeNet
Pros:

1. Reduced number of operations and computational cost because of the use of 1x1
convolution.
GoogLeNet
Pros:

1. Reduced number of operations and computational cost because of the use of 1x1
convolution.
2. Capable of capturing features at multiple scales with the help of different kernel size;
3x3 and 5x5, at the same level.
GoogLeNet
Pros:

1. Reduced number of operations and computational cost because of the use of 1x1
convolution.
2. Capable of capturing features at multiple scales with the help of different kernel size;
3x3 and 5x5, at the same level.
3. Also performs well while working with low contrast images.
4. Highly scalable according to available resources because of its modular architecture.
GoogLeNet
Pros:

1. Reduced number of operations and computational cost because of the use of 1x1
convolution.
2. Capable of capturing features at multiple scales with the help of different kernel size;
3x3 and 5x5, at the same level.
3. Also performs well while working with low contrast images.
4. Highly scalable according to available resources because of its modular architecture.
5. Avoid vanishing gradient problem to some extent by introducing the auxiliary classifiers.
6. Requires less memory compared to previous models like VGG and AlexNet.
GoogLeNet
Cons:

1. Due to the complex architecture, it’s harder to interpret.


2. Still requires a substantial memory.
3. Possible features loss when moving from one module to another.
References:

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015).
Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision
and pattern recognition (pp. 1-9).

You might also like