20062024 rg AI summer school CNN transfer learning

Artificial Intelligence
Summer School:
Machine Learning
Deep
Learning
Machine
• Day 9: Popular CNN Architectures,
Learning Transfer Learning, Fine-tuning
Artificial
Intelligence
Dr Rinki Gupta, Prof, ASET, ACAI
Amity Centre for Artificial Intelligence, Amity University, Noida, India

Outline: Day 2
• Popular CNN Architectures

• ILSVRC
• AlexNet
• VGGNet
• Transfer Learning
• Method
• Types
• Fine-tuning and its Applications

Popular CNN
Architectures

ImageNet Dataset
• ImageNet is a dataset of
• over 15 million labelled high-
resolution images
• from ~22,000 categories
• ImageNet Large Scale Visual
Recognition Challenge (ILSVRC):
• Between 2010 -2017
• Uses ~1000 categories, each with
~1000 images
https://image-net.org/index.php

Popular CNN Architectures
• ImageNet Large Scale Visual (Layers) In ILSVRC
Recognition Challenge
(7)
(ILSVRC) winners:
• AlexNet (8)
• VGGNet
• GoogLeNet/Inception
• ResNet (22)
(16 or 19)
25 million
(152)
(ResNet 50)

Popular CNN Architectures
Top-5 Error Rate, eg.
(Layers) In ILSVRC
• Input: cat image
(7)
• Neural network outputs:
• Tiger: 0.4 • Lion: 0.08 (8)
• Dog: 0.3 • Bird: 0.02

• Cat: 0.1 • Bear: 0.01
• Lynx: 0.09 (22)
• Top-1 o/p contains Tiger
• Top-5 o/p contains Cat (16 or 19)
25 million
(152)
(ResNet 50)

AlexNet
• Developed by Alex Krizhevsky under the guidance of Sutskever Ilya and
Geoffrey E. Hinton (PhD supervisors) in 2012
• Winner of ILSVRC challenge 2012 (top-5 error of 15.3%)
• 8-layer CNN (5 Conv+MaxPool and 3 FC),
• More than 60 million parameters

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." Advances in neural information processing systems 25 (2012): 1097-1105.

AlexNet
• Input:
227x227x3
• Conv 11x11,
5x5, three 3x3
• 3 MaxPool
• ReLu for
hidden units
• Softmax for
output
6x6x256=9216
Dropout p=0.5 Dropout p=0.5

AlexNet
• ReLU activation (avoid vanishing gradient),
• Data Augmentation (avoid overfitting),
• Dropout regularization (avoid co-adaptation)
• Introduced Local Response Normalization (LRN)
• LRN is a non-trainable layer that square-
normalizes the pixel values in a feature map
within a local neighbourhood (Inter-channel,
Intra-channel)
• does lateral inhibition: refers to the capacity of
a neuron to reduce the activity of its
neighbours

VGGNet
• Developed by Visual Geometry Group in 2014
• VGG16 was 2nd in ILSVRC challenge 2014 (top-5 classification error of 7.32%)
• Characterized by Simplicity and Depth
• All Conv layers with 3x3 filters and stride 1, SAME padding
• All max polling layers 2x2 filters, stride 2
• VGG16: 16-layer CNN (16 layers with trainable parameters, over 134 million
parameters); VGG19: 19-layer CNN (more than 144 million parameters)
Karen Simonyan, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv
preprint arXiv:1409.1556 (2014).

VGGNet
Conv= 3x3 filter, s=1, same ReLU activation in all hidden units
Max pool= 2x2, s=2 (5 Max pooling layers) Softmax activation in output units

Stacking of Conv Conv3 Conv3
• Multiple Stacked Conv layers lead to
Wide Receptive Field
• In VGG, varying filter sizes are
implemented by stacking Conv layers
with fixed filter sizes
• Limitation: Training is very slow

5x5 3x3
Effective Receptive Field

Transfer Learning
.
Knowing one programming language makes learning of

another programming language simpler

How TL works in case of Deep Learning Models?
• The machine exploits the
knowledge gained from a Source
task to improve generalization
about Target task
• Improvement of learning in a new
task through the transfer of
knowledge from a related task
that has already been learned
• When to use TL? New dataset is
small and similar to the original
dataset

Steps in Transfer Learning

1-Obtain 2. Create a base model
pre-trained model
• VGG-16
• VGG-19
• Inception V3
• XCeption
• ResNet-50

3. Freeze layers
• Freezing the starting layers from the
pre-trained model is essential to avoid
the additional work of making the
model learn the basic features.
• If we do not freeze the initial layers, we
will lose all the learning that has
already taken place. This will be no
different from training the model from
scratch and will be a loss of time,
resources, etc.

4. Add new trainable layers
• The only knowledge we are reusing from the base model is the feature extraction layers. We need to add additional
layers on top of them to predict the specialized tasks of the model. These are generally the final output layers.

5. Train the new layers
• The pre-trained model’s final output will most likely

differ from the output we want for our model. For
example, pre-trained models trained on the
ImageNet dataset will output 1000 classes.
• However, if we need our model to work for two
classes, we have to train the model with a new
output layer in place

Fine-tune the model
• One method of improving the performance is fine-tuning.

• Fine-tuning involves unfreezing some part of the base model and
training the entire model again on the whole dataset at a very low
learning rate.
• A low learning rate will increase the performance of the model on
the new dataset while preventing overfitting.

Applications of Transfer Learning
One of the most popular and successful applications of transfer

learning in neural networks is image recognition.
Image recognition is the task of identifying and classifying objects,
faces, scenes, or emotions in images.
There are many pre-trained neural networks that have been trained on
large and diverse image datasets, such as ImageNet or COCO, that can
be used as feature extractors or fine-tuned for new image recognition
tasks.
For example, you can use a pre-trained network like ResNet or VGG to
extract features from your own images, and then add a new classifier
layer on top to train on your specific image recognition problem, such
as flower type identification or medical image diagnosis

Summary
• There are many popular CNN Architectures in literature, such as

AlexNet, VGGNet, GoogleNet, ResNet
• Above mentioned models are winners in the ImageNet Large Scale
Visual Recognition Challenge (ILSVRC)
• In Transfer Learning, the machine exploits the knowledge gained from
a Source task to improve generalization about Target task
• Fine-tuning involves unfreezing some part of the base model and
training the entire model again on the whole dataset at a very low
learning rate

References
1. Machine Learning: A Probabilistic Perspective" by Kevin Murphy, published by MIT Press, 2012.
2. "Pattern Recognition and Machine Learning" by Christopher M. Bishop, published by Springer,
2006.
3. "Machine Learning Yearning" by Andrew Ng, published by Goodfellow Publishers, 2018.
4. "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron,
published by O'Reilly Media, 2019.
5. “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville, MIT Press, 2016.
https://www.deeplearningbook.org/

Thank You !
Any Questions

20062024 rg AI summer school CNN transfer learning

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

20062024 rg AI summer school CNN transfer learning

Uploaded by

Copyright:

Available Formats

Artificial Intelligence

Dr Rinki Gupta, Prof, ASET, ACAI

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• Popular CNN Architectures

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• Dog: 0.3 • Bird: 0.02

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• Winner of ILSVRC challenge 2012 (top-5 error of 15.3%)

• 8-layer CNN (5 Conv+MaxPool and 3 FC),

• More than 60 million parameters

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Dropout p=0.5 Dropout p=0.5

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• Limitation: Training is very slow

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Knowing one programming language makes learning of

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• The pre-trained model’s final output will most likely

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• One method of improving the performance is fine-tuning.

Amity Centre for Artificial Intelligence, Amity University, Noida, India

One of the most popular and successful applications of transfer

Amity Centre for Artificial Intelligence, Amity University, Noida, India

• There are many popular CNN Architectures in literature, such as

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

Amity Centre for Artificial Intelligence, Amity University, Noida, India

You might also like