Professional Documents
Culture Documents
Unit 1
Unit 1
Dr.S.Sangeetha
Associate Professor
Department of Artificial Intelligence and Data Science
Kumaraguru College of Technology
Course Outcomes
CO 2: Design the test procedures to assess the efficacy of the developed model.
CO 3: Identify and apply appropriate deep learning models for analyzing the data for a variety of problems.
Machine Learning and Deep Learning, Representation Learning, Width and Depth of Neural
Networks, Learning Algorithms: Capacity - Overfitting - Underfitting - Bayesian Classification -
Activation Functions: RELU, LRELU, ERELU, Unsupervised Training of Neural Networks, Restricted
and Deep Boltzmann Machines , Auto Encoders
ADVANCED NEURAL NETWORKS 11 Hours
Deep Feedforward Networks : Gradient based learning - Hidden Units - Architectural design – Back
Propagation algorithms - Regularization for deep learning: Dataset Augmentation - Noise Robustes
–Semi supervised learning -Multitask learning - Deep Belief networks -Generative Adversial
Networks by Keras MXnet
Books
Reference Books:
𝑦ො = 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛(𝑤1 𝑥1 + 𝑤2 𝑥2
+ ⋯ 𝑤𝑛 𝑥𝑛 + 𝑏)
Model Building
Architecture
Loss function
Optimization
Metrics
Architecture
Number of layers
Number units / layer
Activation function
Linear vs Nonlinear
Activation functions
σ𝑛𝑖=1(𝑦𝑖 − 𝑦)
ො 2
𝐽(𝑏, 𝑤) =
𝑛
Gradient Descent lr:optimal
Neural Network
Single Neuron Solution
? ?
Four Neurons Solution
? ?
5 5
Hidden Layer Solution
? ?
?
Convolutional Neural Networks
1. Introduction
2. The Convolution Operation
3. Motivation
4. Pooling
5. Convolution and Pooling as an Infinitely Strong Prior
6. Variants of the Basic Convolution Function
2 InfoLab
Introduction
InfoLab
Multi-Layer Perceptron (MLP) for image recognition task
28x28 image
4 InfoLab
Convolutional neural network (CNN)
CNN consists of:
- Convolutional layer
- Pooling layer
- Fully connected layer
CNN learns convolutional layer’s kernel
- To make computation easier , CNN uses cross-correlation
instead of convolution
25 InfoLab
ILSVRC
ImageNet Large Scale Visual Recognition (ILSVRC)
- One of the biggest computer vision contest
- 1M images, 1000 labels
26 InfoLab
Convolutional Neural Network
- AlexNet (Krizhevsky et al.) won the ILSVRC
- University of Toronto : Error rate of 16.4%
It also showed CNN can be accelerated through GPU
27 InfoLab
Convolutional Neural Network
InfoSeminar
InfoLab
Convolution
• Convolution is a mathematical operation that combines two
functions to produce a third function. In the context of signal
processing and neural networks, convolution is often used to
process data through filters or kernels.
• In convolutional network terminology, the first argument to
the convolution is often referred to as the input and the
second argument as the kernel. The output is sometimes
referred to as the feature map.
Convolution in the context of discrete
Discrete 1D convolution in computer
Input 𝒙 3 7 5 6 4 2 1
Output 𝒔 1.2
31 InfoLab
Convolution in the context of discrete
Discrete 1D convolution in computer
Input 3 7 5 6 4 2 1
𝒙
Output 𝒔 4.2
32 InfoLab
Convolution in the context of discrete
Discrete 1D convolution in computer
Input 3 7 5 6 4 2 1
𝒙
Output 𝒔 7.7
33 InfoLab
Convolution in the context of discrete
Discrete 1D convolution in computer
Input 3 7 5 6 4 2 1
𝒙
Output 𝒔 11.9
34 InfoLab
Convolution in the context of discrete
Discrete 1D convolution in computer
Input 3 7 5 6 4 2 1
𝒙
Output 𝒔 13.1
35 InfoLab
Cross-correlation
Discrete 1D cross-correlation
Input 3 7 5 6 4 2 1
Output 16.1
36 InfoLab
Convolution vs. Cross-correlation
Convolution
Cross-correlation
37 InfoLab
Convolution
In terms of deep learning, an (image) convolution is an element-
wise multiplication of two matrices followed by a sum.
1.Take two matrices (which both have the same dimensions).
2.Multiply them, element-by-element (i.e., not the dot product,
just a simple multiplication).
3.Sum the elements together.
Nearly all machine learning and deep learning libraries use the
simplified cross-correlation function.
2D Convolution
2D convolution
Convolutional Neural Network (CNN) learns appropriate
value of kernel
Many neural network libraries implement a related
function to convolution called the cross-correlation ,
which is the same as convolution but without flipping
the kernel:
source : https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html
40 InfoLab
Convolution
Computer Vision Problem
vertical edges
0 0 0 10 10 10
0 0 0 10 10 10 0 -30 -30 0
0 0 0 10 10 10 1 0 -1 0 -30 -30 0
0 0 0 10 10 10
∗ 1 0 -1 =
0 -30 -30 0
0 0 0 10 10 10 1 0 -1 0 -30 -30 0
0 0 0 10 10 10
Andr
ew Ng
Vertical and Horizontal Edge Detection
1 0 -1 1 1 1
1 0 -1 0 0 0
1 0 -1 -1 -1 -1
Vertical Horizontal
10 10 10 0 0 0
10 10 10 0 0 0 0 0 0 0
1 1 1
10 10 10 0 0 0 30 10 -10 -30
∗ 0 0 0 =
0 0 0 10 10 10 30 10 -10 -30
-1 -1 -1
0 0 0 10 10 10 0 0 0 0
0 0 0 10 10 10
Andr
ew Ng
Learning to detect edges
1 0 -1
1 0 -1
1 0 -1
3 0 1 2 7 4
1 5 8 9 3 1
𝑤1 𝑤2 𝑤3
2 7 2 5 1 3
𝑤4 𝑤5 𝑤6
0 1 3 1 7 8
𝑤7 𝑤8 𝑤𝖾
4 2 1 6 2 8
2 4 5 2 3 9
Andr
ew Ng
InfoSeminar
Motivation
InfoLab
Benefits when includes convolution layer in deep learning
48 InfoLab
Sparse interaction
• Traditional neural network layers use matrix
multiplication by a matrix of parameters with a separate
parameter describing the interaction between each input
unit and each output unit.
• This means every output unit interacts with every input
unit.
• Convolutional networks, however, typically have referred
to as sparse connectivity or sparse interactions sparse
weights. This is accomplished by making the kernel
smaller than the input.
• For example, when processing an image, the input
image might have thousands or millions of pixels, but we
can detect small, meaningful features such as edges
with kernels that occupy only tens or hundreds of pixels.
Sparse interaction (traditional neural net)
Neurons of input layer and hidden layer are “fully
connected”
Complexity is 𝑶 𝒏 ∗ 𝒎
Output 𝒏
Weight 𝒏 ∗
𝒎
Input
𝒎
50 InfoLab
Sparse interaction (conv)
To make the kernel smaller than the input causes sparsity
Complexity is 𝑶 𝒏 ∗ 𝒌
Fewer parameters are required to be stored and computed
Output 𝒏
Weight 𝒌
Input
𝒎
51 InfoLab
Sparse interaction (comparison)
Output neurons influenced by single input neuron
Output
Weight
Input
Output
Weight
Input
52 InfoLab
Sparse interaction (comparison)
Input neurons influence single output neuron
Output
Weight
Input
Output
Weight
Input
53 InfoLab
Sparse interaction (indirectly fully influenced)
Conv2 output
Conv1 output
Input
54 InfoLab
Why convolutions
10 10 10 0 0 0
10 10 10 0 0 0 0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
∗ 1 0 -1 =
0 30 30 0
1 0 -1
10 10 10 0 0 0 0 30 30 0
10 10 10 0 0 0
56 InfoLab
Parameter Sharing
Parameter sharing refers to using the same parameter for
more than one function in a model.
In a traditional neural net, each element of the weight
matrix is used exactly once when computing the output of a
layer.
It is multiplied by one element of the input and then never
revisited.
A network has tied weights , because the value of the
weight applied to one input is tied to the value of a weight
applied elsewhere.
The parameter sharing used by the convolution operation
means that rather than learning a separate set of
parameters for every location, we learn only one set.
Equivariance Translation
Equivariance Translation
• To say a function is equivariant means that if the input
changes, the output changes in the same way.
∗ =
𝑛 + 2𝑝 − 𝑓 + 1 ∗ 𝑛 + 2𝑝 − 𝑓 + 1
Andr
ew Ng
Valid and Same convolutions
“Valid”: no padding
𝑓−1
𝑝=
2 Andr
ew Ng
Convolutional
Neural
Networks
Strided
deeplearning.ai
convolutions
Strided convolution
2 3 3 4 7 43 4 4 6 34 2 4 9 4
6 1 6 0 9 21 8 0 7 12 4 0 3 2
3 -13 4 04 8 -143 3 4
0 8 -143 9 04 7 43 3 4 4
7 1 8 0 3 21 6 0 6 12 3 0 4 2 ∗ 1 0 2 =
4 -13 2 04 1 -134 8 04 3 -143 4 04 6 34 -1 0 3
3 1 2 0 4 21 1 0 9 12 8 0 3 2
0 -1 1 0 3 -13 9 0 2 -13 1 0 4 3
Andr
ew Ng
Summary of convolutions
𝑛 × 𝑛 image 𝑓 × 𝑓 filter
padding p stride s
𝑛+2𝑝 – 𝑓 𝑛+2𝑝 – 𝑓
+1 × +1
𝑠 𝑠
Andr
ew Ng
Convolutional
Neural
Networks
Convolutions over
deeplearning.ai
volumes
Convolutions on RGB images
Andr
ew Ng
Convolutions on RGB image
∗ =
4x4
Andr
ew Ng
Multiple filters
∗ =
3x3x3 4x4
6x6x3
∗ =
3x3x3
4x4
Andr
ew Ng
Convolutional
Neural
Networks
One layer of a
deeplearning.ai
convolutional
network
Example of a layer
∗
3x3x3
6x6x3
∗
3x3x3
Andr
ew Ng
Number of parameters in one
layer
If you have 10 filters that are 3 x 3 x 3
in one layer of a neural network, how
many parameters does that layer have?
Solution
3*3*3+1=28
28*10 = 210 parameters
Find number of parameters
tf.keras.layers.Conv2D(16, (3,3), activation='relu',
input_shape=(150, 150, 3))
Solution
3*3*3+1=28
28*16=448
Tensorflow code with stride and padding
tf.keras.layers.Conv2D(16, (3,3), strides=(2, 2), padding='same',
activation='relu')
Convolutional
Neural
Networks
A simple convolution
deeplearning.ai
network example
Example ConvNet
Andr
ew Ng
Types of layer in a
convolutional network:
- Convolution
- Pooling
- Fully connected
Andr
ew Ng
Translation Invariance
Convolutional Neural Networks
Pooling layers
deeplearning.ai
Pooling
• Pooling helps to make the representation become
approximately invariant to small translations of the input.
1 3 2 1
2 9 1 1
1 3 2 3
5 6 1 2
Andr
Deep Learning | Pooling and Fully Connected layers (2020 ) (youtube.com)
ew Ng
Pooling layer: Max pooling
1 3 2 1 3
2 9 1 1 5
1 3 2 3 2
8 3 5 1 0
5 6 1 2 9
Andr
ew Ng
Pooling layer: Average pooling
1 3 2 1
2 9 1 1
1 4 2 3
5 6 1 2
Andr
ew Ng
Regularization
L1 Regularization
L2 Regularization
Dropout
L1 Regularization
tf.keras.layers.Conv2D(128, (3,3), activation = 'relu', use_bias=True,
kernel_regularizer =tf.keras.regularizers.l1( l=0.01) )
L2 Regularization
tf.keras.layers.Conv2D(128, (3,3), activation = 'relu', use_bias=True
, kernel_regularizer =tf.keras.regularizers.l2( l=0.01))
Dropout
tf.keras.layers.Conv2D(128, (3,3), activation = 'relu', use_bias=True ),
tf.keras.layers.Dropout(0.2)
The Dropout layer is a mask that nullifies the contribution of some neurons
towards the next layer and leaves unmodified all others. We can apply a
Dropout layer to the input vector, in which case it nullifies some of its features;
but we can also apply it to a hidden layer, in which case it nullifies some hidden
neurons.
CNN Architectures
ResNet
AlexNet
AlexNet
AlexNet famously won the 2012 ImageNet LSVRC-2012
competition by a large margin (15.3% vs 26.2%(second place)
error rates).
Major highlights
1.Used ReLU instead of tanh to add non-linearity.
2.Used dropout instead of regularization to deal with overfitting.
3.Overlap pooling was used to reduce the size of the network.
𝑧 [𝑙+1] = 𝘸 [ 𝑙 + 1 ] 𝑎 [𝑙] + 𝑏 [𝑙+1] 𝑎 [𝑙+1] = 𝑔(𝑧 [𝑙+1] ) 𝑧 [𝑙+2] = 𝘸 [𝑙+2] 𝑎 [𝑙+1] + 𝑏 [𝑙+2] 𝑎 [𝑙+2] = 𝑔(𝑧 [𝑙+2] )
[He et al., 2015. Deep residual networks for image recognition] Andrew Ng
Skip Connection
• The advantage of adding this type of skip connection is
that if any layer hurt the performance of architecture
then it will be skipped by regularization.
• So, this results in training a very deep neural network
without the problems caused by vanishing/exploding
gradient.
• The authors of the paper experimented on 100-1000
layers of the CIFAR-10 dataset.
ResNet-34 Architecture
Inspired from VGG-19
https://youtu.be/o_3mboe1jYI?si=ceyId1ui7Lq0w1eF
ResNet Hand Calculation
Let's consider a simple feedforward neural network with
three layers: an input layer, a hidden layer, and an output
layer. For simplicity, we'll assume each layer has only one
neuron.
1. Input Layer: x1 = 3
2. Hidden Layer: h1 = f(w1 . x1 + b1)
3. Output Layer: y1 = g(w2 . h1 + b2)
Now, let's add a skip connection from the input layer to the
output layer. The output of the network becomes:
ResNet Hand Calculation
y1 = g(w2 . h1 + b2 + x1)
w1 = 0.5
w2 = 2
b1 = 1
b2 = 0
ResNet Hand Calculation
With these values, let's calculate the output y1 :