19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan

19EEE362:Deep Learning For Visual Computing
Dr.T.Ananthan
Department of EEE,
School of Engineering,
AMRITA Vishwa Vidyapeetham,
Coimbatore.
April 22, 2023
Dr.T.Ananthan (Department of EEE) 19EEE362:Deep Learning For Visual Computing April 22, 2023 1 / 21
Deep Feedforward Networks
Deep feedforward networks are often called as feedforward net-

works or multilayer perceptrons(MLPs) are typical example of
deep learning models.
Feedforward neural networks are called networks because they are
typically represented by composing together many different func-
tions.
For example, we might have three functions 𝑓 (1), 𝑓 (2), and 𝑓 (3)
connected in a chain, to form 𝑓 (𝑥) = 𝑓 (3)( 𝑓 (2)( 𝑓 (1)(𝑥))).
In this case, 𝑓 (1) is called the first layer of the network, 𝑓 (2) is
called the second layer, and so on.
The overall length of the chain gives the depth of the model. It
is from this terminology that the name deep learning arises. The
final layer of a feedforward network is called the output layer.
Deep Learning
Activation Functions
Threshold
Sigmoid
Tanh
ReLu (Rectified Linear Unit)
Leaky ReLu
Softmax
Threshold Activation Function
(
1, if 𝑣 ≥ 0
𝜑(𝑣) =
0, if 𝑣 < 0
1 It assumes 0 or 1
2 Not differentiable
Sigmiod Activation Function
1
𝜑(𝑣) =
1 + 𝑒𝑥 𝑝(−𝑎𝑣)
Sigmiod Activation Function.....
1 It is a non linear function

2 It assumes continuous range of value from 0 to 1
3 It is differentiable
4 At the bottom of the tails, most values become zero
during the training and hence most important aspect of
learning of neural network is inhibited.
5 It is undesirable to have values squashed near tails where
the gradient is zero
6 Its output not zero centered
Tanh Activation Function
𝜑(𝑣) = 𝑡𝑎𝑛ℎ(𝑣)
This is similar to sigmoid function except that

It assumes value continuous range of values from -1 to +1
It is zero centered
Deep Network
Deep network have Vanishing gradient problem.
Vanishing Gradient Problem
In back propagation algorithm during the backward pass the weights

are updated.
These weights are updated using local gradient in each layer.
The local gradient of each layer is computed using the derivative of
sigmoid or tanh activation functions.
During the backward pass as the number of layers increases the
local gradient is very small or approaches zero.
Hence no training can take place.
This is happening because of the activation function output lies in
the range of 0 to 1 or -1 to +1.
This is called vanishing gradient problem
To overcome this problem in deep network ReLu (Rectified Linear unit)

activation can be used.
ReLu activation function
Rectified Line
(
𝑣, if 𝑣 ≥ 0
𝜑(𝑣) =
0, if 𝑣 < 0
It is non linear.
It is continuously differentiable.
It assumes a continuous range of values from 0 to 𝑣 or in general 0 to ∞.
However, the problem its output is for negative values of 𝑣.
Leaky ReLu activation function
(
𝑣, if 𝑣 ≥ 0
𝜑(𝑣) =
0.01𝑣, if 𝑣 < 0
Binary Classification Problem
Problem statement: Given the data, to predict the flower is iris setosa or not
Dataset
Neural Network Architecture
Binary classification
Multiple class Problem
Problem statement: Given the data, to predict the flower is iris setosa or iris
versicolor or iris virginica
Dataset
Neural Network Architecture
Multiple Classes
Target Vector for Multiple Classes
iris setosa =[1 0 0]

iris vericolor =[0 1 0]
iris virginica =[0 0 1]
In general for 𝐾 classes,
𝐶1 =[1 . . . . . . 0]
𝐶2 =[0 1 . . . 0]
..
.
𝐶 𝑘 =[0 . . . . . . 1]
Softmax Activation Function
For multiple class problems softmax activation function is used only in

the output layer.
Softmax is function that converts a vector of numbers into a vector of
probabilities. 𝑥 1 𝑥 2 𝑥3 𝑥 4 𝑥 5
Softmax-Example
Softmax-Example.....
𝑒 𝑧𝑖
𝑓 (z) = Í 𝑘
𝑗=1 𝑒𝑧 𝑗
𝑒 𝑧1
𝑓 (𝑧 1 ) =
𝑒 𝑧1 + 𝑒 𝑧2 + 𝑒 𝑧2 + 𝑒 𝑧4 + 𝑒 𝑧5
𝑒 1.3
= 5.1
𝑒 + 𝑒 2.2 + 𝑒 0.7 + 𝑒 1.1
3.66 3.66
= = = .02
3.66 + 164.02 + 9.02 + 2.01 + 3.0 181.71
Softmax-Example.....
𝑒 5.1
𝑓 (𝑧 2 ) = = 0.90
181.71
𝑒 2.2
𝑓 (𝑧 3 ) = = 0.05
181.71
𝑒 0.7
𝑓 (𝑧 4 ) = = 0.01
181.71
𝑒 1.1
𝑓 (𝑧 5 ) = = 0.02
181.71
𝑓 (z) =[0.02 0.90 0.05 0.01 0.02]
=0.02 + 0.90 + 0.05 + 0.01 + 0.02 = 1

19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

19EEE362:Deep Learning For Visual Computing: Dr.T.Ananthan

Uploaded by

Copyright:

Available Formats

19EEE362:Deep Learning For Visual Computing

April 22, 2023

Deep feedforward networks are often called as feedforward net-

1 It is a non linear function

This is similar to sigmoid function except that

Deep network have Vanishing gradient problem.

In back propagation algorithm during the backward pass the weights

To overcome this problem in deep network ReLu (Rectified Linear unit)

iris setosa =[1 0 0]

In general for 𝐾 classes,

For multiple class problems softmax activation function is used only in

You might also like