Ann 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Soft Computing

Unit 1

Dr. Ishan Bhardwaj


REC Bijnor
PERCEPTRON
• In the single layer network, a set of inputs is directly mapped to an
output by using a generalized variation of a linear function. This
simple instantiation of a neural network is also referred to as the
perceptron
• In other words the simplest neural network is referred to as the
perceptron which contains a single input layer and an output node.

Dr. Ishan Bhardwaj, REC Bijnor


Basic architecture of the perceptron
Computation at Output Node of Perceptron

The sign function maps a real value to either +1 or -1, which is appropriate for
binary classification.
Single layer perceptron
• Decision boundary represented by straight line
• This means it is typically useful for binary classification only
Example : Boolean OR function
Example : Boolean AND function
Multilayer perceptron
• In case of non linear functions we need to use multilayer perceptron
multilayer or Neural network to represent them.
• Also in order to do represent such non linear function in multilayer
network it is required to use non linear functions like sigmoid, tanh etc.
• Any Boolean function can be represented using two layer neural network.
• Similarly any continuous function (bounded) can be approximated with
arbitrarily small error by network with one hidden layer.
• Similarly any function can be approximated with arbitrarily accuracy by
network with two hidden layers.
Example : Boolean XOR function
Why we need back-propagation training
How to adjust weights
Gradient descent
• error is a function of the weights

• Aim is to reduce the error

• Principle: move towards the error minimum

• Compute gradient ! get direction to the error minimum

• Adjust weights towards direction of lower error


How to adjust weights cntd…
Back-propagation

• first adjust last set of weights

• propagate error back to each previous layer

• adjust their weights


What is Gradient Descent
• What the name suggest?
• Gradient essentially means slope.
• Descent means to descend, or go down.
• Therefore, gradient descent has something to do with descending down a
slope.
• With gradient descent, we descend down the slope of gradients to
find the lowest point, which is where the error is smallest.
• Gradient descent is an optimization method that helps us find the
exact combination of weights for a network that will minimize the
output error.
Gradient descent
• By descending down the
gradients we are actively
trying to minimize the cost
function and arrive at the
global minimum.
• Our steps are determined by
the steepness of the slope
(the gradient itself) and the
learning rate.
Gradient descent cntd…
• Please note, the gradients
are the errors of the
individual weights, and the
goal of a network is to
minimize these errors.
Methods of Gradient Descent

• Batch Gradient Descent

• Stochastic Gradient Descent (SGD)

• Mini-Batch Gradient Descent


Derivative of Sigmoid
Final Layer Update : One Output Node
Step 1
Step 2
Step 3
Now Representing all the previous steps

Error derivative
of
sigmoid:
y’
For Multiple Output Nodes
Hidden Layer Update?
• In a hidden layer, we do not have a target output value
• But we can compute how much each node contributed to
downstream error
• Definition of error term of each node
Back to Our Example
Challenges with Gradient Descent

• Bad Initialisation

• Slow Learning Rate

• Local Minima
Challenges with Gradient Descent
Challenges with Gradient Descent
Challenges with Gradient Descent
Tips for ANN
• Error can be observed at output layer.
• Hence we first calculate the error at the output node and then
backpropagate the error to update the weights with a aim to reduce
the overall error.
• We use Gradient Descent
• By doing backpropagation it is not necessary that we get global
minima.
• One trick is to use momentum to avoid getting trapped in local
minima. i.e. we maintain the previous direction of weight change in
the previous iterations.

You might also like