SJNanda Neural Network

Neural Networks : Supervised Learning
Approach
Dr. Satyasai Jagannath Nanda

Dept. of Electronics & Communication Engineering
Malaviya National Institute of Technology Jaipur
Email : sjnanda.ece@mnit.ac.in
Contents
• Introduction and Inspiration
• Single Input neuron
• Multiple input neuron
• A layer of neuron
• Multi layer perceptron
• ADLINE
• Backpropagation
• FLANN
• RBF
2/63
Biological Inspiration
Figure: Schematic drawing of biological neurons
3/63
• What is a Neural Network?
➢ A method of computing, based on the interaction of multiple
connected processing elements
• What can a Neural Net do?

➢ Compute a known function
➢ Approximate an unknown function
➢ Pattern Recognition
➢ Signal Processing
4/63
Single Input Neuron
Resemblance biological
Neuron and Neuron Model
Strength of Weight
Synapse
Transfer Cell body
function and
summation
Signal on Output
axon
f = Transfer function or activation function

5/63
✓ One can choose neurons with or without biases. The bias gives the
network an extra variable, and so you might expect that networks with
biases would be more powerful than those without bias
✓ The bias b is like a weight, except that it has constant input 1.
✓ w and b are both adjustable parameters of the neuron.
✓ The transfer function may be linear or non linear
✓ The transfer function is chosen by the designer and then the

parameters will be adjusted by some learning rule so that the neuron
input/output relationship meets some specific goal.
6/63
Nonlinear functions
7/63
Multiple-Input Neuron
The first index indicates the particular neuron destination

for that weight. The second index indicates the source of
the signal fed to the neuron
8/63
A layer of Neurons
9/63
Multiple Layer of Neurons
10/63
Perceptron Learning Rule
Learning rule means a procedure for modifying the

weights and biases of a network.
11/63
Decision Boundary
The decision boundary is determined by the input

vectors for which the net Input n is zero:
12/63
13/63
Example: AND Gate
14/63
Testing
All other points can be tested in this way
A single-neuron Perceptron can classify input vectors into

two categories, since its output can be either 0 or 1.
If the number of neurons is S then there will be 2S possible

categories..
15/63
Perceptron Learning Rule
Supervised Learning
Reinforcement learning
unsupervised learning
The learning rule is an example of supervised training, in

which the learning rule is provided with a set of examples of
proper network behavior:
16/63
Constructing Learning Rules
Let
17/63
18/63
Unified Learning Rule
19/63
Neural Network
A neural network is a massively parallel distributed

processor made up of simple processing unit, which has a
natural propensity for storing experimental knowledge and
making it available for use. It resemble the brain in two
respects
(1)Knowledge is acquired by the network from its
environment through a learn process.
(2)Inter connection strengths, known as synaptic weights are
used to store the acquired knowledge
20/63
ADALINE
21/63
22/63
Wiener-Hopf equation
• Considering weight and bias in one vector
Similarly the bias input “1” as

a component of the input
vector
Output of Network
Network mean square error:
23/63
The expression be written in the following convenient form:
Where
Comparing with quadratic equation
Where
Taking gradient
24/63
Least mean square algorithm
Mean square error
Then, at each iteration we have a gradient estimate

of the form:
Derivative with respect to the Weight
Derivative with respect to the

bias
25/63
LMS Con..
• We obtain after derivative weight and bias
Note that and 1 are the elements of the input vector

z , so the gradient of the squared error at iteration can be
written
Using steepest descent
Splitting x into weight

and bias
26/63
Backpropagation
• The perceptron learning rule of Frank Rosenblatt and the LMS algorithm of
Bernard Widrow and Marcian Hoff were designed to train single-layer
perceptron-like networks.
• Single-layer networks suffer from the disadvantage that they are only able to
solve linearly separable classification problems.
• The difference between the LMS algorithm and backpropagation is only in the
way in which the derivatives are calculated.
• For a single-layer linear network the error is an explicit linear function of the
network weights, and its derivatives with respect to the weights can be easily
computed
• In multilayer networks with nonlinear transfer functions, the relationship
between the network weights and the error is more complex.
• In order to calculate the derivatives, we need to use the chain rule of calculus.
In fact, this chapter is in large part a demonstration of how to use the chain
rule.
27/63
Three layer network
28/63
Back Propagation Training
x1
yl ( n )
x2
el ( n )
d (n)
-
+
Back-Propagation Σ
Algorithm
Neural network using BP algorithm
The error signal is given by el ( n ) = d ( n ) − yl ( n )
29/63
M is the layer of the network
Mean square error:
If the network has multiple outputs this

generalizes to
Where the expectation of the squared error has been

replaced by the squared error at iteration .
30/63
• The steepest descent algorithm for the approximate mean square error
(stochastic gradient descent) is
where alpha is the learning rate.
• So far, this development is identical to that for the LMS algorithm. Now we
come to the difficult part – the computation of the partial derivatives.
31/63
Chain rule
To review the chain rule, suppose that we have a function f that is an explicit
function only of the variable n. We want to take the derivative of with respect
to a third variable w . The chain rule is then:
For example, if
32/63
• The second term in each of these equations can be easily computed, since
• the net input to layer is an explicit function of the weights and bias in that layer:
Therefore
If we now define
33/63
• We can now express the approximate steepest descent algorithm as
In matrix form this becomes:
where
34/63
To compute the sensitivities Sm
Lets take Jacobian matrix:
Next we want to find an expression for this matrix.

Consider the i, j element of the matrix:
where
35/63
• Therefore the Jacobian matrix can be written
• Now we can see where the backpropagation algorithm derives its name. The
sensitivities are propagated backward through the network from last layer to
the first layer:
36/63
To compute sm
Now, since
we can write
This can be expressed in matrix form as
37/63
Example on Backpropagation
We want to approximate the function
38/63
Before we begin the backpropagation algorithm we need to choose some initial
values for the network weights and biases. Generally these are chosen to be small
random values.
The response of the network for these initial values is illustrated

in Figure
39/63
• Next, we need to select a training set .
• In this case, we will sample the function at 21 points in the range [-2,2] at
equally spaced intervals of 0.2. The training points are indicated by the circles
in previous Fig.
• For our initial input we will choose p=1, which is the 16th training point:
• The output of the first layer is then
40/63
• The second layer output is
The error would then be
The next stage of the algorithm is to backpropagate the sensitivities. Before

we begin the backpropagation, recall that we will need the derivatives of the transfer
functions
41/63
We can now perform the backpropagation. The starting point is found at the second
layer,
The first layer sensitivity is then computed by backpropagating the sensitivity from
the second layer,
42/63
• The final stage of the algorithm is to update the weights. For simplicity, we will
use a learning rate α=0.1
This completes the first iteration of the backpropagation algorithm. We next proceed to
randomly choose another input from the training set and perform another iteration of
the algorithm. We continue to iterate until the difference between the network response
and the target function reaches some acceptable level.
43/63
FLANN
• MLP has high computation complexity as it has many layers
• The FLANN is basically a single layer structure in which non-
linearity is introduced by enhancing the input pattern with
nonlinear function expansion.
44/63
Trigonometric FLANN
45/63
Chebyshev FLANN
These higher Chebyshev polynomials for -1<x<1 may be generated
using the recursive formula given by
The first few Chebyshev polynomials are given

by
46/63
Legendre-FLANN
The higher order Legendre polynomials can be derived from the

following recursion formula:
47/63

SJNanda Neural Network

Uploaded by

Copyright:

Available Formats

You might also like

SJNanda Neural Network

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SJNanda Neural Network

Uploaded by

Copyright:

Available Formats

Neural Networks : Supervised Learning

Dr. Satyasai Jagannath Nanda

Figure: Schematic drawing of biological neurons

• What can a Neural Net do?

f = Transfer function or activation function

✓ The bias b is like a weight, except that it has constant input 1.

✓ w and b are both adjustable parameters of the neuron.

✓ The transfer function may be linear or non linear

✓ The transfer function is chosen by the designer and then the

The first index indicates the particular neuron destination

Learning rule means a procedure for modifying the

The decision boundary is determined by the input

All other points can be tested in this way

A single-neuron Perceptron can classify input vectors into

If the number of neurons is S then there will be 2S possible

The learning rule is an example of supervised training, in

A neural network is a massively parallel distributed

Similarly the bias input “1” as

Network mean square error:

Comparing with quadratic equation

Mean square error

Then, at each iteration we have a gradient estimate

Derivative with respect to the Weight

Derivative with respect to the

Note that and 1 are the elements of the input vector

Splitting x into weight

Neural network using BP algorithm

The error signal is given by el ( n ) = d ( n ) − yl ( n )

Mean square error:

If the network has multiple outputs this

Where the expectation of the squared error has been

where alpha is the learning rate.

In matrix form this becomes:

Lets take Jacobian matrix:

Next we want to find an expression for this matrix.

We want to approximate the function

The response of the network for these initial values is illustrated

The error would then be

The next stage of the algorithm is to backpropagate the sensitivities. Before

The first few Chebyshev polynomials are given

The higher order Legendre polynomials can be derived from the

You might also like