SJNanda Neural Network

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Neural Networks : Supervised Learning

Approach

Dr. Satyasai Jagannath Nanda


Dept. of Electronics & Communication Engineering
Malaviya National Institute of Technology Jaipur
Email : sjnanda.ece@mnit.ac.in
Contents
• Introduction and Inspiration
• Single Input neuron
• Multiple input neuron
• A layer of neuron
• Multi layer perceptron
• ADLINE
• Backpropagation
• FLANN
• RBF
2/63
Biological Inspiration

Figure: Schematic drawing of biological neurons

3/63
• What is a Neural Network?
➢ A method of computing, based on the interaction of multiple
connected processing elements

• What can a Neural Net do?


➢ Compute a known function
➢ Approximate an unknown function
➢ Pattern Recognition
➢ Signal Processing

4/63
Single Input Neuron

Resemblance biological
Neuron and Neuron Model

Strength of Weight
Synapse
Transfer Cell body
function and
summation
Signal on Output
axon

f = Transfer function or activation function


5/63
✓ One can choose neurons with or without biases. The bias gives the
network an extra variable, and so you might expect that networks with
biases would be more powerful than those without bias

✓ The bias b is like a weight, except that it has constant input 1.

✓ w and b are both adjustable parameters of the neuron.

✓ The transfer function may be linear or non linear

✓ The transfer function is chosen by the designer and then the


parameters will be adjusted by some learning rule so that the neuron
input/output relationship meets some specific goal.

6/63
Nonlinear functions

7/63
Multiple-Input Neuron

The first index indicates the particular neuron destination


for that weight. The second index indicates the source of
the signal fed to the neuron

8/63
A layer of Neurons

9/63
Multiple Layer of Neurons

10/63
Perceptron Learning Rule

Learning rule means a procedure for modifying the


weights and biases of a network.

11/63
Decision Boundary

The decision boundary is determined by the input


vectors for which the net Input n is zero:

12/63
13/63
Example: AND Gate

14/63
Testing

All other points can be tested in this way

A single-neuron Perceptron can classify input vectors into


two categories, since its output can be either 0 or 1.

If the number of neurons is S then there will be 2S possible


categories..
15/63
Perceptron Learning Rule

Supervised Learning
Reinforcement learning
unsupervised learning

The learning rule is an example of supervised training, in


which the learning rule is provided with a set of examples of
proper network behavior:

16/63
Constructing Learning Rules

Let

17/63
18/63
Unified Learning Rule

19/63
Neural Network

A neural network is a massively parallel distributed


processor made up of simple processing unit, which has a
natural propensity for storing experimental knowledge and
making it available for use. It resemble the brain in two
respects
(1)Knowledge is acquired by the network from its
environment through a learn process.
(2)Inter connection strengths, known as synaptic weights are
used to store the acquired knowledge

20/63
ADALINE

21/63
22/63
Wiener-Hopf equation
• Considering weight and bias in one vector

Similarly the bias input “1” as


a component of the input
vector

Output of Network

Network mean square error:

23/63
The expression be written in the following convenient form:

Where

Comparing with quadratic equation

Where

Taking gradient

24/63
Least mean square algorithm

Mean square error

Then, at each iteration we have a gradient estimate


of the form:

Derivative with respect to the Weight

Derivative with respect to the


bias

25/63
LMS Con..
• We obtain after derivative weight and bias

Note that and 1 are the elements of the input vector


z , so the gradient of the squared error at iteration can be
written
Using steepest descent

Splitting x into weight


and bias

26/63
Backpropagation
• The perceptron learning rule of Frank Rosenblatt and the LMS algorithm of
Bernard Widrow and Marcian Hoff were designed to train single-layer
perceptron-like networks.
• Single-layer networks suffer from the disadvantage that they are only able to
solve linearly separable classification problems.
• The difference between the LMS algorithm and backpropagation is only in the
way in which the derivatives are calculated.
• For a single-layer linear network the error is an explicit linear function of the
network weights, and its derivatives with respect to the weights can be easily
computed
• In multilayer networks with nonlinear transfer functions, the relationship
between the network weights and the error is more complex.
• In order to calculate the derivatives, we need to use the chain rule of calculus.
In fact, this chapter is in large part a demonstration of how to use the chain
rule.

27/63
Three layer network

28/63
Back Propagation Training

x1

yl ( n )
x2

el ( n )
d (n)
-

+
Back-Propagation Σ
Algorithm

Neural network using BP algorithm

The error signal is given by el ( n ) = d ( n ) − yl ( n )

29/63
M is the layer of the network

Mean square error:

If the network has multiple outputs this


generalizes to

Where the expectation of the squared error has been


replaced by the squared error at iteration .

30/63
• The steepest descent algorithm for the approximate mean square error
(stochastic gradient descent) is

where alpha is the learning rate.

• So far, this development is identical to that for the LMS algorithm. Now we
come to the difficult part – the computation of the partial derivatives.

31/63
Chain rule
To review the chain rule, suppose that we have a function f that is an explicit
function only of the variable n. We want to take the derivative of with respect
to a third variable w . The chain rule is then:

For example, if

32/63
• The second term in each of these equations can be easily computed, since
• the net input to layer is an explicit function of the weights and bias in that layer:

Therefore

If we now define

33/63
• We can now express the approximate steepest descent algorithm as

In matrix form this becomes:

where

34/63
To compute the sensitivities Sm

Lets take Jacobian matrix:

Next we want to find an expression for this matrix.


Consider the i, j element of the matrix:

where

35/63
• Therefore the Jacobian matrix can be written

• Now we can see where the backpropagation algorithm derives its name. The
sensitivities are propagated backward through the network from last layer to
the first layer:

36/63
To compute sm

Now, since

we can write
This can be expressed in matrix form as

37/63
Example on Backpropagation

We want to approximate the function

38/63
Before we begin the backpropagation algorithm we need to choose some initial
values for the network weights and biases. Generally these are chosen to be small
random values.

The response of the network for these initial values is illustrated


in Figure

39/63
• Next, we need to select a training set .
• In this case, we will sample the function at 21 points in the range [-2,2] at
equally spaced intervals of 0.2. The training points are indicated by the circles
in previous Fig.
• For our initial input we will choose p=1, which is the 16th training point:
• The output of the first layer is then

40/63
• The second layer output is

The error would then be

The next stage of the algorithm is to backpropagate the sensitivities. Before


we begin the backpropagation, recall that we will need the derivatives of the transfer
functions

41/63
We can now perform the backpropagation. The starting point is found at the second
layer,

The first layer sensitivity is then computed by backpropagating the sensitivity from
the second layer,

42/63
• The final stage of the algorithm is to update the weights. For simplicity, we will
use a learning rate α=0.1

This completes the first iteration of the backpropagation algorithm. We next proceed to
randomly choose another input from the training set and perform another iteration of
the algorithm. We continue to iterate until the difference between the network response
and the target function reaches some acceptable level.
43/63
FLANN
• MLP has high computation complexity as it has many layers
• The FLANN is basically a single layer structure in which non-
linearity is introduced by enhancing the input pattern with
nonlinear function expansion.

44/63
Trigonometric FLANN

45/63
Chebyshev FLANN
These higher Chebyshev polynomials for -1<x<1 may be generated
using the recursive formula given by

The first few Chebyshev polynomials are given


by

46/63
Legendre-FLANN

The higher order Legendre polynomials can be derived from the


following recursion formula:

47/63

You might also like