Back Propagation

Back-Propagation
Neuron #1 x'1 X1 Neuron #2 . . X

n
x'2 x'i
Neuron #1 . . Neuron #p
Y 1(m+1)
Input
. . Neuron #k
Output
Y p(m+1)
Layer 0 Layer m
Layer m+1
Multi-Layer Perceptron
Back Propagation
Supervised learning mechanism for multilayered, generalized feed forward network Discovered by researchers independently [Werbos(1974), Parker(1982), and Rumelhart(1986)] Played a major role in the reemergence of neural networks as a tool for solving a wide variety of problems
Back Propagation
The most well known and widely used among the current types of NN systems Use differentiable activation function Robust and Stable (based on gradient descent technique, i.e. approximate steepest descent) Recognize patterns similar to those they have learned (do not have ability to recognize new patterns - True for all supervised learning)
BackProp: Basic Neuron
Activation functions
Sigmoid function performs a sort of soft threshold that is rounded and differentiable compared to the step function
Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold
Sigmoid Function
Nonlinear activation function Smooth - differentiable Relatively flat at both ends Rapid rise in the middle Advantage of automatic gain control (no saturation)
BP-Model
Limin Fu, Neural Networks in Computer Intelligence, McGraw Hill
3 layered BP network
5 layered BP network
BP Learning Procedure
The back-prop algorithm is an iterative gradient algorithm designed to minimize the mean square error between the actual output of a multilayer feed-forward perceptron and the desired output. It requires continuous differentiable nonlinearities. The following assumes a sigmoid logistic non-linearity is used.
Neuron #1 x' X1 Neuron #2 . . X
n 1
x' x'
i
Neuron #1 . . Neuron #p
Y 1(m+1)
Input
. . Neuron #k
Output
Y p(m+1)
Layer 0 Layer m
Layer m+1
Step 1: Initialize Weights (Wkm1..Wkmn) and Threshold (Wkm0)
Set all weights and thresholds (optional) to small bipolar random values (). Note that k represents neuron k of layer m and n represents the total number of inputs to the neuron k.
Step 2:
Present New Input and Desired Output Present input vector x1, x2, .....xn along with the desired output dk(t). Note: ** x0 is a fixed bias and always set equal to 1.
** dk(t) is the desired output for output neuron k and takes the value of 0 to 1.
** The input could be new on each trial or samples from the training set could be presented cyclically until weights stabilize.
Step 3:
Calculate Actual Outputs [ykm(t)]
n *
ykm(t) = Fs( wkmi(t)

i=0
xi (t) )
where
Fs(sum) = 1/ (1+e-sum)
wkmi(t) is the weight associated with the i input to neuron k of layer m ykm(t) is the output from neuron k of layer m xi(t) is the i input to neuron k of layer m
BP Forward Pass
ykm(t) = Fs( wkmi(t)

i=0
xi (t) )
BP -Forward Pass: Example
Step 4: Adapt Weights
Use a recursive algorithm starting at the output nodes and working back to the first hidden layer. Adjust weights by wkmi(t+1) = wkmi(t) + * km * xi (t)
where
is the learning rate and usually is a small number ranging from 0 to 1 (typically <= 1/n) xi(t) is the i input to neuron k of layer m km is an error term associated with neuron k of layer m
0<i<n
and
km is defined as
) If m= output layer, km = ykm*(1- ykm)*(dk- ykm) B) If m= hidden layer, p km = ykm*(1- ykm)*( wq(m+1)(t) * q(m+1))
q=1
where
ykm is the actual output of neuron k in layer m, wq(m+1)k(t) is a weight associated with input k of neuron q in layer m+1, and p is the total number of neurons in layer m+1 q(m+1) is an error term associated with neuron q of layer m+1
BP Backward Pass
wkmi(t+1) = wkmi(t) + * km * xi (t) km = ykm*(1- ykm)*(dk- ykm)
BP - Backward at Output
BP Backward Pass
wkmi(t+1) = wkmi(t) + * km * xi (t)

p km = ykm*(1- ykm)*( wq(m+1)(t) * q(m+1))
q=1
BP - Backward at Hidden layer
Step 5: Repeat step 2 to 4
Repeat until the difference between the desired outputs and the actual network outputs are within an acceptable error range(such as 1 % error) for all the input vectors of the training set.
Summary-BP Learning
Apply next exemplar Calculate the output of the network Adjust weights (error back prop) All patterns are not trained to desired accuracy ()
Derivation of BP
BP learning rule
LiMin Fu, Neural Networks in Computer Intelligence, McGraw Hill
Derivation of BP
Derivation of BP
Derivation of BP
If activation function is a Sigmoid function, F(x) = 1/(1+e-x) Then F(x) = (1-F(x))*F(x) Since F(x) = Oj At output Layer = (1-Oj)Oj (Tj-Oj) At Hidden Layer = (1-Oj)OjkWkj
Derivation of BP
If activation function is a bipolar Sigmoid function, F(x) = 2/(1+e-x) -1 Then F(x) = 1/2(1-F(x))*(1+F(x)) Since F(x) = Oj At output Layer = 1/2(1-Oj)(1+Oj )(Tj-Oj) At Hidden Layer = 1/2(1-Oj)(1+Oj)kWkj
Choice of initial weights and Biases

Random Initialization
Influence whether the net reaches a global (or only a local minimum of the error and, if so, how quickly it converges -0.5 and 0.5 (or between -1 and 1)
Nguyen-Widrow Initialization
Designed to improve the ability of the hidden units to learn by distributing the initial weights and bias so that, for each input pattern, it is likely that the net input to one of the hidden units will be in the range in which that hidden neuron will learn most readily
Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall
Let n =number of input units p = number of hidden units = scale factor: = 0.7 (p) 1/n For each hidden unit (j = 1,.,p): wij(old) = random number between -0.5 and 0.5 Compute ||wj(old)|| Reinitialize weights: wij = wij(old)/ ||wj(old)|| Set bias: w0j = random number between - and
Nguyen-Widrow analysis is based on the activation function tanh(x) = (ex-e-x)/(ex+e-x)

Back Propagation

Uploaded by

Copyright:

Available Formats

You might also like

Back Propagation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Back Propagation

Uploaded by

Copyright:

Available Formats

Back-Propagation

Neuron #1 x'1 X1 Neuron #2 . . X

BackProp: Basic Neuron

Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold

Limin Fu, Neural Networks in Computer Intelligence, McGraw Hill

Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold

Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold

ykm(t) = Fs( wkmi(t)

ykm(t) = Fs( wkmi(t)

BP -Forward Pass: Example

Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold

wkmi(t+1) = wkmi(t) + * km * xi (t) km = ykm*(1- ykm)*(dk- ykm)

Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold

wkmi(t+1) = wkmi(t) + * km * xi (t)

BP - Backward at Hidden layer

Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold

LiMin Fu, Neural Networks in Computer Intelligence, McGraw Hill

LiMin Fu, Neural Networks in Computer Intelligence, McGraw Hill

LiMin Fu, Neural Networks in Computer Intelligence, McGraw Hill

LiMin Fu, Neural Networks in Computer Intelligence, McGraw Hill

LiMin Fu, Neural Networks in Computer Intelligence, McGraw Hill

Choice of initial weights and Biases

Laurene Fausett, Fundamentals of Neural Networks, Prentice Hall

You might also like

wkmi(t+1) = wkmi(t) + * km * xi (t) km = ykm(1- ykm)(dk- ykm)