Professional Documents
Culture Documents
9_Lesson_06
9_Lesson_06
6.1 Introduction
In lesson 5 we started to discuss about learning algorithms. As stated, a learning
algorithm provides a mechanism to store the learning effect, due to processing in a
neuron, in a connection of ANN. We learned to use Hebbian and Perceptron for
training of ANN in unsupervised and supervised modes respectively. This lesson
discusses two more supervised learning algorithms, known as, Delta Learning rule and
Backpropagation. These algorithms happen to be the champions of ANN training
algorithms over many decades.
As stated earlier, supervised training deals with desired output (d) and the actual
output, f(net). There not the same in general. As such, there is always an error, or a
deference between d and f(net). Thus any expression such as d-f(net) can be treat as
an error function. Most commonly used error function of a single neuron is defined as
Error = [d-f(net)]2
n
Error = [di-f(neti)]2
i=1 n
In a learning session, we human also face with a reasonable error at the beginning. As
time goes on the error will go down. Towards the end of a learning session we
typically expect and error free learning.
In fact, during a learning session, we would not bother to about learning something if
the error is too big. In other words, we always learn subject to a maximum error. In
other words, we have set out our own maximum error values, denoted by Emax.
With the introduction of concept of error, we apply learning rules to change weights
only if the error in the training cycle is less than some predefined Emax value.
6.3 Delta learning rule
Delta learning rule has been developed by considering the error function written for
each neuron i.
Ei = [di-f(neti)]2
Using calculus, we can find an expression that minimizes the error during weight
change. In other words, we can do the weight change or execute the learning subject
to minimization of error.
The learning process, which ensures the minimizing of error during over the cycles, is
called the Delta learning rule. The delta learning rules is given below.
w = [d-f(net)] f (net) X
Here, f (net) denotes the first derivative of the threshold function. Note that concept
of f (net) is really meaningful for continuous functions. Table 6.1 shows expressions
for calculating f (net) values of our commonly used unipolar and bipolar continuous
functions.
Note that f (net) can also be computed in terms of f(net). For instance, for unipolar
threshold function, f (net) is found by computing the f(net) [1-f(net)].
Note also that Delta learning rule is necessarily a supervised training algorithm. The
weight change calculated by Delta learning rule is generally very small as we compute
it by multiplying [d-f(net)] and f (net), which are individually always less than one.
Use of Delta learning rule requires extra steps such as checking for error before
applying the formulae for changing weights. It should be stated that if errors in a
particular cycle goes beyond the reset Emax value, we should discard the training
session so far and restart from the beginning of the input data set. This is a painful
task, and options are available either.
6.3.1 Steps in applying delta learning rule
Since the application of delta learning rule involves many steps, we shall depicts its
steps before going into an example. For this purpose, we can edit the general steps in
ANN training that we identified in lesson 5. Figure 6.1 shows the steps in the use of
delta learning rule.
Example 6.1
Use the Delta learning rule to train the input pairs X1 = [1, 1] D1= [1, 0] and
X2 =[1, 0, 1] D2 = [0, 1] for a single layer ANN with two neurons in the layer. You may
use the unipolar threshold function for this calculation. Also do the training subject to
Emax = 0.5
Step 1 A
We design a single layer two-neuron ANN.
Step 2
Let =1, =0.01 and Emax =0.5
Step 3
Initialize the weights arbitrarily on the network as follows (Figure 6.2)
0.1
A 1
-0.1
0.1
B 0
0.2
Step 4
Now consider the first input pair X1 = [1, 1] D1= [1, 0]
Consider Neuron A and B
Step 6
Apply: output = f(net) = 1/(1+e-net) for neurons A and B
Step 7
Calculate error for each neuron and compute the average error as
E = [d-f(net)]2/n
= [[dA-f(netA)]2 + [dB-f(netB)]2/2
= [[1-0.5]2 + [0-0.6]2]/2
= [[0.5]2 + [0.4]2]/2
= (0.25 + 0.16)/2
= 0.205
Step 8
Since E(0.205) is less than Emax (0.5) go to Step 9
Step 9
Calculate weight change on Neuron A by using w = [d-f(net)] f (net) X
wA = [dA-f(netA)] f (netA) X
= 0.01x(1-0.5)x0.5x[1-0.5] 1
1
= 0.01x0.125 1
1
= 0.00125
0.00125
WAnew = 0.10125
-0.09875
= 0.01x(0-0.6)x0.6x[1-0.6] 1
1
= 0.01x-0.144 1
1
= 0.00144
0.00144
0.1012
A 1
5
-0.09875
0.10144
B 0
0.2014
4
Now you can use this network and apply input X2 and follow the same procedure (Step
4 -10). Then you will find the new weights after learning X2. This will be kept as an
exercise.
Let us discuss whether delata learning rule can be used to train multi-layer ANN.
What do you think? Well, Delta learning rule, w = [d-f(net)] f (net) X, can be
applied only for the neurons, which has a known desired output d. For a given input
pair, we know input and the desired output.
Now if we think of a multi-layer ANN, the desired output is known only for the output
layer neurons. In other words, desired output is unknown for the neurons in hidden
layers and the input layer. As such Delta learning rule cannot be applied to calculate
weight changes on the neurons in intermediate layers.
Therefore, Delta learning rule can be used to train only the single layer Artificial
neural Networks in the supervised mode. Recall that Hebbian learning rule was able
to train multi-layer neural networks, since there is no issue of desired output for
unsupervised learning sessions.
6.4 Training of Multi-layer ANN
We have already stated that multi-layer ANN have a higher representational power,
That means multi-layer ANN can model any real world problem. That is fine. Now the
challenge is how we can train multi-layer ANN. In other words, what are the learning
algorithms to train multi layer-ANN? There are several cases to be discussed here.
ox
ox
x1
ox
xX
One we know values for the neurons, then we can apply the Delta learning rule for
updating the weights. Therefore, we effectively use the Delta learning rule in the
backpropagation algorithm too.
The whole story about backpropagation is nothing more than proposing two methods
to calculate for output layer and the hidden layer neurons.
w = [d-f(net)] f (net) X
output = [d-f(net)]f(net)
As such, suppose that the ith neuron in a hidden layer is connected to n number of
neurons in the output layer, and their are 1, 1, 2, 3, ………..n. Further assume that
weight connecting the ith neuron and the output layer neurons are denoted by wi1, wi2,
….. win. Then for the ith neuron can be written as
n
hidden hidden = f (neti) kwik
K=1
This computation can be depicted in Figure 6.5. Note that, having computed values
for the neurons in the hidden layer next to the output layer, we can apply the same
process to compute for the neurons in the proceeding layer. This process can be
continued up to the input layer.
hidden Wi1 1
1
i
Wi2
2
2
Win n
n
Example 6.2
The following (Figure 6.6) ANN has been trained using backpropagation algorithm with
the use of unipolar continuous threshold function. Given C =0.2, D =0.4, f(netA) =
0.62 and f(netB) = 0.52, find A and B.
0.1 C
0.1
1 A 0.2
0.1
- 0.1
1 0.2 B D
-
Neurons A and B are in the hidden layer as per network architecture shown.
Therefore, we use the following for neurons A and B.
n
hidden hidden = f (neti) kwik
K=1
n
hidden = f (netAi) kwAik
A K=1
= f(netA)[1- f(netA)][ CwAC + DwAD]
= 0.62x 0.38x0.06
= 0.014136
Exercise 6.2
Calculate B for the Example 6.2.
Example 6.3
Having calculated values for neurons A, B, C and D, now we can find the weight
change on each neuron. For this purpose we can use
w = X
= 0.01x0.014136 [0.9 - 0] 1
-1
= 0.00014
-0.00014
Wnew = 0.100140
-0.100140
Similarly, we can calculate weight change on C. Note that with regard to neuron C
and D, input takes from the output of A and B. That means f(netA) = 0.62 and f(netB) =
0.52 forms the input for C and D Therefore, calculation of weight changes on C goes
as follows
= 0.00124
0.001041
Wnew = 0.10124
C 0.10104
Next you will do an exercise involving all the steps in problem. Note that the use of
backpropagation training algorithm primarily involves only two major tasks as
calculation of and use of w = X at least.
Exercise 6.3
In a certain backpropagation training cycle, the input pair X = [1, 1] D= [0.1, 0.9] has
been applied to a 3-layer ANN. The weights of the connections in the network at this
session are shown in Figure 6.7. If the training has been done using the unipolar
continuous threshold function, calculate the following.
-0.1
1 C 0.
-0.1 1
0. - F 0.1
A
0.
0.1 0.1 1
D 0.2
0.1 0.2
B 0. G 0.
0.2 9
-0.1
0.
1
-0.1 E
6.6 Summary
This lesson discussed two major algorithms, Delta learning rule and the
Backpropergation, for training of ANN in the supervised mode. Delta learning rule has
a sound mathematical basis that ensures minimizing of error during a training session.
This algorithm is applicable only for single layer neural networks. We also recognized
Backpropergation algorithm as a unique strategy to train a multi layer neural network
entirely in the supervised model. It is based on two strategies to compute error values
for the neurons in the output layer and other layers.