Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Lesson 06 – Delta Learning and Backpropagation Algorithms

6.1 Introduction
In lesson 5 we started to discuss about learning algorithms. As stated, a learning
algorithm provides a mechanism to store the learning effect, due to processing in a
neuron, in a connection of ANN. We learned to use Hebbian and Perceptron for
training of ANN in unsupervised and supervised modes respectively. This lesson
discusses two more supervised learning algorithms, known as, Delta Learning rule and
Backpropagation. These algorithms happen to be the champions of ANN training
algorithms over many decades.

6.2 Towards Delta Learning rule


Before discussing Delta learning rule we need to discuss the theory behind this
particular rule. It should be stated that none of Hebbian or perceptrone has a theory
behind those rules. Those rules were just proposed by Hebb and Rosenblatt
respectively. However, Delta learning rule has a theoretical basis. Let us discuss the
how it has been developed.

As stated earlier, supervised training deals with desired output (d) and the actual
output, f(net). There not the same in general. As such, there is always an error, or a
deference between d and f(net). Thus any expression such as d-f(net) can be treat as
an error function. Most commonly used error function of a single neuron is defined as

Error = [d-f(net)]2

If n neurons are considered, it is customary to take the average error as follows.

n
Error =  [di-f(neti)]2
i=1 n

In a learning session, we human also face with a reasonable error at the beginning. As
time goes on the error will go down. Towards the end of a learning session we
typically expect and error free learning.

In fact, during a learning session, we would not bother to about learning something if
the error is too big. In other words, we always learn subject to a maximum error. In
other words, we have set out our own maximum error values, denoted by Emax.

With the introduction of concept of error, we apply learning rules to change weights
only if the error in the training cycle is less than some predefined Emax value.
6.3 Delta learning rule
Delta learning rule has been developed by considering the error function written for
each neuron i.

Ei = [di-f(neti)]2

Using calculus, we can find an expression that minimizes the error during weight
change. In other words, we can do the weight change or execute the learning subject
to minimization of error.

The learning process, which ensures the minimizing of error during over the cycles, is
called the Delta learning rule. The delta learning rules is given below.

w = [d-f(net)] f (net) X

Here, f (net) denotes the first derivative of the threshold function. Note that concept
of f (net) is really meaningful for continuous functions. Table 6.1 shows expressions
for calculating f (net) values of our commonly used unipolar and bipolar continuous
functions.

Threshold function Derivative

f(net) = 1/(1+e-net) f (net) = f(net) [1-f(net)]

f(net) = [2/(1+e-net)] -1 f (net) = 1/2[1-f(net)]2

Table 6.1 – Threshold functions and derivatives

Note that f (net) can also be computed in terms of f(net). For instance, for unipolar
threshold function, f (net) is found by computing the f(net) [1-f(net)].

Note also that Delta learning rule is necessarily a supervised training algorithm. The
weight change calculated by Delta learning rule is generally very small as we compute
it by multiplying [d-f(net)] and f (net), which are individually always less than one.

Use of Delta learning rule requires extra steps such as checking for error before
applying the formulae for changing weights. It should be stated that if errors in a
particular cycle goes beyond the reset Emax value, we should discard the training
session so far and restart from the beginning of the input data set. This is a painful
task, and options are available either.
6.3.1 Steps in applying delta learning rule
Since the application of delta learning rule involves many steps, we shall depicts its
steps before going into an example. For this purpose, we can edit the general steps in
ANN training that we identified in lesson 5. Figure 6.1 shows the steps in the use of
delta learning rule.

Step 1: Design a network architecture


Step 2: Set , and Emax
Step 3: Initialize weights
Step 4: For all Neurons Apply an Input X
Step 5: Calculate net using net =  xiwi
Step 6: Calculate output using output = f(net)
Step 7: Calculate Error using E = [d-f(net)]2
Step 8: E< Emax go Step 9 otherwise go to step 3
Step 9: Calculate weight change using w = [d-f(net)] f (net) X
Step 10: With new weights go to Step 4

Figure 6.1 – Steps in the use of Delta learning rule

Example 6.1
Use the Delta learning rule to train the input pairs X1 = [1, 1] D1= [1, 0] and
X2 =[1, 0, 1] D2 = [0, 1] for a single layer ANN with two neurons in the layer. You may
use the unipolar threshold function for this calculation. Also do the training subject to
Emax = 0.5

Step 1 A
We design a single layer two-neuron ANN.

Step 2
Let =1, =0.01 and Emax =0.5

Step 3
Initialize the weights arbitrarily on the network as follows (Figure 6.2)
0.1
A 1
-0.1
0.1
B 0
0.2

Figure 6.2 – ANN with initialized weights

Step 4
Now consider the first input pair X1 = [1, 1] D1= [1, 0]
Consider Neuron A and B

X1 = [1, 1] WA = [0.1, -0.1] DA =1


WB = [0.1, 0.2] DB = 0
Step 5
Apply net =  xiwi. for the neurons A and B

netA = 1x0.1 + 1x-0.1 = 0.0


netB = 1x0.1 + 1x0.2 = 0.3

Step 6
Apply: output = f(net) = 1/(1+e-net) for neurons A and B

outputA = f(netA) = 1/(1+ e-0.0) = 0.5

outputB = f(netB) = 1/(1+ e-0.3) = 0.6

Step 7
Calculate error for each neuron and compute the average error as
E = [d-f(net)]2/n

= [[dA-f(netA)]2 + [dB-f(netB)]2/2

= [[1-0.5]2 + [0-0.6]2]/2

= [[0.5]2 + [0.4]2]/2

= (0.25 + 0.16)/2

= 0.205
Step 8
Since E(0.205) is less than Emax (0.5) go to Step 9

Step 9
Calculate weight change on Neuron A by using w = [d-f(net)] f (net) X
wA = [dA-f(netA)] f (netA) X

= [dA-f(netA)] f(netA) [1-f(netA)] X

= 0.01x(1-0.5)x0.5x[1-0.5] 1
1
= 0.01x0.125 1
1
= 0.00125
0.00125

WAnew – WAold = 0.00125


0.00125

WAnew – 0.1 = 0.00125


-0.1 0.00125

WAnew = 0.10125
-0.09875

Similarly find weight change on Neuron B by using w = [d-f(net)] f (net) X

wB = [dB-f(netB)] f (netB) X

= [dB-f(netB)] f(netB) [1-f(netB)] X

= 0.01x(0-0.6)x0.6x[1-0.6] 1
1
= 0.01x-0.144 1
1
= 0.00144
0.00144

Wnew – Wold = 0.00144


B B
0.00144

Wnew – 0.1 = 0.00144


B
0.2 0.00144
new
WB = 0.10144
0.20144
Having trained the first input (X1), we have received the network with the updated
weights as follows (Figure 6.3)

0.1012
A 1
5
-0.09875
0.10144
B 0
0.2014
4

Figure 6.3 – Updated weights after learning X1

Now you can use this network and apply input X2 and follow the same procedure (Step
4 -10). Then you will find the new weights after learning X2. This will be kept as an
exercise.

6.3.2 Features of delta learning rule


Delta learning rule has several salient features. Firstly, it has sound theoretical basis
for changing weight. The second important feature is the ability generate
considerable small weight changes. Use of continuous activation function guarantees a
huge range of values as the output. Undoubtedly, in the context of supervised
training, delta learning rule offers a huge flexibility in controlling the training session
as compared with the use of perceptron learning rule. This is because, perceptron can
deal with just two values as the output, so it can model only a certain class of real
world problems.

Let us discuss whether delata learning rule can be used to train multi-layer ANN.
What do you think? Well, Delta learning rule, w = [d-f(net)] f (net) X, can be
applied only for the neurons, which has a known desired output d. For a given input
pair, we know input and the desired output.

Now if we think of a multi-layer ANN, the desired output is known only for the output
layer neurons. In other words, desired output is unknown for the neurons in hidden
layers and the input layer. As such Delta learning rule cannot be applied to calculate
weight changes on the neurons in intermediate layers.

Therefore, Delta learning rule can be used to train only the single layer Artificial
neural Networks in the supervised mode. Recall that Hebbian learning rule was able
to train multi-layer neural networks, since there is no issue of desired output for
unsupervised learning sessions.
6.4 Training of Multi-layer ANN
We have already stated that multi-layer ANN have a higher representational power,
That means multi-layer ANN can model any real world problem. That is fine. Now the
challenge is how we can train multi-layer ANN. In other words, what are the learning
algorithms to train multi layer-ANN? There are several cases to be discussed here.

6.4.1 Training in the unsupervised mode


Training of multi-layer ANN in the unsupervised mode is not a problem. Here, we can
easily use Hebbian learning rule, regardless of number of layers and neurons.

6.4.2 Training in the supervised mode


Training of multi-layer ANN in the supervised mode is rather challenging. Clealy,
perceptron and Delta learning rule cannot be directly used to train multi-layer ANN in
the supervised mode. This is because, using perceptron or delta learning rule we can
calculate weight changes only for the output layer.

Use of combination of unsupervised and supervised learning


It is possible to use a combination of unsupervised and supervised training to train
multi-layer ANN. In this canse, output layer is trained in the supervised mode with the
use of perception or delta learning rules. In contrast, all other layers (input layer and
hidden layers) are trained in the unsupervised mode using an algorithm like Hebbian
learning rule. Figure 6.4 illustrates this particular strategy.

ox

ox

x1
ox

xX

Input layer Input and Hidden layers- Output layer -


Train in the unsupervised mode Train in the
supervised
mode
Figure 6.4 – Using unsupervised and supervised training

Training of multi-layer ANN entirely in supervised mode


In case you want to train a multi-layer ANN entirely in the supervised mode, this will
be a challenge. There is an algorithm for this purpose. This algorithm is known as
Backpropagation training algorithm. In fact, Backpropagation training algorithm has
been a breakthrough in ANN research.

6.5 Backpropagation training algorithm


In multi-layer ANN, we know the desired output for the output layer neuron.
Therefore, for the output layer neuron, we can directly calculate, d-f(net), without
any problem. Note that, d-f(net) is a kind of error term, and we denote this by .
Backpropagation algorithm proposes a method to calculate  values for the neurons in
the layer before the output layer, in terms of the  of the output layer. As such,
calculating  for the neurons in the intermediate layers is a process which propagates
the known  values (errors) in the backward direction. Therefore, sometime, this
algorithm is also called error backpropagation algorithm.

One we know  values for the neurons, then we can apply the Delta learning rule for
updating the weights. Therefore, we effectively use the Delta learning rule in the
backpropagation algorithm too.

The whole story about backpropagation is nothing more than proposing two methods
to calculate  for output layer and the hidden layer neurons.

Let us begin with Delta Learning rule

w = [d-f(net)] f (net) X

We write the Delta learning rule as


w = X

6.5.1 Calculating  for the output layer neuron


Regarding the output layer  can be directly written as follows, because we know d

output = [d-f(net)]f(net)

This calculation is quite straightforward. However, the key point about


backpropagation algorithm is associated with the calculation of  for neurons in the
layers other than the output layer.

6.5.1 Calculating  for the neurons in the hidden layers


Then backpropgation algorithm suggests to computer  for the immediate hidden
layer neurons, as a weighted sum of  values in the output layer.

As such, suppose that the ith neuron in a hidden layer is connected to n number of
neurons in the output layer, and their  are 1, 1, 2, 3, ………..n. Further assume that
weight connecting the ith neuron and the output layer neurons are denoted by wi1, wi2,
….. win. Then  for the ith neuron can be written as
n
hidden hidden = f (neti) kwik
K=1

This computation can be depicted in Figure 6.5. Note that, having computed  values
for the neurons in the hidden layer next to the output layer, we can apply the same
process to compute  for the neurons in the proceeding layer. This process can be
continued up to the input layer.

hidden Wi1 1
1
i
Wi2
2
2

Win n
n

Figure 6.5 – Computing of  for a hidden layer neuron.

Example 6.2
The following (Figure 6.6) ANN has been trained using backpropagation algorithm with
the use of unipolar continuous threshold function. Given C =0.2, D =0.4, f(netA) =
0.62 and f(netB) = 0.52, find A and B.
0.1 C
0.1
1 A 0.2
0.1
- 0.1
1 0.2 B D
-

Figure 6.6 – An example using backpropergation algorithm

Neurons A and B are in the hidden layer as per network architecture shown.
Therefore, we use the following for neurons A and B.
n
hidden hidden = f (neti) kwik
K=1
n
 hidden = f (netAi) kwAik
A K=1
= f(netA)[1- f(netA)][ CwAC + DwAD]

= 0.62 (1-0.62)[0.2x0.1 + 0.4x0.1]

= 0.62x 0.38x0.06

= 0.014136

Similarly you can calculate B. It is kept as an exercise for you.

Exercise 6.2
Calculate B for the Example 6.2.

Example 6.3
Having calculated  values for neurons A, B, C and D, now we can find the weight
change on each neuron. For this purpose we can use

w = X

As examples, let us calculate weight change on neurons A and C.

wA = 0.01 [0.9 - f(netA)]X 1


-1

= 0.01x0.014136 [0.9 - 0] 1
-1

= 0.00014
-0.00014

Wnew – Wold = 0.00014


-0.00014

Wnew – 0.1 = 0.00014


-0.1 -0.00014

Wnew = 0.100140
-0.100140
Similarly, we can calculate weight change on C. Note that with regard to neuron C
and D, input takes from the output of A and B. That means f(netA) = 0.62 and f(netB) =
0.52 forms the input for C and D Therefore, calculation of weight changes on C goes
as follows

wC = 0.01 [0.9 - f(netA)]X 0.62


0.52

= 0.01x0.2 [0.9 - 0] 0.62


0.52

= 0.00124
0.001041

Wnew – Wold = 0.00124


C C 0.00104

Wnew - 0.1 = 0.00124


C 0.2 0.00104

Wnew = 0.10124
C 0.10104

Next you will do an exercise involving all the steps in problem. Note that the use of
backpropagation training algorithm primarily involves only two major tasks as
calculation of  and use of w = X at least.

Exercise 6.3
In a certain backpropagation training cycle, the input pair X = [1, 1] D= [0.1, 0.9] has
been applied to a 3-layer ANN. The weights of the connections in the network at this
session are shown in Figure 6.7. If the training has been done using the unipolar
continuous threshold function, calculate the following.

f(net) for all neurons


 for the output layer neurons
 for the all other neurons
updated weights for connections

-0.1
1 C 0.
-0.1 1
0. - F 0.1
A
0.
0.1 0.1 1
D 0.2
0.1 0.2
B 0. G 0.
0.2 9
-0.1
0.
1
-0.1 E

6.6 Summary
This lesson discussed two major algorithms, Delta learning rule and the
Backpropergation, for training of ANN in the supervised mode. Delta learning rule has
a sound mathematical basis that ensures minimizing of error during a training session.
This algorithm is applicable only for single layer neural networks. We also recognized
Backpropergation algorithm as a unique strategy to train a multi layer neural network
entirely in the supervised model. It is based on two strategies to compute error values
for the neurons in the output layer and other layers.

You might also like