Professional Documents
Culture Documents
8_lesson_05
8_lesson_05
5.1 Introduction
Lesson 4 discussed the McCulloch-Pitts model of ANN. That lesson primary covered how we model
the processing inside a neuron. It was based on the concepts of inputs, weights, net, threshold
function and the output. This lesson covers how we model the role of connections to implement
the learning affect on ANN. As such this lesson will present all fundamental learning algorithms;
Hebbain learning and peceptron. We will also discuss limitations of perceptron learning rule.
Next we should discuss how we can model the role of connections in an ANN. In other words, how
we can model the learning affect of an input processed at a neuron. Obviously, processing alone
does not make sense; if we do not some how implement the learning effect. Since connections
are supposed to work as memories, it is very natural to do a kind of modelling on connections.
According to McCulloch-Pitts model connections hold weights that are set out arbitrary at the
beginning. Therefore, wouldn’t that be a good idea to define the learning effect of processing
done at a neuron as change of weights, denoted by ∆w.
Obviouly, the learning effect of certain input data is proportional to the strength of the input
vector X and the function of the output generated, denoted by g(output). Therfore, ∆w can be
written as
∆w α g(output)X
Thus
This is the most general form of the learning rule for training of ANN.
∆w = η g(output)X
Note also that stands for the weight different between the new weights and the old weights. Thus
∆w = Wnew - Wold
All the learning rules (e.g. Hebbian, Perceptron, Delta Learning rule) are formed by replacing
g(output) by different terms. For example, hebbian learning rule has been formed by setting
g(output) = f(net). In contrast, Delta learning rules sets g(output) as [d-f(net)]f ′(net), where d is
the desired output. Next we discuss these learning rules separately.
• Supervised training
• Unsupervised training
In supervised training we are given both input and desired output as the training data. Therefore,
objective of the learning session is to achieve the desired output. In general there is always at
least a small difference (error) between desired output and the actual output generated by
f(net). Many real world problems such as identification of signature are always done by knowing
what the output should be. Therefore, applications such as recognition of images, objects,
patters, and forecasting require the training session to be conducted in the supervised mode.
On the other hand, unsupervised learning does not known what the output should be. So, there is
no concept of desired output in unsupervised learning. This is very mush like reading a book
without having an idea about what is to be extracted. In such an unsupervised learning session,
finding can be anything. Ordinary reading, watching a drama, etc. is always does in unsupervised
manner. Unsupervised training leads to innovative findings. Which were not even thought of
before?
Therefore, learning algorithms can be broadly classified as supervised and unsupervised training
algorithms.
∆w = ηf(net)X
As per the Hebbian learning rule, the key features can be noted. Hebbian learning rule is an
unsupervised learning algorithm since it does indicate any terms about the desired output.
Secondly, for Hebbian learning rule, there is no restriction on the choice of activation function. As
such, it can use both binary and continuous activation function. More importantly, this rule can be
used for any network architecture comprising multiple layers.
Example 5.1
Use Hebbian learning rule to train the two inputs X1 = [0, 1], X2= [1,1] into the ANN shown in
Figure 5.1. You may use unipolar continuous threshold function.
-0.1
A -0.1
0 0.1 0.1
C
0.1 0.1
B 0.1
0.2 0.1
1 E
0.1
0.2
-0.1
C
Then apply: output = f(net) = 1/(1+e- λnet) with λ=1 for neurons A and B
Now apply Hebbian Learning rule ∆w = ηf(net)X to calculate the weight change on A and neurons.
Assume that η = 0.01
Wnew old
A – W A = 0.01x0.52 0
1
Wnew old
A – WA = 0
0.0052
Wnew
A – -0.1 = 0
0.0052
WAnew = -0.1
0.1052
WBnew – Wold
B = 0.01x0.57 0
1
Wnew old
B – WB = 0
0.0057
Wnew
B – 0.1 = 0
0.0057
Wnew
B = 0.1
0.2057
Thus Figure 5.2 shows the network with the updated weights after learning the input X1.
-0.1
A
0.1052
0.1
B
0.2057
With this new weights, now we can apply the second input X2 = [1, 1] and calculate net, f(net)
and new weights.
Exercise 5.1
Complete the Example 5.1 by finding the final weights after training the input X2.
An important observation
Note that weight updates through learning rules are generally very small. For example, an initial
weight 0.1 on A has been updated as 0.1052. The increase is 0.0052. All learning rules are
developed to set small increments for the weight changes. This has been guaranteed by
introducing threshold functions with the values always less than one. On the other hand the value
of η (<<1) also contributes to keep ∆w small. In addition, use of very small weight values as a
combination of positive and negative numbers also amounts to set small values for ∆w.
5.5 Perceptron
Perceptron is the name for the first supervised training algorithm. It was proposed by Frank
Rosenblatt (1958). This algorithm came forward after 10 years from the Hebbian learning
algorithm.
∆w = η[d- f(net)]X
Here d is the desired output, which is essential for supervised training algorithms. In addition,
here the f(net) is restricted to use only the binary threshold functions.
Some books do not mention about the learning constant η, yet we still can use it without any
affect on the fundamentals of Perceptron learning rule.
It should also be noted that peceptron has been restricted as a ingle layer ANN with arbitrary
number of neurons.
Let us discuss how perceptron can be used to model some real world problems.
Example 5.2
Use perceptron with two neurons (Figure 5.3) to train the input-pair X1 = [1, -1] D1=[0.9, 0.8].
Note that the term input-pair has been used to refer to input together with its desired output.
0.1
1 A 0.9
-0.1
0.1
B 0.8
-1
-0.2
Figure 5.3 – Using perceptron learning
Since the desired outputs is [0.9, 0.8] we are expecting 0.9 as the output from A, and 0.8 as the
output from B.
Therefore
outputA = f(netA)
= f(0)
=0
outputB = f(netB)
= f(0.3)
=1
Now apply Perceptron learning rule ∆w = η[d- f(net)]X to calculate weight change on neurons A
and B. Let also assume that η = 0.01.
WAnew – Wold
A = 0.009
-0.009
Wnew
A – 0.1 = 0.009
-0.1 -0.009
WAnew = 0.109
-0.091
Similarly
= 0.01 [0.8- 1] 1
-1
= -0.002
0.002
WBnew – Wold
B = -0.002
0.002
Wnew
B – 0.1 = 0.002
-0.2 -0.002
Wnew
B = 0.102
-0.198
Hence the updated weights under perceptron learning can be shown in Figure 5.4
0.109
A 0.9
1
0.091
0.102
0.8
-1 B
-0.198
Exercise 5.2
Use perceptron with two neurons to train the input-pair X1 = [1, 1, -1]
D1=[0.9, 0.8]. You may use bipolar binary threshold function.
Note that so far we have not discussed about design of ANN. It will be covered elsewhere in the
module after discussing some more learning algorithms/rules.
Activity 5.1
Draw a flowchart to depict steps in training an ANN.
Since perceptron was the very first supervised training algorithm, many people criticised
perceptron. Among others, in early 1960, Marvin Minsky criticised perceptron for its limited
capacity to model real world problems. This issue is also known as liner sperability issue of
perceptron.
For this criticism, Minsky selected very simple real world models of logical operations, AND, OR
and XOR. He showed that even such a simple function XOR cannot be modelled by perceptron. His
proof is not discussed here. Based on the criticism, Minsky said that ANN is a useless technology.
As a result, ANN researches did not come forward for about 26 years.
However, in mid 1980s, researchers have shown that XOR can be modelled by two layer ANN, by
going beyond peceptron. Thus Minsky could not argue that ANN as a technology is useless.
Perhapps, he could have told that perceptron has limitations.
In 1980s, multi-layer ANN was a breakthrough in the filed of AI. It was shown that ANN with 3
layers can model any real world problem. With regard to ANN, higher the number of layers, higher
the representational power.
5.8 Summary
This lesson started to discuss learning rules for training of ANN. As such we discussed Hebbian
learning rule and the perceptron learning rule. Hebbian learning is used for unsupervised training
while perceptron is for supervised training. We also pointed out that perceptron has been
criticized by Minsky for its limited capacity to model real world problems. It was also pointed out
that multi layer ANN is capable of modelling any real world problem.