Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Lesson 05 – Hebbian Algorithm and Perceptrons

5.1 Introduction
Lesson 4 discussed the McCulloch-Pitts model of ANN. That lesson primary covered how we model
the processing inside a neuron. It was based on the concepts of inputs, weights, net, threshold
function and the output. This lesson covers how we model the role of connections to implement
the learning affect on ANN. As such this lesson will present all fundamental learning algorithms;
Hebbain learning and peceptron. We will also discuss limitations of perceptron learning rule.

5.2 Role of Connections in ANN


ANN is a model of human brain. It’s a network of neurons. Neurons work as processors while
connections work as memories. We have already discussed how the processing inside a neuron can
be modelled. This is what is known as McCulloch-Pitts model.

Next we should discuss how we can model the role of connections in an ANN. In other words, how
we can model the learning affect of an input processed at a neuron. Obviously, processing alone
does not make sense; if we do not some how implement the learning effect. Since connections
are supposed to work as memories, it is very natural to do a kind of modelling on connections.

According to McCulloch-Pitts model connections hold weights that are set out arbitrary at the
beginning. Therefore, wouldn’t that be a good idea to define the learning effect of processing
done at a neuron as change of weights, denoted by ∆w.

5.3 General form of learning algorithms


Expressions written for ∆w are called learning algorithms or learning rules for training of ANN. In
other words, during an ANN training session, we change the weights of connections as per the
input processed at the neuron. This continues until we finish the training data set. The result of
the final weight change represents the learning effect of the whole training session.

Obviouly, the learning effect of certain input data is proportional to the strength of the input
vector X and the function of the output generated, denoted by g(output). Therfore, ∆w can be
written as

∆w α g(output)X
Thus

∆w = η g(output)X, where η is called a learning constant.

This is the most general form of the learning rule for training of ANN.

∆w = η g(output)X
Note also that stands for the weight different between the new weights and the old weights. Thus

∆w = Wnew - Wold

All the learning rules (e.g. Hebbian, Perceptron, Delta Learning rule) are formed by replacing
g(output) by different terms. For example, hebbian learning rule has been formed by setting
g(output) = f(net). In contrast, Delta learning rules sets g(output) as [d-f(net)]f ′(net), where d is
the desired output. Next we discuss these learning rules separately.

5.4 Modes of training


Training of ANN is dependent on what is called mode of training. Fundamentally, there are two
kinds of training modes as

• Supervised training
• Unsupervised training

In supervised training we are given both input and desired output as the training data. Therefore,
objective of the learning session is to achieve the desired output. In general there is always at
least a small difference (error) between desired output and the actual output generated by
f(net). Many real world problems such as identification of signature are always done by knowing
what the output should be. Therefore, applications such as recognition of images, objects,
patters, and forecasting require the training session to be conducted in the supervised mode.

On the other hand, unsupervised learning does not known what the output should be. So, there is
no concept of desired output in unsupervised learning. This is very mush like reading a book
without having an idea about what is to be extracted. In such an unsupervised learning session,
finding can be anything. Ordinary reading, watching a drama, etc. is always does in unsupervised
manner. Unsupervised training leads to innovative findings. Which were not even thought of
before?

Therefore, learning algorithms can be broadly classified as supervised and unsupervised training
algorithms.

5.4 Hebbian learning rule


In 1948, Donald Hebb introduced the first ever learning algorithm for traing of ANN. It should be
noted that in 1943, McCulloch-Pitts modelled the processing inside a neuron yet, it took 5 years to
model the learning affect into ANN. Donald Hebb proposed the following rule for training of ANN.

∆w = ηf(net)X

As per the Hebbian learning rule, the key features can be noted. Hebbian learning rule is an
unsupervised learning algorithm since it does indicate any terms about the desired output.
Secondly, for Hebbian learning rule, there is no restriction on the choice of activation function. As
such, it can use both binary and continuous activation function. More importantly, this rule can be
used for any network architecture comprising multiple layers.

Example 5.1
Use Hebbian learning rule to train the two inputs X1 = [0, 1], X2= [1,1] into the ANN shown in
Figure 5.1. You may use unipolar continuous threshold function.
-0.1
A -0.1
0 0.1 0.1
C
0.1 0.1
B 0.1
0.2 0.1
1 E
0.1
0.2

-0.1
C

Figure 5.1 – ANN to apply Hebbian learning

Apply net = ∑ xiwi. for the neurons A and B

netA = 0x-0.1 + 1x0.1 = 0.1

netB = 0x0.1 + 1x0.2 = 0.3

Then apply: output = f(net) = 1/(1+e- λnet) with λ=1 for neurons A and B

outputA = f(netA) = 1/(1+ e- 0.1) = 0.52

outputB = f(netB) = 1/(1+ e- 0.3) = 0.57

Now apply Hebbian Learning rule ∆w = ηf(net)X to calculate the weight change on A and neurons.
Assume that η = 0.01

∆wA = 0.01 f(netA) 0


1

Wnew old
A – W A = 0.01x0.52 0
1

Wnew old
A – WA = 0
0.0052

Wnew
A – -0.1 = 0
0.0052

WAnew = -0.1
0.1052

Similarly, we can find the eight update on neuron B


∆wB = 0.01 f(netB) 0
1

WBnew – Wold
B = 0.01x0.57 0
1

Wnew old
B – WB = 0
0.0057

Wnew
B – 0.1 = 0
0.0057

Wnew
B = 0.1
0.2057

Thus Figure 5.2 shows the network with the updated weights after learning the input X1.

-0.1
A
0.1052
0.1
B
0.2057

Figure 5.2 – Updated weights after learning the input X1

With this new weights, now we can apply the second input X2 = [1, 1] and calculate net, f(net)
and new weights.

Exercise 5.1
Complete the Example 5.1 by finding the final weights after training the input X2.

An important observation
Note that weight updates through learning rules are generally very small. For example, an initial
weight 0.1 on A has been updated as 0.1052. The increase is 0.0052. All learning rules are
developed to set small increments for the weight changes. This has been guaranteed by
introducing threshold functions with the values always less than one. On the other hand the value
of η (<<1) also contributes to keep ∆w small. In addition, use of very small weight values as a
combination of positive and negative numbers also amounts to set small values for ∆w.

5.5 Perceptron

Perceptron is the name for the first supervised training algorithm. It was proposed by Frank
Rosenblatt (1958). This algorithm came forward after 10 years from the Hebbian learning
algorithm.

Percpetron learning takes very simple form

∆w = η[d- f(net)]X

Here d is the desired output, which is essential for supervised training algorithms. In addition,
here the f(net) is restricted to use only the binary threshold functions.

Some books do not mention about the learning constant η, yet we still can use it without any
affect on the fundamentals of Perceptron learning rule.

It should also be noted that peceptron has been restricted as a ingle layer ANN with arbitrary
number of neurons.

Let us discuss how perceptron can be used to model some real world problems.

Example 5.2
Use perceptron with two neurons (Figure 5.3) to train the input-pair X1 = [1, -1] D1=[0.9, 0.8].
Note that the term input-pair has been used to refer to input together with its desired output.

0.1
1 A 0.9
-0.1
0.1
B 0.8
-1
-0.2
Figure 5.3 – Using perceptron learning

Since the desired outputs is [0.9, 0.8] we are expecting 0.9 as the output from A, and 0.8 as the
output from B.

Let us see what we get as the output of A and B.

Apply net = ∑ xiwi. for the neurons A and B

netA = 1x-0.1 + -1x-0.1 = 0

netB = 1x0.1 + -1x-0.2 = 0.3


Since perceptron uses only binary functions, let us apply unipolar binary function
f(net) = 0 net ≤0
1 net >0

Therefore
outputA = f(netA)
= f(0)
=0

outputB = f(netB)
= f(0.3)
=1

Now apply Perceptron learning rule ∆w = η[d- f(net)]X to calculate weight change on neurons A
and B. Let also assume that η = 0.01.

∆wA = 0.01 [0.9 - f(netA)] 1


-1
= 0.01 [0.9 - 0] 1
-1
= 0.009
-0.009

WAnew – Wold
A = 0.009
-0.009

Wnew
A – 0.1 = 0.009
-0.1 -0.009

WAnew = 0.109
-0.091

Similarly

∆wB = 0.01 [0.8 - f(netB)] 1


-1

= 0.01 [0.8- 1] 1
-1
= -0.002
0.002

WBnew – Wold
B = -0.002
0.002

Wnew
B – 0.1 = 0.002
-0.2 -0.002

Wnew
B = 0.102
-0.198
Hence the updated weights under perceptron learning can be shown in Figure 5.4

0.109
A 0.9
1
0.091

0.102
0.8
-1 B
-0.198

Figure 5.4 – Updated weight under perception learning

Exercise 5.2
Use perceptron with two neurons to train the input-pair X1 = [1, 1, -1]
D1=[0.9, 0.8]. You may use bipolar binary threshold function.

5.6 Steps in Training an ANN


We have already used two learning algorithms, namely, Hebbian and Perceptron for training of
ANN. Now based on our experience, we can depict the following key steps in training an ANN in
general.

Step 1: Design a network architecture


Step 2: Initialize weights
Step 3: Apply an Input
Step 4: Calculate net using net = ∑ xiwi
Step 5: Calculate output using output = f(net)
Step 6: Calculate new weight using a suitable learning rule
Step 6: With new weights go to Step 3

Note that so far we have not discussed about design of ANN. It will be covered elsewhere in the
module after discussing some more learning algorithms/rules.

Activity 5.1
Draw a flowchart to depict steps in training an ANN.

5.7 Minsky’s criticism

Since perceptron was the very first supervised training algorithm, many people criticised
perceptron. Among others, in early 1960, Marvin Minsky criticised perceptron for its limited
capacity to model real world problems. This issue is also known as liner sperability issue of
perceptron.

For this criticism, Minsky selected very simple real world models of logical operations, AND, OR
and XOR. He showed that even such a simple function XOR cannot be modelled by perceptron. His
proof is not discussed here. Based on the criticism, Minsky said that ANN is a useless technology.
As a result, ANN researches did not come forward for about 26 years.
However, in mid 1980s, researchers have shown that XOR can be modelled by two layer ANN, by
going beyond peceptron. Thus Minsky could not argue that ANN as a technology is useless.
Perhapps, he could have told that perceptron has limitations.

In 1980s, multi-layer ANN was a breakthrough in the filed of AI. It was shown that ANN with 3
layers can model any real world problem. With regard to ANN, higher the number of layers, higher
the representational power.

5.8 Summary
This lesson started to discuss learning rules for training of ANN. As such we discussed Hebbian
learning rule and the perceptron learning rule. Hebbian learning is used for unsupervised training
while perceptron is for supervised training. We also pointed out that perceptron has been
criticized by Minsky for its limited capacity to model real world problems. It was also pointed out
that multi layer ANN is capable of modelling any real world problem.

You might also like