Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Lecture 7 Introduction, or how the brain works  A neural network can be defined as a model of

Artificial neural networks: Machine learning involves adaptive mechanisms


reasoning based on the human brain. The brain
Supervised learning consists of a densely interconnected set of nerve
that enable computers to learn from experience, cells, or basic information-processing units, called
 Introduction, or how the brain works learn by example and learn by analogy. Learning neurons.
 The neuron as a simple computing element capabilities can improve the performance of an
 The perceptron intelligent system over time. The most popular  The human brain incorporates nearly 10 billion
approaches to machine learning are artificial neurons and 60 trillion connections, synapses(突
 Multilayer neural networks
neural networks and genetic algorithms. This 觸), between them. By using multiple neurons
 Accelerated learning in multilayer neural networks lecture is dedicated to neural networks. simultaneously, the brain can perform its functions
 The Hopfield network much faster than the fastest computers in existence
 Bidirectional associative memories (BAM) today.
 Summary
© Negnevitsky, Pearson Education, 2005 1 © Negnevitsky, Pearson Education, 2005 2 © Negnevitsky, Pearson Education, 2005 3

1 2 3

Biological neural network  Our brain can be considered as a highly complex,


 Each neuron has a very simple structure, but an non-linear and parallel information-processing
army of such elements constitutes a tremendous system.
processing power. Synapse

Synapse Dendrites  Information is stored and processed in a neural


 A neuron consists of a cell body, soma (細胞 Axon network simultaneously throughout the whole
Axon
體), a number of fibers called dendrites (樹模 network, rather than at specific locations. In other
石), and a single long fiber called the axon(軸 words, in neural networks, both data and its
索). Soma processing are global rather than local.
Soma
Dendrites
 Learning is a fundamental and essential
Synapse
characteristic of biological neural networks. The
ease with which they can learn led to attempts to
emulate a biological neural network in a computer.
© Negnevitsky, Pearson Education, 2005 4 © Negnevitsky, Pearson Education, 2005 5 © Negnevitsky, Pearson Education, 2005 6

4 5 6

1
 An artificial neural network consists of a number of Architecture of a typical artificial neural network Analogy between biological and
very simple processors, also called neurons, which artificial neural networks
are analogous to the biological neurons in the brain.

Output Signals
Input Signals
 The neurons are connected by weighted links Biological Neural Network Artificial Neural Network
passing signals from one neuron to another. Soma Neuron
Dendrite Input
 The output signal is transmitted through the Axon Output
neuron’s outgoing connection. The outgoing Synapse Weight
connection splits into a number of branches that
transmit the same signal. The outgoing branches
terminate at the incoming connections of other Middle Layer

neurons in the network. Input Layer Output Layer

© Negnevitsky, Pearson Education, 2005 7 © Negnevitsky, Pearson Education, 2005 8 © Negnevitsky, Pearson Education, 2005 9

7 8 9

The neuron as a simple computing element  The neuron computes the weighted sum of the input Activation functions of a neuron
signals and compares the result with a threshold
Diagram of a neuron value, q. If the net input is less than the threshold, Step function Sign function Sigmoid function Linear function
the neuron output is –1. But if the net input is greater
Input Signals Weights Output Signals than or equal to the threshold, the neuron becomes Y Y Y Y

x1 activated and its output attains a value +1. +1 +1 1 1


Y
w1
 The neuron uses the following transfer or activation 0 X 0 X 0 X 0 X
x2 function: -1 -1 -1 -1
w2
Neuron Y Y
n 1, if X  q
X   xi wi Y 1, if X  0 sign 1, if X  0 sigmoid 1
Y linear X
wn Y i 1 1, if X  q Y step 
0, if X  0
Y 
1, if X  0
Y 
1 e X
xn
 This type of activation function is called a sign
function.
© Negnevitsky, Pearson Education, 2005 10 © Negnevitsky, Pearson Education, 2005 11 © Negnevitsky, Pearson Education, 2005 12

10 11 12

2
Can a single neuron learn a task? Single-layer two-input perceptron The Perceptron
 In 1958, Frank Rosenblatt introduced a training  The operation of Rosenblatt’s perceptron is based
algorithm that provided the first procedure for Inputs on the McCulloch and Pitts neuron model. The
training a simple ANN: a perceptron. x1 model consists of a linear combiner followed by a
Linear Hard
Combiner Limiter
hard limiter.
 The perceptron is the simplest form of a neural w1
Output
network. It consists of a single neuron with  The weighted sum of the inputs is applied to the
adjustable synaptic weights and a hard limiter. Y hard limiter, which produces an output equal to +1
w2 if its input is positive and 1 if it is negative.
q
x2
Threshold

© Negnevitsky, Pearson Education, 2005 13 © Negnevitsky, Pearson Education, 2005 14 © Negnevitsky, Pearson Education, 2005 15

13 14 15

Linear separability in the perceptrons How does the perceptron learn its classification
 The aim of the perceptron is to classify inputs, x1, x2 x2 tasks?
x2, . . ., xn, into one of two classes, say A1 and A2.
Class A1 This is done by making small adjustments in the
 In the case of an elementary perceptron, the n- weights to reduce the difference between the actual
1
dimensional space is divided by a hyperplane into 2 and desired outputs of the perceptron. The initial
1
two decision regions. The hyperplane is defined by x1 weights are randomly assigned, usually in the range
the linearly separable function: Class A2 x1
[0.5, 0.5], and then updated to obtain the output
2 consistent with the training examples.
n x1w1 + x2w2 q = 0 x1w1 + x2w2 + x3w3 q = 0
 xi wi  q  0
x3
(a) Two-input perceptron. (b) Three-input perceptron.
i 1

© Negnevitsky, Pearson Education, 2005 16 © Negnevitsky, Pearson Education, 2005 17 © Negnevitsky, Pearson Education, 2005 18

16 17 18

3
 If at iteration p, the actual output is Y(p) and the The perceptron learning rule Perceptron’s training algorithm
desired output is Yd (p), then the error is given by:
wi ( p  1)  wi ( p)  a . xi ( p) . e( p) Step 1: Initialisation
e( p)  Yd ( p)  Y( p) Set initial weights w1, w2,…, wn and threshold q
where p = 1, 2, 3, . . . where p = 1, 2, 3, . . . to random numbers in the range [0.5, 0.5].
a is the learning rate, a positive constant less than
unity. If the error, e(p), is positive, we need to increase
Iteration p here refers to the pth training example
perceptron output Y(p), but if it is negative, we
presented to the perceptron. The perceptron learning rule was first proposed by need to decrease Y(p).
 If the error, e(p), is positive, we need to increase Rosenblatt in 1960. Using this rule we can derive
perceptron output Y(p), but if it is negative, we the perceptron training algorithm for classification
need to decrease Y(p). tasks.

© Negnevitsky, Pearson Education, 2005 19 © Negnevitsky, Pearson Education, 2005 20 © Negnevitsky, Pearson Education, 2005 21

19 20 21

Perceptron’s training algorithm (continued) Perceptron’s training algorithm (continued) Example of perceptron learning: the logical operation AND
Inputs Desired Initial Actual Error Final
Step 3: Weight training Epoch
x1 x2
output
Yd
weights
w1 w2
output
Y e
weights
w1 w2
Step 2: Activation Update the weights of the perceptron 1 0 0 0 0.3 0.1 0 0 0.3 0.1
0 1 0 0.3 0.1 0 0 0.3 0.1
Activate the perceptron by applying inputs x1(p),
wi ( p  1)  wi ( p)  Dwi ( p) 1
1
0
1
0
1
0.3
0.2
0.1
0.1
1
0
1
1
0.2
0.3
0.1
0.0
x2(p),…, xn(p) and desired output Yd (p). 2 0 0 0 0.3 0.0 0 0 0.3 0.0
Calculate the actual output at iteration p = 1 where Dwi(p) is the weight correction at iteration p. 0 1 0 0.3 0.0 0 0 0.3 0.0
1 0 0 0.3 0.0 1 1 0.2 0.0
1 1 1 0.2 0.0 1 0 0.2 0.0
 n  The weight correction is computed by the delta rule: 3 0 0 0 0.2 0.0 0 0 0.2 0.0
Y ( p )  step   x i ( p ) w i ( p )  q  0
1
1
0
0
0
0.2
0.2
0.0
0.0
0
1
0
1
0.2
0.1
0.0
0.0
 i  1  1 1 1 0.1 0.0 0 1 0.2 0.1

Dwi ( p)  a  xi ( p) . e( p) 4 0 0 0 0.2 0.1 0 0 0.2 0.1

where n is the number of the perceptron inputs, 0


1
1
0
0
0
0.2
0.2
0.1
0.1
0
1
0
1
0.2
0.1
0.1
0.1

and step is a step activation function. Step 4: Iteration 5


1
0
1
0
1
0
0.1
0.1
0.1
0.1
1
0
0
0
0.1
0.1
0.1
0.1
Increase iteration p by one, go back to Step 2 and 0
1
1
0
0
0
0.1
0.1
0.1
0.1
0
0
0
0
0.1
0.1
0.1
0.1
repeat the process until convergence. 1 1 1 0.1 0.1 1 0 0.1 0.1
Threshold: q = 0.2; learning rate: = 0.1
© Negnevitsky, Pearson Education, 2005 22 © Negnevitsky, Pearson Education, 2005 23 © Negnevitsky, Pearson Education, 2005 24

22 23 24

4
Two-dimensional plots of basic logical operations Multilayer neural networks Multilayer perceptron with two hidden layers
x2 x2 x2
 A multilayer perceptron is a feedforward neural
1 1 1
network with one or more hidden layers.

Output Signals
Input Signals
x1 x1 x1  The network consists of an input layer of source
0 1 0 1 0 1 neurons, at least one middle or hidden layer of
computational neurons, and an output layer of
(a) AND (x1  x2) (b) OR (x 1  x 2 ) (c) Ex cl usiv e- OR
(x 1  x2 ) computational neurons.
 The input signals are propagated in a forward First Second
A perceptron can learn the operations AND and OR, Input hidden hidden Output
direction on a layer-by-layer basis.
but not Exclusive-OR. layer layer layer layer

© Negnevitsky, Pearson Education, 2005 25 © Negnevitsky, Pearson Education, 2005 26 © Negnevitsky, Pearson Education, 2005 27

25 26 27

What does the middle layer hide? Back-propagation neural network


 In a back-propagation neural network, the learning
 A hidden layer “hides” its desired output.
 Learning in a multilayer network proceeds the algorithm has two phases.
Neurons in the hidden layer cannot be observed
same way as for a perceptron.
through the input/output behaviour of the network.  First, a training input pattern is presented to the
There is no obvious way to know what the desired  A training set of input patterns is presented to the network input layer. The network propagates the
output of the hidden layer should be. network. input pattern from layer to layer until the output
pattern is generated by the output layer.
 Commercial ANNs incorporate three and  The network computes its output pattern, and if
sometimes four layers, including one or two there is an error  or in other words a difference  If this pattern is different from the desired output,
hidden layers. Each layer can contain from 10 to between actual and desired output patterns  the an error is calculated and then propagated
1000 neurons. Experimental neural networks may weights are adjusted to reduce this error. backwards through the network from the output
have five or even six layers, including three or layer to the input layer. The weights are modified
four hidden layers, and utilise millions of neurons. as the error is propagated.
© Negnevitsky, Pearson Education, 2005 28 © Negnevitsky, Pearson Education, 2005 29 © Negnevitsky, Pearson Education, 2005 30

28 29 30

5
Three-layer back-propagation neural network The back-propagation training algorithm Step 2: Activation
Input signals Activate the back-propagation neural network by
1
Step 1: Initialisation applying inputs x1(p), x2(p),…, xn(p) and desired
x1
1 y1 Set all the weights and threshold levels of the outputs yd,1(p), yd,2(p),…, yd,n(p).
2
1 network to random numbers uniformly
x2
2
2 y2 distributed inside a small range: (a) Calculate the actual outputs of the neurons in
the hidden layer:
i wij j wjk
yk  2.4 2.4   n 
xi k
  ,  
Fi  y j ( p )  sigmoid   xi ( p )  wij ( p )  q j 
 Fi
m  i 1 
n l yl
xn where Fi is the total number of inputs of neuron i
Input Hidden Output
where n is the number of inputs and neuron j is in
in the network. The weight initialisation is done
layer layer layer the hidden layer, and sigmoid is the sigmoid
on a neuron-by-neuron basis.
Error signals activation function.
© Negnevitsky, Pearson Education, 2005 31 © Negnevitsky, Pearson Education, 2005 32 © Negnevitsky, Pearson Education, 2005 33

31 32 33

Step 2 : Activation (continued) Step 3: Weight training Step 3: Weight training (continued)
(b) Calculate the actual outputs of the neurons in Update the weights in the back-propagation network (b) Calculate the error gradient for the neurons in
the output layer: propagating backward the errors associated with the hidden layer:
output neurons. (a) l
m 
yk ( p)  sigmoid   x jk ( p)  w jk ( p)  q k 
Calculate the error gradient for the neurons in the j ( p)  y j ( p)  [1  y j ( p)]   k ( p) w jk ( p)
output layer: k 1
 j 1 
Calculate the weight corrections:
where m is the number of neurons in the hidden k ( p)  yk ( p)  1  yk ( p)  ek ( p)

layers and neuron k is in the output layer. where ek ( p)  yd ,k ( p)  yk ( p) Dwij ( p)   xi ( p)  j ( p)


Calculate the weight corrections: Update the weights at the hidden neurons:
Dw jk ( p)   y j ( p)  k ( p) wij ( p  1)  wij ( p)  Dwij ( p)
Update the weights at the output neurons:
w jk ( p  1)  w jk ( p)  Dw jk ( p)
© Negnevitsky, Pearson Education, 2005 34 © Negnevitsky, Pearson Education, 2005 35 © Negnevitsky, Pearson Education, 2005 36

34 35 36

6
Step 4: Iteration Three-layer network for solving the
Increase iteration p by one, go back to Step 2 and  The effect of the threshold applied to a neuron in the
Exclusive-OR operation hidden or output layer is represented by its weight, q,
repeat the process until the selected error criterion 1
is satisfied. connected to a fixed input equal to 1.
q3
As an example, we may consider the three-layer w13 1  The initial weights and threshold levels are set
back-propagation network. Suppose that the
x1 1 3 w35 q5
randomly as follows:
w23
network is required to perform logical operation w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = 1.2,
Exclusive-OR. Recall that a single-layer perceptron
5 y5 w45 = 1.1, q3 = 0.8, q4 = 0.1 and q5 = 0.3.
w24
could not do this operation. Now we will apply the x2 2 4 w45
three-layer net. w24
Input q4 Output
layer layer
1
Hiddenlayer
© Negnevitsky, Pearson Education, 2005 37 © Negnevitsky, Pearson Education, 2005 38 © Negnevitsky, Pearson Education, 2005 39

37 38 39

 We consider a training set where inputs x1 and x2 are  The next step is weight training. To update the
equal to 1 and desired output yd,5 is 0. The actual  Next we calculate the error gradients for neurons 3
weights and threshold levels in our network, we and 4 in the hidden layer:
outputs of neurons 3 and 4 in the hidden layer are propagate the error, e, from the output layer
calculated as backward to the input layer. 3  y3(1  y3)  5  w35  0.5250 (1  0.5250) (  0.1274) (  1.2)  0.0381
 y4 (1 y4 )  5  w45  0.8808 (1  0.8808) (  0.1274)  1.1  0.0147
y3  sigmoid ( x1w13  x2w23  q3)  1/ 1 e(10.510.410.8)  0.5250 4
 First, we calculate the error gradient for neuron 5 in
y4  sigmoid ( x1w14  x2w24  q4 )  1/ 1 e(10.911.010.1)  0.8808 the output layer:  We then determine the weight corrections:
 Now the actual output of neuron 5 in the output layer 5  y5 (1 y5 ) e  0.5097 (1 0.5097) ( 0.5097) 0.1274 Dw13   x1  3  0.1 1 0.0381 0.0038
is determined as: Dw23   x2  3  0.11 0.0381 0.0038
 Then we determine the weight corrections assuming Dq3   ( 1)  3 0.1 ( 1)  0.0381 0.0038
y5  sigmoid( y3w35  y4w45  q5)  1/ 1 e(0.52501.20.88081.110.3)  0.5097 that the learning rate parameter, a, is equal to 0.1: Dw14   x1  4 0.11 (0.0147)  0.0015
Dw24   x2  4  0.11 ( 0.0147)  0.0015
 Thus, the following error is obtained: Dw35   y3  5  0.1 0.5250 (0.1274)  0.0067 Dq 4   (  1)  4  0.1 ( 1)  ( 0.0147)  0.0015
Dw45   y4  5  0.1 0.8808 (0.1274)  0.0112
e  yd ,5  y5  0  0.5097  0.5097 Dq5   ( 1)  5  0.1 (1)  (0.1274)  0.0127
© Negnevitsky, Pearson Education, 2005 40 © Negnevitsky, Pearson Education, 2005 41 © Negnevitsky, Pearson Education, 2005 42

40 41 42

7
 At last, we update all weights and threshold: Learning curve for operation Exclusive-OR Final results of three-layer network learning
Sum-Squared Network Error for 224 Epochs
w13  w13  D w13  0.5  0.0038  0.5038 10 1

w14  w14  Dw14  0.9  0.0015  0.8985 Inputs Desired Actual Sum of
w23  w23  D w23  0.4  0.0038  0.4038
output output squared
10 0
x1 x2 yd y5 e errors
w24  w24  D w24  1.0  0.0015  0.9985

Sum-Squared Error
w35  w35  D w35   1.2  0.0067   1.2067 10 -1
1 1 0 0.0155 0.0010
0 1 1 0.9849
w45  w45  D w45  1.1  0.0112  1.0888
1 0 1 0.9849
q 3  q 3  D q 3  0.8  0.0038  0.7962 10 -2
0 0 0 0.0175
q 4  q 4  D q 4   0.1  0.0015   0.0985
10 -3
q 5  q 5  D q 5  0.3  0.0127  0.3127

 The training process is repeated until the sum of 10 -4


squared errors is less than 0.001. 0 50 100
Epoch
150 200

© Negnevitsky, Pearson Education, 2005 43 © Negnevitsky, Pearson Education, 2005 44 © Negnevitsky, Pearson Education, 2005 45

43 44 45

Network represented by McCulloch-Pitts model Decision boundaries Accelerated learning in multilayer


for solving the Exclusive-OR operation x2 x2 x2 neural networks
1 x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0
 A multilayer network learns much faster when the
+1.5 1 1 1 sigmoidal activation function is represented by a
1
+1.0 hyperbolic tangent:
x1 1 3 x1 x1 x1
2.0 +0.5
+1.0 0 1 0 1 0 1 2a
Ytan h  a
5 y5
(a) (b) (c) 1 ebX
+1.0 where a and b are constants.
+1.0
x2 2 4 (a) Decision boundary constructed by hidden neuron 3;
+1.0
(b) Decision boundary constructed by hidden neuron 4; Suitable values for a and b are:
+0.5
(c) Decision boundaries constructed by the complete a = 1.716 and b = 0.667
1 three-layer network
© Negnevitsky, Pearson Education, 2005 46 © Negnevitsky, Pearson Education, 2005 47 © Negnevitsky, Pearson Education, 2005 48

46 47 48

8
Learning with momentum for operation Exclusive-OR Learning with adaptive learning rate
 We also can accelerate training by including a 10 2
Training for 126 Epochs
To accelerate the convergence and yet avoid the
momentum term in the delta rule: 10 1
danger of instability, we can apply two heuristics:
10 0

Dw jk ( p)   Dw jk ( p  1)   y j ( p)  k ( p)
10 -1
10 -2
Heuristic 1
10 -3 If the change of the sum of squared errors has the same
where b is a positive number (0  b  1) called the 10 -4
0 20 40 60 80 100 120 algebraic sign for several consequent epochs, then the
momentum constant. Typically, the momentum Epoch
learning rate parameter, a, should be increased.
1.5
constant is set to 0.95. 1 Heuristic 2

Learning Rate
0.5 If the algebraic sign of the change of the sum of
0
squared errors alternates for several consequent
This equation is called the generalised delta rule. -0.5
epochs, then the learning rate parameter, a, should be
-1
0 20 40 60 80 100 120 140 decreased.
Epoch
© Negnevitsky, Pearson Education, 2005 49 © Negnevitsky, Pearson Education, 2005 50 © Negnevitsky, Pearson Education, 2005 51

49 50 51

Learning with adaptive learning rate Learning with momentum and adaptive learning rate
 Adapting the learning rate requires some changes Tr aining for 103 Epochs Tr aining for 85 Epochs
in the back-propagation algorithm.
2
10 10 2
10 1 1

Sum-Squared Erro
10
Sum-Squared Erro

 If the sum of squared errors at the current epoch 10 0 10 0


10 -1 10 -1
exceeds the previous value by more than a 10 -2
10 -2
predefined ratio (typically 1.04), the learning rate 10
-3
10 -3

parameter is decreased (typically by multiplying


-4
10 10 -4
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80

by 0.7) and new weights and thresholds are 1


Epoch Epoch
2.5
calculated. 0.8 2

Learning Rate
Learning Rate

 If the error is less than the previous one, the 0.6 1.5

learning rate is increased (typically by multiplying 0.4 1

0.2 0.5
by 1.05).
0 0
0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 80 90
Epoch Epoch

© Negnevitsky, Pearson Education, 2005 52 © Negnevitsky, Pearson Education, 2005 53 © Negnevitsky, Pearson Education, 2005 54

52 53 54

9
The Hopfield Network  Multilayer neural networks trained with the back-  The stability of recurrent networks intrigued
propagation algorithm are used for pattern several researchers in the 1960s and 1970s.
 Neural networks were designed on analogy with
recognition problems. However, to emulate the However, none was able to predict which network
the brain. The brain’s memory, however, works by
human memory’s associative characteristics we would be stable, and some researchers were
association. For example, we can recognise a
need a different type of network: a recurrent pessimistic about finding a solution at all. The
familiar face even in an unfamiliar environment
neural network. problem was solved only in 1982, when John
within 100-200 ms. We can also recall a complete
Hopfield formulated the physical principle of
sensory experience, including sounds and scenes,  A recurrent neural network has feedback loops
from its outputs to its inputs. The presence of storing information in a dynamically stable
when we hear only a few bars of music. The brain
such loops has a profound impact on the learning network.
routinely associates one thing with another.
capability of the network.

© Negnevitsky, Pearson Education, 2005 55 © Negnevitsky, Pearson Education, 2005 56 © Negnevitsky, Pearson Education, 2005 57

55 56 57

Single-layer n-neuron Hopfield network  The Hopfield network uses McCulloch and Pitts  The current state of the Hopfield network is
neurons with the sign activation function as its determined by the current outputs of all neurons,
x1 1 y1 computing element: y1, y2, . . ., yn.
x2 2 y2 Thus, for a single-layer n-neuron network, the state
1, if X  0 can be defined by the state vector as:
sign 
xi i yi Y  1, if X  
 Y, if X    y1 

xn n yn  y 
Y  
2 
 
 
 y n 

© Negnevitsky, Pearson Education, 2005 58 © Negnevitsky, Pearson Education, 2005 59 © Negnevitsky, Pearson Education, 2005 60

58 59 60

10
 In the Hopfield network, synaptic weights between Possible states for the three-neuron  The stable state-vertex is determined by the weight
Hopfield network matrix W, the current input vector X, and the
neurons are usually represented in matrix form as
follows: y2 threshold matrix q. If the input vector is partially
incorrect or incomplete, the initial state will converge
M (1,1, 1) (1, 1, 1)
into the stable state-vertex after a few iterations.
W  YmYmT  M I  Suppose, for instance, that our network is required to
m1
(1, 1, 1) (1, 1, 1) memorise two opposite states, (1, 1, 1) and (1, 1, 1).
where M is the number of states to be memorised y1
Thus,
by the network, Ym is the n-dimensional binary 0
1 1
vector, I is n  n identity matrix, and superscript T Y2  1 or Y1T  1 1 1 Y2T   1  1  1
(1,1,1) (1,1,1) Y1  1
denotes matrix transposition.
1 1

(1,1, 1) (1,1, 1) where Y1 and Y2 are the three-dimensional vectors.


y3
© Negnevitsky, Pearson Education, 2005 61 © Negnevitsky, Pearson Education, 2005 62 © Negnevitsky, Pearson Education, 2005 63

61 62 63

 The 3  3 identity matrix I is  First, we activate the Hopfield network by applying  The remaining six states are all unstable. However,
1 0 0 the input vector X. Then, we calculate the actual stable states (also called fundamental memories) are
I  0 1 0 output vector Y, and finally, we compare the result capable of attracting states that are close to them.
0 0 1 with the initial input vector X.  The fundamental memory (1, 1, 1) attracts unstable
 Thus, we can now determine the weight matrix as states (1, 1, 1), (1, 1, 1) and (1, 1, 1). Each of
0 2 2 1 0  1
follows:   these unstable states represents a single error,
Y1  sign 2 0 2 1  0   1 compared to the fundamental memory (1, 1, 1).
1 1 1 0 0 0 2 2 2 2 0 1 0  1

W  1 1 1 1  1 1 1 1  2 0 1 0  2 0 2  The fundamental memory (1, 1, 1) attracts
1 1 0 0 1 2 2 0 0 2 2 1 0 1 unstable states (1, 1, 1), (1, 1, 1) and (1, 1, 1).
 
Y2  sign2 0 2 1  0  1  Thus, the Hopfield network can act as an error
 Next, the network is tested by the sequence of input 2
vectors, X1 and X2, which are equal to the output (or  2 0 1 0 1 correction network.
target) vectors Y1 and Y2, respectively.
© Negnevitsky, Pearson Education, 2005 64 © Negnevitsky, Pearson Education, 2005 65 © Negnevitsky, Pearson Education, 2005 66

64 65 66

11
Storage capacity of the Hopfield network Bidirectional associative memory (BAM)  To associate one memory with another, we need a
recurrent neural network capable of accepting an
 The Hopfield network represents an autoassociative
 Storage capacity is or the largest number of input pattern on one set of neurons and producing
type of memory  it can retrieve a corrupted or
fundamental memories that can be stored and incomplete memory but cannot associate this memory a related, but different, output pattern on another
retrieved correctly. with another different memory. set of neurons.
 The maximum number of fundamental memories  Human memory is essentially associative. One thing  Bidirectional associative memory (BAM), first
Mmax that can be stored in the n-neuron recurrent may remind us of another, and that of another, and so proposed by Bart Kosko, is a heteroassociative
network is limited by on. We use a chain of mental associations to recover network. It associates patterns from one set, set A,
a lost memory. If we forget where we left an to patterns from another set, set B, and vice versa.
M max  0.15n umbrella, we try to recall where we last had it, what Like a Hopfield network, the BAM can generalise
we were doing, and who we were talking to. We and also produce correct outputs despite corrupted
attempt to establish a chain of associations, and or incomplete inputs.
thereby to restore a lost memory.
© Negnevitsky, Pearson Education, 2005 67 © Negnevitsky, Pearson Education, 2005 68 © Negnevitsky, Pearson Education, 2005 69

67 68 69

BAM operation  To develop the BAM, we need to create a


correlation matrix for each pattern pair we want to
x1(p) 1 x1(p+1) 1 The basic idea behind the BAM is to store store. The correlation matrix is the matrix product
1 y1(p) 1 y1(p) pattern pairs so that when n-dimensional vector of the input vector X, and the transpose of the
x2(p) 2 x2(p+1) 2 X from set A is presented as input, the BAM
2 y2(p) 2 y2(p) output vector YT. The BAM weight matrix is the
recalls m-dimensional vector Y from set B, but sum of all correlation matrices, that is,
xi (p) i
j yj(p) xi(p+1) i
j yj(p) when Y is presented as input, the BAM recalls X. M

m ym(p) m ym(p)
W  X m YmT
m1
xn(p) n xn(p+1) n
Input Output Input Output where M is the number of pattern pairs to be stored
layer layer layer layer in the BAM.
(a) Forward direction. (b) Backward direction.
© Negnevitsky, Pearson Education, 2005 70 © Negnevitsky, Pearson Education, 2005 71 © Negnevitsky, Pearson Education, 2005 72

70 71 72

12
Stability and storage capacity of the BAM
 The BAM is unconditionally stable. This means that
any set of associations can be learned without risk of
instability.
 The maximum number of associations to be stored in
the BAM should not exceed the number of
neurons in the smaller layer.
 The more serious problem with the BAM is
incorrect convergence. The BAM may not
always produce the closest association. In fact, a
stable association may be only slightly related to
the initial input vector.
© Negnevitsky, Pearson Education, 2005 73

73

13

You might also like