Professional Documents
Culture Documents
ANN T-04 Recurrent Networks
ANN T-04 Recurrent Networks
ANN T-04 Recurrent Networks
Topic-04
Recurrent Networks
Jordan Network
Elman Network
Hopfield Network
Basic structure
Updating procedure
Convergence
Applications: Associative Memory and Optimization Problems
Recurrent Neural Networks
All the data flow in a feed-forward network do not contain any
cycles(when cycles are introduced the network will become recurrent). That is
Lth layer receive signals from the environment(when L=1) or , neurons of the
immediate previous (L-1)th layer (when L>1) and pass their outputs neuron
of the next(L+1)th layer or to the environment (when L is the last layer).
The recurrent neural networks consists of both feed-forward and
feedback connections between the layers and neurons forming a complicated
dynamic system. The ability of the recurrent network to approximate a
continuous or discrete non-linear dynamic system by neural dynamics
defined by a system of non-linear differential or difference equations offer the
opportunities for applications to adaptive control problems.
From computational point of view, a dynamic neural structure which
contains a state feedback may provide more computational advantage than a
purely feed-forward structure.
Generalized Delta Rule in Recurrent Networks
The back-propagation learning rule can be easily used for training
patterns in recurrent networks. The possibilities can be
The hidden unit activation values are fed back to an extra set of
inputs (Elman Networks).
The output values are fed back into hidden units(Jordan
network).
For typical application such as to generate a control command
depending on external input which is a time series. Using the feed
forward network will result in need of large network and which
makes slow and difficult to train
The Jordan Network
One of the earliest recurrent neural networks was the Jordan network(1986).
In the Jordan network, the activation values of the output units are fed back
into the input layer through a set of extra units called state units.
There are as many state units as there are output units in the network.
The connection between the output units and state units have a fixed inter-
connection weights of +1.
The learning take place only in the connections between input and hidden
units as well as hidden and output units.
Thus all the learning rules derived for multi-layer Perceptron can be used to
train the Jordan network.
Jordan Network
(The output activation values are fed back to the input layer to a set of
extra neurons called the state units)
Elman Network
The Elman Network was introduced by Elman in 1990.
In this network a set of context units are introduced, which are the
extra input units whose activation values are fed back from the hidden
units.
Thus the network is very similar to the Jordan network except that
The hidden units instead of the output units are fed back
The extra input units have no self connections.
The hidden units are connected to the context units with a fixed weight
of value +1
Learning of Elman Network
Processing Unit-3
State=0
The processing units in the Hopfield network are fully interconnected
This interconnection topology makes the network Recursive, because the out
put of each unit is feed in to input of all the other unit in the same layer.
This recursive organization will allow the network to relax in to a
stable state in the absence of external inputs
Each interconnection has an associated weight
Let Tji denote the weight associated with the interconnection to the jth
unit from the ith unit
In Hopfield network the weights Tji and Tij have same value, that is Tji = Tij
Mathematical Analysis has shown that when this equality is true, the network
is able converge, that is it eventually attains a stable state.
This convergence is necessary in order for it to perform useful computational
task such as optimization and associative memory.
Many networks with unequal weights (Tji Tij ) are also able to converge
successfully.
Weights Updating Procedure
Initially the network is assigned a state for each processing unit
An updating procedure updates one unit at a time
The updating procedure affects the state of each unit, sometimes changing it
and sometimes leaving it same.
Updating single unit
The net input Sj(t+1) [the weighted sum inputs] for a neuron j at time t+1
is given by
n
S j (t + 1) = u i T ji
i =1
i j
The new activation value at time t+1 for 0/1 model
+1
if S 0
u j (t + 1) =
j
0 if S j 0
The updating process is to update all the units one by one in a sequence and
the repeat the sequence again and again until a stable state is attained
ILLUSTRATION
Convergence
Each state of the Hopfield network has an associated energy value:
E=−
1
2
j i
T ji u j ui
Ej = −
1
2 i
T ji u j ui
i j
1
Ej = − uj
2 i
T ji ui
i j
When jth is updated, if there is no change in its sate, then the energy Ej
remains the same
if there is a change in its state, then the difference in energy Ej is
E j = E j − E j = −
new old
1
2
u j
i
T ji u i
u j = u j − u j
new old
if uj changes from 0 to 1
u j = 1 and
i
T ji u i 0 E j 0
if uj changes from 1 to 0
u j = −1 and
i
T ji u i 0 E j 0
if uj changes from 1 to 1 or 0 to 0
u j = 0 and E j = 0
The Ej is the product of three quantities and this changes is either negative
or zero, no matter what change there is in the state of the unit j upon
updating.
So the network is guaranteed to converge with E taking lower and lower until
the network reaches a steady state.
Hopfield Network as Associative Memory
A primary application of the Hopfield binary network is an associative
memory
Each memory is represented by a binary vector ( either in +1/-1 model
or a 0/1 model)
Each memory is a state of the network that corresponds to a minimum
in the terrain defined by the network energy.
When a network start at an initial state, the updating procedure moves
to a minimum energy state. That minimum then is expected to
corresponds to one of the memories of the network.
Thus the network may converge to the stored memory that is most
similar or most accessible to the initial state.
Setting of interconnection weights
The number of processing units is equal to the number of entries in the
pattern to be stored.
The weights of the interconnections between the neurons have to be
thus set that the state of the system corresponding with the patterns
which are to be stored in the network are stable. (these states can be
seen as dips in energy space).
When the network is cued with a noisy or incomplete pattern, it will
render the incorrect or missing data by iterating to a stable state which
is in some sense near to the cued pattern
First the notation for the patterns to be stored must be defined.
A pattern p is denoted by
(
X p = x1p , x2p ,..., xip ,..., xnp )
Where
xip is the i th entry in the pattern vector p
and
There are m patterns in total (X 1, X 2 ,..., X p ,..., X m )
The Hebb’s rule can be used to store m patterns
m
xip x jp if i j
T ji =
p =1
o otherwise
m
(2 xip − 1)(2 x jp − 1) if i j
T ji =
p =1
o otherwise
The first rule is applicable for +1/-1 model
The second rule is applicable for 0/1 model
Where, the activation value uxi =1 indicates that city x occupies the ith
place in the tour.
A, B and C are constants
The first and second terms in the equation are zero if and only if there is a
maximum of one active neuron in each row and column, respectively.
First term allows only one visit to each city. This term is small when
there is only a single 1 in a given row.
The second term does not allow the sales person to be in two cities at
the same time. This term is small when there is only a single 1 in a
column.
The third term is zero if and only there are exactly n active neurons.
The third term allows only n cities to appear in the itinerary. This term
is small when there is exactly n number of 1s in the matrix.
To minimize the distance of the tour, an extra term fourth term of the
expression E2 is added to the energy E1 .
n n n
E2 =
D
2
(
d xy u xi u y ,i +1 + u y ,i −1 )
x =1 y =1 i
y x
Where
dxy is the distance between cities x and y
D is a constant
This expression for distance sum has to be minimized.
The weights are set as follows
TXi,Yj = − A XY 1 − ij( )
inhibitary connections within each row
− B ij (1 − XY ) inhibitary connections within each column
−C global inhibition
(
− Dd XY J ,I +1 + J ,I −1 ) data term
Where
1 if i= j
ij =
0 otherwise
uA1
A
B
C
D
E
Cities
F
G
H
I
uJ10
J