Unifying the Hopfield and Hamming binary associative memories

P Houselander and J T Taylor

Department of Electronic & Electrical Engineering, University College London, UK

Hopfield network does not go through a training

Abstract phase in the accepted sense. The network is in
In this paper we shall introduce a mathematical essence programmed with the exemplar patterns
technique to enable calculation of the capacity that we wish to be stored and the mining scheme
(which is defined) of an associative memory. We employed is an example of the application of the
shall use the technique to investigate the Hebbian rule [Heb48]. To operate the network,
performance (in t e r m of capacity) of the original an input (probe) is applied to the network (Xi)
Hopfield network and some novel derivatives. such that the outputs of layer 1 (Oj) at time zero
Through a logical sequence of structural take on the probe value and the outputs of layer 2
improvements, we show that a natural conclusion assume a value commensurate with the output of
to the extension of the Hopfield nemork is the layer 1 and the connection weights Wjk. The
Hamming network. This network exhibits the probe is then removed and the outputs allowed to
theoretical maximum capacity obtainable from change (iterate) until they stabilise at a particular
an associative memory structure. binary level. This is the output of the network and
would ideally be the stored pattern which most
The Hopfield Network resembled the input pattern presented (i.e.
The Hopfield network has been the subject of smallest Hamming distance). The storage
prescriptions used (when using the Hebbian rule)
much research in the field of artificial neural
to set the value of the interconnections weights
networks (ANN) since its proposition in 1982
along with the update equations for the nodes in
[Hop82]. It operates as an auto-associative
memory and is therefore able to recall stored layers 1,2 maybe expressed as follows.
patterns (exemplars) when a facsimile of the if j o k Wjk =Z,(Smj.S,) (la)
pattern .is presented as an input. The network ifj=k w , k = O (1b)
can be represented as a 2 layer network consisting Oj(nT) = X i ; n = O (W
of N nodes in each layer (where N is the number Oj(nT) = 0,; n > 0 (2b)
of inputs) and this structure can be seen in Fig. 1
0, = f(xj(Ojewjk) = f(x) (2c)
[Was89]. The nodes in layer 2 sum the weighted
Where w j k is the connection between the layer 1
and layer 2, S,, ’corresponds to “Bit a” in
memory vector m, Xi is the input to the network
(probe), oj and 0, are the outputs of layer 1,2
respectively, n is the number of iterations from
initialisation, T is the time between successive
iterations and f(x) = 1; x > 0; f(x) = -1; x <= 0.
To assess the capabilities of an associative
memory such as the Hopfield network we have
defined two fundamental characteristics which
U determine its usefulness [Hou90a]:
(1) Capacity, C: the maximum number of stable
Layer 1 Layer 2 binary vectors that can be stored (on average).
Fig. 1 Hopfield network This assumes that a particular storage prescription
outputs from the nodes in layer 1 and compare has been applied to a sufficiently large arbitrarily
the total input to a threshold. If the input exceeds chosen set of memories using a specified network
the threshold then the output of the node will go architecture. This is an extension of the
high (+1) otherwise the output will go low (-l), definition implied in [Hop82]].
assuming the use of bipolar signalling. Unlike the (2) Associativity, A: the capability (on average)
majority of other ANN architectures, the of a particular network architecture to take an
arbitrarily chosen erroneous probe and correct the the average correlation for an arbitrary probe
bits in error within the probe to reproduce exactly applied to an arbitrary memory vector. This is
the nearest exemplar. This assumes that the calculated as follows:
network was trained using a specified storage Ac = 2.z[((N- 1) Com x).((N-l)-2.~)]/2~(3)
prescription and that the Hamming distance (H) where x=Oto (N-1)/2 - 1
of the probe to one stable stored memory = [(N-l).((N-1) Com (N-1)/2) (4)
(exemplar) is known and also less than the using Stirlings approximation
Hamming distance to any other stored memory.
(N-l)! 2 sqr(2. .(N-l)).e-(N-l).(N-l)(N-l)
We can use these definitions to compare the
performance of modified Hopfield networks but
=> Ac %sqr(2.(N-l)/~) ( 5 )
due to space constraints this paper will only Where Ac is the average correlation, N Com x is
consider the effect on the capacity C. A thorough the number of ways of choosing x bits from a
examination of the capabilities of an associative vector of N bits. The memory capacity can now
memory would require the. assertion of the be directly calculated. An error can only occur
networks associativity. We can however gain a when the summation of the average correlation of
useful indication of the performance of a M-1 memories can equal or exceed the
structurally modified network by considering the correlation of the probe with the required
capacity alone, if the training strategy is exemplar thus:
u n a1tered. (N- 1) = (M- 1).Ac (6)
To aid the analysis of the Hopfield network we => M = int(N-1)/Ac + 1) + 1 (7)
can modify the structure shown in Fig. 1 to Where M represents the minimum number of
include an extra layer with each node in this extra memories that will cause an error and the
layer corresponding to a particular stored function int returns the integer part of the
memory. This modified structure can be seen in parameter. Since C is defined as the maximum
Fig. 2 and is functionally identical to the one number of memories that can occur before an
depicted in Fig. 1. When a probe is applied to the error occurs then:
network, the output of layer 2 will be the C=M-1 (8)
correlation of each exemplar with the probe, with This shows that as the size of the network
the highest output (in terms of magnitude) increases, the increase in available capacity
follows a square root characteristic.
Equation 7,8 assumes that the probability of the
M-1 memories aligning themselves is greater than
unity. i.e.:
N.M/(2(M-l))>= 1 (9)
If N is greater than 42 however then the
probability of the M-1 memories aligning
themselves is less than unity and the capacity C
will be higher than the figured calculated from
(8). The revised figure for C is C' and is
calculated thus:
M' = M + 2.r (10)
Layer 1 Layer 2 C'=M'- 1 (11)
Fig. 2 Modified Hopfield network where M', C' are the revised values of M and C
respectively. The increase in capacity is
corresponding to the memory which is closest (in incremented in steps of 2 to allow the
the Hamming sense) to the probe. The limited cancellation of an aligned non-stored memory to
capacity of the Hebbian trained Hopfield network one which is non-aligned (i.e. (M-l).Ac + r.Ac -
is due to errors caused by the effective correlation r.Ac = (M-1).Ac >= (N-1)). The value r is defined
noise of the probe with the memories other than as, the value of r that satisfies the following
the one it is closest too. To determine the condition:
capacity (C) we must first find an expression for
N.M'.&(M'-I) Com XI >= 2(M'-1) (12)
where the range of the index x is 0 to r. Equations N/2 away. In practice however, if a random
7,8,9 are the special case of 10,11,12 respectively probe is applied to a memory set then the
when r = 0 (N < 43). correlation of the probe to half of the memories
will be positive (i.e. less than a Hamming
distance of N/2) and equal to an average of Ac
Improving the capacity of the Hopfield and so the modification should not cause a
network problem in terms of the networks error correcting
To improve the capacity we can modify the basic capability. As on average one half of the outputs
structure in several ways. 'The first is to remove of layer 1 will be negative, the capacity is
the stipulation that the principle diagonal in the approximately doubled as shown below:
nodal connection matrix should be equal to zero for an error to occur
and the effect of this is to remove the direct (N-1) + 1 = (A~-l).(M-l)/2 (15)
connections between the input and layer 2 in Fig.
2. The capacity is increased [Oh881 and this as M must be an integer
increase is formalised below : M = int((1 + 2.N/(Ac-1)) + 1 (16)
for an error to occur as before M' = M + 2.r, C' = M' - 1
(N-1) + M = (M-l).Ac (13) Where the value r is defined as, the value of r that
satisfies the following condition:
as M must be an integer
M = int(((N-l)+Ac)/(Ac-1)) + 1 (14) N.M'.xx((M'-l)/2 Com x) >= 2"M'-*)D) (17)
as before Where x = 0 to r. With this modification, the
output of layer 1 will be limited to positive
M'=M+2.r, C'=M'- 1
outputs only. We can increase the relative
where r is derived from condition 12. This gives differmce in the amplitude of the node in layer 2
an increase in capacity of approximately Ac/(Ac- which represents the required exemplar and the
1>. amplitude of the other nodes. This can be done
One feature of the Hopfield network is the by subtracting from each output of layer 1 the
automatic storage of each exemplars complement. avcrage of the remaining outputs of layer 1 (by
Of course, the effective storage of the using the average of the outputs, the amount
complement is generally unwanted and so it subtracted is never greater than any individual
would be useful to introduce a modification output). This new structure (which is the third
which prevented its storage and this will be the modification) can be seen in Fig. 3 and the
second modification to the original network. A connections between layer 1 and layer 2 can be
simple method for achieving this is to add a non- described thus:
linear transfer function to the nodes in layer 2 of
Fig. 2 which will allow positive inputs through Ok = f(oj- b.zOk(0,)) = f(x) (1 8)
unchanged but prevents negative excursions (i.e. where f(x) = x; x > 0; f(x) = 0; x<= 0 and b = I/
f(x) = x; x > 0; f(x) = 0; x C= 0; i.e. the response (M-1). We can reduce the connections between
layer1 and layer2 by applying the following
of an "ideal diode"). This of course does limit
the error correcting capability of the network to simple modification shown below [Hou90b]:

Layer 1 Layer 2 Layer 3

Fig. 3 Modified Hopfield network using subtracter
This modification is shown in Fig. 4 and it should
be noted that the circuit shown is functionally
identical to the structure shown in Fig. 3. The
capacity for the subnactor circuit is derived
below: In this paper we have introduced a technique to
enable the calculation of the capacity of the
output of the node in layer 1 which represents the
Hopfield network and its enhanced derivatives.
exemplar Oj = (N-1) + 1 (20)
We have suggested a logical sequence of
output of each node in layer 1 which does not improvements to the original network and
represent the exemplar Oj = (Ac-1) (21) presented appropriate formulae for the capacity of
for an error to occur (using equation 19,20,21) each structure. These improvements (although
3.N/2 - (Ac-1) - (Ac-l).(M-l)/d= 9 (22) slightly conmved) lead us to a structure whose
=> M = int(3.(2.N/(Ac-i)-l)) (23) characteristics in terms of both capacity and
:is before M‘ = M + 2.r, C’ = M‘ - 1 associativity is equal to the theoretical maximum.
This structure is the Hamming network (modified
to reduce the iteration layer connections).
Although the sequence of improvements
suggested do not in any way constitute a proof,
they do suggest that the Hamming network is the
logical conclusion to the optimisation (in
performance) of the original Hopfield network.

[Heb49] Hebb, D., “Organization of behaviour”,
Layer 2 Layer 3 New York science editions, 1949.
Fig 4 Improved subtracter circuit
[Hop821 Hopfield J.J., “Neural networks and
physical system with emergent collective
where r is defined as the value that condition 20. computational abilities”, Roc. Natl. Acad. Sci.
Although the modifications suggested in this 79, pp 2554-2558, 1982.
paper offer a significant improvement in capacity,
[Hou90a] Houselander, P.., Taylor, J.T.
the modified structures wii! not give an
“Measuring the error correcting capability of the
associativity of 100% which is independent of the
Hopfield network”, IEE Elec. Letts., Vol 26, No
Hamming distance of the Probe from its
2, 1990.
corresponding exemplar. We can of course add
further subtracter circuits and each additional [Hou90b] Houselander, P., Taylor, J.T.,
circuit will give an improved performance at the “Improving the efficiency of the Hamming
expense of extra circuitry. associative memory”, under preparation.
[Lip871 Lippmann R., “An introduction to
computing with Neural Nets”, IEEE ASSP
The Hamming network Magazine, April 1987.
As each subtracter circuit is identical, we can
[Oh881 Oh S., Yoon T., Kim J.C., “Associative-
simulate each additional circuit by feeding the
memory model based on neural networks:
output uf layer 2 to the input of layer 1 (and
modification of Hopfield model”, Optics letters,
removing the input to the network after the nodes
V0123, NO 1, pp 74-76,1988.
in layer 1 have been prjmed). This also removes
the necessity for the connections between layer 1 [Was891 Wasserman P., “Neural computing
and layer 3 (Fig. 2). Layer 1,2 can then be theory and practice”,Von Nostrand Reinhold,
allowed to “iterate” until only one output remains 1989.
positive. This structure will then exhibit a
capacity and associativity that is equal to the
theoretical maximum that can be obtained and is
equivalent to the Hamming network (modified to
reduce the iteration layer connections)

