Professional Documents
Culture Documents
Hopfield Networks
Hopfield Networks
Neural networks and physical systems with emergent collective computational capabilities, PNAS (1982) 79:2554
Key idea: A simple recurrent artificial neural network capable of storing and recovering an
input pattern in a memory, and correcting errors in corrupted inputs.
- Network of ‘units’ connected by links. Each unit can have only one of two possible values (+1/-1)
at each time point in the evolution of the network.
- Links betwen any two units or nodes (i,j) are symmetric (undirected graph) à the matrix of
weights W is symmetric (if units represent neurons or populations of neurons, they are connected by mutual
interactions of the same sign and strength).
- The weights between two units depend on the activity of each unit (Hebbian learning)
Example: a 4 unit network
The activity of the network at each time is
given by a state vector V (representing the
state of activity of each unit in the network).
Assume you want to store a given state
vector in memory, such as
" +1 %
$ '
−1
V=$ '
$ −1 '
$ +1 '
# &
Imagine your state vector V represents a pattern (some data) that you have stored in memory.
Now initialize the network with a slightly different state vector V’ (corrupted data)
" +1 % # +1 &
$ ' % (
−1 −1
V=$ ' V! = % (
$ −1 ' % −1 (
$ +1 ' % −1 (
# & $ '
With this new state vector V’, and the weight matrix W ‘learnt’ from the stored pattern,
# & " 0 −1 −1 +1 %
+1 $ '
% (
−1 0 +1 −1 '
V! = %
−1 ( W =$
% −1 ( $ −1 +1 0 −1 '
% ( $ +1 −1 −1 0 '&
$ −1 ' #
V!! = f ( W ⋅ V!)
Vc!! = f ( wcaVa + wcbVb + wcdVd ) = f (−1−1+1) = −1
𝑓(+1 + 1 + 1)
𝑓(−1 − 1 − 1)
𝑓(W ' V)= W ' V=(𝑁 − 1)V
𝑓(−1 − 1 − 1)
so flippling one of the signs in the input
𝑓(+1 + 1 + 1) will not change the net sign of the
activation function
Dynamical intuition:
Proof: V t = f ( W ⋅ V t−1 )
but W ⋅ V = ( V ⋅ VT − I) ⋅ V = V ⋅ VT ⋅ V − V = V ⋅ ( VT ⋅ V − I) = V ⋅ ( N −1)
where N is the number of units of the network.
Note that f ( V ⋅ (N −1)) = V, and thus V t = V t−1
2.- This clever choice of the weights is analogous to the Hebb’s rule for learning based on
long-term synaptic plasticity: Neurons that fire together, wire together (repeated stimulation
of neuron B by neuron A increases the synaptic strength between B and A).
In this Hopfield’s model, two units that have the same activity have positive
connections, while those of opposite activity have negative connections.
Physical intuition: the ‘energy’ function
Define an ‘energy’ function that depends on the state vector and on the weight matrix of
connections as:
1 1 1
E =− ∑ ∑
2 i j
wijViVj = − VT ⋅ W ⋅ V = − VT ⋅ x
2 2
For a perturbation of the stored state, this function will be monotonically decreasing as
time (update rule) is iterated.
Proof:
If we change (flip sign) the state of one unit i at time t, and define DVi =(Vit-1-Vit)/2, the change in ‘energy’ is:
ΔE = −ΔVi ∑ wijVj = −ΔVi ⋅ xi (where xi is the input of the activation function for unit i)
j≠i
If we are already on the attractor, Vit =Vit-1 for any unit i therefore ΔE = −ΔVi xi = 0 since ΔVi = 0
Morever, from the definition of the energy function, when V is an attractor stored in the weight matrix W:
1 1
E = − VT ⋅ W ⋅ V = − VT ⋅ ( V ⋅ VT − I) ⋅ V =
2 2
1 T
=−
2
( V ⋅ V ⋅ VT ⋅ V − VT ⋅ V) =
1 N(N −1)
= − VT ⋅ V ( VT ⋅ V − I) = −
2 2
which shows that the energy is negative and has the lowest possible value
(note that the matrix W has N(N-1)/2 non-zero elements: lower/upper triangle)
Vν , ν = 1,..., P
Define a set of weight matrices Wν as:
W jiν = Vjν Viν and sum up to each component to calculate a final weight matrix:
W ji = ∑Vjν Viν
ν
The network will converge to the attractor that is ‘closest’ to the initial state
energy landscape changes in such a way that it acquires many additional pre-trained transformer (ChatGPT), and others. Despite being one
local minima that have nothing to do with the memory patterns that we of the most popular deep learning models, transformers are typi-
are trying to store (Fig. 1a). These additional local minima are closely cally designed through trial and error, and the theoretical principles
related to spin glass states, commonly studied in statistical physics of behind their computational strategies remain mysterious. In contrast,
Are Hopfield networks really useful or just a proof of concept?
disordered systems. Their presence is undesirable for a proper function Hopfield networks have a well-established record of theoretical meth-
Comment of the Hopfield network as an associative memory system. This linear ods, but have yet to demonstrate truly impressive empirical results in
Comment scaling relationship between the dimension of the feature space and the large-scale machine learning systems. The correspondence between
memory storage capacity presents a problem from the perspective of dense associative memories and transformer’s attention is interesting
Anew
A newfrontier
frontierfor
forHopfield
Hopfieldnetworks
networks Modern Hopfield networks
In 2016, we realized that it is possible to overcome the linear scaling
like architectures that are fundamentally designed around the idea
of associative memory.
A study reported in a recent preprint8 pursued this second idea
problem between the number of features and memory storage capacity and proposed replacing a sequence of conventional transformer layers
Dmitry Krotov
Dmitry Krotov
nature reviews physics Check for updates
Check4for updates
Over
a the pastHopfield
Traditional few years
networkthere hasassociative
b Dense been amemory by introducing a rapidly growing activation function . The model,
new transformer-like architectures for language and4image processing
Over theKpast few years there has been a by introducing
dubbed ‘denseaassociative
rapidly growing memory’, activation function . The
is characterized by an model,
energy
resurgenceµ of interest in Hopfield
mem K networks of mem
dubbed
tasks that are grounded in the idea of associative memory.
function ‘dense associative
that includes memory’,
higher is characterized
than quadratic by an between
interactions energy
resurgence
E = – ∑ (ξof
· x)interest in Hopfield
2
networks
E = – ∑ F(ξ µ · x)
of
associative
µ = 1 memory. Dmitry Krotov µ=1 discusses function
the featuresthat (Fig.
The broader
includes higher
1b). The
landscape keythan quadratic
difference interactions
between between
dense associative
associative memory. Dmitry Krotov discusses the features
memory (Fig. 1b).
and the traditional The key difference
can we gainHopfield
between
network dense associative
is theoretical
the presence of the
recent theoretical advances and their broader memory
What insights
and the traditional
activation Hopfield
from these
network
recent
isthis
the presence
advances?
ofquad-
the
recent theoretical advances and their broader rapidly
First,growing
the longstanding problem F(·).
function of the When
coupling function
between is memory
impact in the context of energy-based neural rapidly
ratic, growing
dense activation
associative function
memory F(·). When
reduces to thethis function is
traditional quad-
Hopfield
impact in the context of energy-based neural capacity and feature space dimension in Hopfield networks has been
Energy (E)
Energy (E)