Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Restricted Boltzmann Machine Dynamics

Nicolás Cabrera Carpio

November, 2022

1. Hopfield Network(Associative Memory)

An interesting starting point for the theoretical study of neural networks is the Hopfield network,
because it is one of the simplest neural network models and we can use methods from physics, more
specifically methods from statistical mechanics, to study its equilibrium properties[1].

To define the Hopfield Network we begin defining its basic unit: the neuron. We consider a neuron
as a binary variable, i.e. it can take only two values: -1 or 1. If the state of the neuron is σi = 1
the neuron transmits a signal (excited), if the neuron is in the state σj = −1 the neuron does not
transmit a signal (rest).

The Hopfield network is formed by connecting the neurons to each other by wij which represent the
strength of the connections, as shown in the figure (1).

Figura 1: Structure of a hopfiel network, the circles represent neurons and the lines represent the connections characterised by
the weights wij .

1
The input(local field) to neuron σi can be written as:[2]
X
hi = wij σj (1)
j

Given the local field(1), we can define the deterministic dynamics for each neuron σi as:

 
X
σi (t + ∆t) = sgn(hi ) = sgn  wij σj (t) (2)
j

Where: 

 1 if x>0



sgn(x) = 0 if x=0 (3)




−1 if x<0

For example, if neuron σi is excited σi (t) = 1 and hi < 0 the neuron switches to rest σi (t+∆t) = −1.
In other words, the neuron σi at time t + ∆t is aligned in the direction of the local field hi

Now, we define a stable fixed point as a point where for each neuron in the network it is satisfied
that:  
X
σi (t + ∆t) = sgn  wij σj (t) = σi (t) (4)
j

The states of all neurons in the network are invariant under the rule of time evolution(2)

We define a excitation pattern as a fixed state of the network and we denote it as ξ µ and we denote
the state of neuron i in the pattern µ as ξiµ

To memorise p patterns in the neural network, each pattern must be a stable and fixed point(4),
for this aim the weights of the network wij can be chosen as[3]:

p
1 X µ µ
wij = ξi ξj (5)
N
µ=1

For i 6= j and wij = wji

1.1. Connection with physics

Now, we consider the Monte Carlo dynamics of the SK(Sherrington-Kirkpratrick) model with the
following Hamiltonian:

1X
H(σ) = − Jij σi σj (6)
2
i6=j
Where the spins are of ising type σi = ±1 and the interactions Jij follow independently a Gaussian
distribution[4]. And at equilibrium is described by the Boltzmann distribution:

e−βH(σ)
P (σ) = (7)
Z

e−βH is known as the partition function


P
Where Z = σ

In Monte Carlo dynamics we can choose the transition probability(which satisfies detailed balance
condition[5]) from the network state σa to the network state σb as:

e−β∆H
P (σa → σb ) = (8)
1 + e−β∆H

Where ∆H = H(σb ) − H(σa ) and β = 1/kB T being T the temperature and kB the Boltzmann
constant. This choice of transition probability is usually known as the heath-bath choice. It should
be noted that in the limit of T → 0(β → ∞) the probability of passing to a state where the energy
increases(∆H > 0) is zero, accepting only transitions where the energy decreases(∆H < 0) or
remains the same(∆H = 0)[5].

At each step of the dynamics we can flip just a single spin, and with the heath-bath transition
probability(8) we can find the following stochastic transition rule for one spin:
1
P (σi → −σi ) = (1 − σi tanh(βhi )) (9)
2
P
Where hi = i6=j Jij σj . Note that in the limit T → 0(β → ∞) the dynamics (9) reduces to (2).

The initial state of the system may be far from equilibrium, the dynamics(8) requires some time
for the system to reach equilibrium distribution(7) at a temperature T. So the memory retrieval
problem after long time from an initial condition in the neural network can be studied by equilibrium
statistical mechanics of the model(6) with the random interactions Jij following the rule(5)[3].

2. Restricted Boltzmann Machine(RBM)

2.1. Boltzmann Machine(BM)

The transition rule (2)( as we have seen, it is a Monte Carlo dynamics at zero temperature) is
a deterministic dynamics since we can completely know the state of neuron σi at time t + ∆t by
knowing its local field hi at time t, on the other hand, the model with Hamiltonian(6) is under a
stochastic dynamics(8). This model with stochastic dynamics is known as the generalised hopfield
model (GHP).

We are going to treat the Boltzmann machine (BM) as a generalization of the Hopfield network,
that’s why the Boltzmann machine is a stochastic neural network. For the construction of a Boltz-
mann machine we will divide the neurons into two groups: Visible and Hidden as shown in figure(2).
Figura 2: Structure of a Boltzmann Machine, the red circles represent visible neurons, the blue circles represent hidden neurons
and the lines represent the connections characterised by the weights Jij .

Visible neurons are those that we can access, e.g., data, and hidden neurons are latent variables
that can represent features of the data, in addition we will also include a bias term to each neuron,
which basically acts as a treshold. Thus, we can write the hamiltonian for the boltzmann machine
as: X X
H(σ) = − Jij σi σj − σi bi (10)
i<j i

At equilibrium the configurations of σ obey the Boltzmann distribution(7)

2.2. Restricted Boltzmann Machine(RBM)

Now, to define the RBM we consider two layers: visible layer and hidden layer and there are no
connections between neurons in the visible layer and no conections between neurons in the hidden
layer, connections only exist between layers figure(3). For the RBM the hamiltonian is given by:
X X X
H(σ, s) = − Jia σi sa − σi bi − sa ca (11)
i,a i a

Where Jia is the conection between visible neuron i and hidden neuron a, σi is the state of visble
neuron i with bias bi and sa is the state of hidden neuron a with bias c1

The network state obey the Boltzmann distribution:

e−βH(σ,s)
P (σ, s) = (12)
Z
−βH(σ,s)
P P
Where Z = σ se
Figura 3: Structure of a restricted Boltzmann Machine, the red circles represent visible neurons, the blue circles represent hidden
neurons and the lines represent the connections characterised by the weights Jia .

Now, we aim to go into the dynamics of the RBM. First, we should note that since there are no
connections between neurons in the same layer, they are conditionally independent and according
to Bayes’ rule we can write:
P (σ, s)
P (s|σ) = P (13)
s P (σ, s)

By using(11) we can obtain: Y


P (s|σ) = P (sa |σ) (14)
a

Where: P
eβca sa + i βσi Jia sa
P (sa |σ) = P (15)
2 cosh (βca + i βσi Jij )
Process(14) is called recognition, where given the states of visible neurons we can easily sample
the states of hidden neurons. Eq.(17) is the probability that given the states of the neurons in the
visible layer the state of neuron a in the hidden layer will be sa

Then, the network can reconstruct the states of the visible neurons with the states of the hidden
neurons by carrying out the converse process given by:
Y
P (σ|s) = P (σi|s ) (16)
i

Where: P
eβbi σi + a βsa Jia σi
P (σi |s) = P (17)
2 cosh (βbi + a βsa Jij )
Referencias

[1] D.J. Amit, H. Gutfreund, H. Sompolinsky, Phys. Rev. Lett. 55(14), 1530 (1985) .

[2] J.J. Hopfield, Proc. Natl. Acad. Sci. USA 79, 2554 (1982) .

[3] D.J. Amit, H. Gutfreund, H. Sompolinsky, Phys. Rev. A 32, 1007 (1985) .

[4] D. Sherrington, S. Kirkpatrick, Phys. Rev. Lett. 35(26), 1792 (1975) .

[5] For a more detailed description of Monte Carlo methods see, for example, the notes by E.
Carlon available in: http : //itf.f ys.kuleuven.be/ enrico/T eaching/montec arlo2 012.pdf .

You might also like