Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Neural Networks and Deep Learning

Dr. Srikanth Thota

Dr. Srikanth Thota Neural Networks and Deep Learning


Restricted Boltzmann Machine

The Boltzmann machine was fully observed. Will have hidden units as
well.
A classic architecture called the restricted Boltzmann machine assumes
a bipartite graph over the visible units and hidden units:
A bipartite graph (or bigraph) is a graph whose vertices can be divided
into two disjoint and independent sets U and V, that is every edge
connects a vertex in U to one in V.
A complete bipartite graph or biclique is a special kind of bipartite
graph where every vertex of the first set is connected to every vertex of
the second set.
The hidden units learn more abstract features of the data.

Dr. Srikanth Thota Neural Networks and Deep Learning


Restricted Boltzmann Machine

RBM has binary-valued hidden and visible units, and consists of a matrix of weights w of
size m × n.
Each weight element (wi,j ) of the matrix is associated with the connection between the
visible (input) unit vi and the hidden unit hj .
There are bias weights (offsets) ai for vi and bj for hj .
Given the weights and biases,
∑ the energy
∑ of a configuration
∑∑ (pair of boolean vectors) (v,h)
is defined as E(v, h) = − i ai vi − j bj hj − i j vi wi,j hj

In matrix notation, E(v, h) = −aT v − bT h − vT Wh.


This energy function is analogous to that of a Hopfield network.

Dr. Srikanth Thota Neural Networks and Deep Learning


Restricted Boltzmann Machine

The joint probability distribution for the visible and hidden vectors is defined in terms of
the energy function as follows
P(v, h) = Z1 e−E(v,h)
where Z is a partition function defined as the sum of e−E(v,h) over all possible
configurations, which can be interpreted as a normalizing constant to ensure that the
probabilities sum to 1.
The marginal probability of a visible vector is the sum of P(v, h) over all possible hidden
1 ∑ −E(v,h)
layer configurations, P(v) = e ,and vice versa.
Z
{h}

Dr. Srikanth Thota Neural Networks and Deep Learning


Restricted Boltzmann Machine

The hidden unit activations are mutually independent given the visible unit activations and
vice versa.
For m visible units and n hidden units, the conditional probability of a configuration
∏m of the
visible units v, given a configuration of the hidden units h, is P(v|h) = i=1 P(vi |h).
∏n
Conversely, the conditional probability of h given v is P(h|v) = j=1 P(hj |v).

Individual activation probabilities


( )
∑ m
P(hj = 1|v) = σ bj + wi,j vi
( ∑n
i=1 )
P(vi = 1|h) = σ ai + j=1 wi,j hj where σ denotes the logistic sigmoid.

Dr. Srikanth Thota Neural Networks and Deep Learning


Restricted Boltzmann Machine
To estimate the model statistics for the negative update, start from the data and run a
few steps of Gibbs sampling.
By the conditional independence property, all the hiddens can be sampled in parallel, and
then all the visibles can be sampled in parallel.

This procedure is called contrastive divergence.


It’s a good approximation to the model distribution.

Dr. Srikanth Thota Neural Networks and Deep Learning


Restricted Boltzmann Machine
Contrastive Divergence(CD) Algorithm steps for single sample
For a training sample v, compute the probabilities of the hidden units and sample a hidden
activation vector h from this probability distribution.
Compute the outer product of v and h and call this the positive gradient.
From h, sample a reconstruction v’ of the visible units, then resample the hidden
activations h’ from this. (Gibbs sampling step)
Compute the outer product of v’ and h’ and call this the negative gradient.
Update to the weight matrix W is the positive gradient minus the negative gradient, times
some learning rate:
∆W = η(vhT − v′ h′T )

Update the biases a and b analogously


∆a = η(v − v′ )
∆b = η(h − h′ )

Dr. Srikanth Thota Neural Networks and Deep Learning


Thank You

Dr. Srikanth Thota Neural Networks and Deep Learning

You might also like