Professional Documents
Culture Documents
Slides Lecture6
Slides Lecture6
Slides Lecture6
N h
X i
∂ (n) (n)
ln P ({x(n) }N
1 |W) = x i xj − hxi xj iP (x|W)
∂wij
n=1
= N hxi xj iData − hxi xj iP (x|W)
– p. 161
Interpretation of Boltzmann Machines Learning
Illustrative description (MacKay’s book, pp. 523):
– p. 162
Boltzmann Machines with Hidden Units
To model higher order correlations hidden units are required.
• x: states of visible units,
• h: states of hidden units,
• generic state of a unit (either visible or hidden) by yi ,
with y ≡ (x, h),
• state of network when visible units are clamped in state
x(n) is y(n) ≡ (x(n) , h).
Probability of W given a single pattern x(n) is
X X
(n) (n) 1 1 (n)T
P (x |W) = P (x , h|W) = exp y Wy(n)
Z(W) 2
h h
X
where 1 T
Z(W) = exp y Wy
2
x,h
– p. 163
Boltzmann Machines with Hidden Units (cont.)
Applying the maximum likelihood method as before one
obtains
∂ X
(n) N hyi yj iP (h|x(n) ,W) − hyi yj iP (h|x,W)
ln P ({x }1 |W) = |
∂wij n {z } | {z }
clamped to x(n) free
– p. 165
Boltzmann Machines Updates Weights
Combine gradient descent and simulated annealing to update
weights
η
∆wij = hyi yj iP (h|x(n) ,W) − hyi yj iP (h|x,W)
T | {z } | {z }
(n) (n) (n)
clamped to (xi , xo ) clamped to xi