Professional Documents
Culture Documents
ch16 Structured Probabilistic Modelsfor Deep Learning
ch16 Structured Probabilistic Modelsfor Deep Learning
Jen-Tzung Chien
• Introduction
• The Challenge of Unstructured Modeling
• Using Graphs to Describe Model Structure
– Directed Models
– Undirected Models
– The Partition Function
– Energy-Based Models
• Restricted Boltzmann Machine
1
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Outline
• Introduction
• The Challenge of Unstructured Modeling
• Using Graphs to Describe Model Structure
– Directed Models
– Undirected Models
– The Partition Function
– Energy-Based Models
• Restricted Boltzmann Machine
2
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Introduction
• Because the structure of the model is dened by a graph, these models are
often also referred to as graphical models
3
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Outline
• Introduction
• The Challenge of Unstructured Modeling
• Using Graphs to Describe Model Structure
– Directed Models
– Undirected Models
– The Partition Function
– Energy-Based Models
• Restricted Boltzmann Machine
4
Structured Probabilistic Modelsfor Deep Learning Deep Learning
5
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Detail of Task
• Density estimation
given an input x, the machine learning system returns an estimate of the
true density p(x) under the data generating distribution
• Denoising
given a damaged or incorrectly observed input x̃, the machine learning
system returns an estimate of the original or correct x.
6
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Detail of Task
• Sampling
the model generates new samples from the distribution p(x)
7
Structured Probabilistic Modelsfor Deep Learning Deep Learning
8
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Real Problem
• Memory
representing the distribution as a table will require too many values to store
• Statistical efficiency
Because the table-based model has an astronomical number of parameters,
it will require an astronomically large training set to fit accurately
• Runtime
the runtime of these operations is as high as the intractable memory cost
of storing the model
9
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Solution
• This allows the models to have significantly fewer parameters and therefore be
estimated reliably from less data
10
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Outline
• Introduction
• The Challenge of Unstructured Modeling
• Using Graphs to Describe Model Structure
– Directed Models
– Undirected Models
– The Partition Function
– Energy-Based Models
• Restricted Boltzmann Machine
11
Structured Probabilistic Modelsfor Deep Learning Deep Learning
• Graphical models can be largely divided into two categories: models based on
directed acyclic graphs, and models based on undirected graphs
12
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Directed Models
• Directed graphical models are called ”directed” because their edges are directed
13
Structured Probabilistic Modelsfor Deep Learning Deep Learning
14
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Directed Models
• In our relay race example, this means that, using the graph drawn in gure
above
p(t0, t1, t2) = p(t0)p(t1|t0)p(t2|t1)
15
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Directed Models
• To modeln discrete variables each having k values, the cost of the single table
approach scales like O(k n), as we have observed before
• As long as we can design a model such that m << n, we get very dramatic
savings
• In other words, so long as each variable has some parents in the graph, the
distribution can be represented with very few parameters
16
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Undirected Models
• Not all situations we might want to model have such a clear direction to their
interactions
17
Structured Probabilistic Modelsfor Deep Learning Deep Learning
18
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Undirected Models
• For each clique C in the graph, a factor ϕ(C) (also called a clique potential)
measures the affinity of the variables in that clique for being in each of their
possible joint states
• A clique of the graph is a subset of nodes that are all connected to each other
by an edge of the graph.
19
Structured Probabilistic Modelsfor Deep Learning Deep Learning
1
p(x) = p̃(x)
Z
20
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Energy-Based Models
p̃(x) = exp(−E(x))
21
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Outline
• Introduction
• The Challenge of Unstructured Modeling
• Using Graphs to Describe Model Structure
– Directed Models
– Undirected Models
– The Partition Function
– Energy-Based Models
• Restricted Boltzmann Machine
22
Structured Probabilistic Modelsfor Deep Learning Deep Learning
• It has a single layer of latent variables that may be used to learn a representation
for the input
• The canonical RBM is an energy-based model with binary visible and hidden
units. Its energy function is
23
Structured Probabilistic Modelsfor Deep Learning Deep Learning
24
Structured Probabilistic Modelsfor Deep Learning Deep Learning
• Together these properties allow for efficient block Gibbs sampling, which
alternates between sampling all of h simultaneously and sampling all of v
simultaneously
25
Structured Probabilistic Modelsfor Deep Learning Deep Learning
Binary RBM
• Since the energy function itself is just a linear function of the parameters, it is
easy to take its derivatives. For example
∂
E(v, h) = −vihj
∂W i,j
• Training the model induces a representationh of the data v. We can often use
Eh∼p(h|v)[h] as a set of features to describe v
26
Structured Probabilistic Modelsfor Deep Learning Deep Learning
References
[1] J. Pearl, “Bayesian netwcrks: A model cf ?self-activated memory for evidential reasoning,” 1985.
[2] R. Kindermann and L. Snell, Markov random fields and their applications, 1980.
[3] P. Smolensky, “Information processing in dynamical systems: Foundations of harmony theory,” DTIC Document,
Tech. Rep., 1986.
27