Probabilistic Graphical Models

Probabilistic graphical models
Sergio Barbarossa
Signal Processing for Big Data

Markov network
A Markov network consists of:
•  An undirected graph G = (V,E), where each vertex v ∈V represents a

random variable and each edge {u,v} ∈ E represents statistical dependency
between the random variables u and v
•  A set of potential (or compatibility) functions (also called factors or

clique (*) potentials), where each has the domain of some clique k
in G
Each is a mapping onto non-negative real numbers
Defining property: A node p is conditionally independent of another node q in

the Markov network, given some set of nodes S, if every path from p to q
passes through a node in S
________
(*) A maximal clique is a set of vertices that induces a complete subgraph and that is not a
subset of the vertices of any larger complete subgraph

Markov network
Joint probability density function (pdf) over a graph
The joint pdf represented by a Markov network can be written as:
(1)
where is the state of the random variables in the c-th clique, and the
normalizing constant Z (also called a partition function) is
The normalized product constitutes a joint distribution that embodies all

conditional independencies portrayed by the graph
Any positive Markov field can be expressed in the form (1)

Markov network
Example
Any density that factorizes according to this graph must have the form

Bayes networks
Bayesian networks are directed acyclic graphs (DAG) whose nodes represent
random variables and whose arcs encode conditional independencies between
the variables
A DAG is a Bayesian Network relative to a set of variables if the joint distribution

of the node values can be written as the product of the local distributions of each
node and its parents
n
pX (x1 , x2 , . . . , xn ) = p(xi /parents(xi ))
i=1
Given a DAG D and a probability distribution P, a necessary and sufficient

condition for D to be a Bayesian network of P is that each variable Xi be
conditionally independent of all its non-descendants, given its parents

Bayes networks
Two sets of nodes X and Y are d-separated by a third set Z, if the corresponding
variable sets X and Y are statistically independent given the variables in Z
The minimal set of nodes which d-separates node X from all other nodes is given
by X 's Markov blanket
Example
a
p(a, b, c, d, e) = p(a) p(d) p(b/a) p(c/a) p(e/c)

b
c

Markov random field and Gibbs distribution
Joint probability distribution

1
pX (x) = c (xc )
Z
c C
where is the set of cliques
If the product is positive for any x, introducing the function Vc (xc ) := log c (xc )
the pdf can be written in exponential form
1
pX (x) = exp{ Vc (xc )}
Z
c C
This is known, in physics, as the Gibbs (or Boltzman) distribution with interaction
potential and energy
UX (x) = Vc (xc )
c C

Key information represented by the graph
If i and j are not neighbors, xi and xj are independent, conditioned to the others
Given a subset of vertices, pX (x) factorizes as
where the second factor does not depend on xa

As a consequence

Example: Gaussian Markov Random Field

1 1 1
pX (x) = e 2 (x µ) (x µ)
(2 )n | |
The “Markovianity” shows up when each variable only interacts with a few
others through the quadratic energy
the matrix 1 must be sparse

:=
A random vector X = (X1, . . . , Xn) is Markov with respect to a graph G if, for all
vertex cutsets S breaking the graph into disjoint pieces A and B, the conditional
independence statement XA YB /XS holds

In a GMRF, nodes i and j are neighbors if and only if the

entries of the covariance matrix inverse satisfy
A Gauss-Markov field may be simply defined by its quadratic energy function
where is a sparse symmetric positive definite matrix with
The most probable configuration is then the solution of the sparse system

Prop. 1: Each diagonal entry of the inverse covariance matrix is the

reciprocal of a partial variance
Prop. 2: Each off diagonal entry of the inverse covariance matrix, scaled to
have a unit diagonal, is the negative of the partial correlation
coefficient between the two corresponding variables, partialled on all
the remaining variables

Example of graphs supporting MRF models
(a)  rectangular lattice with first-order neighborhood system;

(b)  non-regular planar graph associated to an image partition;
(c)  quad-tree
For each graph, the grey nodes are the neighbors of the white one

Samples from an anisotropic Gaussian MRF built on a regular lattice
In the Gaussian case, any conditional or marginal distribution is Gaussian
In particular

Estimation of a zero mean Gaussian graphical model
Main idea: Estimate covariance matrix and infer graph topology G = (V, E) from
1
:=
ij =0 (i, j) / E

Observation model (K statistically independent snapshots)

K
1 1 T 1
pX (x) = e 2 xk C xk
k=1
(2 )n |C|
K
nK K 1
ln pX (x) = + ln xTk xk
2 2 2
k=1
Maximum Likelihood estimate
= arg min tr( C) ln | |

where
K
1
C= xk xTk
K
k=1
The unrestricted ML estimate is simply

1
M LE =C
if the sample covariance matrix is invertible
If K < n, the sample covariance matrix is not invertible and the solution can be
sought as
= arg min tr( C) ln | | + 1

References
[1] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Networks of plausible

inference”, Morgan Kaufman Pub., 1988
[2] J. Whittaker, Graphical Models in Applied Multivariate Statistics, Wiley 1990
[3] P. Perez, “Markov Random Fields and Images”, CWI Quart., 1998.
[4] S. Geman, D. Geman, “Stochastic Relaxation, Gibbs Distribution, and the Bayesian
Restoration of Images”, IEEE Tr. on Pattern Analysis and Machine Intelligence, 1984

Probabilistic Graphical Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probabilistic Graphical Models

Uploaded by

Copyright:

Available Formats

Probabilistic graphical models

Signal Processing for Big Data

A Markov network consists of:

• An undirected graph G = (V,E), where each vertex v ∈V represents a

• A set of potential (or compatibility) functions (also called factors or

Defining property: A node p is conditionally independent of another node q in

Signal Processing for Big Data

Joint probability density function (pdf) over a graph

The joint pdf represented by a Markov network can be written as:

The normalized product constitutes a joint distribution that embodies all

Any positive Markov field can be expressed in the form (1)

Signal Processing for Big Data

Signal Processing for Big Data

A DAG is a Bayesian Network relative to a set of variables if the joint distribution

Given a DAG D and a probability distribution P, a necessary and sufficient

Signal Processing for Big Data

p(a, b, c, d, e) = p(a) p(d) p(b/a) p(c/a) p(e/c)

Signal Processing for Big Data

Joint probability distribution

where is the set of cliques

Signal Processing for Big Data

Key information represented by the graph

Given a subset of vertices, pX (x) factorizes as

where the second factor does not depend on xa

Signal Processing for Big Data

Example: Gaussian Markov Random Field

the matrix 1 must be sparse

Signal Processing for Big Data

In a GMRF, nodes i and j are neighbors if and only if the

A Gauss-Markov field may be simply defined by its quadratic energy function

where is a sparse symmetric positive definite matrix with

Signal Processing for Big Data

Prop. 1: Each diagonal entry of the inverse covariance matrix is the

Signal Processing for Big Data

Example of graphs supporting MRF models

(a) rectangular lattice with first-order neighborhood system;

Signal Processing for Big Data

Samples from an anisotropic Gaussian MRF built on a regular lattice

In the Gaussian case, any conditional or marginal distribution is Gaussian

Signal Processing for Big Data

Estimation of a zero mean Gaussian graphical model

Signal Processing for Big Data

Estimation of a zero mean Gaussian graphical model

Observation model (K statistically independent snapshots)

= arg min tr( C) ln | |

Estimation of a zero mean Gaussian graphical model

The unrestricted ML estimate is simply

= arg min tr( C) ln | | + 1

Signal Processing for Big Data

[1] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Networks of plausible

[2] J. Whittaker, Graphical Models in Applied Multivariate Statistics, Wiley 1990

Signal Processing for Big Data

You might also like

•  An undirected graph G = (V,E), where each vertex v ∈V represents a

•  A set of potential (or compatibility) functions (also called factors or

(a)  rectangular lattice with first-order neighborhood system;