Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Probabilistic graphical models

Sergio Barbarossa

Signal Processing for Big Data


Markov network

A Markov network consists of:

•  An undirected graph G = (V,E), where each vertex v ∈V represents a


random variable and each edge {u,v} ∈ E represents statistical dependency
between the random variables u and v

•  A set of potential (or compatibility) functions (also called factors or


clique (*) potentials), where each has the domain of some clique k
in G
Each is a mapping onto non-negative real numbers

Defining property: A node p is conditionally independent of another node q in


the Markov network, given some set of nodes S, if every path from p to q
passes through a node in S
________
(*) A maximal clique is a set of vertices that induces a complete subgraph and that is not a
subset of the vertices of any larger complete subgraph

Signal Processing for Big Data


Markov network

Joint probability density function (pdf) over a graph

The joint pdf represented by a Markov network can be written as:

(1)

where is the state of the random variables in the c-th clique, and the
normalizing constant Z (also called a partition function) is

The normalized product constitutes a joint distribution that embodies all


conditional independencies portrayed by the graph

Any positive Markov field can be expressed in the form (1)

Signal Processing for Big Data


Markov network

Example

Any density that factorizes according to this graph must have the form

Signal Processing for Big Data


Bayes networks

Bayesian networks are directed acyclic graphs (DAG) whose nodes represent
random variables and whose arcs encode conditional independencies between
the variables

A DAG is a Bayesian Network relative to a set of variables if the joint distribution


of the node values can be written as the product of the local distributions of each
node and its parents
n
pX (x1 , x2 , . . . , xn ) = p(xi /parents(xi ))
i=1

Given a DAG D and a probability distribution P, a necessary and sufficient


condition for D to be a Bayesian network of P is that each variable Xi be
conditionally independent of all its non-descendants, given its parents

Signal Processing for Big Data


Bayes networks

Two sets of nodes X and Y are d-separated by a third set Z, if the corresponding
variable sets X and Y are statistically independent given the variables in Z

The minimal set of nodes which d-separates node X from all other nodes is given
by X 's Markov blanket

Example
a

p(a, b, c, d, e) = p(a) p(d) p(b/a) p(c/a) p(e/c)


b
c

Signal Processing for Big Data


Markov random field and Gibbs distribution

Joint probability distribution


1
pX (x) = c (xc )
Z
c C

where is the set of cliques

If the product is positive for any x, introducing the function Vc (xc ) := log c (xc )
the pdf can be written in exponential form
1
pX (x) = exp{ Vc (xc )}
Z
c C

This is known, in physics, as the Gibbs (or Boltzman) distribution with interaction
potential and energy
UX (x) = Vc (xc )
c C

Signal Processing for Big Data


Markov random field and Gibbs distribution

Key information represented by the graph

If i and j are not neighbors, xi and xj are independent, conditioned to the others

Given a subset of vertices, pX (x) factorizes as

where the second factor does not depend on xa


As a consequence

Signal Processing for Big Data


Markov random field and Gibbs distribution

Example: Gaussian Markov Random Field


1 1 1
pX (x) = e 2 (x µ) (x µ)
(2 )n | |
The “Markovianity” shows up when each variable only interacts with a few
others through the quadratic energy

the matrix 1 must be sparse


:=

A random vector X = (X1, . . . , Xn) is Markov with respect to a graph G if, for all
vertex cutsets S breaking the graph into disjoint pieces A and B, the conditional
independence statement XA YB /XS holds

Signal Processing for Big Data


Markov random field and Gibbs distribution

In a GMRF, nodes i and j are neighbors if and only if the


entries of the covariance matrix inverse satisfy

A Gauss-Markov field may be simply defined by its quadratic energy function

where is a sparse symmetric positive definite matrix with

The most probable configuration is then the solution of the sparse system

Signal Processing for Big Data


Markov random field and Gibbs distribution

Prop. 1: Each diagonal entry of the inverse covariance matrix is the


reciprocal of a partial variance

Prop. 2: Each off diagonal entry of the inverse covariance matrix, scaled to
have a unit diagonal, is the negative of the partial correlation
coefficient between the two corresponding variables, partialled on all
the remaining variables

Signal Processing for Big Data


Markov random field and Gibbs distribution

Example of graphs supporting MRF models

(a)  rectangular lattice with first-order neighborhood system;


(b)  non-regular planar graph associated to an image partition;
(c)  quad-tree

For each graph, the grey nodes are the neighbors of the white one

Signal Processing for Big Data


Markov random field and Gibbs distribution

Samples from an anisotropic Gaussian MRF built on a regular lattice

In the Gaussian case, any conditional or marginal distribution is Gaussian

In particular

Signal Processing for Big Data


Markov random field and Gibbs distribution

Estimation of a zero mean Gaussian graphical model

Main idea: Estimate covariance matrix and infer graph topology G = (V, E) from
1
:=

ij =0 (i, j) / E

Signal Processing for Big Data


Markov random field and Gibbs distribution

Estimation of a zero mean Gaussian graphical model

Observation model (K statistically independent snapshots)


K
1 1 T 1
pX (x) = e 2 xk C xk

k=1
(2 )n |C|
K
nK K 1
ln pX (x) = + ln xTk xk
2 2 2
k=1
Maximum Likelihood estimate

= arg min tr( C) ln | |


where
K
1
C= xk xTk
K
k=1
Signal Processing for Big Data
Markov random field and Gibbs distribution

Estimation of a zero mean Gaussian graphical model

The unrestricted ML estimate is simply


1
M LE =C
if the sample covariance matrix is invertible

If K < n, the sample covariance matrix is not invertible and the solution can be
sought as

= arg min tr( C) ln | | + 1

Signal Processing for Big Data


References

[1] J. Pearl, “Probabilistic Reasoning in Intelligent Systems: Networks of plausible


inference”, Morgan Kaufman Pub., 1988

[2] J. Whittaker, Graphical Models in Applied Multivariate Statistics, Wiley 1990

[3] P. Perez, “Markov Random Fields and Images”, CWI Quart., 1998.

[4] S. Geman, D. Geman, “Stochastic Relaxation, Gibbs Distribution, and the Bayesian
Restoration of Images”, IEEE Tr. on Pattern Analysis and Machine Intelligence, 1984

Signal Processing for Big Data

You might also like