Professional Documents
Culture Documents
9 1-UndirectedGraphs
9 1-UndirectedGraphs
Srihari
Sargur Srihari
srihari@cedar.buffalo.edu
1
Machine Learning ! ! ! ! ! Srihari
Topics
• Directed versus Undirected graphical models
• Components of a Markov Network
• Independence Properties
• Parameterization
• Gibbs Distributions and Markov Networks
• Reduced Markov Networks
2
Machine Learning ! ! ! ! ! Srihari
Directed vs Undirected
• Bayesian Networks= Directed Graphical Models
– Useful in many real-world domains
• Undirected Graphical Models
– When no natural directionality exists betw. variables
– Offer a simpler perspective on directed graphs
• Independence structure
• Inference task
– Also called Markov Networks or MRFs
• Can combine directed and undirected graphs
– Unlike in previous discussion on BNs attention
restricted to discrete state spaces 3
Machine Learning ! ! ! ! ! Srihari
Directed vs Undirected
• Bayesian Networks= Directed Graphical Models
– Useful in many real-world domains
• Undirected Graphical Models
– When no natural directionality exists betw. variables
– Offer a simpler perspective on directed graphs
• Independence structure
• Inference task
– Also called Markov Networks or MRFs
• Can combine directed and undirected graphs
• Attention restricted to discrete state spaces
4
Machine Learning ! ! ! ! ! Srihari
First attempt
Alice at BN Second attempt
A A at BN
Bob
Debbie D B D B D B
C C C A
Charles
(a) (b) (c)
Drawbacks of BN in Example
• Independences imposed by BN are
inappropriate for the distribution
• Interaction between the variables are
symmetric
Alice
Debbie Bob
Charles
8
Machine Learning ! ! ! ! ! Srihari
X ∈{A, B, C, D}
x1 : student has the misconception
x 0 : Student does not have a misconcpetion
Example of a Factor
Alice
• Factor is not normalized to sum to Alice & Bob
Debbie Bob
• Values need not be in [0,1]
• Φ1(A,B) asserts that A and B Charles
b0 = has misconception
11
b1 = no misconception
Machine Learning ! ! ! ! ! Srihari
Debbie D B Bob D
12
Machine Learning ! ! ! ! ! Srihari
Charles
15
Machine Learning ! ! ! ! ! Srihari
Answering Queries
• We can obtain any desired probability from the joint
distribution as usual
• P(b0)=0.268: Bob is 26% likely to have a misconception
• P(b1|c0)=0.06: if Charles does not have the
misconception, Bob is only 6% likely to have
misconception.
• Most probable joint probability (from table):
P(a0,b1,c1,d0)=0.69
– Alice,Debby have no misconception, Bob,Charles have misconception
16
Machine Learning ! ! ! ! ! Srihari
Benefit of MN representation
• Flexibility in representing interactions
between variables
• E.g., if we want to change interaction
between A and B simply modify the entries
in that factor
• Flip side of flexibility is that the effects of
these changes are not intuitive
17
Machine Learning ! ! ! ! ! Srihari
Parameterization
• Need to associate graph with parameters
• Not as intuitive as CPDs in Bayesian networks
– Factors do not correspond to probabilities or CPDs
• Not intuitively understandable
– Hard to elicit from people
– Significantly harder to estimate from data
• As seen with algorithms to learn MNs
19
Machine Learning ! ! ! ! ! Srihari
20
Machine Learning ! ! ! ! ! Srihari
Factors
• Factor describes compatibilities between
different values of variables in its scope
• A graph can be parameterized by
associating a set of factors with it
• One idea: associate a factor with each
edge
– This is insufficient for a full distribution as
shown in example next
21
Machine Learning ! ! ! ! ! Srihari
Pairwise Factors
• Associating parameters with each edge is
insufficient to specify arbitrary joint distribution
• Consider a fully connected graph
– There are no independence assumptions
• If all variables are binary
– Each factor over an edge has 4 parameters
– Total number of parameters is 4 nC2 (a quadratic in n)
– An arbitrary distribution needs 2n-1 parameters
• Pairwise factors have insufficient parameters
– Parameters cannot be just associated with edges
Machine Learning ! ! ! ! ! Srihari
23
Machine Learning ! ! ! ! ! Srihari
Ψ: Val(X,Y,Z)àR
Φ1(A,B)
a1
a1
b1
b2
c2
c1
0.5⋅0.7 = 0.35
as follows: a b 1
1
1
2
0.5
1 1
a1 b2 c2 0.8⋅0.2 = 0.16
a b b c0.8 0.5 a2 b1 c1 0.1⋅0.5 = 0.05
Ψ (X,Y,Z)= Φ1(X,Y).Φ2(Y,Z) a b 2 1b c0.1 1 2 0.7 a2 b1 c2 0.1⋅0.7 = 0.07
a2 b2 0 b2 c1 0.1 a2 b2 c1 0⋅0.1 = 0
a3 b1 0.3 b2 c2 0.2 a2 b2 c2 0⋅0.2 = 0
a3 b2 0.9 a3 b1 c1 0.3⋅0.5 = 0.15
a3 b1 c2 0.3⋅0.7 = 0.21
a3 b2 c1 0.9⋅0.1 = 0.09
a3 b2 c2 0.9⋅0.2 = 0.18
Two factors are multiplied in a way that matches-up
the common part Y
Gibbs Distributions
• Generalize the idea of factor product to
define an undirected parameterization of a
distribution
• A distribution PΦ is a Gibbs distribution
parameterized by a set of factors Φ = {φ (D ),..,φ 1 1 K (DK )}
1
PΦ (X1,..X n )= P(X1,..X n )
Z
where
m The factors do not necessarily
,..X ) = ∏ φ (D )
P(X represent the marginal
1 n i i
i=1
distributions of the variables
is an unnomalized measure and
in their scopes
Z = ∑ P(X ,..X ) is a normalizing constant
1 n
X1,..X n
26
called the partition function
Machine Learning ! ! ! ! ! Srihari
Influence of Factors
• Factors do not represent marginal
probabilities of the variables within their
scope (as we will see)
• A factor is only one contribution to the
overall joint distribution
• Distribution as a whole has to take into
consideration contributions from all factors
involved
• Example given next
27
Machine Learning ! ! ! ! ! Srihari
a0 b0 0.13
a0 b1 0.69
a1 b0 0.14
a1 b1 0.04
Bob-Charles
Charles-Debbie
agree
disagree
28
Machine Learning ! ! ! ! ! Srihari
29
Machine Learning ! ! ! ! ! Srihari
(a) (b)
30
Machine Learning ! ! ! ! ! Srihari
Factors as Cliques
• Functions of cliques
• Set of variables in clique C is denoted xC
• Joint distribution is written as a product of
potential functions 1
p( x ) = ∏ψ C ( x C )
Z C
• Where Z , the partition function, is a
normalization constant
Z = ∑ ∏ψ C (x C )
x C
32
Machine Learning ! ! ! ! ! Srihari
33
Machine Learning ! ! ! ! ! Srihari
34
Machine Learning ! ! ! ! ! Srihari
35