L4 BN Semantics2

ECE 6504: Advanced Topics in
Machine Learning
Probabilistic Graphical Models and Large-Scale Learning
Topics:
–  Bayes Nets: Representation/Semantics
–  d-separation, Local Markov Assumption
–  Markov Blanket
–  I-equivalence, (Minimal) I-Maps, P-Maps
Readings: KF 3.2,, 3.4

Dhruv Batra
Virginia Tech
Recap of Last Time
(C) Dhruv Batra 2

A general Bayes net
•  Set of random variables Flu Allergy
Sinus
•  Directed acyclic graph
–  Encodes independence assumptions
Headache Nose
•  CPTs
–  Conditional Probability Tables
•  Joint distribution:
(C) Dhruv Batra 3

Independencies in Problem
World, Data, reality: BN:
True distribution P
contains independence
assertions
Graph G
encodes local
independence
assumptions
(C) Dhruv Batra Slide Credit: Carlos Guestrin 4

Bayes Nets
•  BN encode (conditional) independence assumptions.
–  I(G) = {X indep of Y given Z}
•  Which ones?
•  And how can we easily read them?
(C) Dhruv Batra 5

Local Structures
•  What’s the smallest Bayes Net?
(C) Dhruv Batra 6

Local Structures
Indirect causal effect:
X Z Y
Indirect evidential effect: Common effect:

X Z Y
X Y
Common cause: Z
Z
X Y
(C) Dhruv Batra 7

Bayes Ball Rules
•  Flow of information
–  on board
(C) Dhruv Batra 8

Plan for today
•  Bayesian Networks: Semantics
–  d-separation
–  General (conditional) independence assumptions in a BN
–  Markov Blanket
–  (Minimal) I-map, P-map
(C) Dhruv Batra 9

Active trails formalized
•  Let variables O ⊆ {X1,…,Xn} be observed
•  A path X1 – X2 – · · · –Xk is an active trail if for each

consecutive triplet:
–  Xi-1→Xi→Xi+1, and Xi is not observed (Xi∉O)
–  Xi-1←Xi←Xi+1, and Xi is not observed (Xi∉O)
–  Xi-1←Xi→Xi+1, and Xi is not observed (Xi∉O)
–  Xi-1→Xi←Xi+1, and Xi is observed (Xi∈O), or one of its

descendents is observed

An active trail – Example
E G
A B D H
C F
F’
F’’
When are A and H independent?

d-Separation
A B
•  Definition: Variables X and Y
are d-separated given Z if
–  no active trail between Xi and Yj C
when variables Z⊆{X1,…,Xn} are E
observed D
G
F
H J
I
K

d-Separation
•  So what if X and Y are d-separated given Z?
(C) Dhruv Batra 13

Factorization + d-sep è Independence
•  Theorem:
–  If
•  P factorizes over G
•  d-sepG(X, Y | Z)
–  Then
•  P Ⱶ (X ⊥ Y | Z)
–  Corollary:
•  I(G) ⊆ I(P)
•  All independence assertions read from G are correct!
(C) Dhruv Batra 14

More generally: Completeness of d-separation
•  Theorem: Completeness of d-separation
–  For “almost all” distributions where P factorizes over to G
–  we have that I(G) = I(P)
•  “almost all” distributions: except for a set of measure zero of CPTs
•  Means that if X & Y are not d-separated given Z, then P¬ (X⊥Y|Z)

Local Markov Assumption
Flu Allergy
A variable X is independent of
its non-descendants given its
Sinus parents and only its parents
Headache Nose (Xi ⊥ NonDescendantsXi | PaXi)
Markov Blanket
= Markov Blanket of variable x8 – Parents,

children and parents of children
(C) Dhruv Batra Slide Credit: Simon J.D. Prince 17

Example
A variable is conditionally independent of all others,

given its Markov Blanket
(C) Dhruv Batra Slide Credit: Simon J.D. Prince 18

I-map
•  Independency map
•  Definition:
–  If I(G) ⊆ I(P)
–  G is an I-map of P
(C) Dhruv Batra 19

Factorization + d-sep è Independence
•  Theorem:
–  If
•  P factorizes over G
•  d-sepG(X, Y | Z)
–  Then
•  P Ⱶ (X ⊥ Y | Z)
–  Corollary:
•  I(G) ⊆ I(P)
•  G is an I-map of P
•  All independence assertions read from G are correct!
(C) Dhruv Batra 20

The BN Representation Theorem
P factorizes to G
If G is an I-map of P Obtain
Important because:
Every P has at least one BN structure G
Homework 1!!!! J
P factorizes to G
Obtain G is an I-map of P
Important because:
Read independencies of P from BN structure G
I-Equivalence
•  Two graphs G1 and G2 are I-equivalent if
–  I(G1) = I(G2)
•  Equivalence class of BN structures

–  Mutually-exclusive and exhaustive partition of graphs
(C) Dhruv Batra 22

Minimal I-maps & P-maps
•  Many possible I-maps
•  Is there a “simplest” I-map?
•  Yes, two directions

–  Minimal I-maps
–  P-maps
(C) Dhruv Batra 23

Minimal I-map
•  G is a minimal I-map for P if
–  deleting any edges from G makes it no longer an I-map
(C) Dhruv Batra 24

P-map
•  Perfect map
•  G is a P-map for P if
–  I(P) = I(G)
•  Question: Does every distribution P have P-map?
(C) Dhruv Batra 25

BN: Representation: What you need to know
•  Bayesian networks
–  A compact representation for large probability distributions
–  Not an algorithm
•  Representation
–  BNs represent (conditional) independence assumptions
–  BN structure = family of distributions
–  BN structure + CPTs = 1 single distribution
–  Concepts
•  Active Trails (flow of information); d-separation;
•  Local Markov Assumptions, Markov Blanket
•  I-map, P-map
•  BN Representation Theorem (I-map çè Factorization)
(C) Dhruv Batra 26

Main Issues in PGMs
•  Representation
–  How do we store P(X1, X2, …, Xn)
–  What does my model mean/imply/assume? (Semantics)
•  Learning
–  How do we learn parameters and structure of
P(X1, X2, …, Xn) from data?
–  What model is the right for my data?
•  Inference
–  How do I answer questions/queries with my model? such as
–  Marginal Estimation: P(X5 | X1, X4)
–  Most Probable Explanation: argmax P(X1, X2, …, Xn)
(C) Dhruv Batra 27

Learning Bayes nets
Known structure Unknown structure
Fully observable Very easy

data Hard
Missing data Somewhat easy

Very very hard
(EM)
Data
CPTs –
x(1)
P(Xi| PaXi)
…
x(m) structure parameters

L4 BN Semantics2

Uploaded by

Copyright:

Available Formats

You might also like

L4 BN Semantics2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

L4 BN Semantics2

Uploaded by

Copyright:

Available Formats

ECE 6504: Advanced Topics in

Readings: KF 3.2,, 3.4

(C) Dhruv Batra 2

(C) Dhruv Batra 3

(C) Dhruv Batra Slide Credit: Carlos Guestrin 4

(C) Dhruv Batra 5

(C) Dhruv Batra 6

Indirect evidential effect: Common effect:

(C) Dhruv Batra 7

(C) Dhruv Batra 8

(C) Dhruv Batra 9

• A path X1 – X2 – · · · –Xk is an active trail if for each

– Xi-1→Xi→Xi+1, and Xi is not observed (Xi∉O)

– Xi-1←Xi←Xi+1, and Xi is not observed (Xi∉O)

– Xi-1←Xi→Xi+1, and Xi is not observed (Xi∉O)

– Xi-1→Xi←Xi+1, and Xi is observed (Xi∈O), or one of its

(C) Dhruv Batra Slide Credit: Carlos Guestrin 10

When are A and H independent?

(C) Dhruv Batra Slide Credit: Carlos Guestrin 12

(C) Dhruv Batra 13

• All independence assertions read from G are correct!

(C) Dhruv Batra 14

(C) Dhruv Batra Slide Credit: Carlos Guestrin 15

= Markov Blanket of variable x8 – Parents,

(C) Dhruv Batra Slide Credit: Simon J.D. Prince 17

A variable is conditionally independent of all others,

(C) Dhruv Batra Slide Credit: Simon J.D. Prince 18

(C) Dhruv Batra 19

(C) Dhruv Batra 20

• Equivalence class of BN structures

(C) Dhruv Batra 22

• Yes, two directions

(C) Dhruv Batra 23

(C) Dhruv Batra 24

• Question: Does every distribution P have P-map?

(C) Dhruv Batra 25

(C) Dhruv Batra 26

(C) Dhruv Batra 27

Fully observable Very easy

Missing data Somewhat easy

You might also like

•  A path X1 – X2 – · · · –Xk is an active trail if for each

–  Xi-1→Xi→Xi+1, and Xi is not observed (Xi∉O)

–  Xi-1←Xi←Xi+1, and Xi is not observed (Xi∉O)

–  Xi-1←Xi→Xi+1, and Xi is not observed (Xi∉O)

–  Xi-1→Xi←Xi+1, and Xi is observed (Xi∈O), or one of its

•  All independence assertions read from G are correct!

•  Equivalence class of BN structures

•  Yes, two directions

•  Question: Does every distribution P have P-map?