Download as pdf or txt
Download as pdf or txt
You are on page 1of 90

COL333/671: Introduction to AI

Semester I, 2022-23

Probabilistic Reasoning

Rohan Paul

1
Outline
• Last Class
• Adversarial Search
• This Class
• Probabilistic Reasoning
• Reference Material
• AIMA Ch. 13 and 14

2
Acknowledgement
These slides are intended for teaching purposes only. Some material
has been used/adapted from web sources and from slides by Doina
Precup, Dorsa Sadigh, Percy Liang, Mausam, Dan Klein, Anca
Dragan, Nicholas Roy and others.

3
Uncertainty in AI
• Uncertainty: I hear an unusual sound and a
• Observed variables (evidence): Agent knows certain things about burning smell in my car, what
the state of the world (e.g., sensor measurements or symptoms) fault is there in my engine?

• Unobserved variables: Agent needs to reason about other aspects


(e.g. what disease is present, is the car operational, location of the
burglar) I have fever, loss of smell, loss
of taste, do I have Covid?
• Model: Agent knows something about how the known variables
relate to the unknown variables

I hear some footsteps in my


house, where is the burglar?
• Probabilistic reasoning gives us a framework for managing
our beliefs and knowledge.

4
Examples
Inferring disease from symptoms

5
Examples
Accident – driving domain.

6
Examples
Fault diagnosis

7
Examples
Predictive analytics/expert systems

8
Outline
• Representation for Uncertainty (review)
• Bayes Nets:
• Probabilistic reasoning gives us a framework for managing our
beliefs and knowledge.
• Answering queries using Bayes Net
• Inference methods
• Approximate methods for answering queries
• Use of learning

9
Random Variables
• A random variable is some aspect of the world about which
we (may) have uncertainty I hear an unusual sound and a
• R = Do I have Covid? burning smell in my car, what
• T = Engine is faulty or working? fault is there in my engine?
• D = How long will it take to drive to IIT?
• L = Where is the person?
I have fever, loss of smell, loss
of taste, do I have Covid?
• Domains
• R in {true, false} (often write as {+r, -r})
• T in {faulty, working}
I hear some footsteps in my
• D in [0, ¥) house, where is the burglar?
• L in possible locations in a grid {(0,0), (0,1), …}

10
Joint Distributions
• A joint distribution over a set of random variables:
specifies a real number for each assignment (or outcome):

T W P
• Must obey: hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

Note: Joint distribution can answer all probabilistic queries.


Problem: Table size is dn. 11
Events
• An event is a set E of outcomes

• From a joint distribution, we can calculate the


probability of any event
• Probability that it’s hot AND sunny? .4
T W P
• Probability that it’s hot? .4 + .1
hot sun 0.4
hot rain 0.1
• Probability that it’s hot OR sunny? .4 + .1 + .2
cold sun 0.2
cold rain 0.3

12
Marginalization
§ From a joint distribution (>1 variable) reduce it to a distribution over a smaller set of variables
§ Called marginal distributions are sub-tables which eliminate variables
§ Marginalization (summing out): Combine collapsed rows by adding likelihoods

T P
hot 0.5
T W P cold 0.5
hot sun 0.4
hot rain 0.1
cold sun 0.2 W P

cold rain 0.3 sun 0.6


rain 0.4

13
Conditioning Joint Distribution

§ Conditional distributions are probability distributions


T W P
over some variables given fixed values of others
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

P(a,b) Conditional Distributions

W P W P
P(a) P(b)
sun 0.4 sun 0.8
rain 0.6 rain 0.2
14
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
• P(W)? S T W P
summer hot sun 0.30
P(sun)=.3+.1+.1+.15=.65 summer hot rain 0.05
P(rain)=1-.65=.35
summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
• P(W | winter, hot)? summer cold rain 0.05
winter hot sun 0.10
winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
Inference by Enumeration
S T W P
summer hot sun 0.30
summer hot rain 0.05
summer cold sun 0.10
• P(W | winter, hot)? summer cold rain 0.05
winter hot sun 0.10
P(sun|winter,hot)~.1
P(rain|winter,hot)~.05
winter hot rain 0.05
P(sun|winter,hot)=2/3 winter cold sun 0.15
P(rain|winter,hot)=1/3
winter cold rain 0.20
Product Rule
• Marginal and a conditional provides the joint distribution.

• Example:
D W P D W P
wet sun 0.1 wet sun 0.08
R P
dry sun 0.9 dry sun 0.72
sun 0.8
wet rain 0.7 wet rain 0.14
rain 0.2
dry rain 0.3 dry rain 0.06

20
Chain Rule

Constructing a larger distribution by simpler distribution.


21
Bayes Rule
• Two ways to factor a joint distribution over two variables:

• Dividing, we get:

• Example: Diagnostic probability from causal probability:

P (e↵ect|cause)P (cause)
P (cause|e↵ect) =
P (e↵ect)
• Usefulness
• Lets us build one conditional from its reverse.
• Often one conditional is difficult to obtain but the other one is
simple.
22
Independence H 0.5
T 0.5
• Two variables are independent if:
H 0.5
n smaller T 0.5
distributions
• This says that their joint distribution factors into a
product two simpler distributions
• Another form: H 0.5
T 0.5
• We write:

• Example
• N-independent flips of a fair coin.

23
Bayesian Networks
• Problem with using full joint distribution tables as our probabilistic models:
• Unless there are only a few variables, the joint is hard to represent explicitly.

• Bayesian Networks:
• A technique for describing complex joint distributions (models) using simple, local
distributions (conditional probabilities)
• Also known as probabilistic graphical models
• Encode how variables locally influence each other. Local interactions chain together
to give global, indirect interactions

24
Examples

25
Bayesian Networks: Semantics
• A directed, acyclic graph, one node per random variable

• A conditional probability table (CPT) for each node


• A collection of distributions over X, one for each combination of parents’ values

• Bayesian Networks implicitly encode joint distributions


• As a product of local conditional distributions

• To see what probability a BN gives to a full assignment, multiply all the relevant conditionals:
Example: The Alarm Network
B P(B) E P(E)
Burglary Earthquake

+b 0.001 +e 0.002

-b 0.999 -e 0.998
Alarm

B E A P(A|B,E)
+b +e +a 0.95
John Mary
Calls Calls +b +e -a 0.05
+b -e +a 0.94
A J P(J|A) A M P(M|A) +b -e -a 0.06
+a +j 0.9 +a +m 0.7 -b +e +a 0.29
+a -j 0.1 +a -m 0.3 -b +e -a 0.71
-a +j 0.05 -a +m 0.01 -b -e +a 0.001
-a -j 0.95 -a -m 0.99 -b -e -a 0.999
Example: The Alarm Network
B P(B) E P(E)
B E
+b 0.001 +e 0.002
-b 0.999 -e 0.998

A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Example: The Alarm Network
B P(B) E P(E)
B E
+b 0.001 +e 0.002
-b 0.999 -e 0.998

A
A J P(J|A) A M P(M|A)
B E A P(A|B,E)
+a +j 0.9 +a +m 0.7
+b +e +a 0.95
+a -j 0.1 +a -m 0.3
-a +j 0.05 J M -a +m 0.01
+b +e -a 0.05
+b -e +a 0.94
-a -j 0.95 -a -m 0.99
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
Inference by Enumeration in a Bayes Net
• Inference by enumeration is one way to perform
inference in a Bayesian Network (Bayes Net). B E

P (B | + j, +m) /B P (B, +j, +m) A


X
= P (B, e, a, +j, +m)
e,a
J M
X
= P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)
e,a

=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j| a)P (+m| a)
P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j| a)P (+m| a)
Bayesian Networks: Inference
• Bayesian Networks
• Implicitly encode a probability distribution
• As a product of local conditional distributions
• Variables
• Query variables
• Evidence variables All variables
• Hidden variables
• Inference: What we want to estimate?
• Estimating some useful quantity from the joint
distribution.
• Posterior probability
• Most likely explanation
Inference by Enumeration
• Setup: A distribution over query variables (Q)
given evidence variables (E)
• Select entries consistent with the evidence.
• E.g., Alarm rang, it is rainy, disease present
• Compute the joint distribution
• Sum out (eliminate) the hidden variables (H)
• Normalize the distribution
• Next
• Introduce a notion called factors
• Understand this computation using joining and
marginalization of factors.
Example: Traffic Domain
• Random Variables
+r 0.1
• R: Raining
R -r 0.9
• T: Traffic
• L: Late for class
T +r +t 0.8
+r -t 0.2
-r +t 0.1
-r -t 0.9
P (L) = ? L
X
= P (r, t, L)
+t +l 0.3
r,t
+t -l 0.7
X
-t +l 0.1
= P (r)P (t|r)P (L|t) -t -l 0.9
r,t
Example: Traffic Domain
Traffic domain
• What are factors?
• A factor is a function from some set R
of variables into a specific value. +r 0.1 +r +t 0.8 +t +l 0.3
• Initial factors -r 0.9 +r
-r
-t
+t
0.2
0.1
+t
-t
-l
+l
0.7
0.1
• Conditional probability tables (one T -r -t 0.9 -t -l 0.9
per node)
• Select the values consistent with
the evidence
• Inference by Enumeration L applied to the initial factors
• Procedure that joins all the factors
and then sums out all the hidden
variables.
+r 0.1 +r +t 0.8 +t +l 0.3
• Define “joining” and “summing” -r 0.9 +r -t 0.2 -t +l 0.1
next. -r +t 0.1
-r -t 0.9
Operation I: Joining Factors
• Joining
• Get all the factors over the joining variables.
• Build a new factor over the union of variables involved.
• Computation for each entry: pointwise products

R
+r 0.1 +r +t 0.8 +r +t 0.08
-r 0.9 +r -t 0.2 +r -t 0.02 R,T
T -r +t 0.1 -r +t 0.09
-r -t 0.9 -r -t 0.81
Joining Factors

Source: AIMA Ch 14.


Joining Multiple Factors
+r 0.1
R -r 0.9 Join R
+r +t 0.08
Join T
R, T, L
+r -t 0.02
T +r +t 0.8 -r +t 0.09
+r -t 0.2 -r -t 0.81 R, T
-r +t 0.1 +r +t +l 0.024
-r -t 0.9 +r +t -l 0.056
L
L +r -t +l 0.002
+r -t -l 0.018
+t +l 0.3 +t +l 0.3 -r +t +l 0.027
+t -l 0.7 +t -l 0.7 -r +t -l 0.063
-t +l 0.1 -t +l 0.1 -r -t +l 0.081
-t -l 0.9 -t -l 0.9 -r -t -l 0.729
Operation II: Eliminating Factors
• Marginalization +r +t 0.08 Sum out R
• Take a factor and sum out a variable +r -t 0.02 +t 0.17
-r +t 0.09 -t 0.83
• Shrinks the factor to a smaller one
-r -t 0.81

R, T, L T, L L
+r +t +l 0.024
+r +t -l 0.056
+r -t +l 0.002 Sum out R Sum out T
+r -t -l 0.018 +t +l 0.051
-r +t +l 0.027 +t -l 0.119 +l 0.134
-r +t -l 0.063 -t +l 0.083 -l 0.866
-r -t +l 0.081 -t -l 0.747
-r -t -l 0.729
Inference by Enumeration
R Multiple join operations and multiple eliminate operations

T
P (L) = ?
L
Variable Elimination
• Inference by Enumeration
• Problem: the whole distribution is “joined up“ before “sum out” the hidden variables

• Variable Elimination
• Interleaves joining and eliminating variables
• Does not create the full joint distribution in one go
• Key Idea:
• Picks a variable ordering. Picks a variable.
• Joins all factors containing that variable.
• Sums out the influence of the variable on new factor.

• Leverage the structure (topology) of the Bayesian Network


• Marginalize early (avoid growing the full joint distribution)
Inference by Enumeration vs. Variable
Elimination
R P (L) = ?
Inference by Enumeration Variable Elimination
T
XX X X
= P (L|t)P (r)P (t|r) = P (L|t) P (r)P (t|r)
t r t r
L Join on r Join on r

Join on t Eliminate r

Eliminate r Join on t

Eliminate t Eliminate t
Variable Elimination
Join R Sum out R Join T Sum out T

+r +t 0.08
+r 0.1 +r -t 0.02
-r 0.9 +t 0.17
-r +t 0.09
-t 0.83
-r -t 0.81
R R, T T T, L L
+r +t 0.8
+r -t 0.2
-r +t 0.1
T -r -t 0.9 L L +t +l 0.051
+t -l 0.119 +l 0.134
-t +l 0.083 -l 0.866
L +t +l 0.3 +t +l 0.3 -t -l 0.747
+t +l 0.3
+t -l 0.7 +t -l 0.7
+t -l 0.7
-t +l 0.1 -t +l 0.1
-t +l 0.1
-t -l 0.9 -t -l 0.9
-t -l 0.9
Incorporating Evidence
• Till Now, we computed P(Late)?
+r 0.1 +r +t 0.8 +t +l 0.3
• What happens when P(Late| Rain)? -r 0.9 +r -t 0.2 +t -l 0.7
-r +t 0.1 -t +l 0.1
• How to incorporate evidence in Variable -r -t 0.9 -t -l 0.9
Elimination.
• Solution
• If evidence, then start with factors and select
the evidence. +r 0.1 +r +t 0.8 +t +l 0.3
+r -t 0.2 +t -l 0.7
• After selecting evidence, eliminate all variables -t +l 0.1
other than query and evidence. -t -l 0.9

Evidence incorporated in the initial factors


General Variable Elimination
• Query:

• Start with initial factors:


• Local conditional probability tables.
• Evidence (known) variables are instantiated.

• While there are still hidden variables (not Q or evidence):


• Pick a hidden variable H (from some ordering)
• Join all factors mentioning H
• Eliminate (sum out) H

• Join all the remaining factors and normalize


Example: Alarm Domain

P (B|j, m) / P (B, j, m) marginal can be obtained from joint by summing out


X
= P (B, j, m, e, a) use Bayes’ net joint distribution expression
X
e,a
= P (B)P (e)P (a|B, e)P (j|a)P (m|a) use x*(y+z) = xy + xz
e,a
X X
= P (B)P (e) P (a|B, e)P (j|a)P (m|a) joining on a, and then summing out gives f1
X
e a
= P (B)P (e)f1 (j, m|B, e) use x*(y+z) = xy + xz
e X joining on e, and then summing out gives f2
= P (B) P (e)f1 (j, m|B, e)
= P (B)fe2 (j, m|B)
Example: Alarm Domain

Choose A
Example: Alarm Domain

Choose E

Finish with B

Normalize
Variable Elimination: Structuring Computation

Source: AIMA Ch 14.


Example

Start by inserting evidence, which gives the following initial factors:

P (Z), P (X1 |Z), P (X2 |Z), P (X3 |Z), P (y1 |X1 ), P (y2 |X2 ), P (y3 |X3 )

There are three variables to eliminate { X1, X2 and Z }. The Y


variables are observed (instantiated).
Example
Start by inserting evidence, which gives the following initial factors:

P (Z), P (X1 |Z), P (X2 |Z), P (X3 |Z), P (y1 |X1 ), P (y2 |X2 ), P (y3 |X3 )
P
Eliminate X1 , this introduces the factor f1 (y1 |Z) = x1 P (x1 |Z)P (y1 |x1 ),
and we are left with:

P (Z), P (X2 |Z), P (X3 |Z), P (y2 |X2 ), P (y3 |X3 ), f1 (y1 |Z)
P
Eliminate X2 , this introduces the factor f2 (y2 |Z) = x2 P (x2 |Z)P (y2 |x2 ),
and we are left with:

P (Z), P (X3 |Z), P (y3 |X3 ), f1 (y1 |Z), f2 (y2 |Z)


P
Eliminate Z, this introduces the factor f3 (y1 , y2 , X3 ) = z P (z)P (X3 |z)f1 (y1 |Z)f2 (y2 |Z),
and we are left with:
P (y3 |X3 ), f3 (y1 , y2 , X3 )
No hidden variables left. Join the remaining factors to get:

f4 (y1 , y2 , y3 , X3 ) = P (y3 |X3 ), f3 (y1 , y2 , X3 )


P
Normalizing over X3 gives P (X3 |y1 , y2 , y3 ) = f4 (y1 , y2 , y3 , X3 )/ x3 f4 (y1 , y2 , y3 , x3 )
Example
Start by inserting evidence, which gives the following initial factors:

P (Z), P (X1 |Z), P (X2 |Z), P (X3 |Z), P (y1 |X1 ), P (y2 |X2 ), P (y3 |X3 )
P
Eliminate X1 , this introduces the factor f1 (y1 |Z) = x1 P (x1 |Z)P (y1 |x1 ),
and we are left with:

P (Z), P (X2 |Z), P (X3 |Z), P (y2 |X2 ), P (y3 |X3 ), f1 (y1 |Z)
P
Eliminate X2 , this introduces the factor f2 (y2 |Z) = x2 P (x2 |Z)P (y2 |x2 ),
and we are left with: Computational complexity
P (Z), P (X3 |Z), P (y3 |X3 ), f1 (y1 |Z), f2 (y2 |Z) • Depends on the largest factor
P generated in VE.
Eliminate Z, this introduces the factor f3 (y1 , y2 , X3 ) = z P (z)P (X3 |z)f1 (y1 |Z)f2 (y2 |Z), • Factor size = number of entries
and we are left with: in the table.
P (y3 |X3 ), f3 (y1 , y2 , X3 )
• In this example: each factor is of
No hidden variables left. Join the remaining factors to get: size 2 (only one variable). Note
that y is observed.
f4 (y1 , y2 , y3 , X3 ) = P (y3 |X3 ), f3 (y1 , y2 , X3 ) • X1, X2, Z, X3
P
Normalizing over X3 gives P (X3 |y1 , y2 , y3 ) = f4 (y1 , y2 , y3 , X3 )/ x3 f4 (y1 , y2 , y3 , x3 )
Effect of Different Orderings
• For the query P(Xn|y1,…,yn)
• Two different orderings as
• Eliminate Z first. Z, X1, …, Xn-1
• Eliminate Z last. X1, …, Xn-1, Z.
• What is the size of the maximum factor
generated for each of the orderings?
Example
Eliminate Z First

P (Xn , |y1 , y2 , . . . , yn ) = ↵P (Z)P (X1 |Z)P (X2 |Z), . . . , P (Xn |Z)P (y1 |X1 )P (y2 |X2 ), . . . , P (yn |Xn )
<latexit sha1_base64="Ly4zfsq4gYOMHOWlcBaWIWtuH5M=">AAACiXicbVFbS8MwFE7rbc7b1EdfgkPYYIymeEMQBr74OMHpcBslzTIXlqYlORXK8L/4m3zz35hupczLgYTvfOf7TpKTMJHCgOd9Oe7a+sbmVmW7urO7t39QOzx6MnGqGe+xWMa6H1LDpVC8BwIk7yea0yiU/Dmc3eX15zeujYjVI2QJH0X0VYmJYBQsFdQ+uo1+oFp4aEWAs4C07ObbfByDybFq4ls8pDKZUtxtvDRxbiCFvkj9Mi2Ni7arqqw0WfuS8EvCXzVmpdG2aAa1utf2FoH/AlKAOiqiG9Q+bSeWRlwBk9SYAfESGM2pBsEkf68OU8MTymb0lQ8sVDTiZjRfTPIdn1lmjCextksBXrCrjjmNjMmi0CojClPzu5aT/9UGKUyuR3OhkhS4YsuDJqnEEOP8W/BYaM5AZhZQpoW9K2ZTqikD+3lVOwTy+8l/wZPfJl6bPJB657wYRwWdoFPUQARdoQ66R13UQ8zZdFrOhXPp7rjEvXZvllLXKTzH6Ee4d9+od7qi</latexit>
sha1_base64="xI2NPK8ymgrhn38KOHzUpeOU4dQ=">AAACiXicbVFbS8MwFE7rfd7m5c2X4BA2GKMp3hCEgS8+TnA63EZNs9QF07Qkp0IZ/hd/k2/+G9NtlHk5kPCd73zfSXISplIY8Lwvx11aXlldW9+obG5t7+xW9/YfTJJpxrsskYnuhdRwKRTvggDJe6nmNA4lfwxfb4r64xvXRiTqHvKUD2P6okQkGAVLBdWPTr0XqCYeWBHgPCBNu/k2HyVgCqwa+BoPqEzHFHfqTw1cGMhcP0/9Mi2N07aLqrw0WfuM8EvCXzTmpdG2aATVmtfypoH/AjIHtfZhFD0jhDpB9dN2YlnMFTBJjekTL4XhhGoQTPL3yiAzPKXslb7wvoWKxtwMJ9NJvuMTy4xwlGi7FOApu+iY0NiYPA6tMqYwNr9rBflfrZ9BdDmcCJVmwBWbHRRlEkOCi2/BI6E5A5lbQJkW9q6YjammDOznVewQyO8n/wUPfot4LXJHau1TNIt1dISOUR0RdIHa6BZ1UBcxZ9VpOmfOubvpEvfSvZpJXWfuOUA/wr35Bq8DvCI=</latexit>
sha1_base64="9vc4pWh5OHMfSSVEV5G+wwwzCPo=">AAACiXicbVHRatswFJW9rkuTbU3Xve1FNBQSCMEy7VYGhUBe9phC04QmwciKnIjIspGuByb0X/pNe+vfVHaCSdNdkDj33HOupKswlcKA57047oejj8efaif1xucvX0+bZ98eTJJpxkcskYmehNRwKRQfgQDJJ6nmNA4lH4frQVEf/+XaiETdQ57yeUyXSkSCUbBU0HwetieB6uKZFQHOA9K1m2/zRQKmwKqDb/GMynRF8bD92MGFgez0u9Sv0spYtt1X5ZXJ2reEXxH+vjGvjLZFJ2i2vJ5XBn4PyA60+t+jMoZB85/txLKYK2CSGjMlXgrzDdUgmORP9VlmeErZmi751EJFY27mm3KST/jSMgscJdouBbhk9x0bGhuTx6FVxhRW5rBWkP+rTTOIbuYbodIMuGLbg6JMYkhw8S14ITRnIHMLKNPC3hWzFdWUgf28uh0COXzye/Dg94jXI3ek1b9C26ihH+gCtRFBv1Af/UFDNELMOXa6zrXz0224xL1xf2+lrrPznKM34Q5eAU/9vVo=</latexit>

P
This factor is 2n f1 (X1 , X2 , . . . , Xn ) =
<latexit sha1_base64="pCpQ7Y/y6ncXDFBIdTDi7O3lAZc=">AAACTHicbVBNSwMxFMxWrVq/qh69BIvQQikbEfQiCF48VrAf0JYlm2bb0Gx2Sd4K7bI/0IsHb/4KLx4UEUzrHtT6IGEyb+YlGT+WwoDrPjuFldW14vrGZmlre2d3r7x/0DZRohlvsUhGuutTw6VQvAUCJO/GmtPQl7zjT67n/c4910ZE6g6mMR+EdKREIBgFS3llFngpyapdj9Rx1zut4/4wAjPHqcpq+BL3TRJ66SzDzeqsZjerxH07EfCsVl865+7ftFeuuA13UXgZkBxUUF5Nr/xk57Ak5AqYpMb0iBvDIKUaBJM8K/UTw2PKJnTEexYqGnIzSBdhZPjEMkMcRNouBXjB/nSkNDRmGvpWGVIYm7+9Oflfr5dAcDFIhYoT4Ip9XxQkEkOE58niodCcgZxaQJkW9q2YjammDGz+JRsC+fvlZdA+bRC3QW5J5eosj2MDHaFjVEUEnaMrdIOaqIUYekAv6A29O4/Oq/PhfH5LC07uOUS/qlD8AjlCrhQ=</latexit>
sha1_base64="zcRfo+cv5LAfEBVaUNK9M1d97HM=">AAACTHicbVDPSyMxGP2m7rrd7q+qF2EvwbLQQimTsuBehIIXjxWsFtoyZNKMBjOZIflGaIfx//PiwZt/hZc9rIhg+uOwbX2Q8PK+931JXpgqadH3H73S1oeP25/Knytfvn77/qO6s3tuk8xw0eOJSkw/ZFYoqUUPJSrRT41gcajERXh9PKtf3AhjZaLPcJKKUcwutYwkZ+ikoMqjIKdFvR/QJukH7SYZjhO0M57rokGOyNBmcZBPC9KtTxtuc04ydBORTBvNjfOye1UOqjW/5c9BNgldklpnP7oFh25QfXBzeBYLjVwxawfUT3GUM4OSK1FUhpkVKePX7FIMHNUsFnaUz8MoyC+njEmUGLc0krn6f0fOYmsnceicMcMru16bie/VBhlGf0a51GmGQvPFRVGmCCZkliwZSyM4qokjjBvp3kr4FTOMo8u/4kKg61/eJOftFvVb9JTWOr9hgTL8hAOoA4VD6MAJdKEHHO7gCf7Bs3fv/fVevNeFteQte/ZgBaXtN+SDr1A=</latexit>
sha1_base64="h169Sik4Mueb9yglVTZVFaJ0yPY=">AAACTHicbVDLSgMxFM3Ud31V3QhugkVooZRJEXQjFNy4rGBtoS1DJs1oMJMZkjtCO8y/+DtuXLjzK9y4UEQxfSys9ULCybnn3CTHj6Uw4LovTm5hcWl5ZXUtv76xubVd2Nm9NlGiGW+ySEa67VPDpVC8CQIkb8ea09CXvOXfnY/6rXuujYjUFQxi3gvpjRKBYBQs5RVY4KUkK7U9UsFtr1bB3X4EZoRTlZXxGe6aJPTSYYYbpWHZblaJu3Yi4GG5Mneeumdpr1B0q+648DwgU1Cs7wfZd+0haHiFZzuHJSFXwCQ1pkPcGHop1SCY5Fm+mxgeU3ZHb3jHQkVDbnrpOIwMH1mmj4NI26UAj9nfjpSGxgxC3ypDCrfmb29E/tfrJBCc9lKh4gS4YpOLgkRiiPAoWdwXmjOQAwso08K+FbNbqikDm3/ehkD+fnkeXNeqxK2SS1KsH6NJraIDdIhKiKATVEcXqIGaiKFH9Ire0Yfz5Lw5n87XRJpzpp49NFO55R/ZQ7GF</latexit>
z P (z)P (X1 |z), P (X1 |z), . . . , P (X1 |z)

P (Xn , |y1 , y2 , . . . , yn ) = ↵f1 (X1 , X2 , . . . , Xn )P (y1 |X1 )P (y2 |X2 ), . . . , P (yn |Xn )
<latexit sha1_base64="CxJDQjCO+iCmzd3sQ/Of5R/0GOY=">AAACb3icbVFPS+NAFJ9EXd2url33sAdFhi2CASmZIuhFEPayxy5YDbQlTKYTOziZhJkXoYRe9wN68zt48Rv40obivwcTfvP782bmJSm0chCGj56/tr7xZXPra+vb9s733faPvWuXl1bIgch1bqOEO6mVkQNQoGVUWMmzRMub5O5Prd/cS+tUbq5gVshxxm+NSpXggFTc/k/7x1FsTugIXUBnMTvBTw/3kxxcjU1AL+iI62LKaRpXbI5+NEWvTFFcmXmAnTDeNELPkuitiF6wCtSCWQkmaMXtTtgNF0U/AtaADmmqH7cfsJUoM2lAaO7ckIUFjCtuQQkt561R6WTBxR2/lUOEhmfSjavFvOb0CJkJTXOLywBdsK8TFc+cm2UJOjMOU/deq8nPtGEJ6fm4UqYoQRqxPCgtNYWc1sOnE2WlAD1DwIVVeFcqptxyAfiL6iGw90/+CK57XRZ22T/WuTxtxrFF9slvckwYOSOX5C/pkwER5Mnb8/a9A+/Z/+Uf+nRp9b0m85O8KT94AUaktAY=</latexit>
sha1_base64="5oY2ocNIeECgCBDyAzqgqCGc/RM=">AAACb3icbVFPa9swFJe9ruvStc3awg4tRSwUaijBCoPuMgj00mMGTWtIgicrciMiy0Z6HgSTaz9gb/0Ou/Qb9DkJoWv7QOan358n6TkptHIQho+e/2Hj4+anrc+N7S87u3vNr/s3Li+tkH2R69xGCXdSKyP7oEDLqLCSZ4mWt8n0stZv/0rrVG6uYVbIUcbvjEqV4IBU3LynvbMoNud0iC6gs5id46eD+3EOrsYmoL/okOtiwmkaV2yOfjRFL0xRXJl5gJ0wvmqEniXRWROdYB2oBbMWTNCIm62wHS6KvgVsBVrdwzT9Qwjpxc0HbCXKTBoQmjs3YGEBo4pbUELLeWNYOllwMeV3coDQ8Ey6UbWY15yeIjOmaW5xGaAL9mWi4plzsyxBZ8Zh4l5rNfmeNigh/TmqlClKkEYsD0pLTSGn9fDpWFkpQM8QcGEV3pWKCbdcAP6iegjs9ZPfgptOm4Vt9pu1uj/IsrbIEflOzggjF6RLrkiP9Ikg/7x978g79p78b/6JT5dW31tlDsh/5QfPTTC1hg==</latexit>
sha1_base64="VuLMv10Zt18ws9Ruiq87gL1qCQ0=">AAACb3icbVHdSiMxFM7Muv7UVbsq7IUiYYvggJRJEXZvhII3XnZhqwNtGTJpxgYzmSE5I5Shtz6gd76DN76BZ9pSXN0DGb58PyfJmaTQykEYPnv+l7Wv6xubW43tbzu7e83v+zcuL62QfZHr3EYJd1IrI/ugQMuosJJniZa3yf1Vrd8+SOtUbv7CtJCjjN8ZlSrBAam4+Uh7Z1FszukQXUCnMTvHTwf34xxcjU1AL+mQ62LCaRpXbIZ+NEXvTFFcmVmAnTC+bISeBdFZEZ1gFagFsxJM0IibrbAdzot+BmwJWt3DdF69uPmErUSZSQNCc+cGLCxgVHELSmg5awxLJwsu7vmdHCA0PJNuVM3nNaOnyIxpmltcBuicfZ+oeObcNEvQmXGYuI9aTf5PG5SQ/h5VyhQlSCMWB6WlppDTevh0rKwUoKcIuLAK70rFhFsuAH9RPQT28cmfwU2nzcI2+8Na3QuyqE1yRH6SM8LIL9Il16RH+kSQF2/fO/KOvVf/h3/i04XV95aZA/JP+cEb7hu2vg==</latexit>

P
f2 (X1 , X2 , . . . , Xn
<latexit sha1_base64="t/hsUSnyDqQ7N4l6OyAcK1crcBw=">AAACVXicdZHBSyMxFMYzs+rW7qrd9ejlYRFaqGVShN3LQmEvHitYLbRlyKQZDWYyQ/JGtgzzT/ay7H+yF8FMOwe1+iDw43tf8pIvUaakxSD45/mfdnb3Pjf2m1++Hhwetb59v7FpbrgY81SlZhIxK5TUYowSlZhkRrAkUuI2evhd9W8fhbEy1de4zMQ8YXdaxpIzdFLYUnFYDMrOJKQ9mISDHswWKdqKC31Oyy78ApjZPAmLP04pS3B++rG/B2tbF0adZQUwc8OxFpthqx30g3XBNtAa2qSuUdhaueN5ngiNXDFrpzTIcF4wg5IrUTZnuRUZ4w/sTkwdapYIOy/WqZRw5pQFxKlxSyOs1Zc7CpZYu0wi50wY3tu3vUp8rzfNMf45L6TOchSabwbFuQJMoYoYFtIIjmrpgHEj3V2B3zPDOLqPqEKgb5+8DTeDPg369Iq2hxd1HA1yQk5Jh1DygwzJJRmRMeFkRf57nud7f70nf8ff21h9r95zTF6Vf/QMRymvEA==</latexit>
sha1_base64="ZBlxgg4Fsy1juw0bXI8rDxw5nPM=">AAACVXicdZFdSyMxFIYz4+fWj1a93JuDRahQy6QK641Y8MbLClsttGXIpBkNZjJDcka2DPMnvRH/yd4smGl74frxQuDhPW9ykpMoU9JiELx6/srq2vrG5o/a1vbObr2xt39r09xwMeCpSs0wYlYoqcUAJSoxzIxgSaTEXfR4VdXvnoSxMtW/cZaJScLutYwlZ+issKHisOiWrWFI2zAMu20YT1O0FRf6hJbHcAEwtnkSFn+cU5bg8vT7fBvmsWPot2YVwNg1x6VZCxvNoBPMBZ+BLqHZg9NL4tQPG8/ueJ4nQiNXzNoRDTKcFMyg5EqUtXFuRcb4I7sXI4eaJcJOivlUSjhyzhTi1LilEebu+x0FS6ydJZFLJgwf7MdaZX5VG+UYn08KqbMcheaLRnGuAFOoRgxTaQRHNXPAuJHursAfmGEc3UdUQ6Afn/wZbrsdGnToDW32zshCm+QnOSQtQskv0iPXpE8GhJNn8tfzPN978f75q/76Iup7yz0H5D/59Tdjba/i</latexit>
sha1_base64="Toix75VNBj0AM6aIYoscatq7+i4=">AAACVXicdZFNa+MwEIZl9zvbbrPd09LLsKGQQjZYaaF76RLoZY8pNG0gCUZW5FZElo00Lg3Gf7KXsv+kl7KVkxz6OSB4eOedGWkUZUpaDIJ/nr+yura+sblV+7K983W3/m3v0qa54aLPU5WaQcSsUFKLPkpUYpAZwZJIiatoelblr26FsTLVFzjLxDhh11rGkjN0UlhXcVh0yuYgpC0YhJ0WjCYp2ooL/YuWh3AKMLJ5EhZ3TilLcH76ub8Fc9sh9JqzCmDkhuNSrIX1RtAO5gHvgS6h0YWjP/9/9I56Yf3eted5IjRyxawd0iDDccEMSq5EWRvlVmSMT9m1GDrULBF2XMy3UsKBUyYQp8YdjTBXX1YULLF2lkTOmTC8sW9zlfhRbphj/HtcSJ3lKDRfDIpzBZhCtWKYSCM4qpkDxo10dwV+wwzj6D6iWgJ9++T3cNlp06BNz2mje0wWsUn2yU/SJJSckC75S3qkTzi5J4+e5/neg/fkr/rrC6vvLWu+k1fh7z4DiSixfA==</latexit>
1) = xn f1 (X1 , X2 , . . . , Xn 1 , xn )P (yn |xn )

P (Xn , |y1 , y2 , . . . , yn ) = ↵f2 (X1 , X2 , . . . , Xn


<latexit sha1_base64="WDQU/wdiygvwsDvmGe+lN34GANc=">AAACe3icbVFda9swFJW9reu8rUu3x0ERC6XJSINlVrqXQmEve8xgaQ1JMLIiN6KybKTrQTD5E/1pfes/2Uth14kJ/dgFm6Nzzj2y701LrRyE4Z3nv3j5auf17pvg7bv3ex86+x8vXFFZIcei0IWNU+6kVkaOQYGWcWklz1MtL9PrH41++UdapwrzG5alnOX8yqhMCQ5IJZ0bOurFiRnQKbqALhM2wFeE53kBrsGmT8/olOtywWmW1NEK/WiKH5jipDbHbNXHLAxoo9C1IaItEfW3LY2wbtqKm4ggCJJONxyG66LPAWtBl7Q1Sjq3GCqqXBoQmjs3YWEJs5pbUELLVTCtnCy5uOZXcoLQ8Fy6Wb2e3YoeIjOnWWHxMUDX7MOOmufOLfMUnTmHhXuqNeT/tEkF2fdZrUxZgTRic1FWaQoFbRZB58pKAXqJgAur8FupWHDLBeC6miGwp7/8HFxEQxYO2S/WPf/WjmOXfCZfSI8wckrOyU8yImMiyF/vwDvyet693/W/+oON1ffank/kUfkn/wDANLed</latexit>
sha1_base64="Bl8sFGcK2hqbB8PDPv6a0xQB2Bo=">AAACe3icbVFda9swFJW9bmu9r3Tr26CIhbFkpMEyG9vLILCXPaawtIYkeLIiN6KybKTrQjD5E/tpfes/2ctg13EIXdsLEkfnnHv0lZZaOQjDG89/tPf4ydP9g+DZ8xcvX3UOX5+5orJCTkShCxun3EmtjJyAAi3j0kqep1qep5ffG/38SlqnCvMTVqWc5/zCqEwJDkglnd903IsTM6AzdAFdJWyAU4TrRQGuwaZPv9EZ1+WS0yypozX60RTfMsVJbU7Yuo9ZGLCNQldLRDsi6u9aGmHTtBPbiCAIkk43HIabovcB24Lu6CjLfhFCxknnGkNFlUsDQnPnpiwsYV5zC0pouQ5mlZMlF5f8Qk4RGp5LN683b7em75FZ0KywOAzQDXu7o+a5c6s8RWfOYenuag35kDatIPs6r5UpK5BGtBtllaZQ0OYj6EJZKUCvEHBhFZ6ViiW3XAB+V/MI7O6V74OzaMjCITtl3dEn0tY+eUvekR5h5AsZkR9kTCZEkD/esffB63l//a7/0R+0Vt/b9rwh/5X/+R/GwLkd</latexit>
sha1_base64="dxL9fV9+G1Bp8OEAIMF/XSjRReo=">AAACe3icbVFda9swFJW9revctcs+3gZDNIwlJQuW2eheBoG97DGDpTUkwciK3IjKspGuB8HkT+yn9W3/ZC+DXdshdO0uSBydc+7RV1pq5SAMf3n+g4ePDh4fPgmOnh6fPOs9f3HhisoKOROFLmyccie1MnIGCrSMSyt5nmp5mV5/afTLH9I6VZjvsCnlMudXRmVKcEAq6f2k00GcmBFdoAvoJmEjnCJcrwpwDTZD+pkuuC7XnGZJHW3Rj6b4lilOavOebYeYhQG7KHR1RLQnouG+pRHapr3YRQRBkPT64Thsi94HbAf6k1dZW9Okd4OhosqlAaG5c3MWlrCsuQUltNwGi8rJkotrfiXnCA3PpVvW7dtt6VtkVjQrLA4DtGVvd9Q8d26Tp+jMOazdXa0h/6fNK8g+LWtlygqkEd1GWaUpFLT5CLpSVgrQGwRcWIVnpWLNLReA39U8Art75fvgIhqzcMy+sf7kA+nqkLwmp2RAGDknE/KVTMmMCPLbe+O98wbeH7/vn/mjzup7u56X5J/yP/4FZ7q6VQ==</latexit>
1 )P (y1 |X1 )P (y2 |X2 ), . . . , P (yn 1 |Xn 1 )
Example
Eliminate Z Last

P (Xn , |y1 , y2 , . . . , yn ) = ↵P (Z)P (X1 |Z)P (X2 |Z), . . . , P (Xn |Z)P (y1 |X1 )P (y2 |X2 ), . . . , P (yn |Xn )
<latexit sha1_base64="Ly4zfsq4gYOMHOWlcBaWIWtuH5M=">AAACiXicbVFbS8MwFE7rbc7b1EdfgkPYYIymeEMQBr74OMHpcBslzTIXlqYlORXK8L/4m3zz35hupczLgYTvfOf7TpKTMJHCgOd9Oe7a+sbmVmW7urO7t39QOzx6MnGqGe+xWMa6H1LDpVC8BwIk7yea0yiU/Dmc3eX15zeujYjVI2QJH0X0VYmJYBQsFdQ+uo1+oFp4aEWAs4C07ObbfByDybFq4ls8pDKZUtxtvDRxbiCFvkj9Mi2Ni7arqqw0WfuS8EvCXzVmpdG2aAa1utf2FoH/AlKAOiqiG9Q+bSeWRlwBk9SYAfESGM2pBsEkf68OU8MTymb0lQ8sVDTiZjRfTPIdn1lmjCextksBXrCrjjmNjMmi0CojClPzu5aT/9UGKUyuR3OhkhS4YsuDJqnEEOP8W/BYaM5AZhZQpoW9K2ZTqikD+3lVOwTy+8l/wZPfJl6bPJB657wYRwWdoFPUQARdoQ66R13UQ8zZdFrOhXPp7rjEvXZvllLXKTzH6Ee4d9+od7qi</latexit>
sha1_base64="xI2NPK8ymgrhn38KOHzUpeOU4dQ=">AAACiXicbVFbS8MwFE7rfd7m5c2X4BA2GKMp3hCEgS8+TnA63EZNs9QF07Qkp0IZ/hd/k2/+G9NtlHk5kPCd73zfSXISplIY8Lwvx11aXlldW9+obG5t7+xW9/YfTJJpxrsskYnuhdRwKRTvggDJe6nmNA4lfwxfb4r64xvXRiTqHvKUD2P6okQkGAVLBdWPTr0XqCYeWBHgPCBNu/k2HyVgCqwa+BoPqEzHFHfqTw1cGMhcP0/9Mi2N07aLqrw0WfuM8EvCXzTmpdG2aATVmtfypoH/AjIHtfZhFD0jhDpB9dN2YlnMFTBJjekTL4XhhGoQTPL3yiAzPKXslb7wvoWKxtwMJ9NJvuMTy4xwlGi7FOApu+iY0NiYPA6tMqYwNr9rBflfrZ9BdDmcCJVmwBWbHRRlEkOCi2/BI6E5A5lbQJkW9q6YjammDOznVewQyO8n/wUPfot4LXJHau1TNIt1dISOUR0RdIHa6BZ1UBcxZ9VpOmfOubvpEvfSvZpJXWfuOUA/wr35Bq8DvCI=</latexit>
sha1_base64="9vc4pWh5OHMfSSVEV5G+wwwzCPo=">AAACiXicbVHRatswFJW9rkuTbU3Xve1FNBQSCMEy7VYGhUBe9phC04QmwciKnIjIspGuByb0X/pNe+vfVHaCSdNdkDj33HOupKswlcKA57047oejj8efaif1xucvX0+bZ98eTJJpxkcskYmehNRwKRQfgQDJJ6nmNA4lH4frQVEf/+XaiETdQ57yeUyXSkSCUbBU0HwetieB6uKZFQHOA9K1m2/zRQKmwKqDb/GMynRF8bD92MGFgez0u9Sv0spYtt1X5ZXJ2reEXxH+vjGvjLZFJ2i2vJ5XBn4PyA60+t+jMoZB85/txLKYK2CSGjMlXgrzDdUgmORP9VlmeErZmi751EJFY27mm3KST/jSMgscJdouBbhk9x0bGhuTx6FVxhRW5rBWkP+rTTOIbuYbodIMuGLbg6JMYkhw8S14ITRnIHMLKNPC3hWzFdWUgf28uh0COXzye/Dg94jXI3ek1b9C26ihH+gCtRFBv1Af/UFDNELMOXa6zrXz0224xL1xf2+lrrPznKM34Q5eAU/9vVo=</latexit>

P
This factor is size 2 f1 (y1 |Z) =
<latexit sha1_base64="XhoYpJqP0cCgLLeomjYCnWAUGKI=">AAACLXicbVDLSgMxFM34rPU16tJNsAjtpkxE0I1Q0IXLCvaBnWHIpJk2NPMgyUjLMD/kxl8RwUVF3PobZtpZ1NYDgcM553JzjxdzJpVlTY219Y3Nre3STnl3b//g0Dw6bssoEYS2SMQj0fWwpJyFtKWY4rQbC4oDj9OON7rN/c4zFZJF4aOaxNQJ8CBkPiNYack173w3RVl14iJo65yCTzV4A22ZBG46dlEGmwueFmpaGC+GyxquWbHq1gxwlaCCVECBpmu+2/2IJAENFeFYyh6yYuWkWChGOM3KdiJpjMkID2hP0xAHVDrp7NoMnmulD/1I6BcqOFMXJ1IcSDkJPJ0MsBrKZS8X//N6ifKvnZSFcaJoSOaL/IRDFcG8OthnghLFJ5pgIpj+KyRDLDBRuuC8BLR88ippX9SRVUcPqNK4LOoogVNwBqoAgSvQAPegCVqAgBfwBqbg03g1Powv43seXTOKmRPwB8bPL2KRo8U=</latexit>
sha1_base64="shBTZfULs9qhEs+o4r6VpgABk7I=">AAACLXicbVDLSgMxFL3js9ZX1aWbYBHaTZmooBuxoAuXFewD2zJk0kwbmnmQZKRl6A+58VdEcFERt/6GmbaL2nogcDjnXG7ucSPBlbbtsbWyura+sZnZym7v7O7t5w4OayqMJWVVGopQNlyimOABq2quBWtEkhHfFazu9m9Tv/7MpOJh8KiHEWv7pBtwj1OijeTk7jwnwaPC0MGoZXIaPRXRNWqp2HeSgYNHqDLnGaFohMF8OGvg5PJ2yZ4ALRM8I/kyOr8Bg4qTe291Qhr7LNBUEKWa2I50OyFScyrYKNuKFYsI7ZMuaxoaEJ+pdjK5doROjdJBXijNCzSaqPMTCfGVGvquSfpE99Sil4r/ec1Ye1fthAdRrFlAp4u8WCAdorQ61OGSUS2GhhAqufkroj0iCdWm4LQEvHjyMqmdlbBdwg84X76AKTJwDCdQAAyXUIZ7qEAVKLzAG4zh03q1Pqwv63saXbFmM0fwB9bPL37VpJc=</latexit>
sha1_base64="DcJ5lsBEHUVDQXsRz32dxSVP9oU=">AAACLXicbVDLSgMxFM3UV62vUVfiJliEdlMmVtCNUtCFyxHsA9syZNJMG5p5kGSkZegPufFXRHBREbd+hmbaLmrrgcDhnHO5uceNOJPKssZGZmV1bX0ju5nb2t7Z3TP3D2oyjAWhVRLyUDRcLClnAa0qpjhtRIJi3+W07vZvUr/+RIVkYfCghhFt+7gbMI8RrLTkmLeek6BRYegg2NI5BR+L8Aq2ZOw7ycBBI2jPeVooamEwH85pOGbeKlkTwGWCZiRfgeXrnyO7bDvmW6sTktingSIcS9lEVqTaCRaKEU5HuVYsaYRJH3dpU9MA+1S2k8m1I3iqlQ70QqFfoOBEnZ9IsC/l0Hd10seqJxe9VPzPa8bKu2wnLIhiRQMyXeTFHKoQptXBDhOUKD7UBBPB9F8h6WGBidIFpyWgxZOXSe2shKwSukf5yjmYIguOwQkoAAQuQAXcARtUAQHP4BWMwYfxYrwbn8bXNJoxZjOH4A+M71+kkKYx</latexit>
x1 P (y1 |x1 )P (x1 |Z)

P (Xn , |y1 , y2 , . . . , yn ) = ↵P (Z)f1 (y1 |Z)P (X2 |Z), . . . , P (Xn |Z)P (y2 |X2 ), . . . , P (yn |Xn )
<latexit sha1_base64="Ppm2+Mn0OsRsn/H+84BgtqR/0zk=">AAACgHicbVFda9swFJXdrh9p12bdY19EQ8GBLLNCYaMwKPRljxk0bWgSjKzIjagsG+l6YEx+R//X3vZjBr1OTcjaXpA499xzrqSrONfKQRj+9fyt7Q87u3v7rYPDj0fH7U8nty4rrJAjkenMjmPupFZGjkCBluPcSp7GWt7Fj9d1/e63tE5l5gbKXM5S/mBUogQHpKL20zAYR6ZHpygCWkash9sA83kGrsamS3/QKdf5gtNhcN+lSVSxZYDKxoNU3WOwTtfmVetNVblWoX5TV6516Oi2MKJ2J+yHq6BvAWtAhzQxjNp/sJsoUmlAaO7chIU5zCpuQQktl61p4WTOxSN/kBOEhqfSzarVAJf0HJk5TTKLywBdsZuOiqfOlWmMypTDwr2u1eR7tUkByfdZpUxegDTi5aCk0BQyWv8GnSsrBegSARdW4V2pWHDLBeCf1UNgr5/8FtwO+izss1+sc3XRjGOPnJIzEhBGvpEr8pMMyYgI8s/reD3vi+/7gf/VZy9S32s8n8l/4V8+A0nMuB8=</latexit>
sha1_base64="72qIDP9/PCg9E6UaPXVunb0qGeU=">AAACgHicbZHRatswFIZld1u7rFvTrne7EQsFB7LUCoWWwiDQm15msLRhSfBkRW5EZdlIxwVj8hx9r97tYQY9TkLI2h2w+fWf7z+ypTjXykEY/vH8nTdv3+3uvW982P/46aB5eHTjssIKORSZzuwo5k5qZeQQFGg5yq3kaazlbXx/VfdvH6R1KjM/oczlNOV3RiVKcEAraj4OglFkOnSCENAyYh189XA9y8DV2rTpdzrhOp9zOgh+tWkSVWwRILnOoFXP6G2Wm/By9DZVbijkt7lyw2Gi3cCKmq2wGy6LvhZsLVr94yT5TQgZRM0nnCaKVBoQmjs3ZmEO04pbUELLRWNSOJlzcc/v5Bil4al002p5gAt6gs6MJpnFxwBdutuJiqfOlWmMZMph7l72avN/vXEBycW0UiYvQBqx2igpNIWM1rdBZ8pKAbpEwYVV+K1UzLnlAvDO6kNgL3/5tbjpdVnYZT9Yq39GVrVHvpCvJCCMnJM+uSYDMiSC/PVaXsf75vt+4J/6bIX63jrzmfxT/uUzUFi5nw==</latexit>
sha1_base64="F+mpsp82caKr3Bs1IIN4kUZ7EAs=">AAACgHicbVFda9swFJW9dW2zbk23vu1FLAwcyFIrFFoKg0Bf+pjB0oYlwciK3IjKspGuC8bkd/R/7W0/ZrBrJ4Ss3QGJo3PPufqKc60chOFvz3/1eu/N/sFh6+3Ru/fH7ZMPty4rrJBjkenMTmLupFZGjkGBlpPcSp7GWt7FD9d1/e5RWqcy8wPKXM5Tfm9UogQHlKL20yiYRKZHZ2gCWkash9MA14sMXM1Nl36jM67zJaej4GeXJlHFVgE6NxmU6h6D7XIbblrvusqtC/27vnLrw0S3hYjanbAfNqAvCduQzvA0aTCK2r+wmyhSaUBo7tyUhTnMK25BCS1XrVnhZM7FA7+XU6SGp9LNq+YBV/QLKguaZBaHAdqou4mKp86VaYzOlMPSPa/V4v9q0wKSy3mlTF6ANGK9UVJoChmtf4MulJUCdImEC6vwrFQsueUC8M/qR2DPr/yS3A76LOyz76wzPCdrHJBP5DMJCCMXZEhuyIiMiSB/vI7X8776vh/4Zz5bW31vk/lI/oF/9RfxQ7rX</latexit>

Other steps are like the previous example. Each factor is of size 2 consisting of one variable.
Variable ordering can have considerable impact.
Properties
• Variable elimination is dominated by the size of the largest
factor constructed during the operation of the algorithm.
• Depends on the structure of the network and order of
elimination of the variables.
• Finding the optimal ordering is intractable.
• Can pose the problem of finding good ordering as a search.
• Use heuristics.
• Min-fill heuristic
• Eliminate the variable that creates the smallest sized factor (greedy
approach).
• Min-neighbors
• Eliminate the variable that has the smallest number of neighbors in
the current graph. Rank A, B and D with
the Min-Fill heuristic.

57
Irrelevant Variables
Burglary Earthquake

Alarm

John Mary
Calls Calls

Every variable that is not an ancestor of a query variable or


evidence variable is irrelevant for the query. 59
Slide adapted from Prof. Mausam
Bayesian Networks: Independence
• Bayesian Networks
• Implicitly encode joint distributions
• A collection of distributions over X, one for
each combination of parents’ values
• Product of local conditional distributions
• Inference
• Given a fixed BN, what is P(X | e)
• Variable Elimination
• Modeling
• Understanding the assumptions made when
choosing a Bayes net graph

A Bayesian Network Model for Diagnosis of Liver Disorders. Onisko et al. 99.
Conditional Independence
• X and Y are independent if

• X and Y are conditionally independent given Z

• (Conditional) independence: Given Z, Y has no more information to


convey about X or Y does not probabilistically influence X.

• Example:
Smoke causes the alarm to be triggered. Once there is smoke it does not
matter what caused it (e.g., Fire or any other source).
Bayesian Network: Independence
Assumptions
• The conditional distributions defining the Bayesian Network (BN)
assume conditional independences.
P (xi |x1 · · · xi 1) = P (xi |parents(Xi ))
• Often there are additional conditional independences that are
implicit in the network.
• How to show if two variables (X and Y) are conditionally independent
given evidence (say Z)?
• Yes. Provide a proof by analyzing the probability expression.
• No. Find a counter example. Instantiate a CPT for the BN such that X and Y are
not independent given Z.
Causal Chains
• Is X guaranteed to be independent of Z?
X Y Z • No
• Intuitively
X: No Mask Y: Covid Z: Fever
Transmission • Wearing no masks causes virus transmission
which causes fever.
• Wearing masks causes no virus transmission
causes no symptom.
• Path between X and Z is active.
• Instantiate a CPT
P( +y | +x ) = 1, P( -y | - x ) = 1,
P( +z | +y ) = 1, P( -z | -y ) = 1
Causal Chains
• Is X guaranteed to be independent of Z
X Y Z given Y?
• Yes
X: No Mask Y: Covid Z: Fever
Transmission

• Evidence along the chain blocks the


influence (inactivates the path).
Common Cause
Y: Covid infection
• Is X guaranteed to be independent of Z?
Y • No
• Intuitively
• Covid infection causes both Fever and Loss
of Smell.
• Path between X and Z is active.
X Z
• Instantiate a CPT
X: Fever Z: Loss of smell P( +x | +y ) = 1, P( -x | -y ) = 1,
P( +z | +y ) = 1, P( -z | -y ) = 1
Common Cause
Y: Covid infection • Is X independent of Z given Y?
• Yes
Y

X Z
• If you have Covid, then belief over the loss of
X: Fever Z: Loss of smell
smell is not affected by presence of fever
• Observing the cause blocks the influence
(inactivates the path).
Common Effect
X: Covid Y: Tuberculosis
• Are X and Y independent?
• Yes
X Y • Covid and TB both cause Fever. But can’t say
that if you have Covid then you are more or
less likely to have TB (under this model)
X
P (x, y) = P (x, y, z)
z
X
Z = P (x)P (y)P (z|x, y)
z
Z: Fever X
= P (x)P (y) P (z|x, y)
z

<latexit sha1_base64="(null)">(null)</latexit>
= P (x)P (y)
Common Effect
X: Covid Y: Tuberculosis • Is X independent of Y given Z?
• No
X Y • Seeing the fever puts Covid and TB in
competition as possible causal explanations.
• It is likely that one of them is the cause, rare for
both. If Covid is present then the likelihood of
TB being present is low (reduces its chances).
Z • Observing the cause activates influence
between possible causes.
Z: Fever
Active and Inactive Paths
• Question: Are X and Y conditionally independent given Active Triples Inactive Triples
evidence variables {Z}?
• Yes, if X and Y “d-separated” by Z
• Consider all (undirected) paths from X to Y
• No active paths = independence.

• A path is active if each triple is active:


• Causal chain A -> B -> C where B is unobserved (either direction)
• Common cause A <- B -> C where B is unobserved
• Common effect (aka v-structure)
A -> B <- C where B or one of its descendants is observed

• A path is blocked with even a single inactive segment


D-Separation
§ Query: Xi Xj |{Xk1 , ..., Xkn } ?
§ Check all (undirected) paths between and
§ If one or more active, then independence not guaranteed

Xi Xj |{Xk1 , ..., Xkn }


§ Otherwise (i.e. if all paths are inactive),
then independence is guaranteed

Xi Xj |{Xk1 , ..., Xkn }


D-Separation: Examples

Yes R B


D-Separation: Examples
L

Yes
R B
Yes

D T

Yes

D-Separation: Examples

Yes
T D

S
Encoding a Generative Process
• Directed Graphical Models
• Encode a generative structure
• The model our assumptions about how the data was generated.
• Include modeling assumptions that impact inference.
• Can also understand as “probabilistic programs”
Examples
Naïve Language Modeling
Today is Thursday Timetable

Basic Tracking

Adapted from Dorsa Sadigh and Percy Liang Given noisy observations where is the agent?
Examples
Document Analysis

Given a text document what is it about?

Social Network Analysis

Given a social network (graph over n people), what


Adapted from Dorsa Sadigh and Percy Liang
types of people are there?
Approximate Inference in Probabilistic Models
• Exact Inference
• Inference by enumeration (Variable elimination)
• Exact likelihood for probabilistic queries.
• Exact Marginal likelihood P(Late = Yes)
• Exact Conditional likelihood (posterior probability)
• P(Late = True | Rain = Yes, Traffic = High) etc.
• Problem:
• In many practical applications variable elimination can be intractable. Variable elimination
may need to create a large table.

• Approximate Inference
• Compute an “approximate” posterior probability
• Principle Methods
• Generate samples from the distribution.
• Use the samples to construct an approximate estimate of the probabilistic query. P’(Late) or • Prior Sampling
P’(Late| Rain, Traffic).
• Rejection Sampling
• Advantage
• Generating samples and constructing the approximate distribution is often faster. • Likelihood Weighting
• Note that the estimate is approximate, not exact.
• Gibbs Sampling
Need the ability to sample from a distribution
• Sampling from given distribution Example:
• Step 1: Get sample u from uniform
distribution over [0, 1)
C P(C)
• Step 2: Convert this sample u into an
outcome for the given distribution by red 0.6
having each outcome associated with a
sub-interval of [0,1) with sub-interval green 0.1
size equal to probability of the
outcome blue 0.3
• Utility? We should be able to sample
from a CPT defining the probabilistic § If random() returns u = 0.83,
model. then our sample is C = blue
• Next, we look at approaches for § E.g, after sampling 8 times:
sampling from a Bayes Net.
Prior Sampling
• For i=1, 2, …, n
• Sample xi from P(Xi | Parents(Xi))

• Return (x1, x2, …, xn)

Source MLSS 2009. http://videolectures.net/mlss09uk_murray_mcmc/


Prior Sampling
• For i=1, 2, …, n +c 0.5
-c 0.5
• Sample xi from P(Xi | Parents(Xi))

• Return (x1, x2, …, xn) Cloudy


+c +s 0.1 +c +r 0.8
-s 0.9 -r 0.2
-c +s 0.5 -c +r 0.2
Sprinkler Rain
-s 0.5 -r 0.8

WetGrass Samples:
+s +r +w 0.99
-w 0.01 +c, -s, +r, +w
-r +w 0.90 -c, +s, -r, +w
-w 0.10
-s +r +w 0.90 …
-w 0.10
-r +w 0.01
-w 0.99
Approximate Probabilistic Queries with Samples
• Potential samples from the Bayes Net:
+c, -s, +r, +w C
+c, +s, +r, +w
-c, +s, +r, -w
+c, -s, +r, +w
-c, -s, -r, +w
• What can we do with these samples? S R
• Can empirically estimate the probabilistic queries
• Estimating P(W)
• We have counts <+w:4, -w:1> W
• Normalize to get P(W) = <+w:0.8, -w:0.2>
• Can estimate other probabilistic queries as well:
• P(C| +w)? P(C| +r, +w)? P(C| -r, -w)?
• Note: if some evidence is not observed then we cannot estimate it

Problem: Prior sampling is unaware of the types of probabilistic queries that will be asked later?
Can we be more efficient if we knew the queries from the Bayes Net?
Prior Sampling Convergence
• This process generates samples with probability:

• Let the number of samples of an event be

• Then,

• Prior sampling procedure is consistent. In the limit, the estimated distribution converges
to the correct distribution.
Rejection Sampling
• IN: evidence instantiation
C
• For i=1, 2, …, n
• Sample xi from P(Xi | Parents(Xi)) S R

• If xi not consistent with evidence W


• Reject: Return, and no sample is generated in this cycle

• Return (x1, x2, …, xn)

Rejection Sampling
• Estimate P(C| +s) +c, -s, +r, +w
• Tally the C outcomes, but reject samples which do not have S=+s. +c, +s, +r, +w
• As you are sampling successively from the conditional probabilities, return -c, +s, +r, -w
if you find a sample inconsistent with the instantiated variables.
+c, -s, +r, +w
• We can reject earlier, say, if there are 1000 variables and at the third
variable we detect an inconsistency, we can reject the entire sample. -c, -s, -r, +w
• Rejection sampling is consistent for conditional probabilities in the limit.
Likelihood Weighting
Example: P(Shape|blue)
pyramid, green
• Rejection Sampling pyramid, red
• Problem: iff the evidence is unlikely, the sampling
will reject lots of samples. Shape Color sphere, blue
• Wait for a long time before you get a sample that is cube, red
in agreement with the evidence. sphere, green
• Evidence is taken into account after you have
sampled, not exploited as you sample. Sample and then throw
away the inconsistent ones.

• Likelihood Weighting
• Generate only events that are consistent with the Example: P(Shape|blue) pyramid, blue
evidence. pyramid, blue
• Fix evidence variables and sample the rest Shape Color sphere, blue
• Problem: sample distribution not consistent. cube, blue
• Solution: weight by probability of evidence given sphere, blue
parents.
Sample only the consistent values.
Likelihood Weighting
• IN: evidence instantiation +c 0.5
-c 0.5
• w = 1.0
• for i=1, 2, …, n Cloudy
• if Xi is an evidence variable +c +s 0.1 +c +r 0.8
• Xi = observation xi for Xi -s 0.9 -r 0.2
• Set w = w * P(xi | Parents(Xi)) -c +s 0.5 -c +r 0.2
-s 0.5 Sprinkler Rain -r 0.8
• else
• Sample xi from P(Xi | Parents(Xi))
• return (x1, x2, …, xn), w Samples:
+s +r +w 0.99 WetGrass
-w 0.01 +c, +s, +r, +w
-r +w 0.90 …
-w 0.10
-s +r +w 0.90
-w 0.10
-r +w 0.01
-w 0.99 When an evidence variable adjusted the weight.
Likelihood Weighting
• Sampling distribution if z sampled and e fixed evidence

Cloudy
C

S R
• Now, samples have weights
W

• Together, weighted sampling distribution is consistent

Weighting corrects the distribution. It also


represents the importance of the distribution.
Gibbs Sampling
• Likelihood weighting
• Evidence is taken into account during the sampling process.
• Problem?
• Evidence variables influence the choice of down stream variables and not the upstream
ones.
• During the sampling process there may be a low probability evidence encountered later,
the sample will look promising for very long.

• Gibbs Sampling
• Consider evidence when we sample every variable (both downstream and
upstream)

88
Gibbs Sampling
• Procedure:
• Track of a full instantiation x1, x2, …, xn.
• Start with an arbitrary instantiation consistent with the evidence.
• Sample one variable at a time, conditioned on all the rest, but keep evidence fixed.
• Keep repeating this for a long time.
• After repeating you get one sample from the distribution.
• To get more samples: start again.
• Note: this is like local search.

• Property: in the limit of repeating this infinitely many times the resulting sample is
coming from the correct distribution.

• Rationale: both upstream and downstream variables condition on evidence.

• Note: Enough to sample from the Markov Blanket.

89
Conditional Independences and Markov
Boundary

(a) A node is conditionally independent of its (b) A node is conditionally independent of all other
non-descendants given its parents. nodes in the network given the Markov blanket, i.e.,
its parents, children and children’s parents.
90
Gibbs Sampling: Example
Estimating P( S | +r)
Step 1: Fix evidence C
Step 2: Initialize other variables C
• R = +r § Randomly
S +r S +r

W W
Steps 3: Repeat
• Randomly select a non-evidence variable X
• Resample X from P( X | all other variables)

C C C C C C
S +r S +r S +r S +r S +r S +r
W W W W W W

91
Sampling from the conditional
• Sample from P(S | +c, +r, -w) C

S +r

Sampling from the conditional distribution is needed as a sub-routine for Gibbs sampling. It is typically easier to sample
from. The expression is simpler due to instantiated variables, can even construct the probability table if needed. 92
Application: Topic Modeling
Topic Modeling

https://cacm.acm.org/magazines/2012/4/147361-probabilistic-topic-models/fulltext
Application: Topic Modeling
Gibbs Sampling and important technique for this application

https://cacm.acm.org/magazines/2012/4/147361-probabilistic-topic-models/fulltext

You might also like