Probability

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 99

Probability

CSE 4711: Artificial Intelligence

Md. Bakhtiar Hasan


Assistant Professor
Department of Computer Science and Engineering
Islamic University of Technology
Inference in Ghostbusters
A ghost is in the grid somewhere

Video: ghosts - manual


2/
28
2/28
Inference in Ghostbusters
A ghost is in the grid somewhere
Two actions
• Bust action to catch the ghost
• Sensor readings tell how close a
square is to the ghost
▶ On the ghost: red
▶ 1 or 2 away: orange
▶ 3 or 4 away: yellow
▶ 5+ away: green

Video: ghosts - manual


2/
28
2/28
Inference in Ghostbusters
A ghost is in the grid somewhere
Two actions
• Bust action to catch the ghost
• Sensor readings tell how close a
square is to the ghost
▶ On the ghost: red
▶ 1 or 2 away: orange
▶ 3 or 4 away: yellow
▶ 5+ away: green
Sensors are noisy, but we know P (Color|Distance)
P (red|3) P (orange|3) P (yellow|3) P (green|3)
0.05 0.15 0.50 0.30

Video: ghosts - manual


2/
28
2/28
Uncertainty

General situation:
• Observed variables (evidence): Agent knows certain
things about the state of the world (e.g., sensor readings or
symptoms)
• Unobserved variables: Agent needs to reason about
other aspects (e.g. where an object is or what disease is
present)
• Model: Agent knows something about how the known
variables relate to the unknown variables
Probabilistic reasoning gives us a framework for managing
our beliefs and knowledge

3/
28
3/28
Random Variables
A random variable is some aspect of the world
about which we (may) have uncertainty
• R = Is it raining?
• T = Is it hot or cold?
• D = How long will it take to drive to work?
• L = Where is the ghost?
We denote random variables with capital letters

4/
28
4/28
Random Variables
A random variable is some aspect of the world
about which we (may) have uncertainty
• R = Is it raining?
• T = Is it hot or cold?
• D = How long will it take to drive to work?
• L = Where is the ghost?
We denote random variables with capital letters
Domains:
• R ∈ {true, f alse} (often written as {+r, −r})
• T ∈ {hot, cold}
• D ∈ [0, ∞)
• L ∈ {(0, 0), (0, 1), . . . }

4/
28
4/28
Probability Distributions
Associate a probability with each value

5/
28
5/28
Probability Distributions
Associate a probability with each value

Temperature

P (T )
T P
hot 0.5
cold 0.5

5/
28
5/28
Probability Distributions
Associate a probability with each value

Temperature Weather
P (W )
P (T )
W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0

5/
28
5/28
Probability Distributions
Unobserved random variables have distributions
P (W )
P (T ) W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
A distribution is a TABLE of probabilities of values

5/
28
5/28
Probability Distributions
Unobserved random variables have distributions
P (W )
P (T ) W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
A distribution is a TABLE of probabilities of values
A probability (lower case value) is a single number
e.g.: P (W = rain) = 0.1

5/
28
5/28
Probability Distributions
Unobserved random variables have distributions
P (W )
P (T ) W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
A distribution is a TABLE of probabilities of values
A probability (lower case value) is a single number
e.g.: P (W = rain) = 0.1
Must have: ∀x P (X = x) ≥ 0 and
P
P (X = x) = 1
x
5/
28
5/28
Probability Distributions
Unobserved random variables have distributions
P (W )
P (T ) W P
T P
sun 0.6 Shorthand notation:
hot 0.5 rain 0.1 P (hot) = P (T = hot),
cold 0.5 fog 0.3 P (cold) = P (T = cold),
meteor 0.0 P (rain) = P (W = rain),
...
A distribution is a TABLE of probabilities of values
OK if all domain entries are
A probability (lower case value) is a single number unique
e.g.: P (W = rain) = 0.1
Must have: ∀x P (X = x) ≥ 0 and
P
P (X = x) = 1
x
5/
28
5/28
Joint Distributions
A joint distribution over a set of random variables:
X1 , X2 , . . . , Xn specifies a real number for each
assignment (or outcome):
P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

6/
28
6/28
Joint Distributions
A joint distribution over a set of random variables:
X1 , X2 , . . . , Xn specifies a real number for each
assignment (or outcome):
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

6/
28
6/28
Joint Distributions
A joint distribution over a set of random variables:
X1 , X2 , . . . , Xn specifies a real number for each
assignment (or outcome):
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

6/
28
6/28
Joint Distributions
A joint distribution over a set of random variables:
X1 , X2 , . . . , Xn specifies a real number for each
assignment (or outcome):
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
Must obey: hot sun 0.4
P (x1 , x2 , . . . , xn ) ≥ 0 hot rain 0.1
P cold sun 0.2
P (x1 , x2 , . . . , xn ) = 1
(x1 ,x2 ,...,xn ) cold rain 0.3

6/
28
6/28
Joint Distributions
A joint distribution over a set of random variables:
X1 , X2 , . . . , Xn specifies a real number for each
assignment (or outcome):
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
Must obey: hot sun 0.4
P (x1 , x2 , . . . , xn ) ≥ 0 hot rain 0.1
P cold sun 0.2
P (x1 , x2 , . . . , xn ) = 1
(x1 ,x2 ,...,xn ) cold rain 0.3
Size of distribution if n variables with domain sizes d?

6/
28
6/28
Joint Distributions
A joint distribution over a set of random variables:
X1 , X2 , . . . , Xn specifies a real number for each
assignment (or outcome):
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
Must obey: hot sun 0.4
P (x1 , x2 , . . . , xn ) ≥ 0 hot rain 0.1
P cold sun 0.2
P (x1 , x2 , . . . , xn ) = 1
(x1 ,x2 ,...,xn ) cold rain 0.3
Size of distribution if n variables with domain sizes d?
→ dn
• Impractical to write out for large distributions

6/
28
6/28
Probabilistic Models
A joint distribution over a set
of random variables
• (Random) variables with
P (T, W )
domains
• Assignments are called T W P
outcomes hot sun 0.4
• Joint distributions: say
hot rain 0.1
whether assignments
cold sun 0.2
(outcomes) are likely
• Normalized: sum to 1.0
cold rain 0.3
• Ideally: only certain variables
directly interact

7/
28
7/28
Events
A set E of outcomes
P
P (E) = P (x1 . . . xn )
(x1 ...xn )∈E P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

8/
28
8/28
Events
A set E of outcomes
P
P (E) = P (x1 . . . xn )
(x1 ...xn )∈E P (T, W )
From a joint distribution, we can calculate the T W P
probability of any event
• Probability that it’s hot AND sunny? hot sun 0.4
• Probability that it’s hot? hot rain 0.1
• Probability that it’s hot OR sunny? cold sun 0.2
cold rain 0.3

8/
28
8/28
Events
A set E of outcomes
P
P (E) = P (x1 . . . xn )
(x1 ...xn )∈E P (T, W )
From a joint distribution, we can calculate the T W P
probability of any event
• Probability that it’s hot AND sunny? hot sun 0.4
• Probability that it’s hot? hot rain 0.1
• Probability that it’s hot OR sunny? cold sun 0.2
cold rain 0.3
Typically, the events we care about are partial
assignments, like P (T = hot)

8/
28
8/28
Quiz: Events
P (+x, +y)?
P (X, Y )
X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1

9/
28
9/28
Quiz: Events
P (+x, +y)?
→ 0.2 P (X, Y )
P (+x)? X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1

9/
28
9/28
Quiz: Events
P (+x, +y)?
→ 0.2 P (X, Y )
P (+x)? X Y P
P (+x, +y) + P (+x, −y) +x +y 0.2
→ 0.5 +x −y 0.3
P (−y OR + x)? −x +y 0.4
−x −y 0.1

9/
28
9/28
Quiz: Events
P (+x, +y)?
→ 0.2 P (X, Y )
P (+x)? X Y P
P (+x, +y) + P (+x, −y) +x +y 0.2
→ 0.5 +x −y 0.3
P (−y OR + x)? −x +y 0.4
P (+x, −y) + P (−x, −y) + P (+x, +y) −x −y 0.1
→ 0.6

9/
28
9/28
Marginal Distributions
Marginal distributions are sub-tables which eliminate
variables
Marginalization (summing out): Combine collapsed
rows by adding

P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

10 /
10/2828
Marginal Distributions
Marginal distributions are sub-tables which eliminate
variables
Marginalization (summing out): Combine collapsed
rows by adding

P (TP
)=?
P (T, W ) P (t) = P (t, s)
s
T W P P (T )
hot sun 0.4 T P
hot rain 0.1 hot 0.5
cold sun 0.2 cold 0.5
cold rain 0.3

10 /
10/2828
Marginal Distributions
Marginal distributions are sub-tables which eliminate
variables
Marginalization (summing out): Combine collapsed
rows by adding

P (TP
)=? P (WP)=?
P (T, W ) P (t) = P (t, s) P (s) = P (t, s)
s t
T W P P (T ) P (W )
hot sun 0.4 T P W P
hot rain 0.1 hot 0.5 sun 0.6
cold sun 0.2 cold 0.5 rain 0.4
cold rain 0.3

10 /
10/2828
Marginal Distributions
Marginal distributions are sub-tables which eliminate
variables
Marginalization (summing out): Combine collapsed
rows by adding

P (TP
)=? P (WP)=?
P (T, W ) P (t) = P (t, s) P (s) = P (t, s)
s t
T W P P (T ) P (W )
hot sun 0.4 T P W P
hot rain 0.1 hot 0.5 sun 0.6
cold sun 0.2 cold 0.5 rain 0.4
cold rain 0.3 P
P (X1 = x1 ) = P (X1 = x1 , X2 = x2 )
x2

10 /
10/2828
Quiz: Marginal Distributions
P (X, Y )
X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1

11 /
11/2828
Quiz: Marginal Distributions
P
P (x) = P (x, y)
y
P (X, Y )
P (X)
X Y P
X P
+x +y 0.2
+x +x −y 0.3
−x −x +y 0.4
−x −y 0.1

11 /
11/2828
Quiz: Marginal Distributions
P
P (x) = P (x, y)
y

P (X)
X P P (X, Y )
+x 0.5 X Y P
−x 0.5
P +x +y 0.2
P (y) = P (x, y) +x −y 0.3
x
P (Y ) −x +y 0.4
−x −y 0.1
Y P
+y
−y
11 /
11/2828
Quiz: Marginal Distributions
P
P (x) = P (x, y)
y

P (X)
X P P (X, Y )
+x 0.5 X Y P
−x 0.5
P +x +y 0.2
P (y) = P (x, y) +x −y 0.3
x
P (Y ) −x +y 0.4
−x −y 0.1
Y P
+y 0.6
−y 0.4
11 /
11/2828
Conditional Probabilities
A simple relation between joint and conditional probabilities

P (a, b)
P (a|b) =
P (b)

P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

12 /
12/2828
Conditional Probabilities
A simple relation between joint and conditional probabilities

P (a, b)
P (a|b) =
P (b)

P (T, W )
T W P
hot sun 0.4 P (W = s|T = c) =
hot rain 0.1
cold sun 0.2
cold rain 0.3

12 /
12/2828
Conditional Probabilities
A simple relation between joint and conditional probabilities

P (a, b)
P (a|b) =
P (b)

P (T, W )
T W P
hot sun 0.4 P (W = s|T = c) = P (W =s,T =c)
P (T =c)
= 0.2
0.5
= 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

12 /
12/2828
Quiz: Conditional Probabilities
P (+x| + y)

P (X, Y )
X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1

13 /
13/2828
Quiz: Conditional Probabilities
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1

13 /
13/2828
Quiz: Conditional Probabilities
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
P (−x,+y)
→ P (−x,+y)+P (+x,+y)
+x +y 0.2
→ 2 +x −y 0.3
3
−x +y 0.4
−x −y 0.1

13 /
13/2828
Quiz: Conditional Probabilities
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
P (−x,+y)
→ P (−x,+y)+P (+x,+y)
+x +y 0.2
→ 2 +x −y 0.3
3
P (−y| + x) −x +y 0.4
−x −y 0.1

13 /
13/2828
Quiz: Conditional Probabilities
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
P (−x,+y)
→ P (−x,+y)+P (+x,+y)
+x +y 0.2
→ 2 +x −y 0.3
3
P (−y| + x) −x +y 0.4
P (+x,−y) −x −y 0.1
→ P (+x,−y)+P (+x,+y)
→ 3
5

13 /
13/2828
Conditional Distributions
Probability distributions over some variables given fixed values of others

Joint Distribution
P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

14 /
14/2828
Conditional Distributions
Probability distributions over some variables given fixed values of others

Conditional Distributions
P (W |T = hot)
Joint Distribution
W P
P (T, W )
T W P sun 0.8
rain 0.2
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

14 /
14/2828
Conditional Distributions
Probability distributions over some variables given fixed values of others

Conditional Distributions
P (W |T = hot)
Joint Distribution
W P
P (T, W )
T W P sun 0.8
rain 0.2
hot sun 0.4
hot rain 0.1 P (W |T = cold)
cold sun 0.2 W P
cold rain 0.3
sun 0.4
rain 0.6

14 /
14/2828
Normalization Trick

P (T, W )
P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3

15 /
15/2828
Normalization Trick
P (W = s|T = c)

P (T, W )
P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3

15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W )
P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3

15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
P (W = s, T = c) + P (W = r, T = c) P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3

15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
P (W = s, T = c) + P (W = r, T = c) P (W |T = cold)
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
hot rain 0.1 sun 0.4
cold sun 0.2 rain
cold rain 0.3

15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
P (W = s, T = c) + P (W = r, T = c) P (W |T = cold)
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
hot rain 0.1 P (W = r|T = c) sun 0.4
cold sun 0.2 rain
cold rain 0.3

15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
P (W = s, T = c) + P (W = r, T = c) P (W |T = cold)
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
hot rain 0.1 P (W = r|T = c) sun 0.4
cold sun 0.2 P (W = r, T = c) rain
=
cold rain 0.3 P (T = c)

15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
P (W = s, T = c) + P (W = r, T = c) P (W |T = cold)
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
hot rain 0.1 P (W = r|T = c) sun 0.4
cold sun 0.2 P (W = r, T = c) rain
=
cold rain 0.3 P (T = c)
P (W = r, T = c)
=
P (W = s, T = c) + P (W = r, T = c)

15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
P (W = s, T = c) + P (W = r, T = c) P (W |T = cold)
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
hot rain 0.1 P (W = r|T = c) sun 0.4
cold sun 0.2 P (W = r, T = c) rain 0.6
=
cold rain 0.3 P (T = c)
P (W = r, T = c)
=
P (W = s, T = c) + P (W = r, T = c)
0.3
= = 0.6
0.2 + 0.3
15 /
15/2828
Normalization Trick

P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

15 /
15/2828
Normalization Trick

SELECT the joint


P (T, W ) probabilities matching
the evidence P (c, W )
T W P
T W P
hot sun 0.4
hot rain 0.1 cold sun 0.2
cold sun 0.2 cold rain 0.3
cold rain 0.3

15 /
15/2828
Normalization Trick

SELECT the joint NORMALIZE the


probabilities matching selection (make
P (T, W )
the evidence it sum to one)
T W P P (c, W ) P (W |T = c)
T W P W P
hot sun 0.4
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3

15 /
15/2828
Normalization Trick

SELECT the joint NORMALIZE the


probabilities matching selection (make
P (T, W )
the evidence it sum to one)
T W P P (c, W ) P (W |T = c)
T W P W P
hot sun 0.4
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3

Why does this work?


• Sum of selection is P (evidence)! (P (T = c), here)
P (x1 , x2 ) P (x1 , x2 )
P (x1 |x2 ) = =P
P (x2 ) x1 P (x1 , x2 )

15 /
15/2828
Quiz: Normalization Trick
Find P (X|Y = −y)

SELECT the joint NORMALIZE the


probabilities matching selection (make
P (X, Y )
the evidence it sum to one)
X Y P
+x +y 0.2
+x -y 0.3
-x +y 0.4
-x -y 0.1

16 /
16/2828
Quiz: Normalization Trick
Find P (X|Y = −y)

SELECT the joint NORMALIZE the


probabilities matching selection (make
P (X, Y )
the evidence it sum to one)
X Y P
X Y P
+x +y 0.2
+x -y 0.3
+x -y 0.3
-x -y 0.1
-x +y 0.4
-x -y 0.1

16 /
16/2828
Quiz: Normalization Trick
Find P (X|Y = −y)

SELECT the joint NORMALIZE the


probabilities matching selection (make
P (X, Y )
the evidence it sum to one)
X Y P
X Y P X P
+x +y 0.2
+x -y 0.3 +x 0.75
+x -y 0.3
-x -y 0.1 -x 0.25
-x +y 0.4
-x -y 0.1

16 /
16/2828
To Normalize
(Dictionary) To bring or restore to a normal condition

17 /
17/2828
To Normalize
(Dictionary) To bring or restore to a normal condition
• All entries sum to ONE

17 /
17/2828
To Normalize
(Dictionary) To bring or restore to a normal condition
• All entries sum to ONE
Procedure:
• Step 1: Compute Z = sum over all entries
• Step 2: Divide every entry by Z

17 /
17/2828
To Normalize
(Dictionary) To bring or restore to a normal condition
• All entries sum to ONE
Procedure:
• Step 1: Compute Z = sum over all entries
• Step 2: Divide every entry by Z

Example 1

W P
sun 0.2
rain 0.3

17 /
17/2828
To Normalize
(Dictionary) To bring or restore to a normal condition
• All entries sum to ONE
Procedure:
• Step 1: Compute Z = sum over all entries
• Step 2: Divide every entry by Z

Example 1

W P Normalize W P
sun 0.2 sun 0.4
rain 0.3 Z = 0.5 rain 0.6

17 /
17/2828
To Normalize
(Dictionary) To bring or restore to a normal condition
• All entries sum to ONE
Procedure:
• Step 1: Compute Z = sum over all entries
• Step 2: Divide every entry by Z

Example 1 Example 2

T W P
W P Normalize W P
hot sun 20
sun 0.2 sun 0.4 hot rain 5
rain 0.3 Z = 0.5 rain 0.6 cold sun 10
cold rain 15

17 /
17/2828
To Normalize
(Dictionary) To bring or restore to a normal condition
• All entries sum to ONE
Procedure:
• Step 1: Compute Z = sum over all entries
• Step 2: Divide every entry by Z

Example 1 Example 2

T W P T W P
W P Normalize W P
Normalize
hot sun 20 hot sun 0.4
sun 0.2 sun 0.4 hot rain 5 hot rain 0.1
rain 0.3 Z = 0.5 rain 0.6 cold sun 10 Z = 50 cold sun 0.2
cold rain 15 cold rain 0.3

17 /
17/2828
Probabilistic Inference
Compute a desired probability from other
known probabilities (e.g. conditional from
joint)
We generally compute conditional
probabilities
• P (on time|no accidents) = 0.90
• These represent the agent’s beliefs given the
evidence
Probabilities change with new evidence:
• P (on time|no accidents, 5 a.m.) = 0.95
• P (on time|no accidents, 5 a.m., raining) = 0.80
• Observing new evidence causes beliefs to be
updated

18 /
18/2828
Inference by Enumeration
General case (X1 , X2 , . . . , Xn )
• Evidence variables: E1 . . . Ek = e1 . . . ek
• Query* variable: Q

19 /
19/2828
Inference by Enumeration
General case (X1 , X2 , . . . , Xn )
• Evidence variables: E1 . . . Ek = e1 . . . ek
• Query* variable: Q

We want: P (Q|e1 . . . ek )
(works fine with multiple query variables, too)

19 /
19/2828
Inference by Enumeration
General case (X1 , X2 , . . . , Xn )
• Evidence variables: E1 . . . Ek = e1 . . . ek
• Query* variable: Q
• Hidden variables: H1 . . . Hr
We want: P (Q|e1 . . . ek )
(works fine with multiple query variables, too)

19 /
19/2828
Inference by Enumeration

Step 1: Select the


entries consistent
with the evidence

19 /
19/2828
Inference by Enumeration

Step 1: Select the Step 2: Sum out H to


entries consistent get joint of Query
with the evidence and evidence

19 /
19/2828
Inference by Enumeration

Step 1: Select the Step 2: Sum out H to


entries consistent get joint of Query
with the evidence and evidence

P
P (Q, e1 . . . ek ) = P (Q, h1 . . . hr , e1 . . . ek )
h1 ...hr
| {z }
X1 ,X2 ,...,Xn

19 /
19/2828
Inference by Enumeration

Step 1: Select the Step 2: Sum out H to Step 3: Normalize


entries consistent get joint of Query
with the evidence and evidence

P
P (Q, e1 . . . ek ) = P (Q, h1 . . . hr , e1 . . . ek )
h1 ...hr
| {z }
X1 ,X2 ,...,Xn

19 /
19/2828
Inference by Enumeration

Step 1: Select the Step 2: Sum out H to Step 3: Normalize


entries consistent get joint of Query X
Z= P (Q, e1 . . . ek )
with the evidence and evidence q
1
P (Q|e1 . . . ek ) = P (Q, e1 . . . ek )
Z

P
P (Q, e1 . . . ek ) = P (Q, h1 . . . hr , e1 . . . ek )
h1 ...hr
| {z }
X1 ,X2 ,...,Xn

19 /
19/2828
Example: Inference by Enumeration
P (W ) S T W P
summer hot sun 0.30
summer hot rain 0.05
P (W |winter) summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
P (W |winter, hot) winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20

20 /
20/2828
Example: Inference by Enumeration
P (W ) S T W P
P (W = sun) = 0.65
summer hot sun 0.30
P (W = rain) = 0.35
summer hot rain 0.05
P (W |winter) summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
P (W |winter, hot) winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20

20 /
20/2828
Example: Inference by Enumeration
P (W ) S T W P
P (W = sun) = 0.65
summer hot sun 0.30
P (W = rain) = 0.35
summer hot rain 0.05
P (W |winter) summer cold sun 0.10
P (W = sun|winter) = 0.50 summer cold rain 0.05
P (W = rain|winter) = 0.50 winter hot sun 0.10
P (W |winter, hot) winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20

20 /
20/2828
Example: Inference by Enumeration
P (W ) S T W P
P (W = sun) = 0.65
summer hot sun 0.30
P (W = rain) = 0.35
summer hot rain 0.05
P (W |winter) summer cold sun 0.10
P (W = sun|winter) = 0.50 summer cold rain 0.05
P (W = rain|winter) = 0.50 winter hot sun 0.10
P (W |winter, hot) winter hot rain 0.05
P (W = sun|winter, hot) = 0.666 . . . winter cold sun 0.15
P (W = rain|winter, hot) = 0.333 . . . winter cold rain 0.20

20 /
20/2828
Problems with Inference by Enumeration
Obvious problems:
• Worst-case time complexity: O(dn )
• Space complexity O(dn ) to store the joint distribution

21 /
21/2828
Problems with Inference by Enumeration
Obvious problems:
• Worst-case time complexity: O(dn )
• Space complexity O(dn ) to store the joint distribution
Availability of the joint distributions and evidence

21 /
21/2828
The Product Rule
Sometimes we have condition distributions, but want the joint distribution

P (x, y)
P (x|y) =
P (y)
P (x, y) = P (y)P (x|y)

22 /
22/2828
The Product Rule

P (x, y) = P (y)P (x|y)

Example:
P (D|W ) P (D, W )
P (W ) D W P D W P
R P
× wet sun 0.1 = wet sun
sun 0.8 dry sun 0.9 dry sun
rain 0.2 wet rain 0.7 wet rain
dry rain 0.3 dry rain

22 /
22/2828
The Product Rule

P (x, y) = P (y)P (x|y)

Example:
P (D|W ) P (D, W )
P (W ) D W P D W P
R P
× wet sun 0.1 = wet sun 0.08
sun 0.8 dry sun 0.9 dry sun 0.72
rain 0.2 wet rain 0.7 wet rain 0.14
dry rain 0.3 dry rain 0.06

22 /
22/2828
The Chain Rule
More generally, we can always write any joint distribution as an incremental
product of conditional distributions

P (x1 , x2 , x3 ) = P (x1 )P (x2 |x1 )P (x3 |x1 , x2 )


Y
P (x1 , x2 , . . . , xn ) = P (xi |x1 . . . xi−1 )
i

Why is this always true?


The following can be extended for any xn :

P (x2 , x1 ) = P (x1 ) × P (x2 |x1 )


P (x2 , x1 )
= P (x1 ) ×
P (x1 )

23 /
23/2828
Bayes’ Rule

24 /
24/2828
Bayes’ Rule
Two ways to factor a joint distribution over two variables:
P (x, y) = P (x|y)P (y) = P (y|x)P (x)

24 /
24/2828
Bayes’ Rule
Two ways to factor a joint distribution over two variables:
P (x, y) = P (x|y)P (y) = P (y|x)P (x)

Dividing, we get:

P (y|x)
P (x|y) = P (x)
P (y)

24 /
24/2828
Bayes’ Rule
Two ways to factor a joint distribution over two variables:
P (x, y) = P (x|y)P (y) = P (y|x)P (x)

Dividing, we get:

P (y|x)
P (x|y) = P (x)
P (y)

Why is this at all helpful?


• Lets us build one conditional from its reverse
• Often one conditional is tricky but the other one is
simple
• Foundation of many systems (e.g. ASR, MT)

24 /
24/2828
Bayes’ Rule
Two ways to factor a joint distribution over two variables:
P (x, y) = P (x|y)P (y) = P (y|x)P (x)

Dividing, we get:

P (y|x)
P (x|y) = P (x)
P (y)

Why is this at all helpful?


• Lets us build one conditional from its reverse
• Often one conditional is tricky but the other one is
simple
• Foundation of many systems (e.g. ASR, MT)
In the running for most important AI equation!
24 /
24/2828
Inference with Bayes’ Rule
Diagnostic probability from causal probability:

P (effect|cause)P (cause)
P (cause|effect) =
P (effect)
Example:
• M : meningitis, S: stiff neck

P (+m) = 0.0001
P (+s| + m) = 0.8
P (+s| − m) = 0.01

P (+m| + s) = P (+s|+m)P
P (+s)
(+m) P (+s|+m)P (+m)
= P (+s|+m)P (+m)+P (|s|−m)P (−m) = 0.00794
• Note: posterior probability of meningitis still very small
• Note: you should still get stiff necks checked out! Why?
25 /
25/2828
Quiz: Bayes’ Rule
Given:
P (D|W )
P (W ) D W P
R P
wet sun 0.1
sun 0.8 dry sun 0.9
rain 0.2 wet rain 0.7
dry rain 0.3
What is P (W |dry)?

26 /
26/2828
Ghostbusters (Revisited)

Let’s say we have two distributions:


• Prior distribution over ghost location: P (G)
▶ Let’s say this is uniform
• Sensor reading model: P (R|G)
▶ Given: we know what our sensors do
▶ R =reading color measured at (1, 1)
▶ e.g. P (R = yellow|G = (1, 1)) = 0.1

Video: ghosts - with probability


27 /
27/2828
Ghostbusters (Revisited)

Let’s say we have two distributions:


• Prior distribution over ghost location: P (G)
▶ Let’s say this is uniform
• Sensor reading model: P (R|G)
▶ Given: we know what our sensors do
▶ R =reading color measured at (1, 1)
▶ e.g. P (R = yellow|G = (1, 1)) = 0.1
We can calculate the posterior distribution P (G|r)
over ghost locations given a reading using Bayes’ rule:
P (g|r) inf P (r|g)P (g)

Video: ghosts - with probability


27 /
27/2828
Suggested Reading
Russell & Norvig: Chapter 13.1-13.5

28 /
28/2828

You might also like