Probability

Probability
CSE 4711: Artificial Intelligence
Md. Bakhtiar Hasan

Assistant Professor
Department of Computer Science and Engineering
Islamic University of Technology
Inference in Ghostbusters
A ghost is in the grid somewhere
Video: ghosts - manual

2/
28
2/28
Two actions
• Bust action to catch the ghost
• Sensor readings tell how close a
square is to the ghost
▶ On the ghost: red
▶ 1 or 2 away: orange
▶ 3 or 4 away: yellow
▶ 5+ away: green

2/
28
2/28
Two actions
• Bust action to catch the ghost
• Sensor readings tell how close a
square is to the ghost
▶ On the ghost: red
▶ 1 or 2 away: orange
▶ 3 or 4 away: yellow
▶ 5+ away: green
Sensors are noisy, but we know P (Color|Distance)
P (red|3) P (orange|3) P (yellow|3) P (green|3)
0.05 0.15 0.50 0.30

2/
28
2/28
Uncertainty
General situation:
• Observed variables (evidence): Agent knows certain
things about the state of the world (e.g., sensor readings or
symptoms)
• Unobserved variables: Agent needs to reason about
other aspects (e.g. where an object is or what disease is
present)
• Model: Agent knows something about how the known
variables relate to the unknown variables
Probabilistic reasoning gives us a framework for managing
our beliefs and knowledge
3/
28
3/28
Random Variables
A random variable is some aspect of the world
about which we (may) have uncertainty
• R = Is it raining?
• T = Is it hot or cold?
• D = How long will it take to drive to work?
• L = Where is the ghost?
We denote random variables with capital letters
4/
28
4/28
Random Variables
A random variable is some aspect of the world
about which we (may) have uncertainty
• R = Is it raining?
• T = Is it hot or cold?
• D = How long will it take to drive to work?
• L = Where is the ghost?
We denote random variables with capital letters
Domains:
• R ∈ {true, f alse} (often written as {+r, −r})
• T ∈ {hot, cold}
• D ∈ [0, ∞)
• L ∈ {(0, 0), (0, 1), . . . }
4/
28
4/28
Probability Distributions
Associate a probability with each value
5/
28
5/28
Temperature
P (T )
T P
hot 0.5
cold 0.5
5/
28
5/28
Temperature Weather
P (W )
P (T )
W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
5/
28
5/28
Unobserved random variables have distributions
P (W )
P (T ) W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
A distribution is a TABLE of probabilities of values
5/
28
5/28
P (W )
P (T ) W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
A probability (lower case value) is a single number
e.g.: P (W = rain) = 0.1
5/
28
5/28
P (W )
P (T ) W P
T P
sun 0.6
hot 0.5 rain 0.1
cold 0.5 fog 0.3
meteor 0.0
A probability (lower case value) is a single number
e.g.: P (W = rain) = 0.1
Must have: ∀x P (X = x) ≥ 0 and
P
P (X = x) = 1
x
5/
28
5/28
P (W )
P (T ) W P
T P
sun 0.6 Shorthand notation:
hot 0.5 rain 0.1 P (hot) = P (T = hot),
cold 0.5 fog 0.3 P (cold) = P (T = cold),
meteor 0.0 P (rain) = P (W = rain),
...
OK if all domain entries are
A probability (lower case value) is a single number unique
e.g.: P (W = rain) = 0.1
Must have: ∀x P (X = x) ≥ 0 and
P
P (X = x) = 1
x
5/
28
5/28
Joint Distributions
A joint distribution over a set of random variables:
X1 , X2 , . . . , Xn specifies a real number for each
assignment (or outcome):
P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
6/
28
6/28
Joint Distributions
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
6/
28
6/28
Joint Distributions
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
6/
28
6/28
Joint Distributions
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
Must obey: hot sun 0.4
P (x1 , x2 , . . . , xn ) ≥ 0 hot rain 0.1
P cold sun 0.2
P (x1 , x2 , . . . , xn ) = 1
(x1 ,x2 ,...,xn ) cold rain 0.3
6/
28
6/28
Joint Distributions
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
P (x1 , x2 , . . . , xn ) ≥ 0 hot rain 0.1
P cold sun 0.2
P (x1 , x2 , . . . , xn ) = 1
(x1 ,x2 ,...,xn ) cold rain 0.3
Size of distribution if n variables with domain sizes d?
6/
28
6/28
Joint Distributions
P (X1 = x1 , X2 = x2 , . . . , Xn = xn ) P (T, W )
= P (x1 , x2 , . . . , xn ) T W P
P (x1 , x2 , . . . , xn ) ≥ 0 hot rain 0.1
P cold sun 0.2
P (x1 , x2 , . . . , xn ) = 1
(x1 ,x2 ,...,xn ) cold rain 0.3
Size of distribution if n variables with domain sizes d?
→ dn
• Impractical to write out for large distributions
6/
28
6/28
Probabilistic Models
A joint distribution over a set
of random variables
• (Random) variables with
P (T, W )
domains
• Assignments are called T W P
outcomes hot sun 0.4
• Joint distributions: say
hot rain 0.1
whether assignments
cold sun 0.2
(outcomes) are likely
• Normalized: sum to 1.0
cold rain 0.3
• Ideally: only certain variables
directly interact
7/
28
7/28
Events
A set E of outcomes
P
P (E) = P (x1 . . . xn )
(x1 ...xn )∈E P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
8/
28
8/28
Events
A set E of outcomes
P
P (E) = P (x1 . . . xn )
(x1 ...xn )∈E P (T, W )
From a joint distribution, we can calculate the T W P
probability of any event
• Probability that it’s hot AND sunny? hot sun 0.4
• Probability that it’s hot? hot rain 0.1
• Probability that it’s hot OR sunny? cold sun 0.2
cold rain 0.3
8/
28
8/28
Events
A set E of outcomes
P
P (E) = P (x1 . . . xn )
(x1 ...xn )∈E P (T, W )
From a joint distribution, we can calculate the T W P
probability of any event
• Probability that it’s hot AND sunny? hot sun 0.4
• Probability that it’s hot? hot rain 0.1
• Probability that it’s hot OR sunny? cold sun 0.2
cold rain 0.3
Typically, the events we care about are partial
assignments, like P (T = hot)
8/
28
8/28
Quiz: Events
P (+x, +y)?
P (X, Y )
X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1
9/
28
9/28
Quiz: Events
P (+x, +y)?
→ 0.2 P (X, Y )
P (+x)? X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1
9/
28
9/28
Quiz: Events
P (+x, +y)?
→ 0.2 P (X, Y )
P (+x)? X Y P
P (+x, +y) + P (+x, −y) +x +y 0.2
→ 0.5 +x −y 0.3
P (−y OR + x)? −x +y 0.4
−x −y 0.1
9/
28
9/28
Quiz: Events
P (+x, +y)?
→ 0.2 P (X, Y )
P (+x)? X Y P
P (+x, +y) + P (+x, −y) +x +y 0.2
→ 0.5 +x −y 0.3
P (−y OR + x)? −x +y 0.4
P (+x, −y) + P (−x, −y) + P (+x, +y) −x −y 0.1
→ 0.6
9/
28
9/28
Marginal Distributions
Marginal distributions are sub-tables which eliminate
variables
Marginalization (summing out): Combine collapsed
rows by adding
P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
10 /
10/2828
variables
rows by adding
P (TP
)=?
P (T, W ) P (t) = P (t, s)
s
T W P P (T )
hot sun 0.4 T P
hot rain 0.1 hot 0.5
cold sun 0.2 cold 0.5
cold rain 0.3
10 /
10/2828
variables
rows by adding
P (TP
)=? P (WP)=?
P (T, W ) P (t) = P (t, s) P (s) = P (t, s)
s t
T W P P (T ) P (W )
hot sun 0.4 T P W P
hot rain 0.1 hot 0.5 sun 0.6
cold sun 0.2 cold 0.5 rain 0.4
cold rain 0.3
10 /
10/2828
variables
rows by adding
P (TP
)=? P (WP)=?
P (T, W ) P (t) = P (t, s) P (s) = P (t, s)
s t
T W P P (T ) P (W )
hot sun 0.4 T P W P
hot rain 0.1 hot 0.5 sun 0.6
cold sun 0.2 cold 0.5 rain 0.4
cold rain 0.3 P
P (X1 = x1 ) = P (X1 = x1 , X2 = x2 )
x2
10 /
10/2828
Quiz: Marginal Distributions
P (X, Y )
X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1
11 /
11/2828
P
P (x) = P (x, y)
y
P (X, Y )
P (X)
X Y P
X P
+x +y 0.2
+x +x −y 0.3
−x −x +y 0.4
−x −y 0.1
11 /
11/2828
P
P (x) = P (x, y)
y
P (X)
X P P (X, Y )
+x 0.5 X Y P
−x 0.5
P +x +y 0.2
P (y) = P (x, y) +x −y 0.3
x
P (Y ) −x +y 0.4
−x −y 0.1
Y P
+y
−y
11 /
11/2828
P
P (x) = P (x, y)
y
P (X)
X P P (X, Y )
+x 0.5 X Y P
−x 0.5
P +x +y 0.2
P (y) = P (x, y) +x −y 0.3
x
P (Y ) −x +y 0.4
−x −y 0.1
Y P
+y 0.6
−y 0.4
11 /
11/2828
Conditional Probabilities
A simple relation between joint and conditional probabilities
P (a, b)
P (a|b) =
P (b)
P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
12 /
12/2828
P (a, b)
P (a|b) =
P (b)
P (T, W )
T W P
hot sun 0.4 P (W = s|T = c) =
hot rain 0.1
cold sun 0.2
cold rain 0.3
12 /
12/2828
P (a, b)
P (a|b) =
P (b)
P (T, W )
T W P
hot sun 0.4 P (W = s|T = c) = P (W =s,T =c)
P (T =c)
= 0.2
0.5
= 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
12 /
12/2828
Quiz: Conditional Probabilities
P (+x| + y)
P (X, Y )
X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1
13 /
13/2828
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
+x +y 0.2
+x −y 0.3
−x +y 0.4
−x −y 0.1
13 /
13/2828
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
P (−x,+y)
→ P (−x,+y)+P (+x,+y)
+x +y 0.2
→ 2 +x −y 0.3
3
−x +y 0.4
−x −y 0.1
13 /
13/2828
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
P (−x,+y)
→ P (−x,+y)+P (+x,+y)
+x +y 0.2
→ 2 +x −y 0.3
3
P (−y| + x) −x +y 0.4
−x −y 0.1
13 /
13/2828
P (+x| + y)
P (+x,+y)
→ P (+y)
P (X, Y )
→ 1
3
P (−x| + y) X Y P
P (−x,+y)
→ P (−x,+y)+P (+x,+y)
+x +y 0.2
→ 2 +x −y 0.3
3
P (−y| + x) −x +y 0.4
P (+x,−y) −x −y 0.1
→ P (+x,−y)+P (+x,+y)
→ 3
5
13 /
13/2828
Conditional Distributions
Probability distributions over some variables given fixed values of others
Joint Distribution
P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
14 /
14/2828
P (W |T = hot)
Joint Distribution
W P
P (T, W )
T W P sun 0.8
rain 0.2
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
14 /
14/2828
P (W |T = hot)
Joint Distribution
W P
P (T, W )
T W P sun 0.8
rain 0.2
hot sun 0.4
hot rain 0.1 P (W |T = cold)
cold sun 0.2 W P
cold rain 0.3
sun 0.4
rain 0.6
14 /
14/2828
Normalization Trick
P (T, W )
P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (T, W )
P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W )
P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
P (W = s, T = c) + P (W = r, T = c) P (W |T = cold)
T W P
W P
hot sun 0.4
hot rain 0.1 sun
cold sun 0.2 rain
cold rain 0.3
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
hot rain 0.1 sun 0.4
cold sun 0.2 rain
cold rain 0.3
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
hot rain 0.1 P (W = r|T = c) sun 0.4
cold sun 0.2 rain
cold rain 0.3
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
cold sun 0.2 P (W = r, T = c) rain
=
cold rain 0.3 P (T = c)
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
cold sun 0.2 P (W = r, T = c) rain
=
P (W = r, T = c)
=
P (W = s, T = c) + P (W = r, T = c)
15 /
15/2828
Normalization Trick
P (W = s|T = c)
P (W = s, T = c)
=
P (T = c)
P (T, W ) P (W = s, T = c)
=
T W P
0.2 W P
= = 0.4
hot sun 0.4 0.2 + 0.3
cold sun 0.2 P (W = r, T = c) rain 0.6
=
P (W = r, T = c)
=
P (W = s, T = c) + P (W = r, T = c)
0.3
= = 0.6
0.2 + 0.3
15 /
15/2828
Normalization Trick
P (T, W )
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
15 /
15/2828
Normalization Trick
SELECT the joint

P (T, W ) probabilities matching
the evidence P (c, W )
T W P
T W P
hot sun 0.4
hot rain 0.1 cold sun 0.2
cold sun 0.2 cold rain 0.3
cold rain 0.3
15 /
15/2828
Normalization Trick
SELECT the joint NORMALIZE the

probabilities matching selection (make
P (T, W )
the evidence it sum to one)
T W P P (c, W ) P (W |T = c)
T W P W P
hot sun 0.4
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3
15 /
15/2828
Normalization Trick

P (T, W )
T W P P (c, W ) P (W |T = c)
T W P W P
hot sun 0.4
hot rain 0.1 cold sun 0.2 sun 0.4
cold sun 0.2 cold rain 0.3 rain 0.6
cold rain 0.3
Why does this work?

• Sum of selection is P (evidence)! (P (T = c), here)
P (x1 , x2 ) P (x1 , x2 )
P (x1 |x2 ) = =P
P (x2 ) x1 P (x1 , x2 )
15 /
15/2828
Quiz: Normalization Trick
Find P (X|Y = −y)

P (X, Y )
X Y P
+x +y 0.2
+x -y 0.3
-x +y 0.4
-x -y 0.1
16 /
16/2828
Find P (X|Y = −y)

P (X, Y )
X Y P
X Y P
+x +y 0.2
+x -y 0.3
+x -y 0.3
-x -y 0.1
-x +y 0.4
-x -y 0.1
16 /
16/2828
Find P (X|Y = −y)

P (X, Y )
X Y P
X Y P X P
+x +y 0.2
+x -y 0.3 +x 0.75
+x -y 0.3
-x -y 0.1 -x 0.25
-x +y 0.4
-x -y 0.1
16 /
16/2828
To Normalize
(Dictionary) To bring or restore to a normal condition
17 /
17/2828
To Normalize
• All entries sum to ONE
17 /
17/2828
To Normalize
Procedure:
• Step 1: Compute Z = sum over all entries
• Step 2: Divide every entry by Z
17 /
17/2828
To Normalize
Procedure:
Example 1
W P
sun 0.2
rain 0.3
17 /
17/2828
To Normalize
Procedure:
Example 1
W P Normalize W P
sun 0.2 sun 0.4
rain 0.3 Z = 0.5 rain 0.6
17 /
17/2828
To Normalize
Procedure:
Example 1 Example 2
T W P
W P Normalize W P
hot sun 20
sun 0.2 sun 0.4 hot rain 5
rain 0.3 Z = 0.5 rain 0.6 cold sun 10
cold rain 15
17 /
17/2828
To Normalize
Procedure:
Example 1 Example 2
T W P T W P
W P Normalize W P
Normalize
hot sun 20 hot sun 0.4
sun 0.2 sun 0.4 hot rain 5 hot rain 0.1
rain 0.3 Z = 0.5 rain 0.6 cold sun 10 Z = 50 cold sun 0.2
cold rain 15 cold rain 0.3
17 /
17/2828
Probabilistic Inference
Compute a desired probability from other
known probabilities (e.g. conditional from
joint)
We generally compute conditional
probabilities
• P (on time|no accidents) = 0.90
• These represent the agent’s beliefs given the
evidence
Probabilities change with new evidence:
• P (on time|no accidents, 5 a.m.) = 0.95
• P (on time|no accidents, 5 a.m., raining) = 0.80
• Observing new evidence causes beliefs to be
updated
18 /
18/2828
Inference by Enumeration
General case (X1 , X2 , . . . , Xn )
• Evidence variables: E1 . . . Ek = e1 . . . ek
• Query* variable: Q
19 /
19/2828
We want: P (Q|e1 . . . ek )
(works fine with multiple query variables, too)
19 /
19/2828
• Hidden variables: H1 . . . Hr
We want: P (Q|e1 . . . ek )
(works fine with multiple query variables, too)
19 /
19/2828
Step 1: Select the

entries consistent
with the evidence
19 /
19/2828
Step 1: Select the Step 2: Sum out H to

entries consistent get joint of Query
with the evidence and evidence
19 /
19/2828
Step 1: Select the Step 2: Sum out H to

P
P (Q, e1 . . . ek ) = P (Q, h1 . . . hr , e1 . . . ek )
h1 ...hr
| {z }
X1 ,X2 ,...,Xn
19 /
19/2828
Step 1: Select the Step 2: Sum out H to Step 3: Normalize

P
P (Q, e1 . . . ek ) = P (Q, h1 . . . hr , e1 . . . ek )
h1 ...hr
| {z }
X1 ,X2 ,...,Xn
19 /
19/2828
Step 1: Select the Step 2: Sum out H to Step 3: Normalize

entries consistent get joint of Query X
Z= P (Q, e1 . . . ek )
with the evidence and evidence q
1
P (Q|e1 . . . ek ) = P (Q, e1 . . . ek )
Z
P
P (Q, e1 . . . ek ) = P (Q, h1 . . . hr , e1 . . . ek )
h1 ...hr
| {z }
X1 ,X2 ,...,Xn
19 /
19/2828
Example: Inference by Enumeration
P (W ) S T W P
summer hot sun 0.30
summer hot rain 0.05
P (W |winter) summer cold sun 0.10
summer cold rain 0.05
winter hot sun 0.10
P (W |winter, hot) winter hot rain 0.05
winter cold sun 0.15
winter cold rain 0.20
20 /
20/2828
P (W ) S T W P
P (W = sun) = 0.65
summer hot sun 0.30
P (W = rain) = 0.35
summer cold rain 0.05
winter hot sun 0.10
20 /
20/2828
P (W ) S T W P
P (W = sun) = 0.65
summer hot sun 0.30
P (W = rain) = 0.35
P (W = sun|winter) = 0.50 summer cold rain 0.05
P (W = rain|winter) = 0.50 winter hot sun 0.10
20 /
20/2828
P (W ) S T W P
P (W = sun) = 0.65
summer hot sun 0.30
P (W = rain) = 0.35
P (W = sun|winter) = 0.50 summer cold rain 0.05
P (W = rain|winter) = 0.50 winter hot sun 0.10
P (W = sun|winter, hot) = 0.666 . . . winter cold sun 0.15
P (W = rain|winter, hot) = 0.333 . . . winter cold rain 0.20
20 /
20/2828
Problems with Inference by Enumeration
Obvious problems:
• Worst-case time complexity: O(dn )
• Space complexity O(dn ) to store the joint distribution
21 /
21/2828
Problems with Inference by Enumeration
Obvious problems:
• Worst-case time complexity: O(dn )
• Space complexity O(dn ) to store the joint distribution
Availability of the joint distributions and evidence
21 /
21/2828
The Product Rule
Sometimes we have condition distributions, but want the joint distribution
P (x, y)
P (x|y) =
P (y)
P (x, y) = P (y)P (x|y)
22 /
22/2828
The Product Rule
P (x, y) = P (y)P (x|y)
Example:
P (D|W ) P (D, W )
P (W ) D W P D W P
R P
× wet sun 0.1 = wet sun
sun 0.8 dry sun 0.9 dry sun
rain 0.2 wet rain 0.7 wet rain
dry rain 0.3 dry rain
22 /
22/2828
The Product Rule
P (x, y) = P (y)P (x|y)
Example:
P (D|W ) P (D, W )
P (W ) D W P D W P
R P
× wet sun 0.1 = wet sun 0.08
sun 0.8 dry sun 0.9 dry sun 0.72
rain 0.2 wet rain 0.7 wet rain 0.14
dry rain 0.3 dry rain 0.06
22 /
22/2828
The Chain Rule
More generally, we can always write any joint distribution as an incremental
product of conditional distributions
P (x1 , x2 , x3 ) = P (x1 )P (x2 |x1 )P (x3 |x1 , x2 )

Y
P (x1 , x2 , . . . , xn ) = P (xi |x1 . . . xi−1 )
i
Why is this always true?

The following can be extended for any xn :
P (x2 , x1 ) = P (x1 ) × P (x2 |x1 )

P (x2 , x1 )
= P (x1 ) ×
P (x1 )
23 /
23/2828
Bayes’ Rule
24 /
24/2828
Bayes’ Rule
Two ways to factor a joint distribution over two variables:
P (x, y) = P (x|y)P (y) = P (y|x)P (x)
24 /
24/2828
Bayes’ Rule
P (x, y) = P (x|y)P (y) = P (y|x)P (x)
Dividing, we get:
P (y|x)
P (x|y) = P (x)
P (y)
24 /
24/2828
Bayes’ Rule
P (x, y) = P (x|y)P (y) = P (y|x)P (x)
Dividing, we get:
P (y|x)
P (x|y) = P (x)
P (y)
Why is this at all helpful?

• Lets us build one conditional from its reverse
• Often one conditional is tricky but the other one is
simple
• Foundation of many systems (e.g. ASR, MT)
24 /
24/2828
Bayes’ Rule
P (x, y) = P (x|y)P (y) = P (y|x)P (x)
Dividing, we get:
P (y|x)
P (x|y) = P (x)
P (y)
Why is this at all helpful?

• Lets us build one conditional from its reverse
• Often one conditional is tricky but the other one is
simple
• Foundation of many systems (e.g. ASR, MT)
In the running for most important AI equation!
24 /
24/2828
Inference with Bayes’ Rule
Diagnostic probability from causal probability:
P (effect|cause)P (cause)
P (cause|effect) =
P (effect)
Example:
• M : meningitis, S: stiff neck
P (+m) = 0.0001
P (+s| + m) = 0.8
P (+s| − m) = 0.01
P (+m| + s) = P (+s|+m)P
P (+s)
(+m) P (+s|+m)P (+m)
= P (+s|+m)P (+m)+P (|s|−m)P (−m) = 0.00794
• Note: posterior probability of meningitis still very small
• Note: you should still get stiff necks checked out! Why?
25 /
25/2828
Quiz: Bayes’ Rule
Given:
P (D|W )
P (W ) D W P
R P
wet sun 0.1
sun 0.8 dry sun 0.9
rain 0.2 wet rain 0.7
dry rain 0.3
What is P (W |dry)?
26 /
26/2828
Ghostbusters (Revisited)
Let’s say we have two distributions:

• Prior distribution over ghost location: P (G)
▶ Let’s say this is uniform
• Sensor reading model: P (R|G)
▶ Given: we know what our sensors do
▶ R =reading color measured at (1, 1)
▶ e.g. P (R = yellow|G = (1, 1)) = 0.1
Video: ghosts - with probability

27 /
27/2828
Ghostbusters (Revisited)
Let’s say we have two distributions:

• Prior distribution over ghost location: P (G)
▶ Let’s say this is uniform
• Sensor reading model: P (R|G)
▶ Given: we know what our sensors do
▶ R =reading color measured at (1, 1)
▶ e.g. P (R = yellow|G = (1, 1)) = 0.1
We can calculate the posterior distribution P (G|r)
over ghost locations given a reading using Bayes’ rule:
P (g|r) inf P (r|g)P (g)
Video: ghosts - with probability

27 /
27/2828
Suggested Reading
Russell & Norvig: Chapter 13.1-13.5
28 /
28/2828

Probability

Uploaded by

Copyright:

Available Formats

You might also like

Probability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability

Uploaded by

Copyright:

Available Formats

Probability

CSE 4711: Artificial Intelligence

Md. Bakhtiar Hasan

Video: ghosts - manual

Video: ghosts - manual

Video: ghosts - manual

SELECT the joint

SELECT the joint NORMALIZE the

SELECT the joint NORMALIZE the

Why does this work?

SELECT the joint NORMALIZE the

SELECT the joint NORMALIZE the

SELECT the joint NORMALIZE the

Step 1: Select the

Step 1: Select the Step 2: Sum out H to

Step 1: Select the Step 2: Sum out H to

Step 1: Select the Step 2: Sum out H to Step 3: Normalize

Step 1: Select the Step 2: Sum out H to Step 3: Normalize

P (x, y) = P (y)P (x|y)

P (x, y) = P (y)P (x|y)

P (x1 , x2 , x3 ) = P (x1 )P (x2 |x1 )P (x3 |x1 , x2 )

Why is this always true?

P (x2 , x1 ) = P (x1 ) × P (x2 |x1 )

Why is this at all helpful?

Why is this at all helpful?

Let’s say we have two distributions:

Video: ghosts - with probability

Let’s say we have two distributions:

Video: ghosts - with probability

You might also like