Conditional Probability and Revision

Probability Review(slide 1)
Basic Probability Review
Milind G. Sohoni
Operations Management
The Indian School of Business, Gachibowli, Hyderabad
© Milind G. Sohoni 1 / 35
Outline
Topics
Introduction
Conditional Probability
Total Probability Theorem and Bayes’ Rule
Multiple Random Variables
Appendix
Outline
Topics
Introduction
Appendix
Outline
Topics
Introduction
Appendix
Outline
Topics
Introduction
Appendix
Outline
Topics
Introduction
Appendix
Introduction
Review of basic concepts
Introduction
Decision making under uncertainty: Building intuition

Consider the following situations:
1. A laboratory blood test is 95% effective in detecting a certain
disease when it is, in fact, present. However, the test also yields a
“false positive” result for 1% of healthy persons tested. If 0.5% of
the population actually has the disease, what is the probability that
a person has the disease given that his test results is positive? Is the
test really very effective?
1.1 High,
1.2 Low,
1.3 Can’t say.
2. In a class of 70 students what is the probability that some pair will

have the same birthday?
2.1 High,
2.2 Low,
2.3 Can’t say.
Conditional probability
Conditional probability provides us with a way to reason about the
outcome of an experiment, based on partial information. Here are some
examples:
I A spot shows up on a radar screen. How likely is it that it
corresponds to an aircraft?
I In a word guessing game, the first letter of the word is a “t”. What
is the likelihood that the second letter is an “h”?
In more precise terms, given an experiment, a corresponding sample

space, and a probability law, suppose that we know that the outcome is
within some given event B. We wish to quantify the likelihood that the
outcome also belongs to some other given event A. We thus seek to
construct a new probability law, which takes into account this knowledge
and which, for any event A, gives us the conditional probability of A
given B, denoted by P (A| B).
In general
P(A ∩ B)
P(A| B) =
P(B)
assuming P(B) > 0.
Multiplication rule:
Assuming that all of the conditioning events have positive probability, we have

P (∩ni=1 Ai ) = P (A1 ) · P (A2 | A1 ) · P (A3 | A1 ∩ A2 ) · · · P An | ∩n−1
i=1 Ai .
This rule can be easily verified by simply rewriting

P (A1 ∩ A2 ) P (A1 ∩ A2 ∩ A3 ) P ∩ni=1 Ai
P (∩ni=1 Ai ) = P (A1 ) · · ··· ,
P (A1 ) P (A1 ∩ A2 ) P ∩n−1
i=1 Ai
and by using the definition of conditional probability to rewrite the right-hand side
above as
P (A1 ) P (A2 | A1 ) P (A3 | A1 ∩ A2 ) · · · P (An | ∩ni=1 Ai ) .
Visualization
Sec. 1.3
ofConditional
the multiplication
Probability
rule 23
Event A1 ∩A2 ∩A3 Event A1 ∩A2 ∩ ...∩An
An
A1 A2 A3
. . . An-1
P(A1 ) P(A2 |A1 ) P(A3 |A1 ∩A2 ) P(An |A1 ∩A2 ∩ ...∩An-1)
Figure: Visualization of the total

Figure 1.9: Visualization of theprobability
total probabilitytheorem.
theorem. TheThe intersection
intersection event event
A = A1 ∩A2 ∩· · ·∩An is associated with a path on the tree of a sequential descrip-
A = A1 ∩ A2 tion∩ ·of· ·the
∩ experiment.
An is associated with a path on the tree of
We associate the branches of this path with the events a sequential
description ofA1the
,...,A n , and we record We
experiment. next associate
to the branches the branches
the correspondingof conditional
this path with the
probabilities.
events A1 ,. . . , AThe
n , and we record
final node next
of the path to thetobranches
corresponds theevent
the intersection corresponding
A, and
its probability is obtained by multiplying the conditional probabilities recorded
conditional probabilities.
along the branches of the path
P(A1 ∩ A2 ∩ · · · ∩ A3 ) = P(A1 )P(A2 | A1 ) · · · P(An | A1 ∩ A2 ∩ · · · ∩ An−1 ).

The final node of the path corresponds to the intersection event A, and
its probability
Noteis obtained
that by multiplying
any intermediate node along the path thealso conditional
corresponds to some probabilities
inter-
section event and its probability is obtained by multiplying the corresponding
recorded along the probabilities
conditional branchesupof thenode.
to that path For example, the event A1 ∩ A2 ∩ A3
corresponds to the node shown in the figure, and its probability is
P (A1 ∩ A2 ∩ · · · ∩ A3 ) = P (A1 ) P (A2 | A1 ) · · · P (An | A1 ∩ A2 ∩ · · · ∩ An−1 ) .
P(A1 ∩ A2 ∩ A3 ) = P(A1 )P(A2 | A1 )P(A3 | A1 ∩ A2 ).
An example
Three cards are drawn from an ordinary 52-card deck without replacement
(drawn cards are not placed back in the deck). We wish to find the
probability that none of the three cards is a heart. We assume that at
each step, each one of the remaining cards is equally likely to be picked.
By symmetry, this implies that every triplet of cards is equally likely to be

drawn. A cumbersome approach, that we will not use, is to count the
number of all card triplets that do not include a heart, and divide it with
the number of all possible card triplets. Instead, we use a sequential
description of the sample space in conjunction with the multiplication
rule. Let us define the event
Ai = {the ith card is not a heart} , i = 1, 2, 3.
We will calculate P (A1 ∩ A2 ∩ A3 ) the probability that none of the cards

is a heart.
Probability Review(slide 10) 37
P(A3 | A1 ∩ A2 ) = .
50
These probabilities are recorded along the corresponding branches of the tree de-
scribing the sample space, as shown in Fig. 1.10. The desired probability is now
obtained by multiplying the probabilities recorded along the corresponding path of
An example the tree:
39 38 37
P(A1 ∩ A2 ∩ A3 ) = · · .
Notice 52 51 50
39Note that once the probabilities are 38

recorded along the tree, the probability
of several other events can be similarly calculated. For example,
37
P (A1 ) = , P (A2 | A1 ) = P (A3 |A1 ∩ A2 ) = .
52 51 39 13
50
P(1st is not a heart and 2nd is a heart) = · ,
Thus, 52 51
39 38 13
39 38 37
P(1st two are not hearts and 3rd is a heart) = · ·
52 51 50
.
P (A1 ∩ A2 ∩ A3 ) = · · = 0.57.
52 51 50
Not a Heart
37/50
Not a Heart Heart

38/51 13/50
Not a Heart Heart

39/52 13/51
Heart
13/52
Figure 1.10: Sequential description of the sample space of the 3-card selection
Figure: Sequential description
problem in Example 1.10.of the sample space of the 3-card selection
problem.
An example
Note that once the probabilities are recorded along the tree, the
probability of several other events can be similarly calculated.
For example,
39 13
P(1st is not a heart and 2nd is a heart) = · = 0.191,
52 51
or
39 38 13
P(1st two are not hearts and 3rd is a heart) = · · = 0.145.
52 51 50
Total probability theorem and Bayes’ rule
Total probability theorem

Theorem 1
[Total Probability] Let A1 , . . ., An be disjoint events that form a partition of
the sample space (each possible outcome is included in one and only one of the
events A1 , . . ., An ) and assume P (Ai ) > 0, for all i = 1, . . . , n. Then, for any
event B, we have
P(B) = P (A1 ∩ B) + · · · + P (An ∩ B)
= P (A1 ) P (B| A1 ) + · · · + P (An ) P (B| An ) .
Sec. 1.4 Total Probability Theorem and Bayes’ Rule 27
A1 A1 ∩B
A1
A2
B A2 ∩B
A3
B
A3 ∩B
A2 A3
Bc
Figure 1.12: Visualization and verification of the total probability theorem. The
events A1 , . . . , An form a partition of the sample space, so the event B can be
Bayes’ rule
Bayes’ rule:
Let A1 , A2 , . . ., An be disjoint events that form a partition of the sample space,
and assume P (Ai ) > 0 for all i = 1, . . . , n. Then, for any event B such that
P(B) > 0, we have
P (B| Ai )
P (Ai | B) = · P (Ai )
| {z } P (B) | {z }
Posterior | {z } Prior
Likelihood
P (Ai ) P (B| Ai )
= .
P (A1 ) P (B| A1 ) + · · · + P (An ) P (B| An )
Bayes’ rule is often used for inference. There are a number of “causes” that
may result in a certain “effect.” We observe the effect, and we wish to infer the
cause. The events A1 ,. . ., An are associated with the causes and the event B
represents the effect. The probability P(B| Ai ) that the effect will be observed
when the cause Ai is present amounts to a probabilistic model of the
cause-effect relation (). Given that the effect B has been observed, we wish to
evaluate the (conditional) probability P(Ai | B) that the cause Ai is present.
Example: Bayes’ rule

An example of the inference context that is implicit in Bayes’ rule:
Suppose we observe a shade in a person’s X-ray (this is event B, the
“effect”) and we want to estimate the likelihood of three mutually
exclusive and collectively exhaustive potential causes: cause 1 (event A1 )
is that there is a malignant tumor, cause 2 (event A2 ) is that there is a
nonmalignant tumor, and cause 3 (event A3 ) corresponds to reasons
other than a tumor. We assume that we know the probabilities P(Ai )
and P(B| Ai ), i = 1, 2, 3. Given that we see a shade (event B occurs),
Bayes’ rule gives the conditional probabilities of the various causes as
P(B| Ai )
P(Ai | B) = · P(Ai ), i = 1, 2, 3.
| {z } P(A1 )P(B| A1 ) + P(A2 )P(B| A2 ) + P(A3 )P(B| A3 ) | {z }
Posterior | {z } Prior
Likelihood
the cause. The events A1 , . . . , An are associated with the causes and the event B
represents the effect. The probability P(B | Ai ) that the effect will be observed
when the cause Ai is present amounts to a probabilistic model of the cause-effect
Example: Bayes’ rule
relation (cf. Fig. 1.13). Given that the effect B has been observed, we wish to
evaluate the (conditional) probability P(Ai | B) that the cause Ai is present.
Cause 3 B A1 ∩B
Other
Cause 1
A1
Malignant Tumor A1 Bc
B A2 ∩B
B A2
Effect
Cause 2
Shade Observed
A3 Bc
Nonmalignant A2
Tumor
A3 B A3 ∩B
Bc
Figure 1.13: An example of the inference context that is implicit in Bayes’

rule. We observe a shade in a person’s X-ray (this is event B, the “effect”) and
Odds ratio: Bayes’ rule
An alternate explanation for Bayes’ rule is as follows. Suppose we have a

null hypothesis H0 and we collect some evidence E (for example, data).
Further, let hypothesis HA be an alternate hypothesis if H0 is rejected.
Then, Bayes’ rule suggests
P (H0 | E ) P (E | H0 ) P (H0 )
= ·
P (HA | E ) P (E | HA ) P (HA )
| {z } | {z } | {z }
Posterior odds Likelihood ratio Prior odds
that is,
Posterior odds = Likelihood ratio × Prior odds.
Independent events
An interesting and important special case arises when the occurrence of
B provides no information and does not alter the probability that A has
occurred, i.e.,
P(A| B) = P(A).
When the above equality holds, we say that A is independent of B. Note
that by the definition P(A| B) = P(A ∩ B)/P(B), this is equivalent to
P(A ∩ B) = P(A)P(B). We adopt this latter relation as the definition of
independence because it can be used even if P(B) = 0, in which case
P(A| B) is undefined. The symmetry of this relation also implies that
independence is a symmetric property; that is, if A is independent of B,
then B is independent of A, and we can unambiguously say that A and B
are independent events.
Example: Consider a regular pack of 52 cards.
2 4 1
P(Ace| Drawn a red card) = = P(Ace) = = .
26 52 13
Revisiting our original example

A laboratory blood test is 95% effective in detecting a certain disease
when it is, in fact, present. However, the test also yields a “false positive”
result for 1% of healthy persons tested. If 0.5% of the population
actually has the disease, what is probability a person has the disease
given that his test results is positive? Is the test really very effective?
Let T + define the event that the test comes back positive. Let D+
denote the event that the patient has the disease. We are interested in
computing P (D + | T +).
P (T + | D+) P (D+)
P (D + | T +) =
P (T +)
=?
The Monty Hall Show

“Suppose you’re on Monty Hall’s Let’s Make a Deal! You are given the
choice of three doors, behind one door is a car, the others, goats. You
pick a door, say 1, Monty opens another door, say 3, which has a goat.
Monty says to you “Do you want to pick door 2?” Is it to your advantage
to switch your choice of doors?”
Let D = 2 be the event that the car is actually behind door 2, that is you
win by switching to door 2. Let E = 3 be the event that Monty opens
door 3. You wish to compute
P (E = 3| D = 2) P (D = 2)
P (D = 2| E = 3) =
P (E = 3)
=?
The Monty Hall Show

Placement Door chosen Door opened Path
of car by contestant by Monty probabilities
1/2 2 1/18
1 1/2
1/3
3 1/18
1/3 1
1 2 3 1/9
1/3
3 1
2 1/9
1/3
1 1/9
3
1
1/3
1/2 1 1/18
1/3 1/3
2 2
1/2
1/3 3 1/18
3 1
1 1/9
1/3
1 1/9
1 2
1/3
1/3 1
3 2 1 1/9
1/3
1/2 1 1/18
3
1/2
2 1/18
Bayesian update
Placement Door chosen Door opened Unconditional Conditional

of car by contestant by Monty probability probability
1 1/2
1/3
3 1/18 1/3
1/3
1 3 1/9 2/3
1
1/3
1/3 2
Multiple random variables
For two discrete r.v’s X and Y , the joint probability distribution describes
the probability of certain X and Y combinations, i.e., P(X = xi , Y = yj ).
Note: All future discussion extends to more than 2 discrete r.v’s as well.
Consider the following sales data from an upscale café where X =

Number of hot coffees sold and Y = Number of cold drinks sold:

Probability (pi ) Hot coffees (X ) Cold drinks (Y )
0.1 360 360

0.1 790 110
0.15 840 30
0.05 260 90
0.15 190 450
0.1 300 230
0.1 490 60
0.1 150 290
0.1 550 140
0.05 510 290
Mean E [X ] = 457 E [Y ] = 210
Std. Dev. σX = 244.3 σY = 145.6
Table: Cafe sales data.
Marginal distributions
Marginal distribution of X is defined as follows:
X
P (X = xi ) = P (X = xi , Y = yj )
yj
Conditional distribution of X given Y is defined as:
P (X = xi , Y = yj )
P (X = xi | Y = yj ) = .
P (Y = yj )
Note: Conditional probabilities must sum up to 1.

X and Y are independent if
P (X = xi , Y = yj ) = P (X = xi ) · P (Y = yj ) for all xi , yj
and
P (X = xi | Y = yj ) = P (X = xi ) for all xi , yj .
Answer the following
Answer the following using the café sales data:
1. P (X = 150) =?
2. P (X = 150| Y = 290) =?
3. Are X and Y independent?
4. E [X | Y = 290] =?
Conditional expectation
Let X and Y be discrete r.v’s, then the conditional expectation of X

given Y is a function of y over the domain of Y (all values it can take).
Using our earlier definitions of conditional probability and marginal
distributions the conditional expectation can be computed as
M
X
E [X | Y = y ] = xi P (X = xi | Y = y )
i=1
M
X P (X = xi , Y = y )
= xi .
i=1
P (Y = y )
Observe that E [X | Y ] is a random variable with randomness inherited

from Y .
Measuring relationship between variables
1. What measures the Y  Cold drinks
direction of the 400

relationship between X
300
and Y ?
2. How is correlation 200
defined? What does a 100

linear relationship between
two random variables 200 400 600 800
X  Hot coffee
imply?
2.1 Does correlation = 0
imply independence?
Appendix
Solutions to some problems discussed in class
Appendix
The birthday example

Let S = {Set of all possible 70-tuple birthdays}, i.e.
S = {(Jan 1, Feb 20, . . . ,Jan 1, . . .) , . . .} .
Let A denote the event set that “in a class of 70 students, some pair has
the same birthday.” Let Ac denote the complement set, i.e. no pair has
the same birthday – in other words, number of 70-tuples where all entries
are different. Then
Pr {A} = 1 − Pr {Ac }
365
P70
=1−
365 × · · · × 365
365 × 364 × · · · × (365 − 69)
=1−
36570
= 0.9995.
Appendix
Lab blood test example

A laboratory blood test is 95% effective in detecting a certain disease
when it is, in fact, present. However, the test also yields a “false positive”
result for 1% of healthy persons tested. If 0.5% of the population
actually has the disease, what is probability a person has the disease
given that his test results is positive? Is the test really very effective?
Let T + define the event that the test comes back positive. Let D+
denote the event that the patient has the disease. Further, let D−
denote the event that the patient does not have the disease. Notice, D+
and D− are mutually exclusive and collective exhaustive events (a
partition). We are interested in computing P (D + | T +).
P (T + | D+) P (D+)
P (D + | T +) =
P (T +)
P (T + | D+) P (D+)
=
P (T + | D+) P (D+) + P (T + | D−) P (D−)
0.95 × 0.005
=
(0.95 × 0.005) + (0.01 × 0.995)
≈ 0.323
Appendix
The Monty Hall Show

Let D = 2 be the event that the car is actually behind door 2, that is you
win by switching to door 2. Let E = 3 be the event that Monty opens
door 3. Further, let D = 1 be the event the car is behind door 1, and
D = 3 be the event that the car is behind door 3. Notice D = 1, D = 2,
and D = 3 are mutually exclusive and collective exhaustive events (a
partition). You wish to compute
P (E = 3| D = 2) P (D = 2)
P (D = 2| E = 3) =
P (E = 3)
P (E = 3| D = 2) P (D = 2)
=
P (E = 3| D = 1) P (D = 1) + P (E = 3| D = 2) P (D = 2) + P (E = 3| D = 3) P (D = 3)
1
1× 3
=
1 1 1 1
2 × 3 + 1× 3 + 0× 3
2
= .
3
Appendix
The café example
Answer the following using the café sales data:

1. P (X = 150) = 0.1
0.1
2. P (X = 150| Y = 290) = 0.1+0.05 = 32 .
3. Are X and Y independent?
3.1 No. Because P (X = 150) 6= P (X = 150| Y = 290) .
2 1 810
4. E [X | Y = 290] = 150 × 3 + 510 × 3 = 3 = 270.
Appendix
Measuring relationship between variables

1. Covariance and correlation measure the direction of the relationship between X
and Y
cov(X , Y ) = E [(X − E[X ]) (Y − E[Y ])]
M
X
= P (X = xi , Y = yi ) ((X − E[X ]) (Y − E[Y ]))
i=1
cov (X , Y )
corr (X , Y ) = ρX ,Y =
σX σY
where σX and σY are the standard deviations of the distributions of X and Y
respectively. Correlation is always between -1 and 1.
2. A linear relationship implies perfect correlation, i.e., Y = aX + b. If a < 0 then
the variables are negatively correlated, else positively.
3. Correlation = 0 does not imply independence (however, independence implies 0
correlation). All it could mean is that the variables are not linearly dependent.
4. Correlation does not imply causality.

Conditional Probability and Revision

Uploaded by

Copyright:

Available Formats

You might also like

Conditional Probability and Revision

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conditional Probability and Revision

Uploaded by

Copyright:

Available Formats

Probability Review(slide 1)

Basic Probability Review

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Review of basic concepts

Decision making under uncertainty: Building intuition

2. In a class of 70 students what is the probability that some pair will

In more precise terms, given an experiment, a corresponding sample

This rule can be easily verified by simply rewriting

Event A1 ∩A2 ∩A3 Event A1 ∩A2 ∩ ...∩An

Figure: Visualization of the total

P(A1 ∩ A2 ∩ · · · ∩ A3 ) = P(A1 )P(A2 | A1 ) · · · P(An | A1 ∩ A2 ∩ · · · ∩ An−1 ).

By symmetry, this implies that every triplet of cards is equally likely to be

Ai = {the ith card is not a heart} , i = 1, 2, 3.

We will calculate P (A1 ∩ A2 ∩ A3 ) the probability that none of the cards

39Note that once the probabilities are 38

Not a Heart Heart

Not a Heart Heart

Total probability theorem and Bayes’ rule

Total probability theorem

Example: Bayes’ rule

Figure 1.13: An example of the inference context that is implicit in Bayes’

Odds ratio: Bayes’ rule

An alternate explanation for Bayes’ rule is as follows. Suppose we have a

Revisiting our original example

The Monty Hall Show

The Monty Hall Show

Placement Door chosen Door opened Unconditional Conditional

Multiple random variables

Multiple random variables

Consider the following sales data from an upscale café where X =

Multiple random variables

0.1 360 360

Table: Cafe sales data.

Conditional distribution of X given Y is defined as:

Note: Conditional probabilities must sum up to 1.

Answer the following

Answer the following using the café sales data:

Let X and Y be discrete r.v’s, then the conditional expectation of X

Observe that E [X | Y ] is a random variable with randomness inherited

Measuring relationship between variables

1. What measures the Y  Cold drinks

direction of the 400

defined? What does a 100

Solutions to some problems discussed in class

The birthday example

S = {(Jan 1, Feb 20, . . . ,Jan 1, . . .) , . . .} .

Lab blood test example

The Monty Hall Show

The café example

Answer the following using the café sales data: