Professional Documents
Culture Documents
Conditional Probability and Revision
Conditional Probability and Revision
Conditional Probability and Revision
Milind G. Sohoni
Operations Management
The Indian School of Business, Gachibowli, Hyderabad
© Milind G. Sohoni 1 / 35
Probability Review(slide 2)
Outline
Topics
Introduction
Conditional Probability
Appendix
© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline
Topics
Introduction
Conditional Probability
Appendix
© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline
Topics
Introduction
Conditional Probability
Appendix
© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline
Topics
Introduction
Conditional Probability
Appendix
© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline
Topics
Introduction
Conditional Probability
Appendix
© Milind G. Sohoni 2 / 35
Probability Review(slide 3)
Introduction
© Milind G. Sohoni 3 / 35
Probability Review(slide 4)
Introduction
© Milind G. Sohoni 4 / 35
Probability Review(slide 5)
Conditional Probability
Conditional probability
© Milind G. Sohoni 5 / 35
Probability Review(slide 6)
Conditional Probability
Conditional probability
Conditional probability provides us with a way to reason about the
outcome of an experiment, based on partial information. Here are some
examples:
I A spot shows up on a radar screen. How likely is it that it
corresponds to an aircraft?
I In a word guessing game, the first letter of the word is a “t”. What
is the likelihood that the second letter is an “h”?
© Milind G. Sohoni 6 / 35
Probability Review(slide 7)
Conditional Probability
Conditional probability
In general
P(A ∩ B)
P(A| B) =
P(B)
assuming P(B) > 0.
Multiplication rule:
Assuming that all of the conditioning events have positive probability, we have
P (∩ni=1 Ai ) = P (A1 ) · P (A2 | A1 ) · P (A3 | A1 ∩ A2 ) · · · P An | ∩n−1
i=1 Ai .
and by using the definition of conditional probability to rewrite the right-hand side
above as
P (A1 ) P (A2 | A1 ) P (A3 | A1 ∩ A2 ) · · · P (An | ∩ni=1 Ai ) .
© Milind G. Sohoni 7 / 35
Probability Review(slide 8)
Conditional Probability
Visualization
Sec. 1.3
ofConditional
the multiplication
Probability
rule 23
An
A1 A2 A3
. . . An-1
P(A1 ) P(A2 |A1 ) P(A3 |A1 ∩A2 ) P(An |A1 ∩A2 ∩ ...∩An-1)
© Milind G. Sohoni 8 / 35
Probability Review(slide 9)
Conditional Probability
An example
Three cards are drawn from an ordinary 52-card deck without replacement
(drawn cards are not placed back in the deck). We wish to find the
probability that none of the three cards is a heart. We assume that at
each step, each one of the remaining cards is equally likely to be picked.
© Milind G. Sohoni 9 / 35
Probability Review(slide 10) 37
Conditional Probability
P(A3 | A1 ∩ A2 ) = .
50
These probabilities are recorded along the corresponding branches of the tree de-
scribing the sample space, as shown in Fig. 1.10. The desired probability is now
obtained by multiplying the probabilities recorded along the corresponding path of
An example the tree:
39 38 37
P(A1 ∩ A2 ∩ A3 ) = · · .
Notice 52 51 50
Not a Heart
37/50
Heart
13/52
Figure 1.10: Sequential description of the sample space of the 3-card selection
Figure: Sequential description
problem in Example 1.10.of the sample space of the 3-card selection
problem.
© Milind G. Sohoni 10 / 35
Probability Review(slide 11)
Conditional Probability
An example
Note that once the probabilities are recorded along the tree, the
probability of several other events can be similarly calculated.
For example,
39 13
P(1st is not a heart and 2nd is a heart) = · = 0.191,
52 51
or
39 38 13
P(1st two are not hearts and 3rd is a heart) = · · = 0.145.
52 51 50
© Milind G. Sohoni 11 / 35
Probability Review(slide 12)
Total Probability Theorem and Bayes’ Rule
© Milind G. Sohoni 12 / 35
Probability Review(slide 13)
Total Probability Theorem and Bayes’ Rule
A1 A1 ∩B
A1
A2
B A2 ∩B
A3
B
A3 ∩B
A2 A3
Bc
Figure 1.12: Visualization and verification of the total probability theorem. The
events A1 , . . . , An form a partition of the sample space, so the event B can be
© Milind G. Sohoni 13 / 35
Probability Review(slide 14)
Total Probability Theorem and Bayes’ Rule
Bayes’ rule
Bayes’ rule:
Let A1 , A2 , . . ., An be disjoint events that form a partition of the sample space,
and assume P (Ai ) > 0 for all i = 1, . . . , n. Then, for any event B such that
P(B) > 0, we have
P (B| Ai )
P (Ai | B) = · P (Ai )
| {z } P (B) | {z }
Posterior | {z } Prior
Likelihood
P (Ai ) P (B| Ai )
= .
P (A1 ) P (B| A1 ) + · · · + P (An ) P (B| An )
Bayes’ rule is often used for inference. There are a number of “causes” that
may result in a certain “effect.” We observe the effect, and we wish to infer the
cause. The events A1 ,. . ., An are associated with the causes and the event B
represents the effect. The probability P(B| Ai ) that the effect will be observed
when the cause Ai is present amounts to a probabilistic model of the
cause-effect relation (). Given that the effect B has been observed, we wish to
evaluate the (conditional) probability P(Ai | B) that the cause Ai is present.
© Milind G. Sohoni 14 / 35
Probability Review(slide 15)
Total Probability Theorem and Bayes’ Rule
P(B| Ai )
P(Ai | B) = · P(Ai ), i = 1, 2, 3.
| {z } P(A1 )P(B| A1 ) + P(A2 )P(B| A2 ) + P(A3 )P(B| A3 ) | {z }
Posterior | {z } Prior
Likelihood
© Milind G. Sohoni 15 / 35
Probability Review(slide 16)
Total Probability Theorem and Bayes’ Rule
the cause. The events A1 , . . . , An are associated with the causes and the event B
represents the effect. The probability P(B | Ai ) that the effect will be observed
when the cause Ai is present amounts to a probabilistic model of the cause-effect
Example: Bayes’ rule
relation (cf. Fig. 1.13). Given that the effect B has been observed, we wish to
evaluate the (conditional) probability P(Ai | B) that the cause Ai is present.
Cause 3 B A1 ∩B
Other
Cause 1
A1
Malignant Tumor A1 Bc
B A2 ∩B
B A2
Effect
Cause 2
Shade Observed
A3 Bc
Nonmalignant A2
Tumor
A3 B A3 ∩B
Bc
P (H0 | E ) P (E | H0 ) P (H0 )
= ·
P (HA | E ) P (E | HA ) P (HA )
| {z } | {z } | {z }
Posterior odds Likelihood ratio Prior odds
that is,
Posterior odds = Likelihood ratio × Prior odds.
© Milind G. Sohoni 17 / 35
Probability Review(slide 18)
Total Probability Theorem and Bayes’ Rule
Independent events
An interesting and important special case arises when the occurrence of
B provides no information and does not alter the probability that A has
occurred, i.e.,
P(A| B) = P(A).
When the above equality holds, we say that A is independent of B. Note
that by the definition P(A| B) = P(A ∩ B)/P(B), this is equivalent to
P(A ∩ B) = P(A)P(B). We adopt this latter relation as the definition of
independence because it can be used even if P(B) = 0, in which case
P(A| B) is undefined. The symmetry of this relation also implies that
independence is a symmetric property; that is, if A is independent of B,
then B is independent of A, and we can unambiguously say that A and B
are independent events.
Example: Consider a regular pack of 52 cards.
2 4 1
P(Ace| Drawn a red card) = = P(Ace) = = .
26 52 13
© Milind G. Sohoni 18 / 35
Probability Review(slide 19)
Total Probability Theorem and Bayes’ Rule
Let T + define the event that the test comes back positive. Let D+
denote the event that the patient has the disease. We are interested in
computing P (D + | T +).
P (T + | D+) P (D+)
P (D + | T +) =
P (T +)
=?
© Milind G. Sohoni 19 / 35
Probability Review(slide 20)
Total Probability Theorem and Bayes’ Rule
Let D = 2 be the event that the car is actually behind door 2, that is you
win by switching to door 2. Let E = 3 be the event that Monty opens
door 3. You wish to compute
P (E = 3| D = 2) P (D = 2)
P (D = 2| E = 3) =
P (E = 3)
=?
© Milind G. Sohoni 20 / 35
Probability Review(slide 21)
Total Probability Theorem and Bayes’ Rule
1/2 2 1/18
1 1/2
1/3
3 1/18
1/3 1
1 2 3 1/9
1/3
3 1
2 1/9
1/3
1 1/9
3
1
1/3
1/2 1 1/18
1/3 1/3
2 2
1/2
1/3 3 1/18
3 1
1 1/9
1/3
1 1/9
1 2
1/3
1/3 1
3 2 1 1/9
1/3
1/2 1 1/18
3
1/2
2 1/18
© Milind G. Sohoni 21 / 35
Probability Review(slide 22)
Total Probability Theorem and Bayes’ Rule
Bayesian update
1 1/2
1/3
3 1/18 1/3
1/3
1 3 1/9 2/3
1
1/3
1/3 2
© Milind G. Sohoni 22 / 35
Probability Review(slide 23)
Multiple Random Variables
© Milind G. Sohoni 23 / 35
Probability Review(slide 24)
Multiple Random Variables
For two discrete r.v’s X and Y , the joint probability distribution describes
the probability of certain X and Y combinations, i.e., P(X = xi , Y = yj ).
Note: All future discussion extends to more than 2 discrete r.v’s as well.
© Milind G. Sohoni 24 / 35
Probability Review(slide 25)
Multiple Random Variables
© Milind G. Sohoni 25 / 35
Probability Review(slide 26)
Multiple Random Variables
Marginal distributions
Marginal distribution of X is defined as follows:
X
P (X = xi ) = P (X = xi , Y = yj )
yj
P (X = xi , Y = yj )
P (X = xi | Y = yj ) = .
P (Y = yj )
P (X = xi , Y = yj ) = P (X = xi ) · P (Y = yj ) for all xi , yj
and
P (X = xi | Y = yj ) = P (X = xi ) for all xi , yj .
© Milind G. Sohoni 26 / 35
Probability Review(slide 27)
Multiple Random Variables
1. P (X = 150) =?
2. P (X = 150| Y = 290) =?
3. Are X and Y independent?
4. E [X | Y = 290] =?
© Milind G. Sohoni 27 / 35
Probability Review(slide 28)
Multiple Random Variables
Conditional expectation
© Milind G. Sohoni 28 / 35
Probability Review(slide 29)
Multiple Random Variables
imply?
2.1 Does correlation = 0
imply independence?
© Milind G. Sohoni 29 / 35
Probability Review(slide 30)
Appendix
© Milind G. Sohoni 30 / 35
Probability Review(slide 31)
Appendix
Let A denote the event set that “in a class of 70 students, some pair has
the same birthday.” Let Ac denote the complement set, i.e. no pair has
the same birthday – in other words, number of 70-tuples where all entries
are different. Then
Pr {A} = 1 − Pr {Ac }
365
P70
=1−
365 × · · · × 365
365 × 364 × · · · × (365 − 69)
=1−
36570
= 0.9995.
© Milind G. Sohoni 31 / 35
Probability Review(slide 32)
Appendix
P (E = 3| D = 2) P (D = 2)
=
P (E = 3| D = 1) P (D = 1) + P (E = 3| D = 2) P (D = 2) + P (E = 3| D = 3) P (D = 3)
1
1× 3
=
1 1 1 1
2 × 3 + 1× 3 + 0× 3
2
= .
3
© Milind G. Sohoni 33 / 35
Probability Review(slide 34)
Appendix
© Milind G. Sohoni 34 / 35
Probability Review(slide 35)
Appendix
cov (X , Y )
corr (X , Y ) = ρX ,Y =
σX σY
where σX and σY are the standard deviations of the distributions of X and Y
respectively. Correlation is always between -1 and 1.
2. A linear relationship implies perfect correlation, i.e., Y = aX + b. If a < 0 then
the variables are negatively correlated, else positively.
3. Correlation = 0 does not imply independence (however, independence implies 0
correlation). All it could mean is that the variables are not linearly dependent.
4. Correlation does not imply causality.
© Milind G. Sohoni 35 / 35