Conditional Probability and Revision

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Probability Review(slide 1)

Basic Probability Review

Milind G. Sohoni

Operations Management
The Indian School of Business, Gachibowli, Hyderabad

© Milind G. Sohoni 1 / 35
Probability Review(slide 2)
Outline

Topics

Introduction

Conditional Probability

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Appendix

© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline

Topics

Introduction

Conditional Probability

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Appendix

© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline

Topics

Introduction

Conditional Probability

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Appendix

© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline

Topics

Introduction

Conditional Probability

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Appendix

© Milind G. Sohoni 2 / 35
Probability Review(slide 2)
Outline

Topics

Introduction

Conditional Probability

Total Probability Theorem and Bayes’ Rule

Multiple Random Variables

Appendix

© Milind G. Sohoni 2 / 35
Probability Review(slide 3)
Introduction

Review of basic concepts

© Milind G. Sohoni 3 / 35
Probability Review(slide 4)
Introduction

Decision making under uncertainty: Building intuition


Consider the following situations:
1. A laboratory blood test is 95% effective in detecting a certain
disease when it is, in fact, present. However, the test also yields a
“false positive” result for 1% of healthy persons tested. If 0.5% of
the population actually has the disease, what is the probability that
a person has the disease given that his test results is positive? Is the
test really very effective?
1.1 High,
1.2 Low,
1.3 Can’t say.

2. In a class of 70 students what is the probability that some pair will


have the same birthday?
2.1 High,
2.2 Low,
2.3 Can’t say.

© Milind G. Sohoni 4 / 35
Probability Review(slide 5)
Conditional Probability

Conditional probability

© Milind G. Sohoni 5 / 35
Probability Review(slide 6)
Conditional Probability

Conditional probability
Conditional probability provides us with a way to reason about the
outcome of an experiment, based on partial information. Here are some
examples:
I A spot shows up on a radar screen. How likely is it that it
corresponds to an aircraft?
I In a word guessing game, the first letter of the word is a “t”. What
is the likelihood that the second letter is an “h”?

In more precise terms, given an experiment, a corresponding sample


space, and a probability law, suppose that we know that the outcome is
within some given event B. We wish to quantify the likelihood that the
outcome also belongs to some other given event A. We thus seek to
construct a new probability law, which takes into account this knowledge
and which, for any event A, gives us the conditional probability of A
given B, denoted by P (A| B).

© Milind G. Sohoni 6 / 35
Probability Review(slide 7)
Conditional Probability

Conditional probability
In general
P(A ∩ B)
P(A| B) =
P(B)
assuming P(B) > 0.
Multiplication rule:
Assuming that all of the conditioning events have positive probability, we have

P (∩ni=1 Ai ) = P (A1 ) · P (A2 | A1 ) · P (A3 | A1 ∩ A2 ) · · · P An | ∩n−1
i=1 Ai .

This rule can be easily verified by simply rewriting



P (A1 ∩ A2 ) P (A1 ∩ A2 ∩ A3 ) P ∩ni=1 Ai
P (∩ni=1 Ai ) = P (A1 ) · · ··· ,
P (A1 ) P (A1 ∩ A2 ) P ∩n−1
i=1 Ai

and by using the definition of conditional probability to rewrite the right-hand side
above as
P (A1 ) P (A2 | A1 ) P (A3 | A1 ∩ A2 ) · · · P (An | ∩ni=1 Ai ) .

© Milind G. Sohoni 7 / 35
Probability Review(slide 8)
Conditional Probability

Visualization
Sec. 1.3
ofConditional
the multiplication
Probability
rule 23

Event A1 ∩A2 ∩A3 Event A1 ∩A2 ∩ ...∩An

An
A1 A2 A3
. . . An-1
P(A1 ) P(A2 |A1 ) P(A3 |A1 ∩A2 ) P(An |A1 ∩A2 ∩ ...∩An-1)

Figure: Visualization of the total


Figure 1.9: Visualization of theprobability
total probabilitytheorem.
theorem. TheThe intersection
intersection event event
A = A1 ∩A2 ∩· · ·∩An is associated with a path on the tree of a sequential descrip-
A = A1 ∩ A2 tion∩ ·of· ·the
∩ experiment.
An is associated with a path on the tree of
We associate the branches of this path with the events a sequential
description ofA1the
,...,A n , and we record We
experiment. next associate
to the branches the branches
the correspondingof conditional
this path with the
probabilities.
events A1 ,. . . , AThe
n , and we record
final node next
of the path to thetobranches
corresponds theevent
the intersection corresponding
A, and
its probability is obtained by multiplying the conditional probabilities recorded
conditional probabilities.
along the branches of the path

P(A1 ∩ A2 ∩ · · · ∩ A3 ) = P(A1 )P(A2 | A1 ) · · · P(An | A1 ∩ A2 ∩ · · · ∩ An−1 ).


The final node of the path corresponds to the intersection event A, and
its probability
Noteis obtained
that by multiplying
any intermediate node along the path thealso conditional
corresponds to some probabilities
inter-
section event and its probability is obtained by multiplying the corresponding
recorded along the probabilities
conditional branchesupof thenode.
to that path For example, the event A1 ∩ A2 ∩ A3
corresponds to the node shown in the figure, and its probability is
P (A1 ∩ A2 ∩ · · · ∩ A3 ) = P (A1 ) P (A2 | A1 ) · · · P (An | A1 ∩ A2 ∩ · · · ∩ An−1 ) .
P(A1 ∩ A2 ∩ A3 ) = P(A1 )P(A2 | A1 )P(A3 | A1 ∩ A2 ).

© Milind G. Sohoni 8 / 35
Probability Review(slide 9)
Conditional Probability

An example
Three cards are drawn from an ordinary 52-card deck without replacement
(drawn cards are not placed back in the deck). We wish to find the
probability that none of the three cards is a heart. We assume that at
each step, each one of the remaining cards is equally likely to be picked.

By symmetry, this implies that every triplet of cards is equally likely to be


drawn. A cumbersome approach, that we will not use, is to count the
number of all card triplets that do not include a heart, and divide it with
the number of all possible card triplets. Instead, we use a sequential
description of the sample space in conjunction with the multiplication
rule. Let us define the event

Ai = {the ith card is not a heart} , i = 1, 2, 3.

We will calculate P (A1 ∩ A2 ∩ A3 ) the probability that none of the cards


is a heart.

© Milind G. Sohoni 9 / 35
Probability Review(slide 10) 37
Conditional Probability
P(A3 | A1 ∩ A2 ) = .
50

These probabilities are recorded along the corresponding branches of the tree de-
scribing the sample space, as shown in Fig. 1.10. The desired probability is now
obtained by multiplying the probabilities recorded along the corresponding path of
An example the tree:
39 38 37
P(A1 ∩ A2 ∩ A3 ) = · · .
Notice 52 51 50

39Note that once the probabilities are 38


recorded along the tree, the probability
of several other events can be similarly calculated. For example,
37
P (A1 ) = , P (A2 | A1 ) = P (A3 |A1 ∩ A2 ) = .
52 51 39 13
50
P(1st is not a heart and 2nd is a heart) = · ,
Thus, 52 51
39 38 13
39 38 37
P(1st two are not hearts and 3rd is a heart) = · ·
52 51 50
.
P (A1 ∩ A2 ∩ A3 ) = · · = 0.57.
52 51 50

Not a Heart
37/50

Not a Heart Heart


38/51 13/50

Not a Heart Heart


39/52 13/51

Heart
13/52

Figure 1.10: Sequential description of the sample space of the 3-card selection
Figure: Sequential description
problem in Example 1.10.of the sample space of the 3-card selection
problem.
© Milind G. Sohoni 10 / 35
Probability Review(slide 11)
Conditional Probability

An example

Note that once the probabilities are recorded along the tree, the
probability of several other events can be similarly calculated.

For example,
39 13
P(1st is not a heart and 2nd is a heart) = · = 0.191,
52 51
or
39 38 13
P(1st two are not hearts and 3rd is a heart) = · · = 0.145.
52 51 50

© Milind G. Sohoni 11 / 35
Probability Review(slide 12)
Total Probability Theorem and Bayes’ Rule

Total probability theorem and Bayes’ rule

© Milind G. Sohoni 12 / 35
Probability Review(slide 13)
Total Probability Theorem and Bayes’ Rule

Total probability theorem


Theorem 1
[Total Probability] Let A1 , . . ., An be disjoint events that form a partition of
the sample space (each possible outcome is included in one and only one of the
events A1 , . . ., An ) and assume P (Ai ) > 0, for all i = 1, . . . , n. Then, for any
event B, we have
P(B) = P (A1 ∩ B) + · · · + P (An ∩ B)
= P (A1 ) P (B| A1 ) + · · · + P (An ) P (B| An ) .
Sec. 1.4 Total Probability Theorem and Bayes’ Rule 27

A1 A1 ∩B
A1
A2
B A2 ∩B
A3
B
A3 ∩B

A2 A3
Bc

Figure 1.12: Visualization and verification of the total probability theorem. The
events A1 , . . . , An form a partition of the sample space, so the event B can be
© Milind G. Sohoni 13 / 35
Probability Review(slide 14)
Total Probability Theorem and Bayes’ Rule

Bayes’ rule
Bayes’ rule:
Let A1 , A2 , . . ., An be disjoint events that form a partition of the sample space,
and assume P (Ai ) > 0 for all i = 1, . . . , n. Then, for any event B such that
P(B) > 0, we have

P (B| Ai )
P (Ai | B) = · P (Ai )
| {z } P (B) | {z }
Posterior | {z } Prior
Likelihood
P (Ai ) P (B| Ai )
= .
P (A1 ) P (B| A1 ) + · · · + P (An ) P (B| An )

Bayes’ rule is often used for inference. There are a number of “causes” that
may result in a certain “effect.” We observe the effect, and we wish to infer the
cause. The events A1 ,. . ., An are associated with the causes and the event B
represents the effect. The probability P(B| Ai ) that the effect will be observed
when the cause Ai is present amounts to a probabilistic model of the
cause-effect relation (). Given that the effect B has been observed, we wish to
evaluate the (conditional) probability P(Ai | B) that the cause Ai is present.

© Milind G. Sohoni 14 / 35
Probability Review(slide 15)
Total Probability Theorem and Bayes’ Rule

Example: Bayes’ rule


An example of the inference context that is implicit in Bayes’ rule:
Suppose we observe a shade in a person’s X-ray (this is event B, the
“effect”) and we want to estimate the likelihood of three mutually
exclusive and collectively exhaustive potential causes: cause 1 (event A1 )
is that there is a malignant tumor, cause 2 (event A2 ) is that there is a
nonmalignant tumor, and cause 3 (event A3 ) corresponds to reasons
other than a tumor. We assume that we know the probabilities P(Ai )
and P(B| Ai ), i = 1, 2, 3. Given that we see a shade (event B occurs),
Bayes’ rule gives the conditional probabilities of the various causes as

P(B| Ai )
P(Ai | B) = · P(Ai ), i = 1, 2, 3.
| {z } P(A1 )P(B| A1 ) + P(A2 )P(B| A2 ) + P(A3 )P(B| A3 ) | {z }
Posterior | {z } Prior
Likelihood

© Milind G. Sohoni 15 / 35
Probability Review(slide 16)
Total Probability Theorem and Bayes’ Rule
the cause. The events A1 , . . . , An are associated with the causes and the event B
represents the effect. The probability P(B | Ai ) that the effect will be observed
when the cause Ai is present amounts to a probabilistic model of the cause-effect
Example: Bayes’ rule
relation (cf. Fig. 1.13). Given that the effect B has been observed, we wish to
evaluate the (conditional) probability P(Ai | B) that the cause Ai is present.

Cause 3 B A1 ∩B
Other
Cause 1
A1
Malignant Tumor A1 Bc

B A2 ∩B
B A2
Effect
Cause 2
Shade Observed
A3 Bc
Nonmalignant A2
Tumor
A3 B A3 ∩B

Bc

Figure 1.13: An example of the inference context that is implicit in Bayes’


rule. We observe a shade in a person’s X-ray (this is event B, the “effect”) and
© Milind G. Sohoni 16 / 35
Probability Review(slide 17)
Total Probability Theorem and Bayes’ Rule

Odds ratio: Bayes’ rule

An alternate explanation for Bayes’ rule is as follows. Suppose we have a


null hypothesis H0 and we collect some evidence E (for example, data).
Further, let hypothesis HA be an alternate hypothesis if H0 is rejected.
Then, Bayes’ rule suggests

P (H0 | E ) P (E | H0 ) P (H0 )
= ·
P (HA | E ) P (E | HA ) P (HA )
| {z } | {z } | {z }
Posterior odds Likelihood ratio Prior odds

that is,
Posterior odds = Likelihood ratio × Prior odds.

© Milind G. Sohoni 17 / 35
Probability Review(slide 18)
Total Probability Theorem and Bayes’ Rule

Independent events
An interesting and important special case arises when the occurrence of
B provides no information and does not alter the probability that A has
occurred, i.e.,
P(A| B) = P(A).
When the above equality holds, we say that A is independent of B. Note
that by the definition P(A| B) = P(A ∩ B)/P(B), this is equivalent to
P(A ∩ B) = P(A)P(B). We adopt this latter relation as the definition of
independence because it can be used even if P(B) = 0, in which case
P(A| B) is undefined. The symmetry of this relation also implies that
independence is a symmetric property; that is, if A is independent of B,
then B is independent of A, and we can unambiguously say that A and B
are independent events.
Example: Consider a regular pack of 52 cards.
2 4 1
P(Ace| Drawn a red card) = = P(Ace) = = .
26 52 13

© Milind G. Sohoni 18 / 35
Probability Review(slide 19)
Total Probability Theorem and Bayes’ Rule

Revisiting our original example


A laboratory blood test is 95% effective in detecting a certain disease
when it is, in fact, present. However, the test also yields a “false positive”
result for 1% of healthy persons tested. If 0.5% of the population
actually has the disease, what is probability a person has the disease
given that his test results is positive? Is the test really very effective?

Let T + define the event that the test comes back positive. Let D+
denote the event that the patient has the disease. We are interested in
computing P (D + | T +).

P (T + | D+) P (D+)
P (D + | T +) =
P (T +)
=?

© Milind G. Sohoni 19 / 35
Probability Review(slide 20)
Total Probability Theorem and Bayes’ Rule

The Monty Hall Show


“Suppose you’re on Monty Hall’s Let’s Make a Deal! You are given the
choice of three doors, behind one door is a car, the others, goats. You
pick a door, say 1, Monty opens another door, say 3, which has a goat.
Monty says to you “Do you want to pick door 2?” Is it to your advantage
to switch your choice of doors?”

Let D = 2 be the event that the car is actually behind door 2, that is you
win by switching to door 2. Let E = 3 be the event that Monty opens
door 3. You wish to compute

P (E = 3| D = 2) P (D = 2)
P (D = 2| E = 3) =
P (E = 3)
=?

© Milind G. Sohoni 20 / 35
Probability Review(slide 21)
Total Probability Theorem and Bayes’ Rule

The Monty Hall Show


Placement Door chosen Door opened Path
of car by contestant by Monty probabilities

1/2 2 1/18

1 1/2
1/3
3 1/18
1/3 1
1 2 3 1/9
1/3

3 1
2 1/9

1/3

1 1/9
3
1
1/3
1/2 1 1/18
1/3 1/3
2 2
1/2
1/3 3 1/18
3 1
1 1/9
1/3

1 1/9
1 2
1/3

1/3 1
3 2 1 1/9
1/3
1/2 1 1/18
3
1/2
2 1/18

© Milind G. Sohoni 21 / 35
Probability Review(slide 22)
Total Probability Theorem and Bayes’ Rule

Bayesian update

Placement Door chosen Door opened Unconditional Conditional


of car by contestant by Monty probability probability

1 1/2
1/3
3 1/18 1/3

1/3

1 3 1/9 2/3
1
1/3

1/3 2

© Milind G. Sohoni 22 / 35
Probability Review(slide 23)
Multiple Random Variables

Multiple random variables

© Milind G. Sohoni 23 / 35
Probability Review(slide 24)
Multiple Random Variables

Multiple random variables

For two discrete r.v’s X and Y , the joint probability distribution describes
the probability of certain X and Y combinations, i.e., P(X = xi , Y = yj ).
Note: All future discussion extends to more than 2 discrete r.v’s as well.

Consider the following sales data from an upscale café where X =


Number of hot coffees sold and Y = Number of cold drinks sold:

© Milind G. Sohoni 24 / 35
Probability Review(slide 25)
Multiple Random Variables

Multiple random variables


Probability (pi ) Hot coffees (X ) Cold drinks (Y )

0.1 360 360


0.1 790 110
0.15 840 30
0.05 260 90
0.15 190 450
0.1 300 230
0.1 490 60
0.1 150 290
0.1 550 140
0.05 510 290
Mean E [X ] = 457 E [Y ] = 210
Std. Dev. σX = 244.3 σY = 145.6

Table: Cafe sales data.

© Milind G. Sohoni 25 / 35
Probability Review(slide 26)
Multiple Random Variables

Marginal distributions
Marginal distribution of X is defined as follows:
X
P (X = xi ) = P (X = xi , Y = yj )
yj

Conditional distribution of X given Y is defined as:

P (X = xi , Y = yj )
P (X = xi | Y = yj ) = .
P (Y = yj )

Note: Conditional probabilities must sum up to 1.


X and Y are independent if

P (X = xi , Y = yj ) = P (X = xi ) · P (Y = yj ) for all xi , yj

and
P (X = xi | Y = yj ) = P (X = xi ) for all xi , yj .

© Milind G. Sohoni 26 / 35
Probability Review(slide 27)
Multiple Random Variables

Answer the following

Answer the following using the café sales data:

1. P (X = 150) =?
2. P (X = 150| Y = 290) =?
3. Are X and Y independent?
4. E [X | Y = 290] =?

© Milind G. Sohoni 27 / 35
Probability Review(slide 28)
Multiple Random Variables

Conditional expectation

Let X and Y be discrete r.v’s, then the conditional expectation of X


given Y is a function of y over the domain of Y (all values it can take).
Using our earlier definitions of conditional probability and marginal
distributions the conditional expectation can be computed as
M
X
E [X | Y = y ] = xi P (X = xi | Y = y )
i=1
M  
X P (X = xi , Y = y )
= xi .
i=1
P (Y = y )

Observe that E [X | Y ] is a random variable with randomness inherited


from Y .

© Milind G. Sohoni 28 / 35
Probability Review(slide 29)
Multiple Random Variables

Measuring relationship between variables

1. What measures the Y  Cold drinks

direction of the 400


relationship between X
300
and Y ?
2. How is correlation 200

defined? What does a 100


linear relationship between
two random variables 200 400 600 800
X  Hot coffee

imply?
2.1 Does correlation = 0
imply independence?

© Milind G. Sohoni 29 / 35
Probability Review(slide 30)
Appendix

Solutions to some problems discussed in class

© Milind G. Sohoni 30 / 35
Probability Review(slide 31)
Appendix

The birthday example


Let S = {Set of all possible 70-tuple birthdays}, i.e.

S = {(Jan 1, Feb 20, . . . ,Jan 1, . . .) , . . .} .

Let A denote the event set that “in a class of 70 students, some pair has
the same birthday.” Let Ac denote the complement set, i.e. no pair has
the same birthday – in other words, number of 70-tuples where all entries
are different. Then

Pr {A} = 1 − Pr {Ac }
365
P70
=1−
365 × · · · × 365
365 × 364 × · · · × (365 − 69)
=1−
36570
= 0.9995.

© Milind G. Sohoni 31 / 35
Probability Review(slide 32)
Appendix

Lab blood test example


A laboratory blood test is 95% effective in detecting a certain disease
when it is, in fact, present. However, the test also yields a “false positive”
result for 1% of healthy persons tested. If 0.5% of the population
actually has the disease, what is probability a person has the disease
given that his test results is positive? Is the test really very effective?
Let T + define the event that the test comes back positive. Let D+
denote the event that the patient has the disease. Further, let D−
denote the event that the patient does not have the disease. Notice, D+
and D− are mutually exclusive and collective exhaustive events (a
partition). We are interested in computing P (D + | T +).
P (T + | D+) P (D+)
P (D + | T +) =
P (T +)
P (T + | D+) P (D+)
=
P (T + | D+) P (D+) + P (T + | D−) P (D−)
0.95 × 0.005
=
(0.95 × 0.005) + (0.01 × 0.995)
≈ 0.323
© Milind G. Sohoni 32 / 35
Probability Review(slide 33)
Appendix

The Monty Hall Show


Let D = 2 be the event that the car is actually behind door 2, that is you
win by switching to door 2. Let E = 3 be the event that Monty opens
door 3. Further, let D = 1 be the event the car is behind door 1, and
D = 3 be the event that the car is behind door 3. Notice D = 1, D = 2,
and D = 3 are mutually exclusive and collective exhaustive events (a
partition). You wish to compute
P (E = 3| D = 2) P (D = 2)
P (D = 2| E = 3) =
P (E = 3)

P (E = 3| D = 2) P (D = 2)
=
P (E = 3| D = 1) P (D = 1) + P (E = 3| D = 2) P (D = 2) + P (E = 3| D = 3) P (D = 3)

1
1× 3
=   
1 1 1 1
2 × 3 + 1× 3 + 0× 3

2
= .
3

© Milind G. Sohoni 33 / 35
Probability Review(slide 34)
Appendix

The café example

Answer the following using the café sales data:


1. P (X = 150) = 0.1
0.1
2. P (X = 150| Y = 290) = 0.1+0.05 = 32 .
3. Are X and Y independent?
3.1 No. Because P (X = 150) 6= P (X = 150| Y = 290) .
2 1 810
4. E [X | Y = 290] = 150 × 3 + 510 × 3 = 3 = 270.

© Milind G. Sohoni 34 / 35
Probability Review(slide 35)
Appendix

Measuring relationship between variables


1. Covariance and correlation measure the direction of the relationship between X
and Y
cov(X , Y ) = E [(X − E[X ]) (Y − E[Y ])]
M
X
= P (X = xi , Y = yi ) ((X − E[X ]) (Y − E[Y ]))
i=1

cov (X , Y )
corr (X , Y ) = ρX ,Y =
σX σY
where σX and σY are the standard deviations of the distributions of X and Y
respectively. Correlation is always between -1 and 1.
2. A linear relationship implies perfect correlation, i.e., Y = aX + b. If a < 0 then
the variables are negatively correlated, else positively.
3. Correlation = 0 does not imply independence (however, independence implies 0
correlation). All it could mean is that the variables are not linearly dependent.
4. Correlation does not imply causality.

© Milind G. Sohoni 35 / 35

You might also like