Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

SUPPLEMENTARY TOPIC: BAYES’ RULE

1
Lecture Notes for Introductory Statistics

Neal Smith, Augusta University (2016)

In Chapter 3, we discussed what is meant by the conditional probability P (A|B);


this symbol represents the probability of the event A occurring, given that B is
already known to have occurred. As we will see, there are many situations where
P (A|B) is known, but P (B|A) is the real quantity of interest. For example, it might
be known that 90 % of people known to have a certain medical condition will test
positive for that condition when a certain test is administered; that is, P (Positive
Test|Condition is Present) = .90. However, if your doctor informs you that you have
just tested positive for that condition, what you really want to know is P (Condition
is Present|Positive Test), the probably you actually have the condition, given that
you have just tested positive for the condition.
This is a very common problem in probability, and fortunately, we can easily
derive a formula for how to handle this general problem.

1. The Bayes’ Rule formula


Recall that there are two ways to compute the probability P (A and B).

P (A and B) = P (A) × P (B|A) = P (B) × P (A|B)


Using the last two equalities, and solving for P (B|A), we find that

P (A|B) × P (B)
P (B|A) =
P (A)
provided of course that P (A) 6= 0. The above formula is called Bayes’ Rule or
Bayes’ Theorem.
In the Bayes’ Rule formula, there are some terms you need to know. The P (B)
term is sometimes referred to as the prior probability of B, and the P (B|A)
term is sometimes called the updated probability of B, given that A is known.
Intuitively, P (B) is your initial estimate of how likely B is to occur, and P (B|A)
provides you an updated estimate of this probability once you know some additional
information (i.e. that A has occurred).

2. Examples Using Bayes’ Rule


Let’s start with a simple example.
Example 1. An email filter typically works as follows. The user must tag emails
which they classify as junk. As more emails are tagged, the filter can scan the junk
emails and learn. Suppose that 60% of the emails tagged contain the word sale. On
the other hand, only 5% of the user’s emails contain this word, and the user has
flagged 8% of their emails as junk. Thus, given that an email contains the word
sale, find the probability that it is in fact junk.
In the language of Bayes’ Rule, the prior probability P (J) that a randomly
selected piece of this user’s email should be classified as junk is .08. We want to
1
These lecture notes are intended to be used with the open source textbook “Introductory
Statistics” by Barbara Illowsky and Susan Dean (OpenStax College, 2013).
1
Supplemental topic: Bayes’ Rule N. Smith

know P (J|S), the updated probability that the email is junk, once we know the
additional information that it contains the word sale. Fortunately, we know that
P (S|J) = .60, as 60 percent of the user’s known junk emails contain the word sale.
Therefore, Bayes’ Rule tells us that

P (S|J) × P (J) .60 × .08


P (J|S) = = = .96
P (S) .05
Initially, there is only an 8 percent probability that any given email is junk, but
simply seeing the keyword sale increases this probability to a whopping 96 percent!
Intuitively, this should seem reasonable, as not many of the user’s emails contain
this keyword, but the majority of the user’s junk emails do contain this keyword,
and so the combination of these two factors makes it quite likely that this email
should be classified as junk.
In the example above, we pretty much had all the terms needed in Bayes’ Rule
staring us in the face, but life is not always so simple. Let’s look at an example
where we have to work a little bit harder.
Example 2. Cystic Fibrosis is an unpleasant disease. In a particular population,
it is known that about 1/1600 of the population does have CF. Further, there is
a medical test for CF. Among people who are known to have CF, the test will
(correctly) indicate its presence (a positive test) about 85 percent of the time.
Among people who do not have CF, the test will incorrectly report a positive with
probability .001. Thus, if a patient undergoes this CF test, and tests positive, and
if nothing else is known about their medical condition, what is the probability that
they actually have CF?
1
Here, we know P (CF ) = 1600 and we also know P (+|CF ) = .85, where + is our
shorthand for having a positive test. We want the conditional probability P (CF |+)
since we know the subject has just tested positive for CF, and we want to know
how this positive test impacts the probability that they do in fact have CF. Bayes’
Rule says

P (+|CF ) × P (CF )
P (CF |+) =
P (+)
but unfortunately, the probability P (+) (the percentage of the population who
would test positive for CF when tested) is not immediately obvious. However, there
are two scenarios where a subject could test positive for CF-either they have CF
(probability 1/1600) and they subsequently test positive when tested (85 percent
of such subjects will do so based on the information we are given), or a second
possible scenario is the subject does not have CF (probability 1599/1600) and they
subsequently test positive (.001 from the given information). Thus,

P (+) = P (CF ) × P (+|CF ) + P (no CF ) × P (+|no CF )


1 1599
= × .85 + × .001
1600 1600
and putting this all together,

P (+|CF ) × P (CF )
P (CF |+) =
P (+)
1
.85 × 1600
= 1 1599
1600 × .85 + 1600 × .001
≈ .347
Notes, p 2
Supplemental topic: Bayes’ Rule N. Smith

Knowledge of this positive test increases the probability the subject has CF from
1
our initial estimate of 1600 to almost 35 percent. Again, this should intuitively seem
reasonable, as the vast majority of people do not have CF, and a .001 probability
of a false positive in this group is going to be a lot of positive test results. Most
subjects with CF will return a positive test, but the fact there will be so many other
false positives means that while a positive test drastically increases the probability
the subject has CF, it is still far from certain that they do have CF.
As in the above example, it is not at all uncommon to see the denominator of
Bayes’ Rule break into a bunch of cases. Let’s look at another example.
Example 3. Suppose if you will that if you arrive on campus before 8 am, there is
a 95% probability you will obtain a premium parking spot. If you arrive between 8
and 9, this probability drops to 50%, and if you arrive after 9, this probability falls
to only 10%.
You have a friend, and based on your knowledge of their habits, you believe that
only 5% of the time do they arrive on campus before 8 am. 20% of the time they
arrive between 8 and 9, and 75% of the time they arrive after 9. However, one day
you see that your friend managed to obtain a premium parking spot. Use Bayes’
Rule to find the probability that they arrived before 8 am (B8), given the evidence
E that they obtained a premium parking spot.
By Bayes’ Rule,

P (E|B8) × P (B8)
P (B8|E) =
P (E)
We know P (E|B8) = .95 since a premium spot will be had 95 percent of the
time when your friend arrives before 8. We also know that P (B8) = .05 based on
our knowledge of our friend. To find P (E), there are three cases, as the probability
of getting a premium spot depends on the time of arrival. Thus, we have

.95 × .05
P (B8|E) =
.05 × .95 + .20 × .50 + .75 × .10
≈ .213
Once we see that our friend obtained a premium spot, the scarce availability of
such spots later in the day boosts the likelihood that our friend arrived on campus
before 8 a.m. from 5 percent to more than 20 percent.

3. More Advanced Examples Using Bayes’ Rule


Let’s conclude this discussion with a rather esoteric application of Bayes’ Rule.
Example 4. I show you a bag and I truthfully tell you that it contains three
cards, each of which is either orange or white. Thus, the bag contains either 0, 1,
2, or 3 white cards, and from your point of view it might be reasonable to assume
that these four cases are equally likely. That is, you have the prior probabilities
P (0W ) = P (1W ) = P (2W ) = P (3W ) = 14 . I then allow you to draw one card at
random from the bag, and it is white. The question is: how many white cards are
in the bag?
We can immediately rule out one case. Given the evidence E that a white card
was drawn, P (0W |E) = 0, as the observation of a white card in the bag renders it
impossible that there were zero white cards in the bag. Using Bayes’ Rule,

P (E|1W ) × P (1W )
P (1W |E) =
P (E)
Notes, p 3
Supplemental topic: Bayes’ Rule N. Smith

1
× 14 3
=
P (E)
Now, the denominator is the probability of drawing a white card from the bag, and
here there are four cases, depending on whether the bag actually contains 0, 1, 2,
or 3 white cards. So,
1 1 1 1 2 1 3
P (E) = ×0+ × + × + ×
4 4 3 4 3 4 3
1 2 3 6 1
=0+ + + = =
12 12 12 12 2
Therefore,
1 1
3 × 4 2 1
P (1W |E) = 1 = =
2
12 6
and in exactly the same way,
2 1
3 × 4 2 2
P (2W |E) = 1 = =
2
12 6
and
2 1
3 ×
2 4 3
P (3W |E) = 1 = =
12
2
6
So, if it was initially equally likely that there were 0, 1, 2, or 3 white cards in the
bag, and all we know is that we drew a single white card from the bag, the most
likely scenario is that all three cards in the bag are white!

Notes, p 4

You might also like