Bayes' Theorem and Its Applications

BAYES’ THEOREM
AND ITS APPLICATIONS
A REPORT BY
JOHN SYDRIC T. RENDEZA
IN FULFILLMENT OF THE REQUIREMENTS

FOR
STAT 213 – MATH STAT 1
Bayes’ Theorem, often referred to as theorem on the probability of causes, enables us to find the
probabilities of the various events H1, H2, …, Hn that can cause A to occur (Spiegel et al, 2013). It
is named from Thomas Bayes (1702 – 1761), an English mathematician and cleric (Suhov and
Kelbert, 2014).
Formula 1-1 (Bayes’ Formula). Let H = {H1, H2, …} be a positive partition of S, and A be an
event with P(A) > 0. Then for any event Hk of the partition H,
𝐏(𝐇𝐤 ) 𝐏(𝐀|𝐇𝐤 )
𝐏(𝐇𝐤 |𝐀) = 𝒏
∑ 𝐏(𝐇𝐣 ) 𝐏(𝐀|𝐇𝐣 )
𝒋=𝟏
in the case of a finite partition H, and
𝐏(𝐇𝐤 ) 𝐏(𝐀|𝐇𝐤 )
𝐏(𝐇𝐤 |𝐀) = ∞
∑ 𝐏(𝐇𝐣 ) 𝐏(𝐀|𝐇𝐣 )
𝒋=𝟏
when partition H is countably infinite.

Proof. The formula for the conditional probability of event A given event B goes:
𝐏 ( 𝐀  𝐁)
𝐏 ( 𝐀 | 𝐁) =
𝐏 ( 𝐁)
Using the formula above, we can write
𝐏(𝐇𝐤  𝐀) 𝐏(𝐇𝐤 ) 𝐏(𝐀|𝐇𝐤 )

𝐏(𝐇𝐤 |𝐀) = =
𝐏(𝐀) 𝐏(𝐀)
Note. The numerator is derived from the rule of multiplication. The denominator follows the
formula for total probability P(A), defined as:
∞ n
P(A) = ∑ 𝐏(𝐇𝐣 ) 𝐏(𝐀|𝐇𝐣), or ∑ 𝐏(𝐇𝐣 ) 𝐏(𝐀|𝐇𝐣 )
𝒋=𝟏 𝒋=𝟏
when partition H is infinite and finite, respectively.
Formula 1-2 (Special Case). In the case of a partition (positive) into two events, H = {B, B c}, and
any event A with P (A) > 0, we have:
𝐏(𝐁) 𝐏(𝐀|𝐁)
𝐏 ( 𝐁|𝐀 ) =
𝐏(𝐁) 𝐏(𝐀|𝐁) + 𝐏(𝐁 𝐜 ) 𝐏(𝐀|𝐁𝐜 )
Formula 1-2 is particularly used in False Positives and False Negatives, which will be tackled
through examples below.
(Source: Bartoszynski, R., & Niewiadomska-Bugaj, M. (2008). Probability and Statistical
Inference, 2nd Edition. Hoboken, NJ. John Wiley & Sons, Inc.)
Example 1. In a certain factory, machines I, II, and III are all producing springs of the same
length. Of their production, machines I, II, and III respectively produce 2%, 1%, and 3% defective
springs. Of the total production of springs in the factory, machine I produces 35%, machine II
produces 25%, and machine III produces 40%. Find the posterior probability of machine III
producing defective springs.
Solution:
Let D be the event of getting a defective spring. If one spring is selected at random from the total
springs produced in a day, then by the law of total probability:
𝐏(𝐃) = 𝐏(𝐈) 𝐏(𝐃|𝐈) + 𝐏(𝐈𝐈) 𝐏(𝐃|𝐈𝐈) + 𝐏(𝐈𝐈𝐈) 𝐏(𝐃|𝐈𝐈𝐈)

𝐏(𝐃) = (𝟎. 𝟑𝟓)(𝟎. 𝟎𝟐) + (𝟎. 𝟐𝟓)(𝟎. 𝟎𝟏) + (𝟎. 𝟒𝟎)(𝟎. 𝟎𝟑)
𝐏(𝐃) = 𝟎. 𝟎𝟐𝟏𝟓
If the selected spring is defective, the conditional probability that it was produced by machine III
is, by Bayes’ formula;
𝐏(𝐈𝐈𝐈) 𝐏(𝐃|𝐈𝐈𝐈)
𝐏(𝐈𝐈𝐈|𝐃) =
𝐏(𝐃)
(𝟎. 𝟒𝟎) (𝟎. 𝟎𝟑)
𝐏(𝐈𝐈𝐈|𝐃) =
𝟎. 𝟎𝟐𝟏𝟓
𝐏(𝐈𝐈𝐈|𝐃)  𝟎. 𝟓𝟓𝟖𝟏
Note how the posterior probability of III (= 0.5581) increased from the prior probability of III (=
0.40) after the defective spring was observed, because III produces a larger percentage of
defectives than I and II.
(Source: Hogg, R., Tanis, E., & Zimmerman, D., (2015). Probability and Statistical Inference, 9th
Edition. Upper Saddle River, NJ. Pearson Education Inc.)
Example 2. In the United States, there are about 8 women in 100,000 who develops cervical
cancer. A Pap smear is a screening procedure used to detect this cancer. The procedure records
16% false negatives and 10% false positives. Find the probability of a Pap smear detecting a true
case of cervical cancer.
Solution:
Let C be the event of a woman getting cancer, and T be the event of Pap smear producing the
result. For women with this cancer, there are about 16% false negatives. For women without
cancer, there are about 10% false positives. In summary, that is;
Pap Smear detects cervical Pap Smear did not detect
cancer cervical cancer
Women with cervical cancer 0.84 0.16
Women without cervical
0.10 0.90
cancer
Also, the probability of a women having cervical cancer is 0.00008, so the compliment is 0.99992.
𝐏(𝐂+) 𝐏(𝐓 +|𝐂+ )

𝐏 (𝐂 + |𝐓 + ) =
𝐏(𝐂+) 𝐏(𝐓+ |𝐂+) + 𝐏(𝐂−) 𝐏(𝐓 +|𝐂− )
(𝟎. 𝟎𝟎𝟎𝟎𝟖)(𝟎. 𝟖𝟒)
𝐏(𝐂+ |𝐓 +) =
(𝟎. 𝟎𝟎𝟎𝟎𝟖)(𝟎. 𝟖𝟒) + (𝟎. 𝟗𝟗𝟗𝟗𝟐)(𝟎. 𝟏𝟎)
𝐏(𝐂+|𝐓 + ) = 𝟎. 𝟎𝟎𝟎𝟔𝟕𝟐
What this means is that for every million positive Pap smears, only 672 represent true cases of
cervical cancer (a woman has the disease given that her test result is positive). This low ratio makes
one question the value of the procedure. The reason that it is ineffective is that the percentage of
women having that cancer is so small and the error rates of the procedure—namely, 0.16 and
0.10—are so high.
Example 3. Consider two urns. The first contains two white and seven black balls, and the second
contains five white and six black balls. We flip a fair coin and then draw a ball from the first urn
or the second urn depending on whether the outcome was heads or tails. What is the conditional
probability that the outcome of the toss was heads given that a white ball was selected?
Solution:
Let W be the event that a white ball is drawn, and let H be the event that the coin comes up heads.
The desired probability P(H|W) may be calculated as follows:
𝐏(𝐇) 𝐏(𝐖|𝐇)
𝐏(𝐇|𝐖) =
𝐏(𝐇) 𝐏(𝐖|𝐇) + 𝐏(𝐇 𝐜 ) 𝐏(𝐖|𝐇𝐜 )
𝟏 𝟐
( )( )
𝐏(𝐇|𝐖) = 𝟐 𝟗
𝟏 𝟐 𝟏 𝟓
(𝟐)(𝟗) + (𝟐)(𝟏𝟏)
𝟐𝟐
𝐏(𝐇|𝐖) =
𝟔𝟕
(Source: Ross, S., (2010). Introduction to Probability Models, 10th Edition. Los Angeles, CA.
Elsevier Inc.)
Example 4. A laboratory blood test is 95 percent effective in detecting a certain disease when it
is, in fact, present. However, the test also yields a “false positive” result for 1 percent of the healthy
persons tested. (That is, if a healthy person is tested, then, with probability 0.01, the test result will
imply he has the disease.) If 0.5 percent of the population actually has the disease, what is the
probability a person has the disease given that his test result is positive?
Solution:
Let D be the event that the tested person has the disease, and E the event that his test result is
positive. The desired probability P(D|E) is obtained by:
Detected Not Detected
Present 0.95 0.05
Not Present 0.01 0.99
Also, the probability of a person having the disease is 0.005, so the compliment is 0.995.
𝐏(𝐃) 𝐏(𝐄|𝐃)
𝐏(𝐃|𝐄) =
𝐏(𝐃) 𝐏(𝐄|𝐃) + 𝐏(𝐃𝐜 ) 𝐏(𝐄|𝐃𝐜)
(𝟎. 𝟎𝟎𝟓)(𝟎. 𝟗𝟓)
𝐏(𝐃|𝐄) =
(𝟎. 𝟎𝟎𝟓)(𝟎. 𝟗𝟓) + (𝟎. 𝟗𝟗𝟓) (𝟎. 𝟎𝟏)
𝟗𝟓
𝐏(𝐃|𝐄) =  𝟎. 𝟑𝟐𝟑
𝟐𝟗𝟒
(Source: Ross, S., (2010). Introduction to Probability Models, 10th Edition. Los Angeles, CA.
Elsevier Inc.)
Formula 1-3 (Updating the Evidence). Let H = {H1, H2, …} be a partition, and let A and B be
two events. If P(A  B) > 0, then for every Hk in partition H, we have:
𝐏(𝐇𝐤 ) 𝐏(𝐀  𝐁|𝐇𝐤 )

𝐏(𝐇𝐤 |𝐀  𝐁) =
∑ 𝐏(𝐇𝐣 ) 𝐏(𝐀  𝐁|𝐇𝐣 )
𝐏(𝐁 | 𝐀  𝐇𝐤 ) 𝐏(𝐇𝐤 | 𝐀)
=
∑ 𝐏(𝐁 | 𝐀  𝐇𝐣 ) 𝐏(𝐇𝐣 | 𝐀)
Proof. The middle term is Bayes’ formula applied to the LHS. We write the ff. to show the equality
of the middle and RHS.
𝐏(𝐀  𝐁 | 𝐇𝐢 ) 𝐏(𝐇𝐢 ) = 𝐏(𝐀  𝐁  𝐇𝐢 )

= 𝐏(𝐁 | 𝐀  𝐇𝐢 ) P(𝐀  𝐇𝐢 )
= 𝐏(𝐁 | 𝐀  𝐇𝐢 ) P(𝐇𝐢 | 𝐀) 𝐏(𝐀)
Example 5. An urn contains two coins: One is a regular coin, with heads and tails, while the other
has heads on both sides. One coin is chosen at random from the urn and tossed n times. The results
are all heads. What is the probability that the coin tossed is a two-headed one?
Solution:
Intuitively, for a large n we expect the probability that the coin selected has two heads to be close
to 1, since it is increasingly unlikely to get n heads in a row with a regular coin. Let H1 and H2 be
the events “regular coin was chosen” and “coin with two heads was chosen.” Clearly, H1 and H2
form a partition. Let the prior probabilities be P(H1) = P(H2) = 1/2, and let An be the event “n heads
in a row”; our objective is to find P(H2|An). Since P(An|H2) = 1 for all n (this coin will only give
heads), and P(An|H1) = 1/2n, by Bayes’ theorem we have;
𝐏(𝐇𝟐 ) 𝐏(𝐀 𝐧|𝐇𝟐 )

𝐏(𝐇𝟐 |𝐀 𝐧 ) =
𝐏(𝐇𝟏 ) 𝐏(𝐀 𝐧|𝐇𝟏 ) + 𝐏(𝐇𝟐 ) 𝐏(𝐀 𝐧|𝐇𝟐 )
As expected, the probability (4.16) does approach 1 as n increases.

Suppose now that after A, was observed, an additional m tosses again produced only heads (event
Bm). Because An  Bm is the same as An+m, the posterior probability of the two-headed coin (H2)
2𝑛+𝑚
given An+m is 2𝑛+𝑚+1, after we replace n + m for n. Using the second part of formula, and the fact
that P(Bm | Hi  An) = P(Bm | Hi), i = 1 , 2 , we obtain:
𝐏(𝐁𝐦 |𝐇𝟐 ) 𝐏(𝐇𝟐 |𝐀 𝐧)

𝐏(𝐇𝟐 |𝐀 𝐧  𝐁𝐦 ) =
𝐏(𝐁𝐦 |𝐇𝟏 ) 𝐏(𝐇𝟏 |𝐀 𝐧 ) + 𝐏(𝐁𝐦 |𝐇𝟐 ) 𝐏(𝐇𝟐 |𝐀 𝐧)
𝟐𝒏
𝟏( 𝒏 ) 𝟐𝒏+𝒎
𝟐 +𝟏
= 𝟏 𝟏 𝟐𝒏 =
( 𝒎 )( 𝒏 )+𝟏( 𝒏 ) 𝟐𝒏+𝒎 +𝟏
𝟐 𝟐 +𝟏 𝟐 +𝟏
which agrees with the result of updating “all at once”.

Solved Problems.
1. Bowl B1 contains two white chips, bowl B2 contains two red chips, bowl B3 contains two white
and two red chips, and bowl B4 contains three white chips and one red chip. The probabilities of
selecting bowl B1, B2, B3, or B4 are 1/2, 1/4, 1/8, and 1/8, respectively. A bowl is selected using
these probabilities and a chip is then drawn at random. Find the conditional probability that bowl
B1 had been selected, given that a white chip was drawn.
Solution:
Let W be the event that a white chip was drawn. Then:
𝐏(𝐖) = 𝐏(𝐁𝟏 ) 𝐏(𝐖|𝐁𝟏 ) + 𝐏(𝐁𝟐 ) 𝐏(𝐖|𝐁𝟐 ) + 𝐏(𝐁𝟑 ) 𝐏(𝐖|𝐁𝟑 ) + 𝐏(𝐁𝟒 ) 𝐏(𝐖|𝐁𝟒 )

𝟏 𝟏 𝟏 𝟏 𝟏 𝟑
𝐏(𝐖) = ( ) (𝟏) + ( )(𝟎) + ( )( ) + ( ) ( )
𝟐 𝟒 𝟖 𝟐 𝟖 𝟒
𝐏(𝐖) = 𝟎. 𝟔𝟓𝟔𝟐𝟓
If the drawn chip was white, the conditional probability that bowl B1 had been selected is, by
Bayes’ formula:
𝐏(𝐁𝟏 ) 𝐏(𝐖|𝐁𝟏 )
𝐏(𝐁𝟏 |𝐖) =
𝐏(𝐖)
𝟏
( )(𝟏)
𝐏(𝐁𝟏 |𝐖) = 𝟐
𝟎. 𝟔𝟓𝟔𝟐𝟓
𝐏(𝐁𝟏 |𝐖)  𝟎. 𝟕𝟔𝟏𝟗
2. Suppose that medical science has developed a test for a certain disease that is 95% accurate, on
both those who do and those who do not have the disease. If the incidence rate of this disease in
the population is 5%, find the probability that a person: (i) Has the disease when the test is positive.
(ii) Does not have the disease when the test is negative.
Solution:
Let’s make a table first to determine the values under false positive and false negatives.
Test does not detect the
Test detects the disease
disease
Positive 0.95 0.05
Negative 0.05 0.95
Let D be the event the person has the disease, T be the event that the person was tested, and + or
– be the positive and negative.
(i)
+| + )
𝐏(𝐃+) 𝐏(𝐓 + |𝐃+)
𝐏 (𝐃 𝐓 =
𝐏(𝐃+) 𝐏(𝐓+ |𝐃+) + 𝐏(𝐃−) 𝐏(𝐓 +|𝐃−)
(𝟎. 𝟎𝟓)(𝟎. 𝟗𝟓) 𝟏
= =
(𝟎. 𝟎𝟓)(𝟎. 𝟗𝟓) + (𝟎. 𝟗𝟓)(𝟎. 𝟎𝟓) 𝟐
(ii)
𝐏(𝐃−) 𝐏(𝐓 − |𝐃−)
𝐏(𝐃− |𝐓 − ) =
𝐏(𝐃−) 𝐏(𝐓− |𝐃−) + 𝐏(𝐃+) 𝐏(𝐓 −|𝐃+)
(𝟎. 𝟗𝟓) (𝟎. 𝟗𝟓)
= = 𝟎. 𝟗𝟎𝟓
(𝟎. 𝟗𝟓) (𝟎. 𝟗𝟓) + (𝟎. 𝟎𝟓)(𝟎. 𝟎𝟓)

3. Two different suppliers, A and B, provide the manufacturer with the same part. All supplies of
this part are kept in a large bin. In the past 2% of all parts supplied by A and 4% of parts supplied
by B have been defective. Moreover, A supplies three times as many parts as B. Suppose that you
reach into the bin and select a part. (i) Find the probability that this part is defective. (ii) If the part
is non-defective, find the probability that it was supplied by B?
Solution:
(i) Let D be the event that the part was defective. Moreover, A supplies 75% of the parts while B
supplies 25% of the parts.
𝐏(𝐃) = 𝐏(𝐀) 𝐏(𝐃|𝐀) + 𝐏(𝐁) 𝐏(𝐃|𝐁)

𝐏(𝐃) = (𝟎. 𝟕𝟓)(𝟎. 𝟎𝟐) + (𝟎. 𝟐𝟓)(𝟎. 𝟎𝟒)
𝐏(𝐃) = 𝟎. 𝟎𝟏
(ii)
𝐏(𝐁) 𝐏(𝐃𝐜 |𝐁)
𝐏(𝐁|𝐃𝐜 ) =
𝐏(𝐃𝐜 )
(𝟎. 𝟐𝟓)(𝟎. 𝟗𝟔)
𝐏(𝐁|𝐃𝐜 ) =
(𝟎. 𝟗𝟗)
𝐏(𝐁|𝐃𝐜 ) 𝟎. 𝟐𝟒𝟐𝟒
Note how the posterior probability of B (= 0.2424) decreased from the prior probability of III (=
0.96), because B a much less percentage of parts than A.

References
Spiegel, M., Schiller, J., & Srinivasan, R., (2013). Schaum’s Outlines in Probability and Statistics,
4th Edition. USA. McGraw-Hill Companies, Inc.
Suhov, Y., & Kelbert, M., (2014). Probability and Statistics by Example, 2 nd Edition. Cambridge,
UK. Cambridge University Press.
Bartoszynski, R., & Niewiadomska-Bugaj, M. (2008). Probability and Statistical Inference, 2nd
Edition. Hoboken, NJ. John Wiley & Sons, Inc.
Hogg, R., Tanis, E., & Zimmerman, D., (2015). Probability and Statistical Inference, 9th Edition.
Upper Saddle River, NJ. Pearson Education Inc.
Ross, S., (2010). Introduction to Probability Models, 10th Edition. Los Angeles, CA. Elsevier Inc.

Bayes' Theorem and Its Applications

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayes' Theorem and Its Applications

Uploaded by

Copyright:

Available Formats

BAYES’ THEOREM

AND ITS APPLICATIONS

JOHN SYDRIC T. RENDEZA

IN FULFILLMENT OF THE REQUIREMENTS

in the case of a finite partition H, and

when partition H is countably infinite.

Using the formula above, we can write

𝐏(𝐇𝐤  𝐀) 𝐏(𝐇𝐤 ) 𝐏(𝐀|𝐇𝐤 )

when partition H is infinite and finite, respectively.

𝐏(𝐃) = 𝐏(𝐈) 𝐏(𝐃|𝐈) + 𝐏(𝐈𝐈) 𝐏(𝐃|𝐈𝐈) + 𝐏(𝐈𝐈𝐈) 𝐏(𝐃|𝐈𝐈𝐈)

𝐏(𝐂+) 𝐏(𝐓 +|𝐂+ )

Not Present 0.01 0.99

𝐏(𝐇𝐤 ) 𝐏(𝐀  𝐁|𝐇𝐤 )

𝐏(𝐀  𝐁 | 𝐇𝐢 ) 𝐏(𝐇𝐢 ) = 𝐏(𝐀  𝐁  𝐇𝐢 )

𝐏(𝐇𝟐 ) 𝐏(𝐀 𝐧|𝐇𝟐 )

As expected, the probability (4.16) does approach 1 as n increases.

𝐏(𝐁𝐦 |𝐇𝟐 ) 𝐏(𝐇𝟐 |𝐀 𝐧)

which agrees with the result of updating “all at once”.

𝐏(𝐖) = 𝐏(𝐁𝟏 ) 𝐏(𝐖|𝐁𝟏 ) + 𝐏(𝐁𝟐 ) 𝐏(𝐖|𝐁𝟐 ) + 𝐏(𝐁𝟑 ) 𝐏(𝐖|𝐁𝟑 ) + 𝐏(𝐁𝟒 ) 𝐏(𝐖|𝐁𝟒 )

Negative 0.05 0.95

(Source: Bartoszynski, R., & Niewiadomska-Bugaj, M. (2008). Probability and Statistical

𝐏(𝐃) = 𝐏(𝐀) 𝐏(𝐃|𝐀) + 𝐏(𝐁) 𝐏(𝐃|𝐁)

(Source: Bartoszynski, R., & Niewiadomska-Bugaj, M. (2008). Probability and Statistical

You might also like