Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Chapter Three: Probability

1/38
3.1 Introduction

Probability is the basis for inferential statistics.


In this chapter you will
be given a definition of probability.
study probability as it relates to contingency tables and the normal
curve.
be introduced to risk ratios, odds ratios, sensitivity, specificity, and
positive and negative predictive values.

3.1 Introduction 2/38


3.2 A Definition of Probability

We define the probability of some occurrence,1 A as

NA
P (A) =
N

P () is read “The probability of ...” and A represents any event of interest.


A is the compliment of A or “not observing A.” NA is the number of events
that meet the specified criterion and N is the total number of events.

1
This definition assumes equally likely events.
3.2 A Definition of Probability 3/38
Example

Five marbles are placed in a cup. Three are red, two are white. If a marble
is randomly selected, what is the probability it is red?

NA
P (A) =
N
3
=
5
= .60

3.2 A Definition of Probability 4/38


Some Properties of Probability

Several important properties of probability can be deduced from its


definition.
1 P (A) ≥ 0. This follows because NA is a count and cannot, therefore,
be less than zero.
2 P (A) ≤ 1. This follows because NA can never exceed N.
 
3 P (A) + P A = 1 or P A = 1 − P (A). This follows because
N −N A outcomes fail to meet the stated criterion A. Thus
P A = (N−N N
A)
= 1 − NNA = 1 − P (A).

3.2 A Definition of Probability 5/38


3.3 Contingency Tables

A contingency table is a convenient means of summarizing data.


A frequency contingency table summarizes the numbers of
observations in a data set that manifest some specified set of
characteristics.
A probability contingency table summarizes the proportions of
observations in a data set that manifest some specified set of
characteristics.

3.3 Contingency Tables 6/38


Frequency Tables

Frequency tables, such as the one represented here, show the numbers of
observations (persons, things etc.) that manifest some set of
characteristics.

D D
S 9 3 12
S 2 6 8
11 9

3.3 Contingency Tables 7/38


Frequency Tables (continued)

D D
S 9 3 12
S 2 6 8
11 9

Thus,
The number of persons who smoke (S) and have the disease (D) is 9.
The number who don’t smoke (S) and have the disease (D) is 2.
The number who smoke (S) and don’t have the disease (D) is 3.
The number who don’t smoke (S) and don’t have the disease (D) is
6.

3.3 Contingency Tables 8/38


Frequency Tables (continued)

D D
S 9 3 12
S 2 6 8
11 9

Logically, the values at the table margins give the total count for the
indicated characteristics. Thus,
12 persons smoked.
eight did not smoke.
11 had disease.
nine were disease free.

3.3 Contingency Tables 9/38


Some Notation

The following notation is useful in studying probability as it relates to


contingency tables.
The probability of observing an event A:
P (A)
The probability of observing an event A and an event B:
P (AB)
The probability of observing an event A or event B:
P (A ∪ B)
The probability of observing event A given that you have observed
event (B):
P (A | B)

3.3 Contingency Tables 10/38


Calculating Probabilities

D D
S 9 3 12
S 2 6 8
11 9
Given an observation is randomly drawn from the above table, we calculate
The probability of selecting a person who does not smoke:
8

P S = 20 = .40
The probability of selecting someone who has the disease:
P (D) = 1120 = .55
The probability of selecting someone who smokes and does not have
the disease:
3

P SD = 20 = .15
The probability of selecting a non-smoker who does not have the
disease:
6
P S D = 20 = .30
3.3 Contingency Tables 11/38
Calculating Probabilities (continued)

D D
S 9 3 12
S 2 6 8
11 9

The probability of selecting someone who smokes or is without


disease: 
P S ∪ D = 9+3+6 20 = 18
20 = .90
The probability of selecting someone who has disease or is a
non-smoker:
P D ∪ D = 9+2+6 = 17

20 20 = .85

3.3 Contingency Tables 12/38


Calculating Probabilities (continued)

D D
S 9 3 12
S 2 6 8
11 9

The probability of selecting someone with disease given the person


selected is a smoker:
9
P (D | S) = 12 = .75
The probability of selecting someone who doesn’t smoke given they
are disease
 free:
P S | D ≈ 69 = .67

3.3 Contingency Tables 13/38


Probability Tables

D D
S .45 .15 .60
S .10 .30 .40
.55 .45

A more common form of contingency table is obtained by dividing each


count in a frequency table by N in order to obtain probabilities. The
probability table shown here was constructed in this manner from the
frequency table shown previously.

3.3 Contingency Tables 14/38


Probability Tables (continued)

B B

A P (AB) P AB P (A)
  
A P AB P AB P A

P (B) P B

Given arbitrary variables A nd B, the cell and marginal entries depicted


here represent the values in a probability contingency table.

3.3 Contingency Tables 15/38


Probability Tables (continued)

B B

A P (AB) P AB P (A)
  
A P AB P AB P A

P (B) P B

Probabilities of the form P (A ∪ B) or P A ∪ B for example, would be
obtained by summing the appropriate cell entries.

3.3 Contingency Tables 16/38


Probability Tables (continued)

B B

A P (AB) P AB P (A)
  
A P AB P AB P A

P (B) P B
 
Thus P (A ∪ B) = P (AB) + P AB + P AB
and    
P A ∪ B = P AB + P AB + P AB

3.3 Contingency Tables 17/38


Probability Tables (continued)

B B

A P (AB) P AB P (A)
  
A P AB P AB P A

P (B) P B
Conditional Probabilities are calculated in the same manner as was used
with frequency tables. Thus for example,

P (AB)
P (A | B) =
P (B)
and

 P AB
P B|A =
P (A)
3.3 Contingency Tables 18/38
Independence

Two events A and B are said to be independent if

P (A | B) = P (A)
or equivalently if

P (AB) = P (A) P (B)

3.3 Contingency Tables 19/38


Independence (continued)

B B
A .18 .42 .60
A .12 .28 .40
.30 .70

Q: Are A and B independent?


A: Yes.
Q: How do you know?
A: Because P (A | B) = P (A) = .60
or equivalently
P (A) P (B) = P (AB) = .18

3.3 Contingency Tables 20/38


Sensitivity

Sensitivity is the probability that a person with the disease will test
positive for that disease or

Sensitivity = P (+ | D)

3.3 Contingency Tables 21/38


Sensitivity (continued)

D D
+ .008 .011 .019
− .001 .980 .981
.009 .991

Sensitivity = P (+ | D)
.008
=
.009
= .89

3.3 Contingency Tables 22/38


Specificity

Specificity is the probability that a person who does not have the disease
will test negative for the disease or

Specificity = P − | D

3.3 Contingency Tables 23/38


Specificity (continued)

D D
+ .008 .011 .019
− .001 .980 .981
.009 .991


Specificity = P − | D
.980
=
.991
= .99

3.3 Contingency Tables 24/38


Positive Predictive Value

Positive predictive value is the probability that a person who tests


positive for a disease has that disease or

PPV = P (D | +)

3.3 Contingency Tables 25/38


PPV (continued)

D D
+ .008 .011 .019
− .001 .980 .981
.009 .991

PPV = P (D | +)
.008
=
.019
= .42

3.3 Contingency Tables 26/38


Negative Predictive Value

Negative predictive value is the probability that a person who tests


negative for a disease does not have the disease or

NPV = P D | −

3.3 Contingency Tables 27/38


NPV (continued)

D D
+ .008 .011 .019
− .001 .980 .981
.009 .991


NPV = P D | −
.980
=
.981
= .999

3.3 Contingency Tables 28/38


Prevalence

Prevalence is the probability of disease or

Prevalence = P (D)

3.3 Contingency Tables 29/38


Prevalence (continued)

D D
+ .008 .011 .019
− .001 .980 .981
.009 .991

Prevalence = P (D)
= .009

3.3 Contingency Tables 30/38


The Risk Ratio

The risk ratio (RR) is formed by dividing the probability of disease in


some group exposed to a potential risk factor by the probability of disease
in some group not so exposed or

P (D | E )
RR = 
P D|E

3.3 Contingency Tables 31/38


Risk Ratio (continued)

E E
D .15 .10 .25
D .05 .70 .75
.20 .80

P (D | E )
RR = 
P D|E
.750
=
.125
= 6

3.3 Contingency Tables 32/38


The Odds Ratio

The odds that an event will occur is the probability that the event will
occur divided by the probability that the event will not occur. Thus, the
odds of disease for some group exposed to a potential risk factor would be

P (D | E )
odds = 
P D|E

3.3 Contingency Tables 33/38


The Odds Ratio (continued)

Likewise, the odds of disease for some group not exposed to some
potential risk factor would be

P D|E
odds = 
P D|E

3.3 Contingency Tables 34/38


The Odds Ratio (continued)

The odds ratio (OR) is defined as the odds of disease for an exposed
group divided by the odds of disease for an unexposed group or

P(D|E )
P (D|E )
OR =
P (D|E )
P (D|E )

3.3 Contingency Tables 35/38


The Odds Ratio (continued)

which simplifies to


P (D | E ) P D | E
OR =  
P D|E P D|E

3.3 Contingency Tables 36/38


Odds Ratio (continued)

E E
D .15 .10 .25
D .05 .70 .75
.20 .80


P (D | E ) P D | E
OR =  
P D|E P D|E
(.750) (.875)
=
(.250) (.125)
= 21

This means the odds of disease in the exposed group is 21 times that of
the unexposed group.

3.3 Contingency Tables 37/38


Bayes Rule

In its simplist form Bayes rule allows you to use P (A | B) to find


P (B | A). The rule is expressed as

P (A | B) P (B)
P (B | A) =  
P (A | B) P (B) + P A | B P B

3.3 Contingency Tables 38/38

You might also like