Professional Documents
Culture Documents
Pedoman Teori Probilitas
Pedoman Teori Probilitas
Underlying
Statistical Inference:
Probability and the
Normal Distribution
OBJECTIVES
After studying this chapter, you should be able to:
1. Explain the importance of probability theory for statistical inference.
2. Defi ne the characteristics of a probability measure, and explain the difference between a
theoretical probability distribution and an empirical probability distribution (a priori vs. a
posteriori).
3. Compute marginal, joint, and conditional probabilities from a cross-tabulation table and
correctly interpret their meaning.
4. Defi ne and derive sensitivity, specifi city, predictive value, and effi ciency from a cross-
tabulation table.
5. Identify and describe the characteristics of a normal distribution.
6. Use a standard normal distribution to obtain z-scores and percentiles.
7. Explain the importance of the central limit theorem.
61
measurements on the same group that are taken over prevalence of HPV was 28.6% among women aged
time (e.g., to test how a weight-loss program is 14 to 59 years (Dunne et al., 2007).
performing). In all these cases, researchers use
statistical inference as their tool for obtaining
information from a sample of data about the PROBABILITY: THE MATHEMATICS THAT
population from which the sample is drawn. UNDERLIES STATISTICS
Statistical inference uses probability to help the An understanding of probability is critical to an
researchers to assess the meaning of their fi ndings. understanding of statistical i nference. Competency
A particularly important probability distribution for in probability is needed to comprehend statistical
statistical inference in health-related research is the signifi cance (e.g., interpreting p-values), read cross-
normal (or Gaussian) distribution. This chapter tabulation tables, and understand frequency
focuses on the fundamental concepts of probability distributions, all of which are used in health care
and the normal distribution, both of which are crucial research extensively (see Chapters 2 and 10). In
to understanding the statistical techniques contained particular, correctly reading cross-tabulation tables
in the subsequent chapters of this book. (commonly referred to as “cross-tabs”) is a critical
skill for researchers and for clinicians and
Estimating Population Probabilities Using administrators who need to understand the research
Research literature. Using cross-tabs requires an
understanding of joint, conditional, and marginal
One example of using statistical inference to draw probabilities. Thus, after a brief discussion of defi
conclusions about a population from a sample comes nitions and concepts necessary to an understanding
to us from a recent populationbased study of an of probability, we will illustrate the principles of
urban area using the New York City Health and probability with examples from cross- tabulation
Nutrition Examination Survey (NYCHANES). In tables.
this study, the authors found that the prevalence of
diabetes (diagnosed and undiagnosed combined)
among adults aged 20 and above was 12.5%. Defi ning Probability
(Thorpe et al., 2009) Population-based studies have The general concept of objective probability can
also been used to estimate the baseline prevalence of be categorized under two areas: a priori
an exposure or disease. For example, in order to (theoretical or classical) probability and a
determine the effi cacy of the human papillomavirus posteriori (empirical or relative frequency)
(HPV) vaccine, the baseline population prevalence probability (Daniel, 2008; Mood, Graybill, &
for HPV must be determined. Using biological Boes, 1974). In theoretical probability, the
samples collected as part of the 2003 to 2004 distribution of events can be inferred without
NHANES cycle, it was estimated that the overall collecting data. For example, we can compute the
probability of getting “heads or tails” on a coin fl
CHAPTER 3 Key Principles Underlying Statistical Inference: Probability and the Normal Distribution 63
ip without actually fl ipping the coin. In empirical the respective probabilities (Daniel, 2008). The
probability, data must be collected by some probability distributions computed in Chapter 2
process, and the probability that each event will are examples of empirical probability
occur must be estimated from the data. In health distributions. The probability distributions used
care research, empirical probability is used when in inferential statistics (e.g., normal distribution,
reporting characteristics of a sample (e.g., 35% of binomial distribution, chi-square distribution, and
the sample was female) and classical probability Student’s t distribution) are examples of
(e.g., theoretical probability distributions) is used theoretical probability distributions.
when making statistical inferences about the data. Probability theory is based on the three axioms
Probability provides a numerical measure of stated by Kolmogorov (1956). These axioms are
uncertainty by giving a precise measurement of illustrated by examples in the next section:
the likelihood that an event will occur. An event
1. The probability that each event will occur
can be as simple as a single outcome (e.g., one
must be greater than or equal to 0 and less than
card is picked out of a deck of cards) or it can be
or equal to 1.
composed of a set of outcomes (e.g., fi ve cards
2. The sum of the probabilities of all the mutually
are picked out of a deck of cards). It can be an
exclusive outcomes of the sample space is
event from which results are inferred (e.g., a coin
equal to 1. Mutually exclusive outcomes are
fl ip) or an event for which data need to be
those that cannot occur at the same time (e.g.,
collected (e.g., percentage of premature births in a
on any given fl ip, a coin can be either heads
h ospital). Events that are uncertain are those that
or tails but not both).
may or may not occur. For example, there is a
small chance that a lottery ticket will be a winner
(i.e., the “event”), but there is a much larger
chance that it will not be a winner. People who
purchase lottery tickets are essentially willing to
pay for the uncertainty (the probability) of
winning.
Several defi nitions, notations, and f ormulas—
all of which are used throughout this chapter—
are useful for an understanding of probability
(Table 3-1). It is particularly critical to understand
two of these ideas, sample space and probability
distribution, when using statistics. Simply put,
sample space is the set of all possible outcomes
of a study. For example, if a coin is fl ipped, the
sample space has two possible outcomes: heads
and tails; if a six-sided die is rolled, the sample
space has six possible outcomes. Similarly, the
sample space for gender has two outcomes:
female and male.
A probability distribution is the set of
probabilities associated with each possible
outcome in the sample space. The probability
distribution of a variable can be expressed as a
table, graph, or formula; the key is that it specifi
es all the possible values of a random variable and
64 SECTION 1 Obtaining and Understanding Data
3. The probability that either of two mutually (yes/no). A total of 53 EDs provided appropriate
exclusive events, A or B, will occur is the sum responses to these two questions, and it turned out
of the probabilities of their individual that 33 EDs had a forensic nurse on staff and 41 EDs
probabilities. had a relationship with a rape crisis center.
Using Probability: Interpreting a Cross-
Tabulation Table
Marginal Probability
Table 3-1
PROBABILITY SYMBOLS AND DEFINITIONS
Symbol or
Term Meaning
Sample space The set of all possible outcomes of a study
Probability distribution The set of probabilities associated with each event in the sample space
p(AÇB) The joint probability that both events A and B will occur; also called the
intersection of A and B
p(AÈB) The probability that event A will happen and/or event B will happen; also
called the union of A and B
Addition rule p(AÈB) = p(A) + p(B) − p(AÇB)
Multiplication rule p(AÇB) = p(A) × p(A|B)
Independence of events A and B If p(A) = p(A|B), then A and B are independent
In this
CROSS-TABULATION TABLE: FORENSIC NURSE ON EMERGENCY DEPARTMENT STAFF BY
section, we
EMERGENCY DEPARTMENT LINKAGE TO A RAPE CRISIS CENTER
illustrate
the different types of probability using a cross-
tabulation table from a study of emergency
department (ED) services for sexual assault victims
in Virginia (Table 3-2) (Plichta, Vandecar-Burdin,
Odor, Reams, & Zhang, 2007). This study examined
the question: Does having a forensic nurse trained to
assist victims of sexual violence on staff in an ED
affect the probability that a hospital will have a
relationship with a rape crisis center? Some experts
thought that having a forensic nurse on staff might
actually reduce the chance that an ED would have a
connection with a rape crisis center. The two
variables of interest here are “forensic nurse on staff”
(yes/ no) and “relationship with rape crisis center”
CHAPTER 3 Key Principles Underlying Statistical Inference: Probability and the Normal Distribution 65
The marginal probability is simply the number of that the ED will have a relationship with a rape
times the event occurred divided by the total number crisis center is
of times that it could have occurred. When using
relative frequency probability, the probability of an
event is the number of times the event occurred p B( )= = .7736
divided by In other words, 77.36% of the hospitals have such
a relationship, and the probability that the ED will
not have a relationship with a rape crisis center is
34% 34%
14% 14%
2% 2%
Standard – 3s – 2s – 1s X + 1s + 2s +3s FIGURE 3-1 Normal distribudeviation units tion with Standard deviation units.
STANDARD NORMAL DISTRIBUTION AND These z-scores are very useful because they follow
a known probability distribution, and this allows
UNDERSTANDING Z-SCORES for the computation of percentile ranks and the
The standard normal distribution is a particularly assessment of the extent to which a given data
useful form of the normal distribution in which the point is different from the rest of a data set.
mean is 0 and the SD is 1. Data points in any
normally distributed data set can be converted to a
Understanding and Using z-Scores
standard normal distribution by transforming the
data points into z-scores. The z-score shows how A z-score measures the number of SDs that an
many SD a given score actual value lies from the mean; it can be positive
or negative. A data point with a positive z-score
CHAPTER 3 Key Principles Underlying Statistical Inference: Probability and the Normal Distribution 71
has a value that is above (to the right of) the mean, where x is the value of the data point, m is the
and a negative z-score indicates that the value is mean, and s is the SD.
below (to the left of) the mean. Knowing the For example, the ages of the 62 young women
probability that someone will score below, at, or who participated in a program run by an
above the mean on a test can be very useful. In adolescent health clinic are listed in Table 3-3.
health care, the criteria that typically defi ne The ages range from 10 to 22 years and are
laboratory tests (e.g., glucose, thyroid, and roughly normally distributed. The average age is
electrolytes) as abnormal are based on the 16 years, with an SD of 2.94. To fi nd out what
standard normal distribution, with scores that percentage of the girls are 14 years of age or
occur less than 95% of the time. In particular, younger, the percentile rank of 14 is found by
those with a z-score of ±2 or greater (representing computing the z-score of 14 using Equation 3-16:
very large and very small values) are defi ned as
abnormal (Seaborg, 2007).
z= =-0.6802
After this value is computed, we take the absolute
Using Z-Scores to Compute Percentile Ranks
value (| − 0.6802| = 0.6802), and look up this z-
Percentiles allow us to describe a given score in score in the z-table, which is also called the
relation to other scores in a distribution. A “Table of the Area under the Normal Curve” (see
percentile tells us the relative position of a given Appendix B). By looking up the fi rst digit and fi
score and allows us to compare scores on tests that rst decimal place in the column labeled “z,” the
have different means and SDs. A percentile rank value 0.6 can be found in the column. Then,
is calculated as across the top of the table, the hundredth decimal
Number of scores less than a given score place can be found; in this case, .08. The number
that is located at the intersection of the row and
´100
column (25.17) is the percentage of the area under
Total number of scores (3-15)
the curve between the z-score of 0.68 and the
Suppose you received a score of 90 on a test given mean (recall that the mean in the standard
to a class of 50 people. Of your classmates, 40 had normal distribution is 0). A positive z-score is
scores lower than 90. Your percentile rank would above the mean, whereas a negative z-score falls
be below the mean.
The z-score table provides the z-scores for the
(40 / 50)´100 = 80 positive side of the distribution, but
AGES OF PARTICIPANTS IN
You achieved a higher score than 80% of the A PROGRAM FOR ADOLESCENT
people who took the test, which also means that GIRLS
almost 20% who took the test did better than you. Age (Years)
To compute the percentile rank of a single
value (data point) of a variable from a normal 10 20 18
distribution, a z-score is fi rst computed, and then 11 11 19
that z-score is located in a z-table to obtain the
12 12 12
percentile rank. The equation to calculate z-scores
is 13 13 13
14 14 14
x -m
15 15 15
z= (3-18) (3-16)
15 15 15
s
72 SECTION 1 Obtaining and Understanding Data
15 15 15
16 16 16
16 16 16
17 17 17
17 17 17
18 18 18
18 18 13
19 19 14
19 10 15
20 20 15
21 21 16
22 22 16
17 18
20 11