Professional Documents
Culture Documents
Chapter Four: Introduction To Inference
Chapter Four: Introduction To Inference
1/50
4.1 Introduction
The techniques you will learn may be divided into two broad
categories.
1 Tests of hypotheses.
2 Confidence Intervals.
Before you can begin their study you must understand the concept of
sampling distributions.
Using equation 4.1 on page 76, we calculate the standard errors of the
mean for sample sizes 10, 30 and 50 as follows,
5.293
√
10
= 1.67.
5.293
√
30
= .97.
5.293
√
50
= .75.
x̄ − µ
Z=
√σ
n
The same values for the area between 103 and 100 are
Z = 103.0−100.0
20.0
√
= 1.50 and .4332.
100
If the population is large and certain other conditions are met, the
binomial distribution can be used to model the sampling distribution of p̂.
n!
P (y ) = π y (1 − π)n−y
y ! (n − y )!
5!
P (0) = .100 (1 − .10)5−0
0! (5 − 0)!
5!
= .100 .905
0! 5!
= .905
= .59049.
5!
P (1) = .101 (1 − .10)5−1
1! (5 − 1)!
5 · 4! 1 4
= .10 .90
1! 4!
= (5) (.10) (.6561)
= .32805.
5!
P (2) = .102 (1 − .10)5−2
2! (5 − 2)!
5 · 4 · 3! 2 3
= .10 .90
2! 3!
= (10) (.01) (.729)
= .0729
5!
P (3) = .103 (1 − .10)5−3
3! (5 − 3)!
5 · 4 · 3 · 2! 3 2
= .10 .90
3! 2!
= (10) (.001) (.81)
= .0081.
5!
P (4) = .104 (1 − .10)5−4
4! (5 − 4)!
5 · 4! 4 1
= .10 .90
4! 1!
= (5) (.0001) (.90)
= .00045.
5!
P (5) = .105 (1 − .10)5−5
5! (5 − 5)!
5!
= .105 .900
5! 0!
= .105
= .00001.
Given that 10% of the residents of the United States would test positive
for a certain antibody, what is the probability of randomly selecting five
residents of the United States and finding that
all five test positive for the antibody?
at least four (i.e., four or more) will test positive?
at least one will be positive?
Number of
Proportion Successes Probability
p̂ y P (y )
.00 0 .59049
.20 1 .32805
.40 2 .07290
.60 3 .00810
.80 4 .00045
1.00 5 .00001
The probability that all five residents test positive is P (5) = .00001.
The probability that at least four test positive is
P (4) + P (5) = .00045 + .00001 = .00046
The probability that at least one tests positive is
P (1)+P (2)+P (3)+P (4)+P (5) = 1−P (0) = 1−.59049 = .40951.
4.2 Sampling Distributions 34/50
Example
Given a population proportion of .38, the probability that the sample will
contain 9 or 10 donors with type O positive blood is P (9) + P (10).
10!
P (9) = .389 (1 − .38)10−9
9! (10 − 9)!
10 · 9! 9 1
= .38 .62
9! 1!
= (10) (.00017) (.62)
= .00105
When sample size is sufficiently large, the normal curve can be used
to approximate the sampling distribution of p̂.
The question as to how large a sample must be in order to obtain an
adequate approximation cannot be answered definitively.
An often used rule of thumb states that the normal curve
approximation will be satisfactory so long as both nπ and n (1 − π)
are greater than or equal to five though some authors maintain that
these values should be greater than or equal to 10.
p̂ − π
Z=q
π(1−π)
n
The estimated probability will be the area under a normal curve with
mean .10 that lies above .13.
Because the proportion of successes can only take values .00, .02, .04,
. . . , .12, .14, . . . , 1.00, the upper real limit of the .12 interval (i.e.,
.13) is used rather than .12.
The upper limit is employed because the problem is to find the
probability that the proportion of successes is greater than .12. The
lower limit would have been used if the problem required the
probability of obtaining a proportion of .12 or greater.
We now wcalculate
p̂ − π .13 − .10 .03
Z=q =q = = .71.
π(1−π) (.10)(.90) .0424
n 50
The estimate will be the area between the lower real limit of .11 and
the upper real limit of .13.
As calculated previously, the Z score for .13 is .71 while that for .11 is
.11 − .10 .01
Z=q = = .24.
(.10)(.90) .0424
50
Using these values in the normal curve table shows that the areas
between .13 and .10 and .11 and .10 are .2611 and .0948 respectively.
The area between .11 and .13 is then .2611 − .0948 = .1663.
The normal curve table does not contain Z values of this magnitude
but it can be safely concluded that the probability is less than .0002.
(This is the tail area associated with Z = 3.50 which is the most
extreme score in the table.)