Chapter Four: Introduction To Inference

Chapter Four: Introduction To Inference
1/50
4.1 Introduction
In this chapter you will learn the rationale underlying inference.

You will also learn to apply certain inferential techniques.
The methods introduced in this chapter are not commonly employed
in research but are important pedagogically.
They are relatively simple and their mastery will open the way for
understanding the more complex methods dealt with in following
chapters.
4.1 Introduction 2/50

4.1 Introduction (continued)
The techniques you will learn may be divided into two broad
categories.
1 Tests of hypotheses.
2 Confidence Intervals.
Before you can begin their study you must understand the concept of
sampling distributions.
4.1 Introduction 3/50

4.2 Sampling Distributions: Definition
A sampling distribution is a distribution of sample statistics obtained

from samples repeatedly drawn from one or more populations.
4.2 Sampling Distributions 4/50

Sampling Distribution of x̄
The sampling distribution of x̄ can be formed by taking repeated samples

from some population, calculating x̄ for each sample, and forming the
resultant sample means into a relative frequency distribution.

Characteristics of the Sampling Dist. of x̄
The following characteristics of the sampling distribution of x̄ should be

noted.
1 The mean of the sampling distribution is equal to the mean of the
population from which the samples were drawn.
2 The mean of the sampling distribution of some statistic is referred to
as the expected value of the statistic and is symbolized by E []
where [] contains an identifier of the statistic.

Characteristics (continued)
3 E [x̄] = µ. This is a restatement of characteristic (1) given above.

4 The standard deviation of the sampling distribution of x̄ is termed the
standard error of the mean and is symbolized by σx̄ .
5 σx̄ = √σn where σ is the population standard deviation and n is the
sample size.

Characteristics (continued)
6 When the population from which samples are drawn is normally

distributed, the sampling distribution of x̄ will also be normally
distributed.
7 When the population from which samples are drawn is not normally
distributed, the sampling distribution of x̄ will approach normality as
sample size (n) increases. This is an expression of the central limit
theorem.
8 Roughly speaking, the central limit theorem states that the
sampling distributions of certain classes of statistics will approach
normality as sample size (n) increases regardless of the shape of the
sampled population.

Example
Given a population with standard deviation 5.293, find the standard

deviations of sampling distributions generated from this population when
samples are of sizes 10, 30 and 50.

Solution
Using equation 4.1 on page 76, we calculate the standard errors of the
mean for sample sizes 10, 30 and 50 as follows,
5.293
√
10
= 1.67.
5.293
√
30
= .97.
5.293
√
50
= .75.

Using The Normal Curve
Just as you used the normal curve model to estimate probabilities

associated with the selection of a single observation from a population, so
too you can use this model to estimate probabilities associated with the
means of samples selected from a population.

Z Score
Z scores associated with sample means are calculated as follows.
x̄ − µ
Z=
√σ
n

Example
Given a population with mean 110.023 and standard deviation 4.970,

estimate the probability of randomly selecting a sample of 15 observations
and finding that the mean of the sample is greater than 111.

Solution
The Z score for a mean of 111.0 is

111.0 − 110.023
Z= 4.970
= .76
√
15
The associated tail area is .2236 which is our estimated probability.

Example
Suppose 100 observations are randomly selected from a population whose

mean and standard deviation are respectively 100 and 20. What is the
probability that the mean of these observations will be between 99 and
103?

Solution
The√area of a normal curve with mean 100 and standard deviation

20/ 100 that lies between 99 and 103 is the sum of the areas
between 99 and 100 and 100 and 103.
The Z score and area between 99 and 100 are respectively,
Z = 99.0−100.0
√20.0 = −.50 and .1915.
100
The same values for the area between 103 and 100 are
Z = 103.0−100.0
20.0
√
= 1.50 and .4332.
100
The probability estimate is then .1915 + .4332 = .6247.

Distribution Of p̂
A dichotomous population is made up of some dichotomous

characteristic such as lived—died, tumor remission—no tumor
remission, pain—no pain etc.
Traditionally, when speaking in a general sense, one of the two
dichotomous outcomes is termed “success” and the other “failure.”

Distribution Of p̂ (continued)
If the members of the population with the characteristic “success” are

assigned the number one and those with a “failure” characteristic a zero,
then the mean of the population will be the sum of the ones and zeros
divided by the total number of observations in the population which is also
the proportion of successes in the population.

We designate the proportion of successes in the population as π and the

proportion in a sample drawn from the population as p̂.

It can be shown that the standard deviation of the sampling distribution of

p̂, termed the standard error of p̂ is given by
r
π (1 − π)
σp̂ =
n

Example
Given a dichotomous population where the proportion of successes is .10,

find the standard deviation of the sampling distribution of p̂ if sample size
is 5. Recalculate the standard error assuming samples of size 50.

Solution
The standard error of p̂ for samples of size 5 is

r r
π (1 − π) (.10) (.90)
σp̂ = = = .134
n 5
The standard error of p̂ for samples of size 50 is
r
(.10) (.90)
σp̂ = = .042.
50

The Binomial Distribution
If the population is large and certain other conditions are met, the
binomial distribution can be used to model the sampling distribution of p̂.

The Binomial Distribution (continued)
The binomial distribution is generated by the equation
n!
P (y ) = π y (1 − π)n−y
y ! (n − y )!
where P(y ) is the probability of y successes in a sample of size n taken

from a population where the proportion of successes is π.

Example
Calculate the sampling distribution of p̂ for samples of size 5 drawn from a

population in which the proportion of successes is .10.

Solution
5!
P (0) = .100 (1 − .10)5−0
0! (5 − 0)!
5!
= .100 .905
0! 5!
= .905
= .59049.

Solution (continued)
5!
P (1) = .101 (1 − .10)5−1
1! (5 − 1)!
5 · 4! 1 4
= .10 .90
1! 4!
= (5) (.10) (.6561)
= .32805.

5!
P (2) = .102 (1 − .10)5−2
2! (5 − 2)!
5 · 4 · 3! 2 3
= .10 .90
2! 3!
= (10) (.01) (.729)
= .0729

5!
P (3) = .103 (1 − .10)5−3
3! (5 − 3)!
5 · 4 · 3 · 2! 3 2
= .10 .90
3! 2!
= (10) (.001) (.81)
= .0081.

5!
P (4) = .104 (1 − .10)5−4
4! (5 − 4)!
5 · 4! 4 1
= .10 .90
4! 1!
= (5) (.0001) (.90)
= .00045.

5!
P (5) = .105 (1 − .10)5−5
5! (5 − 5)!
5!
= .105 .900
5! 0!
= .105
= .00001.

Table: Sampling distributions of p̂ for n = 5 and π = .10.

Number of
Proportion Successes Probability
p̂ y P (y )
.00 0 .59049
.20 1 .32805
.40 2 .07290
.60 3 .00810
.80 4 .00045
1.00 5 .00001

Example
Given that 10% of the residents of the United States would test positive
for a certain antibody, what is the probability of randomly selecting five
residents of the United States and finding that
all five test positive for the antibody?
at least four (i.e., four or more) will test positive?
at least one will be positive?

Solution
Number of
Proportion Successes Probability
p̂ y P (y )
.00 0 .59049
.20 1 .32805
.40 2 .07290
.60 3 .00810
.80 4 .00045
1.00 5 .00001
The probability that all five residents test positive is P (5) = .00001.
The probability that at least four test positive is
P (4) + P (5) = .00045 + .00001 = .00046
The probability that at least one tests positive is
P (1)+P (2)+P (3)+P (4)+P (5) = 1−P (0) = 1−.59049 = .40951.
Example
A researcher believes that the proportion of blood donors in Iceland

with type O positive blood is greater than .38 which is the proportion
in the US.
f the researcher assesses the blood types of 10 randomly selected
donors in Iceland, what is the probability that 9 or 10 of the selected
donors will have this blood type if the proportion is .38?
If the number of subjects with type O positive blood is in fact 9 or
10, what implications would this have for the researcher’s belief?

Solution
Given a population proportion of .38, the probability that the sample will
contain 9 or 10 donors with type O positive blood is P (9) + P (10).

10!
P (9) = .389 (1 − .38)10−9
9! (10 − 9)!
10 · 9! 9 1
= .38 .62
9! 1!
= (10) (.00017) (.62)
= .00105
P (10) = .3810 = .00006.

The probability that 9 or 10 donors in the sample will have type O

positive blood is then 0.00105 + 0.00006 = 0.00111.
If the number of donors in the sample with type O positive blood is 9
or 10 the researcher’s theory is supported because the probability of
achieving such a result from a population where the proportion is .38
is so small.
It is likely, though not proven, that the proportion of type O positives
in the Islandic blood donor population is greater than .38.

Normal Curve Approximation
When sample size is sufficiently large, the normal curve can be used
to approximate the sampling distribution of p̂.
The question as to how large a sample must be in order to obtain an
adequate approximation cannot be answered definitively.
An often used rule of thumb states that the normal curve
approximation will be satisfactory so long as both nπ and n (1 − π)
are greater than or equal to five though some authors maintain that
these values should be greater than or equal to 10.

Normal Curve Approximation (continued)
The normal curve model is used to approximate probabilities associated

with the distribution of p̂ by means of the following equation.
p̂ − π
Z=q
π(1−π)
n
where p̂ is the sample proportion of successes, π is the population

proportion and n is the sample size.

Example
Suppose a random sample of 50 observations is taken from a

dichotomous population in which the proportion of successes is .10.
What is the probability that the proportion of successes in the sample
will be greater than .12?

Solution
The estimated probability will be the area under a normal curve with
mean .10 that lies above .13.
Because the proportion of successes can only take values .00, .02, .04,
. . . , .12, .14, . . . , 1.00, the upper real limit of the .12 interval (i.e.,
.13) is used rather than .12.
The upper limit is employed because the problem is to find the
probability that the proportion of successes is greater than .12. The
lower limit would have been used if the problem required the
probability of obtaining a proportion of .12 or greater.

Upper and lower real limits of binomial proportions can be computed

directly by adding and subtracting .5/n.
For the present case the upper real limit is .12 + .5/50 = .13.
Using upper and lower limits in this fashion when using a continuous
distribution to approximate probabilities associated with a discrete
variable is an example of what is referred to as a continuity
correction

We now wcalculate
p̂ − π .13 − .10 .03
Z=q =q = = .71.
π(1−π) (.10)(.90) .0424
n 50
Reference to the normal curve table in Appendix A gives an

associated area of .2389.
The value as calculated by the binomial method is
P (7) + P (8) + · · · + P (50) = .2298.

Example
Suppose a random sample of 50 observations is taken from a

dichotomous population in which the proportion of successes is .10.
What is the probability that the proportion of successes in the sample
will be .12?

Solution
The estimate will be the area between the lower real limit of .11 and
the upper real limit of .13.
As calculated previously, the Z score for .13 is .71 while that for .11 is
.11 − .10 .01
Z=q = = .24.
(.10)(.90) .0424
50
Using these values in the normal curve table shows that the areas
between .13 and .10 and .11 and .10 are .2611 and .0948 respectively.
The area between .11 and .13 is then .2611 − .0948 = .1663.

Example
Approximately 16 percent of men in the United States aged 60 to 64

who exhibit a particular risk profile will have a heart attack in the
next 10 years.
If a random sample of 300 such men are observed over the next 10
years, what is the probability that less than 5% will experience a heart
attack?

Solution
Because the problem specifies that less than 5% will experience a

heart attack, the lower real limit of the five percent interval will be
used. This limit is .05 − .5/300 = .048.
The Z score is then
.048 − .16 −.112
Z=q = = −5.33.
(.16)(.84) .021
300
The normal curve table does not contain Z values of this magnitude
but it can be safely concluded that the probability is less than .0002.
(This is the tail area associated with Z = 3.50 which is the most
extreme score in the table.)

Example
Suppose it is believed that a large community is evenly divided in its

opinion as to whether a cap should be placed on the amount that can
be recovered in medical malpractice law suits.
If this supposition is correct, what is the probability that a random
poll of 200 community members will produce 55 percent or more
favorable responses? Compute the probability with and without
continuity correction.

Solution
The continuity correction is .5/200 = .0025. Because the task is to

find the probability that 55 percent or more will be favorable, the
lower real limit of the 55 percent category or .55 − .0025 = .5475 will
be used.
Because the community is assumed evenly divided, the proportion
favorable in the population is taken to be .50.
The Z score is then
.5475 − .50 .0475
Z= q = = 1.36.
(.50)(.50) .035
200
The area above 1.36 is .0869.

Without continuity correction the Z score is .05/.035 = 1.43 which
has an upper tail area of .0764.
The probability as computed by the binomial equation is .0895.

Chapter Four: Introduction To Inference

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter Four: Introduction To Inference

Uploaded by

Copyright:

Available Formats

Chapter Four: Introduction To Inference

In this chapter you will learn the rationale underlying inference.

4.1 Introduction 2/50

4.1 Introduction 3/50

A sampling distribution is a distribution of sample statistics obtained

4.2 Sampling Distributions 4/50

The sampling distribution of x̄ can be formed by taking repeated samples

4.2 Sampling Distributions 5/50

The following characteristics of the sampling distribution of x̄ should be

4.2 Sampling Distributions 6/50

3 E [x̄] = µ. This is a restatement of characteristic (1) given above.

4.2 Sampling Distributions 7/50

6 When the population from which samples are drawn is normally

4.2 Sampling Distributions 8/50

Given a population with standard deviation 5.293, find the standard

4.2 Sampling Distributions 9/50

4.2 Sampling Distributions 10/50

Just as you used the normal curve model to estimate probabilities

4.2 Sampling Distributions 11/50

Z scores associated with sample means are calculated as follows.

4.2 Sampling Distributions 12/50

Given a population with mean 110.023 and standard deviation 4.970,

4.2 Sampling Distributions 13/50

The Z score for a mean of 111.0 is

The associated tail area is .2236 which is our estimated probability.

4.2 Sampling Distributions 14/50

Suppose 100 observations are randomly selected from a population whose

4.2 Sampling Distributions 15/50

The√area of a normal curve with mean 100 and standard deviation

The probability estimate is then .1915 + .4332 = .6247.

4.2 Sampling Distributions 16/50

A dichotomous population is made up of some dichotomous

4.2 Sampling Distributions 17/50

If the members of the population with the characteristic “success” are

4.2 Sampling Distributions 18/50

We designate the proportion of successes in the population as π and the

4.2 Sampling Distributions 19/50

It can be shown that the standard deviation of the sampling distribution of

4.2 Sampling Distributions 20/50

Given a dichotomous population where the proportion of successes is .10,

4.2 Sampling Distributions 21/50

The standard error of p̂ for samples of size 5 is

4.2 Sampling Distributions 22/50

4.2 Sampling Distributions 23/50

The binomial distribution is generated by the equation

where P(y ) is the probability of y successes in a sample of size n taken

4.2 Sampling Distributions 24/50

Calculate the sampling distribution of p̂ for samples of size 5 drawn from a

4.2 Sampling Distributions 25/50

4.2 Sampling Distributions 26/50

4.2 Sampling Distributions 27/50

4.2 Sampling Distributions 28/50

4.2 Sampling Distributions 29/50

4.2 Sampling Distributions 30/50

4.2 Sampling Distributions 31/50

Table: Sampling distributions of p̂ for n = 5 and π = .10.

4.2 Sampling Distributions 32/50

4.2 Sampling Distributions 33/50

A researcher believes that the proportion of blood donors in Iceland

4.2 Sampling Distributions 35/50

4.2 Sampling Distributions 36/50

P (10) = .3810 = .00006.