Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Chapter Four: Introduction To Inference

1/50
4.1 Introduction

In this chapter you will learn the rationale underlying inference.


You will also learn to apply certain inferential techniques.
The methods introduced in this chapter are not commonly employed
in research but are important pedagogically.
They are relatively simple and their mastery will open the way for
understanding the more complex methods dealt with in following
chapters.

4.1 Introduction 2/50


4.1 Introduction (continued)

The techniques you will learn may be divided into two broad
categories.
1 Tests of hypotheses.
2 Confidence Intervals.
Before you can begin their study you must understand the concept of
sampling distributions.

4.1 Introduction 3/50


4.2 Sampling Distributions: Definition

A sampling distribution is a distribution of sample statistics obtained


from samples repeatedly drawn from one or more populations.

4.2 Sampling Distributions 4/50


Sampling Distribution of x̄

The sampling distribution of x̄ can be formed by taking repeated samples


from some population, calculating x̄ for each sample, and forming the
resultant sample means into a relative frequency distribution.

4.2 Sampling Distributions 5/50


Characteristics of the Sampling Dist. of x̄

The following characteristics of the sampling distribution of x̄ should be


noted.
1 The mean of the sampling distribution is equal to the mean of the
population from which the samples were drawn.
2 The mean of the sampling distribution of some statistic is referred to
as the expected value of the statistic and is symbolized by E []
where [] contains an identifier of the statistic.

4.2 Sampling Distributions 6/50


Characteristics (continued)

3 E [x̄] = µ. This is a restatement of characteristic (1) given above.


4 The standard deviation of the sampling distribution of x̄ is termed the
standard error of the mean and is symbolized by σx̄ .
5 σx̄ = √σn where σ is the population standard deviation and n is the
sample size.

4.2 Sampling Distributions 7/50


Characteristics (continued)

6 When the population from which samples are drawn is normally


distributed, the sampling distribution of x̄ will also be normally
distributed.
7 When the population from which samples are drawn is not normally
distributed, the sampling distribution of x̄ will approach normality as
sample size (n) increases. This is an expression of the central limit
theorem.
8 Roughly speaking, the central limit theorem states that the
sampling distributions of certain classes of statistics will approach
normality as sample size (n) increases regardless of the shape of the
sampled population.

4.2 Sampling Distributions 8/50


Example

Given a population with standard deviation 5.293, find the standard


deviations of sampling distributions generated from this population when
samples are of sizes 10, 30 and 50.

4.2 Sampling Distributions 9/50


Solution

Using equation 4.1 on page 76, we calculate the standard errors of the
mean for sample sizes 10, 30 and 50 as follows,
5.293

10
= 1.67.
5.293

30
= .97.
5.293

50
= .75.

4.2 Sampling Distributions 10/50


Using The Normal Curve

Just as you used the normal curve model to estimate probabilities


associated with the selection of a single observation from a population, so
too you can use this model to estimate probabilities associated with the
means of samples selected from a population.

4.2 Sampling Distributions 11/50


Z Score

Z scores associated with sample means are calculated as follows.

x̄ − µ
Z=
√σ
n

4.2 Sampling Distributions 12/50


Example

Given a population with mean 110.023 and standard deviation 4.970,


estimate the probability of randomly selecting a sample of 15 observations
and finding that the mean of the sample is greater than 111.

4.2 Sampling Distributions 13/50


Solution

The Z score for a mean of 111.0 is


111.0 − 110.023
Z= 4.970
= .76

15

The associated tail area is .2236 which is our estimated probability.

4.2 Sampling Distributions 14/50


Example

Suppose 100 observations are randomly selected from a population whose


mean and standard deviation are respectively 100 and 20. What is the
probability that the mean of these observations will be between 99 and
103?

4.2 Sampling Distributions 15/50


Solution

The√area of a normal curve with mean 100 and standard deviation


20/ 100 that lies between 99 and 103 is the sum of the areas
between 99 and 100 and 100 and 103.
The Z score and area between 99 and 100 are respectively,
Z = 99.0−100.0
√20.0 = −.50 and .1915.
100

The same values for the area between 103 and 100 are
Z = 103.0−100.0
20.0

= 1.50 and .4332.
100

The probability estimate is then .1915 + .4332 = .6247.

4.2 Sampling Distributions 16/50


Distribution Of p̂

A dichotomous population is made up of some dichotomous


characteristic such as lived—died, tumor remission—no tumor
remission, pain—no pain etc.
Traditionally, when speaking in a general sense, one of the two
dichotomous outcomes is termed “success” and the other “failure.”

4.2 Sampling Distributions 17/50


Distribution Of p̂ (continued)

If the members of the population with the characteristic “success” are


assigned the number one and those with a “failure” characteristic a zero,
then the mean of the population will be the sum of the ones and zeros
divided by the total number of observations in the population which is also
the proportion of successes in the population.

4.2 Sampling Distributions 18/50


Distribution Of p̂ (continued)

We designate the proportion of successes in the population as π and the


proportion in a sample drawn from the population as p̂.

4.2 Sampling Distributions 19/50


Distribution Of p̂ (continued)

It can be shown that the standard deviation of the sampling distribution of


p̂, termed the standard error of p̂ is given by
r
π (1 − π)
σp̂ =
n

4.2 Sampling Distributions 20/50


Example

Given a dichotomous population where the proportion of successes is .10,


find the standard deviation of the sampling distribution of p̂ if sample size
is 5. Recalculate the standard error assuming samples of size 50.

4.2 Sampling Distributions 21/50


Solution

The standard error of p̂ for samples of size 5 is


r r
π (1 − π) (.10) (.90)
σp̂ = = = .134
n 5
The standard error of p̂ for samples of size 50 is
r
(.10) (.90)
σp̂ = = .042.
50

4.2 Sampling Distributions 22/50


The Binomial Distribution

If the population is large and certain other conditions are met, the
binomial distribution can be used to model the sampling distribution of p̂.

4.2 Sampling Distributions 23/50


The Binomial Distribution (continued)

The binomial distribution is generated by the equation

n!
P (y ) = π y (1 − π)n−y
y ! (n − y )!

where P(y ) is the probability of y successes in a sample of size n taken


from a population where the proportion of successes is π.

4.2 Sampling Distributions 24/50


Example

Calculate the sampling distribution of p̂ for samples of size 5 drawn from a


population in which the proportion of successes is .10.

4.2 Sampling Distributions 25/50


Solution

5!
P (0) = .100 (1 − .10)5−0
0! (5 − 0)!
5!
= .100 .905
0! 5!
= .905
= .59049.

4.2 Sampling Distributions 26/50


Solution (continued)

5!
P (1) = .101 (1 − .10)5−1
1! (5 − 1)!
5 · 4! 1 4
= .10 .90
1! 4!
= (5) (.10) (.6561)
= .32805.

4.2 Sampling Distributions 27/50


Solution (continued)

5!
P (2) = .102 (1 − .10)5−2
2! (5 − 2)!
5 · 4 · 3! 2 3
= .10 .90
2! 3!
= (10) (.01) (.729)
= .0729

4.2 Sampling Distributions 28/50


Solution (continued)

5!
P (3) = .103 (1 − .10)5−3
3! (5 − 3)!
5 · 4 · 3 · 2! 3 2
= .10 .90
3! 2!
= (10) (.001) (.81)
= .0081.

4.2 Sampling Distributions 29/50


Solution (continued)

5!
P (4) = .104 (1 − .10)5−4
4! (5 − 4)!
5 · 4! 4 1
= .10 .90
4! 1!
= (5) (.0001) (.90)
= .00045.

4.2 Sampling Distributions 30/50


Solution (continued)

5!
P (5) = .105 (1 − .10)5−5
5! (5 − 5)!
5!
= .105 .900
5! 0!
= .105
= .00001.

4.2 Sampling Distributions 31/50


Solution (continued)

Table: Sampling distributions of p̂ for n = 5 and π = .10.


Number of
Proportion Successes Probability
p̂ y P (y )
.00 0 .59049
.20 1 .32805
.40 2 .07290
.60 3 .00810
.80 4 .00045
1.00 5 .00001

4.2 Sampling Distributions 32/50


Example

Given that 10% of the residents of the United States would test positive
for a certain antibody, what is the probability of randomly selecting five
residents of the United States and finding that
all five test positive for the antibody?
at least four (i.e., four or more) will test positive?
at least one will be positive?

4.2 Sampling Distributions 33/50


Solution

Number of
Proportion Successes Probability
p̂ y P (y )
.00 0 .59049
.20 1 .32805
.40 2 .07290
.60 3 .00810
.80 4 .00045
1.00 5 .00001

The probability that all five residents test positive is P (5) = .00001.
The probability that at least four test positive is
P (4) + P (5) = .00045 + .00001 = .00046
The probability that at least one tests positive is
P (1)+P (2)+P (3)+P (4)+P (5) = 1−P (0) = 1−.59049 = .40951.
4.2 Sampling Distributions 34/50
Example

A researcher believes that the proportion of blood donors in Iceland


with type O positive blood is greater than .38 which is the proportion
in the US.
f the researcher assesses the blood types of 10 randomly selected
donors in Iceland, what is the probability that 9 or 10 of the selected
donors will have this blood type if the proportion is .38?
If the number of subjects with type O positive blood is in fact 9 or
10, what implications would this have for the researcher’s belief?

4.2 Sampling Distributions 35/50


Solution

Given a population proportion of .38, the probability that the sample will
contain 9 or 10 donors with type O positive blood is P (9) + P (10).

4.2 Sampling Distributions 36/50


Solution (continued)

10!
P (9) = .389 (1 − .38)10−9
9! (10 − 9)!
10 · 9! 9 1
= .38 .62
9! 1!
= (10) (.00017) (.62)
= .00105

P (10) = .3810 = .00006.

4.2 Sampling Distributions 37/50


Solution (continued)

The probability that 9 or 10 donors in the sample will have type O


positive blood is then 0.00105 + 0.00006 = 0.00111.
If the number of donors in the sample with type O positive blood is 9
or 10 the researcher’s theory is supported because the probability of
achieving such a result from a population where the proportion is .38
is so small.
It is likely, though not proven, that the proportion of type O positives
in the Islandic blood donor population is greater than .38.

4.2 Sampling Distributions 38/50


Normal Curve Approximation

When sample size is sufficiently large, the normal curve can be used
to approximate the sampling distribution of p̂.
The question as to how large a sample must be in order to obtain an
adequate approximation cannot be answered definitively.
An often used rule of thumb states that the normal curve
approximation will be satisfactory so long as both nπ and n (1 − π)
are greater than or equal to five though some authors maintain that
these values should be greater than or equal to 10.

4.2 Sampling Distributions 39/50


Normal Curve Approximation (continued)

The normal curve model is used to approximate probabilities associated


with the distribution of p̂ by means of the following equation.

p̂ − π
Z=q
π(1−π)
n

where p̂ is the sample proportion of successes, π is the population


proportion and n is the sample size.

4.2 Sampling Distributions 40/50


Example

Suppose a random sample of 50 observations is taken from a


dichotomous population in which the proportion of successes is .10.
What is the probability that the proportion of successes in the sample
will be greater than .12?

4.2 Sampling Distributions 41/50


Solution

The estimated probability will be the area under a normal curve with
mean .10 that lies above .13.
Because the proportion of successes can only take values .00, .02, .04,
. . . , .12, .14, . . . , 1.00, the upper real limit of the .12 interval (i.e.,
.13) is used rather than .12.
The upper limit is employed because the problem is to find the
probability that the proportion of successes is greater than .12. The
lower limit would have been used if the problem required the
probability of obtaining a proportion of .12 or greater.

4.2 Sampling Distributions 42/50


Solution (continued)

Upper and lower real limits of binomial proportions can be computed


directly by adding and subtracting .5/n.
For the present case the upper real limit is .12 + .5/50 = .13.
Using upper and lower limits in this fashion when using a continuous
distribution to approximate probabilities associated with a discrete
variable is an example of what is referred to as a continuity
correction

4.2 Sampling Distributions 43/50


Solution (continued)

We now wcalculate
p̂ − π .13 − .10 .03
Z=q =q = = .71.
π(1−π) (.10)(.90) .0424
n 50

Reference to the normal curve table in Appendix A gives an


associated area of .2389.
The value as calculated by the binomial method is
P (7) + P (8) + · · · + P (50) = .2298.

4.2 Sampling Distributions 44/50


Example

Suppose a random sample of 50 observations is taken from a


dichotomous population in which the proportion of successes is .10.
What is the probability that the proportion of successes in the sample
will be .12?

4.2 Sampling Distributions 45/50


Solution

The estimate will be the area between the lower real limit of .11 and
the upper real limit of .13.
As calculated previously, the Z score for .13 is .71 while that for .11 is
.11 − .10 .01
Z=q = = .24.
(.10)(.90) .0424
50

Using these values in the normal curve table shows that the areas
between .13 and .10 and .11 and .10 are .2611 and .0948 respectively.
The area between .11 and .13 is then .2611 − .0948 = .1663.

4.2 Sampling Distributions 46/50


Example

Approximately 16 percent of men in the United States aged 60 to 64


who exhibit a particular risk profile will have a heart attack in the
next 10 years.
If a random sample of 300 such men are observed over the next 10
years, what is the probability that less than 5% will experience a heart
attack?

4.2 Sampling Distributions 47/50


Solution

Because the problem specifies that less than 5% will experience a


heart attack, the lower real limit of the five percent interval will be
used. This limit is .05 − .5/300 = .048.
The Z score is then
.048 − .16 −.112
Z=q = = −5.33.
(.16)(.84) .021
300

The normal curve table does not contain Z values of this magnitude
but it can be safely concluded that the probability is less than .0002.
(This is the tail area associated with Z = 3.50 which is the most
extreme score in the table.)

4.2 Sampling Distributions 48/50


Example

Suppose it is believed that a large community is evenly divided in its


opinion as to whether a cap should be placed on the amount that can
be recovered in medical malpractice law suits.
If this supposition is correct, what is the probability that a random
poll of 200 community members will produce 55 percent or more
favorable responses? Compute the probability with and without
continuity correction.

4.2 Sampling Distributions 49/50


Solution

The continuity correction is .5/200 = .0025. Because the task is to


find the probability that 55 percent or more will be favorable, the
lower real limit of the 55 percent category or .55 − .0025 = .5475 will
be used.
Because the community is assumed evenly divided, the proportion
favorable in the population is taken to be .50.
The Z score is then
.5475 − .50 .0475
Z= q = = 1.36.
(.50)(.50) .035
200

The area above 1.36 is .0869.


Without continuity correction the Z score is .05/.035 = 1.43 which
has an upper tail area of .0764.
The probability as computed by the binomial equation is .0895.
4.2 Sampling Distributions 50/50

You might also like