SDA 3E Chapter 4

2007 Pearson Education
Chapter 4: Sampling and

Estimation
Need for Sampling
Very large populations
Destructive testing
Continuous production process
The objective of sampling is to draw a valid inference about
a population.
Sample Design
Sampling Plan a description of the
approach that will be used to obtain
samples from a population
Objectives
Target population
Population frame
Method of sampling
Operational procedures for data collection
Statistical tools for analysis
Sampling Methods
Subjective
Judgment sampling
Convenience sampling
Probabilistic
Simple random sampling every subset of
a given size has an equal chance of being
selected
PHStat Tool
Random Sample Generator
PHStat menu > Sampling > Random
Sample Generator
Enter sample size
Select sampling
method
Excel Data Analysis Tool
Sampling
Excel menu > Tools > Data Analysis >
Sampling
Specify input range
of data
Choose sampling
method

Select output option
Other Sampling Methods
Systematic sampling
Stratified sampling
Cluster sampling
Sampling from a continuous process
Errors in Sampling
Nonsampling error
Poor sample design
Sampling (statistical) error
Depends on sample size
Tradeoff between cost of sampling and
accuracy of estimates obtained by
sampling
Estimation
Estimation assessing the value of a
population parameter using sample data.
Point estimate a single number used to
estimate a population parameter
Confidence intervals a range of values
between which a population parameter is
believed to be along with the probability that
the interval correctly estimates the true
population parameter
Common Point Estimates
Theoretical Issues
Unbiased estimator one for which the
expected value equals the population
parameter it is intended to estimate
The sample variance is an unbiased
estimator for the population variance

( )
1
2
1
2
=
n
x x
s
n
i
i
( )
N
x
n
i
i
2
1
2
=

=

o
Interval Estimates
Range within which we believe the true
population parameter falls
Example: Gallup poll percentage of
voters favoring a candidate is 56% with a
3% margin of error.
Interval estimate is [53%, 59%]
Confidence Intervals
Confidence interval (CI) an interval
estimated that specifies the likelihood that
the interval contains the true population
parameter
Level of confidence (1 o) the probability
that the CI contains the true population
parameter, usually expressed as a percentage
(90%, 95%, 99% are most common).

Sampling Distribution of the
Mean

Interval Estimate Containing the
True Population Mean
Interval Estimate Not Containing
the True Population Mean
Confidence Interval for the
Mean o Known
A 100(1 o)% CI is: x z
o/2
(o/\n)

z
o/2
may be found from Table A.1 or using the
Excel function NORMSINV(1-o/2)
Example
Compute a 95 percent confidence interval for
the mean number of TV hours/week for the
18-24 age group in the file TV Viewing.xls.
Assume that the population standard
deviation is known to be 10.0. The sample
mean for the n = 45 observations is
computed to be 60.16. For a 95 percent CI,
z
o/2
= 1.96. Therefore, the CI is
60.16 1.96(10/\45)
= 60.16 2.92 or [57.24, 63.08]
Confidence Interval for the
Mean, o Unknown
A 100(1 o)% CI is: x t
o/2,n-1
(s/\n)

t
o/2,n-1
is the value from a t-distribution with
n-1 degrees of freedom, from Table A.2 or
the Excel function TINV(o, n-1)
Relationship Between Normal
Distribution and t-distribution
The t-distribution yields larger confidence
intervals for smaller sample sizes.
Example
Compute a 95 percent confidence interval for the
mean number of TV hours/week for the 18-24 age
group in the file TV Viewing.xls. Assume that the
population standard deviation is not but estimated
from the sample as 10.095. A 95 percent CI
corresponds to o/2 = 0.025. With 45 observations,
thus the t-distribution has 45 - 1 = 44 df. Using Table
A.2, we find that t
0.025, 44
= 2.0154, yielding a 95
percent CI for the mean of
60.16 2.0154(10.095/\45)
= 60.16 3.03 or [57.13, 63.19]
PHStat Tool: Confidence
Intervals for the Mean
PHStat menu > Confidence Intervals >
Estimate for the mean, sigma known,
or Estimate for the mean, sigma
unknown
Intervals for the Mean - Dialog
Enter the confidence level

Choose specification of
sample statistics

Check Finite Population
Correction box if
appropriate
Sampling From Finite
Populations
When n > 0.05N, use a correction
factor in computing the standard error:
1
=
N
n N
n
x
o
o
Intervals for the Mean - Results
Confidence Intervals for
Proportions
Sample proportion: p = x/n
x = number in sample having desired
characteristic
n = sample size
The sampling distribution of p has mean
t and variance t(1 t)/n
When nt and n(1 t) are at least 5,
the sampling distribution of p approach
a normal distribution
Proportions

A 100(1 o)% CI is:

n
p) - p(1
z p
/2 o
PHStat tool is available under Confidence

Intervals option
Confidence Intervals and
Sample Size
CI for the mean, o known
Sample size needed for half-width of at
most E is n > (z
o/2
)
2
(o
2
)/E
2
CI for a proportion
Sample size needed for half-width of at
most E is

Use p as an estimate of t or 0.5 for the
most conservative estimate

2
2
2 /
) 1 ( ) (
E
z
n
t t
o

>
PHStat Tool: Sample Size
Determination
PHStat menu > Sample Size >
Determination for the Mean or
Determination for the Proportion
Enter s, E, and
confidence level

Check Finite
Population Correction
box if appropriate
Population Total

A 100(1 o)% CI is:

PHStat tool is available under Confidence
Intervals option
N x t
n-1,o/2

1
N
n N
n
s
N
Differences Between Means

Population 1

Population 2

Mean

2

Standard
deviation

o
1

o
2

Point estimate

x
1

x
2

Sample size

n
1

n
2

Point estimate for the difference in means,
1

2
, is given by x
1
- x
2

Independent Samples With
Unequal Variances

A 100(1 o)% CI is:

x
1
-x
2
(t
o/2, df*
)

2
2
2
1
2
1
n
s
n
s
+
(
+
(
+
1
) / (
1
) / (
2
2
2
2
2
1
2
1
2
1
2
2
2
2
1
2
1
n
n s
n
n s
n
s
n
s
df* =
Fractional values
rounded down
Example
In the Accounting Professionals.xls worksheet,
find a 95 percent confidence interval for the
difference in years of service between males and
females.
Calculations
s
1
= 4.39 and n
1
= 14 (females),
s
2
= 8.39 and n
2
= 13 (males)
df* = 17.81, so use 17 as the degrees
of freedom
Independent Samples With
Equal Variances

A 100(1 o)% CI is:

x
1
- x
2
(t
o/2, n1 + n2 2
)

2 1
1 1
n n
s
p
+
2
) 1 ( ) 1 (
2 1
2
2 2
2
1 1
+
+
=
n n
s n s n
s
p
where s
p
is a common pooled standard deviation. Must
assume the variances of the two populations are equal.
Example: Accounting
Professionals
Paired Samples

A 100(1 o)% CI is:

D (t
n-1,o/2
) s
D
/\n

1
) (
1
=

=
n
D D
s
n
i
i
D
D
i
= difference for each pair of observations
D = average of differences
PHStat tool available in the
Confidence Intervals menu
2
Example
Pile Foundation.xls

A 95% CI for the average difference
between the actual and estimated pile
lengths is

Differences Between
Proportions

A 100(1 o)% CI is:

2
2 2
1
1 1
2 / 2 1
) 1 ( ) 1 (
n
p p
n
p p
z p p

+

o
Applies when n
i
p
i
and n
i
(1 p
i
) are greater than 5
Example
In the Accounting Professionals.xls
worksheet, the proportion of females having
a CPA is 8/14 = 0.57, while the proportion of
males having a CPA is 6/13 = 0.46. A 95
percent confidence interval for the difference
in proportions between females and males is
Sampling Distribution of s
The sample standard deviation, s, is a point
estimate for the population standard
deviation, o
The sampling distribution of s has a chi-
square (_
2
) distribution with n-1 df
See Table A.3
CHIDIST(x, deg_freedom) returns probability to
the right of x
CHIINV(probability, deg_freedom) returns the
value of x for a specified right-tail probability
Confidence Intervals for the
Variance

A 100(1 o)% CI is:

(
(

2
2 / 1 , 1
2
2
2 / , 1
2
) 1 (
,
) 1 (
o o
_ _
n n
s n s n
Note the difference in the
denominators!
Intervals for Variance - Dialog
PHStat menu > Confidence Intervals >
Estimate for the Population Variance
Enter sample size,
standard deviation,
and confidence level
Intervals for Variance - Results
Time Series Data
Confidence intervals only make sense
for stationary time series data
Summary and Conclusions
As the confidence level (1 - o)
increases, the width of the confidence
interval also increases.
As the sample size increases, the width
of the confidence interval decreases.
Probability Intervals
A 100(1 o)% probability interval for a
random variable X is any interval [a,b]
such that P(a s X s b) = 1 o
Do not confuse a confidence interval
with a probability interval; confidence
intervals are probability intervals for
sampling distributions, not for the
distribution of the random variable.

SDA 3E Chapter 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SDA 3E Chapter 4

Uploaded by

Copyright:

Available Formats

2007 Pearson Education

Chapter 4: Sampling and

PHStat tool is available under Confidence

You might also like