Sessions 8 & 9 - Sample and Sampling Distributions

Sample and Sampling
Distributions
Sessions- 8 & 9
BUSINESS STATISTICS
Introduction to Sampling Process
• Sample Unit, Sample Frame and target population: list of the items of
the (target) population from which a sample is to be obtained
• Characteristics of a Population ~ Parameter
• Characteristics of a Sample ~ Statistic
• Sampling Error: Selection Bias
• Sample size [n] vs cost: a trade-off
• Non-Sampling Error: Measurement Bias
• Defects: observational error
• Clerical errors: Recording, Copying, editing
• No-response: Missing Values
Sampling Distribution of x
The sampling distribution of x is the probability

distribution of all possible values of the sample
mean x .
• Expected Value of x
E( x ) = 
where:  = the population mean

When the expected value of the point estimator
equals the population parameter, we say the point
estimator is unbiased.
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normal with mean μ and standard
deviation σ, the sampling distribution of the statistic X is
also normally distributed with
σ
μX  μ
and
σX 
n
Using E[X] and Var (X)
Properties of Moments
Sample Mean Sampling Distribution: Standard Error of the Mean
• the sample mean (the statistic) is used to estimate a population mean (a

parameter)
• the sample proportion (the statistic) is used to estimate the population
proportion(a parameter).
• Our main concern when making a statistical inference is drawing conclusions
about a population, not about a sample.
• Different samples of the same size from the same population will yield different sample means
• A measure of the variability in the mean from sample to sample is given by the Standard Error
of the Mean:
(This assumes that sampling is with replacement or σ
sampling is without replacement from an infinite population)
σX 
n
• Note that the standard error of the mean decreases as the sample size increases
Comparing the Population Distribution
to the Sample Means Distribution
Population Sample Means Distribution

N=4 n=2
μ  21 σ  2.236 μX  21 σ X  1.58
_
P(X) P(X)
.3 .3
.2 .2
.1 .1
0
X
0
18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Intuitive representation of effect of n on standard error
Z-value for Sampling Distribution
of the Mean
• Z-value for the sampling distribution of : X
( X  μX ) ( X  μ)
Z 
σX σ
n
where: =Xsample mean

μ
= population mean
σ
= population standard deviation
n = sample size
Sampling Distribution
Properties
Normal Population
•
μx  μ Distribution
μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)
μx
x
Sampling Distribution Properties
As n increases, σ x decreases [inverse relation = σ/ n ]
Larger
sample size
Smaller
sample size
μ x
If the Population is not Normal
• We can apply the Central Limit Theorem:
• Even if the population is not normal, [check using- Box
plots, Histograms, P-P plots]

• …sample means from the population will be approximately
normal as long as the sample size is large enough.
Properties of the sampling distribution:
μx  μ σ
and
σx 
n
Central Limit Theorem
the sampling
As the n↑ distribution of
sample the sample
size gets mean becomes
large almost normal
enough… regardless of
shape of
population
x
If the Population is not Normal
Population Distribution
Sampling distribution
properties:
Central Tendency
μx  μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx  Larger
n Smaller
sample size
sample
size
μx x
Visualizing CLT
How do we define a large sample?
• For most distributions, n > 30 will give a sampling

distribution that is nearly normal
• For fairly symmetric distributions, n > 15
• For normal population distributions, the sampling
distribution of the mean is always normally
distributed
• So the ultimate ball game is of the shape of the
distribution-visualize through histograms & other
freq. distribution based representations
Example
• Suppose GVKPIL has a mean share price μ = 8 and
standard deviation σ = 3. A random sample of size n
= 36 days [closing price from last 2 years] is selected.
• What is the probability that the sample mean share

price is between 7.8 and 8.2?
Example
Solution:
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
• … so the sampling distribution of x is approximately
normal
• … with mean μx = 8 σ 3
σx    0.5
• …and standard error, n 36
• Do the iterations for n=75; n=9 and n=300
• Implications: n=36: almost 30 % # days will have share prices
between 7.8-8.2
• Other implications: relation between n and corresponding probability
Example
Solution (continued):
 
 7.8 - 8 X -μ 8.2 - 8 
P(7.8  X  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  Z  0.4)  0.6554 - 0.3446  0.3108
Population Sampling Standard Normal

Distribution Distribution Distribution
???
? ??
? ? Sample Standardize
? ? ?
?
7.8 8.2 -0.4 0.4
μ8 X μX  8 x μz  0 Z
Population Proportions
π = the proportion of the population having
some characteristic
• Sample proportion (p) provides an estimate
of π:
X number of items in the sample having the characteri stic of interest
p 
n sample size
• 0≤ p≤1
• p is approximately distributed as a normal distribution when
n is large
(assuming sampling with replacement from a finite population or without
replacement from an infinite population)
Sampling Distribution of p
• Approximated by a
Sampling Distribution
normal distribution if: P( ps)
.3
• nπ  5 .2
.1
0
and
0 .2 .4 .6 8 1 p
n(1  π )  5
where
π (1  π )
μp  π and σp 
n
(where π = population proportion)
Z-Value for Proportions
Standardize p to a Z value with the formula:
p  p 
Z 
σp  (1   )
n
Example
• If the true proportion of bankers who support M & A
of PSU Banks is π = 0.4, what is the probability that
a sample of 200 managers yields a sample
proportion between 0.40 and 0.45?
 i.e.: if π = 0.4 and n = 200, what is

P(0.40 ≤ p ≤ 0.45) ?
Example
• if π = 0.4 and n = 200, what is

P(0.40 ≤ p ≤ 0.45) ?
 (1   ) 0.4(1  0.4)
Find σ p : σ p    0.03464
n 200
Convert to  0.40  0.40 0.45  0.40 

P(0.40  p  0.45)  P Z 
standardized  0.03464 0.03464 
normal:  P(0  Z  1.44)
Example
• if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?
Utilize the cumulative normal table:

P(0 ≤ Z ≤ 1.44) = 0.9251 – 0.5000 = 0.4251
Standardized
Sampling Distribution Normal Distribution
0.4251
Standardize
0.40 0.45 0 1.44

p Z
Finite Population Correction
Factors/Multipliers
• Used to calculate the standard error of both the sample
mean and the sample proportion
• Needed when the sample size, n, is more than 5% of the

population size N, i.e., sampling fraction n/N > 0.05
• The Finite Population Correction Factor Is:
N n
fpc 
N 1
Using The fpc In Calculating Standard Errors
Standard Error of the Mean for Finite Populations
 N n
X 
n N 1
Standard Error of the Proportion for Finite Populations
 (1   ) N  n
p 
n N 1
Using The fpc Reduces The Standard Error
• Resulting in more precise estimates of population parameters
• So when it is used it reduces the standard error
• The fpc is always less than 1

Using fpc With The Mean - Example
Suppose a random sample of size 100 is drawn from a
population of size 1,000 with a standard deviation of 40.
Here n=100, N=1,000 and 100/1,000 = 0.10 > 0.05.

So using the fpc for the standard error of the mean we get:
40 1000  100
X   3.8
100 1000  1
Sampling Distribution of p
• Standard Deviation of p
Finite Population Infinite Population
N n p (1  p) p(1  p)
p  p 
N 1 n n
•  p is referred to as the standard error of

the proportion.
( N  n) /( N•  1) is the finite population
correction factor.
Chapter Summary
Discussed till now
• Sampling distributions
• The sampling distribution of the mean
• For normal populations
• For non-normal populations: Using the Central Limit Theorem
• The sampling distribution of a proportion
• Calculating probabilities using sampling distributions
• To know when finite population corrections are needed
• To know how to utilize finite population correction factors in calculating
standard errors-
• FINAL EXAMPLE
 Example: St. Andrew’s College
What is the probability that a simple random
sample of 30 applicants will provide an estimate of
the population mean SAT score that is within +/-10
of the actual population mean  ?
In other words, what is the probability that x will
be between 1687 and 1707?
Sampling
Distribution
 87.4
of x x    15.96
n 30
for SAT
Scores
x
E( x )  1697
Step 1: Calculate the z-value at the upper endpoint of
the interval.
z = (1707 - 1697)/15.96= .63
Step 2: Find the area under the curve to the left of the
upper endpoint.
P(z < .63) = .7357
Cumulative Probabilities for
the Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
Sampling
Distribution  x  15.96
of x
for SAT
Scores
Area = .7357
x
1697 1707
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (1687 - 1697)/15.96= - .63
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z < -.63) = .2643
Sampling Distribution of x SAT Scores
for
Sampling
of x
for SAT
Scores
Area = .2643
x
1687 1697
for
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.68 < z < .68) = P(z < .68) - P(z < -.68)
= .7357 - .2643
= .4714
The probability that the sample mean SAT score will
be between 1687 and 1707 is:
P(1687 < x < 1707) = .4714

for
Sampling
of x
for SAT
Scores
Area = .4714
x
1687 1697 1707
Relationship Between the Sample Size
and the Sampling Distribution of x
• Suppose we select a simple random sample of 100 applicants
instead of the 30 originally considered from a total of 900 applicants .
• E( x) = m regardless of the sample size. In our
example, E(x) remains at 1697.
• Whenever the sample size is increased, the standard
error of the mean  x is decreased. With the increase
in the sample size to n = 100, the standard error of
the mean is decreased from 15.96 to:
N n    900  100  87.4 
x        .94333(8.74)  8.2
N 1  n  900  1  100 
 Example: contd…*applying concept of sampling from finite population
With n* = 100,
 x  8.2
With n = 30,
 x  15.96
x
E( x )  1697
• Recall that when n = 30, P(1687 < x < 1707) = .4714.
• We follow the same steps to solve for P(1687 < x
< 1707) when n = 100 as we showed earlier when
n = 30.
• Now, with n = 100, P(1687 < x < 1707) = .7776.
• Because the sampling distribution with n = 100 has a
smaller standard error, the values of x have less
variability and tend to be closer to the population
mean than the values of x with n = 30.
Sampling  x  8.2
Distribution
of x
for SAT
Scores
Area = .7776
x
1687 1697 1707

Sessions 8 & 9 - Sample and Sampling Distributions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sessions 8 & 9 - Sample and Sampling Distributions

Uploaded by

Copyright:

Available Formats

Sample and Sampling

The sampling distribution of x is the probability

where:  = the population mean

• the sample mean (the statistic) is used to estimate a population mean (a

Population Sample Means Distribution

where: =Xsample mean

As n increases, σ x decreases [inverse relation = σ/ n ]

• We can apply the Central Limit Theorem:

• Even if the population is not normal, [check using- Box

plots, Histograms, P-P plots]

Properties of the sampling distribution:

• For most distributions, n > 30 will give a sampling

• What is the probability that the sample mean share

Population Sampling Standard Normal

 i.e.: if π = 0.4 and n = 200, what is

• if π = 0.4 and n = 200, what is

Convert to  0.40  0.40 0.45  0.40 

Utilize the cumulative normal table:

0.40 0.45 0 1.44

• Needed when the sample size, n, is more than 5% of the

• The Finite Population Correction Factor Is:

Standard Error of the Proportion for Finite Populations

• So when it is used it reduces the standard error

• The fpc is always less than 1

Here n=100, N=1,000 and 100/1,000 = 0.10 > 0.05.

•  p is referred to as the standard error of

P(1687 < x < 1707) = .4714

You might also like