Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 45

Sample and Sampling

Distributions
Sessions- 8 & 9
BUSINESS STATISTICS
Introduction to Sampling Process
• Sample Unit, Sample Frame and target population: list of the items of
the (target) population from which a sample is to be obtained
• Characteristics of a Population ~ Parameter
• Characteristics of a Sample ~ Statistic
• Sampling Error: Selection Bias
• Sample size [n] vs cost: a trade-off
• Non-Sampling Error: Measurement Bias
• Defects: observational error
• Clerical errors: Recording, Copying, editing
• No-response: Missing Values
Sampling Distribution of x

The sampling distribution of x is the probability


distribution of all possible values of the sample
mean x .
• Expected Value of x
E( x ) = 

where:  = the population mean


When the expected value of the point estimator
equals the population parameter, we say the point
estimator is unbiased.
Sample Mean Sampling Distribution:
If the Population is Normal
• If a population is normal with mean μ and standard
deviation σ, the sampling distribution of the statistic X is
also normally distributed with

σ
μX  μ
and
σX 
n
Using E[X] and Var (X)
Properties of Moments
Sample Mean Sampling Distribution: Standard Error of the Mean

• the sample mean (the statistic) is used to estimate a population mean (a


parameter)
• the sample proportion (the statistic) is used to estimate the population
proportion(a parameter).
• Our main concern when making a statistical inference is drawing conclusions
about a population, not about a sample.
• Different samples of the same size from the same population will yield different sample means
• A measure of the variability in the mean from sample to sample is given by the Standard Error
of the Mean:
(This assumes that sampling is with replacement or σ
sampling is without replacement from an infinite population)
σX 
n
• Note that the standard error of the mean decreases as the sample size increases
Comparing the Population Distribution
to the Sample Means Distribution

Population Sample Means Distribution


N=4 n=2
μ  21 σ  2.236 μX  21 σ X  1.58
_
P(X) P(X)
.3 .3

.2 .2
.1 .1
0
X
0
18 19 20 21 22 23 24
_
18 20 22 24 X
A B C D
Intuitive representation of effect of n on standard error
Z-value for Sampling Distribution
of the Mean
• Z-value for the sampling distribution of : X

( X  μX ) ( X  μ)
Z 
σX σ
n

where: =Xsample mean


μ
= population mean
σ
= population standard deviation
n = sample size
Sampling Distribution
Properties
Normal Population


μx  μ Distribution

μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)

μx
x
Sampling Distribution Properties

As n increases, σ x decreases [inverse relation = σ/ n ]

Larger
sample size

Smaller
sample size

μ x
Sample Mean Sampling Distribution:
If the Population is not Normal

• We can apply the Central Limit Theorem:

• Even if the population is not normal, [check using- Box

plots, Histograms, P-P plots]


• …sample means from the population will be approximately
normal as long as the sample size is large enough.

Properties of the sampling distribution:

μx  μ σ
and
σx 
n
Central Limit Theorem

the sampling
As the n↑ distribution of
sample the sample
size gets mean becomes
large almost normal
enough… regardless of
shape of
population

x
Sample Mean Sampling Distribution:
If the Population is not Normal

Population Distribution
Sampling distribution
properties:
Central Tendency
μx  μ
μ x
Variation Sampling Distribution
σ (becomes normal as n increases)
σx  Larger
n Smaller
sample size
sample
size

μx x
Visualizing CLT
How do we define a large sample?

• For most distributions, n > 30 will give a sampling


distribution that is nearly normal
• For fairly symmetric distributions, n > 15
• For normal population distributions, the sampling
distribution of the mean is always normally
distributed
• So the ultimate ball game is of the shape of the
distribution-visualize through histograms & other
freq. distribution based representations
Example
• Suppose GVKPIL has a mean share price μ = 8 and
standard deviation σ = 3. A random sample of size n
= 36 days [closing price from last 2 years] is selected.

• What is the probability that the sample mean share


price is between 7.8 and 8.2?
Example
Solution:
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
• … so the sampling distribution of x is approximately
normal
• … with mean μx = 8 σ 3
σx    0.5
• …and standard error, n 36
• Do the iterations for n=75; n=9 and n=300
• Implications: n=36: almost 30 % # days will have share prices
between 7.8-8.2
• Other implications: relation between n and corresponding probability
Example
Solution (continued):
 
 7.8 - 8 X -μ 8.2 - 8 
P(7.8  X  8.2)  P   
 3 σ 3 
 36 n 36 
 P(-0.4  Z  0.4)  0.6554 - 0.3446  0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution
???
? ??
? ? Sample Standardize
? ? ?
?
7.8 8.2 -0.4 0.4
μ8 X μX  8 x μz  0 Z
Population Proportions
π = the proportion of the population having
some characteristic
• Sample proportion (p) provides an estimate
of π:
X number of items in the sample having the characteri stic of interest
p 
n sample size
• 0≤ p≤1
• p is approximately distributed as a normal distribution when
n is large
(assuming sampling with replacement from a finite population or without
replacement from an infinite population)
Sampling Distribution of p

• Approximated by a
Sampling Distribution
normal distribution if: P( ps)
.3
• nπ  5 .2
.1
0
and
0 .2 .4 .6 8 1 p
n(1  π )  5
where
π (1  π )
μp  π and σp 
n
(where π = population proportion)
Z-Value for Proportions
Standardize p to a Z value with the formula:

p  p 
Z 
σp  (1   )
n
Example
• If the true proportion of bankers who support M & A
of PSU Banks is π = 0.4, what is the probability that
a sample of 200 managers yields a sample
proportion between 0.40 and 0.45?

 i.e.: if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?
Example

• if π = 0.4 and n = 200, what is


P(0.40 ≤ p ≤ 0.45) ?

 (1   ) 0.4(1  0.4)
Find σ p : σ p    0.03464
n 200

Convert to  0.40  0.40 0.45  0.40 


P(0.40  p  0.45)  P Z 
standardized  0.03464 0.03464 
normal:  P(0  Z  1.44)
Example
• if π = 0.4 and n = 200, what is
P(0.40 ≤ p ≤ 0.45) ?

Utilize the cumulative normal table:


P(0 ≤ Z ≤ 1.44) = 0.9251 – 0.5000 = 0.4251

Standardized
Sampling Distribution Normal Distribution

0.4251

Standardize

0.40 0.45 0 1.44


p Z
Finite Population Correction
Factors/Multipliers
• Used to calculate the standard error of both the sample
mean and the sample proportion

• Needed when the sample size, n, is more than 5% of the


population size N, i.e., sampling fraction n/N > 0.05

• The Finite Population Correction Factor Is:

N n
fpc 
N 1
Using The fpc In Calculating Standard Errors
Standard Error of the Mean for Finite Populations

 N n
X 
n N 1

Standard Error of the Proportion for Finite Populations

 (1   ) N  n
p 
n N 1
Using The fpc Reduces The Standard Error
• Resulting in more precise estimates of population parameters

• So when it is used it reduces the standard error

• The fpc is always less than 1


Using fpc With The Mean - Example
Suppose a random sample of size 100 is drawn from a
population of size 1,000 with a standard deviation of 40.

Here n=100, N=1,000 and 100/1,000 = 0.10 > 0.05.


So using the fpc for the standard error of the mean we get:

40 1000  100
X   3.8
100 1000  1
Sampling Distribution of p

• Standard Deviation of p
Finite Population Infinite Population

N n p (1  p) p(1  p)
p  p 
N 1 n n

•  p is referred to as the standard error of


the proportion.
( N  n) /( N•  1) is the finite population
correction factor.
Chapter Summary
Discussed till now

• Sampling distributions
• The sampling distribution of the mean
• For normal populations
• For non-normal populations: Using the Central Limit Theorem
• The sampling distribution of a proportion
• Calculating probabilities using sampling distributions
• To know when finite population corrections are needed
• To know how to utilize finite population correction factors in calculating
standard errors-
• FINAL EXAMPLE
Sampling Distribution of x
 Example: St. Andrew’s College
What is the probability that a simple random
sample of 30 applicants will provide an estimate of
the population mean SAT score that is within +/-10
of the actual population mean  ?
In other words, what is the probability that x will
be between 1687 and 1707?
Sampling Distribution of x
 Example: St. Andrew’s College

Sampling
Distribution
 87.4
of x x    15.96
n 30
for SAT
Scores

x
E( x )  1697
Sampling Distribution of x
 Example: St. Andrew’s College
Step 1: Calculate the z-value at the upper endpoint of
the interval.
z = (1707 - 1697)/15.96= .63
Step 2: Find the area under the curve to the left of the
upper endpoint.
P(z < .63) = .7357
Sampling Distribution of x
 Example: St. Andrew’s College
Cumulative Probabilities for
the Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
Sampling Distribution of x
 Example: St. Andrew’s College

Sampling
Distribution  x  15.96
of x
for SAT
Scores

Area = .7357

x
1697 1707
Sampling Distribution of x
 Example: St. Andrew’s College
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (1687 - 1697)/15.96= - .63
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z < -.63) = .2643
Sampling Distribution of x SAT Scores
for
 Example: St. Andrew’s College

Sampling
Distribution  x  15.96
of x
for SAT
Scores

Area = .2643

x
1687 1697
Sampling Distribution of x SAT Scores
for
 Example: St. Andrew’s College
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.68 < z < .68) = P(z < .68) - P(z < -.68)
= .7357 - .2643
= .4714
The probability that the sample mean SAT score will
be between 1687 and 1707 is:

P(1687 < x < 1707) = .4714


Sampling Distribution of x SAT Scores
for
 Example: St. Andrew’s College

Sampling
Distribution  x  15.96
of x
for SAT
Scores
Area = .4714

x
1687 1697 1707
Relationship Between the Sample Size
and the Sampling Distribution of x
 Example: St. Andrew’s College
• Suppose we select a simple random sample of 100 applicants
instead of the 30 originally considered from a total of 900 applicants .
• E( x) = m regardless of the sample size. In our
example, E(x) remains at 1697.
• Whenever the sample size is increased, the standard
error of the mean  x is decreased. With the increase
in the sample size to n = 100, the standard error of
the mean is decreased from 15.96 to:
N n    900  100  87.4 
x        .94333(8.74)  8.2
N 1  n  900  1  100 
Relationship Between the Sample Size
and the Sampling Distribution of x
 Example: contd…*applying concept of sampling from finite population

With n* = 100,
 x  8.2

With n = 30,
 x  15.96

x
E( x )  1697
Relationship Between the Sample Size
and the Sampling Distribution of x
 Example: St. Andrew’s College
• Recall that when n = 30, P(1687 < x < 1707) = .4714.
• We follow the same steps to solve for P(1687 < x
< 1707) when n = 100 as we showed earlier when
n = 30.
• Now, with n = 100, P(1687 < x < 1707) = .7776.
• Because the sampling distribution with n = 100 has a
smaller standard error, the values of x have less
variability and tend to be closer to the population
mean than the values of x with n = 30.
Relationship Between the Sample Size
and the Sampling Distribution of x
 Example: St. Andrew’s College

Sampling  x  8.2
Distribution
of x
for SAT
Scores
Area = .7776

x
1687 1697 1707

You might also like