Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 37

Topic 7

Sampling and Sampling Distributions


Chapter topics
• Concept of sampling
• Probability and nonprobability sampling methods
• Concept of sampling distributions
• Sampling distribution of the mean
• For normal populations
• Using the Central Limit Theorem
• Sampling distribution of a proportion
• Probabilities using sampling distributions
Why Sample?

• Selecting a sample is less time-consuming than selecting


every item in the population (census).

• Selecting a sample is less costly than selecting every item in


the population.

• An analysis of a sample is less cumbersome and more


practical than an analysis of the entire population.
Selection of Class Representatives

Unbiased
Sample Unbiased,
representative sample
Male students
drawn at random from
Female students
Population the entire population

Biased
Sample
Biased, unrepresentative
Female sample drawn consisting
Male students students of more female students
Population
than males
Sampling Process begins with a Sampling Frame

• The sampling frame is a listing of items that make up the population


• Frames are data sources such as population lists, directories, or maps
• Inaccurate or biased results can occur if a frame excludes certain
portions of the population
• Using different frames to generate data can lead to dissimilar
conclusions
Types of Sampling

Sampling

Non-Probability Probability Sampling


Sampling

Simple Random Stratified


Convenience Judgement Quota Snowball

Systematic Cluster
Types of Sampling: Non-probability Sampling
In non-probability sampling, items included are chosen without
considering their probability of occurrence.
• In convenience sampling, items are selected based only on the fact that they are
easy, inexpensive, or convenient to sample.
• In judgment sampling, one gets the opinions of pre-selected individuals or
experts in the subject matter.
• In quota sampling, individuals or items are selected on the basis of specific traits
or qualities. Some fixed number of units are selected including all the traits.
• In snowball sampling, research units are selected with the help of other research
units. It is used where potential participants are difficult to identify. For example,
customers in life insurance, network marketing, survey on ‘social evils’ etc.
Types of Sampling: Probability Sampling

In probability sampling, items in the sample are chosen on the


basis of known probabilities.

Probability Sampling

Simple Random Stratified Systematic Cluster


Probability Sampling: Simple Random Sampling

• Every individual or item from the frame has an equal chance of


being selected

• Selection may be with replacement (selected individual is


returned to frame for possible reselection) or without
replacement (selected individual isn’t returned to the frame).

• Samples are obtained using either lottery method or random


number tables or computer random number generators.
Selecting a Simple Random Sample using ‘Random Number Table’

Sampling Frame For


Population With 850 Portion Of A Random Number Table
Items 49280 88924 35779 00283 81163 07275
11100 02340 12860 74697 96644 89439
09893 23997 20048 49420 88872 08401
Item Name Item #
Bev R. 001
Ulan X. 002 The First 12 Items in a simple random sample: first
Roger F. 003 3 digits should be between 001 to 850
. . Item # 49280 - select Item # 11100 - select
. . Item # 88924 - ignore Item # 02340 - select
Item # 35779 - select Item # 12860 - select
. .
Item # 00283 - select Item # 74697- select
. . Item # 81163 - select Item # 96644 - ignore
Peter S. 848 Item # 07275 - select Item # 89439 - ignore

Joann P. 849
Paul F. 850
Probability Sampling: Stratified Random Sampling

• Divide population into two or more subgroups (called strata) according to some common
characteristic

• A simple random sample is selected from each subgroup, with sample sizes proportional
to strata sizes

• Samples from subgroups are combined into one


• This is a common technique when sampling population of voters, stratifying across racial
or socio-economic lines.

Population
Divided
into 4
strata

Chap 7-11
Probability Sampling: Systematic Sampling

• Decide on sample size: n


• Divide frame of N individuals into groups of k
individuals: k=N/n (called sampling interval)
• Randomly select one individual from the 1st group
• Select every kth individual thereafter

N = 40 First Group
n=4
k = 10
Probability Sampling: Cluster Sampling

• Population is divided into several “clusters,” each representative of the


population
• A simple random sample of clusters is selected
• All items in the selected clusters can be used, or items can be chosen from a
cluster using another probability sampling technique
• A common application of cluster sampling involves election exit polls, where
certain election districts are selected and sampled.

Population
divided into
16 clusters. Randomly selected
clusters for sample
Probability Sample: Comparing Sampling Methods

• Simple random sample and Systematic sample


 Simple to use
 May not be a good representation of the population’s
underlying characteristics
• Stratified sample
 Ensures representation of individuals across the entire
population
• Cluster sample
 More cost effective
 Less efficient (need larger sample to acquire the same level
of precision)
Sampling Distribution

• A sampling distribution is a distribution of all of the possible values of


a sample statistic (mean, std dev., proportion etc.) for a given size of
sample selected from a population.
• For example, suppose you sample 50 students from your college
regarding their mean GPA. If you obtain different samples of size 50,
you will compute a different mean for each sample. We are
interested in the distribution of all potential mean GPAs () we might
calculate for all samples of 50 students.
Sampling Distribution
• If we consider the process of selecting a simple random sample as an
experiment, the sample mean  is the numerical description of the
outcome of the experiment. Thus, the sample mean  is a random
variable.
• As a result, just like other random variables,  has a mean or expected
value, a standard deviation, and a probability distribution.
• Because the various possible values of  are the result of different
simple random samples, the probability distribution of  is called the
sampling distribution of .
• Knowledge of this sampling distribution and its properties will enable
us to make probability statements about how close the sample mean
 is to the population mean μ.
Sampling Distribution of :
If the Population is Normal

If a population is normal with mean μ and standard deviation σ, the sampling


distribution of  is also normally distributed with

μX  μ and σX 
σ
n

Different samples of the same size from the same population will yield different
sample means
A measure of the variability in the mean from sample to sample is given by the
Standard Error of the Mean (standard deviation of sample means)
Note that the standard error of the mean decreases as the sample size increases
Central Limit Theorem
the sampling
As the n↑
distribution
sample
becomes
size gets
almost normal
large
regardless of
enough…
shape of
population

x
Sampling Distribution of :
If the Population is not Normal
• We can apply the Central Limit Theorem:
• Even if the population is not normal,
• Sample means from the population will be approximately
normal as long as the sample size is large enough.

Properties of the sampling distribution:

and σ
μx  μ σx 
n
Sample Mean Sampling Distribution:
If the Population is not Normal
(continued)

Population Distribution
Sampling distribution
properties:
Central Tendency
μx  μ
μ x
Variation Sampling Distribution (becomes normal as n increases)
σ
σx  Larger
n Smaller
sample size
sample
size

μx x
How Large is Large Enough?
• For most distributions, n > 30 will give a sampling
distribution that is nearly normal
• For fairly symmetric distributions, n > 15
• For normal population distributions, the sampling
distribution of the mean is always normally distributed
Z-value for Sampling Distribution of Mean

Z-value for the sampling distribution of 

( X  μX ) ( X  μ)
Z 
σX σ
n
where: X = sample mean
μ = population mean
σ = population standard deviation
n = sample size
Example
• Suppose a population has mean μ = 8 and standard
deviation σ = 3. Suppose a random sample of size n = 36 is
selected.
• What is the probability that the sample mean is between
7.8 and 8.2?
Example
Solution:
• Even if the population is not normally distributed,
the central limit theorem can be used (n > 30)
• The sampling distribution of  is approximately
normal with
mean μx = 8
and, standard deviation σ 3
σx    0.5
n 36
Solution (continued):

Population Sampling Standard Normal


Distribution Distribution Distribution 0.6554 -
??? 0.3446
? ??
? ? Sample Standardize =0.3108
?? ?
?
7.8 8.2 -0.4 0.4
μ8 X μX  8 x μz  0 Z
Practice Exercises
1.Mean expenditure of all the visitors in a restaurant is Rs.2000 with a std. deviation of
Rs.250. A random sample of 40 customers was taken, find the probability that
(a) mean expenditure of customers is more than Rs.1928, (b) mean expenditure of
customers is between Rs.1950 and Rs.2030.
(a) Z = = = -1.82

=
=
= 1-0.0344
= 0.9656

(b) P(1950< <2030)


= P(-1.26<Z<0.76) Z= -1.82 Z=0
2. The numerical population of grade point averages at a college has mean 2.61 and
standard deviation 0.5. If a random sample of size 100 is taken from the population,
what is the probability that the sample mean will be between 2.51 and 2.71?
3. A prototype automotive tire has a mean design life of 38,500 miles with a standard
deviation of 2,500 miles. Five such tires are manufactured and tested. Find the
probability that the sample mean will be less than 36,000 miles. Assume that the
distribution of lifetimes of such tires is normal.
4. An automobile battery manufacturer claims that its midgrade battery has a mean life
of 50 months with a standard deviation of 6 months. Suppose the distribution of battery
lives of this particular brand is approximately normal.
(a) On the assumption that the manufacturer’s claims are true, find the probability that
a randomly selected battery of this type will last less than 48 months. (Normal
distribution problem)
(b) On the same assumption, find the probability that the mean life of a random sample
of 36 such batteries will be less than 48 months. (Sampling distribution problem)
Sampling Distribution for Population Proportion
• Let p = the proportion of the population having some characteristic
• Sample proportion () provides an estimate of population proportion (p):

• ‘x’ is the number of elements in the sample that possess the characteristic of
interest and ‘n’ is the sample size.
• 0≤ p≤1
• p is approximately distributed as a normal distribution when n is large
(assuming sampling with replacement from a finite population or without replacement from an
infinite population)
Sampling Distribution of 

• Approximated by a normal distribution if:


Sampling Distribution
P( )
𝑛𝑝≥ 5 and𝑛(1− 𝑝)≥5 .3
.2
.1
0
where 0 .2 .4 .6 8 1 

𝝁=𝒑 and
𝝈 =
𝒏 √
𝒑(𝟏− 𝒑)

(where p = population proportion)


Z-Value for Proportions
Standardize  to a Z-value with the formula:

 −𝜇  − 𝑝
𝑍= =


𝜎 𝑝 (1 −𝑝 )
𝑛
Example
• If the true proportion of voters who support Proposition A is
0.4, what is the probability that a sample of size 200 yields a
sample proportion between 0.40 and 0.45?
• i.e. if p = 0.4 and n = 200, what is P(0.40 ≤ ≤ 0.45) ?
Example
(continued)

if p = 0.4 and n = 200, what is


P(0.40 ≤  ≤ 0.45) ?

𝜎 =
√ 𝑛
=

𝑝(1 − 𝑝) 0.4 (1 −0.4 )
200
=0.03464

(
0.40−0.40 0.45−0.40
)
Convert to
standardized P(0.40≤≤0.45)=P ≤Z ≤
normal: 0.03464 0.03464
Example
(continued)
if p = 0.4 and n = 200, what is
P(0.40 ≤  ≤ 0.45) ?

Use standardized normal table: P(0 ≤ Z ≤ 1.44) = 0.4251

Standardized
Sampling Distribution Normal Distribution
0.9251-0.5
= 0.4251
Standardize

0.40 0.45 0 1.44


p Z
Practice Exercise

1.The Grocery Manufacturers of America reported that 76% of consumers read the ingredients
listed on a product’s label. Assume the population proportion is p = .76 and a sample of 400
consumers is selected from the population.
(a) Show the sampling distribution of the sample proportion  where  is the proportion of
the sampled consumers who read the ingredients listed on a product’s label.
(b) What is the probability that the sample proportion will be within ±.03 of the population
proportion?
(c) Answer part (b) for a sample of 750 consumers.

2. The Food Marketing Institute shows that 17% of households spend more than $100 per week
on groceries. Assume the population proportion is p = .17 and a sample of 800 households will be
selected from the population.
(d) Show the sampling distribution of p, the sample proportion of households spending more
than $100 per week on groceries.
(e) What is the probability that the sample proportion will be within ±.02 of the population
proportion?
(f) Answer part (b) for a sample of 1600 households.
Point Estimation
• Point estimation is the process of using the sample data available to estimate the unknown
value of a parameter. The point estimate obtained from the data will be a single number like
sample mean, sample standard deviation, sample proportion etc.
• Suppose we have an unknown population parameter, such as a population mean μ or a
population proportion p, which we'd like to estimate. For example, suppose we are interested
in estimating:
p = the (unknown) proportion of American college students, 18-24, who have a smart
phone
μ = the (unknown) mean number of days it takes patients to respond to a drug
In either case, we can't possibly survey the entire population. That is, neither we can survey all
American college students between the ages of 18 and 24 nor can we survey all patients with a
specific disease. So, of course, we do what comes naturally and take a random sample from
the population, and use the resulting data to estimate the value of the population parameter.
Of course, we want the estimate to be "good" in some way.
The following table shows a sample of 30 managers of a company out of the total
2500 managers.
• The mean annual salary (=$51,814) is a point estimate of the population mean
salary (μ=$51,800).
• Similarly sample std. dev. (s=$3348) is a point estimate of the population std. dev.
(σ=$4000).
• The proportion of managers who have completed training (=0.63) is a point
estimate of the population proportion (p=0.60).
Properties of a Point Estimator
Unbiasedness: If the expected value of the sample statistic is equal to the population
parameter being estimated, the sample statistic is said to be an unbiased estimator of
the population parameter.
In discussing the sampling distributions of the sample mean and the sample proportion,
we stated that E() = μ and E() = p. Thus, both  and  are unbiased estimators of
their corresponding population parameters μ and p. In the case of the sample standard
deviation s and the sample variance s2, it can be shown that E(s2) = σ2.
Efficiency: The most efficient point estimator is the one which is having the smallest
variance of all the unbiased estimators. The variance represents the level of dispersion
from the estimate, and the smallest variance should vary the least from one sample to
the other.
Consistency: A third property associated with good point estimators is consistency. A
point estimator is consistent if the values of the point estimator tend to become closer
to the population parameter as the sample size becomes larger. In other words, a large
sample size tends to provide a better point estimate than a small sample size.

You might also like