Chapter 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Chapter Five

Sampling and Sampling Distributions


The Concept of Sampling Distribution

• Samples can be taken from population using a probability or


non-probability sampling techniques.
• Using one of the sampling techniques if we take several
samples from a population, the statistics we would compute
for each sample need not be the same and most likely would
vary from sample to sample because each selected sample do
not likely to contain the same elements.
Cont’d

• Illustration: - Suppose that a population has five elements


(N = 5) 3, 6, 9, 12 and 15.
• If we draw samples of 3 (n = 3) by applying sampling without
replacement, we will have NCn= 5C3 = 10 different possible
sample.
• The following may be the elements in the sample.
• Possible Samples:
3, 6,9 3,6,12 3,6,15 3,9,12 3,9,15
3,12,15 6,9,12 6,9,15 6,12,15 9,12,15
Cont’d

• For each sample we can compute the mean, proportion,


variance, and standard deviations (i.e., the sample statistics).
• The following table reveals the mean value for each sample.
Cont’d

• Sampling distribution: is a probability distribution of all the


values of Sample statistics. To understand how close the
sample values (statistics) to the population parameter we
need to understand the properties of the sampling
distribution of sample statistics.
• Here, we will discuss the sampling distribution of the mean,
proportion and variance.
Sampling Distribution of the Mean

• It is the probability distribution of all possible sample means


of size n that could be taken from a population of size N.
• The computed mean value for each sample with its associated
probability is referred as sampling distribution of the mean.
• For the above illustration, sampling distribution of the mean
is:
Cont’d

• The concept of sampling distribution is helpful in letting us to


make probability statements about the error involved when
the sample mean (𝑋) is used to estimate the populations
mean (µ).
• That is, the practical value of the sampling distribution of the
mean can be used to provide probability information about
the “sampling error”.
Properties of the sampling distribution of the mean

1. The expected value of the sample mean E (𝑥) (the mean of the sample means) is equal to
the population mean. Algebraically E (𝑥) = µ
2. Give the population mean (µ), population standard deviation (δ), the sample size (n) and
population size (N); the standard deviation of the sample mean (δ𝑥) is given as follows:

𝛿𝑋
𝛿𝑋 = … … … … … … … … … … … … … 𝐹𝑜𝑟 𝑖𝑛𝑓𝑖𝑛𝑖𝑡𝑒 𝑝𝑜𝑝𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑛

𝛿𝑋
𝑁−𝑛
𝛿𝑋 = . … … … … … … … … … … … … … 𝐹𝑜𝑟 𝑓𝑖𝑛𝑖𝑡𝑒 𝑝𝑜𝑝𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑛 𝑁−1
Cont’d
• A population is said to be infinite when it is not possible to list
or count all the elements included in the population, (i.e.,
when the elements are unlimited).
• Or, in the cases when the elements in the population are
limited, the population may be considered as infinite when
the sample size is small and as a rule of thumb, statisticians
consider the population as infinite when n ≤ 5% of N.
• A population is said to be finite when n > 0.05 N.
𝑁−𝑛
• The value is referred as finite population correction
𝑁−1
factor.
3. The sampling distribution of the mean is normally distributed
regardless of the population from which it is drawn.
The shape of the sampling distribution of the mean
i. Sampling Distribution of the mean from normal population
• Whenever the population has a normal probability distribution, the
sampling distribution of the mean ( 𝑥 ) is a normal probability
distribution for any sample size.
𝛿𝑋 𝑛
• 𝐸 𝑋 = 𝜇 𝑋 = 𝜇 𝑎𝑛𝑑 𝛿𝑋 = , 𝑓𝑜𝑟 < 0.05 𝑎𝑛𝑑
𝑛 𝑁

𝛿𝑋 𝑁−𝑛 𝑛
• 𝛿𝑋 = . , 𝑓𝑜𝑟 > 0.05
𝑛 𝑁−1 𝑁

ii. Sampling distribution of the mean from unknown or non-normal


population
• We may not always confronted with normal population, we may
also face non-normal or unknown population distributions. How
does the sampling distribution of the mean react in such situations?
Cont’d

• Illustration: Suppose that the data in the table below indicate


the FDI of East Asian countries.
• Table 5.1: Foreign Direct Investment (FDI) of East Asian
Countries in millions of USD

Country Philippines Indonesia Taiwan S.Korea Singapore

FDI 3 3 7 9 14

Population Mean FDI in millions of USD (µ) = 36/5 =7.2


Cont’d

• Considering the data the population may not be normal,


because there are only 5 elements involved in the population
hence too small to be approximated by a normal distribution.
• Let us draw samples of size 3, compute the sample means (𝑋)
and list them; and calculate the mean of sampling
distributionµ(𝑋).
• This is done and put in table below.
Table 3.2: Possible Sample means of the countries’ FDI in millions of USD

• From the table we recognize that even in a case in which the


population is not normal, the mean of the sampling
distribution μ( 𝑋 ) is still equal to the population mean.
Central Limit Theorem (CLT)

• It is one of the most important theorems in statistics.


• In selecting simple random samples of size n from a population of N
elements, the sampling distribution of the sample mean (𝑋) can be
approximated by a normal probability distribution as the sample
size becomes large.
• The significance of the central limit theorem is that it permits us to
use sample statistics to make inferences about population
parameters without knowing anything about the shape of the
frequency distribution of that population other than what we can
get from the sample.
Examples

1. A population of 100 elements has a mean of 19.2 and


standard deviation of 1. What is the mean and standard
deviation of the sampling distribution of the mean for
samples of size 25?
Cont’d

• 2. A library Checks out an average of µ = 320 books per day,


with a standard deviation of δ = 75 books. Consider a sample
of 30 days of operation, with (𝑋) being the sample mean
number of books checked out per day. What is the probability
that the sample mean for 30 days will be between 300 and
340 books?
Cont’d
Cont’d

• The standard normal probability table given us an area of


0.4292 corresponding to a Z value of -1.47, and it gives an
area of 0.4292 for a Z-value 1.47.
• By summing these two together; we get 0.8584 as total
probability that the sample mean will be between 300 and
340 books per day.
Cont’d

• 3. The distribution of annual earnings of all economics


graduates with zero year experience has a mean of 19,000
Birr and a standard deviation of 2,000 Birr. If we draw a
random sample of 30 fresh economic graduates, what is the
probability that their earnings will average more than 19,750
Birr annually?

• Solution: Given µ = 19,000, δ = 2,000,


• Required: P(𝑋 > 19,750)?
Cont’d
Cont’d
• 4. If the number of miles per gallon achieved by all cars of a
particular Model has mean of 25 and standard deviation of 2,
what is the probability that, for a random sample of 20 such
cars, average miles per gallon will be less than 24? Assume
that the population distribution is normal.
Sampling Distribution of Proportion

• We saw in our earlier discussion that if n independent trials,


each with probability of success p, are carried out, then the
total number of success, X, obeys a binomial distribution.
• A common problem arises when the parameter p is unknown.
For instance, we may want to determine the proportion of an
electorate intending to vote for a particular candidate for
office or the proportion of a magazine’s readership likely to be
in the market for a specific product.
• In cases of this kind it is natural to base inference on the
proportion of successes in the sample.
Cont’d

• Definition: - Let X be the number of successes in a binomial


sample of n observations, where the probability of success is p.
(In most applications, the parameter p is the proportion of
members of a large population possessing a characteristics of
interest.) Then the proportion of successes:

𝑋
• 𝑃ത = is the sample proportion
𝑛
Properties of the sampling distribution of the proportion 𝑃ത

1. The expected value of the sample proportion E(𝑃) is equal to


the population proportion, P. Symbolically: E(𝑃) = P
• Exercise: Consider a population of N = 5 given numbers 3, 6,
9, 12, and 15. Let’s take even numbers, the proportion of
even numbers is 2/5 = 0.4. Consider a samples of size 3 (n =
3), considering all the possible samples, show the expected
value of sample proportions is equal to population
proportion.
Cont’d

2. Just as with the standard deviation of the sample means (𝛿𝑋 ),


the standard deviation of the sample proportion (𝛿𝑃 ) also
depends on whether the population is finite or infinite. It follows
that the standard deviation of the sample proportion is:
Cont’d
3. When the population from which the samples are drawn is normal,
the form of the sampling distribution of the sample proportion is also
normal. In the other case, when the population is non-normal or
unknown applying the central limit theorem the sampling distribution
of the sample proportion (𝑃) can be approximated by normal
probability distribution provided that these two conditions are
fulfilled:

• np ≥ 5 and n (1 – p) ≥ 5 Where n is the sample size, and


• P is any given population proportion.
Examples

1. A population proportion is 0.4. A simple random sample of


size 200 will be taken and the sample proportion (𝑃) will be
used to estimate the population proportion. What is the
probability that the sample proportion will be within ± 03 of the
population proportion?
Cont’d
Cont’d

2. A corporation receives 100 applications for a position from


recent college graduates in Economics. Assuming that these
applicants can be regarded as a random sample of all such
graduates, what is the probability that between 25% and 35% of
them are women if 30% of all recent college graduates in
Economics are women?
Cont’d
Sampling distribution of Variance

• The sample variance is different for different random samples


from the same population.
• That means the sample variance is a random variable having a
sampling distribution.
• Exercise with the previous example where N=5 and n=3.
Properties of the sampling distribution of sample variance
Cont’d
• Statisticians have established that, if the population
distribution is normal then a statistic (n -1)S2/σ2 has a χ2 (Chi-
Square) distribution with n – 1 degrees of freedom where the
chi-square distribution is a family of continuous distributions,
depending on degrees of freedom.
• This statistic is a test statistic used to make inferences about
population based on information provided by sample
variance.
• The Chi-Square distributions are defined only for positive
values of a random variable, which is appropriate in the
present context since a sample variance cannot be negative.
• The density function is asymmetric.
Cont’d

• The properties of the χ2 distribution can therefore be used to


find the variance of the sampling distribution of the sample
variance. (It should be repeated that this result holds only
when the parent population is normal).
• We frequently need to find values of the cumulative
distribution function for a χ2 random variable.
• Such problems are often phrased in terms of the
determination of cutoff point corresponding to particular
specified probabilities.
Cont’d

• For instance, if a random variable has a χ 2


10distribution we
may require the number K for which
• P (χ210 >K) = 0.10
• The distribution function of the chi-square random variable is
available in the Appendix of many statistical books and from
these tables these cutoff points can be read directly.
• For the χ 210random variable, it can be seen from a table that
the P(χ210 >K) = 0.10, then K= 15.99.
Cont’d

• Example: Suppose a commercial freezer must hold a selected


temperature with little variation. Specifications call for a
standard deviation of no more than 4 degrees. If a sample of
14 freezers is to be tested, what is the upper limit (K) for the
sample variance such that the probability of exceeding this
limit, given that the population standard deviation is 4, is less
than 0.05?
The F-distribution

• There are cases where statistical analysis


involves comparing two-population variance.
• You might need to compare the precision of
one measuring device with that of another,
the stability of one manufacturing process
with that of another, or even the variability in
the grading procedure of one college
instructor with that of another.
Cont’d

One-way to compare two population variances,  1 and  2 , is to use the ratio of the sample
2 2

variances, s 1
2
. When independent random samples are drawn from two normal populations
s 2

with equal variance from two normal populations with equal variance, ( 1 = 2
2 2
) then s 1
2
s 2

has a probability distribution in repeated sampling that is termed as F-distribution.


Cont’d

F-distribution is a sampling distribution of the ratio of two independent random variables with
chi square distributions, each divided by its respective degree of freedom. If U and v are
independent random variables having chi square distributions with v1 and v2 degree of freedom,
then


2

u 1
v1 v1
F= =

2
v
v2 2
v2

Is a random variable having F-distribution whose values vary with every set of two samples
of size n1 and n2.
Cont’d

 and  in the above equation


2 2
Substituting the value of
1 2

(n1 − 1) s 1
2

n1 − 1

2

F= 1

(n2 − 1) s 2
2


2
2
n2 − 1

F =  12 s 12
2 2

s 1 2
Cont’d

• Application: Hypothesis testing


The t Distribution

• The major difficulty in applying the normal distribution is that


in most realistic applications the population standard
deviation σ is unknown. This makes it necessary to replace σ
with an estimate, usually with the value of the sample
standard deviation S.
• Thus, the theory that follows leads to the exact distribution of

for random samples from normal populations.


Cont’d

• Given a random sample of n observations, with mean and


standard deviation s, from normally distributed population
with mean , the random variable t, follows the student’s t
distribution with (n-1) degrees of freedom.
• The shape of the student’s t distribution is rather similar to
that of the standard normal distribution.
• Both distributions have mean zero, and the probability density
functions of both are symmetric about their mean.
• However, the density function of the student’s t distribution
has a larger dispersion (variability) than the standard normal
distribution.
• The actual amount of variability in the sampling distributions
of t depends on the size of the sample n.
Cont’d

• As the number of degree of freedom increases,


(sample size increases) the student’s t-
distribution becomes increasingly similar to the
standard normal distribution.
• This is intuitively reasonable and follows from the
fact that for a large sample size, the sample
standard deviation is a very precise estimator of
the population standard deviation.
• In particular, the small the degree of freedom
associated with the t-statistic, the more the
variable will be its sampling distribution.
Cont’d

• A sample t statistic is defined as the ratio of


the standard normal Z to the square root of
chi square distribution.
Cont’d
By virtue of the central limit theorem as n tends to be large, the sample standard error
approaches population standard deviation . That is when n becomes large; the t-statistic
approaches the standard normal variable. That is If n>30, then s   so that
X − X −
t= = Z = .
 
n n

• Application: Hypothesis testing about population mean,


difference between two populations means, testing for an
observed regression coefficient, etc.

You might also like