Chap1 B Stat

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Chapter one- Sampling and sampling distribution, Department of AcFn

CHAPTER 1
Sampling and sampling distribution

1.1. Sampling theory


1.1.1. Basic Definitions
Population: is totality of statistical information on a particular character of all
the members covered by an investigation. Statistical information is 'finite' or
'infinite' according to its size. A population is said to be finite when the number
of members of the population can be expressed as a definite quantity otherwise
it is infinite population.
Population size: is the total number of members of the population.
Parameter: Characteristic or measure obtained from a population.
A Sample: is a portion or part of the population of interest.
Sample size: is the number of items included in the sample.
Statistic: Characteristic or measure obtained from a sample.
Sampling: The process or method of sample selection from the population.
Sampling unit: the ultimate unit to be sampled or elements of the population
to be sampled
Examples: If somebody studies Socio-economic status of the households,
households are the sampling unit.
If one studies performance of freshman students in certain colleges,
the student is the sampling unit.
Sampling frame: is the complete list of all units in finite population.
Examples: List of households.
List of students in the registrar office.

1.1.2. The Need for Samples


There are many reasons for sampling a population.
a) The results of a sample may be adequately estimating the value of the
population parameter, thus saving time and money.
b) It may be too time consuming to contact all members of the population.
c) It may be impossible to check or locate all the members of the population.
d) The cost of studying all items in the population may be prohibitive.
e) Often testing destroys the sampled item and it cannot be returned to the
population.

1.1.3. Designing and conducting a sampling study


This section describes how to select a sample. First describe how to sample
from a finite population and then describe how to select a sample from an
infinite population.

Business Statistics, AcFn 2132 Page 1


Chapter one- Sampling and sampling distribution, Department of AcFn

Finite Population
Statisticians recommend selecting a probability sample when sampling from a
finite population because a probability sample allows them to make valid
statistical inferences about the population. The simplest type of probability
sample is one in which each sample of size n has the same probability of being
selected. It is called a simple random sample.

Infinite Population
Sometimes we want to select a sample from a population, but the population is
infinitely large or the elements of the population are being generated by an on-
going process for which there is no limit on the number of elements that can be
generated. Thus, it is not possible to develop a list of all the elements in the
population. We cannot select a simple random sample because we cannot
construct a frame consisting of all the elements. Statisticians recommend
selecting what is called a random sample.
A random sample of size n from an infinite population is a sample selected
such that the following conditions are satisfied.
 Each element selected comes from the same population.
 Each element is selected independently.
Situations involving sampling from an infinite population are usually
associated with a process that operates over time. Examples include parts
being manufactured on a production line, repeated experimental trials in a
laboratory, transactions occurring at a bank, telephone calls arriving at a
technical support center, and customers entering a retail store. In each case,
the situation may be viewed as a process that generates elements from an
infinite population. As long as the sampled elements are selected from the
same population (i.e., must be selected at approximately the same point in
time) and are selected independently (i.e., selecting of one element must not
affect the next selection), the sample is considered a random sample from an
infinite population.

1.1.4. Bias and Errors in Sample Survey


a) Sampling error:
Samples are used to estimate population characteristics. For example, the
mean of sample is used to estimate the population mean. However, since the
sample is a part or portion of the population, it is unlikely that the sample
mean would be exactly equal to the population. Similarly, it is unlikely that the
sample standard deviation would be exactly equal to the population standard
deviation. We can therefore expect a difference between a sample statistic and

Business Statistics, AcFn 2132 Page 2


Chapter one- Sampling and sampling distribution, Department of AcFn

its corresponding population parameter. This difference is called sampling


error.
Consequently, sampling error is the discrepancy between the population value
and sample value. In other word it is the difference between a sample statistic
and its corresponding population parameter.
Example: Consider the population of five employees at ABC Industries. Last
week the output for each employee was 97, 103, 96, 99, and 105 units.
 Suppose a sample of two employees was selected and their outputs were
97 and 105. The mean of this sample is 101 [(97 + 105)/2].
 Another sample of two employees resulted in outputs of 103 and 96, so
the mean of this sample is 99.5
 The mean of all the outputs (the population mean) is 100, found by
μ = ∑X = 97 + 103 + 96 + 99 + 105 = 100
N 5
 The sampling error for the first sample is 1.0 (sample mean - μ = 101 -
100) whereas the sampling error for the second sample is - 0.5 (sample
mean - μ = 99.5 - 100). each of these differences, 1.0 and -0.5, is the
sampling error made in estimating the population mean based on the
sample mean.
 Sampling error may arise due to inappropriate sampling techniques
applied.
b) Non Sampling Errors
This kind of error is known as bias. Bias arises solely due to human factors
and difficult to detect. They are errors due to procedure bias such as:
 Due to incorrect responses and measurement problems
 Errors at different stages in processing the data.
1.1.5. Types of Samples
There are two types of sampling.
1. Random Sampling or probability sampling.
It is a method of sampling in which all elements in the population have a pre-
assigned non-zero probability to be included in to the sample. Samples are
selected using the principles of probability.
It is classified into four types.
 Simple random sampling
 Systematic sampling
 Stratified random sampling
 Cluster sampling
Simple Random Sampling
It is a method of selecting items from a population such that every possible
sample of specific size has an equal chance of being selected. Or all elements in

Business Statistics, AcFn 2132 Page 3


Chapter one- Sampling and sampling distribution, Department of AcFn

the population have the same pre-assigned non-zero probability to be included


in to the sample.
Simple random sampling can be done either using the lottery method or table
of random numbers. In practice the members of the sample are drawn one by
one. There are two ways of drawing a simple random sample.

i) Simple Random Sampling With Replacement (SRSWR)


Simple Random Sample is said to be "with replacement", when the sample
members are drawn from the population one by one, and after each drawing,
the selected population unit is returned to the population before the next one is
drawn. i.e., at each stage of the sampling process all the population units
(including those obtained in earlier drawings) are considered for selection with
equal probability.
The population remains the same before each drawing and any of the
population units may appear more than once in the sample.

ii) Simple Random Sampling Without Replacement


Simple Random Sample is said to be "without replacement", when either the
sample members are drawn all at a time, or drawn one by one in such a
manner that after each drawing the selected unit is not returned to the
population when the next one is drawn.
At each stage of the sampling process the population units already chosen are
not considered for subsequent selections. The size of the population goes on
diminishing as the sampling process continues. Consequently, no population
unit can appear more than once in the sample.

Systematic Sampling
A complete list of all elements within the population (sampling frame) is
required. The items or individuals of the population are arranged in some
order. In a systematic sampling, a random starting point is selected using a
simple random sampling method and then every kth member of the population
is selected.

The steps involved in constructing a systematic sampling scheme are as below:


Step 1: Calculate k. It is the population size is divided by the sample size.
N
N  populationsize, n  samplesize, k   samplingint erval.
n
Step 2: Select any number between 1 and k by using random sampling.
Suppose it is j (1 ≤ j ≤ k).

Business Statistics, AcFn 2132 Page 4


Chapter one- Sampling and sampling distribution, Department of AcFn

Step 3: Starting with this number select every kth number until all the n
units are selected. The jth unit is selected at first and then (j + k)th, (j +
2k)th,..., etc until the required sample size is reached.
Stratified Random Sampling
When a population can be clearly divided into groups based on some
characteristic, and then stratified random sampling can be used to guarantee
that each group is represented in the sample. The groups are also called
Strata. Once the strata are defined, we can apply simple random sampling
within each group or strata to collect the sample. Elements in the same strata
should be more or less homogeneous while different in different strata. It is
applied if the population is heterogeneous.

Some of the criteria for dividing a population into strata are: Sex (male, female);
Age (under 18, 18 to 28, 29 to 39); Occupation (blue-collar, white collar,
others).
Example 1: Population has 25 students of whom 15 are white and 10 black. A
stratified sample of size 10 should have how many whites / blacks? Let
N=population size, N1=blacks, N2=whites, n=sample size.
N1 /N = (10/25)*10 or 4 blacks and (N2/N)*n= (15/25)*10 or 6 we have a
representative sample

Cluster Sampling
Another common type of sampling is cluster sampling. It is often employed to
reduce the cost of sampling a population scattered over a large geographic
area. In cluster sampling, a population is divided into clusters using naturally
occurring geographic or other boundaries. Then, clusters are randomly selected
and a sample is collected by randomly selecting from each cluster.
Clusters are formed in a way that elements within a cluster are heterogeneous,
i.e. observations in each cluster should be more or less dissimilar. Cluster
sampling is useful when it is difficult or costly to generate a simple random
sample. For example, to estimate the average annual household income in a
large city we use cluster sampling, because to use simple random sampling we
need a complete list of households in the city from which to sample.
To use stratified random sampling, we would again need the list of households.
A less expensive way is to let each block within the city represent a cluster. A
sample of clusters could then be randomly selected, and every household
within these clusters could be interviewed to find the average annual
household income.

Business Statistics, AcFn 2132 Page 5


Chapter one- Sampling and sampling distribution, Department of AcFn

2. Non Random Sampling or Non-Probability Sampling.


The fundamental difference between non-probability sampling and probability
sampling is that in non-probability sampling procedure, the selection of the
sample units does not ensure a known chance to the units being selected. In
other words, the units are selected without using the principle of probability.

It is a sampling technique in which the choice of individuals for a sample


depends on the basis of convenience, personal choice or interest. Even though
the non-probability sampling has advantages such as reduced cost, speed, and
convenience in implementation, it lacks accuracy in view of the selection bias.
Another negative point of the non-probability sampling is its inability to
generalize results from the sample to the population. It is mandatory in
inferential statistics to use only probability sampling for valid conclusions.
Non-probability sampling is suitable for pilot studies and exploratory research.
It is classified into three:
 Judgment sampling.
 Convenience sampling
 Quota Sampling.

Judgment (Purposive) Sampling


A sample which is selected on the basis of individual judgment of the sampler,
is called a Purposive Sample. There is no special technique for selecting a
purposive sample, but the sampler picks out a typical or representative sample
according to his own judgment. Consequently, there is much scope for bias
and the degree of accuracy of the estimates is not known.

Purposive sampling may be useful when the sample is small; but as the sample
size increases the estimates become unreliable due to accumulation of bias.
The advantage of purposive sampling is that whereas a random sample may
vary widely from the average, a purposive sample will not.

Convenience Sampling
In this method, the decision maker selects a sample from the population in a
manner that is relatively easy and convenient. The researcher samples
whatever units come most readily to hand. In this case, probability is not used
in the sampling at all.

Quota Sampling
In this method, the decision maker requires the sample to contain a certain
number of items with a given characteristic. Many political polls are, in part,

Business Statistics, AcFn 2132 Page 6


Chapter one- Sampling and sampling distribution, Department of AcFn

quota sampling. In simple terms, quota sampling is stratified random sampling


without probability principle being applied to the selection of the sample units.

Example: Suppose in an opinion study, you want both men and women to
participate. You know that in the population category of interest, 65% are men
and 35 % are women. If your sample size is fixed at 200, you will have a quota
of 130 men and 70 women.

3.2. SAMPLING DISTRIBUTION


3.2.1. Definitions
Sampling distribution is defined as the sampling distribution of the mean; the
sampling distribution of the mean is the probability distribution of the sample
mean. If we organize the arithmetic means of all possible sample of size n from
a given population size N into a probability distribution, we will get the
sampling distribution of the mean.
In other words, it is a probability distribution consisting of a list of all possible
sample mean of a given sample size selected from the population and the
probability of occurrence associated with each sample mean.
Sampling distribution of the mean is a probability distribution of all possible
sample means of a given sample size.

3.2.2. Sampling Distribution of the Mean


Steps for the construction of sampling distribution of the mean
1. From a finite population of size N , randomly draw all possible samples of
size n .
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution
or relative frequency distribution.
Example: Consider a population with size of four (N = 4) and it includes
variables 2, 3, 5 and 6. If samples with the size of two are to be taken without
replacement from the population
a) Compute the population mean
b) How many possible sample arrangements will be there?
c) Identify the possible samples.
d) Compute the sample means.
e) Develop the sampling distribution of the mean.
f) Determine the mean of the sampling distribution of the sample mean.
Solution
a) μ = 2 + 3 + 5 + 6 = 4
4

Business Statistics, AcFn 2132 Page 7


Chapter one- Sampling and sampling distribution, Department of AcFn

b) nCx = 4C2 = 4! = 6 samples


2! (4 - 2)!
c) n1 = (2, 3), n2 = (2, 5), n3 = (2, 6), n4 = (3, 5), n5 = (3, 6), and n6 = (5, 6)
d) x for n1 = 2 + 3 = 2.5, x for n2 = 2 + 5 = 3.5, x for n3 = 2 + 6 = 4, x for n4 =
3 + 5= 4 2 2 2
2
x for n5 = 3 + 6 = 4.5, x for n6 5 + 6 = 5.5
2 2

e)

x 2.5 3.5 4 4.5 5.5


f (x) 1 1 2 1 1
P( x ) 1/6= 0.1667 1/6= 0.1667 2/6=0.333 1/6=0.1667 1/6=0.1667

f) The mean of the sampling distribution of the sample mean is obtained by


summing the various sample means and dividing the sum by the number of
the samples. The mean o+f all the sample means is usually written μ x .

μ x = Sum of all sample means = 2.5 + 3.5 + 4 + 4 + 4.5 + 5.5 = 24 = 4


6 6
In summary, this example illustrates important relationships between the
population distribution and the sampling distribution of the sample mean.
1. The mean of the sample means is exactly equal to the population mean.
2. The dispersion of the sampling distribution of sample means is narrower
than the population distribution. This is because the spread in the
distribution of the sample mean ranges from 2.5 to 5.5, while the population
values vary from 2 up to 6.
3. The shape of the sampling distribution of the sample mean and the shape of
the frequency distribution of the population values are different. The
distribution of the sample mean tends to be more bell-shaped and to
approximate the normal probability distribution.
Properties of the Sampling Distribution of x
1. The mean of the sample means is exactly equal to the population mean.
μ = μx
2. In general if sampling is Infinite or,
Finite with less than or equal to 5% of the population sample size

Business Statistics, AcFn 2132 Page 8


Chapter one- Sampling and sampling distribution, Department of AcFn

Then the standard deviation of the distribution of the sample mean is equal
to the population standard deviation divided by the square root of the
sample size.
σ   N  n
δx = Otherwise, X   
n n  N 1 

Note that as we increase the size of the sample, the spread of the distribution
of the sample mean becomes smaller.
Example: The standard deviation of annual salary for the population of 2500
managers is σ = 4000. In this case, the population is finite, with N = 2500. With
a sample size of 30, compute the standard deviation of the sample mean.
Solution: we have n/N = 30/2500 = .012. Because the sample size is less than
5% of the population size, we can ignore the finite population correction factor
and use the infinite population factor.
δx =
4000
30
= 730.3
Take the same example above and if the sample size is 1000 instead of 30,
compute the standard deviation of the sample mean.
Solution: we have n/N = 1000/2500 = 0.4. Because the sample size is more
than 5% of the population size, we can use the finite population correction
factor.
4000  2500  1000 
X   
1000  2500  1 
= 98
Form of the Sampling Distribution of X
The preceding results concerning the expected value and standard deviation for
the sampling distribution ofX are applicable for any population. The final step
in identifying the characteristics of the sampling distribution of X is to
determine the form or shape of the sampling distribution. We will consider two
cases: (1) The population has a normal distribution; and (2) the population
does not have a normal distribution.
Population has a normal distribution. In many situations it is reasonable to
assume that the population from which we are selecting a random sample has
a normal, or nearly normal, distribution. When the population has a normal
distribution, the sampling distribution of X is normally distributed for any
sample size.

Business Statistics, AcFn 2132 Page 9


Chapter one- Sampling and sampling distribution, Department of AcFn

Population does not have a normal distribution. When the population from
which we are selecting a random sample does not have a normal distribution,
the central limit theorem is helpful in identifying the shape of the sampling
distribution of X . A statement of the central limit theorem as it applies to the
sampling distribution of X follows.
Central Limit Theorem: - In selecting random samples of size n from a
population, the sampling distribution of the sample mean can be approximated
by a normal distribution as the sample size becomes large.
 and finite variance  ,
2
Given a population of any functional form with mean
the sampling distribution of X , computed from samples of size n from the
population will be approximately normally distributed with mean  and

2
variance , when the sample size is large.
n
From a practitioner standpoint, we often want to know how large the sample
size needs to be before the central limit theorem applies and we can assume
that the shape of the sampling distribution is approximately normal. Statistical
researchers have investigated this question by studying the sampling
distribution of for a variety of populations and a variety of sample sizes.
General statistical practice is to assume that, for most applications, the
sampling distribution of x¯ can be approximated by a normal distribution
whenever the sample is size 30 or more.
Example- The distribution of annual earnings of all bank tellers with five years
of experience is skewed negatively. This distribution has a mean of Birr 15,000
and a standard deviation of Birr 2000. If we draw a random sample of 30
tellers, what is the probability that their earnings will average more than Birr
15,750 annually?
Solution:
Steps:
1. Calculate µ and  x
µ = Birr 15,000
 x = δ/√n= 2000/√30 = Birr 365.15
2. Calculate Z for X
X X X 
ZX  
X X
15,750  15,000
Z15, 750    2.05
365

Business Statistics, AcFn 2132 Page 10


Chapter one- Sampling and sampling distribution, Department of AcFn

3. Find the area covered by the interval


P ( X > 15,750) = P (Z > +2.05)
= 0.5 - P (0 to +2.05)
= 0.5 – 0.47892
= 0.02018
4. Interpret the results
There is a 2.02% chance that the average earning being more than Birr 15,
750 annually in a group of 30 tellers.

3.2.3. Sampling Distribution of the proportion


The sample proportion p is the point estimator of the population proportion p.
The formula for computing the sample proportion is
𝑥
p =𝑛
Where; x= the number of elements in the sample that possess the
characteristic of interest. n = sample size

The sampling distribution of p is the probability distribution of all possible


values of the sample proportion p . To determine how close the sample
proportion is to the population proportion p, we need to understand the
properties of the sampling distribution of: the expected value of, the standard
deviation of, and the shape or form of the sampling distribution of p .
Expected Value of p
The expected value of p , the mean of all possible values of p , is equal to the
population proportion p.
E(p)=p
Where E ( p ) = the expected value of p
p = the population proportion
Example: A research institute study about managers who participated in
management training program and have a proportion of 60%. The expected
value of p for the sampling problem is 60 %.

Standard deviation of p
Just like the standard deviation of mean, the standard deviation of p depends
on whether the population is finite or infinite. The two formulas for computing
the standard deviation of p are as follows:

Business Statistics, AcFn 2132 Page 11


Chapter one- Sampling and sampling distribution, Department of AcFn

Example: Let’s take the previous situation again. The population proportion of
managers who participated in the management training program is p = .60 with
sample size of 30 and population of 2500.
n/N = 30/2500 = 0.012, we can ignore the finite population correction factor
when we compute the standard error of the proportion. For the simple random
sample of 30 managers, is

A research institute study about managers who participated in management


training program and have a proportion of 60%. The expected value of p for
the sampling problem is 60 %.

Form of the sampling distribution of p


Now that we know the mean and standard deviation of the sampling
distribution of p , the final step is to determine the form or shape of the
sampling distribution. The sample proportion is p = x/n. For a simple random
sample from a large population, the value of x is a binomial random variable
indicating the number of elements in the sample with the characteristic of
interest. Because n is a constant, the probability of x/n is the same as the
binomial probability of x, which means that the sampling distribution of p is
also a discrete probability distribution and that the probability for each value of
x/n is the same as the probability of x.

To determine whether the sample size is large enough, it must satisfy the
following two conditions: np ≥ 5 and n (1 - p) ≥ 5. The sampling distribution of
p can be approximated by a normal distribution whenever np ≥ 5 and n (1 - p)
≥ 5.

In practical applications, when an estimate of a population proportion is


desired, we find that sample sizes are almost always large enough to permit the
use of a normal approximation for the sampling distribution of p .
Example: Recall the previous example, we know that the population proportion
of managers who participated in the training program is p = 0.60. With a
simple random sample of size 30, we have np = 30(0.60) = 18 and n (1 - p) =

Business Statistics, AcFn 2132 Page 12


Chapter one- Sampling and sampling distribution, Department of AcFn

30(0.40) = 12. Thus, the sampling distribution of p can be approximated by a


normal distribution.
Illustration:
Suppose that 60% of the electrical contractors in a region use a particular
brand of wire. What is the probability of taking a random sample of size 120
from these electrical contractors and finding that 0.5 or less use that brand of
wire?
Solution:
n = 120 P = 0.6 q = 0.4 P ( p < 0.5) =?
Steps:
1. Check that np and nq > 5
120*0.6 = 120, and 120*0.4 = 48. Both are greater than 5.
2. Calculate  P
Pq 0.6 * 0.4
P  =   0.0477
n 120
3. Calculate Z for p
P P
Zp 
p
0.5  0.6
Z 0.5    2.24
0.0477
4. Find the area covered by the interval
P ( p < 0.5) = P (Z < -2.24)
= 0.5 - P (0 to -2.24)
= 0.5 – 0.48745
= 0.01255

5. Interpret the results


The probability of finding 50% or less of the contractors to use this
particular brand of wire is very low (1.255%) if we take a random sample of
120 contractors.

Exercises
1. The mean undergraduate cost for tuition, fees, room, and board for four-
year institutions was $26,489 for a recent academic year. Suppose that
standard deviation of $3204 and that of 36 four-year institutions are
randomly selected. Find the probability that the sample mean cost for these
36 schools is

Business Statistics, AcFn 2132 Page 13


Chapter one- Sampling and sampling distribution, Department of AcFn

A. Less than $25,000


B. Greater than $26,000
C. Between $24,000 and $26,000
D. Find the standard error of the sampling distribution
2. Time spent using e-mail per session is normally distributed, with mean of 8
minutes and standard deviation of 2 minutes. If you select a random sample
of 25 sessions,

A. What is the probability that the sample mean is between 7.8 and 8.2
minutes?
B. What is the probability that the sample mean is between 7.5 and 8
minutes?
C. If you select a random sample of 100 sessions, what is the probability
that the sample mean is between 7.8 and 8.2 minutes?
3. Suppose that during any hour in a large department store, the average
number of shoppers is 448, with a standard deviation of 21 shoppers. What
is the probability of randomly selecting 49 different shopping hours,
counting the shoppers, and having the sample mean fall between 441 and
446 shoppers, inclusive?

4. The U.S. Census Bureau announced that the median sales price of new
houses sold in 2009 was $215,600, and the mean sales price was $270,100.
Assume that the standard deviation of the prices is $90,000.

A. If you select a random sample of n=100 what is the probability that the
sample mean will be less than $300,000?
B. If you select a random sample of n= 100 what is the probability that the
sample mean will be between $275,000 and $290,000?
5. A population has a mean of 200 and a standard deviation of 50. Suppose a
simple random sample of size 100 is selected and is used to estimate μ.
A. What is the probability that the sample mean will be within +5 of the
population mean?
B. What is the probability that the sample mean will be within +10 of the
population mean?
6. A population proportion is .40. A simple random sample of size 200 will be
taken and the sample proportion will be used to estimate the population
proportion.
A. Compute the standard error of the proportion
B. What is the probability that the sample proportion will be within+.03 of
the population proportion?

Business Statistics, AcFn 2132 Page 14


Chapter one- Sampling and sampling distribution, Department of AcFn

C. What is the probability that the sample proportion will be within +.05 of
the population proportion?
7. In a recent survey of full-time female workers ages 22 to 35 years, 46% said
that they would rather give up some of their salary for more personal time.
Suppose you select a sample of 100 full-time female workers 22 to 35 years
old.
A. What is the probability that in the sample, fewer than 50% would rather
give up some of their salary for more personal time?
B. What is the probability that in the sample, between 40% and 50% would
rather give up some of their salary for more personal time?
C. What is the probability that in the sample, more than 40% would rather
give up some of their salary for more personal time?

Business Statistics, AcFn 2132 Page 15

You might also like