Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Chapter 1: Sampling Distribution

1
Sampling
Distribution

1.1 Introduction

A population and a sample are two important aspects of a statistical study. A


population is a set of units, for example persons, objects or items of interest,
whose characteristics are under studied. It can be a widely defined category such
as “all students in IPTA” or it can be narrowly defined such as “all first year
students in UTM”. A sample is a subset of the units of the population. If
properly taken, it is a representative of the population.

One of the purposes of statistical inference is to develop estimates about the


characteristics of a population using information contained in a sample. In
estimating population parameters such as ,  2 and  , sample statistics such as
x , s 2 and p , are computed respectively. However it is important to realize that
the sample results provide only estimates of the values of the population

1
Chapter 1: Sampling Distribution

characteristics. That is, we do not expect the values of the sample statistics to be
exactly equal to the values of the population parameters. Similarly, we do not
expect the values of the statistics computed from one sample to be equal to the
values computed from another sample selected from the same population. This
is because a second sample will not contain the same elements as the first.

Suppose we want to estimate . Since many random samples are possible from
the same population, we would expect x to vary from sample to sample. If we
consider the process of selecting random samples as an experiment, the sample
mean x is a numerical description of the outcome of the experiment. Thus, the
sample mean x is a random variable, and just like any other random variables, x
has a mean, a variance and a probability distribution.

Definition
Because the possible values of x are the result of different samples, the
probability distribution of X is called the sampling distribution of X .

1.2 Sampling Distributions

Knowledge of the sampling distribution and its properties will enable us to make
probability statements about how close a sample statistic is to the population
parameter. The sampling distribution of a statistic depends on the size of the
population, the size of the samples, and the method of choosing the samples. The
sampling distributions discussed in this chapter are the sampling distributions of
the mean, difference between two means, proportion and difference between
two proportions.`

2
Chapter 1: Sampling Distribution

1.2.1 Sampling Distribution of X from


Normal Distribution

The first important sampling distribution to be considered is that of the mean


X . The sampling distribution of X is the distribution of sample means x , which
may be different from one sample to the next, resulting from an experiment
conducted over and over on different samples of the same size.

Suppose that a random sample of n observations is taken from a normal


population with mean  and variance 2. Each observation X i , i = 1, 2, …, n, of
the random sample will then have the same normal distribution as the population
being sampled. Then X will have a normal distribution with mean

 n 
  Xi  1  n  1 n 
 X  E ( X )  E  i 1   E   X i     E  X i  
 n  n  i 1  n  i 1 
 
 
1 1
  E ( X 1 )  E ( X 2 )  ...  E ( X n )        ...   
n n
1
  n   
n
and variance
2
 n
Xi  1  n 
 X2  Var ( X )  Var   
 n Var   Xi 
 i 1 n     i 1 
1
 Var ( X 1 )  Var ( X 2 )  ...  Var ( X n ) 
n2
1 n 2  2
 2     ...     2 
 2 2 2

n n n

Therefore we can write the distribution for X as

3
Chapter 1: Sampling Distribution

 2
X ~ N   , 
 n 

Note that this is the distribution for X when a sample is taken from an infinite

N  n  2 
population. If the population is finite,  2X   
N  1  n 

Example 1.1

A beverage company produces orange-flavoured cordial in bottles with a mean


volume of 758 ml and a standard deviation of 12 ml. A random sample of 10
bottles is taken and the mean volume is calculated. Assuming that the volume of
the orange-flavoured cordial is distributed approximately normal, find
a) the probability that the mean volume is less than 750 ml.
b) the probability that the mean volume lies between 745 ml and 775 ml.
c) the probability that the mean volume is greater than 780 ml.

Solution

4
Chapter 1: Sampling Distribution

1.2.2 Sampling Distribution of X from Non-


Normal Distribution

When the population distribution is unknown, we rely on one of the most


important theorem in statistics called the central limit theorem. According
to this theorem, when samples are taken from a population that is not normally
distributed, the sampling distribution takes on the characteristic normal shape as
the sample size increases. That is, when we are sampling from a population with
unknown distribution, the sampling distribution of X will still be approximately

2
normal with mean  and variance provided that the sample size is large.
n
Then the limiting form of the distribution of
X 
Z
/ n
as n  , is the standard normal distribution N (0,1) .

Note:
 The normal approximation for X will generally be good if the sample
size, n  30.
 Continuity correction factor is not needed for the random variable X .
Example 1.2

In an examination taken by a large number of students the mean mark was 64.5
and the variance was 64. If a random sample of 100 scripts is selected, what is
the probability that the mean mark
a) is at least 75?
b) is between 63.8 and 64.5?
c) is at most 70?

5
Chapter 1: Sampling Distribution

Solution

Example 1.3

30 random observations are taken from the following distribution:


X : the number of heads obtained when an unbiased coin is tossed nine times.
What is the probability that
a) the sample mean exceeds 5?
b) the sample mean lies between 4 and 8?

Solution

6
Chapter 1: Sampling Distribution

1.2.3 Sampling Distribution of Difference


between Two Sample Means X X
1 2

Suppose that we have two populations, X 1 and X 2 which are normally

distributed. X 1 has mean 1 and variance  12 while X 2 has mean

 2 and variance  22 . These two distributions can be written as


X 1 ~ N ( 1 ,  12 ) and X2 ~ N (  2 ,  22 )

Let X 1 represents the mean of a random sample of size n1 selected from X 1

and X 2 represents the mean of a random sample of size n2 selected from X 2 .

The distributions for random variables X 1 and X 2 can be written as:

  12    22 
X1 ~ N  1 ,  and X2 ~ N   2 , 
 n1   n2 
Now we are interested in finding out what is the sampling distribution of the

difference between two sample means, for example, the distribution of X 1  X 2 .

The mean of X 1  X 2 is

 X  X  E  X 1  X 2   E  X 1   E  X 2   1   2
1 2

and the variance is

Var X 1  X 2   Var ( X 1 )  Var ( X 2 )  Var ( X 1 )  (1) 2 Var ( X 2 )


σ12  22
 
n1 n2

Therefore the distribution of X 1  X 2 can be written as:

  12  22 
X1  X 2 ~ 
N  1   2 , 
 n1 n2 

Hence,

7
Chapter 1: Sampling Distribution

( X 1  X 2 )  ( 1   2 )
Z ~ N (0,1)
  12    22 
 n  n 
 1  2

Example 1.4
A factory produces two types of tennis balls, type A and type B. It is known that
the mean height of bounce is 140 cm for type A with a standard deviation of 2
cm while the mean height of bounce for type B is 138 cm with a standard
deviation of 3 cm. Assume that the heights of bounce are approximately normal.
A random sample of 49 tennis balls of type A and 36 tennis balls of type B are
taken. Find the probability that the
a) mean height of bounce of type A tennis balls is larger than the mean height
of bounce of type B tennis balls.
b) difference between the mean height of bounce of type A tennis balls and
type B tennis balls is at most 3 cm.
c) mean height of bounce of both types of tennis balls differ by at most 3 cm.

Solution

8
Chapter 1: Sampling Distribution

Example 1.5
An engineering firm sets an aptitude test when applicants first apply for training.
For the first batch of applicants, the mean time taken to complete the test is 40.5
minutes with standard deviation 7.5 minutes. For the second batch of applicants,
the mean time taken to complete the test is 42.5 minutes with standard
deviation 6.5 minutes. Assume that the times taken to complete the aptitude test
are approximately normal. If the times taken by a random sample of 35
applicants in the first batch and the times taken by a random sample of 40
applicants in the second batch are selected, find the probability that
a) mean time taken by the first batch of applicants is less than the time taken
by the second batch of applicants.
b) the mean time taken by the two batches of applicants will differ by more
than 3 minutes.

Solution

9
Chapter 1: Sampling Distribution

1.2.4 Sampling Distribution of the


Proportion P
For a random sample of size n drawn from a binomial population that has a
proportion of successes  , the probability of X = x successes can be obtained
from the binomial probability function.

n
P( X  x)     x (1   ) n  x , for x = 0, 1, 2, …, n
 x
If all possible samples of size n is taken from this distribution, and the number of
successes is determined from each sample, then the proportion of successes for
X
each sample can be characterized by p  . Since each sample will give a
n
different value of p, then the proportion is a random variable and symbolized as
P. Therefore the resulting sampling distribution of P has mean  and variance
 (1   )
. By the Central Limit Theorem, the distribution of P is approximately
n
normal where

X
 1 1
P  E  P   E    E  X    n   
n
 n n
X 1 n (1   )  (1   )
 P2  Var  P   Var    2 Var  X   
n n n2 n

Thus the distribution of P can be written as

  (1   ) 
P~ N  ,  when n is large ( n  30 ).
 n 
Note:
When approximating binomial distribution with normal distribution, a
1 X
continuous correction factor of  is used. Since P  , then a continuous
2 n
1
correction factor of  is used.
2n

10
Chapter 1: Sampling Distribution

Example 1.6
It is known that 3% of frozen curry puffs delivered to a canteen are broken.
What is the probability that, on a morning when 500 curry puffs are delivered,
a) 5% or more are broken?
b) between 2% and 4% are broken?
c) at least 3% are broken?

Solution

11
Chapter 1: Sampling Distribution

Example 1.7
Three quarters of the households in Taman Uda subscribe to internet service
provider. In a random sample of 100 households, find the probability that
a) at least 73 households subscribe to internet service provider.
b) at most 90 households subscribe to internet service provider.
c) between 85 and 90 households subscribe to internet service provider.

Solution

12
Chapter 1: Sampling Distribution

1.2.5 Sampling Distribution of the


Difference Between Two Proportions,
P1 P2

Now say we have two binomial populations with proportion of successes  1

and  2 respectively. Samples of size n1 are taken from population 1 and samples
of size n2 are drawn from population 2. Then P1 and P2 are the proportions from
those samples.

The sampling distributions of P1 and P2 are as follows

  ( 1  1 )    (1   2 ) 
P1 ~ N  1 , 1  and P2 ~ N   2 , 2 
 n1   n2 
provided n1 and n2 are large.

We are interested in finding the sampling distribution of the difference between


two proportions, P1  P2 . The mean of P1  P2 is

 P  P  E  P1  P2   E ( P1 )  E ( P2 )  1   2
1 2

and the variance is

  1 (1   1 )    2 (1   2 ) 
 P2  P  Var ( P1  P2 )  Var ( P1 )  Var ( P2 )    
 n1   n2 
1 2

Using the Central Limit Theorem, the distribution of P1  P2 is

  (1   1 )  2 (1   2 ) 
P1  P2 ~ N   1   2 , 1  
 n1 n2 

13
Chapter 1: Sampling Distribution

Example 1.8
Even though breakfast is an important meal of the day, it was found that 10% of
male adults while 25% of female adults skip breakfast. If a random sample of 50
male adults and a random sample of 50 female adults were selected, what is the
probability that
a) the proportion of the female adults who skips breakfast exceeds the
proportion of the male adults who skips breakfast?
b) the proportion of the female adults who skips breakfast exceeds the
proportion of the male adults who skips breakfast by at least 0.1?
c) the proportion of adults who skip breakfast from both groups differ by at
most 0.3?

Solution

14
Chapter 1: Sampling Distribution

Example 1.9
The proportions of single and married male policyholders who made an
insurance claim over the preceding 3-year period are 0.3 and 0.2 respectively. If
a large automobile company selected 50 random samples of single and 40
random samples of married male policyholders who made an insurance claim
over the preceding 3-year period, what is the probability that
a) the difference between the sample proportion of single male policyholders
and the sample proportion of married male policyholders who made an
insurance claim over the preceding 3-year period does not exceed 0.1?
b) the difference between the sample proportion of single policyholders and the
sample proportion of married male policyholders who made an insurance
claim over the preceding 3-year period is at least 0.15?

Solution

15
Chapter 1: Sampling Distribution

Exercises : Chapter 1

1) A random sample of size 35 is drawn from a normal distribution with mean


30 and variance 25. What is the probability that the
a) sample mean is at least 28?
b) sample mean is at most 32?
c) sample mean is between 29 and 32?

2) A random sample of size 50 is drawn from a binomial distribution with


parameters n = 15 and p = 0.3. What is the probability that
a) the sample mean is larger than 3.9?
b) the sample mean is between 4.1 and 4.4?
c) the sample mean is smaller than 4.0?

3) A random sample of size 40 is drawn from a Poisson distribution with


mean 8. What is the probability that the
a) sample mean is between 9 and 9.5?
b) sample mean is at least 7?
c) sample mean is at most 6?

4) The content of a packet of chocolate drink powder is distributed normally


with mean 250 grams and a standard deviation of 25 grams. 50 packets
were chosen randomly, what is the probability that the
a) mean weight is less than 245 grams?
b) mean weight is between 245 grams and 256 grams?

5) The average CPA for faculty of Science students is 2.9 with standard
deviation 1.0. A random sample of 100 students is drawn from this
population. What is the probability that the sample mean is smaller than
2.6?

16
Chapter 1: Sampling Distribution

6) An experiment was performed to compare the wear and tear of school


shoes produced by Factory A and B. It is known that the wear and tear of
shoes produced by factory A is distributed normally with mean 6 months
and standard deviation 1 month while the wear and tear of shoes
produced by factory B is normally distributed with mean 5 months and
standard deviation 0.5 month. A random sample of 40 shoes were taken
from both factories, what is the probability that the
a) difference in the sample means is larger than 0.5 month?
b) sample mean of wear and tear of shoes produced by factory A is
larger than the mean of wear and tear of shoes produced by factory
B.

7) A builder claims that water heaters are installed in 70% of all homes in
Taman Perling. A random sample of 50 homes is drawn from Taman
Perling, what is the probability that
a) at most 60% of the houses has water heaters installed in their homes.
b) at least 55% of the houses has water heaters installed in their homes.
c) Between 65% and 75% of the houses has water heaters installed in
their homes.

8) At a certain college it is estimated that 65% of the students ride


motorcycle to campus. If a sample size of 50 students is randomly taken,
what is the probability that at least 60% of the students ride motorcycle to
campus.

9) It is known that a particular machine produces 10% defective electrical


components. A box of 100 items of these electrical components will be
rejected by a buyer if more than 11% of the items are defective. Buyer A
ordered 200 boxes of these electrical components. How many of these
boxes will be rejected by buyer A?

17
Chapter 1: Sampling Distribution

10) The most popular soccer player is David Beckham. It is known that 60% of
the female soccer fans and 55% of the male soccer fans favor David
Beckham over the other soccer players. A sample of 100 female soccer
fans and a sample of 100 male soccer fans were interviewed at random.
What is the probability that the
a) proportion of female supporters of David Beckham is larger than the
proportion of male supporters?
b) difference between proportion of female supporters of Beckham and
that of male supporters is at most 0.2

11) It is known that 30% and 35% of the residents in Taman Sutera and Bandar
Baru UDA subscribe to “New Straits Times” newspaper respectively. If a
random sample of 50 newspaper readers from Taman Sutera and 50
readers from Bandar Baru UDA were taken randomly, what is the
probability that the proportion of “New Straits Times” subscribers in
Taman Sutera is larger than Bandar Baru UDA?

12) A certain type of thread is manufactured with a mean tensile strength of


78.3 kg. and a standard deviation of 5.6 kg. Assuming that the strength of
this type of thread is distributed approximately normal, find
a) the probability that the mean strength of a random sample of 10 such
thread falls between 77 kg. and 78 kg.
b) the probability that the mean strength is greater than 79 kg.
c) the probability that the mean strength is less than 76 kg.
d) the value of x to the right of which 15% of the means computed from
random samples of size 10 would fall.

13) The mean waiting time for drive-through customers before they get their
orders from a fast food restaurant is 20 minutes with standard deviation 5
minutes. If a random sample of 64 customers is observed, what is the
probability that their mean waiting time

18
Chapter 1: Sampling Distribution

a) is between 18 and 19 minutes?


b) Is more than 22 minutes?
c) At most 19 minutes?

14) A random sample of size 100 is taken from a population which is


distributed P(10). What is the probability that
a) the sample mean is larger than 12?
b) the sample mean is between 11 and 13?

15. The mean final examination scores for students taking a statistics test is 30
marks with a standard deviation of 6 marks. Assume that the test scores are
approximately normal. Two random samples were taken randomly
consisting of 32 and 50 students, respectively. What is the probability that
a) the mean test scores will differ by more than 3 marks?
b) the mean test scores from group 1 is larger than group 2?

16) A study was conducted to investigate repayment of government loans


(PTPTN) taken by students in the institute of higher learning. It was found
that only 20% paid their loans promptly. If a random sample of 100
students who took the loan was selected, what is the probability that
a) the proportion of students who do not pay their loans exceeds 0.2?
b) the proportion of students who do not pay their loans is between 0.15
and 0.25?
c) the proportion of students who do not pay their loans is at most 0.3?

Chapter 1: Answers

1. a) 0.9911 b) 0.9911 c) 0.8721


2. a) 0.9916 b) 0.2887 c) 0.0233
3. a) 0.0121 b) 0.9875 c) 0.0000

19
Chapter 1: Sampling Distribution

4. a) 0.0793 b) 0.8761
5. 0.0013
6. a) 0.9976 b) 1.0000
7. a) 0.0838 (0.0823) b) 0.9931 c) 0.4648
8. 0.8159 (0.8133)
9. 200(0.3085)  62 boxes
10. a) 0.7611 (0.7642) b) 0.9838 (0.9842)
11. 0.2981
12. a) 0.1998 b) 0.3446 c) 0.0968 d) 80.13
13. a) 0.0541 b) 0.0007 c) 0.0548
14. a) 0 b) 0.0008
15. a) 0.0272 b) 0.5
16. a) 1 b) 0 c) 0

20

You might also like