Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

CONFIDENCE INTERVAL-II

STATISTICAL METHODS IN ECONOMICS-II


LESSON: CONFIDENCE INTERVAL-II
(Population mean, variance and proportion)
LESSON DEVELOPER: ANJANI K. KOCHAK
COLLEGE/DEPARTMENT: DEPARTMENT OF ECONOMICS,
LADY SHRI RAM COLLEGE, UNIVERSITY OF DELHI

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

Table of contents Page

1. Introduction 2

2. Single sample confidence intervals

(a) Confidence interval for population mean 2

(b) Confidence interval for population variance 7

(c) Confidence interval for population proportion 8

3. Two sample confidence intervals

(a) Confidence interval for difference of means 10

(b) Confidence interval for ratio of population variances 13

(c) Confidence interval for difference of population proportions 16

4. Practice questions 18

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

Learning Objectives
In this lesson you will learn to derive the confidence interval for many
population parameters. For a single sample we will derive the confidence
interval for the population mean, variance and proportion and for two
samples the confidence interval for difference of means, variance ratio and
difference of proportions. To do this you will be introduced to three more
probability distributions, the t-distribution, the χ 2 distribution and the F
distribution.

Introduction
In the earlier lesson on confidence intervals, the basic logic of constructing a confidence
interval for the population mean with the help of a sample drawn from a normal population
with a known standard deviation σ was explained. Now we extend the same concept of
confidence interval of a population parameter to (i) population mean under different
situations (ii) population variance and (iii) population proportion. We will also extend the
analysis to two samples and derive the confidence interval for (i) difference of two
population means (i) ratio of two variances and (iii) difference of two population
proportions.

Single sample confidence intervals


(a)Confidence interval for population mean

We can distinguish between 4 different cases depending on sample size, nature of


the distribution of the population and whether population standard deviation is known or
not.

Case I : Suppose we have a normal population with mean µ and standard deviation σ and
the standard deviation, σ is known. If we take repeated samples of size n from this
population and derive the sampling distribution of the sample mean �
𝑋𝑋, we have learnt that 𝑋𝑋�
will also be normally distributed with mean µ and standard deviation σ/√n. Therefore if we
subtract µ from 𝑋𝑋� and divide by σ/√n we will get a standard normal variable z which is Ω
N(0,1)i.e

if X Ω N(µ, σ2)

then 𝑋𝑋� Ω N(µ, σ2/n)

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

𝑋𝑋� −µ
and Z= Ω N(0,1).
σ/√n

Next we need to specify the confidence level or confidence coefficient of the confidence
interval. Confidence levels are expressed as percentages—the common ones being
90%,95% and 99%.Confidence coefficients are expressed as probabilities- the
corresponding ones being 0.90, 0.95 and 0.99.

Suppose we desire to construct a 95% confidence interval for the population mean for the
population described above. We know from the normal tables that for a standard normal
variate Z, 95% of all observations would lie between -1.96 and +1.96,i.e

𝑋𝑋� −µ
P (−1.96 < < +1.96) = 0.95 (A)
σ/√n

Applying a few arithmetic operations to the inequalities inside the brackets will yield

P (𝑋𝑋� − 1.96 σ/√n) < µ < 𝑋𝑋� + 1.96 σ/√n) = 0.95 (B)

The elements inside the brackets indicate a random interval for the population mean µ
because the two limits of µ have a random element 𝑋𝑋� which varies from sample to
sample.The lower limit is � 𝑋𝑋 − 1.96 σ/√n and the upper limit is 𝑋𝑋� + 1.96 σ/√n.The interval is
centred to the sample mean 𝑋𝑋�, and extends 1.96 σ/√n to each side of 𝑋𝑋�.The width of the
interval is fixed. It is 2 times1.96 σ/√n.We can interpret the expression (B) as stating that
the probability that the population mean will lie within the random interval is 0.95.

Now for a given sample if we compute the sample mean 𝑥𝑥̅ and substitute it for �𝑋𝑋 in the
random interval (B), the resulting fixed interval is known as the 95% confidence interval for
the population mean.

( 𝑥𝑥̅ −1.96 σ/√n) , 𝑥𝑥̅ + 1.96 σ/√n) ) is a 95% confidence interval for µ .

We can change the confidence level to 99% or 90% or any other level. The
95%confidence interval was derived from the probability 0.95 for the initial inequality (A).If
we want to construct a 90% confidence interval, the initial probability of 0.95 must be
replaced by 0.90 .This implies that the z-critical value changes from 1.96 to 1.645. A 90%
confidence interval is then obtained by using 1.645 in place of 1.96 in equation
(A).Therefore the random interval that we get will now look like

P (𝑋𝑋� − 1.645 σ/√n) < µ < 𝑋𝑋� + 1.645σ/√n) = 0.90

We can interpret this expression as stating that the probability that the population
mean will lie within the random interval is 0.90 i.e if we take a large number of samples and
construct a large number of 90 % confidence intervals, approximately 90% of them would
contain the true population mean.

Likewise a 99% confidence interval can be obtained by using 2.58 instead of 1.96 in
equation (A).Thus we can change the level of confidence by replacing 1.96 with the

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

appropriate standard normal critical value. How do we find the appropriate standard normal
critical value? The normal tables give critical values of z𝛼𝛼 /2 , such that the area to its left is
1– α /2 as in figure 1. You can check from the normal tables that the area to the left of 1.96
is 0.975,to the left of 2.58 is 0.995 and so on. Since the curve is symmetrical, area to the
left of -z𝛼𝛼/2 would be α /2 and thus the area between - z𝛼𝛼/2 and z𝛼𝛼 /2 would be 1- α (figure
2).Thus if we have to find a 94 % confidence interval, the value of α would be 0.06.We
would therefore have to find z0.03 i.e critical value of z such that the area to its left is 0.97,
which is 1.88.Now the area between -1.88 and 1.88 would be 0.94.

Thus ( 𝑥𝑥̅ – ( zα / 2 ) σ/√n) , 𝑥𝑥̅ + ( zα / 2 ) σ/√n ) is a 100(1-α)% confidence interval for the
population mean µ

Case 2:If the random sample is however drawn from a non-normal population with a
known standard deviation σ, the sampling distribution of 𝑋𝑋� would be approximately normal
provided the sample size n, is sufficiently large(>30).This result follows from the central
limit theorem. Thus in this case too

( 𝑥𝑥̅ −1.96 σ/√n) , 𝑥𝑥̅ + 1.96 σ/√n) is a 95% confidence interval for the population mean µ , and

( 𝑥𝑥̅ – ( zα / 2 ) σ/√n) , 𝑥𝑥̅ + ( zα / 2 ) σ/√n ) is a 100 (1-α)% confidence interval for the population
mean µ.

Case 3: A more common situation arises when the population is normal but the population
standard deviation is not known. Since the population standard deviation σ is not known we
replace it by the sample standard deviation S. Now both X � and S are variables which vary
𝑋𝑋� −µ
from sample to sample. Thus the random variable , now follows a t-distribution with
S/√n

n-1 degrees of freedom. A t -distribution is a bell shaped, symmetrical distribution centred


at 0.It has only one parameter v, called the degrees of freedom .The t tables contain
critical values of tα ,v for values of α like 0.10,0.05,etc and v =1,2……..∞, where tα ,v is such
that the area to its right under the curve of the t distribution with v degrees of freedom is
equal to α (Figure 1) .Since the distribution is symmetric about 0, the area to the left of - tα ,v
is also α .Thus the area under the t distribution between - tα ,v and + tα ,v is 1- 2α (Figure 2).
By a similar logic the area between - tα / 2,v and tα / 2,v is 1-α .

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

𝑋𝑋� −µ
Now since the random variable is distributed as a t –distribution with n-1 degrees of
S/√n
freedom

𝑋𝑋� −µ
P ( - tα / 2, n −1 < < tα / 2, n −1 ) = 1-α
S/√n

A little manipulation will give

� - tα / 2, n −1 S/√n < µ <X


P(X � + tα / 2, n −1 S/√n ) = 1-α

If we substitute �X and S by the sample mean 𝑥𝑥̅ and standard deviation s computed from the
random sample of size n drawn from the population, we get a 100(1-α ) % confidence
interval for the population mean µ.

( 𝑥𝑥̅ –( tα / 2, n −1 ) s /√n <, 𝑥𝑥̅ +( tα / 2, n −1 ) s /√n)

Here the upper bound for the population mean µ is

𝑥𝑥̅ + tα / 2, n −1 s /√n

and the lower bound is

𝑥𝑥̅ - tα / 2, n −1 s /√n

Case 4: As n becomes large(n>40), the t distribution obtained in case 3 above, can be


approximated by the normal distribution so that even when the population standard
𝑋𝑋� −µ
deviation is unknown the random variable can be taken to be approximately normally
S/√n
distributed. Thus the 100(1- α ) % confidence interval for the population mean is
approximately equal to

𝑥𝑥̅ – ( zα / 2 )s /√n) , 𝑥𝑥̅ + ( zα / 2 ) s /√n)

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

Confidence interval for population mean


Nature of Population Size of Sampling Confidence interval
population standard population distribution of
deviation standardized
𝑋𝑋�
Normal known any normal [ 𝑥𝑥̅ – ( zα / 2 ) σ/√n) ,
𝑥𝑥̅ + ( zα / 2 ) σ/√n)]

Non -Normal known large Normal(clt) [𝑥𝑥̅ – ( zα / 2 ) σ/√n) ,


𝑥𝑥̅ + ( zα / 2 ) σ/√n)]

Normal unknown small t [ 𝑥𝑥̅ – ( tα / 2, n −1 ) s /√n ,


𝑥𝑥� + ( tα / 2, n −1 ) s /√n)]

normal unknown large Normal(approx) [ 𝑥𝑥̅ – ( zα / 2 )s /√n) ,


𝑥𝑥̅ + ( zα / 2 ) s /√n))]

Example: A new weight reducing drug was tried on 16 patients and produced a mean
decrease in weight of 32 kgs. If the standard deviation for the 16 patients was 8 kgs, find a
90% confidence interval for the true mean decrease in weight of all the patients of the type
being treated. Assume population to be normally distributed. Which case does this fall in
and why?

Solution: This falls in case 3, since population standard deviation is unknown and sample
standard deviation is given and sample size is small.90% confidence interval for µ

(32-1.753*8/4, 32+1.753*8/4)

(28.494, 35.506)

(b) Confidence interval for population variance

Suppose we have a normal population with mean µ and variance σ2 . If a large number of
random samples of size n are drawn from this population and the sample variance S2 is
(n − 1) S 2
calculated for each sample, then the random variable is distributed as a chi –
σ2
square distribution with n-1 degrees of freedom.

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

(n − 1) S 2 χ
n 2

σ2
=( ∑(X
i =1
i − X) /σ )Ω
2 2
n −1

The chi-square distribution is a continuous probability distribution defined by one parameter


v,the degrees of freedom. This distribution is generally positively skewed but becomes
symmetrical as v increases. The chi-square tables give critical values for Χ2 α,v ,such that the

χ
2
area to the right of this critical value is α (Figure1). . For example in the table is
0.025,12

23.337.This implies that for a χ 2


distribution with 12 degrees of freedom 0.025% of the
total area lies to the right of 23.337.

χ
2
Therefore area to the right of for a chi –square distribution with n-1 degrees of
α /2, n −1

χ
2
freedom will be α/2.Similarly the area to the right of will be (1-α/2). Thus the area
1−α /2, n −1

χ χ
2 2
between a and and will be (1-α),for a given value of v (Figure 2).
1−α /2, n −1 α /2, n −1

(n − 1) S 2
χ χ
2 2
P( < < )= (1-α),
1−α /2, n −1 σ 2 α /2, n −1

A little manipulation yields

(n − 1) S 2 (n − 1) S 2
P( <σ2 < ) =(1 − α ) ( C)
χα2 /2,n −1 χ12−α /2,n −1

This is a random interval since S2 is a random variable which varies from sample to sample.
Once we substitute the computed value of s2 from a given sample in (C) we

2
get a 100(1-α)% confidence interval for σ

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

(n − 1) s 2 (n − 1) s 2
( , )
χ χ
2 2

α /2, n −1 1−α /2, n −1

Example: The length of skull of 12 fossil skeletons of an extinct species of birds has
a standard deviation of 0.3 cm.Find a 95% confidence interval for the variance of
the length of the skull of all birds of this species, if the population is normally
distributed.

Solution:95% confidence interval for population variance-

[(11*0.09)/21.92 ,(11*0.09)/3.816]

(0.045,0.259)

(c) Confidence interval for population proportion:

Let p denote the proportion of ‘successes’ in the population where ‘success’ represents the
presence of an attribute e.g if we are looking at students admitted to economics honours in
a college , then ‘ success’ could be having done economics in class 12. If a large number of
samples of size n are selected and the proportion of successes in the sample, 𝑝𝑝̂ is
𝑋𝑋
calculated, we can derive the sampling distribution of p� . Now � = , where X is the number
p
𝑛𝑛
of successes in the sample. If n is small compared to the size of the population, X follows a
binomial distribution with mean np and variance npq. However if both np≥ 10 and n(1-p)≥
10 then X has approximately a normal distribution. Thus 𝑝𝑝̂ , being a linear function of X
will also be normally distributed with mean p and variance p(1-p)/n.Thus

𝑝𝑝−𝑝𝑝�
P(− zα / 2 < < zα / 2 ) = (1-α),
� p(1−p)/n.

To derive the confidence interval for p we replace the inequality with equality

𝑝𝑝−𝑝𝑝� =
zα / 2
� p(1−p)/n.

Squaring both sides and cross multiplying yields

n 𝑝𝑝̂ 2 +p2n -2pn 𝑝𝑝̂ = ( zα / 2 ) 2p -( zα / 2 ) 2 p2

Now collect terms for p2 and p

[n+( zα / 2 ) 2] p2 – [2n 𝑝𝑝̂ + ( zα / 2 ) 2]p +n𝑝𝑝̂ 2

This is a quadratic equation in p of the form

ap2 +bp +c =0

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

To find the roots of p we use the formula

−𝑏𝑏±�𝑏𝑏 2 −4𝑎𝑎𝑎𝑎
P= which yields
2𝑎𝑎

p + z 2 / 2n p (1 − p ) / n + z 2 / 4n 2
α /2 α /2
=p ± zα / 2
1 + z 2α / 2 / n 1 + z 2α / 2 / n

p + z 2 / 2n
Let p = α /2
so that
1 + z 2α / 2 / n

p (1 − p ) / n + z 2 / 4n 2
p= p ± zα /2 α /2
(D)
1 + z 2α /2 / n

Thus this represents a 100(1-α)% confidence interval for p ,where the two roots represent
the two confidence limits . This is often called the score confidence interval for p.

If the sample size n is very large, then p ≈ p̂ and the 100(1-α)% confidence interval is
approximately equal to

p=
pˆ ± zα /2 pˆ (1 − pˆ ) / n (E)

The interval (E) often referred to as the traditional interval has the advantage of simplicity
but does not work well even when n =100, if p is close to 0 or 1.According to Devore recent
research has shown that the score confidence interval (D) is more accurate than the
traditional confidence interval (E) for all sample sizes and values of p, and thus
recommends that the score confidence interval should always be used.

Example: Among 100 fish caught in a lake, 20 were inedible because of pollution. Construct
a traditional 99% confidence interval for the population proportion

Solution:99% traditional confidence interval is:

(0.2-2.58*�0.16/100, 0.2 +2.58*�0.16/100)

(0.0968, 0.3032)

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

Two sample confidence intervals


(a) Confidence interval for difference of means

Like for the sample mean, here too we can distinguish between 4 different cases depending
on sample sizes, nature of the distribution of the population and whether population
standard deviations are known or not.

CaseI : In this case two independent random samples of sizes m and n are drawn from
two normal populations with known variances σ12 and σ22 .The sampling distribution of the two
sample means 𝑋𝑋� and 𝑌𝑌� would be normal since the two populations were normal.

𝑋𝑋� Ω N(µ1, σ12 /m)

���Ω N (µ2, σ22 /n)


𝑌𝑌

Since (𝑋𝑋� − 𝑌𝑌
���) is a linear combination of two normally distributed independent random
variables it would also be normally distributed. Applying rules of Expectations the mean
would be µ1 - µ2 and variance( σ12 /m + σ22 /n)

���)=E(𝑋𝑋� )- E(𝑌𝑌
E(𝑋𝑋� − 𝑌𝑌 ���) = µ1 - µ2

V(𝑋𝑋� − 𝑌𝑌
���)= V(𝑋𝑋� ) +V(𝑌𝑌
���)= σ12 /m + σ22 /n (since the two samples are independent)

Then

(𝑋𝑋 �����
��� − 𝑌𝑌 ) Ω N (µ1 - µ2, σ12 /m + σ22 /n)

and

( X − Y ) − ( µ1 − µ 2 )
Z= Ω N(0,1).
σ 12 σ 22
+
m n

Thus

( X − Y ) − ( µ1 − µ 2 )
P (− zα / 2 < < + zα / 2 ) = 1 − α
σ 12 σ 22
+
m n

Rearranging terms gives ̅

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

��� − �����
𝑃𝑃{(𝑋𝑋 ��� − �����
𝑌𝑌 ) – [( zα / 2 )� 𝜎𝜎12 /𝑚𝑚 + 𝜎𝜎22 /𝑛𝑛)] < ( µ1 − µ2 ) < (𝑋𝑋 𝑌𝑌 ) + [( zα / 2 )� 𝜎𝜎12 /𝑚𝑚 + 𝜎𝜎22 /𝑛𝑛)] } = (1 − 𝛼𝛼),

Substituting the mean values 𝑥𝑥̅ and 𝑦𝑦� obtained from the sample we would get a

100(1-α)% confidence interval for difference of means𝑥𝑥̅

(𝑥𝑥 −𝑦𝑦�) – [� zα / 2 � � σ12 /m + σ22 /n)] , ���


��� (𝑥𝑥 −𝑦𝑦�) + [( zα / 2 )� σ12 /m + σ22 /n)]

Case 2:If the random samples are however drawn from a non-normal population with a
known variances σ12 and σ22 ,the sampling distribution of (𝑋𝑋� − 𝑌𝑌
���) would be approximately
normal provided the sample sizes, m and n, are sufficiently large(>30). This result follows
from the central limit theorem. Thus in this case too

(𝑥𝑥 −𝑦𝑦�) – [� zα / 2 � � σ12 /m + σ22 /n)] , ���


��� (𝑥𝑥 −𝑦𝑦�) + [( zα / 2 )� σ12 /m + σ22 /n)]

is a 100(1- α) % confidence interval for the difference between the two population
mean𝑠𝑠 ( µ1 − µ2 )

Case 3: A more practical situation arises when the populations are normal but the
population variances are not known. Since the population variances are unknown we replace
them by the sample variances 𝑆𝑆12 and 𝑆𝑆22 , assuming that the population standard deviations
are different .Now both the means and the standard deviations vary in value from one
sample to another. In this case the random variable

( X − Y ) − ( µ1 − µ2 )
T  =  
S 12 S 2 2
+
m n

will follow a t distribution with v degrees of freedom, where v can be calculated from the

data as -

2
 s12 s2 2 
 + 
v= 2 2
m n 
( s1 / m) ( s2 2 / n) 2
+
m −1 n −1

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

Thus

( X − Y ) − ( µ1 − µ 2 )
P (-t α/2,v <[ ] < t α/ 2,v ) = 1 – α
S 12 S 2 2
+
m n

A little manipulation of terms and replacing 𝑋𝑋� and 𝑌𝑌� by mean values 𝑥𝑥̅ and 𝑦𝑦� obtained from
the sample and 𝑆𝑆12 and 𝑆𝑆22 by 𝑠𝑠12 and 𝑠𝑠22 obtained from the sample, we would get a

100(1-α)% confidence interval for difference of means

��� −𝑦𝑦�) – [� t
{(𝑥𝑥 2 2 ��� �) + [( t 2 2
α / 2,v � � 𝑠𝑠1 /m + 𝑠𝑠2 /n)] , (𝑥𝑥 −𝑦𝑦 α / 2,v )� 𝑠𝑠1 /m + 𝑠𝑠2 /n)]}

Case 4: As n becomes large (n>40), the t distribution approaches normality so that even
when the population standard deviations are not known the random variable

( X − Y ) − ( µ1 − µ 2 )
{ } can be taken to be normally distributed approximately. Thus the
S 12 S 2 2
+
m n

100(1- α ) % confidence interval can be approximated by

(𝑥𝑥 −𝑦𝑦�) – [� zα / 2 � � 𝑠𝑠12 /m + 𝑠𝑠22 /n)] , ���


��� (𝑥𝑥 −𝑦𝑦�) + [( zα / 2 )� 𝑠𝑠12 /m + 𝑠𝑠22 /n)]

If the population standard deviations are unknown but assumed to be equal, a pooled
standard deviation is calculated using the sample standard deviations, as

(m − 1) s12 + (n − 1) s22
sp =
m+n−2

Now s2p replaces 𝑠𝑠12 and 𝑠𝑠22 in case 3 and 4.The degrees of freedom of the t distribution

in case 3 would now be (m + n – 2).

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

Confidence interval for difference of population means

Nature of Population Size of Sampling Confidence interval


populations standard samples distribution
deviations of
standardized
���
(𝑋𝑋 − �����
𝑌𝑌 )
Normal known Any normal ��� – 𝑦𝑦�) – [� zα / 2 � � σ12 /m + σ22 /n)] ,
(𝑥𝑥
��� −𝑦𝑦�) + [( z � σ12 /m + σ22 /n)]
(𝑥𝑥 α /2

Non -Normal known Large Normal(clt) Same as case 1 above


normal Unknown Small t ���
(𝑥𝑥 −𝑦𝑦�) – [� tα / 2,v � � 𝑠𝑠12 /m + 𝑠𝑠22 /n)] ,
but ���
(𝑥𝑥 −𝑦𝑦�) + [( tα / 2,v )� 𝑠𝑠12 /m + 𝑠𝑠22 /n)]
different
normal unknown Large Normal ��� −𝑦𝑦�) – [� zα / 2 � � 𝑠𝑠12 /m + 𝑠𝑠22 /n)] ,
(𝑥𝑥
but (approx) ���
(𝑥𝑥 −𝑦𝑦�) + [( zα / 2 )� 𝑠𝑠12 /m + 𝑠𝑠22 /n)]
different

Example: A random sample of 100 construction workers in Chennai has an average weekly
salary of Rs.1200 with a standard deviation of Rs. 80.In Delhi a sample of 120 construction
workers has an average weekly salary of Rs. 1000 with a standard deviation of Rs. 100.
Construct a 95% confidence interval for the difference in average weekly salary of all
construction workers in Chennai and Delhi. Assume population standard deviations are
different.

6400 10000 6400 10000


Solution: (200-1.96� + ,200+1.96� + )
100 120 100 120

(200-12.138, 200+12.138)

(187.862, 212.138)

(b) Confidence interval for ratio of population variances

To construct a confidence interval for ratio of population variances we need to introduce a


new distribution, namely the F distribution. We have learnt that if random samples of size n
are drawn from a normal population with mean µ and variance σ2 and the sample variance
(𝑛𝑛 −1)𝑆𝑆2
s2 is calculated for each sample, then the random variable is distributed as a chi –
σ2
square distribution with n-1 degrees of freedom. It is also known that the random variable
formed as a ratio of two independently distributed chi-squared random variables divided by
their respective degrees of freedom will follow a F distribution. If X is distributed as a chi-

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

squared variable with v1 degrees of freedom and Y is distributed as a chi-squared variable


with v2 degrees of freedom, then

X/v1
F= 𝑌𝑌/𝑣𝑣2
The F distribution is a continuous probability distribution with two parameters v1 and v2
where v1 is the degrees of freedom of the numerator and v2 is the degrees of freedom of the
denominator. The F tables give critical values for Fα ,v ,such that the area to the right of
v
1 2

this is α for given values of v1 and v2 (Figure 1).For practical purposes F critical values are
tabulated for four values of α, namely 0.1,0.05,0.010,0.001.

For example in the tables F.05,12,15 is 2.48.This implies that for a F distribution with 12
degrees of freedom in the numerator and 15 in the denominator, 0.05% of the total area
lies to the right of 2.48.The distribution is not symmetrical but an important property of the
distribution helps us to obtain other critical values easily. This property is that

1
F1−α ,v v
 = (G)
1 2
Fα ,v2 v
1

This implies that F.95,12,15 =1/ F.05,15,12 =1/2.62 =0.382 i.e for a F12,15 ,90% of the area lies
1
between 0.382 and 2.48. Thus in general area between and Fα /2,v1v2 will be 1 – α
Fα /2,v2v1
(Figure 2).

Now if we take a random sample of m observations from a normal population with variance
𝜎𝜎12 and another independent random sample of size n from another normal population with
variance σ22 and the sample variances are S12 and S22 respectively then the random variable

σ 22 𝑆𝑆12
F =
𝜎𝜎12 𝑆𝑆22

follows a F distribution with m-1 and n-1 degrees of freedom. This random variable is a ratio
of two chi squared random variable divided by their respective degrees of freedom, the two

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

chi–squared random variables being (m-1) 𝑆𝑆12 /𝜎𝜎12 and (n-1) 𝑆𝑆22 /𝜎𝜎22 .

σ 22 S12
P( F1−α /2,m −1,n −1 < (1 − α )
< Fα /2,m −1,n −1 ) =
σ 12 S22

Rearranging terms within brackets

S 22 σ 22 S 22
P( F1−α /2,m −1,n −1 < < Fα /2, m −1, n −1 2 ) =(1 − α )
S12 σ 12 S1

1 S12 σ 12 1 S12
P( > > ) =(1 − α )
F1−α /2,m −1,n −1 S 22 σ 22 Fα /2,m −1,n −1 S 22

1 S12 σ 12 1 S12
P( < < ) =(1 − α )
Fα /2,m −1,n −1 S 22 σ 22 F1−α /2,m −1,n −1 S 22

Using property (G)

1 S12 σ 12 S12
P( < < Fα /2, n −1, m −1 2 ) =(1 − α )
Fα /2,m −1,n −1 S 22 σ 22 S2

This gives a random interval for the ratio of population variances. Replacing 𝑆𝑆12 and 𝑆𝑆22 by
the sample variances s1 and s2 calculated from the two samples gives us a 100(1-α)%
2 2

confidence interval for the ratio of the population variances.

1 s12 s12
( , Fα /2, n −1, m −1 2 )
Fα /2,m −1,n −1 s22 s2

Example: A random sample of 11articles produced by machine A gave a mean length of


4cm and standard deviation of 5mm, while a random sample of 16 articles produced by
machine B gave a mean length of 4.2cm and standard deviation of 4.5 mm.Assuming that
the sample were independent and that the length of the articles was normally distributed for
both machines, find a 90% confidence interval for the ratio of the two population variances.

Solution: Here m=11,n=16, 𝑠𝑠12 = 25 , , 𝑠𝑠22 =20.25.

From the tables F.05,10,15 =2.54,and F.05,15,10 = 2.85.Therefore a 90% confidence


σ21
interval for 𝜎𝜎22
is

(25/[20.25 *2.54]), ([25 *2.85]/20.25)

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

(0.486, 3.518)

Confidence interval for difference of population proportions

We move now to difference of two population proportions. Suppose we have two


populations and p1 is the proportion of ‘success’ in the first population and p2 is the
proportion of ‘success’ in the second population. Success refers to the presence of an
attribute as explained earlier. Suppose a random sample of size m is selected from the first
population and an independent random sample of size n from the second population. Let X
be the number of successes in the first sample and the sample size m is much smaller than
the size of the population. Similarly let Y be the number of successes in the second sample
and the sample size n is much smaller than the size of the population. In this situation both
X and Y can be regarded as having binomial distributions. Let p̂1 be the proportion of
successes in the first sample and p̂2 be the proportion of successes in the second sample,
then ( p̂1 - p̂2 ) would be an estimator of( p1 - p2 ). If m and n are large, then both p̂1 and
p̂2 will have approximately normal distributions and therefore ( p̂1 - p̂2 ) being a linear
combination of the two, would also have an approximately normal distribution. What about
the mean and variance of ( p̂1 - p̂2 )? Since

X Ω B(m p1, m p1q1) and Y Ω B(n p2, np2q2)

X Y mp1 np2
E( p̂1 - p̂2 )=E( )- E( ) = - = p1 - p 2
m n m n

X Y
V( p̂1 - p̂2 )=V( ) +V( ) since X and Y are independent
m n

V ( X ) V (Y ) mp1q1 np2 q2 p1q1 p2 q2


= + = + = + where q1=1-p1 and
m2 n2 m2 n2 m n
q2=1-p2

Thus the standardized ( p̂1 - p̂2 ) will be N(0,1)

( pˆ1 − pˆ 2 ) − ( p1 − p2)
Z= Ω N (0,1) for large m and n.
p1q1 p2 q2
+
m n

and

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

( pˆ1 − pˆ 2 ) − ( p1 − p2)
P (− zα /2 < 1−α
< zα /2 ) =
p1q1 p2 q2
+
m n

Rearranging terms and substituting p̂1 , p̂2 , q̂1 , q̂2 for p1,p2,q1,q2 respectively, we get an
approximate 100(1 – α)% confidence interval for ( p1 - p2 ).

pˆ1qˆ1 pˆ 2 qˆ2 pˆ1qˆ1 pˆ 2 qˆ2


ˆ1 −
{(p pˆ 2 ) − zα /2 + ˆ1 − pˆ 2 ) + zα /2
, (p + }
m n m n

This interval can be used when m p̂1 ,m q̂1 ,n p̂2 ,n q̂2 are all ≥ 10.

Small sample test for difference of proportion are beyond the scope of this lesson.

Example:In a random sample of visitors to a new mall,180 of 240 men and 150 of 300
women made purchases. Construct a 95% confidence interval for the difference between
the true proportions of men and women who make purchases at this mall.

Solution:

V( p̂1 - p̂2 )=(0.75*0.25)/240 +(0.5*0.5)/300

=0.00078+0.00083

= 0.00161

95% confidence interval

{(0.75-0.5)- 1.96*0.04, (0.75-0.5)+ 1.96*0.04}

(0.172, 0.328)

All conditions are satisfied i.e m p̂1 ,m q̂1 ,n p̂2 ,n q̂2 are all ≥ 10

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

References

1.J.E Freund Mathematical Statistics

2.J.L.Devore:Probability and Statistics for Engineering and the Sciences

Practice questions

Q 1. A catering company wanted to know the % of large companies that provide lunch
services to its employees. A sample of 240 such companies showed that 96 of them
provided such facilities. Construct a 97% confidence interval for the true % of all large
companies that provide lunch facilities to its employees.

Q 2. Construct a 95% confidence interval for the difference between the mean life times of
two kinds of drying machines, given that a random sample of 40 machines of the first kind
gave an average life of 420 hours and 50 machines of the second kind gave an average life
of 405 hours. The population standard deviations are 20 and 25 respectively.

Q 3.A consumer agency took a random sample of 25 washing machines of a certain brand
and tested them. They found the variance of the lives of these machines was 5200 square
hours. Make a 99% confidence interval for the variance and standard deviation of the lives
of all washing machines of that brand. What assumption about the population is necessary
to construct this confidence interval?

Q 4.The breaking strength of copper wires used in air conditioning is normally distributed
with standard deviation 2 psi.A random sample of nine wires is tested, and the average
breaking strength is found to be 100 psi.Find a 95% confidence interval for the true mean
breaking strength of the wire.

Q 5. A sample of 25 customers was randomly selected from Reliance Store and a purchase
index was calculated for them. It was found that the mean purchase index was 7.6 with a
standard deviation of 0.75.Another sample of 28 customers selected from Sabka Bazar
produced a mean purchase index of 8.1with a standard deviation of 0.59.Assuming that the
customer purchase index for each market is normally distributed, construct a 95%
confidence interval for the difference in the mean purchase indexes for all customers for the
two Stores.

Q6.A random sample of 25 observation selected from a normally distributed population


produced a sample variance of 35.Construct a confidence interval for σ2 for each of the

Institute of Lifelong Learning, University of Delhi


CONFIDENCE INTERVAL-II

following confidence levels and comment on what happens to the confidence interval when
the confidence level decreases.(a) 0.99 (b) 0.95 (c) 0.90.

Q 7.A random sample of 500 adult residents of district A found that 380 were in favour of
increasing the highway speed limit to 100 kmph,while another sample of 400 adult residents
of district B found 260 were in favour of the increased speed limit. Construct a 90%
confidence interval for the true difference in the proportion of adult residents favouring
increased speed limit.

Q 8.A random samples of 16 glasses were selected and the wall thickness was measured.
The sample mean was 4.25mm and the standard deviation was 0.07mm.Find a 95%
confidence interval for the true mean wall thickness.

Q 9. A car manufacturer wanted to estimate the mileage of their latest model. A sample of
49 cars of that model were randomly selected and their average mileage was found to be
19.6kms.The sample also gave a standard deviation of 0.7 kms.Find a 90% confidence
interval for the true average mileage of this model. What assumptions, if any, are necessary
to find this confidence interval?

Q 10.Thirteen mango trees of one variety were selected and it was found that they had a
mean height of 13.5 feet with a standard deviation of 1.5 feet. Sixteen randomly selected
mango trees of another variety gave a mean height of 13 feet with a standard deviation of
1.2 feet. Assuming that the random samples were selected from normal populations,
construct a 90% confidence interval for the ratio of the variances of the two populations
sampled.

Institute of Lifelong Learning, University of Delhi

You might also like