Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

MODULE 3

LARGE AND SMALL SAMPLE TESTS

Standard Error

Let u be a statistic satisfying the conditions of the central limit theorem.


Then

𝑢 − 𝐸(𝑢)
𝑡= ~𝑁(0,1)
𝑉(𝑢)

The standard deviation of the distribution of any statistic is called its


standard error. If u is the statistic 𝑉(𝑢) is its standard error.

 Testing of a hypothesis concerning the mean of a population

Let 𝜇 be the mean and 𝜎 be the S.D of a population. Consider the


hypothesis 𝐻0 : 𝜇 = 𝜇0
The alternative hypothesis may be any of the following
𝐻1 : 𝜇 > 𝜇0
𝐻1 : 𝜇 < 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
Case1: 𝝈 is known
We know that 𝑥 is the best test statistic for 𝜇
𝐸 𝑥 =𝜇
𝜎2
𝑉 𝑥 =
𝑛
𝜎
So 𝑆. 𝐸 𝑥 =
𝑛

Wehave

𝑥−𝜇 𝑥−𝜇 𝑛
𝑡= 𝜎 = ~𝑁(0,1)
𝜎
𝑛
a) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 < 𝝁𝟎
Test statistic is
𝑥−𝜇 𝑛
𝑡=
𝜎
We reject the hypothesis (𝑯𝟎 ) when 𝒕 < −𝒕𝜶 ,where 𝑡𝛼 is so
determined that 𝑃 𝑡 < 𝑡𝛼 = 𝛼
b) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 ≠ 𝝁𝟎

Test statistic is

𝑥−𝜇 𝑛
𝑡=
𝜎
We reject the hypothesis (𝑯𝟎 ) when 𝒕 ≥ 𝒕𝜶 ,where 𝒕𝜶 is
𝟐 𝟐

so determined that 𝑃 𝑡 ≥ 𝒕𝜶 =𝛼
𝟐

c) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 > 𝜇𝟎

Test statistic is

𝑥−𝜇 𝑛
𝑡=
𝜎
We reject the hypothesis (𝑯𝟎 ) when 𝒕 > 𝒕𝜶 ,where 𝑡𝛼 is so
determined that 𝑃 𝑡 > 𝑡𝛼 = 𝛼
Case 2 : 𝝈 is unknown
If 𝜎 is unknown then take sample s.d (s) as an approximation to 𝜎. So
the
Test statistic is
𝑥−𝜇 𝑛
𝑡=
𝑠
Do testing as described above
 Testing equality of the means of two populations
Case 1 : 𝝈 is known

Let 𝜇1 & 𝜇2 be the means and 𝜎1 & 𝜎2 be the S.D of two populations.
Let the sample of sizes 𝑛1 & 𝑛2 be taken and let 𝑥1 & 𝑥2 be the
means and 𝑠1 & 𝑠2 be the sample S.D
Suppose we have to test 𝐻0 : 𝜇1 = 𝜇2
We know that
𝜎1
𝑥1 ~𝑁 𝜇1 ,
𝑛1
And
𝜎2
𝑥2 ~𝑁 𝜇2 ,
𝑛2
𝐸 𝑥1 − 𝑥2 = 𝜇1 − 𝜇2 = 0 𝑢𝑛𝑑𝑒𝑟 𝑡ℎ𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠

𝜎1 2 𝜎2 2
𝑉(𝑥1 − 𝑥2 ) = +
𝑛1 𝑛2
Therefore
𝑥1 − 𝑥2 − 𝐸 𝑥1 − 𝑥2
𝑡= ~𝑁 0,1
𝑉(𝑥1 − 𝑥2 )
𝑥1 − 𝑥2
𝑡= ~ 𝑁(0,1)
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2
a) If the alternative hypothesis: 𝑯𝟏 : 𝝁 < 𝝁𝟎
Test statistic is
𝑥1 − 𝑥2
𝑡=
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2

We reject the hypothesis (𝑯𝟎 ) when 𝒕 < −𝒕𝜶 ,where 𝑡𝛼 is so


determined that 𝑃 𝑡 < 𝑡𝛼 = 𝛼

b) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 ≠ 𝝁𝟎

Test statistic is
𝑥1 − 𝑥2
𝑡=
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2

We reject the hypothesis (𝑯𝟎 ) when 𝒕 ≥ 𝒕𝜶 ,where 𝒕𝜶 is


𝟐 𝟐

so determined that 𝑃 𝑡 ≥ 𝒕𝜶 =𝛼
𝟐

c) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 > 𝜇𝟎

Test statistic is
𝑥1 − 𝑥2
𝑡=
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2

We reject the hypothesis (𝑯𝟎 ) when 𝒕 > 𝒕𝜶 ,where 𝑡𝛼 is so


determined that 𝑃 𝑡 > 𝑡𝛼 = 𝛼
Case 2 : 𝝈 is unknown
𝑛 1 𝑠1 2 +𝑛 2 𝑠2 2
𝐼𝑓 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑡𝑎𝑘𝑒 𝜎1 2 = 𝜎2 2 = 𝜎 =
𝑛 1 +𝑛 2

 Testing the hypothesis that a proportion has a specified value


(𝑯𝟎 : 𝒑 = 𝒑𝟎)
Let p denotes the proportion of characteristic and 𝑞 = 1 − 𝑝 is the
proportion of not possessing the characteristics.
Consider the hypothesis 𝐻0 : 𝑝 = 𝑝0
Let a sample of size 𝑛 be taken and let x be the number of units
possessing the characteristic, then x 𝐵(𝑛, 𝑝)
So
𝐸 𝑥 = 𝑛𝑝0 & 𝑉 𝑥 = 𝑛𝑝0 𝑞0
Test statistic
𝑥−𝐸 𝑥
𝑡= ~ 𝑁(0,1)
𝑉 𝑥
𝑥 − 𝑛𝑝0
𝑡= ~ 𝑁 0,1
𝑛𝑝0 𝑞0
𝑥
− 𝑝0
𝑡= 𝑛 ~ 𝑁 0,1
1
𝑛𝑝0 𝑞0
𝑛
𝑥
− 𝑝0
𝑡= 𝑛 ~ 𝑁 0,1
𝑛𝑝0 𝑞0
𝑛2
𝑥
− 𝑝0
𝑡= 𝑛 ~ 𝑁 0,1
𝑝0 𝑞0
𝑛
If the alternative hypothesis is
𝐻1 : 𝑝 ≠ 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 ≥ 𝒕𝜶
𝟐

𝐻1 : 𝑝 > 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 > 𝒕∝

𝐻1 : 𝑝 < 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 < −𝒕∝

 Testing equality of proportion in two populations(𝑯𝟎 : 𝒑𝟏 = 𝒑𝟐 )


Let samples of sizes 𝑛1 &𝑛2 be taken from thw two populations and
let 𝑥1 & 𝑥2 be the number of units possessing the specifeied
characteristics in the two samples.
Sample proportions
𝑥1 𝑥2
𝑝1 ′ = 𝑎𝑛𝑑 𝑝2 ′ =
𝑛1 𝑛2
.
Let 𝑝1 = 𝑝2 = 𝑝
Under the hypothesis
𝑥1 1 1
𝐸 𝑝1 ′ = 𝐸 = 𝐸 𝑥1 = 𝑛 𝑝 =𝑝
𝑛1 𝑛1 𝑛1 1
𝑥2 1 1
𝐸 𝑝2 ′ = 𝐸 = 𝐸 𝑥2 = 𝑛 𝑝 =𝑝
𝑛2 𝑛2 𝑛2 2
𝑥1 1 1 𝑝𝑞
𝑉 𝑝1 ′ = 𝑉 = 2 𝑉 𝑥1 = 2 . 𝑛1 𝑝𝑞 =
𝑛1 𝑛1 𝑛1 𝑛1
Similarly
𝑝𝑞
𝑉 𝑝2 ′ =
𝑛2
Test statistics

𝑥1 𝑥2 𝑥 𝑥
− −𝐸 1 − 2
𝑛 𝑛2 𝑛1 𝑛2
𝑡= 1 𝑥1 𝑥2 ~𝑁(0,1)
𝑉 𝑛 −𝑛
1 2
𝑥1 𝑥2

𝑛1 𝑛2
𝑡= ~𝑁 0,1
𝑝𝑞 𝑝𝑞

𝑛1 𝑛2

𝑥1 𝑥2

𝑛1 𝑛2
𝑡= ~𝑁 0,1
1 1
𝑝𝑞 𝑛 + 𝑛
1 2

If p is unknown then
𝑛1 𝑝1 ′ + 𝑛2 𝑝2 ′
𝑝=
𝑛1 + 𝑛2

If the alternative hypothesis is


𝐻1 : 𝑝 ≠ 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 ≥ 𝒕𝜶
𝟐
𝐻1 : 𝑝 > 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 > 𝒕∝

𝐻1 : 𝑝 < 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 < −𝒕∝

Goodness of fit
Let there be k classes and let 𝑂𝑖 (observed frequency) be the number of
sample values falling in the 𝑖 𝑡ℎ class. Let 𝐸𝑖 be the expected frequency of
the 𝑖 𝑡ℎ class.
𝑘
2
2
𝑂𝑖 − 𝐸𝑖
𝜒 =
𝐸𝑖
𝑖=1

Follows chi square distribution with 𝑘 − 𝑟 − 1 degrees of freedom, where 𝑟


is the number of independent constraints to be satisfied by the
frequencies.
If 𝜒 2 > 𝜒 2 𝛼 we reject 𝐻0
Where 𝑃 𝜒 2 > 𝜒 2 𝛼 | 𝐻0 = 𝛼
POINTS TO BE REMEMBERED
 Sample size 𝑛 should be large(more than 50)
 Theoretical frequency of each class should be at least 5. If any class
has frequency less than 5, that class should be combined with the
adjacent class.
 Degrees of freedom :
 If the hypothesis directly specifies the theoretical frequencies
or the rule for determining the theoretical frequencies the d.f
will be one less than the number of classes.(i.e, 𝑘 − 1 if the
number of classes is 𝑘 )
 If 𝑟 parameters are estimated for the calculation of the
theoretical frequencies the d.f is 𝑘 − 𝑟 − 1 where 𝑘 is the
number of classes
 If the classification is in the form of a two way table
(contingency table) and if there are 𝑐 columns and 𝑟 rows and
no parameters are estimated then d.f is 𝑐 − 1 (𝑟 − 1)

Testing of independence of qualitative characteristics

Consider two qualitative characteristics A and B divided into r and s classes


respectively (i.e 𝐴1 , 𝐴2 , … 𝐴𝑟 & 𝐵1 , 𝐵2 , … , 𝐵𝑠 ) . Such a classification in which
attributes are divided into more than two classes is known as manifold
classification. The various cell frequencies can be expressed in the following
table known as 𝑟 × 𝑠 contingency table

𝐵1 𝐵2 . . 𝐵𝑗 . . 𝐵𝑠 Total
𝐴1 𝑓11 𝑓12 𝑓1𝑗 𝑓1𝑠 𝑓1.
𝐴2 𝑓21 𝑓22 𝑓2𝑗 𝑓2𝑠 𝑓2.
.
.
𝐴𝑖 𝑓𝑖1 𝑓𝑖2 𝑓𝑖𝑗 𝑓𝑖𝑠 𝑓𝑖.
.
.
𝐴𝑟 𝑓𝑟1 𝑓𝑟2 𝑓𝑟𝑗 𝑓𝑟𝑠 𝑓𝑟.
total 𝑓.1 𝑓.2 𝑓.𝑗 𝑓.𝑠 𝑓..
𝑓𝑖.
𝑃 𝐴𝑖 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎 𝑝𝑒𝑟𝑠𝑜𝑛 𝑝𝑜𝑠𝑠𝑒𝑠𝑠𝑒𝑠 𝑡ℎ𝑒 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝐴𝑖 =
𝑓..

𝑓.𝑗
𝑃 𝐵𝑗 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎 𝑝𝑒𝑟𝑠𝑜𝑛 𝑝𝑜𝑠𝑠𝑒𝑠𝑠𝑒𝑠 𝑡ℎ𝑒 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝐵𝑖 =
𝑓..

If the characteristics are independent, the probability that any


observation will fall in the cell 𝐴𝑖 𝐵𝑗 is
𝑓𝑖. 𝑓.𝑗
𝑃(𝐴𝑖 𝐵𝑗 ) = ×
𝑓.. 𝑓..
𝑓𝑖. 𝑓.𝑗 𝑓𝑖.×𝑓.𝑗
So expected frequency in this cell = 𝑓.. × ×𝑓 =
𝑓.. .. 𝑓..

2
𝑓𝑖. × 𝑓.𝑗
𝑓𝑖𝑗 −
𝑂𝑖 − 𝐸𝑖 2 𝑓..
𝜒2 = =
𝐸𝑖
𝑖 𝑗
𝑓𝑖. × 𝑓.𝑗
𝑓..
Follows chi square distribution with 𝑟 − 1 𝑠 − 1 d.f
If 𝜒 2 > 𝜒 2 𝛼 the hypothesis that the characteristics are independent is to
be rejected.
Show that in a 𝟐 × 𝟐 contingency table where frequencies are 𝒂, 𝒃, 𝒄, 𝒅

2
𝑎 + 𝑏 + 𝑐 + 𝑑 𝑎𝑑 − 𝑏𝑐 2
𝜒 =
𝑎+𝑏 𝑐+𝑑 𝑏+𝑑 𝑎+𝑐
The given table is

A Not A Total

B a b a+b

Not B c d c+d

Total a+c b+d n=a+b+c+d

The expected value in the cell 1,1 𝑖𝑠


a+b a+c
𝑛
The expected value in the cell 1,2 𝑖𝑠
a+b b+d
𝑛
The expected value in the cell 2,1 𝑖𝑠
c+d a+c
𝑛
The expected value in the cell 2,2 𝑖𝑠
c+d b+d
𝑛
2 2
a+b a+c c+d b+d
𝑎− 𝑑−
𝑛 𝑛
𝜒2 = +⋯+ (1)
a+b a+c c+d b+d
𝑛 𝑛
2
a+b a+c
𝑎− 𝑛𝑎 − 𝑎 + 𝑏 𝑎 + 𝑐 2
𝑛
=
a+b a+c 𝑛 𝑎 + 𝑏 (𝑎 + 𝑐)
𝑛
2
(𝑎 + 𝑏 + 𝑐 + 𝑑)𝑎 − 𝑎 + 𝑏 𝑎 + 𝑐
=
𝑛 𝑎 + 𝑏 (𝑎 + 𝑐)
𝑎2 + 𝑏𝑎 + 𝑐𝑎 + 𝑑𝑎 − 𝑎2 + 𝑎𝑐 + 𝑏𝑎 + 𝑏𝑐 2
=
𝑛 𝑎+𝑏 𝑎+𝑐

𝑑𝑎 − 𝑏𝑐 2
=
𝑛 𝑎+𝑏 𝑎+𝑐
Similarly
2
a+b b+d
𝑏− 𝑑𝑎 − 𝑏𝑐 2
𝑛
=
a+b b+d 𝑛 𝑎+𝑏 𝑏+𝑑
𝑛
2
c+d a+c
𝑐− 𝑑𝑎 − 𝑏𝑐 2
𝑛
=
c+d a+c 𝑛 c+d a+c
𝑛
2
c+d b+d
𝑑− 𝑑𝑎 − 𝑏𝑐 2
𝑛
=
c+d b+d 𝑛 c+d b+d
𝑛
𝑑𝑎 −𝑏𝑐 2 𝑑𝑎 −𝑏𝑐 2 𝑑𝑎 −𝑏𝑐 2 𝑑𝑎 −𝑏𝑐 2
So 𝜒 2 = + + +
𝑛 𝑎+𝑏 𝑎+𝑐 𝑛 𝑎 +𝑏 𝑏+𝑑 𝑛 c+d a+c 𝑛 c+d b+d
2
𝑑𝑎 − 𝑏𝑐 1 1 1
= + +
𝑛 𝑎+𝑏 𝑎+𝑐 𝑎+𝑏 𝑏+𝑑 c+d a+c
1
+
c+d b+d

2
𝑑𝑎 − 𝑏𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
c+d a+c
+
𝑎+𝑏 𝑏+𝑑 c+d a+c
𝑎+𝑏 𝑏+𝑑
+
c+d a+c 𝑎+𝑏 𝑏+𝑑
𝑎+𝑏 𝑎+𝑐
+
c+d b+d 𝑎+𝑏 𝑎+𝑐

2
𝑑𝑎 − 𝑏𝑐 𝑐+𝑑 𝑏+𝑑 + c+d a+c + 𝑎+𝑏 𝑏+𝑑 + 𝑎+𝑏 𝑎+𝑐
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)

2
𝑑𝑎 − 𝑏𝑐 𝑐+𝑑 𝑏+𝑑 + a+c + 𝑎+𝑏 𝑏+𝑑 + 𝑎+𝑐
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
2
𝑑𝑎 − 𝑏𝑐 𝑐+𝑑 𝑛+ 𝑎+𝑏 𝑛
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
2
𝑑𝑎 − 𝑏𝑐 𝑛 𝑎+𝑏+𝑐+𝑑
=
𝑛 𝑎+𝑏 𝑎+𝑐 𝑐+𝑑 𝑏+𝑑
𝑑𝑎 − 𝑏𝑐 2 𝑛
=
𝑎+𝑏 𝑎+𝑐 𝑐+𝑑 𝑏+𝑑
Testing of Homogeneity

Let there be k sets of observations and let 𝑛𝑖 be the number of


observations in the 𝑖 𝑡ℎ set. Let each set be classified into r classes based
on the value of variable characteristics. Let 𝑓𝑖𝑗 be the number of
observations in the 𝑖 𝑡ℎ class in the 𝑗𝑡ℎ set. The contingency table is

Sets
1 2 . . k Total
1 𝑓11 𝑓12 𝑓1𝑘 𝑓1.
2 𝑓21 𝑓22 𝑓2𝑘 𝑓2.
Class

.
.
r 𝑓𝑟1 𝑓𝑟2 𝑓𝑟𝑘 𝑓𝑟.
Total 𝑓.1 𝑓.2 𝑓.𝑘 𝑓..

We have to examine whether the k sets belong to similar populations


with the same proportion of elements in each class(homogeneous).
2
𝑓𝑖. × 𝑓.𝑗
𝑓𝑖𝑗 −
2
𝑂𝑖 − 𝐸𝑖 2 𝑓..
𝜒 = =
𝐸𝑖
𝑖 𝑗
𝑓𝑖. × 𝑓.𝑗
𝑓..
Follows chi square distribution with 𝑟 − 1 𝑘 − 1 d.f
If 𝜒 2 > 𝜒 2 𝛼 we reject the hypothesis 𝐻0

You might also like