Hypothesis Testing 7,8ppt

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

Analysis of Variance – F-Test

Meaning of Analysis of Variance or “F-Test”


The analysis of variance or ‘F-Test’ is a technique developed by R.A. Fisher, to test for
the significance of the difference among more than two sample means/variances and
to make inferences about whether such samples are drawn from population having
the same mean/variance.

F-test is obtained by taking the ratio of unbiased estimates of population variances as


follows:

𝑺𝟐𝟏 ഥ 𝟏 )𝟐 /(𝒏𝟏 −𝟏)


σ(𝑿𝟏 −𝑿
𝑭= = , Numerator should be more than
𝑺𝟐𝟐 ഥ 𝟐 )𝟐 /(𝒏𝟐 −𝟏)
σ(𝑿𝟐 −𝑿
denominator

Tutorial Note: To keep the ratio larger than 1, the larger variance is placed in the
numerator. If the computed value of F is greater than the table value of F, we reject H0
and conclude that the two populations do not have the same variance. If the
computed value of F is less than the table value of F, we accept H0 and conclude that
the two populations have the same variance.
Assumptions of Analysis of Variance or “F-test”
The analysis of variance of F-Test is based on the
following assumptions:
1. Each sample is drawn randomly from a
normal population and the sample statistics
tend to reflect the characteristics of the
population.
2. The population from which the samples are
drawn have same means and variances i.e.

𝜇1 = 𝜇2 = 𝜇3 = ⋯ 𝜇𝑘

𝜎1 2 = 𝜎2 2 = 𝜎3 3 = ⋯ 𝜎𝑘 2
Uses of F-Test
F test is used –
For test of hypothesis of equality between two variances.
For test of hypothesis of equality amongst several sample
means.

Properties of F-Test
Range – Range of values of F is from 0 to ∞. The value of F can
never be negative since both terms of the F-ratio and squared
values.

Shape – The shape of ‘F’ distribution cure depends upon the


number of degrees of freedom for the first term and that for
the second term. In general F curve is skewed to right.

Critical Value – For same probability value, critical value of F


for the lower area is reciprocal of F for the upper area with 𝑣1
and 𝑣2 interchanged.
Analysis of Variance
Analysis of variance is the ratio of 2 variances
(i) between samples
(ii) within samples. Its purpose is to find out the
influence of different forces working on them.

It is used for agricultural experiments, for natural


sciences, for physical sciences.

Classification Model
There may be one way classification model or two
way classification model.
One-way Classification Model
One way classification model is designed to study the effect of one factor in
an experiment. For example, influence of application of one or more types of
fertilizers may be considered on several pieces of land. It is designed to test
the null hypothesis that the arithmetic means of the population from which
the k samples are randomly drawn are equal to one another.
𝐻𝑜 : 𝜇1 = 𝜇2 = 𝜇3 = ⋯ 𝜇𝑘
Practical Steps involved in one factor analysis of variance
Step-1: We set 𝐻𝑜 : 𝜎12 = 𝜎22 𝐻1 : 𝜎12 ≠ 𝜎22

Step-2: Calculate the mean of each sample i.e. 𝑋ത1 , 𝑋2 , … . … 𝑋ത𝑘 and grand
average as follows:

𝑋ത1 + 𝑋ത2 +⋯+𝑋ത 𝑘


𝑋ധ =
𝑁1 +𝑁2 +⋯𝑁𝑘
Note: To simplify calculations one may add, subtract, multiply or divide the
given data by any figure. It will not affect the ultimate solution.

Step-3: Calculate the difference between means of various samples and


grand average.

Step-4: Square these differences and obtain their total i.e. σ(𝑋ത1 − 𝑋ധ )2 for
each sample.
Step-5: Calculate the sum of squares between the samples (SSB) as follows:
SSB = σ(𝑋ത1 − 𝑋ധ )2 + (𝑋ത2 − 𝑋ധ )2 + σ(𝑋ത3 − 𝑋ധ )2 + ⋯
Step-6: Calculate the difference between the various items in a sample and the mean
values of the respective samples.
Step-7: Square these differences and obtain their total for each sample i.e.
σ(𝑋 − 𝑋ത )2
Step-8: Calculate the sum of squares within the samples (SSW) as follows:
𝑆𝑆𝑊 = σ(𝑋1 − 𝑋ത1 )2 + σ(𝑋2 − 𝑋ത2 )2 +
σ(𝑋3 − 𝑋ത3 )2 + ⋯
Step-9: Prepare ANOVA table as follows:
Source Degree Comput
Table
of Sum of of Mean ed
value of
variatio squares freedo squares value of
F
n m F
Betwee MSB =
n SSB c–1 F=
samples
Within MSW =
SSW n–c
Samples
Total n–1
Step-10: Compare the computed value of F
with the table value of F for the given degrees of
freedom as a given critical level (generally we
take 5% level of significance) and interpret the
same as follows:
Case Interpretation
(a) If the computed value of F The difference in the
is greater than the table value variances is significant and it
of F could not have arisen due to
fluctuation of random
sampling and hence we
reject 𝐻0
(b) If the computed value of F The difference in the variance
is less than the table value of F is not significant and it could
have arisen due to
fluctuations of random
sampling and hence we
accept 𝐻0
Case Study-16:
The following table gives the yields on 15
sample fields under three varieties of seeds; (viz.
A, B, C)
YIELDS
A B C
5 3 10
6 5 13
8 2 7
1 10 13
5 0 17

Test at 5% level of significance.


Hint of Case Study-16:

Hint:
We have to analyse the variability among three
independent variables A, B, C where A, B C are
categories.
In this table, yield is the dependant variable.
Varieties of seeds are factor on which yield
depends. Therefore, it is a one fact ANOVA.
Discussion of Case Study - 16.
Yields
A B C
5 3 10
6 5 13
8 2 7
1 10 13
5 0 17

H0: 𝜎 21 = 𝜎 2 2 =𝜎 2 3
H1:At least two of the population variancess are unequal.

Calculation of mean of each sample (𝑥)ҧ and grand average (𝑥)Ӗ


5 + 6 + 8 + 1 + 5 25
𝑥ҧ1 = = =5
5 5
3 + 5 + 2 + 10 + 0 20
𝑥ҧ2 = = =4
5 5
10 + 13 + 7 + 13 + 17 60
𝑥ҧ3 = = = 12
5 5
𝑥ҧ1 + 𝑥ҧ2 + 𝑥ҧ3 5 + 4 + 12 21
𝑥Ӗ = = = =7
3 3 3
Calculation of sum of Squares between the samples
A B C
2 2 2
𝑥ҧ1 − 𝑥Ӗ 𝑥ҧ2 − 𝑥Ӗ 𝑥ҧ3 − 𝑥Ӗ
2 2 2
5−7 =4 4−7 =9 12 − 7 = 25
2 2 2
5−7 =4 4−7 =9 12 − 7 = 25
2 2 2
5−7 =4 4−7 =9 12 − 7 = 25
2 2 2
5−7 =4 4−7 =9 12 − 7 = 25
2 2 2
5−7 =4 4−7 =9 12 − 7 = 25

20 45 125

Sum of squares between samples (SSB)

SSB = 20+45+125=190
Calculation of Sum of Squares within Samples (SSW/SSE)
A B C
2 2 2
𝑥1 − 𝑥ҧ 𝑥2 − 𝑥ҧ2 𝑥3 − 𝑥ҧ
5−5 2 = 0 3−4 2 =1 10 − 12 2 = 4
6−5 2 = 1 5−4 2 =1 13 − 12 2 = 1
8−5 2 = 9 2−4 2 =4 7 − 12 2 = 25
1 − 5 2 = 16 10 − 4 2 = 36 13 − 12 2 = 1
5−5 2 = 0 0 − 4 2 = 16 17 − 12 2 = 25

26 58 56

Sum of squares within samples

SSW = 26+58+56=140
ANNOVA TABLE
One Way ANOVA
Source of Sum of Degree of Mean Squares Computed value Table
Variance Squares Freedom of F value
of F

SSB/SSC 190 𝑣1 =c – 1 MSSB=190/2 F=95/11.67=8.14 3.88


= 3 – 1 =2 =95

SSW/SSR 140 𝑣2 =n – c MSSW=140/1


= 15 – 3 =12 2=11.67

Total =SST 330 n–1


= 15 – 1 =14
𝐹𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 8.14
𝐹0.05 , 𝜈1 = 2, 𝜈2 = 12 = 𝐹𝑡𝑎𝑏𝑢𝑙𝑎𝑟 = 3.88
𝐹𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 > 𝐹𝑇𝑎𝑏𝑢𝑙𝑎𝑡𝑜𝑟
Since the computed value of F is greater than the
table value of F, H0 is rejected.
It is concluded that the average yield of land under
different varieties of seed show significant different.
Practical steps involved in the preparation of ANOVA Table (i.e. Analysis of Variance
table) for one factor analysis of variance.
Step-1: We set 𝐻𝑜 : 𝜎12 = 𝜎22 𝐻1 : 𝜎12 ≠ 𝜎22
Step-2: Calculate the mean of each sample i.e. σ 𝑋1 σ 𝑋2 … square the
observations and obtain their total for each sample, i.e. σ 𝑋1 2 σ 𝑋2 2 … …

Note: To simplify the calculations, one may add, subtract, multiply or divide the
given data by any figure. It will not affect the ultimate solution.
𝑇2
Step-3: Calculate Correction Factor ( ) as follows.
𝑁
𝑇 2 (𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠)2
=
𝑁 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Step-4: Calculate total sum of squares (SST) as follows:
SST = Sum of squares of all the observations – Correction Factor
2 2 𝑇2
= σ 𝑋1 + σ 𝑋2 … . −
𝑁
Step-5: Calculate sum of squares between samples (SSB) as follows;

(𝑆𝑢𝑚 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒1 )


SSB = +⋯ −
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒1
𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟

(σ 𝑋1 )2 (σ 𝑋2 )2 𝑇2
= + +⋯ −
𝑁1 𝑁2 𝑁
• Step-6: Calculate sum of squares within samples
(SSW) as follows:
• SSW = SST – SSB

• Step-7: Prepare the ANOVA table as follows:
ANOVA Table
Source of Sum of Degree of Mean Variance
variation squares freedom squares Ratio

Between 𝑆𝑆𝐵 𝑀𝑆𝐵


SSB c–1 MSB = F=
samples 𝐶−1 𝑀𝑆𝑊

Within 𝑆𝑆𝑊
SSW n–c MSW =
Samples 𝑛−𝑐

Total SST n–1


Step-10: Compare the computed value of F
with the table value of F for the given degrees of
freedom as a given critical level (generally we
take 5% level of significance) and interpret the
same as follows:
Case Interpretation
(a) If the computed value of F The difference in the
is greater than the table value variances is significant and it
of F could not have arisen due to
fluctuation of random
sampling and hence we
reject 𝐻0
(b) If the computed value of F The difference in the variance
is less than the table value of F is not significant and it could
have arisen due to
fluctuations of random
sampling and hence we
accept 𝐻0
Case Study-17:
The following table gives the yields on 15
sample fields under three varieties of seeds; (viz.
A, B, C)
YIELDS
A B C
5 3 10
6 5 13
8 2 7
1 10 13
5 0 17

Test at 5% level of significance.


Discussion of Case Study - 17

Hint:
We have to analyse the variability among three
independent variables A, B, C where A, B C are
categories.
In this table, yield is the dependant variable.
Varieties of seeds are factor on which yield
depends. Therefore, it is a one fact ANOVA.
Discussion of Case Study - 16.
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3
𝐻1 : At least two of the population means are
unequal.
Calculation of sum of observations of each row and
each column and grand total:
Yields
A B C Row Total
5 3 10 18
6 5 13 24
8 2 7 17
1 10 13 24
5 0 17 22

𝑇1 =25 𝑇2 =20 𝑇3 =60 T=105


T=𝑇1 +𝑇2 +𝑇3 =25 + 20 + 60 = 105

𝑇2 105 2
𝐶𝑓 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 = = = 735
𝑁 15
Sum of squares between columns
𝑇1 2 𝑇2 2 𝑇3 2
SSB=SSC = + + − 735
𝑁1 𝑁2 𝑁3

25 2 20 2 60 2
= + + − 735
5 5 5

= (125 + 80 + 720) − 735

= 190
Sum of the Squares of Total (SST)

𝑆𝑆𝑇 = ෍ 𝑋 2 − 𝑐𝑓

SST =
(52 + 62 + 82 + 12 + 52 + 32 + 52 + 22 + 102 + 02 + 102
+ 132 + 72 + 132 + 172 ) − 735

= (25+36+64+1+25+9+25+4+100+0+100+169+49+169+289)
– 735

= 1065 – 735 = 330

SSW=SSE = SST – SSC = 330 – 190 =140


ANNOVA TABLE
One Way Annova

Source of Sum of Degree of Mean Computed Table


Variance Squares Freedom Squares value of F value
of F

Between 190 c–1 190/2=95 95/11.67=8.14 3.88


Samples = 3 – 1 =2
With-in 140 n–c 140/12=11.
Samples = 15 – 3 67
=12

Total 330 n–1


= 15 – 1
=14
𝐹𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 8.14
𝐹0.05 , 𝜈1 = 2, 𝜈2 = 12 = 𝐹𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 = 3.88
𝐹𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑 > 𝐹𝑇𝑎𝑏𝑢𝑙𝑎𝑡𝑜𝑟
Since the computed value of F is greater than the
table value of F, H0 is rejected.
It is concluded that the average yield of land under
different varieties of seed show significant different.
Two Way Classification Model
Two way classification model is designed to study the effects of two factors
simultaneously in the same experiment.
Practical steps involved in the preparation of ANOVA table (i.e. Analysis of variance
table) for two factor analysis of variance
Step-1: We set H0: 𝜇1 = 𝜇2 = 𝜇3 𝐻1 : At least two of the population means are
unequal.
Step-2: Calculate sum of observations of each row and each column and their grand
total.
𝑇2
Step-3: Calculate correction factor as follows:
𝑁

𝑇2 (𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠)2


=
𝑁 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Note: N = r x c where, r = no. of columns, c = no. of rows

Step-4: Calculate sum of squares between columns (SSC) as follows:


𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑜𝑙𝑢𝑚𝑛
SSC = − 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑒𝑎𝑐ℎ 𝑐𝑜𝑙𝑢𝑚𝑛

Step-5: Calculate sum of squares between Rows (SSR) as follows:


𝑆𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑟𝑜𝑤
SSR = − 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝐹𝑎𝑐𝑡𝑜𝑟
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑒𝑎𝑐ℎ 𝑟𝑜𝑤
Step-6: Calculate total sum of squares (SST) as
follows:
SST = Sum of squares of all the
observations – correction factor
2 2 𝑇2
= σ 𝑋1 + σ 𝑋2 … . −
𝑁
Step-7: Calculate sum of squares for the Residual
Error/ (SSE) as follows
SSE = SST – (SSC + SSR)
Step-8: Prepare the ANOVA table as follows:

Source of Sum of Degree of


Mean squares Variance Ratio
variation squares freedom

Between 𝑆𝑆𝐶 𝐺𝑟𝑒𝑎𝑡𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒


SSC c–1 MSC = 𝐶−1 ∗ 𝐹1 = 𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
columns

Between 𝑆𝑆𝑅 𝐺𝑟𝑒𝑎𝑡𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒


SSR R–1 MSR = 𝑟−1 ∗∗ 𝐹2 = 𝑆𝑚𝑎𝑙𝑙𝑒𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
rows

Residual
(c – 1) 𝑆𝑆𝐸
Error SSE MSE =
( r – 1) 𝑐−1 (𝑟−1)
within/SSE

Total SST rc – 1
Step-9: Compare the computed value of F with
the table value of F for the given degrees of freedom
at a given critical level (generally we take 5% level of
significance) and interpret the same as follows:
Case Interpretation
(a) If the computed value of F is The difference in the variances is
greater than the table value of F significant and it could not have
arisen due to fluctuation of
random sampling and hence we
reject 𝐻0

(b) If the computed value of F is The difference in the variance is


less than the table value of F not significant and it could have
arisen due to fluctuations of
random sampling and hence we
accept 𝐻0
Case Study-18:
The following table gives per hectare yield for
three varieties of wheat each grown on five
plots:
Per hectare yield (in tons)
Plot of Land Variety of wheat
A B C
X 5 3 10
Y 6 5 13
Z 8 2 7
P 1 10 13
Q 5 0 17
Discussion of Case Study - 18

Hint:
We have to analyse the variability among three
independent variables A, B, C where A, B C are categories
of wheat and variability among five independent
variables X,Y,Z,P,Q where X,Y,Z,P,Q are categories of plot of
land.
In this table, yield is the dependant variable.
Varieties of wheat are factors on which yield depends.
Varieties of land are factors on which yield depends.

Therefore, it is a two way ANOVA.


Discussion of Case Study - 18.
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3
𝐻1 : At least two of the population means are
unequal.
Calculation of sum of observations of each row and
each column and grand total:
Yields
Plot of Land Variety of wheat

A B C Row Total
X 5 3 10 𝑅1 =18
Y 6 5 13 𝑅2 =24
Z 8 2 7 𝑅3 =17
P 1 10 13 𝑅4 =24
Q 5 0 17 𝑅5 =22
Col Total 𝑇1 =𝐶1 =125 𝑇2 =𝐶2 =20 𝑇3 =𝐶3 =60 T=105
T=𝑇1 +𝑇2 +𝑇3 =25 + 20 + 60 = 105

𝑇2 105 2
𝐶𝑓 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 = = = 735
𝑁 15
Sum of squares between columns
𝑇1 2 𝑇2 2 𝑇3 2
SSB=SSC = + + − 735
𝑁1 𝑁2 𝑁3

25 2 20 2 60 2
= + + − 735
5 5 5

= (125 + 80 + 720) − 735

= 190
Sum of Squares between Rows (SSR)

18 2 24 2 17 2 24 2 22 2
SSR = + + + + − 735
3 3 3 3 3

= (108 + 193 + 96.33 + 192 + 161.33) − 735

= 14.66
Sum of the Squares of Total (SST)

2
𝑆𝑆𝑇 − ෍ 𝑋 − 𝑐𝑓

SST =
(52 + 62 + 82 + 12 + 52 + 32 + 52 + 22 + 102 + 02 + 102
+ 132 + 72 + 132 + 172 ) − 735

= (25+36+64+1+25+9+25+4+100+0+100+169+49+169+289) – 735

= 1065 – 735 = 330

Sum of Squares of Error (SSE)


SSE = SST – (SSC + SSR)
= 330 – (190+14.66)
= 125.34
TWO WAY ANNOVA TABLE
Source of Sum of Degree of Mean of Computed Table
Variance Squares Freedom Sum value of F value of F
Squares

Between 190 c–1 190/2=95 95/15.67=6.06 4.46


Columns = 3 – 1 =2
Between 14.66 𝒓−𝟏 =𝟒 3.665 15.67/3.665 = 6.04
Rows 4.28
Residual 125.34 𝒄−𝟏 𝒓−𝟏 15.67
error =𝟐×𝟒=𝟖

Total 330 n–1=


= 15 – 1 =14

N=c . r
Since the computed value of F (6.06) is greater
than the tabular value of F (4.46), 𝐻0 is rejected.
It is concluded that there is a significance
difference between the variance of wheat.

Since the computed value (4.28) is less than the


tabular value (6.04), 𝐻0 is accepted. It is
concluded that there is no significance
difference among varieties of plots.
NON-PARAMETRIC TESTS

Non-parametric level or distribution free test do


not rely on assumptions that the data are drawn
from a given probability distribution.

It is the opposite of parametric statistic.


Advantages of Non-Parametric Test
• It is a distribution free test.
• It is more robust.
• Non-parametric test can be used for very small
sample size.
• Non-parametric test can be used for attributes.
• Non-parametric test can be used for making
judgment about individuals.
• They are very easy to calculate.
• They can be used with limited information.
Dis-advantages of Non-parametric Test

• The result cannot be generalized since they are not


efficient as parametric test.

• It cannot be used for more complex problems.

• It ignore certain amount of information.


Types of Non-Parametric Tests

• Sign test for paired data

• Spearman’s rank correlation test

• Mann-Whitney U test

• Kruskal – Wallis Test H test


Sign Test for One Sample Data when average to be compared
• In one sample sign test mean or median of a random
sample would be given.
• Differences are taken between each of the observation of
the available sample and existing sample average.
• Only signs are considered for analysis and magnitudes are
ignored. Signs may be positive or negative. If difference is
zero then it is ignored.
• Then test is conducted with the Null hypothesis that the
samples are taken from same population with the given
average.
• This hypothesis would be true if the number of positive signs
are equal to the number of negative signs.
• Alternate hypothesis is that the samples were not taken
from the same population.
• If null hypothesis is accepted then alternative hypothesis is
rejected and vice versa.
Sign Test for Paired Data
• In paired sign test there are two samples.
• Differences are taken among the observations of two
samples and only signs are considered for analysis and
magnitude is ignored.
• Signs may be positive or negative. If difference is zero
then it is ignored.
• Then test is conducted with the Null hypothesis that
both samples are taken from same population.
• This hypothesis would be true if the number of
positive signs are equal to the number of negative
signs.
• Alternate hypothesis is that the samples were not
taken from the same population.
• If null hypothesis is accepted then alternative
hypothesis is rejected and vice versa.
CASE STUDY-19
A typing school claims that in a six-week intensive course, it
can train students to type, on the average, at least 60 words
per minute. A random sample of 15 graduates is given a
typing test and the median number of words per minute
typed by each of these students is given below:
Test the hypothesis, that the median typing speed of
graduation is at least 60 words per minute.

Studen A B C D E F G H I J K L M N O
ts
Word 81 76 53 71 66 59 88 73 80 66 58 70 60 56 55
per
Minute
Hint of CASE STUDY-19
Median/Mean of a random sample is given to
compare
Sign Test
Studen A B C D E F G H I J K L M N O
ts
Sign + + - + + - + + + + - + 0 - -
𝑋 = 𝐸 ′ + ′ 𝑠𝑖𝑔𝑛 = 9
𝑯𝟎 : 𝝁 = 𝟔𝟎
𝑯𝟏 : 𝝁 ≠ 𝟔𝟎
𝟏 𝟏
𝑷= , 𝒒= , 𝒏 = 𝟏𝟓𝑯
𝟐 𝟐
𝟏
𝑬 = 𝒏𝒑 = 𝟏𝟓 = 𝟕. 𝟓, 𝑿=𝟗
𝟐
𝑿 − 𝒏𝒑 𝟗 − 𝟕. 𝟓 𝟏. 𝟓 𝟑 𝟑
𝒁= = = = = (𝒂𝒑𝒑𝒓𝒐𝒙)
𝒏𝒑𝒒 𝟏 𝟏 𝟏𝟓 𝟏𝟓 𝟑. 𝟖𝟕
𝟏𝟓
𝟐 𝟐 𝟒
𝑍𝐶𝑎𝑙 = 0.775 < 𝑍𝑡𝑎𝑏 = 𝑍0.05 = 1.96
Hence, null hypothesis is accepted. Hence Median = 60
CASE STUDY-20
Use the sign test to see if there is a difference
between the number of day’s until collection of
an account receivable, before and after a new
collection policy. Take 𝛼 = 0.05.
Before 30 28 34 35 40 42 33 38
After 32 29 33 32 37 43 40 41

Before 34 45 28 27 25 41 36
After 37 44 27 33 30 38 36
Hint: CASE STUDY-20

Sign Test
No information about distribution.
𝑿−𝒏𝒑
𝒁= (Matched pair, Non-parametric test)
𝒏𝒑𝒒
Discussion of CASE STUDY-20

𝑯𝟎 : There is no significance difference


𝑯𝟏 : There is a significance
Calculation of Sign of Difference
Before 30 28 34 35 40 42 33 38 34 45 28 27 25 41 36
After 32 29 33 32 37 43 40 41 37 44 27 33 30 38 36
Sign - - + + + - - - - + + - - + 0

X=6
X = No. of ‘+’ sign.
𝟏
𝑿 − 𝒏𝒑 𝟔 − 𝟏𝟓
𝒁= = 𝟐 == −𝟏. 𝟓 ≅ −𝟎. 𝟕𝟕𝟓 (𝒂𝒑𝒑𝒓𝒐𝒙)
𝒏𝒑𝒒 𝟏 𝟏 𝟏. 𝟗𝟑𝟔
𝟏𝟓
𝟐 𝟐
𝑍 = 0.775 < 𝑍0.05 = 1.96. 𝑯𝟎 is accepted. There is no significance difference.
CASE STUDY-21

Rank in 4 6 1 3 9 7 10 2 8 5
Training
(𝑅𝑥 )
Rank in 5 8 3 1 7 6 9 2 10 4
Field (𝑅𝑦 )
Hint of CASE STUDY-21
Ranks are given,
Spearman’s Rank Test
𝑯𝟎 : r = 0 (There is no correlation)

𝑯𝟏 : r ≠0 (There is correlation)

Rank in 4 6 1 3 9 7 10 2 8 5
Training
(𝑅𝑥 )
Rank in 5 8 3 1 7 6 9 2 10 4
Field (𝑅𝑦 )
𝑑 -1 -2 -2 2 2 1 1 0 -2 1
= 𝑅𝑥 − 𝑅𝑦
𝑑2 1 4 4 4 4 1 1 0 4 1
2
෍ 𝑑 = 24
2
6σ𝑑 6 24 144
𝑟 =1− 2
= =1−
𝑛 𝑛 −1 10 100 − 1 10 99

⇒ 𝑟 = 0.8545 ≅ 𝟎. 𝟖𝟓 (𝒂𝒑𝒑𝒓𝒐𝒙)
𝟏 𝟏 𝟏 𝟏
𝑆. 𝐸 𝑟 = = = = = 𝟎. 𝟑𝟑
𝒏−𝟏 𝟏𝟎 − 𝟏 𝟗 𝟑
𝒓−𝟎 𝟎. 𝟖𝟓 − 𝟎
𝒁= = ≅ 𝟐. 𝟓𝟖 (𝒂𝒑𝒑𝒓𝒐𝒙)
𝑺. 𝑬 (𝒓) 𝟎. 𝟑𝟑

𝑍0.05 , Spearman = 0.6364

As 𝑍𝑐𝑎𝑙 > 𝑍𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙

𝑯𝟎 is rejected.
There is a correlation between training and performance.
CASE STUDY-22
A larger hospital hires most of its doctors from the two major universities. Over the last
year, hospital has been conducting test for the newly recruited doctors to determine which
school educate better. Based on the following scores, help the human resource department
of the hospital to decide whether the universities differ in quality. (α= 𝟎. 𝟏𝟎)
Test Score

University A 99 83 89 64 98 85 61 79 91 87 88

University B 96 90 97 94 86 95 68 78 93 56 76 84
Hint of CASE STUDY-22
• Two samples are independent.
• No Information about distribution
• Nonparametric Test
• U Test
Given

Universit 99 83 89 64 98 85 61 79 91 87 88 -
yA
Universit 96 90 97 94 86 95 68 78 93 56 76 84
yB

Score 99 98 97 96 95 94 93 91 90 89 88 87 86
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13
Univers A A B B B B B A B A A A B
ity

Score 85 84 86 79 78 76 68 64 61 56
Rank 14 15 16 17 18 19 20 21 22 23
Univers A B A A B B B A A B
ity
Calculation of Sum of Ranks
Universit 99 98 91 89 88 87 85 83 79 64 61 -
yA
Rank 1 2 8 10 11 12 14 16 17 21 22 𝑅1
= 134

Universit 97 96 95 94 93 90 86 84 78 76 68 56
yB
Rank 3 4 5 6 7 9 13 15 18 19 20 23 𝑅2
= 142

𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐 (There is no significance difference)


𝑯𝟏 : 𝝁𝟏 ≠ 𝝁𝟐 ( There is a significance difference)

∝ = 0.1, 𝒏𝟏 = 𝟏𝟏, 𝒏𝟐 = 𝟏𝟐, 𝑹𝟏 = 𝟏𝟑𝟒, 𝑹𝟐 = 𝟏𝟒𝟐


𝒏𝟏 ( 𝒏𝟏 +𝟏)
𝑼𝟏 = 𝒏𝟏 . 𝒏𝟐 + − 𝑹𝟏
𝟐
𝟏𝟏 𝟏𝟏 + 𝟏
= (𝟏𝟏)(𝟏𝟐) + − 𝟏𝟑𝟒 = 𝟔𝟒
𝟐
𝒏𝟏 . 𝒏𝟐 𝟏𝟏(𝟏𝟐)
𝝁𝒗 = = = 𝟔𝟔
𝟐 𝟐

𝒏𝟏 . 𝒏𝟐 (𝒏𝟏 + 𝒏𝟐 + 𝟏) 𝟏𝟑𝟐(𝟐𝟒)
𝑺. 𝑬 𝑼 = 𝝈𝒗 = =
𝟏𝟐 𝟏𝟐
= 𝟏𝟔. 𝟐𝟒𝟖𝟏

𝑼 −𝝁𝒗 𝟔𝟒−𝟔𝟔 −𝟐
𝒁= = = = −𝟎. 𝟏𝟐𝟑𝟏
𝑺.𝑬.(𝑼) 𝟏𝟔.𝟐𝟒𝟖𝟏 𝟏𝟔.𝟐𝟒𝟖𝟏

𝑍𝐶𝑎𝑙 = 0.12 < 𝑍0.1 = 1.645

𝑯𝟎 is accepted. There is no significance difference

You might also like