Week10 Analysis of Variance

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

STATISTICAL ANALYSIS FOR

EXPERIMENTAL DESIGN AND DATA


PROCESSING

Prof. Mustafa Sait YAZGAN


Assoc. Prof. Alpaslan EKDAL

1
Analysis of Variance (ANOVA)

Introduction

If the number of groups (treatment or factor), which is analyzed


or compared, are more than two it is difficult to conduct the
hypothesis tests with z and t distributions, and also their
accuracy is low. In these cases, it is more appropriate to conduct
hypothesis tests according to F distribution using Variance
Analysis method.

2
Analysis of Variance (ANOVA)

The principle of variance analysis is testing the common


variance by dividing it into its sources.

Common variance is divided into two when applying random


parcel trials. First one is the variation due to the differences of
treatments, and second one is the variation due to the
uncontrollable random errors.

The effect of treatments depends on the difference between


the variation of treatments and variation of random errors. As
can be understood, variances are compared during variance
analysis.

3
Analysis of Variance (ANOVA)

Tests and estimations related to variances are conducted by


using the formula of 𝑋 2 distribution.

F distribution, which is calculated as the ratio of two


independent 𝑋 2 distributions, is used for the comparison of two
variances.

Distribution has standard table values for 0.01, 0.05, 0.10 and
0.25 confidence levels. This table should be well understood for
variance analysis.

4
Analysis of Variance (ANOVA)

Some research examples, where variance analysis is applied and


‘One way classified’ data sets are obtained, are as follows:

1) A researcher who wants to compare the average efficiencies


of 4 different plants, could generate 60 parcels in a
homogeneous agricultural land, and grows each plant
randomly in 15 parcels. In this case the results could be
analyzed according to this method.

5
Analysis of Variance (ANOVA)

2) When 5 different medicines are applied with enough


repetitions (equal or different, minimum 15 repetitions) on
mice that have the same disease, effectiveness of the
medicines are compared with the same analysis method.

3) To compare the average durability of batteries manufactured


in 4 different factories, data obtained with enough number of
measurements are analyzed with full random trial plan.

6
Application of Variance Analysis

While applying the variance analysis steps of a general


hypothesis test is followed. Prior to variance analysis suitability
of data to normal distribution, homogeneity of treatment
variances and summability of the model should be checked.

Variance analysis is applied as follows:

1) Hypotheses are established.


H0: 1 = 2 = 3 ...... = p (Treatment averages are the same)
H1: 1  2  3 ......  p (Treatment averages are different)

2) Critical table value is identified from F table.


Fc = F, (treatment df, error df)

7
Application of Variance Analysis

3) Test statistic is calculated.

𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑇𝑀𝑆


𝐹ℎ = =
𝐸𝑟𝑟𝑜𝑟 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝐸𝑀𝑆

4) Test statistic value is compared with the table value to give


the decision.

Fh>Fc H0 is rejected H1 is accepted. At least two averages are


different
Fh< Fc H0 is accepted, averages are not different

8
Application of Variance Analysis

𝑥. .2
2
𝐶𝑜𝑚𝑚𝑜𝑛 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝐶𝑆𝑆 = ෍ ෍ 𝑥𝑖𝑗 −
𝑛. 𝑝

𝑥𝑖2 𝑥. .2
𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝑇𝑆𝑆 = ෍ −
𝑛 𝑛. 𝑝

𝐸𝑟𝑟𝑜𝑟 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝐸𝑆𝑆 = 𝐶𝑆𝑆 − 𝑇𝑆𝑆

9
Application of Variance Analysis

Degrees of freedom should be calculated for variances (mean


squares).

Common degree of freedom (cdf) = (n.p) – 1


Treatment degree of freedom (tdf) = p – 1 and
Error degree of freedom (edf) = (cdf–tdf)

Mean squares, which represent the variances of each source,


are calculated as follows:

Treatment Mean Square (TMS) = TSS/tdf


Error Mean Square (EMS) = ESS/edf

10
Application of Variance Analysis

After this step test statistic is calculated as follows:

𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑇𝑀𝑆


𝐹ℎ = =
𝐸𝑟𝑟𝑜𝑟 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝐸𝑀𝑆

All the operations are summarized in Analysis of Variance


(ANOVA) Table as given below.
Degrees
Variance Sum of Mean of
of Fh Fc Result
Sources Squares Squares
Freedom
Treatment tdf TSS TMS TMS/EMS F Table
Error edf ESS EMS
Common cdf CSS

11
Example 1

In a battery manufacturing factory in order to identify whether


there’s a difference in the durability of batteries produced with
4 different technologies, 6 battery samples from each
technology were tested. According to the results is there a
significant difference between the durability of batteries at 99%
level of confidence?
Batteries
A B C D
Sample No
1 64 98 75 55
2 72 91 93 66
3 68 97 78 49
4 77 82 71 64
5 56 85 63 70
6 95 77 76 68
𝑥 72 88.3 76 62 Total
෍ 𝑥𝑖 432 530 456 372 1790

12
Example 1

Table has 4 means to compare. When t test is applied to 4


means 6 different comparisons should be done as given below:

𝑥1 − 𝑥2 , 𝑥2 − 𝑥3 , 𝑥1 − 𝑥3 , 𝑥2 − 𝑥4 , 𝑥1 − 𝑥4 , 𝑥3 − 𝑥4
On the other hand by applying variance analysis technique, it
would be possible to give decision with an easier test according
to F distribution.

13
Example 1

In the given example sum of squares of common variance is


calculated as follows:
2 2
2 (σ 𝑥 𝑖 ) 1790
𝐶𝑆𝑆 = ෍ ෍ 𝑥𝑖𝑗 − = 642 + 722 + ⋯ + 682 − = 4207.8
𝑛. 𝑝 24

p: Number of treatments (battery)


n: Number of repetitions

There are two sources for common variance. First one is due to
the differences in batteries, and the second one is due to the
random differences within each type of battery.

14
Example 1

Variance due to the differences in battery types is calculated as


sum of squares between treatments (TSS).

σ 𝑥𝑖2 (σ 𝑥𝑖 )2 4322 + 5302 + 4562 + 3722 17902


𝑇𝑆𝑆 = − = − = 2136.5
𝑛 𝑛. 𝑝 6 24

Variance due to variability within treatments (error) is calculated


by subtracting sum of squares between treatments from
common variance.

𝐸𝑆𝑆 = 𝐶𝑆𝑆 − 𝑇𝑆𝑆 = 4207.8 − 2136.5 = 2071.3

15
Example 1

Since p = 4 and n = 6, degrees of freedom;

cdf: n.p-1 = 23
tdf: p-1 = 4-1 = 3
edf: cdf-tdf = 23-3 = 20

Mean of squares;

𝑇𝑆𝑆 2136.5
𝑇𝑀𝑆 = = = 712.2
𝑡𝑑𝑓 3

𝐸𝑆𝑆 2071.3
𝐸𝑀𝑆 = = = 103.6
𝑒𝑑𝑓 20
16
Example 1

The results are summarized on the ANOVA table.


Degrees of
Variance Sources Sum of Squares Mean of Squares
Freedom
Between treatments
3 2136.5 712.2
(Between batteries)
Within treatments (Error)
20 2071.3 103.6
(Within batteries)
Common 23 4207.8

According to the results on the table, it is probable that the


treatments are different at a level equal to the ratio of mean of
squares of between treatments (712.7) to mean squares of
within treatments (103.6). To analyze this with a certain error
level, F distribution is used .
17
Example 1

Hypotheses are established as;

H0: 1 = 2 = 3 = 4

H1: Mean of treatments are different (at least mean of two


treatments are different)

𝑇𝑀𝑆 712.17
𝐹ℎ = = = 6.87
𝐸𝑀𝑆 103.6

F table value for tdf = 3 and edf = 20 and  = 0.01 F0.01,(3, 20) = 4.94

18
Example 1

Since test statistic value (6.87) is higher than critical table value
(4.94), the mean of durability of batteries are significantly
different than each other.

To identify which treatment/s (batteries) cause the difference,


multiple comparison method LSD (Least Significant Difference
Test), Duncan, Student-Newman-Keuls (SNK), Bonferroni, Tukey,
Scheffe or Dunnet methods could be used. LSD method which is
widely used will only be explained in here.

19
Least Significant Difference (LSD) Multiple
Comparison Method

LSD method is an easy one for applying among the multiple


comparison tests. For this test the least significant difference is
calculated with the given formula.

2𝐸𝑀𝑆
𝐿𝑆𝐷 = 𝐹𝛼(1,𝑒𝑑𝑓)
𝑛

𝐿𝑆𝐷 > (𝑥1 − 𝑥2 ) Means are similar

𝐿𝑆𝐷 < (𝑥1 − 𝑥2 ) Means are different

20
Example 2

Solve Example 1 using LSD multiple comparison test? =0.01

First LSD value is calculated:

2 ∗ 103.6
𝐿𝑆𝐷 = 𝐹0.01,(1, 20) = 8.10 ∗ 34.53 = 16.72
6

In the second step differences of means to be compared are


calculated, and compared with LSD value.

21
Example 2

Means to be Difference of Means


LSD Results
Compared (Absolute Difference AD)

A-B 72-88.3=16.3 16.72 AD<LSD Insignificant


A-C 72-76=4 16.72 AD<LSD Insignificant
A-D 72-62=10 16.72 AD<LSD Insignificant
B-C 88.3-76=12.3 16.72 AD<LSD Insignificant
B-D 88.3-62=26.3 16.72 AD>LSD Significant*
C-D 76-62=14 16.72 AD<LSD Insignificant

According to multiple comparison test results, there’s a


difference between the durability of B and D batteries. The
other batteries have similar durability.

22
tdf
edf

23
tdf
edf

24

You might also like