Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

ANOVA

1. One-way ANOVA
The one-way analysis of variance (ANOVA) is used to determine whether there are any
statistically significant differences between the means of three or more independent
(unrelated) groups.
Assume that we want to compare the average of the K population based on random
variables including n1, n2,… , nk which are observed from the K population. There are
three assumptions about the population groups for which the ANOVA analysis was
conducted:
- The population should have normal distribution
- The population variances are normally distributed
- Sampled observations are independent
If the mean of the populations is denoted as μ1, μ2,… , μk, then when the above
assumptions are met, the One-way ANOVA model is described as a hypothesis test as
follows:
H0: μ1 = μ2 = . . . = μk
Hypothesis H0 assumes that the mean of the first k populations is equal (in terms of
related research, this hypothesis assumes that causal factors have no impact on the
problem we are studying). And the conjecture is:
H1: There exists at least one pair of different population mean
The first two assumptions for conducting the analysis of variance are described as the
figure shown below. You can see that the three populations have normal distribution
with a relatively similar degree of dispersion, but their 3 different positions give three
different mean values. Obviously, if you do have the values of the 3 populations and
their distributed representations as shown below, then you don’t need to do anything
else except concluding that you reject H0 or 3 populations that have different means.
But you only have a representative sample of observations, so to test this hypothesis,
we perform the following steps:
Step 1: Calculate the sample mean of the groups (as representative of the
populations)
We first know how to calculate the sample mean from the observations of k
independent random samples (symbol x 1, x 2,…, x k ) and the mean of k observations
(symbol x ) from the general case as follows:
Population

1 2 … K
x 11 x2 1 … xk 1

x12 x 22 … x k 2 (x ij )

… … … …

x1 n x2 n xk n
1 2
… k

Calculate the sample mean of each group x 1, x 2,…, x k according to the formula
ni

∑ x ij (i=1, 2, .., k)
j =1
x i=
ni
And the mean of k samples (the average of the entire survey sample):
k

∑ ni xi
x i= i=1k
∑ ni
i=1

Step 2: Calculate sums of squared.


Sum the squared deviation within the SSW group and the sum of the squared differences
between the SSG.
The Sum of Squares Within Groups (SSW) is calculated by adding the squared deviation
between the observations and the sample mean for each group, and then summing the
results for all groups. SSW reflects the outcome variation due to the influence of other
factors, not the causal factor under study (the factor used to distinguish the compared
populations/groups)
The sum of the squared deviation of each group was calculated using the formula:
n1

Group 1: SS1=∑ (x 1 j−x 1 )2


j=1
n2

Group 2: SS2=∑ (x 2 j−x 2 )2


j=1

Similarly, we calculate the kth group to get SSk. So the sum of squared deviation within
groups is calculated as follows:
SSW =SS 1+ SS 2+ …+SS K
According to the general formula:
k ni
SSW =∑ ∑ (x ij −x 1)
2

i =1 j=1

The Sum of Squared deviation between Groups (SSG) is calculated by adding the
squared deviation between the sample mean of each k group (these deviations are
accepted with the corresponding number of observations for each group). SSG reflects
the variable outcome factor due to the influence of the causative factor.
k
SSG=∑ n i( xi −x)2
i=1

The Total sum of squares SST is calculated by adding sum of squared deviation between
each observed value of the whole research sample ( x ij) with the overall mean ( x ) SST
reflects the outcome factor due to the influence of all causes.
k ni
SST =∑ ∑ (x ij −x )
2

i=1 j=1

It can be easily shown that the sum of total squared deviations is equal to the Sum of
Squared Between Groups and Sum of Squares Within Groups.
SST =SSW + SSG

Thus, the above formula shows that SST is the total variation of the outcome factor that was
analyzed into 2 parts: the part of variation caused by the factor being studied (SSG) and the
remaining variation caused by other factors not studied here (SSW)

Step 3: Calculate the variances (which is the average of the squared deviations).
The variances are calculated by taking the sums of the squared deviations divided by the
corresponding degrees of freedom.
Calculate Mean Square Within (MSW) by summing the sum of Squares Within Groups
(SSW) divided by the respective degrees of freedom is n-k (n is the number of
observations, k is the number of compared groups). MSW is an estimate of the variation
of the weak effect caused by other factors (or explanations).
SSW
MSW =
n−k

Calculate Mean Square Between Groups (MSG) by summing the mean square of the
groups divided by the respective degrees of freedom is k – 1. MSG is the estimate of the
variable part of the outcome factor caused by the causative factor under study (or
explainable).
SSG
MSG=
k−1
Step 4: Hypothesis testing
The equality hypothesis of k population mean is decided on the basis of ratio of two
variances: mean square between groups (MSG) and mean square within groups (MSW),
this number is called the F ratio because it follows Fisher-Snedecor with degrees of
freedom k-1 in the numerator and n-k in the denominator.
MSG
F=
MSW
We reject the null hypothesis H 0 assuming that the mean of k populations are equal
when:
F> F( k−1; n−k ) , α
F (k−1 ;n−k ), α is the limit value to look up from the lookup table number 8 with degree of
freedom by columm number k-1 and rows n-k, remember to choose the table with the
appropriate significance level.

Source of Sum of squares Degree of Mean Squares F ratio


Variation (SS) Freedom (df) (MS)

Between – group SSG k-1 SSG MSG


MSG= F=
k−1 MSW

Within – groups SSW n-k SSW


MSW =
n−k

Total SST n-1


2. Two-way ANOVA
A two-way ANOVA is used to estimate how the mean of a quantitative variable changes
according to the levels of two categorical variables. Two-way Analysis of Variance
considers at the same time two causal factors (in the form of qualitative data) that affect
the outcome factor (in the form of quantitative data). Two-way analysis of variance will
help us to include this factor in the analysis making the research results also valid.
a. One sample observation in a cell
Suppose we study the influence of two qualitative causal factors on some quantitative
outcome factors. According to the first causal factor, we can arrange the research sample
units into K groups. According to the second causal factor, we can arrange the research
sample units into H blocks. If we also arrange the sample units according to these two
causal factors, we will have a combined table consisting of K columns and H rows and
the table will have K x H data cells. If we only have 1 observed sample in a cell, the
total number of observed sample units is n = K x H. The general form of this table is as
follows:

Blocks Groups
1 2 3 … K
1 x 11 x2 1 x31 … xK 1

2 x1 2 x 22 x 32 … xK 2

… … … … … …
H x1 H x2H x3H … x KH

To perform the hypothesis test that the mean of the population K corresponding to K
sampled groups is equal, and the hypothesis test that the mean of the population H
corresponding to the sample block H is equal, we perform the following steps:

Step 1: Calculate the mean


Mean of each Group (column)
H

∑ x ij (i=1,2,..,K)
x i= j −1
H
Mean of each Block (row)
H

∑ x ij (i=1,2,..,H)
x j= i−1
K
The mean of all observations:
K H K H

∑ ∑ xij ∑ xi ∑ x j
X = i =1 j=1
= i=1 − j=1
n K H
Step 2: Calculate the sum of the squared deviation
The Total sum of squares (SST): SST = SSG + SSB + SSE
K H
SST =∑ ∑ (x ij −x )2
i=1 j=1

SST reflects the variation of the quantitative outcome factor due to the influence of all
causes.
The Sum of Squared deviation between Groups (SSG):
K
SSG=H ∑ ni (x i−x )2
i=1

SSG reflects the variable part of the quantitative outcome factor due to the influence of
the first cause factor, which is used for grouping in the column.
The Sum of Squared deviation between Blocks (SSB)
H
SSB=K ∑ ni ( x j−x)2
j=1

The SSB reflects the variable part of the quantitative outcome factor due to the influence
of the second causal factor, which is used for grouping in the line.
The sum of squared estimate of errors (SSE)
K H
SSE=∑ ∑ (x ij −x i−x j + x)2 =SST −SSG−SSB
i=1 j=1

SSE reflects the variable part of the quantitative outcome factor due to the influence of
other factors, the remaining factors were not included in the study
Step 3: Calculate the variance
Mean squared variation between groups:
SSG
MSG=
K −1
Mean squared variation between blocks:
SSB
MSB=
H −1
Mean of squared estimate of errors:
SSE
MSE=
(H −1)(K −1)
Step 4: Hypothesis testing on the effect of the first causal factor (column) and the
second causal factor (line) to the effect factor by the ratios F:
MSG MSB
F 1= F 2=
MSE MSE
Step 5: There are 2 cases in ANOVA's decision to reject the H0 hypothesis element:
1. For F1 at the significance level α, the hypothesis H0 that the mean of the
population K by the first equal causal factor (column) is rejected when:
F 1> F K−1 , ( K −1 ) (H −1 ), α

2. For F2 at the hypothetical significance level α that the mean of the population H
by the second equal causal factor (line)is rejected when:
F 2> F H −1 , ( K−1) ( H −1) , α
Where:
F K −1 , ( K−1) ( H −1) , α is valued in the F-distribution table with K-1 degrees of freedom in the
numerator and (K - 1)(H - 1) degrees of freedom in the denominator.
F H −1 , ( K −1 ) (H −1 ) ,α is valued in the F-distribution table with H-1 degrees of freedom in
the numerator and (K - 1)(H - 1) degrees of freedom in the denominator.
Source of Sum of squares Degree of Mean Squares (MS) F ratio
Variation (SS) Freedom (df)
Between – SSG K-1 SSG MSG
MSG= F 1=
group K −1 MSE
Between – SSW H-1 SSB MSB
MSB= F 2=
Blocks H −1 MSE
Estimate of SSE (K-1)(H-1) SSE
MSE=
errors (H −1)(K −1)
Total SST n-1
b. Many sample observations in a cell
In order to increase the accuracy when concluding about the influence of two causal
factors on the outcome of the sample for a population, we increase the observed sample.
Let L be the number of observations in a cell, we have the general form of L observations
in a number of cells as follows:
Blocks Groups
1 2 … K
1 x 111 x 112 x 11 L x 2 11 x 2 12 x 2 1 L x K 11 x K 12 x K 1 L
2 x 1 21 x 12 2 x 1 2 L x 2 21 x 22 2 x2 2 L x K 21 x K 2 2 x K 2 L

H x 1 H 1 x 1 H 2 x1 H L x2 H 1 x2 H 2 x2 H L xK H 1 x K H2 xK H L
Step 1: Calculate the mean
Mean of each Group (column)
𝑖==1𝐻𝑠=1𝐿𝑥𝑖𝑗𝑠𝐻𝐿
Mean of each Block (row)
k L

∑ ∑ x ijs
x j= i=1 s=1
(i=1 , 2 , … , K )
KL
Mean of each cell
L

∑ x ijs
x i j= s−1 (i=1 , 2, … , K )
L
The mean of all observations:
K H L

∑ ∑ ∑ x ijs
X = i =1 j=1 s=1
(i=1 ,2 , … , K)
KHL

Step 2: Calculate the sum of the squared deviation


The Total sum of squares (SST): SST = SSG + SSB + SSE
K H L
SST =∑ ∑ ∑ ( xijs −x)2
i=1 j=1 s=1

SST reflects the variation of the quantitative outcome factor due to the influence of all
causes.
The Sum of Squared deviation between Groups (SSG):
K
SSG=HL ∑ ( x i−x)2
i=1

SSG reflects the variable part of the quantitative outcome factor due to the influence of
the first cause factor, which is used for grouping in the column.
The Sum of Squared deviation between Blocks (SSB)
H
SS B=K L ∑ (x j−x )2
j=1

The SSB reflects the variable part of the quantitative outcome factor due to the influence
of the second causal factor, which is used for grouping in the line.
The sum of squared Intersection between Groups and Block (SSI)
K H
SSI =L ∑ ∑ (x ij −x i−x j + x)2
i=1 j=1

The sum of squared estimate of errors (SSE)


K H L
SSE=∑ ∑ ∑ (x ijs −xij )2=SST −SSG−SSB−SSI
i=1 j=1 s=1

SSE reflects the variable part of the quantitative outcome factor due to the influence of
other factors, the remaining factors were not included in the study.
Step 3: Calculate the variance
Mean squared variation between groups:
SSG
MSG=
K −1

Mean squared variation between blocks:


SS B
MS B=
H −1
Mean squared Intersection between Groups and Block (SSI)
SS I
MS I =
(H−1)(K−1)
Mean of squared estimate of errors:
SS E
MS E=
K H (L−1)
Step 4: Hypothesis testing on the effect of the first causal factor (column) and the
second causal factor (line) to the effect factor by the ratios F:
MSG MS B MS I
F 1=
MSE
; F 2=
MSE
; F 3=
MSE
Step 5: The decision rule in two - way ANOVA :
For F1 at the significance level α, the hypothesis H0 assumes that the mean of the
population K by the first equal causal factor (column) is rejected when:
F 1> F K−1 , KH ( L−1 ), α
For F2 at the hypothetical significance level α that the mean of the population H by the
second equal causal factor (line)is rejected when:
F 2> F H −1 , KH (L−1) ,α
For F3 at the level of significance α, hypothesis H0 assumes that there is no interaction
between the first element (column) and the second element (row) is rejected when:
F 3> F (K−1)(H −1), KH ( L−1) ,α
Where F 1> F K−1 , KH ( L−1 ), α is valued in the F-distribution table with K - 1 degrees of
freedom in the numerator and KH ( L−1 ) degrees of freedom in the denominator.
F H −1 , KH ( L−1 ), α is valued in the F-distribution table with H - 1 degrees of freedom in the

numerator and KH ( L−1 ) degrees of freedom in the denominator.


F(K −1)( H −1 ), KH ( L−1) , α is valued in the F-distribution table with (K - 1)(H - 1) degrees of

freedom in the numerator and KH ( L−1 ) degrees of freedom in the denominator.

You might also like