Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

UNIT SEVEN

_____________________________________________________________________
ANALYSIS OF VARIANCE
_____________________________________________________________________
Unit outline
 Characteristics of Analysis of Variance

 One way analysis of Variance

 Two way analysis of Variance

Unit objective
After completing this chapter students will be able to:
 Distinguish F distribution from other types of distributions

 Know the characteristics of analysis of variance

 Use F test to test the hypothesis the mean of more than two population is equal

 Understand and use one way analysis of variance

 Understand and use two way analysis of variance

7.1 Chapter Introduction


Procedures for determining whether or not two populations have equal means were discussed in
hypothesis testing. However, management problems involve more than two populations and
decision makers want to know whether the means of these populations are or are not equal.
The responses that are generated in an experimental situation always exhibit a certain amount of
variability. In an analysis of variance, we divide the total variation in response measurements in
to portions that may be attributed to various factors of interest to the experimenter.

Introduction
One way to compare two population variances, d21 and d22, is to use the ratio of the sample
variances, S21/S22. If S21/S22 is nearly equal to 1, you will find little evidence to indicate that d21
and d22 are unequal. On the other hand, a very large or a very small value of S 21/S22 provides
evidence of a difference in a population variance. The assumptions required for an analysis of
variance are similar to those required for student’s t-distribution. Analysis of variance is so
called because we decide whether to accept or reject the hypothesis of equal population mean on
the by analyzing the variations (variance) in the sample means. The ANOVA test is performed
on simple random samples drawn randomly, one from each of the several populations. The test
assumes that the populations are normally distributed and have equal variances.

7.2 One way Analysis of variance


In the analysis of variance, the F statistic, is used to test whether the mean of two or more
groups are significantly different. It operates by breaking dawn the variance of the two or more
groups in to components. These components ore then used to construct the sample statistic.
An ANOVA based on group data that are defined by single classification is called one-way
ANOVA and an ANOVA based on group data that are defined by a dual Classification is called
tow way ANOVA.
Suppose we went to test whether number of years of work experience since graduation has an
effect on beginning salary for management graduates. The following table shows salary for the
graduates with different years of experience. Test if the average salaries of the different working
experience category are different at 5% level.
The three treatments or groups are:
Treatment 1: Bachelor’s degree with no work experience
Treatment 2: Bachelor’s Degree with one year of experience
Treatment 3: Bachelor’s degree with two years of experience
We also assume that all students in the sample graduated from one University and specialized in
one field of study. In order to simplify the necessary computation, a random sample of only 12
observations- 3 samples of (of 4 graduates) from each of the combinations.
Years of work experience
Student 1 year of Experience 2 years of Experience 3 years of experience
1 16 19 24
2 21 20 21
3 18 21 22
4 13 20 25
Total 68 80 92
Mean 17 20 23

(Global) Overall mean = (17+20+23)/3 =20

Specifying Hypotheses
Dear learner you might have noticed that we have calculate mean for the three treatments: no
experience, one year of experience and 2 years of experience. We have also calculated the
overall (global mean) for the observation. Hence we want to test whether these three sample
means were drawn from populations that have identical means. In other words, we want to test
the following null hypothesis:

Ho:
μ1 =μ2 =μ3 against the alternative hypothesis
H1: At least two population means are not equal.
Thus we are testing whether the difference between the sample means are too large to be
attributed solely to chance. If the test results indicate that the sample means are significantly
different, then we can conclude that the different years of work experience have an impact on
beginning salaries. Note that we make inference about means of more than two populations.
As we can see from the above table there are n observations and m populations. Each of the m
populations is a treatment. The top row indicates that we will be testing the equality of m
different means. With in each column there are n individual samples taken from each of the m
treatments. In developing the one way analysis of variance model, our purpose is to specify the
underlying relationships among the various treatments. Hence the first step is to calculate the
sample means from the random observations taken from each of the m treatments.
 To test the null hypothesis that the treatment means are equal, we need to assess two
measures of variability.

1. Variability of the sample with in each treatments this is referred to as with in


group variability

2. we are also interested in the variability between the m treatments- between group
variability

 The term variation refers to the sum of squared deviations which called the sum of
squares

SST = ∑ n j ( x j - x )2
Where SST = between treatment sum of squares
nj = sample size of treatment
X j = sample mean of the jth treatment
X = overall mean
n1 ( X - X ) 2 = 4(17-20)2 = 36
1

n2 ( X 2 −X )2 = 4(20-20)2 = 0

n3 ( X −X )2 = 4(23-20)2 = 36
SST= 72

With In Treatment Sum of Square


With in treatment sum of square specifies the treatment effect. It indicates the unexplained
variability that is due to the random sampling process. Calculation of with in treatment sum of
square can be done as follows.

SSW = ∑ ∑ ( X nj− X j )2
Treatment 1 Treatment 2 Treatment 3
(16-17)2 =1 (19-20)2 =1 (24-13)2 = 1
(21-17)2 = 16 (20 – 20)2 = 0 (21-23)2=4
(18-17)2 = 1 (21 – 20)2=1 (22-23)2 =1
(13-17)2 = 16 (20-20)2 = 0 (25-23)2 = 4
Total 34 2 10
SSW = 34+2+10 = 46
The between sum of square and with in sum of square together represent the total variation of the
ANOVA model. We calculate the total variation by adding squared deviations of the individual
observations about the global mean. Total sum of square can be calculated as:
m nj
∑ ∑ ( Xij−X )2
TSS= j=1 i=1

Where:
TSS= Total sum of square
Xij= value of the observation in the ith row and jth column.
X = Overall mean
To put it more simply we obtain total sum of square by adding the between treatments variation
and the within treatments variation.
(TSS) =SSW + SST = 76+34 = 118

Between treatment and with in treatment mean squares


The number of degrees freedom associated with the between treatments variation is (m-1)
whereas, the number of degrees of freedom associated with the with in treatment variation is (n-
m)
Where m= number of treatments and M= number of observations
Therefore, if return back to the example and try to calculate the degrees of freedom for the
between treatment and with in treatment variation, there are 3 treatments and 12 observations.
Which shows that m=3 and n=12. Thus, the degree of freedom can be calculated as:
Degree of freedom for between treatment variation is = 3-1 =2
Degree of freedom for within treatment variation is = 12-3 = 9

Mean square of the between treatments


Thetest of null hypothesis is based on the assumption that all the m treatments have common
variance. If the null hypothesis is in fact true, then the SST and SSW can be used as a basis for
estimate of a common variance. To calculate these estimates, we can now divide each of the
variability measures by its number of degrees of freedom. Hence the unbiased estimate of the
between treatments mean of square can be obtained by dividing SST by (m-1) degrees of
freedom.
MST = SST/m-1
Where MST = between treatment mean of square
In our example the between treatment mean of square is MST= 72/2 = 36. Similarly, nonbiased
estimate of the within treatment mean square is found by dividing SSW by (n-m) degrees of
freedom.
MSW = SSW/(n-m)
Where: MSW = Mean square of with in treatments
= 46/9 = 5.11
We now test the null hypothesis that the population treatment means are equal by comparing the
between treatment means square with the within treatment mean square.

The Test statistic


Comparison of the between treatments mean square and the within treatment mean square is
performed by computing a ratio:
The test statistic (F) = MST/MSW
If the null hypothesis that the population treatment means are equal were true, the ratio (F) would
tend to be equal 1. Alternatively, if the null hypothesis were not true, the ratio would be greater
than 1 (MST generally can not be smaller than MSW), which implies that the treatment means
do differ because the between treatment variances exceed the within treatment variance. The
ratio for the above example can be calculated as:
F calculated = MST/MSW
F calculated = 36/5.11 = 7.04
Summary table for one way ANOVA
Sources of sum of squares degree of mean
Variation freedom squares
Between treatments 72 SST 2 (m-1) 36
With in treatment 46 SSW 9 (n-m) 5.11
F, 2, 9 36/5.11 = 7.04
F 2, 9, 0.05 = 4 .26
Decision Rule
Accept H0 if F2, 9 < 4.26. 4.26 is a critical value that is read form statistical table at the end of
the module with the heading analysis of variance. To read the value from statistical table:
Search for F distribution table with 5 % significance level. Search for 2 degree of freedom in
numerator (on top) and 9 degrees of freedom in the denominator (first column) and read the
intersection of the two which is 4.26 in this case.
Decision
On the basis of the calculation that we have already made, we have found that F calculate is 7.04
which is greater that the critical F value 4.26; therefore, the null hypothesis must be rejected and
the alternative hypothesis must be accepted.

7.3 TWO-WAY ANALYSIS OF VARIANCE


In this section we extend one way ANOVA to two ways. A two way analysis of variance deals
with a more in-depth interpretation of Analysis of Variance analysis. In the previous example we
have been using our primary interest focused on single aspect of the one way analysis of variance
(years of experience), but it is possible that another factor also affects the outcome. In one way
analysis of variance, we conclude that the number of years of experience had a significant impact
on starting salary. However, we may suspect that some of the variability of the model is due to
the geographic location of the job. Hence, now we need not only to look at the treatment effects
of number of years work experience but also to isolate the impact of geographic location on the
starting salaries of all the graduates. By setting up a two way ANOVA problem, we want to
design a more accurate test to explain the differences in mean population of the treatments.
Our new model must be constructed in such a way as to test for the influence that the second a
second factor may have on starting salary. Using the data from the previous table which is
repeated below, we have 4 rows represent 4 geographic locations in Ethiopia. Hence, we will be
able to acquire information about the various years of work experience as well as information
about the geographic locations of the job. This new factor in our analysis (region) is called the
blocking factor. The blocks contain only a single observation per cell. Let’s assume that
 the first row represents Western part of Ethiopia

 the second row represents the Eastern part of Ethiopia

 the third row represents the northern part of Ethiopia

 The fourth row represents the southern part of Ethiopia.

The observation made is presented as follows


Yean of Experience
Region 1 2 3 Row sums Row mean
1 16 19 24 59 19.667
2 21 20 21 62 20.667
3 18 21 22 61 20.333
4 13 20 25 58 19.333
Column sum 68 80 92 240
Column means 17 20 23

Test if population mean salaries among various years of experience and among the various
geographical locations are equal at 5% level.
Solution
Specifying the hypothesis
We will have two hypotheses to be tested
1. Ho: population mean salaries among various years of work experience are equal

2. Ho= population mean salaries among various regions are equal.

H1 = population mean values are not equal.


Between and residual sum squares.
The necessary calculation for two way analysis of variance involves computation of the
following values:
SST = between treatment seem square
SSB = between block sum square
TSS = Total sum of square
SSE = error sum of square
Dear learner we have already seen how to calculate between treatment sum of square, and total
sum of square. In fact we have already computed the two. Do you recall the way we calculated
the two?
We now calculate the between blocks sum of square and error sum of square. Between blocks
sum of squares can be calculated as: where I, j, and k represents the k th salary observation in the
ith row, and jth column.
I
∑ J K ( X n −X )2
SSB = i=1

Where,
SSB= between blocks sum o f square

X i=sample mean of the ith row


X =Over all mean
The between blocks sum of square can be calculated as follows.
j ( X 1− X )2 = 3(19.667 – 20)2 = 0.333

j ( X 2− X )2 = 3 (20.667- 20)2 = 1.335

j ( X 3 −X )2 =3(20.337-20)2 = 0.333

j ( X 4 −X )2 = 3(19 .333-20)2 = 1.335

Total sum of square between blocks (SSB) is given by summation of the values which will be
3.336.
Error sum of square in turn can be calculated by subtracting between treatments sum of square,
and between blocks sum of square from the total sum of square.
SSE = TSS – SST – SSB
= 118-72 -3.336 = 42.664
The next logical step to be done is determination of degrees of freedom for between blocks sum
of square and residual (error) variation. The number of degrees of freedom associated with the
between blocks variation is (I-1) where as the degrees of freedom for associated with the residual
variation is (J-1)(I-1).
Between variance and error variance
It is now possible to obtain the unbiased estimates of the between blocks variance and residual
variance. The between blocks variance is calculated as:
SSB 3. 336
MSB = I −1 = 4−1 = 3.336/3 = 1.112
Where: I-1= degrees of freedom for the between blocks variance.
The residual (error) variance can be calculated in the same way as:
SSE
MSE = ( J−1 )( I−1) = 42-664/3(2) = 42.664/6 = 7.111
Where (J-1) (I-1) = Degree of freedom for residual or error variance
To test our null hypothesis about the influence of various years of work experience we must
calculate F ratio.
F(2,6) MST/MSE = 36/7.111 = 5.065
The above ratio is calculated f ratio. To decide whether to accept or reject the null hypothesis, we
need to read the critical value from statistical table and compare it with calculated f value.
Critical value for the above decision is F(2, 6, 0.05) = 5.14.
How to read the critical value from statistical table:
 Find f distribution with the mentioned significance level

 Find 2 degrees of numerator which is found in the first row of the table.

 Find 6 degrees of freedom in the denominator which is found in the first column of the
table.

 Find the intersection of the two and read the value which is found at the intersection of
the two.

INTERPRETATION
As the critical F value is greater than the calculated F value, we can not reject he null hypothesis;
which implies that there is no difference between the populations mean of salaries associated
with various years of work experience.
In testing the null hypothesis for the influence of geographical location on salaries, we find that F
ratio is:
1. 112
F(3,6) = 7 .111 = 0.156
The critical value associated with this test is F(3,6,0.05) = 4.76. The critical value again indicates
that we can not reject the null hypothesis that the population means of salaries associated with
geographical locations are equal.

Two way ANOVA Summary table


Source of sum of degree of mean
Variation squares freedom square
Between treatment SST (72) (I-1)2 36
Between Blocks SSB (3.336) (I-1)3 1.112
Residual SSE (42.664) (J-1) (I-1)6 7.111

F2, 6 = 36/7.111 = 5.065


F (2, 6, 0.05) = 5.14
F (3, 6) = 1.112/7.111 = 0.156
F (3, 6 0.05) = 4.76
The years of experience and region do not affect the salary.
UNIT SUMMARY

Analysis of Variance (ANOVA) is statistical technique used to determine whether samples from
two or more groups come from populations with equal means. Analysis of variance employs one
dependent measure, whereas multivariate analysis of variance compares samples on two or more
dependent measures.

Analysis of variance (ANOVA) is a statistical technique that can be used to evaluate whether
there are differences between the average value, or mean, across several population groups. With
this model, the response variable is continuous in nature, whereas the predictor variables are
categorical. For example, in a clinical trial of hypertensive patients, ANOVA methods could be
used to compare the effectiveness of three different drugs in lowering blood pressure.
Alternatively, ANOVA could be used to determine whether infant birth weight is significantly
different among mothers who smoked during pregnancy relative to those who did not
One-way ANOVA evaluates the effect of a single factor on a single response variable. For
example, a clinician may be interested in determining whether there are differences in the age
distribution of patients enrolled in two different study groups. Using ANOVA to make this
comparison requires that several assumptions be satisfied. Specifically, the patients must be
selected randomly from each of the population groups, a value for the response variable is
recorded for each sampled patient, the distribution of the response variable is normally
distributed in each population, and the variance of the response variable is the same in each
population. In the above example, age would represent the response variable, while the treatment
group represents the independent variable, or factor, of interest.

As indicated through its designation, ANOVA compares means by using estimates of variance.
Specifically, the sampled observations can be described in terms of the variation of the
individual values around their group means, and of the variation of the group means around the
overall mean. These measures are frequently referred to as sources of "within-groups" and
"between-groups" variability, respectively. If the variability within the k different populations is
small relative to the variability between the group means, this suggests that the population means
are different. This is formally tested using a test of significance based on the F distribution,
which tests the null hypothesis (H0) that the means of the k groups are equal:

H0 = μ1 = μ2 = μ3 = …. μk

An F-test is constructed by taking the ratio of the "between-groups" variation to the "within-
groups" variation. If n represents the total number of sampled observations, this ratio has an F
distribution with k-1 and n-k degrees in the numerator and denominator, respectively. Under the
null hypothesis, the "within-groups" and "between-groups" variance both estimate the same
underlying population variance and the F ratio is close to one. If the between-groups variance is
much larger than the within-groups, the F ratio becomes large and the associated p-value
becomes small. This leads to rejection of the null hypothesis, thereby concluding that the means
of the groups are not all equal. When interpreting the results from the ANOVA procedures it is
helpful to comment on the strength of the observed association, as significant differences may
result simply from having a very large number of samples.
Multi-way analysis of variance (MANOVA) is an extension of the one-way model that allows
for the inclusion of additional independent nominal variables. In some analyses, researchers may
wish to adjust for group differences for a variable that is continuous in nature. For example, in
the example cited above, when evaluating the effectiveness of hypertensive agents administered
to three groups, we may wish to control for group differences in the age of the patients. The
addition of a continuous variable to an existing ANOVA model is referred to as analysis of
covariance (ANCOVA).

SELF CHECK EXERCISE 7


1. An investor selected random samples of stock purchases recommended by three stock
brokers a year ago. The investor calculated the percent returns on each stock during the
year, as given below. Perform an ANOVA test at α = 0.05 level to determine if the mean
returns for the three advisory firms are equal.
Percent returns
A B C
7.0 8.7 3.4
2.8 5.2 8.1
5.1 4.9 4.2
4.6 7.0 2.6
2. Instruments for correcting a power plant malfunction are mounted on control panel.
Three panels were designed, with the instruments arranged differently on different
panels. Then three random samples of four control engineers per were selected. Each
sample was assigned to one panel. The time in seconds taken by engineers to correct
stimulated malfunction are given below. Perform ANOVA test at the 0.05 level to
determine if the mean times to correct the malfunction are the same for the three panels.
Percent returns
Panel A Panel B Panel C
17 9 13
12 16 8
15 11 14
20 12 9

3. Three methods for assembling a product are to be tested at the 0.05 level to determine
whether mean times per assembly for the methods are equal. Random sample assembly
times in minutes are given below. Perform the ANOVA test.

Method one Method two Method three


11 19 19
13 25 14
19 16 13
18 22 14
14 18 20

4. Stock analyst thinks four stock mutual funds generate about the same return. She
collected the accompanying rate of return data on four different mutual funds during the
last 5 years.
A B C D
1988 12 11 13 15
1989 12 17 19 11
1990 13 18 15 12
1991 18 20 25 11
1992 12 19 19 10

a) Conduct a one-way ANOVA to decide whether the funds give different performance. Use
5%
b) Conduct a two way ANOVA to decide whether the funds give different performances.
Use 5%
5. The following table gives the data regarding the sales in four zones in Ethiopia and the
sales made by four sales men. At 5% level of significance conduct a two way ANOVA
(Analysis of Variance), to test the mean sales among the sales men is the same.
North East West South
Sales Man A 8 6 5 4
Sales man B 6 6 7 6
Sales Man C 5 6 8 9
Sales Man 4 8 7 9

You might also like