Professional Documents
Culture Documents
UNIT FIVE
UNIT FIVE
_____________________________________________________________________
ANALYSIS OF VARIANCE
_____________________________________________________________________
Unit outline
Characteristics of Analysis of Variance
Unit objective
After completing this chapter students will be able to:
Distinguish F distribution from other types of distributions
Use F test to test the hypothesis the mean of more than two population is equal
Introduction
One way to compare two population variances, d21 and d22, is to use the ratio of the sample
variances, S21/S22. If S21/S22 is nearly equal to 1, you will find little evidence to indicate that d21
and d22 are unequal. On the other hand, a very large or a very small value of S 21/S22 provides
evidence of a difference in a population variance. The assumptions required for an analysis of
variance are similar to those required for student’s t-distribution. Analysis of variance is so
called because we decide whether to accept or reject the hypothesis of equal population mean on
the by analyzing the variations (variance) in the sample means. The ANOVA test is performed
on simple random samples drawn randomly, one from each of the several populations. The test
assumes that the populations are normally distributed and have equal variances.
Specifying Hypotheses
Dear learner you might have noticed that we have calculate mean for the three treatments: no
experience, one year of experience and 2 years of experience. We have also calculated the
overall (global mean) for the observation. Hence we want to test whether these three sample
means were drawn from populations that have identical means. In other words, we want to test
the following null hypothesis:
Ho:
μ1 =μ2 =μ3 against the alternative hypothesis
H1: At least two population means are not equal.
Thus we are testing whether the difference between the sample means are too large to be
attributed solely to chance. If the test results indicate that the sample means are significantly
different, then we can conclude that the different years of work experience have an impact on
beginning salaries. Note that we make inference about means of more than two populations.
As we can see from the above table there are n observations and m populations. Each of the m
populations is a treatment. The top row indicates that we will be testing the equality of m
different means. With in each column there are n individual samples taken from each of the m
treatments. In developing the one way analysis of variance model, our purpose is to specify the
underlying relationships among the various treatments. Hence the first step is to calculate the
sample means from the random observations taken from each of the m treatments.
To test the null hypothesis that the treatment means are equal, we need to assess two
measures of variability.
2. we are also interested in the variability between the m treatments- between group
variability
The term variation refers to the sum of squared deviations which called the sum of
squares
SST = ∑ n j ( x j - x )2
Where SST = between treatment sum of squares
nj = sample size of treatment
X j = sample mean of the jth treatment
X = overall mean
n1 ( X - X ) 2 = 4(17-20)2 = 36
1
n2 ( X 2 −X )2 = 4(20-20)2 = 0
n3 ( X −X )2 = 4(23-20)2 = 36
SST= 72
SSW = ∑ ∑ ( X nj− X j )2
Treatment 1 Treatment 2 Treatment 3
(16-17)2 =1 (19-20)2 =1 (24-13)2 = 1
(21-17)2 = 16 (20 – 20)2 = 0 (21-23)2=4
(18-17)2 = 1 (21 – 20)2=1 (22-23)2 =1
(13-17)2 = 16 (20-20)2 = 0 (25-23)2 = 4
Total 34 2 10
SSW = 34+2+10 = 46
The between sum of square and with in sum of square together represent the total variation of the
ANOVA model. We calculate the total variation by adding squared deviations of the individual
observations about the global mean. Total sum of square can be calculated as:
m nj
∑ ∑ ( Xij−X )2
TSS= j=1 i=1
Where:
TSS= Total sum of square
Xij= value of the observation in the ith row and jth column.
X = Overall mean
To put it more simply we obtain total sum of square by adding the between treatments variation
and the within treatments variation.
(TSS) =SSW + SST = 76+34 = 118
Test if population mean salaries among various years of experience and among the various
geographical locations are equal at 5% level.
Solution
Specifying the hypothesis
We will have two hypotheses to be tested
1. Ho: population mean salaries among various years of work experience are equal
Where,
SSB= between blocks sum o f square
j ( X 3 −X )2 =3(20.337-20)2 = 0.333
Total sum of square between blocks (SSB) is given by summation of the values which will be
3.336.
Error sum of square in turn can be calculated by subtracting between treatments sum of square,
and between blocks sum of square from the total sum of square.
SSE = TSS – SST – SSB
= 118-72 -3.336 = 42.664
The next logical step to be done is determination of degrees of freedom for between blocks sum
of square and residual (error) variation. The number of degrees of freedom associated with the
between blocks variation is (I-1) where as the degrees of freedom for associated with the residual
variation is (J-1)(I-1).
Between variance and error variance
It is now possible to obtain the unbiased estimates of the between blocks variance and residual
variance. The between blocks variance is calculated as:
SSB 3. 336
MSB = I −1 = 4−1 = 3.336/3 = 1.112
Where: I-1= degrees of freedom for the between blocks variance.
The residual (error) variance can be calculated in the same way as:
SSE
MSE = ( J−1 )( I−1) = 42-664/3(2) = 42.664/6 = 7.111
Where (J-1) (I-1) = Degree of freedom for residual or error variance
To test our null hypothesis about the influence of various years of work experience we must
calculate F ratio.
F(2,6) MST/MSE = 36/7.111 = 5.065
The above ratio is calculated f ratio. To decide whether to accept or reject the null hypothesis, we
need to read the critical value from statistical table and compare it with calculated f value.
Critical value for the above decision is F(2, 6, 0.05) = 5.14.
How to read the critical value from statistical table:
Find f distribution with the mentioned significance level
Find 2 degrees of numerator which is found in the first row of the table.
Find 6 degrees of freedom in the denominator which is found in the first column of the
table.
Find the intersection of the two and read the value which is found at the intersection of
the two.
INTERPRETATION
As the critical F value is greater than the calculated F value, we can not reject he null hypothesis;
which implies that there is no difference between the populations mean of salaries associated
with various years of work experience.
In testing the null hypothesis for the influence of geographical location on salaries, we find that F
ratio is:
1. 112
F(3,6) = 7 .111 = 0.156
The critical value associated with this test is F(3,6,0.05) = 4.76. The critical value again indicates
that we can not reject the null hypothesis that the population means of salaries associated with
geographical locations are equal.
Analysis of Variance (ANOVA) is statistical technique used to determine whether samples from
two or more groups come from populations with equal means. Analysis of variance employs one
dependent measure, whereas multivariate analysis of variance compares samples on two or more
dependent measures.
Analysis of variance (ANOVA) is a statistical technique that can be used to evaluate whether
there are differences between the average value, or mean, across several population groups. With
this model, the response variable is continuous in nature, whereas the predictor variables are
categorical. For example, in a clinical trial of hypertensive patients, ANOVA methods could be
used to compare the effectiveness of three different drugs in lowering blood pressure.
Alternatively, ANOVA could be used to determine whether infant birth weight is significantly
different among mothers who smoked during pregnancy relative to those who did not
One-way ANOVA evaluates the effect of a single factor on a single response variable. For
example, a clinician may be interested in determining whether there are differences in the age
distribution of patients enrolled in two different study groups. Using ANOVA to make this
comparison requires that several assumptions be satisfied. Specifically, the patients must be
selected randomly from each of the population groups, a value for the response variable is
recorded for each sampled patient, the distribution of the response variable is normally
distributed in each population, and the variance of the response variable is the same in each
population. In the above example, age would represent the response variable, while the treatment
group represents the independent variable, or factor, of interest.
As indicated through its designation, ANOVA compares means by using estimates of variance.
Specifically, the sampled observations can be described in terms of the variation of the
individual values around their group means, and of the variation of the group means around the
overall mean. These measures are frequently referred to as sources of "within-groups" and
"between-groups" variability, respectively. If the variability within the k different populations is
small relative to the variability between the group means, this suggests that the population means
are different. This is formally tested using a test of significance based on the F distribution,
which tests the null hypothesis (H0) that the means of the k groups are equal:
H0 = μ1 = μ2 = μ3 = …. μk
An F-test is constructed by taking the ratio of the "between-groups" variation to the "within-
groups" variation. If n represents the total number of sampled observations, this ratio has an F
distribution with k-1 and n-k degrees in the numerator and denominator, respectively. Under the
null hypothesis, the "within-groups" and "between-groups" variance both estimate the same
underlying population variance and the F ratio is close to one. If the between-groups variance is
much larger than the within-groups, the F ratio becomes large and the associated p-value
becomes small. This leads to rejection of the null hypothesis, thereby concluding that the means
of the groups are not all equal. When interpreting the results from the ANOVA procedures it is
helpful to comment on the strength of the observed association, as significant differences may
result simply from having a very large number of samples.
Multi-way analysis of variance (MANOVA) is an extension of the one-way model that allows
for the inclusion of additional independent nominal variables. In some analyses, researchers may
wish to adjust for group differences for a variable that is continuous in nature. For example, in
the example cited above, when evaluating the effectiveness of hypertensive agents administered
to three groups, we may wish to control for group differences in the age of the patients. The
addition of a continuous variable to an existing ANOVA model is referred to as analysis of
covariance (ANCOVA).
3. Three methods for assembling a product are to be tested at the 0.05 level to determine
whether mean times per assembly for the methods are equal. Random sample assembly
times in minutes are given below. Perform the ANOVA test.
4. Stock analyst thinks four stock mutual funds generate about the same return. She
collected the accompanying rate of return data on four different mutual funds during the
last 5 years.
A B C D
1988 12 11 13 15
1989 12 17 19 11
1990 13 18 15 12
1991 18 20 25 11
1992 12 19 19 10
a) Conduct a one-way ANOVA to decide whether the funds give different performance. Use
5%
b) Conduct a two way ANOVA to decide whether the funds give different performances.
Use 5%
5. The following table gives the data regarding the sales in four zones in Ethiopia and the
sales made by four sales men. At 5% level of significance conduct a two way ANOVA
(Analysis of Variance), to test the mean sales among the sales men is the same.
North East West South
Sales Man A 8 6 5 4
Sales man B 6 6 7 6
Sales Man C 5 6 8 9
Sales Man 4 8 7 9