Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Lesson 6.

4 Simple Analysis of Variance

Learning objectives:
After completing this lesson the student should be able to:

1. Define the sampling distribution of F and specify its characteristics.

2. Specify the H0 and H1 for one-way, independent groups ANOVA.

3. Solve problems using one-way ANOVA and specify the assumptions


underlying one-way ANOVA.

4. Explain why H1 in one-way ANOVA is always non-directional and why we


evaluate it with a one-tailed evaluation.

5. Specify the difference between planned and post hoc comparisons;


specify which is more powerful and explain why.

6. Do multiple comparisons using planned comparisons

7. Do multiple comparisons using the HSD and the Newman-Keuls (NK) tests.

8. Rank order planned comparisons, the HSD and the NK tests with regard
to power.

9. Understand the illustrative examples, do the practice problems, and


understand the solutions.

Introduction
We have been using the mean as the basic statistic for evaluating the null
hypothesis. It’s also possible to use the variance of the data for hypothesis testing.
One of the most important tests that does this is called the F test, after R. A. Fisher,
the statistician who developed it. In using this test, we calculate the statistic Fobt,
which fundamentally is the ratio of two independent variance estimates of the
same population variance.

Discussion
F TEST AND THE ANALYSIS OF VARIANCE (ANOVA)

The F test is appropriate in any experiment in which the scores can be used
to form two independent estimates of the population variance. One quite
frequent situation in the behavioural sciences for which the F test is appropriate
occurs when analyzing the data from experiments that use more than two groups
or conditions.
Given that it is frequently desirable to do experiments with more than two
groups, you may wonder why these experiments aren’t analyzed in the usual way.
For example, if the experiment used four independent groups, why not simply
compare the group means two at a time using the t- test for independent groups?
That is, why not just calculate t values comparing group 1 with 2, 3, and 4; 2 with
3 and 4; and 3 with 4?

The analysis of variance is a statistical technique used to analyze multigroup


experiments. Using the F test allows us to make one overall comparison that tells
whether there is a significant difference between the means of the groups. Thus,
it avoids the problem of an increased probability of Type I error that occurs when
assessing many t values. The analysis of variance, or ANOVA as it is frequently
called, is used in both independent groups and repeated measures designs. It is
also used when one or more factors (variables) are investigated in the same
experiment.

We shall consider the simplest of these designs: the simple randomized-


group design. This design is also often referred to as the one-way analysis of
variance, independent groups design. A third designation often used is the single
factor experiment, independent group’s design.* According to this design,
subjects are randomly sampled from the population and then randomly assigned
to the conditions, preferably such that there are an equal number of subjects in
each condition. There are as many independent groups as there are conditions.
If the study is investigating the effect of an independent variable as a factor, then
the conditions would be the different levels of the independent variable used.
Each group would receive a different level of the independent variable
(e.g., a different concentration of hormone X). Thus, in this design, scores from
several independent groups are analysed.
The alternative hypothesis used in the analysis of variance is non-directional.
It states that one or more of the conditions have different effects from at least one
of the others on the dependent variable. The null hypothesis states that the
different conditions are all equally effective, in which case the scores in each
group are random samples from populations having the same mean value. If
there are k groups, then the null hypothesis specifies that

Essentially, the analysis of variance partitions the total variability of the


data (SST) into two sources: the variability that exists within each group, called
the within-groups sum of squares (SSW), and the variability that exists between
the groups, called the between-groups sum of squares(SSB) . Each sum of
squares is used to form an independent estimate of the H0 population variance.
The estimate based on the within-groups variability is called the within groups
variance estimate (and the estimate based on the between-groups variability is
called the between-groups variance estimate.

ANOVA analyses sample variances to draw inferences about population


means. Sample variances can always be calculated as SS/df and these sample
variances are called mean squares (MS):
SSBetween

SSTotal
SSWithin

Computational Formula

(Σ𝑋)2
𝑆𝑆 𝑇𝑜𝑡𝑎𝑙 = Σ𝑋 2 − ,
𝑁
(Σ𝑋1 )2 (Σ𝑋2 )2 (Σ𝑋𝑎 )2 (Σ𝑋)2
SSBetween= + +⋯+ −
𝑛1 𝑛2 𝑛𝑎 𝑁

SSWithin= SSTotal – SSBetween

ANOVA Table
Source of Sum of
Variation Squares (SS) df MS Fcom F crit
Between Groups SS between C-1 SSB/C-1 MSB/MSW= from the table
(SSb/C-1)
Within Groups SS within N-C SSW/N-C /(SSW/N-C)
Total SS Total N-1

Example. Test if There is Significant differences among the three types of drugs
Placebo Drug A Drug B
9 5 2
8 4 4
8 5 3
6 8 1
9 3 5
Sum 40 25 15 80
Mean 8 5 3 5.333
Solution: six step rule
1. H0: μ1 = μ2 = μ3
Ha: at least one of the mean is different
2. 𝛼 = 0.05
3. ANOVA
4. Computation
(Σ𝑋)2
𝑆𝑆 𝑇𝑜𝑡𝑎𝑙 = Σ𝑋 2 − ,
𝑁

802
𝑆𝑆𝑇 = (92 + 82 + 82 + 62 + 92 + 52 + 42 +. . . + 12 + 5 2 ) −
15
=81+64+64+….+1+25-6400/15= 𝟗𝟑. 𝟑𝟑𝟑

(Σ𝑋1 )2 (Σ𝑋2 )2 (Σ𝑋𝑎 )2 (Σ𝑋)2


SSBetween= + + ⋯+ −
𝑛1
𝑛2 𝑛𝑎 𝑁
2 2
25 80
𝑆𝑆𝐵 = (402 /5 + + 152 /5) − = 𝟔𝟑. 𝟑𝟑𝟑
5 15

SSWithin= SSTotal – SSBetween

𝑆𝑆𝑤 = 93.333 − 63.333 = 30.00

Source of Variation SS df MS Fcom(obtain) F crit(tab)


Between Groups 63.333 3-1=2 63.33/2=31.66667 31.66667/2.500 3.885294=3.89
Within Groups 30.000 15-3=12 30/12=2.500 =12.66667
Total 93.33333 15-1=14

5. (Decision): Since F is greater than F-critical there is significant difference.


6. Interpretation: There is significant differences between the Treatment ( LSD
will interpret the differences)

Example 2 (Different Situations and Stress)


Suppose you are interested in determining whether certain situations
produce differing amounts of stress. You know the amount of the hormone
corticosterone circulating in the blood is a good measure of how stressed a
person is. You randomly assign 15 students into three groups of 5 each. The
students in group 1 have their corticosterone levels measured immediately after
returning from vacations (low stress). The students in group 2 have their
corticosterone levels measured after they have been in class for a week
(moderate stress). The students in group 3 are measured immediately before final
exam week (high stress). All measurements are taken at the same time of day.
You record the data shown in Table. Scores are in milligrams of corticosterone per
100 millilitres of blood.

1. What is the alternative hypothesis?


2. What is the null hypothesis?
3. What is the conclusion? Use𝛼 =0.05.
Table:
Group 1 Group 2 Group 3
Vacation Class Final Exam
3 10 10
2 8 13
7 7 14
2 5 13
6 10 15

Solution
1. Null hypothesis: The null hypothesis states that the different situations affect
stress equally. Therefore, the three sample sets of scores are random samples
from populations where
𝜇1 = 𝜇2 = 𝜇3
2. Alternative hypothesis: The alternative hypothesis states that at least one of
the situations affects stress differently than at least one of the remaining
situations. Therefore, at least one of the means 𝜇1 , 𝜇2 𝑎𝑛𝑑 𝜇3 differs from at
least one of the others.

3. Conclusion, using 𝛼 =0.05: The conclusion is reached in the same general way
as with the other inference tests. First, we calculate the appropriate statistic,
in this case , and then we evaluate based on its sampling distribution.
Computation:

Group 1 Group 2 Group 3


Vacation Class Final Exam
𝑋1 𝑿𝟏 𝟐 𝑿𝟐 𝑿𝟐 𝟐 𝑿𝟑 𝑿𝟑 𝟐
3 9 10 100 10 100
2 4 8 64 13 169
7 49 7 49 14 196
2 4 5 25 13 169
6 36 10 100 15 225
Sum 20 102 40 338 65 859
Mean 4 20.4 8 67.6 13 171.8
1252
𝑆𝑆𝑇 = 32 + 22 + 72 + 22 + 62 + 9 + 42 +. . . + 132 + 15 2 −
15
1252
=1299 - - =257.333
15
402 1252
𝑆𝑆𝐵 = (202 /5 + + 652 /5) − = 203. 𝟑𝟑𝟑
5 15
𝑆𝑆𝑤 = 257.333 − 203.333 = 54.00
ANOVA TABLE
Source of Variation SS df MS Fobt F crit
Between Groups 203.3333 2 101.6667 22.59259 3.885294
Within Groups 54 12 4.5
Total 257.3333 14

Since the F computed is greater than F critical we reject Ho

Self Assessment/ Activity


1. A college professor wants to determine the best way to present an important
topic to his class. He has the following three choices: (1) he can lecture, (2) he
can lecture and assign supplementary reading, or (3) he can show a film and
assign supplementary reading. He decides to do an experiment to evaluate the
three options. He solicits 27 volunteers from his class and randomly assigns 9 to
each of three conditions. In condition 1, he lectures to the students. In condition
2, he lectures plus assigns supplementary reading. In condition 3, the students see
a film on the topic plus receive the same supplementary reading as the students
in condition 2. The students are subsequently tested on the material. The following
scores (percentage correct) were obtained in the table
a. What is the overall null hypothesis?
b. What is the conclusion? Use level of significance 0.05.

2. An instructor is teaching three sections of Introductory Psychology, each


section covering the same material. She has made up a different final
exam for each section, but she suspects that one of the versions is more
difficult than the other two. She decides to conduct an experiment to
evaluate the difficulty of the exams. During the review period, just before
finals, she randomly selects five volunteers from each class. Class 1
volunteers are given version 1 of the exam; class 2 volunteers get version 2,
and class 3 volunteers receive version 3. Of course, all volunteers are sworn
not to reveal any of the exam questions, and also, of course, all of the
volunteers will receive a different final exam from the one they took in the
experiment. The following are the results.

a. What is the overall null hypothesis?


b. What is the conclusion? Use level of significance 0.05.

ANOVA Table
RELATIONSHIP BETWEEN ANOVA AND THE t TEST
When a study involves just two independent groups and we are testing the
null hypothesis that μ1 = μ2 we can use either the t test for independent groups or
the analysis of variance. In such situations, it can be shown algebraically that
𝑡2 = 𝐹
SIZE OF EFFECT USING 𝜔2 (omega square) and 𝜼𝟐 (Eta Squared),
We have already discussed the size of the effect of the X variable on the Y
variable in conjunction with correlational research when we discussed the
coefficient of determination 𝑟 2 .You will recall that 𝑟 2 is a measure of the
proportion of the total variability of Y accounted for by X and hence is a measure
of the strength of the relationship between X and Y. If the X variable is causal with
regard to the Y variable, the coefficient of determination is also a measure of the
size of the effect of X on Y.
The situation is very similar when we are dealing with the one-way,
independent groups ANOVA. In this situation, the independent variable is the X
variable and the dependent variable is the Y variable. One of the statistics
computed to measure size of effect in the one-way, independent groups ANOVA
is omega squared (𝝎𝟐 ).The other is eta squared (𝜼𝟐 ), which we discuss in the next
section. Conceptually, 𝝎𝟐 and 𝜼𝟐 are like 𝑟 2 in that each provides an estimate of
the proportion of the total variability of Y that is accounted for by X. 𝝎𝟐 is a
relatively unbiased estimate of this proportion in the population, whereas the
estimate provided by 𝜼𝟐 is more biased. The conceptual equation for is given by
𝜎𝐵 2
𝜔2 =
𝜎𝐵 2 + 𝜎𝑊 2
Since we do not know the values of these population variances, we estimate them from the
sample data. The resulting equation is
𝑆𝑆𝐵 − (𝐶 − 1)𝑀𝑆𝑤
𝜔2 =
𝑆𝑆𝑇 + 𝑀𝑆𝑤
Example Stress Experiment
Let’s compute the size of effect 𝝎𝟐 using for the stress experiment.
ANOVA TABLE
Source of Variation SS df MS Fobt F crit
Between Groups 203.3333 2 101.6667 22.59259 3.885294
Within Groups 54 12 4.5
Total 257.3333 14
𝑺𝑺 −(𝑪−𝟏)𝑴𝑺 𝟐𝟓𝟕.𝟑𝟑𝟑−(𝟑−𝟏)(𝟒.𝟓)
𝝎𝟐 = 𝑩 𝒘
= =0.742
𝑺𝑺𝑻 +𝑴𝑺𝒘 𝟐𝟓𝟕.𝟑𝟑𝟑+𝟒.𝟓

Thus, the estimate provided 𝝎𝟐 by tells us that the stress situations account
for 0.742 or 74.2% of the variance in corticosterone levels. Referring to Table 15.4,
since the value of 𝝎𝟐 is greater than 0.14, this is considered a large effect.

Eta Squared, 𝜼𝟐
Eta squared is an alternative measure for determining size of effect in one-
way, independent groups ANOVA experiments. It also provides an estimate of the
proportion of the total variability of Y that is accounted for by X, and is very similar
to𝝎𝟐 . However, it gives a more biased estimate than 𝝎𝟐 , and the biased estimate
is usually larger than the true size of the effect. Nevertheless, it is quite easy to
calculate, has been around longer than𝝎𝟐 , and is still commonly used. Hence,
we have included a discussion of it here. The equation for computing 𝜼𝟐 is given
by
𝑺𝑺𝑩
𝜼𝟐 =
𝑺𝑺𝑻
Stress Experiment
This time, let’s compute 𝜼𝟐 for the data of the stress experiment
ANOVA TABLE
Source of Variation SS df MS Fobt F crit
Between Groups 203.3333 2 101.6667 22.59259 3.885294
Within Groups 54 12 4.5
Total 257.3333 14
Computing the value of 𝜼𝟐 for these data, we
𝑺𝑺𝑩 𝟐𝟎𝟑. 𝟑𝟑𝟑
𝜼𝟐 = = = 𝟎. 𝟕𝟗
𝑺𝑺𝑻 𝟐𝟓𝟕. 𝟑𝟑𝟑
Based on 𝜼𝟐 , the stress situations account for 0.790 or 79.0% of the variance
in corticosterone levels. According to Cohen’s criteria (see Table 15.4), this value
of 𝜼𝟐 also indicates a large effect. Note, however, that the value of 𝜼𝟐 is larger
than the value obtained for 𝝎𝟐 , even though both were calculated on the same
data. Because 𝝎𝟐 provides a more accurate estimate of the size of effect, we
recommend its use over 𝜼𝟐 .

MULTIPLE COMPARISONS
In one-way ANOVA, a significant F value indicates that all the conditions
do not have the same effect on the dependent variable. For example, in the
illustrative experiment presented earlier in the chapter that investigated the
amount of stress produced by three situations, a significant F value was obtained
and we concluded that the three situations were not the same in the stress levels
they produced. For pedagogical reasons, we stopped the analysis at this
conclusion.
However, in actual practice, the analysis does not ordinarily end at this
point. Usually, we are also interested in determining which of the conditions differ
from each other. A significant F value tells us that at least one condition differs
from at least one of the others. It is also possible that they are all different or any
combination in between may be true. To determine which conditions differ,
multiple comparisons between pairs of group means are usually made. In the
remainder of this chapter, we shall discuss two types of comparisons that may be
made: a priori comparisons and a posteriori comparisons.
A Priori, or Planned, Comparisons
A priori comparisons are planned in advance of the experiment and often
arise from predictions based on theory and prior research. With a priori
comparisons, we do not correct for the higher probability of a Type I error that
arises due to multiple comparisons, as is done with the a posteriori methods. This
correction, which we shall cover in the next section, in effect makes it harder for
the null hypothesis to be rejected. When doing a priori comparisons, statisticians
do not agree on whether the comparisons must be orthogonal (i.e.,
independent). We have followed the position taken by Keppel and Winer that
planned comparisons need not be orthogonal as long as they flow meaningfully
and logically from the experimental design and are few in number.
In doing planned comparisons, the t test for independent groups is used.
We could calculate tobt in the usual way. For example, in comparing conditions
1 and 2, we could use the equation for t-test

A Posteriori, or Post Hoc, Comparisons


When the comparisons are not planned in advance, we must use an a
posteriori test. These comparisons usually arise after the experimenter sees the
data and picks groups with mean scores that are far apart, or else they arise from
doing all the comparisons possible with no theoretical a priori basis. Since these
comparisons were not planned before the experiment, we must correct for the
inflated probability values that occur when doing multiple comparisons, as
mentioned in the previous section.
Many methods are available for achieving this correction. The topic is fairly
complex, and it is beyond the scope of this text to present all of the methods.
However, we shall present two of the most commonly accepted methods: a
method devised by Tukey called the HSD (Honestly Significant Difference) test
and the Newman–Keuls test. Both of these tests are post hoc multiple comparison
tests. They maintain the Type I error rate at a while making all possible comparisons
between pairs of sample means.
You will recall that the problem with doing multiple t test comparisons is that
the critical values of t were derived under the assumption that there are only two
samples whose means are to be compared. This would be accomplished by
performing one t test. When there are many samples and hence more than one
comparison, the sampling distribution of t is no longer appropriate. In fact, if it
were to be used, the actual probability of making a Type I error would greatly
exceed alpha, particularly if many comparisons were made. Both the Tukey and
Newman–Keuls methods avoid this difficulty by using sampling distributions based
on comparing the means of many samples rather than just two. These
distributions, called the Q or Studentized range distributions, were developed by
randomly taking k samples of equal n from the same population (rather than just
two, as with the t test) and determining the difference between the highest and
lowest sample means. The differences were then divided by producing
distributions that were like the t distributions except that these provide the basis
for making multiple comparisons, not just a single comparison as in the t test. The
95th and 99th percentile points for the Q distribution are given in Table G in
Appendix D. These values are the critical values of Q for the 0.05 and 0.01 alpha
levels. As you might guess, the critical values depend on the number of sample
means and the degrees of freedom associated with .
In discussing the HSD and Newman–Keuls tests, it is useful to distinguish between
two aspects of Type I errors: the experiment-wise error rate and the comparison-
wise error rate.

De f i n i t i o n s
■ The experiment-wise error rate is the probability of making one or more Type I
errors for the full set of possible comparisons in an experiment.
■ The comparison-wise error rate is the probability of making a Type I error for any
of the possible comparisons.
The Tukey Honestly Significant Difference (HSD) Test
The Tukey Honestly Significant Difference test is designed to compare all
possible pairs of means while maintaining the Type I error for making the complete
set of comparisons at a. Thus, the HSD test maintains the experiment-wise Type I
error rate at a. The statistic calculated for this test is Q. It is defined by the following
equation:
̅𝟏 − 𝑿
𝑿 ̅𝟐
𝑸𝑶𝒃𝒕 =
√𝟐𝑴𝑺𝒘
𝒏
Note that in calculating Qobt, the smaller mean is always subtracted from
the larger mean. This always makes Qobt positive. Otherwise, the Q statistic is very
much like the t statistic, except it uses the Q distributions rather than the t
distributions. To use the statistic, we calculate Qobt for the desired comparisons
and compare Qobt with Qcrit, determined from Table G. The decision rule states
that if Qobt ≤Qcrit, reject H0. If not, then retain H0.

To illustrate the use of the HSD test, we shall apply it to the data of the stress
experiment. For the sake of illustration, we shall assume that all three comparisons
are desired. There are two steps in using the HSD test. First, we must calculate the
Qobt value for each comparison and then compare each value with Qcrit.

Group 1 Group 2 Group 3


Vacation Class Final Exam
𝑋1 𝑿𝟏 𝟐 𝑿𝟐 𝑿𝟐 𝟐 𝑿𝟑 𝑿𝟑 𝟐
3 9 10 100 10 100
2 4 8 64 13 169
7 49 7 49 14 196
2 4 5 25 13 169
6 36 10 100 15 225
Sum 20 102 40 338 65 859
Mean 4 20.4 8 67.6 13 171.8

ANOVA TABLE
Source of Variation SS df MS Fobt F crit
Between Groups 203.3333 2 101.6667 22.59259 3.885294
Within Groups 54 12 4.5
Total 257.3333 14

The calculations for Qobt are as follows:


̅𝒊 − 𝑿
𝑿 ̅𝒋
𝑸𝑶𝒃𝒕 =
√𝟐𝑴𝑺𝒘
𝒏
Comparing Vacation and Class
𝟖−𝟒
𝑸𝑶𝒃𝒕 = = 𝟒. 𝟐𝟏
√𝟐(𝟒. 𝟓)
𝟓
Comparing Class and Final Exam
𝟏𝟑 − 𝟖
𝑸𝑶𝒃𝒕 = = 𝟓. 𝟐𝟕
√𝟐(𝟒. 𝟓)
𝟓
Comparing Vacation and Final Exam
𝟏𝟑 − 𝟒
𝑸𝑶𝒃𝒕 = = 𝟗. 𝟒𝟖
√𝟐(𝟒. 𝟓)
𝟓
For this experiment, alpha was set at 0.05. From Table G, with df = 12, C=_ 3, and
we obtain Qcrit = 3.77
Since Qcrit = 3.77 for each comparison, we reject H0 in each case and
conclude that 𝜇1 ≠ 𝜇2 ≠ 𝜇3 . three conditions differ in stress-inducing value.

Summary table for Post Hoc Analysis


Mean
Variable Tested Mean Difference Qobt
(I-J)
Vacation 4.0
Vacation Class 8.0 4.0 4.21 *
Final Exam 13.0 9.0 9.48*
Class 8.0
Class
Final Exam 13.0 5 5.27*
QCritical =3.77< Qobt

You might also like