Factorial Analysis of Variance (ANOVA) On SPSS

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 98

Factorial Analysis of

Variance (ANOVA) on SPSS


Practice reproducing the analyses
yourself:
2 Factor Between (2 levels x 2 levels).sav
2 Factor Between (2 levels x3 levels).sav
3 Factor Between (2 levels x 2 levels x 2 levels).sav
2 Factor Within (2 levels x 2 levels).sav
All on Portal

Reading

http://www.socialresearchmethods.net/kb/expfact.htm - a
simple summary of factorial designs
http://davidmlane.com/hyperstat/index.html - see sections
11 & 12 for between subjects designs and section 13 for
within subjects (repeated measures) designs. This is
recommended its concise, clear and to the point. It also
contains a very good glossary from which you can quickly
refresh your memory for definitions of such things as the
Standard Error etc.
Chapters 10,11,12 of Gravetter & Forzano cover between,
within, and factorial design issues.
Chapters 13,14,15 of Gravetter & Wallnau cover the stats
ANOVA etc. However dont get bogged down with formulas
for calculating sums of squares. See next slide

Things you should know:


How to interpret interaction plots
How to interpret ANOVA tables and assumption tests
That the Error degrees of Freedom is always N-1 (N= total
number of data points)
That the degrees of freedom for a test of a main effect of a
factor = number of levels the factor contains -1.
That the degrees of freedom for a test of an interaction between
two or more factors = the number of levels in one factor x the
number of levels in the other xetc. Thus the DF for a 3 way
interaction between factors having 2,2 and 4 factor levels is 1 x 1
x 3=3.
That ANOVA uses F tests and that the F statistic for any effect is
the Mean Square for the Effect divided by the Error mean Square:
MScondition/MSerror
That when you have an alpha level of .05 this means that the
probability of not making a Type 1 error is 95% (.95) for each test
you do
Thus if you have 20 F tests in your ANOVA table the probability
of none of them being spurious is .95 x .95 x .95 x .95or .95 20
or (1-

Things you neednt worry about:

The precise way that Sums of Squares are calculated


(But it will help your understanding of ANOVA if you at least
understand the gist of how variability is partitioned).
How Levenes test or Mauchlys test are calculated only that
they test the assumption of homogeneity of variance for
between subjects designs and its (more or less equivalent) in
within subjects designs.
In the SPSS output you can largely ignore the following when
doing repeated measures analyses (at this stage at least):
The multivariate tests which you get at the beginning
Tests of within subject contrasts (although these can be a useful
tool for examining patterns in the data)
Any tests of between subjects effects that only involve an
intercept (i.e. you can ignore this output when all your factors
are within subjects)

1. Between Subjects
Designs

2 Factor designs

Data Format
All scores in a single column
Additional columns for each Factor

Main assumptions of ANOVA:


Assumptions:
There are 3 main assumptions underlying
ANOVA
1. Homogeneity of variance
The error variance within each condition should be
statistically equal. Thus any differences between
conditions should only be a shift in the mean. Put
another way the effect of treatment/condition
manipulations is to add a constant to each
individuals score.

Main assumptions of ANOVA:


OK

NOT OK

2. Normality

Main assumptions of ANOVA:

The distribution of errors within each condition


should be normal. By errors we mean deviations
from the mean for that condition.
Because the errors are the deviations from the
condition means this is equivalent to saying that
the scores should be distributed normally about
the condition means.

Main assumptions of ANOVA:


3.Independence of observations
The data points should represent independent
observations. Knowing the value of one should
not tell you anything about the value of any
other.
N.B. This assumption is obviously violated in
repeated measures experiments (because
knowing that one data point comes from subject
x who might be a particularly fast responder,
say- does tell you something about the likelihood
of another observation from subject x being
relatively fast). This is why Subjects have to be
included as a factor in the analysis of repeated
measures designs- the non-independent
component is partialled out.

Design

Experiment to investigate the effect


of stimulus duration and modality
(Word vs Picture) on Recognition
performance.

Dependent Variable (Score)


Two Factors: Modality and Duration

Factor Levels

Modality two levels


Word, Picture

Duration two levels


200msec, 800msec
= 2 x 2 design

200 ms
800 ms

Duration

Modality
Pictures

Words

5 subjects

5 subjects

5 subjects

5 subjects

Data
entry

View Factor Level Labels

This person scored 127.19


and was tested in the word
modality and with the 800msec
duration

Analyse / General Linear Model / Univariate

Dependent Variable : Score


Fixed Factors: duration + modality

Main effect plot for


modality

Main effect plot for duration

Interaction plot

Interaction plot

Options Condition means, descriptive stats,


test for homogeneity (equality) of variances.

Displays overall
mean, means for
each level of
duration, mean for
each level of
modality and the
means for each
combination of
duration by modality
(= the interaction
means).

Means

Produces Levenes
test for
homogeneity of
variance (one of
the assumptions of
Anova i.e. that
the variances
within each cell of
the design are not
significantly
different.

Homogeneity Test

Gives descriptive
statistics (mean,
max, min, SD etc.
by the
experimental
groups)

Descriptive stats

Output
Factors and
Factor level
labels

Output

Descriptivescell means &


SDs

Output

Levenes test. This significant result means


the assumption of equal group variances
has not been met.

Output

In this case the analysis is not valid !.


A data transformation may be of use
here.

Some cell SDs


considerably
different
**
**

At this point either

Abandon the analysis


See if a data transformation removes
the problem (e.g. Log(score))
Report results but with extreme
caution

2 Factor Between (2 levels x 2 levels).sa


levels).s

Assume we have different data:

Levenes test, and


any test that checks
assumptions for an
analysis should not
be significant.
Here the p value of .
271 says that there
is no evidence for
any differences in
variances between
the groups which is
what we want.

ANOVA Table
(Ignore shaded items)

Test for the Main Effect of Duration (i.e. 200 vs 800 ms


pooling across both Modalities)
Significant effect of Duration, F(1,16) = 5.5, p = .032

There was a significant effect of Stimulus Duration.


Participants who viewed the stimulus for 200 msec scored
higher (M =134) than those who viewed it for 800 msec (M =
115), F(1,16) = 5.5, p = .032.

Duration Profile Plot


Estimated Marginal Means of SCORE
140

This difference
is significant

Estimated Marginal Means

130

120

110
200

DURATION

800

ANOVA Table
(Ignore shaded items)

Test for the Main Effect of Modality (i.e. Pictures vs Words


pooling across both Durations).
No Significant effect of Modality.

Profile Plot for Modality


Estimated Marginal Means of SCORE

125.5

Estimated Marginal Means

Check the scale!

126.0

125.0

This difference
is not significant

124.5

124.0

123.5
Picture

MODALITY

Word

Any graphs you present should be using the same scale.


By default SPSS changes the scale so that the data takes
up the whole graph area. Here are the two graphs on the
same scale:
Estimated Marginal Means of SCORE

140.0

140

130.0

130

Estimated Marginal Means

Estimated Marginal Means

Estimated Marginal Means of SCORE

120.0

110.0
Picture

Word

MODALITY

120

110
200

800

DURATION

Modality

Duration

ANOVA Table
(Ignore shaded items)

Test for the Interaction between Modality and Duration.


There was a significant two-way interaction between modality
and duration, F(1,16) = 7.2, p = .017.

Profile Plot of Modality by Duration interaction


Estimated Marginal Means of SCORE
150

140

Estimated Marginal Means

130

120

MODALITY

110

Picture
100

Word

200

DURATION

800

Main effect of Duration is still observable in the graph


Estimated Marginal Means of SCORE
150

200 msec Average


140

800 msec Average

Estimated Marginal Means

130

120

MODALITY

110

Picture
100

Word

200

DURATION

800

Estimated Marginal Means of SCORE


140

Estimated Marginal Means

130

120

110
200

DURATION

800

Main effect of Duration is still observable in the graph


Estimated Marginal Means of SCORE
150

200 msec Average


140

800 msec Average

Estimated Marginal Means

130

120

MODALITY

110

Picture
100

Word

200

DURATION

800

Interpretation of the Modality by Duration Interaction


Estimated Marginal Means of SCORE
150

Several ways
of describing
the interaction:

140

Estimated Marginal Means

130

120

MODALITY

110

Picture
100

Word

200

DURATION

800

Interpretation of the Modality by Duration Interaction


Estimated Marginal Means of SCORE
150

140

Estimated Marginal Means

130

120

MODALITY

110

Picture
100

Word

200

DURATION

800

.At the 200 msec


duration pictures
resulted in scores
approximately 20 points
higher than words
whereas at the 800 msec
duration the opposite
pattern was true with
words producing scores
approximately 20 points
below pictures), F
(1,16) = 7.2, p = .
017.

Interpretation of the Modality by Duration Interaction


Estimated Marginal Means of SCORE
150

140

Estimated Marginal Means

130

120

MODALITY

110

Picture
100

Word

200

DURATION

800

For words there


was a small
increase in
performance
going from the
200 msec (M= to
the 800 msec
duration. With
pictures,
however, there
was a large
decrease in
performance

Alternative Plot same data


Estimated Marginal Means of SCORE
150

140

Estimated Marginal Means

130

120

DURATION

110

200
100

800

Picture

MODALITY

Word

At the 200 msec


duration performance
was better with
pictures (M = 144)
than words (M = 124)
whereas at the 200
msec duration the
opposite was true with
words giving better
performance (M = 127)
than pictures (M =
103), F (1,16) = 7.2, p
= .017.

Extension to factors with 3 Levels

10 extra participants at
500 msec duration - 5
with Words, 5 with Pictures

2 Factor Between (2 levels x 3 levels)

The analysis is the same, however the


interpretation of the main effect of DURATION is a
little more complex:

Note the increased Degree of Freedom for Duration


and the interaction

Duration Profile Plot:

A significant F test only says that not all the means are e

To examine individual pair-wise comparisons:


1. If you make a priori predictions about which
means you are interested in comparing:
You can use Simple T tests (LSD) for 3 means
Sidak or Bonferroni for a greater number of
comparisons.
2. If you want to make post hoc comparisons:
You can use Tukeys Test

Note that the more conservative Tukey test only finds


one significant difference whereas LSD finds two. Note
the Tukey test requires equal sample sizes.

There was a significant main effect if stimulus


duration,
F (2,24) = 8.07, p =.02. Post Hoc
comparisons using Tukeys HSD showed that only the
difference between the 200 (M = 134.3) and 500 (M
= 101.8) durations was significant, p=.001.

3 Factor Designs

3 Factor Between (2 levels x 2 levels x 2 levels).sav

Adding a third Noise


factor with two levels
(Low, High) requires
doubling the number of
subjects, assuming you
still want 5 in each cell.
In the following
analysis, for ease of
interpretation, we will
go back to having just
two levels of the
duration factor (200 vs.
800).

Logic of the analysis is the same but we now


have:
3 possible main effects :
Duration
Modality
Noise
3 possible 2-way interactions:
Duration x modality
Duration x noise
Modality x noise
1 possible 3-way interaction
Duration x modality x noise

Both main effects of duration and noise significant. 3way interaction also significant.

Interpreting 3-Way interactions.


Much easier if you have some predictions
about the expected pattern
For instance in this example we might
predict that as well as generally
decreasing performance high levels of
noise might obscure any differences
between the picture and word conditions:

3-way interaction is a difference in the pattern of a


2-way interaction at levels of the third factor

There was a significant 3 way interaction between


duration, modality and noise, F (1,32)=4.5, p = .041. In
the low noise condition pictures and words produced
opposite effects on performance at the two durations. At
stimulus presentations of 200msec words gave rise to
performance some 20 points lower than pictures
whereas the reverse pattern was true for the 800 msec
duration. With high noise, however, there was very little

If you want to provide a bit more weight to your


conclusions concerning the interpretation of the 3way interaction you could perform a simple
interaction effects analysis.
This is actually very easy
You just run two separate ANOVAs one at
each level of (in this example) the noise factor.
Each of these analyses has the factors
duration and modality but one uses the data
from the high noise condition and the other
from the low noise condition.
You then interpret the 2-way interactions
between duration and modality at each level of
noise

One ANOVA on this data

One ANOVA on this data

Can then say whether it is true that the interaction on


the left (low noise) is significant whilst the one on the
right (high noise) is not.
There is one catch the F ratio for the 2-way
interactions in each separate analysis needs to be
computed using the MSerror from the original analysis.

Original 3 Factor ANOVA

MSerror from the original analysis = 400.8 on 32 DF

You now need to run the


two separate 2 way
ANOVAS on the data from
the high and low noise
conditions.
On SPSS the easiest way
to do this is to first split
the data using the split
data command.

Any subsequent commands be


they Tables, Plots or, as in this
case ANOVAs, will now be
done separately for each level
of the grouping variable
(noise):

Having split the data file by the noise variable you


now simply perform a 2 way ANOVA, with factors
duration and modality as before:
Analyse / General Linear Model / Univariate

This factor is
left out as it
is the one
used to split
the file

SPSS will now compute the two 2 way ANOVAs

This table is simply 2 ANOVA tables put together


one for the low noise data and one for the high noise

However the F ratios are wrong as they need to be


computed using the MSerror from the original 3 way

Original 3 Factor ANOVA

MSerror from the original analysis = 400.8 on 32 DF

F ratios are simply the result of dividing the Mean


Square for the effect by the error Mean Square
(MSerror)
E.g. the duration F ratio is simply MSduration / MSerror
For the simple interaction effects follow up we need
to compute our own F ratios for the modality by
duration interactions at each noise level by
substituting the MSerror from the original analysis.

MSerror from the original analysis = 400.8 on 32 DF

For the low noise interaction the correct F ratio is 2393.978 / 400.8 =
For the high noise interaction the correct F ratio is 129.97 /400.8 = .32

For the low noise interaction the correct F ratio is 2393.978 / 400.8 =
For the high noise interaction the correct F ratio is 129.97 /400.8 = .32

5.97

.32

To work out the p value you need either to look it


up in F tables.
Or to calculate the exact probability (very easlily)
using a package such as Excel:

E.g. To calculate the p value associated with the low


noise modality x duration interaction:
The value we got was 5.97
This is based on 1 df for the effect and 32 df for error
Click in any cell in Excel and type:
=FDIST(5.97,1,32) and press return
=FDIST(5.97,1,3
2)
NB. Dont forget the = at the start of the formula
Excel then gives the answer:
The simple interaction effect at the low noise level
was significant, F (1,32) = 5.97, p = .02.

Repeated Measures designs

These are where the same subject is tested in


the different experimental conditions

Advantages are that the test is more sensitive


Disadvantages things like order effects,
practice effects etc.
Not always possible in principle e.g. if
partaking in one condition exposes subjects
to information that will ruin them for any
other condition

2 Factor Within (2 levels x 2 levels).

Test is more sensitive because:


Individual differences are controlled for:
e.g. suppose a reaction time study:
Some people are just faster average
responders than others. What we are
usually interested in is the relative
effect of a treatment on performance
Repeated measures (or within subjects)
designs examine the relative effect of
conditions on individuals

Repeated measures ANOVA on SPSS

Interpretation of effects from the ANOVA table is the sa


Main difference is in the data entry
Designs can be all repeated measures or a mixture

E.g. A two factor repeated measures design could have

Both factors as repeated measures (or within subjects)


Or
One repeated measure and one between subjects meas

Both factors as repeated measures:


Each subject is tested under every combination
The order of the combinations would normally
be randomised for each subject
Or
Pseudo-randomised so that equal numbers of
subjects receive each order (this is the most
common method)

Modality by Stimulus duration data


Assuming this experiment was carried out with
both factors as repeated measures:

This is how the


data is entered
into SPSS.
Each row
represents scores
from a single
subject.

Each subject has 4


data points.
These could be single
scores or the average
of many trials under
that condition. The
latter is common with
measures such as RT
which are inherently
noisy (i.e. you need to
take the average of
many raw data points
to get a good estimate
for that subject under
those conditions).

Give the columns meaningful names the first column contains


data from the Duration level 1 (200msec) and Modality level 1
(picture). You can use short hand for the actual column names
and put the longer, more meaningful, description as the variable
label:

To avoid confusion later the columns should always


be ordered in a hierarchy - take a 3 Factor example
(all with 2 levels and where F1(1) = Factor 1 Level
1):
F1(2)

F1(1)

F2(1)
F3(1)

F3(2)

F2(2)
F3(1)

F3(2)

F2(1)
F3(1)

F3(2)

F2(2)
F3(1)

F3(2)

To run the analysis:

First Factor is Duration and this has two levels


NB the first factor is the one at the top of the hierarchy:

Order in which you define the factors in SPSS

2
3

F1(2)

F1(1)

F2(1)
F3(1)

F3(2)

F2(2)
F3(1)

F3(2)

F2(1)
F3(1)

F3(2)

F2(2)
F3(1)

F3(2)

Second factor is
modality with two
levels

This sets up all the


factors now click
Define to tell SPSS
where the columns
are that correspond
to each factor level
combination

The first question mark is


asking where is the column
containing the data from
level 1 of factor 1 and level 1
of factor 2? This is our
column 1 (d1m1)

Note at the top where it says Within-Subjects Variables you


get a reminder of which is the first and second factors. The
order we defined the factors in was duration then modality
hence at the top we have (duration, modality). The numbers
in the brackets refer to the levels of the corresponding
factors.

The process continues until all the within


subject variables have been set up. NB:
only when you set up the factors in the
data sheet according to the hierarchy and
define the factors starting from the top of
the hierarchy will they be in the correct
order already.

Once set up you can use the plots and options


(display means) in exactly the same way as with
between subjects designs.

SPSS Output
This is not quite the same as for between
subjects designs.
The first box just summarises the withinsubjects factors and allows you to check
that they have been entered in the right
order:

You can ignore the multivariate tests output


unless you have special reason to question
certain assumptions.

Mauchlys Sphericity test is an important assumption test it


is the repeated measures equivalent of Levenes test for
homogeneity of variance. IT SHOULD NOT BE SIGNIFICANT.
NB when, as in this case, a factor only has two levels the
sphericity cannot be violated and there is never a problem.
The dots in the SIG column simply mean that the test is not
appropriate.

The Tests of Within Subjects Effects are where you


find the significance tests for all your within subjects
factors and any interactions involving any within
subjects factor. Highlighted here is the test for a main
effect of Duration.
If for any test there is no violation of sphericity use

Suppose the test of sphercity for the interaction had


given a significant result (p<.05). Then when you
came to interpret the interaction effect in the main
ANOVA table you would use the Greenhouse-Geiser
adjustment:

Notice also that in a repeated measures design any


within-subjects variable has its own error term and
this should be checked when giving the DFs for a
test:
E.g. ..interaction was significant F (1,19) = .
Here the 1 , as before, comes from the DF
associated with the test of the interaction and the
19 comes from the DF associated with the Specific

The Tests of Within-Subjects contrasts only really


apply
When you have a factor with more than 2 levels
and
You want to test for a particular trend (e.g. that
performance increases in a straight line (linear)
fashion as drug dosage increases.

Plots and tables of means can be interpreted in


exactly the same way as between subjects designs.

You might also like