Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

The Statistical Imagination

Chapter 12:
Analysis of Variance:
Differences among Means of
Three or More Groups

2008 McGraw-Hill

Analysis of Variance (ANOVA)


ANOVA is used to compare three or
more group means
Instead of comparing each group
mean to the others (as with a t-test),
ANOVA compares each group mean
to the grand mean, which is the mean
for all cases in the sample
2008 McGraw-Hill

Main Effects
In ANOVA, the difference between
each group mean and the grand
mean is a test effect, which are
called main effects
When the main effects are zero, this
indicates that there are no
differences among the means
2008 McGraw-Hill

The ANOVA Hypothesis Test


For the ANOVA test, the H0 states that
the population means of the groups
are equal
The H0 can also be stated as the
main effects are equal to zero, or
there is no difference among the
means
2008 McGraw-Hill

The Idea Behind ANOVA


ANOVA hypothesizes about differences among
means, but its calculation is based on explaining
variance around the grand mean
E.G., suppose that the overall or grand mean of
socioeconomic status (SES) of all household
heads is 45. Urban residents, however, average
50. The 5-point difference we call the main effect
of the category urban

2008 McGraw-Hill

The Idea Behind


ANOVA (cont.)
Shaneka, an urban dweller, scores 60. This
is 15 SES points more than the grand
mean of 45. This 15 SES points is her
deviation score, the difference between her
raw score and the overall mean
ANOVA determines whether it is feasible to
say that 5 SES points of her 15-point
deviation score are due to the fact that she
is an urban resident
2008 McGraw-Hill

The Idea Behind


ANOVA (cont.)
The focus with ANOVA is on explaining
deviation scores
Deviation scores when squared,
summed, and averaged for a group of
scores make up the variance. Hence
the name analysis of variance
2008 McGraw-Hill

The Idea Behind


ANOVA (cont.)
With ANOVA we are asserting that the
spread of scores is due to the main effects
of the groups, as illustrated in Figure 12-2
in the text
Can scores be explained by differences
between group classifications? If so, then
scores will cluster around group means
rather than the grand mean, and this
suggests a difference among means
2008 McGraw-Hill

The General Linear Model


The general linear model is a useful
framework for understanding ANOVA
The general linear model states that the best
prediction of an individuals score on a
dependent variable is the overall mean plus
an adjustment for the effects of group
membership on an independent variable

2008 McGraw-Hill

Applying the
General Linear
Model

For Shaneka, the urban resident with


a SES of 60, we decompose her
score into 45 points for the grand
mean and 5 points explained by
urban resident (the main effect of
urban). The remaining 10 points are
unexplained error
2008 McGraw-Hill

Calculating ANOVA Statistics


ANOVA calculations are summarized
in a source table
To obtain variances, we calculate
three parts of the variation (or sums
of squares) of the interval/ratio
dependent variable and divide them
by degrees of freedom
2008 McGraw-Hill

Sums of Squares
The three types of sums of squares
for ANOVA are:
1. the total sum of squares (SST)
2. the between-group or explained
sum of squares (SSB), and
3. the within-group or unexplained sum
of squares (SSW)
2008 McGraw-Hill

Calculating the SST


The total sum of squares (SST) is
calculated by summing the squared
deviation scores for all cases
The SST is the same sum of squares
calculated for the standard deviation
(Chapter 5)
2008 McGraw-Hill

Calculating the SSB


The between-group or explained sum
of squares (SSB) is calculated by
squaring the main effect of each case
and summing these squares
The SSB is explained in the sense that
it is accounted for by differences
among the group means, as measured
by main effects
2008 McGraw-Hill

Calculating the SSW


The within-group or unexplained sum of
squares (SSW) is that part of the squared
deviation scores that is not accounted for
by main effects. It is unexplained error in
the prediction of scores
The SSW is most easily calculated by
subtracting the between-group sum of
squares from the total sum of squares
2008 McGraw-Hill

Calculating the Mean


Square Variance (MSV)
After sums of squares are computed, to
account for sample size and the number of
groups, these sums are divided by their
degrees of freedom. The resulting variances
are called mean square variances (MSV)
MSWB = the mean square variance between
groups
MSWW = the mean square variance within
groups
2008 McGraw-Hill

Calculating the
F-Ratio Test Statistic
The test statistic for ANOVA is the F-ratio
statistic
This is the ratio of the mean square variance
between groups to the mean square
variance within groups: F = MSVB / MSVW
The p-value is determined using Fdistribution curves, Appendix B, Tables D and
E
2008 McGraw-Hill

When to Use the


F-ratio Test
In general, we use ANOVA and the Fratio when testing a hypothesis
between a nominal/ordinal independent
variable with three or more categories,
and an interval/ratio dependent
variable
ANOVA is a difference of means test
and a cousin of the t-test
2008 McGraw-Hill

When to Use the


F-ratio Test (cont.)
1. Number of variables, samples, and populations:
a) One population with a single interval/ratio
dependent variable, comparing means for three or
more groups of a single nominal/ordinal
independent variable. Each groups sample must
be representative of its subpopulation, or
b) a single interval/ratio dependent variable whose
mean is compared for three or more populations
using representative samples
2008 McGraw-Hill

When to Use the


F-ratio Test (cont.)
2) Sample size: generally no
requirements. However, the dependent
interval/ratio variable should not be highly
skewed within any group sample.
Moreover, range tests are unreliable unless
sample sizes of groups are about equal.
These restrictions are less important when
group sample sizes are large
2008 McGraw-Hill

When to Use the


F-ratio Test (cont.)
3) Variances (and standard
deviations) of the groups are equal.
This is the same restraint for the t-test
(see equality of variances, Chapter
11)

2008 McGraw-Hill

Existence and Direction of the


Relationship for ANOVA
Existence: Determined by using the
F-ratio to test the null hypothesis of
equal group means
Direction: Not applicable (because
the independent variable is nominal)

2008 McGraw-Hill

Strength of the
Relationship for ANOVA
Strength: A strong relationship is one in
which a high proportion of the total
variance in the dependent interval/ratio
variable is accounted for by the group
variable
The correlation ratio, 2 (epsilon squared)
is a conservative measure that is unlikely
to overinflate the strength of the
relationship
2008 McGraw-Hill

Nature of the Relationship


for ANOVA
To assess the nature for ANOVA:
1) Make best estimates at the group level by
reporting the grand mean, group means,
and main effects
2) Provide examples of best estimates for
individuals using the general linear model
3) Use range tests to specify which group
means are significantly different from
others
2008 McGraw-Hill

Range Tests
With ANOVA, rejection of the null hypothesis
merely indicates that at least two group means are
significantly different
Range tests determine which means differ, by
establishing the range of differences between
means that is statistically significant
Tukeys Highly Significant Difference (HSD) is a
conservative range test, unlikely to mistakenly tell
us that a difference exists when in fact it does not
2008 McGraw-Hill

Statistical Follies
Care must be taken not to apply a group
finding to individuals
The ecological fallacy, drawing
conclusions about individuals on the basis
of analysis of group units, such as
communities, is an extreme case of
misapplying statistical findings
2008 McGraw-Hill

You might also like