Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 64

Principles of the t-test and ANOVA

Prof. Simon Marwood


Fundamental Principle #1
• We can express essentially everything in statistics as:
Fundamental Principle #2
• A good model will explain most of the variance in the data.
• We can compare how much variance is explained by the model to
how much is not explained

If the model is a good one, we would expect this ratio to be above 1


Comparing 2 means
The t-test
Comparing 2 means: the paired t-test
• The “model” we fit to this data to explain it is the difference between
the two means
• Our “null” expectation is that difference between means is 0

• The error, or that which cannot be explained by the model, is the


sampling variation. We use the standard error of the differences in
the mean
• SD of differences / √n
Comparing 2 means: the paired t-test

zero

The value for t is compared to a table of t-distribution with


the same degrees of freedom to determine the probability
of the model being in appropriate
Comparing 2 means: the paired t-test

zero

If the SE of differences was small, then a small(ish)


difference between means would be important

If the SE of differences was large, then only large


differences between means would be important
Sum d^2 = 44.8333
Var of differences = 44.83 / 5
= 8.97

Standard deviation (SD; of differences) = √8.97 = 2.99


Standard Error of Mean (SEM; of differences) = 2.99 / √6 = 1.22
Comparing 2 means: the paired t-test

zero

t-statistic = 2.833 (mean diff) / 1.22 (SEM diffs)


t = -2.473
-7 / 2.831 = -2.473
(pause)
A very simple statistical model
• The mean
• A hypothetical model of your dataset (a number which may not be observed
at all)
• It may also be an impossible value
• i.e. The mean number of friends that lecturers have is 2.6
How well does the mean fit?
• Is it a good approximation of the real world or not?

• The next slide shows the individual data that brought about a mean of
2.6 friends
Lecturer 1 had just 1 friend
His DEVIANCE from the mean is 1 – 2.6 (observed – expected)
The model / mean OVER-ESTIMATED how many friend he has
How well does the mean fit
• We could add up all of these deviances to get a sum of deviances,
which would be a measure of the TOTAL ERROR
= -1.6 + -0.6 + 0.4 + 0.4 + 1.4
=0

• No total error? The mean is a perfect representation of the model?


How well does the mean fit?
• We obviously need to resolve the issue of the different directions of
deviance canceling each other out

• One way to do this is to square each deviance (or error) and then add
them all up
• The Sum of Squared errors (SS)
How well does the mean fit?

However, surely the SS will be bigger just by having more observations, even of the “fit”
(how close each observation is to the mean) is very good?

To overcome this problem, we could divide by the number of observations: SS / n

Typically, we actually divide by (n – 1) as we are normally working with a sample from a


population, rather than the population
(diversion)
Degrees of Freedom
• How much choice do you have when assigning 11 players to a position
on the football field?
• After you have made 10 choices, there is no “freedom” to choose as there is
only one place left
• There n – 1 degrees of freedom
Degrees of Freedom
• If you have a sample of 4 observations from the population, they are
free to vary.
• If the mean was 10, you would then assume that the population was
10
• This value is then held constant
• Given a mean of 10, are all 4 values free to vary?
• No, because once 3 have been assigned, there is no freedom to choose for
the 4th, it is forced to ensure the mean = 10
• There are 3 degrees of freedom (n – 1)
Back to the fit of the mean
• SS / n – 1 is otherwise known as the VARIANCE

• It measures how well the model fits the data

• Often the square root is of this is determined, otherwise known as


STANDARD DEVIATION
Sum of Squared error (SS)

VARIANCE & SD Deviance/error squared


Fundamental Principle #3
• We assess the “fit” of a model by examining the sum of squared
differences between the observed data and the model data

• Often we then need to divide by the degrees of freedom to correct for


the number of observations
Comparing 3 or more means
ANOVA
There are 3 variance components:
• total variance (SST)
• model variance (SSM)
• unexplained variance (SSR)
• Note that SSM + SSR = SST.

How do we calculate these values?


CON CHO CAF

87 70 55

70 65 65

75 60 70

78 70 55

82 65 55
90
CON

CHO

CAF

GRAND MEAN
Total sum of squares
• SST = S(x – grand mean)2
• But this is a pain to calculate

• Remember though that Variance = SS / (n – 1)

• Therefore: SST = grand variance * (n – 1)


CON CHO CAF
1319.7

87 70 55

70 65 65

75 60 70

78 70 55

82 65 55

Total SS = Variance * (15 – 1) = 94.3 * 14 = 1319.7


If there was no difference between the means of the 3 groups, the green, purple and pink
90 lines would all be at the same value, along the value of the overall mean
horizontal
CON
This is a model of “no effect”
We could fit another model where we say that the mean of each group predicts the value for
each participant
CHO us that a large amount of the total
If this model sum of squares is relatively large (telling
variation can be explained by the model), then perhaps this new model is better than the old
model; in other words the group means differ significantly from the overall mean.
CAF

GRAND MEAN
Model Sum of Squares
1. Calculate difference between
mean of each group and grand
mean
2. Square these error terms 5d2 d2
(squared error) d2 5d2
5d2 d2
3. Multiply each result by
number in respective group
4. Add them up (sum of
squares!)
90 SSCON = {(78.4 – 68.13∙)2}*5 = 527
Mean CON = 78.4

SSCHO = {(66 – 68.13∙)2}*5 = 22.8

Mean CHO = 66
SSCHO = {(60 – 68.13∙)2}*5 = 330.8

Mean CAF = 60

GRAND MEAN: 68.13∙

SSM = 527 + 22.8 + 330.8 = 880.5


(“Between Model”)
CON CHO CAF
1319.7

87 70 55

70 65 65

75 880.5 60 1319.7 – 880.570


= 439.2

78 70 55

82 65 55
Mean Squares
• We need to eliminate the influence of the different number of
observations that make up each SS term

• To do this we divide by degrees of freedom


• Model degrees of freedom = number of groups – 1
• (k – 1) = 2
• Residual degrees of freedom = total degrees of freedom - degrees of freedom
of model
• (n – 1) – (k – 1) = 14 – 2 = 12
• n – k (15 – 3) = 12
The test statistic for ANOVA (F-ratio)
• Mean SSM = SSM / df
• 880.5 / 2 = 440.3
• Mean SSR = SSR / df
• 439.2 / 12 = 36.6

• F = Mean SSM / Mean SSR


• 440.3 / 36.6 = 12.03
F-Ratio
• F-ratio of 12.03 is then compared against critical F-values in the F-
distribution for the associated degrees of freedom to see if it is an
unlikely or likely finding
With 2 & 12 degrees of freedom, critical value is 3.89 for p=0.05
With 2 & 12 degrees of freedom, critical values are 3.89 (p=0.05) and 6.93 (p=0.01)
Therefore somewhere between 1% and 5% probability of this ratio occurring by
chance

In other words: there is a significant difference


1-Way ANOVA from SPSS
Assumptions of Independent ANOVA
• Variances of the data in each condition need to be similar
• “Homogeneity of variance”
• If violated, can be accounted for by selection of appropriate corrections
• Welch
• Brown-Forsythe
• Observations should be independent
• Scale data used
• Distribution within groups needs to be normal
• If n>30, this is not typically an issue
• ANOVA is robust to violations here?
• When group sizes are equal, F-stat is robust to violations of normality
• If groups are normally distributed, F-stat is robust to different sample sizes
Assumptions of ANOVA
• When groups with larger sample sizes have larger variance
• F-stat is conservative (likely to be non-sig)
• Type II error (reject alternative hypothesis when true)
• When groups with larger sample sizes have smaller variance
• F-stat is liberal (likely to be sig)
• Type I error (reject null hypothesis when true)
Comparing 3 or more means:
Paired or Repeated Measures
data
ANOVA
ANOVA with Repeated Measures
• One problem with repeated measures designs is with the assumption
of independence
• Here, the scores from an individual across conditions are absolutely
dependent on each other
• Although one individual should not impact on the scores from another
• It is possible to “model” this dependence, but the most popular
choice is to use the ANOVA, which is simpler, but has additional
assumptions
• In particular: sphericity
Sphericity (sometimes circularity): Ɛ
• We assume that variance of differences between conditions is similar

When are difference variances


different enough?
Sphericity?
• It’s OK, there’s a test!

Epsilon (Ɛ) is an estimate of sphericity: 1 – 1/(no. conditions - 1) With a “failed” test of sphericity, the GG correction is
the safest/most conservative (but: type 2 error)

But HF is a little liberal


Rule of thumb:
If GG Epsilon (Ɛ) is < 0.75, use the GG correction
If GG Epsilon (Ɛ) is > 0.75, use the HF correction
Sphericity

Hurray! I can use “Sphericity assumed” on the SPSS Output!

Problem. Sphericity is: (too) easy to detect with large sample size, (too) hard to detect with a small sample size

Field (2018) recommends ALWAYS using the corrected version of your ANOVA in accordance with the rules of thumb
Factorial & Factorial Repeated
Measures ANOVA
Oh no
Factorial (repeated) ANOVA
• Very commonly we measure parameters across two different factors
• We might measure plasma [FFA] over time with and without the consumption
of a high fat diet
“Two-Way ANOVA with repeated measures
on both factors (time*diet)”

Diet would have 2 levels


Time would have 7 levels

This analysis generates three things:


Main Effect diet
Main effect time
Interaction diet*time
Main effect of time? ✓

Main effect of condition? ✗

Interaction? ✗
Main effect of time? ✗

Main effect of condition? ✓

Interaction? ✓✗
Main effect of time? ✓

Main effect of condition? ✓

Interaction? ✗
Main effect of time? ✓

Main effect of condition? maybe

Interaction? ✓✓✓
Main effect of time? ✓

Main effect of condition? ✓

Interaction? ✓
Factorial ANOVA
• We may also have a completely independent design, but with two
factors
• We might measure anxiety in male and female students (sex) in Further
Education and Higher Education (Education)
• Two-Way ANOVA (Sex*Education), with 2 levels for each factor

Male

Female

FE HE
Education
Mixed ANOVA

Male

Female
ANCOVA: analysis of covariance

Male

Female

Correct for differences in the baseline


condition
Summary
• “Test statistics” are generated via a measure of effect / error; the resultant value is
compared with “critical” values within a probability density function
• The ANOVA procedure can be extended to a wide range of study designs and is the most
“popular” statistical approach
• Repeated Measures
• Multiple factors
• Mixed Designs
• The ANCOVA is a special design to control for baseline differences in a mixed design

• ANOVA compares (ratio of) the average model sum-of-squares with the average error sum-
of-squares
• Producing an F-ratio that is compared with the F-distribution
• Repeated measures designs need to examine the assumption of sphericity carefully

• When multiple factors are considered, there are both “main effects” and “interactions”
• Consider your experimental design carefully
Tasks
• Familiarise yourself with SPSS and follow step-by-step guides to
setting up an ANOVA
• ANOVA1

• Before next week, look at the SPSS worksheet on “Examining


Assumptions” and read the appropriate chapter in the textbook
Essential Resources
• Field, Andy P. Discovering Statistics Using IBM SPSS . Fifth edition. Los
Angeles: SAGE, 2017.

• https://my.hope.ac.uk/spss/SPSS.php

You might also like