Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 24

Review of one-way ANOVA

Kristin Sainani Ph.D.


http://www.stanford.edu/~kcobb
Stanford University
Department of Health Research and Policy
ANOVA
for comparing means between
more than 2 groups
The F-distribution
 A ratio of variances follows an F-distribution:
 2
between
~ Fn ,m
 2
within

The F-test tests the hypothesis that two variances are


equal.
F will be close to 1 if sample variances are equal.
H 0 :  between
2
  within
2

H a :  between
2
  within
2
How to calculate ANOVA’s by
hand…
 
Treatment 1 Treatment 2 Treatment 3 Treatment 4
y11 y21 y31 y41
y12 y22 y32 y42
n=10 obs./group
y13 y23 y33 y43
y14 y24 y34 y44 k=4 groups
y15 y25 y35 y45
y16 y26 y36 y46
y17 y27 y37 y47
y18 y28 y38 y48
y19 y29 y39 y49
y110 y210 y310 y410
10


10 10 10
y1 j
y 2j y 3j y 4j The group means
j 1 j 1
y1  y 2 
j 1
y 3 
j 1 y 4 
10 10 10 10
10 10

10 10

(y 1j  y1 ) 2

j 1
( y 2 j  y 2 ) 2 (y
j 1
3j  y 3 ) 2
(y
j 1
4j  y 4 ) 2
The (within)
j 1

10  1 10  1 10  1 10  1 group variances
Sum of Squares Within (SSW),
or Sum of Squares Error (SSE)
10 10 10
(y
10
(y (y
2
 y 2 )
(y  y 4 ) 2
2
1j  y1 ) 2 2j 3j  y 3 ) 4j
j 1 j 1 j 1
The (within)
j 1
group variances
10  1 10  1 10  1 10  1

10 10

 (y
10 10

(y   y 4 ) 2
2
1j
2
 y1 ) + ( y 2 j  y 2 ) 2 + ( y 3 j  y 3 ) + 4j
j 1 j 3 j 1
j 1

4 10
  i 1 j 1
( y ij  y i  ) 2 Sum of Squares Within (SSW)
(or SSE, for chance error)
Sum of Squares Between (SSB), or
Sum of Squares Regression (SSR)
4 10
Overall mean of
all 40  y
i 1 j 1
ij
observations
(“grand mean”) y  
40

(y
Sum of Squares Between
2
10 x  y  )
(SSB). Variability of the
group means compared to
i the grand mean (the
i 1 variability due to the
treatment).
Total Sum of Squares (SST)

Total sum of squares(TSS).


4 10


Squared difference of every

( y ij  y  ) 2 observation from the overall


mean. (numerator of
variance of Y!)
i 1 j 1
Partitioning of Variance

4 10 4 4 10
 ( y
i 1 j 1
ij  y i ) 2

+ 10x ( y i   y  ) 2 =  ( y ij  y  ) 2
i 1 i 1 j 1

SSW + SSB = TSS


ANOVA Table
     

Mean Sum
Source of Sum of of Squares
variation d.f. squares F-statistic p-value

Between k-1 SSB SSB/k-1 Go to


SSB
(sum of squared k 1
(k groups) SSW Fk-1,nk-k
deviations of nk  k
group means from chart
grand mean)

Within nk-k SSW s2=SSW/nk-k


(sum of squared
(n individuals per
deviations of
group)
observations from
their group mean)

Total nk-1 TSS


variation (sum of squared deviations of
observations from grand mean)   TSS=SSB + SSW
n
X n  Yn 2 n
X  Yn 2

SSB  n (X n  ( ))  n (Yn  ( n ))  
ANOVA=t-test i 1 2 i 1 2
n n
X n Yn 2 Y X

n (
i 1 2
 )  n ( n  n )2
2 i 1 2 2

X n 2 Yn 2 X *Y Y X X *Y
n(( )  ( )  2 n n  ( n )2  ( n )2  2 n n ) 
2 2 2 2 2 2
      2 2 2
n( X n  2 X n * Yn  Yn )  n( X n  Yn )
Mean
Source of Sum of Sum of
variation d.f. squares Squares F-statistic p-value

Between 1 SSB Squared Go to


n( X  Y ) 2 (X  Y ) 2 2
( )  (t 2 n  2 )
(2 groups) (squared difference sp
2
sp
2
sp
2 F1, 2n-2
difference in means 
n n Chart
in means times n notice
multiplied values are
by n) just (t 2n-2)2
Within 2n-2 SSW Pooled
variance
equivalent to
numerator of
pooled
variance

Total 2n-1 TSS


variation  
Example
Treatment 1 Treatment 2 Treatment 3 Treatment 4
60 inches 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
Example
Step 1) calculate the sum
Treatment 1 Treatment 2 Treatment 3 Treatment 4
of squares between groups:
60 inches 50 48 47
  67 52 49 67
42 43 50 54
Mean for group 1 = 62.0 67 67 55 67
56 67 56 68
Mean for group 2 = 59.7
62 59 61 65
Mean for group 3 = 56.3 64 67 61 65
59 64 60 56
Mean for group 4 = 61.4 72 63 59 60
71 65 64 65
 
Grand mean= 59.85

SSB = [(62-59.85)2 + (59.7-59.85)2 + (56.3-59.85)2 + (61.4-59.85)2 ] xn per


group= 19.65x10 = 196.5
Example
Step 2) calculate the sum
Treatment 1 Treatment 2 Treatment 3 Treatment 4
of squares within groups:
60 inches 50 48 47
 
67 52 49 67
(60-62) +(67-62) + (42-62)
2 2
42 43 50 54
2
+ (67-62) 2+ (56-62) 2+ (62-
62) 2+ (64-62) 2+ (59-62) 2+ 67 67 55 67

(72-62) 2+ (71-62) 2+ (50- 56 67 56 68


59.7) 2+ (52-59.7) 2+ (43- 62 59 61 65
59.7) 2+67-59.7) 2+ (67-
64 67 61 65
59.7) 2+ (69-59.7) 2…+….
(sum of 40 squared 59 64 60 56

deviations) = 2060.6 72 63 59 60

71 65 64 65
Step 3) Fill in the ANOVA table
           

Source of variation d.f. Sum of squares Mean Sum of F-statistic p-value


Squares

Between 3 196.5 65.5 1.14 .344

Within 36 2060.6 57.2    

 
Total 39 2257.1      
Step 3) Fill in the ANOVA table
           

Source of variation d.f. Sum of squares Mean Sum of F-statistic p-value


Squares

Between 3 196.5 65.5 1.14 .344

Within 36 2060.6 57.2    

 
Total 39 2257.1      

INTERPRETATION of ANOVA:
How much of the variance in height is explained by treatment group?
R2=“Coefficient of Determination” = SSB/TSS = 196.5/2275.1=9%
Coefficient of Determination

2 SSB SSB
R  
SSB  SSE SST
The amount of variation in the outcome variable (dependent
variable) that is explained by the predictor (independent variable).
ANOVA example
Table 6. Mean micronutrient intake from the school lunch by school
S1a, n=25 S2b, n=25 S3c, n=25 P-valued
Calcium (mg) Mean 117.8 158.7 206.5 0.000
SDe 62.4 70.5 86.2
Iron (mg) Mean 2.0 2.0 2.0 0.854
SD 0.6 0.6 0.6
Folate (μg) Mean 26.6 38.7 42.6 0.000
SD 13.1 14.5 15.1
Mean 1.9 1.5 1.3 0.055
Zinc (mg)
SD 1.0 1.2 0.4
a
 School 1 (most deprived; 40% subsidized lunches).
b
 School 2 (medium deprived; <10% subsidized).
c
 School 3 (least deprived; no subsidization, private school).
d
 ANOVA; significant differences are highlighted in bold (P<0.05).
Answer
Step 1) calculate the sum of squares between groups:
Mean for School 1 = 117.8
Mean for School 2 = 158.7
Mean for School 3 = 206.5

Grand mean: 161

SSB = [(117.8-161)2 + (158.7-161)2 + (206.5-161)2] x25 per


group= 98,113
Answer
Step 2) calculate the sum of squares within groups:
 
S.D. for S1 = 62.4
S.D. for S2 = 70.5
S.D. for S3 = 86.2

Therefore, sum of squares within is:


(24)[ 62.42 + 70.5 2+ 86.22]=391,066
Answer
Step 3) Fill in  your  ANOVA table
       

Source of d.f. Sum of squares Mean Sum of F-statistic p-value


variation Squares
Between 2 98,113 49056 9 <.05

Within 72 391,066 5431    

Total 74 489,179      

**R2=98113/489179=20%
School explains 20% of the variance in lunchtime calcium
intake in these kids.
Beyond one-way ANOVA
Often, you may want to test more than 1
treatment. ANOVA can accommodate
more than 1 treatment or factor, so long as
they are independent. Again, the variation
partitions beautifully!
 
TSS = SSB1 + SSB2 + SSW
 
The Regression Picture
yi
ŷi  xi  
C A
 

B
y A
B y
C *Least squares
yi
estimation gave us the
 
line (β) that minimized
C2
n n n
x  
(y
i 1
i  y) 2   ( yˆ
i 1
i  y) 2   ( yˆ
i 1
i  yi ) 2 A2 =SSy

A2 B2 C2 R2=SSreg/SStotal
SS SS SS
total reg residual
Total squared distance of Distance from regression line to naïve Variance around the regression line
observations from naïve mean mean of y  Additional variability not
of y
 Variability due to x explained by x—what least
 Total variation squares method aims to minimize
(regression)
Standard error of y/x
n

2
 i 1
( yi  y ) 2
SS y
Sy/x2= average residual squared
(what we’ve tried to minimize)
sy   n
n 1 n 1
2

i 1
( yi  yˆ i ) 2
sy/ x 
n n2

2

i 1
( yi  yˆ i ) 2 (equivalent to MSE(=SSW/df)
in ANOVA)

sy/ x 
n2
The standard error of Y given X is the average variability around
the regression line at any given value of X. It is assumed to be equal
at all values of X.

Sy/x

Sy/x
Sy/x
Sy/x
Y Sy/x

Sy/x

You might also like