Professional Documents
Culture Documents
Lec Mar 1A ANOVA I
Lec Mar 1A ANOVA I
(ANOVA)-I
INDE-8210 / MECH-8290
Multiple Comparison
• When an experiment (comparison) has
– A single factor with 3 or more levels, or
• (e.g. Testing the percentage of dosage; quench
schedule of metal parts; cutting speed of a CNC
machine, etc.)
– Two or more factors
• (e.g. mix of two/multiple chemicals for a new clue,
cutting speed and machine feed for a CNC
machine, etc.
• Statistical Test to be used for comparison of
means from different populations: ANOVA, as
the t-test or z-test is no longer capable of
handling this kind of comparison.
ANOVA
• Stands for ANalysis Of VAriance
• Used for hypothesis testing in
– Simple Regression
– Multiple Regression
– Comparison of Means
• The heart of the ANOVA is a comparison
of variance estimates between your
conditions (factors, groups, treatments,
etc.)
• Often call the effects as treatment effects
Why do ANOVA
• For two levels of a Independent Variable,
ANOVA produce same results as the t-test.
• However, when more than 2 levels of treatment
(or experiment), MUST use ANOVA.
– Often, it is necessary to have more than two
levels of treatments ( in most experiments)
• Non-linear relationship, two levels will
always produce a straight line, or linear
relationship. But, three or more can
describe a non-linear function.
Problems with Multiple t-tests
• Each time you perform a statistical test, you reject the
null hypothesis if the probability that your sample data
was obtained is less than 5% given that the null
hypothesis is true.
• You are potentially wrong 5% of the time that you reject
a null hypothesis (this is another reason why we never
use the word PROVE)!
• Perform three t-tests, each test has a 5% chance of
erroneously rejecting null hypothesis –implies that the
probability that you are wrong on at least one of the t-
tests increases to ~15%
• Family-wise error rate: chance of making at least 1 type-I
error in a family (or sequence) of statistical tests.
Reasons and Assumptions of ANOVA
Two-Sample T-test:
H0: 1 = 2
(Two populations have the same mean)
Average of 5 Levels
= (15.4+15.5+15.4+15.4+15.3)/5 = 15.4
Total Variation
ssTotal = (15.5-15.4)2+(15.3-15.4)2+(15.4-15.4)2+(15.6-15.4)2+(15.3-15.4)2+(15.5-15.4)2+
(15.5-15.4)2+(15.3-15.4)2 +(15.2-15.4)2+(15.4-15.4)2
= 0.140
=
ssBetween-Level = 2(15.4-15.4)2+2(15.5-15.4)2+2(15.4-15.4)2+2(15.4-15.4)2 +2(15.3-15.4)2
= 0.04 Between-Treatment-Variation (due to different levels of Amps)
SSe
+
ssWithin-Level = (15.5-15.4)2+(15.3-15.4)2+(15.4-15.5)2+(15.6-15.5)2 +(15.3-15.4)2+
(15.5-15.4)2 +(15.5-15.4)2+(15.3-15.4)2 + (15.2-15.3)2+(15.4-15.3)2
= 0.100 Within-Treatment-Variation (due to Errors or natural causes)
One-way ANOVA: thickness versus current
Source DF SS MS F P
Current 4 0.0400 0.0100 0.50 0.739
Error 5 0.1000 0.0200
Total 9 0.1400
Notice that
SStotal = SScutting-fluid + SSnatural-variation
Example 2B
Does cutting fluid type affects the surface finish of
machined parts?
- Factor = “cutting fluid type"
- Factor levels ( treatments) = types "1“, "2“, “3”, and “4”
H0: 1 = 2 = 3 = 4
H1: At least one is different
Y(i,j) = surface finish from the j-th part treated with cutting fluid i
Cutting fluid 1 Cutting fluid 2 Cutting fluid 3 Cutting fluid 4
Y(1,1)=18.95 Y(2,1)=14.95 Y(3,1)=18.05 Y(4,1)=19.95
Y(1,2)=19.05 Y(2,2)=15.05 Y(3,2)=17.95 Y(4,2)=20.05
Average 19.00 15.00 18.00 20.00
(within
Type) Grand Average = 18.00
σ4 σ2 ഥ
SStotal = 𝑖=1 𝑗=1 𝑌𝑖,𝑗 − 𝑌
=(18.95-18)2+(19.05-18)2+(14.95-18)2+(15.05-18)2
+(18.05-18)2+(17.95-18)2+(19.95-18)2+(20.05-18)2
= 28.02
SScutting-fluid-type=(2*(19-18)2)+(2*(15-18)2)+
(2*(18-18)2)+(2*(20-18)2)= 28.00
SSnatural-variation=(18.95-18)2+(19.05-18)2+(14.95-15)2
+(15.05-15)2+(18.05-18)2+(17.95-18)2
+(19.95-19)2+(20.05-19)2 = 0.02
Notice that the following is still true!
SStotal = SScutting-fluid + SSnatural-variation
Basic Model of a data set:
• Assume the factor is at “k” levels, and that n
observations are taken within each level (balanced!)
• Let y(i,j) be the response for the j-th observation within
the i-th level of the factor.
FACTOR
1 2 3 ....... k
-----------------------------------------
y(1,1) y(2,1) y(3,1) ...... y(k,1)
y(1,2) y(2,2) y(3,2) y(k,2)
. .
Observation . .
y(1,n) y(2,n) y(3,n) ..... y(k,n)
• Each y(i,j) is a random variable.
• Once the expt has been conducted, we will have numbers
that are outcomes of the random variables.
• If we repeated the expt, we would get different outcomes
• Sampling is assumed to be RANDOM
• Statistical Models:
Two Basic Types depending on the situation
FIXED vs. RANDOM Effects models
Example:
y(i,j) = spot welding quality of randomly parts from electric
current level at i
• Factor = welding current at levels “10000A", “15000A",
“9000A"
• Randomly select parts from each level, and record their
welding quality in the dataset.
FIXED Effects Model
Interested only in the 3 levels, 10000A, 15000A, and 9000A
H0: 10000 = 15000 = 9000
Random Effects:
Interested in making inference about a large
population of treatments from which those studied
are a representative random sample.
Basic Relevance:
For the One-way ANOVA
i) The statistical tests are the same for Fixed
and Random
ii) The power of the tests is different
𝑦..
𝑦ധ = ൗ𝑁 The Overall Average
k 2
SStreatment = n y − y
i i.
i =1
SSerror = SS − SS
Total Treatment
Proof
k n 2 k n 2
SS = y − y = y − y + y − y
T ij ij
i. i.
i =1 j =1 i =1 j =1
k n 2
= y −y + y −y
ij i.
i.
i =1 j =1
k n 2 k n 2 k n
= y −y + y −y + 2 y −y y −y
ij
i. i. ij i. i.
i =1 j =1 i =1 j =1 i =1 j =1
k 2 k n
= SS + n y − y + 2 y − y y − y
E i. ij i. i.
i =1 i =1 j =1
( )
k n
= SS + SS + 2 yij − yi. yi. − y
E TRT
i =1 j =1
= SS + SS +0
E TRT
n
= 2 (yi. − y )
k
yij − nyi.
i =1 j = 1
= 2 (yi. − y )nyi. − nyi. = 2 (yi. − y )0 = 0
k k
i =1 i =1
k
SStrt = n ( yi. - y.. )2
i =1
2(total)
2(error)
2(total)
2(error)
2
treatment
2
+ error
2
total
= ~ 1.0
2
error error
2
4
3
2
1
= + j + ij
computational formula
• goal: estimate the aforementioned
variances in an “easy” way
• compute a bunch of SSs (summed
squared differences from the mean)
• compute MS (mean-squares, i.e.
variances) and make our ratio, F
degrees of freedom
• total d.o.f = N-1
• for k treatments, treatment d.o.f = k-1
• for n data points per treatment, error d.o.f
= k(n-1) (i.e. the sum of the d.o.f’s for each
group) (Note: error can be considered as
the natural variation, or the uncontrollable
variation that can not be avoided, as we
discussed back in SPC about variation
classification.)
F
• F is the F test statistic
• There will be an F test statistic for each
source except for the error and total
• F is the ratio of two sample variances
• The MS column contains variances
• The F test statistic for each source is the
MS for that row divided by the MS of the
error row
F
• F requires a pair of degrees of freedom,
one for the numerator and one for the
denominator
• The numerator df is the df for the source
• The denominator df is the df for the error
row
• F is always a right tail test
The ANOVA Table
• The ANOVA table is composed of rows,
each row represents one source of
variation
• For each source of variation …
– The variation is in the SS column
– The degrees of freedom is in the df column
– The variance is in the MS column
– The MS value is found by dividing the SS by
the df
ANOVA Table
• The complete ANOVA table can be
generated by most statistical packages
and spreadsheets
• We’ll concentrate on understanding how
the table works rather than the formulas
for the variations
The ANOVA Table
Source SS df MS F
(variation) (variance)
Explained*
(treatment)
Error
(natural)
Total
The explained* variation has different names depending on the particular type
of ANOVA problem
Example 1
Source SS df MS F
Explained 18.9 3
Error 72.0 16
Total
Explained 18.9 3
Error 72.0 16
Total 90.9 19
Error 26
Total
Error 26 8.20
Total
Total 31
Total 319.8 31
Cotton
Weight
Percentage
Observations
15 7 7 15 11 9
20 12 17 12 18 18
25 14 19 19 18 18
30 19 25 22 19 23
35 7 10 11 15 11
Is there evidence to support the claim that cotton content affects the mean
tensile strength? Use = 0.05.
Example4: Solution
Minitab Output
One-way ANOVA: Tensile Strength versus Cotton Percentage
Source DF SS MS F P
Cotton P 4 475.76 118.94 14.76 0.000
Error 20 161.20 8.06
Total 24 636.96