5.3c JHypothesis TestingANOVA

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

5.

3 Hypothesis Testing
ANOVA
Dr. Jyotika Doshi
ANOVA
• ANalysis Of VAriance
• By Fisher
• Statistical technique used to check if the means
of two or more groups are significantly different
from each other
– checks the impact of one or more factors by
comparing the means of different samples
– With two samples, t-test and ANOVA give the same
results
• Based on two estimates of population variances:
between groups, within groups

7/10/2021 Dr. Jyotika Doshi 3


ANOVA: F-test
compare two variances

https://medium.com/@learnbay/anova-for-statistics-in-data-science-c941ad752723

7/10/2021 Dr. Jyotika Doshi 4


7/10/2021 Dr. Jyotika Doshi 5
7/10/2021 Dr. Jyotika Doshi 6
Experimental Design
and Analysis of Variance
• Three types of experimental designs
– Completely randomized design
– Randomized block design
– Factorial experiment
• Response/dependent variable: quantitative
• Independent variables: called factors
– May be categorical
– With 2 or more levels/treatments/groups

7/10/2021 Dr. Jyotika Doshi 7


Scenario
• Response variable: reaction time
• Indep. Variable: drinks with 3 treatments/levels/groups
– water drinks
– some sugary juice
– coffee or tea
• Experimental units: persons
• Test if there is any difference between the reaction
time of 3 groups or not
• H0: m1 = m2 = m3 =... = mk (k=number of groups=3)
• H1: Not all mi (i = 1,..., k) are equal (at least one mi≠mj)

7/10/2021 Dr. Jyotika Doshi 8


H0: m1 = m2 = m3 (mean reaction time)
Scenario 1 Scenario 2 Scenario 3

Variation Variation Decision


within group between groups
1. Not much not much accept H0
2. Very high not much accept H0
3. Not much Very high reject H0
7/10/2021 Dr. Jyotika Doshi 9
Variance between groups
• Closer or overlapping distributions 
grand mean to be similar to individual means
 smaller variance between groups

7/10/2021 Dr. Jyotika Doshi 10


Completely randomized design (CRD)
• Experimental design where the treatments are
randomly assigned to the experimental units
Ex. one-factor CRD
• Response variable: score in first semester
• Indep. variable (factor): bachelor degree
– Levels or treatments or groups: BSc, BCom, BCA, BE
• Random selection of students
– Group size may not be same
7/10/2021 Dr. Jyotika Doshi 12
One-way ANOVA: Variances
• One factor (independent variable), k treatments (groups)
• Xij: ith item in group j, nj: size of sample j
• X̅ j: mean of sample j, Grand mean, combined mean: X̿
• Total SS : SS of combined groups
– SST=∑i ∑j(Xij- X̿ )2
• SS within group: SSW = ∑j (∑i (Xij - X̅j)2)
– Sum of squares of all groups
– SSW: also called SSE (due to error)
• SS between groups: SSB = ∑jnj(X̅j – X̿)2… (weighed)
– SSTr (treatment) or SSC (columns, treatments usually in columns)
• Xij- X̿ = (Xij - X̅j) + (X̅j – X̿)  SST = SSW + SSB
• Variance = SS / d.f.

7/10/2021 Dr. Jyotika Doshi 13


F-test
• F: ratio of two estimates of variances
• F statistic = Between group variability / Within
group variability
= MSB/MSE = (SSBetween/(k-1)) / (MSWithin/(n-k))
n: total observations = ∑nj
• Numerator d.f. = k-1
• Denominator d.f.= ∑(nj -1) = n-k

7/10/2021 Dr. Jyotika Doshi 14


One-way ANOVA Table (CRD)
• Independent variable with k treatments
(levels/groups/columns)
Source SS df MS=SS/df Fcalc p-value
Between/ SSB
MSTr /
Treatment/ or k-1 SSTr/(k-1) P(F>Fcalc)
MSE
Columns SSTr
Within/
SSE n-k SSE/(n-k)
Error
Total SST n-1

7/10/2021 Dr. Jyotika Doshi 16


Sampling distribution of F

Sampling Distribution
of MSTR/MSE

Reject H0
Do Not Reject H0 
MSTR/MSE
F
Critical Value

7/10/2021 Dr. Jyotika Doshi 18


Example1: One-Way ANOVA
• A random sample of the students in each
row was taken to compare their score in
the second exam. Scores are as follows:
– Front: 82, 83, 97, 93, 55, 67, 53
– Middle: 83, 78, 68, 61, 77, 54, 69, 51, 63
– Back: 38, 59, 55, 66, 45, 52, 52, 61
• Response variable: scores
• Factor (indep.var): row
– Treatments/levels/groups: Front, Middle, Back

7/10/2021 Dr. Jyotika Doshi 19


Example 1: One-Way ANOVA…
Summary statistics:
Row Front Middle Back
Sample size 7 9 8
Mean 75.71 67.11 53.50
St. Dev 17.63 10.95 8.96
Variance 310.90 119.86 80.29

7/10/2021 Dr. Jyotika Doshi 20


Example 1: One-Way ANOVA…
• Front: 82, 83, 97, 93, 55, 67, 53 mean(75.71)
• Middle: 83, 78, 68, 61, 77, 54, 69, 51, 63 (mean 67.11)
• Back: 38, 59, 55, 66, 45, 52, 52, 61 (mean 53.50)
• Are all the sample means identical?
– No, there is some variation between the groups
– between group variation (variation due to the
treatment)
• Are the values within each group identical?
– No, there is some variation within the groups
– within group variation (error variation)

7/10/2021 Dr. Jyotika Doshi 21


Example 1 (1-way ANOVA)…
• 1. H0: m1 = m2 = m3 vs. H1: Not all mi are equal
• 2. α = 0.05
• 3. test statistic F = MSB/MSE
n1 x1  n2 x2  n x
• 4. Compute F x k k

– Grand mean X̿ = ∑ ∑ Xij / ∑nj = 65.08 n n 


1 2
nk

(7∗75.71) + (9∗67.11) + (8∗53.50)


OR =65.08
𝟕+𝟗+𝟖
– SSB = ∑jnj(X̅j – X̿)2 ≈1900.84
SS  B   7  75.71  65.08  9  67.11  65.08  8  53.50  65.08
2 2 2

– SSW = ∑j (∑i (Xij - X̅j)2) = ∑j (nj – 1) * Sj2 = 3386.31


SS W   6  310.90   8 119.86   7 80.29 

7/10/2021 Dr. Jyotika Doshi 23


Example 1 (1-way ANOVA)…
• 4. (continue…)
Between Groups d.f. = 3-1 = 2
Within groups d.f. = n-k = (7+9+8)-3 = 21
MSB = SSB / df = 1900.84 / 2
MSW = MSE = SSE / df = 3386.31 / 21
F = MSB/MSE = 5.9
– SST=Total SS = ∑i ∑j(Xij – X̿)2
• Not used in computing F
• May use to get SSE using SST and SSB (SST= SSB+SSE)

7/10/2021 Dr. Jyotika Doshi 24


One-Way ANOVA
• MS(B) = 1902 / 2 = 951.0
• MS(W) = 3386 / 21 = 161.2
• MS(T) = 5288 / 23 = 229.9
– Notice: MS(Total) ≠ MSB + MSW
– But, SST = SSB + SSW, same for degree of
freedom
– Usually, MS(Total) isn’t shown

7/10/2021 Dr. Jyotika Doshi 25


Example 1 (1-way ANOVA)…
• Step 4: Complete 1-way ANOVA table with the p-value
Source SS df MS Fcalc p
Between 1902 2 951.0 5.9 0.009

Within 3386 21 161.2


Total 5288 23 229.9
• Step 5: p-value = 0.009 < (α=0.05)  reject H0
• Step 6: means of rows are different (at least one row has
a different mean)
7/10/2021 Dr. Jyotika Doshi 26
Example 1 (1-Way ANOVA)…
• In the example, conclusion is
– At 5% level of significance, there is enough
evidence to support the claim that there is a
difference in the mean scores of the front, middle,
and back rows in class
• ANOVA doesn’t tell which row is different
– Either run multiple t-test
– OR run post hoc (posteriori)tests (ex. Tukey’s
honestly significant difference (HSD) test)

7/10/2021 Dr. Jyotika Doshi 27


Example 2: 1-way ANOVA (CRD)
• Test whether there are differences in monthly expenses
of the departments using the following data.
Department Monthly Expenses in Rs. (Thousands)
Human Resource 15 9 8 12 15 12 10
Sales 12 8 10 10 14
Marketing 10 10 12 15

Response var.: expenses (Rs.)


Indep.var. (factor): department with 3 treatments or groups
1. H0: m1 = m2 = m3 vs. H1: Not all mi are equal
2. α = 0.05
3. test statistic F = MSB/MSE

7/10/2021 Dr. Jyotika Doshi 29


Example 2 (1-way ANOVA)…
• Step 4:
Department Monthly Expenses in Rs. Count
(Thousands) Sum nj Mean
Human Resource 15 9 8 12 15 12 10 81 7 11.57 Grand
Sales 12 8 10 10 14 54 5 10.8 mean=
Marketing 10 10 12 15 47 4 11.75 11.37
2
(Xij-X̿) 13.1769 5.6169 11.3569 0.3969 13.1769 0.3969 1.8769
Sum =
0.3969 11.3569 1.8769 1.8769 6.9169 85.75
1.8769 1.8769 0.3969 13.1769
• SST = ∑∑(Xij – X̿)2 = 85.75
• SSTr = ∑jnj(X̅j – X̿)2 = 7(11.57-11.37)^2 + 5(10.8-11.37)^2 +
4(11.75-11.37)^2 = (7*0.04)+(5*0.33)+(4*0.14) = 2.486
• SSE = SST – SSTr = 83.264
7/10/2021 Dr. Jyotika Doshi 30
Example 2 (1-way ANOVA) …
• Step 4,5: ANOVA Table
Source of P-
Variation SS df MS F value F crit
Between Groups 2.486 2 1.243 0.194 0.826 3.806
Within Groups 83.26 13 6.405
Total 85.75 15
• Step 6: Conclusion
• P-value > 0.05  do not reject H0
• At 5% significance level, there is no evidence to
conclude that mean expenses of various
departments are same
7/10/2021 Dr. Jyotika Doshi 31
Assumptions of One-Way ANOVA
• Normality
– response variable in each sampled (group) population
is normally distributed population
• Independence
– Observations must be independent
– each sample has been drawn independently of the
other samples
• Variance Equality
– Variance of response variable for each sampled
population should be the same

7/10/2021 Dr. Jyotika Doshi 32


Python
• Python: stats_f.oneway()

7/10/2021 Dr. Jyotika Doshi 33


Exercise 1
• A dietician wants to test 3 different types of diet plans
to see if the average weight loss (in pounds) is similar
in all the plans. A homogeneous group of total 23
persons is selected as experimental units and the
weight loss is recorded after a period of 30 days. Test
the hypothesis at 1% level of significance.
Diet Plan 1 4 3.8 3.7 6.2 5.6 4.2

Diet Plan 2 3.6 5.2 2.8 3 3.8 5 3.9 5.5

Diet Plan 3 6.5 7.2 5.9 5.5 6.8 7.7 8 8.2 7

7/10/2021 Dr. Jyotika Doshi 34


Exercise 2, 3
• Fuel costs are important to profitability in the airline business. A
small regional carrier has been operating three types of planes and
has collected the following cost data from its 14 planes. Can we
conclude that there is no significance difference between plane
types in fuel costs? Test at 1% level of significance.
Type A 7.0 8.5 7.5 6.5 8.0
Type B 5.5 7.5 7.0
Type C 8.0 9.5 8.5 8.0 9.5 9.0
• A study compared the effects of four alternative ways of promoting
sales. The unit sales for 5 stores using all four options are given in
the following table. Do the different options produce significant
difference in the sales? Test at 0.05 level of significance.
Free sample 78 87 81 89 85
One-pack gift 94 91 87 90 88
Cents off 73 78 69 83 76
Refund by mail 79 83 78 69 81
7/10/2021 Dr. Jyotika Doshi 35
Two-way ANOVA

7/10/2021 Dr. Jyotika Doshi 36


2-way ANOVA
• statistical test used to determine the effect of
two nominal predictor variables on a
continuous outcome variable
• an extension of the one-way ANOVA
– Two factors
• Factor A (subdivided into k treatments)
• Factor B (subdivided into b blocks)

7/10/2021 Dr. Jyotika Doshi 37


2-way ANOVA, 2 factors
• To know how two explanatory variables, in
combination, affect a dependent variable
• Ex. Which type of fertilizer and planting density
produces the higher crop yield in a field experiment?
• Experimental units: different land plots
• Factor 1: fertilizer type (levels: 1, 2, or 3)
• Factor 2: planting density (levels: 1=low, 2=high)
• Response variable: Yield in crop per acre at harvest
time
• Assign different plots in a field randomly to a
combination of fertilizer type and planting density
7/10/2021 Dr. Jyotika Doshi 38
2-way ANOVA: 3 sets of Hypotheses
• Null hypotheses for each of the sets
1. The population means of the first factor are equal.
(one-way ANOVA for the column factor)
2. The population means of the second factor are
equal. (one-way ANOVA for the row factor)
• Sometimes, 2nd factor may not be of interest to study
3. There is no interaction between the two factors
( like performing a test for independence with
contingency tables)
 Interaction effect: effect of one factor is indep. of effect of
other factor

7/10/2021 Dr. Jyotika Doshi 40


Factorial experiments
• to draw conclusions about two or more variable
• valuable designs when simultaneous conclusions
about two or more factors are required
• term ‘factorial’ is used because the experimental
conditions include all possible combinations of
the factor
• Interaction source is also considered
• Uses three hypothesis (including interaction)

7/10/2021 Dr. Jyotika Doshi 41


2-factor factorial experiment
• In an attempt to improve students’ performance on the GMAT, a
major Texas university is considering offering three GMAT
preparation programs. The GMAT is usually taken by students from
three colleges: College of Business, College of Engineering, College of
Arts and Sciences. Their GMAT scores are recorded to study the
effect of proposed program and effect of undergraduate college.
• Factor 1 of interest: GMAT preparation program
– three-hour review session covering the types of questions generally
asked on the GMAT
– one-day program covering relevant exam material, along with the
taking and grading of a sample exam
– intensive 10-week course involving the identification of each student’s
weaknesses and setting up of individualized programs for improvement
– 3 Levels: three-hour review, one-day program, 10-week course
• Factor 2 of interest: undergraduate college
– 3 Levels: business, engineering, arts and sciences

7/10/2021 Dr. Jyotika Doshi 42


2-factor factorial experiment, replica=2
Preparation Undergraduate college
program Business Engineering Arts & Science
Three-hour 500, 580 540, 460 480, 400
One-day 460, 540 560, 620 420, 480
• Response var.
GMAT score 10-week 560, 600 600, 580 480, 410
• Replications : Sample size in cells, 2 students here
(may be different in different cells)
• Main effect (factor 1): Do the preparation programs differ in terms
of effect on GMAT scores?
• Main effect (factor 2): Do the undergraduate colleges differ in
terms of effect on GMAT scores?
• Interaction effect (factors 1 and 2): Do students in some colleges
do better on one type of preparation program whereas others do
better on a different type of preparation program?

7/10/2021 Dr. Jyotika Doshi 43


Interaction effect
• Joint effect of all factors
• If the interaction effect has a significant
impact on the GMAT scores, we can conclude
that the effect of the type of preparation
program depends on the undergraduate
college.

7/10/2021 Dr. Jyotika Doshi 44


Two factors factorial design
with no replica
• On the basis of the following data relating to the fields of a
crop under the five treatments (seed types) and four
different types of plots (fertility), state whether there is
significance difference in the yield due to (i) different
treatments (ii) due to different fertility of plots.

Each plot, 5 Plot Treatments: Seed types


homogeneous Fertility A B C D E
experimental units, P1 210 253 266 199 267
i.e. for each Pi, five P2 184 193 235 164 214
similar plots to sow P3 207 206 239 211 277
seeds P4 167 208 212 166 242

7/10/2021 Dr. Jyotika Doshi 45


Randomized block design

7/10/2021 Dr. Jyotika Doshi 46


Randomized Block Design (RBD)
• Experimental units: elements selected in study
• Completely Randomized Design (CRD)
– Experimental design in which the treatments are randomly
assigned to the experimental units
• Randomized Block Design (RBD)
– Experimental units are heterogeneous
– blocking used to form homogeneous groups of
experimental units
– Within block, treatments are assigned randomly to
experimental units
– Two factors (treatment, block) may or may not interact

7/10/2021 Dr. Jyotika Doshi 47


RBD…
• Response variable: Quantitative (ex. yield in crop)
• Two categorical explanatory variables, called Factors
– Usually one primary factor (Treatment, say seed type)
– Block factor (say plot fertility type )
• Block factor
– Usually a confounding variable (not of interest by itself)
but has an influence on the response variable
– included in order to reduce the variability in response
variable
• With k treatments and b blocks,
total sample size n = b * k (without replica)

7/10/2021 Dr. Jyotika Doshi 48


Blocks in RBD
• Group of similar or homogeneous units
• Used to reduce the variability in response
variable
• Other examples
– 3 methods (treatments) of reducing blood pressure;
Blocks defined using initial blood pressure
– 4 methods (treatments) for enhancing memory;
Blocks defined by age
– Impairment while driving (treatments: alcohol,
marijuana, no sleep, control); Blocks by gender

7/10/2021 Dr. Jyotika Doshi 49


Simple Block Design, all nij = 1
No replica
• A simple block design has two factors with:
– Exactly one data value in each cell (combination of
treatment and block)
– Single sample of controllers (replica = 1)
• Factor 1 is factor of interest, called treatment
• Factor 2, called blocks, used to control a known
source of variability
• Main interest: comparing means of levels of
treatment
• factor 1 with k Treatments, factor 2 with b Blocks
N = k*b data values
7/10/2021 Dr. Jyotika Doshi 51
Complete vs. Incomplete block design
• Complete block design :
– Experimental design where each block is
subjected to all k treatments
– i.e. all controllers (blocks) tested with all
treatments
• Incomplete block design :
– Experimental designs where some but not all
treatments are applied to each block
– beyond the scope
7/10/2021 Dr. Jyotika Doshi 52
2-way ANOVA table (simple RBD)
without interaction effect
Source of Sum of Degrees of Mean p-
Variation Squares Freedom Square F Value
SSTR MSTR
Treatments SSTR k-1 MSTR 
k-1 MSE
SSBL 𝑀𝑆𝐵𝐿
Blocks SSBL b-1 MSBL 
b -1 𝑀𝑆𝐸
SSE
Error SSE (k – 1)(b – 1) MSE 
( k  1)(b  1)

Total SST nT - 1

• SSTr: SSColumns, SSBlocks:SSRows


• Factor 2(blocks) may or may not be of interest
7/10/2021 Dr. Jyotika Doshi 53
Example 3 (2-way ANOVA)
assuming no interaction effect
• The cutting speeds of four types of tools are being
compared in an experiment. Five cutting materials of
varying degree of hardness are to be used as experimental
blocks.
– Response variable: cutting speed (time in seconds)
– Treatment levels: 4 types of tools
– Block levels: five cutting materials of varying degree of hardness
Tool 1 Tool 2 Tool 3 Tool 4
Material 1 12 20 13 11
Material 2 2 14 7 5
Material 3 8 17 13 10
Material 4 1 12 8 3
Material 5 7 17 14 6
7/10/2021 Dr. Jyotika Doshi 54
Example 3 (2-way ANOVA),
no interaction effect
• Step 1. Two Hypothesis
– For treatments (columns: 4 cutting tools)
• H0: μt1 = μt2 = μt3 = μt4
• H1: not all μtj are same
– For blocks (rows: 5 cutting material)
• H0: μb1 = μb2 = μb3 = μb4 = μb5
• H1: not all μbi are same
• Step 2. α = 0.05
• Step 3. test statistic: (with k columns, b rows, nij=1)
– FTr=MSTr/MSE with (k-1, (b-1)(k-1)) df
– Fblock=MSBlock/MSE with (b-1, (b-1)(k-1)) df

7/10/2021 Dr. Jyotika Doshi 55


Two-way ANOVA: Variances
• Two factors (independent variable)
– k treatments (groups), b blocks
• Xij: value of response variable in block i of treatment j
• Grand mean, combined mean: X̿
• X̅ tj: mean of treatment j, X̅ bi: mean of block I
• Total SS : SS of combined groups: SST=∑i ∑j(Xij- X̿ )2
• SS between Treatment groups: SSTr = ∑jnj(X̅tj – X̿)2 [nj=b]
• SS between Block groups: SSBlock = ∑jni(X̅bi – X̿)2 [ni=k]
• SST = SSTr + SSBlock+SSE  SSE = SST - SSTr - SSBlock
• Variance = SS / d.f.

7/10/2021 Dr. Jyotika Doshi 56


Example 3 (2-way ANOVA)…
Step 4:
• Treatment: levels=4, means: 6, 16, 11, 7
• Block: levels = 5, means: 14, 7, 12, 6, 11
• Grand mean X̿ = 10
• SST = ∑i ∑j(Xij- X̿ )2 = 518
• SSTr = ∑jnj(X̅tj – X̿)2 = 5{(6-10)2+(16-10)2+(11-10)2+(7-10)2}=
5*(16+36+1+9) = 5*62 = 310
• SSBlock = ∑ini(X̅ bi – X̿)2 = 4{(14-10)2+(7-10)2+(12-10)2+(6-
10)2+(11-10)2}= 4*(16+9+4+16+1) = 4*46= 184
• SSE = SST – SSTr – SSBlock = 518 – 310 – 184 = 24
• MSTr=SSTr/df = 310/3=103.3
• MSBlock = SSBlock/df = 184/4=46
• MSE = 24/(3*4)=2
7/10/2021 Dr. Jyotika Doshi 57
Example 3 (2-way ANOVA)…
• Step 4,5: 2-way ANOVA Table
Source df SS MS F P-value
tool 3 310 103.3 51.7 3.897E-07
material 4 184 46 23 1.489E-05
Error 12 24 2
Total 19 518
• Step 6: Conclusion
– Reject Ho for tools: signi. diff. in mean cutting speed using
various cutting tools
– Reject Ho for material: signi. diff. in mean cutting speed
using cutting material with varying hardness

7/10/2021 Dr. Jyotika Doshi 58


Example 4 (2-way ANOVA)
without interaction effect
• The following data represent clotting times (minutes) of
plasma from eight subjects treated in four different ways.
The eight subjects (blocks) were allocated at random to
each of the four treatment groups.
Treatment 1 Treatment 2 Treatment 3 Treatment 4
1 8.4 9.4 9.8 12.2
2 12.8 15.2 12.9 14.4
3 9.6 9.1 11.2 9.8
4 9.8 8.8 9.9 12.0
5 8.4 8.2 8.5 8.5
6 8.6 9.9 9.8 10.9
7 8.9 9.0 9.2 10.4
8 7.9 8.1 8.2 10.0

7/10/2021 Dr. Jyotika Doshi 59


Example 4: 2-way ANOVA table
without interaction effect
Sum Mean
Source of Variation DF F P-value
Squares Square
Between treatments 13.01625 3 4.33875 6.6150 0.0025
Between blocks 78.98875 7 11.284107 17.2041 2.19E-07
Residual (error) 13.77375 21 0.655893
Total 105.77875 31
• F (between treatments) = 6.1650, P very small < .003
• Reject H0, high significant difference between mean clotting
times across treatments
• Difference in mean clotting time of subjects is of no particular
interest here

7/10/2021 Dr. Jyotika Doshi 60


Example 5 (2-way ANOVA)
• A fast food franchise want to find out if 3 new menu
items have the same popularity. 6 franchisee
restaurants are randomly chosen for participation in
the study. In accordance with the randomized block
design, 3 new menu items for each restaurant is
randomly assigned.
• Each row in the table (given on next slide) represents
the sales figures of the 3 new menu items in 6
franchise restaurants after a week of marketing. At .05
level of significance, test whether the mean sales
volume for the 3 new menu items are all equal. Also
test whether the mean sales volume for the 6 franchise
restaurants are all equal.
7/10/2021 Dr. Jyotika Doshi 61
Example 5:Using s/w tools: 2-way ANOVA
Franchise Item1 Item2 Item3
1 31 27 24 • H0 for menu items (treatments):
2 31 28 31 mean sales of 3 new items is
3 45 29 46 same
4 21 18 48 • H0 for franchise (blocks): mean
5 42 36 46
6 32 17 40 sales of 6 franchise is same
Source Sum Sq df Mean Sq F value Pr(>F)
treatment (items) 539 2 269 4.96 0.032 *
blk (franchise) 560 5 112 2.06 0.155
At 5% level of significance,
• Reject H0 for treatments, There is signi. difference in mean sales of
3 menu items
• Do not reject H0 for franchise, No signi. diff. in mean sales of 6
franchise
7/10/2021 Dr. Jyotika Doshi 62
Assumptions of a Two-Way ANOVA
• Dependent variable should be continuous
• Two explanatory variables (factors) should be in
categorical, subdivided into independent groups
• Sample independence: each sample has been
drawn independently of the other samples
• Variance Equality: variance of data in different
populations of groups should be same
• Normality: each sample is taken from a normally
distributed population

7/10/2021 Dr. Jyotika Doshi 63


2-way ANOVA
with interaction, with replica
• ANOVA procedure: with three hypothesis
• Two factors: factor A with a levels, factor B with b
levels
• Interaction: joint effect of factors A and B
• Number of replica: r
• Partition of sum of squares total (SST) into sources:
SST = SSA + SSB + SSAB + SSE
• Partition of total degree of freedom: nT – 1
– Factor A d.f. = a – 1
– Factor B d.f. = b – 1
– Interaction d.f. = (a – 1)(b – 1)
– Error d.f. = ab(r – 1)

7/10/2021 Dr. Jyotika Doshi 65


ANOVA table: 2-way factorial design
3 hypothesis, with replica r
Source of Sum of Degrees of Mean
Variation Squares Freedom Square F p-value
SSA MSA
Factor A SSA a-1 MSA 
a-1 MSE
SSB MSB
Factor B SSB b-1 MSB 
b -1 MSE
SSAB MSAB
Interaction SSAB (a – 1)(b – 1) MSAB 
( a  1)(b  1) MSE
SSE
Error SSE ab (r – 1) MSE 
ab(r  1)

Total SST nT - 1
7/10/2021 Dr. Jyotika Doshi 66
Computation of SS in 2-way ANOVA
with three hypothesis
a b r
• Total sum of squares SST =  ( xijk  x )2
i 1 j 1 k 1
a
• sum of squares(factor A) SSA = br  ( xi .  x )2
i 1

• sum of squares(factor B) b
SSB = ar  ( x . j  x )2
j 1
a b
• Interaction SS SSAB = r  ( xij  xi .  x . j  x )2
i 1 j 1

• Error SS: SSE = SST – SSA – SSB - SSAB

7/10/2021 Dr. Jyotika Doshi 67


Example 6: (with replica r=3)
Two-Factor Factorial Experiment
A survey was conducted of hourly wages for a sample of
workers in two industries at three locations in Ohio. Part
of the purpose of the survey was to determine if
differences exist in both industry type and location. Test
the hypothesis.
Industry Cincinnati Cleveland Columbus
I $12.10 $11.80 $12.90
I 11.80 11.20 12.70
I 12.10 12.00 12.20
II 12.40 12.60 13.00
II 12.50 12.00 12.10
II 12.00 12.50 12.70

7/10/2021 Dr. Jyotika Doshi 68


Example 6 …
• Factor A: industry type (2 levels: I and II)
• Factor B: location (3 levels)
• Replications: 3 (Each combination repeated 3 times)
Three hypothesis:
• Factor A (Industry)
– H0: no signi. Difference in means due to industry type
• Factor B (location)
– H0: no signi. Difference in means due to location
• Interaction
– H0: no signi. interaction between industry type and
location

7/10/2021 Dr. Jyotika Doshi 69


Example 6…
• ANOVA Table

Source of Sum of Degrees of Mean


Variation Squares Freedom Square F p-Value
Factor A .50 1 .50 4.19 .06
Factor B 1.12 2 .56 4.69 .03
Interaction .37 2 .19 1.55 .25
Error 1.43 12 .12
Total 3.42 17

7/10/2021 Dr. Jyotika Doshi 70


Example 6…
• Conclusions using critical value (α=0.05), p-value
• Industries:
– F = 4.19 < F(α,1,12) = 4.75, p-value = .06 > α = .05
– not to reject H0  Mean wages do not differ significantly
by industry type
• Locations:
– F = 4.69 > F(α,2,12) = 3.89, p-value = .03 < α = .05
– Reject H0 Mean wages differ significantly by location
• Interaction:
– F = 1.55 < F(α,2,12) = 3.89, p-value = .25 > α = .05
– Do not reject H0  no significant interaction between
industry and location

7/10/2021 Dr. Jyotika Doshi 71


End of ANOVA topic !!!

7/10/2021 Dr. Jyotika Doshi 74

You might also like