Statistics For Business: Analysis of Variance

Statistics for Business
Analysis of Variance
Deepak Mathivathanan
Learning Objectives
In this chapter, you learn:

 The basic concepts of experimental design
 How to use the one-way analysis of variance
to test for differences among the means of
several groups
 How to use the two-way analysis of variance
and interpret the interaction
Chapter Overview
Analysis of Variance (ANOVA)
One-Way Two-Way
ANOVA ANOVA
F-test Interaction
Effects
Tukey-
Kramer
test
General ANOVA Setting
 Investigator controls one or more factors of interest

 Each factor contains two or more levels
 Levels can be numerical or categorical
 Different levels produce different groups
 Think of the groups as populations
 Observe effects on the dependent variable
 Are the groups the same?
 Experimental design: the plan used to collect the
data
Completely Randomized Design
 Experimental units (subjects) are assigned

randomly to the different levels (groups)
 Subjects are assumed homogeneous
 Only one factor or independent variable
 With two or more levels (groups)
 Analyzed by one-factor analysis of variance
(one-way ANOVA)
One-Way Analysis of Variance
 Evaluate the difference among the means of three or
more groups
Examples: Accident rates for 1st, 2nd, and 3rd shift

Expected mileage for five brands of tires
 Assumptions
 Populations are normally distributed
 Populations have equal variances
 Samples are randomly and independently drawn
Hypotheses: One-Way ANOVA

 All population means are equal
 i.e., no treatment effect (no variation in means among groups)

 At least one population mean is different
 i.e., there is a treatment (groups) effect
 Does not mean that all population means are different (at
least one of the means is different from the others)
All Means are the same:

The Null Hypothesis is True
(No Group Effect)
At least one mean is different:

The Null Hypothesis is NOT true
(Treatment Effect is present)
or
Partitioning the Variation
 Total variation can be split into two parts:
SST = SSA + SSW
SST = Total Sum of Squares

(Total variation)
SSA = Sum of Squares Among Groups
(Among-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)
SST = SSA + SSW
Total Variation = the aggregate dispersion of the individual data

values around the overall (grand) mean of all factor levels (SST)
Among-Group Variation = dispersion between the factor

sample means (SSA)
Within-Group Variation = dispersion that exists among the data

values within the particular factor levels (SSW)
Total Variation (SST)
Among Group Within Group Variation

= Variation (SSA) + (SSW)
The Total Sum of Squares
SST = SSA + SSW
Where:
SST = Total sum of squares
c = number of groups
nj = number of values in group j
Xij = ith value from group j
X = grand mean (mean of all data values)
The Total Sum of Squares
Among-Group Variation
SST = SSA + SSW
Where:
SSA = Sum of squares among groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)
Among-Group Variation
µ1 µ2 µc
Within-Group Variation
SST = SSA + SSW
Where:
SSW = Sum of squares within groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith value in group j
Within-Group Variation
Obtaining the Mean Squares
Mean Squares Among
Mean Squares Within
Mean Squares Within

One-Way ANOVA Table
Source of df SS MS F-Ratio
Variation (Variance)
Among c-1 SSA MSA
Groups
Within n-c SSW MSW
Groups
Total n-1 SST =
SSA+SSW
n = sum of the sample sizes from all groups
df = degrees of freedom
One-Way ANOVA
Test Statistic
H0: μ1= μ2 = … = μc
H1: At least two population means are different
 Test statistic
 MSA is mean squares among variances

 MSW is mean squares within variances
 Degrees of freedom
 df1 = c – 1 (c = number of groups)
 df2 = n – c (n = sum of all sample sizes)
One-Way ANOVA
Test Statistic
 The F statistic is the ratio of the among variance to the
within variance
 The ratio must always be positive
 df1 = c -1 will typically be small
 df2 = n - c will typically be large
Decision Rule:
Reject H0 if F > FU,  = .05
otherwise do not reject H0
0 Do not Reject H0
reject H0
FU
One-Way ANOVA
Example
You want to see if three

different golf clubs yield Club 1 Club 2 Club 3
different distances. You 254 234 200
randomly select five 263 218 222
measurements from trials on an 241 235 197
automated driving machine for 237 227 206
each club. At the .05 251 216 204
significance level, is there a
difference in mean distance?
One-Way ANOVA
Example
Distance
Club 1 Club 2 Club 3 270
254 234 200 260 •
263 218 222 ••
250
241 235 197
237 227 206 240 •
• ••
251 216 204 230
•
220
•• •
210
200 ••
••
190
1 2 3
Club
One-Way ANOVA
Example
X1 = 249.2 n1 = 5
X2 = 226.0 n2 = 5
X3 = 205.8 n3 = 5
n = 15
X = 227.0
c=3
SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

One-Way ANOVA
Example
MSA = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.3
Critical
Value:
FU = 3.89
 = .05
0 Do not Reject H0
reject H0 F = 25.275
FU = 3.89
One-Way ANOVA
Example
 H0: μ1 = μ2 = μ3
 H1: μj not all equal
  = .05
 df1= 2 df2 = 12
Decision: Conclusion: There is

evidence that at least one
Reject H0 at α = 0.05 μj differs from the rest
One-Way ANOVA in Excel
SUMMARY
EXCEL:
Select Groups Count Sum Average Variance
Tools Club 1 5 1246 249.2 108.2
____ Club 2 5 1130 226 77.5
Data Club 3 5 1029 205.8 94.2
Analysis ANOVA
_____
Source of
ANOVA: Variation
SS df MS F P-value F crit
single
Between
factor Groups
4716.4 2 2358.2 25.275 4.99E-05 3.89
Within
1119.6 12 93.3
Groups
Total 5836.0 14
The Tukey-Kramer Procedure
 Tells which population means are significantly different
 e.g.: μ1 = μ2 ≠ μ3
 Done after rejection of equal means in ANOVA
 Allows pair-wise comparisons
 Compare absolute mean differences with critical
range
μ1= μ2 μ3 x
Tukey-Kramer Critical Range
where:
QU = Value from Studentized Range Distribution with c
and n - c degrees of freedom for the desired level of  (see
appendix E.9 table)
MSW = Mean Square Within
nj and nj’ = Sample sizes from groups j and j’
1. Compute absolute mean
Club 1 Club 2 Club 3 differences:
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
2. Find the QU value from the table in appendix E.9 with
c = 3 and (n – c) = (15 – 3) = 12 degrees of freedom for
the desired level of  ( = .05 used here):
3. Compute Critical Range:
4. Compare:
5. All of the absolute mean differences are greater than

critical range. Therefore there is a significant difference
between each pair of means at the 5% level of significance.
ANOVA Assumptions
 Randomness and Independence

 Select random samples from the c groups (or
randomly assign the levels)
 Normality
 The sample values from each group are from a
normal population
 Homogeneity of Variance
 Can be tested with Levene’s Test
ANOVA Assumptions
Levene’s Test
 Tests the assumption that the variances of each
group are equal.
 First, define the null and alternative hypotheses:
 H0: σ21 = σ22 = …=σ2c
 H1: Not all σ2j are equal
 Second, compute the absolute value of the difference
between each value and the median of each group.
 Third, perform a one-way ANOVA on these absolute
differences.
Two-Way ANOVA
 Examines the effect of

 Two factors of interest on the dependent
variable
 e.g., Percent carbonation and line speed on soft
drink bottling process
 Interaction between the different levels of these
two factors
 e.g., Does the effect of one particular carbonation
level depend on which level the line speed is set?
Two-Way ANOVA
 Assumptions
 Populations are normally distributed

 Populations have equal variances
 Independent random samples are selected
Two-Way ANOVA
Sources of Variation
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n/ = number of replications for each cell
n = total number of observations in all cells (n = rcn/)
Xijk = value of the kth observation of level i of factor A and
level j of factor B
Two-Way ANOVA
Sources of Variation
SST = SSA + SSB + SSAB + SSE Degrees of
Freedom:
SSA r–1
Factor A Variation
SST
SSB c–1
Total Variation Factor B Variation
SSAB
Variation due to interaction (r – 1)(c – 1)
between A and B
n-1
SSE rc(n/ – 1)
Random variation (Error)
Two-Way ANOVA
Equations
Total Variation:
Factor A Variation:
Factor B Variation:
Two-Way ANOVA
Equations
Interaction Variation:
Sum of Squares Error:

Two-Way ANOVA
Equations

n/ = number of replications in each cell
Two-Way ANOVA
Equations

n/ = number of replications in each cell
Two-Way ANOVA
Equations
Two-Way ANOVA:
The F Test Statistic
F Test for Factor A Effect
H0: μ1.. = μ2.. = • • • = μr..
Reject H0 if
H1: Not all μi.. are equal
F > FU
F Test for Factor B Effect

H0: μ.1. = μ.2. = • • • = μ.c.
Reject H0 if
H1: Not all μ.j. are equal
F > FU
F Test for Interaction Effect

H0: the interaction of A and B is
equal to zero
Reject H0 if
H1: interaction of A and B isn’t zero
F > FU
Two-Way ANOVA:
Summary Table
Source of Degrees of Sum of Mean F
Variation Freedom Squares Squares Statistic
MSA MSA/
Factor A r–1 SSA
= SSA /(r – 1) MSE
MSB MSB/
Factor B c–1 SSB
= SSB /(c – 1) MSE
AB MSAB MSAB/
(r – 1)(c – 1) SSAB
(Interaction) = SSAB / (r – 1)(c – 1) MSE
MSE
Error rc(n’ – 1) SSE
= SSE/rc(n’ – 1)
Total n–1 SST
Two-Way ANOVA:
Features
 Degrees of freedom always add up
 n-1 = rc(n/-1) + (r-1) + (c-1) + (r-1)(c-1)
 Total = error + factor A + factor B + interaction
 The denominator of the F Test is always the same
but the numerator is different
 The sums of squares always add up
 SST = SSE + SSA + SSB + SSAB
 Total = error + factor A + factor B + interaction
Two-Way ANOVA:
Interaction
 No Significant Interaction:  Interaction is present:
Factor B Level 1
Mean Response
Mean Response
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor B Level 2
Factor B Level 3
Factor A Levels Factor A Levels

What is a Chi Square Test?
• There are two types of chi-square tests. Both use the chi-square statistic and
distribution for different purposes:
 A chi-square goodness of fit test determines if sample data matches a population.
 A chi-square test for independence compares two variables in a contingency table to see if
they are related. In a more general sense, it tests to see whether distributions of categorical
variables differ from each another
• A very small chi square test statistic means that your observed data fits your
expected data extremely well. In other words, there is a relationship.
• A very large chi square test statistic means that the data does not fit very well. In
other words, there isn’t a relationship.
Chi Square P-Values
A chi square test will give you a p-value. The p-value will tell you if your test results
are significant or not. In order to perform a chi square test and get the p-value, you
need two pieces of information:
• Degrees of freedom. That’s just the number of categories minus 1.

• The alpha level(α). This is chosen by you, or the researcher. The usual alpha level
is 0.05 (5%), but you could also have other levels like 0.01 or 0.10.
Jamovi
• One-way ANOVA • Chi square goodness of fit

https://www.youtube.com/watch?v=QAcaiRU_fbY&t=312s
https://www.youtube.com/watch?v=FfabgyEKwtk
• Two-way ANOVA or Factorial AVOVA • Chi-squared test of association
https://www.youtube.com/watch?v=epTeSAmCCM8 https://www.youtube.com/watch?v=vsrj647Tb3g
Thank You

Statistics For Business: Analysis of Variance

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Business: Analysis of Variance

Uploaded by

Copyright:

Available Formats

Statistics for Business

In this chapter, you learn:

Analysis of Variance (ANOVA)

 Investigator controls one or more factors of interest

 Experimental units (subjects) are assigned

Examples: Accident rates for 1st, 2nd, and 3rd shift

All Means are the same:

At least one mean is different:

 Total variation can be split into two parts:

SST = SSA + SSW

SST = Total Sum of Squares

SST = SSA + SSW

Total Variation = the aggregate dispersion of the individual data

Among-Group Variation = dispersion between the factor

Within-Group Variation = dispersion that exists among the data

Among Group Within Group Variation

Mean Squares Among

Mean Squares Within

Mean Squares Within

 MSA is mean squares among variances

You want to see if three

SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

Decision: Conclusion: There is

5. All of the absolute mean differences are greater than

 Randomness and Independence

 Examines the effect of

 Populations are normally distributed

Sum of Squares Error:

r = number of levels of factor A

r = number of levels of factor A

F Test for Factor B Effect

F Test for Interaction Effect

Factor A Levels Factor A Levels

• Degrees of freedom. That’s just the number of categories minus 1.

• One-way ANOVA • Chi square goodness of fit

• Two-way ANOVA or Factorial AVOVA • Chi-squared test of association

You might also like