Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

ClassWork 03 Two Way ANOVA

Exercise 1

Let us prove that: SSTOT = SS A + SS B + SS AB + SS E

Solution

yijk = µˆ + τˆi + βˆ j + τβ
 +e
ij ijk

yijk − µˆ = τˆi + βˆ j + τβ
 +e
ij ijk

∑( )
2
∑( y ˆ)
2
ijk − µ= eijk + τˆi + βˆ j + τβ

ij
ijk ijk

Since the cross-product terms are zero, we get

∑( y − µˆ ) =
2 2

ijk
ijk ∑τˆ + ∑ βˆ + ∑τβ
ijk
i
2  + e
∑ ijk
2
j
ijk
ij
ijk
2
ijk

∑( y − y )
2
SSTOT
= ijk SS A
= ∑=
τˆ i
2
bn∑τˆi2 SS B
= βˆ
∑= 2
j an∑ βˆ j2
ijk ijk i ijk j
2 2
 
SS AB
= ∑=
τβ
ijk
n∑τβ ij
ij
ij SS E
= ∑e
ijk
2
ijk

Let us prove that the cross product terms are zero, we do that just for one

 = 2n τˆ  τβ
2∑τˆi *τβ  =0
ij ∑ i ∑ ij 
ijk i  j 

Remember that one of the constraints to solve the normal equations is ∑τβ
j
ij =0

1
Exercise 2 [M]

The yield of a chemical process is being studied. The two most important variables are thought to be
the pressure and the temperature. Three levels of each factor are selected, and a factorial experiment
with two replicates is performed. The yield data follows:

Pressure (psi)
T(°C) 200 215 230
90.4 90.7 90.2
150
90.2 90.6 90.4
90.1 90.5 89.9
160
90.3 90.6 90.1
90.5 90.8 90.4
170
90.7 90.9 90.1

a) Analyze the data. Use α=0.05 and comment the model’s adequacy;
b) Calculate the residual for the data (1,2,1) = 90.7;
c) Under what conditions would you operate this process? Evaluate by hand the three constants
and the critical value, then use Minitab.

Solution

a) Analyze the data. Use α=0.05 and comment the model’s adequacy
Since the experiment is replicated, the individual value plot is used. The individual value plot
should be used only when replicates are present.

Individual Value Plot of Yield


91,0

90,8

90,6
Yield

90,4

90,2

90,0

Pressure 200 215 230 200 215 230 200 215 230
Temperature 150 160 170

The graph indicates that the variability appears uniform.

Two more graphs are plotted to better understand the influence of each factor and of their interaction
on the response variable. The Minitab commands are:
• StatANOVAMain Effect Plot
• StatANOVAInteraction Plot

2
Main Effects Plot for Yield Interaction Plot for Yield
Data Means Data Means
Temperature Pressure 90,9 Temperature
90,7 150
90,8 160
170
90,7
90,6
90,6

90,5 90,5

Mean
Mean

90,4

90,4 90,3

90,2
90,3
90,1

90,0
90,2
200 215 230
150 160 170 200 215 230 Pressure

The main effects plot displays the response means for each factor level in sorted order. A horizontal
line is drawn at the grand mean. The effects are the differences between the means and the reference
line. Analyzing the graph, both the factors seem to be relevant and the pressure seems to have a
greater influence on the response than the temperature. Instead, the interaction between pressure and
temperature seems fairly small, as shown by the similar shape of the three curves.

Let us do the analysis by hand


1 y2
SS
= A ∑ i  abn
bn i
y 2
− = 0.3011

1 y2
SS
= B ∑  j  abn 0.76778
an j
y 2
− =

1 y2
SS=
AB ∑ ij  abn − SS A − SS=B 0.06889
n ij
y 2

y2
SSTOT = ∑ y − 2
= 1.29778
ijk SS E = SSTOT − SS A − SS B − SS AB = 0.16
ijk abn

The ANOVA table is:


Source SS df MS F0
A 0.30111 2 0.151 8.47*
B 0.76778 2 0.384 21.59*
AB 0.06889 4 0.017 0.969
Error 0.16 9 0.018
Total 1.29778 17
* significant at 5%

The analysis in MINITAB is done using the command StatANOVAGeneral Linear Model

General Linear Model: Yield versus Temperature; Pressure


Factor Information
Factor Type Levels Values
Temperature Fixed 3 150; 160; 170
Pressure Fixed 3 200; 215; 230

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Temperature 2 0,30111 0,15056 8,47 0,009
Pressure 2 0,76778 0,38389 21,59 0,000
3
Temperature*Pressure 4 0,06889 0,01722 0,97 0,470
Error 9 0,16000 0,01778
Total 17 1,29778

Model Summary
S R-sq R-sq(adj) R-sq(pred)
0,133333 87,67% 76,71% 50,68%

Check the residual assumptions:


Scatterplot of SRES1 vs FITS1; Temperature; Pressure Probability Plot of SRES1
Normal
FITS1 Temperature
2 99
Mean -2,51219E-13
1 StDev 1,029
95 N 18
AD 1,039
0 90
P-Value 0,007

-1 80
70

Percent
-2 60
SRES1

90,0 90,2 90,4 90,6 90,8 150 155 160 165 170 50
40
Pressure
2 30
20
1
10
0 5

-1
1
-3 -2 -1 0 1 2 3
-2
200 210 220 230 SRES1

The hypothesis of normality is refused (the p-value is lower than 0.05). From the probability plot, it
is clear the presence of an overfitting problem. So, we try to solve the rejection of the normality
assumption by reducing the model (i.e. eliminate the non-significant factors) rather than using Box-
Cox transformation. In general, if the normality assumption is not verified and there are non-
significant factors in the model, it is recommended to reduce the model before transforming the data.
Consequently, let us focus only on the significant model (pure additive).

General Linear Model: Yield versus Temperature; Pressure

Factor Information
Factor Type Levels Values
Temperature Fixed 3 150; 160; 170
Pressure Fixed 3 200; 215; 230

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Temperature 2 0,30111 0,15056 8,55 0,004
Pressure 2 0,76778 0,38389 21,80 0,000
Error 13 0,22889 0,01761
Lack-of-Fit 4 0,06889 0,01722 0,97 0,470
Pure Error 9 0,16000 0,01778
Total 17 1,29778

Model Summary
S R-sq R-sq(adj) R-sq(pred)
0,132691 82,36% 76,94% 66,19%

The Lack of Fit (LOF) test and the pure error will be covered in ClassWork 08 – regression. For
now, you should only remember that p-value of the LOF test should be >0.05.

4
Before drawing the conclusion, we check the residual assumptions.
Scatterplot of SRES2 vs FITS2; Temperature; Pressure Probability Plot of SRES2
Normal
FITS2 Temperature
2 99
Mean -2,10079E-13
1 StDev 1,029
95 N 18
0 AD 0,189
90
P-Value 0,888
-1 80
70
-2

Percent
60
SRES2

90,0 90,2 90,4 90,6 90,8 150 155 160 165 170 50
Pressure 40
2 30
20
1
10
0
5
-1
1
-2 -3 -2 -1 0 1 2 3
200 210 220 230 SRES2

Test for Equal Variances: SRES2 vs Temperature; Pressure


Temperature Pressure

Bartlett’s Test
150 200
P-Value 0,990
215

230

160 200

215

230

170 200

215

230

0 100 200 300 400 500 600


95% Bonferroni Confidence Intervals for StDevs

All the residuals belong to the interval (-3;+3), then the residuals appear independent from the
predicted response and independent from the factors. The hypotheses of normality and homogeneous
variance cannot be refused.
In conclusion, the temperature and the pressure influence the response. The additive model is
significant.

b) Calculate the residual for the data (1,2,1)=90.7;

Let us calculate manually the residual e121.


Using the full model, the residual would be:
e121 = y121 − yˆ12 = 90.7 − µˆ12 = 90.70 − 90.65 = 0.05

Instead using the reduced model (this is the correct way to estimate the residual):
542.5 544.1 1627.4
yˆ12 = µˆ + τˆ1 + βˆ2 = y + ( y1 − y ) + ( y 2  − y ) = y1 + y 2  − y = + − = 90.687
6 6 18
e121 = y121 − yˆ12 = 90.7 − 90.687 = 0.0 1

c) Under what conditions would you operate this process?

The model is additive: we can pick the best level of temperature independently from the pressure
α
and vice versa. Thus, α FAM = 0.05 ⇒ α= FAM = 0.025 Since both factors have the same
2
number of levels, we can calculate the three constants only for one factor.

The analysis for the factor temperature:

5
a ( a − 1) 3 ( 3 − 1)
Bα t α
= ( df E=
) rA = = 3= df E 13
2 rA 2 2
=Bα t0.025/6
= (13) 3.107

1 1 4.296

= qα ( a,=
df E ) q0.025 ( =
3,13) = 3.04
2 2 2

Sα =( a − 1) Fα ( a − 1, df E ) =2 F0.025 ( 2,13) =2* 4.9653 =3.151

Sα FAM= ( a + b − 2 ) Fα FAM
( a + b − 2, df E )= 4 F0.05 ( 4,13)= 4*3.179= 3.57

MS E 0.01761
The critical value=is Tα 2 3.04
= 2 0.2329
bn 6
(For demonstration purposes, we built the Table of Differences by hand.)
Let us build the matrix of the differences:
Temperature 150 170
160 0.167 0.317
150 0.15

The conclusion is:

160 150 170

The temperatures 160° and 150° are not statistically different, and the same conclusion is drawn for
the temperatures 150° and 170°.

The same analysis is made for the factor pressure. The constant values are the same because a=b and
the critical value is still 0.228. The matrix of the differences is:
Pressure 200 215
230 0.183 0.5
200 0.317

230 200 215

In conclusion the pressures 230 psi and 200 psi do not are different from a statistical point of view.

If we wish to maximize the yield of the chemical process, we select a pressure equal to 215 psi,
instead for the temperature the levels 150° and 170° are not statistically different. If we consider that
a higher temperature corresponds to a higher energetic cost, we would choose the temperature 150°.

6
Instead, if we have enough money to do more experiments, we could focus our attention on these two
levels of temperature to better understand which one allow for a higher yield.

With Minitab:
Stat  ANOVA  Comparison Options:97.5

7
Exercise 3 [M]

An article describes an experiment to investigate the effect of the type of glass and the type of
phosphor on the brightness of a television tube. The response variable is the current necessary (in
microamps) to obtain a specified brightness level. The data are as follows:

Phosphor
Glass 1 2 3
1 280 300 290
290 310 285
285 295 290
2 230 260 220
235 240 225
240 235 230

a) Analyze the data and draw conclusions. Use α=0.05;


b) Use Tukey’s test with Minitab to determine the best condition (α=0.05).

Solution

a) Analyze the data and draw conclusions. Use α=0.05;


We start by using the individual value plot since we have three replicates for each condition.

Individual Value Plot of Brightness

310

300

290

280
Brightness

270

260

250

240

230

220

Phosphor 1 2 3 1 2 3
Glass 1 2

The individual value plot indicates that no evident outliers appears, and the variability is uniform.

Main Effects Plot for Brightness Interaction Plot for Brightness


Data Means Data Means
Glass Phosphor 310 Glass
1
290 300 2

290
280
280

270 270
Mean
Mean

260
260
250

250
240

230
240

220
230 1 2 3
1 2 1 2 3 Phosphor

From the Main Effect plot, the type of glass seems to influence the response more than the type of
phosphor. In the Interaction plot, the lines are approximately parallel, indicating a probable lack of
interaction between factors glass and phosphor.
8
General Linear Model: Brightness versus Glass; Phosphor
Factor Information
Factor Type Levels Values
Glass Fixed 2 1; 2
Phosphor Fixed 3 1; 2; 3

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Glass 1 14450,0 14450,0 273,79 0,000
Phosphor 2 933,3 466,7 8,84 0,004
Glass*Phosphor 2 133,3 66,7 1,26 0,318
Error 12 633,3 52,8
Total 17 16150,0

Model Summary
S R-sq R-sq(adj) R-sq(pred)
7,26483 96,08% 94,44% 91,18%

Check for residual assumptions


Scatterplot of SRES1 vs FITS1; Glass; Phosphor Probability Plot of SRES1
FITS1 Glass
Normal
99
2 Mean -3,98447E-15
StDev 1,029
1 95 N 18
AD 0,359
90
0 P-Value 0,411
80
-1
70
Percent

-2 60
SRES1

220 240 260 280 300 1,00 1,25 1,50 1,75 2,00 50
Phosphor 40
30
2
20

1 10

0 5

-1
1
-3 -2 -1 0 1 2 3
-2
1,0 1,5 2,0 2,5 3,0 SRES1

Test for Equal Variances: SRES1 vs Glass; Phosphor


Glass Phosphor

Bartlett’s Test
1 1 P-Value 0,458

2 1

0 5 10 15 20 25 30 35
95% Bonferroni Confidence Intervals for StDevs

There are not outliers. The hypothesis of normality cannot be refused, the same is true for the test of
equal variance. The residual assumptions are checked. In conclusion the type of glass and the type of
phosphor are significant, instead their interaction is insignificant.

b) Use Tukey’s test to determine the best condition (α=0.05).


In this case we have two families: Glass and Phosphor: they both have p-values < 0.05 and the
interactions are not significant.
The Minitab command is: StatANOVAGeneral Linear Model Comparison
Options: confidence level 97,5%

9
Grouping Information Using the Tukey Method and 97,5% Confidence
Glass N Mean Grouping
1 9 291,667 A
2 9 235,000 B
Means that do not share a letter are significantly different.

Grouping Information Using the Tukey Method and 97,5% Confidence


Phosphor N Mean Grouping
2 6 273,333 A
1 6 260,000 B
3 6 256,667 B
Means that do not share a letter are significantly different.

If we wish to decrease the current necessary, we would recommend to use the glass number 2 and
one of the phosphors among the types 1and 3.

10
Exercise 4 [M]

Johnson and Leone describe an experiment to investigate warping of copper plates. The two factors
studied were the temperature and the copper content of the plates. The response variable was a
measure of the amount of warping. The data were as follows:

Copper Content (%)


T (°C) 40 60 80 100
50 17; 20 16; 21 24; 22 28; 27
75 12; 9 18; 13 17; 12 27; 31
100 16; 12 18; 21 25; 23 30; 23
125 21; 17 23; 21 23; 22 29; 31

a) Analyze the data and draw conclusions. Use α=0.05;


b) If low warping is desirable, what level of copper content would you specify? Use Tuckey with
Minitab (no manual calculations are required).
c) Suppose that temperature cannot be easily controlled in the environment in which the copper
plates are to be used. Does this change your previous answer?

Solution

a) Analyze the data and draw conclusions. Use α=0.05;


We have replicated conditions, so we start by using the individual value plot.
Individual Value Plot of Warping

30

25
Warping

20

15

10

Copper 40 60 80 100 40 60 80 100 40 60 80 100 40 60 80 100


Temperature 50 75 100 125

The variability appears uniform and no evident outliers are shown in the graph.

Main Effects Plot for Warping Interaction Plot for Warping


Data Means Data Means
Temperature Copper 30 Temperature
30,0 50
75
100
27,5 125
25

25,0
Mean

20
Mean

22,5

20,0 15

17,5
10

15,0 40 60 80 100
50 75 100 125 40 60 80 100 Copper

The factor copper seems to affect more the response variable than the temperature. The interaction
between the temperature and the copper does not seem particularly relevant.

11
General Linear Model: Warping versus Temperature; Copper
Factor Information
Factor Type Levels Values
Temperature Fixed 4 50; 75; 100; 125
Copper Fixed 4 40; 60; 80; 100

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Temperature 3 156,1 52,031 7,67 0,002
Copper 3 698,3 232,781 34,33 0,000
Temperature*Copper 9 113,8 12,642 1,86 0,133
Error 16 108,5 6,781
Total 31 1076,7

Model Summary
S R-sq R-sq(adj) R-sq(pred)
2,60408 89,92% 80,48% 59,69%

Before drawing the conclusion, let us check the residual assumptions.

Scatterplot of SRES1 vs FITS1; Temperature; Copper Probability Plot of SRES1


Normal
FITS1 Temperature
2 99
Mean -4,78784E-16
1 StDev 1,016
95 N 32
0 AD 0,666
90
P-Value 0,074
-1 80
70
-2
Percent

60
SRES1

10 15 20 25 30 40 60 80 100 120 50
Copper 40
2 30
20
1
10
0 5

-1
1
-2 -3 -2 -1 0 1 2 3
40 60 80 100 SRES1

Test for Equal Variances: SRES1 vs Temperature; Copper


Temperature Copper

50 40 Bartlett’s Test
60
P-Value 0,984
80
100

75 40
60
80
100

100 40
60
80
100

125 40
60
80
100

0 200 400 600 800 1000 1200 1400


95% Bonferroni Confidence Intervals for StDevs

No outliers appear in the graph, in fact all the standardized residuals belong to the interval (-3,+3).
Moreover, the residuals appear independent form the predicted response and from both factors. The
hypotheses of normality and homogenous variance cannot be rejected.

b) If low warping is desirable, what level of copper content would you specify?
The Minitab command is: StatANOVAGeneral Linear Model Comparison
Options: confidence level 95%
Grouping Information Using the Tukey Method and 95% Confidence
Copper N Mean Grouping
100 8 28,250 A

12
80 8 21,000 B
60 8 18,875 B C
40 8 15,500 C
Means that do not share a letter are significantly different.

If we wish low warping, there is not difference among the levels of copper 60 and 40. A next
campaign of experiment focalized on these two levels is desirable.

c) Suppose that temperature cannot be easily controlled in the environment in which the copper
plates are to be used. Does this change your previous answer?

No, it does not. The model is purely additive; this means that we can optimize separately the
temperature and the copper.

13
Exercise 5 [M]
The quality control department of a fabric finishing plant is studying the effect of several factors on
the dyeing of cotton-synthetic cloth. Three operators, three cycle times, and two temperatures were
selected. Three small specimens of cloth were dyed under each set of conditions. The finished cloth
was compared to a standard, and a numerical score was assigned. The results follow:

Temperature
300 350
Operator Operator
Time 1 2 3 1 2 3
23 27 31 24 38 34
40 24 28 32 23 36 36
25 26 29 28 35 39
36 34 33 37 34 34
50 35 38 34 39 38 36
36 39 35 35 36 31
28 35 26 26 36 28
60 24 35 27 29 37 26
27 34 25 25 34 24

a) How should the experiment be conducted?


b) Analyze the data. Use α=0.05.

Solution

a) How should the experiment be conducted?

The experiment should be conducted randomizing the order of the experiments.

b) Analyze the data. Use α=0.05

Individual Value Plot of Score


40

35
Score

30

25

Operator 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
Time 40 50 60 40 50 60
Temperature 300 350

The individual value plot indicates that the variability appears uniform. Instead some data could be
outliers, for instance the observation (300, 50, 2, 1) is somewhat far from the other two replicates, the
same can be said for the observations (300, 60, 2, 2) and (350, 40, 1, 3).

14
Main Effects Plot for Score Interaction Plot for Score
Data Means Data Means
Temperature Time Operator 40 50 60 1 2 3
36
35
Temperature
300
35 350
30
Temperature

34
25

Time
33 35
Mean

40
50
32 Time 30 60

31 25

30

Operator
29
300 350 40 50 60 1 2 3

The factors time and operator seem to influence the response more than the factor temperature. Then,
an interaction plot with three or more factors show separate two-way interaction plots for all two-
factor combinations. The factors time and operator seem to interact since the lack of parallelism of
the lines.

General Linear Model: Score versus Temperature; Time; Operator


Factor Information
Factor Type Levels Values
Temperature Fixed 2 300; 350
Time Fixed 3 40; 50; 60
Operator Fixed 3 1; 2; 3

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Temperature 1 50,07 50,074 15,28 0,000
Time 2 436,00 218,000 66,51 0,000
Operator 2 261,33 130,667 39,86 0,000
Temperature*Time 2 78,81 39,407 12,02 0,000
Temperature*Operator 2 11,26 5,630 1,72 0,194
Time*Operator 4 355,67 88,917 27,13 0,000
Temperature*Time*Operator 4 46,19 11,546 3,52 0,016
Error 36 118,00 3,278
Total 53 1357,33

Model Summary
S R-sq R-sq(adj) R-sq(pred)
1,81046 91,31% 87,20% 80,44%

We have to check the residual assumptions before drawing the conclusions.

Scatterplot of SRES1 vs FITS1; Temperature; Time; Operator Probability Plot of SRES1


FITS1 Temperature
Normal
2 99
Mean -2,80434E-15
1 StDev 1,009
95 N 54
0 AD 0,376
90
P-Value 0,400
-1 80
70
-2
Percent

60
SRES1

24 27 30 33 36 300 312 324 336 348 50


Time Operator 40
2 30
20
1
10
0
5

-1
1
-2 -3 -2 -1 0 1 2 3
40 45 50 55 60 1,0 1,5 2,0 2,5 3,0 SRES1

15
Test for Equal Variances: SRES1 vs Temperature; Time; Operator
Temperature Time Operator
300 40 1 Bartlett’s Test
2
3 P-Value 0,867

50 1
2
3
60 1
2
3

350 40 1
2
3
50 1
2
3
60 1
2
3

0 10 20 30 40 50
95% Bonferroni Confidence Intervals for StDevs

From the scatterplot, no outliers appear. The hypotheses of normality and equal variance cannot be
rejected. Thus, the model assumptions are verified. In conclusion, the three factors are significant, as
well as the interactions time*operator and time*temperature and the third level interaction.

If we wish to estimate the model parameters, one of the possible output of the GLM command is:
Coefficients
Term Coef SE Coef T-Value P-Value VIF
Constant 31,556 0,246 128,08 0,000
Temperature
300 -0,963 0,246 -3,91 0,000 1,00
Time
40 -1,667 0,348 -4,78 0,000 1,33
50 4,000 0,348 11,48 0,000 1,33
Operator
1 -2,444 0,348 -7,02 0,000 1,33
2 2,889 0,348 8,29 0,000 1,33
Temperature*Time
300 40 -1,704 0,348 -4,89 0,000 1,33
300 50 0,963 0,348 2,76 0,009 1,33
Temperature*Operator
300 1 0,519 0,348 1,49 0,145 1,33
300 2 -0,593 0,348 -1,70 0,098 1,33
Time*Operator
40 1 -2,944 0,493 -5,98 0,000 1,78
40 2 -1,111 0,493 -2,25 0,030 1,78
50 1 3,222 0,493 6,54 0,000 1,78
50 2 -1,944 0,493 -3,95 0,000 1,78
Temperature*Time*Operator
300 40 1 1,648 0,493 3,34 0,002 1,78
300 40 2 -1,407 0,493 -2,86 0,007 1,78
300 50 1 -1,185 0,493 -2,41 0,021 1,78
300 50 2 1,093 0,493 2,22 0,033 1,78

16
Exercise 6 [M]

Consider the three-factor model:

=i 1, 2, …, a

yijk = µ + τ i + β j + γ k + τβij + βγ jk + ε ijk   with  j = 1, 2, …, b
=
 k 1, 2, …, c

Notice that there is only one replicate. Assuming all the factors are fixed, write down the analysis of
variance table, including the expected mean squares (use the Montgomery’s tables concerning the
complete model).

Solution

The model is yijk = µ + τ i + β j + γ k + τβij + βγ jk + ε ijk


Because of the lack of replicates, the estimate of the error is done using the terms that do not appear
into the model: SSAC e SSABC.
It is required to write down the ANOVA table and the expressions of E(MS) using the Montgomery’s
table. We can easily delete the rows of the terms that are not considered into the model because the
design is orthogonal. In fact, the estimates do not change if some rows are deleted. The table is:

Source SS df MS E(MS)
A SSA (a-1) SSA/(a-1) bc
σ2 + ∑
a −1 i
τ i2

B SSB (b-1) SSB/(b-1) ac


σ2 + ∑
b −1 j
β j2

C SSC (c-1) SSC/(c-1) ab


σ2 + ∑
c −1 k
γ k2

AB SSAB (a-1)(b-1) SSAB/(a-1)(b-1) c


( a − 1)( b − 1) ∑
σ2 + τβ 2
ij
ij

BC SSBC (b-1)(c-1) SSBC/(b-1)(c-1) a


σ2 +
( b − 1)( c − 1) ∑ βγ
jk
2
jk

Error SSE dfE SSE/dfE E ( MS E )


Total SSTOT abc-1
.
Let us calculate the degree of freedom of the error:
=SS E SS AC + SS ABC
df E =df AC + df ABC =(a − 1)(c − 1) + (a − 1)(b − 1)(c − 1) =b(a − 1)(c − 1)

Let us calculate the sum of square of the error:

17
SS AC + SS ABC df SS AC df ABC SS ABC df df ABC
MS E = = AC + = AC MS AC + MS ABC
df AC + df ABC df AC + df ABC df AC df AC + df ABC df ABC df AC + df ABC df AC + df ABC

df AC df ABC
E ( MS E ) = E ( MS AC ) + E ( MS ABC ) =
df AC + df ABC df AC + df ABC
df AC  2 b  df ABC  2 1 
= σ +
df AC + df ABC

(a − 1)(c − 1) ik
τγ ik2  +  σ + ∑
(a − 1) ( b − 1) (c − 1) ijk
τβγ ijk2  =
  df AC + df ABC  
df AC b df ABC 1
σ2 +
= ∑
df AC + df ABC (a − 1)(c − 1) ik
τγ ik2 + ∑
df AC + df ABC (a − 1) ( b − 1) (c − 1) ijk
τβγ ijk2

18
Exercise 7

An experiment was conducted in order to measure the amount of the product y. Two factors were
studied (A and B). The levels chosen were coded to the (-1,1). The levels of A were -1 and +1 and
the levels of B were -1, 0 and +1. The data were as follows:

B
-1 0 1
4.7 3.55 8.08 9.47 17.29 15.92
-1
4.01 2.38 9.12 9.51 15.75 16.86
A
20.30 21.51 21.47 18.79 21.61 16.28
1
19.89 21.04 21.29 18.39 18.08 18.03

a) Analyze the data. Use α=0.05;


b) Under what conditions would you operate this process? Evaluate by hand the three constants
and the critical value, then use Minitab.

Solution

a) Analyze the data. Use α=0.05;

Individual Value Plot of Productivity

20

15
Productivity

10

0
B -1 0 1 -1 0 1
A -1 1

It is better to pay attention to the data (1,0) and (1,1), they appear as potential outliers.

Main Effects Plot for Productivity Interaction Plot for Productivity


Data Means Data Means
A B 22,5 A
20 -1
20,0 1

18 17,5

15,0
16
Mean
Mean

12,5

14 10,0

7,5
12
5,0

10
-1 0 1
-1 1 -1 0 1 B

The factor A seems to have an influence on the response greater than the factor B. The interaction
seems to be fairly relevant.

General Linear Model: Productivity versus A; B


Factor Information
Factor Type Levels Values

19
A Fixed 2 -1; 1
B Fixed 3 -1; 0; 1

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
A 1 600,40 600,400 356,70 0,000
B 2 113,08 56,542 33,59 0,000
A*B 2 227,03 113,516 67,44 0,000
Error 18 30,30 1,683
Total 23 970,81

Model Summary
S R-sq R-sq(adj) R-sq(pred)
1,29739 96,88% 96,01% 94,45%

Before drawing the conclusion, let us check the residual assumptions


Scatterplot of SRES1 vs FITS1; A; B Probability Plot of SRES1
Normal
FITS1 A
99
Mean 2,312965E-16
2
StDev 1,022
95 N 24
AD 0,216
0 90
P-Value 0,827
80
70
-2
Percent

60
SRES1

5 10 15 20 -1,0 -0,5 0,0 0,5 1,0 50


B 40
30
2 20

10

0 5

1
-2 -3 -2 -1 0 1 2 3
-1,0 -0,5 0,0 0,5 1,0 SRES1

Test for Equal Variances: SRES1 vs A; B


A B

Bartlett’s Test
-1 -1 P-Value 0,218

1 -1

0 2 4 6 8 10 12 14
95% Bonferroni Confidence Intervals for StDevs

From the graphs, we can observe that the residual of the data (1,1) has a different behavior respect to
the others, but its residual value is lower than 3, thus it cannot be classified as an outlier. The
hypothesis of normality and the hypothesis of equal variance cannot be rejected.
In conclusion, both factors are significant as well as their interaction.

b) Under what conditions would you operate this process?


The complete model is significant; we have to choose the combination of factor levels that maximize
the response. A multiple comparison is required. First of all, we have to calculate the constant values
of the Bonferroni, Sheffè and Tukey.

20
ab(ab − 1) 6 ( 6 − 1)
=r = = 15 = df E 18 =a 2=b 3
2 2
Bonferroni= B0.05 t0.05
= 30 (18) 3.38
1 1 4.49
Tukey T
=0.05 q0.05 (ab, df
= E) q0.05 (6,18)
= = 3.18
2 2 2
Scheffé 5 ( 5,18 )
S0.05 = (ab − 1) Fα (ab − 1, df E ) = 5 F0.0= 13.864 3.72
=
The Tukey constant is characterized by the lower value.
MS E 1.68
The critical value is: T0.05
= 2 3.18
= 2 2.195
n 4
The mean of each cell is:

A B yij 
-1 -1 3.66
-1 0 9.045
-1 1 16.455
1 1 18.500
1 0 19.985
1 -1 20.685

If we compare each mean cell value to the others:


(-1,0) (-1,1) (1,1) (1,0) (1, -1)
9.045 16,455 18.5 19.985 20.685
(-1,-1) 3.66 5.385 12.795 14.84 16.325 17.025
(-1,0) 9.045 7.41 9.455 10.94 11.64
(-1,1) 16,455 2.045 3.53 4.23
(1,1) 18.5 1.485 2.185
(1,0) 19.985 0.7

In the table, the significant differences are highlighted. The table of the difference is depicted in the
next graph. In conclusion the combination (1,-1), (1,1) and (1,0) are not statistically different.

(-1,-1) (-1,0) (-1,1) (1,1) (1,0) (1,-1)

Grouping Information Using the Tukey Method and 95% Confidence


A*B N Mean Grouping
1 -1 4 20,685 A
1 0 4 19,985 A
1 1 4 18,500 A B
-1 1 4 16,455 B
-1 0 4 9,045 C
-1 -1 4 3,660 D
Means that do not share a letter are significantly different.

21
To maximize the response, we can choose between (A,B) = (1, -1) or (1 0) or (1 1). These
conditions are all equivalent.

Exercise 8 [February 28th 2012 ]

An experiment, presented in the paper “A Systematic Approach to the Analysis of Means” (E. G.
Schilling, Journal of Quality Technology, 1973), investigated the washing power of a solution as
measured by the reflectance of pieces of cotton cloth after washing. Pieces of cloth were soiled with
colloidal graphite and liquid paraffin and then washed for 20 minutes at 60° followed by two rinses
at 40° and 30°, respectively. The three factors in the washing solution of interest were:
• “sodium carbonate” (Factor A, levels 0%, 0.05%, and 0.1%);
• “detergent” (Factor B, levels 0.05%, 0.1%, and 0.2%);
• “sodium carboxymethyl cellulose” (Factor C, levels 0%, 0.025%, 0.05%).
One observation was taken per treatment combination, and the responses are shown in the next table:

Cellulose
Carbonate Detergent 1 2 3
1 10,6 14,9 18,2
1 2 19,8 24,3 23,2
3 27 31,5 34
1 19,7 25,5 25,9
2 2 32,9 36,4 38,9
3 36,1 39 40,6
1 22,3 29,4 29,7
3 2 32 41 41,6
3 32,1 41,5 38,7

a) Analyze the data. Use α=0.05;


b) If you wish to increase the quality of the washing power of a solution, under what conditions
would you operate this process? Use Tukey with α=5%; calculate manually the Tukey’s
constant and the critical value.

Solution

a) Analyze the data. Use α=0.05;

Main Effects Plot for Washing Interaction Plot for Washing


Data Means Data Means
Carbonate Detergent Cellulose 1 2 3 1 2 3
40
36 Carbonate
1
34 30 2
Carbonate 3
32 20

40
30 Detergent
Mean

1
28 30 2
Detergent 3
26 20

24

22
Cellulose
20
1 2 3 1 2 3 1 2 3

22
The detergent and the carbonate seem relevant and their effects are large compared to the effect of
the cellulose. The carbonate and detergent show nonparallel lines, indicating a probable interaction.
Instead, the interaction lines of detergent and cellulose are parallel; we do not suspect any interaction.
The same can be true for the interaction between carbonate e cellulose.

General Linear Model: Washing versus Carbonate; Detergent; Cellulose


Factor Information
Factor Type Levels Values
Carbonate Fixed 3 1; 2; 3
Detergent Fixed 3 1; 2; 3
Cellulose Fixed 3 1; 2; 3

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Carbonate 2 723,41 361,707 207,34 0,000
Detergent 2 933,03 466,516 267,42 0,000
Cellulose 2 224,19 112,096 64,26 0,000
Carbonate*Detergent 4 73,37 18,342 10,51 0,003
Carbonate*Cellulose 4 18,23 4,557 2,61 0,115
Detergent*Cellulose 4 1,01 0,253 0,14 0,960
Error 8 13,96 1,745
Total 26 1987,20

Model Summary
S R-sq R-sq(adj) R-sq(pred)
1,32081 99,30% 97,72% 92,00%

Let us check the residual assumptions before drawing the conclusions.


Scatterplot of SRES1 vs FITS1; Carbonate; Detergent; Cellulose Probability Plot of SRES1
Normal
FITS1 Carbonate
2 99
Mean -1,38490E-14
1 StDev 1,019
95 N 27
0 AD 0,960
90
P-Value 0,013
-1
80
-2 70
Percent

60
SRES1

10 20 30 40 1,0 1,5 2,0 2,5 3,0 50


Detergent Cellulose 40
2 30
20
1
10
0
5
-1

-2 1
-3 -2 -1 0 1 2 3
1,0 1,5 2,0 2,5 3,0 1,0 1,5 2,0 2,5 3,0 SRES1

The hypothesis of normality is not verified. Looking at the ANOVA table, we can observe that the
interactions Carbonate*Cellulose and Detergent*Cellulose are not relevant. Thus, we delete them
from the model.

General Linear Model: Washing versus Carbonate; Detergent; Cellulose


Factor Information
Factor Type Levels Values
Carbonate Fixed 3 1; 2; 3
Detergent Fixed 3 1; 2; 3
Cellulose Fixed 3 1; 2; 3

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Carbonate 2 723,41 361,707 174,34 0,000
Detergent 2 933,03 466,516 224,86 0,000

23
Cellulose 2 224,19 112,096 54,03 0,000
Carbonate*Detergent 4 73,37 18,342 8,84 0,001
Error 16 33,19 2,075
Total 26 1987,20

Model Summary
S R-sq R-sq(adj) R-sq(pred)
1,44037 98,33% 97,29% 95,24%

Let us check the residual assumptions


Scatterplot of SRES2 vs FITS2; Carbonate; Detergent; Cellulose Probability Plot of SRES2
Normal
FITS2 Carbonate
2 99
Mean -8,56928E-15
StDev 1,019
1
95 N 27
AD 0,302
0 90
P-Value 0,552

-1 80
70
-2

Percent
60
SRES2

10 20 30 40 1,0 1,5 2,0 2,5 3,0 50


Detergent Cellulose 40
30
2
20
1
10
0 5

-1
1
-2 -3 -2 -1 0 1 2 3
1,0 1,5 2,0 2,5 3,0 1,0 1,5 2,0 2,5 3,0 SRES2

The normality hypothesis cannot be rejected. Looking at the scatterplot, no outliers appear and the
variance appears homogeneous among the factor levels. The assumptions are verified. In conclusion,
the three factors influence the response as well as the interaction Carbonate*Detergent.

b) If you wish to increase the quality of the washing power of a solution, under what conditions
would you operate this process? Use Tukey with α=5% and calculate manually the Tukey’s
constant.

To choose the level of each factor that increases the quality of the washing power, a multiple
comparison has to be done. The factor cellulose is independent from the other factors, so its best level
can be chosen independently. Instead the effect of the factor Carbonate is not independent form the
effect of the detergent; in fact their interaction is significant. We have two families
α
(Carbonate*Detergent and Cellulose), thus: α FAM = 0.05 ⇒ α= FAM = 0.025
2
The Tukey’s constants are:
1 1 5.550
Tukey (AB) =
T0.025 q0.025 (ab=, df E ) q0.025 (9,16)
= = 3.92
2 2 2 The critical values are:
1 1 4.148
Tukey (C) T0.025
= q0.025 (c=
, df E ) q0.025 (3,16)
= = 2.93
2 2 2
MS E 2.07
AB : =
Tα 2 3.92
= 2 4.605
cn 3
MS E 2.07
C: =
Tα 2 2.93
= 2 1.987
abn 9
Grouping Information Using the Tukey Method and 97,5% Confidence
Cellulose N Mean Grouping
3 9 32,3111 A
2 9 31,5000 A
1 9 25,8333 B
Means that do not share a letter are significantly different.

24
Grouping Information Using the Tukey Method and 97,5% Confidence
Carbonate*Detergent N Mean Grouping
2 3 3 38,5667 A
3 2 3 38,2000 A
3 3 3 37,4333 A
2 2 3 36,0667 A
1 3 3 30,8333 B
3 1 3 27,1333 B C
2 1 3 23,7000 C D
1 2 3 22,4333 D
1 1 3 14,5667 E
Means that do not share a letter are significantly different.

If we wish to maximize the quality of the washing, concerning the Carbonate and Detergent, there is
not difference among the conditions (2,3), (3,2), (3,3) and (2,2). Instead, concerning the cellulose, the
levels 2 and 3 are not statistically different.

25
Exercise 9

We would determine if different hardening methods (A) and the processing times (B) can affect
external hardness. Five different hardening methods and three different processing times are used.
Suppose that we utilize some specimens with a rectangular section and replicate the experiment four
times.
We measure the specimen hardness in the center of the largest face.
Let us calculate the power using the direct method, if we are interested in:
a) a ratio d/σ=3.5 concerning the factor A. Verify the results with Minitab;
b) a difference greater or equal to 4 among the levels of B (σ2=20) ;
c) a difference greater than 2.5σ concerning the interaction AB.

Solution

From the text: a=5, b=3, n=4.


df A = a − 1 = 4 df B = b − 1 = 2 df AB = (a − 1)(b − 1) = 8 df E = ab(n − 1) = 45
a) Let us calculate the power using the direct method, if we are interested in a ratio d/σ=3.5
concerning the factor A. Verify the results with Minitab
Power =1 − β =Prob { F (df A , df E , δ ) > Fα (df A , df E )}
• Let us calculate Fα (df A , df E )
Inverse Cumulative Distribution Function
F distribution with 4 DF in numerator and 45 DF in denominator
P(X<=x) x
0,95 2,57874

• Let us calculate the noncentrality parameter


2
b ⋅ n  d  4⋅3 2
=δA =   = 3.5 73.5
2 σ  2
• Let us calculate β
Cumulative Distribution Function
F distribution with 4 DF in numerator and 45 DF in denominator and
noncentrality parameter 73,5
x P(X<=x)
2,57874 0,0000000

• The power is: Power = 1- β = 1-0,0 = 1

Minitab command: StatPower and Sample size General Full factorial

General Full Factorial Design


α = 0,05 Assumed standard deviation = 4,47214
Factors: 2 Number of levels: 5; 3
Include terms in the model up through order: 2
Not including blocks in model.

Maximum Total

26
Difference Reps Runs Power
15,6525 4 60 1,00000

Power Curve for General Full Factorial


1,0
Reps
4

Assumptions
0,8 α 0,05
StDev 4,47214
# Factors 2
# Levels 5; 3
0,6
Terms Included In Model
Power
Blocks No
Term Order 2
0,4

0,2

0,0
0 2 4 6 8 10 12 14
Maximum Difference

b) Let us calculate the power using the direct method, if we are interested in a difference greater
or equal to 4 among the levels of B;

Power =1 − β =Pr ob { F (df B , df E , δ ) > Fα (df B , df E )}


• Let us calculate Fα (df B , df E )
Inverse Cumulative Distribution Function
F distribution with 2 DF in numerator and 45 DF in denominator
P(X<=x) x
0,95 3,20432
2
a⋅n  d  4⋅5 4
• Let us calculate the noncentrality parameter
= δB =  = 8
2 σ  2 5
• Let us calculate β
Cumulative Distribution Function
F distribution with 2 DF in numerator and 45 DF in denominator and
noncentrality parameter 8
x P(X<=x)
3,20432 0,313553

• The power is: Power = 1- β = 1-0,313553 = 0.686447

We cannot verify this result with Minitab because: “Minitab performs the calculation based on the
main effect with the largest number of levels to provide conservative results”.
Let us try to calculate the new d and to use Minitab. The output is:

General Full Factorial Design


α = 0,05 Assumed standard deviation = 4,47214
Factors: 2 Number of levels: 5; 3
Include terms in the model up through order: 2
Not including blocks in model.
Maximum Total
Difference Reps Runs Power
4 4 60 0,345190

27
The output value of Minitab is lower because it is referred to the critical value (i.e., the one with the
maximum number of levels).
c) Let us calculate the power using the direct method, if we are interested in a difference greater
than 2.5σ concerning the interaction AB.

Power =1 − β =Pr ob { F (df AB , df E , δ ) > Fα (df AB , df E )}


• Let us calculate Fα (df AB , df E )
Inverse Cumulative Distribution Function
F distribution with 8 DF in numerator and 45 DF in denominator
P(X<=x) x
0,95 2,15213
2
n d  4 2
• δ AB
Let us calculate the noncentrality parameter= =  = 2.5 12.5
2 σ  2
• Let us calculate β
Cumulative Distribution Function
F distribution with 8 DF in numerator and 45 DF in denominator and
noncentrality parameter 12.5
x P(X<=x)
2,15213 0,381248

• The power is: Power = 1- β = 1-0,381248 = 0.618752

Minitab is not able to do this calculation.

28
Exercise 10 [July 10th 2013]

Diet affects weight gain. We wish to compare nine diets; these diets are the factor-level combinations
of protein source (beef, pork, and grain) and number of calories (low, medium, and high). There are
eighteen test animals that were randomly assigned to the nine diets, two animals per diet. The mean
responses (weight gain) are:

Weight Calories
Protein Low Medium High
Beef 76 86,8 101,8
Pork 78,3 89,5 98,2
Grain 78,8 83,5 86,2

a) Analyze the data with α=0.05;


b) If we wish to reduce the weight gain, what would you recommend? Evaluate the Tukey
constant and the critical value, then use Minitab.

Solution

a) Analyze the date with α=0.05

Main Effects Plot for Weight Interaction Plot for Weight


Data Means Data Means
Protein Calories 1 05 Protein
96 1
2
1 00 3

92
95
Mean

88 90
Mean

85
84

80
80
75
1 2 3
1 2 3 1 2 3 Calories

The factor Calories seems to have a large influence on the weight compared to the factor protein. The
interaction between the factors seems relevant but we cannot verify directly its significance because
of the lack of replicates.

General Linear Model: Weight versus Protein; Calories


Factor Information
Factor Type Levels Values
Protein Fixed 3 1; 2; 3
Calories Fixed 3 1; 2; 3

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Protein 2 63,05 31,52 1,36 0,355
Calories 2 469,94 234,97 10,12 0,027
Error 4 92,91 23,23
Total 8 625,90

Model Summary
S R-sq R-sq(adj) R-sq(pred)
4,81958 85,16% 70,31% 24,85%

29
Before drawing the conclusions, let us check the residual assumptions.
Scatterplot of SRES1 vs FITS1 ; Protein; Calories Probability Plot of SRES1
Normal
FITS1 Protein
99
1 Mean 4,687608E-1 6
StDev 1 ,061
95 N 9
0 AD 0,291
90
P-Value 0,525
-1 80
70

Percent
-2 60
SRES1

75 80 85 90 95 1 ,0 1 ,5 2,0 2,5 3,0 50


Calories 40
30
1 20

10
0
5

-1
1
-3 -2 -1 0 1 2 3
-2
1 ,0 1 ,5 2,0 2,5 3,0 SRES1

We can observe that all the standardized residuals belong to the interval (-3;+3); no outliers cab be
pointed out. The hypothesis of normality cannot be refused. Instead the hypothesis of homogeneous
variance is assumed looking at the scatterplot. In conclusion, only the factor Calories affects the
weight.

b) If we wish to reduce the weight gain, what would you recommend?

In order to reduce the weight gain, it is necessary a multiple comparison. The only factor significant
1 1 5
is the Calories. Its Tukey’s constant
= is T0.05 =q0.05 (a, df E ) = q0.05 (3, 4) =3.536
2 2 2
MS E 23.23
and the critical value is:
= Tα 2 3.536
= 2 13.92
b 3
Grouping Information Using the Tukey Method and 95% Confidence
Calories N Mean Grouping
3 3 95,4 A
2 3 86,6 A B
1 3 77,7 B
Means that do not share a letter are significantly different.

In order to reduce the weight gain, as predictable, it is better to eat foods with a medium or low
amount of calories.

30
Exercise 11 [February 5th 2014]

Derive, explicating all the steps, the expression of the expected mean square of the factor B 𝐸𝐸(𝑀𝑀𝑀𝑀𝐵𝐵 )
for a two-factor analysis with one observation per cell.

Solution

E ( SS B )
E ( MS B ) =
b −1
The model with 2 factors and one observation per cell is: yij = µ + τ i + β j + ε ij

 1 1 2  1 2   1 2
E ( SS=
B) E ∑ y•2j • −  E  ∑ y• j  − E 
y•••= y••  per=
n 1
=  an j 1,=
b abn   a j 1,b   ab 

   
2
1 
2
   1  
=E ( SS B ) E  ∑  ∑ ( µ + τ i + β j + ε ij )   − E  ∑ ( µ + τ i +=β j + ε ij )  
=
a b  i 1,a    i 1,a 
 j 1,=  ab= j =1,b  

1 2  1 2
= E  ∑ ( a µ + τ • + a β j + ε • j )  − E  ( abµ + bτ • + a β= • + ε •• ) 
 a j =1,b   ab 

1   1 
= E ∑ (a µ 2 2
+ a 2 β j2 + ε •2j + 2a 2 µβ j + 2a µε • j )  − E  ( a 2b 2 µ 2 + ε ••2 + =
2abµε •• ) 
a j =1,b   ab 
1 1
= abµ 2 + a ∑ β j2 + E ( ε •2j ) + 2 µε •• − abµ 2 − E ( ε ••2 ) − 2 µε •• =
j =1,b a ab
1 1 ab ab 2
= a ∑ β j2 +
∑ E ( ε •2j ) − E (=
ε ••2 ) a ∑ β j2 + σ 2 − =σ a ∑ β j2 + ( b − 1) σ 2
=j 1,=
b a j 1,b ab =j 1,b a ab=j 1,b

E ( SS B ) a
E ( MS B=
)
b −1
= σ2 + ∑
b − 1 j =1,b
β j2

31

You might also like