Professional Documents
Culture Documents
Handout 4: The Statistical Model
Handout 4: The Statistical Model
Example 4.1 Consider the following etch rate data. The goal is to make comparisons across the
power settings.
1
The Statistical Model
In the last handout, we carried out the ANOVA and intuitively obtained measures of error by
computing the sums of squares “by hand.” These calculations are made much simpler using the
framework for a general linear model. The general linear model approach does require the use
of matrices and some linear algebra operations.
In the previous handout, the goal was to compare different levels of a single factor. One way to
write the statistical model for our example is as follows:
y ij=μ+ τ i + ε ij ,
where
y ij :
μ :
τi :
ε ij :
2
The Model in Matrix Notation
The easiest way to work with such a model is through the use of matrices. The model for our
simple example can be expressed in matrix form as follows.
y 11 τ 1 ε 11
y 12 μ τ 1 ε 12
[ [] [ ]
y 13 μ τ 1 ε 13
y 14 μ τ 1 ε 14
y 15 μ τ ε 15
1
μ
y 21 τ 2 ε 21
μ
y 22 μ τ 2 ε 22
y 23 μ τ 2 ε 23
y 24 μ τ 2 ε 24
y 25 τ ε
= μ + 2 + 25
y 31 μ τ 3 ε 31
y 32 μ τ 3 ε 32
y 33 μ τ ε 33
3
μ
y 34 τ 3 ε 34
μ
y 35 μ τ 3 ε 35
y 41 μ τ 4 ε 41
y 42 μ τ 4 ε 42
y 43 μ τ 4 ε 43
y 44 μ τ 4 ε 44
y 45 τ 4 ε 45
which is equivalent to
3
Here, y represents the response vector and X represents the design matrix. The estimated
model parameters can be obtained as follows -- this approach to estimating the model
parameter is very general and in fact works for any linear model.
−1
Model Estimates= ( X ' X ) X ' y
−1
Compute ( X ' X )
Problem: The columns of X ' X are not linearly independent; thus, the inverse of X ' X cannot
be computed.
Solution: We need to re-parameterize the model so that the model parameters can be
estimated. Consider the following re-parameterization. This re-parameterization utilizes the
“Sum to Zero” restriction τ 1 + τ 2+ τ 3 +τ 4=0 .
4
τ 4=−(τ 1 + τ 2+ τ 3)
5
6
Getting the Estimates via Linear Algebra
7
Getting the Predicted Outcomes using our General Linear Model
The predicted response vector from our simply contains the mean for each treatment level.
Compute the predicted response vector for our data.
μ
^
^y ¿ X ( X X ) X Y X μ^ ¿=¿ X τ^ 1 ¿
¿ ¿ ¿
' −1 '
τ^ 2
τ^ 3 []
For our design matrix and estimated vector of model coefficients we have the following
predicted outcomes.
^y 11
^y 12 1 1 0 0 551.2
[ ][ ] [ ]
^y 13 1 1 0 0 551.2
^y 14 1 1 0 0 551.2
^y 15 1 1 0 0 551.2
^y 21 1 1 0 0 551.2
1 0 1 0 587.4
^y 22 1 0 1 0 587.4
^y 23 1 0 1 0 587.4
^y 24 1 0 1 0 617.75 587.4
^y
^y = 25 = 1 0
^y 31 1 0
^y 32 1 0
^y 33 1 0
1
0
0
0
1
1
7.65
[ ]
0 −66.55 = 587.4
1 −30.35 625.4
625.4
625.4
1 0 0 1 625.4
^y 34
1 0 0 1 625.4
^y 35
1 −1 −1 −1 707.0
^y 41 1 −1 −1 −1 707.0
^y 42 1 −1 −1 −1 707.0
^y 43 1 −1 −1 −1 707.0
^y 44 1 −1 −1 −1 707.0
^y 45
8
Getting the Residuals, i.e. Errors, using our General Linear Model
The amount of error present in our model is simply the difference between the original
response vector and the predicted response vector. The residual vector is computed as follows.
r^ =( y−^y )
[ ][ ][ ]
542 551.2 −9.2
539 551.2 −12.2
570 551.2 18.8
610 587.4 22.6
593 587.4 5.6
565 587.4 −22.4
590 587.4 2.6
r^ = 579 − 587.4 = −8.4
610 625.4 −15.4
651 625.4 25.6
600 625.4 −25.4
629 625.4 3.6
637 625.4 11.6
715 707.0 8.0
725 707.0 18.0
710 707.0 3.0
700 707.0 −7.0
685 707.0 −22.0
SSE=r^ ' r^
2 SSE r^' r^
σ^ =MSE= =
df error df error
9
2 r^ ' r^ 5339
σ^ = = =334
df error 16
10
The Standard Error of Model Estimates
The estimate of σ^ 2 is used in computing the standard errors of the estimated model parameters.
The standard error estimates of our model parameters are necessary for hypothesis testing and
for computing confidence intervals (which we used in the previous handout).
−1
Var ( μ^ )=σ^ 2 ( X ' X )
Computing the variance of the estimated model parameters for our example.
μ
^
¿
([ ])
τ^
Var ( μ^ )=Var 1
τ^ 2
τ^ 3
¿
2 '
¿ σ^ ( X X )
¿
−1
334
0.05
[ 0
0
0
0
0.15
0
−0.05
0
−0.05
−0.05 0.15 −0.05
−0.05 −0.05 0.15
¿
]
The standard errors for the estimated model parameters obtained from Minitab.
The diagonal elements of this matrix are the variances for each of the estimated model
parameters. The off-diagonal elements are the covariances. A nonzero covariance implies that
variation in of the estimated model parameters influences the variation in another. The design
matrix results in, i.e. causes, a covariance structure amongst the τ^ ' s.
This covariance structure implies that the design choice will impact our ability to estimate one
treatment level effect independent of another. Designs that permit one treatment level effect
to be estimated independent of another are often considered optimal designs.
Next, we will consider the standard error necessary to compare across treatment levels.
11
The following output from Minitab contains the standard error Difference quantities for the
treatment level comparisons.
Next, we will confirm the calculations for one of these standard errors, say the comparison for
Power Setting 180 against Power Setting 160.
The above Difference in Means is simply a linear combination, say c , of the estimated
parameter vector, ^μ. In particular,
^μ
τ^ 1
[]
c∗^μ =[ 0 −1 1 0 ]∗ =( τ^ 2−^τ 1) =36.20
τ^ 2
^τ 3
12
The associated SE of Difference can be computed using the fact that
with c= [ 0 −1 1 0 ] yields
0
Var ( c∗^μ ) =Var ( [ 0 −1 1 0 ]∗^μ ) ¿ [ 0 −1 1 0 ]∗Var ( ^μ )∗
¿ ¿ ¿
−1
0
[]
1 133.60 ¿
13
Example 4.2 A manufacturer of television sets is interested in the effect on tube connectivity of
four different types of coating for color picture tubes. A completely randomized experiment is
conducted and the following conductivity data are obtained.
1. Write out the general linear model (using matrix notation) for this data. In particular,
clearly specify the response vector y, the design matrix X, the vector of model
parameters , and the error vector . You should use the set-to-zero parameterization
for the model as in done in Minitab.
14
Consider the following Minitab output from an appropriate analysis.
2. Verify ❶. That is, use your design matrix X and the response vector y to compute the
estimated model parameters.
−1
^μ= ( X' X ) X' y
3. Use your estimated model parameters to verify at least one of the Means given in ❷.
15
4. Compute the estimated residual vector, r^ , or your model. (See page 7).
5. Use the estimated residual vector to verify ❸. In particular, compute the Adj MS value.
SSE
σ^ =√ MSE=
df error√
7. Use your design matrix X and
−1
Var ( μ^ )=σ^ 2 ( X ' X )
8. Consider the predicted or fitted value for Group 2 of 145.25, i.e. ❻, and it’s associated
standard error value of 2.22, i.e. ❼.
What is the form of the c vector for Group 2? Use the c vector to verify the calculations
for the fitted value and standard error provided above.
Fit=c∗^μ
√
SE Fit= √Var ( c∗μ^ )= c [ σ^ 2 ( X ' X )
−1
]c '
16
Consider the following output from the pairwise comparison portion of the Minitab output.
9. Verify the calculations for ❽. What is the appropriate c vector for this calculation?
Difference of Means=c∗^μ
10. Verify the calculations for ❾ using the same c vector from above.
√ −1
SE of Difference=√ Var ( c∗^μ )= c [ σ^ 2 ( X ' X ) ]c
'
17