Chapter 9

106
9. Analysis of Variance Models
Applies several treatments or treatment combinations to randomly selected experimental units
and compare the treatment means for some response y . In ANOVA, we use linear models to
facilitate a comparison of these means. The model is often expressed with more parameters
than can be estimated, and results in an X matrix that is not of full rank.

9.1 Non‐Full‐Rank Models.

(a) One Way Model
Suppose a researcher has developed two chemical additives for increasing the mileage of
gasoline. To formulate the model, we might start with the notion that without additives, a gallon
yields an average of  miles. Then if chemical 1 is added, the mileage is expected to increase by
 1 miles per gallon, and if chemical 2 is added, the mileage would increase by  2 miles per
gallon. The model could be expressed as
y1     1  1 ; y2     2   2
where y1 is the miles per gallon from a tank of gasoline containing chemical 1 and 1 is a
random error term. The variables y2 and  2 are defined similarly. The researcher would like to
estimate the parameters  ,1 and  2 and test hypothesis such as H 0 : 1   2 .
Suppose the experiment consists of filling the tanks of six identical cars with gas, then adding
chemical 1 to three tanks and chemical 2 to the other three tanks. Thus, a model for each of the
six observations is:
y11    1  11 , y12    1  12 , y13     1  13 ,
(9.1)
y21     2   21 , y22     2   22 , y23     2   23 ,
Or
yij     i   ij , i  1, 2, j  1, 2,3 (9.2)
where yij is the observed miles per gallon of the j th car that contains the i th chemical in its
tank and  ij is the associated random error. The six equations in (9.1) in matrix form as:
 y11   1 1 0  11 
     
 y12   1 1 0 12
   
 y13   1 1 0   13 
 
   1    (9.3)
 y21   1 0 1     21 

 y  1 0 1  2    22 
 22     
 y  1 1  
 23   0  23 
or
y  Xβ  ε .

In (9.3), X is a 6  3 matrix whose rank is 2 since the first column is the sum of the second and
third column, which are linearly independent. Since X is not of full rank, thus the parameters
 ,1 and  2 cannot be estimated by βˆ   XX  Xy because  XX  does not exist.
1 1

107
With three parameters and rank  X   2 , the model is said to be overparameterized. By
increasing the number of observations (replication) for each of the two additives will not change
the rank of X .

Three approaches to remedy this problem: (1) redefine the model using two new parameters
that are unique, (2) use the overparameterized model but place constraints on the parameters
so that they become unique, and (3) in the overparameterized model, work with linear
combinations of the parameters that are unique and can be estimated. To illustrate these three
techniques:

1. To reduce the number of parameters, consider for example if   15,1  1, 2  3 , and the
model becomes
yij  15  1   ij  16  1 j , j  1, 2,3,
(9.4)
y2 j  15  3   ij  18   2 j , j  1, 2,3,
The values 16 and 18 are the means after the two treatments have been applied. Generally, the
means could be labeled 1 and 2 and the model could be written as
y1 j  1  1 j and y2 j  2   2 j
The means 1 and 2 are unique and can estimated. The redefined model for all six
observations in (9.1) or (9.2) takes the form
 y11  1 0   11 
     
 y12  1 0   12 
 y13  1 0   1   13 
       
 y21   0 1  2    21 
 y   0 1  
 22     22 
 y   0 1  
 23     23 
which we write as y  Wμ  ε .
The matrix W is full rank, and we can estimate μ as
 ˆ 
μˆ   1    WW  Wy
1

 ˆ 2 

This solution is called reparameterization.

2. Alternatively, to reduce the number of parameters, by introducing constraints on the
parameters  , 1 and  2 denoted as  * , 1* and  2* . In (9.1) & (9.2), the constraint  1*   2*  0
has the effect of defining  * to be the new mean after the treatments are applied and  1* and
 2* to be deviations from this mean. With this constraint, (9.4) can be written as
yij  17  1   ij  16  1 j , j  1, 2,3,
.
y2 j  17  1   ij  18   2 j , j  1, 2,3,
This model is now unique because there is no other way to express it so that  1*   2*  0 . Such
constraint are often called side conditions.
108
Thus, model yij   *   i*   ij subject to  1*   1*  0 can be expressed in a full‐rank format by
substituting  2*   1* to obtain y1 j   *   1*  1 j and y2 j   *   1*   ij . So that the matrix
form for the six observations is written as:
 y11  1 1   11 
     
 y12  1 1   12 
 y13  1 1     13 
*
      
 y21  1 1  1*    21 
 y  1 1  
 22     22 
 y  1 1  
 23     23 
or
y  X*β*  ε .
Thus, matrix X* is full rank, and the parameters  * and  1* can be estimated.

3. In (9.4), exist some linear combinations that are unique. For example,  1   2  2,    1  16,
and    2  18 remain the same for all possible values of  , 1 and  2 . Such unique linear
combinations can be estimated.

(b) Two‐Way Model
Suppose we want to measure the effect of two different vitamins and two different methods of
administering the vitamins on the weight gain of chicks. This leads to a two‐way model. Let 1
and  2 be the effects of the two vitamins, and let 1 and  2 be the effects of the two methods
of administration. If we assume that these effects are additive (no interaction), the model
becomes:
y11    1  1  11, y12    1   2  12
or
y21     2  1   21 , y22     2   2   22

yij     i   j   ij ; i  1, 2, j  1, 2, (9.5)
where yij is the weight gain of the  ij  th chick and  ij is the associated random error. In matrix
form, (9.5) becomes
 
 y11  1 1 0 1 0     11 
    1  
 y12   1 1 0 0 1    12 
2   (9.6)
 y21  1 0 1 1 0      21 
       
 y22  1 0 1 0 1  1    22 
 2 
or y  Xβ  ε .
Since rank  X   3 , only three unique parameters are possible, unless side conditions are
imposed on the five parameters. There are many ways to reparameterize in order to reduce to
three parameters in the model. For example, consider the parameters  1 ,  2 , and  3 defined as
 1    1  1 ,  2   2  1 ,  2   2  1 .
The model can be written in terms of the  ’s as
109
y11     1  1   11   1  11 ,

y12     1  1     2  1   12   1   3  12 ,
y21     1  1    2  1    21   1   2   21 ,
y22     1  1    2  1     2  1    22   1   2   3   22 .
In matrix form, this becomes
 y11   1 0 0   11 
     1   
 y12    1 0 1       12 
 y21   1 1 0   2    21 
 
   1 1 1    3    
 22  
y   22 
or y  Zγ  ε (9.7)

The rank of Z is clearly 3, and we have a full‐rank model for which γ can be estimated by
γˆ   ZZ  Zy . This provides estimates of  2   2  1 and  3   2  1 , which are typically of
1
interest to the researcher.

Now, consider the side conditions on the parameters. Since rank  X   3 and there are five
parameters, we need two (linearly independent) side conditions. If these two constraints are
appropriately chosen, the five parameters become unique and thereby estimable. Denote the
constrained parameters by  * , i* , and  *j and consider the side conditions 1*   2*  0 and
1*   2*  0 . These lead to unique definition of  i* and  *j as deviations from means. To show
this, start by writing the model as
y11  11  11 , y12  12  12 ,
(9.8)
y21  21   21 , y22  22   22
 
where ij  E yij is the mean weight gain with vitamin i and method j . The means are
displayed in Table 9.1 , and the parameters 1* , 2* , 1* , and  2* are defined as row    and
  column effects.

The first column effect, 1*  1   , is the deviation of the mean for vitamin 1 from the overall
mean (after treatments) and is unique. The parameters  2* , 1* , and  2* are likewise uniquely
defined. From definition in Table 9.1, we obtain
1*   2*  1    2    1  2  2
(9.9)
 2  2 
and similarly, 1*   2*  0 . Thus with the side conditions 1*   2*  0 and 1*   2*  0 , the
redefined parameters are both unique and meaningful. In (9.5) in terms of
 *   ,  i*  i   , and  *j   j   :
ij     i       j      ij  i   j   

  *  i*   *j
110
The term ij  i   j   , which is required to balance the equation, is associated with the
interaction of vitamins and methods. In order for  i* and  *j to be additive effects, the
interaction ij  i   j   must be zero.

Table 9.1 Means and Effects for the Model in (9.8)
Column 1 Column 2 Row means Row effects
Row 1 11 21 1 1*  1  
Row 2 12 22 2 1*  2  
Column means 1 2 
Column effects   1  
*
1   2  
*
2

9.2 Estimation
Consider estimation of β and of linear function of β in the non‐full‐rank model y  Xβ  ε .
W/out reparameterize or impose side conditions and w/out normality assumption of y .

Estimability of β .
Model y  Xβ  ε , E  y   Xβ,cov  y    2 I, X is n  p of rank k  p  n . X is not of full rank.
Using LS approach, we get the normal equations as XXβˆ  Xy . Here, XX has no inverse, and
therefore this normal equations do not have a unique solution. However, there are an infinite
number of solutions.

Theorem 9.2A. If X is n  p of rank k  p  n , the system of equations XXβˆ  Xy is
consistent.

Example 9.2. In separate h/out.

Estimable Functions of β .
Since β cannot be estimated, can we estimate any linear
combination of the  ’s, say λ β . A linear function of
parameters λ β is said to be estimable if there exists a
linear combination of the observations with an expected
value equal to λ β . Meaning that, λ β is estimable if there
exists a vector a such that E  ay   λ β .

Theorem 9.2B. In a model y  Xβ  ε , where E  y   Xβ and X is n  p of rank k  p  n , the
linear function λ β is estimable if and only if any one of the following conditions holds:
(i) λ  is a linear combination of the rows of X , that is, there exists a vector a such that
111
aX  λ  . (9.10)
Proof: If there exists a vector a such that aX  λ  , then using this vector a , we have
E  ay   aE  y   aXβ  λ β

(ii) λ  is a linear combination of the rows of XX or λ is a linear combination of the
columns of XX , that is, there exists a vector r such that
r XX  λ  or XXr  λ
(9.11)
Proof: If there exists a solution r for XXr  λ , then by defining a  Xr , we obtain
E  ay   E  rXy   rXE  y 
 rXXβ  λ β

(iii) λ or λ  is such that
XX  XX  λ  λ or λ   XX  XX  λ  .
 
(9.12)
where  XX  is any (symmetric) generalized inverse of XX .


Proof: If XX  XX  λ = λ , then  XX  λ is a solution to XXr  λ in part(ii). Conversely, if
- -
λ β is estimable, then XXr  λ has a solution vector.

Example 9.2(a): Using model in Example 9.2, we note that  1   2 is unique. To show that
 1   2   0,1, 1 β  λ β is estimable, using Theorem 9.2B as:
(i) To find a vector a such that aX  λ    0,1, 1 , consider a   0,0,1, 1,0,0  which
gives
aX   0,0,1, 1,0,0  X  1,1,0   1,0,1
.
  0,1, 1  λ 

Likewise, we can obtain λ β from E  y  :
λ β  aXβ  aE  y    0,0,1, 1,0,0  E  y 
 E  y11  
 
 E  y12  
 
 E  y13  
  0,0,1, 1,0,0  
E  y21  
 
 E  y22  
 
 E y 
 23 
 E  y13   E  y21      1      2    1   2

(ii) The matrix XX is given as
112
 6 3 3
 
XX   3 3 0  .
 3 0 3
 
 1 1 
To find a vector r such that XXr  λ   0,1, 1 , consider r   0, ,   , which gives
 3 3
 6 3 3 0   0 
    
XXr   3 3 0  1 3    1   λ .
 3 0 3  1 3   1
    

 1 1
Using the generalized inverse  XX   diag  0, ,  given in Example 9.2, the product

(iii)
 3 3
 XX  XX becomes


0 1 1
 XX  XX   0 1 0  .

0 0 1
 
Then for λ   0,1, 1 . The condition XX  XX  λ  λ in (9.12) holds:


0 1 1 0  0 
    
 0 1 0  1  =  1  .
 0 0 1   1  1
    

Note: A set of functions λ1β, λ 2β,, λ m β is said to be linearly independent if the coefficient
vectors λ1 , λ 2 ,, λ m are linearly independent.

Theorem 9.2C. In the non‐full‐rank model y  Xβ  ε , the
number of linearly independent estimable functions of β
is the rank of X .

Note: All estimable functions can be obtained from Xβ or
XXβ .

Theorem 9.2D. In the model y  Xβ  ε , where E  y   Xβ and X is n  p of rank k  p  n ,
any estimable function λ β can be obtained by taking a linear combination of the rows
(elements) of Xβ or of the rows of XXβ .

Note: We can examine linear combinations of the rows of X or XX to obtain a set of estimable
functions of the parameters.

Example 9.2(b). Consider the model (9.6) with
113
 
1 1 0 1 0  

1 1 0 0

1  1 
X , β    2  .
1 0 1 1 0  
   1 
1 0 1 0 1  
 2

To examine what is estimable, take linear combinations aX of the rows of X to obtain three
linearly independent rows. For example, subtract the first row of X from the third row and
multiply by β , to obtain  0 1 1 0 0  β  1   2 , which involves only the  ’s.
Subtracting the first row of X from the third row can be expressed as
aX   1 0 1 0  X  x1  x3 , where x1 and x3 are the first and third rows of X .

Subtracting the first row from each succeeding row in X gives
1 1 0 1 0 
 
 0 0 0 1 1  .
 0 1 1 0 0 
 
 0 1 1 1 1
Subtracting the second and third rows from the fourth row of this matrix yields
1 1 0 1 0
 
 0 0 0 1 1  .
 0 1 1 0 0 
 
0 0 0 0 0 
Multiplying the first three rows by β , we obtain the three linearly independent
estimable functions
λ1β    1  1 , λ 2β   2  1 , λ 3β   2  1 .
These functions are identical to the functions  1 ,  2 ,  3 used before for (9.6) to reparameterize
to a full‐rank model.

In Example 9.2(b), the two estimable functions  2  1 and  2  1 are such that the
coefficients of the  ’s or of the  ’s sum to zero. A linear combination of this type is called a
contrast.

9.3 Estimators

9.3.1 Estimators of λ β .
From Theorem 9.2B(i) and (ii) we have the estimators ay and r Xy for λ β , where a and
r satisfy λ   aX and λ   r XX , respectively. A third estimator of λ β is λ βˆ , where β̂ is a
solution of XXβˆ  Xy .

The properties of r Xy and λ βˆ are in Theorem 9.3A.
114

Theorem 9.3A. Let λ β be an estimable function of β in the model y  Xβ  ε where
E  y   Xβ and X is n  p of rank k  p  n . Let β̂ be any solution to the normal equations
XXβˆ  Xy , and let r be any solution to XXr  λ . Then the two estimators λ βˆ and r Xy
have the following properties :

(i)  
E λ βˆ  E  r Xy   λ β.
(ii) λ βˆ is equal to r Xy for any β̂ or any r .

(iii) λ βˆ and r Xy are invariant to the choice of β̂ or r .

Example 9.3(i). The linear function  1   2   0,1, 1 β  λ β was shown to be estimable in
 1 1
Example 9.2(a). To estimate  1   2 with r Xy , we use r    0, ,   from Example 9.2(a) to
 3 3
obtain
 y11 
 
y12 
1 1 1 1 1 1  
 1 1   y13 
r Xy   0, ,   1 1 1 0 0 0   
 3 3    y 21 
 0 0 0 1 1 1 
y 
 22 
y 
 23 
 y 
 1 1   y1 y2
  0, ,    y1     y1  y2
 3 3   3 3
 y2 
1 1
where y   3i 1  3j 1 yij , yi   3j 1 yij , yi  yi   3j 1 yij .
3 3
To obtain the same result using λ β , firstly, find a solution to the normal equations XXβˆ  Xy :
ˆ
 6 3 3   ˆ   y 
    
 3 3 0  ˆ1    y1  or
 3 0 3 ˆ   y 
  2   2 

6ˆ  3ˆ1  3ˆ2  y
3ˆ  3ˆ1  y1
3ˆ  3ˆ2  y2

Consider ̂ to be an arbitrary constant and obtain
1 1
ˆ1  y1  ˆ  y1  ˆ , ˆ2  y2  ˆ  y2  ˆ .
3 3
Thus
115
 ˆ   0  1 
     
βˆ  ˆ1    y1   ˆ  1
ˆ   y   1
 2   2   

To estimate  1   2   0,1, 1 β  λ β , we can set ˆ  0 to obtain βˆ   0, y1 , y2  and
λ β  y1  y2 . If we leave ̂ arbitrary, we likewise obtain
 ˆ 
 
λ βˆ   0,1, 1  y1  ˆ 
 y  ˆ 
 2 
 y1  ˆ   y2  ˆ   y1  y2

Since βˆ   XX  Xy is not unique for the non‐full‐rank model y  Xβ  ε with cov  y    2 I , it

does not have a unique covariance matrix. However, for a particular (symmetric) generalized
inverse  XX  , we can obtain the following covariance matrix:



cov βˆ  cov  XX  Xy 



 


 
  XX  X  2 I X  XX  
 
(9.13)
  2  XX  XX  XX 

 

The following Theorem gives the variance of r Xy and λ βˆ .
Theorem 9.3B. Let λ β be an estimable function in the model y  Xβ  ε , where X is n  p of
rank k  p  n and cov  y    2 I . Let r be any solution to XXr  λ , and let β̂ be any solution
to XXr  λ . Then the variance of λ βˆ or of r Xy has the following properties:
(i) var  r Xy    2rXXr   2r λ .
(ii)  
var λ βˆ   2 λ   XX  λ .

var  λ βˆ  is unique, that is, invariant to the choice of r or  XX 


(iii) .

Theorem 9.3C. If λ1β and λ 2β are two estimable functions in the model y  Xβ  ε , where X
is n  p of rank k  p  n and cov  y    2 I , the covariance of their estimators is given by
cov  λ1β, λ 2β    2r1λ 2   2 λ1r1   2 λ1  XX  λ 2


where XXr1  λ1 and XXr2  λ 2 .

Theorem 9.3D. If λ β is an estimable function in the model model y  Xβ  ε , where X is
n  p of rank k  p  n , then the estimators λ βˆ and r Xy are BLUE.

9.3.2 Estimator of σ 2 .
116
We define
 

SSE  y  Xβˆ y  Xβˆ  (9.14)
where β̂ is any solution to the normal equations XXβˆ  Xy . Two alternative expressions for
SSE are
SSE  y y  βˆ Xy
(9.15)
SSE  y  I  X  XX  X y

 

For an estimator of  2 , we define
SSE
s2  (9.16)
nk
where n is the number of rows of X and k  rank  X  .

Theorem 9.3E. For s 2 defined in (9.16) for the non‐full‐rank model y  Xβ  ε with E  y   Xβ
and cov  y    2 I , we have the following properties:
(i)  
E s 2   2 .
s is invariant to the choice of β̂ or to the choice of generalized inverse  XX  .
2 
(ii)

9.3.3 Normal Model
For the non‐full‐rank model y  Xβ  ε , now assume that
  
y is N n Xβ, 2 I or ε is N n 0, 2 I . 
With the normality assumption we can obtain maximum likelihood estimators.

 
Theorem 9.3F. If y is N n Xβ, 2 I , y  Xβ  ε , where X is n  p of rank k  p  n , then the
maximum likelihood estimators for β and  2 are given by
βˆ   XX  Xy

(9.17)
ˆ 2 
1
n
 

y  Xβˆ y  Xβˆ  (9.18)

Note: The form of the maximum likelihood estimator β̂ in (9.17) is the same as that of the least‐
squares estimator . The estimator ˆ 2 is biased. We often use the unbiased estimator s 2 given in
(9.16).


The mean vector and covariance matrix for β̂ are given as E βˆ   XX  XXβ and


cov βˆ   2  XX  XX  XX  . Next theorem gives some additional properties of β̂ and s 2 .
 
Note that some of these follow because βˆ   XX  Xy is a linear function of the observations.



117
 
Theorem 9.3G. If y is N n Xβ, 2 I , where X is n  p of rank k  p  n , then the maximum
likelihood estimators β̂ and s 2 (corrected for bias) have the following properties:
β̂ is N p  XX  XXβ, 2  XX  XX  XX   .
  
(i)
 
(ii)  n  k  s 2  2 is  2  n  k  .
(iii) β̂ and s 2 are independent.

 
Theorem 9.3H. If y is N n Xβ, 2 I , where X is n  p of rank k  p  n , and if λ β is an
estimable function, then λ βˆ has minimum variance among all unbiased estimators.

Note: The estimator λ βˆ was shown to have minimum variance among all linear unbiased
estimators. With the normality assumption added in Theorem 9.3G, λ βˆ has minimum variance
among all unbiased estimators.

9.4 Reparameterization
Now, we formalize and extend this approach to obtaining a model based on estimable
parameters.
In reparameterization, we transform the non‐full‐rank model y  Xβ  ε , where X is n  p of
rank k  p  n , to the full‐rank model y  Zγ  ε , where Z is n  k of rank k and γ  Uβ is a
set of k linearly independent estimable functions of β . Thus Zγ  Xβ , and we can write
Zγ  ZUβ  Xβ (9.19)
where X  ZU . Since U is k  p of rank k  p , the matrix U is nonsingular (by Theorem) and
multiply X  ZU by U to solve for Z in terms of X and U :

ZUU  XU
1 (9.20)
Z  XU  UU 

Thus, model y  Zγ  ε is a full‐rank model and its normal equations ZZγˆ  Zy have the
unique solution γˆ   ZZ  Zy .
1

In the reparameterized full‐rank model y  Zγ  ε , the unbiased estimator of  2 is given
 y  Zγˆ   y  Zγˆ  
1 SSE
s2  (9.21)
nk nk
Since Zγ  Xβ , the estimators Zγˆ and Xβˆ are also equal,
Zγˆ  Xβˆ .
and therefore SSE in (9.14) and SSE in (9.21) are the same:
 
 
y  Xβˆ y  Xβˆ   y  Zγˆ   y  Zγˆ  . (9.22)

118
The set Uβ  γ is only one possible set of linearly independent estimable functions. Let Vβ  δ
be another set of linearly independent estimable functions. Then there exists a matrix W such
that y  Wδ  ε . Now an estimable function λ β can be expressed as a function of γ or of δ :
λ β  bγ  cδ. (9.23)
Hence

λ β  bγˆ  cδˆ .
and either reparameterization gives the same estimator of λ β .

Example 9.4. A reparameterization for the model yij     i   ij ; i  1, 2 , j  1, 2 . The model
can be written in matrix form as
1 1 0   11 
     
1 1 0     12 
y  Xβ  ε    
 1 0 1  1    21 
  
  2   
1 0 1   22 

Since X has rank 2, there exist two linearly independent estimable functions . We can choose
these in many ways, one of which is    1 and    2 . Thus
 
  1     1  1 1 0   
γ      1   Uβ
  2      2   1 0 1   
 2
To reparameterize in terms of γ , we can use
1 0 
 
 1 0
Z
 0 1
 
 0 1
so that Zγ  Xβ :
1 0   1     1 
      
1 0   1    1     1 
Zγ     
0 1   2    2      2 
     
0 1  2    2 
Alternatively, matrix Z can be obtained directly using (9.20). It is easy to verify that ZU  X :
1 0  1 1 0 
   
 1 0  1 1 0  1 1 0 
ZU    X
 0 1  1 0 1  1 0 1

   
 0 1 1 0 1

9.5 Testing Hypotheses
Now consider hypotheses about  ’s in the model y  Xβ  ε , where X is n  p of rank

k  p  n .Assume that y is N n Xβ, 2 I . 

119
Testable Hypotheses
A hypothesis such as H 0 : 1   2     q is said to be
testable if there exists a set of linearly independent estimable
functions λ1β, λ 2β, , λ t β such that H 0 is true if and only if
λ1β  λ 2 β    λ t β  0 .
Sometimes the subset of  ’s whose equality we wish to test is such that every contrast  i ci i
is estimable (  i ci i is a contrast if  i ci  0 ). In this case, it is easy to find a set of q  1 linearly
independent estimable functions that can be set equal to zero to express 1   2     q . One
such set is the following:

λ1β   q  1 1   2  3     q , 

λ 2β   q  2   2  3   4     q , 

λ q 1β   q 1   q

These q  1 contrasts λ1β, λ 2β,, λ q 1β constitute a set of linearly independent estimable
functions such that
 λ1β   0 
   
    
 λ β   0 
 q 1   
if and only if 1   2     q .

Consider model yij     i   ij ; i  1, 2,3 , j  1, 2,3 , and a hypothesis of interest is
H 0 : 1   2   3 . By taking linear combinations of the rows of Xβ , we can obtain the two
linearly independent estimable functions 1   2 and 1   2  2 3 . The hypothesis
H 0 : 1   2   3 is true if and only if 1   2 and 1   2  2 3 are simultaneously equal to
zero. Therefore, H 0 is a testable hypothesis and is equivalent to
 1   2  0
H0 :      (9.24)
 1   2  2 3   0 
Note: To test the testable hypotheses, we use a full‐and‐reduced‐model approach or
alternatively use a general linear hypothesis test.

Full and Reduced Model
Consider a non‐full‐rank model y  Xβ  ε , where β is p  1 and X is n  p of rank k  p  n .
Suppose we want to test H 0 : 1   2     q . If H 0 is testable, we can find a set of linearly
independent estimable functions λ1β, λ 2β,, λ t β such that H 0 : 1   2     q is equivalent
to
120
 γ1β   0 
    
γ β 0
H 0 : γ1   2    
   
   
 γ t β   0 
It is also possible to find
 γ t 1β 
 
γ 2   
 γ β 
 k 
such that the k functions λ1β,, λ t β, λ t 1β , λ k β are linearly independent and estimable,
where k  rank  X  . Let
γ 
γ   1  .
 γ2 
We can now reparameterize (section 9.4) from the non‐full‐rank model y  Xβ  ε to the full‐
rank model
y  Zγ  ε  Z1γ1  Z 2 γ 2  ε ,
where Z   Z1 , Z 2  is partitioned to conform with the number of elements in γ1 and γ 2 .

For the hypothesis H 0 : γ1  0 , the reduced model is y  Z 2 γ *2  ε* . The estimate of γ *2 in the
reduced model is the same as the estimate of γ 2 in the full model if the columns of Z 2 are
orthogonal to those of Z1 , that is, if Z 2Z1  O . For the balanced models the orthogonality will
typically hold . Accordingly, we refer to γ 2 and γ̂ 2 rather than to γ *2 and γ̂*2 .

Since y  Xβ  ε is a full‐rank model, the hypothesis H 0 : γ1  0 can be tested (as in Section
8.2). The test is outlined in Table 9.2, which is analogous to Table 8.3. Note that the degrees of
freedom, t , for SS  γ1 | γ 2  is the number of linearly independent estimable functions required
to express H 0 .

Table 9.2. Analysis of variance for Testing H 0 : γ1  0 in Reparameterized Balanced Models
Source of d.f Sum of Squares F‐Statistics
Variation
Due to γ1 t SS  γ1 | γ 2   γˆ Z y  γˆ 2 Z 2 y SS  γ 1 | γ 2  t

adjusted for γ 2 SSE  n  k 
Error nk SSE  y y  γˆ Z y
Total n 1 SST  y y  n y 2

In Table 9.2, the sum of squares γˆ Zy is obtained from the full model y  Zγ  ε . The sum of
squares γˆ 2 Z2 y is obtained from the reduced model y  Z 2 γ 2  ε , which assumes the hypothesis
is true. The reparameterization procedure presented above seems straightforward. However,
finding the matrix Z in practice can be time‐consuming. Fortunately, this step is actually not
necessary.

121
From (9.15) and (9.22), we have
y y  βˆ Xy  y y  γˆ Zy
 βˆ Xy  γˆ Zy (9.25)
where β̂ represents any solution to the normal equations XXβ  Xy . Similarly, corresponding
ˆ
to y  Z 2 γ *2  ε* , we have a reduced model y  X 2β*2  ε* obtained by setting 1   2     q .
Then,
βˆ *2 X2 y  γˆ *2Z2 y , (9.26)

where β̂*2 is any solution to the reduced normal equations X2 X 2βˆ *2  X2 y .

Theorem 9.5A. Consider the the partitioned model y  Xβ  ε  X1β1  X 2β 2  ε , where X is
n  p of rank k  p  n 1. If X2 X1  O , the estimate of β*2 in the reduced model y  X 2β*2  ε*
is the same as the estimate of β 2 in the full model.

In the balanced non‐full‐rank models we are considering in this chapter, the orthogonality of X1
and X 2 will typically hold. Accordingly, we refer to β 2 and β̂ 2 , rather than to β*2 and β̂*2 . The
test can be expressed as in Table 9.3, in which βˆ Xy is obtained from the full model
y  Xβ  ε and βˆ 2 X2 y is obtained from the model y  X 2β 2  ε , which has been reduced by the
hypothesis H 0 : 1   2     q . Note that the degrees of freedom t for SS  β1 | β 2  is the
same as for SS  γ1 | γ 2  in Table 9.2, namely, the number of linearly independent estimable
functions required to express H 0 . Typically, this is given by t  q  1 . A set of q  1 linearly
independent estimable functions was illustrated at the beginning of Section 9.5.

Table 9.3 Analysis of Variance for Testing H 0 : 1   2     q in Balanced Non‐Full‐Rank
Models
Source of d.f Sum of Squares F‐Statistics
Variation
Due to β1 t SS  β1 | β 2   βˆ X y  βˆ 2 X2 y SS  β1 | β 2  t

adjusted for β 2 SSE n  k 
Error nk SSE  y y  βˆ Xy
Total n 1 SST  y y  n y
2

General Linear Hypothesis
As illustrated in (9.24), a hypothesis such as H 0 : 1   2   3 can be expressed in the form
H 0 : Cβ  0 .We can test this hypothesis in a manner analogous to that used for the general
linear hypothesis test for the full‐rank model in Section 8.4. The following theorem is an
extension of Theorem 8.4a to the non‐full‐rank case.

122
 
Theorem 9.5B. If y is distributed as N n Xβ, 2 I and X is n  p of rank k  p  n , if C is a
m  p of rank m  k , such that Cβ is a set of m linearly independent estimable functions, and if
βˆ   XX  Xy , then


C  XX  C is nonsingular and invariant to  XX  .
 
(i)
Cβˆ is N m , 2C  XX  C ;

(ii)
 

 
1
SSH  2  Cβˆ C  XX  C Cβˆ  2 is  2  m,   , where

(iii)
 
1
   Cβ  C  XX  C Cβ 2 2 ;


 
SSE   y  I  X  XX  X y  2 is  2  n  k  ;
2 
(iv)
 
(v) SSH & SSE are independent.

 
Theorem 9.5C. Let y be N n Xβ, 2 I where X is n  p of rank k  p  n , and let C , Cβ , and
β̂ be defined as in Theorem 9.5B. Then if H 0 : Cβ  0 is true,the statistic :

 
1
Cβˆ C  XX  C Cβˆ m

F
SSH m
   ; (9.27)
SSE  n  k  SSE  n  k 
is distributed as F  m, n  k  .

9.6 An Illustration of Estimation and Testing
Model : yij     i   j   ij ; i  1, 2,3; j  1, 2
To test H 0 : 1   2   3 and H 0 : 1   2 .
Observations in the form y  Xβ  ε :
 y11  1 1 0 0 1 0      11 
      
 y12  1 1 0 0 0 1  1   12 
 y21  1 0 1 0 1 0    2    21 
       (9.28)
 y22  1 0 1 0 0 1   3    22 
 y  1 0 0 1 1 0       
 31     1   31 
 y  1 0 0 1 0 1     
 32    2   32 
Matrix XX :
 6 2 2 2 3 3
 
2 2 0 0 1 1 
2 0 2 0 1 1 
XX   
2 0 0 2 1 1 
3 1 1 1 3 0 
 
3 1 1 1 0 3 

The rank of both X and XX is 4
123

Estimable Functions
The hypothesis H 0 : 1   2   3 can be expressed as H 0 : 1   2 and 1   3  0 . Thus H 0 is
testable if 1   2 and 1   3 are estimable. To check 1   2 for estimability, we write it as
1   2   0,1, 1,0,0,0  β  λ1β
and then note that λ1 can be obtained from X as
1,0, 1,0,0,0  X   0,1, 1,0,0,0 
and from XX as
 1 1 
 0, ,  ,0,0,0  XX   0,1, 1,0,0,0 
 2 2 

(See Theorems 9.2B & 9.2D). Alternatively, we can obtain 1   2 as a linear combination of
the rows (elements) of E  y   Xβ :
E  y11  y21   E  y11   E  y21 
   1  1      2  1  .
 1   2
Similarly, 1   3 can be expressed as
1   3   0,1,0, 1,0,0  β  λ 2β
and λ 2 can be obtained from X or XX :
1,0,0,0, 1,0  X   0,1,0, 1,0,0  ,
 1 1  .
 0, ,0,  ,0,0  XX   0,1,0, 1,0,0 
 2 2 

It is also of interest to examine a complete set of linearly independent estimable functions
obtained as linear combinations of the rows of X [see Theorem 9.2D) . If we subtract the first
row from each succeeding row of X , we obtain

1 1 0 0 1 0
 
 0 0 0 0 1 1 
 0 1 1 0 0 0 
  .
 0 1 1 0 1 1 
 0 1 0 1 0 0 
 
 0 1 0 1 1 1 
We multiply the second and third rows by 21 and then add them to the fourth row, with similar
operations involving the second, fifth, and sixth rows. The result is
124
1 1 0 0 1 0
 
 0 0 0 0 1 1
 0 1 1 0 0 0 
 
0 0 0 0 0 0 
 0 1 0 1 0 0 
 
0 0 0 0 0 0 
Multiplying this matrix by β , we obtain a complete set of linearly independent estimable
functions:   1  1 , 1   2 ,1   2 ,1   3 . Note that the estimable functions not involving
 are contrasts in the  ’s or  ’s.

Testing a Hypothesis
Since two linearly independent estimable functions of the  ’s are needed to express
H 0 : 1   2   3 , the sum of squares for testing H 0 : 1   2   3 has 2 degrees of freedom.
Similarly, H 0 : 1   2 is testable with 1 degree of freedom.

The normal equations XXβˆ  Xy are given by
 6 2 2 2 3 3   ˆ   y.. 
   ˆ   
 2 2 0 0 1 1   1   y1. 
 2 0 2 0 1 1   ˆ 2   y2. 
   ˆ     (9.29)
 2 0 0 2 1 1   3   y3. 
 3 1 1 1 3 0   ˆ   y 
   1   .1 
 
 3 1 1 1 0 3   ˆ2   y.2 

If we impose the side conditions ˆ1  ˆ 2  ˆ 3  0 and ˆ1  ˆ2  0 , we obtain the following
solution to the normal equations:
ˆ  y.. ,
ˆ1  y1.  y.. , ˆ 2  y2.  y.. , ˆ 3  y3.  y.. (9.30)
ˆ1  y.1  y.. , ˆ2  y.2  y.. ,

where y..   ij yij 6, y1.   j y1 j 2, y.1   i yi1 3 , and so on.
If we impose the side conditions on both the parameters and the estimates, equations (9.30) are
unique estimates of unique meaningful parameters. Thus, for example, 1 becomes
1*  1.  .. , the expected deviation from the mean due to treatment 1, and y1.  y.. is a
reasonable estimate. On the other hand, if the side conditions are used only to obtain estimates
and are not imposed on the parameters, then 1 is not unique, and y1.  y.. does not estimate a
parameter. In this case, ˆ1  y1.  y.. can be used only together with other elements in β̂ [as
given by (9.30)] to obtain estimates λ βˆ of estimable functions λ β .

Now, to test H 0 : 1   2   3 following the outline in Table 9.3:
125
For the full model, we need βˆ Xy  SS   ,1 , 2 , 3 , 1 ,  2  which we denote by SS   , ,   .
By (9.29) and (9.30), we obtain
 y.. 
 
 
y
SS   , ,    βXy  ˆ ,ˆ1 ,ˆ 2 ,ˆ3 , 1 ,  2  1. 
ˆ ˆ ˆ
 
 
 y.2 
 ˆ y  ˆ y  ˆ y  ˆ y  ˆ y  ˆ y
.. 1 1. 2 2. 3 3. 1 .1 2 .2
 
3 2
 y.. y..    y1.  y..  y1.   y. j  y.. y. j (9.31)
i 1 j 1
y..2 3  y1. y..  2  y y 

     y1.     ..  y. j
.j
6 i 1  2 6 j 1  3 6
y..2  3 y1.2 y..2   2 y. j y..2 
2
     
6  i 1 2 6   j 1 3 6 
since  i yi.  y.. &  j y. j  y.. . The error sum of squares SSE is given by
y 2  3 y 2 y 2   2 y. j y..2 
2
y y  βˆ Xy   yij2  ..    1.  ..      .
ij 6  i 1 2 6   j 1 3 6 
To obtain βˆ 2 X2 y in Table 9.3, use the reduced model yij       j   ij     j   ij , where
1   2   3 and    is replaced by  . The normal equations X2 X 2βˆ 2  X2 y for the reduced
model are
6ˆ  3ˆ1  3ˆ2  y..
3ˆ  3ˆ  y .
1 .1 (9.32)
3ˆ  3ˆ2  y.2

Using the side condition ˆ1  ˆ2  0 , the solution to the reduced normal equations in (9.32) is
easily obtained as
ˆ  y.. , ˆ1  y.1  y.. , ˆ2  y.2  y.. (9.33)

By (9.32) & (9.33), we have
y..2  2 y. j y..2 
2
SS   ,    β 2 X 2 y  ˆ y..  1 y.1   2 y.2 
ˆ   ˆ ˆ    (9.34)
6  j 1 3 6 

Denote SS 1 , 2 , 3 |  , 1 ,  2  as SS  |  ,   , we have
y2 y2
SS  |  ,    βˆ Xy  βˆ 2 X2 y   i.  .. . (9.35)
i 2 6

The test is summarized in Table 9.4.

126
Table 9.4 Analysis of Variance for Testing H 0 : 1   2   3
Source of Variation d.f Sum of Squares F‐Statistic
Due to 
adjusted for  , 
2
SS  |  ,    
2
yi .

y..
2
 y
i
2
i. 
2  y..2 6 2

i 2 6 SSE 2
Error 2 SSE   ij yij βˆ Xy
2
Total 5 y..2
SST   ij yij2 
6

Example 9.2. Consider the model yij     i   ij ; i  1, 2, j  1, 2,3 . Given that the matrix X
and the vector β as
1 1 0 
 
1 1 0   
1 1 0   
X  , β  1 
 1 0 1   
 1 0 1  2
 
 1 0 1
By theorem we obtain
 6 3 3
 
XX   3 3 0  .
 3 0 3
 

A generalized inverse of XX is given by:
 
0 0 0 
 
 X X   0

  1
0
3 
 
 0 0 1 
 3

The vector Xy is given by
 y11 
 
 y12 
1 1 1 1 1 1    y.. 
  y13   
Xy  1 1 1 0 0 0      y1.  ,
 0 0 0 1 1 1  y21   y 
 
y   2. 
 22 
y 
 23 
where y..   i 1  j 1 yij , yi.   j 1 yij . Then
2 3 3
 
0 0 0
   y..   0 
   
β   XX  Xy   0 0   y1.    y1.  ,
ˆ  1

 3 
  y   y 
 0 1   2.   2. 
0 
 3
1 1
where y1.   3j 1 yij  yi.
3 3


To find E βˆ , we need E  y ..  . Since E  ε   0 , we have E  ij   0 . Then,
 3 yij  1 3
E  y..   E      E yij 
 j 1 3  3 j 1

1 3
  1
  E    1   ij   3  3 i  0 
3 j 1 3
    i

Thus,
0 
  
E β      1  .
ˆ
   
 2
The same result is obtained using

E βˆ   XX  XXβ

 
0 0 0
  6 3 3  
 1    
 0 0  3 3 0  1 
 3 
   3 0 3  
  2 
 0 0 1 
 3
0 
 
     1 
   
 2

Note:
Theorem : Suppose A is n  p of rank r and that A is partitioned as
A A12 
A   11 
 A 21 A 22 
where A11 is r  r of rank r . Then a generalized inverse of A is given by
 A 1 O 
A    11  ,
 O O
where the three O matrices are of appropriate sizes so that A  is p  n .

Corollary : Suppose A is n  p of rank r and that A is partitioned as above, where A 22 is
r  r of rank r . Then, a generalized inverse of A is given by
O O 
A   1 

 O A 22 
where the three O matrices are of appropriate sizes so that A  is p  n .

Example 9.3(a) ESTIMATION IN THE LESS THAN FULL RANK MODEL
The SAS procedure PROC GLM can be used to find conditional inverses and to estimate  2 and β in the

less than full rank model. The information gained can then be used easily to generate point or interval
estimates for the estimable functions t β .
Example 9.3(a): It is known that a toxic material was dumped in a river that flows into a large salt water
commercial fishing area. Civil engineers are interested in the amount of toxic material in parts per million
found in oysters harvested at three different locations ranging from the estuary to the bay itself. These data
as follows:
Site 1 (river) Site 2 (bay) Site 3 (estuary)

15 19 22
26 15 26
20 20 24
20 10 26
29 26 15
28 11 17
21 13 24
26 15
18

SAS CODE:
data toxic;
input y x1 x2 x3;
cards;
15 1 0 0
26 1 0 0
20 1 0 0
20 1 0 0
29 1 0 0
28 1 0 0
21 1 0 0
26 1 0 0
19 0 1 0
15 0 1 0
10 0 1 0
26 0 1 0
11 0 1 0
20 0 1 0
13 0 1 0
15 0 1 0
18 0 1 0
22 0 0 1
26 0 0 1
24 0 0 1
26 0 0 1
15 0 0 1
17 0 0 1
24 0 0 1
;
proc glm;
model y=x1 x2 x3/xpx i;

title1 Finding a Conditional;
title2 Inverse and Estimating;
title3 The Variance in the Less;
title4 Than Full Rank Model;

/* t1‐t2 is a label for contrast  1   2 */
/*x1 1 x2 ‐1 forms the vector t   0 1 1 0 that forms the contrast  1   2 */
Estimate 't1‐t2' x1 1 x2 ‐1;

/*t1 is a label that says we are trying to estimate  1 *
/* x1 1 forms the vector t   0 1 0 0 used to express  1 in the form t β */
Estimate 't1' x1 1;
run;
OUTPUT:
Finding a Conditional
Inverse and Estimating
The Variance in the Less
Than Full Rank Model

The GLM Procedure

Number of Observations Read 24
Number of Observations Used 24

The GLM Procedure

The X'X Matrix

Intercept x1 x2 x3 y

Intercept 24 8 9 7 486
x1 8 8 0 0 185
x2 9 0 9 0 147
x3 7 0 0 7 154
y 486 185 147 154 10546
The GLM Procedure

X'X Generalized Inverse (g2)

Intercept x1 x2 x3 y

Intercept 0.1428571429 ‐0.142857143 ‐0.142857143 0 22
x1 ‐0.142857143 0.2678571429 0.1428571429 0 1.125
x2 ‐0.142857143 0.1428571429 0.253968254 0 ‐5.666666667
x3 0 0 0 0 0
y 22 1.125 ‐5.666666667 0 478.875

The GLM Procedure

Dependent Variable: y

Sum of
Source DF Squares Mean Square F Value Pr > F

Model 2 225.6250000 112.8125000 4.95 0.0174

Error 21 478.8750000 22.8035714 (1)

Corrected Total 23 704.5000000

R‐Square Coeff Var Root MSE y Mean

0.320263 23.58177 4.775309 20.25000

Source DF Type I SS Mean Square F Value Pr > F

x1 1 99.1875000 99.1875000 4.35 0.0494
x2 1 126.4375000 126.4375000 5.54 0.0283
x3 0 0.0000000 . . .

Source DF Type III SS Mean Square F Value Pr > F

x1 0 0 . . .
x2 0 0 . . .
x3 0 0 . . .

Standard
Parameter Estimate Error t Value Pr > |t|

t1‐t2 6.79166667(2) 2.32038285(3) 2.93 0.0081

Standard

Intercept 22.00000000 B 1.80489697 12.19 <.0001
x1 1.12500000 B 2.47145696 0.46 0.6536
x2 ‐5.66666667 B 2.40652929 ‐2.35 0.0283
x3 0.00000000 B . . .
(4)

The GLM Procedure


NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve
the normal equations. Terms whose estimates are followed by the letter 'B' are not
uniquely estimable.
(1) The estimated variance, (2) The estimated difference between 1 and  2 and its standard error
s t  XX  t is given at (3). Estimates for  ,1 , 2 and  2 are given in (4). These estimates are not
c
unique. They are based on the conditional inverse found.
HYPOTHESIS TESTING IN THE LESS THAN FULL RANK MODEL
Example 9.3(b): Three different treatment methods for removing organic carbon from tar sand
wastewater are to be compared. The methods are airflotation (AF), foam separation (FS), and
ferric‐chloride coagulation (FCC). These data, are obtained:
AF (I) FS (II) FCC (III)

34.6 38.8 26.7
35.1 39.0 26.7
35.3 40.1 27.0
35.8 40.9 27.1
36.1 41.0 27.5
36.5 43.2 28.1
36.8 44.9 28.1
37.2 46.9 28.7
37.4 51.6 30.7
37.7 53.6 31.2

Assume the one‐way classification model with n1  n2  n3  10 and N  30 . We want to test
H 0 : 1   2   3
In matrix form, we are testing
H 0 : Cβ  0
 
 
 0 1 1 0   1 
where C    and β 
 0 1 0 1   2 
 
 3 
The F‐statistic used to test H 0 from (8.21) is:

 
1
Cβˆ C  XX  C Cβˆ q
c
F
SSH q
  
SSE  n  k  1 SSE  n  k  1
For this model,
1 1 0 0
 
1 1 0 0
   
 
1 1 0 0
 
1 0 1 0 30 10 10 10 
1 10 10 0 0 
0 1 0
X  , XX    .
    10 0 10 0 
1  
0 1 0 10 0 0 10 
 
1 0 0 1
 
1 0 0 1
   
 
1 0 0 0
A conditional inverse for XX is
0 0 0 0 
 
0 1 0 0 
 10 
 XX c  1  .
0 0 0
 10 
 1
0 0 0 
 10 
Using this conditional inverse,
 0 
 
ˆβ   XX c Xy  36.25  , C  XX c C  0.2 0.1
 44.0   0.1 0.2 
 
 
 28.18
C  XX c C
1 1  0.2 0.1  7.75
   , Cβ    .
  0.03  0.1 0.2   8.07 
The numerator of the F ratio used to test H 0 is
 Cβˆ  C  XX  C

c 1
Cβˆ 1251.533
  625.766 .
2 2
The residual sum of squares for these data can be shown to be 278.661. Thus, the F ratio for
testing H 0 : 1   2   3 is
625.766
F2,27   60.63
278.661 / 27
This F2,27 value is >3.354  F0.05,2,27  , so we have enough evidence to reject the null hypothesis.
We conclude that the three different treatment methods give different results in removing organic
carbon from tar sand wastewater.
PROC GLM is used to test hypotheses in the less than full rank model.
SAS CODE:
data tar;
input y x1 x2 x3;
cards;
34.6 1 0 0
35.1 1 0 0
35.3 1 0 0
35.8 1 0 0
36.1 1 0 0
36.5 1 0 0
36.8 1 0 0
37.2 1 0 0
37.4 1 0 0
37.7 1 0 0
38.8 0 1 0
39.0 0 1 0
40.1 0 1 0
40.9 0 1 0
41.0 0 1 0
43.2 0 1 0
44.9 0 1 0
46.9 0 1 0
51.6 0 1 0
53.6 0 1 0
26.7 0 0 1
26.7 0 0 1
27.0 0 0 1
27.1 0 0 1
27.5 0 0 1
28.1 0 0 1
28.1 0 0 1
28.7 0 0 1
30.7 0 0 1
31.2 0 0 1
;
proc glm; /* asks for the general linear models procedure*/
model y=x1 x2 x3; /* identifies the independent variables as x1,x2,x3 and y as the..*/
/* response variable*/

contrast 'equal means' x1 1 x2 ‐1 x3 0, /* ask GLM to test H 0 : Cβ  0 ; the values..*/
x1 1 x2 0 x3 ‐1; /*listed after the variable names form the..*/
/* three columns of the matrix C */
run;
OUTPUT:
The GLM Procedure



Sum of

Model 2 1251.532667 625.766333 60.63 <.0001

Error 27 278.661000(B) 10.320778(C)


R‐Square Coeff Var Root MSE y Mean

0.817892 8.888490 3.212597 36.14333


x1 1 0.170667 0.170667 0.02 0.8986
x2 1 1251.362000 1251.362000 121.25 <.0001
x3 0 0.000000 . . .


x1 0 0 . . .
x2 0 0 . . .
x3 0 0 . . .

Contrast DF Contrast SS Mean Square F Value Pr > F

equal means 2 1251.532667(A) 625.766333 60.63(D) <.0001

Standard

Intercept 28.18000000 B 1.01591229 27.74 <.0001
x1 8.07000000 B 1.43671694 5.62 <.0001
x2 15.82000000 B 1.43671694 11.01 <.0001
x3 0.00000000 B . . .
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve
the normal equations. Terms whose estimates are followed by the letter 'B' are not
uniquely estimable
From the output, SAS NOTE indicates that parameter estimates are not unique in the less than full rank
model. The estimates found are based on a conditional inverse. The sum of squares associated with the
hypothesis H 0 : Cβ  0 is
Cβˆ  C  XX  C

c 1
Cβˆ  1251.533 .
This sum of squares is shown by (A) . The residual sum of squares and s 2 are given at (B) and (C)
,respectively. The F ratio used to test H 0 : Cβ  0 is given by (D). Note that any testable hypothesis can
be tested via an appropriately chosen CONTRAST statement. The estimates for  0 , 1 ,  2 and  3 shown in
the output are different from those calculated earlier because these estimates are not unique.

Example 9.3(c): A one‐way ANOVA with fixed effects based on the reparameterized model can be run easily
on SAS using PROC GLM or PROC ANOVA.
SAS CODE:
data tar;
input method $ remove;
cards;
AF 34.6
AF 35.1
AF 35.3
AF 35.8
AF 36.1
AF 36.5
AF 36.8
AF 37.2
AF 37.4
AF 37.7
FS 38.8
FS 39.0
FS 40.1
FS 40.9
FS 41.0
FS 43.2
FS 44.9
FS 46.9
FS 51.6
FS 53.6
FCC 26.7
FCC 26.7
FCC 27.0
FCC 27.1
FCC 27.5
FCC 28.1
FCC 28.1
FCC 28.7
FCC 30.7
FCC 31.2
;
proc glm;
class method; /* indicates that data are grouped according to the values
of the variable METHOD*/

model remove=method; /*identifies the variable REMOVE as the response variable*/

title coal‐tar data; /*titles the output*/
run;

OUTPUT:
coal‐tar data

The GLM Procedure

Class Level Information

Class Levels Values

method 3 AF FCC FS


The GLM Procedure

Dependent Variable: remove

Sum of

Model 2 1251.532667(B) 625.766333 60.63(A) <.0001

Error 27 278.661000(C) 10.320778


R‐Square Coeff Var Root MSE remove Mean

0.817892 8.888490 3.212597 36.14333


method 2 1251.532667 625.766333 60.63 <.0001


method 2 1251.532667 625.766333 60.63 <.0001
From the output, the F ratio used to test H 0 : 1  2  3 is shown by (A). The ANOVA table is based on
the corrected total sum of squares. The sum of squares (SS of regression) is shown by (B) and the error sum
of squares shown by (C)
Example 9.3(b)
Using the CONTRAST and ESTIMATE Statements with Unbalanced Data

Consider the toxic data in Example 9.3(a). Now we want to test the significance of the difference between
toxic means with the CONTRAST statement.

data toxic;
input site toxic;
cards;
1 15
1 26
1 20
1 20
1 29
1 28
1 21
1 26
2 19
2 15
2 20
2 10
2 26
2 11
2 13
2 15
2 18
3 22
3 26
3 24
3 26
3 15
3 17
3 24
;
proc glm;
class site;
model toxic=site;
contrast 'site_1‐site_2' site 1 ‐1 0;
estimate 'site_1‐site_2' site 1 ‐1 0;
contrast 'site_1‐site_3' site 1 0 ‐1;
estimate 'site_1‐site_3' site 1 0 ‐1;
contrast 'site_2‐site_3' site 0 1 ‐1;
estimate 'site_2 ‐site_3' site 0 1 ‐1;

run;

OUTPUT:

The GLM Procedure

Class Level Information

Class Levels Values

site 3 1 2 3


The GLM Procedure

Dependent Variable: toxic

Sum of

Model 2 225.6250000 112.8125000 4.95 0.0174

Error 21 478.8750000 22.8035714


R‐Square Coeff Var Root MSE toxic Mean

0.320263 23.58177 4.775309 20.25000


site 2 225.6250000 112.8125000 4.95 0.0174


site 2 225.6250000 112.8125000 4.95 0.0174

Contrast DF Contrast SS Mean Square F Value Pr > F

site_1‐site_2 1 195.3602941 195.3602941 8.57 0.0081
site_1‐site_3 1 4.7250000 4.7250000 0.21 0.6536
site_2‐site_3 1 126.4375000 126.4375000 5.54 0.0283

Standard

site_1‐site_2 6.79166667 2.32038285 2.93 0.0081
site_1‐site_3 1.12500000 2.47145696 0.46 0.6536
site_2 ‐site_3 ‐5.66666667 2.40652929 ‐2.35 0.0283

NOTE: The CONTRAST statement produces the same sum of squares (SS), mean square, F‐test and p‐value
for the differerence between means obtained from the Type III ANOVA F‐Test. (test of H 0 :  A   B ).
From the output:
1. The difference between the estimates of 1 and  2 is 6.7917, and the standard error of the estimate is 2.32.

2. The difference between the estimates of 1 and 3 is 1.123, and the standard error of the estimate is 2.47.
3. The difference between the estimates of  2 and 3 is ‐5.67, and the standard error of the estimate is 2.41

4. A t‐statistic for testing H 0 : 1  2 is t  6.7917 2.32  2.93 . The p‐value for the t‐statistic is 0.0081.

5. A t‐statistic for testing H 0 : 1  3 is t  1.125 2.471  0.46 . The p‐value for the t‐statistic is 0.6536.
6. A t‐statistic for testing H 0 : 2  3 is t  5.666 2.4065  2.35 . The p‐value for the t‐statistic is 0.0283.

5. We conclude either rejecting or not rejecting the null hypothesis by referring to p‐value from the output.

6. From this example, we conclude that we have enough evidence to reject the null hypothesis (since
p  value  0.05 ) of equal means between site 1 and site 2 and between site 2 and site 3 whereas there are not
enough evidence to reject null hypothesis of equal means between site 1 and site 3. We conclude that the amount of
toxic is equal between site 1 and site 3 but are not equal between site 1 and site 2 and site 2 and site 3 each.

7. To obtain a 95% confidence interval for the difference in means between this groups:
 1 1 
  A   B   t0.025,21MSE   
 nA nB 
From the output: MSE = 22.8035714 , and from statistical table, we found that t0.025,21 =2.080, therefore:

 1  2  : 6.79  2.080(22.803) 
1 1
  ;
8 9
 1  3  :1.125  2.080(22.803) 
1 1
  ;
8 7
 2  3  : 5.666  2.080(22.803) 
1 1
 
9 7
ANOVA
To compare the treatment means for some response y after
applying several treatments to randomly selected
experimental unit.
The model is often have more parameters than can be
estimated results in an X matrix that is not of full rank (non‐
full‐rank model)
We also might end up with balanced or unbalanced models.
Non‐Full‐Rank Models
(a) One‐Way Model
y1     1  1 ; y2     2   2
Model:  i
Represents the effect of two additives (  1 &  2 ) to mileage

y1 & y2. If additives 1 is added, the mileage is expected to
increase by  1 miles per gallon, and the mileage would

increase by  2 if additives 2 were added.  i is random error
term.
Suppose the experiment consists of filling the tanks of six
identical cars with gas, then adding chemical 1 to three tanks
and chemical 2 to the other three tanks. Thus, a model for
each of the six observations is:
y11     1  11 , y12     1  12 , y13     1  13 ,
y21     2   21 , y22     2   22 , y23     2   23 ,
Or
yij     i   ij , i  1, 2, j  1, 2,3

where yij is the observed miles per gallon of the jth car that

contains the i th chemical in its tank and  ij is the associated
random error.
In matrix form written as
 y11  1 1 0  11 
     
y
 12  1 1 0 12
   
 y13  1 1 0     13 
   1   
 21  1
y 0 1     21 
 y  1   2   
0 1 
 22     22 
 y  1  
 23   0 1  23 
 y  Xβ  ε with the parameters to be

estimated are  , 1 , 2 .
ˆ   XX 1 Xy

β
But we cannot estimate by using because
 XX 1 does not exist due to X is not of full rank (rank X =
2).
We are dealing with overparamaterized model (we have 3
parameters and rank  X   2 )
To solve problems by (i)reparameterization (ii) use the

overparameterized model with constraints on the parameters
or (iii) using a linear combinations of parameter (must unique
too)

(i) Reparameterization (to reduce the number of parameter)
Example: If   15, 1  1, 2  3
y1 j  15  1   ij  16  1 j , j  1, 2,3,
y2 j  15  3   ij  18   2 j , j  1, 2,3,
Which is can be written as

y1 j  1  1 j and y2 j   2   2 j and
 y11  1 0  11 
     
 y12  1 0  12 
 y13  1 0   1   13 
      
 y21   0
 y  0
1   2    21 
1    y  Wμ  ε
 22     22 
 y  0  
 23   1  23 
The matrix W is full rank, and we can estimate μ as
 ˆ1 
μˆ      WW  Wy
1
 ˆ 2 
(ii) Impose contraint  1   2  0 (side conditions)
* *
So the example
y1 j  15  1   ij  16  1 j , j  1, 2,3,
y2 j  15  3   ij  18   2 j , j  1, 2,3,
yij  17  1   ij  16  1 j , j  1, 2,3,
becomes y  17  1    18   , j  1, 2,3,
2j ij 2j
Thus, model yij     i   ij subject to 1  1  0 can be

* * * *
expressed in a full‐rank format by substituting  2  1 to

* *
obtain y1 j    1  1 j and y2 j   *  1*   ij . So that the matrix

* *
form for the six observations can be written as:
 y11  1 1   11 
     
 y12  1 1   12 
 y  X β ε
 y13  1 1   *   13 
      
 y21  1 1  1*    21  * *

 y  1 1  
 22     22 
 y  1 1  
 23    23 
Thus, matrix X is full rank, and the parameters  and 1

* * *
can be estimated.
(iii) In
y1 j  15  1   ij  16  1 j , j  1, 2,3,
y2 j  15  3   ij  18   2 j , j  1, 2,3,
exist some linear combinations that are unique. For example,
 1   2  2,    1  16, and    2  18 remain the same
for all possible values of  ,1 and  2 . Such unique linear
combinations can be estimated.

(b) Two‐Way Anova
Example: to measure the effect of two different vitamins and
two different methods of administering the vitamins on the
weight gain of chicks. This leads to a two‐way model.
Let 1 and  2 be the effects of the two vitamins, and let 1

and  2 be the effects of the two methods of administration.
If we assume that these effects are additive (no interaction),
the model becomes:
y11    1  1  11, y12    1   2  12
y21     2  1   21 , y22     2   2   22
and can be written as
yij     i   j   ij ; i  1, 2, j  1, 2,
where yij is the weight gain of the  ij  th chick and  ij is
the associated random error.
Written in matrix form, as
 
 y11   1 1 0 1 0     11 
    1  
 y   1 1 0 0 1    12 
12
2  
 y21   1 0 1 1 0      21 
       
 22   1
y 0 1 0 1  1    22 
 2 
 y  Xβ  ε
Since rank  X   3 , only three unique parameters are

possible. unless side conditions are imposed on the five
parameters.
(i) To reparameterize (many ways actually)
For example, consider the parameters  1 ,  2 , and  3 defined

as
 1    1  1 ,  2   2  1 ,  3   2  1 .
So that the model can be written in terms of the  ’s as
y11     1  1   11   1  11 ,
y12     1  1     2  1   12   1   3  12 ,
y21     1  1    2  1    21   1   2   21 ,

y22     1  1    2  1     2  1    22   1   2   3   22 .
In matrix form, written as
 y11  1 0 0  11 

 1   
 y  Zγ  ε
  
 y   1 0 1     12 
12
2 
 y21  1 1 0      21 
    3  
 22  1
y 1 1      22 
Rank(Z)=3 now we have a full‐rank model for which γ can
be estimated by γˆ   ZZ  Zy . This provides estimates of
1
 2   2  1 and  3   2  1 , which are typically of interest to
the researcher.
(ii) side condition
We have rank  X   3 , with 5 parameters. So we need 2

(linearly independent) side conditions.
Denote the constrained parameters by  *

,  *
,
i and
 *
j and
consider the side conditions 1*   2*  0 and 1   2  0 . These

* *
lead to unique definition of  i and  j as deviations from

* *
means. To show this, start by writing the model as
y11  11  11 , y12  12  12 ,
y21  21   21 , y22  22   22
where ij  E  yij  is the mean weight gain with vitamin i and
method j . The means are displayed in Table 9.1 , and the
parameters 1* , 2* , 1* , and  2 are defined as row    and
*
  column effects.
One thing for sure with the side conditions 1*   2*  0 and
1*   2*  0 , the redefined parameters are both unique and
meaningful.
ESTIMATION
To consider estimation of β and of linear function of β in the

non‐full‐rank model y  Xβ  ε .
W/out reparameterize or impose side conditions and w/out
normality assumption of y .
Problem with non‐full rank model is XX has no inverse, and

therefore we cannot find a unique solution to the normal
equations (not consistent solution)
Theorem 9.2A. If X is n  p of rank k  p  n , the system of
equations XXβˆ  Xy is consistent.
Given in Example 9.2.

Estimable Functions of β .
Since β cannot be estimated, can we estimate any linear

combination of the  ’s, say λ β ?
A linear function of parameters λ β is said to be estimable if
there exists a linear combination of the observations with an
expected value equal to λ β . Meaning that, λ β is estimable if
there exists a vector a such that E  ay   λ β .
Theorem 9.2B gives condition for which λ β is estimable

follows by example in Example 9.2(a).
Theorem 9.2C tells us how to determine the number of
estimable functions (linearly independent) of β (which is
equal to the rank of X)
Theorem 9.2D is telling us that any estimable functions can
be obtained by taking a linear combination of the rows
(elements) of Xβ or of the rows of XXβ .
Look at Example 9.2(b)!
So, what are the estimators of this λ β ?
The estimators are (i) a y (ii) r Xy (iii) λ β

ˆ
where a and r satisfies λ = aX and respectively

λ  = r XX and β̂ is a solution of XXβˆ = Xy .
Some properties of r Xy and λ βˆ are stated in Theorem
9.3A follows by an illustration of estimators r Xy and λ βˆ

in Example 9.3(i).
Theorem 9.3B gives properties of the variance of these
estimates ( r Xy and λ β )

ˆ
The covariance of the estimators of two estimable functions

is given in Theorem 9.3C.
Theorem 9.3D mentions that these estimators r Xy and

λ βˆ are BLUE.
Estimator of 
2

So far we have defined SSE as
(i)  

SSE  y  Xβˆ y  Xβˆ 
(ii) SSE  y y  βˆ Xy and
(iii) SSE  y  I  X  XX  X y



An estimator of  is
2
SSE
s2 
n  k where n is the number rows of X and
k=rank(X).
2
s
The properties of this are given in Theorem 9.3E
Now we assumed a normal model (for non‐full rank model)
Theorem 9.3F. If yis N n  Xβ, 2 I  , y  Xβ  ε , where X is

n  p of rank k  p  n , then the maximum likelihood
estimators for β and  2 are given by
βˆ   XX  Xy


ˆ 2 
1
n
 

y  Xβˆ y  Xβˆ 
Theorem 9.3G. If yis N n  Xβ, 2 I  , where X is n  p of rank

k  p  n , then the maximum likelihood estimators β̂ and s 2
(corrected for bias) have the following properties:
(i) β̂ is N p  XX  XXβ, 2  XX  XX  XX   .
(ii)  n  k  s  is   n  k  .
2 2 2
(iii) β̂ and s are independent.

2
Theorem 9.3H. If y is 

N n Xβ, 2 I  , where X is n  p of rank
k  p  n , and if λ β is an estimable function, then λ βˆ has
minimum variance among all unbiased estimators.

Testable Hypotheses
A hypothesis such as H 0 : 1   2     q is said to be testable if
there exists a set of linearly independent estimable functions
λ1β, λ 2 β, , λ t β such that H 0 is true if and only if
λ1β  λ 2 β    λ t β  0 .
Note: To test the testable hypotheses, we use a full‐and‐

reduced‐model approach or alternatively use a general linear
hypothesis test.
Follows by a discussion on the Hypothesis testing for this non‐
full‐rank model in which including the full‐and‐reduced model
and also general linear hypothesis.
Theorem 9.5B. If y is distributed as N n  Xβ, I  and X is n  p of

2
rank k  p  n , if C is a m  p of rank m  k , such that Cβ is a set

of m linearly independent estimable functions, and if βˆ   XX  Xy

, then
C  XX  C is nonsingular and invariant to  XX 

 
(i) .
(ii) Cβˆ is N m , 2C  XX  C ;
(iii) SSH    Cβˆ  C  XX  C Cβˆ  is  2  m,   , where

2   2
1
1
   Cβ  C  XX  C Cβ 2 2 ;

 
(iv) SSE  2  y  I  X  XX  X y  2 is 

 2
 n  k  ;
(v) SSH & SSE are independent.

Theorem 9.5C. Let y be N n  Xβ, 2 I  where X is n  p of rank
k  p  n , and let C , Cβ , and β̂ be defined as in Theorem 9.5B.

Then if H 0 : Cβ  0 is true,the statistic :
 Cβˆ  C  XX 
 1
C Cβˆ m

F
SSH m
  
SSE  n  k  SSE  n  k 
;
is distributed as F  m, n  k  .

Chapter 9

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 9

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 9

Uploaded by

Copyright:

Available Formats

106

y11     1  1   11   1  11 ,

λ β is estimable, then XXr  λ has a solution vector.

(ii) λ βˆ is equal to r Xy for any β̂ or any r .

  2  XX  XX  XX 

var  λ βˆ  is unique, that is, invariant to the choice of r or  XX 

Note that some of these follow because βˆ   XX  Xy is a linear function of the observations.

y..2 3  y1. y..  2  y y 

The SAS procedure PROC GLM can be used to find conditional inverses and to estimate  2 and β in the

Site 1 (river) Site 2 (bay) Site 3 (estuary)

AF (I) FS (II) FCC (III)

Assume the one‐way classification model with n1  n2  n3  10 and N  30 . We want to test

A conditional inverse for XX is

 Cβˆ  C  XX  C

Cβˆ  C  XX  C

Represents the effect of two additives (  1 &  2 ) to mileage

increase by  1 miles per gallon, and the mileage would

where yij is the observed miles per gallon of the jth car that

 y  Xβ  ε with the parameters to be

ˆ   XX 1 Xy

To solve problems by (i)reparameterization (ii) use the

Example: If   15, 1  1, 2  3

Thus, model yij     i   ij subject to 1  1  0 can be

expressed in a full‐rank format by substituting  2  1 to

obtain y1 j    1  1 j and y2 j   *  1*   ij . So that the matrix

Thus, matrix X is full rank, and the parameters  and 1

Let 1 and  2 be the effects of the two vitamins, and let 1

Since rank  X   3 , only three unique parameters are

For example, consider the parameters  1 ,  2 , and  3 defined

We have rank  X   3 , with 5 parameters. So we need 2

Denote the constrained parameters by  *

consider the side conditions 1*   2*  0 and 1   2  0 . These

lead to unique definition of  i and  j as deviations from

where ij  E  yij  is the mean weight gain with vitamin i and

To consider estimation of β and of linear function of β in the

Problem with non‐full rank model is XX has no inverse, and

Since β cannot be estimated, can we estimate any linear

Theorem 9.2B gives condition for which λ β is estimable

The estimators are (i) a y (ii) r Xy (iii) λ β

where a and r satisfies λ = aX and respectively

9.3A follows by an illustration of estimators r Xy and λ βˆ

estimates ( r Xy and λ β )

The covariance of the estimators of two estimable functions

Theorem 9.3D mentions that these estimators r Xy and

(iii) SSE  y  I  X  XX  X y

Theorem 9.3F. If yis N n  Xβ, 2 I  , y  Xβ  ε , where X is

Theorem 9.3G. If yis N n  Xβ, 2 I  , where X is n  p of rank

(i) β̂ is N p  XX  XXβ, 2  XX  XX  XX   .

(iii) β̂ and s are independent.

Theorem 9.3H. If y is 

Note: To test the testable hypotheses, we use a full‐and‐

Theorem 9.5B. If y is distributed as N n  Xβ, I  and X is n  p of

rank k  p  n , if C is a m  p of rank m  k , such that Cβ is a set

C  XX  C is nonsingular and invariant to  XX 

(ii) Cβˆ is N m , 2C  XX  C ;

(iii) SSH    Cβˆ  C  XX  C Cβˆ  is  2  m,   , where

(iv) SSE  2  y  I  X  XX  X y  2 is 

Theorem 9.5C. Let y be N n  Xβ, 2 I  where X is n  p of rank