Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Paper 75006

What's happening if there is "B" in the general linear modeling output?


Shilong Kuang, Kelley Blue Book Inc., Irvine, CA

ABSTRACT
®
The SAS GLM procedure is an efficient tool in statistical data analysis, especially when we have categorical
®
variables (also called class variables) as predictors. In SAS practice, you may see the letter "B" showing up
sometimes in the parameter estimate output, and you may wonder what's happening in our model. What is the cause
of this? Can we still trust our model? Can we verify the modeling output by some alternative procedures? Where
should we put more attention for the similar situations in future?
In this paper, we answer those questions by demonstrating some intuitively understandable examples, with the
corresponding theoretical statistical background attached. With those in mind, you can easily turn the general linear
®
modeling into a more powerful tool! SAS ROCKS!

Keywords: General linear model, estimable function, singular matrix, generalized inverse.

INTRODUCTION
®
The SAS GLM procedure analyzes data within the framework of general linear models. When we are using the GLM
procedure in SAS, we may see some “weird” message in the output like the following:

Note: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the
normal equations. Terms whose estimates are followed by the letter 'B' are not uniquely estimable.

You may wonder what’s happening in our model. Is there anything wrong in the model? To what extent can we trust
the modeling output? We begin our demonstration by the following simple example.

EXAMPLE TO START
We consider the following one-way classified data, there is one categorical variable “group” with 3 levels (A,B,C).
SAS Code 1

data example_1; SAS Output1


input group $ y @@;
cards; Standard
A 6.4 A 6.2 Parameter Estimate t Value Pr > |t|
Error
B 8.4 B 8.7 Intercept 3.25 B 0.1354006 24 0.0002
C 3.4 C 3.1
;run; group A 3.05 B 0.1914854 15.93 0.0005
group B 5.3 B 0.1914854 27.68 0.0001
proc glm data=example_1; group C 0 B . . .
class group;
model y=group /solution e;
run;

1
How do we interpret the letter “B” in the output?
11 0 0
11 0 0 u
1010 τ1
If we write down the design matrix X in the form Y = Xβ + ε , X = ,β = ,
1010 τ2
10 01 τ3
10 01
We can see there are only 3 independent rows in X, with 4 parameters in (u, τ 1 , τ 2 , τ 3 ) .

 y1 = u + τ 1

In other words, for this system of linear equations,  y2 = u + τ 2 , there are only 3 independent equations with 4
y = u +τ
 3 3
variables, we know there is no unique solution. In fact, there are infinite many solutions.

The design matrix X is not a full-rank matrix, the row space r(X) has dimension 3 while the column space has
dimension 4. This explains the note: “the X'X matrix has been found to be singular”. This singularity indicates the
model as defined has too many parameters, it’s over-parameterized!

To overcome this “embarrassing” situation, by default in the GLM parameterization, we put the last level “C” (in
alphabetical order) of the categorical variable “group” as the reference level, considering as zero effect. This
restriction explains why we see the 0 estimate for the parameter “group C” in the previous modeling output.

ALTERNATIVE WAY TO DOUBLE CHECK THE OUTPUT


We can use the alternative PROC REG procedure to verify the output, assigning dummy variables to each level of
the categorical variable (except the last level), as following:

SAS Code 2
data example_1;
set example_1; proc reg data=example_1;
if group='C' then do; d1=0; d2=0; end; model y=d1 d2;
else if group='A' then do; d1=1; d2=0; end; run;
else if group='B' then do; d1=0; d2=1; end;
run;

Parameter estimate output is the same as the previous GLM output:

SAS Output 2
Parameter Standard
Variable DF t Value Pr > |t|
Estimate Error
Intercept 1 3.25 0.1354 24 0.0002
d1 1 3.05 0.19149 15.93 0.0005
d2 1 5.3 0.19149 27.68 0.0001

The modeling equation can be written as: Y(group=’C’)=intercept=3.25, which is the same as the previous GLM
output: Y(group=’C’)=intercept+”group C” parameter estimate=3.25+0=3.25.

2
MORE STATISTICS EXPLANATION
Since the design matrix X in the previous example is not full rank, the unique inverse of (X’X) does not exist. The
absence of a unique solution indicates that at least some parameters in the model can’t be estimated, and they are
said to be non-estimable!

DEFINITION OF ESTIMABLE FUNCTION


In a general linear model, a linear combination λ ' β of parameters is said to be estimable if there exists a linear
function a' Y that is unbiased for λ ' β , that is, E (a ' Y ) = λ ' β . If no such function exists, then the linear
combination λ' β is defined as non-estimable function.

LEMMA 1
λ' β is estimable if and only if λ ' = a' X for some a ' ; that is, λ ' is in the row space of X .

Proof:

Part-1, we prove the sufficiency assumption: given λ ' = a' X for some a ' , then
E (λ ' Y ) = a ' E (Y ) = a' Xβ = λ ' β
Part-2, for the necessary assumption: given λ ' β is estimable, by the estimable definition, there exists some linear
combination a' Y that is unbiased for λ ' β , that is, E (a ' Y ) = λ ' β for any β . It follows
E (a ' Y ) = a' E (Y ) = a ' Xβ = λ ' β for any β .
That is, a' Xβ = λ ' β for any β . Therefore, a' X = λ ' .

LEMMA 2
λ ' β is estimable if and only if there is a solution ξ such that X ' Xξ = λ (the last equation is usually called
conjugate normal equation).

Proof: Part-1, we prove the sufficiency assumption: if there is a solution ξ such that X ' Xξ = λ , then
λ ' = ξ ' X ' X := a ' X , where a = Xξ .
Therefore by Lemma 1, we know λ' β is estimable.

Part-2, for the necessary assumption: assume λ' β is estimable, by lemma 1, there exist some a' such that
λ ' = a' X , then λ = X ' a . By the generalized inverse property, we have
( X ' X )( X ' X ) − X ' X = X ' X ⇒ [( X ' X )( X ' X ) − X '− X ' ] p×n X n× p = O p× p .
Let L := ( X ' X )( X ' X ) − X '− X ' , it’s easy to show L' L = O p× p ⇒ L = O p×n
That is, ( X ' X )( X ' X ) − X ' = X '
We then have ( X ' X )( X ' X )

X 'a = X 'a = λ . (*)
Therefore, there is a solution ξ such that X ' Xξ = λ where ξ = ( X ' X )− X ' a .

COROLLARY
The necessary and sufficient condition for a linear combination λ' β of parameters to be estimable is
( X ' X )( X ' X ) λ = λ .

Proof: Based on Lemma 1 and lemma 2, λ' β is estimable if and only if the equation (*) is satisfied:

( X ' X )( X ' X ) X ' a = X ' a = λ . That is, ( X ' X )( X ' X ) − λ = λ .


3
EXAMPLE APPLICATION
Let us use our first example to demonstrate how to apply the previous conclusion in the corollary. Consider

Yij = u + τ i + ε ij , where i = 1, 2, ... , a and j = 1, 2, ... , n i


The corresponding design matrix:

1n1 1n1 0 n1 ...0 n1 1n1 1n2 1n3 ... 1na u


1n2 0 n2 1n2 ...0 n2 1n1 0 n2 0 n2 ... 0 n2 τ1
X = , X '= , β=
... ... ...
1na 0 na 0 na ...1na 0 n1 0 n2 0 n3 ... 1na τa
Then,

n n1 n2 ... na
n1 n1 0 ... 0
X'X = where n = n1 + n2 ... + n a , and we choose one of its generalized inverses :
...
na 0 0 ... na

0 0 0 ... 0
0 1 1 ... 1
1
0 0 ... 0
n1 0 1 0 ... 0
( X ' X )− = and ( X ' X )( X ' X ) − =
... ...
1 0 0 0 ... 1
0 0 0 ... ( a +1)×( a +1)
na
( a +1)×( a +1)

By the corollary (corollary seems to be more useful in most situations!): a linear combination λ' β of parameters is
estimable if and only if ( X ' X )( X ' X ) λ = λ , where λ ' = (λ0 , λ1 , ... , λa ) . That is,

a
(λ0 , λ1 , ... , λa ) = (∑ λi , λ1 , ... , λa )
i =1
Therefore, the necessary and sufficient condition for a linear combination λ' β of parameters to be estimable is
a
λ0 = ∑ λi ( ** )
i =1

For instance, we can check if τ 1 − τ 2 is estimable, since τ 1 − τ 2 = (0, 1, −1, ..., 0) (u, τ 1 , ... ,τ a )'
a
That is, λ0 = 0, λ1 = 1, λ2 = −1, λ3 = 0, ..., λa = 0 which satisfied the equation (**): λ0 = ∑ λi
i =1

Therefore, τ 1 − τ 2 is estimable.

4
a
We can similarly check that each of τ1 ,τ 2 ,τ 3 is not estimable separately, since λ0 = 0 ≠ ∑ λi = 1 . This
i =1
explains again the note message as we mentioned in the beginning: “Terms whose estimates are followed by the
letter 'B' are not uniquely estimable”, and all group A, group B, group C parameters’ estimate in GLM output have the
letter “B” attached.

CONCLUSION
®
In this paper, we investigate the “weird” letter “B” output in the SAS GLM procedure. By the demonstration with
intuitively understandable examples, we are able to answer a series of questions: is there anything wrong in the
model? To what extent we can trust the output? What alternative strategies we can take? etc.
Furthermore, we provide the theoretical statistical explanation to rigorously prove our statement.

ACKNOWLEDGMENTS
I would like to thank my former Statistics professor Dr. Daniel Jeske for his various help at the beginning of my
statistics career, Also I would like to thank our Vice-President Mr. Shawn Hushman for the trust and various SAS
training support.

REFERENCE
nd
1. Applied Regression Analysis - A Research Tool, John O. Rawlings, Sastry G. Pantula, David A. Dickey, 2
edition, ISBN 0-387-98454-2, 1998 Springer-Verlag New York, Inc..
2. INTRODUCTION TO BUILDING A LINEAR REGRESSION MODEL, LESLIE A. CHRISTENSEN, SAS
Users Group International (SUGI), Proceedings 22, Statistics and Analytics.

RECOMMENDED READING
 SAS Usage Note 38384: How to interpret the results of the SOLUTION option in the MODEL statement of PROC
GLM? http://support.sas.com/kb/38/384.html.
 SAS Usage Note 22585: Why is the X'X matrix found to be singular in the PROC GLM Output?
http://support.sas.com/kb/38/384.html.
 SAS Study Notes: http://www.dumblittledoctor.com/sas_tutorial_home.php.

CONTACT INFORMATION
Your comments and questions are very valued and encouraged. Please contact our author at:
Name: Dr. Shilong Kuang
Enterprise: Kelley Blue Book, Inc.
Address: 195 Technology Drive
City, State ZIP: Irvine, CA,92618
E-mail: shilong.kuang@gmail.com
Web:

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.

You might also like