GLM Lec3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Stat 7410 Estimation for General Linear Model Prof.

Goel
Broad Outline

General Linear Model (GLM):

 Training Sample Model: Given n observations, [ [(Yi , x′i ), x′i = ( xi1 , , xir )] ,
i = 1, 2, n, the sample model can be expressed as
=Yi µ ( xi1 , xi 2 , =
, xir ) + ε i , i 1, 2, , n, (1)
where, ε i , i = 1, 2,, n , denote the noise (random errors), each with mean
zero and variance σ 2 .

 From now on, we denote the features f j , j = 1, 2,, p , themselves as coded


predictor variables x1 , x2 ,, x p . In the simplest setting, the random errors are
assumed to be uncorrelated with equal variance.

 Thus the sample GLM can be expressed as


p

∑ β j xij + ε i , E[ε i ] =
Yi =
j =1
0, Var[ε i ] =
σ 2 , Cov(ε i , ε k ) =
0, i ≠ k . (2)
p
] µ=
E[Yi | x i= i ∑β x .
j =1
j ij (3)

 Vector/matrix notation for the response, predictor variables, error terms and
the unknown coefficients:
 Y1   β1   ε1   x1.′   x1 j 
Y  β  ε   x′  x 
=Yn×1  2
= 
, β p×1 =
2
,ε  2
=
,and X n× p  2. 
let x. j   ,
. Also,=
2j

        
       ′  
Yn  β p  ε n   xn .   xnj 
denote the jth column of X , i.e., X n× p = [x.1 , x.2 , , x. p ].
 Given the response vector Y, and the design matrix X , the sample GLM can
be written as

Xβ + ε , E[ε ] =
Y= 0, Cov[ε ] =
((cov(ε i , ε j ))) =
(( E (ε iε j ))) =
E[εε ′] =
σ 2I. (4)

 Thus, E[ Y]= μ= Xβ , Cov ( Y)= E[( Y - Xβ )( Y - Xβ )′]= E[εε ′]= σ 2I. (5)
 Ordinary Least Square (OLS): For an estimate β of β , corresponding
n p n

∑ (Yi − ∑ β j xij )2 =
Residual Sum of Squares: l2 ( β ) =
=i 1 =j 1
∑ ei2 (β ) =
=i 1
e′( β )e( β ).

• Problem: Find an estimated coefficient vector βˆ = arg min l2 ( β )


p
β∈R

 In matrix notation,

• min

e′( β )e( β )= min(

Y − Xβ )′(Y − Xβ )= e′( βˆ )e( βˆ ). (6)
β ∈R p
β ∈R p

 Expand S ( β ) =(Y − Xβ )′(Y − Xβ ) =Y′Y − Y′Xβ − β ′X′Y + β ′X′Xβ .

 On setting the partial derivatives of S ( β ) with respect to β equal to zero, we


get the Normal Equations
X′Xβ = X′Y . (7)

 Any solution of (7) is an optimal solution to the OLS problem.

 Full rank case:


 Example – Simple Linear Regression, Multiple Regression with linearly
independent features.
 If the matrix X′X is non-singular (the design matrix X is of full column rank
p), inverse of X′X exists, and the unique optimal least square solution is

βˆ = ( X′X ) −1 X′Y. (8)

 Note that E[βˆ ] (=


= X ′X ) −1 X ′E[ Y] = ( X ′X ) −1 X ′Xβ β .

Since, Cov(TY) = TCov(Y)T′, therefore,


 =
Cov ( βˆ ) Cov
= ( TY ) σ 2 [TIT′], =
where T ( X ′X ) −1 X ′.
Therefore,Cov ( βˆ ) = σ 2 ( X ′X ) −1.
 Not full-rank case:

 Example - ANOVA for Designed Experiments


 The normal equations (7) are consistent, but the system has infinitely many
solutions. Each solution can be expressed as β 0 = ( X′X) − X′Y, where ( X′X) −
is a generalized inverse of X′X . In fact,

• E  β=
0
 ( X′X) − X′E[=
Y] ( X′X) − ( X′X=
) β Hβ ≠ β , so some components of
β do not possess unbiased estimators.

 The biases in β 0 may depend on the generalized inverse used in


obtaining the particular OLS solution.

• All these estimators can’t be regarded as optimal estimator of the


vector β .

 Why?
 Optimal with respect to what criteria? May need to add an
additional criterion to OLS to get an unique solution, e.g.,

• Minimum norm OLS estimator: β + = ( X′X) + X′Y, where ( X′X) + is


the Moore-Penrose generalized inverse (Pseudo-inverse) of X′X .

 Coordinate Free (Vector Space) Approach: Interpret the model μ = Xβ as


μ ∈ C[ X] . However, note that µ̂ = Xβ 0 = PY = X(X′X)- X′Y , the
projection of Y onto C[ X] . The symmetric matrix P = X( X′X) − X′ is the
orthogonal projection matrix onto C[ X] , the space spanned by the columns
of X.

 Even though in non-full rank there are infinitely many solutions (in β ) to the
normal equations case, the projection µˆ = PY (also called Yˆ ) is unique, i.e., the
matrix P does not change with the choice of a generalized inverse of X′X .

 For a vector u ∈ C[ X] , i.e., u = Xb, for some vector b,

Pu = X( X′X) − X=
′u X( X′X) − X′Xb
= Xb
= u.

• Thus the projection of a vector u ∈ C[ X] onto C[ X] is u itself.


 Furthermore, for an arbitrary vector Y ∈ Vn , PY ∈ C[ X], therefore, P(PY) = PY
holds true for all Y ∈ R n . Hence, (P 2 - P)= 0 ⇔ P(I - P) = 0. Thus, P 2 = P, (i.e., P
is an idempotent matrix). Is P a symmetric matrix?.

• Fact: Every symmetric idempotent matrix is an orthogonal projection


matrix onto the space spanned by its columns.

 Since P (I-P) = 0, rows (columns) of P are orthogonal to the columns (rows)


of (I-P), i.e., Py and (I - P)y are orthogonal.

 Note that, Xβ 0 = µˆ = PY = Yˆ , Y − Yˆ = (I - P)Y = e, the vector of residuals.


n
 Therefore, the vectors Yˆ and e are orthogonal, i.e.,
= ˆ ′e ∑
Y = yˆ i ei 0.
i −1

 Example: Details of full-rank linear regression model.

 The Key Question:


p
• How to characterize the class of linear functions c′β = ∑ c j β j that can
j =1

be estimated uniquely through the least squares solutions?

 Estimable functions: A linear parametric function c′β is said to be


estimable, if there exist at least one unbiased estimator.

• If there does not exist any unbiased estimator of the linear function
c′β , it is said to be non-estimable.

• Why consider estimable functions? We will discuss its connection


with the concept of Identifiability.

 Note that Yi is an unbiased estimator of x′i. β . (Why?) Thus x′i. β is estimable


for each row of the matrix X. Hence c′β , where the vector c′ is some linear
combination of rows of X, is also estimable.

 Fact: A linear parametric function c′β of β j ' s is estimable if and only if


c ∈ C ( X′) ≡ Row space of X. (Prove it.)

• c ∈ C=
( X′) ⇔ c X′l for some l ∈  n .
 Therefore, for any OLS β 0 , c=
′β 0 l′X ( X ′X ) − X
= PY l′µˆ .
′Y l′=

 Thus c′β 0 is invariant to the choice of generalized inverse [Unique OLS


solution for unbiased estimator of c′β .

 Gauss-Markov Theorem - c′β 0 Best (Minimum Variance) Linear Unbiased


Estimator (B.L.U.E.) of an estimable linear function c′β .

 Generalized Least Squares: Var (ε ) = σ 2 V, where V is a known p.d. matrix.


• Reduce this problem to an OLS problem by a non-singular
transformation
• Since V is a positive definite matrix, there exists a non-singular
matrix T such that V −1 = T′T.
• Now, consider the linear transformation Z = TY.
• Note E (Z) ==
TE (Y) TXβ , Cov
= (Z) σ= 2
TVT′ σ 2 T(T= ′T) −1 T′ σ 2 I.
• Can consider OLS problem for Z.
 Background - Vector differentiation:
p p p
=
Vector of Partial derivatives of a linear form l′u ∑
= li ui ,and a quadratic form u′Au
=i 1
∑∑ a u u ,
=i 1 =j 1
ij i j

for a symmetric matrix A :


 ∂ ′   ∂ 
 ∂u ( l u)   ∂u ( u′Au)   2a11u1 + 2∑ a1 j u j 
 1   l1   1   j ≠1   a′1u 
 ∂     ∂   2a u + 2 a u 
∂ ′
( l u) 
 l2  l; ∂ (=

( u Au)   22 2 ∑ 2j j 
 a′ u 
= ( l′u) =∂u2 = u ′Au)  ∂u2 = j ≠ 2= 2=  2  2Au.
∂u   ∂u       
            
 ∂   lp   ∂   2a u + 2 a u  a′n u 
 ( l′u)   ( u′Au)  
nn n ∑ nj j 

 ∂u   ∂u  j ≠n 
 p   p 
When A is not symmetric, u′Au = u′{( A + A′) / 2}u, with {( A + A′) / 2} symmetric. Therefore,

( u′Au= ) ( A + A′)u.
∂u

You might also like