Ecomt Solns1

Solutions for Econometrics I Homework No.
1
due 2006-02-20
Feldkircher, Forstner, Ghoddusi, Grafenhofer, Pichler, Reiss, Yan, Zeugner
Exercise 1.1
Structural form of the problem:
1. qtd = α0 + α1 pt + α2 yt + ut1
2. qts = β0 + β1 pt−1 + ut2
To get the reduced form solve your system of equations for the endogenous
variables:
3. qts = qtd = β0 + β1 pt−1 + ut
1
4. pt = α1
[(β0 − α0 ) − α2 yt + β1 pt−1 (ut2 − ut1 )]
To arrive at the final form, each equation may only contain own lags or exogenous
variables on the right-hand side. So (4) is already in the final form and for (3):
Rewrite (1) to get
1 d
pt = [q − α0 − α2 yt − ut1 ]
α1 t
If we lag this we get
1 d
pt−1 = [q − α0 − α2 yt−1 − ut1−1 ]
α1 t−1
Plug this into (3) to get
1 d
qts = qtd = β0 + ut + β1 [ (q − α0 − α2 yt−1 − ut1−1 )]
α1 t−1
1
Exercise 1.2
The variance covariance matrix (VCV) of X ∈ Rk is defined as V ar(X) = E[(X −

EX)(X − EX)0 ].The covariance matrix is given by Cov(X, Y ) = E[(X − EX)(Y −
EY )0 ].Show the following transformation rules, where A, B, a, b are non-random ma-
trices or vectors of suitabel dimensions (A ∈ Rs×k , B ∈ Rt×m and a ∈ Rs , b ∈ Rt )
1. E(AX + a) = AE(X) + a
2. Cov(X, Y ) = E(XY 0 ) − (EX)(EY )0
3. Cov(AX + a, BY + b) = A[Cov(X, Y )]B 0
4. V ar(AX + a) = A[V ar(X)]A0
Proof:
1.
E(AX + a) = AE(X) + a (1)
Martin Wagners Comment on that: ”follows from properties of the integral”

Dominikis comment: ”multiply out the equation and look at the i-th row”
2.
Cov(X, Y ) = E[(X − EX)(Y − EY )0 ]
= E[XY 0 − X(EY )0 − EXY 0 + EX(EY )0 ]
= E(XY 0 ) − E[X(EY )0 ] − E[E(X)Y 0 ] + E[EX(EY )0 ]
= E(XY 0 ) − EX(EY )0
2
3.
V ar(AX + a) = V ar(AX) + V ar(a)1
= V ar(AX) + 0
= E[(AX − AEX)(AX − AEX)0 ]
= E[AXX 0 A0 − AX(EX)0 A0 − AEXX 0 A0 + AEX(EX 0 )A0 ]
= AE[XX 0 − X(EX)0 − E(X)X 0 + EX(EX 0 )]A0
= A[V ar(x)]A0
4.
Cov(X, Y ) = E(XY 0 ) − (EX)(EY )0
= E[(AX + a − E(AX + a))(BY + b − E(BY + b))0 ]

0
{z − a} )(BY + b − BE(Y ) − b) ]
= E[(AX + a − |AEX
follows from 1
= E[(AX − AE(X))(BY − BE(Y ))0 ]
= AE[(X − E(X))(Y − E(Y ))]B 0
5. follows from 3
Exercise 1.3
Let X ∈ RT ×k , Y ∈ RT ×m , 1 = (1, . . . , 1) ∈ RT . Define
1. X = T1 10 X and Y = T1 10 Y
ar(X) = T1 [(X − 1X)0 (X − 1X)]

2. Vd
3. Cov(X,
d Y ) = T1 [(X − 1X)0 (Y − 1Y )]
For A ∈ Rk×s , B ∈ Rm×t , a ∈ R1×s , b ∈ R1×t derive the following transformation

rules:
3
1. XA + 1a = XA + a
Proof:
1 0
XA + 1a =[1 (XA + 1a)]
T
1 0 1
10 1 a)] = XA + T a
[1 XA + |{z}
T T
equals T
= XA + a
0
2. Cov(X,
d Y ) = T1 X 0 Y − X Y
Proof:
1
Cov(X,
d Y ) = [(X − 1X)0 (Y − 1Y )]
T
1 0 0 0
= [X Y − X 0 1Y − X 10 Y + X 10 1Y ]
T
1 0 1 0 1 1 0
= X Y − X 0 1 Y − X 10 Y + T X Y
T |T {z } |T {z } T
0
X Y
1 0 0
= X 0 Y − 2X Y + X Y
T
1 0
= X 0Y − X Y
T
3. Cov(XA
d + 1a, Y B + 1b) = A0 Cov(X,
d Y )B
Proof:
1
Cov(XA
d + 1a, Y B + 1b) = [(XA + 1a − (1XA + 1a))0 (Y B + 1b − (1Y B + 1b))]
T
1
= [(XA − 1XA)0 (Y B − 1Y B)]
T
1
= A0 [(X − (1X)0 (Y − 1Y )]B
T
= A0 Cov(X,
d Y )B
ar(XA + 1a) = A0 Vd
4. Vd ar(X)A
Proof:
1
ar(XA + 1a) = [XA + 1a − 1(XA + 1a)]0 [XA + 1a − 1(overlineXA + 1a)]
Vd
T
1
= [XA + 1a − 1(X
T
4
Exercise 1.4
We start with a singular value decomposition (SVD) of X ∈ RT ×k , ie. we have

U ∈ RT ×T V ∈ Rk×k , both orthogonal matrices, and Σ = diag(σ1 , . . . , σr , 0, . . . , 0),
√
where σi = λi with λi ’s being the Eigenvalues of X 0 X such that
X = U ΣV 0 .
We have to show X 0 Xβ = X 0 y. We plug in the SVD of X and get
X 0 Xβ = X 0 y
V Σ0 ΣV 0 β = V Σ0 U 0 y |V 0 ·
Σ0 Σ(V 0 β) = Σ0 U 0 y
   
λ1 (V 0 β)1 σ (U 0 y)1
   1 
... ...
   
   
   
λr (V 0 β)r   σr (U 0 y)r
   
 
 = 
   
 0   0 
   
   
 ...   ... 
   
0 0
Define Σ+ := diag(1/σ1 , . . . , 1/σr , 0, . . . , 0). By this we know that β := V Σ+ U 0 y solves

the normal equations.
(Note: Σ+ Σ+ Σ = Σ+ )
Exercise 1.5
Show that X(X 0 X)−1 X 0 is the orthogonal projector on the column space spanned by
X and show that I − X(X 0 X)−1 X 0 is the projector on the ortho-complement of the
column space spanned by X.
Assumption: (X 0 X)−1 is invertible. Define P1 := X(X 0 X)−1 X 0 and P2 =
X(X 0 X)−1 (X 0 X)(X 0 X)−1 X 0 = P1 .
| {z }
I
5
Proof:
< a, P b >= a0 X(X 0 X)−1 X 0 b = X[(X 0 X)−1 ]0 X 0 a =< P a, b > (2)
So P is symmetric.
Secondly, we show that P projects on the space Xb by showing that the remainder
term a − P a is orthogonal to the space Xb.
< a − P a, Xb >= (a − P a)0 Xb (3)
= a0 Xb − a0 X (X 0 X)−1 X 0 X b (4)
| {z }
I
0 0
= a Xb − a Xb = 0 (5)
(I − P )a = a − P a (I − P )2 = I − 2P + P 2 = I − P P is symmetric implies that

I-P is symmetric.
Showing that (I − P )a for some given a projects on the orthocomplement of Xb is
equivalent to showing that (I − P )a is orthogonal to Xb which is algebraically the
same as has been demonstrated above.
Exercise 1.6
Part (i)
Suppose β + is not a solution to the normal equation. Then:
X 0 Xβ + 6= X 0 y where β + = (X 0 X)+ X 0 y
This implies:
(X 0 X)(X 0 X)+ X 0 y 6= X 0 y
By using the singular value decomposition from exercise 1.4 we know X = U ΣV 0

where V is the eigenvector matrix
6
OΛO0 OΛ+ O0 OΣ0 U 0 y 6= OΣ0 U 0 y
OIr Σ0 U 0 y 6= OΣ0 U 0 y
X 0 y 6= X 0 y
which is a contradiction.
Part (ii)
Show that a given β with X 0 Xβ = 0 implies β 0 β + = 0:

For this we show that X 0 Xβ = 0 implies X 0 Xβ + = 0:
X 0 Xβ = OΛO0 β = 0
O0 OΛO0 β = ΛO0 β = O0 0 = 0
since O is orthonormal. Furthermore we know that Λ+ = Λ+ Λ+ Λ, so:
Λ+ O0 β = Λ+ Λ+ 0 = 0
OΛ+ O0 β = (X 0 X)+ β = 0
the transpose of the latter term is equally zero: β 0 (X 0 X)+ = 00 . So we have
β 0 β + = β 0 (X 0 X)+ X 0 y = 00 X0 y = 0
Part (iii)
Show that ||β + || ≤ ||β|| where β is a solution to X 0 Xβ = X 0 y:
||β + || = ||(X 0 X)+ X 0 y|| = ||(X 0 X)+ (X 0 X)β|| =
= ||OΛ+ O0 OΛO0 || = ||OΛ+ ΛO0 ||

since (X 0 X)+ = OΛ+ O0 and (X 0 X) = OΛO0 . Moreover O0 O = I since O is orthonor-
mal.
7
Denote with Ir the ”pseudo-identity matrix” as the matrix with the first r entries in
thediagonal equal to one, and the reamining entries equal to zero.
0 1
1 0 ... 0 0 ...
B C
B
B 0 1 ... 0 0 ... CC
. . .. ..
B C
B .. .. .. C
B . . . C
Ir = B
B C
C
B
B 0 0 ... 1 0 ... CC
B C
B
B 0 0 ... 0 0 ... CC
@ .. .. ..
A
. . .
As can easily be seen Λ+ Λ = Ir , and OIr O0 = Ir . So:
||OΛ+ ΛO0 || = ||Ir β|| ≤ ||β||
So ||β + || ≤ ||β||. The latter conclusion follows from the fact that Ir β is a vector with
only the first r entries equal to those of β while the entries from r + 1 to k are zero.
Exercise 1.7
(1) Show that R2 as defined in class for inhomogenous regression (including the con-
stant term) is equal to
s2yby
R2 = ryb
2
y =
syy sybyb
Definitions:
1
PT
1. syby = T i=1 (yi − y)(b
yi − yb)
1
PT
2. syy = T i=1 (yi − y)2
sybyb
3. R2 = syy
Hint: Show that syby = sybyb

2 2
Starting with the definitions We know that T Y = T Yb so Yb = Y . From
2
sY Yb = (Y 0 Yb − T Y Yb ) = (Y 0 Yb − T Y )
2 2
sYb 0 Yb = (Yb 0 Yb − T Yb ) = (Yb 0 Yb − T Y )
8
To show < Y, Yb >=< Yb , Yb > .
Proof:
< Yb , Yb >=< Y − u
b, Yb >
=< Y, Yb > − < u

b, Yb >
| {z }
0
=< Y, Yb >
(2) Show that R2 = 0 if the constant ist the only regressor

Proof:
If the constant is the only regressor then X ∈ RT ×1 so the linear regression model
looks like
YT ×1 = XT ×1 βT ×1 + uT ×1
So X is a column vector of dimension T × 1 with xi = 1 ∀i = 1, . . . , T which we will

denote as 1. The least square estimator βLS = (X 0 X)−1 X 0 Y will in this case look like:
βLS = [10 1]−1 10 Y
= (T )−1 10 Y
1 1 1 1
= [ , , , . . . , , ][Y1 , Y2 , Y3 , . . . , YT ]0
T T T T
T
1X
= Yi
T i=1
=Y
Hence Yb = XβLS = 1Y = [Y , Y , . . . , Y ]0 . If we reconsider the expression for the

R2 , then it will be zero if sY Yb = 0.
9
Calculating sY Yb for our specific Y gives us:
T
1X
sY Yb = (Yi − Y )(Ybi − Yb )
T i=1
T
1X
= (Yi − Y )(Y − Y )
T i=1
T
1X
= (Yi − Y ) × 0
T i=1
=0
So R2 will always be zero if we regress Y on simply the constant.
Exercise 1.8
We have to show
(y − Xβ)0 (y − Xβ) = (y − X β)
b 0 (y − X β) b 0 X 0 X(β − β)
b + (β − β) b
Multiplying (y − Xβ)0 (y − Xβ) and exanding with X βb yields
10
(y − Xβ)0 (y − Xβ) = [(y − X β) b 0 [(y − X β)
b − (Xβ − X β)] b − (Xβ − X β)]
b
b 0 − (X(β − β))
= [(y − X β) b 0 ][(y − X β)
b − (X(β − β))]
b
b 0 (y − X β)
= (y − X β) b 0 (X(β − β))
b − (y − X β) b 0 (y − X β)
b − (X(β − β)) b
| {z } | {z }
y−b
y y−b
y
b 0 (X(β − β))
+ (X(β − β)) b
b 0 (y − X β)
= [(y − X β) b 0X 0)
b + ((β − β)
b − (y − yb)0 (Xβ − yb) − (Xβ − yb)0 (y − yb)

(X(β − β))]
b 0 (y − X β)
= [(y − X β) b
b 0 X 0 )(X(β − β))]
+ ((β − β) b −u b0 (Xβ − yb) − (Xβ − yb)0 u
b
b 0 (y − X β)
= [(y − X β) b
b 0 X 0 )(X(β − β))]
+ ((β − β) b −u b0 yb
b0 Xβ + u
| {z } |{z}
0 0
− (Xβ)0 u
b + yb0 u
b
| {z } |{z}
β0X 0u
b=0 0
b 0 (y − X β)
= [(y − X β) b 0 X 0 )(X(β − β))]
b + ((β − β) b
Exercise 1.9
Show the second claim of item (iii) of the Frisch-Waugh theorem as discussed.
Frisch Waugh Theorem: We partition our regressor matrix X into X = [X1 , X2 ]
with X1 ∈ RT ×k1 , X2 ∈ RT ×k2 and are assuming that rk(X) = k1 + k2 . Then the
residuals of
1. y = X1 β1 + X2 β2 + u
are the same as when regressing
f2 0 X
2. βb2 = (X f2 0 ye)
f2 )−1 (X
11
with P1 = (X10 X1 )−1 X10 and M1 = I − P1 . Here we regress first y on X1 and
denote the residuals of this regression as ye = M1 y. In a second step we then
regress X2 on X1 and again compute the residuals denoted as X
e = M1 X2 . In
a third step we use the formerly computed residuals and run the regression as
stated in (2). In the lecture it was shown that the residuals of (2) are the same
as the ones of (1). We are now asked to show that when running
f2 0 X
3. βb2 = (X f2 0 y)
f2 )−1 (X
the residuals of (3) are not equal with that of (1)=(2). In (3) we use the original
y-variable instead of ye.
Proof:
Write Normal Equations in a partitioned form:
(1*) X10 X1 β1 + X10 X2 β2 = X10 y
(2*) X1 β1 + X20 X2 β2 = X20 y
Now consider the first equation (1*):
X10 X1 β1 + X10 X2 β2 = X10 y (6)
X1 β1 + P1 X2 β2 = P1 y (7)
X1 β1 = −P1 X2 β2 + P1 y (8)
To get from (1) to (2) we have to premultiply (1) by X1 (X10 X1 )−1 . Now look at equation
(2*) and plug in the expression for X1 β1 :
12
X1 β1 + X20 X2 β2 = X20 y (9)
X20 [−P1 X2 β2 + P1 y] + X20 X20 X2 β2 = X20 y (10)
−X20 P1 X2 β2 + X20 P1 y + X20 X2 β2 = X20 y (11)
−X20 P1 X2 β2 + X20 X2 β2 = X20 y − X20 P1 y (12)

projector is idempotent and symmetric
z }| {
X20 [I − P1 ] X2 β2 = X20 [I − P1 ]y (13)
X20 [I − P1 ]0 [I − P1 ]X2 β2 = X20 [I − P1 ]y (14)

f2 0 X
X f2 0 y
f2 β2 = X (15)
f2 0 X
βb2 = (X f2 0 y)
f2 )−1 (X (16)
b = ye − X
The residuals ye = u f2 βb2 do not equal u(∗) = y − X
f2 βb2 .Only in the case
when y equals ye.
Exercise 1.10
0 0 0
CC = (LX + )(LX + ) + (C − LX + )(C − LX + )
0 0 0 0 0
= (LX + )(LX + ) + CC + (LX + )(LX + ) − C(LX + ) − (LX + )C
0 0 0
h 0 0
i0 0 0 0 0 0 0
= CC + 2L(X X)−1 X (X X)−1 X L − CX(X X)−1 L − L(X X)−1 X C
0 0 0 0 0 0 0 0 0 0
= CC + 2L(X X)−1 X X(X X)−1 L − CX(X X)−1 L − L(X X)−1 X C
0 0 0 0 0 0 0
= CC + 2L(X X)−1 L − L(X X)−1 L − L(X X)−1 L
0
= CC
Using the following facts:
0 0
• as X has full rank, we know that X + = (X X)−1 X ,
0 0 0 0 −1 0 −1
• (X X)−1 = (X X) = XX ,
13
0 0 0
• CX = L and therefore X C = L .
Exercise 1.11
The mean squared error (MSE) of an estimator β̃ of β is defined as M SE(β̃) =

E[(β̃ − β)0 (β̃ − β)]. Show the following claims:
(i) If β̃ is unbiased with VCV Σβ̃ β̃ , then it holds that M SE(β̃) = tr(Σβ̃ β̃ ), where
tr denotes the trace of a matrix.
For any estimator of β ∈ Rk we can rewrite
k
X
M SE(β̃) = E[(β̃ − β)0 (β̃ − β)] = E[ (β̃i − βi )2 ]
i=1
Now since β̃ is unbiased, E(β̃) = β, and thus Σβ̃ β̃ = E[(β̃ − β)(β̃ − β)0 ].
Pk 2
Pk
Further tr(Σβ̃ β̃ ) = i=1 [E(β̃i − βi ) ] = E[ i=1 (β̃i − βi )2 ] = M SE(β̃) QED.
(ii) Let β̃1 and β̃2 be two unbiased estimators with covariance matrices Σβ̃1 β̃1 and Σβ̃2 β̃2 .
Show that it holds that
Σβ̃1 β̃1 ≤ Σβ̃2 β̃2 ⇒ M SE(β̃1 ) ≤ M SE(β̃2 )
What does this imply for the OLS estimator?
Define ∆ := Σβ̃2 β̃2 − Σβ̃1 β̃1 . Since Σβ̃1 β̃1 ≤ Σβ̃2 β̃2 , ∆ must be non-negative definite.
Now, using the results from (i), write
M SE(β̃2 ) − M SE(β̃1 ) = tr(Σβ̃2 β̃2 ) − tr(Σβ̃1 β̃1 ) =
14
k
X
tr(Σβ̃2 β̃2 − Σβ̃1 β̃1 ) = tr(∆) = (e0i ∆ei ) ≥ 0
i=1
k
where ei ∈ R is a vector with 1 in the i-th row and all zeros else. Because ∆ is
non-negative definite, all the elements in this sum must be non-negative, and therefore
also the total sum. Thus M SE(β̃1 ) ≤ M SE(β̃2 ). QED
Since the OLS β̂ has the ”smallest” VCV matrix among all linear unbiased esti-
mators of β, this implies that it also has smaller or equal MSE among this class of
estimators.
(iii) Minimize M SE(β̃) over all linear unbiased estimatiors β̃. From the lecture we
know that Σβ̃ β̃ = σ 2 DD0 for all unbiased estimators of β, β̃ = Dy (DX = I).
From (i):
M SE(β̃) = tr(Σβ̃ β̃ ) = tr(σ 2 DD0 )
Using the decomposition lemma, we can write
DD0 = (X + )(X + )0 + (D − X + )(D − X + )0
hence
tr(σ 2 DD0 ) = tr σ 2 (X + )(X + )0 + σ 2 (D − X + )(D − X + ) =

= tr σ 2 (X + )(X + )0 + tr σ 2 (D − X + )(D − X + ) =

    
= σ 2 tr  (X + )(X + )0  + tr (D − X + ) (D − X + )

| {z } | {z } | {z }
independent of D and pos. semi-def. R R0
We minimize this expression over D, where R is R = (r10 , . . . , rT0 ).

T
X
0
tr(RR ) = ||ri ||2 ≥ 0 since ||ri ||2 ≥ 0 ∀i
i=1
Now tr(RR0 ) = 0 which is equivalent to ri0 = (0, . . . , 0) ∀i which is equivalent to

D = X +.
This implies that tr(σ 2 DD0 ) is minimized for D = X + .
15

Ecomt Solns1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ecomt Solns1

Uploaded by

Copyright:

Available Formats

Solutions for Econometrics I Homework No.

Structural form of the problem:

2. qts = β0 + β1 pt−1 + ut2

3. qts = qtd = β0 + β1 pt−1 + ut

Plug this into (3) to get

The variance covariance matrix (VCV) of X ∈ Rk is defined as V ar(X) = E[(X −

2. Cov(X, Y ) = E(XY 0 ) − (EX)(EY )0

3. Cov(AX + a, BY + b) = A[Cov(X, Y )]B 0

4. V ar(AX + a) = A[V ar(X)]A0

E(AX + a) = AE(X) + a (1)

Martin Wagners Comment on that: ”follows from properties of the integral”

Cov(X, Y ) = E[(X − EX)(Y − EY )0 ]

= E[XY 0 − X(EY )0 − EXY 0 + EX(EY )0 ]

= E(XY 0 ) − E[X(EY )0 ] − E[E(X)Y 0 ] + E[EX(EY )0 ]

V ar(AX + a) = V ar(AX) + V ar(a)1

= E[(AX − AEX)(AX − AEX)0 ]

= E[AXX 0 A0 − AX(EX)0 A0 − AEXX 0 A0 + AEX(EX 0 )A0 ]

= AE[XX 0 − X(EX)0 − E(X)X 0 + EX(EX 0 )]A0

Cov(X, Y ) = E(XY 0 ) − (EX)(EY )0

= E[(AX + a − E(AX + a))(BY + b − E(BY + b))0 ]

= E[(AX − AE(X))(BY − BE(Y ))0 ]

= AE[(X − E(X))(Y − E(Y ))]B 0

Let X ∈ RT ×k , Y ∈ RT ×m , 1 = (1, . . . , 1) ∈ RT . Define

ar(X) = T1 [(X − 1X)0 (X − 1X)]

For A ∈ Rk×s , B ∈ Rm×t , a ∈ R1×s , b ∈ R1×t derive the following transformation

We start with a singular value decomposition (SVD) of X ∈ RT ×k , ie. we have

We have to show X 0 Xβ = X 0 y. We plug in the SVD of X and get

Define Σ+ := diag(1/σ1 , . . . , 1/σr , 0, . . . , 0). By this we know that β := V Σ+ U 0 y solves

< a, P b >= a0 X(X 0 X)−1 X 0 b = X[(X 0 X)−1 ]0 X 0 a =< P a, b > (2)

< a − P a, Xb >= (a − P a)0 Xb (3)

(I − P )a = a − P a (I − P )2 = I − 2P + P 2 = I − P P is symmetric implies that

Suppose β + is not a solution to the normal equation. Then:

By using the singular value decomposition from exercise 1.4 we know X = U ΣV 0

Show that a given β with X 0 Xβ = 0 implies β 0 β + = 0:

since O is orthonormal. Furthermore we know that Λ+ = Λ+ Λ+ Λ, so:

the transpose of the latter term is equally zero: β 0 (X 0 X)+ = 00 . So we have

Show that ||β + || ≤ ||β|| where β is a solution to X 0 Xβ = X 0 y:

||β + || = ||(X 0 X)+ X 0 y|| = ||(X 0 X)+ (X 0 X)β|| =

= ||OΛ+ O0 OΛO0 || = ||OΛ+ ΛO0 ||

As can easily be seen Λ+ Λ = Ir , and OIr O0 = Ir . So:

||OΛ+ ΛO0 || = ||Ir β|| ≤ ||β||

Hint: Show that syby = sybyb

=< Y, Yb > − < u

(2) Show that R2 = 0 if the constant ist the only regressor

So X is a column vector of dimension T × 1 with xi = 1 ∀i = 1, . . . , T which we will

βLS = [10 1]−1 10 Y

Hence Yb = XβLS = 1Y = [Y , Y , . . . , Y ]0 . If we reconsider the expression for the

So R2 will always be zero if we regress Y on simply the constant.

Multiplying (y − Xβ)0 (y − Xβ) and exanding with X βb yields

b − (y − yb)0 (Xβ − yb) − (Xβ − yb)0 (y − yb)

are the same as when regressing

(1*) X10 X1 β1 + X10 X2 β2 = X10 y

(2*) X1 β1 + X20 X2 β2 = X20 y

Now consider the first equation (1*):

X10 X1 β1 + X10 X2 β2 = X10 y (6)

X20 [−P1 X2 β2 + P1 y] + X20 X20 X2 β2 = X20 y (10)

−X20 P1 X2 β2 + X20 P1 y + X20 X2 β2 = X20 y (11)

−X20 P1 X2 β2 + X20 X2 β2 = X20 y − X20 P1 y (12)

X20 [I − P1 ]0 [I − P1 ]X2 β2 = X20 [I − P1 ]y (14)

when y equals ye.

Using the following facts:

The mean squared error (MSE) of an estimator β̃ of β is defined as M SE(β̃) =