Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Solutions for Econometrics I Homework No.

1
due 2006-02-20
Feldkircher, Forstner, Ghoddusi, Grafenhofer, Pichler, Reiss, Yan, Zeugner

Exercise 1.1

Structural form of the problem:

1. qtd = α0 + α1 pt + α2 yt + ut1

2. qts = β0 + β1 pt−1 + ut2

To get the reduced form solve your system of equations for the endogenous
variables:

3. qts = qtd = β0 + β1 pt−1 + ut

1
4. pt = α1
[(β0 − α0 ) − α2 yt + β1 pt−1 (ut2 − ut1 )]

To arrive at the final form, each equation may only contain own lags or exogenous
variables on the right-hand side. So (4) is already in the final form and for (3):
Rewrite (1) to get
1 d
pt = [q − α0 − α2 yt − ut1 ]
α1 t
If we lag this we get

1 d
pt−1 = [q − α0 − α2 yt−1 − ut1−1 ]
α1 t−1

Plug this into (3) to get

1 d
qts = qtd = β0 + ut + β1 [ (q − α0 − α2 yt−1 − ut1−1 )]
α1 t−1

1
Exercise 1.2

The variance covariance matrix (VCV) of X ∈ Rk is defined as V ar(X) = E[(X −


EX)(X − EX)0 ].The covariance matrix is given by Cov(X, Y ) = E[(X − EX)(Y −
EY )0 ].Show the following transformation rules, where A, B, a, b are non-random ma-
trices or vectors of suitabel dimensions (A ∈ Rs×k , B ∈ Rt×m and a ∈ Rs , b ∈ Rt )

1. E(AX + a) = AE(X) + a

2. Cov(X, Y ) = E(XY 0 ) − (EX)(EY )0

3. Cov(AX + a, BY + b) = A[Cov(X, Y )]B 0

4. V ar(AX + a) = A[V ar(X)]A0

Proof:

1.

E(AX + a) = AE(X) + a (1)

Martin Wagners Comment on that: ”follows from properties of the integral”


Dominikis comment: ”multiply out the equation and look at the i-th row”

2.

Cov(X, Y ) = E[(X − EX)(Y − EY )0 ]

= E[XY 0 − X(EY )0 − EXY 0 + EX(EY )0 ]

= E(XY 0 ) − E[X(EY )0 ] − E[E(X)Y 0 ] + E[EX(EY )0 ]

= E(XY 0 ) − EX(EY )0

2
3.

V ar(AX + a) = V ar(AX) + V ar(a)1

= V ar(AX) + 0

= E[(AX − AEX)(AX − AEX)0 ]

= E[AXX 0 A0 − AX(EX)0 A0 − AEXX 0 A0 + AEX(EX 0 )A0 ]

= AE[XX 0 − X(EX)0 − E(X)X 0 + EX(EX 0 )]A0

= A[V ar(x)]A0

4.

Cov(X, Y ) = E(XY 0 ) − (EX)(EY )0

= E[(AX + a − E(AX + a))(BY + b − E(BY + b))0 ]


0
{z − a} )(BY + b − BE(Y ) − b) ]
= E[(AX + a − |AEX
follows from 1

= E[(AX − AE(X))(BY − BE(Y ))0 ]

= AE[(X − E(X))(Y − E(Y ))]B 0

5. follows from 3

Exercise 1.3

Let X ∈ RT ×k , Y ∈ RT ×m , 1 = (1, . . . , 1) ∈ RT . Define

1. X = T1 10 X and Y = T1 10 Y

ar(X) = T1 [(X − 1X)0 (X − 1X)]


2. Vd

3. Cov(X,
d Y ) = T1 [(X − 1X)0 (Y − 1Y )]

For A ∈ Rk×s , B ∈ Rm×t , a ∈ R1×s , b ∈ R1×t derive the following transformation


rules:

3
1. XA + 1a = XA + a

Proof:
1 0
XA + 1a =[1 (XA + 1a)]
T
1 0 1
10 1 a)] = XA + T a
[1 XA + |{z}
T T
equals T

= XA + a
0
2. Cov(X,
d Y ) = T1 X 0 Y − X Y

Proof:
1
Cov(X,
d Y ) = [(X − 1X)0 (Y − 1Y )]
T
1 0 0 0
= [X Y − X 0 1Y − X 10 Y + X 10 1Y ]
T
1 0 1 0 1 1 0
= X Y − X 0 1 Y − X 10 Y + T X Y
T |T {z } |T {z } T
0
X Y
1 0 0
= X 0 Y − 2X Y + X Y
T
1 0
= X 0Y − X Y
T
3. Cov(XA
d + 1a, Y B + 1b) = A0 Cov(X,
d Y )B

Proof:
1
Cov(XA
d + 1a, Y B + 1b) = [(XA + 1a − (1XA + 1a))0 (Y B + 1b − (1Y B + 1b))]
T
1
= [(XA − 1XA)0 (Y B − 1Y B)]
T
1
= A0 [(X − (1X)0 (Y − 1Y )]B
T
= A0 Cov(X,
d Y )B

ar(XA + 1a) = A0 Vd
4. Vd ar(X)A

Proof:
1
ar(XA + 1a) = [XA + 1a − 1(XA + 1a)]0 [XA + 1a − 1(overlineXA + 1a)]
Vd
T
1
= [XA + 1a − 1(X
T
4
Exercise 1.4

We start with a singular value decomposition (SVD) of X ∈ RT ×k , ie. we have


U ∈ RT ×T V ∈ Rk×k , both orthogonal matrices, and Σ = diag(σ1 , . . . , σr , 0, . . . , 0),

where σi = λi with λi ’s being the Eigenvalues of X 0 X such that

X = U ΣV 0 .

We have to show X 0 Xβ = X 0 y. We plug in the SVD of X and get

X 0 Xβ = X 0 y

V Σ0 ΣV 0 β = V Σ0 U 0 y |V 0 ·

Σ0 Σ(V 0 β) = Σ0 U 0 y
   
λ1 (V 0 β)1 σ (U 0 y)1
   1 
... ...
   
   
   
λr (V 0 β)r   σr (U 0 y)r
   
 
 = 
   
 0   0 
   
   
 ...   ... 
   
0 0

Define Σ+ := diag(1/σ1 , . . . , 1/σr , 0, . . . , 0). By this we know that β := V Σ+ U 0 y solves


the normal equations.
(Note: Σ+ Σ+ Σ = Σ+ )

Exercise 1.5

Show that X(X 0 X)−1 X 0 is the orthogonal projector on the column space spanned by
X and show that I − X(X 0 X)−1 X 0 is the projector on the ortho-complement of the
column space spanned by X.
Assumption: (X 0 X)−1 is invertible. Define P1 := X(X 0 X)−1 X 0 and P2 =
X(X 0 X)−1 (X 0 X)(X 0 X)−1 X 0 = P1 .
| {z }
I

5
Proof:

< a, P b >= a0 X(X 0 X)−1 X 0 b = X[(X 0 X)−1 ]0 X 0 a =< P a, b > (2)

So P is symmetric.
Secondly, we show that P projects on the space Xb by showing that the remainder
term a − P a is orthogonal to the space Xb.

< a − P a, Xb >= (a − P a)0 Xb (3)

= a0 Xb − a0 X (X 0 X)−1 X 0 X b (4)
| {z }
I
0 0
= a Xb − a Xb = 0 (5)

(I − P )a = a − P a (I − P )2 = I − 2P + P 2 = I − P P is symmetric implies that


I-P is symmetric.
Showing that (I − P )a for some given a projects on the orthocomplement of Xb is
equivalent to showing that (I − P )a is orthogonal to Xb which is algebraically the
same as has been demonstrated above.

Exercise 1.6

Part (i)

Suppose β + is not a solution to the normal equation. Then:

X 0 Xβ + 6= X 0 y where β + = (X 0 X)+ X 0 y

This implies:
(X 0 X)(X 0 X)+ X 0 y 6= X 0 y

By using the singular value decomposition from exercise 1.4 we know X = U ΣV 0


where V is the eigenvector matrix

6
OΛO0 OΛ+ O0 OΣ0 U 0 y 6= OΣ0 U 0 y

OIr Σ0 U 0 y 6= OΣ0 U 0 y

X 0 y 6= X 0 y

which is a contradiction.

Part (ii)

Show that a given β with X 0 Xβ = 0 implies β 0 β + = 0:


For this we show that X 0 Xβ = 0 implies X 0 Xβ + = 0:

X 0 Xβ = OΛO0 β = 0

O0 OΛO0 β = ΛO0 β = O0 0 = 0

since O is orthonormal. Furthermore we know that Λ+ = Λ+ Λ+ Λ, so:

Λ+ O0 β = Λ+ Λ+ 0 = 0

OΛ+ O0 β = (X 0 X)+ β = 0

the transpose of the latter term is equally zero: β 0 (X 0 X)+ = 00 . So we have

β 0 β + = β 0 (X 0 X)+ X 0 y = 00 X0 y = 0

Part (iii)

Show that ||β + || ≤ ||β|| where β is a solution to X 0 Xβ = X 0 y:

||β + || = ||(X 0 X)+ X 0 y|| = ||(X 0 X)+ (X 0 X)β|| =

= ||OΛ+ O0 OΛO0 || = ||OΛ+ ΛO0 ||


since (X 0 X)+ = OΛ+ O0 and (X 0 X) = OΛO0 . Moreover O0 O = I since O is orthonor-
mal.

7
Denote with Ir the ”pseudo-identity matrix” as the matrix with the first r entries in
thediagonal equal to one, and the reamining entries equal to zero.
0 1
1 0 ... 0 0 ...
B C
B
B 0 1 ... 0 0 ... CC
. . .. ..
B C
B .. .. .. C
B . . . C
Ir = B
B C
C
B
B 0 0 ... 1 0 ... CC
B C
B
B 0 0 ... 0 0 ... CC
@ .. .. ..
A
. . .

As can easily be seen Λ+ Λ = Ir , and OIr O0 = Ir . So:

||OΛ+ ΛO0 || = ||Ir β|| ≤ ||β||

So ||β + || ≤ ||β||. The latter conclusion follows from the fact that Ir β is a vector with
only the first r entries equal to those of β while the entries from r + 1 to k are zero.

Exercise 1.7

(1) Show that R2 as defined in class for inhomogenous regression (including the con-
stant term) is equal to
s2yby
R2 = ryb
2
y =
syy sybyb
Definitions:

1
PT
1. syby = T i=1 (yi − y)(b
yi − yb)

1
PT
2. syy = T i=1 (yi − y)2

sybyb
3. R2 = syy

Hint: Show that syby = sybyb


2 2
Starting with the definitions We know that T Y = T Yb so Yb = Y . From

2
sY Yb = (Y 0 Yb − T Y Yb ) = (Y 0 Yb − T Y )
2 2
sYb 0 Yb = (Yb 0 Yb − T Yb ) = (Yb 0 Yb − T Y )

8
To show < Y, Yb >=< Yb , Yb > .
Proof:

< Yb , Yb >=< Y − u
b, Yb >

=< Y, Yb > − < u


b, Yb >
| {z }
0

=< Y, Yb >

(2) Show that R2 = 0 if the constant ist the only regressor


Proof:
If the constant is the only regressor then X ∈ RT ×1 so the linear regression model
looks like
YT ×1 = XT ×1 βT ×1 + uT ×1

So X is a column vector of dimension T × 1 with xi = 1 ∀i = 1, . . . , T which we will


denote as 1. The least square estimator βLS = (X 0 X)−1 X 0 Y will in this case look like:

βLS = [10 1]−1 10 Y

= (T )−1 10 Y
1 1 1 1
= [ , , , . . . , , ][Y1 , Y2 , Y3 , . . . , YT ]0
T T T T
T
1X
= Yi
T i=1

=Y

Hence Yb = XβLS = 1Y = [Y , Y , . . . , Y ]0 . If we reconsider the expression for the


R2 , then it will be zero if sY Yb = 0.

9
Calculating sY Yb for our specific Y gives us:
T
1X
sY Yb = (Yi − Y )(Ybi − Yb )
T i=1
T
1X
= (Yi − Y )(Y − Y )
T i=1
T
1X
= (Yi − Y ) × 0
T i=1

=0

So R2 will always be zero if we regress Y on simply the constant.

Exercise 1.8

We have to show

(y − Xβ)0 (y − Xβ) = (y − X β)
b 0 (y − X β) b 0 X 0 X(β − β)
b + (β − β) b

Multiplying (y − Xβ)0 (y − Xβ) and exanding with X βb yields

10
(y − Xβ)0 (y − Xβ) = [(y − X β) b 0 [(y − X β)
b − (Xβ − X β)] b − (Xβ − X β)]
b

b 0 − (X(β − β))
= [(y − X β) b 0 ][(y − X β)
b − (X(β − β))]
b

b 0 (y − X β)
= (y − X β) b 0 (X(β − β))
b − (y − X β) b 0 (y − X β)
b − (X(β − β)) b
| {z } | {z }
y−b
y y−b
y

b 0 (X(β − β))
+ (X(β − β)) b

b 0 (y − X β)
= [(y − X β) b 0X 0)
b + ((β − β)

b − (y − yb)0 (Xβ − yb) − (Xβ − yb)0 (y − yb)


(X(β − β))]
b 0 (y − X β)
= [(y − X β) b

b 0 X 0 )(X(β − β))]
+ ((β − β) b −u b0 (Xβ − yb) − (Xβ − yb)0 u
b
b 0 (y − X β)
= [(y − X β) b

b 0 X 0 )(X(β − β))]
+ ((β − β) b −u b0 yb
b0 Xβ + u
| {z } |{z}
0 0

− (Xβ)0 u
b + yb0 u
b
| {z } |{z}
β0X 0u
b=0 0

b 0 (y − X β)
= [(y − X β) b 0 X 0 )(X(β − β))]
b + ((β − β) b

Exercise 1.9

Show the second claim of item (iii) of the Frisch-Waugh theorem as discussed.
Frisch Waugh Theorem: We partition our regressor matrix X into X = [X1 , X2 ]
with X1 ∈ RT ×k1 , X2 ∈ RT ×k2 and are assuming that rk(X) = k1 + k2 . Then the
residuals of

1. y = X1 β1 + X2 β2 + u

are the same as when regressing

f2 0 X
2. βb2 = (X f2 0 ye)
f2 )−1 (X

11
with P1 = (X10 X1 )−1 X10 and M1 = I − P1 . Here we regress first y on X1 and
denote the residuals of this regression as ye = M1 y. In a second step we then
regress X2 on X1 and again compute the residuals denoted as X
e = M1 X2 . In

a third step we use the formerly computed residuals and run the regression as
stated in (2). In the lecture it was shown that the residuals of (2) are the same
as the ones of (1). We are now asked to show that when running

f2 0 X
3. βb2 = (X f2 0 y)
f2 )−1 (X

the residuals of (3) are not equal with that of (1)=(2). In (3) we use the original
y-variable instead of ye.
Proof:
Write Normal Equations in a partitioned form:

(1*) X10 X1 β1 + X10 X2 β2 = X10 y

(2*) X1 β1 + X20 X2 β2 = X20 y

Now consider the first equation (1*):

X10 X1 β1 + X10 X2 β2 = X10 y (6)

X1 β1 + P1 X2 β2 = P1 y (7)

X1 β1 = −P1 X2 β2 + P1 y (8)

To get from (1) to (2) we have to premultiply (1) by X1 (X10 X1 )−1 . Now look at equation
(2*) and plug in the expression for X1 β1 :

12
X1 β1 + X20 X2 β2 = X20 y (9)

X20 [−P1 X2 β2 + P1 y] + X20 X20 X2 β2 = X20 y (10)

−X20 P1 X2 β2 + X20 P1 y + X20 X2 β2 = X20 y (11)

−X20 P1 X2 β2 + X20 X2 β2 = X20 y − X20 P1 y (12)


projector is idempotent and symmetric
z }| {
X20 [I − P1 ] X2 β2 = X20 [I − P1 ]y (13)

X20 [I − P1 ]0 [I − P1 ]X2 β2 = X20 [I − P1 ]y (14)


f2 0 X
X f2 0 y
f2 β2 = X (15)
f2 0 X
βb2 = (X f2 0 y)
f2 )−1 (X (16)

b = ye − X
The residuals ye = u f2 βb2 do not equal u(∗) = y − X
f2 βb2 .Only in the case

when y equals ye.

Exercise 1.10

0 0 0
CC = (LX + )(LX + ) + (C − LX + )(C − LX + )
0 0 0 0 0
= (LX + )(LX + ) + CC + (LX + )(LX + ) − C(LX + ) − (LX + )C
0 0 0
h 0 0
i0 0 0 0 0 0 0
= CC + 2L(X X)−1 X (X X)−1 X L − CX(X X)−1 L − L(X X)−1 X C
0 0 0 0 0 0 0 0 0 0
= CC + 2L(X X)−1 X X(X X)−1 L − CX(X X)−1 L − L(X X)−1 X C
0 0 0 0 0 0 0
= CC + 2L(X X)−1 L − L(X X)−1 L − L(X X)−1 L
0
= CC

Using the following facts:

0 0
• as X has full rank, we know that X + = (X X)−1 X ,
 0 0  0 0 −1  0 −1
• (X X)−1 = (X X) = XX ,

13
0 0 0
• CX = L and therefore X C = L .

Exercise 1.11

The mean squared error (MSE) of an estimator β̃ of β is defined as M SE(β̃) =


E[(β̃ − β)0 (β̃ − β)]. Show the following claims:

(i) If β̃ is unbiased with VCV Σβ̃ β̃ , then it holds that M SE(β̃) = tr(Σβ̃ β̃ ), where
tr denotes the trace of a matrix.

For any estimator of β ∈ Rk we can rewrite

k
X
M SE(β̃) = E[(β̃ − β)0 (β̃ − β)] = E[ (β̃i − βi )2 ]
i=1

Now since β̃ is unbiased, E(β̃) = β, and thus Σβ̃ β̃ = E[(β̃ − β)(β̃ − β)0 ].

Pk 2
Pk
Further tr(Σβ̃ β̃ ) = i=1 [E(β̃i − βi ) ] = E[ i=1 (β̃i − βi )2 ] = M SE(β̃) QED.
(ii) Let β̃1 and β̃2 be two unbiased estimators with covariance matrices Σβ̃1 β̃1 and Σβ̃2 β̃2 .
Show that it holds that

Σβ̃1 β̃1 ≤ Σβ̃2 β̃2 ⇒ M SE(β̃1 ) ≤ M SE(β̃2 )

What does this imply for the OLS estimator?

Define ∆ := Σβ̃2 β̃2 − Σβ̃1 β̃1 . Since Σβ̃1 β̃1 ≤ Σβ̃2 β̃2 , ∆ must be non-negative definite.

Now, using the results from (i), write

M SE(β̃2 ) − M SE(β̃1 ) = tr(Σβ̃2 β̃2 ) − tr(Σβ̃1 β̃1 ) =

14
k
X
tr(Σβ̃2 β̃2 − Σβ̃1 β̃1 ) = tr(∆) = (e0i ∆ei ) ≥ 0
i=1
k
where ei ∈ R is a vector with 1 in the i-th row and all zeros else. Because ∆ is
non-negative definite, all the elements in this sum must be non-negative, and therefore
also the total sum. Thus M SE(β̃1 ) ≤ M SE(β̃2 ). QED

Since the OLS β̂ has the ”smallest” VCV matrix among all linear unbiased esti-
mators of β, this implies that it also has smaller or equal MSE among this class of
estimators.
(iii) Minimize M SE(β̃) over all linear unbiased estimatiors β̃. From the lecture we
know that Σβ̃ β̃ = σ 2 DD0 for all unbiased estimators of β, β̃ = Dy (DX = I).
From (i):
M SE(β̃) = tr(Σβ̃ β̃ ) = tr(σ 2 DD0 )

Using the decomposition lemma, we can write

DD0 = (X + )(X + )0 + (D − X + )(D − X + )0

hence
tr(σ 2 DD0 ) = tr σ 2 (X + )(X + )0 + σ 2 (D − X + )(D − X + ) =


= tr σ 2 (X + )(X + )0 + tr σ 2 (D − X + )(D − X + ) =
 
    

= σ 2 tr  (X + )(X + )0  + tr (D − X + ) (D − X + )


| {z } | {z } | {z }
independent of D and pos. semi-def. R R0

We minimize this expression over D, where R is R = (r10 , . . . , rT0 ).


T
X
0
tr(RR ) = ||ri ||2 ≥ 0 since ||ri ||2 ≥ 0 ∀i
i=1

Now tr(RR0 ) = 0 which is equivalent to ri0 = (0, . . . , 0) ∀i which is equivalent to


D = X +.
This implies that tr(σ 2 DD0 ) is minimized for D = X + .

15

You might also like