Tutorial 2 2023

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Tutorial 2: Linear Regression Model and OLS

Spring 2023

Question 1
Consider two invertible matrices A and B of the same dimensions. Show the inverse of the product
AB exists and is given by

(AB)−1 = B −1 A−1 . (1)

Solution:
Existence. If both A and B are non-singular matrices, the determinant of AB equals to

det|AB| = det|A| · det|B| (2)

For now let us take the result in Eq. 2 as granted. The proof of Eq. 2 is out of the scope for
this course.
6 0 because det|A| =
Following the result in 2, one concludes that det|AB| = 6 0 and det|B| =
6 0. If
6 0, then the inverse of AB exists.
det|AB| =

The inverse is given by B −1 A−1 .  


If the inverse of AB is B −1 A−1 , then the following must hold: AB B −1 A−1 = I. Let us show
that indeed this conditions holds
 
AB B −1 A−1 = I / post multiple by A

ABB −1 = A / post multiple by B


AB = AB (3)

1
Question 2
Show that the transpose of the product of an arbitrary number of factors is the product of the
transposes of the individual factors in completely reversed order:

(ABC...)T = ...C T B T AT . (4)

Solution:
Let us consider matrices A and B with the 2 × 2 dimensions.
" # " #
a11 a12 b11 b12
A= B= (5)
a21 a22 b21 b22

The transpose of the (AB) equals to


 
" #T a11 b11 + a12 b21 a21 b11 + a22 b21
a11 b11 + a12 b21 a11 b12 + a12 b22
(AB)T = = a11 b12 + a12 b22 a21 b12 + a22 b22  (6)
 
a21 b11 + a22 b21 a21 b12 + a22 b22

Now let us check what B T AT is:


 
" # " # a11 b11 + a12 b21 a21 b11 + a22 b21
b11 b21 a11 a21
B T AT = · = a11 b12 + a12 b22 a21 b12 + a22 b22  (7)
 
b12 b22 a12 a22

Indeed, the results in Eq.s. 6 and 7 are the same. Moreover, note that any type of matrix can be
summarized as the 2-by-2 matrix. For example, if matrix C has (n + 1) × (m + 1) dimension, we
can re-write the matrix as the following way:
" #
c11 c21
C= (8)
c12 c22

where c11 is n × m matrix, c21 is n × 1 vector, c12 is 1 × m vector, and c22 is a scalar. Then, by
doing the calculation above, you can still show that (AC)T = C T AT .
Furthermore, following the associativity rule (i.e (AB)C = A(BC)), the result can be general-
ized into an arbitrary number of matrices.

2
Question 3
Let A be an n × m matrix. Show that

rank(A) = rank(AT ) = rank(AT A) = rank(AAT ). (9)

Solution:
The first part, rank(A) = rank(AT ) is trivial (the column and row ranks of any matrix are equal).
Next, let us show

rank(AT ) = rank(AAT ). (10)

To do that, we can show that matrices AT and AAT have the same null spaces.1 If these matrices
have the same null spaces, then they have the same ranks.2 To prove that the matrices AT and
AAT have the same null spaces, we need to show that AT x = 0 if and only if AAT x = 0, where x
is a column vector.
Step 1: Show that if AT x = 0, then AAT x = 0. Following associativity rule

AAT x = A(AT x) = A0 = 0. (11)

Step 2: Show that if AAT x = 0, then AT x = 0. If AAT x = 0, then pre-multplying the both-sides
by xT , we find

xT AAT x = 0
(AT x)T (AT x) = 0
||AT x|| = 0 (12)

Let b = AT x where b is a column vector. Then, the condition above implies that i b2i = 0, where
P

bi is the ith column element of b. The condition i b2i = 0 is only satisfied if bi = 0 ∀i. Therefore,
P

AT x = 0 holds if AAT x = 0.
The results in Step 1 and Step 2 imply that AT and AAT have the same null spaces. Conse-
quently, these matrics have the same ranks.
Following the same logic, you can show that matrices A and AT A have the same null spaces.
1 The null space of a matrix A is the set of all solutions to Ax = 0. The nullity of a matrix is the number of
vectors in the null space.
2 Rank plus nullity theorem: Rank of A plus nullity of A equals the number of columns of A. Therefore, if A and
B have the same number of columns and same null space, they have the same rank.

3
Question 4
Suppose that X = [ι X1 X2 ] where ι is a n-vector of ones, X1 is an n × k1 matrix, and X2 is an
n × k2 matrix. Write the matrix XX T in terms of the components of X. What are the dimensions
of its component matrices?

Solution:

 
h i ιT
h i
XX T = ι X1 X2 X1T  = ιιT + X1 X1T + X2 X2T (13)
 

X2T

This has a dimension of n × n (as well as its component).

Question 5
Consider the following model:

Y = β0 + βX + |X|ε (14)

where E(X) = 0, V ar(X) > 0, E(ε) = 0, and ε and X are independent.

1. Find E(Y |X) and V ar(Y |X).

2. Show that β = 0 if and only if Cov(Y, X) = 0.

Solution 5.1:
Note that Y and X are random variables, β0 and β are scalars, and E(ε|X) = E(ε) since ε and X
are independent.

E(Y |X) = E(β0 + βX + |X|ε|X) = β0 + βX + |X| E(ε|X)


= β0 + βX (15)

V ar(Y |X) = E((Y − E(Y |X))2 |X)


= E((|X|ε)2 |X)
= E((X 2 ε2 |X)
= X 2 E(ε2 |X) (16)

4
Solution 5.2:
First let us show that if β = 0, then Cov(X, Y ) = 0 holds. Secondly, we will show that if
Cov(X, Y ) = 0 and V ar(X) > 0, β = 0 holds.

Step 1: If β = 0, then Y = β0 + |X|ε. Now let us find Cov(Y, X):

Cov(Y, X) = E(Y X) − E(Y ) E(X)


= E((β0 + |X|ε)X) − E(β0 + |X|ε) E(X)
= β0 E(X) + E(|X|εX) − β0 E(X) − E(|X|ε) E(X). (17)

Since E(|X|εX) = 0 and E(|X|ε) = 0 we can write3

Cov(Y, X) = β0 E(X) + 0 − β0 E(X) − 0 = 0. (18)

Step 2: Now let us show that if Cov(X, Y ) = 0. then β = 0. Cov(X, Y ) = 0 implies that
E(Y X) = E(Y ) E(X). In other words, the following holds when Cov(X, Y ) = 0

E((β0 + βX + |X|ε)X) = E(β0 + βX + |X|ε) E(X)


β0 E(X) + β E(X 2 ) + E(|X|εX) = β0 E(X) + β E(X) E(X) + E(|X|ε) E(X). (19)

Exploiting the independence between ε and X we can rewrite the expression above as

β0 E(X) + β E(X 2 ) = β0 E(X) + β E(X) E(X)


β E(X 2 ) = β E(X) E(X) (20)

Note that E(X 2 ) 6= E(X) E(X) as long as V ar(X) = E(X 2 ) − E(X) E(X) > 0. Thus, the condition
above holds only if β = 0.
Following the results in Step 1 and Step 2, we proved that β = 0 iff Cov(Y, X) = 0.

Question 6
Suppose that Y is a square integrable random variable, i.e. EY 2 < ∞, and X is a random vector
in Rk such that X1 ≡ 1 and E(XX T ) is invertible. The best linear predictor of Y given X that
minimizes the mean squared criterion is X T β ∗ , where

β ∗ = arg min E(Y − X T β)2 (21)


β∈Rk

3 By LIE E(|X|εX) = E[E(|X|εX|X)] = E[|X| E(ε|X)X] = 0 and E(|X|ε) = E[E(|X|ε)|X] = E[|X| E(ε)|X] = 0.

5
is called the optimal linear LS approximation coefficient.

1. Show that

β ∗ = [E(XX T )]−1 E[XY ] (22)

2. Define ε∗ = Y − X T β ∗ . Show that E Xε∗ = 0.

3. Suppose that the conditional mean E[Y |X] = X T β for some β ∈ Rk . Show that β ∗ = β and
E(ε∗ |X) = 0.

Solution 6.1.:
Let Q(β) = E(Y − X T β)2 . Q(b) is continuous and twice differentiable in β. Thus,

Q(β) = E(Y − X T β)2


= E(Y 2 − 2Y X T β + β T XX T β) (23)

Setting the FOC equals to zero, we find the optimum β ∗ :

∂Q(β)
|β=β ∗ = − E((2Y X T )T ) + 2 E(XX T )β ∗ = 0 (24)
β
= − E(XY )2 + 2 E(XX T )β ∗ = 0 (25)
 −1
β ∗ = E(XX T ) E(XY ) (26)

Solution 6.2.:

Define ε∗ = Y − X T β ∗ . Show that E(Xε∗ ) = 0.

E(Xε∗ ) = E X(Y − X T β ∗ )
= E(XY ) − E(XX T )β ∗
= E(XY ) − E(XX T ) E(XX T )−1 E(XY )
= E(XY ) − E(XY ) = 0 (27)

Solution 6.3.:

6
E(Y |X) = X T β implies we can write Y as Y = X T β + ε where E(ε|X) = 0. Thus

Y = XT β + ε
XY = XX T β + Xε
E(XY |X) = E(XX T β|X) + E(Xε|X)
E(XY |X) = E(XX T |X)β
LIE : E(XY ) = E(XX T )β
β = [E(XX T )]−1 E(XY ) = β ∗ (28)

Next,

ε∗ = Y − X T β ∗
E(ε∗ |X) = E(Y |X) − E(X T β ∗ |X)
E(ε∗ |X) = X T β − X T β ∗ = 0 (29)

The first line follows the definition of ε∗ . The second line is obtained by finding the conditional
expectations from the both sides of the equation. The third line is obtained following the result
that the conditional mean is E(Y |X) = X T β iff β = β ∗ .

Question 7
Suppose that the simple regression model y = βx + ε satisfies the LRM assumptions.

1. Fin the minimum mean squared error linear estimator of β, i.e., β̃ = c∗T y where

c∗ = argminc∈Rn = {V ar(cT y|x) + (E[cT y − β|x])2 }

2. Compare the conditional expectation and variance of β̃ given x with those of the OLS esti-
mator.

Solution We start by writing what the two terms (expectation and variance) in the function
we want to minimize look like

1.

E [c0 y − β|x] = c0 E [xβ + ε|x] − β = c0 xβ − β


V ar [c0 y|x] = c0 V ar [y|x] c = c0 V ar [xβ + ε|x] c = σ 2 c0 c

7
Hence
2
M SE = σ 2 c0 c + (c0 βx − β) = σ 2 c0 c + β 2 (c0 x − 1)2

To minimize, we get the first order condition:

∂M SE
= 2σ 2 c + 2β 2 (c0 x − 1)x = 0
∂c
=⇒σ 2 c = β 2 (1 − c0 x)x (30)

Premultiplying (30) with x0 then yields (note the scalars that can be moved around and that
x0 c = c0 x)

σ 2 x0 c = β 2 (1 − c0 x)x0 x
x0 c(σ 2 + β 2 x0 x) = x0 xβ 2
x0 c = (σ 2 + β 2 x0 x)−1 x0 xβ 2 (31)

Similarly, premultiplying (30) with y 0 then yields

σ 2 y 0 c = β 2 (1 − c0 x)y 0 x
β̃ = σ −2 β 2 (1 − c0 x)y 0 x (32)

Substituting (31) into (32) we get:

β̃ = σ −2 β 2 (1 − (σ 2 + β 2 x0 x)−1 x0 xβ 2 )y 0 x
σ + β 2 x0 x − x0 xβ 2
 2 
−2 2
β̃ = σ β y0 x
σ 2 + β 2 x0 x
−1 0
β̃ = β 2 σ 2 + β 2 x0 x xy

Note that the estimator is a function of the unknown parameters β and σ and cannot be
computed - it is infeasible- and it is formally not an estimator, since it is not a mapping from
the data to the parameter space. We can often obtain an estimator that is similar to the
infeasible one by first estimating these unknown parameters.

8
2. We can then calculate the (conditional) expected value and variance:
h i −1 0 −1 0
E β̃|x = β 2 σ 2 + β 2 x0 x x E [y|x] = β 2 σ 2 + β 2 x0 x x xβ
 2 −1
σ
= + x0 x x0 xβ
β2
h i −2 0 −2 2 0
V ar β̃|x = β 4 σ 2 + β 2 x0 x x V ar [y|x] x = β 4 σ 2 + β 2 x0 x σ xx
 2 −2
σ
= + x0 x σ 2 x0 x
β2

For the ordinary least squares estimator, we have


h i
E β̂|x = E (x0 x)−1 x0 y|x = β
 

h i σ2
V ar β̂|x = V ar (x0 x)−1 x0 y|x = 0
 
xx

We see that β̃ is biased and its expected value not equal to the expected value of the ordinary
least squares estimator (which is unbiased). Furthermore, taking the ratio
h i
V ar β̃|x (x0 x)2
h i= 2
V ar β̂|x (σ 2 /β 2 + x0 x)

shows that the variance of the mean squared estimator is always smaller than the variance of
the OLS estimator. But as x0 x grows larger, the variances become more similar (the ratio
goes to one).

Question 8
Verify Properties (2) and (3) of the OLS estimator β̂ using a Monte Carlo simulation. Specifically,
suppose that the data {(Yi , Xi ) : i = 1, ..., n} are i.i.d. and generated from the following model:

Xi = exp(Vi ηi ), and (33)


Yi = [1 Xi ]β + ηi ui (34)

where Vi , ηi and ui are independent, ui ∼ N (0, 1). ηi is a Rademacher random variable (i.e.,
P (ηi = 1) = P (ηi = −1) = 1/2), and Vi ∼ U [0, 1].

1. Check whether the model satisfies the LRM assumptions.

2. Generate data from the true model with n = 100 and β = [1 0.5]T

9
3. Estimate β using OLS. Plot the fitted line and the data.

4. Repeat (b) 1,000 times and check if the average and the variance of the estimates are close
to the corresponding theoretical values. When approximating the conditional quantities, use
a fixed sample of (Vi , ηi ).

Solution 8.1:

• Y is linear in parameters - YES

• E(Xi XiT ) is invertible. True, because V ar(Xi ) > 0.

• E(ηi ui |Xi ) = 0? YES, because

E(ηi ui |Xi ) = E(ηi ui |Vi , ηi )


= ηi E(ui |Vi , ηi )
= ηi E(ui ) = 0

• {Xi , Yi } is i.i.d. - YES

Therefore the OLS estimator of β must be unbiased.

Solution 8.2, 8.3 and 8.4:


Check the R code that is provided on Canvas.

10

You might also like