Professional Documents
Culture Documents
Tutorial 2 2023
Tutorial 2 2023
Tutorial 2 2023
Spring 2023
Question 1
Consider two invertible matrices A and B of the same dimensions. Show the inverse of the product
AB exists and is given by
Solution:
Existence. If both A and B are non-singular matrices, the determinant of AB equals to
For now let us take the result in Eq. 2 as granted. The proof of Eq. 2 is out of the scope for
this course.
6 0 because det|A| =
Following the result in 2, one concludes that det|AB| = 6 0 and det|B| =
6 0. If
6 0, then the inverse of AB exists.
det|AB| =
1
Question 2
Show that the transpose of the product of an arbitrary number of factors is the product of the
transposes of the individual factors in completely reversed order:
Solution:
Let us consider matrices A and B with the 2 × 2 dimensions.
" # " #
a11 a12 b11 b12
A= B= (5)
a21 a22 b21 b22
Indeed, the results in Eq.s. 6 and 7 are the same. Moreover, note that any type of matrix can be
summarized as the 2-by-2 matrix. For example, if matrix C has (n + 1) × (m + 1) dimension, we
can re-write the matrix as the following way:
" #
c11 c21
C= (8)
c12 c22
where c11 is n × m matrix, c21 is n × 1 vector, c12 is 1 × m vector, and c22 is a scalar. Then, by
doing the calculation above, you can still show that (AC)T = C T AT .
Furthermore, following the associativity rule (i.e (AB)C = A(BC)), the result can be general-
ized into an arbitrary number of matrices.
2
Question 3
Let A be an n × m matrix. Show that
Solution:
The first part, rank(A) = rank(AT ) is trivial (the column and row ranks of any matrix are equal).
Next, let us show
To do that, we can show that matrices AT and AAT have the same null spaces.1 If these matrices
have the same null spaces, then they have the same ranks.2 To prove that the matrices AT and
AAT have the same null spaces, we need to show that AT x = 0 if and only if AAT x = 0, where x
is a column vector.
Step 1: Show that if AT x = 0, then AAT x = 0. Following associativity rule
Step 2: Show that if AAT x = 0, then AT x = 0. If AAT x = 0, then pre-multplying the both-sides
by xT , we find
xT AAT x = 0
(AT x)T (AT x) = 0
||AT x|| = 0 (12)
Let b = AT x where b is a column vector. Then, the condition above implies that i b2i = 0, where
P
bi is the ith column element of b. The condition i b2i = 0 is only satisfied if bi = 0 ∀i. Therefore,
P
AT x = 0 holds if AAT x = 0.
The results in Step 1 and Step 2 imply that AT and AAT have the same null spaces. Conse-
quently, these matrics have the same ranks.
Following the same logic, you can show that matrices A and AT A have the same null spaces.
1 The null space of a matrix A is the set of all solutions to Ax = 0. The nullity of a matrix is the number of
vectors in the null space.
2 Rank plus nullity theorem: Rank of A plus nullity of A equals the number of columns of A. Therefore, if A and
B have the same number of columns and same null space, they have the same rank.
3
Question 4
Suppose that X = [ι X1 X2 ] where ι is a n-vector of ones, X1 is an n × k1 matrix, and X2 is an
n × k2 matrix. Write the matrix XX T in terms of the components of X. What are the dimensions
of its component matrices?
Solution:
h i ιT
h i
XX T = ι X1 X2 X1T = ιιT + X1 X1T + X2 X2T (13)
X2T
Question 5
Consider the following model:
Y = β0 + βX + |X|ε (14)
Solution 5.1:
Note that Y and X are random variables, β0 and β are scalars, and E(ε|X) = E(ε) since ε and X
are independent.
4
Solution 5.2:
First let us show that if β = 0, then Cov(X, Y ) = 0 holds. Secondly, we will show that if
Cov(X, Y ) = 0 and V ar(X) > 0, β = 0 holds.
Step 2: Now let us show that if Cov(X, Y ) = 0. then β = 0. Cov(X, Y ) = 0 implies that
E(Y X) = E(Y ) E(X). In other words, the following holds when Cov(X, Y ) = 0
Exploiting the independence between ε and X we can rewrite the expression above as
Note that E(X 2 ) 6= E(X) E(X) as long as V ar(X) = E(X 2 ) − E(X) E(X) > 0. Thus, the condition
above holds only if β = 0.
Following the results in Step 1 and Step 2, we proved that β = 0 iff Cov(Y, X) = 0.
Question 6
Suppose that Y is a square integrable random variable, i.e. EY 2 < ∞, and X is a random vector
in Rk such that X1 ≡ 1 and E(XX T ) is invertible. The best linear predictor of Y given X that
minimizes the mean squared criterion is X T β ∗ , where
3 By LIE E(|X|εX) = E[E(|X|εX|X)] = E[|X| E(ε|X)X] = 0 and E(|X|ε) = E[E(|X|ε)|X] = E[|X| E(ε)|X] = 0.
5
is called the optimal linear LS approximation coefficient.
1. Show that
3. Suppose that the conditional mean E[Y |X] = X T β for some β ∈ Rk . Show that β ∗ = β and
E(ε∗ |X) = 0.
Solution 6.1.:
Let Q(β) = E(Y − X T β)2 . Q(b) is continuous and twice differentiable in β. Thus,
∂Q(β)
|β=β ∗ = − E((2Y X T )T ) + 2 E(XX T )β ∗ = 0 (24)
β
= − E(XY )2 + 2 E(XX T )β ∗ = 0 (25)
−1
β ∗ = E(XX T ) E(XY ) (26)
Solution 6.2.:
E(Xε∗ ) = E X(Y − X T β ∗ )
= E(XY ) − E(XX T )β ∗
= E(XY ) − E(XX T ) E(XX T )−1 E(XY )
= E(XY ) − E(XY ) = 0 (27)
Solution 6.3.:
6
E(Y |X) = X T β implies we can write Y as Y = X T β + ε where E(ε|X) = 0. Thus
Y = XT β + ε
XY = XX T β + Xε
E(XY |X) = E(XX T β|X) + E(Xε|X)
E(XY |X) = E(XX T |X)β
LIE : E(XY ) = E(XX T )β
β = [E(XX T )]−1 E(XY ) = β ∗ (28)
Next,
ε∗ = Y − X T β ∗
E(ε∗ |X) = E(Y |X) − E(X T β ∗ |X)
E(ε∗ |X) = X T β − X T β ∗ = 0 (29)
The first line follows the definition of ε∗ . The second line is obtained by finding the conditional
expectations from the both sides of the equation. The third line is obtained following the result
that the conditional mean is E(Y |X) = X T β iff β = β ∗ .
Question 7
Suppose that the simple regression model y = βx + ε satisfies the LRM assumptions.
1. Fin the minimum mean squared error linear estimator of β, i.e., β̃ = c∗T y where
2. Compare the conditional expectation and variance of β̃ given x with those of the OLS esti-
mator.
Solution We start by writing what the two terms (expectation and variance) in the function
we want to minimize look like
1.
7
Hence
2
M SE = σ 2 c0 c + (c0 βx − β) = σ 2 c0 c + β 2 (c0 x − 1)2
∂M SE
= 2σ 2 c + 2β 2 (c0 x − 1)x = 0
∂c
=⇒σ 2 c = β 2 (1 − c0 x)x (30)
Premultiplying (30) with x0 then yields (note the scalars that can be moved around and that
x0 c = c0 x)
σ 2 x0 c = β 2 (1 − c0 x)x0 x
x0 c(σ 2 + β 2 x0 x) = x0 xβ 2
x0 c = (σ 2 + β 2 x0 x)−1 x0 xβ 2 (31)
σ 2 y 0 c = β 2 (1 − c0 x)y 0 x
β̃ = σ −2 β 2 (1 − c0 x)y 0 x (32)
β̃ = σ −2 β 2 (1 − (σ 2 + β 2 x0 x)−1 x0 xβ 2 )y 0 x
σ + β 2 x0 x − x0 xβ 2
2
−2 2
β̃ = σ β y0 x
σ 2 + β 2 x0 x
−1 0
β̃ = β 2 σ 2 + β 2 x0 x xy
Note that the estimator is a function of the unknown parameters β and σ and cannot be
computed - it is infeasible- and it is formally not an estimator, since it is not a mapping from
the data to the parameter space. We can often obtain an estimator that is similar to the
infeasible one by first estimating these unknown parameters.
8
2. We can then calculate the (conditional) expected value and variance:
h i −1 0 −1 0
E β̃|x = β 2 σ 2 + β 2 x0 x x E [y|x] = β 2 σ 2 + β 2 x0 x x xβ
2 −1
σ
= + x0 x x0 xβ
β2
h i −2 0 −2 2 0
V ar β̃|x = β 4 σ 2 + β 2 x0 x x V ar [y|x] x = β 4 σ 2 + β 2 x0 x σ xx
2 −2
σ
= + x0 x σ 2 x0 x
β2
h i σ2
V ar β̂|x = V ar (x0 x)−1 x0 y|x = 0
xx
We see that β̃ is biased and its expected value not equal to the expected value of the ordinary
least squares estimator (which is unbiased). Furthermore, taking the ratio
h i
V ar β̃|x (x0 x)2
h i= 2
V ar β̂|x (σ 2 /β 2 + x0 x)
shows that the variance of the mean squared estimator is always smaller than the variance of
the OLS estimator. But as x0 x grows larger, the variances become more similar (the ratio
goes to one).
Question 8
Verify Properties (2) and (3) of the OLS estimator β̂ using a Monte Carlo simulation. Specifically,
suppose that the data {(Yi , Xi ) : i = 1, ..., n} are i.i.d. and generated from the following model:
where Vi , ηi and ui are independent, ui ∼ N (0, 1). ηi is a Rademacher random variable (i.e.,
P (ηi = 1) = P (ηi = −1) = 1/2), and Vi ∼ U [0, 1].
2. Generate data from the true model with n = 100 and β = [1 0.5]T
9
3. Estimate β using OLS. Plot the fitted line and the data.
4. Repeat (b) 1,000 times and check if the average and the variance of the estimates are close
to the corresponding theoretical values. When approximating the conditional quantities, use
a fixed sample of (Vi , ηi ).
Solution 8.1:
10