Professional Documents
Culture Documents
Appendex E
Appendex E
This appendix derives various results for ordinary least squares estimation of the multiple linear
regression model using matrix notation and matrix algebra.
where yt is the dependent variable for observation t, and xtj, j = 2,3, …, k, are the independent
variables.
[Some authors prefer to define xt as a column vector, in which case, xt is replaced with x’t.
Mathematically, it makes more sense to define it as a row vector.]
We can write in full matrix notation by appropriately defining data vectors and
1
Estimation of β proceeds by minimizing the sum of squared residuals. Define the sum of squared
residuals function for any possible k x 1 parameter vector b as
. This is equivalent to
(We have divided by -2 and taken the transpose.) We can write this first order condition as
We want to write these in matrix form to make them more useful. Using the formula for
partitioned multiplication in Appendix D, we see that
2
is equivalent to
or
It can be shown that this formula always has at least one solution. Multiple solutions do not help
us, as we are looking for a unique set of OLS estimates given our data set.
The assumption that X’X is invertible is equivalent to the assumption that rank(X) = k, which
means that the columns of X must be linearly independent. This is the matrix version of multiple
linear regression assumption 4 in Chapter 3.
The flaw in this reasoning is that X is usually not a square matrix, and so it cannot be inverted. In
other words, we cannot write unless n = k, a case that virtually never arises
in practice.
From first order condition and the definition of vector , we can see that the
first order condition for vector , is the same as
3
Because the first column of X consists entirely of ones, implies that the OLS residuals
always sum to zero when an intercept is included in the equation and that the sample covariance
between each independent variable and the OLS residuals is zero.
Using matrix, we can show that the total sum of squares is equal to the explained sum of squares
plus the sum of squared residuals.
1. Linear in parameters
3. No perfect collinearity
This is a careful statement of the assumption that rules out linear dependencies among the
explanatory variables. X’X is nonsingular, and so vector is unique and can be written as in
4
= so we have seen that it is unbiased
To obtain the simplest form of the variance-covariance matrix of , we impose the assumptions
of homoskedasticity and no serial correlation.
The variance of ut cannot depend on any element of X, and the variance must be constant across
observations t. The no serial correlation assumption: the errors cannot be correlated across
observations. We often say that u has scalar variance-covariance matrix when the assumption
holds. We can now derive the variance-covariance matrix of the OLS estimator.
5
This means that the variance of (conditional on X) is obtained by multiplying σ2 by the jth
diagonal element of (X’X)-1. The equation also tells us how to obtain the covariance between any
two OLS estimates: multiply σ2 by the appropriate off diagonal element of (X’X)-1.
it is true
6
Unbiasedness of
Based on the above assumptions
Proof
7
E.3 Statistical Inference
When we add the final classical linear model assumption, vector has a multivariate normal
distribution, which leads to the t and F distributions.
Under Assumption MLR.5, each ut is independent of the explanatory variables for all t. In a time
series setting, this is essentially the strict exogeneity assumption.
This theorem (normality of distribution of ) is the basis for statistical inference involving .
We can also find the chi-square distribution, t-distribution, and F distributions of the .
Normal distribution:-
Chi-square distribution:-
8
t-distribution: a normal distribution over a chi-square distribution is a t distribution with the chi-
squared degree of freedom.