Professional Documents
Culture Documents
Week2 Lecture1
Week2 Lecture1
University of Amsterdam
Week 2. Lecture 1
February, 2024
1 / 54
Overview
2 / 54
The plan for this week
3 / 54
Recap: Linear model
Last week, we considered the simple linear model with a single regressor xi
yi = α + βxi + εi , i = 1, . . . , n. (1)
4 / 54
Recap: OLS estimator
We used sample data {(yi , xi )}ni=1 to construct statistics that can be used
as estimates of (α, β). For this purpose, we considered the Ordinary Least
Squares (OLS) objective function.
5 / 54
1. Multiple regression setting
6 / 54
1.1. Motivation
7 / 54
Scatter plot
250
200
price
150
100
50
0 5 10 15 20
distance_km
8 / 54
Two models we considered last week
Problem: Both of these models were able to explain some of the features in
the above scatter plot, e.g. that most most expensive hotels are the ones
closer to the city center. However, neither of the models was able to
explain/predict/fit what happens for hotels that are far away from the
center.
9 / 54
Combination of the two
If we believe that distance is more important for hotels closer to the city
center than for those that are outside of the city center, then it is natural to
consider models that combine feature from separate individual models.
While the first model simply adds two regressors linearly to the model, the
second one allows for the interaction effect between the distance variable
and whether the hotel is within 2km or not.
Models of the second type are very common in empirical work. More about
that in Week 5.
10 / 54
1.2. The model
11 / 54
Multiple regression model
Here we slightly deviate from the convention used before (for a reason
obvious soon) and include the intercept as first regressor, i.e. α = β1 and
x1,i = 1, i = 1, . . . , n.
For the follow up results distinction between regressors that vary between
units (e.g. distance) and the ones that do not (intercept) is immaterial.
12 / 54
Interpretation
∂ E[yi |(x1,i , . . . , xK ,i )]
= βk , k = 2, . . . , K . (9)
∂xk,i
If all regressors are continuous. Hence, βk measure the effect of the marginal
change in xk,i on the conditional expectation of yi (given all regressors).
13 / 54
Interpretation. Hotels example.
14 / 54
Not all regressors are equal!
Note that while I use the same notation (x1,i , . . . , xK ,i ) for all regressors (so
they are all mathematically equal), in reality some regressors for economists
are of greater interest than others.
15 / 54
1.3. Empirical results
16 / 54
Empirical results. Model 1.
17 / 54
Empirical results. Model 2.
18 / 54
How should we interpret Model 2? Fitted curves
19 / 54
Conclusions?
On Model 1.
I Just adding two regressors additively does not improve the fit of the
model substantially, i.e. R 2 goes up marginally. Later this week we
explain why R 2 should always go up.
I From Model 1 it is clear that distance does not matter that much for
the variation of prices. It matters more if the distance <2km or not.
So relationship is not linear afterall.
On Model 2.
I Adding interaction term dramatically improves R 2 . Hence, distance
matters!
I But it is mostly important (and can be explained by the model) only if
you are within the 2km radius from the city center.
I For observations outside the 2km radius from the city center, the effect
of distance is even positive!
I This illustrates why models with interacted explanatory variables are so
popular by applied econometricians and economists.
20 / 54
2. Linear model and OLS using matrix notation
21 / 54
2.1. Multiple regression model
22 / 54
How OLS is calculated with multiple regressors?
n K
!2
X X
(βb1 , . . . , βbK ) = arg min yi − βk xk,i . (13)
β1 ,...,βK
i=1 k=1
23 / 54
Derivatives
24 / 54
Not convenient
Here I use the convention that 0 denotes the transpose of a vector, and all
bold quantities are column vectors.
25 / 54
Matrix preliminaries for OLS
y = (y1 , . . . , yn )0 , [n × 1]
ε = (ε1 , . . . , εn )0 , [n × 1]
xi = (x1,i , . . . , xK ,i )0 , [K × 1]
x
(k)
= (xk,1 , . . . , xk,n )0 , [n × 1]
X = (x1 , . . . , xn )0 , [n × K ]
β = (β1 , . . . , βK )0 , [K × 1].
yi = xi0 β + εi , (17)
for all i = 1, . . . , n.
26 / 54
Matrix preliminaries for OLS
y1 = x10 β + ε1 ,
y2 = x20 β + ε2 ,
... = ...,
0
yn−1 = xn−1 β + εn−1 ,
yn = xn0 β + εn .
Or simply as:
y = X β + ε. (18)
27 / 54
Example. Vienna Hotels. Model with interaction.
The first five rows of y and X are given by (without any specific sorting):
81 1 2.737 0 0
85 1 2.254 0 0
83 1 2.737 0 0
y = 82 , X = 1 (19)
1.932 1 1.932
103 1 1.449 1 1.449
.. .. .. .. ..
. . . . .
Here the first column of X is the vector of ones (intercept), the second
column is the distance in kilometres, the third column is a binary variable
that indicates if hotel is <2km from the city center, and the final column is
the product of the latter two.
28 / 54
2.2. OLS estimator
29 / 54
OLS using matrix notation
Note that for any value of β the LSn (β) objective function is a scalar!
30 / 54
Derivatives
∂LSn (β)
= −2X 0 (y − X β).
∂β
31 / 54
The OLS estimator
From the above we conclude that the OLS estimator βb is the solution to
following set of K equations in K unknowns:
0
X (y − X β)
b = 0K . (21)
n
!−1 n
!
−1
X X
0 0 0
βb = (X X) (X y) = xi xi x i yi . (22)
i=1 i=1
−1
Here (·) is the usual matrix inverse.
32 / 54
The OLS estimator. Special case
For the special case we analyzed in the previous week xi = (1, xi )0 and
βb = (b b 0 then:
α, β)
Pn Pn −1 Pn
α 1
= Pni=1 Pni=1 x2i Pni=1 yi
b
(23)
βb i=1 xi i=1 xi i=1 xi yi
We arrive at the expression derived previously in the course upon using the
exact formulas for the inversion of a [2 × 2] matrix.
33 / 54
3. Geometry of OLS
34 / 54
3.1. Fitted values and residuals
35 / 54
LS objective function decomposition
Observe that:
b 0 (y − X βb − X (β − β))
LSn (β) = (y − X βb − X (β − β)) b
b 0 (y − X β)
= (y − X β) b 0 X 0 X (β − β)
b + (β − β) b
b 0 X (β − β)
− (y − X β) b 0 X 0 (y − X β)
b − (β − β) b
36 / 54
LS objective function decomposition
Observe that:
b 0 X = (y − X (X 0 X )−1 X 0 y )0 X = y 0 X − y 0 X = 00 .
(y − X β) (24)
K
Hence:
37 / 54
Decomposition
y
b = X βb = X (X 0 X )−1 X 0 y , (27)
0 −1 0
e
b = y − yb = (In − X (X X) X )y . (28)
Hence both the fitted values and the residuals are certain (linear)
transformations of the original data y .
38 / 54
3.2. Projection matrices
39 / 54
Decomposition
Let:
PX = X (X 0 X )−1 X 0 ,
MX = In − X (X 0 X )−1 X 0 .
Then:
MX + PX = In , (29)
and also:
M X PX = On×n . (30)
These two matrices (MX and PX ) are very special and known to be
projection matrices. Also MX is known to be the residual maker matrix for
an obvious reason.
40 / 54
M
X and P X are projection matrices
V = V 2 = V 0. (31)
41 / 54
Projection matrix P X
42 / 54
Projection matrix M X
MX z = X γ − X (X 0 X )−1 X 0 X γ = X γ − X γ = 0n . (34)
43 / 54
OLS geometrically
Hence, OLS geometrically simply helps to project y on two spaces that are
orthogonal to each other:
I yb fitted values that lie in the K dimensional space spanned by columns
X;
44 / 54
Implication. Projection matrices.
ın = Xe1 , (37)
45 / 54
3.3. The R 2
46 / 54
Some preliminaries
47 / 54
Demeaning projection matrix
y
e ≡ y − ın y = y − ın ı0n y /n = y − ın (ı0n ın )−1 ı0n y . (41)
y
e = M1 y , (42)
48 / 54
SST = SSE + SSR decomposition
y
e = M 1 PX y + M 1 M X y . (43)
M1 MX = MX . (45)
49 / 54
SST = SSE + SSR decomposition
y
e = M 1 PX y + M X y . (46)
Such that:
0
y
e ye = y 0 PX0 M10 M1 PX y + y 0 MX0 MX y . (47)
Where we used the fact that because MX = MX0 = MX2 (and the same for
M1 ):
0 0
MX M1 PX = (M1 MX ) PX = MX PX = O. (48)
50 / 54
The R 2
e0 ye = yb0 M yb + eb0 eb .
y (49)
|{z} | {z1 } |{z}
SST SSE SSR
Hence: 0
SSE y MX y
R2 ≡ =1− . (50)
SST y 0 M1 y
51 / 54
4. Summary
52 / 54
Summary today
In this lecture
I We introduced the multiple regression framework.
I We introduced the vector/matrix notation for this framework.
I We showed how the OLS estimator can be derived using this new
notation.
I We provided a geometrical interpretation for the OLS estimator.
I We gave residuals and fitted values interpretations in terms of the
corresponding orthogonal projections.
53 / 54
On Friday
54 / 54