Professional Documents
Culture Documents
Week 2, OLS
Week 2, OLS
Week 2, OLS
ECO 6175
ABEL BRODEUR
Weeks 2 & 3
1/ 83
Statistical Tools
Outline:
I OLS
I Limitations of OLS
I Omitted variable bias
I Measurement error
I Reverse causality
I Heteroscedasticity
I Panel Data
I Discrete Choice Models
2/ 83
Ordinary Least Square
Yi = β0 + β1 xi + εi
3/ 83
Regression
4/ 83
Let’s Solve!
X X
minimize ε̂2i = (Yi − Ŷi )2
β̂0 ,β̂1 i i
Graph!
5/ 83
First-Order Conditions
∂ X
= −2 (Yi − Ŷi ) = 0 (1)
∂ β̂0 i
∂ X
= −2 xi (Yi − Ŷi ) = 0 (2)
∂ β̂1 i
6/ 83
First-Order Conditions
7/ 83
First-Order Conditions
From eq. (2) X
xi (Yi − Ŷi ) = 0
i
X
xi (Yi − β̂0 − β̂1 xi ) = 0
i
X
xi (Yi − Ȳ + β̂1 x̄ − β̂1 xi ) = 0
i
8/ 83
Least Squares Formulas
P
xi (Yi − Ȳ)
β̂1 = Pi
i xi (xi − x̄)
Be careful:
(xi − x̄) 6= 0
P
i (Y − Ȳ)(xi − x̄)
β̂1 = Pi 2
i (xi − x̄)
9/ 83
Condition
10/ 83
Hessian Matrix
∂ 2 RSS ∂ 2 RSS
∂ β̂02 ∂ β̂0 ∂ β̂1
H=
2
∂ RSS d2 RSS
11/ 83
Hessian Matrix
If var(X
ˆ > 0), then this is a positive definite matrix since all
the principal submatrices have positive determinants:
1P
1 N i
xi
H = 2N
1 P 1 P 2
xi x
N i N i i
12/ 83
Hessian Matrix
If var(X
ˆ > 0), then this is a positive definite matrix since all
the principal submatrices have positive determinants:
1P
1 xi
N i 1X 2 1X
= xi − x̄2 = (xi − x̄)2
1 P 1 P 2 N i N i
xi xi
N i N i
13/ 83
Multiple Independent Variables
yi = β0 + β1 xi1 + β2 xi2 + εi
Obtaining OLS estimates:
− x¯2 )2 i (xi1 − x¯1 )(yi − ȳ) − i (xi1 − x¯1 )(xi2 − x¯2 ) i (xi2 − x¯2 )(yi − ȳ)
P P P P
i (xi2
βˆ1 = P P P 2
2 2
i (xi1 − x¯1 ) i (xi2 − x¯2 ) − i (xi1 − x¯1 )(xi2 − x¯2 )
14/ 83
Multiple Independent Variables
ŷ1 x11 β̂1 + · · · + x1j β̂j + · · · + x1k β̂k
ŷ2 x21 β̂1 +
. .
· · · + x2j β̂j + · · · + x2k β̂k
. .
. .
=
ŷi xi1 β̂1 + · · · + xij β̂j + · · · + xik β̂k
. .
.. ..
ŷN xn1 β̂1 + · · · + xnj β̂j + · · · + xnk β̂k
15/ 83
Multiple Independent Variables
β = (β0 , β1 , · · · , βp )0
Ŷ = Xβ,
which is an n-dimensional vector.
16/ 83
Multiple Independent Variables
X X
2
min ( (yi − ŷi ) = (yi − xi1 β̂1 − ... − xik β̂k )2 )
β̂1 ,··· ,β̂k i i
17/ 83
First-Order Conditions
∂ X
= −2 xi1 ε̂i = 0
∂ β̂1 i
∂ X
= −2 xi2 ε̂i = 0
∂ β̂2 i
..
.
∂ X
= −2 xik ε̂i = 0
∂ β̂k i
18/ 83
Multiple Independent Variables
X 0 : k x N; ε̂ : N x 1; 0: k x 1
X 0 ε̂ = 0
X 0 (Y − X β̂) = 0
β̂ = (X 0 X)−1 X 0 Y
19/ 83
Assumptions
Assumptions about the true regression model and data
generating process
I Ideal conditions have to be met in order for OLS to be
a good estimate
I BLUE: unbiased and efficient
20/ 83
Assumption 1
21/ 83
Assumption 2
rank(X) = k
22/ 83
Assumption 3
E(µ2 X 0 X) = σ 2 E(X 0 X)
where σ 2 ≡ E(µ2 )
23/ 83
Assumption 3
E(µ2 |X) = σ 2
24/ 83
Significance and F -Test
25/ 83
Significance and F -Test
F -statistics
I SSRr : Sum of Squared Residuals of the restricted
model
I SSRur : Sum of Squared Residuals of the unrestricted
model (all regressors)
I F -statistics is always non-negative (SSRr is greater
than SSRur )
I The F -test is thus a one sided test
Notation
I k: number of regressors; n: observations; q: number
of exclusion restrictions
I q is equal to the difference in degrees of freedom
between the restricted and unrestricted models
26/ 83
Interpreting and Comparing Coefficients
The size of the slope parameters depends on the scaling
of the variables!
I Not easy to compare the size effects even when
similar scales
I It is sometimes also hard to interpret coefficients
without logs (elasticities)
Standardized coefficients:
I Take the standard deviation of the dependent and
independent variables into account
I How much y changes if x changes by one standard
deviation instead of one unit
I Makes it easier also to compare coefficients of the
many independent variables
27/ 83
Limitations of OLS
28/ 83
Ability Bias and the Returns to Schooling
yi = α + ρSi + γAi + i
yi = αs + ρs Si + si .
What do we get?
29/ 83
(I) Omitted Variables
The relationship between the long and short regression
coefficients is given by the omitted variables bias (OVB)
formula
Cov(yi , Si )
ρs = = ρ + γδAS
Var(Si )
Cov(Ai ,Si )
where δAS = Var(Si )
30/ 83
Griliches (1977)
The conventional wisdom is Cov(Ai , Si ) > 0, so returns to
schooling estimates will be biased up
yi = const + 0.059(0.003)Si
+ 0.0028(0.0005)IQi + experience
31/ 83
Bad Controls
32/ 83
Bad Controls
33/ 83
Bad Controls
yi = α + ρSi + γOi + i
Oi = λ0 + λ1 Si + ui
You could think about these as a simultaneous equations
system. Occupation is an endogenous variable. As a
result, you could not necessarily estimate the first
equation by OLS. Bad control!
34/ 83
Classical Measurement Error
Example
yi = α + βxi + i
We do not observe xi , instead we have x̃i
x̃i = xi + wi
where cov(xi , wi ) = 0 and cov(ei , wi ) = 0. This is called the
classical measurement error model
35/ 83
Classical Measurement Error
The bivariate regression coefficient we estimate is
cov(yi , x̃i )
β̂ =
var(x̃i )
cov(α + βXi + i , xi + wi )
=
var(xi + wi )
var(xi )
=β = βλ
var(xi + wi )
We see that β is biased towards zero by an attenuation
factor
var(xi )
λ=
var(xi + wi )
which is the variance in the “signal” divided by the
variance in the “signal plus noise”
36/ 83
Measurement Error in the Returns to
Schooling
37/ 83
Measurement Error With Two Regressors
Consider again the generic case first
yi = α + β1 x1i + β2 x2i + i
and only x1i is subject to classical measurement error i.e.
cov(x1i , wi ) = cov(x2i , wi ) = 0
β̂1 = β1 λ0
0 λ − R212
λ =
1 − R212
where λ is the bivariate attenuation factor, and R212 is the
R2 from the population regression of x̃1i on x2i
38/ 83
Measurement Error With Two Regressors
Short regression (on just x̃1i ) coefficient is
β̂1,short = λβ1 + β2 δx2 x̃1 = λ(β1 + β2 δx2 x1 )
where the estimate of β1 is biased both because of
attenuation due to measurement error, and because of
omitted variables bias
40/ 83
Measurement Error in the Control
What about the coefficient β2 ? Even when there is no
measurement error in x̃2i , the estimate of β2 will be biased:
1−λ
β̂2 = β1 δx1 x2 + β2
1 − R212
Note that the bias will be larger the larger
I The measurement error
Intuition:
I β1 is attenuated, and hence does not reflect the full
effect of x1i
I β2 will capture part of the effect of x1i , through the
correlation with x2i
41/ 83
Measurement Error in the Returns to
Schooling
yi = α + ρSi + γAi +
where Si is schooling and Ai is ability. Suppose we only
have a mismeasured version of schooling, S̃i . Then the
short regression will give
ρ̂short = λρ + γδAS̃
and the long regression
ρ̂long = λ0 ρ
If ability bias is upwards (δAS̃ > 0) it is not possible to say
a priori which estimate will be closer to ρ
42/ 83
Measurement Error in Ability
1−λ
ρ̂ = γδAS̃ + ρ.
1 − R2AS̃
If ability bias is upwards (δAS̃ > 0) then the returns to
schooling will be biased up but by less than in the short
regression. Controlling for a mismeasured ability is better
than controlling for nothing!
43/ 83
Numbers on the Griliches Example
Pick some numbers for the regression
yi = 0.15Si + 0.01Ai + i
and set
λ = 0.9
y1 = β0 + β1 y2 + µ (3)
y2 = α0 + α1 y1 + α2 x + ν (4)
46/ 83
Limitations of OLS
OLS for eq. (3) and we forget to include eq. (4)
cov(y2 , β0 + β1 y2 + µ)
β̂1 =
var(y2 )
cov(y2 , µ)
β̂1 = β1 +
var(y2 )
...
α1 σµ2
1−α1 β1
E(β̂1 ) = β1 +
var(y2 )
47/ 83
Other Issues
I Outliers? Weight?
I Nonlinear models
I We will cover many other problems!
48/ 83
Statistical Tools
I OLS
I Limitations of OLS
I Heteroscedasticity
I Panel Data
I Discrete Choice Models
49/ 83
Heteroscedasticity
Heteroscedasticity (Ancient Greek: “different dispersion”)
I Variance of the error terms differ across observations
50/ 83
Example II
51/ 83
Example III
52/ 83
Check for Heteroscedasticity
53/ 83
Heteroscedasticity: Serial Correlation
εt = ρεt−1 + µt
54/ 83
Heteroscedasticity: Serial Correlation
εt = ρεt−1 + µt
εt = ρ(ρεt−2 + µt−1 ) + µt
εt = ρ2 εt−2 + ρµt−1 + µt
εt = ρ2 (ρεt−3 + µt−2 ) + ρµt−1 + µt
...
r−1
X
r
εt = ρ εt−r + ρi µt−i
i=0
if ρ < 1, if r → ∞, ρ → ∞
∞
X
εt = ρi µt−i
0
var(εt ) = E(ε2t )
X∞
var(εt ) = var( ρi µt−i )
0
Y ∗ = X ∗ B + ε∗ → β̃
58/ 83
Heteroscedasticity: Serial Correlation
0
E(ε∗ ε∗ ) = E(Ω−1/2 εε0 Ω−1/2 )
0
E(ε∗ ε∗ ) = Ω−1/2 E((εε0 )Ω−1/2 )
0
E(ε∗ ε∗ ) = Ω−1/2 ΩΩ−1/2 )
0
E(ε∗ ε∗ ) = Id
59/ 83
Heteroscedasticity: Serial Correlation
In practice:
0 0
β̃ = (X ∗ X ∗ )−1 X ∗ Y ∗
β̃ = (X 0 Ω−1/2 Ω−1/2 X)−1 (X 0 Ω−1/2 Ω−1/2 Y)
β̃ = (X 0 Ω−1 X)−1 (X 0 Ω−1 Y)
and:
0
var(β̃) = σ 2 (X ∗ X ∗ )−1
var(β̃) = σ 2 (X 0 Ω−1 X)−1
60/ 83
Heteroscedasticity: Serial Correlation
61/ 83
Heteroscedasticity: Serial Correlation
White (1980):
σ12 0 0 0
σ22 0 0
Ω=
··· 0
σT2
and
X 0 ΩX 1X 2 0
= εt xt xt
T T
62/ 83
Heteroscedasticity: Serial Correlation
White (1980):
63/ 83
Heteroscedasticity: Serial Correlation
64/ 83
Spatial Autocorrelation
65/ 83
Spatial Autocorrelation
Z: Matrix n × p , indicators (0 or 1) depending on whether
the individual is within group p
1 0 0 ··· 0
1 0 0 0 0
1 0 0 0 0
.. .. .. .. ..
. . . . .
1 0 0 0 0
0 1 0 · · · 0
Z= 0 1 0 0 0
0 1 0 0 0
. . . . .
.. .. .. .. ..
0 1 0 0 0
0 0 1 · · · 0
.. .. .. .. ..
. . . . .
66/ 83
Spatial Autocorrelation
1 ··· 1 0 ··· 0 0 ··· 0
1 ··· 1 0 ··· 0 0 · · · 0
.. . . ..
. · · · .. .. · · ·
0 . · · · 0
1 1 1 0 0 0 0 0 0
0 ··· 0 1 ··· 1 0 · · · 0
0 ··· 0 1 ··· 1 0 · · · 0
0
ZZ =
... . . .. ..
· · · .. .. · · ·
. . · · · 0
0 0 0 1 1 1 0 0 0
0
· · · 0 0 · · · 0 1 · · · 1
0
· · · 0 0 · · · 0 1 · · · 1
. .. .. .. .. ..
.. · · · . . · · · . . · · · .
0 0 0 0 0 0 1 1 1
67/ 83
Homoscedasticity: Summary
68/ 83
Statistical Tools
I OLS
I Limitations of OLS
I Heteroscedasticity
I Panel Data
I Discrete Choice Models
69/ 83
Panel Data
70/ 83
Panel Data
(1) Balance
I A balanced data set is a set that contains all
elements observed in all time frame
I Is it unbalanced for exogenous reasons? Selection
(e.g. attrition/migration)?
71/ 83
Panel Data
(3) Estimation
I Fixed effects: αi parameters to evaluate
72/ 83
Panel Data
Fixed effects:
Yit = Xit β + αi + νit
N
X
Yit = Xit β + αki dki + νit
k=1
73/ 83
Panel Data
74/ 83
Panel Data
75/ 83
Panel Data
Random effects:
I In practice, need FGLS!
76/ 83
Statistical Tools
I OLS
I Limitations of OLS
I Heteroscedasticity
I Panel Data
I Discrete Choice Models
77/ 83
Discrete Choice Models
78/ 83
Linear Probability Model
79/ 83
Linear Probability Model
Issues:
I Noncomforing predicted probabilities: LPM can
predict probabilities outside the range 0-1
I Heteroscedastic by construction
80/ 83
Probit and Logit
81/ 83
Multinomial Logit
82/ 83
Ordered Response Model
83/ 83