Professional Documents
Culture Documents
Chapter Four: Endogenity, Iv Regression and Simultaneous Equation Models (Sems)
Chapter Four: Endogenity, Iv Regression and Simultaneous Equation Models (Sems)
Chapter Four: Endogenity, Iv Regression and Simultaneous Equation Models (Sems)
❑ Endogeneity
❑Simultaneous Equations Models (SEMs)
3. Simultaneity
❑ A system of simultaneous equations occurs when two or more left hand
side variables are functions of each other:
y1 = a + b1x1 + g2y2 + e
y2 = a + g1x1 + g2y1 + e (4.6)
- With some algebra you can rewrite these two equations in “reduced form” as
a single equation with an endogenous regressor
Or Simultaneity arises when one or more of the independent variables, Xj’s,
is jointly determined with the dependent variable, Y , typically through an
equilibrium mechanism.
❖ If there is no endogeneity, it is more efficient to use OLS. If there is
endogeneity, OLS is inconsistent and so IV using 2SLS is best
Dealing with Endogenity
Now let’s see each in a bit detail
Unobserved heterogeneity problem can be handled using the ff options:
1. Find additional data so that every relevant variable is included.
2. Ignore it
- Acceptable only if omitted variable is uncorrelated with all
included variables; otherwise the coefficient estimates will be biased
up or down.
3. Try to get an appropriate proxy variable
4. Use instrumental variable(s) and two-stage least squares estimation
5. Use the fixed effects method or the first differenced model (panel data)
Cont’d---
These methods can eliminate bias when E(u|X) ≠ 0
Suppose we want to estimate a model y = f (x; q) where x is the variable of
interest and q is control.
If we cannot obtain data for or observe q, the variable q will be omitted. In a
sense, q will be part of the error term .
If the omitted variable q is correlated with x, then it will be correlated with u .
That is cov (x; u) or E(u|X) ≠ 0 and hence x is endogenous.
Consider the following regression
log(wage) = b 0 + b1educ + ( b 2 abil + e) (4.7)
Since ability is not observed, we can uonly run the following regression
log(wage) = b 0 + b1educ + u (4.8)
Since ability is which is correlated with education is ommited, education is
endogenous (i.e, correlated with u). Thus, bˆ will be biased
1
Ability also can affect wage through education.
Cont’d---
❖ To eliminate the bias we can employ the following three options:
I. Plug in/find the proxy variable; for example IQ for ability
Suppose y is the outcome, q is the omitted variable and z is the proxy for
q. What properties should the proxy z have?
a. Proxy z should be strongly correlated with q.
b. Proxy z must be redundant (ignorable)
E (y | x, q, z) = E (y | x, q) (4.9)
c. Omitted q must be uncorrelated with other regressors conditional on z:
(corr (q , xj) = 0 | z) for each xj
❑ The last two mean roughly that q and z provide similar information
about the outcome
Cont’d---
❑ What if proxy variable z is correlated with a regressor x?
➢ OLS is inconsistent, but one can hope and argue that the
inconsistency is less than if z is omitted.
❑ Consider using a lagged dependent variable as a proxy variable.
Example: If you believe that omitted variable qt strongly affects
outcome yt, then a lagged value of y (such as yt-2) is probably
correlated with qt as well.
Problem: yt-2 may be correlated with other x’s as well, leading to
inconsistency
❑ Consider using multiple proxy variables for a single omitted
variable
- Simply put all proxy variables in the equation that meet the
requirements for proxies.
Cont’d---
❑ What if omitted variable q interacts with a regressor x?
y = a + b1x + b2q + b3qx + e (4.10)
∂y/∂x = b1+ b3q
➢ marginal effect of x on y involves q, which is unobserved
If we can find a good proxy, we would estimate the model using OLS
Whenever there is no appropriate proxy variable, consistent estimation of the
parameters β0 and β1 requires an instrumental variable.
Difference between IV and Proxy?
- With IV we will leave the unobserved variable in the error term but use an estimation
method that recognizes the presence of the omitted variable
- With a proxy we were trying to remove the unobserved variable from the error term e.g.
IQ. IQ would make a poor instrument as it would be correlated with the error in our
model (ability in u)
- Need something correlated with education but uncorrelated with ability (parents
education?)
Cont’d---
II. Instrumental variable approach
❖ For an instrumental variable (an instrument) Z to be valid, it must satisfy two
conditions:
Instrument exogeneity or Validity Condition: Cov(Zi , ui )= 0
◼ All the instruments are uncorrelated with the error term:
◼ Instrument z should have no partial effect on y and must be uncorrelated with the
error term (z is exogenous).
◼ Must be uncorrelated with potential unobserved determinants of the y variable
Instrument relevance or Informativeness Condition: Cov(Zi , Xi )≠0
✓ Instrument (z) must be correlated with the troublesome/endogenous variable (x)
✓ This condition can be tested given the random sample from the population.
❑ The second requirement above is also called the identification requirement.
When it does not hold the parameters are not identified. Which means that we
can not calculate unique values for the parameters
Cont’d---
❖ Thus, if the above both conditions are satisfied, we call z an instrumental
variable
There are two ways to intuitively understand these conditions.
1. Instrumental variable is a variable that is not correlated with the
omitted variable, but is correlated with the endogenous explanatory
variable.
✓ Instrument would have to be correlated with education level of a
person, but not their ability level
2. Instrumental variable is a variable that affects y only through x.
The condition Cov(z,u)=0 involves unobserved u. Therefore, we cannot
test this condition. But, when you have extra instrumental variables, you
can test this. This will be discussed later.
The condition Cov(z,x)≠0 is easy to test(test for instrumental relevance).
Testing Over identifying Restrictions: The LM or
J-statistic test (Test for exogenity)-for condition 1
If there is just one instrument for our endogenous variable (just
identified), we can’t test whether the instrument is uncorrelated with
the error
If we have multiple instruments, it is possible to test the
over identifying restrictions (partially - for instrument exogeneity)- to see
if some of the instruments are correlated with the error
Estimate the structural model using IV (2SLS) and
obtain the residuals( )
Regress the residuals on all the exogenous variables and obtain the R2
to form nR2
Under the null that all instruments are uncorrelated with the error,
LM ~ cq2 where q is the number of extra instruments (m-k df)
Cont’d---
Just run the OLS:
x=π0+π1z+v (4.11)
❑ Then test: H0: π1=0.
❑ Thus, we should be able to reject the H0 against HA
We need an instrument z for education which is:
1. not correlated with ability (and other omitted variables)
2. correlated with education.
IQ cannot be a good instrument for education because it is correlated
with ability.
Mother's education could satisfy the second condition but could be
correlated with child's ability.
Number of siblings while growing up might be an instrument.
Measuring the strength of instruments in practice:
The first-stage F-statistic (Test for relevance)- for
condition 2
The first stage regression:
Regress X on Z1 , … , Zm , W1 , … , Wr .
Totally irrelevant instruments iff all the coefficients on Z1 , … , Zm are
zero.
The first-stage F-statistic tests the hypothesis that Z1 , … , Zm do not
enter the first stage regression.
If instruments are weak, then the TSLS estimator is biased and the
and statistical inferences (standard errors, hypothesis tests, confidence
intervals) can be misleading and also t-statistic has a non-normal
distribution
Weak instruments imply a small first stage F-statistic.
Compute the first-stage F-statistic.
Cont’d---
Rule-of-thumb:
- If F > 10, instruments are strong - use TSLS.
- But if the first stage F-statistic is less than 10, then the set of
instruments is weak (take some action).
What to do if you have weak instruments?
- Get better instruments
- If you have many instruments, some are probably weaker than
others, then it’s a good idea to drop the weaker ones (dropping an
irrelevant instrument will increase the first-stage F).
The stronger the correlation between the instruments and the explanators
(strong instrument), the more efficient IV is.
If the correlation between Z and X is too low, then Z is a weak instrument,
and 2SLS is not a helpful procedure
The IV Estimation: One explanatory variable (X)-
one instrument (Z) case
Two Stage Least Squares (TSLS)
As it sounds, TSLS has two stages - two regressions:
(1)First isolates the part of X that is uncorrelated with u:
Regress X on Z using OLS:
X=π0+π1z+v (4.12)
Because Zi is uncorrelated with ui , is
uncorrelated with ui . We don’t know or but we have
estimated them.
Compute the predicted values of Xi , , where
, i = 1, ... , n.
Cont’d---
(2) Replace Xi by in the regression of interest:
regress Y on using OLS:
(4.13)
Because is uncorrelated with ui in large samples, so the first least squares
assumption holds.
If the instruments are not uncorrelated with the error term, the first stage
of TSLS doesn’t successfully isolate a component of X that is uncorrelated
with the error term, so is correlated with u and TSLS is inconsistent.
Thus can be estimated by OLS using regression (2).
This argument relies on large samples (so and are well estimated
using regression (1)).
This resulting estimator is called the Two Stage Least Squares (TSLS)
estimator, .
Cont’d---
Suppose you have a valid instrument, Zi .
Stage 1:Regress Xi on Zi , obtain the predicted values .
Stage 2:Regress Yi on , the coefficient on is the TSLS
estimator, .
Then is a consistent estimator of .
Now, consider: y = b 0 + b1 x + u
Then we have: Cov(z,y)=Cov(z, β0+β1x+u)
So we have, Cov(z,y)= β1Cov(z,x)+Cov(z,u)
Since Cov(z,u)=0, we have
Cov( z , y ) (4.14)
b1 =
Cov( z , x )
β1 is thus identified in the sense that it is expressed in the population
moments (covariances) that can be estimated using sample data.
Cont’d---
By replacing the population covariances (Cov(z,y) and Cov(z,x))with their
sample covariances, we have the instrumental variable estimator of β1 which
is given nby
( zi − z )( yi − y ) ziYi (4.15)
b̂1 = i =1 bˆ IV =
n zi xi
( zi − z )( xi − x )
i =1
The IV estimator of β0 is obtained as usual using means of Y and X
You can easily show thatbˆ1 is a consistent estimator of β1.
Consistency of the TSLS estimator
bˆ 2 SLS
−1
= ( xˆ ' xˆ ) xˆ ' y
ˆ Cov( z, y )
b IV = ( z ' x) z ' y =
−1
Cov( z , x)
Cont’d---
❖ Approach 3: Difference it out
❑ Suppose that the endogeneity is fixed over time, such as
measurement error or an omitted variable. Further, suppose that
observe data in two time periods.
❑ A difference-in-difference (DD) model can be used: subtract values
at time 1 (“before”) from values at time 2 (“after”) and the
endogenous variable will drop out.
Limitations:
- DD models will not eliminate selection bias.
- DD models only eliminate fixed variables; sometimes endogenous
variables change values over time
Statistical inference with IV:
Homoskedasticity case
Homoskedasticity assumption in the case of IV regression is stated in
terms of z: E(u2|z)=σ2
It can be shown that the asymptotic variance of bˆ1is given by:
2
var( bˆ1 ) = (4.17)
n x x , z
2 2
where x2is the variance of x, and x2,z is the correlation between x and z.
Now, the estimator of var(bˆ1 ) is obtained by replacing σ2, 2, and x ,z with
2
x
their sample estimates.
Sample estimator of σ2 is obtained in the following way. First, obtain the IV
estimates for β0 and β1, then compute
uˆi = yi − bˆ0 − bˆ1 xi (4.18)
The estimator for σ2 is then computed
n
as
1
̂ 2 =
n−2
i
i =1
uˆ 2
(4.19)
Cont’d---
The sample estimator for x2 is given as:
1 n SSTx
̂ x = ( xi − x ) 2 =
2 (4.20)
n i =1
n
for x , z can be most easily obtained in the following way.
SST 2
Finally, sample estimatorx
First, regress x on z. Then the R-squared from this regression equals the square of
the sample correlation, R2x,z
Then, the estimator for the variance of is given by:
ˆ 2
va r̂( bˆ1 ) = (4.21)
SSTx Rx , z
2
You can show that this is a consistent estimator of the asymptotic variance of (4.17).
The R-squared for IV regression is computed as
R2=1-SSR/SST (4.22)
Where SSR is the sum of the squared IV residuals. (The IV residual is given by (4.18)).
Unlike in the case of OLS, SSR can be greater than SST. Thus, R2 can be negative.
In IV regression, R2 does not have a natural interpretation.
IV Estimation of the multiple regression model
IV estimation can be extended to the multiple regression case. Call the
model we are interested in estimating the structural model
Our problem is that one or more of the variables are endogenous.
To estimate a multiple regression consistently, we need at least one
instrumental variable for each troublesome explanatory/endogenous
variable
Identification
In general, a parameter is said to be identified if different values of the
parameter would produce different distributions of the data.
In IV regression, whether the coefficients are identified depends on the
relation between the number of instruments (m) and the number of
endogenous regressors (k).
Intuitively, if there are fewer instruments than endogenous regressors, we
can’t estimate . . For example, if k = 1 but m = 0 (no instruments)
Cont’d---
Three possible cases for the coefficients, to be identified are:
✓ When we do not have enough instruments, the equation is under identified
(m < k). In this case we do not have a consistent estimator
✓ When we have just enough instruments for consistent estimation, we say the
regression equation is exactly identified (m = k). Then we simply use
Instrumental Variables Least Squares
✓ When we have more than enough instruments, the regression equation is over
identified(m > k). Two-Stage Least Squares (2SLS or TSLS) will be
employed for the reduced form equations
The necessary condition for identification is called order condition.
- There should be at least as many excluded exogenous explanatory variables
as there are included endogenous explanatory variables in the structural
equation.
❑ The sufficient condition for identification is a rank condition
Cont’d---
Write the structural model as:
y1 = b0 + b1y2 + b2z1 + u1 (4.23)
, where y2 is endogenous and z1 is exogenous
Let z2 be the instrument, so Cov(z2,u1) = 0 and
y2 = 0 + 1z1 + 2z2 + v2 (4.24)
, where 2 ≠ 0
This reduced form equation regresses the endogenous variable on all
exogenous ones
Two Stage Least Squares (2SLS)
It’s possible to have multiple instruments
Consider our original structural model, and let
y2 = 0 + 1z1 + 2z2 + 3z3 + v2 (4.25)
Cont’d---
STAGE TWO:
◼ Substitute the Ys from the reduced form for the Ys that appear on the right
side (only) of the structural equations, and then estimate these revised
structural equations with OLS
Cont’d---
❑ That is, estimate (using OLS):
(4.39)
(4.40)
❑ Hence, the simultaneity causality bias can be eliminated using IV-2SLS estimation
approach
Note that:
✓ 2SLS estimates are still biased in small samples, but consistent in large samples
(get closer to true βs as N increases)
✓ If the fit of the reduced-form equation is poor, then 2SLS will not rid the equation of
bias even in a large sample
✓ 2SLS estimates have increased variances and standard errors relative to OLS
✓ Two-Stage Least Squares cannot be applied to an equation unless that equation is
identified. Thus, Identification is a precondition for the application of 2SLS to
equations in simultaneous systems