EQ01 (OLS)

lwage76 c ed76 exp76 exp762 black smsa76


Interpretation of coefficients:
- For continuous variables, we deal with a log-
lin model, so coefficients are semi-
d(lwage) / d(exp) =
0.084 -2*0.0022(exp)  semi-elasticity of wages
w.r.t. experience.
- for dummy variable, the interpretation is as for
log-log models. E.g. black workers have lower
wage than white people: In particular -0.189
means that the relative difference between the
average wage of black workers and the average
wage of white workers is -0.189*100=-18.9%,
holding all the other variables (education,
experience, etc.) constant.
3 main Causes:
1) simultaneity: correlation between unobserved
component of error term (e.g. ability of worker)
and one or more regressors (schooling, education
attained: I expect a positive correlation, as more
education for more able workers);
2) measurement error: (e.g. education is badly
measured by number of years of school, also
quality of schools may matter);
3) dynamic models. OLS estimates are biased and
inconsistent (the magnitude of the estimated
coefficients is either too low - downward bias - or
too high - upward bias).

Solution: IV.
Find variables that are correlated to the
endogenous variable (to education/ experience)
but uncorrelated with the error term (e.g. ability).

- We first use one IV for each endogenous

variable (perfectly identified model): near
college, age, age2.
- Then we use more IVs than endogenous
variables (over identified model): near
college, age, age2, mom education, dad

2sls (EQ07)
lwage76 c ed76 exp76 exp762 black smsa76
IV list
c nearc4 age76 age762 momed daded black
smsa76 south76

gmm (EQ 06)

lwage76 c ed76 exp76 exp762 black smsa76
IV list
c nearc4 age76 age762 momed daded black
smsa76 south76

Interpretation of coefficients:
- As in OLS: the coefficients of continuous
variables are interpreted as semi-elasticities
because the model is log-lin.

- As in OLS, since the dependent variable is
log-transformed, the coefficients of dummy
variables are the relative difference in the
average dependent variable between the
included category and the excluded category,
all else equal. E.g. the dummy black means
that the relative difference between the
black workers’ and white workers’ average
wage is -0.176*100=-17.6%, holding all the
other variables (education, experience, etc.)

- Difference between GMM and 2SLS (GIVE and
IV methods)
o GMM: the most general method. It uses
the optimal weighting matrix S = 1/n [Σi εi2
zi zi’]. It accounts for heteroscedasticity
and autocorrelation
o GIVE (2SLS): The Weighting matrix used
does not account for
heteroscedasticity/autocorrelation: SZZ =
[s2 (1/n Z’Z)]. In the perfectly identified
case (R=K), W is irrelevant, so we get
identical coefficients for GMM and IV.

Weak instruments:
If the instruments exhibit only weak correlation
with the endogenous regressor(s), the properties
of the IV estimator can be very poor (the IV
estimator is biased; its standard errors are
misleading; hypothesis tests are unreliable).

To test whether there is a weak instruments

problems, it is useful to estimate the first stage
and to evaluate the explanatory power of the
additional instruments that are not included in
the equation of interest. Report the first stage
and think about whether it makes sense.
Report the F-statistic on the excluded
instruments. The bigger this is, the better.
F statistics above 10 to 20 are considered
relatively safe, lower Fs put you in the danger

1st stage – OLS (EQ05)
ed76 c nearc4 age76 age762 daded momed
black smsa76 south76

f- test
H0 : nearc4=0, age76=0, age762=0, momed=0,

C(2)=0, c(3)=0, c(4)=0, c(5)=0, c(6)=0

