Professional Documents
Culture Documents
401 Endterm 2020
401 Endterm 2020
(a) Show that the 2-Stage Least Squares (2SLS) estimator of δ1 in this first equation is given
by δ̂2SLS = (Z10 P Z1 )−1 (Z10 P Y1 ) where P = X(X 0 X)−1 X 0
(b) Suppose the first equation is just identified. What does this imply about the dimension
of X1∗ ? Use this to show that in this case, the instrumental variable (IV) estimator
(with instruments W = X) is the same as the 2SLS estimator derived above. State any
assumptions you invoke.
4 + 4 marks
3. In the context of instrumental variables, with the help of an example, explain the difference,
if any, between an exclusion restriction and the exogeneity assumption.
Max words: 125 5 marks
4. State in words one key identifying assumption to identify impact of an intervention using a
(a) DiD approach with group A (without intervention) and group B (with intervention).
(b) Triple difference approach with the additional comparison involving group C
Max words: 50 + 100 2+4 marks
5. Consider the regression model Y = β0 + β1 X1 + β2 X12 + β3 X2 + e. But X1 is endogenous.
There are three instruments Z1 , Z2 and Z3 , each of which is exogenous and relevant. The
researcher therefore decides to first regress
X1 on X2 , Z1 , Z2 and Z3 using OLS and obtains X̂1 .
She next regresses using OLS in the second stage Y on X̂1 , X̂12 and X2 .
Will this address the problem of endogeneity? Explain why or why not.
Max words: 200 8 marks
6. In estimating standard errors, a decision to cluster at the level of the village instead of
a tehsil (with say 30 villages per tehsil), essentially involves a tradeoff between bias and
variance. Explain.
Max words: 100 4 marks
7. Consider a simple linear regression of the natural log of wages on a gender dummy variable.
Under what circumstances would it be inappropriate to interpret the estimated slope coeffi-
cient as the average difference (in percentage terms) between male and female wages? You
may ignore considerations of omitted variable bias and endogeneity.
Max words: 100 3 marks
8. Consider the GLS model Y = Xβ + e with E(e|X) = 0 and E(ee0 |X) = σ 2 Ω (so that when
Ω = I, we get the homoskedastic model).
(a) Show that
e0 e σ 2 tr[ΩM ]
s2 = =
n−K n−K
(b) Further, if P and Ω are symmetric positive definite, show that
tr[σ 2 Ω]
0 ≤ E[s2 ] ≤
n−K
5 + 5 marks
9. Consider a simple linear regression model with a single explanatory variable Yi∗ = α+βXi∗ +ei .
But both Y ∗ and X ∗ are measured with error, so Y = Y ∗ + w and X = X ∗ + v and instead a
regression of Y on X is run. What can you say about the existence and degree of bias in the
estimated slope coefficient? Carefully specify all assumptions you invoke in your derivation.
8 marks
2
10. Consider the simple linear regression model Yi = βXi + ei with a single explanatory variable
and no intercept. Let pi denote the ith diagonal element of the projection maker matrix P .
(a) Write down an expression for pi .
P
(b) What is pi ?
2+ 2 marks
11. In a multinomial logit regression of whether an individual is (a) unemployed (base category)
(b) employed or (c) not in the labour force, on three variables (i) gender (Male =1) (ii)
sector (Urban =1 ) and (iii) an interaction between gender and sector, an analyst obtains the
following coefficients. All are statistically significant
Variable Parameter estimate
Employed
Intercept 12.1
Male 0.3
Urban 0.8
Male × Urban 0.2
Not in labour force
Intercept 8.3
Male -0.7
Urban 0.9
Male × Urban -0.4
(a) Interpret the coefficient associated with Male in the Employed and Not in Labour Force
equations.
(b) How would you compute the marginal effects of whether being male and an urban resident
make it more or less likely to be not in the labour force?