Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Undergraduate Econometrics:

First Midterm

Instructions.

• You have 75 minutes to complete this test.

• The exam is worth 75 points.

• Most questions can be answered independently of one another. For those that build, partial
credit is always available.

• Tables of critical values can be found at the end of the exam.

• To obtain full credit, make sure to explain your answers. Please write out numerical answers
in decimal form. In the calculations please keep at least three significant figures.

• The test is closed-book. You may use your “cheat-sheet” and a calculator during the exam.
You may not use any printed material except this exam. Using any other electronic devices
(such as cell-phones, smartphones, tablets, laptops, etc.) is not allowed.

1
1. (15 points) A policy maker wishes to study whether giving nurses prescribing power is a good
policy for lowering healthcare costs. She brings you the following data. Let Ti be a 0/1 variable
for whether state i allows nurses to prescribe medicine, and Hi be a variable for the average health
of a state’s residents. Denote by Hi1 the health of a state when T = 1 and Hi0 to be the baseline
health of a state, when T = 0.

a) (5) Write down a linear regression model that relates Hi and Ti . Explain what the error
term would be in this model.

Answer.

Hi =Hi0 + (Hi1 − Hi0 )Ti


=µH0 + βTi + Ui

Here Ui is the deviation of a state’s baseline health from the mean. It reflects all factors that
determine a state’s baseline level of health—wealth, composition of residents, etc..

b) (5) An economist runs a regression of Hi on Ti . Why is this regression likely to be biased,


and in which direction?

Answer. Many examples fit here. The key to answering this question was to recognize any variable
of your choice, Z and that there are two ingredients for this to be OVB:

1. Z must be correlated with U

2. Z must be correlated with X

You also had to recognize that the sign of the bias is given by the sign of the effect of Z on Y
times the sign of the correlation between Z and X.

A good example would be wealth: perhaps only poor states have to resort to the policy of allowing
nurses to prescribe medicine, but poor states will (generally) have less healthy residents. Another
good example would be a doctor’s shortage. Both of these would lead to negative bias.

Another example might be that states only allow nurses to prescribe medicine when the states
are relatively healthy so there isn’t much for the nurses to do. Thus, states only allow nurses to
prescribe medicine when demand is low enough that there won’t be a large risk. This would lead to
positive bias.

2
c) (5) The state of New Jersey picks half of its counties out of a hat and allows nurses in those
counties to prescribe medicine while the rest cannot. Explain, appealing to the OLS assumptions,
why this will be an unbiased estimator of the causal effect of giving nurses more power.

Answer. This will ensure that E(U |T ) = 0. This is because since the only determinant of T was
being pulled out of a hat, but being pulled out of a hat will not be correlated with initial health status.
Any understanding of randomization was the key to this problem.

2. (10 points) True or False – 2 points each.

1. Unbiased estimators are always consistent

2. Estimators are random variables

3. Parameters are random variables

4. Under homoscedasticity, a smaller residual variance implies a larger standard error on the
OLS estimators

5. Seeing a higher R2 after adding a variable is evidence of a variable’s statistical importance

Answer.
Explanations are only included for student benefit. A simple T/F was sufficient for full credit.

1. False. E(X1 ) in a sample of X1 , X2 , ..., XN is unbiased but inconsistent.

2. True. Estimators are functions of the data, and the data is a random draw from the under-
lying population.

3. False. Parameters are numbers. They are not random, even if they are unknown.

4. False. This follows from the formula:

σu2
σβ̂2 ≈
1 nσx2

5. False. R2 always increases with additional variables—only the F test can determine statistical
significance.

3
3. (25 points) Alicia, a young video game developer, wishes to estimate the demand for video
games. She is most interested in how demand grows with the length of the game, as this will help
in design. She has run a few regressions, summarized below. She included a few controls such as
price and a dummy for whether the game was action or something else. She has asked for your
help in thinking through the results. The numbers next to the variable names are the coefficient
estimates while the standard errors are in parentheses. The intercept has been omitted.

Dep Var: log(Sales)


(1) (2) (3) (4)
# of Levels .003 .063 .062 .180
(.017) (.013) (.013) (.072)
(# of Levels)2 -.003
(—)
Action Dummy .358 .341
(.021) (.042)
log(P rice) -1.23 -1.22 -1.21
(.057) (.057) (.056)
R2 .001 .439 .442 .445
N 600 600 600 600

a) (4) Interpret the coefficients on “# of Levels” and the “Action Dummy” in regression (3).

Answer.
Adding an additional level to the game results in a 6.2% increase in sales, holding all other
variables constant.

Action games generate 35.8% more sales, holding all other variables constant.

b) (4) At the 5% level, test if the coefficient on number of levels in regression (2) is different
from .003.

Answer. This requires a t-test. The hypothesis is that βLevels 6= .003. The corresponding stat is:

.063 − .003
t= = 4.62
.013
This is larger than 1.96, so we reject the null.

c) (6) From regressions (1) and (2) what can you conclude about the correlation between the
price of a video game and the length of a video game?

4
Answer. It is clear that log(P rice) was an omitted variable in the first regression. The sign of the
bias is negative (i.e., the true value of βLevels is closer to .06 but it was estimated as .003 when not
controlling for price. Since the effect of price is negative, it must be that the correlation between
levels and price is positive. I.e., longer games cost more.

To see this with the formula:

Cov(Levels, log(P rice))


β̂Levels − βLevels ≈ βlog(P rice) × 2
σLevels

Regressions (1) and (2) tell us that the left hand side is negative. From regression (2) we know that
βlog(P rice) is negative. Hence, the covariance term has to be positive.

d) (4) Assuming homoscedasticity, do a 5% F -test to determine if the number of levels has a


nonlinear effect on log(Sales).

Answer. To test if the effect is nonlinear we want to test the quadratic (i.e., nonlinear term).
Since the only regression that contains this variable is (4), that is the unrestricted regression. The
restricted regression will set βLevels2 = 0, which is regression (3). From there we can calculate the
F:
2 − R2
RU R R N −k−1
F = 2 ×
1 − RU R q
.445 − .442 600 − 5
= ×
1 − .445 1
=3.22

The significance level is 3.84. So we do not reject the null.

e) (4) Finally, Alicia’s current game is 8 levels. Using regression (4), estimate the predicted
marginal effect of increasing the number of levels on log(Sales), holding other variables constant.

Answer. Most of the credit comes from recognizing how to get the marginal effect of a polynomial:

dE(log(Sales)|Levels)
= .180 − 2 × .003 × (Levels)
dLevels

Plugging in 8 for the number levels yields, .132. So adding an additional level will raise sales by
about 13.2%.

Note: If instead of calculating the derivative you did the difference between expected sales for 8

5
levels and 9 levels, you received full credit.

f ) (3) Do a 5% test on whether any of the variables in regression (3) matter, statistically
speaking.
Answer. This requires using the regression F test:

R2 n−k−1
F = × (1)
1 − R2 k
.445 595
= × (2)
1 − .445 4
=119.3 (3)

This is will above the significance level for an F∞,4 . We reject the null.

4. (25 points) Two economists are interested in estimating the causal effect of X on Y . They
have used randomization to assign X so that X and U are independent (and hence there is ho-
moscedasticity in addition to the OLS assumptions). They begin with the OLS regression:

Yi = β0 + β1 Xi + Ui

Then one economist suggests working with demeaned data. Define, X̃i = Xi −X̄ and Ỹi = Yi −Ȳ .
He suggests running the following regression without an intercept:

Ỹi = αX̃i + Ũi

a) (6) Write down the OLS objective for α and solve for the OLS estimator of α. Call this α̂.

Answer. This was a homework problem. So those who did the homework should recognize it. The
OLS objective is given by,
N
X
min (Ỹi − aX̃i )2
a
i=1

Setting the derivative to 0 yields,

N
X
−2 (Ỹi − aX̃i )X̃i = 0
i=1

Which can be rearranged to be:


PN
i=1 (X̃i )(Ỹi )
α̂ = PN 2
i=1 X̃i

6
b) (6) Prove that α̂ is an unbiased estimator of β1
Answer. Plugging in the definition of X̃ and Ỹ yields,
PN
i=1 (X̃i )(Ỹi )
α̂ = PN 2
i=1 X̃i
PN
(Xi − X̄)(Yi − Ȳ )
= i=1 PN 2
i=1 (Xi − X̄)

This is just the OLS estimator of β1 , β̂1 . This is an unbiased estimator. Hence,

E(α̂) = E(β̂1 ) = β1

c) (5) Prove that


PN 2
PN ˆ2
i=1 Ûi = i=1 Ũi

Answer. This will be true if Ûi = Ũˆi . To prove this just plug in that α̂ = β̂1 again:

Ũˆi =Ỹi − α̂X̃i


=(Yi − Ȳ ) − β̂1 (Xi − X̄)
=Yi − (Ȳ − β̂1 X̄) −β̂1 Xi
| {z }
β̂0

=Yi − β̂0 − β̂1 Xi


=Ui

Now a third economist suggests estimating the following regression using OLS:

Ỹi = γ0 + γ1 X̃i + Ui

c) (5) Prove that in this regression γ̂0 = 0 exactly. Hint: This does not depend on (a)-(c).

Answer. Notice that:

N
¯ = 1 X(X − X̄)
X̃ i
N
i=1

=X̄ − X̄
=0

7
and similarly for Ỹ . The rest follows from the definition of the constant estimator:

γ̂0 =Ỹ¯ − γ̂1 X̃


¯

=0 − γ̂1 × 0
=0

e) (3) What is the value of a t-test that γ0 = 0 and what is the p-value in this case? Hint: You
do not need to know more than the fact that N (0, 1) is symmetric about the mean.
Answer. The t statistic will always be exactly 0. This is because regardless of the variance in the
denominator, the numerator γ̂0 will always be 0, as proved above. The p-value will be 2 × Φ(0).
Since the normal is symmetric this is just 1. Full credit will be given if only 2 × Φ(0) was written.

Table 1: N (0, 1) 2-Sided Critical Values


Level: 1% 5% 10%
2.83 1.96 1.64

Table 2: Fdf,∞ 5% Critical Values


df: 1 2 3 4 5
3.84 3.00 2.60 2.37 2.21

You might also like