Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

EC220 Introduction to Econometrics

Exam Solutions
IRDAP 2020

Section A
(Answer all questions. This section carries 1/3 of the overall mark.)

Question 1
[33.34 marks]
Charitable giving is an increasingly important component of the economy. Yet, relatively little is known
about what motivates people to give to charities. Fundraisers are interested in assessing the effec-
tiveness of door-to-door fundraising campaigns, which are normally expensive and time-consuming.
A consultancy firm in London has access to a dataset on various characteristics (home address, age,
gender, occupation, whether the household donates before, ...) of 170,000 households.
In summer 2020, the firm sends ask-for-donation flyers to 100,000 households. The flyer only contains
the exact time of a door-to-door charity solicitation in the neighbourhood. The firm codes Flyeri = 1
if household i is sent a flyer, and 0 otherwise. The team then visits the 100,000 households, but are
only able to speak with 60,000 of them. The firm codes Visitedi = 1 if the door-to-door solicitor
could speak to household i, and 0 otherwise. After the flyer campaign, the firm records in the variable
Donatei = 1 if household i donates to a charity, and 0 otherwise.

(a) The consultancy firm claims that the 60,000 successful visits should constitute a random sam-
ple from the intended population of 100,000 households who receive flyers. Critically discuss
the claim. Could the team statistically test whether or not the claim holds? If yes, carefully de-
scribe one suitable test and necessary assumptions. If not, carefully explain why. [5 marks]

Solution : Students can discuss both sides of the story. The key aspect is the definition of a
random sample: the reason for the team being unable to reach the households door-to-door
should not be related with the intention to donate (i.e., the households should not actively try
to avoid the fundraisers). One story for why it might hold is that the team could not reach a
household because the time is not suitable for the household schedule for example, a visit to
relative, or another random outgoing event. Clearly this reason may not be plausible because
40,000 out of 100,000 miss the appointed time, indicating some systematic absence. That
is, the households would systematically avoid the fundraiser to avoid the door-to-door ask for
donation and thereby giving, e.g. because the visits are during the day and would therefore not
reach the working population which may have a different willingness to donate from the non-
working population. [3 marks for a good discussion of the randomness or selection story (or
both)]
We can statistically test this claim by running a balance test of observable characteristics be-
tween the successfully visited 60,000 households and the remaining 40,000. We do so by com-
paring the averages of household characteristics available in the data between the 60,000 and
the 40,000. [1 point] If we observe statistically significant differences in some averages be-
tween the two groups, the claim does not hold statistically. Even if we observe statistically
insignificant differences for all of the averages, to conclude that the claim holds, we must as-
sume that no unobservable characteristics could explain intention to donate. [1 point] Clearly
this is a very strong claim to make.
Note: No marks should be awarded for a discussion of how the 100,000 households receiv-
ing flyers may not be a random sample – the question is about the 60,000 households among
flyer-receiving households who are visited.

© LSE ST 2020/EC220R IRDAP Page 1 of 18


Give [1 out of 2 points] in the second part if students suggest that randomness cannot be tested.
No marks for suggesting a regression of donation on visiting – clearly we do not have informa-
tion on donation for unvisited households.

(b) The firm claims that the 100,000 households were randomly selected from the original 170,000
addresses. Assume that the claim is true. The firm investigates the causal effect of the ask-
for-donation flyer campaign on giving behaviour in London.
(i) Could the firm use the available information to answer the causal question? If yes, carefully
describe and interpret the estimation regression that you would run. If not, explain why and
describe the additional information and assumptions you would require. [3.34 marks]

Solution : Yes we can. Using the full sample on 170,000 households [0.34 points], we can
run the following regression:

Donatei = α + β Flyeri + γXi + ei (1)

[2 points; 1.5 if students include V isitedi or the interaction of V isitedi and F lyeri in the
regression] where β is the coefficient of interest. We can also include a set of good neutral
controls of variables that could help explain intention to donate to improve the precision of
the causal estimate. [0.5 point]
The equation would help because the treatment variable Flyeri is effectively randomised
so that there will no selection bias between those who receive the flyer and those who do
not. [0.5 point]
Note: A good discussion of random sampling in this question deserves [1 point].
A good discussion of a regression on V isitedi instead of F lyeri deserves [1 point].
Do not penalise for using the probability to donate as an outcome variable.

(ii) A critic suggests pensioners are more likely to donate when they receive an ask-for-donation
flyer. Let Pensioneri = 1 if household i0 s head is a pensioner, 0 otherwise. Could you test
this claim? If yes, carefully describe how. If no, explain why. [4 marks]

Solution : There are three steps. First, we could create a new interaction Pensioneri ×
Flyeri and include the interaction in the equation above (together with Pensioneri ).

Donatei = α0 + β0 Flyeri + β1 Pensioneri + β2 Pensioneri × Flyeri + γ0 Xi + ui [1.5 points]


(2)

Second, β2 is the differential effect of receiving a flyer for pensioners and we would like to
test whether this is statistically significant and positive. [1 point]
Third, we can use a t-test to test this hypothesis [0.5 points; z-test also acceptable]. The
hypotheses are H0 : β2 = 0 vs. H1 : β2 > 0. [0.5 points; two-sided test acceptable] A
rejection of the null hypothesis will provide support for the claim. [0.5 points]
Note: Do not penalise if students include V isitedi as additional regressor here.
If students do not include the interaction but only F lyeri and P ensioneri , a very good
discussion can be awarded [2 out of 4 points] for the whole question.

(iii) The firm now would like to estimate the causal effect of a door-to-door solicitation visit

© LSE ST 2020/EC220R IRDAP Page 2 of 18


on giving behaviour. Could it be done using the available information? If yes, describe the
regression and critically discuss your approach. If not, explain why. [7 marks]

Solution : We should use an IV strategy/setup here. First we could run the following re-
gression:

Donatei = π0 + θVisitedi + λXi + vi (3)

using the full sample of 170,000 and instrument the variable Visitedi with Flyeri . [1.5 points]
Now students are expected to describe a 2SLS procedure (we run first stage of Visitedi
on Flyeri and controls, then include the predicted of Visitedi ) or a Wald estimator. A good
discussion here gets [2 points].
This approach is likely to result in the causal effect of door-to-door solicitation because the
three assumptions are likely to hold. First, randomisation of the instrument holds by design.
[1 point]. Second, relevance assumption holds because a large part of the households re-
ceived flyers allowing for door-to-door visits. [1 point]. Finally, the exclusion restriction as-
sumes that the only way the flyers could affect giving behaviour is through the direct visit.
[0.5 point]. This might hold because there is no other information about the charities in-
cluded in the flyer, thereby any raised donations should come from the visits. Students can
also argue against this assumption by pointing to the fact that the flyer raises awareness
of a fundraising campaign and encourage households to donate. But this should mean
that these households would welcome the door-to-door initiate. A clever discussion here
further deserves [1 point].
Note: If students only run Equation (3) without the IV approach, give maximum [3 points]
for an excellent discussion. Otherwise, [1 point] is awarded if only the equation and the full
sample are given.
If students argue that we cannot estimate the causal effect because visits are not random,
award [up to 2 points] for an excellent discussion.

(c) Another research team in Birmingham hypothesises that including photographs that elicit emo-
tion in ask-for-donation flyers would increase the amount of donations. The team has a dataset
of 100 potential donors with information on various household characteristics. In Spring 2020,
they send an ask-for-donation flyer with information of different charities in the city to 40 house-
holds randomly selected from the sample. The team adds additional photographs that evoke
emotion to the flyers sent to the remaining 60 households (coded as Photographs Includedi =
1, 0 otherwise). Denote Amount of Donationsi as the amount in GBP raised from household i
after the flyer campaign and Female-headedi = 1 if an adult female is the main decision maker
of household i. The team reports the results from OLS regressions in the table below.

© LSE ST 2020/EC220R IRDAP Page 3 of 18


Dependent Variable: Amount of Donationsi (£)

Regressor (1) (2) (3)

15.751 15.367 15.424


Photographs Includedi
(7.222) (4.102) (5.193)

1.124 1.139
Female-headedi
(0.151) (0.061)

1.188
Pensioneri
(0.152)

80.523 74.623 77.611


Constant
(12.244) (6.112) (9.189)

Observations 100 100 100

(i) What are the average donations from all the households in the sample after the campaign?
Carefully explain your answer. If you cannot derive the answer, clearly indicate any further
information or assumptions necessary for your calculation. [2 marks]

Solution : The average donation from households without the photographs is the con-
stant: xwithout = £80.523. [0.5 point]
The average donation from households with the photographs is the constant plus the coef-
ficient for Photographs Includedi = 1: xwith = £80.523 + 15.751 = £96.274. [0.5 point]
The average donation from all households is:

x = 0.4 × xwithout + 0.6 × xwith (4)


= 0.4 × £80.523 + 0.6 × £96.274 (5)
= 32.2092 + 57.7644 = £89.9736 [1 point] (6)

Note: The precise decimals are not important. A result around 89.97 or 90 pounds suffices.
Wrong calculations that include the constant are awarded [0.5 points].

(ii) A critic cites prior research that pensioners are just more likely to donate than working
households, all else equal. The critic suggests that the team must include the variable
Pensioneri , which indicates whether household i0 s head is a pensioner, in the estimation to
avoid omitted variable bias. Critically evaluate the critic0 s suggestion. [5 marks]

Solution : If the photographs were properly randomised to include in the flyers sent to
households and the sample size is large enough (relying on the Central Limit Theorem),
the treatment should be randomised and any characteristics of the households would NOT
constitute an omitted variable. While being pensioners might be correlated with the out-
come, it should not be correlated with the treatment or the expectation of receiving the
treatment due to the randomisation. [2] points] Therefore, we do not have to include the
variable as the point estimate for the effect of Photographs to change when we add the
control for Pensioner as there would be no OVB. However, since Pensioner could explain

© LSE ST 2020/EC220R IRDAP Page 4 of 18


some of the residual error, it will help reduce the standard error of our estimates. The team
could include the variable in to improve the precision. [2 points]. This is especially impor-
tant in our case because our sample size is quite small (only 100) and the variance formula
indicates that we should include good neutral controls to improve precision. [1 point]
Note: Students should discuss the sample size to get the full mark.
If students refer to a regression involving the interaction between treatment and being a
pensioner and argue that this interaction suffers from OVB, award [up to 4 out of 5 points]
for an excellent discussion.
Even though besides the point, an excellent discussion of OVB can be awarded [up to 2 out
of 5 points] here.

(iii) Another critic interprets Column (2) as a causal evidence for women being more altruistic
than men. His rationale is that gender is assigned randomly at birth, Column (2) captures
the causal effect of having a female household head on the amount of donations. Carefully
explain whether the critic is right or wrong? [3 marks]

Solution : It is true that gender is randomised at birth, but it is not random that some
households have a female head. [1.5 point] As such, the interpretation of the coefficient
for Female cannot be causal, and could be biased. Households with a female head might
be systematically different from a male-headed female in a way that is correlated with their
intention to donate. [1.5 points].
Note: Students arguing that the critic is wrong for a different reason should be awarded
partial marks.

(iv) Carefully explain why the coefficients for Photographs Includedi are different in Column (1),
Column (2), and Column (3)? Clearly state any assumptions you make. [4 marks]

Solution : Because the treatment is randomised, there should be no OVB in the estima-
tions. [1 point] The difference in the coefficients is not due to OVB as such, but instead due
to sample variability of the estimates. [1 point] Indeed, the difference is small in magnitude
and the difference between the coefficients is cleary not statistically significant (SEs far
greater than difference). [1 point] The fact that the sample size is relatively small means
we should expect some sample variability [1 point].
Note: Even though besides the point, an excellent discussion of OVB or measurement error
can be awarded [up to 1.5 out of 4 points] here.

© LSE ST 2020/EC220R IRDAP Page 5 of 18


Section B
(Answer all questions. This section carries 2/3 of the overall mark.)

Question 2
Consider the bivariate regression model without intercept
yi = βxi + ui ,
for i = 1, . . . , n. We impose the following assumptions.
SLR.1 The population model is y = βx + u.
SLR.2 We have a random sample of size n, {(yi , xi ) : i = 1, . . . , n}, following the population model
in SLR.1.
SLR.3 The sample outcomes on {xi : i = 1, . . . , n} are not all the same value.
SLR.4 The error term u satisfies E(u|x) = 0 for any value of x.
SLR.5 The error term u satisfies V ar(u|x) = σ 2 for any value of x (homoskedasticity).
Let β̂ be the OLS estimator for the regression from y on x without intercept, that is
Pn
xi y i
β̂ = Pi=1
n 2 .
i=1 xi

[22.33 marks]
Pn
(a) Explain whether the following statement is true or false: i=1 ûi = 0 for the OLS residuals
ûi = yi − β̂xi for i = 1, . . . , n. [If it is true, prove this statement. Otherwise, explain the reason.]
[3 marks]

Solution : This statement is FALSE. The reason is explained as follows. The OLS estimator is
Pn
defined as β̂ = arg minβ i=1 {yi − βxi }2 . Thus the FOC of β̂ is
n
X
0= xi (yi − β̂xi ).
i=1
Pn
[1 point] Therefore, by the definition of ûi = yi − β̂xi , β̂ satisfies 0 = i=1 xi ûi . [1 point]
However, this is the only restriction on β̂ to be satisfied as the FOC, and there is NO guarantee
Pn
for β̂ to satisfy i=1 ûi = 0. [1 point] [Note that this is the regression model without intercept.]

(b) Show that β̂ is a consistent estimator for β under SLR.1-4. [4 marks]

Solution : Note that


1 Pn 1 Pn
n i=1 xi yi n i=1 xi ui
(∗) β̂ = 1 Pn 2
=β+ 1 Pn 2
,
n i=1 xi n i=1 xi

where the first equality follows from the definition of β̂ , and the second equality follows from
yi = βxi + ui by SLR.1-2. By the law of large numbers,
n
!
1X 2
plim xi = E[x2i ],
n
i=1
n
!
1 X
plim xi ui = E[xi ui ].
n
i=1

© LSE ST 2020/EC220R IRDAP Page 6 of 18


[1 point; LLN needs to be mentioned] Therefore, by the property of plim,
1 Pn

plim n i=1 xi ui
plim(β̂) = β + 1 Pn 2

plim n i=1 xi
E[xi ui ]
= β+
E[x2i ]
= β.

[2 points; deduct 1 point if plim is applied to sums, not averages] where the last equality follows
from

E[xi ui ] = E[E[xi ui |xi ]] = E[xi E[ui |xi ]] = 0.

[1 point for showing this explicitly] [The first equality follows from the law of iterated expecta-
tions, the second equality follows from the property of the conditional expectation, and the last
equality follows from SLR.4.] Therefore, β̂ is consistent for β .

(c) Under SLR.1-5, derive E[β̂ 2 |X], where X = (x1 , . . . , xn ). [4 marks]

Solution : Note that the conditional variance V ar(β̂|X) is written as

V ar(β̂|X) = E[(β̂ − E[β̂|X])2 |X]


= E[β̂ 2 |X] − (E[β̂|X])2 ,

where the first equality follows from the definition of the conditional variance, and the second
equality follows from a direct calculation and property of the conditional expectation. Thus,
E[β̂ 2 |X] can be expressed as

(∗∗) E[β̂ 2 |X] = V ar(β̂|X) + (E[β̂|X])2 .

Thus, we first compute E[β̂|X] and V ar(β̂|X), and then obtain E[β̂ 2 |X] by (**).
By (*) and taking the conditional expectation,
Pn    Pn 
i=1 xi u i E i=1 xi ui X
E[β̂|X] = E β + Pn 2 X =β+
Pn 2
x i=1 xi
Pn i=1 i Pn
x E[u |X] x E[u |x ]
Pni 2i
= β + i=1 Pni 2i i
= β + i=1
i=1 xi i=1 xi
= β,

where the first equality follows from (*), the second and third equalities follow from the property
of conditional expectation, the fourth equality follows from SLR.2, and the fifth equality follows
from SLR.4. [1.5 points]Therefore,

E[β̂|X] = β.

© LSE ST 2020/EC220R IRDAP Page 7 of 18


Also, the conditional variance V ar(β̂|X) is obtained as follows:
 Pn 
i=1 xi ui
V ar(β̂|X) = V ar β + Pn 2 X
i=1 xi
Pn  Pn 2
V ar i=1 xi ui X i=1 xi V ar(ui |X)
= =
2 2 2 2
Pn Pn
i=1 xi i=1 xi
Pn 2
i=1 xi V ar(ui |xi )
=
2 2
Pn
i=1 xi
σ2
= Pn 2,
i=1 xi

where the first equality follows from (*), the second and third equalities follow from the property
of conditional variance, the fourth equality follows from SLR.2, and the fifth equality follows from
SLR.5. [1.5 points]
Combining these results with (**), we obtain

σ2
E[β̂ 2 |X] = Pn 2 + β 2.
i=1 xi

[1 point; students need to have derived (**) here]

(a) (a)
(d) Now suppose another random sample of size n, {(yi , xi ) : i = 1, . . . , n}, is available.
(a) (a)
[Here “(a)” is a superscript to signify another sample.] Suppose {(yi , xi ) : i = 1, . . . , n}
is independent of the original sample {(yi , xi ) : i = 1, . . . , n}, and we impose the following
assumptions.
SLR.1a The population model is y (a) = β (a) x(a) + u(a) . [This model may be different from the
one in SLR.1.]
(a) (a)
SLR.2a We have a random sample of size n, {(yi , xi ) : i = 1, . . . , n}, following the popula-
tion model in SLR.1a.
(a)
SLR.3a The sample outcomes on {xi : i = 1, . . . , n} are not all the same value.
SLR.4a The error term u(a) satisfies E(u(a) |x(a) ) = 0 for any value of x(a) .
SLR.5a The error term u(a) satisfies V ar(u(a) |x(a) ) = (σ (a) )2 for any value of x(a) (homoskedas-
ticity).
Let β̂ (a) be the OLS estimator for the regression from y (a) on x(a) , that is
Pn (a) (a)
(a) i=1 xi yi
β̂ = Pn (a) 2
.
(x
i=1 i )

Show that E[β̂ − β̂ (a) ] = β − β (a) under SLR.1-4 and SLR.1a-4a. [3.33 marks]

Solution : Note that

E[β̂ − β̂ (a) ] = E[β̂] − E[β̂ (a) ].

From the derivation in (c), we obtain E[β̂|X] = β under SLR.1-4. Therefore,

E[β̂] = E[E[β̂|X]] = E[β] = β,

© LSE ST 2020/EC220R IRDAP Page 8 of 18


where the first equality follows from the law of iterated expectation, the second equality follows
from E[β̂|X] = β , and the last equality follows from the property of expectation. [1 point] By
exactly same argument to the proof of E[β̂|X] = β , we obtain

E[β̂ (a) |X(a) ] = β (a) ,

under SLR.1a-4a. Therefore,

E[β̂ (a) ] = E[E[β̂ (a) |X(a) ]] = E[β (a) ] = β (a) ,

where the first equality follows from the law of iterated expectation, the second equality follows
from E[β̂ (a) |X(a) ] = β (a) , and the last equality follows from the property of expectation. [1.33
points] Combining these results, we obtain E[β̂ − β̂ (a) ] = β − β (a) . [1 point]

(e) Under SLR.1-5 and SLR.1a-5a, derive the (conditional) variance V ar(β̂ − β̂ (a) |X, X(a) ), where
(a) (a)
X(a) = (x1 , . . . , xn ). [4 marks]

Solution : Note that

V ar(β̂ − β̂ (a) |X, X(a) )


= V ar(β̂|X, X(a) ) + V ar(β̂ (a) |X, X(a) ) − 2Cov(β̂, β̂ (a) |X, X(a) )
= V ar(β̂|X, X(a) ) + V ar(β̂ (a) |X, X(a) )
= V ar(β̂|X) + V ar(β̂ (a) |X(a) ),

where the first equality follows from the property of the conditional variance, the second equality
(a) (a)
follows from independence of β̂ and β̂ (a) (because {(yi , xi ) : i = 1, . . . , n} and {(yi , xi ) :
i = 1, . . . , n} are independent), and the last equality follows from independence of β̂ and X(a) ,
(a) (a)
and independence of β̂ (a) and X (again, because {(yi , xi ) : i = 1, . . . , n} and {(yi , xi ) :
i = 1, . . . , n} are independent). [2 points]
2
Now, from (c), we obtain V ar(β̂|X) = Pnσ 2 under SLR.1-5. By the same argument to derive
i=1 xi
2
V ar(β̂|X) = Pnσ 2 , we obtain
i=1 xi

(σ (a) )2
V ar(β̂ (a) |X(a) ) = Pn (a) 2
.
(x
i=1 i )

[1 point] Therefore,

σ2 (σ (a) )2
V ar(β̂ − β̂ (a) |X, X(a) ) = Pn 2 + Pn (a) 2
i=1 xi i=1 (xi )

[1 point]

(f) In addition to SLR.1-5 and SLR.1a-5a, assume σ = σ (a) and


SLR.6 The error term u is independent of x and is normally distributed with mean zero and
variance σ 2 .
SLR.6a The error term u(a) is independent of x(a) and is normally distributed with mean zero
and variance σ 2 .
Explain how to test the null hypothesis H0 : β = β (a) against the two-sided alternative H1 :

© LSE ST 2020/EC220R IRDAP Page 9 of 18


β 6= β (a) . [Hint:
P In this setup, an unbiased
 estimator of σ 2 is obtained as
1 n Pn (a) 2 (a) (a) (a)
s2 = 2n−2 2
i=1 ûi + i=1 (ûi ) where ûi = yi − β̂ (a) xi .] [4 marks]

Solution : Based on the results derived in (d) and (e) combined with SLR.1-6, SLR.1a-6a, and
σ = σ (a) , we obtain

β̂ − β̂ (a) − E[β̂ − β̂ (a) ] β̂ − β̂ (a) − (β − β (a) )


q =s   ∼ N (0, 1).
(a)
V ar(β̂ − β̂ |X, X ) (a)
σ 2 Pn 1 x2 + Pn 1 (a)
i=1 i i=1 (xi )2

Under the null H0 : β − β (a) (and SLR.1-6, SLR.1a-6a, and σ = σ (a) ), we obtain

β̂ − β̂ (a)
t= s   ∼ t2n−2 .
s2 Pn 1 2 + Pn
1
(a) 2
i=1 xi i=1 (xi )

[3 points; 1 for test statistic, 1 for correct distribution, 1 for required assumptions] Let
t2n−2,1−α/2 be the (1 − α/2)-th quantile of the t2n−2 distribution. The testing procedure for
H0 against H1 with significance level α is

Reject H0 if |t| > t2n−2,1−α/2


Do not reject H0 if |t| ≤ t2n−2,1−α/2 .

[1 point]
[Alternative answer: We merge the two samples, and define the multiple regression model
(a)
(M ) ỹi = β x̃i + β (a) x̃i + ũi ,

for i = 1, . . . , 2n, where

yi for i = 1, . . . , n,

ỹi = (a) ,
yi−n for i = n + 1, . . . , 2n
for i = 1, . . . , n,

xi
x̃i = ,
0 for i = n + 1, . . . , 2n
for i = 1, . . . , n,

(a) 0
x̃i = (a) .
xi for i = n + 1, . . . , 2n

Note that SLR.1-6, SLR.1a-6a, and σ = σ (a) guarantee the assumptions MLR.1-6 for this re-
gression model (M ). Therefore, we can test H0 : β = β (a) by the t-test based on (M ). The
t-statistic will be identical.] [Points to be awarded in accordance with the first approach]

© LSE ST 2020/EC220R IRDAP Page 10 of 18


Question 3
(a) Answer the following questions. [11.33 marks]
(i) Consider two scalar random variables x and u, where E(u) = 0. Compare three concepts:
(1) E(u|x) = 0, (2) Cov(x, u) = 0, and (3) x and u are independent. [5.33 marks]

Solution : First, (1) implies (2). To see that, if (1) is true, then

Cov(x, u) = E(xu) − E(x)E(u)


= E(xu)
= E(E[xu|x])
= E(xE[u|x])
= 0,

where the first equality follows from the property of covariance, the second equality follows
from the assumption E(u) = 0, the third equality follows from the law of iterated expecta-
tions, the fourth equality follows from the property of the conditional expectation, and the
last equality follows from (1).
Second, on the other hand, (2) does not necessarily imply (1). This can be seen that (1)
implies

E[a(x)u] = E[E[a(x)u|x]] = E[a(x)E[u|x]] = 0,

for any function a(·), by the same argument above using the law of iterated expectation.
However, (2) only guarantees E[a(x)u] = 0 for the case of a(x) = x. [2 points]
Now we argue that (3) is even stronger than (1). To see that (3) implies (1), note that (3)
guarantees

E(u|x) = E(u) = 0,

where the first equality uses (3) and the second equality follows from the assumption
E(u) = 0. On the other hand, (1) does not necessarily imply (3). For example, even if
E(u|x) does not depend on x (and takes zero for all x), the conditional variance V ar(u|x)
may depend on x. In this case, u and x are not independent. [2 points]
In sum, (1) is stronger than (2), and (3) is stronger than (1) in the sense that

(3) ⇒ (1) ⇒ (2)

but (1) ; (3) and (2) ; (1). [1.33 points]


Note: Be generous in marking here. Few students will prove this as formally as the solution
states. Award marks if the conceptual idea of the different concepts and their hierarchy
comes across.

(ii) Consider the regression model

y = β0 + β1 x1 + β2 x2 + u, E(u|x1 , x2 ) = 0.
Suppose that the error term is heteroskedastic, (i.e., V ar(u|x1 , x2 ) varies with x1 and x2 ).

(ii-1) Explain how to test the null hypothesis H0a : β1 = β2 against the one-sided alternative
hypothesis H1a : β1 > β2 . [3 marks]

(ii-2) Suppose we want to test the null hypothesis H0b : β1 = β2 = 0. Write down a test for
this hypothesis under homoskedasticity. Then explain the problem of this testing procedure
under the current setup. [3 marks]

© LSE ST 2020/EC220R IRDAP Page 11 of 18


Solution :
Solution for (ii-1): The t-statistic is defined as

β̂1 − β̂2
t= ,
se(β̂1 − β̂2 )

where β̂1 and β̂2 are the OLS estimators for the regression model and
q
se(β̂1 − β̂2 ) = [robust.se(β̂1 )]2 + [robust.se(β̂2 )]2 − 2s12 .

[1 point] robust.se(β̂1 ) and robust.se(β̂2 ) are the heteroskedasticity robust standard errors
for β̂1 and β̂2 , respectively. s12 is an (consistent) estimator of Cov(β̂1 , β̂2 |X) with X =
{(x1i , x2i ) : i = 1, . . . , n} under heteroskedasticity. [1 point] Under MLR.1-4, we have
a
t ∼ N (0, 1).

Let z1−α be the (1 − α)-th quantile of the N (0, 1) distribution. The asymptotic testing
procedure for H0a against H1a with asymptotic significance level α is

Reject H0a if t > z1−α


Do not reject H0a if t ≤ z1−α .

[1 point]
Solution for (ii-2): If the error term is homoskedastic, a testing procedure for H0b with sig-
nificance level α is given by

Reject H0b if F > F2,n−3,1−α


Do not reject H0b if F ≤ F2,n−3,1−α

[1 point] where

R2 /2
F = ,
(1 − R2 )/(n − 3)

R2 is the R-square for the regression y = β0 + β1 x1 + β2 x2 + u, and F2,n−3,1−α is the


(1 − α)-th quantile of the F2,n−3 distribution. [1 point]
A major problem of this procedure under heteroskedasticity is that F statistic above does
NOT follow the F2,n−3 distribution under heteroskedasticity. Therefore, the testing proce-
dure above does not control the significance level (or type I probability) at the declared level
α. [1 point]

© LSE ST 2020/EC220R IRDAP Page 12 of 18


(b) [11 marks]
It is postulated that a reasonable demand-supply model for the wine industry in Australia, under
market clearing assumption, would be given by

Qt = α0 + α1 Ptw + α2 Ptb + α3 Yt + α4 At + u1t demand


Qt = β0 + β1 Ptw + β2 St + u2t supply

where Qt = real per capita consumption of wine, Ptw = price of wine relative to CPI, Ptb = price
of beer relative to CPI, Yt = real per capita disposable income, At = real per capital advertising
expenditure, and St = storage cost at time t. CPI is the Consumer Price Index at time t. The
endogenous variables in this model are Q and P w , and the exogenous variables are P b , Y, A
and S . The variance of u1t and u2t are, respectively σ12 , and σ22 , and Cov(u1t , u2t ) = σ12 6= 0.
The errors do not exhibit any correlation over time.

(i) Derive the reduced form for Ptw .


[2 marks]

Solution : 1.5 points Upon equating demand and supply, we get

β0 + β1 Ptw + β2 St + u2t = α0 + α1 Ptw + α2 Ptb + α3 Yt + α4 At + u1t


(β1 − α1 )Ptw = (α0 − β0 ) + α2 Ptb + α3 Yt + α4 At − β2 St + u1t − u2t
α0 − β0 α2 α3 α4 β2 u1t − u2t
Ptw = + Ptb + Yt + At − St +
β1 − α1 β1 − α1 β1 − α1 β1 − α 1 β1 − α1 β1 − α1
so
u1t − u2t
Ptw = π0 + π1 Ptb + π2 Yt + π3 At + π4 St + vt , where vt =
β1 − α 1
0.5 point For the 3rd equation we point out that β1 6= α1 , (opposite signs because they
represent the slope of demand and supply function).

(ii) The OLS estimation of the demand function, based on annual data from 1955-1975 (T =
20), gave the following results (all variables are in logs and figures in parentheses are t-
ratios).

Q̂t = −23.651 + 1.158Ptw −0.275 Ptb +3.212 Yt −0.603 At


(−6.04) (4.0) (−0.45) (4.5) (−1.3)

All the coefficients except that of Y have the wrong signs. The coefficient of P w (price
elasticity of demand, α1 ) not only has the wrong sign but also appears significant.

Explain why the OLS parameter estimator may give rise to these counter-intuitive results.
You are expected to use your results in answer (a) to support your answer.
[3 marks]

Solution : 2 points The resulting parameter estimates are biased and inconsistent, so
the parameter estimates, even for large sample (which this one clearly is not) are not likely
to be close to the true parameters. The inconsistency arises from the simultaneity where
prices and quantity are jointly determined.

© LSE ST 2020/EC220R IRDAP Page 13 of 18


1 point Using the result in b)i, we can show this explicitly:

Cov(Ptw , u1t ) = Cov(π0 + π1 Ptb + π2 Yt + π2 At + π3 St + vt , u1t )


= Cov(vt , u1t ) as we assume Ptb , Yt , At and St are exogenous
u1t − u2t σ 2 − σ12
= Cov( , u1t ) = 1 6= 0
β1 − α1 β1 − α1

(iii) The supply equation is overidentified. Clearly explain this terminology. What distinguishes
overidentification from exact identification and underidentification? Provide one set of as-
sumptions that would render the supply equation exactly identified.
[3 marks]

Solution : 2 points The supply equation is overidentified because there are more instru-
ments than we need to deal with the endogeneity of Ptw in the supply equation. Using the
results in b)i, we can consider Ptb , Yt , At , and St (relevance). All of these variables have
been assumed to be exogenous (valid). We cannot use St as it affect the supply directly
(exclusion). The remaining instrument, leave us with two more instruments than we need.

1 point If we only have 1 instrument to deal with the endogeneity of Ptw (e.g., say α2 = α4 =
0) we would have exact identification. (0.5 points if they argue we have exact identification
if two instruments show up in the supply equation)

(iv) Discuss how you should estimate the supply equation in light of the overidentification. Dis-
cuss the benefit of using overidentification conditions. [3 marks]

Solution : 2 points The student would need to describe the 2SLS procedure here.
Step 1: Estimate the reduced form (given in b)i) by OLS and obtain fitted values

P̂tw = π̂0 + π̂1 Ptb + π̂2 Yt + π̂3 At + π̂4 St

Step 2: Estimate the following regression by OLS using the obove fitted values:

Qt = β0 + β1 P̂tw + β2 St + e2t

1 point The efficiency (precision) of our estimates will be better when utilizing the overiden-
tification condition rather than restricting ourselves to using only a single instrument (exact
identification). 2SLS really is an optimal IV estimator, which uses the best combination of
all instruments to ensure better precision of our parameter estimates.

© LSE ST 2020/EC220R IRDAP Page 14 of 18


Question 4
(a) [14 marks]
Consider the following time series model

crimet = α0 + ρcrimet−1 + α1 clearupt + α2 clearupt−1 + α3 clearupt−2 + et , (4.1)


|ρ| < 1, t = 3, ..., T,

where E (et |crimet−1 , clearupt , clearupt−1 , ...) = 0


and V ar (εt |crimet−1 , clearupt , clearupt−1 , ...) = σ 2 . The errors exhibit no autocorrelation.

(i) Discuss the following statement “Even though we can estimate the parameters consistently
by OLS, for inference is important to use HAC standard errors”. Support your answers with
clear arguments. In your answer briefly explain the difference between unbiasedness and
consistency of parameter estimates.
[5 marks]

Solution : 2 points As the errors do not exhibit autocorrelation, there is no need to use
HAC standard errors. We can use the usual standard errors as we have homoskedasticity
and no autocorrelation. HAC standard errors should be used when using OLS in the
presence of autocorrelation.

2 points Apart from weak dependence and stationarity (satisfied), the conditions required
for consistency are that the errors are uncorrelated with the regressors which is the case
here... E.g., E(crimet−1 et ) = E [E(crimet−1 et |crimet−1 , clearupt , clearupt−1 , ...)] =
E [crimet−1 E(et |crimet−1 , clearupt , clearupt−1 , ...)] = 0, and since E(et ) = 0 as well,
we have Cov(crimet−1 , et ) = 0. [Mean independence implies uncorrelatedness, also
asked in Q3a]

1 point Unbiasedness is a finite sample property (rarely satisfied in time series mod-
els)that ensures that there is no systematic over- or underestimation of the parameters;
Consistency is a large sample property that ensures that as our sample size grows our
parameter estimates are more likely to be close to the truth.

(ii) Show that when you omit the relevant variable crimet−1 in the above model, you will get
evidence of autocorrelation in the errors. Explain the result.

Hint: You are expected to reformulate your model as

crimet = β0 + β1 clearupt + β2 clearupt−1 + β3 clearupt−2 + vt . (4.2)

[4 marks]

Solution : 1 point If we omit the relevant variable crimet−1 , then are new error term vt will
incorporate it. Specifically

vt = ρcrimet−1 + et

and βj = αj , j = 0, 1, 2, 3.

© LSE ST 2020/EC220R IRDAP Page 15 of 18


2 points We should now show that, e.g., E (vt vt−1 ) 6= 0. Using the definition of vt we have

E (vt vt−1 ) = E ((ρcrimet−1 + et ) (ρcrimet−2 + et−1 ))


= ρ2 E (crimet−1 crimet−2 ) + E (et et−1 ) +
ρE (crimet−2 et ) + ρE (crimet−1 et−1 ) 6= 0

1 point Clear explanations: The second term is zero because et does not exhibit autocor-
relation, the third term is zero as E(et |crimet−1 , clearupt , clearupt−1 , ...) = 0, the final
term is non-zero because our equation (4.1) shows that crimet is a function of et , similarly
the first term is non-zero because ρ 6= 0 revealing (using (4.1) again) that crimet−1 is be a
function of crimet−2 (lagging (4.1) one period).

(iii) Let us assume that the true model is displayed in (4.2), where vt exhibits autocorrelation of
unknown form that displays weak dependence, and E(vt |clearupt , clearupt−1 , clearupt−2 ) =
0. You are asked to test whether the long run effect of clear-up rates on the crime rate is
significant. Discuss how you can obtain the standard error of the long run effect required
to conduct the test. [5 marks]

Solution : 1 point The long run effect in this model is given by δ = β1 + β2 + β3 .

1.5 points We can use the asymptotic t-test to test the hypothesis H0 : δ = 0 against
HA : δ 6= 0. The test statistic is

δ̂ a
 ∼ N (0, 1) under H0 (t distribution acceptable)
SE δ̂

δ̂
and at the 5% level of significance we should reject if > 1.96.
SE (δ̂ )

0.5 points Because of the presence of autocorrelation in the errors, we do need to


use HAC robust standard errors.

2 point To obtain the standard errors with standard regression package it would be
convenient to reparametrize the model beforehand:

crimet = β0 +δclearupt +δ1 (clearupt−1 − clearupt )+δ2 (clearupt−2 − clearupt )+vt .

Alternatively, we could use the fact that


 
SE β̂1 + β̂2 + β̂3
q
= ˆ βˆ1 , βˆ2 ) + 2Cov(
SE(β̂1 )2 + SE(β̂2 )2 + SE(β̂3 )2 + 2Cov( ˆ βˆ1 , βˆ3 ) + 2Cov(
ˆ βˆ2 , βˆ3 )

the latter would therefore require us to obtain the estimated covariances.

© LSE ST 2020/EC220R IRDAP Page 16 of 18


(b) [8 marks]
Stevenson and Wolfers (2008) amongst others have analysed happiness using data collected
in the General Social Survey. Here we are interested in explaining the binary variable vhappy , a
dummy variable that denotes whether an individual considers him/herself "very happy" or not
(1 = yes, 0 = no). The following socio-demographic variables are considered: occattend and re-
gattend (which are dummy variables indicating whether the individual occasionally or regularly
attends church, where the excluded dummy indicates that the individual never attends church),
income (family income in ’000US$), unemp10 (dummy indicating whether the individuals has
been unemployed in the last 10 years), and educ (years of education completed). A random
sample of observations from US are available.

Advised that there are benefits to using the Probit model over the Linear Probability Model (you
are not asked to discuss this), you obtain the following results:

(i) Discuss how you can obtain the predicted probability of an individual who regularly attends
church, whose lincome equals 5, has not been unemployed in the last 10 years and has 13
years of education.
[3 marks]

Solution : The predicted probability of this individual is given by

Φ (−1.218 + .256 + .198 ∗ 5 + .0218 ∗ 13) = Φ (.311) [2 points] = .6217 [1 point]

The predicted probability is 62.2% (or simply 0.62).

(ii) You want to test the joint significance of the church attendance variables occattend and
regattend. How would you conduct this test, and what additional information would you
require to implement it? Given the results presented in the table, what do you expect the
outcome of this test to be? Briefly explain your answer. [5 marks]

Solution : We are asked to test H0 : βoccattend = βregattend = 0 against HA : at least


one of the coefficients is non-zero.

1 point We would want to run another probit regression where these two variables
are left out.

2 points Our LR will then compare the log-likelihood of the original model with the

© LSE ST 2020/EC220R IRDAP Page 17 of 18


log-likelihood of the restricted model.
 
a
LR = 2 log LU − log LR ∼ χ22 under H0

The regression results show that log LU = −5891.481. We will reject the null at the 5%
level of significance if LR > 5.99, in which case we would find evidence of their joint
significance.

2 points In the above table, we notice  that βregattend is highly significant. Its test
statistic, z = β̂regattend /SE β̂regattend has a very small p − value (p < 0.0001) .
Despite the fact that βoccattend is not statistically significant, I expect we will find evidence
of their joint significance (note: the alternative states that at least one of the coefficients
in non-zero).

END OF PAPER

© LSE ST 2020/EC220R IRDAP Page 18 of 18

You might also like