PS2 Solution

Econometrics 1 - Problem set 2
Samuele Borsini Luigi Calro Murialdo Xinyi Zhang

0001083685 0001037355 0001075309
Question 1
Preliminary Analysis
Solution (1) Looking at the standard deviations of GDP per capita’s log
(table 1), Chile registered the greatest variation, whereas Paraguay registered
the lowest one.
Table 1: GDP per capita’s log statistics

Country Mean SD Min Max
Argentina 8.361606 .1594668 8.090176 8.745898
Bolivia 6.849169 .1090917 6.684549 7.069288
Brazil 8.381038 .1051876 8.203846 8.633789
Chile 8.558166 .3645343 7.971852 9.060107
Colombia 8.009254 .1439194 7.789635 8.278395
Costa Rica 8.210605 .2005409 7.935359 8.586365
Honduras 7.128446 .1031013 7.009527 7.355527
Mexico 8.835705 .0897421 8.714128 9.001988
Nicaragua 7.011085 .1461931 6.77743 7.245125
Panama 8.269622 .2010249 7.97375 8.747618
Paraguay 7.290305 .0853354 7.131776 7.453443
Peru 7.861983 .1717078 7.584051 8.251704
Uruguay 8.434622 .1870602 8.117807 8.827324
Venezuela, RB 8.623245 .0858734 8.371621 8.781027
Total 7.979501 .6529535 6.684549 9.060107
Solution (2) No, they did not. The highest variation in the CO2 emissions’
log has been registered by Honduras, while Mexico has registered the lowest one
(table 2).
Solution (3) From table 2, on average Venezuela is the country that expe-
rienced the highest level of CO2 emissions per capita, on the other side there
1
Table 2: CO2 level of emission per capita’s log statistics
Country Mean SD Min Max
Argentina 1.328376 .1020411 1.18661 1.566575
Bolivia .0234759 .3002862 -.4931038 .4491379
Brazil .4917929 .1555767 .2356698 .7655925
Chile 1.046245 .3199351 .5727751 1.451309
Colombia .455186 .0949469 .2584696 .6320238
Costa Rica .209913 .2648509 -.271057 .6527979
Honduras -.3625969 .3342241 -.8597653 .201258
Mexico 1.328472 .0487545 1.23513 1.456606
Nicaragua -.4283491 .1965379 -1.00708 -.186882
Panama .5053506 .2397069 .0354755 .9628125
Paraguay -.4852938 .2338397 -.8812061 -.1304859
Peru .1497628 .184563 -.1051707 .6768439
Uruguay .4541554 .2196969 .0477475 .9114031
Venezuela, RB 1.830554 .0986794 1.632516 2.031914
Total .4582117 .7102654 -1.00708 2.031914
is Nicaragua that has seen on average the lowest level of CO2 emissions per
capita. Interestingly, the same countries have also registered the peak and bot-
tom value of CO2 pollution per capita’s log. Generally, the summary statistics
do not confirm the Kuznets curve as the country with the lowest level of pollu-
tion can hardly be considered the most developed one.
Solution (4) Looking at the plot (figure 1), it suggests that a quadratic term
could help the model. However, according to the Kuznets theory the quadratic
line is expected to be downward facing whereas the figure suggests the opposite.
Moreover, it seems to be a positive relation between those two sizes. Thus,
the more a country’s economy is big, the more it pollutes.
Regression Analysis
Solution (1) The model shows an R2 equal to 0.682, which is not particularly
high. The estimated coefficients are around -6.71 for the constant and 0.9 for
the GDP per capita’s log. Both of them are significant, even at a 1% level of
significance, with p-values approximately equal to 0. Hence, there is a positive
correlation between the GDP per capita’s log and the CO2 emission’s log. β1 can
be interpreted as the elasticity of CO2 emission to GDP per capita. Therefore,
when the GDP per capita increases by 1%, we can expect an increases of β1 %
in the level of emission.
Solution (2) Adding the controls given by the problem, the elasticity of CO2
emission to GDP decreases from 0.89 to 0.13, though it still remains significant,
2
Figure 1: Scatterplot of GDP per capita’s log against CO2 per capita emissions’
log
even at a 1% level of significance, with a related p-value approximately equal to

0. We also infer that the previous model was upward biased due to the presence
of omitted variable bias.
Based on the results of the new model on STATA, the log of employment
(with a p-value equal to 0.481) and the log of the population density (with a
p-value equal to 0.641) do not have significant relation with the log of CO2
emissions. Among the significant regressors, the log of human capital, the log
of energy use and the log of foreign direct investments have a positive effect on
CO2 emissions. All the coefficients estimated are the elasticity of CO2 to each
original regressors (not to their log transformations). Therefore, a 1% increase
in these variables is related with an increase in CO2 emissions by 0.33% (for
the human capital), by 1.01% (for the energy use) and by 0.07% (for the foreign
direct investment).
On the contrary, the log of gross fixed capital is negatively correlated with
the log of CO2 emissions. Hence, a 1% increasing of the gross fixed capital is
related with a 0.029% C02 emission decrease.
Solution (3) If we add a polynomial term of order 2 of GDP per capita’s log
to the previous model, STATA gives us the following results. The coefficient
for the linear term of GDP’s log is positive and equal to 1.15. Whereas, the
coefficient of the quadratic term of GDP’s log (equal to -0.067) is negative. Thus,
the marginal effect of GDP (elasticity of CO2 to GDP per capita) became:
∂CO2i
= 1.146218 − 2 · 0.066824 · GDP i (1)
∂GDP i
Notice that CO2i and GDP i are logs.
3
That seems to confirm the Kuznets curve hypothesis, because those coeffi-
cients generate a downward facing parabola.
However, neither of those terms are significant at a 5% level of significance
(with p-value’s equal to 0.053 for the linear one and equal to 0.089 for the
quadratic one). This implies that we cannot infer with high certainty whether
the quadratic relationship between CO2 emission and GDP is represented by a
downward facing parabola or a upward facing on. Moreover, we cannot infer
whether there is or not a quadratic relationship between those two sizes.
Solution (4) In order to decide if it is optimal to include the quadratic term,

we build the following test:
(
H0 : β 2 = 0
H1 : β2 ̸= 0
The test for the hypothesis of insignificance for the quadratic term gives an F
statistic with p-value equal to 0.0887 (this is exactly the same test that STATA
does automatically when it runs the regressions). Hence, if we use a 5% level of
significance we should not include the quadratic term in the regression.
Equation 1 represents the marginal effect of GDP. Therefore, in order to test
whether the marginal effect is 0, we build a test as follows:
(
H0 : β1 + 2β2 GDP i = 0
H1 : β1 + 2β2 GDP i ̸= 0
In order to compute it, we use the mean of the GDP’s log, which is equal to
7.98. The test gives a t statistic with p-value equal to 0.092. Hence, using a 5%
level of significance, we can infer that the marginal effect of GDP is 0 in this
model.
The turning point of a general parabola (y = ax2 + bx + c) can be computed
as follows:
b
x=−
2a
In our case, b is β̂1 and a is β̂2 , thus the (estimated) turning point (GDP
\ 0 ) will
be:
\ 0 = − β̂1 ≈ 8.58
GDP
2β̂2
Solution (5) All those tests can be represented as follows:

(
H0 : β1 + 2β2 c = 0
H1 : β1 + 2β2 c ̸= 0
Where c = {5, 7, 9}. In the case c = 5, we get a t statistic with a p-value of

0.018, thus we can say that, when the GDP’s log is equal to 5, its marginal
4
effect is not 0 (at a 5% level of significance, yet we would not say so at a 1%
level). In the case c = 7, we get a t statistic with a p-value of 0, thus we can say
that, when the GDP’s log is equal to 7, its marginal effect is not 0 (even at a
1% level of significance). In the case c = 9, we get a t statistic with a p-value of
0.634, thus we cannot say that, when the GDP’s log is equal to 9, its marginal
effect is not 0.
Solution (6) Estimating a model as follows:

CO2i = β0 +β1 GDP i +β2 GDP 2i +β3 Xi +β4 Highi +β5 GDP i Highi +β6 GDP 2i Highi +ϵi
Where Highi is a dummy variable in which an observation is 1 if its GDP’s log
is greater or equal than the median, and it is 0 if it is lower than the median.
The marginal effect of GDP will be:
(
∂CO2i β1 + 2β2 GDP i if Highi = 0
=
∂GDP i β1 + β5 + 2(β2 + β6 )GDP i if Highi = 1
In order to test whether the marginal effect of GDP is different between high
and low-income countries, we have to compute the difference between the two
marginal effects:
β1 + β5 + 2(β2 + β6 )GDP i − β1 − 2β2 GDP i = β5 + 2β6 GDP i
If there is no difference between high and low-income countries, this value should
be 0. Hence, we test this hypotesis:
(
H0 : β5 + 2β6 GDP i = 0
H1 : β5 + 2β6 GDP i ̸= 0
In order to compute it, we use the mean of the high income countries GDP’s
log, which is equal to 8.51. The test gives a t statistic with p-value equal to 0.
Hence, even at a 1% level of significance, we can infer that there is a different
between the marginal effect of GDP between high and low-income countries.
Solution (7) In order to perform a Chow test, we calculate the F statistic as

follows:
SSR − (SSR0 + SSR1 ) n0 + n1 − 2k
F = ≈ 45.71
k SSR0 + SSR1
The related p-value is 0. Therefore we can conclude that there is a structural
break between high and low-income countries.
Question 2
Regression Analysis
Solution (1) Since the regressions run are between standardised measures,
then the interpretation is as follows: when each regressor (the original non-
standardised measures) varies of a value equal to its standard deviation (∂Xj =
5
σXj ), the provision of public goods (the original non-standardised measure)
varies of a value equal to its standard deviation times the estimated coefficient
(∂Y = βj · σY ).
Solution (2) The interpretation of the estimated coefficients for the regressors
already used in the previous regressions does not change.
For the legal origin dummies’ coefficient, we can say that being, for instance,
a French legal origin country will lead to a public goods’ provision that is less
than the one of a country with British legal origin (the base group) by 1.673875
times the standard deviation of public goods’ provision. We can give a similar
interpretation to the regional dummies’ ones.
For the absolute latitude, we can say that a unit change in the absolute
latitude will lead to a chenge equal to 0.0419 times the standard deviation of
public goods’ provision in this last measure.
Lastly, using or not the robust errors gives the same estimates. Yet, what
change are the standard errors, that become bigger.
Solution (3) In order to test the joint significance of the standardise three
measures, we test the following hypothesis:
(
H0 : Rγ = d
H1 : Rγ ̸= d
Where:  
γ0
′
   
0 1 0 0 0  γ1  0
0′ 
 
R = 0 0 1 0  γ2 
γ=  d = 0
0 0 0 1 0′  γ3  0
γ4
Where 0 is a column vector of 0’s. The F statistic computed has a p-value of
0.096, thus, even at a 1% level of significance, we can reject the hypothesis of
jointly insignificance.
Solution (4) The Brusch-Pagan test gives a χ2 statistic with p-value equal
to 0.033. Thus, at a 5% level of significance, we should reject the hypothesis of
homoskedasticity. Yet, the White’s test gives a χ2 statistic with p-value equal
to 0.317, which should lead us to not reject the homoskedasticity hypothesis.
This contradiction can be explained by two factors: the first is the number of
regressors and the second is the number of observations. Due to the number
of regressors, the White’s test χ2 statistic has 44 degrees of freedom, since we
are estimating a model with that much regressors. Usually, this test does not
perform well in those cases. This is even more true if we look at the number of
observation (48). Thus the model built for the test would have several problems.
Moreover, having 48 observations creates problem for all types of tests, because
it is too small to lead us to think that the CLT holds.
6
Figure 2: Scatterplot of residuals against fitted values
Figure 2 shows the plot of residuals against fitted values. Since the number
of observation is low, we cannot say much. Though, it seems that there is not
homoskedasticity by looking at the concentration of points when the fitted value
is between 0 and 4.
Solution (5) Huber-White robust standard error should always be used to

preserve the validity of your inference in case there is heteroskedasticity. Using
the classical OLS standard errors gives right answers when there is homoskedas-
ticity. Yet, when this requirement is not met, using those errors will lead you
to incorrect tests, invalidating all the inference. This problem comes from the
inconsistency of the homoskedastic estimator in presence of hetereoskedasticity.
We know that under homoskedasticity (assuming that the CLT holds), as
n → ∞, we would have:
β̂j − βj∗ d
p −
→ N (0, 1)
σ̂ 2 [(X ′ X)−1 ]jj
p
Though, this is true because σ̂ 2 In − → σ 2 In , namely the homoskedastic variance
estimator is a consistent estimator for the covariance matrix of ϵ. Yet, if there
is heteroskedasticity, this is not true anymore. Therefore, that statistic does not
tend to a standard Gaussian distribution. Hence, we cannot use it for testing.
With the robust standard errors (assuming that the CLT holds), as n → ∞,
we would have:
β̂j − βj∗ d
q −
→ N (0, 1)
[V̂ HC (β̂)]jj
Since V̂ HC (β̂) is a consistent estimator when there is heteroskedasticity, using it
7
would rule out the risk. Moreover, as it is consistent even in the homoskedastic
case, we would not run the risk of making error in the simpler case.
Solution (6) Table 3 contains all the regressions’ results from the previous
points.
Table 3: Regressions’ results

(1) (2) (3) (4) (5)
p1 8 p1 8 p1 8 p1 8 p1 8
Standardised measure of the overlap between culture and ethnicity -29.93∗∗ -43.19∗∗∗ -43.19∗∗∗
(12.22) (11.58) (12.52)
Standardised measure of ethnolinguistic fractionalisation -3.485∗∗∗ 1.380 1.380
(1.135) (1.018) (0.880)
Standardised measure of cultural fractionalisation 15.28∗ 19.90∗∗∗ 19.90∗∗
(8.430) (5.678) (9.179)
French legal origin -1.674∗∗∗ -1.674∗∗∗
(0.479) (0.582)
German legal origin -0.460 -0.460
(0.606) (0.590)
Scandinavian legal origin -0.500 -0.500
(0.899) (0.825)
Latin America and Carribean -0.257 -0.257
(0.746) (0.658)
Sub-Saharan Africa -4.031∗∗∗ -4.031∗∗∗
(0.725) (0.835)
East and Southeast Asia 0.949 0.949
(0.696) (0.789)
Absolute value of latitude of a country’s geodesic centroid 0.0419∗∗ 0.0419∗∗
(0.0192) (0.0194)
Constant 1.411∗∗∗ 1.899∗∗∗ -7.552∗ -9.541∗∗∗ -9.541∗∗
(0.460) (0.525) (4.483) (2.825) (4.556)
F-stat 5.997 9.424 3.283 14.92 16.42
R2 0.113 0.167 0.0653 0.801 0.801
Standard errors in parentheses
∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01

PS2 Solution

Uploaded by

Copyright:

Available Formats

You might also like

PS2 Solution

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PS2 Solution

Uploaded by

Copyright:

Available Formats

Econometrics 1 - Problem set 2

Samuele Borsini Luigi Calro Murialdo Xinyi Zhang

Table 1: GDP per capita’s log statistics

even at a 1% level of significance, with a related p-value approximately equal to

Solution (4) In order to decide if it is optimal to include the quadratic term,

Solution (5) All those tests can be represented as follows:

Where c = {5, 7, 9}. In the case c = 5, we get a t statistic with a p-value of

Solution (6) Estimating a model as follows:

Solution (7) In order to perform a Chow test, we calculate the F statistic as

Solution (5) Huber-White robust standard error should always be used to

Since V̂ HC (β̂) is a consistent estimator when there is heteroskedasticity, using it

Table 3: Regressions’ results

You might also like