Mock Exam2

Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15
This exam should take you at most 120 minutes. Please allocate enough time to upload
your solutions on Learn before 18:15! Please upload one single pdf file only.
Good luck!
1 Case Study 1 – Multiple linear regression with

dummy variables
In this case study, we consider savings of US workers in relation to various socio-
economic factors. One of these factors is the eligibility to participate in a special
pension plan, the so called 401(k) pension plan.
The data set at hand contains savings of 9275 individuals with 6 explanatory variables.
The dependent variable is:
• savings: The total savings measured in 1000 USD. They will by denoted by
Y.
The explanatory variables are:
• inc: The annual income measured in 1000 USD. It will be denoted by Xinc .
• e401k: Eligibility for the 401(k) pension plan coded as a dummy variable.
De401k = 1 if eligible, De401k = 0 if not eligible.
• marr: Marital status coded as a dummy variable. Dmarr = 1 if married,
Dmarr = 0 if not married.
• male: Gender coded as a dummy variable. Dmale = 1 if male, Dmale = 0 if not
male.
• age: Age in years. It will be denoted by Xage .
• hsize: A categorical variable representing the size of a household with three
levels: hsize = 1, hsize = 2, and hsize ≥ 3. Choosing hsize = 1 as a
baseline, the dummy variables DhsizeTwo and DhsizeMThree = 1 code the levels
hsize = 2 and hsize ≥ 3, respectively.
Econometrics I 1
WS 2021/2022, 16:00 – 18:15
The following model is estimated:
Y = β0 + β1 Xinc + β2 De401k + β3 Dmarr + β4 Dmale + β5 Xage (1)

+ β6 DhsizeTwo + β7 DhsizeMThree + β8 Dmarr Xinc + u,
where the conditional expectation of u given all the explanatory variables is 0.

You may find a fitted model in Listing 1.
1500
1000
residuals
500
0
−500
50 100 150 200
income
Figure 1: Residuals of Model 1.1 plotted against income
1.1
In figure 1 you can find a scatterplot of the residuals from the model against the
income. Do you see a violation of any of the usual assumptions in linear regression
which you can check with figure 1? Give an explanation.
Econometrics I 2
WS 2021/2022, 16:00 – 18:15
Listing 1: Regression output of Model 1.1
```
##
## C a l l :
## lm ( f o r m u l a = s a v i n g s ~ i n c + e401k + marr + male + age + h s i z e +
## i n c : marr , data = k401ksubs )
##
## R e s i d u a l s :
## Min 1Q Median 3Q Max
## −506.15 −18.80 −4.30 9.64 1457.88
##
## C o e f f i c i e n t s :
## Esti mate Std . E r r o r t v a l u e Pr ( >| t | )
## ( I n t e r c e p t ) −52.25714 3 . 3 4 4 9 1 −15.623 < 2 e −16 ∗∗∗
## i n c 0.81655 0 . 0 6 4 2 3 1 2 . 7 1 4 < 2 e −16 ∗∗∗
## e401k 5.97070 1.28262 4 . 6 5 5 3 . 2 8 e −06 ∗∗∗
## marr −13.83317 2 . 7 3 4 5 3 −5.059 4 . 3 0 e −07 ∗∗∗
## male 0.42505 1.69556 0.251 0.80206
## age 1.00715 0 . 0 6 0 0 6 1 6 . 7 6 9 < 2 e −16 ∗∗∗
## hsizeTwo 1.99273 2.20166 0.905 0.36543
## hsizeMThree −4.04746 2 . 1 7 8 1 2 −1.858 0 . 0 6 3 1 7 .
## i n c : marr 0.20437 0.07008 2 . 9 1 6 0 . 0 0 3 5 5 ∗∗
## −−−
## S i g n i f . c o d e s : 0 ' ∗ ∗ ∗ ' 0 . 0 0 1 ' ∗ ∗ ' 0 . 0 1 ' ∗ ' 0 . 0 5 ' . ' 0 . 1 ' ' 1
##
## R e s i d u a l s t a n d a r d e r r o r : 5 8 . 0 2 on 9266 d e g r e e s o f freedom
## M u l t i p l e R−s q u a r e d : 0 . 1 7 7 9 , Adjusted R−s q u a r e d : 0 . 1 7 7 2
## F− s t a t i s t i c : 2 5 0 . 6 on 8 and 9266 DF, p−v a l u e : < 2 . 2 e −16
```
Econometrics I 3
WS 2021/2022, 16:00 – 18:15
1.2
Interpret the effect of the eligibility for 401(k) on savings.
1.3
Interpret the effect of the household size on savings.
1.4
Test if the eligibility for 401(k) has a significant effect on the savings on average,
ceteris paribus. (Provide details: Null hypothesis, significance level, test statistic,
p-value, interpretation.)
1.5
What are the expected savings of a 35 year old married male, living in a household of
size 4, who has an annual income of 33 500 USD and who is eligible for 401(k)?
1.6
What is the expected change in savings when the annual income increases by 500 USD
a) for someone who is married, and b) for someone who is not married, respectively,
ceteris paribus?
1.7
Listing 1 includes an F-statistic equal to F = 250.6. What is being tested here?

(Provide details: Null hypothesis, and interpretation.)
Econometrics I 4
WS 2021/2022, 16:00 – 18:15
1.8
Let Dnotmale be a dummy variable equal to 1 if a person is not male and zero otherwise.
If Dnotmale is used in place of Dmale in our model, what will be the OLS estimate of
the coefficient for Dnotmale ?
(Hint: It holds that Dnotmale = 1 − Dmale . Plug this into the model equation (1).)
1.9
What will happen to the R-squared (coefficient of determination) if Dnotmale is used

in place of Dmale ?
1.10
Should Dmale and Dnotmale both be included as independent variables in the model?
Explain.
1.11
What test would you perform to test whether the size of a household has a significant
effect on expected savings, ceteris paribus? Provide null hypothesis.
1.12
What distribution does the test statistic used in the previous question 1.11 follow
under the null hypothesis? Provide the name and the correct parameters (with
numbers).
Econometrics I 5
WS 2021/2022, 16:00 – 18:15
2 Case Study 2 – Modelling quadratic effects

In the following case study, we examine the relationship between the prestige of
various occupations and their corresponding socioeconomic characteristics. The data
set consists of 98 occupations with the following dependent and explanatory variables:
• prestige: the Pineo–Porter prestige score (ranged between 10 to 90 points,
the higher the more prestigious) of the occupation; hereafter referred to as the
variable Y .
• education: years of education (1 unit = 1 year) for the typical employee in the
given occupation; hereafter referred to as the variable Xeducation .
• income: annual income measured in Canadian Dollars (1 unit = 1 CAD) for the
typical employee in the given occupation, hereafter referred to as the variable
Xincome .
• women: the percentage of women (1 unit = 1 percentage point) in the occupation,
hereafter referred to as the variable Xwomen .
• type: the type of occupation divided into three categories: Blue Collar (bc),
Professional (prof) and White Collar (wc). Specifically, choosing type = bc
to be the baseline, type = prof is represented by a dummy variable Dprof and
type = wc is represented by a dummy variable Dwc .
The data set is obtained from the year 1971 Census of Canada. Below in table 1 are
its summary statistics.
Table 1: Summary statistics of the data set
prestige education income women type

Min. :17.30 Min. : 6.380 Min. : 1656 Min. : 0.000 bc :44
1st Qu.:35.38 1st Qu.: 8.445 1st Qu.: 4250 1st Qu.: 3.268 prof:31
Median :43.60 Median :10.605 Median : 6036 Median :14.475 wc :23
Mean :47.33 Mean :10.795 Mean : 6939 Mean :28.986 NA
3rd Qu.:59.90 3rd Qu.:12.755 3rd Qu.: 8226 3rd Qu.:52.203 NA
Max. :87.20 Max. :15.970 Max. :25879 Max. :97.510 NA
Econometrics I 6
WS 2021/2022, 16:00 – 18:15
2.1
We start with the following model
Y = β0 + β1 log(Xincome ) + β2 Xeducation + β3 Dprof + β4 Dwc + u, E(u|X) = 0,
where X = (Xincome , Xeducation , Dprof , Dwc ) and call it Model 2.1.

Discuss the adequacy of the model specification using figure 2 by explicitly referring
to the relevant assumption. Furthermore, discuss what the fullfillment/violation of
such an assumption entails for the OLS estimators.
Residuals vs Fitted
20
medical.technicians
electronic.workers
10
Residuals
0
−10
collectors
20 30 40 50 60 70 80
Fitted values
lm(prestige ~ log(income) + education + type)
Figure 2: Residuals of Model 2.1 plotted against fitted values
Econometrics I 7
WS 2021/2022, 16:00 – 18:15
2.2
Consider a new model, where Xincome is a level variable (we take the raw variable and
not its logarithm). In addition, we include the variable Xwomen and the interaction
term Xincome Xeducation . The new model is now written as:
Y = β0 + β1 Xincome + β2 Xeducation + β3 Dprof

+ β4 Dwc + β5 Xwomen + β6 Xincome Xeducation + u, E(u|X) = 0,
where X = (Xincome , Xeducation , Dprof , Dwc , Xwomen ) and we call it Model 2.2.
Perform residual diagnostics on Model 2.2 using figure 3 and listing 2. For each
figure/listing, explicitly refer to the assumption(s) assessed.
Residuals vs Fitted Normal Q−Q

20
medical.technicians
3 medical.technicians
electronic.workers electronic.workers
2
Standardized residuals
10
1
Residuals
0
0
−1
−10
−2
general.managers
−3
general.managers
30 50 70 −2 −1 0 1 2
Fitted values Theoretical Quantiles
Figure 3: Residuals of Model 2.2 plotted against fitted values (left), QQ-plot of
residuals of Model 2.2 (right)
Econometrics I 8
WS 2021/2022, 16:00 – 18:15
Listing 2: Jarque–Bera test on residuals of Model 2.2
```
##
## Jarque Bera Test
##
## data : r e g 2 _ 2 $ r e s i d u a l s
## X−s q u a r e d = 1 . 2 9 6 6 , d f = 2 , p−v a l u e = 0 . 5 2 2 9
```
2.3
In Model 2.2, test whether the percentage of women has any significant effect on the
expected prestige score, ceteris paribus. When answering this question, explicitly
provide the test name, null hypothesis, significance level, test statistics, p-value and
test decision.
Provide an interpretation of the test decision in the current context.
The summary output of Model 2.2 is provided in listing 3.
2.4
Compute the estimated instantaneous change in the expected prestige score to a

small change in years of education for sewing machine operators. Compute the same
quantity for the university teachers. Relevant covariate information of these two
groups are given in table 2.
Table 2: Covariates of sewing machine operators and

university teachers
education income women type

university.teachers 15.97 12480 19.59 prof
sewing.mach.operators 6.38 2847 90.67 bc
Econometrics I 9
WS 2021/2022, 16:00 – 18:15
Listing 3: Summary of Model 2.2
```
##
## C a l l :
## lm ( f o r m u l a = p r e s t i g e ~ income + e d u c a t i o n + type + income : e d u c a t i o n +
## women , data = Prestige_new )
##
## −15.2186 −5.0131 0.6606 4.8713 16.9888
##
## ( I n t e r c e p t ) −2.153 e+01 8 . 0 9 9 e+00 −2.658 0 . 0 0 9 2 8 ∗∗
## income 4 . 3 8 0 e −03 1 . 0 4 8 e −03 4 . 1 8 1 6 . 6 8 e −05 ∗∗∗
## e d u c a t i o n 5 . 2 5 1 e+00 7 . 8 2 2 e −01 6 . 7 1 3 1 . 5 9 e −09 ∗∗∗
## t y p e p r o f 4 . 5 7 6 e+00 3 . 7 6 6 e+00 1.215 0.22751
## typewc −4.812 e+00 2 . 5 9 9 e+00 −1.851 0 . 0 6 7 4 1 .
## women 3 . 9 4 7 e −02 3 . 0 5 9 e −02 1.290 0.20019
## income : e d u c a t i o n −2.415 e −04 7 . 3 6 3 e −05 −3.280 0 . 0 0 1 4 7 ∗∗
## −−−
## S i g n i f . c o d e s : 0 ' ∗ ∗ ∗ ' 0 . 0 0 1 ' ∗ ∗ ' 0 . 0 1 ' ∗ ' 0 . 0 5 ' . ' 0 . 1 ' ' 1
##
## R e s i d u a l s t a n d a r d e r r o r : 6 . 7 8 1 on 91 d e g r e e s o f freedom
## F− s t a t i s t i c : 8 7 . 5 8 on 6 and 91 DF, p−v a l u e : < 2 . 2 e −16
```
Econometrics I 10
WS 2021/2022, 16:00 – 18:15
2.5
Next, we return to Model 2.1 where Xincome is used as a logged variable. Additionally,
2
we include Xwomen as a linear and as a quadratic term Xwomen . The rationale is that
the effect of female participation in the workforce may have a nonlinear effect on the
prestige score.
Based on these considerations, consider the following model:
Y = β0 + β1 log(Xincome ) + β2 Xeducation + β3 Dprof

2
+ β4 Dwc + β5 Xwomen + β6 Xwomen + u, E(u|X) = 0,
where X = (Xincome , Xeducation , Dprof , Dwc , Xwomen ) and we call it Model 2.3.
(a) Assuming that all relevant assumptions for testing hold, what test can you use
to assess whether the percentage of women has any significant effect on the
expected prestige score, ceteris paribus? In addition to the test name, provide
(in formulas) the corresponding null hypothesis.
(b) Compute the corresponding test statistics. You may find the regression output of
Model 2.3 shown in listing 4 and the table of the sum of squared residuals shown
in table 3 useful. In addition, provide the distribution of the test statistics.
(c) Describe in words how you can use the test statistic computed in (b) to test
the hypothesis defined in (a).
Table 3: Sum of squared residuals for Models 2.1-2.3
SSR
Model 2.1 4096.286
Model 2.2 4184.432
Model 2.3 3641.730
Econometrics I 11
WS 2021/2022, 16:00 – 18:15
Listing 4: Summary of Model 2.3
```
##
## C a l l :
## lm ( f o r m u l a = p r e s t i g e ~ l o g ( income ) + e d u c a t i o n + type + women +
## I (women ^ 2 ) , data = Prestige_new )
##
## −11.9107 −4.3428 0.3451 3.9136 16.9437
##
## ( I n t e r c e p t ) −1.097 e+02 1 . 8 6 9 e+01 −5.867 7 . 0 8 e −08 ∗∗∗
## l o g ( income ) 1 . 4 1 0 e+01 2 . 2 8 6 e+00 6 . 1 6 9 1 . 8 7 e −08 ∗∗∗
## e d u c a t i o n 2 . 9 6 6 e+00 5 . 9 1 4 e −01 5 . 0 1 6 2 . 6 0 e −06 ∗∗∗
## t y p e p r o f 6 . 0 3 1 e+00 3 . 5 1 1 e+00 1.718 0.0892 .
## typewc −2.839 e+00 2 . 3 7 1 e+00 −1.197 0.2342
## women −8.208 e −02 8 . 5 6 2 e −02 −0.959 0.3403
## I (women^2) 1 . 8 6 0 e −03 8 . 9 2 1 e −04 2.085 0.0399 ∗
## −−−
## S i g n i f . c o d e s : 0 ' ∗ ∗ ∗ ' 0 . 0 0 1 ' ∗ ∗ ' 0 . 0 1 ' ∗ ' 0 . 0 5 ' . ' 0 . 1 ' ' 1
##
## R e s i d u a l s t a n d a r d e r r o r : 6 . 3 2 6 on 91 d e g r e e s o f freedom
## F− s t a t i s t i c : 1 0 2 . 9 on 6 and 91 DF, p−v a l u e : < 2 . 2 e −16
```
Econometrics I 12
WS 2021/2022, 16:00 – 18:15
2.6
In Model 2.3, we would like to interpret the estimated instantaneous change in the
expected prestige score to a small change in the percentage of women in any given
occupation.
(a) Start by computing the vertex of the function that corresponds to the effect
outlined above.
(b) Does the sign of the effect remain the same on the observed values of Xwomen
or does it change its sign? In case of the former, provide the sign. In the case
of the latter, describe the direction in which it changes (i.e., from positive to
negative or the other way around).
2.7
Astronauts are generally classified as professionals. In the year 1971, they are typically
educated for 18 years, with a typical income of 20 000 CAD and 25% of them are
women.
(a) Compute the expected prestige score using Model 2.2.
(b) Same as above, but with Model 2.3.
2.8
Which model do you deem most appropriate and why so? You may argue using the
information in table 4.
Table 4: AIC/BIC of Models 2.1-2.3
AIC BIC
Model 2.1 655.9331 671.4429
Model 2.2 662.0195 682.6993
Model 2.3 648.4061 669.0859
Econometrics I 13
WS 2021/2022, 16:00 – 18:15
2.9 Bonus Question
If we were to conduct this analysis today, what additional explanatory variables would
you consider including in the model? Are there other ones you would try to omit?
Econometrics I 14
WS 2021/2022, 16:00 – 18:15
3 True or false?
State if you deem the statement true or false. For true statements, provide a brief
explanation as to why they are true. For false statements, provide a correct statement
or an explanation as to why they are false.
3.1
Suppose you would like to fit a simple linear model
Y = β0 = β1 X + u, E(u|X) = 0, V(u|X) = σ 2 , u ∼ N (0, σ 2 ),

using a random sample (Yi , Xi ), i = 1, . . . , N , where the sample size is N = 200.
To that end, you first compute the OLS estimator β̂1 first on the whole sample.
Second, you compute the OLS estimator only on the first half of the sample using
i = 1, . . . , 100, and denote it by β̂1,1 . And third you compute the OLS estimator on
the second half of the sample, using i = 101, . . . , 200, and denote it by β̂1,2 .
Finally, you take the average of β̂1,1 and β̂1,2 . Let’s call this estimator
1 1
β̃1 = β̂1,1 + β̂1,2 .
2 2
You would like to compare the performance of the two estimators β̂1 and β̃1 .
a) It holds that
E(β̃1 ) = E(β̂1 ).
b) It holds that
sd(β̃1 |X) = sd(β̂1 |X),

where X = (X1 , . . . , X200 ).
Econometrics I 15
WS 2021/2022, 16:00 – 18:15
3.2
Under the standard assumptions in linear regression (random sample, correctly

specified model, homoskedasticity, normality of the error term) the OLS estimators
converge to 0 in probability.
3.3
When you model the effect of citizenship on life expectancy, and you have people with
12 different citizenships in your sample, some of them having multiple citizenships,
then it is sufficient to use 11 dummy variables to encode the information on citizenship.
3.4
If you compare two models with the same number of estimable coefficients, BIC
assigns a smaller number to the model with the larger coefficient of determination,
R2 .
3.5
Consider the following linear regression model
Y = β0 + β1 X + β2 D + β3 DX + u, E(u|D, X) = 0,
where X is a continuous explanatory variable and D is an explanatory dummy
variable.
If X increases by 1 unit, we expect Y to increase by β1 + β3 units.
Econometrics I 16
WS 2021/2022, 16:00 – 18:15
3.6 Bonus Question
Suppose you would like to examine the effect of the time spent to prepare for the
midterm Econometrics I exam on the number of points achieved in this exam. To
this end, you ask all your classmates to tell you how much time they have spent and
how many points they have achieved. The maximal number of points which could be
achieved is 30. In your survey, the time spent on the revision varies between 2 hours
and 40 hours, where 10% of your classmates have spent more than 30 hours on the
revision.
With your data, you fit a simple linear regression model of the form
Y = β0 + β1 X + u, E(u|X) = 0,
where Y is the number of points achieved and X is the time spend on revision,
measured in hours. The OLS-estimates are β̂0 = 3 and β̂1 = 1.
a) The linear model is not correctly specified, that is, E(u|X) 6= 0.
b) Now you fit a quadratic model
Y = β0 + β1 X + β2 X 2 + u, E(u|X) = 0.
It is plausible that the OLS estimates β̂1 and β̂2 will both be negative.
Hint: It might be helpful to draw the regression line (a) and regression parabola (b)
to obtain a good visual understanding.
Econometrics I 17

Mock Exam2

Uploaded by

Copyright:

Available Formats

You might also like

Mock Exam2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mock Exam2

Uploaded by

Copyright:

Available Formats

Final on 20 December 2021

WS 2021/2022, 16:00 – 18:15

1 Case Study 1 – Multiple linear regression with

The following model is estimated:

Y = β0 + β1 Xinc + β2 De401k + β3 Dmarr + β4 Dmale + β5 Xage (1)

where the conditional expectation of u given all the explanatory variables is 0.

50 100 150 200

Figure 1: Residuals of Model 1.1 plotted against income

Listing 1: Regression output of Model 1.1

Interpret the effect of the eligibility for 401(k) on savings.

Interpret the effect of the household size on savings.

Listing 1 includes an F-statistic equal to F = 250.6. What is being tested here?

What will happen to the R-squared (coefficient of determination) if Dnotmale is used

2 Case Study 2 – Modelling quadratic effects

Table 1: Summary statistics of the data set

prestige education income women type

We start with the following model

Y = β0 + β1 log(Xincome ) + β2 Xeducation + β3 Dprof + β4 Dwc + u, E(u|X) = 0,

where X = (Xincome , Xeducation , Dprof , Dwc ) and call it Model 2.1.

Figure 2: Residuals of Model 2.1 plotted against fitted values

Y = β0 + β1 Xincome + β2 Xeducation + β3 Dprof

Residuals vs Fitted Normal Q−Q

Fitted values Theoretical Quantiles

Listing 2: Jarque–Bera test on residuals of Model 2.2

Compute the estimated instantaneous change in the expected prestige score to a

Table 2: Covariates of sewing machine operators and

education income women type

Listing 3: Summary of Model 2.2

Y = β0 + β1 log(Xincome ) + β2 Xeducation + β3 Dprof

Table 3: Sum of squared residuals for Models 2.1-2.3

Listing 4: Summary of Model 2.3

2.9 Bonus Question

Suppose you would like to fit a simple linear model

Y = β0 = β1 X + u, E(u|X) = 0, V(u|X) = σ 2 , u ∼ N (0, σ 2 ),

sd(β̃1 |X) = sd(β̂1 |X),

Under the standard assumptions in linear regression (random sample, correctly

Consider the following linear regression model

3.6 Bonus Question

You might also like