Mock Exam2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Final on 20 December 2021

WS 2021/2022, 16:00 – 18:15

This exam should take you at most 120 minutes. Please allocate enough time to upload
your solutions on Learn before 18:15! Please upload one single pdf file only.
Good luck!

1 Case Study 1 – Multiple linear regression with


dummy variables
In this case study, we consider savings of US workers in relation to various socio-
economic factors. One of these factors is the eligibility to participate in a special
pension plan, the so called 401(k) pension plan.
The data set at hand contains savings of 9275 individuals with 6 explanatory variables.
The dependent variable is:
• savings: The total savings measured in 1000 USD. They will by denoted by
Y.
The explanatory variables are:
• inc: The annual income measured in 1000 USD. It will be denoted by Xinc .
• e401k: Eligibility for the 401(k) pension plan coded as a dummy variable.
De401k = 1 if eligible, De401k = 0 if not eligible.
• marr: Marital status coded as a dummy variable. Dmarr = 1 if married,
Dmarr = 0 if not married.
• male: Gender coded as a dummy variable. Dmale = 1 if male, Dmale = 0 if not
male.
• age: Age in years. It will be denoted by Xage .
• hsize: A categorical variable representing the size of a household with three
levels: hsize = 1, hsize = 2, and hsize ≥ 3. Choosing hsize = 1 as a
baseline, the dummy variables DhsizeTwo and DhsizeMThree = 1 code the levels
hsize = 2 and hsize ≥ 3, respectively.

Econometrics I 1
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

The following model is estimated:

Y = β0 + β1 Xinc + β2 De401k + β3 Dmarr + β4 Dmale + β5 Xage (1)


+ β6 DhsizeTwo + β7 DhsizeMThree + β8 Dmarr Xinc + u,

where the conditional expectation of u given all the explanatory variables is 0.


You may find a fitted model in Listing 1.
1500
1000
residuals

500
0
−500

50 100 150 200

income

Figure 1: Residuals of Model 1.1 plotted against income

1.1

In figure 1 you can find a scatterplot of the residuals from the model against the
income. Do you see a violation of any of the usual assumptions in linear regression
which you can check with figure 1? Give an explanation.

Econometrics I 2
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

Listing 1: Regression output of Model 1.1

```
##
## C a l l :
## lm ( f o r m u l a = s a v i n g s ~ i n c + e401k + marr + male + age + h s i z e +
## i n c : marr , data = k401ksubs )
##
## R e s i d u a l s :
## Min 1Q Median 3Q Max
## −506.15 −18.80 −4.30 9.64 1457.88
##
## C o e f f i c i e n t s :
## Esti mate Std . E r r o r t v a l u e Pr ( >| t | )
## ( I n t e r c e p t ) −52.25714 3 . 3 4 4 9 1 −15.623 < 2 e −16 ∗∗∗
## i n c 0.81655 0 . 0 6 4 2 3 1 2 . 7 1 4 < 2 e −16 ∗∗∗
## e401k 5.97070 1.28262 4 . 6 5 5 3 . 2 8 e −06 ∗∗∗
## marr −13.83317 2 . 7 3 4 5 3 −5.059 4 . 3 0 e −07 ∗∗∗
## male 0.42505 1.69556 0.251 0.80206
## age 1.00715 0 . 0 6 0 0 6 1 6 . 7 6 9 < 2 e −16 ∗∗∗
## hsizeTwo 1.99273 2.20166 0.905 0.36543
## hsizeMThree −4.04746 2 . 1 7 8 1 2 −1.858 0 . 0 6 3 1 7 .
## i n c : marr 0.20437 0.07008 2 . 9 1 6 0 . 0 0 3 5 5 ∗∗
## −−−
## S i g n i f . c o d e s : 0 ' ∗ ∗ ∗ ' 0 . 0 0 1 ' ∗ ∗ ' 0 . 0 1 ' ∗ ' 0 . 0 5 ' . ' 0 . 1 ' ' 1
##
## R e s i d u a l s t a n d a r d e r r o r : 5 8 . 0 2 on 9266 d e g r e e s o f freedom
## M u l t i p l e R−s q u a r e d : 0 . 1 7 7 9 , Adjusted R−s q u a r e d : 0 . 1 7 7 2
## F− s t a t i s t i c : 2 5 0 . 6 on 8 and 9266 DF, p−v a l u e : < 2 . 2 e −16
```

Econometrics I 3
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

1.2

Interpret the effect of the eligibility for 401(k) on savings.

1.3

Interpret the effect of the household size on savings.

1.4

Test if the eligibility for 401(k) has a significant effect on the savings on average,
ceteris paribus. (Provide details: Null hypothesis, significance level, test statistic,
p-value, interpretation.)

1.5

What are the expected savings of a 35 year old married male, living in a household of
size 4, who has an annual income of 33 500 USD and who is eligible for 401(k)?

1.6

What is the expected change in savings when the annual income increases by 500 USD
a) for someone who is married, and b) for someone who is not married, respectively,
ceteris paribus?

1.7

Listing 1 includes an F-statistic equal to F = 250.6. What is being tested here?


(Provide details: Null hypothesis, and interpretation.)

Econometrics I 4
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

1.8

Let Dnotmale be a dummy variable equal to 1 if a person is not male and zero otherwise.
If Dnotmale is used in place of Dmale in our model, what will be the OLS estimate of
the coefficient for Dnotmale ?
(Hint: It holds that Dnotmale = 1 − Dmale . Plug this into the model equation (1).)

1.9

What will happen to the R-squared (coefficient of determination) if Dnotmale is used


in place of Dmale ?

1.10

Should Dmale and Dnotmale both be included as independent variables in the model?
Explain.

1.11

What test would you perform to test whether the size of a household has a significant
effect on expected savings, ceteris paribus? Provide null hypothesis.

1.12

What distribution does the test statistic used in the previous question 1.11 follow
under the null hypothesis? Provide the name and the correct parameters (with
numbers).

Econometrics I 5
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

2 Case Study 2 – Modelling quadratic effects


In the following case study, we examine the relationship between the prestige of
various occupations and their corresponding socioeconomic characteristics. The data
set consists of 98 occupations with the following dependent and explanatory variables:
• prestige: the Pineo–Porter prestige score (ranged between 10 to 90 points,
the higher the more prestigious) of the occupation; hereafter referred to as the
variable Y .
• education: years of education (1 unit = 1 year) for the typical employee in the
given occupation; hereafter referred to as the variable Xeducation .
• income: annual income measured in Canadian Dollars (1 unit = 1 CAD) for the
typical employee in the given occupation, hereafter referred to as the variable
Xincome .
• women: the percentage of women (1 unit = 1 percentage point) in the occupation,
hereafter referred to as the variable Xwomen .
• type: the type of occupation divided into three categories: Blue Collar (bc),
Professional (prof) and White Collar (wc). Specifically, choosing type = bc
to be the baseline, type = prof is represented by a dummy variable Dprof and
type = wc is represented by a dummy variable Dwc .
The data set is obtained from the year 1971 Census of Canada. Below in table 1 are
its summary statistics.

Table 1: Summary statistics of the data set

prestige education income women type


Min. :17.30 Min. : 6.380 Min. : 1656 Min. : 0.000 bc :44
1st Qu.:35.38 1st Qu.: 8.445 1st Qu.: 4250 1st Qu.: 3.268 prof:31
Median :43.60 Median :10.605 Median : 6036 Median :14.475 wc :23
Mean :47.33 Mean :10.795 Mean : 6939 Mean :28.986 NA
3rd Qu.:59.90 3rd Qu.:12.755 3rd Qu.: 8226 3rd Qu.:52.203 NA
Max. :87.20 Max. :15.970 Max. :25879 Max. :97.510 NA

Econometrics I 6
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

2.1

We start with the following model

Y = β0 + β1 log(Xincome ) + β2 Xeducation + β3 Dprof + β4 Dwc + u, E(u|X) = 0,

where X = (Xincome , Xeducation , Dprof , Dwc ) and call it Model 2.1.


Discuss the adequacy of the model specification using figure 2 by explicitly referring
to the relevant assumption. Furthermore, discuss what the fullfillment/violation of
such an assumption entails for the OLS estimators.

Residuals vs Fitted
20

medical.technicians
electronic.workers
10
Residuals

0
−10

collectors

20 30 40 50 60 70 80

Fitted values
lm(prestige ~ log(income) + education + type)

Figure 2: Residuals of Model 2.1 plotted against fitted values

Econometrics I 7
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

2.2

Consider a new model, where Xincome is a level variable (we take the raw variable and
not its logarithm). In addition, we include the variable Xwomen and the interaction
term Xincome Xeducation . The new model is now written as:

Y = β0 + β1 Xincome + β2 Xeducation + β3 Dprof


+ β4 Dwc + β5 Xwomen + β6 Xincome Xeducation + u, E(u|X) = 0,
where X = (Xincome , Xeducation , Dprof , Dwc , Xwomen ) and we call it Model 2.2.
Perform residual diagnostics on Model 2.2 using figure 3 and listing 2. For each
figure/listing, explicitly refer to the assumption(s) assessed.

Residuals vs Fitted Normal Q−Q


20

medical.technicians
3 medical.technicians
electronic.workers electronic.workers
2
Standardized residuals
10

1
Residuals

0
0

−1
−10

−2

general.managers
−3

general.managers

30 50 70 −2 −1 0 1 2

Fitted values Theoretical Quantiles

Figure 3: Residuals of Model 2.2 plotted against fitted values (left), QQ-plot of
residuals of Model 2.2 (right)

Econometrics I 8
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

Listing 2: Jarque–Bera test on residuals of Model 2.2

```
##
## Jarque Bera Test
##
## data : r e g 2 _ 2 $ r e s i d u a l s
## X−s q u a r e d = 1 . 2 9 6 6 , d f = 2 , p−v a l u e = 0 . 5 2 2 9
```

2.3

In Model 2.2, test whether the percentage of women has any significant effect on the
expected prestige score, ceteris paribus. When answering this question, explicitly
provide the test name, null hypothesis, significance level, test statistics, p-value and
test decision.
Provide an interpretation of the test decision in the current context.
The summary output of Model 2.2 is provided in listing 3.

2.4

Compute the estimated instantaneous change in the expected prestige score to a


small change in years of education for sewing machine operators. Compute the same
quantity for the university teachers. Relevant covariate information of these two
groups are given in table 2.

Table 2: Covariates of sewing machine operators and


university teachers

education income women type


university.teachers 15.97 12480 19.59 prof
sewing.mach.operators 6.38 2847 90.67 bc

Econometrics I 9
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

Listing 3: Summary of Model 2.2

```
##
## C a l l :
## lm ( f o r m u l a = p r e s t i g e ~ income + e d u c a t i o n + type + income : e d u c a t i o n +
## women , data = Prestige_new )
##
## R e s i d u a l s :
## Min 1Q Median 3Q Max
## −15.2186 −5.0131 0.6606 4.8713 16.9888
##
## C o e f f i c i e n t s :
## Esti mate Std . E r r o r t v a l u e Pr ( >| t | )
## ( I n t e r c e p t ) −2.153 e+01 8 . 0 9 9 e+00 −2.658 0 . 0 0 9 2 8 ∗∗
## income 4 . 3 8 0 e −03 1 . 0 4 8 e −03 4 . 1 8 1 6 . 6 8 e −05 ∗∗∗
## e d u c a t i o n 5 . 2 5 1 e+00 7 . 8 2 2 e −01 6 . 7 1 3 1 . 5 9 e −09 ∗∗∗
## t y p e p r o f 4 . 5 7 6 e+00 3 . 7 6 6 e+00 1.215 0.22751
## typewc −4.812 e+00 2 . 5 9 9 e+00 −1.851 0 . 0 6 7 4 1 .
## women 3 . 9 4 7 e −02 3 . 0 5 9 e −02 1.290 0.20019
## income : e d u c a t i o n −2.415 e −04 7 . 3 6 3 e −05 −3.280 0 . 0 0 1 4 7 ∗∗
## −−−
## S i g n i f . c o d e s : 0 ' ∗ ∗ ∗ ' 0 . 0 0 1 ' ∗ ∗ ' 0 . 0 1 ' ∗ ' 0 . 0 5 ' . ' 0 . 1 ' ' 1
##
## R e s i d u a l s t a n d a r d e r r o r : 6 . 7 8 1 on 91 d e g r e e s o f freedom
## M u l t i p l e R−s q u a r e d : 0 . 8 5 2 4 , Adjusted R−s q u a r e d : 0 . 8 4 2 7
## F− s t a t i s t i c : 8 7 . 5 8 on 6 and 91 DF, p−v a l u e : < 2 . 2 e −16
```

Econometrics I 10
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

2.5

Next, we return to Model 2.1 where Xincome is used as a logged variable. Additionally,
2
we include Xwomen as a linear and as a quadratic term Xwomen . The rationale is that
the effect of female participation in the workforce may have a nonlinear effect on the
prestige score.
Based on these considerations, consider the following model:

Y = β0 + β1 log(Xincome ) + β2 Xeducation + β3 Dprof


2
+ β4 Dwc + β5 Xwomen + β6 Xwomen + u, E(u|X) = 0,
where X = (Xincome , Xeducation , Dprof , Dwc , Xwomen ) and we call it Model 2.3.
(a) Assuming that all relevant assumptions for testing hold, what test can you use
to assess whether the percentage of women has any significant effect on the
expected prestige score, ceteris paribus? In addition to the test name, provide
(in formulas) the corresponding null hypothesis.
(b) Compute the corresponding test statistics. You may find the regression output of
Model 2.3 shown in listing 4 and the table of the sum of squared residuals shown
in table 3 useful. In addition, provide the distribution of the test statistics.
(c) Describe in words how you can use the test statistic computed in (b) to test
the hypothesis defined in (a).

Table 3: Sum of squared residuals for Models 2.1-2.3

SSR
Model 2.1 4096.286
Model 2.2 4184.432
Model 2.3 3641.730

Econometrics I 11
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

Listing 4: Summary of Model 2.3

```
##
## C a l l :
## lm ( f o r m u l a = p r e s t i g e ~ l o g ( income ) + e d u c a t i o n + type + women +
## I (women ^ 2 ) , data = Prestige_new )
##
## R e s i d u a l s :
## Min 1Q Median 3Q Max
## −11.9107 −4.3428 0.3451 3.9136 16.9437
##
## C o e f f i c i e n t s :
## Esti mate Std . E r r o r t v a l u e Pr ( >| t | )
## ( I n t e r c e p t ) −1.097 e+02 1 . 8 6 9 e+01 −5.867 7 . 0 8 e −08 ∗∗∗
## l o g ( income ) 1 . 4 1 0 e+01 2 . 2 8 6 e+00 6 . 1 6 9 1 . 8 7 e −08 ∗∗∗
## e d u c a t i o n 2 . 9 6 6 e+00 5 . 9 1 4 e −01 5 . 0 1 6 2 . 6 0 e −06 ∗∗∗
## t y p e p r o f 6 . 0 3 1 e+00 3 . 5 1 1 e+00 1.718 0.0892 .
## typewc −2.839 e+00 2 . 3 7 1 e+00 −1.197 0.2342
## women −8.208 e −02 8 . 5 6 2 e −02 −0.959 0.3403
## I (women^2) 1 . 8 6 0 e −03 8 . 9 2 1 e −04 2.085 0.0399 ∗
## −−−
## S i g n i f . c o d e s : 0 ' ∗ ∗ ∗ ' 0 . 0 0 1 ' ∗ ∗ ' 0 . 0 1 ' ∗ ' 0 . 0 5 ' . ' 0 . 1 ' ' 1
##
## R e s i d u a l s t a n d a r d e r r o r : 6 . 3 2 6 on 91 d e g r e e s o f freedom
## M u l t i p l e R−s q u a r e d : 0 . 8 7 1 5 , Adjusted R−s q u a r e d : 0 . 8 6 3 1
## F− s t a t i s t i c : 1 0 2 . 9 on 6 and 91 DF, p−v a l u e : < 2 . 2 e −16
```

Econometrics I 12
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

2.6

In Model 2.3, we would like to interpret the estimated instantaneous change in the
expected prestige score to a small change in the percentage of women in any given
occupation.
(a) Start by computing the vertex of the function that corresponds to the effect
outlined above.
(b) Does the sign of the effect remain the same on the observed values of Xwomen
or does it change its sign? In case of the former, provide the sign. In the case
of the latter, describe the direction in which it changes (i.e., from positive to
negative or the other way around).

2.7

Astronauts are generally classified as professionals. In the year 1971, they are typically
educated for 18 years, with a typical income of 20 000 CAD and 25% of them are
women.
(a) Compute the expected prestige score using Model 2.2.
(b) Same as above, but with Model 2.3.

2.8

Which model do you deem most appropriate and why so? You may argue using the
information in table 4.
Table 4: AIC/BIC of Models 2.1-2.3

AIC BIC
Model 2.1 655.9331 671.4429
Model 2.2 662.0195 682.6993
Model 2.3 648.4061 669.0859

Econometrics I 13
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

2.9 Bonus Question

If we were to conduct this analysis today, what additional explanatory variables would
you consider including in the model? Are there other ones you would try to omit?

Econometrics I 14
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

3 True or false?
State if you deem the statement true or false. For true statements, provide a brief
explanation as to why they are true. For false statements, provide a correct statement
or an explanation as to why they are false.

3.1

Suppose you would like to fit a simple linear model

Y = β0 = β1 X + u, E(u|X) = 0, V(u|X) = σ 2 , u ∼ N (0, σ 2 ),


using a random sample (Yi , Xi ), i = 1, . . . , N , where the sample size is N = 200.
To that end, you first compute the OLS estimator β̂1 first on the whole sample.
Second, you compute the OLS estimator only on the first half of the sample using
i = 1, . . . , 100, and denote it by β̂1,1 . And third you compute the OLS estimator on
the second half of the sample, using i = 101, . . . , 200, and denote it by β̂1,2 .
Finally, you take the average of β̂1,1 and β̂1,2 . Let’s call this estimator

1 1
β̃1 = β̂1,1 + β̂1,2 .
2 2
You would like to compare the performance of the two estimators β̂1 and β̃1 .
a) It holds that

E(β̃1 ) = E(β̂1 ).

b) It holds that

sd(β̃1 |X) = sd(β̂1 |X),


where X = (X1 , . . . , X200 ).

Econometrics I 15
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

3.2

Under the standard assumptions in linear regression (random sample, correctly


specified model, homoskedasticity, normality of the error term) the OLS estimators
converge to 0 in probability.

3.3

When you model the effect of citizenship on life expectancy, and you have people with
12 different citizenships in your sample, some of them having multiple citizenships,
then it is sufficient to use 11 dummy variables to encode the information on citizenship.

3.4

If you compare two models with the same number of estimable coefficients, BIC
assigns a smaller number to the model with the larger coefficient of determination,
R2 .

3.5

Consider the following linear regression model

Y = β0 + β1 X + β2 D + β3 DX + u, E(u|D, X) = 0,
where X is a continuous explanatory variable and D is an explanatory dummy
variable.
If X increases by 1 unit, we expect Y to increase by β1 + β3 units.

Econometrics I 16
Final on 20 December 2021
WS 2021/2022, 16:00 – 18:15

3.6 Bonus Question

Suppose you would like to examine the effect of the time spent to prepare for the
midterm Econometrics I exam on the number of points achieved in this exam. To
this end, you ask all your classmates to tell you how much time they have spent and
how many points they have achieved. The maximal number of points which could be
achieved is 30. In your survey, the time spent on the revision varies between 2 hours
and 40 hours, where 10% of your classmates have spent more than 30 hours on the
revision.
With your data, you fit a simple linear regression model of the form

Y = β0 + β1 X + u, E(u|X) = 0,
where Y is the number of points achieved and X is the time spend on revision,
measured in hours. The OLS-estimates are β̂0 = 3 and β̂1 = 1.
a) The linear model is not correctly specified, that is, E(u|X) 6= 0.
b) Now you fit a quadratic model

Y = β0 + β1 X + β2 X 2 + u, E(u|X) = 0.
It is plausible that the OLS estimates β̂1 and β̂2 will both be negative.
Hint: It might be helpful to draw the regression line (a) and regression parabola (b)
to obtain a good visual understanding.

Econometrics I 17

You might also like