BEC 341 2022 Assign 3 - 231120 - 152534

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

KWAME NKRUMAH UNIVERSITY

SCHOOL OF BUSINESS STUDIES


DEPARTMENT OF ECONOMICS & FINANCE

Econometrics Assign 3 Due Date: 29th November 2022 by 8:00AM, Lecturer: S. Sikota
1 Theory
1. Given the following multiple linear regression:

y = β0 + β1x1 + β2x2 + ... + βkxk + u

(a) What restrictions do we impose on the βj’s in multiple linear regression analysis? [2 marks]
(b) If x1 and x2 are have a positive correlation coefficient how does that influence our estimates of β1
[2 marks]

(c) Now, given that x1 and x2 are correlated, explain how we would interpret there coefficient estimates βˆ1
and βˆ2? [2 marks]
(d) Explain the intuition of the ”partialling out” effect of multiple linear regression in your own words.[3
marks]

Now we are concerned with the goodness of fit of our model.


(e) Give the formulae and intuitive meanings of SSR, SST, and SSE. [3 marks]
(f) We know that SST = SSE + SSR. Explain in your own words the intuitive meaning of this expression. [2
marks]
(g) Using the definition R2 = 1 − SSR/SST, explain why the R2 can never decrease as variables are added to
it. [3 marks]
(h) Write down an alternative expression for the R2 (not including SSR/SST/SSE), and explain why it is
useful as a measure of goodness of fit. [3 marks]

2. We use a technique called ordinary least squares when solving for the estimators in multiple linear
regression.
(a) Why is it so named? [2 marks]
(b) Why do we minimize the sum of squares and just not the ordinary residual? [2 marks]
(c) How do we interpret a negative residual? [2 marks]
(d) Given the following multiple linear regression:

y = βˆ0 + βˆ1x1 + βˆ2x2

We would like to derive the OLS estimators βˆ0,βˆ1, and βˆ2. Using the ordinary least squares technique,
derive the set of equations which can be used to solve for βˆ0,βˆ1,βˆ2. Explain each step of your working.
[5 marks]

3. Given the following multiple linear regression:

y = β0 + β1x1 + β2x2 + ... + βkxk + u

You decide to try to include each of the following variables into the regression: (1) first lnxk then (2) sqrtxk
and finally (3) xk + x1 (see below). Would you expect any of these to cause problems in your regression? Why
or why not? What could you do if problems arise? [5 marks]

4. Write down each of the four assumptions underlying multiple linear regression analysis and explain eachof
them in your own words. [8 marks]

5. Write down the formal definition of bias in a regression of y on x1 where x2 is omitted, and explain each
component of the bias in your own terms. [6 marks]
6. Consider the regression output below, which you are familiar with from the slides. It presents the effectsof
age and education on wage.

1
(a) Interpret the R2 and explain if it is useful or not in the case of this regression? [2 marks]
(b) Is the estimate of βeduc reliable or not? Justify your answer. [3 marks]
(c) If we had a dataset of 50 variables containing demographic information on all the individuals in the
sample, including wage and age, should we include all of them in the model? Why or why not?
Justify your answer using the tools you have learnt in class. [5 marks]

TOTAL = 60

2 Applied Questions
Use the project dataset dataset2022.dta to answer the following questions (available under the Project folder.
Please attach your do file to your tut hand-in. This section may only be completed using Stata. Note - your do file
must run from start to finish with no errors.

You are strongly advised to have worked through the last week’s tutorial in Stata in full before beginning this
tutorial.

1. Open dataset2022.dta, then clean and summarise the following 5 new variables you may be using in your
project:

(a) Highest grade completed


(b) Age
(c) Total household income (in March)
(d) Rural
(e) Femal

Hint:Use the lookfor command to find the variables. Clean carefully, check, label, etc - i.e. use the
recommended cleaning techniques used in the first Stata tutorial. Messy or uncommented code will be
penalised. For the last two variables set them to one if rural or female and zero otherwise. Do not set missing
values to zero! Make sure you also analyse the number of observations for each variable. [10 marks]

2. Run the following 2 lines of code:

reg hhincome age educ reg


hhinc age

We would like to do a comparison of simple and multiple linear regression, focussing on the coefficient on
age. The simple regression we write out as:

hhinc\ = βˆ0 + βˆ1ageyrs + βˆ2educ

(a) Write out an equation which connects β˜1 and βˆ1. Explain each of the components. [3 marks]

(b) Run a regression in order to be able to obtain the values of all the components in your equation in
Question 2a. Explain the meaning of your chosen regression. [2 marks]

(c) Under what conditions would β˜1 and βˆ1 equal each other? Does your output, coupled with your
equation, make you conclude that the SLR and MLR coefficients on age are equal? [2 marks]

2
3. We would like to check that after a regression, the properties of OLS hold. We do this using the codebelow,
which uses a new command, predict. Run the following commands, examine the output carefully, and then
explain what the purpose is of each of lines (3) to (8), and whether the results agree or disagree with the
properties of OLS. [6 marks]

* (1)
cap drop hhinc uhat
* (2)reg hhinc age educ
female
* (3)predict hhinc * (4)
predict uhat, resid
* (5)su uhat * (6) br hhinc
hhinchat uhat
* (7)
su hhinc hhinchat
* (8)corr uhat age

4. Given the regression coefficients and constant in Question 3, calculate using the display command
yourpredicted household income level (hhinchat), and your residual (uhat) (insert 1 or 0 for female as
appropriate). Explain the sign of your residual and whether it makes sense to you. Would you be studying
more or less if you were in this sample? You may use any values of the other explanatory variables you wish.
[4 marks]

5. Run the following code and use the results to answer the following questions:

cap drop uhat hhinchat reg hhinc


age female educ rural count count
if e(sample) == 1 predict hhinchat
predict uhat, resid sort hhid
br hhid pid hhinc hhinchat uhat age educ female rural

Was the full sample included in the regression? Why or why not? Does everyone have non-missing values for
uhat or hhinchat? Why or why not? [6 marks]

TOTAL = 40

You might also like