Chapter 3 Answers

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ECON 482 / WH Hong Answer Key

Answer Key: Problem Set 3

1. In a study relating college grade point average to time spent in various activities, you
distribute a survey to several students. The students are asked how many hours they
spend each week in four activities: studying, sleeping, working, and leisure. Any activity
is put into one of the four categories, so that for each student, the sum of hours in the four
activities must be 168.
i. In the model
GPA = β 0 + β1study + β 2 sleep + β3 work + β 4leisure + u

does it make sense to hold sleep , work , and leisure fixed while changing
study ?
(Ans)
No. By definition, study + sleep + work + leisure = 168. Therefore, if we change
study, we must change at least one of the other categories so that the sum is still 168.

ii. Explain why this model violates Assumption MLR.3.


(Ans)
From part (i), we can write, say, study as a perfect linear function of the other
independent variables: study = 168 − sleep − work − leisure. This holds for every
observation, so MLR.3 violated

iii. How could you reformulate the model so that its parameters have a useful
interpretation and it satisfies Assumption MLR.3?
(Ans)
Simply drop one of the independent variables, say leisure:
GPA = β 0 + β1 study + β 2 sleep + β 3 work + u.

Now, for example, β1 is interpreted as the change in GPA when study increases by

one hour, where sleep, work, and u are all held fixed. If we are holding sleep and
work fixed but increasing study by one hour, then we must be reducing leisure by one
hour. The other slope parameters have a similar interpretation.

1
ECON 482 / WH Hong Answer Key

2. Suppose that average worker productivity at manufacturing firms ( avgprod ) depends on


two factors, average hours of training ( avgtrain ) and average worker ability ( avgabil ):
avgprod = β 0 + β1avgtrain + β 2 avgabil + u
Assume that this equation satisfies the Gauss-Markov assumptions. If grants have been
given to firms whose workers have less than average ability, so that avgtrain and

avgabil are negatively correlated, what is the likely bias in β1 obtained from the

simple regression of avgprod on avgtrain ?


(Ans)
We know that β1 = βˆ1 + δˆ1βˆ2 , where δˆ1 is from the regression of x1 = δˆ0 + δˆ1 x2 + eˆ . By

definition, β 2 > 0, and by assumption, Corr(x1,x2) < 0. Therefore, there is a negative

bias in β1 : E( β1 ) < β1 . This means that, on average across different random samples,

the simple regression estimator underestimates the effect of the training program. It is
even possible that E( β1 ) is negative even though β1 > 0.

3.
i. Consider the simple regression model y = β 0 + β1 x + u under the first four Gauss-

Markov assumptions. For some function g ( x ) , for example g ( x ) = x 2 or

g ( x ) = log (1 + x 2 ) , define zi = g ( xi ) . Define a slope estimator as

⎛ n
⎞ ⎛ n

β1 = ⎜ ∑ ( zi − z ) yi ⎟ ⎜ ∑ ( zi − z ) xi ⎟
⎝ i =1 ⎠ ⎝ i =1 ⎠

Show that β1 is linear (in yi ) and unbiased. Remenber, because E ( u x ) = 0 , you

can treat both xi and zi as nonrandom in your derivation.


(Ans)
n
For notational simplicity, define szx = ∑ (z
i =1
i − z ) xi ; this is not quite the sample

covariance between z and x because we do not divide by n – 1, but we are only using
it to simplify notation. Then we can write β1 as

2
ECON 482 / WH Hong Answer Key

∑ (z i − z ) yi
β1 = i =1
.
szx

This is clearly a linear function of the yi: take the weights to be wi = (zi − z )/szx.
To show unbiasedness, as usual we plug yi = β 0 + β1 xi + ui into this equation, and
simplify:
n

∑ ( z − z )( βi 0 + β1 xi + ui )
β1 = i =1

szx
n n
β 0 ∑ ( zi − z ) + β1szx + ∑ ( zi − z )ui
= i =1 i =1

szx
n

∑ ( z − z )u
i i
= β1 + i =1

szx
n
where we use the fact that ∑ ( z − z ) = 0 always.
i =1
i Now szx is a function of the zi

and xi and the expected value of each ui is zero conditional on all zi and xi in the
sample. Therefore, conditional on these values,
n

∑ ( z − z )E(u )
i i
E( β1 ) = β1 + i =1
= β1
szx
because E(ui) = 0 for all i.

ii. Add the homoskedasticity assumption, MLR.5. Show that


2
⎛ n 2⎞ ⎛ n ⎞
( )
var β1 = σ 2 ⎜ ∑ ( zi − z ) ⎟
⎝ i =1 ⎠
⎜ ∑ ( zi − z ) xi ⎟ .
⎝ i =1 ⎠
(Ans)
From the fourth equation in part (i) we have (again conditional on the zi and xi in the
sample),

3
ECON 482 / WH Hong Answer Key

⎡ n ⎤ n
Var ⎢ ∑ ( zi − z )ui ⎥ ∑ (z − z )
i
2
Var(ui )
Var( β1 ) = ⎣ i =1 ⎦= i =1

szx2 szx2
n

∑ (z − z ) i
2

=σ2 i =1

szx2

because of the homoskedasticity assumption [Var(ui) = σ2 for all i]. Given the
definition of szx, this is what we wanted to show.

iii. Show directly that, under the Gauss-Markov assumption, var βˆ1 ≤ var β1 , where ( ) ( )
β̂1 is the OLS estimator. [Hint: The Cauch inequality implies that
2
⎛ −1 n ⎞ ⎛ −1 n 2 ⎞⎛ −1
n
2⎞


n ∑i =1
( z i − z )( xi − x ) ≤
⎟ ⎜
⎠ ⎝
n ∑i =1
( zi − z ) ⎟⎜
⎠⎝
n ∑
i=1
( xi − x ) ⎟ ;

notice that we can drop x from the sample covariance.]
(Ans)
n
We know that Var( β̂1 ) = σ2/ [∑ ( xi − x ) 2 ]. Now we can rearrange the inequality in
i =1

the hint, drop x from the sample covariance, and cancel n-1 everywhere, to get
n n
[∑ ( zi − z ) 2 ] / szx2 ≥ 1/[∑ ( xi − x ) 2 ]. When we multiply through by σ2 we get
i =1 i =1

Var( β1 ) ≥ Var( β̂1 ), which is what we wanted to show.

Computer Exercises
4. Confirm the partiallling out interpretation of the OLS estimates by explicitly doing the
partialling out for Example 3.2 in the textbook, using the data set in WAGE1.dta. This
first requires regressing educ on exper and tenure and saving the residuals, r̂1 .

Then, regress log ( wage ) on r̂1 . Compare the coefficient on r̂1 with the coefficient on

educ in the regression of log ( wage ) on educ , exper and tenure .

(Ans)
The regression of educ on exper and tenure yields

4
ECON 482 / WH Hong Answer Key

educ = 13.57 − .074 exper + .048 tenure + r̂1 .

n = 526, R2 = .101.
Now, when we regress log(wage) on r̂1 we obtain
n
log( wage) = 1.62 + .092 r̂1

n = 526, R2 = .207.
As expected, the coefficient on r̂1 in the second regression is identical to the coefficient
on educ in equation (3.19). Notice that the R-squared from the above regression is
below that in (3.19). In effect, the regression of log(wage) on r̂1 explains log(wage)
using only the part of educ that is uncorrelated with exper and tenure; separate effects of
exper and tenure are not included.

5. Use the data set in WAGE2.dta for this problem. As usual, be sure all of the following
regression contain an intercept.
i. Run a simple regression of IQ on educ to obtain the slope coefficient, say, δ1 .
(Ans)
The slope coefficient from the regression IQ on educ is (rounded to five decimal
places) δ1 = 3.53383.

ii. Run the simple regression of log ( wage ) on educ , and obtain the slope coefficient,

β1 .
(Ans)
The slope coefficient from log(wage) on educ is β1 = .05984.

iii. Run the multiple regression of log ( wage ) on educ and IQ , and obtain the slope

coefficients, β̂1 and βˆ2 , respectively.


(Ans)
The slope coefficients from log(wage) on educ and IQ are

5
ECON 482 / WH Hong Answer Key

βˆ1 = .03912 and βˆ2 = .00586, respectively.

iv. Verify that β1 = βˆ1 + βˆ2δ1 .


(Ans)
We have βˆ1 + δ1βˆ2 = .03912 + 3.53383(.00586) ≈ .05983, which is very close
to .05984; the small difference is due to rounding error.

You might also like