Ps6sol Fa13 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Problem Set 6 Answers

Economics 3125
Fall 2013

Claire S.H. Lim

1. The median starting salary for new law school graduates is determined by

log(salary) = β0 + β1 LSAT + β2 GPA + β3 log(libvol) + β4 log(cost) + β5 rank + u

where LSAT is the median LSAT score for the graduating class, GPA is the median college GPA for
the class, libvol is the number of volumes in the law school library, cost is the annual cost of attending
law school, and rank is a law school ranking (with rank = 1 being the best).

(a) Explain why we expect β5 ≤ 0.


Answer. A larger value of rank is associated with lower perceived law school quality. We would
expect graduates of lower-ranked schools to have lower earnings.

(b) What signs do you expect for the other slope parameters? Justify your answers.
Answer. We expect that each of the other slope coefficients, β1 through β4 , will be positive.
Higher LSAT scores and college GPAs indicate higher student quality. The other two variables,
the number of volumes in the law library and the cost of attendance, are indicators of the quality
of the law school. For example, schools that hire better law faculty will have a higher cost of
attendance.

(c) Using the data in LAWSCH85.DTA, the estimated equation is

\ = 8.34 + 0.0047 LSAT + 0.248 GPA + 0.095 log(libvol)


log(salary)

+ 0.038 log(cost) − 0.0033 rank

n = 136, R2 = 0.842

What is the predicted ceteris paribus difference in salary for schools with a median GPA differ-
ent by one point? (Report your answer as a percentage.)

1
Answer. The estimate β̂2 says that a one point increase in median GPA, holding other ex-
planatory variables constant, is associated with a 0.248 proportional increase in predicted salary,
which is a 24.8% increase in predicted salary.

(d) Interpret the coefficient on the variable log(libvol).


Answer. This is an elasticity estimate: for a one percent increase in the number of volumes
in the law library, holding other explanatory variables constant, predicted salary increases by
0.095%. [Note that we had to multiply the estimate by 100 to express a percentage change in
part (c), but did not have to adjust the estimate to report a percentage change in part (d).]

(e) Would you say it is better to attend a higher ranked law school? How much is a difference in
ranking of 20 worth in terms of predicted starting salary?
Answer. It does appear that students from better-ranked schools earn higher starting salaries,
even after controlling for some important objective measures of student and school quality. The
ceteris paribus effect of moving up 20 places in the ranking is (−20)(100)(−0.0033) = 6.6,
interpreted as a 6.6% increase in predicted starting salary.

2. Suppose that you are interested in estimating the ceteris paribus relationship between y and x1 . For
this purpose, you can collect data on a control variable, x2 . (For concreteness, you might think of y as
final exam score, x1 as class attendance and x2 as GPA up through the previous semester.) Let β̃1 be
the simple regression estimate from y on x1 and let β̂1 be the multiple regression estimate from y on
x1 and x2 .

(a) If x1 is highly correlated with x2 in the sample, and x2 has a large partial effects on y, would you
expect β̃1 and β̂1 be similar or very different? Explain.
Answer. Recall that β̃1 = β̂1 + β̂2 δ̃1 , where β̂2 is the estimated coefficient on x2 in the multiple
regression and δ̃1 is the estimated coefficient from a simple regression of x2 on x1 . The problem
tells us that δ̃1 is nonzero and that β̂2 is large, so we would expect β̃1 and β̂1 to be very different.

(b) If x1 is almost uncorrelated with x2 , but x2 has a large partial effect on y, will β̃1 and β̂1 tend to
be similar or very different? Explain.
Answer. In this case we are told that δ̃1 is nearly zero, so even though β̂2 is large, we would
expect β̃1 and β̂1 to be similar.

2
3. Regression analysis can be used to test whether the market efficiently uses information in valuing
stocks. For concreteness, let return be the total return from holding a firm’s stock over the four-year
period from the end of 1990 to the end of 1994. The efficient markets hypothesis says that these
returns should not be systematically related to information known in 1990. If firm characteristics
known at the beginning of the period help to predict stock returns, then we could use this information
in choosing stocks.
For 1990, let dkr be a firm’s debt to capital ratio, let eps denote the earnings per share, let netinc
denote net income, and let salary denote total compensation for the CEO.

(a) Using the data in RETURN.DTA, the following equation was estimated:

\ = −14.37 + 0.321 dkr + 0.043 eps − 0.0051 netinc + 0.0035 salary


return
(6.89) (0.201) (0.078) (0.0047) (0.0022)

n = 142, R2 = 0.0395

Test whether the explanatory variables are jointly significant at the 5% level. Is any explanatory
variable individually significant?
Answer. For the test of joint significance, the null hypothesis is that all the slope coefficients
are zero, so H0 : βdkr = βeps = βnetinc = βsalary = 0. The alternative hypothesis is that at least
one slope coefficient is not zero. In the restricted model, there are no explanatory variables, so
R2r = 0. The test statistic is
(R2ur − R2r )/q (0.0395 − 0)/4
F= = = 1.41
(1 − R2ur )/(n − k − 1) (1 − 0.0395)/(142 − 4 − 1)
The 5% critical value for an F distribution with 4 numerator degrees of freedom and 137 de-
nominator degrees of freedom is c = 2.45. (It would also be fine to take c = 2.37.) Since F ≤ c,
we cannot reject the null hypothesis.
Each individual significance test is of the null hypothesis H0 : β j = 0 against the two-sided al-
β̂ j −0
ternative hypothesis HA : β j 6= 0. The test statistic is t = . Because the sample is large, this
se(β̂ j )
test statistic is distributed approximately standard normal under the null hypothesis, and the 5%
critical value for the test is 1.96. The test statistics are:

dkr: t = 1.60 netinc: t = −1.09

eps: t = 0.55 salary: t = 1.59

3
None of these magnitudes is greater than 1.96, so in each of the four tests we fail to reject the
null hypothesis.

(b) Now, reestimate the model using the log form for netinc and salary:

\ = −36.30 + 0.327 dkr + 0.069 eps − 4.74 log(netinc) + 7.24 log(salary)


return
(39.37) (0.203) (0.080) (3.39) (6.31)

n = 142, R2 = 0.0330

Do any of your conclusions from part (a) change?


Answer. All of the reasoning given in part (a) also applies here. For the joint test, the test statistic
is F = 1.17, so the null hypothesis is not rejected. For the individual tests, the test statistics are:

dkr: t = 1.61 netinc: t = −1.40

eps: t = 0.86 salary: t = 1.15

As in part (a), none of these test statistics has a magnitude larger than 1.96, so in each test we
fail to reject the null hypothesis.

(c) Interpret the coefficient on log(salary).


Answer. Holding other explanatory variables constant, a 1% increase CEO compensation is
associated with an increase of 0.0724 in the predicted total return from holding the stock.

(d) In this sample, some firms have zero debt and others have negative earnings. Should we try to
use log(dkr) or log(eps) in the model to see if these improve the fit? Explain.
Answer. No. Using logs for these variables will drop some observations from the regression,
because log(x) is undefined for x ≤ 0. This will be represented as a missing value in Stata.

(e) Overall, is the evidence for predictability of stock returns strong or weak?
Answer. The evidence is very weak. No explanatory variables have significant effects at the 5%
level, and the null hypothesis that all the effects are zero cannot be rejected in either model. In
addtion, the firm characteristics explain less than 4% of the variation in return.

4. A problem of interest to health officials (and others) is to determine the effects of smoking during
pregnancy on infant health. One measure of infant health is birth weight; a birth weight that is too low
can put an infant at risk for contracting various illnesses. Since factors other than cigarette smoking
that affect birth weight are likely to be correlated with smoking, we should take those factors into

4
account. For example, higher income generally results in access to better prenatal care, as well as
better nutrition for the mother. An equation that recognizes this is

bwght = β0 + β1 cigs + β2 f aminc + u

(a) What is the most likely sign for β2 ?


Answer. As the problem states, higher family income is generally associated with better prenatal
care and better maternal nutrition, so we expect β2 > 0.

(b) Do you think that cigs and faminc are likely to be correlated? Explain why the correlation might
be positive or negative.
Answer. If cigarettes are a normal good for individuals, smoking will increase with income at
the individual level. But this relationship may break down at the aggregate level. For example,
individuals with more education tend to have higher incomes and smoke less. In our sample, the
correlation between cigs and faminc is −0.173, a mild negative relationship.

(c) Now, estimate the equation with and without faminc, using the data in bwght.dta. Report the
results in equation form, including the sample size and R-squared. Discuss your results, focusing
on whether adding faminc substantially changes the estimated effect of cigs on bwght.
Answer. The simple and multiple regression results are:

\ = 119.77 − 0.514 cigs


bwght

n = 1388, R2 = 0.023

\ = 116.97 − 0.463 cigs + 0.093 f aminc


bwght

n = 1388, R2 = 0.030

Unsurprisingly, smoking during pregnancy is associated with lower birthweight, while higher
family income is associated with higher birthweight. The estimated effect of smoking increases
absolutely (becomes less negative) by a small amount in the multiple regression compared to the
simple regression. Recall the relationship between simple and multiple regression coefficients:
since the multiple regression coefficient on faminc is positive and cigs and faminc are negatively
correlated, the simple regression coefficient on cigs will be lower than the multiple regression
coefficient on cigs.

5
5. Consider the following model of test score “production”, which can be used to study the effects of
attending (or skipping) class on a student’s final exam score:

f inal = β0 + β1 ACT + β2 attend + u

where ACT is the student’s ACT score, and attend is the fraction of classes attended over the semester.
Estimate the equation using the dataset attend.dta, and report the estimated coefficients, standard
errors, sample size and r-squared.
Answer. We have:

f inal i = 9.41 + 0.530 ACTi + 0.174 attendi


[ n = 680, R2 = 0.17
(1.45) (0.048) (0.031)

(a) Using the estimates and standard errors, show how to construct the 95% confidence interval for
β1 .
Answer. We construct the 95% confidence interval for β1 as follows: β̂1 ± se(β̂1 ) · c α2 = 0.530 ±
0.048 · 1.96 = [0.436, 0.624].

(b) Can you reject the hypothesis H0 : β2 = 0 against the two-sided alternative at the 5% level? Find
the p-value for this test. First construct your t-statistic and conduct the test using the estimates
and standard errors, then confirm your answer using the test post-estimation command in Stata.
(Note that the test command uses an F-test and not a t-test, but this should give you a similar
answer.)
Answer. The p-value for this test is constructed as: 2Φ(−|t|). The t-statistic, t, is:

β̂2 − 0 0.174 − 0
t= = = 5.61
se(β̂2 ) 0.031

Therefore the p-value is: 2Φ(−5.61) ≈ 0. Similarly, the test post-estimation command indicates
a p-value of 0.0000. Since the p-value is less than 0.05, we reject the null hypothesis that β2 = 0
at the 5% level.

(c) Can you reject the hypothesis H0 : β2 = 0.2 against the two-sided alternative at the 5% level?
Find the p-value for this test. First construct your t-statistic and conduct the test using the
estimates and standard errors, then confirm your answer using the test post-estimation command
in Stata.

6
Answer. The t-statistic, t, is:

β̂2 − 0.2 0.174 − 0.2


t= = = −0.839
se(β̂2 ) 0.031

Therefore the p-value is: 2Φ(−0.839) ≈ 2(0.2005) = 0.4010. The test command indicates a
p-value of 0.3947. Since the p-value is greater than 0.05, we fail to reject the null hypothesis
that β2 = 0.2 at the 5% level.
˜
(d) Compute β2 by running the bi-variate regression of f inal on attend, ˜ is the residual
where attend
of the regression of attend on ACT . (Hint: to put residuals into a new variable, type predict
newvar, resid after your regression command).
Answer. We find β̃2 = 0.174, which is identical to β̂2 above.

(e) Using the formulas we covered in class, show that the estimate of β2 for the bivariate regression
of f inal on attend is a function of the multivariate estimates and the coefficient of the auxiliary
regression of ACT on attend.
Answer. We’ll expand the coefficients in writing them out to reduce roundoff error. In the
bivariate regression of f inal on attend, we have β̃2 = 0.1209031. In the multivariate regression
of of f inal on ACT and attend, we have β̂1 = 0.5299129 and β̂2 = 0.1739337. Finally, in the
auxiliary regression of ACT on attend, we have δ̃2 = −0.1000742. Then by the formula covered
in class, we have 0.1209031 = β̃2 = β̂2 + β̂1 · δ̃2 = 0.1739337 + 0.5299129 · −0.1000742.

(f) Now, back to the multivariate model. Give one example of an omitted variable that can cause
bias in the estimated coefficient on attend in the multivariate regression. Propose a formula for
the bias in terms of the true parameter, the variance of attend, and the covariance of the omitted
variable and attend. You can assume that ACT is uncorrelated with attend.
Answer. An omitted variable that may cause bias in the estimated coefficient on attend in the
multivariate regression is the number of other classes a student is enrolled in that semester (let’s
call it courseload). Let β̃2 be the estimated coefficient on attend in the short regression (that
omits courseload). Let β̂2 be the estimated coefficient on attend in the long regression, and let
β̂3 be the estimated coefficient on courseload in the long regression. Then a formula for the bias
ˆ
is: β̃2 = β̂2 + β̂3 · Cov(attend,courseload)
ˆ
Var(attend)
.

7
Do and Log Files for Problems 4 and 5

DO FILE

*Econ 3125, Applied Econometrics


*Program Name: problem_set6.do

set more off


capture log close
local path "C:/Econ 3125/Fall 2013/Problem Sets/Problem Set 6"
log using "‘path’/problem_set6.log", replace

/*Question 4*/

use "‘path’\bwght.dta"

/*Part (c)*/

reg bwght cigs

reg bwght faminc

clear all

/*Question 7*/

use "‘path’\attend.dta"

reg final ACT attend

8
/*Part (b)*/

test attend

/*Part (c)*/

test attend=0.2

/*Part (d)*/

reg attend ACT

predict attend_tilde, resid

reg final attend_tilde

/*Part (e)*/

reg final ACT attend

reg final attend

reg ACT attend

di 0.1739337 + 0.5299129*-0.1000742

log close
exit

LOG FILE

9
log: C:/Econ 3125/Fall 2013/Problem Sets/Problem Set 6/problem_set6.log
.
. /*Question 4*/
.
. use "‘path’\bwght.dta"
.
. /*Part (c)*/
. reg bwght cigs
Source | SS df MS Number of obs = 1388
-------------+------------------------------ F( 1, 1386) = 32.24
Model | 13060.4194 1 13060.4194 Prob > F = 0.0000
Residual | 561551.3 1386 405.159668 R-squared = 0.0227
-------------+------------------------------ Adj R-squared = 0.0220
Total | 574611.72 1387 414.283864 Root MSE = 20.129
------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.5137721 .0904909 -5.68 0.000 -.6912861 -.3362581
_cons | 119.7719 .5723407 209.27 0.000 118.6492 120.8946
------------------------------------------------------------------------------
.
. reg bwght faminc
Source | SS df MS Number of obs = 1388
-------------+------------------------------ F( 1, 1386) = 16.65
Model | 6819.0527 1 6819.0527 Prob > F = 0.0000
Residual | 567792.667 1386 409.662819 R-squared = 0.0119
-------------+------------------------------ Adj R-squared = 0.0112
Total | 574611.72 1387 414.283864 Root MSE = 20.24
------------------------------------------------------------------------------

10
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
faminc | .1183234 .0290016 4.08 0.000 .0614317 .1752152
_cons | 115.265 1.001901 115.05 0.000 113.2996 117.2304
------------------------------------------------------------------------------
.
.
. clear all
.
.
. /*Question 5*/
.
. use "‘path’\attend.dta"
.
. reg final ACT attend
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 2, 677) = 69.38
Model | 2561.91192 2 1280.95596 Prob > F = 0.0000
Residual | 12500.0351 677 18.4638628 R-squared = 0.1701
-------------+------------------------------ Adj R-squared = 0.1676
Total | 15061.9471 679 22.1825435 Root MSE = 4.297
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ACT | .5299129 .047828 11.08 0.000 .4360038 .6238219
attend | .1739337 .0306059 5.68 0.000 .1138398 .2340276
_cons | 9.414827 1.447809 6.50 0.000 6.572091 12.25756
------------------------------------------------------------------------------
.
.

11
. /*Part (b)*/
.
. test attend
( 1) attend = 0
F( 1, 677) = 32.30 Prob > F = 0.0000
.
.
. /*Part (c)*/
.
. test attend=0.2
( 1) attend = .2
F( 1, 677) = 0.73 Prob > F = 0.3947
.
.
. /*Part (d)*/
.
. reg attend ACT
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 1, 678) = 17.00
Model | 494.15501 1 494.15501 Prob > F = 0.0000
Residual | 19711.1391 678 29.0724766 R-squared = 0.0245
-------------+------------------------------ Adj R-squared = 0.0230
Total | 20205.2941 679 29.7574287 Root MSE = 5.3919
------------------------------------------------------------------------------
attend | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ACT | -.2443857 .0592769 -4.12 0.000 -.3607739 -.1279974
_cons | 31.64825 1.350265 23.44 0.000 28.99705 34.29946
------------------------------------------------------------------------------
.

12
. predict attend_tilde, resid
.
. reg final attend_tilde
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 1, 678) = 27.95
Model | 596.319818 1 596.319818 Prob > F = 0.0000
Residual | 14465.6272 678 21.3357334 R-squared = 0.0396
-------------+------------------------------ Adj R-squared = 0.0382
Total | 15061.9471 679 22.1825435 Root MSE = 4.6191
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
attend_tilde | .1739337 .0329002 5.29 0.000 .1093353 .2385321
_cons | 25.89118 .1771329 146.17 0.000 25.54338 26.23897
------------------------------------------------------------------------------
.
.
. /*Part (e)*/
.
. reg final ACT attend
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 2, 677) = 69.38
Model | 2561.91192 2 1280.95596 Prob > F = 0.0000
Residual | 12500.0351 677 18.4638628 R-squared = 0.1701
-------------+------------------------------ Adj R-squared = 0.1676
Total | 15061.9471 679 22.1825435 Root MSE = 4.297
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ACT | .5299129 .047828 11.08 0.000 .4360038 .6238219

13
attend | .1739337 .0306059 5.68 0.000 .1138398 .2340276
_cons | 9.414827 1.447809 6.50 0.000 6.572091 12.25756
------------------------------------------------------------------------------
.
. reg final attend
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 1, 678) = 13.56
Model | 295.352008 1 295.352008 Prob > F = 0.0002
Residual | 14766.5951 678 21.7796387 R-squared = 0.0196
-------------+------------------------------ Adj R-squared = 0.0182
Total | 15061.9471 679 22.1825435 Root MSE = 4.6669
------------------------------------------------------------------------------
final | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
attend | .1209031 .0328317 3.68 0.000 .0564391 .185367
_cons | 22.72992 .8769078 25.92 0.000 21.00814 24.4517
------------------------------------------------------------------------------
.
. reg ACT attend
Source | SS df MS Number of obs = 680
-------------+------------------------------ F( 1, 678) = 17.00
Model | 202.353053 1 202.353053 Prob > F = 0.0000
Residual | 8071.57489 678 11.9049777 R-squared = 0.0245
-------------+------------------------------ Adj R-squared = 0.0230
Total | 8273.92794 679 12.1854609 Root MSE = 3.4504
------------------------------------------------------------------------------
ACT | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
attend | -.1000742 .0242735 -4.12 0.000 -.1477344 -.052414
_cons | 25.12694 .6483252 38.76 0.000 23.85397 26.39991

14
------------------------------------------------------------------------------
.
. di 0.1739337 + 0.5299129*-0.1000742
.12090309
.
.
. log close

15

You might also like