Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Econ 251

Problem Set #6

SOLUTIONS

This problem set introduces you to the dummy variable regression, and testing hypotheses
involving more than one parameter.

Instructions:
Following each question, please handwrite or type your answers and copy/paste the
STATA output (please, use the ‘copy as picture’ option).
The first two problems use STATA file called BEAUTY.dta. (Refer to Problem set 5 for
a variable description or type des in the STATA command window).

Problem I: Dummy variable regression (12 points in total)


This problem illustrates an important property of the dummy variable regression.

1. Estimate the model for the relationship between education and log-earnings:

lwage = 𝜷1 + 𝜷2educ + u.

Draw the regression line carefully labeling the axes, and indicating the intercept and slope.
(1 point)

. gen lwage=log(wage)

. reg lwage educ

Source SS df MS Number of obs = 1260


F( 1, 1258) = 95.89
Model 31.5149966 1 31.5149966 Prob > F = 0.0000
Residual 413.464976 1258 .328668502 R-squared = 0.0708
Adj R-squared = 0.0701
Total 444.979972 1259 .353439215 Root MSE = .5733

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0602839 .0061563 9.79 0.000 .0482061 .0723616


_cons .9014239 .0790132 11.41 0.000 .7464117 1.056436

2. Create 3 dummy variables representing 3 different mutually exclusive educational


categories:
1. high school dropout: this variable equals 1 if years of education is less than 12 years
and zero otherwise. Call this variable HSdropout.
2. high school graduate: this variable equals 1 if years of education is exactly 12 years
1
and zero otherwise. Call this variable HSgraduate.
3. high school and above: this variable equals 1 if years of education is greater than 12
years and zero otherwise. Call this variable College.

Hint: type in STATA:


gen HSdropout = (educ<12)
gen HSgraduate = (educ==12)
gen College = (educ>12)

Now estimate the following model for log-earnings (lwage):


lwage = 𝜷1 + 𝜷2 HSgraduate + 𝜷3 College + u

(Note that we cannot include all the dummies in the regression – we always need to exclude
one category to avoid violating the Gauss-Markov assumption of no perfect collinearity; in
this case we excluded HSdropout).

(i) Interpret the OLS estimates of all the parameters, 𝜷1, 𝜷2 and 𝜷3.

(3 points)

Solution:

. reg lwage HSgraduate College

Source SS df MS Number of obs = 1260


F( 2, 1257) = 32.76
Model 22.0464586 2 11.0232293 Prob > F = 0.0000
Residual 422.933514 1257 .33646262 R-squared = 0.0495
Adj R-squared = 0.0480
Total 444.979972 1259 .353439215 Root MSE = .58005

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

HSgraduate .1211555 .0459269 2.64 0.008 .0310538 .2112573


College .3320125 .0447447 7.42 0.000 .2442299 .4197951
_cons 1.468873 .0372873 39.39 0.000 1.395721 1.542025

Write down the conditional mean lwage for each educational category:

1) High school dropouts (i.e. HSdropout=1)


𝔼(lwage∣HSdropout=1) = 𝜷1 + 𝜷2.0 + 𝜷3.0 = 𝜷1
(Note we used the fact that E(u∣HSdropout =1)=0 by exogeneity).

Hence, the intercept 𝜷1 is the expected log-wage for high school dropouts (i.e. when
HSgraduate=0 and College=0).
Our OLS estimate of 𝜷1 is 1.47, meaning that high school dropouts earn a log-wage of
1.47, on average.
2
2) High school graduates (i.e. HSgraduate=1)
𝔼 (lwage∣ HSgraduate=1) = 𝜷1 + 𝜷2.1 + 𝜷3.0 = 𝜷1 + 𝜷2
(Note we used the fact that E(u∣ HSgraduate=1)=0 by exogeneity).

Hence, 𝜷2 is the difference between the expected log-wage for high school graduates and
for high school dropouts:
E(lwage∣ HSgraduate=1) - E(lwage∣HSdropout=1) = (𝜷1 + 𝜷2) - 𝜷1 = 𝜷2.

Our OLS estimate of 𝜷2 is -0.12, meaning that on average, high school graduates earn
12 percentage points higher wages compared to high school dropouts (or, alternatively,
high school graduates earn 0.12 higher log-wages compared to high school dropouts).

2) College graduates (i.e. College=1)


𝔼 (lwage∣ College=1) = 𝜷1 + 𝜷2.0 + 𝜷3.1 = 𝜷1 + 𝜷3
(Note we used the fact that E(u∣College=1)=0 by exogeneity).

Hence, 𝜷3 is the difference between the expected log-wage for college graduates and for
high school dropouts:
E(lwage∣ College=1) - E(lwage∣HSdropout=1) = (𝜷1 + 𝜷3) - 𝜷1 = 𝜷3.

Our OLS estimate of 𝜷3 is 0.33, meaning that on average, college graduates earn 33
percentage points higher wages compared to high school dropouts (or, alternatively,
college graduates earn 0.33 higher log-wages compared to high school dropouts).

(ii) Add the estimated regression lines to the same picture from part 1.
(2 points)
Solution:
From part (i), high school dropouts earn a log-wage of 1.47, on average; high school
graduates earn a log-wage of (1.47 + 0.12) = 1.60, on average; college graduates earn a
log-wage of (1.47 + 0.33) = 1.80, on average. So, we are going to have 3 regression lines:
1 for each education category, and these lines will be horizontal (as shown on the next
page).
If you want to be somewhat precise when drawing line (i) note that the fitted value for 12
years of education is 1.62 and this is just above 1.60.

(iii) What is the sample mean of variable log-wage in each of the 3 educational categories?
How would these sample means look if you graphed them on the same graph that you used in
part (ii)?
(2 points)
Hint: recall the STATA command “sum”. E.g. for dropouts type:
sum lwage if HSdropout==1.

3
. sum lwage if HSdropout==1

Variable Obs Mean Std. Dev. Min Max

lwage 242 1.468873 .6095631 .0487901 3.244153

. sum lwage if HSgraduate ==1

Variable Obs Mean Std. Dev. Min Max

lwage 468 1.590028 .5459776 .1570037 3.267285

. sum lwage if College ==1

Variable Obs Mean Std. Dev. Min Max

lwage 550 1.800885 .5947204 .0198026 4.353113

The graphs for parts (ii) and (iii) would look exactly the same (the small differences we
get are due to rounding). See next page.

This is the key point this exercise: regression on dummy variables simply reproduces the
sample mean of the outcome variable, Y in each dummy variable category.

lwage

Slope = 0.06

1.80

1.60

1.47

0.90
0

0 educ
… 11 12 13 …

High school dropouts


HSdropout=1
High school graduates
HSgraduate=1
4 College graduates
College=1
Some of you asked me how to draw this in STATA. The STATA graph doesn’t look
very “nice” (the fitted value for high school graduates is just a dot, which is hard so
see) but you can generate it by typing:
reg lwage HSgraduate College
predict yhat_HSdrop if HSdropout==1
predict yhat_HSgrad if HSgraduate==1
predict yhat_Coll if College==1
twoway (scatter lwage educ,xsc(r(0)) ysc(r(0))) (lfit yhat_HSdrop educ) (lfit
yhat_Coll educ) (lfit yhat_HSgrad educ)
(See the graph on next page).

(vi) Use the regression output to test the hypothesis that high school graduates and high
school dropouts earn the same mean log-wages. Use significance level of 1%.
(1 point)
Solution:
From part (i) we saw that 𝜷2 is the difference between the expected log-wage for high
school graduates and for high school dropouts:
E(lwage∣ HSgraduate=1) - E(lwage∣HSdropout=1) = (𝜷1 + 𝜷2) - 𝜷1 = 𝜷2.
Hence, the null hypothesis “high school graduates and high school dropouts earn the
same mean log-wages”
H0: E(lwage∣ HSgraduate=1) = E(lwage∣HSdropout=1) is equivalent to the hypothesis
H0: 𝜷2=0.
Since the p-value is 0.008<0.01 – the significance level, we reject the null hypothesis
and conclude that high school graduates and high school dropouts earn significantly
different mean log-wages.

(vii) Use the regression output to test the hypothesis that high school dropouts and college
graduates earn the same mean log-wages. Use significance level of 1%.
5
(1 point)

Solution:
From part (i) we saw that 𝜷3 is the difference between the expected log-wage for college
graduates and for high school dropouts:
E(lwage∣ College=1) - E(lwage∣HSdropout=1) = (𝜷1 + 𝜷3) - 𝜷1 = 𝜷3.

Hence, the null hypothesis “high school dropouts and college graduates earn the same
mean log-wages”
H0: E(lwage∣ College=1) - E(lwage∣HSdropout=1) is equivalent to the hypothesis
H0: 𝜷3=0.
Since the p-value is 0.000<0.01 – the significance level, we reject the null hypothesis
and conclude that high school dropouts and college graduates earn significantly
different mean log-wages.

Problem II: Testing hypotheses for more than one parameter (15 points in total)

This problem introduces you to regression on dummy variables and hypotheses tests in
STATA involving more than one parameter.

Variable looks from dataset BEAUTY.dta contains each persons’s score on their physical
attractiveness (people in the sample were ranked by an interviewer). The attractiveness was
coded in five categories: 1=homely, 2=quite plain, 3=average, 4=good looking, and
5=strikingly beautiful/handsome.

(i) Create 3 dummy variables that represent a person’s looks the following way. The first
variable equals 1 if looks is less than 3, and 0 otherwise (call it belowaverage), the second
equals 1 if looks is exactly 3, and 0 otherwise (call it average) and the last one equals 1 if looks
is greater than 3, and 0 otherwise (call it aboveaverage).
(Note: no submission for this part of the problem set is required).

Hint: All you need to do for part (i) is type in STATA:


gen belowaverage = (looks<3)
gen average = (looks==3)
gen aboveaverage =(looks>3)
Solution:
All you needed to here was to type in the hint in STATA.

FYI: only 1.03 percent of the people in the sample fall into category 1=homely, and
only 1.51 percent of the people in the sample fall into category 5=strikingly
beautiful/handsome. For this reason we do the binning above.

6
. tab looks

from 1 to 5 Freq. Percent Cum.

1 13 1.03 1.03
2 142 11.27 12.30
3 722 57.30 69.60
4 364 28.89 98.49
5 19 1.51 100.00

Total 1,260 100.00

(ii) Now estimate the following model for log-earnings (lwage) for women:

lwage = 𝜷1 + 𝜷2 belowaverage + 𝜷3 aboveaverage + u.

(Note that we always need to exclude one category to avoid violating the Gauss-Markov
assumption of no perfect collinearity; in this case we excluded average).

Interpret the OLS estimates of all the parameters, 𝜷1, 𝜷2 and 𝜷3.

Hint: In order to estimate the model for women, type in STATA:


reg lwage belowaverage aboveaverage if female==1
(3 points)

Solution:

. reg lwage belowaverage aboveaverage if female==1

Source SS df MS Number of obs = 436


F( 2, 433) = 2.30
Model 1.25587193 2 .627935966 Prob > F = 0.1018
Residual 118.356592 433 .273340858 R-squared = 0.0105
Adj R-squared = 0.0059
Total 119.612464 435 .274971181 Root MSE = .52282

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

belowaverage -.1376261 .0761973 -1.81 0.072 -.2873887 .0121365


aboveaverage .0336409 .0554196 0.61 0.544 -.0752841 .1425658
_cons 1.308817 .0342511 38.21 0.000 1.241498 1.376136

Again, write down the conditional mean log-wage for each group.

1) For a woman of average looks (i.e. average=1)


For a woman of average looks average=1, belowaverage=0 and aboveaverage=0. Then her
expected lwage is given by:
7
E(lwage∣average=1) = 𝜷1 + 𝜷2.0 + 𝜷3.0 = 𝜷1
(Note we just substituted belowaverage=0 and aboveaverage=0 and used the fact that
E(u∣ average=1)=0 by exogeneity).

Therefore, the intercept 𝜷1 is interpreted as the mean log-wage of women with average
looks (i.e. the mean log-wage for the omitted category).
Our OLS estimate of 𝜷1 is 1.31, meaning that the predicted log-wage of women with
average looks is 1.31.

2) For a woman of below average looks (i.e. belowaverage=1)


For a woman of below average looks belowaverage=1, average=0 and aboveaverage=0.
Then her expected lwage is given by:
E(lwage∣belowaverage=1) = 𝜷1 + 𝜷2.1 + 𝜷3.0 = 𝜷1 + 𝜷2
(Note we just substituted belowaverage=1 and aboveaverage=0 and used the fact that
E(u∣ belowaverage=1)=0 by exogeneity).
From here it is easy to see that 𝜷2 equals the difference between the mean log-wages of
belowaverage and the omitted category, i.e.:
E(lwage∣belowaverage=1) - E(lwage∣average=1) = 𝜷1 + 𝜷2 – 𝜷1 = 𝜷2.

Our OLS estimate of 𝜷2 is -0.14, meaning that compared to women with average looks
(i.e. compared to the omitted category), women of below average looks earn 14
percentage point lower wages, on average.

3) For a woman of above average looks (i.e. aboveaverage=1)


For a woman of above average looks aboveaverage=1, average=0 and belowaverage=0.
Then her expected lwage is given by:
E(lwage∣aboveaverage=1) = 𝜷1 + 𝜷2.0 + 𝜷3.1 = 𝜷1 + 𝜷3
(Note we just substituted belowaverage=0 and aboveaverage=1 and used the fact that
E(u∣ aboveaverage=1)=0 by exogeneity).

From here it is easy to see that 𝜷3 equals the difference between the mean log-wages of
aboveaverage and the omitted category, i.e.:
E(lwage∣aboveaverage=1) - E(lwage∣average=1) = 𝜷1 + 𝜷3 - 𝜷1 = 𝜷3.

Our OLS estimate of 𝜷3 is 0.03, meaning that compared to women with average looks
(i.e. compared to the omitted category), women with above average looks earn 3
percentage point higher wages, on average.

Now use the regression results to test the following hypotheses. (In order to get full points
write down the null hypothesis, whether you reject it or not, and why. Also, please, do not
forget to attach your STATA output).

(iii) Test the hypothesis that women with below average looks earn the same average log-
wage as women with average looks. Use a significance level of 10%. (2 points)

Solution:

(iii)
8
Since 𝜷2 equals the difference between the mean log-wage of women with below average
and average looks (recall E(lwage∣belowaverage=1) - E(lwage∣average=1)= 𝜷2
from part (iii), then this hypothesis is the same as stating 𝜷2=0, i.e. H0: 𝜷2=0.
We reject the null at the 10% significance level as p-value=0.072<0.10.

(iv) Test the hypothesis that women with above average looks earn the same average log-
wage as women with average looks. Use a significance level of 10%. (2 points)
Solution:
Since 𝜷3 equals the difference between the mean log-wage of women with above average
and average looks (recall E(lwage∣aboveaverage=1) - E(lwage∣average=1)=𝜷3
from part (iii), then this hypothesis is the same as stating 𝜷3=0, i.e. H0: 𝜷3=0.
We fail to reject the null at the 10% significance level as p-value= 0.544>0.10.

(v) Test the hypothesis that women with above average looks earn the same average log-
wage as women with below average looks. Use a significance level of 5%.
(2 points)
Solution:
This hypothesis requires that:
(lwage∣belowaverage=1)= (lwage∣aboveaverage=1).
From part (iii) this is the same as: H0: 𝜷1 + 𝜷2 = 𝜷1 + 𝜷3, or H0: 𝜷2 = 𝜷3 (as 𝜷1 cancels).

. test belowaverage = aboveaverage

( 1) belowaverage - aboveaverage = 0

F( 1, 433) = 4.49
Prob > F = 0.0346

We reject the null at the 5% significance level as p-value=0.035<0.05. This is an


example of an F-test.

(vi) Test the hypothesis that women from all three beauty categories have the same average
log-earnings. Use a significance level of 5%. (2 points)

Solution:
This hypothesis requires that the mean log-wages in all three groups are the same, i.e.
E(lwage∣belowaverage=1)= E (lwage∣aboveaverage=1)= E (lwage∣average=1)
From part (iii) this is the same as: H0: 𝜷1 + 𝜷2 = 𝜷1 + 𝜷3 = 𝜷1 or H0: 𝜷2 = 𝜷3=0 (as 𝜷1
cancels).
Note from the summary table that H0: in part (vi) is not the same as part (v): in (v) we
only required 𝜷2 = 𝜷3; here we not only require 𝜷2 = 𝜷3 but we also require that both
are zeros.

9
. test belowaverage = aboveaverage=0

( 1) belowaverage - aboveaverage = 0
( 2) belowaverage = 0

F( 2, 433) = 2.30
Prob > F = 0.1018

We fail to reject the null at the 10% significance level as p-value=0.102>0.10. This is
another example of an F-test.

Just FYI: Note that this is also the F-statistics reported in the standard STATA output. For
every model, its F-statistic refers to the hypothesis that all slope parameters are jointly zeros
(obviously, if you have a model where you cannot reject the null that the slopes are all zeros,
this is not a very meaningful model at all).

The null hypotheses in parts (iii) to (vi) are summarized below:

Question Null hypothesis in terms of the parameters of the regression


(iii) H0: 𝜷2=0
(iv) H0: 𝜷3=0
(v) H0: 𝜷2 = 𝜷3
(vi) H0: 𝜷2 = 𝜷3=0

(vii) Now estimate the multiple linear regression model for all the women in the sample:

lwage = 𝜷1 + 𝜷2 belowaverage + 𝜷3 aboveaverage + 𝜷4 educ + 𝜷5 exper +


+ 𝜷6 black + 𝜷7 bigcity + u.

 Holding all other factors fixed, does living in a big city (bigcity) have a statistically
significant effect on log-earnings? What about labour market experience (exper)?
Use a significance level of 5% in both cases.
 Show how you would calculate the F-statistic for testing H0: 𝜷2 = 𝜷3=0 in the model
above. (3 points in total)
2 −𝑅2
𝑁−𝐾 𝑅𝑢𝑟 𝑟
Hint: Use the formula from class . 2 , where
𝑚 1−𝑅𝑢𝑟
𝑹𝟐𝒖𝒓 - R-squared of the model with no restrictions
𝑹𝟐𝒓 - R-squared of the model with the restriction that the slopes on belowaverage
and aboveaverage are both 0s
N - sample size
K - number of parameters (including the intercept)
m - number of restrictions

The resticted model is:

lwage = 𝜷1 + 𝜷4 educ + 𝜷5 exper + 𝜷6 black + 𝜷7 bigcity + u.

What does its R-squared equal?


10
Solution:

 Holding other factors fixed, living in a big city (bigcity) has a statistically
significant effect on log-earnings – the p-value associated with the null
hypothesis that the slope on bigcity equals 0 is 0.002<0.05, so we reject the null.
Labour market experience also has a statistically significant effect on log-
wages, ceteris paribus – the p-value associated with the null hypothesis that the
slope on exper equals 0 is 0.000<0.05, so we reject the null.

 F-statistic calculation (see next page).

Some of you forgot to estimate the model on the female sample only; we did not
downgrade you as long as your calculation was correct.

Unrestricted model STATA output:

. reg lwage belowaverage aboveaverage educ exper black bigcity if female==1

Source SS df MS Number of obs = 436


F( 6, 429) = 20.66
Model 26.8143417 6 4.46905695 Prob > F = 0.0000
Residual 92.7981219 429 .216312638 R-squared = 0.2242
Adj R-squared = 0.2133
Total 119.612464 435 .274971181 Root MSE = .46509

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

belowaverage -.1122478 .0679545 -1.65 0.099 -.2458129 .0213174


aboveaverage .0361311 .0502615 0.72 0.473 -.0626584 .1349206
educ .0794316 .0091334 8.70 0.000 .0614798 .0973833
exper .0105163 .0021899 4.80 0.000 .0062121 .0148206
black .0935771 .0725275 1.29 0.198 -.0489763 .2361305
bigcity .1788153 .0576648 3.10 0.002 .0654747 .292156
_cons .1080574 .1260579 0.86 0.392 -.1397106 .3558254

Resticted model STATA output:

11
. reg lwage educ exper black bigcity if female==1

Source SS df MS Number of obs = 436


F( 4, 431) = 29.78
Model 25.8997155 4 6.47492888 Prob > F = 0.0000
Residual 93.7127481 431 .21743097 R-squared = 0.2165
Adj R-squared = 0.2093
Total 119.612464 435 .274971181 Root MSE = .46629

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0808748 .0091275 8.86 0.000 .0629349 .0988147


exper .0101811 .0021648 4.70 0.000 .0059262 .014436
black .0950454 .0725674 1.31 0.191 -.0475846 .2376754
bigcity .1772302 .0577251 3.07 0.002 .0637726 .2906879
_cons .0915181 .1240145 0.74 0.461 -.1522305 .3352666

So,
𝑹𝟐𝒖𝒓 = 𝟎. 𝟐𝟐𝟒𝟐
𝑹𝟐𝒓 = 𝟎. 𝟐𝟏𝟔𝟓
𝑵 = 𝟒𝟑𝟔
K=7
m=2

Hence,
𝑵−𝑲 𝑹𝟐𝒖𝒓 −𝑹𝟐𝒓 𝟒𝟑𝟔−𝟕 𝟎.𝟐𝟐𝟒𝟐−𝟎.𝟐𝟏𝟔𝟓
𝑭 − 𝒔𝒕𝒂𝒕 = . = . ≈ 2.1
𝒎 𝟏−𝑹𝟐𝒖𝒓 𝟐 𝟏−𝟎.𝟐𝟐𝟒𝟐

You can double-check this by estimating the unrestricted model and then typing:

. test belowaverage =aboveaverage =0

( 1) belowaverage - aboveaverage = 0
( 2) belowaverage = 0

F( 2, 429) = 2.11
Prob > F = 0.1220

(viii) In order for the OLS estimates to be an unbiased estimate of the true effect of looks,
we need to assume that women with below average, average, and above average looks are
exactly the same in all other aspects, except physical appearance. Put formally, we need:
E(u∣belowaverage=1)=𝔼(u∣aboveaverage=1)=𝔼(u∣average=1)=𝔼(u)=0.

Do you think this is likely to hold?


(1 point)
Solution:
Basically, the answer largely depends on whether you believe that how a person looks
is truly exogenous. If you think that looks are given by “Nature” and there’s nothing

12
one can do about the way they look, then, yes, the OLS estimates of the slope
parameters are unbiased.
I personally would argue that how a person looks is not completely exogenous. E.g. a
person with above average looks might look better than average since they take care
of themselves – they may dress well, exercise, watch their weight, etc. And it might be
that this type of person is also more diligent in their work, and not just about their
looks. If this is the case the OLS estimate of 𝜷2 is likely biased down and the OLS
estimate of 𝜷3 is likely biased up.
Another more subtle point is that there might be reverse causality between looks and
wages – it might be that women who earn more look better, on average, precisely
because they earn more (e.g. as they can afford a better diet, better cosmetics, etc., or
maybe even plastic surgery).

Problem III: Dummy variable regression and a continuous variable (8 points in total)

Background: Is there a marriage premium on the labour market? Some authors (e.g.
Korenman and Neumark, 1991) found a significant wage premium for married men, but
their analysis is limited because they cannot directly observe productivity. Professional
athletes provide a good opportunity to study the marriage premium because we can easily
collect data on various productivity measures, in addition to salary.

Open dataset NBASAL.dta. This dataset contains various information on players in the
National Basketball Association (NBA). For each player, we have information on points
scored, rebounds, assists, playing time, and demographics. We are going to use the
following variables from this dataset:
wage annual salary, thousands $
lwage log(wage)
guard =1 if the player is a guard, 0 otherwise
forward =1 if the player is a forward, 0 otherwise
center =1 if the player is a center, 0 otherwise
points number of points scored per game
rebounds number of rebounds per game
assists number of assistances per game
exper player’s experience in years
marr =1 if the player is married, 0 otherwise

(a) Using a dummy variable regression, calculate the average annual salary (in
thousands of USD) for guards (guard==1), forwards (forward==1), and centers
(center==1).
Hint: This is analogous to question 1: simply regress wage on two of the dummy variables.
Make sure you omit one dummy, e.g. omit guard. (Why?)

Solution:

. reg wage forward center

13
Source | SS df MS Number of obs = 269
-------------+---------------------------------- F(2, 266) = 2.48
Model | 4903788.34 2 2451894.17 Prob > F = 0.0857
Residual | 262975129 266 988628.303 R-squared = 0.0183
-------------+---------------------------------- Adj R-squared = 0.0109
Total | 267878917 268 999548.197 Root MSE = 994.3

------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
forward | 207.4889 133.1783 1.56 0.120 -54.72881 469.7066
center | 358.6025 173.8989 2.06 0.040 16.20895 700.996
_cons | 1277.658 93.53568 13.66 0.000 1093.494 1461.823
------------------------------------------------------------------------------

. di 1277.658 + 207.4889
1485.1469

. di 1277.658 + 358.6025
1636.2605

The average annual salary for guards is given by the intercept estimate – 1,277
thousand USD. Forwards earn $207 thousand more than guards per year, on average,
or 1,485 thousand USD. Centers earn $358 thousand more than guards per year, on
average, or 1,636 thousand USD.
(b) Run a regression of lwage (log of yearly salary) on points (points scored per game).
Interpret the slope estimate.

Solution:
. reg lwage points

Source | SS df MS Number of obs = 269


-------------+---------------------------------- F(1, 267) = 166.40
Model | 79.9315859 1 79.9315859 Prob > F = 0.0000
Residual | 128.257177 267 .48036396 R-squared = 0.3839
-------------+---------------------------------- Adj R-squared = 0.3816
Total | 208.188763 268 .776823743 Root MSE = .69308
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
points | .092553 .0071749 12.90 0.000 .0784264 .1066796
_cons | 6.007291 .084573 71.03 0.000 5.840777 6.173806
------------------------------------------------------------------------------

One more point scored per game increases a player’s salary by 9%, on average.
(c) Now run a regression of lwage on points and two dummy variables forward and
center (Make sure that your omit category guard). Produce a graph of the regression
lines for each group by executing the following STATA code (just copy/paste in the
command window):
reg lwage points forward center
predict yhat_g if guard==1
predict yhat_f if forward==1
predict yhat_c if center==1
twoway (scatter lwage points) (lfit yhat_g points) (lfit yhat_f points) (lfit yhat_c points)

14
IMPORTANT NOTE: Notice that now each position (guard, forward, center) gets a
separate regression line with a different intercept but the slope on points is the same.
No interpretation is required, just submit a snapshot of the graph generated by
STATA.
Solution:

(d) Finally, let’s use this data to test whether married men are paid more after we
account for productivity differences. (For example, NBA owners may think that
married men bring stability to the team, or are better for the team image.)
Regress lwage on points, rebounds, assists (think of these as productivity measures), exper;
forward and center (to allow for differences between positions) and marr (a dummy for
being married). Conditional on productivity, do you find evidence for a significant marriage
premium in NBA?
(2 points each, 8 points in total)

Solution:
The slope on marr is not statistically different from zero even at the 10% significance
level. Therefore, conditional on productivity, we find no evidence for a significant
marriage premium.

. reg lwage points rebounds assists exper marr forward center

Source | SS df MS Number of obs = 269


-------------+---------------------------------- F(7, 261) = 37.79
Model | 104.788177 7 14.9697395 Prob > F = 0.0000
Residual | 103.400587 261 .39617083 R-squared = 0.5033
-------------+---------------------------------- Adj R-squared = 0.4900
Total | 208.188763 268 .776823743 Root MSE = .62942
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
points | .0643611 .0103089 6.24 0.000 .0440618 .0846604
rebounds | .0407392 .0213493 1.91 0.057 -.0012995 .0827779
15
assists | .0572367 .0265075 2.16 0.032 .0050408 .1094325
exper | .0726404 .012285 5.91 0.000 .0484502 .0968307
marr | -.0202345 .0835095 -0.24 0.809 -.1846727 .1442036
forward | .2231415 .1185838 1.88 0.061 -.0103613 .4566443
center | .2015074 .1439425 1.40 0.163 -.0819291 .4849439
_cons | 5.489368 .1115066 49.23 0.000 5.269801 5.708935
------------------------------------------------------------------------------

16

You might also like