Gen Some - Highsch 0: Worksheet - 2 Name-Sarah Nuzhat Khan ID-20175008 Creating Dummy Variables

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ECO 515 (Summer 2020)

Worksheet -2
Name-Sarah Nuzhat Khan
ID-20175008

Creating Dummy Variables

Using CPS data (in Ch.8), we tabulated the years of education variable.

yrseduc Freq. Percent Cum.

6 803 1.27 1.27


8 589 0.93 2.20
9 742 1.17 3.38
10 729 1.15 4.53
11 1,009 1.60 6.13
12 18,616 29.46 35.59
13 11,628 18.40 53.99
14 7,045 11.15 65.13
16 14,407 22.80 87.93
18 5,594 8.85 96.78
19 1,084 1.72 98.50
20 949 1.50 100.00

Total 63,195 100.00

Then we created five categorical variables: some_highsch, highsch, some_college,


college and post_college by using the gen command.

. gen some_highsch=0

. replace some_highsch=1 if yrseduc<12


(3872 real changes made)
. gen highsch=0

. replace highsch=1 if yrseduc==12


(18616 real changes made)

. tab yrseduc if yrseduc==12

yrseduc Freq. Percent Cum.

12 18,616 100.00 100.00

Total 18,616 100.00


. gen some_college=0

. replace some_college=1 if yrseduc>12 & yrseduc<16


(18673 real changes made)

. gen college=0

. replace college=1 if yrseduc==16


(14407 real changes made)

. gen post_college=0

. replace post_college=1 if yrseduc>16


(7627 real changes made)
a) What are these categorical variables popularly known as in Econometrics?

Ans: They are popularly known as dummy variables.

. reg ahe highsch college some_college post_college

Source SS df MS Number of obs = 63195


F( 4, 63190) = 3960.65
Model 2343618.8 4 585904.7 Prob > F = 0.0000
Residual 9347783.65 63190 147.931376 R-squared = 0.2005
Adj R-squared = 0.2004
Total 11691402.4 63194 185.008109 Root MSE = 12.163

ahe Coef. Std. Err. t P>|t| [95% Conf. Interval]

highsch 4.490099 .2148299 20.90 0.000 4.069032 4.911166


college 14.71009 .2201668 66.81 0.000 14.27856 15.14162
some_college 7.328132 .2147734 34.12 0.000 6.907176 7.749089
post_college 21.22407 .2400024 88.43 0.000 20.75366 21.69447
_cons 12.64156 .1954621 64.68 0.000 12.25846 13.02467

. sum ahe if yrseduc<12

Variable Obs Mean Std. Dev. Min Max

ahe 3872 12.64156 7.125179 2 85

b) Why did we omit the some_highsch variable?

Ans: because it would create multicollinearity problem. Therefore, some_highsch is


omitted.

c) What is the ‘reference group’ here?

Ans: Here some_highsch is the reference group.

d) Compare the intercept with the mean of the some_highsch education. What do you
observe?

Ans: The value of mean of ahe for some_highsch equals to intercept. We observe that
the intercept of the regression and mean of the omitted category is same. In the
regression, the intercept in a sense represents the mean of the omitted group.

e) What can you say about the intercept of a regression with only the dummy variables of
a category such as educational levels (i.e. no other independent variables)?

Ans: the intercept in a sense represents the mean of the omitted group.
f) What does the coefficient of high_sch/some_college/college/post_college mean? Note:
ahe is average hourly earnings in 2004 dollars.

Estimated coefficient of high_sch=$ 4.49= E(ahe|high_sch) - E(ahe|some_highsch)


This implies a high_sch graduate on an average earns $4.49 more than a high_sch
dropout.

Estimated coefficient of some_college=$ 7.33= E(ahe|some_college) -


E(ahe|some_highsch). This implies a some_college graduate on an average earns $7.33
more than a high_sch dropout.

Estimated coefficient of college=$ 14.71= E(ahe| college) - E(ahe|some_highsch)


This implies a college graduate on an average earns $14.71 more than a high_sch
dropout.

Estimated coefficient of post_college=$ 21.22= E(ahe| post_college) -


E(ahe|some_highsch). This implies a post_college graduate on an average earns $21.22
more than a high_sch dropout.

g). What is the estimated average wage rate of a High School grads? College graduate?
Post_college grads?

. sum ahe if highsch==1

Variable Obs Mean Std. Dev. Min Max

ahe 18616 17.13166 9.455924 2.003205 109.8901

. sum ahe if college==1

Variable Obs Mean Std. Dev. Min Max

ahe 14407 27.35165 14.95727 2.003205 116.6436

. sum ahe if post_college==1

Variable Obs Mean Std. Dev. Min Max

ahe 7627 33.86563 16.86093 2.403846 131.224

Estimated average wage rate of a high school grad is $17. 13


Estimated average wage rate of a college grad is $27.35
Estimated average wage rate of a post college grad is $33.86

h) Which of these categorical variables are statistically significant? How can you tell?
Ans: All the categorical values here are statistically significant because their
corresponding p-values.

i) Can we infer from these estimated results that average earnings of more-educated
workers exceed those of less educated workers? Explain.

Ans: It is evident that wage rate is increasing with the increased level of education.This
implies that average earnings of more-educated workers exceed those of less educated
workers.

You might also like