Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Econ 3044: Introduction to Econometrics

Chapter-4: MLR: Further Issues and Dummy Variables

Lemi Taye

Addis Ababa University


lemi.taye@aau.edu.et

December 28, 2019

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 1 / 43
Overview

1 More on Functional Form

2 MLR with Qualitative Information

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 2 / 43
More on Using Logarithmic Functional Forms

We begin by reviewing how to interpret the parameters in the


following model, which relates the median housing price (price) in the
community to various community characteristics: nox is the amount
of nitrogen oxide in the air; rooms is the average number of rooms in
houses in the community.

log(price) = β0 + β1 log(nox) + β2 rooms + u. (1)

The coefficient β1 is the elasticity of price with respect to nox


(pollution).
The coefficient β2 is the change in log(price), when ∆rooms = 1; as
we have seen many times, when multiplied by 100, this is the
approximate percentage change in price.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 3 / 43
More on Using Logarithmic Functional Forms

When estimated using the data in HPRICE2, we obtain

\ = 9.23 − .718 log(nox) + .306 rooms


log(price)
(.19) (.066) (.019) (2)
2
n = 506, R = .514.

Thus, when nox increases by 1%, price falls by .718%, holding only
rooms fixed.
When rooms increases by one, price increases by approximately
100(.306) = 30.6%.
The estimate that one more room increases price by about 30.6%
turns out to be somewhat inaccurate for this application.
The approximation error occurs because, as the change in log(y)
becomes larger and larger, the approximation %∆y ≈ 100 · ∆ log(y)
becomes more and more inaccurate.
Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 4 / 43
More on Using Logarithmic Functional Forms

Fortunately, a simple calculation is available to compute the exact


percentage change.
To describe the procedure, we consider the general estimated model

\ = β̂0 + β̂1 log(x1 ) + β̂2 x2 .


log(y)

\ = β̂2 ∆x2 .
Now, fixing x1 , we have ∆log(y)
Using simple algebraic properties of the exponential and logarithmic
functions gives the exact percentage change in the predicted y as

%∆y = 100 · [exp(β̂2 ∆x2 ) − 1], (3)

where the multiplication by 100 turns the proportionate change into a


percentage change.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 5 / 43
More on Using Logarithmic Functional Forms

When ∆x2 = 1,
%∆y = 100 · [exp(β̂2 ) − 1]. (4)
Applied to the housing price example with x2 = rooms and
[ = 100[exp(.306) − 1] = 35.8%, which is notably
β̂2 = .306, %∆price
larger than the approximate percentage change, 30.6%, obtained
directly from (2).
The adjustment in equation (3) is not as crucial for small percentage
changes.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 6 / 43
Models with Quadratics

Quadratic functions are also used quite often in applied economics


to capture decreasing or increasing marginal effects.
In the simplest case, y depends on a single observed factor x, but it
does so in a quadratic fashion:

y = β0 + β1 x + β2 x2 + u.

For example, take y = wage and x = exper. As we discussed in


Chapter 3, this model falls outside of simple regression analysis but is
easily handled with multiple regression.
It is important to remember that β1 does not measure the change in y
with respect to x; it makes no sense to hold x2 fixed while changing x.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 7 / 43
Models with Quadratics

If we write the estimated equation as

ŷ = β̂0 + β̂1 x + β̂2 x2 , (5)

then we have the approximation

∆ŷ ≈ (β̂1 + 2β̂2 x)∆x so ∆ŷ/∆x ≈ β̂1 + 2β̂2 x. (6)

This says that the slope of the relationship between x and y depends
on the value of x; the estimated slope is β̂1 + 2β̂2 x.
If we plug in x = 0, we see that β̂1 can be interpreted as the
approximate slope in going from x = 0 to x = 1. After that, the
second term, 2β̂2 x, must be accounted for.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 8 / 43
Models with Quadratics

If we are only interested in computing the predicted change in y given


a starting value for x and a change in x, we could use (5) directly:
there is no reason to use the calculus approximation at all.
However, we are usually more interested in quickly summarizing the
effect of x on y, and the interpretation of β̂1 and β̂2 in equation (6)
provides that summary.
Typically, we might plug in the average value of x in the sample, or
some other interesting values, such as the median or the lower and
upper quartile values.
In many applications, β̂1 is positive and β̂2 is negative.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy VariablesDecember 28, 2019 9 / 43
Models with Quadratics

For example, using the wage data in WAGE1, we obtain

[ = 3.73 + .298 exper − .0061 exper2


wage
(.35) (.041) (.0009) (7)
2
n = 526, R = .093.

This estimated equation implies that exper has a diminishing effect on


wage.
When the coefficient on x is positive and the coefficient on x2 is
negative, the quadratic has a parabolic shape.
There is always a positive value of x where the effect of x on y is
zero; before this point, x has a positive effect on y; after this point, x
has a negative effect on y.
In practice, it can be important to know where this turning point is.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 10 / 43
Models with Quadratics

In the estimated equation (5) with β̂1 > 0 and β̂2 < 0, the turning
point (or maximum of the function) is always achieved at the
coefficient on x over twice the absolute value of the coefficient on x2 :

x∗ = |β̂1 /(2β̂2 )|. (8)

In the wage example, x∗ = exper∗ is .298/[2(.0061)] ≈ 24.4. (Note


how we just drop the minus sign on .0061 in doing this calculation.)
This quadratic relationship is illustrated in Figure 1.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 11 / 43
Models with Quadratics

Figure 1: Quadratic relationship between wage


[ and exper.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 12 / 43
Models with Quadratics

When a model has a dependent variable in logarithmic form and an


explanatory variable entering as a quadratic, some care is needed in
reporting the partial effects.
The following example also shows that the quadratic can have a
U-shape, rather than a parabolic shape.
A U-shape arises in equation (5) when β̂1 is negative and β̂2 is
positive; this captures an increasing effect of x on y.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 13 / 43
Models with Quadratics

Example (Effects of pollution on Housing prices)


We modify the housing price model to include a quadratic term in rooms:

log(price) = β0 + β1 log(nox) + β2 log(dist) + β3 rooms


(9)
+ β4 rooms2 + β5 stratio + u.

The model estimated using the data in HPRICE2 is

\ = 13.39 − .902 log(nox) − .087 log(dist)


log(price)
(.57) (.115) (.043)
− .545 rooms + .062 rooms2 − .048 stratio
(.165) (.013) (.006)
n = 506, R2 = .603.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 14 / 43
Models with Quadratics

Example (Effects of pollution on Housing prices (continued ))


The quadratic term rooms2 has a t statistic of about 4.77, and so it is
very statistically significant. But what about interpreting the effect of
rooms on log(price)? Initially, the effect appears to be strange. Because
the coefficient on rooms is negative and the coefficient on rooms2 is
positive, this equation literally implies that, at low values of rooms, an
additional room has a negative effect on log(price). At some point, the
effect becomes positive, and the quadratic shape means that the
semi-elasticity of price with respect to rooms is increasing as rooms
increases. This situation is shown in Figure 2.

We obtain the turnaround value of rooms using equation (8) (even though
β̂1 is negative and β̂2 is positive). The absolute value of the coefficient on
rooms, .545, divided by twice the coefficient on rooms2 , .062, gives
rooms∗ = .545/[2(.062)] ≈ 4.4; this point is labeled in Figure 2.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 15 / 43
Models with Quadratics

Example (Effects of pollution on Housing prices (continued ))

\ as a quadratic function of rooms.


Figure 2: log(price)

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 16 / 43
Models with Interaction Terms

Sometimes, it is natural for the partial effect, elasticity, or


semi-elasticity of the dependent variable with respect to an
explanatory variable to depend on the magnitude of yet another
explanatory variable.
For example, in the model

price = β0 + β1 sqrf t + β2 bdrms + β3 sqrf t · bdrms + β4 bthrms + u,

the partial effect of bdrms on price (holding all other variables fixed)
is
∆price
= β2 + β3 sqrf t. (10)
∆bdrms
If β3 > 0, then (10) implies that an additional bedroom yields a higher
increase in housing price for larger houses.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 17 / 43
Models with Interaction Terms

In other words, there is an interaction effect between square footage


and number of bedrooms.
The parameters on the original variables can be tricky to interpret
when we include an interaction term.
In summarizing the effect of bdrms on price, we must evaluate (10)
at interesting value of sqrf t, such as the mean value, or the lower and
upper quartiles in the sample.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 18 / 43
Models with Interaction Terms

Example (Effects of attendance on Final Exam performance)


A model to explain the standardized outcome on a final exam (stndf nl) in
terms of percentage of classes attended, prior college grade point average,
and ACT score is
stndf nl = β0 + β1 atndrte + β2 priGP A + β3 ACT + β4 priGP A2
(11)
+ β5 ACT 2 + β6 priGP A × atndrte + u.

(We use the standardized exam score to interpret a student’s performance


relative to the rest of the class.) In addition to quadratics in priGP A and
ACT , this model includes an interaction between priGP A and the
attendance rate. The idea is that class attendance might have a different
effect for students who have performed differently in the past, as measured
by priGP A. We are interested in the effects of attendance on final exam
score: ∆stndf nl/∆Datndrte = β1 + β6 priGP A.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 19 / 43
Models with Interaction Terms

Example (Effects of attendance on Final Exam performance


(continued ))
Using the 680 observations in ATTEND, for students in a course on
microeconomic principles, the estimated equation is

\nl = 2.05 − .0067 atndrte − 1.63 priGP A − .128 ACT


stndf
(1.36) (.0102) (.48) (.098)
2 2
+ .296 priGP A + .0045 ACT + .0056 priGP A × atndrte
(.101) (.0022) (.0043)
n = 680, R2 = .229, R̄2 = .222.

We must interpret this equation with extreme care. If we simply look at the
coefficient on atndrte, we will incorrectly conclude that attendance has a
negative effect on final exam score.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 20 / 43
Models with Interaction Terms
Example (Effects of attendance on Final Exam performance
(continued ))
But this coefficient supposedly measures the effect when priGP A = 0,
which is not interesting (in this sample, the smallest prior GPA is about
.86). We must also take care not to look separately at the estimates of β1
and β6 and conclude that, because each t statistic is insignificant, we
cannot reject H0 : β1 = 0, β6 = 0. In fact, the p-value for the F -test of
this joint hypothesis is .014, so we certainly reject H0 at the 5% level.

How should we estimate the partial effect of atndrte on stndf nl? We


must plug in interesting values of priGP A to obtain the partial effect. The
mean value of priGP A in the sample is 2.59, so at the mean priGP A, the
effect of atndrte on stndf nl is −.0067 + .0056(2.59) ≈ .0078. What does
this mean? Because atndrte is measured as a percentage, it means that a
10 percentage point increase in atndrte increases stndf
\nl by .078
standard deviations from the mean final exam score.
Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables
December 28, 2019 21 / 43
Describing Qualitative Information

Qualitative factors often come in the form of binary information (eg.


gender, PC ownership, marital status, etc.).
The relevant information can be captured by defining a binary
variable or a zero-one variable.
In econometrics, binary variables are most commonly called dummy
variables.
In defining a dummy variable, we must decide which event is assigned
the value one and which is assigned the value zero.
For example, in a study of individual wage determination, we might
define f emale to be a binary variable taking on the value one for
females and the value zero for males.
The name in this case indicates the event with the value one.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 22 / 43
A Single Dummy Independent Variable

To incorporate binary information with only a single dummy


explanatory variable, we just add it as an independent variable in the
equation.
For example, consider the following simple model of hourly wage
determination:

wage = β0 + δ0 f emale + β1 educ + u. (12)

In model (12), only two observed factors affect wage: gender and
education.
Because f emale = 1 when the person is female, and f emale = 0
when the person is male, the parameter δ0 has the following
interpretation: δ0 is the difference in hourly wage between females and
males, given the same amount of education.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 23 / 43
A Single Dummy Independent Variable

Thus, the coefficient δ0 determines whether there is discrimination


against women: if δ0 < 0, then for the same level of other factors,
women earn less than men on average.
In terms of expectations, if we assume the zero conditional mean
assumption E(u|f emale, educ) = 0, then

δ0 = E(wage|f emale = 1, educ) − E(wage|f emale = 0, educ).

Because f emale = 1 corresponds to females and f emale = 0


corresponds to males, we can write this more simply as

δ0 = E(wage|f emale, educ) − E(wage|male, educ). (13)

The key here is that the level of education is the same in both
expectations; the difference, δ0 , is due to gender only.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 24 / 43
A Single Dummy Independent Variable

The situation can be depicted graphically as an intercept shift


between males and females.
In Figure 3, the case δ0 < 0 is shown, so that men earn a fixed
amount more per hour than women.
The difference does not depend on the amount of education, and this
explains why the wage-education profiles for women and men are
parallel.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 25 / 43
A Single Dummy Independent Variable

Figure 3: Graph of wage = β0 + δ0 f emale + β1 educ for δ0 < 0.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 26 / 43
A Single Dummy Independent Variable

At this point, you may wonder why we do not also include in (12) a
dummy variable, say male, which is one for males and zero for females.
This would be redundant. In (12), the intercept for males is β0 , and
the intercept for females is β0 + δ0 . Because there are just two
groups, we only need two different intercepts.
This means that, in addition to β0 , we need to use only one dummy
variable; we have chosen to include the dummy variable for females.
Using two dummy variables would introduce perfect collinearity
because f emale + male = 1, which means that male is a perfect
linear function of f emale.
Including dummy variables for both genders is the simplest example of
the so-called dummy variable trap, which arises when too many
dummy variables describe a given number of groups.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 27 / 43
A Single Dummy Independent Variable

In (12), we have chosen males to be the base group or benchmark


group, that is, the group against which comparisons are made.
This is why β0 is the intercept for males, and δ0 is the difference in
intercepts between females and males.
We could choose females as the base group by writing the model as

wage = α0 + γ0 male + β1 educ + u,

where the intercept for females is α0 and the intercept for males is
α0 + γ0 ; this implies that α0 = β0 + δ0 and α0 + γ0 = β0 .
In any application, it does not matter how we choose the base group,
but it is important to keep track of which group is the base group.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 28 / 43
A Single Dummy Independent Variable

Nothing much changes when more explanatory variables are involved.


Taking males as the base group, a model that controls for experience
and tenure in addition to education is

wage = β0 + δ0 f emale + β1 educ + β2 exper + β3 tenure + u. (14)

If educ, exper, and tenure are all relevant productivity


characteristics, the null hypothesis of no difference between men and
women is H0 : δ0 = 0.
The alternative that there is discrimination against women is
H1 : δ0 < 0.
To test for wage discrimination, we just estimate the model by OLS,
exactly as before, and use the usual t statistic.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 29 / 43
A Single Dummy Independent Variable

Example (Hourly Wage Equation)


Using the data in WAGE1, we estimate model (14). For now, we use wage,
rather than log(wage), as the dependent variable:

[ = −1.57 − 1.81 f emale + .572 educ + .25 exper + .141 tenure


wage
(.72) (.26) (.049) (.012) (.021)
n = 526, R2 = .364.

The negative intercept—the intercept for men, in this case—is not very
meaningful because no one has zero values for all of educ, exper, and
tenure in the sample. The coefficient on f emale is interesting because it
measures the average difference in hourly wage between a man and a
woman who have the same levels of educ, exper, and tenure. If we take a
woman and a man with the same levels of education, experience, and
tenure, the woman earns, on average, $1.81 less per hour than the man.
Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables
December 28, 2019 30 / 43
A Single Dummy Independent Variable

A common specification in applied work has the dependent variable


appearing in logarithmic form, with one or more dummy variables
appearing as independent variables.
In this case, the coefficients have a percentage interpretation.

Example (Housing price Regression)


Using the data in HPRICE1, we obtain the equation

\ = −1.35 + .168 log(lotsize) + .707 log(sqrf t)


log(price)
(.65) (.038) (.093)
+ .027 bdrms + .054 colonial
(.029) (.045)
2
n = 88, R = .649.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 31 / 43
A Single Dummy Independent Variable

Example (Housing price Regression (continued ))


All the variables are self-explanatory except colonial, which is a binary
variable equal to one if the house is of the colonial style. What does the
coefficient on colonial mean? For given levels of lotsize, sqrf t, and
bdrms, the difference in log(price) between a house of colonial style and
that of another style is .054. This means that a colonial-style house is
predicted to sell for about 5.4% more, holding other factors fixed.

This example shows that, when log(y) is the dependent variable in a


model, the coefficient on a dummy variable, when multiplied by 100, is
interpreted as the percentage difference in y, holding all other factors
fixed.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 32 / 43
A Single Dummy Independent Variable

When the coefficient on a dummy variable suggests a large


proportionate change in y, the exact percentage difference can be
obtained exactly as with the semi-elasticity calculation in (4).

Example (Log Hourly Wage Equation)


Let us reestimate the wage equation, using log(wage) as the dependent
variable and adding quadratics in exper and tenure:

\ = .417 − .297 f emale + .080 educ + .029 exper


log(wage)
(.100) (.055) (.058) (.005)
− .00058 exper + .032 tenure − .00059 tenure2
2

(.00010) (.007) (.00023)


n = 526, R2 = .441.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 33 / 43
A Single Dummy Independent Variable
Example (Log Hourly Wage Equation (continued ))
Using the approximation 100 · ∆ log(y) ≈ %∆y, the coefficient on female
implies that, for the same levels of educ, exper, and tenure, women earn
about 100(.297) = 29.7% less than men. We can do better than this by
computing the exact percentage difference in predicted wages. What we
want is the proportionate difference in wages between females and males,
\F − wage
holding other factors fixed: (wage \M )/wage\M . What we have
from the estimated model is
\ F ) − log(wage
log(wage \ M ) = −.297.

Exponentiating and subtracting one gives

\F − wage
(wage \M = exp(−.297) − 1 ≈ −.257.
\M )/wage

This more accurate estimate implies that a woman’s wage is, on average,
25.7% below a comparable man’s wage.
Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables
December 28, 2019 34 / 43
Using Dummy Variables for Multiple Categories

We can use several dummy independent variables in the same


equation. For example, we could add the dummy variable married to
equation (14).
The coefficient on married gives the (approximate) proportional
differential in wages between those who are and are not married,
holding gender, educ, exper, and tenure fixed.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 35 / 43
Using Dummy Variables for Multiple Categories
Example (Log Hourly Wage Equation)
Let us estimate a model that allows for wage differences among four groups:
married men, married women, single men, and single women. To do this, we must
select a base group; we choose single men. Then, we must define dummy
variables for each of the remaining groups. Call these marrmale, marrf em, and
singf em. Putting these three variables into (14) (and, of course, dropping
female, since it is now redundant) gives

\ = .321 + .213 marrmale − .198 marrf em


log(wage)
(.100) (.055) (.058)
− .110 singf em + .079 educ + .027 exper − .00057 exper2
(.056) (.007) (.005) (.00011)
+ .029 tenure − .00053 tenure2
(.007) (.00023)
n = 526, R2 = .461.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 36 / 43
Using Dummy Variables for Multiple Categories

Example (Log Hourly Wage Equation (continued ))


All of the coefficients, with the exception of singf em, have t statistics well
above two in absolute value. The t statistic for singf em is about −1.96,
which is just significant at the 5% level against a two-sided alternative.

To interpret the coefficients on the dummy variables, we must remember


that the base group is single males. Thus, the estimates on the three
dummy variables measure the proportionate difference in wage relative to
single males. For example, married men are estimated to earn about 21.3%
more than single men, holding levels of education, experience, and tenure
fixed. A married woman, on the other hand, earns a predicted 19.8% less
than a single man with the same levels of the other variables.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 37 / 43
Interactions Involving Dummy Variables

We have effectively seen interactions among dummy variables in the


previous example, where we defined four categories based on marital
status and gender.
In fact, we can recast that model by adding an interaction term
between f emale and married to the model where f emale and
married appear separately.
The estimated model with the f emale-married interaction term is

\ = .321 − .110 f emale + .213 married


log(wage)
(.100) (.056) (.055)
(15)
− .301 f emale × married + . . . ,
(.072)

where the rest of the regression is necessarily identical to the previous


result.
Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables
December 28, 2019 38 / 43
Interactions Involving Dummy Variables

Equation (15) shows explicitly that there is a statistically significant


interaction between gender and marital status.
This model also allows us to obtain the estimated wage differential
among all four groups, but here we must be careful to plug in the
correct combination of zeros and ones.
Setting f emale = 0 and married = 0 corresponds to the group single
men, which is the base group, since this eliminates f emale, married,
and f emale × married.
We can find the intercept for married men by setting f emale = 0 and
married = 1 in (15); this gives an intercept of .321 + .213 = .534,
and so on.
Equation (15) is just a different way of finding wage differentials
across all gender-marital status combinations.

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 39 / 43
Interactions Involving Dummy Variables

Example (Effects of Computer Usage on Wages)


Krueger (1993) estimates the effects of computer usage on wages. He
defines a dummy variable, which we call compwork, equal to one if an
individual uses a computer at work. Another dummy variable, comphome,
equals one if the person uses a computer at home. Using 13,379 people
from the 1989 Current Population Survey, Krueger (1993, Table 4) obtains

\ = β̂0 + .177 compwork + .070 comphome


log(wage)
(.009) (.019)
+ .017 compwork × comphome + other f actors,
(.023)

(The other factors are the standard ones for wage regressions, including
education, experience, gender, and marital status; see Krueger’s paper for
the exact list.)
Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables
December 28, 2019 40 / 43
Interactions Involving Dummy Variables

Example (Effects of Computer Usage on Wages (continued ))


Krueger does not report the intercept because it is not of any importance;
all we need to know is that the base group consists of people who do not
use a computer at home or at work.

It is worth noticing that the estimated return to using a computer at work


(but not at home) is about 17.7%. (The more precise estimate is 19.4%.)
Similarly, people who use computers at home but not at work have about a
7% wage premium over those who do not use a computer at all. The
differential between those who use a computer at both places, relative to
those who use a computer in neither place, is about 26.4% (obtained by
adding all three coefficients and multiplying by 100), or the more precise
estimate 30.2% obtained from equation (4).

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 41 / 43
Reading Assignment

Read the following from Wooldridge (2015):


I Allowing for Different Slopes
I Testing for Differences in Regression Functions across Groups

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 42 / 43
************* End of Chapter Four *************

Lemi Taye (AAUSC) Ch 4: MLR: Further Issues and Dummy Variables


December 28, 2019 43 / 43

You might also like