Professional Documents
Culture Documents
ECN 813 Dummy Variable
ECN 813 Dummy Variable
identify female students and make non-female students the reference group.
1
The Regression Equation
For now, the key outputs of interest are the least-squares estimates for regression
coefficients. They allow us to fully specify our regression equation:
ŷ = 38.6 + 0.4 * IQ + 7 * X 1
This is the only linear equation that satisfies a least-squares criterion. That means this
equation fits the data from which it was created better than any other linear equation.
Significance of Regression Coefficients
• With multiple regression, there is more than one independent
variable; so it is natural to ask whether a particular independent
variable contributes significantly to the regression after effects of
other variables are taken into account. The answer to this question
can be found in the regression coefficients table:
• The regression coefficients table shows the following information for
each coefficient: its value, its standard error, a t-statistic, and the
significance of the t-statistic. In this example, the t-statistics for IQ
and gender are both statistically significant at the 0.05 level. This
means that IQ predicts test score beyond chance levels, even after
the effect of gender is taken into account. And gender predicts test
score beyond chance levels, even after the effect of IQ is taken into
account.
• The regression coefficient for gender provides a measure of
the difference between the group identified by the dummy
variable (males)
• and the group that serves as a reference (females).
• Here, the regression coefficient for gender is 7. This
suggests that, after effects of IQ are taken into account,
males will score 7 points higher on the test than the
reference group (females).
• And, because the regression coefficient for gender is
statistically significant, we interpret this difference as a real
effect - not a chance effect.
INTERCEPT DUMMY VARIABLE
• In general, if:
• Y = b0 + b1IQ+ µ -------------------------------------------------- (1)
• Introduce the gender dummy variable X1
• Y = b0 + b1IQ + b2X1 + µ -------------------------------------------------- (2)
• Where: b0, b1, and b2 are regression coefficients. IQ and X1 are variables of the model.
X1 is defined as the categorical variable Gender represented as a single dummy variable:
• X1 = 1 for male students.
• X1 = 0 for non-male students.
• Y is the test score.
• And in order to get interpretation for X1 consider the two possible values of X1 and how they will affect
the specification of equation (2) above. For X1 = 0 we will have:
• Y = b0 + b1IQ + b2(0) + µ -------------------------------------------------- (3)
• Y = b0 + b1IQ + µ -------------------------------------------------- (4)
• Equation 4 is the same as the original model without the dummy variable X 1.
• If X1 = 1, we will have:
• Y = b0 + b1IQ + b2(1) + µ -------------------------------------------------- (5)
• Y = b0 + b1IQ + b2 + µ -------------------------------------------------- (6)
• Y = (b0 + b2) + b1IQ + µ -------------------------------------------------- (7)
• The constant or intercept of equation 6 is now different from b0 and is equal
to (b0 + b2). So we can see that by including the dummy variable, the value of
the intercept has changed, shifting the function and therefore the regression
line) up or down.
• Relating to our estimate, in the above example,
• ŷ = 38.6 + 0.4 * IQ + 7 * X1
• This suggests that, after effects of IQ are taken into account, females’ score
on the average is 38.6 (b0) males will score 7 points higher or 38.6 + 7 = 45.6
(b0 + b2).
The effect of a dummy variable on the constant of the regression line
• Y
• b1 > 0
• b0 + b1
• b0 b1 < 0
• b0 + b1
• X
SLOPE DUMMY VARIABLE
.Suppose that we think that the last five students (serial number 6 to 10) out of
the 10 students took IQ enhancing drugs. In order to test this, we need to
construct a dummy variable (D) that will take the following values:
• D = 0 for student who did not take the drug (serial number 1 to 5)
• D = 1 for students who took the drug (serial number 6 to 10)
• This dummy variable, because we assume that it affected the slope para
meter must be included in the model in the following multiplicative way:
• Y = b0 + b1IQ + b2DIQ + µ -----------------------------------------(8)
• The effect of the dummy variable can be separated again according to two
different outcomes. If D = 0, we will have:
• Y = b0 + b1IQ + b2(0)IQ + µ -----------------------------------------(9)
• Y = b0 + b1IQ + µ -----------------------------------------(10)
• Equation 10 is the same as initial condition.
• If D = 1:
• Y = b0 + b1IQ + b2(1)IQ + µ --------------------------------------(11)
• Y = b0 + b1IQ + b2IQ + µ
• Y = b0 + (b1 + b2)IQ + µ -----------------------------------------(12)
• So, the marginal impact of IQ to test performance for students that did not
take drug is b1 and the marginal impact of IQ to test performance for students
that took drug is
(b1 + b2).
The effect of a dummy variable on the slope of the regression
• Slope b1 + b2
• Y
• b2 > 0
• Slope b1
•
• b0 b2 < 0
• b2 < 0
• X
The combined effect of intercept and slope dummies
• Let us suppose that we have a dummy variable defined as:
• D = 0 for student who did not take the drug
• D = 1 for students who took the drug
• Given our model:
• Y = b0 + b1IQ + µ
• Then using dummy variable to examine its effects on both the constant and the
slope we have:
• Y = b0 + b1IQ +b2D + b3D(IQ) + µ
• If D = 0;
• Y = b0 + b1IQ + µ (As before)
• If D = 1;
• Y = b0 + b1IQ + b2 + b3IQ + µ = (b0 + b2) + (b1 + b3)IQ + µ