Professional Documents
Culture Documents
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
Instructor
Taimoor Naseer Waraich
Parallels with Simple Regression
• b0 is still the intercept
• b1 to bk all called slope parameters
• u is still the error term (or disturbance)
• Still need to make a zero conditional mean assumption, so now
assume that
• E(u|x1,x2, …,xk) = 0
• Still minimizing the sum of squared residuals, so have k+1 first
order conditions
Interpreting Multiple Regression
~ ~ ~
Compare the simple regression y 0 1 x1
with the multiple regression yˆ ˆ0 ˆ1 x1 ˆ 2 x2
~
Generally, 1 ˆ1 unless :
ˆ 0 (i.e. no partial effect of x ) OR
2 2
• Can compute the fraction of the total sum of squares (SST) that
is explained by the model, call this the R-squared of regression
• R2 = SSE/SST = 1 – SSR/SST
More about R-squared
• R2 can never decrease when another independent variable is
added to a regression, and usually will increase
{ d=0
y = b0 + b1x
}
b0
x
Dummies for Multiple Categories
• We can use dummy variables to control for something with
multiple categories
• Suppose everyone in your data is either a HS dropout, HS grad
only, or college grad
• To compare HS and college grads to HS dropouts, include 2
dummy variables
• hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if
college grad, 0 otherwise
Multiple Categories (cont)
• Any categorical variable can be turned into a set of dummy
variables
• Because the base group is represented by the intercept, if there
are n categories there should be n – 1 dummy variables
• If there are a lot of categories, it may make sense to group
some together
• Example: top 10 ranking, 11 – 25, etc.
Interactions Among Dummies
• Interacting dummy variables is like subdividing the group
• Example: have dummies for male, as well as hsgrad and
colgrad
• Add male*hsgrad and male*colgrad, for a total of 5 dummy
variables –> 6 categories
• Base group is female HS dropouts
• hsgrad is for female HS grads, colgrad is for female college
grads
• The interactions reflect male HS grads and male college grads
More on Dummy Interactions
• Formally, the model is y = b0 + d1male + d2hsgrad + d3colgrad
+ d4male*hsgrad + d5male*colgrad + b1x + u, then, for example:
• If male = 0 and hsgrad = 0 and colgrad = 0
• y = b 0 + b 1x + u
• If male = 0 and hsgrad = 1 and colgrad = 0
• y = b0 + d2hsgrad + b1x + u
• If male = 1 and hsgrad = 0 and colgrad = 1
• y = b0 + d1male + d3colgrad + d5male*colgrad + b1x + u
Other Interactions with Dummies
• Can also consider interacting a dummy variable, d, with a
continuous variable, x
• y = b0 + d1d + b1x + d2d*x + u
• If d = 0, then y = b0 + b1x + u
• If d = 1, then y = (b0 + d1) + (b1+ d2) x + u
• This is interpreted as a change in the slope
Example of d0 > 0 and d1 < 0
y
y = b0 + b1x
d=0
d=1
y = (b0 + d0) + (b1 + d1) x
x
Testing for Differences Across Groups
• Testing whether a regression function is different for one group
versus another can be thought of as simply testing for the joint
significance of the dummy and its interactions with all other x
variables
• So, you can estimate the model with all the interactions and
without and form an F statistic, but this could be unwieldy
The F-test
• The F-test is an analysis of the variance of a regression
• It can be used to test for the significance of a group of variables
or for a restriction
• It has a different distribution to the t-test, but can be used to test
at different levels of significance
• When determining the F-statistic we need to collect either the
residual sum of squares (RSS) or the R-squared statistic
• The formula for the F-test of a group of variables can be
expressed in terms of either the residual sum of squares (RSS)
or explained sum of squares (ESS)
F-test of explanatory power
• This is the F-test for the goodness of fit of a regression and in
effect tests for the joint significance of the explanatory variables.
1 2 3 4 5
1 161.4 199.5 215.7 224.6 230.2
2 18.5 19.0 19.2 19.3 19.3
3 10.1 9.6 9.3 9.1 9.0
4 7.7 7.0 6.6 6.4 6.3
5 6.6 5.8 5.4 5.2 5.1
F-statistic
• When testing for the significance of the goodness of fit, our
null hypothesis is that the explanatory variables jointly equal 0.
Improvement in fit/
Extra degrees of freedom used up
Residual sum of squares remaining/
Degrees of freedom remaining
F-tests
• The test for joint significance has its own
formula, which takes the following form:
RSS R RSSu / m
F
RSSu / n k
m number of restrictio ns
k parameters in unrestrict ed mod el
RSSu unrestrict ed RSS
RSS R restricted RSS
Joint Significance of a group of variables
• To carry out this test you need to conduct two separate OLS
regression, one with all the explanatory variables in
(unrestricted equation), the other with the variables whose joint
significance is being tested, removed.
• Then collect the RSS from both equations.
• Put the values in the formula
• Find the critical value and compare with the test statistic. The
null hypothesis is that the variables jointly equal 0.
Joint Significance
• If we have a 3 explanatory variable model and wish
to test for the joint significance of 2 of the variables
(x and z), we need to run the following restricted and
unrestricted models:
yt 0 1wt ut restricted
yt 0 1wt 2 xt 3 zt ut
unrestrict ed
Example of the F-test for joint significance
• Given the following model, we wish to test the joint
significance of w and z. Having estimated them, we collect
their respective RSSs (n=60).
yt 0 1 xt 2 wt 3 zt ut unrestrict ed
RSSu 0.75
yt 0 1xt vt restricted
RSS R 1.5
Joint significance