Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable

Regression Analysis
Estimation and Interpretation Of Regression

Equation
Dummy Independent Variable
Instructor
Taimoor Naseer Waraich
Parallels with Simple Regression
• b0 is still the intercept
• b1 to bk all called slope parameters
• u is still the error term (or disturbance)
• Still need to make a zero conditional mean assumption, so now
assume that
• E(u|x1,x2, …,xk) = 0
• Still minimizing the sum of squared residuals, so have k+1 first
order conditions
Interpreting Multiple Regression
yˆ  ˆ0  ˆ1 x1  ˆ 2 x2  ...  ˆ k xk , so

yˆ  ˆ x  ˆ x  ...  ˆ x ,
1 1 2 2 k k
so holding x2 ,..., xk fixed implies that

yˆ  ˆ x , that is each  has
1 1
a ceteris pa ribus interpreta tion

A “Partialling Out” Interpretation
Consider t he case where k  2, i.e.

yˆ  ˆ  ˆ x  ˆ x , then
0 1 1 2 2
ˆ1    rî1 yi   rˆ2

i1 , where rî1 are
the residuals from the estimated
regression xˆ1  ˆ0  ˆ2 xˆ 2
“Partialling Out” continued
• Previous equation implies that regressing y on x1 and x2 gives
same effect of x1 as regressing y on residuals from a regression
of x1 on x2
• This means only the part of xi1 that is uncorrelated with xi2 are
being related to yi so we’re estimating the effect of x1 on y after
x2 has been “partialled out”
Simple vs Multiple Reg Estimate
~ ~ ~
Compare the simple regression y   0  1 x1
with the multiple regression yˆ  ˆ0  ˆ1 x1  ˆ 2 x2
~
Generally, 1  ˆ1 unless :
ˆ  0 (i.e. no partial effect of x ) OR
2 2
x1 and x2 are uncorrelat ed in the sample

Goodness-of-Fit
We can think of each observatio n as being made
up of an explained part, and an unexplaine d part,
yi  yˆ i  uî We then define the following :
  y  y  is the total sum of squares (SST)
2
i
  yˆ  y  is the explained sum of squares (SSE)

2
i
 uˆ is the residual sum of squares (SSR)

2
i
Then SST  SSE  SSR

Goodness-of-Fit (continued)
• How do we think about how well our sample regression line
fits our sample data?
• Can compute the fraction of the total sum of squares (SST) that
is explained by the model, call this the R-squared of regression
• R2 = SSE/SST = 1 – SSR/SST
More about R-squared
• R2 can never decrease when another independent variable is
added to a regression, and usually will increase
• Because R2 will usually increase with the number of

independent variables, it is not a good way to compare models
Assumptions for Unbiasedness
• Population model is linear in parameters: y = b0 + b1x1 + b2x2 +…+
bkxk + u
• We can use a random sample of size n, {(xi1, xi2,…, xik, yi): i=1, 2, …,
n}, from the population model, so that the sample model is yi = b0 +
b1xi1 + b2xi2 +…+ bkxik + ui
• E(u|x1, x2,… xk) = 0, implying that all of the explanatory variables are
exogenous
• None of the x’s is constant, and there are no exact linear relationships
among them
Omitted Variable Bias
Suppose the true model is given as

y   0  1 x1   2 x2  u , but we
~ ~ ~
estimate y     x  u ,
0 1 1
Summary of Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
b2 > 0 Positive bias Negative bias
b2 < 0 Negative bias Positive bias

Omitted Variable Bias Summary
• Two cases where bias is equal to zero
– b2 = 0, that is x2 doesn’t really belong in model
– x1 and x2 are uncorrelated in the sample
• If correlation between x2 , x1 and x2 , y is the same direction,

bias will be positive
• If correlation between x2 , x1 and x2 , y is the opposite
direction, bias will be negative
Dummy Variables
• A dummy variable is a variable that takes on the value 1 or 0
• Examples: male (= 1 if are male, 0 otherwise), south (= 1 if in
the south, 0 otherwise), etc.
• Dummy variables are also called binary variables, for obvious
reasons
A Dummy Independent Variable
• Consider a simple model with one continuous variable (x) and
one dummy (d)
• y = b0 + d0d + b1x + u
• This can be interpreted as an intercept shift
• If d = 0, then y = b0 + b1x + u
• If d = 1, then y = (b0 + d0) + b1x + u
• The case of d = 0 is the base group
Example of d0 > 0
y y = (b0 + d0) + b1x

d=1
slope = b1
d0
{ d=0
y = b0 + b1x
}
b0
x
Dummies for Multiple Categories
• We can use dummy variables to control for something with
multiple categories
• Suppose everyone in your data is either a HS dropout, HS grad
only, or college grad
• To compare HS and college grads to HS dropouts, include 2
dummy variables
• hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if
college grad, 0 otherwise
Multiple Categories (cont)
• Any categorical variable can be turned into a set of dummy
variables
• Because the base group is represented by the intercept, if there
are n categories there should be n – 1 dummy variables
• If there are a lot of categories, it may make sense to group
some together
• Example: top 10 ranking, 11 – 25, etc.
Interactions Among Dummies
• Interacting dummy variables is like subdividing the group
• Example: have dummies for male, as well as hsgrad and
colgrad
• Add male*hsgrad and male*colgrad, for a total of 5 dummy
variables –> 6 categories
• Base group is female HS dropouts
• hsgrad is for female HS grads, colgrad is for female college
grads
• The interactions reflect male HS grads and male college grads
More on Dummy Interactions
• Formally, the model is y = b0 + d1male + d2hsgrad + d3colgrad
+ d4male*hsgrad + d5male*colgrad + b1x + u, then, for example:
• If male = 0 and hsgrad = 0 and colgrad = 0
• y = b 0 + b 1x + u
• y = b0 + d2hsgrad + b1x + u
• y = b0 + d1male + d3colgrad + d5male*colgrad + b1x + u
Other Interactions with Dummies
• Can also consider interacting a dummy variable, d, with a
continuous variable, x
• y = b0 + d1d + b1x + d2d*x + u
• If d = 0, then y = b0 + b1x + u
• If d = 1, then y = (b0 + d1) + (b1+ d2) x + u
• This is interpreted as a change in the slope
Example of d0 > 0 and d1 < 0
y
y = b0 + b1x
d=0
d=1
y = (b0 + d0) + (b1 + d1) x
x
Testing for Differences Across Groups
• Testing whether a regression function is different for one group
versus another can be thought of as simply testing for the joint
significance of the dummy and its interactions with all other x
variables
• So, you can estimate the model with all the interactions and
without and form an F statistic, but this could be unwieldy
The F-test
• The F-test is an analysis of the variance of a regression
• It can be used to test for the significance of a group of variables
or for a restriction
• It has a different distribution to the t-test, but can be used to test
at different levels of significance
• When determining the F-statistic we need to collect either the
residual sum of squares (RSS) or the R-squared statistic
• The formula for the F-test of a group of variables can be
expressed in terms of either the residual sum of squares (RSS)
or explained sum of squares (ESS)
F-test of explanatory power
• This is the F-test for the goodness of fit of a regression and in
effect tests for the joint significance of the explanatory variables.
• It is based on the R-squared statistic.
• It is routinely produced by most computer software packages.
• It follows the F-distribution, which is quite different to the t-test

F-test formula
• The formula for the F-test of the goodness
of fit is:
2
R / k 1
F 2
(1  R ) /( n  k )
k 1
Fnk
F-distribution
• To find the critical value of the F-distribution, in general you
need to know the number of parameters and the degrees of
freedom.
• The number of parameters is then read across the top of the

table, the d of f. from the side. Where these two values
intersect, we find the critical value.
F-test critical value
1 2 3 4 5
1 161.4 199.5 215.7 224.6 230.2
2 18.5 19.0 19.2 19.3 19.3
3 10.1 9.6 9.3 9.1 9.0
4 7.7 7.0 6.6 6.4 6.3
5 6.6 5.8 5.4 5.2 5.1
F-statistic
• When testing for the significance of the goodness of fit, our
null hypothesis is that the explanatory variables jointly equal 0.
• If our F-statistic is below the critical value we fail to reject the

null and therefore we say the goodness of fit is not significant.
Joint Significance
• The F-test is useful for testing a number of hypotheses and is
often used to test for the joint significance of a group of
variables.
• In this type of test, we often refer to ‘testing a restriction’.
• This restriction is that a group of explanatory variables are

jointly equal to 0
F-test for joint significance
• The formula for this test can be viewed as:
Improvement in fit/
Extra degrees of freedom used up
Residual sum of squares remaining/
Degrees of freedom remaining
F-tests
• The test for joint significance has its own
formula, which takes the following form:
RSS R  RSSu / m
F
RSSu / n  k
m  number of restrictio ns
k  parameters in unrestrict ed mod el
RSSu  unrestrict ed RSS
RSS R  restricted RSS
Joint Significance of a group of variables
• To carry out this test you need to conduct two separate OLS
regression, one with all the explanatory variables in
(unrestricted equation), the other with the variables whose joint
significance is being tested, removed.
• Then collect the RSS from both equations.
• Put the values in the formula
• Find the critical value and compare with the test statistic. The
null hypothesis is that the variables jointly equal 0.
Joint Significance
• If we have a 3 explanatory variable model and wish
to test for the joint significance of 2 of the variables
(x and z), we need to run the following restricted and
unrestricted models:
yt   0  1wt  ut  restricted
yt   0  1wt   2 xt   3 zt  ut
 unrestrict ed
Example of the F-test for joint significance
• Given the following model, we wish to test the joint
significance of w and z. Having estimated them, we collect
their respective RSSs (n=60).
yt   0  1 xt   2 wt   3 zt  ut  unrestrict ed
 RSSu  0.75
yt   0  1xt  vt  restricted
 RSS R  1.5
Joint significance
where :  0 ,  0 are constants.

1.... 3 , 1 are slope parameters .
ut , vt are error term s
xt , wt , zt are explanator y variables
Joint significance
H 0 :  2  3  0
H1 :  2   3  0
• As the F statistic is greater than the critical

value (28>3.15), we reject the null
hypothesis and conclude that the variables
w and z are jointly significant and should
remain in the model.
Linear Probability Model
• P(y = 1|x) = E(y|x), when y is a binary variable, so we can
write our model as
• P(y = 1|x) = b0 + b1x1 + … + bkxk
• So, the interpretation of bj is the change in the probability of
success when xj changes
• The predicted y is the predicted probability of success
• Potential problem that can be outside [0,1]
Linear Probability Model (cont)
• Even without predictions outside of [0,1], we may estimate
effects that imply a change in x changes the probability by
more than +1 or –1, so best to use changes near mean
• This model will violate assumption of homoskedasticity, so
will affect inference
• Despite drawbacks, it’s usually a good place to start when y is
binary

Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable

Uploaded by

Copyright:

Available Formats

You might also like

Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable

Uploaded by

Copyright:

Available Formats

Regression Analysis

Estimation and Interpretation Of Regression

yˆ  ˆ0  ˆ1 x1  ˆ 2 x2  ...  ˆ k xk , so

so holding x2 ,..., xk fixed implies that

a ceteris pa ribus interpreta tion

Consider t he case where k  2, i.e.

ˆ1    rˆi1 yi   rˆ2

x1 and x2 are uncorrelat ed in the sample

  yˆ  y  is the explained sum of squares (SSE)

 uˆ is the residual sum of squares (SSR)

Then SST  SSE  SSR

• Because R2 will usually increase with the number of

Suppose the true model is given as

Corr(x1, x2) > 0 Corr(x1, x2) < 0

b2 > 0 Positive bias Negative bias

b2 < 0 Negative bias Positive bias

• If correlation between x2 , x1 and x2 , y is the same direction,

y y = (b0 + d0) + b1x

• It is based on the R-squared statistic.

• It is routinely produced by most computer software packages.

• It follows the F-distribution, which is quite different to the t-test

• The number of parameters is then read across the top of the

• If our F-statistic is below the critical value we fail to reject the

• In this type of test, we often refer to ‘testing a restriction’.

• This restriction is that a group of explanatory variables are

where :  0 ,  0 are constants.

• As the F statistic is greater than the critical

You might also like