Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 12

Regression Models for Quantitative

(Numeric) and Qualitative


(Categorical) Predictors

KNNL – Chapter 8
Polynomial Regression Models
• Useful in 2 Settings:
– True relation between response and predictor is
polynomial
– True relation is complex nonlinear function that can be
approximated by polynomial in specific range of X-levels
• Models with 1 Predictor: Including p polynomial
terms in model, creates p-1 “bends”
– 2nd order Model: E{Y} = b0 + b1x + b2x2 (x = centered X)
– 3rd order Model: E{Y} = b0 + b1x + b2x2 + b3x3
• Response Surfaces with 2 (or more) predictors
– 2nd order model with 2 Predictors:
E Y   b0  b1 x1  b2 x2  b11 x12  b22 x22  b12 x1 x2 x1  X1  X 1 x2  X 2  X 2
Modeling Strategies
• Use Extra Sums of Squares and General Linear Tests to
compare models of increasing complexity (higher order)
• Use coding in fitting models (centered/scaled)
predictors to reduce multicollinearity when conducting
testing.
• Fit models in original units, or back-transform for
plotting on original scale* (see below for quadratic)
• For Response Surfaces, include multiple replicates at
center point for goodness-of-fit tests

   
^ 2
Centered variables: Y  b0  b1 x  b11 x 2  b0  b1 X  X  b11 X  X
2

 b0  b1 X  b1 X  b11 X 2  2b11 X X  b11 X  b0  b1 X  b11 X
2
  
2
 b1  2b11 X X  b11 X  bo'  b1' X  b2' X 2
Regression Models with Interaction Term(s)
• Interaction  Effect (Slope) of one predictor variable
depends on the level other predictor variable(s)
• Formulated by including cross-product term(s) among
predictor variables
• 2 Variable Models: E{Y} = b0 + b1X1 + b2X2 + b3X1X2
X 2  0  E Y   b 0  b1 X 1  b 2  0   b3  X 1 (0)   b 0  b1 X 1

X 2  10  E Y   b 0  b1 X 1  b 2 10   b3  X 1 (10)   b 0  b1 X 1  10b 2  10b3 X 1 


  b 0  10b 2    b1  10b3  X 1

Testing Hypothesis of no interaction: H 0 : b3  0 H A : b3  0


Qualitative Predictors
• Often, we wish to include categorical variables as
predictors (e.g. gender, region of country, …)
• Trick: Create dummy (indicator) variable(s) to
represent effects of levels of the categorical variables
on response
• Problem: If variable has c categories, and we create c
dummy variables, the model is not full rank when we
include intercept
• Solution: Create c – 1 dummy variables, leaving one
level as the control/baseline/reference category
Example – Salary vs Experience by Region
State has 3 regions, sample 2 people per region, Y =salary, X 1  experience
1 if Region 1 1 if Region 2 1 if Region 3
X2   X3   X4  
 0 otherwise  0 otherwise  0 otherwise
 6

1 X 11 1 0 0  6  X i1 2 2 2 
1 X 21 1 0 0   i 1

  6 6

1 X 31 0 1 0   X i1  X i21 X 11  X 21 X 31  X 41 X 51  X 61 
X  X'X   i 1 i 1 
1 X 41 0 1 0
 2 X 11  X 21 2 0 0 
1 X 51 0 0 1  
   2 X 31  X 41 0 2 0 
1 X 61 0 0 1 
 2 
X 51  X 61 0 0 2

Solution, just use the Region 1 dummy (X2) and the region 2 dummy (X3), making
Region 3 the “reference” region (Note: it is arbitrary which region is the reference)
Example – Salary vs Experience by Region
1 if Region 1 1 if Region 2
3 regions, Y =salary, X 1  experience X2   X3  
 0 otherwise  0 otherwise

E Y   b 0  b1 X 1  b 2 X 2  b3 X 3

Region 1: X 2  1, X 3  0  E Y   b 0  b1 X 1  b 2 (1)  b3 (0)   b 0  b 2   b1 X 1


Region 2: X 2  0, X 3  1  E Y   b 0  b1 X 1  b 2 (0)  b3 (1)   b 0  b3   b1 X 1
Region 3: X 2  0, X 3  0  E Y   b 0  b1 X 1  b 2 (0)  b3 (0)  b 0  b1 X 1

b 2  Difference between Regions 1 and 3, controlling for experience


b3  Difference between Regions 2 and 3, controlling for experience
b 2  b3  Difference between Regions 1 and 2, controlling for experience

b 2  b3  0  No differences among Regions 1,2,3 wrt Salary, Controlling for Experience


Interactions Between Qualitative and Quantitative Predictors
• We can allow the slope wrt to a Quantitative Predictor to
differ across levels of the Categorical Predictor
• Trick: Create cross-product terms between Quantitative
Predictor and each of the c-1 dummy variables
• Can conduct General Linear Test to determine whether
slopes differ (or t-test when qualitative predictor has c=2
levels)
• These models generalize to any number of quantitative
and qualitative predictors
Salary (Y ), Expediture (X 1 ), and regions (X 2 , X 3 ):
Additive Model: E Y   b0  b1 X 1  b 2 X 2  b3 X 3
Interaction Model: E Y   b0  b1 X 1  b 2 X 2  b3 X 3  b 4 X 1 X 2  b5 X 1 X 3
Region 1: X 2  1, X 3  0  : E Y   b0  b1 X 1  b 2 (1)  b3 (0)  b 4 X 1 (1)  b5 X 1 (0)   b0  b 2    b1  b 4  X 1
Region 2: X 2  0, X 3  1 : E Y    b0  b3    b1  b5  X 1 Region 3: X 2  0, X 3  0  : E Y   b0  b1 X 1

You might also like