Professional Documents
Culture Documents
Linear Regression
Linear Regression
Linear Regression
❖Variance
❖Covariance
❖Correlation
Covariance
▪Ranges between -∞ to +∞
Y = 0 + 1 X 1 + 2 X 2 + + P X P +
Dependent Independent
(response) (explanatory)
variable variables
Linear Model
Linear regression is a method that summarizes how the average
values of a numerical outcome variable vary over subpopulations
defined by linear functions of predictors.
kid.score = 78 + 12 · mom.hs
Linear Model
Here, kid.score denotes either predicted or expected test score
given the mom.hs predictor. This model summarizes the
difference in average test scores between the children of
mothers who completed high school and those with mothers
who did not.
The intercept, 78, is the average (or predicted) score for children
whose mothers did not complete high school. To see this
algebraically, consider that to obtain predicted scores for these
children we would just plug 0 into this equation. To obtain
average test scores for children (or the predicted score for a
single child) whose mothers were high school graduates, we
would just plug 1 into this equation to obtain 78 + 12 · 1 = 91.
Linear Model
The difference between these two subpopulation means is equal
to the coefficient on mom.hs. This coefficient tells us that
children of mothers who have completed high school score 12
points higher on average than children of mothers who have
not completed high school.
Linear Model
Regression with a continuous predictor
Data were collected on the depth of a dive of penguins and the duration of the
dive. The following linear model is a fairly good summary of the data, where t
is the duration of the dive in minutes and d is the depth of the dive in yards.
Comments: The interpretation of the intercept doesn’t make sense in the real
world. It isn’t reasonable for the duration of a dive to be near t = 0 , because
that’s too short for a dive. If data with x-values near zero wouldn’t make sense,
then usually the interpretation of the intercept won’t seem realistic in the real
world. It is, however, acceptable (even required) to interpret this as a coefficient
in the model.
Interaction Regression Model
Hypothesizes interaction between pairs of X
variables
– Response to one X variable varies at different
levels of another X variable
Contains two-way cross product terms
Y = 0 + 1x1 + 2x2 + 3x1x2 +
Can be combined with other models
e.g. dummy variable models
Interaction Regression Model
An interaction between mom.hs and mom.iq—that is,
a new predictor which is defined as the product of these two
variables. This allows the slope to vary across subgroups. The
fitted model is
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0 X1
0 0.5 1 1.5
Effect (slope) of X1 on Y does depend on X2 value
When should we look for interactions?
The most commonly used procedure used for regression analysis is called ordinary least
squares (OLS).
The OLS procedure minimizes the sum of squared residuals. From the theoretical
regression model
Yi = þ0 + þ1Xi + si,
þ^0 = Y¯ — þ^1X¯
R2y.12 = .9656
R2(adj) y.123- - -P
0 for women
Thus, for women the model becomes
Yi = + xi + (0) + i = + xi + i
and for men
Yi = + xi + (1)+ i = + + xi + i
Y D =1
D =0
1
+
1
49
Dummy Variable Regression
D= 0
Yi = + xi + Di + i 1
+
1
X
Dummy Variable Regression: Example
Predicting Scholaptic Aptitude Test Scores from PISA scores and gender (males: 0,
females: 1)
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 37.326 3.743 9.972 .000
PISA Score .216 .021 .397 10.413 .000
Gender 9.971 .440 .863 22.647 .000
a. Dependent Variable: SAT Score
A = ˆ=37.326
ˆ= 9.971
ˆ+ ˆ = 37.326 + 9.971 = 47.297
B = ˆ= .216
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 37.326 3.743 9.972 .000
PISA Score .216 .021 .397 10.413 .000
Gender 9.971 .440 .863 22.647 .000
a. Dependent Variable: SAT Score
Modeling Interactions
•Two explanatory variables interact in determining a response variable when
the partial effect of one depends on the value of the other
•Interaction between two variables means that the effect of one of the
variables depends on the value of the other variable
- Example: the effect of income (average income of incumbents in
dollars) on prestige (percentage score) is bigger for men than for women
11
Dummy Variable Regression: Interactions
Y D = 1 (males)
+ D = 0 (females)
Prestige
+ 1
Income X
• The within-gender regressions of income on prestige are not parallel — the slope for men is larger
than the slope for women.
• Because the effect of income varies by gender, income and gender interact inaffecting
prestige
• Interaction refers to the manner in which explanatory variables combine to affect a response
variable, not to the relationship between the explanatory variables themselves.
Problems with Pick-a-Point Approach
Y D = 1 (males)
+ D = 0 (females)
Prestige
+ 1
Income X
and are the intercept and slope for the regression of income on prestige among women
gives the difference in intercepts betweenthe male and female groups
gives the difference in slopes between the two groups
Dummy Variable Regression: Interactions
Y D = 1 (males)
+ D = 0 (females)
Prestige
+ 1
Income
X
Constructing regressors
• Y = prestige, X = income, D = dummy for gender (0 or 1)
• The model then becomes Yi = + xi + D i + (xi Di ) +
• Note xiDi is a new regressor (interaction/moderation) term
- Note: The variable xiDi must be calculated before the analysis is run
• Regression equation for females (Di = 0):
Yi = + xi + 0 + (xi 0) + i = + xi + i
58
Dummy Variable Regression: Interactions
Testing
•Testing for interaction is testing for a
difference in slope between males and
females → is the slope difference
•Statistical hypotheses
H0 : = 0
H1 : 0
Dummy Variable Regression: Interactions
Example
B SE t p-value
Intercept 12.113 4.263 2.841 .005
Gender 7.546 6.453 1.169 .244
income 0.001 0.00 8.927 .000
gender*income 0.0004 0.00 3.536 .001
ˆ= 12.113
ˆ = 7.546
ˆ+ ˆ = 12.113 + 7.546 = 19.66
ˆ= 0.001
ˆ= 0.00044
ˆ = 7.546
ˆ+ ˆ= 0.001 +0.00044 = 0.00144
→ Difference in prestige between males and females
increases one unit in income by
.0014 percentage points, on average 171
7
Dummy Variable Regression
These are following steps to build dummy variables:
Interaction Regression Model
Hypothesizes interaction between pairs of X
variables
– Response to one X variable varies at different
levels of another X variable
Contains two-way cross product terms
Y = 0 + 1x1 + 2x2 + 3x1x2 +
Can be combined with other models
e.g. dummy variable models
Interaction Regression Model
An interaction between mom.hs and mom.iq—that is,
a new predictor which is defined as the product of these two
variables. This allows the slope to vary across subgroups. The
fitted model is
12
4
0 X1
0 0.5 1 1.5
Interaction Example
Y Y = 1 + 2X1 + 3X2 + 4X1X2
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0 X1
0 0.5 1 1.5
Interaction Example
Y Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0 X1
0 0.5 1 1.5
Interaction Example
Y Y = 1 + 2X1 + 3X2 + 4X1X2
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12
8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0 X1
0 0.5 1 1.5
Effect (slope) of X1 on Y does depend on X2 value
When should we look for interactions?