Professional Documents
Culture Documents
Regression I: Simple Regression: Class 21
Regression I: Simple Regression: Class 21
Simple Regression
Class 21
Schedule for Remainder of Term
Nov 21: Simple Regression
Will cover:
Simple Regression: Does income affect purchasing?
Multiple Regression: Do income and caffeine affect purchasing?
Moderated Multiple Regression: Does caffeine moderate the effect
of income on purchasing?
35
30
ANOVA: Do the means
Aggression
25
of Group A, Group B 20
15
and Group C differ? 10
5
Categorical data only 0
Regression: Does 12
10
Variable X influence
Aggression
8
Outcome Y? 6
4
Continuous Data and 2
Categorical Data 0
low medium high very extreme
high
Frustration
Regression vs. ANOVA as
Vehicles for Analyzing Data
Possible Explanations?
A. Public is stealing the silver
B. Reginald is misplacing the silver
C. Guido is pawning the silver
The Palace Heist: A True-Regression Mystery
Possible explanations:
A. Public is stealing silver
B. Reginald’s ADD leads to misplaced silver
C. Guido is pawning silver
Is it just one of these explanations, or a combination of them?
E.g., Public theft, alone, OR public theft plus Guido’s gambling?
If it is multiple causes, are they equally important or is one
more important than another?
E.g., Crowd size has a significant effect on lost silver, but is
less important than Guido’s debts.
Moderation: Do circumstances interact?
E.g., Does more silver get lost when Reginald’s ADD is severe,
but only when crowds are large?
Regression Can Test Each of These Possibilities,
And Can Do So Simultaneously
Y = DV = Aggression
Aggression
4
3
B1 = slope
2
B1 = Effect of DV on IV (effect
1
of reprimands on aggression)
1 2 3 4 5 6 7 8
Coefficients = parameters; Reprimands
things that account for Y.
b0 and b1 are coefficients.
6
bullies will aggress 2 times a day
5
Aggression
plus (1 * number of reprimands).
4
3
How many times will a bully
2
aggress if he/she is reprimanded
1
3 times?
1 2 3 4 5 6 7 8
Y = 2 + 1.0 (3) = 5
Reprimands
7
6
5
4
3
2
1
1 2 3
Y=3+0
Regression Tests “Models”
Model: A predicted TYPE of relationship between one
or more IVs (predictors) and a DV (outcome).
Relationships can take various shapes:
1 2 3 4 5 6 7 8
Reprimands
9
*
8
Individual
* Response *
7
*
Aggression * * *
6
5 * * *
* * * *
4
* * * *
3
* *
2
* * *
1
1 2 3 4 5 6 7 8 9 10 11 12
Reprimands
Line represents the "best fitting slope".
Disparate points represent residuals = deviations from slope.
"Model fit" is based on method of least squares.
Method of Least Squares
Regression attempts to find the “best
fitting” line to describe data.
Actual Response
9 10
Deviation,
i.e., Error = * *
predicted – * X88 - Y88 *
8
Aggression (Y)
actual.
*
7
ε 88 * * *
6
* * *
5
* * * *
X88 - Ŷ88
4
Predicted
* * * *
* *
3
Response
* * *
2
1 2 3 4 5 6 7 8 9 10 11 12
Reprimands (X)
Regression Compares Slope to Mean
10
*
9
8 * * *
*
*
4 5 6 7
Aggression
* *
* *
*
Null Hyp: Mean score of aggression is best predictor,
reprimands unimportant (b1 = 0)
3
1 2 3 4 5 6 7 8 9 10 11 12
Reprimands
Observed slope
10
Null slope
4 5 6 79
8
Aggression
1 2 3 4 5 6 7 8 9 10 11 12
Reprimands
Total Sum of Squares (SST)
10
*
Model
9
8 * * * Slope
*
*
4 5 6 7
Aggression
* *
* * Null Slope
1 2 3 4 5 6 7 8 9 10 11 12
Reprimands
Residual Sum of Squares (SSR)
10
*
9
8 * * *
*
*
4 5 6 7
Aggression
* *
* *
*
1 2 3 4 5 6 7 8 9 10 11 12
Reprimands
The Regression Question
Does the model (e.g., the regression line) do a better job
describing obtained data than the mean (i.e., b1 = 0)?
In other words,
Regression F
“Regression” = model
SSM
SSR
MSR MSM
Assessing Individual Predictors
Is the predictor slope significant, i.e. does IV predict outcome?
predictor t sig. of t
B = slope; Std. Error = Std. Error of slope t = B / Std. Error of B
Beta = Standardized B. Shows how many SDs outcome changes
per each SD change in predictor.
Beta allows comparison between predictors, of predictor strength.
Interpreting Simple Regression
Aggression
Y = bo + b1 + b2 + ε
Y = __ Aggression
b1 = __ reprimands
b2 = __ family stress
ε = __ error
Elements of Multiple Regression
Total Sum of Squares (SST) = Deviation of each score from DV mean,
square these deviations, then sum them.
Residual Sum of Squares (SSR) = Each residual from total model (not
simple line), squared, then sum all these squared residuals.
Model Sum of Squares (SSM) = SST – SSR = The amount that the
total model explains result above and beyond the simple mean.
NOTE: Main diff. between these values in mutli. regression and simple
regression is use of total model rather than single slope. Math much
more complicated, but conceptually the same.
Methods of Regression
Hierarchical: 1. Predictors selected based on theory or past work
2. Predictors entered into analysis in order of predicted
importance, or by known influence.
3. New predictors are entered last, so that their
unique contribution can be determined.
Forced Entry: All predictors forced into model simultaneously. No
starting hypothesis re. relative importance of predictors.
Stepwise: Program automatically searches for strongest
predictor, then second strongest, etc. Predictor
1—is best at explaining entire model, accounts for
say 40% . Predictor 2 is best at explaining
remaining 60%, etc. Controversial method.
In general, Hierarchical is most common and most accepted.
Avoid “kitchen sink” Limit number of predictors to few as possible, and
to those that make theoretical sense.
Sample Size in Regression
R = Power of regression
R2 = Amount var. explained
Adj. R2 = Corrects for multiple
predictors
R sq. change = Impact of each
added model
X1 X2 X3
Homoscedasticity and
Heteroscedasticity
Assessing Homoscedasticity
Select: Plots
Enter: ZRESID for Y and ZPRED for X
Ideal Outcome: Equal distribution across chart
Extreme Cases
Cases that deviate greatly from * *
expected outcome > ± 2.5 can * * *
warp regression. *
* * * *
*
First, identify outliers using *
Casewise Diagnostics option.
Possible problem
case
Casewise Diagnostics for Problem Cases Only
In "Statistics" Option, select Casewise Diagnostics
Select "outliers outside:" and type in how many Std. Dev. you
regard as critical. Default = 3
More than 3 DV
What If Assumption(s) are Violated?
What is problem with violating assumptions?
Some options:
B SE B β
Step 1
Constant -0.54 0.42
Fam. Stress 0.74 0.11 .85 *
Step 2
Constant 0.71 0.34
Fam. Stress 0.57 0.10 .67 *
Reprimands 0.33 0.10 .38 *
Note: R2 = .72 for Step 1, Δ R2 = .11 for Step 2 (p = .004); * p < .01