Professional Documents
Culture Documents
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
Simple Linear Ordinary Least Squares Regression: JTMS-03 Applied Statistics With R
19.04.2021
19.04.2021 ASwR 2
Rationale of simple linear regression
yi = ŷi + εi
19.04.2021 ASwR 3
Rationale of simple linear regression
• The predicted scores ŷ are located along the line of best fit (also
known as regression line)
– Recall the straight line from the correlational scatter plots that
depicts the correlation between the variables involved
– Two points are needed to fit a straight line
ŷi = a + bxi
where:
ŷi is the predicted score for observation i
a is the intercept of the line (point of intersection with Y-axis)
b is the slope (gradient) of the regression line
– Hence, the regression model is:
y = a + bx + ε
19.04.2021 ASwR 4
Rationale of simple linear regression
Y Regression line
ŷ = a + bx
εi = yi – ŷi
Predicted
score ŷi
Observed
score yi
19.04.2021 ASwR 5
Rationale of simple linear regression
Y Regression line
ŷ = a + bx
a
b
19.04.2021 ASwR 6
Rationale of simple linear regression
19.04.2021 ASwR 7
Rationale of simple linear regression
19.04.2021 ASwR 8
Simple linear regression
19.04.2021 ASwR 9
Simple linear regression
= 56 – 0.745*29 = 34.40
19.04.2021 ASwR 12
Simple linear regression
19.04.2021 ASwR 13
Simple linear regression
W=
19.04.2021 ASwR 14
Simple linear regression
• Intercept: a = 34.40
Students who invested 0 hours of studying would get on average
34.40 points on the exam.
19.04.2021 ASwR 15
Simple linear regression
19.04.2021 ASwR 16
Simple linear regression
19.04.2021 ASwR 17
Simple linear regression
19.04.2021 ASwR 18
Simple linear regression
– R output
summary(ex1.reg)
19.04.2021 ASwR 19
Simple linear regression
19.04.2021 ASwR 20
Simple linear regression
The overall ANOVA test of model fit shows whether the predictors in
the regression model explain a significant amount of variation in the
dependent variable. (Note the complementarity between ANOVA and
regression).
As F(1,8) = 8.647, p < .05, we can conclude that the explained amount
of variation (R2 = .5194) is statistically significant.
19.04.2021 ASwR 21
Simple linear regression
The result from the overall ANOVA test of model fit can also be
obtained as follows.
anova(ex1.reg)
19.04.2021 ASwR 22
Simple linear regression
19.04.2021 ASwR 23
Simple linear regression
The evidence on the intercept (a = 34.406, p < .05) shows that when
students invest 0 hours of studying, the expected exam performance is
at 34.4 points, and this value is significantly different from 0.
As you may realize, this is not useful information, because there is no
observed score of 0 on exam performance. Actually, quite often the
intercept is not interpreted at all.
19.04.2021 ASwR 24
Simple linear regression
19.04.2021 ASwR 25
Simple linear regression
library(QuantPsyc)
lm.beta(ex1.reg)
19.04.2021 ASwR 26
Simple linear regression
How many hours should a student invest in order to get 100 out of 100
points on the exam?
100 = 34.406 + 0.745*x
x = 88.05 hours
19.04.2021 ASwR 27
Core assumptions of simple linear regression
• Dependent variable
– Continuous (interval or ratio scaled)
• Independent variable (predictor)
– Continuous (not necessarily normally distributed) or dichotomous
– Non-zero variance: scores on the predictor should not be
identical for all observations
– Homoskedasticity: residuals have the same variance at each
level of the predictor
– Linearly related to the dependent variable
• Errors
– Normally distributed
– Lack of autocorrelation: For any two observations, the residuals
should be uncorrelated
19.04.2021 ASwR 28
Plots for diagnostics
https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/
• Plot of predicted values vs. actual (observed) values
– High model accuracy – Rather low model accuracy
19.04.2021 ASwR 29
Plots for diagnostics
https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/
• Plot of predicted values vs. actual (observed) values
– Omitted grouping variable – Omitted interaction effect
19.04.2021 ASwR 30
Plots for diagnostics
19.04.2021 ASwR 31
Plots for diagnostics
https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/
• Plot of predicted values vs. standardized residuals
– Examples of ‘well fitting’ models: Residuals are randomly
scattered, not forming a clear pattern
19.04.2021 ASwR 32
Plots for diagnostics
https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/
• Plot of predicted values vs. standardized residuals
– Example of a problematic model: Heteroskedasticity
19.04.2021 ASwR 33
Plots for diagnostics
https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/
• Plot of predicted values vs. standardized residuals
– Example of a problematic model: Non-linearity
19.04.2021 ASwR 34
Plots for diagnostics
https://www.qualtrics.com/support/stats-iq/analyses/regression-guides/interpreting-residual-plots-improve-regression/
• Plot of predicted values vs. standardized residuals
– Example of a problematic model: Presence of outliers
19.04.2021 ASwR 35
Plots for diagnostics
19.04.2021 ASwR 36
Plots for diagnostics
19.04.2021 ASwR 37
Plots for diagnostics
https://i.stack.imgur.com/wzOMY.png
19.04.2021 ASwR 38
Plots for diagnostics
19.04.2021 ASwR 39