Professional Documents
Culture Documents
Chapter 7 - Regression Analysis
Chapter 7 - Regression Analysis
Y = β0 + β1X + ε
but it can be estimated as follows:
Ypred = b0 + b1X + e
Understanding Residuals in Regression
Null and Alternative Hypothesis
Average daily time spent watching TV accounted for 12.9% of the variation in
cholesterol concentration with adjusted R2 = 12.0%, a medium size effect according
to Cohen (1988).
Statistical Significance of the Model
Average daily time spent watching TV statistically significantly
predicted cholesterol concentration, F(1, 97) = 14.40, p < .001.
Interpreting the Coefficients
Substituting the values of the coefficients into the regression equation, you
have:
cholesterol concentration = -0.944 + (0.037)(time_tv)
Predicting Cholesterol Concentration
Predictions were made to determine mean cholesterol concentration
for those people who watched a daily average of 160, 170 and 180
minutes of TV. For 160 minutes, mean cholesterol concentration was
predicted as 4.98 mmol/L, 95% CI [4.73, 5.23]; for 170 minutes it was
predicted as 5.35 mmol/L, 95% CI [5.24, 5.45]; and for 180 minutes it
was predicted as 5.72 mmol/L, 95% CI [5.53, 5.90].
Reporting
A linear regression was run to understand the effect of average daily time spent watching TV
on cholesterol concentration. To assess linearity a scatterplot of cholesterol concentration
against average daily time spent watching TV with superimposed regression line was plotted.
Visual inspection of these two plots indicated a linear relationship between the variables.
There was homoscedasticity and normality of the residuals. One participant was one outlier
with a cholesterol concentration of 7.98 mmol/L. They were removed from the analysis due
to not representing the target population.
The regression equation for the current example can be expressed in the following form:
predicted VO2max = b0 + (b1 x age) + (b2 x weight) + (b3 x heart_rate) + (b4 x gender)
predicted VO2max = 87.83 – (0.165)(age) – (0.385)(weight) – (0.118)(heart_rate) + (13.208)(gender)
Predicting the Dependent Variable
A multiple regression was run to predict VO2max from gender, age, weight and heart
rate. There was linearity as assessed by partial regression plots and a plot of studentized
residuals against the predicted values. There was independence of residuals, as
assessed by a Durbin-Watson statistic of 1.910. There was homoscedasticity, as assessed
by visual inspection of a plot of studentized residuals versus unstandardized predicted
values. There was no evidence of multicollinearity, as assessed by tolerance values
greater than 0.1. There were no studentized deleted residuals greater than ±3 standard
deviations, no leverage values greater than 0.2, and values for Cook's distance above 1.
The assumption of normality was met, as assessed by a Q-Q Plot. The multiple regression
model statistically significantly predicted VO2max, F(4, 95) = 32.393, p < .001,
adj. R2 = .56. All four variables added statistically significantly to the prediction, p < .05.
Summarizing the Multiple Regression Analysis