Professional Documents
Culture Documents
Exam 3 (F20)
Exam 3 (F20)
Exam 3 (F20)
* zn --- proportion of residential land zoned for lots over 25,000 sq.ft.
* chas --- Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
* black --- $1000 (Bk - 0.63)^2$ where $Bk$ is the proportion of blacks by town.
library(MASS)
names(Boston)
STAT 3113 Regression Analysis Exam 3 (Fall 2020) Name: Jacob Sheridan
Question 1: Conduct a stepwise regression analysis of the Boston data using R to find the “best”
predictors of medv. Please use 0.15 for both -to-remove and -to-enter. Include the R output and
comment on the output.
It appears that lstat is the best variable, followed by rm, then ptratio, then dis.
Question 2: What are the dangers associated with drawing inferences from the stepwise model?
Question 3: Use all-possible-regressions-selection to find the “best” predictors of medv. Include the
adjusted r-square plot and Cp plot, and illustrate how the choice is made.
The higher the adjusted r-square and the lower the Cp, the better the model. Therefore, wherever both
graphs start to level out is about the number of variables to include in the model.
Question 4: Compare the results in Question 1 and 3, which independent variables consistently are
selected as the “best” predictors?
The variables that are consistently selected as the “best” are lstat, rm, ptratio, and dis
STAT 3113 Regression Analysis Exam 3 (Fall 2020) Name: Jacob Sheridan
Question 5: Fit the first-order linear model with 10 independent variables: crim, chas, nox, rm, dis,
rad, tax, ptratio, black, lstat. Plot the residual plots. Comment on the four residual plots one by one on
the issues you found. What model adjustments would you recommend?
There is a slight curvilinear trend in the residual plot. The points deviate from the line towards the right
of the Normal Q-Q plot. The points aren’t very spread out in the Scale-Location plot. The Residuals vs
Leverage plot is ok since none of the points are beyond Cook’s Distance. I would suggest transforming
one of the variables.
Question 6: Use the following code to plot the partial residual plots. Comment on the partial residual
plots. What do the plots reveal the information between medv and the independent variables?
library(car)
crPlots(fit)
They reveal whether each variable has a positive ore negative effect on medv. They also reveal that
most of the variables have a mostly linear relationship with medv except rm which have a curvilinear
relationship with medv.
STAT 3113 Regression Analysis Exam 3 (Fall 2020) Name: Jacob Sheridan
Question 7: Fit the model with 10 variables in Question 5 and add second-order term of Istat,
second-order term of rm, and the interaction between rad & lstat, rm &rad, crim & chas, chas & nox.
Include the R output of the model fitting. Comment on the model fitted.
The addition of these terms has caused rm and black to become a lot less significant. The adjusted r-
squared has increased by .1333.
STAT 3113 Regression Analysis Exam 3 (Fall 2020) Name: Jacob Sheridan
Question 8: For the model fitted in Question 7, is the normality assumption reasonably satisfied?
Yes
Question 9: Use Studentized Deleted Residual method to identify whether there are any outliers.
Hint: In Course Content -> R -> Chapter 8 Residual Analysis R code, find the code to get studentized
deleted residuals of the model you fit in Question 7.
STAT 3113 Regression Analysis Exam 3 (Fall 2020) Name: Jacob Sheridan
Question 10: Use Cook’s Distance method to identify whether there are any influential observations.
Hint: In Course Content -> R -> Chapter 8 Residual Analysis R code, find the code to get Cook’s Distance
of the model you fit in Question 7.