Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Adekunle Onaopepo 2561632

Ghareh Gozlou Samira 2561142


Problem 6
a)
The matrix of correlations is illustrated in the table below
mpg
cylinders
displacement
horsepower
weight
acceleration
year
origin

mpg

cylinders

1.0000
-0.7776
-0.8051
-0.7784
-0.8322
0.4233
0.5805
0.5652

-0.7776
1.0000
0.9508
0.8429
0.8975
-0.5046
-0.3456
-0.5689

displacement

-0.8051
0.9508
1.0000
0.8972
0.9329
-0.5438
-0.3698
-0.6145

horsepower

weight

acceleration

-0.7784
0.8429
0.8972
1.0000
0.8645
-0.6891
-0.4163
-0.4551

-0.8322
0.8975
0.9329
0.8645
1.0000
-0.4168
-0.3091
-0.5850

0.4233
-0.5046
-0.5438
-0.6891
-0.4168
1.0000
0.2903
0.2127

year

origin

0.5805
-0.3456
-0.3698
-0.4163
-0.3091
0.2903
1.0000
0.1815

0.5652
-0.5689
-0.6145
-0.4551
-0.5850
0.2127
0.1815
1.0000

Without loss of generality,


The correlation between a variable and itself is 1.0
The correlation (, ) {, , , } is negative
showing inverse relation between the variables
The correlation (, ) {, , } is positive showing
direct relation between the variables
The correlation between year and origin which is 0.18 is close to zero which implies that the variables
are nearly independent on one another.
b)

Mpg ~ cylinders
Mpg ~ displacement
Mpg ~ horsepower
Mpg ~ year

RSE
4.914
4.365
4.906
6.363

R squared
0.6047 (Significant)
0.6482 (Significant)
0.6059 (Significant)
0.337

Clear from the illustrations above mpg as a response is statistically significant with respect to cylinders
displacement and horsepower but not so statistically significant with year.
c)

The 95% confidence interval for all parameters estimates are:


(Intercept)
cylinders
displacement
horsepower
weight

[-26.349864469
[ -1.129001385
[ 0.005119788
[ -0.044058392
[ -0.007756074

,-8.087004775]
, 0.142248747]
, 0.034671499]
, 0.010156103]
,-0.005192013]

acceleration [ -0.113769257 , 0.274920933]


year
[ 0.650551315 , 0.850994041]
origin
[ 0.879280169 , 1.973000822]

The values suggest a possible high standard error for the intercept parameter estimate (0) and lowest
standard error in the group is the weight parameter(4). This implies that the 0.95 probability of the 0
being a true estimate lies in a wider range than that of 4
Multiple Linear Regression
Residual standard error: 4.914 on 390 degrees of freedom
Multiple R-squared: 0.6047

Simple Linear Regression (X as cylinders, displacement, horsepower and year respectively)


Residual standard error: 4.914 on 390 degrees of freedom
Multiple R-squared: 0.6047
Residual standard error: 4.635 on 390 degrees of freedom
Multiple R-squared: 0.6482
Residual standard error: 4.906 on 390 degrees of freedom
Multiple R-squared: 0.6059
Residual standard error: 6.363 on 390 degrees of freedom
Multiple R-squared: 0.337

The model fit is worse in the simple linear regression, since generally, the estimates for the Residual
standard error and R-squared for cylinders, horsepower and displacement are higher and lower
(respectively) than in the multiple regression model. This implies that the multiple linear regression is a
better estimate of the system.

d)

The residual plot in the upper left illustration suggest a slight non-linearity but still generally acceptable.
The residual plot also marks out observations 323,327,326 as outliers.
The leverage plot in the lower left illustration shows observation 14 possesses an unusually high
leverage

e)
In this case we decided to choose the pairwise combination with non-linear displacement in the form

Cylinders ~ weight + exp(displacement)


Year ~ cylinders + sqrt(displacement)
Weight ~ year + cubic- displacement

The residual vs fitted values was selected since its the best to evaluate non-linearity and it shows in the
illustrations 1 and 3 that the model fit degrades and is not the best fit but illustration 2 is still somewhat
acceptable except the ending deviates substantially from the mean also, so its debatable.

You might also like