Cross Validation

VALIDATION SET APPROACH
PROBLEM:
Write an R program to train a model for the Swiss data set by cross validation using validation set
approach and predict the variable fertility
AIM:
To perform the validation set approach on a given data set
CODE IN R LANGUAGE:
library(caret)
library(tidyverse)
head(swiss) #load the data
str(swiss) #to get the structure of the data set
set.seed(123)
#to divide the data set into test-train sets
train_set=swiss$Fertility %>% createDataPartition(p=0.8,list = FALSE)
train_data=swiss[train_set,]
test_data=swiss[-train_set,]
model=lm(Fertility~.,data=train_data) #built model
#make predictions
predictions=model%>%predict(test_data)
predictions
data.frame(R2=R2(predictions,test_data$Fertility),RMSE=RMSE(predictions,test_data$Fertility),MA
E=MAE(predictions,test_data$Fertility))
OUTPUT:
> library(caret)
> library(tidyverse)
> head(swiss) #load the data
Fertility Agriculture Examination Education Catholic
Courtelary 80.2 17.0 15 12 9.96
Delemont 83.1 45.1 6 9 84.84
Franches-Mnt 92.5 39.7 5 5 93.40
Moutier 85.8 36.5 12 7 33.77
Neuveville 76.9 43.5 17 15 5.16
Porrentruy 76.1 35.3 9 7 90.57
Infant.Mortality
Courtelary 22.2
Delemont 22.2
Franches-Mnt 20.2
Moutier 20.3
Neuveville 20.6
Porrentruy 26.6
> str(swiss) #to get the structure of the data set
'data.frame': 47 obs. of 6 variables:
$ Fertility : num 80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
$ Agriculture : num 17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
$ Examination : int 15 6 5 12 17 9 16 14 12 16 ...
$ Education : int 12 9 5 7 15 7 7 8 7 13 ...
$ Catholic : num 9.96 84.84 93.4 33.77 5.16 ...
$ Infant.Mortality: num 22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...
> set.seed(123)
> #to divide the data set into test-train sets
> train_set=swiss$Fertility %>% createDataPartition(p=0.8,list = FALSE)
> train_data=swiss[train_set,]
> test_data=swiss[-train_set,]
> model=lm(Fertility~.,data=train_data) #built model
> #make predictions
> predictions=model%>%predict(test_data)
> predictions
Glane Sarine Aigle Avenches Payerne Rolle Entremont
79.87755 78.07547 59.22761 64.51491 71.03743 62.55682 76.96442
Martigwy
76.22415
>
data.frame(R2=R2(predictions,test_data$Fertility),RMSE=RMSE(predictions,test_data$Fertility),MAE=MAE(p
redictions,test_data$Fertility))
R2 RMSE MAE
0.5946201 6.410914 5.651552
RESULT:
The validation set approch is used to train the linear regression model on the Swiss dataset. The model
have a r squared value of 0.594 and RMSE value of 6.41
LEAVE-ONE-OUT CROSS VALIDATION
PROBLEM:
Write an R program to train a model for the Swiss data set by leave one out cross validation and
predict the variable for unseen data.
AIM:
To study the leave-one-out cross validation method to train a model for a given data set.
CODE IN R LANGUAGE:
library(caret)
library(tidyverse)
set.seed(123)
method=trainControl(method = "LOOCV") #leave-one-out method
model1=train(Fertility~.,data=swiss,method="lm",trControl=method) #train the model
print(model1) #summarise the model
OUTPUT:
>library(caret)
Courtelary 80.2 17.0 15 12 9.96
Delemont 83.1 45.1 6 9 84.84
Franches-Mnt 92.5 39.7 5 5 93.40
Moutier 85.8 36.5 12 7 33.77
Neuveville 76.9 43.5 17 15 5.16
Porrentruy 76.1 35.3 9 7 90.57
Infant.Mortality
Courtelary 22.2
Delemont 22.2
Franches-Mnt 20.2
Moutier 20.3
Neuveville 20.6
Porrentruy 26.6
$ Fertility : num 80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
$ Agriculture : num 17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
$ Examination : int 15 6 5 12 17 9 16 14 12 16 ...
$ Education : int 12 9 5 7 15 7 7 8 7 13 ...
$ Catholic : num 9.96 84.84 93.4 33.77 5.16 ...
> set.seed(123)
> method=trainControl(method = "LOOCV")
> #TRAIN MODEL
> model1=train(Fertility~.,data=swiss,method="lm",trControl=method)
> #summarize
> print(model1)
Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Leave-One-Out Cross-Validation
Summary of sample sizes: 46, 46, 46, 46, 46, 46, ...
Resampling results:
RMSE Rsquared MAE

7.738618 0.6128307 6.116021
Tuning parameter 'intercept' was held constant at a value of TRUE
RESULT:
Leave One-Out cross validation is used to train the linear regression model on the Swiss dataset. The
model have a r squared value of 0.612 and RMSE value of 7.738.
K-FOLD CROSS VALIDATION
PROBLEM:
Write an R program to train a model for the Swiss data set by repeated k-fold cross validation and
predict the dependent variable.
AIM:
To study the k-fold cross validation method to train a model for a given data set.
CODE IN R LANGUAGE:
library(caret)
library(tidyverse)
set.seed(123)
method2=trainControl(method = "repeatedcv",number = 10,repeats = 3) #k-fold method
model2=train(Fertility~.,data=swiss,method="lm",trControl=method2) #to create an model
print(model2) #Summary of the model
OUTPUT:
> library(caret)
Courtelary 80.2 17.0 15 12 9.96
Delemont 83.1 45.1 6 9 84.84
Franches-Mnt 92.5 39.7 5 5 93.40
Moutier 85.8 36.5 12 7 33.77
Neuveville 76.9 43.5 17 15 5.16
Porrentruy 76.1 35.3 9 7 90.57
Infant.Mortality
Courtelary 22.2
Delemont 22.2
Franches-Mnt 20.2
Moutier 20.3
Neuveville 20.6
Porrentruy 26.6
$ Fertility : num 80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
$ Agriculture : num 17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
$ Examination : int 15 6 5 12 17 9 16 14 12 16 ...
$ Education : int 12 9 5 7 15 7 7 8 7 13 ...
$ Catholic : num 9.96 84.84 93.4 33.77 5.16 ...
> set.seed(123)
> method2=trainControl(method = "repeatedcv",number = 10,repeats = 3) #k-fold method
> model2=train(Fertility~.,data=swiss,method="lm",trControl=method2) #to create an model
> print(model2) #Summary of the model
Linear Regression
47 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 42, 42, 42, 42, 42, 44, ...
Resampling results:
RMSE Rsquared MAE

7.357186 0.6992415 6.15871
Tuning parameter 'intercept' was held constant at a value of TRUE
RESULT:
10-fold cross validation is perfomed to train the linear regression model on the Swiss data set. The model have a
R-squared value of 0.699 and RMSE value of 7.357
BOOTSTRAPPING
PROBLEM:
Write a R program to perform bootstrapping on the Auto dataset and evaluate the model.
AIM:
Perform bootstrapping on the given set of data.
CODE IN R LANGUAGE:
library(MASS)
attach(Auto)
statistic <- function(data, index) {
lm.fit <- lm(mpg ~ horsepower, data = data, subset = index)
coef(lm.fit)
}
statistic(Auto, 1:392)
summary(lm(mpg ~ horsepower, data = Auto))
set.seed(123)
boot(Auto, statistic, 1000)
quad.statistic <- function(data, index) {
lm.fit <- lm(mpg ~ poly(horsepower, 2), data = data, subset = index)
coef(lm.fit)
}
set.seed(1)
boot(Auto, quad.statistic, 1000)
summary(lm(mpg ~ poly(horsepower, 2), data = Auto))
OUTPUT:
> library(MASS)
> attach(Auto)
> statistic <- function(data, index) {
+ lm.fit <- lm(mpg ~ horsepower, data = data, subset = index)
+ coef(lm.fit)
+}
> statistic(Auto, 1:392)
(Intercept) horsepower
39.9358610 -0.1578447
> summary(lm(mpg ~ horsepower, data = Auto))
Call:
lm(formula = mpg ~ horsepower, data = Auto)
Residuals:
Min 1Q Median 3Q Max
-13.5710 -3.2592 -0.3435 2.7630 16.9240
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.935861 0.717499 55.66 <2e-16 ***
horsepower -0.157845 0.006446 -24.49 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.906 on 390 degrees of freedom

Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
> set.seed(123)
> boot(Auto, statistic, 1000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Auto, statistic = statistic, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 39.9358610 0.0156469811 0.845583773
t2* -0.1578447 -0.0001803022 0.007393556
> quad.statistic <- function(data, index) {
+ lm.fit <- lm(mpg ~ poly(horsepower, 2), data = data, subset = index)
+ coef(lm.fit)
+}
> set.seed(1)
> boot(Auto, quad.statistic, 1000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = Auto, statistic = quad.statistic, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 23.44592 -0.003660358 0.2195369
t2* -120.13774 0.002769239 3.6138046
t3* 44.08953 0.101767465 4.1998076
> summary(lm(mpg ~ poly(horsepower, 2), data = Auto))
Call:
lm(formula = mpg ~ poly(horsepower, 2), data = Auto)
Residuals:
Min 1Q Median 3Q Max
-14.7135 -2.5943 -0.0859 2.2868 15.8961
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.4459 0.2209 106.13 <2e-16 ***
poly(horsepower, 2)1 -120.1377 4.3739 -27.47 <2e-16 ***
poly(horsepower, 2)2 44.0895 4.3739 10.08 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.374 on 389 degrees of freedom

Multiple R-squared: 0.6876, Adjusted R-squared: 0.686
F-statistic: 428 on 2 and 389 DF, p-value: < 2.2e-16
RESULT:
A regression model for the in-built dataset Auto is built using bootstrapping method to predict the
variable ‘mpg’.
BEST SUBSET SELECTION
PROBLEM:
Write a R program to fit a regression model for the Hitters data set using best subset selection method
to predict the variable Salary.
AIM:
To use the best subset selection method to fit the best regression model for the given dataset.
CODE IN R LANGUAGE:
library(ISLR)
names(Hitters) # to see variables names
dim(Hitters )
sum(is.na(Hitters$Salary)) #to calculate no.of miising observations
Hitters =na.omit(Hitters) # to remove all rows that have missing value
dim(Hitters )
library(leaps)
#to perform best subset selection on the data set
regfit.full=regsubsets(Salary~.,Hitters)
summary(regfit.full) #summary of the model
regfit.full=regsubsets (Salary~.,data=Hitters ,nvmax=19)
reg.summary =summary (regfit.full)
names(reg.summary)
reg.summary$rsq
reg.summary$rss
reg.summary$adjr2
OUTPUT:
> library(ISLR)
> names(Hitters) # to see variables names
[1] "AtBat" "Hits" "HmRun" "Runs" "RBI" "Walks"
[7] "Years" "CAtBat" "CHits" "CHmRun" "CRuns" "CRBI"
[13] "CWalks" "League" "Division" "PutOuts" "Assists" "Errors"
[19] "Salary" "NewLeague"
> dim(Hitters )
[1] 263 20
> sum(is.na(Hitters$Salary)) #to calculate no.of miising observations
[1] 0
> Hitters =na.omit(Hitters) # to remove all rows that have missing value
> dim(Hitters )
[1] 263 20
> library(leaps)
> #to perform best subset selection on the data set
> regfit.full=regsubsets(Salary~.,Hitters)
> summary(regfit.full) #summary of the model
Subset selection object
Call: regsubsets.formula(Salary ~ ., Hitters)
19 Variables (and intercept)
Forced in Forced out
AtBat FALSE FALSE
Hits FALSE FALSE
HmRun FALSE FALSE
Runs FALSE FALSE
RBI FALSE FALSE
Walks FALSE FALSE
Years FALSE FALSE
CAtBat FALSE FALSE
CHits FALSE FALSE
CHmRun FALSE FALSE
CRuns FALSE FALSE
CRBI FALSE FALSE
CWalks FALSE FALSE
LeagueN FALSE FALSE
DivisionW FALSE FALSE
PutOuts FALSE FALSE
Assists FALSE FALSE
Errors FALSE FALSE
NewLeagueN FALSE FALSE
1 subsets of each size up to 8
Selection Algorithm: exhaustive
AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI
1 ( 1 ) " " " " " " " " " " " " " " " " " " " " " " "*"
2 ( 1 ) " " "*" " " " " " " " " " " " " " " " " " " "*"
3 ( 1 ) " " "*" " " " " " " " " " " " " " " " " " " "*"
4 ( 1 ) " " "*" " " " " " " " " " " " " " " " " " " "*"
5 ( 1 ) "*" "*" " " " " " " " " " " " " " " " " " " "*"
6 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " " " "*"
7 ( 1 ) " " "*" " " " " " " "*" " " "*" "*" "*" " " " "
8 ( 1 ) "*" "*" " " " " " " "*" " " " " " " "*" "*" " "
CWalks LeagueN DivisionW PutOuts Assists Errors NewLeagueN
1 (1)"" "" "" "" "" "" ""
2 (1)"" "" "" "" "" "" ""
3 (1)"" "" "" "*" " " " " " "
4 ( 1 ) " " " " "*" "*" " " " " " "
5 ( 1 ) " " " " "*" "*" " " " " " "
6 ( 1 ) " " " " "*" "*" " " " " " "
7 ( 1 ) " " " " "*" "*" " " " " " "
8 ( 1 ) "*" " " "*" "*" " " " " " "
> regfit.full=regsubsets (Salary~.,data=Hitters ,nvmax=19)
> reg.summary =summary (regfit.full)
> names(reg.summary)
[1] "which" "rsq" "rss" "adjr2" "cp" "bic" "outmat" "obj"
> reg.summary$rsq
[1] 0.3214501 0.4252237 0.4514294 0.4754067 0.4908036 0.5087146 0.5141227
[8] 0.5285569 0.5346124 0.5404950 0.5426153 0.5436302 0.5444570 0.5452164
[15] 0.5454692 0.5457656 0.5459518 0.5460945 0.5461159
> reg.summary$rss
[1] 36179679 30646560 29249297 27970852 27149899 26194904 25906548 25136930
[9] 24814051 24500402 24387345 24333232 24289148 24248660 24235177 24219377
[17] 24209447 24201837 24200700
> reg.summary$adjr2
[1] 0.3188503 0.4208024 0.4450753 0.4672734 0.4808971 0.4972001 0.5007849
[8] 0.5137083 0.5180572 0.5222606 0.5225706 0.5217245 0.5206736 0.5195431
[15] 0.5178661 0.5162219 0.5144464 0.5126097 0.5106270
RESULT:
The best fit linear regression model is built on the given dataset with the maximum R 2 value of
0.546 .
FORWARD STEPWISE SELECTION
PROBLEM:
Write a R program to fit a regression model for the Hitters data set using forward stepwise selection
method.
AIM:
To use the forward stepwise selection method to fit the best regression model for the given dataset.
CODE IN R LANGUAGE:
library(ISLR)
dim(Hitters )
dim(Hitters )
library(leaps)
regfit.fwd = regsubsets(Salary~., data = Hitters, nvmax = 19, method = "forward")
summary(regfit.fwd)
coef(regfit.fwd,which.min(summary(regfit.fwd)$cp))
OUTPUT:
> library(ISLR)
> dim(Hitters )
[1] 263 20
[1] 0
> dim(Hitters )
[1] 263 20
> library(leaps)
> regfit.fwd = regsubsets(Salary~., data = Hitters, nvmax = 19, method = "forward")
> summary(regfit.fwd)
Call: regsubsets.formula(Salary ~ ., data = Hitters, nvmax = 19, method = "forward")
AtBat FALSE FALSE
Hits FALSE FALSE
HmRun FALSE FALSE
Runs FALSE FALSE
RBI FALSE FALSE
Walks FALSE FALSE
Years FALSE FALSE
CAtBat FALSE FALSE
CHits FALSE FALSE
CHmRun FALSE FALSE
CRuns FALSE FALSE
CRBI FALSE FALSE
CWalks FALSE FALSE
LeagueN FALSE FALSE
PutOuts FALSE FALSE
Assists FALSE FALSE
Errors FALSE FALSE
Selection Algorithm: forward
AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns
1 (1) "" "" "" "" """" "" "" "" "" ""
2 ( 1 ) " " "*" " " " " " " " " " " " " " " " " " "
3 ( 1 ) " " "*" " " " " " " " " " " " " " " " " " "
4 ( 1 ) " " "*" " " " " " " " " " " " " " " " " " "
5 ( 1 ) "*" "*" " " " " " " " " " " " " " " " " " "
6 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " " "
7 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " " "
8 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " "*"
9 ( 1 ) "*" "*" " " " " " " "*" " " "*" " " " " "*"
10 ( 1 ) "*" "*" " " " " " " "*" " " "*" " " " " "*"
11 ( 1 ) "*" "*" " " " " " " "*" " " "*" " " " " "*"
12 ( 1 ) "*" "*" " " "*" " " "*" " " "*" " " " " "*"
13 ( 1 ) "*" "*" " " "*" " " "*" " " "*" " " " " "*"
14 ( 1 ) "*" "*" "*" "*" " " "*" " " "*" " " " " "*"
15 ( 1 ) "*" "*" "*" "*" " " "*" " " "*" "*" " " "*"
16 ( 1 ) "*" "*" "*" "*" "*" "*" " " "*" "*" " " "*"
17 ( 1 ) "*" "*" "*" "*" "*" "*" " " "*" "*" " " "*"
18 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" " " "*"
19 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
CRBI CWalks LeagueN DivisionW PutOuts Assists Errors NewLeagueN
1 ( 1 ) "*" " " " " " " "" "" "" ""
2 ( 1 ) "*" " " " " " " "" "" "" ""
3 ( 1 ) "*" " " " " " " "*" " " " " " "
4 ( 1 ) "*" " " " " "*" "*" " " " " " "
5 ( 1 ) "*" " " " " "*" "*" " " " " " "
6 ( 1 ) "*" " " " " "*" "*" " " " " " "
7 ( 1 ) "*" "*" " " "*" "*" " " " " " "
8 ( 1 ) "*" "*" " " "*" "*" " " " " " "
9 ( 1 ) "*" "*" " " "*" "*" " " " " " "
10 ( 1 ) "*" "*" " " "*" "*" "*" " " " "
11 ( 1 ) "*" "*" "*" "*" "*" "*" " " " "
12 ( 1 ) "*" "*" "*" "*" "*" "*" " " " "
13 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
14 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
15 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
16 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
17 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
18 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
19 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
> coef(regfit.fwd,which.min(summary(regfit.fwd)$cp))
(Intercept) AtBat Hits Walks CAtBat CRuns
162.5354420 -2.1686501 6.9180175 5.7732246 -0.1300798 1.4082490
CRBI CWalks DivisionW PutOuts Assists
0.7743122 -0.8308264 -112.3800575 0.2973726 0.2831680
RESULT:
The best fit regression model is built on the dataset to predict the variable salary.
BACKWARD STEPWISE SELECTION
PROBLEM:
Write a R programme to fit a regression model for the Hitters data set using backward stepwise
selection method.
AIM:
To use the backward stepwise selection method to fit the best regression model for the given dataset.
CODE IN R LANGUAGE:
library(ISLR)
dim(Hitters )
dim(Hitters )
library(leaps)
#to fit a regression model
regfit.bwd = regsubsets(Salary~., data = Hitters, nvmax = 19, method = "backward")
summary(regfit.bwd)
coef(regfit.bwd,which.min(summary(regfit.bwd)$cp))
OUTPUT:
> library(ISLR)
> dim(Hitters )
[1] 263 20
[1] 0
> dim(Hitters )
[1] 263 20
> library(leaps)
> regfit.bwd = regsubsets(Salary~., data = Hitters, nvmax = 19, method = "backward")
> summary(regfit.bwd)
Call: regsubsets.formula(Salary ~ ., data = Hitters, nvmax = 19, method = "backward")
AtBat FALSE FALSE
Hits FALSE FALSE
HmRun FALSE FALSE
Runs FALSE FALSE
RBI FALSE FALSE
Walks FALSE FALSE
Years FALSE FALSE
CAtBat FALSE FALSE
CHits FALSE FALSE
CHmRun FALSE FALSE
CRuns FALSE FALSE
CRBI FALSE FALSE
CWalks FALSE FALSE
LeagueN FALSE FALSE
PutOuts FALSE FALSE
Assists FALSE FALSE
Errors FALSE FALSE
Selection Algorithm: backward
AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns
1 ( 1 ) " " " " " " " " " " " " " " " " " " " " "*"
2 ( 1 ) " " "*" " " " " " " " " " " " " " " " " "*"
3 ( 1 ) " " "*" " " " " " " " " " " " " " " " " "*"
4 ( 1 ) "*" "*" " " " " " " " " " " " " " " " " "*"
5 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " "*"
6 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " "*"
7 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " "*"
8 ( 1 ) "*" "*" " " " " " " "*" " " " " " " " " "*"
9 ( 1 ) "*" "*" " " " " " " "*" " " "*" " " " " "*"
10 ( 1 ) "*" "*" " " " " " " "*" " " "*" " " " " "*"
11 ( 1 ) "*" "*" " " " " " " "*" " " "*" " " " " "*"
12 ( 1 ) "*" "*" " " "*" " " "*" " " "*" " " " " "*"
13 ( 1 ) "*" "*" " " "*" " " "*" " " "*" " " " " "*"
14 ( 1 ) "*" "*" "*" "*" " " "*" " " "*" " " " " "*"
15 ( 1 ) "*" "*" "*" "*" " " "*" " " "*" "*" " " "*"
16 ( 1 ) "*" "*" "*" "*" "*" "*" " " "*" "*" " " "*"
17 ( 1 ) "*" "*" "*" "*" "*" "*" " " "*" "*" " " "*"
18 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" " " "*"
19 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*" "*" "*" "*"
CRBI CWalks LeagueN DivisionW PutOuts Assists Errors NewLeagueN
1 (1) "" "" "" "" "" "" "" ""
2 (1) "" "" "" "" "" "" "" ""
3 (1) "" "" "" "" "*" " " " " " "
4 (1) "" "" "" "" "*" " " " " " "
5 (1) "" "" "" "" "*" " " " " " "
6 ( 1 ) " " " " " " "*" "*" " " " " " "
7 ( 1 ) " " "*" " " "*" "*" " " " " " "
8 ( 1 ) "*" "*" " " "*" "*" " " " " " "
9 ( 1 ) "*" "*" " " "*" "*" " " " " " "
10 ( 1 ) "*" "*" " " "*" "*" "*" " " " "
11 ( 1 ) "*" "*" "*" "*" "*" "*" " " " "
12 ( 1 ) "*" "*" "*" "*" "*" "*" " " " "
13 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
14 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
15 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
16 ( 1 ) "*" "*" "*" "*" "*" "*" "*" " "
17 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
18 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
19 ( 1 ) "*" "*" "*" "*" "*" "*" "*" "*"
> coef(regfit.bwd,which.min(summary(regfit.bwd)$cp))
(Intercept) AtBat Hits Walks CAtBat CRuns
162.5354420 -2.1686501 6.9180175 5.7732246 -0.1300798 1.4082490
CRBI CWalks DivisionW PutOuts Assists
0.7743122 -0.8308264 -112.3800575 0.2973726 0.2831680
RESULT:
The best fit regression model is built on the dataset to predict the variable salary using backward
stepwise selection.

Cross Validation

Uploaded by

Copyright:

Available Formats

You might also like

Cross Validation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cross Validation

Uploaded by

Copyright:

Available Formats

VALIDATION SET APPROACH

RMSE Rsquared MAE

Tuning parameter 'intercept' was held constant at a value of TRUE

RMSE Rsquared MAE

Tuning parameter 'intercept' was held constant at a value of TRUE

Residual standard error: 4.906 on 390 degrees of freedom

ORDINARY NONPARAMETRIC BOOTSTRAP

ORDINARY NONPARAMETRIC BOOTSTRAP

Residual standard error: 4.374 on 389 degrees of freedom

You might also like