Model Building Like KNN Model, NB Model GLM Model For R Studio

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

MGN801

CA-2

Haobijam Rojesh Singh

Registration no:11615614
> heart= read.csv(file.choose(), header=T)
> View(heart)
> data=(heart)
> heart$sex=as.factor(heart$sex)
> set.seed(1234)
> intrain=createDataPartition(y=heart$sex, p=0.75, list=F)
> training=heart[intrain,]
> testing=heart[-intrain,]
> dim(training)
[1] 228 14
> dim=(testing)

> #step 3 Model building


> #knn method

> modelfit=train(sex~.,data=training,metdod="knn")
> modelfit
Random Forest

228 samples
13 predictor
2 classes: '0', '1'

No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 228, 228, 228, 228, 228, 228, ... Resampling
results across tuning parameters:

mtry Accuracy Kappa


2 0.7084241 0.2298801
7 0.7049115 0.2665396
13 0.6966670 0.2640432

Accuracy was used to select the optimal model using the largest value. The
final value used for the model was mtry = 2.

> #step4 model validation


> predictions=predict(modelfit, newdata=testing)
> predictions
[1] 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 0 0 0
1 1 0 1 1 1 1 1 0 1 1
[48] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Levels: 0 1
> confusionMatrix(predictions,testing$sex)
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 10 2
1 14 49

Accuracy : 0.7867
95% CI : (0.6768, 0.8729)
No Information Rate : 0.68
P-Value [Acc > NIR] : 0.02855

Kappa : 0.435

Mcnemar's Test P-Value : 0.00596

Sensitivity : 0.4167
Specificity : 0.9608
Pos Pred Value : 0.8333
Neg Pred Value : 0.7778
Prevalence : 0.3200
Detection Rate : 0.1333
Detection Prevalence : 0.1600
Balanced Accuracy : 0.6887

'Positive' Class : 0

> #medhod "glm"


> modelfit1=train(sex~.,data=training,method="glm")
> modelfit1
Generalized Linear Model

228 samples
13 predictor
2 classes: '0', '1'

No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 228, 228, 228, 228, 228, 228, ... Resampling
results:

Accuracy Kappa
0.6684758 0.1855419

> predictions1=predict(modelfit1,newdata=testing)
> predictions1
[1] 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 1 0 0
1 0 0 1 1 1 1 1 1 1 1
[48] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Levels: 0 1
> confusionMatrix(predictions1,testing$sex)
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 9 6
1 15 45

Accuracy : 0.72
95% CI : (0.6044, 0.8176)
No Information Rate : 0.68
P-Value [Acc > NIR] : 0.27122

Kappa : 0.2857

Mcnemar's Test P-Value : 0.08086

Sensitivity : 0.3750
Specificity : 0.8824
Pos Pred Value : 0.6000
Neg Pred Value : 0.7500
Prevalence : 0.3200
Detection Rate : 0.1200
Detection Prevalence : 0.2000
Balanced Accuracy : 0.6287

'Positive' Class : 0

> #model "nb"


> modelfit2=train(sex~., data=training,method="nb")
There were 50 or more warnings (use warnings() to see the first 50)
> modelfit2
Naive Bayes

228 samples
13 predictor
2 classes: '0', '1'

No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 228, 228, 228, 228, 228, 228, ... Resampling
results across tuning parameters:

usekernel Accuracy Kappa


FALSE 0.6802025 0.2814926
TRUE 0.6789058 0.2374677

Tuning parameter 'fL' was held constant at a value of 0


Tuning parameter 'adjust' was held constant at a value
of 1
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were fL = 0, usekernel = FALSE and adjust
= 1.
> predictions2=predict(modelfit2,newdata=testing)
There were 50 or more warnings (use warnings() to see the first 50)
> predictions2
[1] 1 0 1 0 1 0 1 1 0 1 1 0 1 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0
0 1 0 1 1 1 1 1 0 1 1
[48] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Levels: 0 1
> confusionMatrix(predictions2,testing$sex)
Confusion Matrix and Statistics

Reference
Prediction 0 1
0 16 6
1 8 45

Accuracy : 0.8133
95% CI : (0.7067, 0.894)
No Information Rate : 0.68
P-Value [Acc > NIR] : 0.0073

Kappa : 0.5614

Mcnemar's Test P-Value : 0.7893

Sensitivity : 0.6667
Specificity : 0.8824
Pos Pred Value : 0.7273
Neg Pred Value : 0.8491
Prevalence : 0.3200
Detection Rate : 0.2133
Detection Prevalence : 0.2933
Balanced Accuracy : 0.7745

'Positive' Class : 0

INTERPRETATION
1.From the above chart we can see that the accuracy is 0.7867, specificity is 0.9608 and
sensitivity is 0.4167 as per the KNN model.
2. From the above chart we can see that the accuracy is 0.8133, specificity is
0.8824 and sensitivity is 0.6667as per the NB model.
3. From the above chart we can see that the accuracy is 0.72, specificity is 0.8824
and sensitivity is 0.3750as per the GLM model.

❖ After comparing the three models i can see that NB model is the best.

Reference
https://www.kaggle.com/ronitf/heart-disease-uci

You might also like