Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Jahangirnagar University

Subject: Data Mining


Subject Code: PMIT-6307

Submitted by
Md. Naeem Islam
ID: 211139
Batch: PMIT 22

Submitted to
Prof. Md. Fazlul Karim Patwary
Professor
Institute of Information Technology
Jahangirnagar University

ASSIGNMENT-3
Lab Work

1. Use IRIS Data set

2. Construct Classification Tree and Accuracy

3. Repeat (1 & 2) for different random training data

the accuracy

4 repeat (1,2,3) for KNN , SVM , Bayesian classification

5. Submit with codes and results. It will be good if you explain/interpret the

results
1. Decision Tree Classification using IRIS dataset in R.

We are using RStudio software to complete this lab work. Need to use following codes to achieve our
goals.

>iris - To show IRIS dataset(IRIS dataset is preloaded in RStudio)

Creating Training Data & Test Data.


>Xi=sample(2,nrow(iris),replace=TRUE,prob=c(0.8,0.2)) - To create an index called X.
>table(Xi) - To show data in X index as table.
> train=iris[Xi==1,] - To create training data in the train variable.
>train - To show training data.
> test=iris[Xi==2,] - To create test data in the test variable.
>test - To show Test data

Classification Model & Accuracy.


> library(rpart) - To load rpart library which is an essential package.
>dtreeC=rpart(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data=train,method="
class"); - It will create Classification model called dtreeC using training data.
> pred=predict(dtreeC,test,type="class") -To create a prediction data in pred variable using test data.
>table(pred,test[,5]) -To show pred and test data as table to identify prediction.
>tab=table(pred,test[,5]) -To store predicted data in TAB(tab is matrix).
>diag(tab) -To show diagonal data.
>sum(diag(tab)) -To see the total number of diagonal data.
>sum(diag(tab))/nrow(test) - To see the accuracy.
>1-sum(diag(tab))/nrow(test) - To see/show inaccuracy.
Plot or Visualization.
> plot(dtreeC) - To plot the model
> text(dtreeC) - To show text of the model
To view more graphical information, we have to use “library(rpart.plot)”
>rpart.plot(dtreeC) - To see graphical view of the model

Screenshot of lab work


KNN
1. KNN Classification using IRIS dataset in R.

KNN will work well as the dimensionality is low due to a few parameters

>iris - To show dataset

Creating Training Data & Test Data.


> Species=factor(iris$Species) - To encode target features as factors to differentiate
>library(caTools) - This caTools library is required to split data.
>set.seed(100) - For random number generation.
>Split = sample.split(Species, SplitRatio = 0.8) - Splitting dataset in the ratio of 8:2.
>Training = subset(iris, Split==T) - Creating training set
>Testing = subset(iris, Split==F) - Creating test set

Classification Model & Accuracy.


Feature Scaling to ensure the same variance in the standard distribution and scaling after removing
the column containing labels

>Train = scale(Training[-5])
>Test = scale(Testing[-5])
>library(class) - For adding 'knn' classifier
>Prediction = knn(train = Train, test = Test, cl = Training[,5], k = 5, prob = T) - Predicting using
'knn'. Adding factors for classification & Setting 'k' to 5 as it generally avoids overfitting
>table(Testing[,5], Prediction) - For confusion Matrix

Plot or Visualization.
>library(gmodels) - Tool for model fitting
>CrossTable(x = Testing$Species, y = Prediction, prop.chisq=FALSE) - Implementing cross-
tabulation function(Excluding chi-square contribution of each cell)
Screenshot of Lab Work
SVM
1. SVM(Support Vector Machine) Classification using IRIS dataset in R.

To complete this process, need some library packages to install.


>library("e1071")
> library(GGally)
>library(ggplot2)

>iris - To show IRIS dataset

Creating SVM Model


>svm_model <- svm(Species ~ ., data=iris, kernel="radial") - To Create SVM model

Classification Model & Accuracy.

>pred = predict(svm_model,iris)
>tab = table(Predicted=pred, Actual = iris$Species)
>tab
>sum(diag(tab)/sum(tab))

Plot or Visualization.
>ggpairs(iris, ggplot2::aes(colour = Species, alpha = 0.4))
>plot(svm_model, data=iris, Petal.Width~Petal.Length, slice = list(Sepal.Width=3,
Sepal.Length=4))

Parameter Tunning
It helps you to select best model
>set.seed(123)
>tmodel=tune(svm,Species~., data=iris, ranges=list(epsilon= seq(0,1,0.1), cost = 2^(2:7)))
>plot(tmodel)
Screenshot of Lab Work
NAIVE BAYES
1. Naive Bayes Classification using IRIS dataset in R.

>iris - To show IRIS dataset


Need to install package and load library - install.packages("caret")

> library(caret)
> library(lattice)
> library(ggplot2)

Creating Training Data & Test Data.


>indxTrain <- createDataPartition(y = iris$Species,p = 0.75,list = FALSE)
> training <- iris[indxTrain,]
> testing <- iris[-indxTrain,]
>prop.table(table(iris$Species)) * 100 - To check dimensions of the split.

Classification Model & Accuracy.


Creating objects x which holds the predictor variables and y which holds the response variables
> x = training[,-9]
> y = training$Species
> library(e1071)
> model = train(x,y,'nb',trControl=trainControl(method='cv',number=10))
>model
> Predict <- predict(model,newdata = testing )
> Predict
> confusionMatrix(Predict, testing$Species ) - To get the confusion matrix to see accuracy value
and other parameter values

Plot or Visualization.
> plot(model)
> plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8,
+ + col=“red”, main=“Petal length distribution for the 3
Error: unexpected '=' in:
"plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8,
+ col="
> curve(dnorm(x, 5.552, 0.5518947), add=TRUE, col=“green”)
Error: unexpected '>' in ">"
> plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8,
+ col=“red”, main=“Petal length distribution for the 3
Error: unexpected input in:
+ gv
Error: object 'gv' not found
> plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8, col="red", main="Petal length
distribution for the 3 different species")
> curve(dnorm(x, 4.260, 0.4699110), add=TRUE, col="blue")
> curve(dnorm(x, 5.552, 0.5518947), add=TRUE, col="green")

Screenshot of Lab Work

You might also like