Assigmnent 3 (Data Mining)

Jahangirnagar University
Subject: Data Mining

Subject Code: PMIT-6307
Submitted by
Md. Naeem Islam
ID: 211139
Batch: PMIT 22
Submitted to
Prof. Md. Fazlul Karim Patwary
Professor
Institute of Information Technology
Jahangirnagar University
ASSIGNMENT-3
Lab Work
1. Use IRIS Data set
2. Construct Classification Tree and Accuracy
3. Repeat (1 & 2) for different random training data
the accuracy
4 repeat (1,2,3) for KNN , SVM , Bayesian classification
5. Submit with codes and results. It will be good if you explain/interpret the
results
1. Decision Tree Classification using IRIS dataset in R.
We are using RStudio software to complete this lab work. Need to use following codes to achieve our
goals.
>iris - To show IRIS dataset(IRIS dataset is preloaded in RStudio)
Creating Training Data & Test Data.

>Xi=sample(2,nrow(iris),replace=TRUE,prob=c(0.8,0.2)) - To create an index called X.
>table(Xi) - To show data in X index as table.
> train=iris[Xi==1,] - To create training data in the train variable.
>train - To show training data.
> test=iris[Xi==2,] - To create test data in the test variable.
>test - To show Test data
Classification Model & Accuracy.

> library(rpart) - To load rpart library which is an essential package.
>dtreeC=rpart(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data=train,method="
class"); - It will create Classification model called dtreeC using training data.
> pred=predict(dtreeC,test,type="class") -To create a prediction data in pred variable using test data.
>table(pred,test[,5]) -To show pred and test data as table to identify prediction.
>tab=table(pred,test[,5]) -To store predicted data in TAB(tab is matrix).
>diag(tab) -To show diagonal data.
>sum(diag(tab)) -To see the total number of diagonal data.
>sum(diag(tab))/nrow(test) - To see the accuracy.
>1-sum(diag(tab))/nrow(test) - To see/show inaccuracy.
Plot or Visualization.
> plot(dtreeC) - To plot the model
> text(dtreeC) - To show text of the model
To view more graphical information, we have to use “library(rpart.plot)”
>rpart.plot(dtreeC) - To see graphical view of the model
Screenshot of lab work

KNN
1. KNN Classification using IRIS dataset in R.
KNN will work well as the dimensionality is low due to a few parameters
>iris - To show dataset

> Species=factor(iris$Species) - To encode target features as factors to differentiate
>library(caTools) - This caTools library is required to split data.
>set.seed(100) - For random number generation.
>Split = sample.split(Species, SplitRatio = 0.8) - Splitting dataset in the ratio of 8:2.
>Training = subset(iris, Split==T) - Creating training set
>Testing = subset(iris, Split==F) - Creating test set

Feature Scaling to ensure the same variance in the standard distribution and scaling after removing
the column containing labels
>Train = scale(Training[-5])
>Test = scale(Testing[-5])
>library(class) - For adding 'knn' classifier
>Prediction = knn(train = Train, test = Test, cl = Training[,5], k = 5, prob = T) - Predicting using
'knn'. Adding factors for classification & Setting 'k' to 5 as it generally avoids overfitting
>table(Testing[,5], Prediction) - For confusion Matrix
>library(gmodels) - Tool for model fitting
>CrossTable(x = Testing$Species, y = Prediction, prop.chisq=FALSE) - Implementing cross-
tabulation function(Excluding chi-square contribution of each cell)
Screenshot of Lab Work
SVM
1. SVM(Support Vector Machine) Classification using IRIS dataset in R.
To complete this process, need some library packages to install.

>library("e1071")
> library(GGally)
>library(ggplot2)
>iris - To show IRIS dataset
Creating SVM Model

>svm_model <- svm(Species ~ ., data=iris, kernel="radial") - To Create SVM model
>pred = predict(svm_model,iris)
>tab = table(Predicted=pred, Actual = iris$Species)
>tab
>sum(diag(tab)/sum(tab))
>ggpairs(iris, ggplot2::aes(colour = Species, alpha = 0.4))
>plot(svm_model, data=iris, Petal.Width~Petal.Length, slice = list(Sepal.Width=3,
Sepal.Length=4))
Parameter Tunning
It helps you to select best model
>set.seed(123)
>tmodel=tune(svm,Species~., data=iris, ranges=list(epsilon= seq(0,1,0.1), cost = 2^(2:7)))
>plot(tmodel)
NAIVE BAYES
1. Naive Bayes Classification using IRIS dataset in R.
>iris - To show IRIS dataset

Need to install package and load library - install.packages("caret")
> library(caret)
> library(lattice)
> library(ggplot2)

>indxTrain <- createDataPartition(y = iris$Species,p = 0.75,list = FALSE)
> training <- iris[indxTrain,]
> testing <- iris[-indxTrain,]
>prop.table(table(iris$Species)) * 100 - To check dimensions of the split.

Creating objects x which holds the predictor variables and y which holds the response variables
> x = training[,-9]
> y = training$Species
> library(e1071)
> model = train(x,y,'nb',trControl=trainControl(method='cv',number=10))
>model
> Predict <- predict(model,newdata = testing )
> Predict
> confusionMatrix(Predict, testing$Species ) - To get the confusion matrix to see accuracy value
and other parameter values
> plot(model)
> plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8,
+ + col=“red”, main=“Petal length distribution for the 3
Error: unexpected '=' in:
"plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8,
+ col="
> curve(dnorm(x, 5.552, 0.5518947), add=TRUE, col=“green”)
Error: unexpected '>' in ">"
> plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8,
+ col=“red”, main=“Petal length distribution for the 3
Error: unexpected input in:
+ gv
Error: object 'gv' not found
> plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8, col="red", main="Petal length
distribution for the 3 different species")
> curve(dnorm(x, 4.260, 0.4699110), add=TRUE, col="blue")
> curve(dnorm(x, 5.552, 0.5518947), add=TRUE, col="green")

Assigmnent 3 (Data Mining)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assigmnent 3 (Data Mining)

Uploaded by

Copyright:

Available Formats

Jahangirnagar University

Subject: Data Mining

1. Use IRIS Data set

2. Construct Classification Tree and Accuracy

3. Repeat (1 & 2) for different random training data

4 repeat (1,2,3) for KNN , SVM , Bayesian classification

>iris - To show IRIS dataset(IRIS dataset is preloaded in RStudio)

Creating Training Data & Test Data.

Classification Model & Accuracy.

Screenshot of lab work

>iris - To show dataset

Creating Training Data & Test Data.

Classification Model & Accuracy.

To complete this process, need some library packages to install.

>iris - To show IRIS dataset

Creating SVM Model

Classification Model & Accuracy.

>iris - To show IRIS dataset

Creating Training Data & Test Data.

Classification Model & Accuracy.

Screenshot of Lab Work

You might also like