Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

10/8/21, 1:09 PM 20190802050_DS_Lab4

AI/ML LAB-4
Name: Pratik Jadhav

PRN: 20190802050

AIM: To implement two algorithms on a data set and impute the


accuracy score of the predictions

Q1. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data
set. Print both correct and wrong predictions. Java/Python ML library classes can be used
for this problem.

In [1]:
%matplotlib inline

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

In [2]:
iris_data = pd.read_csv("Iris.csv")

iris_data.head()

Out[2]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

In [3]:
len(iris_data)

150
Out[3]:

In [4]:
iris_data.isna().sum()

Id 0

Out[4]:
SepalLengthCm 0

SepalWidthCm 0

PetalLengthCm 0

PetalWidthCm 0

Species 0

dtype: int64

localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 1/5
10/8/21, 1:09 PM 20190802050_DS_Lab4

In [5]: X = iris_data.drop("Species", axis=1)

y = iris_data["Species"]

len(X), len(y)

(150, 150)
Out[5]:

In [6]:
from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.2,

random_state=1)

clf = KNeighborsClassifier(n_neighbors=3)

clf.fit(X_train, y_train)

clf.score(X_test, y_test)

0.9666666666666667
Out[6]:

In [7]:
y_preds = clf.predict(X_test)

y_preds[:10]

array(['Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',

Out[7]:
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',

'Iris-setosa', 'Iris-setosa', 'Iris-virginica'], dtype=object)

In [8]:
y_preds_proba = clf.predict_proba(X_test)

y_preds_proba[:10]

array([[1., 0., 0.],

Out[8]:
[0., 1., 0.],

[0., 1., 0.],

[1., 0., 0.],

[0., 0., 1.],

[0., 1., 0.],

[0., 0., 1.],

[1., 0., 0.],

[1., 0., 0.],

[0., 0., 1.]])

In [9]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

accuracy = accuracy_score(y_preds, y_test)

print(f"The accuracy of the ML model for iris data: {accuracy * 100:.2f}%\n")

print(f"Classfication Report: {classification_report(y_preds, y_test)}\n")


print(f"Confusion Matrix: \n{confusion_matrix(y_preds, y_test)}")

The accuracy of the ML model for iris data: 96.67%

Classfication Report: precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 11

Iris-versicolor 0.92 1.00 0.96 12

Iris-virginica 1.00 0.86 0.92 7

localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 2/5
10/8/21, 1:09 PM 20190802050_DS_Lab4

accuracy 0.97 30

macro avg 0.97 0.95 0.96 30

weighted avg 0.97 0.97 0.97 30

Confusion Matrix:

[[11 0 0]

[ 0 12 0]

[ 0 1 6]]

In [10]:
from sklearn.model_selection import cross_val_score

cvs = cross_val_score(clf, X, y)

print(cvs)

print(f"Mean of each testing data set: {np.mean(cvs) * 100:.2f}%")

[0.66666667 1. 1. 1. 0.7 ]

Mean of each testing data set: 87.33%

In [11]:
y_testing = pd.Series(y_test).reset_index().drop("index",axis=1)

y_predictions = pd.Series(y_preds)

In [12]:
predictions_df = pd.DataFrame(data={

"Species": y_testing["Species"],

"Predicted Species": y_predictions

})

In [13]:
predicts = []

for index, i in enumerate(y_testing["Species"]):

if i == y_preds[index]:

predicts.append("Correct")

else:

predicts.append("Wrong")

In [14]:
predictions_df["Correct or Wrong"] = pd.Series(predicts)

predictions_df.head()

Out[14]: Species Predicted Species Correct or Wrong

0 Iris-setosa Iris-setosa Correct

1 Iris-versicolor Iris-versicolor Correct

2 Iris-versicolor Iris-versicolor Correct

3 Iris-setosa Iris-setosa Correct

4 Iris-virginica Iris-virginica Correct

In [15]:
print(f"Total Correct or Wrong Predictions:\n\

{predictions_df['Correct or Wrong'].value_counts()}")

Total Correct or Wrong Predictions:

Correct 29

localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 3/5
10/8/21, 1:09 PM 20190802050_DS_Lab4

Wrong 1

Name: Correct or Wrong, dtype: int64

Q2. Write a program to implement the naïve Bayesian classifier for a sample training data
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data
sets.

In [16]:
iris_data = pd.read_csv("Iris.csv")

iris_data.head()

Out[16]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

In [17]:
X = iris_data.drop("Species", axis=1)

y = iris_data["Species"]

len(X), len(y)

(150, 150)
Out[17]:

In [18]:
from sklearn.naive_bayes import GaussianNB

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.3,

random_state=1)

gnb = GaussianNB()

gnb.fit(X_train, y_train)

gnb.score(X_test, y_test)

1.0
Out[18]:

In [19]:
y_preds = gnb.predict(X_test)

y_preds[:10]

array(['Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',

Out[19]:
'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',

'Iris-setosa', 'Iris-setosa', 'Iris-virginica'], dtype='<U15')

In [20]:
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_preds, y_test)

localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 4/5
10/8/21, 1:09 PM 20190802050_DS_Lab4

print(f"The accuracy of the ML model for iris data: {accuracy * 100:.2f}%")

The accuracy of the ML model for iris data: 100.00%

In [21]:
from sklearn.model_selection import cross_val_score

cvs = cross_val_score(gnb, X, y)

print(cvs)

print(f"Mean of each testing data set: {np.mean(cvs) * 100:.2f}%")

[0.96666667 1. 1. 1. 1. ]

Mean of each testing data set: 99.33%

In [22]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

accuracy = accuracy_score(y_preds, y_test)

print(f"The accuracy of the ML model for iris data: {accuracy * 100:.2f}%\n")

print(f"Classfication Report: {classification_report(y_preds, y_test)}\n")


print(f"Confusion Matrix: \n{confusion_matrix(y_preds, y_test)}")

The accuracy of the ML model for iris data: 100.00%

Classfication Report: precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 14

Iris-versicolor 1.00 1.00 1.00 18

Iris-virginica 1.00 1.00 1.00 13

accuracy 1.00 45

macro avg 1.00 1.00 1.00 45

weighted avg 1.00 1.00 1.00 45

Confusion Matrix:

[[14 0 0]

[ 0 18 0]

[ 0 0 13]]

Conclusion: Hence, we have successfully implemented kNeigbhours and Naive Bayesian


algorithms on iris data set and computed the accuracy and different evaluation model on the
predictions. We got an accuray of 96.67% on testing data and 87.33% on different testing data
sets of the KNeighbours Algorithm. And for Naive Bayesian we got an accuracy of 100% and
99.33% on different testing data sets of iris data.

localhost:8888/nbconvert/html/20190802050_DS_Lab4.ipynb?download=false 5/5

You might also like