K Nearest Neighbours

21bce5695-knn-lab7
March 13, 2024
21BCE5695 M. Ashwin
1 K Nearest Neighbours
1.1 Importing required libraries
[ ]: from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier,KNeighborsRegressor
from sklearn.dummy import DummyClassifier, DummyRegressor
from sklearn.metrics import classification_report, mean_squared_error
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.decomposition import PCA
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
1.2 Importing Dataset

[ ]: df = pd.read_csv('apple_quality.csv')
[ ]: print(df.head(2))
A_id Size Weight Sweetness Crunchiness Juiciness Ripeness \

0 0 -3.970049 -2.512336 5.346330 -1.012009 1.844900 0.32984
1 1 -1.195217 -2.839257 3.664059 1.588232 0.853286 0.86753
Acidity Quality
0 -0.491590 good
1 -0.722809 good
Dropping the ID column since it is not relevant to the machine learning model
[ ]: df.drop(['A_id'], axis=1, inplace=True)
Splitting into input and output data
1
[ ]: x = df.drop(['Quality'], axis=1)
y = df['Quality']
1.3 Data Analysis

[ ]: plt.figure(figsize=(25,10))
for (i,v) in enumerate(x.columns):
plt.subplot(3,df.shape[1],i+1);
plt.hist(df.iloc[:,i],bins="sqrt")
plt.title(df.columns[i],fontsize=9);
Encoding the categorical output values into binary values

[ ]: label = []
for i in tqdm(df['Quality']):
if i=='bad':
label.append(0)
else:
label.append(1)
df['Quality'] = label
100%|��| 4000/4000 [00:00<00:00, 945994.70it/s]
[ ]: dfinfo = pd.DataFrame(df.dtypes,columns=["dtypes"])
for (m,n) in zip([df.count(),df.isna().sum()],["count","isna"]):
dfinfo = dfinfo.merge(pd.
↪DataFrame(m,columns=[n]),right_index=True,left_index=True,how="inner");
dfinfo.T.append(df.describe())
<ipython-input-65-4673ff7821a0>:4: FutureWarning: The frame.append method is

deprecated and will be removed from pandas in a future version. Use
pandas.concat instead.
dfinfo.T.append(df.describe())
[ ]: Size Weight Sweetness Crunchiness Juiciness Ripeness \

dtypes float64 float64 float64 float64 float64 float64
count 4000 4000 4000 4000 4000 4000
isna 0 0 0 0 0 0
2
count 4000.0 4000.0 4000.0 4000.0 4000.0 4000.0
mean -0.503015 -0.989547 -0.470479 0.985478 0.512118 0.498277
std 1.928059 1.602507 1.943441 1.402757 1.930286 1.874427
min -7.151703 -7.149848 -6.894485 -6.055058 -5.961897 -5.864599
25% -1.816765 -2.01177 -1.738425 0.062764 -0.801286 -0.771677
50% -0.513703 -0.984736 -0.504758 0.998249 0.534219 0.503445
75% 0.805526 0.030976 0.801922 1.894234 1.835976 1.766212
max 6.406367 5.790714 6.374916 7.619852 7.364403 7.237837
Acidity Quality
dtypes float64 int64
count 4000 4000
isna 0 0
count 4000.0 4000.0
mean 0.076877 0.501
std 2.11027 0.500062
min -7.010538 0.0
25% -1.377424 0.0
50% 0.022609 1.0
75% 1.510493 1.0
max 7.404736 1.0
Correlation matrix
[ ]: df.corr().round(2).style.background_gradient(cmap="viridis")
[ ]: <pandas.io.formats.style.Styler at 0x78992d29c3d0>
[ ]: print(df.head(3))
Size Weight Sweetness Crunchiness Juiciness Ripeness Acidity \

0 -3.970049 -2.512336 5.346330 -1.012009 1.844900 0.329840 -0.491590
1 -1.195217 -2.839257 3.664059 1.588232 0.853286 0.867530 -0.722809
2 -0.292024 -1.351282 -1.738429 -0.342616 2.838636 -0.038033 2.621636
Quality
0 1
1 1
2 0
1.4 Model building and testing

Splitting data into training and testing
[ ]: x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.
↪3,stratify=y,random_state=30);
3
[ ]: model = KNeighborsClassifier(algorithm="auto");
parameters = {"n_neighbors":[1,3,5],
"weights":["uniform","distance"]}
model_optim = GridSearchCV(model, parameters, cv=5,scoring="accuracy");
Training the model

[ ]: model_optim.fit(x_train,y_train)
[ ]: GridSearchCV(cv=5, estimator=KNeighborsClassifier(),
param_grid={'n_neighbors': [1, 3, 5],
'weights': ['uniform', 'distance']},
scoring='accuracy')
[ ]: model_optim.best_estimator_
[ ]: KNeighborsClassifier(weights='distance')
Model metrics
[ ]: for (i,x,y) in zip(["Train","Test"],[x_train,x_test],[y_train,y_test]):
print("Classification kNN",i," report:
↪\n",classification_report(y,model_optim.predict(x)))
Classification kNN Train report:

precision recall f1-score support
bad 1.00 1.00 1.00 1397

good 1.00 1.00 1.00 1403
accuracy 1.00 2800

macro avg 1.00 1.00 1.00 2800
weighted avg 1.00 1.00 1.00 2800
Classification kNN Test report:

precision recall f1-score support
bad 0.91 0.90 0.91 599

good 0.90 0.91 0.91 601
accuracy 0.91 1200

macro avg 0.91 0.91 0.91 1200
weighted avg 0.91 0.91 0.91 1200

K Nearest Neighbours

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

K Nearest Neighbours

Uploaded by

Copyright:

Available Formats

21bce5695-knn-lab7

March 13, 2024

1.2 Importing Dataset

A_id Size Weight Sweetness Crunchiness Juiciness Ripeness \

Splitting into input and output data

1.3 Data Analysis

Encoding the categorical output values into binary values

100%|��| 4000/4000 [00:00<00:00, 945994.70it/s]

<ipython-input-65-4673ff7821a0>:4: FutureWarning: The frame.append method is

[ ]: Size Weight Sweetness Crunchiness Juiciness Ripeness \

Size Weight Sweetness Crunchiness Juiciness Ripeness Acidity \

1.4 Model building and testing

Training the model

Classification kNN Train report:

bad 1.00 1.00 1.00 1397

accuracy 1.00 2800

Classification kNN Test report:

bad 0.91 0.90 0.91 599

accuracy 0.91 1200

You might also like

K Nearest Neighbours

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

K Nearest Neighbours

Uploaded by

Copyright:

Available Formats

21bce5695-knn-lab7

March 13, 2024

1.2 Importing Dataset

A_id Size Weight Sweetness Crunchiness Juiciness Ripeness \

Splitting into input and output data

1.3 Data Analysis

Encoding the categorical output values into binary values

100%|����������| 4000/4000 [00:00<00:00, 945994.70it/s]

<ipython-input-65-4673ff7821a0>:4: FutureWarning: The frame.append method is

[ ]: Size Weight Sweetness Crunchiness Juiciness Ripeness \

Size Weight Sweetness Crunchiness Juiciness Ripeness Acidity \

1.4 Model building and testing

Training the model

Classification kNN Train report:

bad 1.00 1.00 1.00 1397

accuracy 1.00 2800

Classification kNN Test report:

bad 0.91 0.90 0.91 599

accuracy 0.91 1200

You might also like

100%|��| 4000/4000 [00:00<00:00, 945994.70it/s]