KNN_Rainfall

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

3/20/24, 3:32 PM KNN

K-nearest Neighborhood classification algorithm to


predict rainfall level

Importing Necessary Libraries & Functions


In [1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix

Loading and preparing data


In [2]: D = pd.read_csv('C:/Users/Admin/Desktop/STAT 405/Mymensingh.csv')
D

Out[2]: ID Station Year Month TEM DPT WIS HUM SLP T_RAN A_RAIN RAN

0 1 Mymensingh 1960 1 16.9 11.3 2.0 73.39 1016.0 15 0.48 NRT

1 2 Mymensingh 1960 2 21.4 12.6 1.7 66.34 1013.0 0 0.00 NRT

2 3 Mymensingh 1960 3 24.1 14.9 2.3 64.13 1011.4 69 2.23 LTR

3 4 Mymensingh 1960 4 29.9 17.6 2.2 59.03 1007.1 27 0.90 NRT

4 5 Mymensingh 1960 5 29.6 23.2 2.4 73.45 1003.4 187 6.03 LTR

... ... ... ... ... ... ... ... ... ... ... ... ...

667 668 Mymensingh 2015 8 28.7 26.2 2.5 87.10 1003.3 349 11.26 MHR

668 669 Mymensingh 2015 9 28.8 25.2 2.0 85.63 1006.0 263 8.77 LTR

669 670 Mymensingh 2015 10 27.0 23.5 2.0 82.48 1011.3 180 5.81 LTR

670 671 Mymensingh 2015 11 23.1 18.7 1.7 81.73 1013.7 13 0.43 NRT

671 672 Mymensingh 2015 12 18.3 14.9 1.8 82.68 1015.9 5 0.16 NRT

672 rows × 12 columns

In [3]: D.dropna(how='any',axis=0,inplace=True)
D

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 1/9


3/20/24, 3:32 PM KNN

Out[3]: ID Station Year Month TEM DPT WIS HUM SLP T_RAN A_RAIN RAN

0 1 Mymensingh 1960 1 16.9 11.3 2.0 73.39 1016.0 15 0.48 NRT

1 2 Mymensingh 1960 2 21.4 12.6 1.7 66.34 1013.0 0 0.00 NRT

2 3 Mymensingh 1960 3 24.1 14.9 2.3 64.13 1011.4 69 2.23 LTR

3 4 Mymensingh 1960 4 29.9 17.6 2.2 59.03 1007.1 27 0.90 NRT

4 5 Mymensingh 1960 5 29.6 23.2 2.4 73.45 1003.4 187 6.03 LTR

... ... ... ... ... ... ... ... ... ... ... ... ...

667 668 Mymensingh 2015 8 28.7 26.2 2.5 87.10 1003.3 349 11.26 MHR

668 669 Mymensingh 2015 9 28.8 25.2 2.0 85.63 1006.0 263 8.77 LTR

669 670 Mymensingh 2015 10 27.0 23.5 2.0 82.48 1011.3 180 5.81 LTR

670 671 Mymensingh 2015 11 23.1 18.7 1.7 81.73 1013.7 13 0.43 NRT

671 672 Mymensingh 2015 12 18.3 14.9 1.8 82.68 1015.9 5 0.16 NRT

654 rows × 12 columns

In [4]: DD = D.drop(['ID','Station','Year','Month','T_RAN','A_RAIN'],axis=1)
DD

Out[4]: TEM DPT WIS HUM SLP RAN

0 16.9 11.3 2.0 73.39 1016.0 NRT

1 21.4 12.6 1.7 66.34 1013.0 NRT

2 24.1 14.9 2.3 64.13 1011.4 LTR

3 29.9 17.6 2.2 59.03 1007.1 NRT

4 29.6 23.2 2.4 73.45 1003.4 LTR

... ... ... ... ... ... ...

667 28.7 26.2 2.5 87.10 1003.3 MHR

668 28.8 25.2 2.0 85.63 1006.0 LTR

669 27.0 23.5 2.0 82.48 1011.3 LTR

670 23.1 18.7 1.7 81.73 1013.7 NRT

671 18.3 14.9 1.8 82.68 1015.9 NRT

654 rows × 6 columns

In [5]: X = DD.drop(['RAN'],axis=1)
X

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 2/9


3/20/24, 3:32 PM KNN

Out[5]: TEM DPT WIS HUM SLP

0 16.9 11.3 2.0 73.39 1016.0

1 21.4 12.6 1.7 66.34 1013.0

2 24.1 14.9 2.3 64.13 1011.4

3 29.9 17.6 2.2 59.03 1007.1

4 29.6 23.2 2.4 73.45 1003.4

... ... ... ... ... ...

667 28.7 26.2 2.5 87.10 1003.3

668 28.8 25.2 2.0 85.63 1006.0

669 27.0 23.5 2.0 82.48 1011.3

670 23.1 18.7 1.7 81.73 1013.7

671 18.3 14.9 1.8 82.68 1015.9

654 rows × 5 columns

In [6]: Y = DD['RAN']
Y

0 NRT
Out[6]:
1 NRT
2 LTR
3 NRT
4 LTR
...
667 MHR
668 LTR
669 LTR
670 NRT
671 NRT
Name: RAN, Length: 654, dtype: object

Splitting data into trainning and test set ( 75% and


25% )
In [7]: X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25,random_state=124)
X_train

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 3/9


3/20/24, 3:32 PM KNN

Out[7]: TEM DPT WIS HUM SLP

434 25.8 19.4 3.3 72.55 1008.0

260 29.2 24.3 3.7 80.63 1006.5

299 20.1 14.0 2.7 72.74 1013.4

159 29.5 21.0 2.0 70.10 997.5

544 26.5 23.2 4.1 82.29 1005.2

... ... ... ... ... ...

550 22.7 18.9 1.8 83.83 1012.7

118 23.9 17.2 1.0 73.03 1014.1

144 19.8 12.7 1.2 67.87 1015.6

17 27.6 24.2 2.0 87.27 1001.7

480 17.4 13.3 2.3 80.35 1014.1

490 rows × 5 columns

In [8]: Y_train

434 NRT
Out[8]:
260 LTR
299 NRT
159 LTR
544 MHR
...
550 NRT
118 NRT
144 NRT
17 MHR
480 NRT
Name: RAN, Length: 490, dtype: object

In [9]: X_test

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 4/9


3/20/24, 3:32 PM KNN

Out[9]: TEM DPT WIS HUM SLP

446 24.9 17.9 3.8 70.45 1010.8

401 27.5 23.9 5.2 85.67 1001.7

244 27.8 23.1 6.5 76.77 1005.3

502 23.7 20.0 1.7 83.90 1013.5

524 28.3 25.3 2.5 87.77 1005.5

... ... ... ... ... ...

438 28.5 25.9 3.7 86.68 1000.8

129 26.9 23.4 0.8 83.19 1009.0

193 20.5 15.4 0.1 80.82 995.2

68 28.3 24.6 1.4 85.40 1006.4

219 26.3 19.5 4.1 72.77 1008.2

164 rows × 5 columns

In [10]: Y_test

446 LTR
Out[10]:
401 MHR
244 MHR
502 NRT
524 LTR
...
438 LTR
129 LTR
193 NRT
68 LTR
219 LTR
Name: RAN, Length: 164, dtype: object

Fit training data in KNN algorithm


In [11]: KNN = KNeighborsClassifier(n_neighbors=8,metric='euclidean')
KNN.fit(X_train,Y_train)
P = KNN.predict(X_test)
P

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 5/9


3/20/24, 3:32 PM KNN
array(['LTR', 'MHR', 'LTR', 'NRT', 'MHR', 'MHR', 'LTR', 'MHR', 'NRT',
Out[11]:
'NRT', 'LTR', 'MHR', 'NRT', 'MHR', 'NRT', 'NRT', 'MHR', 'NRT',
'MHR', 'LTR', 'NRT', 'MHR', 'NRT', 'MHR', 'NRT', 'MHR', 'LTR',
'NRT', 'LTR', 'NRT', 'LTR', 'MHR', 'LTR', 'NRT', 'LTR', 'LTR',
'MHR', 'NRT', 'LTR', 'NRT', 'LTR', 'NRT', 'LTR', 'MHR', 'NRT',
'LTR', 'LTR', 'LTR', 'NRT', 'LTR', 'MHR', 'MHR', 'MHR', 'NRT',
'LTR', 'NRT', 'LTR', 'NRT', 'NRT', 'MHR', 'LTR', 'LTR', 'NRT',
'MHR', 'NRT', 'LTR', 'NRT', 'MHR', 'LTR', 'LTR', 'NRT', 'MHR',
'LTR', 'LTR', 'NRT', 'MHR', 'MHR', 'LTR', 'LTR', 'LTR', 'NRT',
'LTR', 'MHR', 'LTR', 'MHR', 'NRT', 'LTR', 'NRT', 'LTR', 'LTR',
'LTR', 'LTR', 'LTR', 'NRT', 'LTR', 'LTR', 'NRT', 'NRT', 'LTR',
'NRT', 'LTR', 'NRT', 'LTR', 'MHR', 'LTR', 'NRT', 'LTR', 'NRT',
'NRT', 'LTR', 'LTR', 'LTR', 'NRT', 'LTR', 'NRT', 'LTR', 'LTR',
'LTR', 'MHR', 'NRT', 'MHR', 'MHR', 'LTR', 'MHR', 'NRT', 'LTR',
'MHR', 'NRT', 'LTR', 'NRT', 'NRT', 'LTR', 'NRT', 'NRT', 'NRT',
'LTR', 'NRT', 'MHR', 'NRT', 'MHR', 'LTR', 'NRT', 'LTR', 'LTR',
'NRT', 'NRT', 'MHR', 'LTR', 'LTR', 'NRT', 'LTR', 'LTR', 'NRT',
'MHR', 'NRT', 'LTR', 'LTR', 'LTR', 'LTR', 'MHR', 'LTR', 'MHR',
'LTR', 'LTR'], dtype=object)

Performance Metrics
In [14]: accuracy_score(Y_test,P)

0.7073170731707317
Out[14]:

In [15]: print(confusion_matrix(Y_test,P))

[[47 11 11]
[15 24 0]
[ 9 2 45]]

In [16]: print(classification_report(Y_test,P))

precision recall f1-score support

LTR 0.66 0.68 0.67 69


MHR 0.65 0.62 0.63 39
NRT 0.80 0.80 0.80 56

accuracy 0.71 164


macro avg 0.70 0.70 0.70 164
weighted avg 0.71 0.71 0.71 164

Finding optimal value of K


In [17]: As = []
kk = []
for i in range(2,40):
knn = KNeighborsClassifier(n_neighbors=i,metric='euclidean')
knn.fit(X_train,Y_train)
pred = knn.predict(X_test)
As.append(accuracy_score(Y_test,pred))
kk.append(i)

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 6/9


3/20/24, 3:32 PM KNN

In [18]: As

[0.676829268292683,
Out[18]:
0.725609756097561,
0.7134146341463414,
0.7195121951219512,
0.7012195121951219,
0.7195121951219512,
0.7073170731707317,
0.7012195121951219,
0.7012195121951219,
0.7073170731707317,
0.6951219512195121,
0.7073170731707317,
0.7012195121951219,
0.7073170731707317,
0.7012195121951219,
0.7012195121951219,
0.6890243902439024,
0.6951219512195121,
0.6890243902439024,
0.6951219512195121,
0.6951219512195121,
0.7073170731707317,
0.6890243902439024,
0.6890243902439024,
0.676829268292683,
0.6951219512195121,
0.6890243902439024,
0.6951219512195121,
0.7073170731707317,
0.7073170731707317,
0.6890243902439024,
0.6951219512195121,
0.6890243902439024,
0.6829268292682927,
0.6829268292682927,
0.6890243902439024,
0.6951219512195121,
0.7073170731707317]

In [19]: kk

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 7/9


3/20/24, 3:32 PM KNN
[2,
Out[19]:
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39]

In [22]: plt.figure(figsize=(7,4))
plt.plot(kk,As,color='red',marker='o')
plt.axvline(x=3)

<matplotlib.lines.Line2D at 0x14c171e0610>
Out[22]:

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 8/9


3/20/24, 3:32 PM KNN

In [ ]:

localhost:8889/nbconvert/html/Desktop/STAT 405/KNN.ipynb?download=false 9/9

You might also like