Professional Documents
Culture Documents
Randomforest Mllab
Randomforest Mllab
Randomforest Mllab
1 21BCE5695
2 M. Ashwin
3 Random Forest
[ ]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
from sklearn.model_selection import train_test_split
import sklearn
[ ]: print(dataf.shape)
(4269, 8)
[ ]: dataf.head()
1
4 20 382 Rejected
[ ]: print(dataf['loan_status'].value_counts())
loan_status
Approved 2656
Rejected 1613
Name: count, dtype: int64
[ ]: dataf['loan_status'].value_counts().plot.bar()
[ ]: <Axes: xlabel='loan_status'>
2
[ ]: dataf['education'].value_counts(normalize=True).plot.bar(title='Gender')
plt.show()
dataf['no_of_dependents'].value_counts(normalize=True).plot.bar(title='Married')
plt.show()
dataf['self_employed'].value_counts(normalize=True).plot.
↪bar(title='Self_Employed')
plt.show()
3
4
6 Idependent variables (Numerical)
Visualizing the distribution of annual income.
[ ]: sns.displot(dataf['income_annum'])
[ ]: <seaborn.axisgrid.FacetGrid at 0x7e9285181150>
5
We can see that the annual income is evenly distributed.
Cibil Score distribution bar graph.
[ ]: sns.displot(dataf['cibil_score'])
[ ]: <seaborn.axisgrid.FacetGrid at 0x7e92864193f0>
6
We can see that the cibil score is also evenly distributed hence, no normalization will be required
Loan Amount distribution plot.
[ ]: sns.displot(dataf['loan_amount'])
[ ]: <seaborn.axisgrid.FacetGrid at 0x7e92851817b0>
7
Encoding data
[ ]: from sklearn import preprocessing
label_encoder = preprocessing.LabelEncoder()
obj = (dataf.dtypes == 'object')
print(type(obj))
for col in list(obj[obj].index):
dataf[col] = label_encoder.fit_transform(dataf[col])
<class 'pandas.core.series.Series'>
[ ]: edu = []
for i in dataf['education']:
if i==0:
edu.append(1)
else:
edu.append(0)
8
dataf['education'] = edu
[ ]: l = []
for i in dataf['loan_status']:
if i==0:
l.append(1)
else:
l.append(0)
dataf['loan_status'] = l
Correlation matrix
[ ]: matrix = dataf.corr()
f, ax = plt.subplots(figsize=(10,10))
sns.heatmap(matrix,vmax=.8,square=True,cmap="BuPu", annot = True)
[ ]: <Axes: >
9
7 Model Building
Splitting data into training and testing
[ ]: Xval = dataf.drop(['loan_status'], axis=1)
Yval = dataf['loan_status']
[ ]: print(np.array(X_train).shape)
print(np.array(Y_train).shape)
print(np.array(X_test).shape)
print(np.array(Y_test).shape)
(3415, 7)
(3415,)
(854, 7)
(854,)
Random Forest Model
[ ]: from sklearn.ensemble import RandomForestClassifier
model.fit(X_train, Y_train)
yPred = model.predict(X_test)
[ ]: print("Accuracy:", val)
Accuracy: 0.9765807962529274
10