Loan Status Analysis: Bhagwan Mahaveer College of Engineering and Management

LOAN STATUS ANALYSIS
A PROJECT SYNOPSIS
Submitted to
BHAGWAN MAHAVEER COLLEGE OF ENGINEERING AND
MANAGEMENT
By
HEMANT KUMAR
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Under Supervision of: Submitted By:
Ms. GURPREET HEMANT KUMAR

Assistance Professor 00155202716
C.S.E DEPT. B.Tech, C.S.E, 7th Sem
ACKNOWLEDGEMENT
I have taken efforts in this project. However, it would not have been possible without the kind
support and help of many individuals and organizations. I would like to extend my sincere thanks
to all of them.
I am highly indebted to Mr. Arjun Joshi for his guidance and constant supervision as well as for
providing necessary information regarding the project & also for their support in completing the
project.
I would like to express my gratitude towards my parents & partner of my project who kept on
motivating me for this project and helped me to complete it.
Above all i wish to express our heartfelt gratitude to Ms. GUPREET whose support has greatly
boosted our self-confidence and will go a long way on helping us to reach further milestone and
great highs.
DECLARATION
I hereby declare that the project work entitled “ LOAN STATUS ANALYSIS ” submitted to
the BHAGWAN MAHAVEER COLLEGE OF ENGINEERING AND MANAGEMENT, is
a record of an original work by me under the guidance of Mr. Arjun Joshi, and this project
work is submitted in the partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science & Engineering. The result embodied in this
project have not been submitted to any other University or Institute for the award of any degree
or diploma.
(Signature of Student)
HEMANT KUMAR
(Student Name)
00155202716.
(Registration Number)
Table of Contents
ACKNOWLEDGEMENT............................................................................................................................. 2
DECLARATION........................................................................................................................................... 3
OBJECTIVE.................................................................................................................................................. 5
INTRODUCTION......................................................................................................................................... 5
FEATURES................................................................................................................................................... 7
HARDWARE & SOFTWARE REQUIRMENT.......................................................................................... 8
PROJECT LAYOUT...................................................................................... Error! Bookmark not defined.
FUTURE SCOPE........................................................................................................................................ 19
BIBLIOGRAPHY........................................................................................................................................20
OBJECTIVE
The main objective of this loan status project is to recognize the chances of a person being a
defaulter on the basis of analysis of previous collected data, that large chunk of data is used for
analysis of loan status attribute and suggests the tips for decreasing the chances of financial loss
and training a machine learning model which then decides whether giving loan to a certain
person is beneficial or not.
INTRODUCTION
Any consumer finance company which specializes in lending various types of loans to urban
customers faces certain problems. When the company receives a loan application, the company
has to make a decision for loan approval based on the applicant’s profile. Two types of risks are
associated with the bank’s decision:
If the applicant is likely to repay the loan, then not approving the loan results in a loss of
business to the company
If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving
the loan may lead to a financial loss for the company.
The aim is to identify patterns which indicate if a person is likely to default, which may be used
for taking actions such as denying the loan, reducing the amount of loan, lending (to risky
applicants) at a higher interest rate, etc. Using EDA, how consumer attributes and loan attributes
influence the tendency of default is determined. In other words, the projects helps us understand
the driving factors (or driver variables) behind loan default, i.e. the variables which are strong
indicators of default and identifying these risky loan applicants can reduce the amount of credit
loss.
Data cleaning is done by pandas, numpy and data correlation is used to find the driving
attributes.Logistic Regression is used to train the model which tells the loan will be charged-off
or not on the basis of various attributes.
This project also cover insights from the various data exploration and analysis which gives any
user much better understanding of the data. Most important results are concluded at the end with
suggestions for decreasing the number of defaulters.
FEATURES
 Uni-variate, Bi--variate and Multivariate Analysis of data.

 Can suggest tips of decrease financial loss of a company.
 Can predict the if a person is going to be defaulter or not.
HARDWARE & SOFTWARE
REQUIREMENT
Section Requirements and Recommendations
Software Requirements Python(any latest version)
Additional Software Requirements Sklearn-kit, seaborn, numpy/pandas,

matplotlib
Section Requirements and Recommendations
Supported Operating Systems Microsoft Windows 7 32/64 bit

Microsoft Windows 7 32/64 bit
4 GB or more
8 GB or more recommended especially
RAM for Microsoft Windows Vista, 7 and 8
CPU 3.0 GHz processor speed or higher

PROJECT LAYOUT
Calculating loan and consumer attributes using EDA

In this project, we will use EDA for finding the driving factors of a defaulter, Logistic
Regression algorithm is used.
In this project, we will learn:
 Results of uni-variate, bi-variate analysis etc.

 Data visualization using seaborn.
 Deploying logistic Regression algorithm.
Let's get started by importing a few of the libraries we will use in this project.
from sklearn.model_selection import cross_val_score

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
Loading and cleaning of dataset for better accuracy. As the dataset is local it is imported
locally from the system.
data = pd.read_excel("loan.xlsx")
#cleaning data(all columns with null values)

data.head()
limitPer = len(data)*0.3data = data.dropna(thresh=limitPer,axis=1)
data.columns
data.info() #checking type and columns with low or null values.
data.describe()
Univariate Data Analysis
#loan_status
plt.figure(figsize=(6,6))
a = sns.countplot(data['loan_status'])
for p in a.patches:
a.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
Bivariate Data Analysis
#Loan's based on home ownership
d = sns.countplot(data=data,x='home_ownership',order = data['home_ownership'].value_counts().
index)
for p in d.patches:
d.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#Delinquency for past 2 years.
f = sns.barplot(data=data,x='loan_status',y='delinq_2yrs')
plt.show()
#Effect of verified or not verified income on amount of loan.
g = sns.barplot(data=data,x='verification_status',y='loan_amnt')
plt.show()
#purpose of loan vs loan status
b = sns.countplot(data=data,x='purpose',hue='loan_status')
b.set_xticklabels(b.get_xticklabels(), rotation=40, ha="right")
for p in b.patches:
b.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#Relationship of Grade to defaulters.
c=sns.countplot(data=data,x='grade',hue='loan_status')
for p in c.patches:
c.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#Short term vs long term loan's
e=sns.countplot(data=data,x='term',hue='loan_status')
for p in e.patches:
e.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#data correlation
corr=data.corr()plt.figure(figsize=(14,14))sns.heatmap(corr,cbar=True,square=True,cmap='cool
warm')plt.show()
THIS HEATMAP SHOWS THE VARIABLES WITH MAXIMUM CORRELATION

WHICH ARE USED TO TRAIN LOGISTIC REGRESSION MODEL
#For training model columns with high correlation are selected
prediction_var = ['loan_amnt','funded_amnt','funded_amnt_inv','installment','total_pymnt','total_
pymnt_inv','total_rec_prncp','total_rec_int']
#setting our target value(loan_status) to fit model
data['loan_status']=data['loan_status'].map({'Fully Paid':1,'Current':2,'Charged Off':0})
train, test = train_test_split(data,test_size = 0.3)

print(train.shape)
print(test.shape)
train_X = train[prediction_var]
train_Y = train.loan_status
test_X = test[prediction_var]
test_Y = test.loan_status
#LogisticRegression
logistic = LogisticRegression(solver='lbfgs',multi_class='auto')
logistic.fit(train_X,train_Y)
temp = logistic.predict(test_X)
a1 = metrics.accuracy_score(temp,test_Y)#to check accuracy
#DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
cross_val_score(clf,train_X,train_Y,cv=10)
clf.fit(train_X,train_Y,sample_weight=None,check_input=True,X_idx_sorted=None)
clf.get_params(deep=True)clf.predict_log_proba(test_X)
clf.predict(test_X,check_input=True)
a2 = clf.score(test_X,test_Y,sample_weight=None)
#RandomForestRegressor
randf = RandomForestRegressor(n_estimators=100,random_state=0)
randf.fit(train_X,train_Y)
a3 = randf.score(test_X,test_Y)
accur = pd.DataFrame(columns=("Algorithm","Accuracy"))
accur.loc[1]=["LogisticRegression",a1]
accur.loc[2]=["DecisionTreeClassifier",a2]
accur.loc[3]=["RandomForestRegressor",a3]
accur #Shows the algorithm with respective accuracy(Logistic Reg.)

FUTURE SCOPE
With the knowledge I have gained by developing this project, I am confident that in the future I
can make this system more effective by adding some services.
 Banking.
 Can check the the % of person being defaulter.
 Deployment on web for better usability.
BIBLIOGRAPHY
To make this project I have taken the information from :
https://www.coursera.org/learn/machine-learning
https://stackabuse.com/seaborn-library-for-data-visualization-in-python-part-
1/
https://stackoverflow.com

Loan Status Analysis: Bhagwan Mahaveer College of Engineering and Management

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Loan Status Analysis: Bhagwan Mahaveer College of Engineering and Management

Uploaded by

Copyright:

Available Formats

LOAN STATUS ANALYSIS

Under Supervision of: Submitted By:

Ms. GURPREET HEMANT KUMAR

 Uni-variate, Bi--variate and Multivariate Analysis of data.

Section Requirements and Recommendations

Software Requirements Python(any latest version)

Additional Software Requirements Sklearn-kit, seaborn, numpy/pandas,

Section Requirements and Recommendations

Supported Operating Systems Microsoft Windows 7 32/64 bit

CPU 3.0 GHz processor speed or higher

Calculating loan and consumer attributes using EDA

In this project, we will learn:

 Results of uni-variate, bi-variate analysis etc.

from sklearn.model_selection import cross_val_score

#cleaning data(all columns with null values)

Bivariate Data Analysis

#Loan's based on home ownership

#Effect of verified or not verified income on amount of loan.

#purpose of loan vs loan status

#Relationship of Grade to defaulters.

#Short term vs long term loan's

THIS HEATMAP SHOWS THE VARIABLES WITH MAXIMUM CORRELATION

#For training model columns with high correlation are selected

data['loan_status']=data['loan_status'].map({'Fully Paid':1,'Current':2,'Charged Off':0})

train, test = train_test_split(data,test_size = 0.3)

accur #Shows the algorithm with respective accuracy(Logistic Reg.)

To make this project I have taken the information from :

You might also like