Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

LOAN STATUS ANALYSIS

A PROJECT SYNOPSIS

Submitted to
BHAGWAN MAHAVEER COLLEGE OF ENGINEERING AND
MANAGEMENT
By

HEMANT KUMAR
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Under Supervision of: Submitted By:

Ms. GURPREET HEMANT KUMAR


Assistance Professor 00155202716
C.S.E DEPT. B.Tech, C.S.E, 7th Sem
ACKNOWLEDGEMENT

I have taken efforts in this project. However, it would not have been possible without the kind
support and help of many individuals and organizations. I would like to extend my sincere thanks
to all of them.

I am highly indebted to Mr. Arjun Joshi for his guidance and constant supervision as well as for
providing necessary information regarding the project & also for their support in completing the
project.
I would like to express my gratitude towards my parents & partner of my project who kept on
motivating me for this project and helped me to complete it.

Above all i wish to express our heartfelt gratitude to Ms. GUPREET whose support has greatly
boosted our self-confidence and will go a long way on helping us to reach further milestone and
great highs.
DECLARATION

I hereby declare that the project work entitled “ LOAN STATUS ANALYSIS ” submitted to
the BHAGWAN MAHAVEER COLLEGE OF ENGINEERING AND MANAGEMENT, is
a record of an original work by me under the guidance of Mr. Arjun Joshi, and this project
work is submitted in the partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science & Engineering. The result embodied in this
project have not been submitted to any other University or Institute for the award of any degree
or diploma.

(Signature of Student)

HEMANT KUMAR

(Student Name)

00155202716.

(Registration Number)
Table of Contents

ACKNOWLEDGEMENT............................................................................................................................. 2
DECLARATION........................................................................................................................................... 3
OBJECTIVE.................................................................................................................................................. 5
INTRODUCTION......................................................................................................................................... 5
FEATURES................................................................................................................................................... 7
HARDWARE & SOFTWARE REQUIRMENT.......................................................................................... 8
PROJECT LAYOUT...................................................................................... Error! Bookmark not defined.
FUTURE SCOPE........................................................................................................................................ 19
BIBLIOGRAPHY........................................................................................................................................20
OBJECTIVE

The main objective of this loan status project is to recognize the chances of a person being a
defaulter on the basis of analysis of previous collected data, that large chunk of data is used for
analysis of loan status attribute and suggests the tips for decreasing the chances of financial loss
and training a machine learning model which then decides whether giving loan to a certain
person is beneficial or not.
INTRODUCTION

Any consumer finance company which specializes in lending various types of loans to urban
customers faces certain problems. When the company receives a loan application, the company
has to make a decision for loan approval based on the applicant’s profile. Two types of risks are
associated with the bank’s decision:

If the applicant is likely to repay the loan, then not approving the loan results in a loss of
business to the company

If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving
the loan may lead to a financial loss for the company.

The aim is to identify patterns which indicate if a person is likely to default, which may be used
for taking actions such as denying the loan, reducing the amount of loan, lending (to risky
applicants) at a higher interest rate, etc. Using EDA, how consumer attributes and loan attributes
influence the tendency of default is determined. In other words, the projects helps us understand
the driving factors (or driver variables) behind loan default, i.e. the variables which are strong
indicators of default and identifying these risky loan applicants can reduce the amount of credit
loss.

Data cleaning is done by pandas, numpy and data correlation is used to find the driving
attributes.Logistic Regression is used to train the model which tells the loan will be charged-off
or not on the basis of various attributes.

This project also cover insights from the various data exploration and analysis which gives any
user much better understanding of the data. Most important results are concluded at the end with
suggestions for decreasing the number of defaulters.
FEATURES

 Uni-variate, Bi--variate and Multivariate Analysis of data.


 Can suggest tips of decrease financial loss of a company.
 Can predict the if a person is going to be defaulter or not.
HARDWARE & SOFTWARE

REQUIREMENT

Section Requirements and Recommendations

Software Requirements Python(any latest version)

Additional Software Requirements Sklearn-kit, seaborn, numpy/pandas,


matplotlib

Section Requirements and Recommendations

Supported Operating Systems Microsoft Windows 7 32/64 bit


Microsoft Windows 7 32/64 bit

4 GB or more
8 GB or more recommended especially
RAM for Microsoft Windows Vista, 7 and 8

CPU 3.0 GHz processor speed or higher


PROJECT LAYOUT

Calculating loan and consumer attributes using EDA


In this project, we will use EDA for finding the driving factors of a defaulter, Logistic
Regression algorithm is used.

In this project, we will learn:

 Results of uni-variate, bi-variate analysis etc.


 Data visualization using seaborn.
 Deploying logistic Regression algorithm.

Let's get started by importing a few of the libraries we will use in this project.

from sklearn.model_selection import cross_val_score


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier

Loading and cleaning of dataset for better accuracy. As the dataset is local it is imported
locally from the system.

data = pd.read_excel("loan.xlsx")

#cleaning data(all columns with null values)


data.head()
limitPer = len(data)*0.3data = data.dropna(thresh=limitPer,axis=1)
data.columns
data.info() #checking type and columns with low or null values.

data.describe()
Univariate Data Analysis

#loan_status

plt.figure(figsize=(6,6))
a = sns.countplot(data['loan_status'])
for p in a.patches:
a.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()

Bivariate Data Analysis

#Loan's based on home ownership

plt.figure(figsize=(16,8))
d = sns.countplot(data=data,x='home_ownership',order = data['home_ownership'].value_counts().
index)
for p in d.patches:
d.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#Delinquency for past 2 years.

plt.figure(figsize=(10,6))
f = sns.barplot(data=data,x='loan_status',y='delinq_2yrs')
plt.show()

#Effect of verified or not verified income on amount of loan.

plt.figure(figsize=(10,6))
g = sns.barplot(data=data,x='verification_status',y='loan_amnt')
plt.show()

#purpose of loan vs loan status

plt.figure(figsize=(16,8))
b = sns.countplot(data=data,x='purpose',hue='loan_status')
b.set_xticklabels(b.get_xticklabels(), rotation=40, ha="right")
for p in b.patches:
b.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()

#Relationship of Grade to defaulters.

plt.figure(figsize=(16,10))
c=sns.countplot(data=data,x='grade',hue='loan_status')
for p in c.patches:
c.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()

#Short term vs long term loan's

plt.figure(figsize=(16,8))
e=sns.countplot(data=data,x='term',hue='loan_status')
for p in e.patches:
e.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#data correlation
corr=data.corr()plt.figure(figsize=(14,14))sns.heatmap(corr,cbar=True,square=True,cmap='cool
warm')plt.show()

THIS HEATMAP SHOWS THE VARIABLES WITH MAXIMUM CORRELATION


WHICH ARE USED TO TRAIN LOGISTIC REGRESSION MODEL

#For training model columns with high correlation are selected

prediction_var = ['loan_amnt','funded_amnt','funded_amnt_inv','installment','total_pymnt','total_
pymnt_inv','total_rec_prncp','total_rec_int']
#setting our target value(loan_status) to fit model

data['loan_status']=data['loan_status'].map({'Fully Paid':1,'Current':2,'Charged Off':0})

train, test = train_test_split(data,test_size = 0.3)


print(train.shape)
print(test.shape)
train_X = train[prediction_var]
train_Y = train.loan_status
test_X = test[prediction_var]
test_Y = test.loan_status

#LogisticRegression

logistic = LogisticRegression(solver='lbfgs',multi_class='auto')
logistic.fit(train_X,train_Y)
temp = logistic.predict(test_X)
a1 = metrics.accuracy_score(temp,test_Y)#to check accuracy

#DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=0)
cross_val_score(clf,train_X,train_Y,cv=10)
clf.fit(train_X,train_Y,sample_weight=None,check_input=True,X_idx_sorted=None)
clf.get_params(deep=True)clf.predict_log_proba(test_X)
clf.predict(test_X,check_input=True)
a2 = clf.score(test_X,test_Y,sample_weight=None)

#RandomForestRegressor

randf = RandomForestRegressor(n_estimators=100,random_state=0)
randf.fit(train_X,train_Y)
a3 = randf.score(test_X,test_Y)

accur = pd.DataFrame(columns=("Algorithm","Accuracy"))
accur.loc[1]=["LogisticRegression",a1]
accur.loc[2]=["DecisionTreeClassifier",a2]
accur.loc[3]=["RandomForestRegressor",a3]

accur #Shows the algorithm with respective accuracy(Logistic Reg.)


FUTURE SCOPE

With the knowledge I have gained by developing this project, I am confident that in the future I
can make this system more effective by adding some services.

 Banking.
 Can check the the % of person being defaulter.
 Deployment on web for better usability.
BIBLIOGRAPHY

To make this project I have taken the information from :

https://www.coursera.org/learn/machine-learning

https://stackabuse.com/seaborn-library-for-data-visualization-in-python-part-
1/

https://stackoverflow.com

You might also like