Professional Documents
Culture Documents
Loan Status Analysis: Bhagwan Mahaveer College of Engineering and Management
Loan Status Analysis: Bhagwan Mahaveer College of Engineering and Management
A PROJECT SYNOPSIS
Submitted to
BHAGWAN MAHAVEER COLLEGE OF ENGINEERING AND
MANAGEMENT
By
HEMANT KUMAR
in partial fulfillment for the award of the degree
of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
I have taken efforts in this project. However, it would not have been possible without the kind
support and help of many individuals and organizations. I would like to extend my sincere thanks
to all of them.
I am highly indebted to Mr. Arjun Joshi for his guidance and constant supervision as well as for
providing necessary information regarding the project & also for their support in completing the
project.
I would like to express my gratitude towards my parents & partner of my project who kept on
motivating me for this project and helped me to complete it.
Above all i wish to express our heartfelt gratitude to Ms. GUPREET whose support has greatly
boosted our self-confidence and will go a long way on helping us to reach further milestone and
great highs.
DECLARATION
I hereby declare that the project work entitled “ LOAN STATUS ANALYSIS ” submitted to
the BHAGWAN MAHAVEER COLLEGE OF ENGINEERING AND MANAGEMENT, is
a record of an original work by me under the guidance of Mr. Arjun Joshi, and this project
work is submitted in the partial fulfillment of the requirements for the award of the degree of
Bachelor of Technology in Computer Science & Engineering. The result embodied in this
project have not been submitted to any other University or Institute for the award of any degree
or diploma.
(Signature of Student)
HEMANT KUMAR
(Student Name)
00155202716.
(Registration Number)
Table of Contents
ACKNOWLEDGEMENT............................................................................................................................. 2
DECLARATION........................................................................................................................................... 3
OBJECTIVE.................................................................................................................................................. 5
INTRODUCTION......................................................................................................................................... 5
FEATURES................................................................................................................................................... 7
HARDWARE & SOFTWARE REQUIRMENT.......................................................................................... 8
PROJECT LAYOUT...................................................................................... Error! Bookmark not defined.
FUTURE SCOPE........................................................................................................................................ 19
BIBLIOGRAPHY........................................................................................................................................20
OBJECTIVE
The main objective of this loan status project is to recognize the chances of a person being a
defaulter on the basis of analysis of previous collected data, that large chunk of data is used for
analysis of loan status attribute and suggests the tips for decreasing the chances of financial loss
and training a machine learning model which then decides whether giving loan to a certain
person is beneficial or not.
INTRODUCTION
Any consumer finance company which specializes in lending various types of loans to urban
customers faces certain problems. When the company receives a loan application, the company
has to make a decision for loan approval based on the applicant’s profile. Two types of risks are
associated with the bank’s decision:
If the applicant is likely to repay the loan, then not approving the loan results in a loss of
business to the company
If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving
the loan may lead to a financial loss for the company.
The aim is to identify patterns which indicate if a person is likely to default, which may be used
for taking actions such as denying the loan, reducing the amount of loan, lending (to risky
applicants) at a higher interest rate, etc. Using EDA, how consumer attributes and loan attributes
influence the tendency of default is determined. In other words, the projects helps us understand
the driving factors (or driver variables) behind loan default, i.e. the variables which are strong
indicators of default and identifying these risky loan applicants can reduce the amount of credit
loss.
Data cleaning is done by pandas, numpy and data correlation is used to find the driving
attributes.Logistic Regression is used to train the model which tells the loan will be charged-off
or not on the basis of various attributes.
This project also cover insights from the various data exploration and analysis which gives any
user much better understanding of the data. Most important results are concluded at the end with
suggestions for decreasing the number of defaulters.
FEATURES
REQUIREMENT
4 GB or more
8 GB or more recommended especially
RAM for Microsoft Windows Vista, 7 and 8
Let's get started by importing a few of the libraries we will use in this project.
Loading and cleaning of dataset for better accuracy. As the dataset is local it is imported
locally from the system.
data = pd.read_excel("loan.xlsx")
data.describe()
Univariate Data Analysis
#loan_status
plt.figure(figsize=(6,6))
a = sns.countplot(data['loan_status'])
for p in a.patches:
a.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
plt.figure(figsize=(16,8))
d = sns.countplot(data=data,x='home_ownership',order = data['home_ownership'].value_counts().
index)
for p in d.patches:
d.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#Delinquency for past 2 years.
plt.figure(figsize=(10,6))
f = sns.barplot(data=data,x='loan_status',y='delinq_2yrs')
plt.show()
plt.figure(figsize=(10,6))
g = sns.barplot(data=data,x='verification_status',y='loan_amnt')
plt.show()
plt.figure(figsize=(16,8))
b = sns.countplot(data=data,x='purpose',hue='loan_status')
b.set_xticklabels(b.get_xticklabels(), rotation=40, ha="right")
for p in b.patches:
b.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
plt.figure(figsize=(16,10))
c=sns.countplot(data=data,x='grade',hue='loan_status')
for p in c.patches:
c.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
plt.figure(figsize=(16,8))
e=sns.countplot(data=data,x='term',hue='loan_status')
for p in e.patches:
e.annotate('{:.0f}'.format(p.get_height()), (p.get_x()+0.1, p.get_height()+50))
plt.show()
#data correlation
corr=data.corr()plt.figure(figsize=(14,14))sns.heatmap(corr,cbar=True,square=True,cmap='cool
warm')plt.show()
prediction_var = ['loan_amnt','funded_amnt','funded_amnt_inv','installment','total_pymnt','total_
pymnt_inv','total_rec_prncp','total_rec_int']
#setting our target value(loan_status) to fit model
#LogisticRegression
logistic = LogisticRegression(solver='lbfgs',multi_class='auto')
logistic.fit(train_X,train_Y)
temp = logistic.predict(test_X)
a1 = metrics.accuracy_score(temp,test_Y)#to check accuracy
#DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=0)
cross_val_score(clf,train_X,train_Y,cv=10)
clf.fit(train_X,train_Y,sample_weight=None,check_input=True,X_idx_sorted=None)
clf.get_params(deep=True)clf.predict_log_proba(test_X)
clf.predict(test_X,check_input=True)
a2 = clf.score(test_X,test_Y,sample_weight=None)
#RandomForestRegressor
randf = RandomForestRegressor(n_estimators=100,random_state=0)
randf.fit(train_X,train_Y)
a3 = randf.score(test_X,test_Y)
accur = pd.DataFrame(columns=("Algorithm","Accuracy"))
accur.loc[1]=["LogisticRegression",a1]
accur.loc[2]=["DecisionTreeClassifier",a2]
accur.loc[3]=["RandomForestRegressor",a3]
With the knowledge I have gained by developing this project, I am confident that in the future I
can make this system more effective by adding some services.
Banking.
Can check the the % of person being defaulter.
Deployment on web for better usability.
BIBLIOGRAPHY
https://www.coursera.org/learn/machine-learning
https://stackabuse.com/seaborn-library-for-data-visualization-in-python-part-
1/
https://stackoverflow.com