Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 12

LOAN PREDICTOR USING MACHINE

LEARNING.

INTERNSHIP ASSESSMENT REPORT

BACHELOR OF TECHNOLOGY
Data Science & III Semester

SUBMITTED BY
RICHA SRIVASTAVA
(2101331540086)

NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY,


GREATER NOIDA, UTTARPRADESH
STUDENT’S DECLARATION

I hereby certify that the work which is being presented in the internship assessment report entitled
“LOAN PREDICTOR USING MACHINE LEARNING” in fulfillment of the requirement for the
award of the Degree of Bachelor of Technology in the Department of Data Science of Noida
Institute Of Engineering & Technology, Gautam Buddha Nagar, Greater Noida, Uttar Pradesh, is
an authentic record of my own work carried out during III Semester.

Date: Name and Signature of student

The internship assessment viva-voce examination of Mr./Ms. __________, Roll No.


…………………. Of B.TECH ( Data Science) has been held on __________.

Signature of:

Project Guide: _______ Head of Dept:


(Stamp of organization)

External Examiner: _______ Internal Examiner _________


ACKNOWLEDGEMENT

I am highly grateful to Dr. Vinod M Kapse, Director, Noida Institute of Engineering &
Technology, Greater Noida for providing this opportunity.

The constant guidance and encouragement received from Dr. Priyanka Chandani, HOD (Data
Science, dept.), NIET, Greater Noida has been of great help in carrying out the project work and is
acknowledged with reverential thanks.
 
I would like to express a deep sense of gratitude and thanks profusely to Mrs. Himani Sharma’s
project guide, without her wise counsel and able guidance, it would have been impossible to
complete the report in this manner.
 
I express gratitude to other faculty members of the Data Science department of NIET for their
intellectual support throughout the course of this work.
 
 
 
 
RICHA SRIVASTAVA
 

 
 
 
Chapter 1.

Introduction

Loan Prediction is very helpful for employees of banks as well as for the applicant also. The aim

of this paper is to provide a quick, immediate, and easy way to choose deserving applicants.

Housing Finance Company deals in all loans. They have a presence across all urban, semi-urban

and rural areas. Customer-first applies for a loan after that company or bank validates the

customer’s eligibility for a loan. A company or bank wants to automate the loan eligibility process

(real-time) based on customer details provided while filling out the application form. These details

are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit

History, and others. This project has taken the data of previous customers of various banks to

whom on a set of parameters loans were approved. So the machine learning model is trained on

that record to get accurate results. Our main objective of this project is to predict the safety of the

loan. To predict loan safety, the SVM and Naïve Bayes algorithm are used. First, the data is

cleaned so as to avoid missing values in the data set.

In this report, we are going to make a Loan Predictor Using Machine Learning. This is a

classification problem in which we need to classify whether the loan will be approved or not.

Classification refers to a predictive modeling problem where a class label is predicted for a given

example of input data. A few examples of classification problems are Spam Email detection,

Cancer detection, Sentiment Analysis, etc.


Chapter 2.

Literature Survey

1." Loan Approval Prediction based on Machine Learning Approach" .

The main objective of this paper is to predict whether assigning the loan to particular person will

be safe or not. This paper is divided into four sections (i)Data Collection (ii) Comparison of

machine learning models on collected data (iii) Training of system on most promising model (iv)

Testing 2.“Exploring the Machine Learning Algorithm for Prediction the Loan Sanctioning

Process” Author- E. Chandra Blessie, R. Rekha - Year- 2019 Extending credits to corporates and

individuals for the smooth functioning of growing economies like India is inevitable. As

increasing number of customers apply for loans in the banks and non- banking financial

companies (NBFC), it is really challenging for banks and NBFCs with limited capital to device a

standard resolution and safe procedure to lend money to its borrowers for their financial needs. In

addition, in recent times NBFC inventories have suffered a significant downfall in terms of the

stock price. It has contributed to a contagion that has also spread to other financial stocks,

adversely affecting the benchmark in recent times. In this paper, an attempt is made to condense

the risk involved in selecting the suitable person who could repay the loan on time thereby

keeping the bank’s nonperforming assets (NPA) on the hold. This is achieved by feeding the past

records of the customer who acquired loans from the bank into a trained machine learning model

which could yield an accurate result. The prime focus of the paper is to determine whether or not

it will be safe to allocate the loan to a particular person. This paper has the following sections (i)

Collection of Data, (ii) Data Cleaning and (iii) Performance Evaluation. Experimental tests found

that the Naïve Bayes model has better performance Evaluation. Experimental tests found that the

Naïve Bayes model has better performance than other models in terms of loan forecasting
Chapter 3.

Understanding the Problem Statement 

Housing Finance company deals in all kinds of home loans. They have a presence across all

urban, semi-urban and rural areas. The customer first applies for a home loan and after that, the

company validates the customer eligibility for the loan. The company wants to automate the loan

eligibility process (real-time) based on customer detail provided while filling out online

application forms. These details are Gender, Marital Status, Education, number of Dependents,

Income, Loan Amount, Credit History, and others. To automate this process, they have provided a

dataset to identify the customer segments that are eligible for loan amounts so that they can

specifically target these customers. As mentioned above this is a Binary Classification problem in

which we need to predict our Target label which is “Loan Status”. Loan status can have two

values: Yes or NO.Yes: if the loan is approved NO: if the loan is not approved. So using the

training dataset we will train our model and try to predict our target column that is “Loan Status”

on the test dataset.


Chapter 4.

Implementation Details (Modules):

Loan Dataset : Loan Dataset is very useful in our system for prediction of more accurate result.
Using the loan Dataset the system will automatically predict which costumer’s loan it should
approve and which to reject. System will accept loan application form as an input. Justified format
of application form should be given as an input to get processed.
Typically , Here the system separate a dataset into a training set and testing set ,most of the data
use for training ,and a smaller portions of data is use for testing. after a system has been processed
by using the training set, it makes the prediction against the test set.
Data cleaning and processing: In Data cleaning the system detect and correct corrupt or
inaccurate records from database and refers to identifying incomplete, incorrect, inaccurate or
irrelevant parts of the data and then replacing , modifying or detecting the dirty or coarse data. In
Data processing the system convert data from a given form to a much more usable and desired
form i.e. make it more meaningful and informative.
Dataset:-
Load Essential Python Libraries

Load Training/ Test Dataset

Size of Train/Test Data

So we have 614 rows and 13 columns in our training dataset.

In test data, we have 367 rows and 12 columns because the target column is not included in the

test data.

First look at the Dataset


Imputing the missing values:

 Fill null values with mode

Univariate Analysis Observations

1. More Loans are approved Vs Rejected

2. Count of Male applicants is more than Female

3. Count of Married applicant is more than Non-married

4. Count of graduate is more than non-Graduate

5. Count of self-employed is less than that of Non-Self-employed

6. Maximum properties are located in Semiurban areas


7. Credit History is present for many applicants

8. The count of applicants with several dependents=0 is maximum.

Output:

•  Technology:- Machine learning, Variate Analysis, Correlation, Supervised learning.


• Tools:- Python, Jupiter notebook, CSV Files, Libraries etc.
Chapter 5.

Conclusion

After the Final Submission of test data, my accuracy score was 78%.

Feature engineering helped me increase my accuracy.

• People who had repaid their debt had higher chances to get the loan.

• Applicant income does not affect the chances of having a loan.

• Credit history more, semiurban area:- high chances

• Married People have a higher chance of getting loans.

• Co-applicant income less, higher the chances of getting a loan.

Reference:-

Internshala.com
1) Kumar Arun, Garg Ishan, Kaur Sanmeet, ―Loan Approval Prediction based on Machine

Learning Approach‖, IOSR Journal of Computer Engineering (IOSR-JCE), Vol. 18, Issue 3, pp.

79-81, Ver. I (May-Jun. 2016).

You might also like