Assessment Report Richa

LOAN PREDICTOR USING MACHINE
LEARNING.
INTERNSHIP ASSESSMENT REPORT
BACHELOR OF TECHNOLOGY
Data Science & III Semester
SUBMITTED BY
RICHA SRIVASTAVA
(2101331540086)
NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY,

GREATER NOIDA, UTTARPRADESH
STUDENT’S DECLARATION
I hereby certify that the work which is being presented in the internship assessment report entitled
“LOAN PREDICTOR USING MACHINE LEARNING” in fulfillment of the requirement for the
award of the Degree of Bachelor of Technology in the Department of Data Science of Noida
Institute Of Engineering & Technology, Gautam Buddha Nagar, Greater Noida, Uttar Pradesh, is
an authentic record of my own work carried out during III Semester.
Date: Name and Signature of student
The internship assessment viva-voce examination of Mr./Ms. __________, Roll No.

…………………. Of B.TECH ( Data Science) has been held on __________.
Signature of:
Project Guide: _______ Head of Dept:

(Stamp of organization)
External Examiner: _______ Internal Examiner _________

ACKNOWLEDGEMENT
I am highly grateful to Dr. Vinod M Kapse, Director, Noida Institute of Engineering &
Technology, Greater Noida for providing this opportunity.
The constant guidance and encouragement received from Dr. Priyanka Chandani, HOD (Data
Science, dept.), NIET, Greater Noida has been of great help in carrying out the project work and is
acknowledged with reverential thanks.

I would like to express a deep sense of gratitude and thanks profusely to Mrs. Himani Sharma’s
project guide, without her wise counsel and able guidance, it would have been impossible to
complete the report in this manner.

I express gratitude to other faculty members of the Data Science department of NIET for their
intellectual support throughout the course of this work.

RICHA SRIVASTAVA

Chapter 1.
Introduction
Loan Prediction is very helpful for employees of banks as well as for the applicant also. The aim
of this paper is to provide a quick, immediate, and easy way to choose deserving applicants.
Housing Finance Company deals in all loans. They have a presence across all urban, semi-urban
and rural areas. Customer-first applies for a loan after that company or bank validates the
customer’s eligibility for a loan. A company or bank wants to automate the loan eligibility process
(real-time) based on customer details provided while filling out the application form. These details
are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit
History, and others. This project has taken the data of previous customers of various banks to
whom on a set of parameters loans were approved. So the machine learning model is trained on
that record to get accurate results. Our main objective of this project is to predict the safety of the
loan. To predict loan safety, the SVM and Naïve Bayes algorithm are used. First, the data is
cleaned so as to avoid missing values in the data set.
In this report, we are going to make a Loan Predictor Using Machine Learning. This is a
classification problem in which we need to classify whether the loan will be approved or not.
Classification refers to a predictive modeling problem where a class label is predicted for a given
example of input data. A few examples of classification problems are Spam Email detection,
Cancer detection, Sentiment Analysis, etc.

Chapter 2.
Literature Survey
1." Loan Approval Prediction based on Machine Learning Approach" .
The main objective of this paper is to predict whether assigning the loan to particular person will
be safe or not. This paper is divided into four sections (i)Data Collection (ii) Comparison of
machine learning models on collected data (iii) Training of system on most promising model (iv)
Testing 2.“Exploring the Machine Learning Algorithm for Prediction the Loan Sanctioning
Process” Author- E. Chandra Blessie, R. Rekha - Year- 2019 Extending credits to corporates and
individuals for the smooth functioning of growing economies like India is inevitable. As
increasing number of customers apply for loans in the banks and non- banking financial
companies (NBFC), it is really challenging for banks and NBFCs with limited capital to device a
standard resolution and safe procedure to lend money to its borrowers for their financial needs. In
addition, in recent times NBFC inventories have suffered a significant downfall in terms of the
stock price. It has contributed to a contagion that has also spread to other financial stocks,
adversely affecting the benchmark in recent times. In this paper, an attempt is made to condense
the risk involved in selecting the suitable person who could repay the loan on time thereby
keeping the bank’s nonperforming assets (NPA) on the hold. This is achieved by feeding the past
records of the customer who acquired loans from the bank into a trained machine learning model
which could yield an accurate result. The prime focus of the paper is to determine whether or not
it will be safe to allocate the loan to a particular person. This paper has the following sections (i)
Collection of Data, (ii) Data Cleaning and (iii) Performance Evaluation. Experimental tests found
that the Naïve Bayes model has better performance Evaluation. Experimental tests found that the
Naïve Bayes model has better performance than other models in terms of loan forecasting
Chapter 3.
Understanding the Problem Statement
Housing Finance company deals in all kinds of home loans. They have a presence across all
urban, semi-urban and rural areas. The customer first applies for a home loan and after that, the
company validates the customer eligibility for the loan. The company wants to automate the loan
eligibility process (real-time) based on customer detail provided while filling out online
application forms. These details are Gender, Marital Status, Education, number of Dependents,
Income, Loan Amount, Credit History, and others. To automate this process, they have provided a
dataset to identify the customer segments that are eligible for loan amounts so that they can
specifically target these customers. As mentioned above this is a Binary Classification problem in
which we need to predict our Target label which is “Loan Status”. Loan status can have two
values: Yes or NO.Yes: if the loan is approved NO: if the loan is not approved. So using the
training dataset we will train our model and try to predict our target column that is “Loan Status”
on the test dataset.

Chapter 4.
Implementation Details (Modules):
Loan Dataset : Loan Dataset is very useful in our system for prediction of more accurate result.
Using the loan Dataset the system will automatically predict which costumer’s loan it should
approve and which to reject. System will accept loan application form as an input. Justified format
of application form should be given as an input to get processed.
Typically , Here the system separate a dataset into a training set and testing set ,most of the data
use for training ,and a smaller portions of data is use for testing. after a system has been processed
by using the training set, it makes the prediction against the test set.
Data cleaning and processing: In Data cleaning the system detect and correct corrupt or
inaccurate records from database and refers to identifying incomplete, incorrect, inaccurate or
irrelevant parts of the data and then replacing , modifying or detecting the dirty or coarse data. In
Data processing the system convert data from a given form to a much more usable and desired
form i.e. make it more meaningful and informative.
Dataset:-
Load Essential Python Libraries
Load Training/ Test Dataset
Size of Train/Test Data
So we have 614 rows and 13 columns in our training dataset.
In test data, we have 367 rows and 12 columns because the target column is not included in the
test data.
First look at the Dataset

Imputing the missing values:
 Fill null values with mode
Univariate Analysis Observations
1. More Loans are approved Vs Rejected
2. Count of Male applicants is more than Female
3. Count of Married applicant is more than Non-married
4. Count of graduate is more than non-Graduate
5. Count of self-employed is less than that of Non-Self-employed
6. Maximum properties are located in Semiurban areas

7. Credit History is present for many applicants
8. The count of applicants with several dependents=0 is maximum.
Output:
• Technology:- Machine learning, Variate Analysis, Correlation, Supervised learning.

• Tools:- Python, Jupiter notebook, CSV Files, Libraries etc.
Chapter 5.
Conclusion
After the Final Submission of test data, my accuracy score was 78%.
Feature engineering helped me increase my accuracy.
• People who had repaid their debt had higher chances to get the loan.
• Applicant income does not affect the chances of having a loan.
• Credit history more, semiurban area:- high chances
• Married People have a higher chance of getting loans.
• Co-applicant income less, higher the chances of getting a loan.
Reference:-
Internshala.com
1) Kumar Arun, Garg Ishan, Kaur Sanmeet, ―Loan Approval Prediction based on Machine
Learning Approach‖, IOSR Journal of Computer Engineering (IOSR-JCE), Vol. 18, Issue 3, pp.
79-81, Ver. I (May-Jun. 2016).

Assessment Report Richa

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessment Report Richa

Uploaded by

Copyright:

Available Formats

LOAN PREDICTOR USING MACHINE

INTERNSHIP ASSESSMENT REPORT

NOIDA INSTITUTE OF ENGINEERING & TECHNOLOGY,

Date: Name and Signature of student

The internship assessment viva-voce examination of Mr./Ms. __________, Roll No.

Project Guide: _______ Head of Dept:

External Examiner: _______ Internal Examiner _________

cleaned so as to avoid missing values in the data set.

Cancer detection, Sentiment Analysis, etc.

1." Loan Approval Prediction based on Machine Learning Approach" .

Understanding the Problem Statement

on the test dataset.

Implementation Details (Modules):

Load Training/ Test Dataset

Size of Train/Test Data

So we have 614 rows and 13 columns in our training dataset.

First look at the Dataset

 Fill null values with mode

Univariate Analysis Observations

1. More Loans are approved Vs Rejected

2. Count of Male applicants is more than Female

3. Count of Married applicant is more than Non-married

4. Count of graduate is more than non-Graduate

5. Count of self-employed is less than that of Non-Self-employed

6. Maximum properties are located in Semiurban areas

8. The count of applicants with several dependents=0 is maximum.

• Technology:- Machine learning, Variate Analysis, Correlation, Supervised learning.

Feature engineering helped me increase my accuracy.

• Applicant income does not affect the chances of having a loan.

• Credit history more, semiurban area:- high chances

• Married People have a higher chance of getting loans.

• Co-applicant income less, higher the chances of getting a loan.

79-81, Ver. I (May-Jun. 2016).

You might also like

External Examiner: _ Internal Examiner ___