Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

Housing Loan

Approval
Prediction
Group 1

Ansh Tulsyan – 200103033

Bhushan Agrawal - 200103050

Mansi Ramrakhyani - 200103089

Sabari Santosh S - 200101130

Vikrant Beniwal – 200103089


The Problem Statement

• Predicting Loan Approval for home


loans based on historical data by
identifying those customer segments
who are eligible for a loan amount

• The Independent Variables: Gender,


Marital Status, Education, Number of
Dependents, Income, Loan Amount,
Credit History

• The Dependent Variable: Loan Status


The Dataset

• Dataset Source: Analytics Vidhya

• There are outliers in some of the features


– Applicant and Co Applicant income

• Extra characters like ‘+’ on some of the


rows for the feature – Dependents

• Blank fields in Gender, Married,


Dependents and Self_Employed and NAs
in LoanAmount, Loan_Amount_term and
Credit_History Missing data and NAs

• Data is also Skewed


Exploratory Data Analysis
Key Observations – Exploratory Data Analysis
• 7 Variables had missing data

• On studying the distribution of loan amount and applicant


income, there are extreme values which may be outliers

• Graduates have more outliers, and their loan distribution


is wider compared to non-graduates

• Some features had certain data points that were not


scaled – for example: Credit_history variable had mean of
0.8422 even though the data was 0s and 1s

• There was no normal distribution of the features showing


that scaling and normalization needs to be done
Distribution After Cleaning The Data
The Solution
• Logistic Regression : It is used to predict the probability of certain Loading The Data

classes or events
Exploratory Data Analysis
• GLM: Model used is GLM – Generalized Linear Model

• MLE: Maximum Likelihood Estimation is used to fit the data to the


Cleaning The Data
model

• Feature Selection: Manual Feature selection was done based on Train/Test Split
the exploratory data analysis and feature correlation matrix

• Independent Variables: Credit History, Education, Self Employed, Build Model and Fit Train Data

Property Area, Loan Amount, Income


Measure Results and Prediction on
• Dependent Variable: Loan Status Test Data
The Results
• Train Data: Accuracy 82%

• Test Data: Accuracy 84%

• Misclassification Error: 0.16

• AUC ROC: 0.4912406

• Confusion Matrix:

Model Summary
Thank you!

You might also like