Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Report: 

00

00
by grammer jtp

General metrics
16,704 2,053 229 8 min 12 sec 15 min 47 sec
characters words sentences reading
speaking

time time

Writing Issues
No issues found

Plagiarism
This text hasn’t been checked for plagiarism

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 1 of 26


Report: 00

Unique Words 38%


Measures vocabulary diversity by calculating
the unique words
percentage of words used only once in your
document

Rare Words 40%


Measures depth of vocabulary by identifying words rare words
that are
not among the 5,000 most common English
words.

Word Length 5.7


Measures average word length characters per word

Sentence Length 9
Measures average sentence length words per sentence

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 2 of 26


Report: 00

00

Loan Eligibility Prediction Using Machine Learning


In today's swiftly changing financial scenario, the ease of
obtaining credit has a crucial role in molding personal
ambitions and goals. Whether it's purchasing a residence,
seeking advanced education, or growing a venture, loans
have seamlessly integrated themselves into our daily
existence. As technological advancements reshape
conventional methodologies, the rise of Machine Learning
(ML) is noteworthy in its ability to foresee loan approval,
reshaping the loan sanctioning process into one that's
more resourceful and enlightened.

Conventional methods for granting loans can be quite


arduous, time-intensive, and susceptible to human
inclinations. Financial establishments meticulously
scrutinize heaps of paperwork and historical records to
gauge an applicant's creditworthiness. Yet, this strategy
doesn't consistently encompass the entire scope, often
leading to lost prospects and hazardous lending choices.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 3 of 26


Report: 00

This is where Machine Learning makes its entrance. By


tapping into the capabilities of ML algorithms, financial
institutions can now streamline the procedure of
assessing loan eligibility, rendering it swiffer, more
equitable, and highly precise. The capacity to dissect an
array of data points and discern patterns guarantees a
thorough evaluation of an applicant's financial
background, culminating in judicious lending verdicts.

Machine Learning algorithms excel at recognizing


intricate patterns within vast datasets. When applied to
loan eligibility prediction, these algorithms consider a
multitude of variables that impact an applicant's
creditworthiness.

Benefits of Loan Eligibility Prediction Using Machine


Learning
Predicting loan eligibility through Machine Learning (ML)
offers a range of benefits that positively impact both

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 4 of 26


Report: 00

borrowers and lending institutions. Here are some key


advantages:
Precise Evaluations: ML algorithms delve into an array of
factors, culminating in more precise judgments about the
fiscal reliability of a candidate. This heightened precision
engenders well-informed lending resolutions,
consequently driving down the prospect of defaults.
Objective Evaluation: The commotion brought about by
ML mitigates latent human inclinations that could sway
lending determinations. This establishes an unbiased and
equitable assessment of all candidates, fostering parity
in opportunities.
Risk Mitigation: The accurate anticipation of the potential
of loan repayments acts as a safeguard against defaults
and sluggish loans. Lenders can handle their loan
portfolios more effectively, thereby preserving financial
robustness.
Scalability: ML-powered loan eligibility assessment can
handle a large volume of applications without
compromising accuracy. This scalability is crucial in
managing peak application periods.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 5 of 26


Report: 00

Enhanced Customer Experience: Accelerated and just


loan approval procedures contribute to a superior client
journey. Borrowers hold esteem for swift service, an
impetus for heightened customer allegiance.

Challenges of Loan Eligibility Prediction Using Machine


Learning
Forecasting loan eligibility using machine learning
techniques presents several challenges that need to be
navigated effectively:
Data Heterogeneity: Combining data from various
sources, each with different formats and structures, can
be complex. Merging this data into a cohesive dataset
that the ML model can understand and learn from is a
challenge.
Feature Engineering: Selecting and preparing the right
variables (features) for the model is critical. Feature
engineering involves transforming and selecting features
that have the most predictive power. It requires domain
expertise and an understanding of how each feature
impacts the prediction.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 6 of 26


Report: 00

Model Robustness: Ensuring that the ML model performs


consistently across different scenarios and doesn't break
down when exposed to unusual data points (outliers) is
essential. The model should be able to handle a range of
inputs without producing unreliable predictions.
Bias Mitigation: Detecting and addressing biases in both
the data and the model is crucial for fair lending
decisions. Biases can lead to discriminatory outcomes,
and ensuring that the model's predictions are equitable
across different demographic groups is a significant
challenge.
Interpretable Models: Complex ML models, while
powerful, can lack transparency. Explaining how the
model arrives at a prediction to both borrowers and
stakeholders is important, especially in lending decisions
that have a significant impact on individuals' lives.

About the Dataset


Dream Housing Finance company specializes in providing
home loans, catering to various regions encompassing
urban, semi-urban, and rural areas. Customers initiate

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 7 of 26


Report: 00

the home loan application process, followed by the


company's assessment of their eligibility for the loan.

The company envisions automating the loan eligibility


assessment in real time based on the customer
information submitted through the online application
form. This information includes factors such as Gender,
Marital Status, Education, Number of Dependents,
Income, Loan Amount, and Credit History, among others.
To streamline this process, they have posed a challenge:
to classify customer segments eligible for loan amounts,
enabling precise targeting of these prospective
borrowers. They've offered a partial dataset to facilitate
this endeavor.

Data Key Information


Loan_ID: Unique identification for each loan.
Gender: Gender of the applicant (Male/Female).
Married: Marital status of the applicant (Yes/No).
Dependents: Number of dependents the applicant has.
Education: Educational background of the applicant
(Graduate/Undergraduate).

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 8 of 26


Report: 00

Self_Employed: Whether the applicant is self-employed


or not (Yes/No).
ApplicantIncome: Income of the applicant.
CoapplicantIncome: Income of the co-applicant (if any).
LoanAmount: Loan amount in thousands.
Loan_Amount_Term: Term of the loan in months.
Credit_History: Whether the applicant's credit history
meets the guidelines (1: Yes, 0: No).
Property_Area: Area where the property is located
(Urban/Semi-Urban/Rural).
Loan_Status: Approval status of the loan (Y: Approved, N:
Not Approved).

Code:

Importing Libraries
import numpy as np
import pandas as pd

Reading the Dataset


train_loan = pd.read_csv('/kaggle/input/loan-eligible-
dataset/loan-train.csv')

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 9 of 26


Report: 00

test_loan = pd.read_csv('/kaggle/input/loan-eligible-
dataset/loan-test.csv')

Let us display a subset of information extracted from our


dataset.
# Here, we present the initial five rows from the dataset.
train_loan.head()
Output:

As evident from the output above, there is a considerable


number of columns, which are also referred to as
features.

We can utilize the "train_loan" dataframe to display a


subset of rows, showcasing records from both the initial
five and the last five entries in the dataset.
train_loan
Output:

We observe a substantial number of rows and columns. To


ascertain the total count of records and columns present

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 10 of 26


Report: 00

in our dataset, we can employ either the shape attribute


or the len() function. These methods provide insights into
the quantity of records and features within the dataset.
print("Rows: ", len(train_loan))
Output:

print("Columns: ", len(train_loan.columns))


Output:

print("Shape : ", train_loan.shape)


Output:

loan_train_columns = train_loan.columns # assigning to


a variable
loan_train_columns # printing the list of columns
Output:

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 11 of 26


Report: 00

We will utilize the train_loan.describe() method to


present essential details extracted from the dataset. This
function furnishes information such as count, mean,
standard deviation (std), minimum, quartiles, and
maximum values in its generated output.
train_loan.describe()
Output:

train_loan.info()
Output:

As evident from the results presented:

The output demonstrates a presence of 614 entries.


A collective of 13 features exists, ranging from 0 to 12.
These features encompass three distinct data types:
float64 (4 instances), int64 (1 instance), and object (8
instances).
The memory consumption, specifically the memory
usage, amounts to 62.5+ KB.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 12 of 26


Report: 00

Furthermore, an assessment of the number of absent


values in the Non-Null Count column is feasible.

EDA( Exploratory Data Analysis)


To begin, our initial focus lies in investigating data of the
'object' type. Therefore, let's create a function aimed at
determining the diversity of values present within the
column.
def exploring_type_object(df ,name_of_feature):
"""
To ascertain the quantity of values within features of the
'categorical' data type,
And provide a tabulation of categorical values along with
their respective frequencies.
"""
if df[name_of_feature].dtype == 'object':
print(df[name_of_feature].value_counts())

To start, our primary emphasis is on examining data


categorized as the 'object' type. As a result, let's generate
a function with the purpose of identifying the range of
values existing within the column.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 13 of 26


Report: 00

# We proceed to test and invoke a function solely focused


on the 'gender' aspect.
exploring_type_object(train_loan, 'Gender')
Output:

A minor concern has arisen: Consider a scenario where


your datasets consist of numerous features that need to
be defined similarly to the code provided earlier.

# The remedy is to recall the existence of the variable


named `loan_train_columns`. Now, let's utilize it.
# 'Loan_Status', 'Property_Area', 'Self_Employed',
'Education', 'Dependents', 'Married', 'Gender', 'Loan_ID'

for featureName in loan_train_columns:


if train_loan[featureName].dtype == 'object':
print('\n"' + str(featureName) + '\'s" Values with count
are :')
exploring_type_object(train_loan, str(featureName))

Output:

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 14 of 26


Report: 00

Using the missingno package, it's necessary to replace


any missing values with the mean and median.
import missingno as msno

# Enumeration of the percentage of values that are


absent
train_loan

train_loan.isna().sum()
# round((train_loan.isna().sum() / len(train_loan)) * 100,
2)

Output:

msno.bar(train_loan)
Output:

msno.matrix(train_loan )

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 15 of 26


Report: 00

Output:

Observing the scenario, it's noticeable that several


columns exhibit a scarcity of null values. Given this
scenario, we opt to utilize the mean and mode for
substituting these NaN values.

train_loan['Credit_History'].fillna(train_loan['Credit_Histo
ry'].mode(), inplace=True) # Mode
test_loan['Credit_History'].fillna(test_loan['Credit_Histor
y'].mode(), inplace=True) # Mode

train_loan['LoanAmount'].fillna(train_loan['LoanAmount']
.mean(), inplace=True) # Mean
test_loan['LoanAmount'].fillna(test_loan['LoanAmount'].
mean(), inplace=True) # Mean

Converting Categorical to Numerical Value


Loan_Status features boolean values, So we replace Y
values with 1 and N values with 0, and the same for other
Boolean types of columns.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 16 of 26


Report: 00

train_loan.Loan_Status =
train_loan.Loan_Status.replace({"Y": 1, "N" : 0})
# test_loan.Loan_Status =
test_loan.Loan_Status.replace({"Y": 1, "N" : 0})

train_loan.Gender = train_loan.Gender.replace({"Male":
1, "Female" : 0})
test_loan.Gender = test_loan.Gender.replace({"Male": 1,
"Female" : 0})

train_loan.Married = train_loan.Married.replace({"Yes":
1, "No" : 0})
test_loan.Married = test_loan.Married.replace({"Yes": 1,
"No" : 0})

train_loan.Self_Employed =
train_loan.Self_Employed.replace({"Yes": 1, "No" : 0})
test_loan.Self_Employed =
test_loan.Self_Employed.replace({"Yes": 1, "No" : 0})

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 17 of 26


Report: 00

train_loan['Gender'].fillna(train_loan['Gender'].mode()[0],
inplace=True)
test_loan['Gender'].fillna(test_loan['Gender'].mode()[0],
inplace=True)

train_loan['Dependents'].fillna(train_loan['Dependents'].
mode()[0], inplace=True)
test_loan['Dependents'].fillna(test_loan['Dependents'].m
ode()[0], inplace=True)

train_loan['Married'].fillna(train_loan['Married'].mode()
[0], inplace=True)
test_loan['Married'].fillna(test_loan['Married'].mode()[0],
inplace=True)

train_loan['Credit_History'].fillna(train_loan['Credit_Histo
ry'].mean(), inplace=True)
test_loan['Credit_History'].fillna(test_loan['Credit_Histor
y'].mean(), inplace=True)
Columns like Property_Area, Dependents, and Education
contain various values. Therefore, we can employ the
LabelEncoder from the sklearn package.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 18 of 26


Report: 00

from sklearn.preprocessing import LabelEncoder


feature_col = ['Property_Area','Education', 'Dependents']
le = LabelEncoder()
for col in feature_col:
train_loan[col] = le.fit_transform(train_loan[col])
test_loan[col] = le.fit_transform(test_loan[col])

Data Visualisation
Here, we are presenting visual insights derived from the
dataset. To achieve this, we require certain packages,
namely matplotlib and seaborn.
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns


sns.set_style('dark')

train_loan
Output:

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 19 of 26


Report: 00

train_loan.plot(figsize=(18, 8))

plt.show()
Output:

plt.figure(figsize=(18, 6))
plt.subplot(1, 2, 1)

train_loan['ApplicantIncome'].hist(bins=10)
plt.title("Loan Application Amount ")

plt.subplot(1, 2, 2)
plt.grid()
plt.hist(np.log(train_loan['LoanAmount']))
plt.title("Log Loan Application Amount ")

plt.show()
Output:

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 20 of 26


Report: 00

plt.figure(figsize=(18, 6))
plt.title("Relation Between Application Income vs Loan
Amount ")

plt.grid()
plt.scatter(train_loan['ApplicantIncome'] ,
train_loan['LoanAmount'], c='k', marker='x')
plt.xlabel("Applicant Income")
plt.ylabel("Loan Amount")
plt.show()
Output:

plt.figure(figsize=(12, 6))
plt.plot(train_loan['Loan_Status'],
train_loan['LoanAmount'])
plt.title("Loan Application Amount ")
plt.show()
Output:

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 21 of 26


Report: 00

plt.figure(figsize=(12,8))
sns.heatmap(train_loan.corr(), cmap='coolwarm',
annot=True, fmt='.1f', linewidths=.1)
plt.show()
Output:

Within this heatmap, the correlation between two


variables is distinctly observable.

Modelling
# import machine learning model from sklearn package

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score


model_logistic = LogisticRegression()
To begin, we utilize the LogisticRegression module from
sklearn.linear_model package. Here's a brief overview of
LogisticRegression.

Logistic Regression is a classification algorithm primarily


employed to predict binary outcomes (e.g., 1 / 0, Yes / No,

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 22 of 26


Report: 00

True / False) based on a set of independent variables. To


represent binary or categorical outcomes, dummy
variables are utilized. Conceptually, you can perceive
Logistic Regression as a specialized form of linear
Regression applied to categorical outcomes, where the
logarithm of odds serves as the dependent variable.

train_features = ['Credit_History', 'Education', 'Gender']

train_x = train_loan[train_features].values
train_y = train_loan['Loan_Status'].values

test_x = test_loan[train_features].values

model_logistic.fit(train_x, train_y)
Output:

Evaluating the Model


# Predicting the model for testing data

predicted = model_logistic.predict(test_x)

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 23 of 26


Report: 00

# checking the coefficients of the trained model


print('Coefficient of model :', model_logistic.coef_)
Output:

# checking the intercept of the model


print('Intercept of model',model_logistic.intercept_)
Output:

# Accuray Score on the training dataset


# accuractrain_y = accuracy_score(test_x, predicted)
score = model_logistic.score(train_x, train_y)
print('accuracy_score overall :', score)
print('accuracy_score percent :', round(score*100,2))
Output:

The accuracy of our model seems quite good.


Now, we can use this model to predict the target on the
testing dataset.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 24 of 26


Report: 00

# Predicting the target on the test dataset


predict_test = model_logistic.predict(test_x)
print('Target on test data',predict_test)
Output:

Now, we can submit the predicted target for the testing


dataset.

Future Aspects of Loan Eligibility Prediction Using


Machine Learning
In the future of Loan Eligibility Prediction with Machine
Learning, things are getting exciting. Imagine smarter
computer programs that can figure out who can get a loan
better. Also, big data will help them become even better
at this. These programs might even tell you quickly if you
can get a loan. Plus, they're working on making these
programs fair and clear so everyone gets a fair chance.
And guess what? These programs might even talk to each
other, with people who know money and technology
teaming up for great ideas. So, the future's looking bright
for loans and computers!

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 25 of 26


Report: 00

Conclusion
Loan eligibility prediction using Machine Learning is a
remarkable advancement in the financial sector. It
streamlines the lending process, making it efficient,
objective, and responsive to individual needs. As
technology evolves and ethical considerations are
addressed, ML-powered lending has the potential to
revolutionize the financial landscape, making credit
accessible to those who need it while maintaining
transparency and fairness.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 26 of 26

You might also like