Report 2

Report:
00
00
by grammer jtp
General metrics
16,704 2,053 229 8 min 12 sec 15 min 47 sec
characters words sentences reading
speaking
time time
Writing Issues
No issues found
Plagiarism
This text hasn’t been checked for plagiarism
Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 1 of 26

Report: 00
Unique Words 38%

Measures vocabulary diversity by calculating
the unique words
percentage of words used only once in your
document
Rare Words 40%

Measures depth of vocabulary by identifying words rare words
that are
not among the 5,000 most common English
words.
Word Length 5.7

Measures average word length characters per word
Sentence Length 9
Measures average sentence length words per sentence

Report: 00
00
Loan Eligibility Prediction Using Machine Learning

In today's swiftly changing financial scenario, the ease of
obtaining credit has a crucial role in molding personal
ambitions and goals. Whether it's purchasing a residence,
seeking advanced education, or growing a venture, loans
have seamlessly integrated themselves into our daily
existence. As technological advancements reshape
conventional methodologies, the rise of Machine Learning
(ML) is noteworthy in its ability to foresee loan approval,
reshaping the loan sanctioning process into one that's
more resourceful and enlightened.
Conventional methods for granting loans can be quite

arduous, time-intensive, and susceptible to human
inclinations. Financial establishments meticulously
scrutinize heaps of paperwork and historical records to
gauge an applicant's creditworthiness. Yet, this strategy
doesn't consistently encompass the entire scope, often
leading to lost prospects and hazardous lending choices.

Report: 00
This is where Machine Learning makes its entrance. By

tapping into the capabilities of ML algorithms, financial
institutions can now streamline the procedure of
assessing loan eligibility, rendering it swiffer, more
equitable, and highly precise. The capacity to dissect an
array of data points and discern patterns guarantees a
thorough evaluation of an applicant's financial
background, culminating in judicious lending verdicts.
Machine Learning algorithms excel at recognizing

intricate patterns within vast datasets. When applied to
loan eligibility prediction, these algorithms consider a
multitude of variables that impact an applicant's
creditworthiness.
Benefits of Loan Eligibility Prediction Using Machine

Learning
Predicting loan eligibility through Machine Learning (ML)
offers a range of benefits that positively impact both

Report: 00
borrowers and lending institutions. Here are some key

advantages:
Precise Evaluations: ML algorithms delve into an array of
factors, culminating in more precise judgments about the
fiscal reliability of a candidate. This heightened precision
engenders well-informed lending resolutions,
consequently driving down the prospect of defaults.
Objective Evaluation: The commotion brought about by
ML mitigates latent human inclinations that could sway
lending determinations. This establishes an unbiased and
equitable assessment of all candidates, fostering parity
in opportunities.
Risk Mitigation: The accurate anticipation of the potential
of loan repayments acts as a safeguard against defaults
and sluggish loans. Lenders can handle their loan
portfolios more effectively, thereby preserving financial
robustness.
Scalability: ML-powered loan eligibility assessment can
handle a large volume of applications without
compromising accuracy. This scalability is crucial in
managing peak application periods.

Report: 00
Enhanced Customer Experience: Accelerated and just

loan approval procedures contribute to a superior client
journey. Borrowers hold esteem for swift service, an
impetus for heightened customer allegiance.
Challenges of Loan Eligibility Prediction Using Machine

Learning
Forecasting loan eligibility using machine learning
techniques presents several challenges that need to be
navigated effectively:
Data Heterogeneity: Combining data from various
sources, each with different formats and structures, can
be complex. Merging this data into a cohesive dataset
that the ML model can understand and learn from is a
challenge.
Feature Engineering: Selecting and preparing the right
variables (features) for the model is critical. Feature
engineering involves transforming and selecting features
that have the most predictive power. It requires domain
expertise and an understanding of how each feature
impacts the prediction.

Report: 00
Model Robustness: Ensuring that the ML model performs

consistently across different scenarios and doesn't break
down when exposed to unusual data points (outliers) is
essential. The model should be able to handle a range of
inputs without producing unreliable predictions.
Bias Mitigation: Detecting and addressing biases in both
the data and the model is crucial for fair lending
decisions. Biases can lead to discriminatory outcomes,
and ensuring that the model's predictions are equitable
across different demographic groups is a significant
challenge.
Interpretable Models: Complex ML models, while
powerful, can lack transparency. Explaining how the
model arrives at a prediction to both borrowers and
stakeholders is important, especially in lending decisions
that have a significant impact on individuals' lives.
About the Dataset

Dream Housing Finance company specializes in providing
home loans, catering to various regions encompassing
urban, semi-urban, and rural areas. Customers initiate

Report: 00
the home loan application process, followed by the

company's assessment of their eligibility for the loan.
The company envisions automating the loan eligibility

assessment in real time based on the customer
information submitted through the online application
form. This information includes factors such as Gender,
Marital Status, Education, Number of Dependents,
Income, Loan Amount, and Credit History, among others.
To streamline this process, they have posed a challenge:
to classify customer segments eligible for loan amounts,
enabling precise targeting of these prospective
borrowers. They've offered a partial dataset to facilitate
this endeavor.
Data Key Information

Loan_ID: Unique identification for each loan.
Gender: Gender of the applicant (Male/Female).
Married: Marital status of the applicant (Yes/No).
Dependents: Number of dependents the applicant has.
Education: Educational background of the applicant
(Graduate/Undergraduate).

Report: 00
Self_Employed: Whether the applicant is self-employed

or not (Yes/No).
ApplicantIncome: Income of the applicant.
CoapplicantIncome: Income of the co-applicant (if any).
LoanAmount: Loan amount in thousands.
Loan_Amount_Term: Term of the loan in months.
Credit_History: Whether the applicant's credit history
meets the guidelines (1: Yes, 0: No).
Property_Area: Area where the property is located
(Urban/Semi-Urban/Rural).
Loan_Status: Approval status of the loan (Y: Approved, N:
Not Approved).
Code:
Importing Libraries
import numpy as np
import pandas as pd
Reading the Dataset

train_loan = pd.read_csv('/kaggle/input/loan-eligible-
dataset/loan-train.csv')

Report: 00
test_loan = pd.read_csv('/kaggle/input/loan-eligible-
dataset/loan-test.csv')
Let us display a subset of information extracted from our

dataset.
# Here, we present the initial five rows from the dataset.
train_loan.head()
Output:
As evident from the output above, there is a considerable

number of columns, which are also referred to as
features.
We can utilize the "train_loan" dataframe to display a

subset of rows, showcasing records from both the initial
five and the last five entries in the dataset.
train_loan
Output:
We observe a substantial number of rows and columns. To

ascertain the total count of records and columns present

Report: 00
in our dataset, we can employ either the shape attribute

or the len() function. These methods provide insights into
the quantity of records and features within the dataset.
print("Rows: ", len(train_loan))
Output:
print("Columns: ", len(train_loan.columns))

Output:
print("Shape : ", train_loan.shape)

Output:
loan_train_columns = train_loan.columns # assigning to

a variable
loan_train_columns # printing the list of columns
Output:

Report: 00
We will utilize the train_loan.describe() method to

present essential details extracted from the dataset. This
function furnishes information such as count, mean,
standard deviation (std), minimum, quartiles, and
maximum values in its generated output.
train_loan.describe()
Output:
train_loan.info()
Output:
As evident from the results presented:
The output demonstrates a presence of 614 entries.

A collective of 13 features exists, ranging from 0 to 12.
These features encompass three distinct data types:
float64 (4 instances), int64 (1 instance), and object (8
instances).
The memory consumption, specifically the memory
usage, amounts to 62.5+ KB.

Report: 00
Furthermore, an assessment of the number of absent

values in the Non-Null Count column is feasible.
EDA( Exploratory Data Analysis)

To begin, our initial focus lies in investigating data of the
'object' type. Therefore, let's create a function aimed at
determining the diversity of values present within the
column.
def exploring_type_object(df ,name_of_feature):
"""
To ascertain the quantity of values within features of the
'categorical' data type,
And provide a tabulation of categorical values along with
their respective frequencies.
"""
if df[name_of_feature].dtype == 'object':
print(df[name_of_feature].value_counts())
To start, our primary emphasis is on examining data

categorized as the 'object' type. As a result, let's generate
a function with the purpose of identifying the range of
values existing within the column.

Report: 00
# We proceed to test and invoke a function solely focused

on the 'gender' aspect.
exploring_type_object(train_loan, 'Gender')
Output:
A minor concern has arisen: Consider a scenario where

your datasets consist of numerous features that need to
be defined similarly to the code provided earlier.
# The remedy is to recall the existence of the variable

named `loan_train_columns`. Now, let's utilize it.
# 'Loan_Status', 'Property_Area', 'Self_Employed',
'Education', 'Dependents', 'Married', 'Gender', 'Loan_ID'
for featureName in loan_train_columns:

if train_loan[featureName].dtype == 'object':
print('\n"' + str(featureName) + '\'s" Values with count
are :')
exploring_type_object(train_loan, str(featureName))
Output:

Report: 00
Using the missingno package, it's necessary to replace

any missing values with the mean and median.
import missingno as msno
# Enumeration of the percentage of values that are

absent
train_loan
train_loan.isna().sum()
# round((train_loan.isna().sum() / len(train_loan)) * 100,
2)
Output:
msno.bar(train_loan)
Output:
msno.matrix(train_loan )

Report: 00
Output:
Observing the scenario, it's noticeable that several

columns exhibit a scarcity of null values. Given this
scenario, we opt to utilize the mean and mode for
substituting these NaN values.
train_loan['Credit_History'].fillna(train_loan['Credit_Histo
ry'].mode(), inplace=True) # Mode
test_loan['Credit_History'].fillna(test_loan['Credit_Histor
y'].mode(), inplace=True) # Mode
train_loan['LoanAmount'].fillna(train_loan['LoanAmount']
.mean(), inplace=True) # Mean
test_loan['LoanAmount'].fillna(test_loan['LoanAmount'].
mean(), inplace=True) # Mean
Converting Categorical to Numerical Value

Loan_Status features boolean values, So we replace Y
values with 1 and N values with 0, and the same for other
Boolean types of columns.

Report: 00
train_loan.Loan_Status =
train_loan.Loan_Status.replace({"Y": 1, "N" : 0})
# test_loan.Loan_Status =
test_loan.Loan_Status.replace({"Y": 1, "N" : 0})
train_loan.Gender = train_loan.Gender.replace({"Male":
1, "Female" : 0})
test_loan.Gender = test_loan.Gender.replace({"Male": 1,
"Female" : 0})
train_loan.Married = train_loan.Married.replace({"Yes":
1, "No" : 0})
test_loan.Married = test_loan.Married.replace({"Yes": 1,
"No" : 0})
train_loan.Self_Employed =
train_loan.Self_Employed.replace({"Yes": 1, "No" : 0})
test_loan.Self_Employed =
test_loan.Self_Employed.replace({"Yes": 1, "No" : 0})

Report: 00
train_loan['Gender'].fillna(train_loan['Gender'].mode()[0],
inplace=True)
test_loan['Gender'].fillna(test_loan['Gender'].mode()[0],
inplace=True)
train_loan['Dependents'].fillna(train_loan['Dependents'].
mode()[0], inplace=True)
test_loan['Dependents'].fillna(test_loan['Dependents'].m
ode()[0], inplace=True)
train_loan['Married'].fillna(train_loan['Married'].mode()
[0], inplace=True)
test_loan['Married'].fillna(test_loan['Married'].mode()[0],
inplace=True)
train_loan['Credit_History'].fillna(train_loan['Credit_Histo
ry'].mean(), inplace=True)
test_loan['Credit_History'].fillna(test_loan['Credit_Histor
y'].mean(), inplace=True)
Columns like Property_Area, Dependents, and Education
contain various values. Therefore, we can employ the
LabelEncoder from the sklearn package.

Report: 00
from sklearn.preprocessing import LabelEncoder

feature_col = ['Property_Area','Education', 'Dependents']
le = LabelEncoder()
for col in feature_col:
train_loan[col] = le.fit_transform(train_loan[col])
test_loan[col] = le.fit_transform(test_loan[col])
Data Visualisation
Here, we are presenting visual insights derived from the
dataset. To achieve this, we require certain packages,
namely matplotlib and seaborn.
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

sns.set_style('dark')
train_loan
Output:

Report: 00
train_loan.plot(figsize=(18, 8))
plt.show()
Output:
plt.figure(figsize=(18, 6))
plt.subplot(1, 2, 1)
train_loan['ApplicantIncome'].hist(bins=10)
plt.title("Loan Application Amount ")
plt.subplot(1, 2, 2)
plt.grid()
plt.hist(np.log(train_loan['LoanAmount']))
plt.title("Log Loan Application Amount ")
plt.show()
Output:

Report: 00
plt.title("Relation Between Application Income vs Loan
Amount ")
plt.grid()
plt.scatter(train_loan['ApplicantIncome'] ,
train_loan['LoanAmount'], c='k', marker='x')
plt.xlabel("Applicant Income")
plt.ylabel("Loan Amount")
plt.show()
Output:
plt.plot(train_loan['Loan_Status'],
train_loan['LoanAmount'])
plt.title("Loan Application Amount ")
plt.show()
Output:

Report: 00
plt.figure(figsize=(12,8))
sns.heatmap(train_loan.corr(), cmap='coolwarm',
annot=True, fmt='.1f', linewidths=.1)
plt.show()
Output:
Within this heatmap, the correlation between two

variables is distinctly observable.
Modelling
# import machine learning model from sklearn package
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

model_logistic = LogisticRegression()
To begin, we utilize the LogisticRegression module from
sklearn.linear_model package. Here's a brief overview of
LogisticRegression.
Logistic Regression is a classification algorithm primarily

employed to predict binary outcomes (e.g., 1 / 0, Yes / No,

Report: 00
True / False) based on a set of independent variables. To

represent binary or categorical outcomes, dummy
variables are utilized. Conceptually, you can perceive
Logistic Regression as a specialized form of linear
Regression applied to categorical outcomes, where the
logarithm of odds serves as the dependent variable.
train_features = ['Credit_History', 'Education', 'Gender']
train_x = train_loan[train_features].values
train_y = train_loan['Loan_Status'].values
test_x = test_loan[train_features].values
model_logistic.fit(train_x, train_y)
Output:
Evaluating the Model

# Predicting the model for testing data
predicted = model_logistic.predict(test_x)

Report: 00
# checking the coefficients of the trained model

print('Coefficient of model :', model_logistic.coef_)
Output:
# checking the intercept of the model

print('Intercept of model',model_logistic.intercept_)
Output:
# Accuray Score on the training dataset

# accuractrain_y = accuracy_score(test_x, predicted)
score = model_logistic.score(train_x, train_y)
print('accuracy_score overall :', score)
print('accuracy_score percent :', round(score*100,2))
Output:
The accuracy of our model seems quite good.

Now, we can use this model to predict the target on the
testing dataset.

Report: 00
# Predicting the target on the test dataset

predict_test = model_logistic.predict(test_x)
print('Target on test data',predict_test)
Output:
Now, we can submit the predicted target for the testing

dataset.
Future Aspects of Loan Eligibility Prediction Using

Machine Learning
In the future of Loan Eligibility Prediction with Machine
Learning, things are getting exciting. Imagine smarter
computer programs that can figure out who can get a loan
better. Also, big data will help them become even better
at this. These programs might even tell you quickly if you
can get a loan. Plus, they're working on making these
programs fair and clear so everyone gets a fair chance.
And guess what? These programs might even talk to each
other, with people who know money and technology
teaming up for great ideas. So, the future's looking bright
for loans and computers!

Report: 00
Conclusion
Loan eligibility prediction using Machine Learning is a
remarkable advancement in the financial sector. It
streamlines the lending process, making it efficient,
objective, and responsive to individual needs. As
technology evolves and ethical considerations are
addressed, ML-powered lending has the potential to
revolutionize the financial landscape, making credit
accessible to those who need it while maintaining
transparency and fairness.

Report 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report 2

Uploaded by

Copyright:

Available Formats

Report:

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 1 of 26

Unique Words 38%

Rare Words 40%

Word Length 5.7

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 2 of 26

Loan Eligibility Prediction Using Machine Learning

Conventional methods for granting loans can be quite

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 3 of 26

This is where Machine Learning makes its entrance. By

Machine Learning algorithms excel at recognizing

Benefits of Loan Eligibility Prediction Using Machine

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 4 of 26

borrowers and lending institutions. Here are some key

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 5 of 26

Enhanced Customer Experience: Accelerated and just

Challenges of Loan Eligibility Prediction Using Machine

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 6 of 26

Model Robustness: Ensuring that the ML model performs

About the Dataset

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 7 of 26

the home loan application process, followed by the

The company envisions automating the loan eligibility

Data Key Information

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 8 of 26

Self_Employed: Whether the applicant is self-employed

Reading the Dataset

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 9 of 26

Let us display a subset of information extracted from our

As evident from the output above, there is a considerable

We can utilize the "train_loan" dataframe to display a

We observe a substantial number of rows and columns. To

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 10 of 26

in our dataset, we can employ either the shape attribute

print("Columns: ", len(train_loan.columns))

print("Shape : ", train_loan.shape)

loan_train_columns = train_loan.columns # assigning to

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 11 of 26

We will utilize the train_loan.describe() method to

As evident from the results presented:

The output demonstrates a presence of 614 entries.

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 12 of 26

Furthermore, an assessment of the number of absent

EDA( Exploratory Data Analysis)

To start, our primary emphasis is on examining data

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 13 of 26

# We proceed to test and invoke a function solely focused

A minor concern has arisen: Consider a scenario where

# The remedy is to recall the existence of the variable

for featureName in loan_train_columns:

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 14 of 26

Using the missingno package, it's necessary to replace

# Enumeration of the percentage of values that are

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 15 of 26

Observing the scenario, it's noticeable that several

Converting Categorical to Numerical Value

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 16 of 26

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 17 of 26

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 18 of 26

from sklearn.preprocessing import LabelEncoder

import seaborn as sns

Report was generated on Monday, Aug 28, 2023, 02:16 AM Page 19 of 26