EDA Credit Case Study

EXPLORATORY DATA ANALYSIS
CASE STUDY
1
PROBLEM STATEMENT:
The loan providing companies face difficulties to give loans to the people due to their insufficient or non-existent credit history.
Because of that, some consumers use it as their advantage by becoming a defaulter.
This case study aims to identify patterns which indicate if a client has difficulty paying their instalments which may be used for taking
actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc.
This will ensure that the consumers capable of repaying the loan are not rejected
This analysis will help company to identify the variables behind loan default, i.e. the variables which are strong indicators of default.
The company can utilize this knowledge for its portfolio and risk assessment.
All of the above analysis is done using Exploratory Data Analysis.
The observations after EDA is explained in following slides.

Approach:
1. Reading the application data and cleaning the data. Columns with 50% missing data was deleted.
Remaining columns with missing values were analyzed for using distribution plot and count plots. Data was analyzed to find the mean,
mode and median.
2. After analyzing the data columns which cannot be replaced with mean / median / mode were deleted.
Other missing data were substituted with mean / median and mode.
3. Data was analyzed for outliers for numerical columns.
4. Ratio of Imbalance was calculated.
5. Some numerical columns were converted into the categorical columns for the analysis.
6. Univariate analysis was done for all categorical columns with object data type.
7. Top 10 Correlation for defaulters and non defaulters were carried out.
8. Reading the previous application data and cleaning the data. Columns with 50% missing data was deleted.
9. Remaining columns with missing values were analyzed for using distribution plot and count plots. Data was analyzed to find the mean,
mode and median. After analyzing the data columns which cannot be replaced with mean / median / mode were deleted. Other
missing data were substituted with mean / median and mode.
10. Previous and Current application data was merged
11. Bivariate analysis was carried out.

Pie Plot for Ratio of Imbalance for Target data which shows percentage of Non- defaulters and Defaulters
Ratio of data imbalance for Non Defaulters vs Defaulters is 11.38 : 1

Box plot for columns with outliers.
Box plot for columns with outliers.
PLOT: NAME_CONTRACT_TYPE
Inferences:
1. Quantum of revolving loans is extremely low as compared

to cash loans.
2. There are very low defaulters for revolving loan type
contract.
3. Number of defaulters are quite low as compared to non-
defaulters in cash loan contract, however the percentage
defaulters are more incase of cash loans against non
defaulters.
PLOT: CODE_GENDER
Inferences:
1. Quantum of Female applicants are more than the male

applicants.
2. Although the no. of defaulters in Female & Male category
are almost same but within male category percentage of
defaulters are more.
PLOT: FLAG_OWN_CAR
Inferences:
1. Quantum of the applicants who does not own car are more than
the applicants who does own a car.
2. I has been observed from the graph that the applicants who do
not own the car has chances to be a defaulter as compared to the
applicants who owns a car.
PLOT: FLAG_OWN_REALTY
Inferences:
1. Quantum of the applicants who own a house/flat are more than the
applicants who does not have house.
2. It has been observed from the graph that within the applicants who does not
own a house/flat has more defaulters as compared to the applicants who
owns a house/flat.
PLOT: NAME_TYPE_SUITE
Inferences:
1. Quantum of applicants who are single who applied for the

loans are way more than the other category.
2. And the applicants who are single can also be defaulters as
compared to other categories
3. We can observe that the total count of applicants in single
category no. of defaulter are way less than the non- defaulters.
4. However the percentage non defaulters are higher in case of
single applicants.
PLOT: NAME_INCOME_TYPE
Inferences:
1. The applicants who have income type as working tends to have

higher possibility to be a defaulter as compared to the other
categories.
PLOT: NAME_EDUCATION_TYPE
Inferences:
1. The applicants who have secondary /secondary special

education tends to have higher possibility to be a defaulter.
2. High quantum of applicants applied for loans have

secondary/secondary special education.
3. Percentage of non defaulters are higher who has not completed

the higher education.
PLOT: NAME_FAMILY_STATUS
Inferences:
1. The applicants who are married tends to have higher possibility

of being a defaulter as compared to other categories.
2. As per the percentage graph, within the applicants who are

single or have civil marriage tends to have higher possibility to
be defaulters as compared to other categories.
PLOT: NAME_HOUSING_TYPE
Inferences:
1. It is observed that higher quantum of applicants who owns a

house/apartment has applied for loan.
2. As per the percentage graph, within the applicants who are living
with parents/lives in rented apartments tends to have higher
possibility to be defaulters as compared to other categories.
PLOT: OCCUPATION_TYPE
Inferences:
1. People with labour as occupation go for high no. of loans,

however the sales staff and drivers percentage defaulters are
higher compared to non defaulters.
PLOT: ORGANIZATION_TYPE
Inferences:
1. Applicants with Business entity type 3 and self employed have high percentage of defaulters.
Top 10 Correlation of Non- Defaulters and Defaulters is same.
Inferences:
1. People with secondary education loans were refused or cancelled more no. of times in past.
PLOT: CNT_CHILDREN
Inferences:
1. Applicants who have children's are more likely to be defaulters.

Thank You

EDA Credit Case Study

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EDA Credit Case Study

Uploaded by

Copyright:

Available Formats

EXPLORATORY DATA ANALYSIS

All of the above analysis is done using Exploratory Data Analysis.

The observations after EDA is explained in following slides.

3. Data was analyzed for outliers for numerical columns.

4. Ratio of Imbalance was calculated.

10. Previous and Current application data was merged

11. Bivariate analysis was carried out.

Ratio of data imbalance for Non Defaulters vs Defaulters is 11.38 : 1

1. Quantum of revolving loans is extremely low as compared

1. Quantum of Female applicants are more than the male

1. Quantum of applicants who are single who applied for the

1. The applicants who have income type as working tends to have

1. The applicants who have secondary /secondary special

2. High quantum of applicants applied for loans have

3. Percentage of non defaulters are higher who has not completed

1. The applicants who are married tends to have higher possibility

2. As per the percentage graph, within the applicants who are

1. It is observed that higher quantum of applicants who owns a

1. People with labour as occupation go for high no. of loans,

1. Applicants who have children's are more likely to be defaulters.

You might also like