Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

EXPLORATORY DATA ANALYSIS

CASE STUDY

1
PROBLEM STATEMENT:

The loan providing companies face difficulties to give loans to the people due to their insufficient or non-existent credit history.
Because of that, some consumers use it as their advantage by becoming a defaulter.

This case study aims to identify patterns which indicate if a client has difficulty paying their instalments which may be used for taking
actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc.
This will ensure that the consumers capable of repaying the loan are not rejected

This analysis will help company to identify the variables behind loan default, i.e. the variables which are strong indicators of default.
The company can utilize this knowledge for its portfolio and risk assessment.

All of the above analysis is done using Exploratory Data Analysis.

The observations after EDA is explained in following slides.


Approach:

1. Reading the application data and cleaning the data. Columns with 50% missing data was deleted.
Remaining columns with missing values were analyzed for using distribution plot and count plots. Data was analyzed to find the mean,
mode and median.

2. After analyzing the data columns which cannot be replaced with mean / median / mode were deleted.
Other missing data were substituted with mean / median and mode.

3. Data was analyzed for outliers for numerical columns.

4. Ratio of Imbalance was calculated.

5. Some numerical columns were converted into the categorical columns for the analysis.

6. Univariate analysis was done for all categorical columns with object data type.

7. Top 10 Correlation for defaulters and non defaulters were carried out.

8. Reading the previous application data and cleaning the data. Columns with 50% missing data was deleted.

9. Remaining columns with missing values were analyzed for using distribution plot and count plots. Data was analyzed to find the mean,
mode and median. After analyzing the data columns which cannot be replaced with mean / median / mode were deleted. Other
missing data were substituted with mean / median and mode.

10. Previous and Current application data was merged

11. Bivariate analysis was carried out.


Pie Plot for Ratio of Imbalance for Target data which shows percentage of Non- defaulters and Defaulters

Ratio of data imbalance for Non Defaulters vs Defaulters is 11.38 : 1


Box plot for columns with outliers.
Box plot for columns with outliers.
PLOT: NAME_CONTRACT_TYPE

Inferences:

1. Quantum of revolving loans is extremely low as compared


to cash loans.
2. There are very low defaulters for revolving loan type
contract.
3. Number of defaulters are quite low as compared to non-
defaulters in cash loan contract, however the percentage
defaulters are more incase of cash loans against non
defaulters.
PLOT: CODE_GENDER

Inferences:

1. Quantum of Female applicants are more than the male


applicants.
2. Although the no. of defaulters in Female & Male category
are almost same but within male category percentage of
defaulters are more.
PLOT: FLAG_OWN_CAR

Inferences:

1. Quantum of the applicants who does not own car are more than
the applicants who does own a car.
2. I has been observed from the graph that the applicants who do
not own the car has chances to be a defaulter as compared to the
applicants who owns a car.
PLOT: FLAG_OWN_REALTY

Inferences:

1. Quantum of the applicants who own a house/flat are more than the
applicants who does not have house.

2. It has been observed from the graph that within the applicants who does not
own a house/flat has more defaulters as compared to the applicants who
owns a house/flat.
PLOT: NAME_TYPE_SUITE

Inferences:

1. Quantum of applicants who are single who applied for the


loans are way more than the other category.
2. And the applicants who are single can also be defaulters as
compared to other categories
3. We can observe that the total count of applicants in single
category no. of defaulter are way less than the non- defaulters.
4. However the percentage non defaulters are higher in case of
single applicants.
PLOT: NAME_INCOME_TYPE

Inferences:

1. The applicants who have income type as working tends to have


higher possibility to be a defaulter as compared to the other
categories.
PLOT: NAME_EDUCATION_TYPE

Inferences:

1. The applicants who have secondary /secondary special


education tends to have higher possibility to be a defaulter.

2. High quantum of applicants applied for loans have


secondary/secondary special education.

3. Percentage of non defaulters are higher who has not completed


the higher education.
PLOT: NAME_FAMILY_STATUS

Inferences:

1. The applicants who are married tends to have higher possibility


of being a defaulter as compared to other categories.

2. As per the percentage graph, within the applicants who are


single or have civil marriage tends to have higher possibility to
be defaulters as compared to other categories.
PLOT: NAME_HOUSING_TYPE

Inferences:

1. It is observed that higher quantum of applicants who owns a


house/apartment has applied for loan.
2. As per the percentage graph, within the applicants who are living
with parents/lives in rented apartments tends to have higher
possibility to be defaulters as compared to other categories.
PLOT: OCCUPATION_TYPE

Inferences:

1. People with labour as occupation go for high no. of loans,


however the sales staff and drivers percentage defaulters are
higher compared to non defaulters.
PLOT: ORGANIZATION_TYPE

Inferences:

1. Applicants with Business entity type 3 and self employed have high percentage of defaulters.
Top 10 Correlation of Non- Defaulters and Defaulters is same.
Inferences:

1. People with secondary education loans were refused or cancelled more no. of times in past.
PLOT: CNT_CHILDREN

Inferences:

1. Applicants who have children's are more likely to be defaulters.


Thank You

You might also like