Credit Score Prediction.

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

CREDIT SCORING MODEL

Model Overview
Development of credit scoring models is important for financial institutions to identify
defaulters and no defaulters when making credit granting decisions. In recent years, artificial
intelligence (AI) techniques have shown successful performance in credit scoring. Crediting
scoring model which predicts the credit score of employees as well as the probability a given
employee fall in financial distress in the next 2 years. Improve on the state of the art in credit
scoring by predicting the probability that somebody will experience financial distress in the
next two years.

Banks, Companies and Organization play a crucial role in market economies. They decide
who can get finance and on what terms and can make or break investment decisions. For
markets and society to function, individuals and companies need access to credit.

Credit scoring algorithms, which make a guess at the probability of default, are the method
banks use to determine whether or not a loan should be granted. This competition requires
participants to improve on the state of the art in credit scoring, by predicting the probability
that somebody will experience financial distress in the next two years.

Goal:

The goal of this credit scoring model is to build a model that borrowers can use to help make
the best financial decisions. Historical data are provided on 250,000 borrowers.

Open Dataset

Dataset source: Kaggle

The variables are the following: SeriousDlqin2yrs Person experienced 90 days past due
delinquency or worse (Target variable / label)

RevolvingUtilizationOfUnsecuredLines: Total balance on credit cards and personal lines of


credit except real estate and no installment debt like car loans divided by the sum of credit
limits

age Age of borrower in years

NumberOfTime30-59DaysPastDueNotWorse: Number of times borrower has been 30-59


days past due but no worse in the last 2 years.

Debt Ratio: Monthly debt payments, alimony, living costs divided by monthly gross income

Monthly Income: Monthly income

NumberOfOpenCreditLinesAndLoans: Number of Open loans (installment like car loan or


mortgage) and Lines of credit (e.g. credit cards)
NumberOfTimes90DaysLate: Number of times borrower has been 90 days or more past
due.

NumberRealEstateLoansOrLines: Number of mortgage and real estate loans including


home equity lines of credit

NumberOfTime60-89DaysPastDueNotWorse: Number of times borrower has been 60-89


days past due but no worse in the last 2 years.

NumberOfDependents: Number of dependents in family excluding themselves (spouse,


children etc.)

Predictive Model

Random Forest:

We will be using a random forest classifier for two reasons: firstly, because it would allow us
to quickly and easily change the output to a simple binary classification problem. Secondly,
because the predict_proba functionality allows us to output a probability score (probability of
1), this score is what we will use for predicting the probability of 90 days past due
delinquency or worse in 2 years’ time.

Furthermore, we will predominantly be adopting a quantiles-based approach in order to


streamline the process as much as possible so that hypothetical credit checks can be returned
as easily and as quickly as possible.

Other options which could have been used include:


Naïve bayes model could have been used to classification but is a little slower.
It is a classification technique based on Bayes' Theorem with an assumption of independence
among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature

1. Step 1: Separate By Class.

2. Step 2: Summarize Dataset.

3. Step 3: Summarize Data By Class.

4. Step 4: Gaussian Probability Density Function.

5. Step 5: Class Probabilities.

SVM model – support vector machine – classification, regression and outliers detection.
Effective in high dimensional spaces (meaning BIG Data). The support vector machine (SVM)
is a predictive analysis data-classification algorithm that assigns new data elements to one of
labelled categories. Support Vector Machine approaches have constantly received attention
from researchers in establishing new credit models.
Model Accuracy:
This model lead to an accuracy rate of 0.800498.

You might also like