Modeling Course Sec 02

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Model building (credit scoring) - logistic regression models

using SAS

Data design for modeling

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 1


Section-2, Model Design Agenda

 Understand what is a model?


 What is a credit score?
 What is modeling?
 Where it is used?
 What benefit it brings?
 Discuss various kinds of score / models.
 What makes a typical scorecard

 Model design terms


– Observation point, historical window and performance window
– Performance indicator – good / bad / inderminate
 Precaution for performance window selection
 Precaution with initial set of variables

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 2


Model Design - Example

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 3


Model Objective & Project Design - example

Background : To build a collection score for a retail portfolio.


Objective : Build a behavioral model to improve collections efficiency
Medical trilogy:

Accident in remote village


Collections
agents will
Patients will not focus their
Patients will Patients don’t
survive – no effort on
survive with need doctor’s
matter what you who will pay
medical help help
do
with call
Even general
villagers can help
Leave them Help them
them with first
aid

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 4


Model Objective & Project Design - example

Background : To build a collection score for a retail portfolio.


Objective : Build a behavioral model to improve collections efficiency
Project Design :
 Behavioral Model – account based score
 Data – Bureau Data and On-us data
 Observation window : Jan 2006
 History window : Jan 05-Jan 06
 Performance Window : Feb06-July 06

Jul-06

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 5


Model Objective & Project Design - example

 Exclusions :
– Accounts bankrupt or charged off ;
– Months on Book<12 ;
– Current Delinquency > 2 due
– Current delinquency <= 1 due
 Performance Definition :
– Bad (Bad / Response =1) : 90+ DPD or worse in the next 6 months.
– Good (Bad / Response =0) : Never 90+ or worse in next 6 months
 Validation : In time validation / Out of time validation

Hence all these details are finalized while fixing the project design and objective.

Comeback to this section after


going through the next video
presentation on definitions

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 6


Model Design – Definitions and Pointers

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 7


Model Objective & Project Design
 Objective of model conveys two essential concepts-
– Purpose of the model is to build a scorecard to rank order applicants by:
• credit risk such as write-off, bankruptcy, delinquency
• revenue/profitability
• attrition/fraud
• collection/recovery
– Estimate the impact of the new score in terms of profitability and loss
reduction

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 8


Model Objective & Project Design
 Project Design involves exploring and finalizing on the following aspects of the
project :
– Customer score vs. account score
– Observation point and/or window
– Historical information window
– Performance window
– Exclusions
– Bad definition
– Validation plan and samples
– Segmentation (if appl.)

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 9


Observation and Performance window

 Observation and performance window are decided based on the objective and
History Performance window
the underlying Product
 Observation Point or Scoring Point :
– The time period at which the model is being built.
 Performance Window :
– The time window which is used to gauge the performance of account to
assign a good/ bad value.
 History Window :
– History window contains the past performance of the account
– Used for Behavioral scores where on-us history is significant
– On –us history : Performance of the account on our books
– off Us- Generic performance of the customer provided by Bureau across all
products]
4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 10
Performance Definition

 Performance definition could have three categories :


Good - Bad - Indeterminate
 Indeterminate are the gray accounts which don’t have the characteristics of
either good or bad .
For ex We have performance definition as -
– Bad – 60+dpd or worse in next 6 months ;
– Good : Never Delinquent
– Indeterminate : Accounts that go delinquent between 1-59 dpd but not
worse
 Indeterminates are excluded when building models i.e from development data
as they may bias the results. But they can be included while validating the
models to gauge the performance on entire sample.
 Indeterminates should not be more than 20 % of the entire base.
4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 11
Deep dive into performance window for the model design

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 12


Observation and Performance window (contd.)

 Key factors in choosing Observation point are :


– Captures enough accounts to model (rule of thumb – minimum 1000
records)
– Takes in account any seasonal influences
– Should be representative of the current portfolio profile as close as
possible.
 Key factors in choosing Performance period are :
– Window should be long enough to ensure accounts have sufficient time for
their performance to mature
– Sufficient number of goods and bads are there (rule of thumb – minimum
200 good and 200 bad records)

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 13


Observation and Performance window (contd.)

 Choosing optimal performance window could be done using Vintage


Analysis.

Write off in the period Cum Write off rate


3.50%

3.00%

2.50%
What should
2.00% be the ideal
performance
1.50%
window?
1.00%
0.75% 0.75%
0.50%
0.50% 0.29% 0.25%
0.13% 0.15% 0.13% 0.11% 0.14%
0.03% 0.05%
0.00%
03 06 09 12 15 18 21 24 27 30 33 36

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 14


Vintage Chart – as of Sep 2013
3.50%

3.00%
Cumulative write off rate

2.50%

2.00%

1.50%

1.00%

0.50%

0.00%
03 06 09 12 15 18 21 24 27 30 33 36
Q4-2010 0.03% 0.08% 0.21% 0.50% 1.00% 1.75% 2.50% 2.75% 2.90% 3.03% 3.14% 3.28%
Q2-2011 0.05% 0.09% 0.28% 0.60% 1.15% 1.80% 2.75% 2.90% 3% 3.15%
Q4-2011 0.04% 0.08% 0.32% 0.62% 1.20% 1.85% 2.60% 2.75%
Q4-2012 0.10% 0.20% 0.45% 0.90%

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 15


Observation and Performance window (contd.)

 Performance window could be of two types –


– Rolling window – Involves gauging the performance for the fixed period from
observation point
– Variable window –Involves measuring the performance of accounts as of a the
end of performance. So accounts based on different observation point will have
variable months of performance.
Rolling Window

Jul-11
• Why
Apr-11
Rolling
window /
Jan-11 Variable
window?
0 1 2 3 4 5 6 7 8 9 10 11 12

Variable Window

Q1-11

4/18/2014
0 7:331 AM 2 3 4 5 6
Created by 7– Gopal 8Prasad Malakar
9 10 11 12 16
Observation and Performance window (contd.)

Rule of thumb –
if the variance in performance window ( say 3 months when performance
window is 24 to 27 months) is quite smaller is comparison of minimum
performance window, variable performance window is ok
Where as if the variance in performance window (say 3 months when
performance window is 6 to 9 months) is quite big is comparison of minimum
performance window, one should go for rolling performance window

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 17


Model design / development precautions

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 18


Model Objective & Project Design (contd.)

Checkpoints to be kept in mid while deciding the objective and Project Design :

 Objective should be precise and measurable .I.e Objective cannot be


‘minimize losses’ but should state how and what is the benchmark.

 Objective should be in conjunction with the business strategy : It’s more


important to build a model which best fits the business requirement rather
than building a statistically good model but which doesn’t align well with the
business strategy.

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 19


Data Sufficiency

 Data Sufficiency involves ensuring that the data has the required attributes to
make the prediction as stated by objective

For ex : 1) To build a model to predict fraud, the given data doesn’t have any
key for identifying fraud accounts or those identifiers are erased than there are no
accounts which we can identify as ‘bad’ and build a model to predict the same.
2) Alternatively if we are building a response model specifically for internet
channel for a “airline card”. Then data should have a identifier for ‘channel of
acquisition’ to identify the right data base on which to build the model
 Be careful with not selecting effect variable as cause variable
 Be careful to select variable, which are available at the time scoring

4/18/2014 7:33 AM Created by – Gopal Prasad Malakar 20

You might also like