Professional Documents
Culture Documents
Credit Model Validation by Glemon Kiefer Lason
Credit Model Validation by Glemon Kiefer Lason
by
Dennis Glennon
Nicholas M. Kiefer
C. Erik Larson
and
Hwan-sik Choi
July 2007
.
Development and Validation of Credit-Scoring
Models1
1 Disclaimer: The statements made and views expressed herein are solely those of the
authors and do not necessarily represent o¢ cial policies, statements, or views of the O¢ ce
of the Comptroller of the Currency or its sta¤ or of Fannie Mae or its sta¤ . Acknowledge-
ment: We are grateful to our colleagues for many helpful comments and discussions, and
especially to Regina Villasmil, curator of the OCC/RAD consumer credit database.
2 U.S. Department of the Treasury, O¢ ce of the Comptroller of the Currency, Risk
Analysis Division.
3 Cornell University, Departments of Economics and Statistical Sciences, 490 Uris Hall,
Decisions
Abstract
The initial sample consists of 1; 000; 000 randomly selected individual credit reports
as of June 30, 1999. Nine hundred …fty thousand of these individuals were ran-
domly sampled from the sub-population of individuals for whom the value of a
generic, bureau-based score (GBS) could be computed (the scoreable population),
while 50; 000 individuals were sampled from the unscoreable population. The allo-
cation of the sample between scoreable and unscoreable populations was chosen in
order to track some initially unscoreable observations longitudinally though subse-
quent time periods. Because the unscoreable segment represents roughly 25 percent
of the credit bureau population, a purely random sampling from the main credit
bureau database would have yielded too many unscoreable individuals.2
Given the required cross-sectional size and the need to observe future performance
when developing a model, it was also determined that the sample should include
performance information through June 30, 2004 the terminal date of our data
set. The 1,000,000 observations from the June 30, 1999 sample make up the initial
“core”set of observations under our panel data design. The panel is constructed
by updating the credit pro…le of each observation in the core on June 30th of each
subsequent year. In Figure 1 we illustrate the general sampling and matching strat-
2
Unscoreable individuals include those who are deceased or who have only public records or very
thin credit tradline experience.
egy using the 1999 and 2000 data; counts of sampled and matched individuals are
presented in Tables 1 and 2.
In general, the match rate from one year’s sample to the following year’s bu-
reau master …le is high. Some of the scoreable individuals sampled in 1999 became
unscoreable in 2000, again due to death or inactivity, and some of the previously
unscoreable became scoreable in 2000 (for instance, if they had acquired enough
credit history). Of the 1; 000; 000 individuals sampled in 1999, 949; 790 individuals
were found to be scoreable as of June 30, 2000. As indicated in Table 2, this change
resulted from 17,339 individuals moving from scoreable to unscoreable or missing,
while 17; 129 individuals moved from unscoreable to scoreable.
Over time, the credit quality of a …xed sample of observations (i.e., the core) is
likely to diverge from that of a growing population. For that reason, we update the
core each year by sampling additional individuals from the general population and
then developing “rebalanced”sampling weights which allow for comparison between
the updated core and the current population. For example, we update the core
in 2000 by comparing the GBS distribution of the 949; 790 individuals from the
1999-2000 matched sample (tabulated using 10-point score buckets from 300 to 900,
the range of the GBS) to a similarly constructed GBS distribution for an additional
950; 000 individuals randomly sampled from the credit bureau’s master …le as of June
30, 2000. The relative di¤erence in frequency by bucket between the two distributions
was then used to identify the size of an “update sample”of individuals to add to the
1999-2000 matched sample. The minimum of these bucket-level frequency changes
(i.e., the maximum decrease rate in relative frequency) was then used as a sampling
proportion to determine the number of additional individuals that would be randomly
sampled from the June 30, 2000, scoreable population and added to the core data
set (i.e., the 1999-2000 matched …le). For 2000, the “updating proportion” was
determined to be 7 percent, resulting in the addition of 66,500 individuals from
the 2000 scoreable population to the 1999-2000 matched scoreable sample on the
CCDB. Use of this updating strategy ensures that the precision with which one might
estimate characteristics at the GBS bucket level in a given year does not diminish
due to drift in the credit quality of those individuals sampled in earlier years.
Sampling for years 2001-2004 proceeded along similar lines, with the results re-
ported again in Tables 1 and 2. The individuals who were members of the CCDB
panel in a previous year (i.e., the core) were matched to a current year’s master …le.
Individuals who were unmatched or remained or became unscoreable in the current
year were dropped from the CCDB panel and then replaced with another draw of
50,000 unscoreable individuals from the current year’s master …le. The GBS distribu-
tion from the panel was compared with that for a random cross section of individuals
drawn from the current master …le and a “updating proportion”was determined and
applied to de…ne an additional fraction of the random cross section to add to and
complete the current-year CCDB panel.
3 Scorecard Development
3.1 De…ning Performance and Identifying Risk Drivers
We follow industry-accepted practices to generate a comprehensive risk pro…le for
each individual. We use as a starting point the …ve broadly de…ned categories out-
lined in Fair-Isaac (2006). We summarized our own examples of possible credit
bureau variables that fall within each category and which are obtainable from our
data set; these are presented in Table 3.
Scorecard development attempts to build a segmentation or index that can be
used to classify agents into two or more distinct groups. Econometric methods for
the modeling of limited dependent variables and statistical classi…cation methods
are therefore commonly applied. In order to implement these types of models using
the type of credit information available from bureaus, it is necessary to de…ne a
performance outcome; this is usually, but not necessarily, dichotomous, with classes
generally distinguishing between “good”and “bad”credit histories based upon some
measure of performance.
In this paper, we choose to classify and develop a predictive model for perfor-
mance of good and bad credits based upon their “default”experience. Bad outcomes
correspond to individuals who experience a “default”and “good”outcomes to indi-
viduals who do not. It is our convention to assign a default if an individual becomes
90 days past due (DPD), or worse, on at least one bankcard over a 24-month per-
formance period (for example July 1999 through June 2001). Although regulatory
rules require banks to charge-o¤ credit card loans at 180 DPD, it is not uncommon
among practitioners to use our more conservative de…nition of default (90+ DPD).
We experimented with a de…nition of default based on both a 12- and 18-month
performance period. The results of our analysis are fundamentally the same under
the alternative de…nitions of default.
0
pi = E(yi jxi ) = 1=(1 + exp( xi )) for each individual i; (1)
Z = ln(^
p=(1 p^)): (3)
The semiparametric models use the estimated (parametric) index function to parti-
tion the sample into relative risk segments. We rank the sample by the estimated
index from the logistic regression and then estimate the link function nonparamet-
rically. Speci…cally, for this model the estimates of the default rate are equal to the
empirically observed default rate within each segment.
We follow current industry practice and partition the sample into discrete seg-
ments, chosen so that each band contains the same number of observations, m: Given
the sample size, we create 30 distinct segments. For each segment, the predicted
probability of default is given by
p^i = y Ji ; (4)
where Pn
yk 1fJk = Ji g
y Ji = Pk=1
n : (5)
k=1 1fJk = Ji g
and Ji 2 f1; :::; 30g denotes the segment J to which individual i belongs.
The fully nonparametric model form does not assume a functional form for the
covariates. To implement our nonparametric speci…cation, we use a tree method
called CHAID (Chi-squared Automatic Interaction Detector) to cluster the data into
multiple “nodes” by individual characteristics (attributes). The variable selection
process searches by sequential subdivision for a grouping of the data giving maximal
discrimination subject to limitations on the sizes of the groups (avoiding the best …t
solution of one group per data point). The approach is due to Kass (1980).6 The
CHAID approach splits the data sequentially by performing consecutive Chi-square
6
Various re…nements have been made to Kass’s original speci…catgion; we implement CHAID
using the SAS macro %TREEDISC (SAS (1995)).
tests on all possible splits. It accepts the best split. If all possible splits are rejected,
or if a minimum group size limit is reached, it stops. Each of the …nal nodes is
assigned with predictions that are equal to the empirical default probability, p^n for
node n. By design of the algorithm, individuals within a node are chosen to be as
homogeneous as possible, while individuals in di¤erent nodes are as heterogeneous
as possible (in terms of p^n ), resulting in maximum discrimination. Note that the
splitting of the development sample data into four segments which preceded the
construction of parametric and semiparametric models was not undertaken prior to
implementing the CHAID algorithm.
For the CHAID method we have to specify (1) the candidate variable list, (2)
the transformation of continuous variables into discrete variables, and (3) the mini-
mum size of the …nal nodes. We considered two di¤erent sets of candidate variables.
Initially, we considered all available attributes and kept only those that generated
at least one split. As an alternative, we used only those attributes that were iden-
ti…ed using the Intersection method for variable selection outlined above. In the
latter case, for each model segment (i.e., clean/thick, clean/thin, dirty/current, and
dirty/delinquent), we take the intersection of the variables from the stepwise selec-
tion process with the variables appearing 10+ times in the 20 percent, 50 percent,
and 100 percent Resampling methods, then combine the selected variables across the
model segments by taking the union of those sets of variables.
As the CHAID approach considers all possible splits, it requires the splitting of
continuous variables into discrete ranges. We chose the common and practical ap-
proach of constructing dummy variables to represent each quartile of each continuous
variable. As a validity check on this procedure we also split the continuous variables
into 200 bins. (Note that this process includes all intermediate splits from 4–199 as
special cases).
To prevent nodes from having too few observations or having only one kind of
account (good or bad), we set the minimum of observations in a node to be 1; 000.
The CHAID rejects a split if it produces a node smaller than 1; 000. Therefore the
size of the …nal nodes works as a stopping rule for the CHAID. Since this speci…cation
is rather arbitrary, we experiment with di¤erent node sizes ranging from 100 to 8; 000
observations.
(1 p^700 )
= 20; (6)
p^700
2. Every 20-unit increase in S doubles the odds ratio. The score values, S; are
calibrated using the a¢ ne transformation:
We also recalibrate the GBS so as to allow for comparison with the RAD scores.
Since we cannot observe the predicted p^ associated with the GBS, we estimate it
though a linear regression of the empirical log-odds in our sample against the score
values. Data for the regression consists of empirical log-odds estimated for 20 di¤erent
buckets of individuals sorted by the GBS and the associated bucket mean bureau
values.
X
10
(pj pj )2
HL = ; (8)
j=1
pj (1 pj )=nj
where nj is the number of observation in each of the j deciles. The H-L statistic is
distributed as Chi-square with 8 degrees of freedom under the null that pj = pj for all
j. Just to be clear, a good model should have a high value of the separation measure,
K-S, but a low value of the accuracy measure H-L. It would be perhaps better to
label the H-L as an an “inaccuracy” measure, as it is a Chi-squared measure of …t,
but the contrary convention is long established.
5 Conclusion
We developed a credit-scoring model for bankcard performance using the OCC Risk
Analysis Division consumer credit data base and methods that are often encountered
in the industry. We validated and compared a parametric model, a semiparametric
model, and a popular nonparametric approach (CHAID).
It is worth pointing out that data preparation is crucial. The sample design
issues are important, as discussed, but simple matters such as variable de…nition and
treatment of missing or ambiguous data become critical. This is especially true in
cases where similar credit attributes could be calculated in slightly di¤erent ways.
Evaluating these data issues was one of the most time-consuming components of the
project.
With the data in hand, we …nd that careful statistical analysis will deliver a useful
model, and that, while there are di¤erences across methods, the di¤erences are small.
The parametric and semiparametric models appear to work slightly better than the
CHAID. There is little di¤erence between the parametric and semiparametric models.
We …nd that within-period validation is useful, but out-of-time validation shows
a substantial loss of accuracy. We attribute this to the changing macroeconomic
conditions. These conditions led to a small change in the overall default rate. This
change re‡ects much larger changes in the default rates of the high-default (low-score)
components of the population. Thus accurate out-of-time prediction of within-score-
group default rates should be based on models which are frequently updated, or
which have variables re‡ecting aggregate credit conditions. On the positive side, the
separation properties of the models seem quite robust in the out-of-time validation
samples. This suggests that it is easier to rank individuals by creditworthiness than
to predict actual default rates.
There are many additional models in each of the categories, parametric, semi-
parametric and nonparametric, which could be considered. We have taken a rep-
resentative approach from each category. Our models are similar to those used in
practice. Our results suggest that the performance of models developed using simple
cross sectional techniques may be unreliable in terms of accuracy as macroeconomic
conditions change. The results suggest that increased attention be placed on the
use of longitudinal modeling methods as a means by which to estimate performance
conditional on temporally varying economic factors.
References
Bierman, H., and W. H. Hausman (1970): “The Credit Granting Decision,”
Management Sci., 16, 519–532.
FRB (2006): “Federal Reserve Statistical Release G.19, Consumer Credit, Decem-
ber,”Discussion paper, Federal Reserve Board.
Thomas, L., J. Crook, and D. B. Edelman (1992): Credit Scoring and Credit
Control. Oxford University Press, Oxford.
Thomas, L. C., D. B. Edelman, and J. Crook (2002): Credit Scoring and Its
Applications. SIAM.
TABLES AND FIGURES
Table 1:
CCDB Sampling Design Counts
A B C D E F G H H
Random
Matched Random
Scoreable CCDB
from Unmatched Unscoreable Total Current
Cross Panel
Previous from Cross Updating Year Masterfile
Section Updating
Year Year's Previous Section Random Extracts
From Proportion = B+E+G
Panel Year's Panel From Sample
Current
and and Dropped Current Year = B+C+D+E
Year
Scoreable Masterfile
Masterfile
Amounts Owed
Aggregate credit amount of bankcard tradelines of which the records were updated within 12 months BK27
Aggregate credit amount of installment tradelines of which the records were updated within 12 months IN27
Aggregate credit amount of mortgage tradelines of which the records were updated within 12 months MG27
Aggregate credit amount of auto loan tradelines of which the records were updated within 12 months AL27
Aggregate credit amount of revolving tradelines of which the records were updated within 12 months RV27
Dummy variable for the positive aggregate credit amount of bankcard tradelines U11
Dummy variable for the positive aggregate credit amount of installment tradelines U12
Dummy variable for the positive aggregate credit amount of mortgage tradelines U13
Dummy variable for the positive aggregate credit amount of auto loan tradelines U17
Dummy variable for the positive aggregate credit amount of revolving tradelines U18
Aggregate balance amount of open bankcard tradelines of which the records were updated within 12 months ABK16
Aggregate balance amount of installment tradelines of which the records were updated within 12 months IN16
Aggregate balance amount of mortgage tradelines of which the records were updated within 12 months MG16
Aggregate balance amount of auto loan tradelines of which the records were updated within 12 months AL16
Aggregate balance amount of finance tradelines of which the records were updated within 12 months ALN08
Aggregate balance amount of retail tradelines of which the records were updated within 12 months ART08
Aggregate balance amount of revolving tradelines of which the records were updated within 12 months RV16
Aggregate balance amount of open home equity tradelines of which the records were updated within 12 months AEQ08
Bankcard utilization rate (Aggregate balance / Aggregate credit amount) BK28
Dummy variable for zero bankcard utilization rate BK28_0
Dummy variable for bankcard utilization rate=100% BK28_100
Dummy variable for bankcard utilization rate>100% BK28_101
Installment accounts utilization rate (Aggregate balance / Aggregate credit amount) IN28
Mortgage accounts utilization rate (Aggregate balance / Aggregate credit amount) MG28
Auto loan accounts utilization rate (Aggregate balance / Aggregate credit amount) AL28
Open bankcard utilization rate (Aggregate balance / Aggregate credit amount) ABK18
Revolving accounts utilization rate (Aggregate balance / Aggregate credit amount) RV28
Average credit amount of bankcard tradelines with positive balance and of which the records were updated within 12
BK17
months
Average credit amount of installment tradelines with positive balance and of which the records were updated within 12
IN17
months
Average credit amount of mortgage tradelines with positive balance and of which the records were updated within 12
MG17
months
Average credit amount of retail tradelines with positive balance and of which the records were updated within 12 months RT17
Average credit amount of auto loan tradelines with positive balance and of which the records were updated within 12
AL17
months
Average credit amount of revolving tradelines with positive balance and of which the records were updated within 12 months RV17
New credit
Total number of inquiries within 6 months AIQ01
Total number of inquiries within 12 months IQ12
Total number of bankcard accounts opened within 2 years BK61
Total number of installment accounts opened within 2 years IN61
Total number of mortgage accounts opened within 2 years MG61
Dummy variable for the existence of new accounts within 2 years NUM71
Significance Ranking
Variables selected using the Stepwise method Variable in Selection Methods
(sorted by variable type) Names Inter-
Resample Stepwise
section
I. Payment History
Total number of tradelines with good standing, positive balance, and of which the records were updated
GO01 1 1 1
within 12 months
Worst status of open bankcards within 6 months CURR 3 3 3
Total number of closed tradelines within 12 months NUM_Closed 10 6
Dummy variable for the existence of tradelines with 90 days past due or worse BAD11 9 13
Worst status of bankcard tradelines with 60 days past due or worse and of which the records were updated
BK43 15
within 12 months
Maximum of the balance amount, past due amount, and charged off amount of delinquent bankcard
BK53 29
tradelines with 60 days past due or worse and of which the records were updated within 12 months
Dummy variable for the existence of installment tradelines with 90 days past due or worse within 12 months IN13 31
Significance Ranking
Variables selected using the Stepwise method Variable in Selection Methods
(sorted by variable type) Names Inter-
Resample Stepwise
section
I. Payment History
Total number of tradelines with good standing, positive balance, and of which the records were updated
GO01 3 4 3
within 12 months
Total number of tradelines with 30 days past due or worse BAD41 4 5 4
Total number of tradelines with 90 days past due or worse BAD01 7 9 7
Total number of closed tradelines within 12 months NUM_Closed 12 10 9
Total number of bankcard tradelines with 90 days past due or worse within 12 months BK03 25 13
Dummy variable for the existence of mortgage tradelines with 90 days past due or worse within 12 months MG13 18 21 16
Dummy variable for the existence of installment tradelines with 90 days past due or worse within 12 months IN13 13 13 18
Dummy variable for the existence of bankcard tradelines with 90 days past due or worse within 12 months BK13 24 20
Dummy variable for the existence of retail tradelines with 90 days past due or worse within 12 months RT13 28
Dummy variable for the existence of revolving tradelines with 90 days past due or worse within 12 months RV13 20 11 34
Dummy variable for the existence of revolving retail tradelines with 90 days past due or worse within 12
RTR13 36
months
Dummy variable for the existence of auto loan tradelines with 90 days past due or worse within 12 months AL13 38
Dummy variable for the existence of tradelines with 90 days past due or worse BAD11 43
Maximum of the balance amount, past due amount, and charged off amount of delinquent bankcard
BK53 44
tradelines with 60 days past due or worse and of which the records were updated within 12 months
Significance Ranking
Variables selected using the Stepwise method Variable in Selection Methods
(sorted by variable type) Names Inter
Resample Stepwise
section
I. Payment History
Worst status of open bankcards within 6 months CURR 2 2 4
Total number of tradelines with 30 days past due or worse BAD41 10
Total number of tradelines with good standing, positive balance, and of which the records were updated
GO01 11
within 12 months
Significance Ranking
Variables selected using the Stepwise method Variable in Selection Methods
(sorted by variable type) Names Inter-
Resample Stepwise
section
I. Payment History
Total number of tradelines with 30 days past due or worse BAD41 33
Dummy variable for the existence of tradelines with 30 days past due or worse BAD51 3 5 5
Worst status of open bankcards within 6 months CURR 2 3 3
Total number of tradelines with good standing, positive balance, and of which the records were updated
GO01 6 4
within 12 months
Total number of closed tradelines within 12 months NUM_Closed 19
I. Payment History
Total number of tradelines with 60 days past due or worse BAD21 X X
Dummy variable for the existence of tradelines with 60 days past due or worse BAD31 X
Total number of tradelines with 30 days past due or worse BAD41 X X
Dummy variable for the existence of tradelines with 30 days past due or worse BAD51 X X
Worst status of open bankcards within 6 months CURR X X
Total number of tradelines with good standing, positive balance, and of which the records were updated within 12
months
GO01 X X
Calibrated Generic Bureau Score 62.4 62.6 194.1 (<.0001) 2506.3 (<.0001)
1. The H-L test does not apply. By design, the predicted outcomes under the semi parametric approach are equal to the actual outcomes.
2. The p-values are derived under null hypothesis H0: pj=qj for all j (see equation 8) under the assumption that the H-L ~ c2 df=8.
Table 11:
In-Time Validation: Separation (K-S) and Accuracy (H-L) Measures for Parametric and Semi
Parametric Models at Development, by Segment
Statistic and Sample
Scoring Model Kolmogorov-
Segment Hosmer-Lemeshow
Smirnov
Variable 1999 1999
Model Form 1999 Dev 1999 Hold-out
Selection Dev Hold-out
(value) (p-value) (value) (p-value)
Stepwise 40.3 38.9 16.5 .0358 22.1 .0047
Dirty History Parametric Resampling 39.3 37.2 12.5 .1303 23.9 .0024
and Presently
Mildly Intersection 37.5 37.0 6.1 .6360 10.8 .2133
Delinquent Stepwise 40.3 38.5 489.8 (<.0001) 548.1 (<.0001)
Semi
1 Resampling 39.1 37.3 526.8 (<.0001) 565.6 (<.0001)
(n=13,302) Parametric
Intersection 39.1 37.3 592.7 (<.0001) 625.6 (<.0001)
Stepwise 42.9 43.2 30.8 .0002 26.9 .0007
Dirty History Parametric Resampling 42.4 43.0 26.0 .0011 27.3 .0006
and Presently Intersection 41.4 42.0 31.1 .0001 32.8 .0001
Current
Stepwise 43.0 43.2 34.7 (<.0001) 25.8 .0011
Semi
(n=67,814) Resampling 42.7 43.2 46.1 (<.0001) 42.5 (<.0001)
Parametric
Intersection 42.2 42.6 52.7 (<.0001) 35.8 (<.0001)
Stepwise 58.2 57.1 8.3 .4047 27.3 .0006
Parametric Resampling 57.4 56.1 11.5 .1750 16.7 .0334
Clean History Intersection 54.3 54.4 48.7 (<.0001) 51.7 (<.0001)
and Thin File
Stepwise 57.6 57.1 7.5 .4837 16.4 .0370
Semi
(n=15,132) Resampling 57.2 56.8 6.1 .6360 16.9 .0312
Parametric
Intersection 55.5 55.2 21.4 .0062 51.8 (<.0001))
Stepwise 60.2 60.1 84.9 (<.0001) 94.8 (<.0001)
Parametric Resampling 60.0 59.9 74.6 (<.0001) 74.5 (<.0001)
Clean History
and Thick File Intersection 58.7 58.9 67.6 (<.0001) 82.2 (<.0001)
Stepwise 60.1 60.2 9.7 .2867 15.4 .0518
(n=242,330) Semi Resampling 60.0 59.9 7.7 .4633 14.4 .0719
Parametric
Intersection 59.1 59.2 6.8 .5584 15.9 .0438
1. The predicted values were derived from the actual default rates in the decile range based on the pooled segment data in Table 10.
Table 12:
Empirical Bad Rates for Alternate Bad Definitions on the Development Samples
1. Missing observations were generated if the lender failed to report performance as of the observation date 6, 12, or 18 months forward.
Table 13:
K-S separation measures for models built to alternate bad definitions on the development sample
Bad Event Type and Sample
Scoring Model
90+ Days Past Due or Worse 60+ Days Past Due or Worse
Bad
Variable
Model Form Event Dev Hold-Out Dev Hold-Out
Selection
Horizon
24 64.0 64.0 61.5 61.6
18 65.8 65.8 63.1 63.4
Stepwise
12 67.8 67.6 65.4 65.4
6 71.8 72.2 68.6 68.7
24 63.7 63.8 61.4 61.6
18 65.7 65.5 63.0 63.3
Parametric Resampling
12 67.7 67.5 65.3 65.3
6 71.5 72.0 68.4 68.8
24 62.7 62.9 60.5 60.9
18 64.6 64.8 62.2 62.6
Intersection
12 66.7 66.8 64.5 64.8
6 71.0 71.7 67.8 68.2
24 64.0 63.9 61.6 61.7
18 65.8 65.8 63.0 63.3
Stepwise
12 67.8 67.6 65.3 65.3
6 71.7 72.2 68.6 68.7
24 63.8 63.7 61.4 61.6
Semi 18 65.7 65.5 63.0 63.2
Parametric Resampling
12 67.6 67.5 65.2 65.3
6 71.6 72.1 68.4 68.8
24 62.6 62.8 60.5 60.9
18 64.6 64.8 62.1 62.5
Intersection
12 66.6 66.7 64.4 64.8
6 71.0 71.6 67.8 68.2
24 58.3 57.9 56.6 56.4
18 59.9 59.4 57.9 57.7
All variables
12 61.6 61.0 59.6 59.4
Non 6 65.0 64.9 62.3 62.2
Parametric
24 59.2 58.9 57.2 56.8
18 60.9 60.4 58.4 58.1
Intersection
12 62.7 62.0 60.1 59.6
6 65.8 65.7 62.8 62.0
24 62.4 62.6 60.3 60.4
Calibrated Generic Bureau 18 64.6 64.4 61.9 61.9
Score
12 66.5 66.4 63.9 63.8
6 70.2 70.7 67.2 66.7
Table 14:
Sample Sizes and Bad Rates For the Development and Out-of-Time Validation Samples
(Bad = 90 Days Past Due, or Worse, over the Following 24 Months)
NonParametric All Variables 58.3 59.3 60.0 59.9 2.1 .9778 3962.4 (<.0001) 3231.5 (<.0001) 2792.4 (<.0001)
(CHAID) Intersection 59.2 60.6 60.9 60.9 44.8 (<.0001) 4163.6 (<.0001) 3758.9 (<.0001) 3053.0 (<.0001)
Calibrated Generic Bureau Score 62.4 64.9 65.6 65.1 194.1 (<.0001) 2506.3 (<.0001) 4267.6 (<.0001) 2970.0 (<.0001)
Table 17:
Out-of-Time Validation: Separation (K-S) and Accuracy (H-L) Measures for Parametric and Semi Parametric Models, by
Segment
1999 2000
Updating Sample:
66,500 scoreable
Cross-sectional
random sample from
2000 master file
(1,000,000)
883,500 scoreable
Figure 2:
1999 Development and 1999 Hold-Out Sample Construction, and Bad Rates
(Bad = 90+Days Past Due, or Worse, over the Following 24 Months)
338,578
individuals Clean Credit History & Thin File
(p=7.19%) 15,132 Individuals
Future (p=4.76%)
bankcard Clean Credit History
performance
is observable 257,462 Individuals
at month 24 (p=3.06%)
Clean Credit History & Thick File
Not 242,330 Individuals
At least 677,262
Presently (p=2.96%)
with one individuals
open Severely
Delinquent Mildly Delinquent
bankcard
or Worse Dirty In-Time Hold-Out Segment
with
With on History 13,207 Individuals (p=49.04%)
balance
With CCDB Valid Bankcards The In-Time 80,942 Individuals Current
update
Attribute CCDB Hold-out (p=20.39%) In-Time Hold-Out Segment
date
Records Tradeline 714,698 Samples 67,735 Individuals (p=14.79%)
between
In 1999 Accounts individuals
1/99 and Thin File
In 1999 338,684
6/99 In-Time Hold-Out Segment
1 individuals Clean
Million 995,251 (p=7.22%) History 15,150 Individuals (p=4.86%)
733,820
Individuals individuals 257,742 Individuals Thick File
individuals
(p=3.09%) In-Time Hold-Out Segment
242,592 Individuals (p=2.97%)
Individuals with at least one bankcard account presently severely delinquent or worse
19,122 individuals
Individuals without any open bankcards with a balance update date in 1/99 though 6/99
261,431 individuals
Individuals with attributes but without matiching valid tradeline data in 1999
4,749 individuals
Figure 3:
0.9
0.8
0.7
0.6
CDF of Bads
Parametric (Stepwise)
0.5 Parametric (Resampling)
Parametric (Intersection)
0.4
SemiParametric (Stepwise)
0.3 SemiParametric (Resampling)
SemiParametric (Intersection)
0.2 CHAID (All variables)
CHAID (Intersection)
0.1 Calibrated GBS
No Separation
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CDF of All Samples (Sorted from Low to High score)
Figure 4:
0.9
0.8
0.7
0.6
CDF of Bads
Parametric (Stepwise)
0.5 Parametric (Resampling)
Parametric (Intersection)
0.4
SemiParametric (Stepwise)
0.3 SemiParametric (Resampling)
SemiParametric (Intersection)
0.2 CHAID (All variables)
CHAID (Intersection)
0.1 Calibrated GBS
No Separation
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CDF of All Samples (Sorted from Low to High score)
Figure 5:
15 SemiPara(Stepwise)
SemiPara(Resampling)
SemiPara(Intersection)
CHAID(All variables)
10
CHAID(Intersection)
Calibrated GBS
Calibration Target
5
15 SemiPara(Stepwise)
SemiPara(Resampling)
SemiPara(Intersection)
CHAID(All variables)
10
CHAID(Intersection)
Calibrated GBS
Calibration Target
5
All Samples
Dev
Hold
Parametric 00
(Resampling)
01
02
Dev
Hold
Semiparametric 00
(Stepwise)
01
02
Dev
Hold
Nonparametric
00
(CHAID)
01
02
Dev
Hold
Calibrated 00
GBS
01
02
500 550 600 650 700 750 800 850 900
Values
Figure 8: RAD Scores, Dirty/Delinquent Sample, Development and Validation
Dirty/Delinquent
Dev
Hold
Parametric 00
(Resampling)
01
02
Dev
Hold
Semiparametric 00
(Stepwise) 01
02
Dev
Hold
Nonparametric 00
(CHAID)
01
02
Dev
Hold
Calibrated
00
GBS
01
02
500 550 600 650 700 750 800 850 900
Values
Figure 9: RAD Scores, Dirty/Current Sample, Development and Validation
Dirty/Current
Dev
Hold
Parametric 00
(Resampling)
01
02
Dev
Hold
Semiparametric 00
(Stepwise)
01
02
Dev
Hold
Nonparametric
00
(CHAID)
01
02
Dev
Hold
Calibrated 00
GBS 01
02
500 550 600 650 700 750 800 850 900
Values
Figure 10: RAD Scores, Clean/Thin Sample, Development and Validation
Clean/Thin
Dev
Hold
Parametric 00
(Resampling)
01
02
Dev
Hold
Semiparametric 00
(Stepwise)
01
02
Dev
Hold
Nonparametric
00
(CHAID)
01
02
Dev
Hold
Calibrated 00
GBS 01
02
500 550 600 650 700 750 800 850 900
Values
Figure 11: RAD Scores, Clean/Thick Sample, Development and Validation
Clean/Thick
Dev
Hold
Parametric 00
(Resampling) 01
02
Dev
Hold
Semiparametric 00
(Stepwise) 01
02
Dev
Hold
Nonparametric 00
(CHAID)
01
02
Dev
Hold
Calibrated 00
GBS 01
02
500 550 600 650 700 750 800 850 900
Values
Figure 12:
15 SemiPara(Stepwise)
SemiPara(Resampling)
SemiPara(Intersection)
CHAID(All variables)
10
CHAID(Intersection)
Calibrated GBS
Calibration Target
5
0.9
0.8
0.7
0.6
CDF of Bads
Parametric (Stepwise)
0.5 Parametric (Resampling)
Parametric (Intersection)
0.4
SemiParametric (Stepwise)
0.3 SemiParametric (Resampling)
SemiParametric (Intersection)
0.2 CHAID (All variables)
CHAID (Intersection)
0.1 Calibrated GBS
No Separation
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CDF of All Samples (Sorted from Low to High score)
Figure 14:
0.9
0.8
0.7
0.6
CDF of Bads
0.5
0.4
Development
0.3
Hold-out
0.2 2000
2001
0.1 2002
No Separation
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CDF of All Samples (Sorted from Low to High score)
Figure 15:
0.9
0.8
0.7
0.6
CDF of Bads
0.5
0.4
Development
0.3
Hold-out
0.2 2000
2001
0.1 2002
No Separation
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CDF of All Samples (Sorted from Low to High score)
Figure 16:
0.9
0.8
0.7
0.6
CDF of Bads
0.5
0.4
Development
0.3
Hold-out
0.2 2000
2001
0.1 2002
No Separation
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CDF of All Samples (Sorted from Low to High score)
Figure 17:
0.9
0.8
0.7
0.6
CDF of Bads
0.5
0.4
Development
0.3
Hold-out
0.2 2000
2001
0.1 2002
No Separation
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
CDF of All Samples (Sorted from Low to High score)