davis1992

IMA Journal of Mathematics Applied in Business & Industry (1992) 4, 43-51
Machine-learning algorithms for credit-card applications

R. H. DAVIS
Heriot-Watt University, Edinburgh
D. B. EDELMAN
Bank of Scotland, Dunfermline*
A. J. GAMMERMAN
Heriot-Watt University, Edinburgh
Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

[Received July 1991]
Credit assessment involves predicting applicant reliability and profitability. The

objective of this paper is to apply a number of algorithms to credit-card assessment.
Despite the fact that many numerical and connectionist learning algorithms address
the same problem of learning from classified examples, very little is known regarding
their comparative strengths and weaknesses. Experiments comparing the top-down
induction-learning algorithms (G&T and ID3) with the multilayer perceptron,
pocket, and back-propagation neural learning algorithms have been performed using
a set of approved applications for credit cards from the Bank of Scotland where the
decision process was principally a credit scoring system. Overall, they all perform
at the same level of classification accuracy, but the neural algorithms take much
longer to train. The paper describes our motivation for using the machine-learning
algorithms for credit-card assessment, describes the algorithms in detail, and
compares the performance of these algorithms in terms of their accuracy.
1. Introduction
Inductive systems are used to generalize concepts from a set of examples. The
examples constitute previous experience, and induction is one of the very basic ways
humans use for learning. As a result, the learning usually creates some knowledge.
The syntactic structure and semantic organization of knowledge is what distinguishes
it from information. Many knowledge representation formalisms have been used,
and one of them we are going to deal with is a decision tree. Induction that is
used to extract concepts from data should somehow be capable of distinguishing
among them. For example, in credit-card assessment, two possible concepts are
'creditworthy' (or 'good' applicants who paid back their debts) and 'noncredit-
worthy' (or 'bad' applicants). A large enough set of card-application examples can
be used in order to find a 'typical' set of attributes with the corresponding concept
or class (good or bad).
Our research aim is to apply different machine-learning and inference techniques
to the credit-card assessment problem. In particular, there are two major objectives:
* Now at Royal Bank of Scotland, Edinburgh.

43
O Oifofd Unircraty Prat 1992
44 R. H. DAVIS, D. B. EDELMAN, AND A. J. GAMMERMAN
(1) to describe a few different algorithms and systems for the classification of
applicants; (2) to compare the effectiveness of different models by comparing the
accuracy with the results of classification by bank managers. We estimate the
'accuracy' here by classifying a test set of examples, and then calculating the per-
centage of 'correct' classifications (the ones that correspond to the bank managers'
decisions).
We consider here two approaches. The first uses Bayesian inference without
assuming independence—the G&T model (Gammerman & Thatcher 1990)—and the
second is based on neural-network algorithms, or more precisely, on a connectionist
expert system technique (Gallant 1988). We shall describe briefly both models and
their design and implementation, and then compare their performance using a
banking example.

2. G&T model
In the first model, one works through the calculations completely from first principles,
applying Bayes' theorem without assuming independence, and estimating joint
probability distributions directly from data. There is a widely-held belief that,
if independence is not assumed, the calculation of probabilities soon becomes
unmanageable because of a 'combinatorial explosion'. However, it is possible to
apply Bayes' theorem without assuming independence and without an unmanage-
able increase in complexity (Gammerman & Thatcher 1990). Consider, for example,
the problem of estimating the probabilities that applicants belong to different
classes, given their attributes. If the assumption of independence is dropped, then
Bayes' theorem leads to a simple (indeed obvious) result: if we can identify the past
applicants who had the same combination of attributes as a new applicant, and look
up their final classification, then we can estimate the probabilities for the new
applicant. We shall call it the 'proper Bayes' method in order to distinguish it from
the 'simple Bayes' method, where the global assumption of independence is made.
The reason why the complexity of the proper Bayes approach does not increase
exponentially—why there is no combinatorial explosion—is that the calculations
involve only those propositions and their combinations which actually occur in the
database.
With a very large database the method would, in principle, be very simple indeed.
However, with a limited database, it is necessary to use selected combinations of
those attributes which are most relevant to each class. A method is given for selecting
such combinations. The method also gives upper and lower confidence limits for each
probability, to provide a measure of the precision of the estimates.
A general computational model called G&T has been developed, and it has two
major parts: the first is the selection procedure, and the second is the matching
procedure. The architecture of the system is presented in Fig. 1. In the selection
part, the system extracts combinations of relevant attributes and calculates an
estimated probability with corresponding confidence limits for each class. This is
summarized in the following algorithm.
MACHINE-LEARNING ALGORITHMS FOR CREDIT-CARD APPLICATIONS 45
Training
procedure
Coding
Selection
Calculations

Matching
procedure
Loading
Final Fitting
decisions
(FD)
Evaluation Decision
making
Computer
diagnoses
FIG. 1. The architecture of the G&T model.
FOR each concerned class i DO:

Calculate 'chi-squared' values for each possible attribute,
where 'chi-squared' is the sum of the squares of all the
standardized differences from expected frequencies in a group
of categorized observations.
IF all objects belong to the class i
OR all objects have the same attributes
OR the highest 'chi-squared' value is lower than a threshold
THEN
this combination is terminated with an estimated probability of the
combination belonging to class i.
ELSE
l.Work out the most important attribute among the remaining for class i
according to the 'chi-squared' values.
2.Partition the current set of training objects into two subsets of including
and excluding the selected attribute.
3.Repeat the whole algorithm for each subset until the termination
condition is met.
Training set
of data
Pocket
algorithm
Knowledge

base
User Inference
interface mechanism
FIG. 2. Connectionist expert system.
The job of the matching part is to fit the observed attributes of newly arrived
applicants into the combinations of relevant attributes, and also to find their
probabilities.
The G&T system allows a decision to be made for each new applicant by selecting
the class with the highest probability selected (we shall call it the computer
classification). The system also allows the evaluation of the accuracy of classification,
by comparing the computer classification with the applicant's final classification
recorded by bank managers.
3. The connectionist model

The connectionist model (or neural network) refers to the network whereby
cells are joined by arcs which have weights associated with them. A single cell (or
perceptron) represents either an attribute or a class; it receives a set of inputs and
the weights through its connections with other cells, and computes a weighted sum
of its inputs. A function of the total input determines the overall value that will be
output
The developed system has two major programs: a learning program which uses a
set of training examples to generate a network, and an expert system to make
inferences and to provide some explanations. The basic structure of connectionist
model is shown in Fig. 2. The major algorithm in the learning part is the so-called
pocket algorithm developed by Gallant (1988).
Pocket algorithm
(1) set P to the 0 vector, where P is a perceptron weight vector
(2) let P be the current perceptron weights.
randomly pick a training example Ek (with corresponding classification C*)
(3) IF P correctly classifies Ek
THEN
IF (the current run of correct classifications with P is longer
than the run of correct classifications for the weight vector W^ in your
pocket)
THEN
replace the pocket weight Wm by P

remember the length of its correct run
ELSE
form a new set of weights, i.e. replace P by P + CkEk
(4) go to (2)
The main idea is that, for every cell, we compute weights from a given set of examples,
and we have a pocket where we keep the best weights that we acquire so far.
4. Results
We now present results of the performance of the models. In each case the training
set was 609 applicant records, and 255 were used as the testing set.
(a) The G&T model

The results of application of the G&T system are shown below. Table 1 illustrates
a specimen set of combinations of relevant attributes, with estimated probabilities
and upper and lower confidence limits for each class (threshold 10.5) extracted from
the training set. There are 17 combinations of attributes with corresponding values
(see Appendix for the meaning of these attributes); absence of the attributes/values is
denoted by a bar. In the table, the column headed N gives the number of past
applicants who had this combination of attributes; the column headed D shows how
many of them had the class 1 ('good')- The column headed P shows the ratio D/N,
which can be used as an estimate of the probability p that a new applicant with this
combination of attributes will have the corresponding class. The upper and lower
confidence limits for p are given in the column L and U respectively (95% confidence
limits). For example, the combination (2) consists of two attributes: 15/3 (more than
10 years with the Bank) and not 14/3 (mortgage code is not with 'others'); there were
75 applicants, and 55 of them were 'good' applicants; so the probability is 0.73, and
0.71 is the lower limit and 0.83 is the upper limit. The second class ('bad') consists
of all those applicants who have not been classified to the first class. The probability
that a particular applicant will fall in Class 2 can therefore be found by estimating
his probability for the first class and then subtracting it from unity.
Table 2 shows the overall accuracy of the approach applied to the testing set of
48 R. H . DAVIS, D . B. EDELMAN, AND A. J. GAMMERMAN
TABLE 1
Selected combinations of attributes for class 1 ('good1)
Combinations D N P L U
(1) 15/3 14/3 0 2 0.000 0.000 0.850

(2) 15/3 T473 55 75 0.733 0.710 0.830
(3) T373 17/0 6/20 2 2 1.000 0.150 1.000
(4) T373 17/0 6720 9/12 3 26 0.115 0.020 0.310
(5) T573 17/0 6720 97T2 7/1 2/2 0 2 0.000 0.000 0.850
(6) T373 17/0 6720 97T2 7/1 272 6 9 0.667 0.290 0.930
(7) T573- 17/0 6720 57T2 77T 88 303 0.290 0.289 0.293
(8) T57I 17/1 6/10 1 1 1.000 0.020 1.000

(9) T375 17/1 67TO 6/0 1 1 1.000 0.020 1.000
(10) T575 17/1 6710 670 6/6 1 1 1.000 0.020 1.000
(11) T575 17/1 67H5 570 575 10/2 11/3 6 10 0.600 0.260 0.880
(12) T373 17/1 5710 670 676 10/2 TT73- 2/2 3 8 0.375 0.080 0.760
(13) T573 17/1 6710 670 675 10/2 TT73 272 2 19 0.105 0.010 0.340
(14) T57? 17/1 67TO S7O 575 1072 6/8 1 1 1.000 0.020 1.000
(15) T373 17/1 67TO 57O 575 T072 678 2/2 10 39 0.256 0.130 0.430
(16) T375 17/1 6710 670 576 TO72 678 272 15/0 5 17 0.294 0.100 0.580
(17) T373 17/1 5710 57O 575 TO72 678 272 T370 9 93 0.097 0.040 0.180
255 applicants. That is, for each of the 255 clients, we calculate the probabilities for
both classes, and take as a 'computer classification' the classification with the highest
probability. As a result, 71.4% of the applicants were correctly classified.
TABLE 2
Accuracy of classification: G&T
Computer classification vs final 1classification
Computer classification
Final classification
(Threshold 10.5) GOOD BAD Total
GOOD 18 62 80
BAD 11 164 175
Total 29 226 255
Accuracy = 182/255 =71.4%
Table 3 shows the accuracy achieved by the 'simple Bayes' method on the same set
of data.
TABLE 3
Accuracy of classification: simple Bayes
Computer classification vs final classification
Final classification GOOD BAD Total
GOOD 38 42 80
BAD 37 138 175
Total 75 180 255

Accuracy = 176/255 = 69%
(b) The connectionist model

The model has been implemented and trained using the training set. All 19 nodes
have been connected through a set of hidden nodes to the output node with two
values: 'good' and 'bad'. The accuracy of the approach is given below.
TABLE 4
Accuracy of classification: pocket algorithm
Computer classification vs final classification
Final classification GOOD BAD Total
GOOD 39 41 80
BAD 50 125 175
Total 89 166 255
Accuracy = 164/255 = 64.3%
5. Conclusions
Two computational models have been described and applied to a set of banking data.
The present results are limited by the relatively small number of training examples
and test set. Given this fact, both techniques performed well. Still there were various
differences among the techniques worth noting. First, G&T has been generally more
'selective' than the connectionist model. It selected only relevant attributes and
constructed a set of combinations of attributes (a decision tree) for each of the classes.
It would be even more obvious had we processed the data with more than two classes.
The accuracy achieved by G&T was on the level of (~71%). As can be seen the
'proper Bayes' model has achieved a slightly higher level of accuracy than the 'simple
Bayes' model (~69%). The connectionist model was much slower in the learning
part than G&T model, and achieved about 64% of accuracy.
Appendix—list of attributes with their values

Attributes Values (No.)
1. Salutation code Mr (0), Mrs (1), Miss (2), Ms (3), Other (4)
0 (0), 1-19 (1), 20-25 (2), 26-29 (3) 30-39 (4), 40-49 (5),
2. Age group 50-59 (6), 60+ (7)
single (0), married-Mt (1), married-AJ (2), divorced, separated
3. Marital status (3), widowed (4), not given (5)
0 (0), 1-2 (1), 3 + (2), not given (3)
4. Number of children y (0), n (1)
5. Home telephone indicator miscellaneous (0), office staff (1), social worker (2), driver (3),
6. Spouse's employment category executive (4), production worker (5), guard (6), houseperson
(7), labourer/outside worker (8), business owner (9), semi-
professional (10), military/non office (11), service (12), military

office-1 (13), professional (14), military office-2 (15), retired
(16), student (17), trades (18), unemployed (state support) (19),
sales (20), unemployed (21), manager (22), not given (23)
7. Spouse's annual income 0 (0), 0-2,999 (1), 3,000-5,999 (2), 6,000-8,999 (3), 9,000-11,999
(4), 12,000-14,999 (5), 15,000-18,999 (6), 19,000-23,999 (7),
24,000-29,999 (8), 30,000+ (9)
8. Applicant's employment status other employers (0), public sector (1), U.K. government (2),
houseperson (3), self-employed (4), military (5), private sector
(6), retired (7), student (8), part time (9), unemployed (10), not
given (11)
9. Applicant's employment category miscellaneous (0), office staff (1), social worker (2), driver (3),
executive (4), production worker (5), guard (6), houseperson
(7), labourer/outside worker (8), business owner (9), semi-
professional (10), military/non office (11), service (12), military
office-1 (13), professional (14), military office-2 (15), retired
(16), student (17), trades (18), unemployed (state support) (19),
sales (20), unemployed (21), manager (22), not given (23)
10. Years in current employment 0 (0), 1-3 (1), 4-10 (2), 11 + (3), don't know (4)
11. Applicant's annual income 0 (0), 0-2,999 (1), 3,000-5,999 (2), 6,000-8,999 (3), 9,000-11,999
(4), 12,000-14,999 (5), 15,000-18,999 (6), 19,000-23,999 (7),
24,000-29,999 (8), 30,000+ (9)
12. Home status owner (0), tenant, furnished (1), tenant, unfurnished (2), living
with parents (3), other (4), not given (5)
13. Years at current address 0 (0), 1-3 (1), 4-10 (2), 11+ (3), don't know (4)
14. Mortgage code Bank of Scotland (0), other bank (1), building society (2), other
(3), don't know (4)
15. Years at bank 0 (0), 1-3 (1), 4-10 (2), 10+ (3), not given (4)
16. Current account indicator y (0), not given (1)
17. Savings/deposit account indicator y (0), not given (1)
18. Number of credit cards held 0(0), 1(1), 2 + (2)
19. Number of credit cards applied for 1 (0), 2 (1)
married—M (married applicant). | married—A (associate (spouse) applicant)
Acknowledgements
We wish to thank Mr. Hakon Styri and Mr. Zhiyuan Luo for useful discussions and
some results with both systems. Data were used with permission of Bank of Scotland.
REFERENCES
GALLANT, S. 1988 Connectionist expert systems. Commun. ACM 31, 152-69.
GAMMERMAN A., & THATCHER, A. R. 1990 Bayesian inference in an expert system without
assuming independence. Advances in Artificial Intelligence (Ed. M. C. Columbic) pp.
182-95. Springer-Verlag.

davis1992

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

davis1992

Uploaded by

Copyright:

Available Formats

IMA Journal of Mathematics Applied in Business & Industry (1992) 4, 43-51

Machine-learning algorithms for credit-card applications

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

Credit assessment involves predicting applicant reliability and profitability. The

* Now at Royal Bank of Scotland, Edinburgh.

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

FIG. 1. The architecture of the G&T model.

FOR each concerned class i DO:

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

FIG. 2. Connectionist expert system.

3. The connectionist model

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

(a) The G&T model

(1) 15/3 14/3 0 2 0.000 0.000 0.850

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

Computer classification vs final 1classification

Accuracy = 182/255 =71.4%

Computer classification vs final classification

Final classification GOOD BAD Total

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

(b) The connectionist model

Computer classification vs final classification

Final classification GOOD BAD Total

Accuracy = 164/255 = 64.3%

Appendix—list of attributes with their values

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

Downloaded from http://imaman.oxfordjournals.org/ at University of Reading on January 1, 2015

You might also like