Professional Documents
Culture Documents
An Integrated Data Mining and Behavioral Hseil
An Integrated Data Mining and Behavioral Hseil
An Integrated Data Mining and Behavioral Hseil
www.elsevier.com/locate/eswa
Abstract
Analyzing bank databases for customer behavior management is difficult since bank databases are multi-dimensional, comprised of
monthly account records and daily transaction records. This study proposes an integrated data mining and behavioral scoring model to
manage existing credit card customers in a bank. A self-organizing map neural network was used to identify groups of customers based on
repayment behavior and recency, frequency, monetary behavioral scoring predicators. It also classified bank customers into three major
profitable groups of customers. The resulting groups of customers were then profiled by customer’s feature attributes determined using an
Apriori association rule inducer. This study demonstrates that identifying customers by a behavioral scoring model is helpful characteristics
of customer and facilitates marketing strategy development.
q 2004 Elsevier Ltd. All rights reserved.
Keywords: Data mining; Behavioral scoring model; Customer segmentation; Neural network; Association rule
0957-4174/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2004.06.007
624 N.-C. Hsieh / Expert Systems with Applications 27 (2004) 623–633
techniques. However, analyzing bank databases for custo- trained by back propagation and gradient descent, or similar
mer behavior management is difficult since bank databases alternatives.
are multi-dimensional, comprising of monthly account
records and daily transaction records (Donato et al., 2.3. Properties of the built behavioral scoring model
1999). Therefore, even with highly accurate scoring models,
some misclassification patterns appear frequently. In the business world, the most successful application of
This study intended to draw much from data mining behavioral scoring model is embodied into databases, which
perspectives. Providing a general integrated data mining and is an approach of analyzing customer histories, looking for
behavioral scoring model for customer behavior analysis, similar behavioral patterns among existing customer pre-
which includes necessary preprocessing of the real-world ferences and using those patterns for a targeted selection of
data sets, scoring predicators derivation and customer existing or future customers The decisions to be made
profiling in order to support a standard model building include which target groups of customers will be encour-
process will be of great utility. The framework of two-stage aged to spend more, what credit line to assign, whether to
behavioral scoring model serves as a tool to validate the promote new products to particular groups of customers,
effect of data mining techniques in practical scoring analysis and, if the repayment ability turns bad, how to manage debt
applications. recovery. Therefore, a behavioral scoring model is an
information-driven marketing process that enables market-
2.2. Neural networks to the segmentation analysis ers to develop, test, implement, measure and appropriately
modify customized marketing programs and strategies.
For credit scoring or behavioral scoring analysis, many In addition to customer values that credit scoring models
studies have presented that neural networks perform use as major scoring information, repayment behavior
significantly better than statistical techniques such as linear patterns and customer purchasing histories are also required
discriminate analysis (LDA), multiple discriminate analysis in a behavioral scoring model. Behavioral scoring models
(MDA), logistic regression analysis (LRA) and so on (Desai are intended to establish associations between the input
et al., 1996; Lancher et al., 1995; Malhotra and Malhotra, predictors and the output scores in order to model the
2003; Sharda & Wilson, 1996; Zhang et al., 1999). The behavior of different customers. More precisely, behavioral
application of neural networks to segmentation analysis is a scoring models tried to group customers that represent
promising research area and is a challenge for a variety of shared behavior patterns. This is carried out by assigning
marketers (Vellido, Lisboa, & Vaughan, 1999). behavior scores to each customer and grouping customers
Baesens, Viaene, Poel, Vanthienen, & Dedene (2002) into classes of similar score value using an SOM neural
employed Bayesian neural networks to repeat purchase network. The behavior score is given by a mathematical
behavior modelling in direct marketing. Davies, Moutinho, function of the form:
& Curry (1996) and Moutinho, Davies, & Curry (1996)
analyzed how different bank customer groups represent behavior score Z fSOM ðpredicator1 ; predicator2 ; .Þ:
different expectations of the automatic teller machines In this study, four predicators, namely, repayment behavior
(ATMs) service. Rather than profiling segments based on and RFM values are used to classify three profitable groups
demographic or geographic characteristics, Dasgupta et al. of customers. Individual customer scores are updated on a
(1994) characterized potential customer segments in terms yearly basis in this study.
of lifestyle variables. Balakrishnan, Cooper, Jacob, & Lewis
(1996) accomplished a six-segment classification study
using coffee brand switching probabilities derived from the
scanner data at a sub-household level. Mazanec (Mazanec, 3. Assessing the neural network as customer
1992) grouped tourists using a benefit approach. Setiono segmentation
et al. (1998) utilized a rule-extraction neural network to aim
at companies for the promotion of new information 3.1. Preparing the data sets
technology. Fish, Barnes, & Aiken (1995) proposed a new
methodology for industrial market segmentation by neural For this study, bank databases were provided by a major
networks. Lee, Chiu, Lu, & Chen (2002) explore the Taiwanese credit card issuer. Data preprocessing was
performance of credit scoring by integrating the back required to ensure data field consistency in behavioral
propagation neural networks with traditional discriminate scoring model building. Obviously, not all the data are
analysis. Kim & Sohn (2004) used neural networks to related to the chosen purposes, so knowledge extraction
manage customer loans. from the bank databases included the following three sub-
Among these studies, only Balakrishnan et al. used the actions. The first sub-action was intended to organize the
frequency sensitive competitive learning (FSCL) algorithm raw data. Two data sets were obtained: a set containing
in segmentation analysis. The rest of the studies used effective credit card account information of 158,126
supervised feed-forward multilayer perceptron (MLP) customers until June 2003, and another set storing over
626 N.-C. Hsieh / Expert Systems with Applications 27 (2004) 623–633
20 millions individual transaction records for these accounts to make good customer behavior management may be
from January 2000 to June 2003. Then, two data sets were limited by poor data relevance and quality, the volume of
joined using a customer identifier to create a single data needing to be processed, or difficulty in viewing the
behavioral-oriented data set. The second sub-action was data. Therefore, the original data set could not be used
the extraction of only that data considered useful for the directly to predict customer behavior, so extra behavioral
analysis. Unnecessary data fields and records containing scoring predicators were needed for predication.
incomplete or missing data were removed from the data sets As mentioned, banks have three types of profitable
(Fish et al., 1995). The third sub-action was the application customers: revolver users, transactor users and convenience
of simple statistics to calculate an aggregate of new users. Revolver users always carry a credit card balance,
behavioral scoring predicators. rolling over part of the bill to the next month, instead of
The aim of calculating the aggregate was to emphasize paying off the balance in full each month. Revolver users are
the customer repayment behavior and RFM (Bult & highly profitable customers because every month they pay
Wansbeek, 1995) information hidden in the 12 months considerable interest on their outstanding balance. Mean-
observation period. In this case, the values derived from the while, transactor users pay in full on or before the due date
database such as maximum, minimum and average of a set of the interest-free credit period and do not incur any
of variables (e.g. repayment states, payment cycle days, interest payments or finance charges. Transactor users do
number of credit card purchases, consumption amount, not contribute significant revenue through interest on their
interest on credit balance, and so on) for the monthly credit balances, but the discount on each transaction they
activity over the past 12 months were considered for the make still provides an important source of bank revenue.
purpose of building a behavioral scoring model. As Finally, convenience users are customers who periodically
mentioned, the desired outcome is to be able to predict charge large bills, such as for vacation or large purchases to
which customer belongs to which profitable group. The their credit card, and then pay these bills off over several
ranges of values of numerical predictor are split into months. Convenience users thus contribute significant
intervals so that each interval contains as many customers as amounts of interest on their credit balance.
possible that have a significant homogeneous behavior. Fig. 2 presents the conceptual framework used to answer
Multiple predictors can be grouped together to obtain the the questions posed in this study. This figure shows the two
same effect. To derive the most profitable customers, it was components, customer segmentation and customer profiling,
chosen to identify similar repayment behavior with respect which serve as major issues to be discussed here. Generally,
to RFM values found in the real world. credit card issuers make money from annual fees, interest on
credit balance, and the discount collected from merchants
3.2. Analyzing the behavior of customers on each transaction. In this framework, account and
transaction data sets are assumed to be input sets to
To establish a better relationship with customers, banks customer segmentation. The values of RFM and repayment
constantly seek ways of differentiating their offerings and behavior are assumed to be behavioral scoring predicators
developing more appropriate services for distinct market affecting customer segmentation.
segments. An important observation on the current state-of- The recency (R) value measures the average time
the-art segmentation analysis is the use of past transaction distance between the day of makes a charge and the day
data. The results produced are based on the assumption that pays the bill, frequency (F) value measures the average
the customer behavior follows patterns similar to past number of credit card purchases made, and monetary (M)
patterns and will repeat in the future. Therefore, there could value measures the amount of consumption spent during a
not be a better time than now to recognize the importance of yearly time period. Next, variables such as customer
an effective new marketing strategy using data mining attributes and credit card usage are assumed to influence
techniques. To increase the amount of purchases while customer profiling. Finally, clusters and the associated
improving customer satisfaction is a major goal. customer profiles are assumed to be outputs, as well as
Segmentation analysis is a method of achieving more influencing of credit card marketing strategies. In Fig. 2,
targeted communication with customers and is a pioneering repayment behavior is highly related to customer segmenta-
step towards classifying individual customers according to tion, but is an implicit variable which cannot be retrieved
previously defined groups of customers. The process of directly from the data set. We needed to develop a method
segmentation analysis describes the characteristics of for modeling the customer repayment behavior.
groups of customers within the data, and putting customers As shown in the following equation, this study employs
into segments according to their affinities or similar ‘Repayment Ability’ (RA) to model repayment behavior,
characteristics.
This study tries to construct a behavioral scoring model Repayment Ability
for direct marketing and encouraging consumption (Lancher
et al., 1995). These two goals are similar for analyzing no: of months without delayed pay off
Z :
potential credit card customer behavior. However, attempts no: of months of holding the card
N.-C. Hsieh / Expert Systems with Applications 27 (2004) 623–633 627
The default observation range is assumed to be During the learning process, when a pattern is presented as
12 months, and RA is computed as the ‘no. of months an input to the neural network, each Euclidean distance
without delayed pay off’ divided by the ‘no. of months of between the pattern and each neuron is calculated using
holding the card’. For example, a customer without carries a RA first and then RFM as input variables. For inputs to the
credit card balance for 8 months, and then the degree of RA SOM, each feature is scaled by subtracting the mean and
is computed as 8/12. For each customer, if RA is dividing by the standard deviation, resulting in each scaled
approaching one, then the repayment behavior of that feature having a mean of zero and a standard deviation of
customer is considered a transactor user. Meanwhile, if RA one. Once the most similar neuron is determined, the
is between zero and one then the repayment behavior of that neighborhood of that neuron is identified. The neighbor-
customer is considered a convenience user. Finally, if the hood of a neuron is defined as all the neurons within a
value of RA is approaching to zero then the repayment given link distance of the matched neuron. All neurons in
behavior of that customer is considered a revolver user. the neighborhood are adjusted to have feature values closer
to the current case. The adjustment amount of the neuron
3.3. Assessing the SOM for customer behavioral scoring weights is controlled by the learning rate.
The SOM map is shown in Fig. 3, the repayment
During the last years, the SOM (Kohonen, 1995) has behavior, number of customers, ratio of number of
gained in popularity as a classification analysis tool in customers relative to the overall customers, RA and RFM
business related areas (Vellido et al., 1999). In this study, are shown for each neuron. Fig. 4 illustrated the overall
the SOM is built with data from existing customers, which distribution of customers with respect to three major
include variables from account and transaction data sets. All profitable types of customers. The mass cases are distributed
of the existing customer’s data are used to build the over neurons 9–16, the number of customers is 104,979 and
behavioral scoring model in order to predicate potential repayment behavior is revolver user. Neurons 3, 4, 7 and 8
customer behavior. indicate a total of 21,202 customers are convenience users.
The behavioral scoring model utilized in this study is Neurons 1, 2, 5 and 6 indicate a total of 31,945 customers
arranged to form a two dimensional SOM with a 4!4 are transactor users.
rectangular shaped array of neurons. Each of these neurons On the basis that no meaningful conclusions can be
is connected to the input vectors through synaptic weights drawn from small numbers of customers, no future analysis
which are adjusted during learning. The first phase of needs to be performed on the clusters with fewer than 1000
SOM is a rough estimation phase, used to capture the gross cases (i.e. neuron 6, 12 and 15). The next major step is to
data patterns. The second phase is a tuning phase, used to choose the target groups of customers, so as to choose the
adjust the map to model the fine features of the data. target customers for direct marketing and encourage
628 N.-C. Hsieh / Expert Systems with Applications 27 (2004) 623–633
consumption. The repayment behavior can be used to The data set obtained after data preprocessing contained
indicate the risk of customers, the risk degrees among three 32 attributes, 10 character attributes and 22 continuous
profitable groups of customers are ‘transactor user’%‘con- attributes. The neural network sensitivity analysis was used
‘convenience user’%‘revolver user’. Moreover, the clusters to reserve the relative importance attributes, repayment
of RFM values tend to RYF[M[ of each profitable group behavior and RFM values chosen as predicated variables for
are selected as target ones, all customers who belong to whole customers. As recommended by Hornik, Stinch-
these clusters become candidates for conducting suitable combe, & White (1989), one hidden layer network is
marketing strategies for a bank, which attract the most sufficient to model a complex system with any desired
attention. accuracy, and the employed neural network model has just
one hidden layer.
3.4. Determining the relative importance variables Table 1 lists the distribution of the relative importance
for each input variable using the neural network. The
After the segmentation of the existing customers, it is sensitivity analysis of the neural network and the order of
possible to infer the characteristics of each group of most significant input variables indicate those variables that
customers and from that propose appropriate management
strategies. Customer profiling (Setiono et al., 1998) provides
a basis for enterprises to offer customers better services and
retain good customers. Customer profiling is done by
assembling collected information on customers and their
potential behavior. We first used neural network sensitivity
analysis (Zurada, Malinowski, & Cloete, 1994) test for
whole customers to determine if there are significant
differences between each customer and minimize the input
variables, then infer customer profiles by an Apriori
association rule inducer. Fig. 4. Customer distribution to repayment behavior.
N.-C. Hsieh / Expert Systems with Applications 27 (2004) 623–633 629
Table 2
Cluster-1 profile
rule inducer. An association rule is considered relevant measure in replacement of confidence measure. Srikant &
for decision making if it has support and confidence at Agrawal (1995) defined generalized association rules using a
least equal to some minimal support and confidence taxonomy of the items set. Heckerman (Heckerman, 1996)
thresholds defined by the user. As shown in Table 2, the and Silberschatz et al. (Silberschatz & Tuzhilin, 1996)
extracted association rules are usually very large, to the measured the distance between association rules by evaluat-
present of a huge proportion of redundant rules ing the deviation according rule’s support and confidence.
conveying the same information. Many of the rules Bayardo, Agrawal, & Gunopulos (1999) used item con-
may contain redundant, irrelevant information or describe straints, which are Boolean expressions defined by user, to
trivial knowledge. We present interactive strategies for specify the form of association rules. Pasquier, Bastide,
pruning redundant association rules on the basis of Taouil, & Lakhal (1999) adapted the Duquenne-Guigues
equivalence relation to enhance its readability. basis for global implications, and the proper basis for
Several methods have been proposed in the literature to partial implications to the framework of association rules.
reduce the number of extracted association rules. Silverstein, Klemettinen, Mannila, Ronkainen, Toivonen, & Verkamo
Brin, & Motwani (1998) used Pearson’s correlation statistic (1994) simplified a relatively significant number of
Table 3
The redundant-free cluster profile of cluster-1 (merged)
Table 4
The redundant-free cluster profile of cluster-2 (merged)
association rules via the visualization technique. Bastide, according to the context. This method is possible to
Pasquier, Taouil, Stumme, & Lakhal (2000) used the Galois deduce efficiently, without access to the original dataset;
connection as a basis to discover minimal non-redundant all valid association rules with their supports and
association rules. Bayardo & Agrawal (1999) proposed the confidences are from these bases.
A-maximal rules which state that when the population of Suppose that X10Y1 is a redundant-free association rule,
objects concerned is reduced when an item is added to the if and only if, there does not exist another association rule
antecedent, the form of association rules will have maximal X20Y2, such that X24X1 and Y14Y2. For example, in
antecedents. Table 2, rule 9 is redundant to rules 3–8, because rule 9 does
We intended to provide strategies to reserve useful, not convey additional information to the user. Therefore,
relevant and non-redundant association rules. Thus, rule 9 can be removed from the cluster profile. Here is an
redundant rules which represent in certain databases illustration of two types of rule merging principle.
the majority of extracted rules, particularly in the case of (1) Let X10Y1 (s1%, c1%, l1) and X20Y2 (s2%, c2%, l2) be
dense or correlated data for which the total number of two association rules in the same cluster profile, where
valid rules is very large, will be pruned. Using the X24X1 or Y14Y2. Then, X10Y1 (s1%, c1%, l1) is a
concept of equivalence class, the redundant rules will be redundant association rule and can be directly removed
collected in the same equivalence class. The presentation from the cluster profile. For example, Table 3 represents
to the user will be only the most informative non- the redundant-free cluster profile to cluster-1 (Table 2).
redundant association rules, where the union of the The last field in Table 2, ‘R. to Rules’, indicates the
antecedents (or consequents) is equal to the unions of the corresponding redundant association rules.
antecedents (or consequents) of all the association rules (2) Let X10Y1 (s1%, c1%, l1) and X20Y2 (s2%, c2%, l2) be
valid in the context. The resulting rules will have two association rules in the different cluster profiles,
minimal antecedents and maximal consequents in the where X24X1 or Y14Y2, and t1, t2 are number of cases
same equivalence class. The extraction of a set of rules representing X10Y1 (s1%, c1%, l1) and X20Y2 (s2%,
without any loss of information will convey all the c2%, l2), respectively. Then, X10Y1 (s1%, c1%, l1) is a
information in a set of association rules that are all valid redundant association rule and should be removed from
Table 5
The redundant-free cluster profile of cluster-1 and cluster-2 (merged)
the cluster profile. Three judgment standards, support, and marketing strategies can be implemented according to
confidence and lift, of X20Y2 (s 0 %,c 0 %,l 0 ) were more detailed customer sub-groups.
updated as:
0 1
t1 * SupportðX1 g Y1 ; DB1 Þ C t2 * SupportðX2 g Y2 ; DB2 Þ
s Z0
;
B t1 C t2 C
B C
B 0 t1 * SupportðX1 g Y1 ; DB1 Þ C t2 * SupportðX2 g Y2 ; DB2 Þ C
Bc Z ;C
X 2 0 Y2 B
B t1 SupportðX1 ; DB1 Þ C t2 SupportðX2 ; DB2 Þ C:
C
B 0 c0 C
Bl Z C
@ t1 SupportðY1 ; DB1 Þ C t2 SupportðY2 ; DB2 Þ A
:
t1 C t2
Johnson, R. A., & Wichern, D. W. (1998). Applied multivariate statistical Setiono, R., Thong, J. Y. L., & Yap, C. S. (1998). Symbolic rule extraction
analysis (4th Ed.). Upper Saddle River, NJ: Prentice-Hall. from neural networks—an application to identifying organizations
Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using adopting IT. Information and Management, 34(2), 91–101.
misclassification patterns of credit scoring model. Expert Systems with Sharda, R., & Wilson, R. (1996). Neural network experiments in business
Applications, 26, 567–573. failures predication: a review of predictive performance issues.
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., & Verkamo, International Journal of Computational Intelligence and Organiz-
A. I. (1994). Finding interesting rules from large sets of discovered ations, 1(2), 107–117.
association rules. Proceedings of CIKM Conference , 401–407. Silberschatz, A., & Tuzhilin, A. (1996). What makes patterns interesting in
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer. knowledge discovery systems. IEEE Transactions on Knowledge and
Lancher, R. C., Coats, P. K., Shanker, C. S., & Fant, L. F. (1995). A neural Data Engineering, 8(6), 970–974.
network for classifying the financial health of a firm. European Journal Silverstein, C., Brin, S., & Motwani, R. (1998). Beyond market baskets:
of Operational Research, 85(1), 53–65. generalizing association rules to dependence rules. Data Mining and
Knowledge Discovery, 2(1), 39–68.
Lee, T. S., Chiu, C. C., Lu, C. J., & Chen, I. F. (2002). Credit scoring using
Srikant, R., & Agrawal, R. (1995). Mining generalized association rules.
the hybrid neural discriminate technique. Expert Systems with
Proceedings of VLDB Conference , 407–419.
Applications, 23, 245–254.
Thomas, L. C. (2000). A survey of credit and behavioural scoring:
Malhotra, R., & Malhotra, D. K. (2003). Evaluating consumer loans using
forecasting financial risk of lending to consumers. International
neural networks. Omega, 31(2), 83–96.
Journal of Forecasting, 16, 149–172.
Mazanec, J. A. (1992). Classifying tourists into market segments: a neural
Vellido, A., Lisboa, P. J. G., & Vaughan, J. (1999). Neural networks in
network approach. Journal of Travel and Tourism Marketing, 1(1), business: a survey of applications (1992–1998). Expert Systems with
39–59. Applications, 17, 51–70.
Morrison, D. F. (1990). Multivariate statistical methods. New York, NY: Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural
McGraw-Hill. networks in bankruptcy prediction: general framework and cross-
Moutinho, L., Davies, F., & Curry, B. (1996). The impact of gender on car validation analysis. European Journal of Operational Research, 116,
buyer satisfaction and loyalty. Journal of Retailing and Consumer 16–32.
Sciences, 3(3), 135–144. Zurada, J. M., Malinowski, A., & Cloete, I. (1994). Sensitivity analysis for
Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Closed set based minimization of input data dimension for feedforward neural network.
discovery of small covers for association rules. Proceedings of BDA IEEE International Symposium on Circuits and Systems, London, May
Conference , 361–381. 20–June 3.