Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

EFFECTIVE CAMPAIGN MANAGEMENT

IN RETAIL BANKING
IES Research Cluster:
Economics & Law in Banking and Finance
Tom Jaro April 4, 2006
Case Study on Consumer Lending
2
TABLE OF CONTENTS
INTRODUCTION INTO MARKETING
EFFICIENT DIRECT MARKETING
ANALYTICAL TECHNIQUES
CASE STUDY ON CONSUMER LENDING
CONCLUSIONS
3
ATL MARKETING
Marketing activities, specifically their promotion part, are generally split
into two categories. The first category is called above-the-line promotion
commonly known under the ATL acronym. ATL covers all promotion
techniques that do not address a concrete individual customer. Typical
examples are advertising spots on TV, outdoor advertising, and other
media advertising such as newspapers, magazines, radio, and internet.
Most of ATL techniques soon became very popular and widely used in
retail banking worldwide. Following the fast moving consumer goods
industry (FMCG), retail banks realized ATL marketing campaigns to
promote their brands/image or specific financial products and services.
The retail banking sector currently remains among industries with the
highest spending on advertising being not far behind of
telecommunication and FMCG companies.
4
BTL MARKETING
The second category is called below-the-line marketing (BTL) or direct
marketing (DM). Unlike ATL approaches, direct marketing always
addresses a concrete customer using some sort of a contact available on
a companys database such as home address, phone number, email, etc.
Therefore, it is increasingly known as a database marketing as well.
Both ATL and direct marketing were adopted to push sales of consumer
banking products. Although all these techniques have gradually become
more sophisticated, the real change happened in the early 2000s when
banks discovered new opportunities in addressing their existing
customers. Specifically, so called propensity-to-buy modelling and data-
mining techniques became a new phenomenon that significantly
improved the effectiveness and efficiency of direct marketing campaigns.

5
TABLE OF CONTENTS
INTRODUCTION INTO MARKETING
EFFICIENT DIRECT MARKETING
ANALYTICAL TECHNIQUES
CASE STUDY ON CONSUMER LENDING
CONCLUSIONS
6
LINEAR FUNCTION OF THE EXPECTED
CAMPAIGN PROFITABILITY
0
N
e
[
+
-
o t + = [
k j
e e
c N N RR
( )
k j
e e
c RR N = [ t
RR
e
is an expected response rate to the campaign, N is the total number
of customers addressed in the campaign,
j
is a profit that a bank has
from the sale of a product j, c
k
is an additional unit cost associated with
the campaign, and is a side effect of the campaign.
A decision whether a DM campaign is going to be executed should
depend on the expected profitability of the campaign. If no response
model is available, this decision mainly depends on the value of RR
e
.
7
RANKED INDIVIDUAL AND CUMULATIVE
RESPONSE RATES
Retail banks build models
that predict the probability
that a client will buy
a particular financial
product or service in the
near future.
Specifically, they estimate
the response rate of every
client to a particular direct
marketing campaign.
Therefore, the predictive
modelling in marketing is
known as response or
propensity-to-buy
modelling.
0% 20% 40% 60% 80% 100%
0%
20%
40%
60%
80%
100%
p(N)
P(N)
Percentiles of Clients
E
x
p
e
c
t
e
d

R
e
s
p
o
n
s
e

R
a
t
e

8
LIFT/INDEX CHART
0% 20% 40% 60% 80% 100%
0.0
1.0
2.0
3.0
4.0
L(N)
L(N)
Percentiles of Clients
L
i
f
t

V
a
l
u
e

As the probabilities are
ranked, the expected
response P(N) at point N
cannot be lower than the
total expected response
rate RR
e
, for which all bank
customers (N
T
) would be
addressed in a campaign.
This multiplicative effect
of response modelling
is called the lift value L(N).
L(N) shows how many
times P(N) is higher than
RR
e
for N-customers with
the highest expected
responses.
9
GAIN CHART
Percentiles of Prospects
P
e
r
c
e
n
t
i
l
e
s

o
f

R
e
s
p
o
n
d
e
r
s

0% 20% 40% 60% 80% 100%
0%
20%
40%
60%
80%
100%
A
B
R(N)
G(N)
X
The lift value can be
alternatively received
from the gain chart.
R(N) is the expected
percentage of
responders randomly
chosen from the total
set of bank customers.
G(N) is the expected
percentage of
responders that are
chosen from the total
set of customers ranked
based on their
estimated propensity.
L(20%)=(A+B)/B
10
0
N
e
[
+
-
) N (
e
3
[
) N (
e
2
[
) N (
e
1
[
N
1
N
2

* *
NON-LINEAR FUNCTION OF THE EXPECTED
CAMPAIGN PROFITABILITY
Unlike the original profit
function, the response
modelling makes the
expected profit non-linear.
This results purely from
the non-linearity of the lift
function L(N) as other
variables are linear or even
constant.
The maximum net profit
from a particular campaign
can be found for a number
of prospects (N
*
) that is
higher than zero while
lower than N
T
.
11
PROFIT MAXIMIZATION
k j
e
c N ) N ( P N = [ t
k j
e
T
e
c N RR ) N ( L N = [ t
;
RR
c
) N ( L ) N ( L
j
e
k
t
= +
k j
e
T j
e
T
e
c RR ) N ( L N RR ) N ( L
N
+ =
c
c
t t
H
0
N
2
e 2
s
c
c H
12
TABLE OF CONTENTS
INTRODUCTION INTO MARKETING
EFFICIENT DIRECT MARKETING
ANALYTICAL TECHNIQUES
CASE STUDY ON CONSUMER LENDING
CONCLUSIONS
13
RESPONSE MODELLING TECHNIQUES
There is no single widely accepted methodological approach to building
predictive models for direct marketing campaigns. However, in retail
banking three main techniques are frequently used in practice.
Unfortunately, none of them can be generally considered as the best
technique.

Response modelling techniques are as follows:
Logistic regressions (LOGIT)
Decision trees
1
(CRT, CHAID, Exhaustive CHAID, C 5.0 etc.)
Neural networks (not used in this analysis).
[1] They are also known as answer, association, classification, diagnostic, or predictive trees.
14
LOGISTIC REGRESSION
i
i
i
Z
Z
Z
i
i
e
e 1
e 1
P 1
P
=
+
+
=

=
+ = =
|
|
.
|

\
|

=
n
1 i
i i 0 i
i
i
i
X Z
P 1
P
ln Logit | o
Logit is, in fact, the result of the linear transformation of the originally
nonlinear relation between the probabilities P
i
and explanatory variables
X
i
and between the probabilities P
i
and the estimated parameters
i
.
The interpretation is as follows:
i
, the slope, measures the change in
Logit
i
for a unit change in X
i
. The estimation of conditional probabilities
can be easily get from the Logit function:
)) x ( p ( Logit
)) x ( p ( Logit
i
i
i
e 1
e
P

(Odds Ratio)
15
DECISION TREES: CHAID
Chi-squared Automatic Interaction Detector (CHAID), developed by Kass
(1980), is probably the most popular decision tree of all. It works with all
types of continuous and categorical variables. However, continuous
predictor (independent) variables are automatically binned into ordinal
classes for the purpose of the analysis.
Using as a criterion the significance of a statistical test, CHAID evaluates
all of the values or classes of a predictor variable (X
i
) with respect to the
target variable (Y). The use of the statistical test fully depends upon the
type of the target variable. If the target variable is continuous, an F-test
is used. If the target variable is ordinal or nominal
1
, the maximum
likelihood-ratio test is used. The level of the statistical significance () is
defined in advance.

[1] If the target variable is nominal, the Pearson chi-squared test can be used as well.
16
DECISION TREES: Exhaustive CHAID
Exhaustive CHAID was developed by Biggs, de Ville, and Suen (1991)
to address some of the weaknesses of the original CHAID method.
In particular, sometimes CHAID may not find an optimal solution
as it stops merging the values/classes of the predictor variable (X
i
)
into groups as soon as it finds that all remaining groups are statistically
heterogeneous with respect to the target variable (Y). Exhaustive CHAID
remedies this by continuing to merge the values/classes of the predictor
variable (X
i
) until only two supergroups are left.
However, results of the Exhaustive CHAID do not significantly differ from
predictive results received from the standard CHAID in many cases.

17
DECISION TREES: CRT
The C&RT acronym stands for Classification and Regression Trees,
developed by Breiman, Friedman, Olshen, and Stone (1984).
Unlike both CHAID methods, it is a binary tree growing method that
always splits the data into two subsets only. This means that each
(parent) node of the tree, regardless on which branch level it is, can be
further split into maximum two other (child) nodes. This split is executed
only if child nodes are more homogeneous/pure than the parent node
based on so called purity measures. In a completely pure node, all of the
cases have exactly the same value for the target variable (Y).
There are four impurity measures used by C&RT for splitting depending
on the type of the target variable (Y). If the target variable is categorical,
Gini, and so called twoing, or (for ordinal variables) ordered twoing are
used. If the target variable is continuous, the least-squared deviation
(LSD) is commonly used.
18
TABLE OF CONTENTS
INTRODUCTION INTO MARKETING
EFFICIENT DIRECT MARKETING
ANALYTICAL TECHNIQUES
CASE STUDY ON CONSUMER LENDING
CONCLUSIONS
19
We will apply logistic regressions and selected decision tree methods
on a sample of retail banking customers of an anonymous bank that
addressed its customers in a particular consumer lending campaign
in the 2
nd
half of 2005.
We will try to identify key differentiators between the group of customers
that responded to this campaign, i.e. bought a consumer loan afterwards,
and the group of customers that did not.
We will also predict their hypothetical likelihood of buying a consumer
loan as if the campaign were not executed. Then we will compare the
results of predictions of individual modelling methods.
RESEARCH OBJECTIVE
20
The whole data set (19,500 cases) is randomly split with aid of Bernoullis
procedure into training and testing groups that are fixed for all analyses.
Both data sets are well balance as clients that bought a consumer loan
represent nearly 50% of cases in both groups.
The dependent variable (CLOAN) is defined as ownership of a consumer
loan that was bought in the second half of 2005 in one of direct mailing
campaigns of this bank. This is a binomial variable that has a value of one
for retail customers with a consumer loan and value of zero for others.
15 independent variables are selected for the predictive analysis to
estimate the likelihood of buying a consumer loan. They are chosen
from the original set of 90 variables based on the business presumptions,
descriptive statistics or bivariate correlations. Nevertheless, these 15
variables cover most of areas provided by the original set of variables.
DATA & VARIABLES
21
We expect that younger people are more likely to buy a consumer loan.
Retail customers being longer with the bank (TENURE) are mostly older
and thus less willing to buy a consumer loan because of their age.
Customers with the lower number of adjusted products (PRODUCTS_N3)
are assumed to respond more likely to the consumer lending campaign.
Customers with SAVINGS_MT products are less likely to buy a consumer
credit because they may use their medium-term savings to finance their
extra consumption.
We suppose that clients with the higher number of outgoing payment
transactions (TRX_OUT_AVG) are more likely to buy a consumer loan
than others due to their higher living activity.
We expect a negative impact on the likelihood of buying a consumer loan
for both credit card (CCRD) and overdraft (OVD) holders etc.
PRELIMINARY HYPOTHESES (EXAMPLES)
22
Selected
Independent
Variables
Expected impact on
the propensity to buy
a consumer loan
AGE -
TENURE -
PRODUCTS_N3 -
SAVINGS_MT -
VOL_SAVINGS_AVG -
VOL_LOANS_AVG +
PB +
TRX_OUT_AVG +
CATO_CRED_AVG +
TOTORATIO2 +
TO-RATIO9 +
CA_LCY_N +
CA_FCY_N +/-
CCRD -
OVD -
SUMMARY OF PRELIMINARY HYPOTHESES (SIGNS)
We expect that all of these
variables could be found
statistically significant in
the logistic regression.
Concerning decision
threes, we presume that
VOL_SAVINGS_AVG,
CATO_CRED_AVG, AGE,
and OVD should play.
Alternatively, the role of
CATO_CRED_AVG
could be replaced by one
of credit turnover ratios
such as TOTORATIO2 or
TO-RATIO9.
23
CORRELATION MATRIX OF INDEPENDENT VARIABLES
Correl ati ons
1 -, 200** -, 199** -, 171** -, 412** ,169** ,328** -, 012 -, 139** ,245** ,311** -, 168** -, 130** ,208** ,020** ,163**
,000 ,000 ,000 ,000 ,000 ,000 ,100 ,000 ,000 ,000 ,000 ,000 ,000 ,008 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 200** 1 ,193** ,117** ,395** -, 361** -, 214** -, 012 ,094** -, 049** -, 336** ,078** -, 037** -, 185** -, 007 -, 010
,000 ,000 ,000 ,000 ,000 ,000 ,101 ,000 ,000 ,000 ,000 ,000 ,000 ,318 ,157
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 199** ,193** 1 ,221** ,185** -, 014* ,091** ,054** ,069** -, 042** -, 004 ,048** ,074** ,057** -, 012 -, 077**
,000 ,000 ,000 ,000 ,046 ,000 ,000 ,000 ,000 ,582 ,000 ,000 ,000 ,109 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 171** ,117** ,221** 1 ,427** ,096** ,318** ,155** ,207** ,040** ,105** ,397** ,292** ,474** ,004 -, 090**
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,570 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 412** ,395** ,185** ,427** 1 -, 246** -, 192** ,033** ,232** -, 093** -, 473** ,060** ,019** -, 182** -, 010 -, 019**
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,006 ,000 ,163 ,007
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
,169** -, 361** -, 014* ,096** -, 246** 1 ,392** ,101** -, 034** ,063** ,360** -, 046** ,145** ,303** -, 009 -, 029**
,000 ,000 ,046 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,207 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
,328** -, 214** ,091** ,318** -, 192** ,392** 1 ,185** ,042** ,146** ,315** -, 035** ,186** ,598** ,001 -, 059**
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,912 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 012 -, 012 ,054** ,155** ,033** ,101** ,185** 1 ,492** ,394** ,098** ,158** ,110** ,045** -, 002 ,001
,100 ,101 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,837 ,931
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 139** ,094** ,069** ,207** ,232** -, 034** ,042** ,492** 1 ,124** -, 030** ,180** ,051** -, 061** -, 003 -, 007
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,641 ,349
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
,245** -, 049** -, 042** ,040** -, 093** ,063** ,146** ,394** ,124** 1 ,104** ,033** ,022** ,052** ,005 ,051**
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,002 ,000 ,528 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
,311** -, 336** -, 004 ,105** -, 473** ,360** ,315** ,098** -, 030** ,104** 1 -, 142** ,058** ,235** ,022** ,039**
,000 ,000 ,582 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,004 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 168** ,078** ,048** ,397** ,060** -, 046** -, 035** ,158** ,180** ,033** -, 142** 1 ,017* -, 080** -, 005 -, 127**
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,017 ,000 ,540 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
-, 130** -, 037** ,074** ,292** ,019** ,145** ,186** ,110** ,051** ,022** ,058** ,017* 1 ,102** -, 003 -, 016*
,000 ,000 ,000 ,000 ,006 ,000 ,000 ,000 ,000 ,002 ,000 ,017 ,000 ,681 ,028
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
,208** -, 185** ,057** ,474** -, 182** ,303** ,598** ,045** -, 061** ,052** ,235** -, 080** ,102** 1 ,005 -, 046**
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,466 ,000
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
,020** -, 007 -, 012 ,004 -, 010 -, 009 ,001 -, 002 -, 003 ,005 ,022** -, 005 -, 003 ,005 1 ,009
,008 ,318 ,109 ,570 ,163 ,207 ,912 ,837 ,641 ,528 ,004 ,540 ,681 ,466 ,234
17962 17962 17962 17962 17962 17962 17962 17962 17962 17962 17962 17962 17962 17962 17962 17962
,163** -, 010 -, 077** -, 090** -, 019** -, 029** -, 059** ,001 -, 007 ,051** ,039** -, 127** -, 016* -, 046** ,009 1
,000 ,157 ,000 ,000 ,007 ,000 ,000 ,931 ,349 ,000 ,000 ,000 ,028 ,000 ,234
19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 19541 17962 19541
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
Pearson Correlat ion
Sig. (2-t ailed)
N
CLOAN
Age
Tenure
Product s_N3
SAVINGS_MT
PB
Trx_OUT_AVG
CATO_Cred_AVG
VOL_Savings_AVG
VOL_Loans_AVG
CA_LCY_N
CA_FCY_N
CCRD
OVD
TO-RATI O9
TOTORATIO2
CLOAN Age Tenure Product s_N3 SAVINGS_MT PB Trx_OUT_AVG
CATO_Cred_
AVG
VOL_
Savings_AVG
VOL_Loans_
AVG CA_LCY_N CA_FCY_N CCRD OVD TO-RATI O9 TOTORATIO2
Correlat ion is significant at the 0.01 level (2-tailed). **.
Correlat ion is significant at the 0.05 level (2-tailed). *.
24
LOGISTIC REGRESSION - RESULTS (1/3)
ENTER METHOD
B S.E. Wald df Sig. Exp(B)
Age 0,012663 0,002379 28,34 1 0,000 1,012744
Tenure -0,010148 0,000967 110,10 1 0,000 0,989903
Products_N3 -0,999430 0,117189 72,73 1 0,000 0,368089
SAVINGS_MT -0,103329 0,177423 0,34 1 0,560 0,901830
PB -0,092897 0,063670 2,13 1 0,145 0,911287
Trx_OUT_AVG 0,087341 0,004744 339,00 1 0,000 1,091269
CATO_Cred_AVG 0,000003 0,000001 9,68 1 0,002 1,000003
VOL_Savings_AVG -0,000011 0,000001 289,99 1 0,000 0,999989
VOL_Loans_AVG 0,000035 0,000001 665,39 1 0,000 1,000035
CA_LCY_N 1,562598 0,129196 146,28 1 0,000 4,771201
CA_FCY_N 0,389995 0,181779 4,60 1 0,032 1,476973
CCRD -2,797469 0,315198 78,77 1 0,000 0,060964
OVD 0,650219 0,144527 20,24 1 0,000 1,915960
TORATIO9 0,000013 0,000076 0,03 1 0,867 1,000013
TOTORATIO2 1,707649 0,149284 130,85 1 0,000 5,515977
Constant -3,028574 0,198100 233,73 1 0,000 0,048385
25

4650 589 88,8 4671 619 88,3
826 3733 81,9 812 3641 81,8
85,6 85,3
Observed
no
yes
CLOAN
Overall Percentage
Step 1
no yes
CLOAN
Percentage
Correct
Training Group
no yes
CLOAN
Percentage
Correct
Testing Group
Predicted
LOGISTIC REGRESSION - RESULTS (2/3)
FORWARD STEPWISE
B S.E. Wald df Sig. Exp(B)
Tenure -0,011593 0,000901 165,58 1 0,000 0,988473
Products_N3 -1,547388 0,063964 585,24 1 0,000 0,212803
Trx_OUT_AVG 0,075051 0,004004 351,34 1 0,000 1,077939
VOL_Loans_AVG 0,000037 0,000001 697,97 1 0,000 1,000037
CA_LCY_N 1,949593 0,094444 426,13 1 0,000 7,025829
OVD 1,384275 0,094706 213,64 1 0,000 3,991929
TOTORATIO2 1,712511 0,142199 145,04 1 0,000 5,542865
Constant -3,102950 0,140556 487,36 1 0,000 0,044916
26
LOGISTIC REGRESSION (3/3)
STATISTICAL TESTS

8245,778
a
,417 ,557
Step
1
-2 Log
likelihood
Cox & Snell
R Square
Nagelkerke
R Square
Estimation terminated at iteration number 6 because
parameter estimates changed by less than ,001.
a.
5289,903 7 ,000
5289,903 7 ,000
5289,903 7 ,000
Step
Block
Model
Step 1
Chi-square df Sig.
390,912 8 ,000
Step
1
Chi-square df Sig.
Hosmer-Lemeshow Test
(it should be insignificant)
Omnibus Tests of
Model Coefficients
(F-test equivalent)
Coefficient of Determination
(Nagelkerge is similar standard R
2
)
27
DECISION TREES: CRT - RESULTS (1/3)
Classification
4862 376 92,8%
294 4266 93,6%
52,6% 47,4% 93,2%
4898 393 92,6%
307 4145 93,1%
53,4% 46,6% 92,8%
Observed
no
yes
Overall Percentage
no
yes
Overall Percentage
Sample
Training
Test
no yes
Percent
Correct
Predicted
Grow ing Method: CRT
Dependent Variable: CLOAN
28
DECISION TREES: CRT - RESULTS (2/3)
Independent Variable Importance
,297 100,0%
,163 54,8%
,112 37,7%
,088 29,5%
,081 27,4%
,051 17,1%
,037 12,5%
,033 11,1%
,016 5,5%
,012 4,1%
,009 3,1%
,009 3,0%
,003 1,1%
,001 ,2%
Independent Variable
VOL_Loans_AVG
TO-RATIO9
Trx_OUT_AVG
VOL_Savings_AVG
TOTORATIO2
Tenure
CATO_Cred_AVG
OVD
CA_LCY_N
SAVINGS_MT
CCRD
PB
Age
Products_N3
Importance
Normalized
Importance
Grow ing Method: CRT
Dependent Variable: CLOAN
29
DECISION TREES: CRT GAIN EXAMPLE (3/3)
Gains for Nodes
3584 36,6% 3467 76,0% 96,7% 207,9% 3584 36,6% 3467 76,0% 96,7% 207,9%
779 8,0% 617 13,5% 79,2% 170,2% 4363 44,5% 4084 89,6% 93,6% 201,1%
73 ,7% 50 1,1% 68,5% 147,2% 4436 45,3% 4134 90,7% 93,2% 200,2%
206 2,1% 132 2,9% 64,1% 137,7% 4642 47,4% 4266 93,6% 91,9% 197,5%
96 1,0% 30 ,7% 31,3% 67,1% 4738 48,4% 4296 94,2% 90,7% 194,8%
255 2,6% 41 ,9% 16,1% 34,5% 4993 51,0% 4337 95,1% 86,9% 186,6%
1169 11,9% 107 2,3% 9,2% 19,7% 6162 62,9% 4444 97,5% 72,1% 155,0%
159 1,6% 9 ,2% 5,7% 12,2% 6321 64,5% 4453 97,7% 70,4% 151,4%
3477 35,5% 107 2,3% 3,1% 6,6% 9798 100,0% 4560 100,0% 46,5% 100,0%
3547 36,4% 3417 76,8% 96,3% 210,8% 3547 36,4% 3417 76,8% 96,3% 210,8%
748 7,7% 583 13,1% 77,9% 170,6% 4295 44,1% 4000 89,8% 93,1% 203,8%
73 ,7% 38 ,9% 52,1% 113,9% 4368 44,8% 4038 90,7% 92,4% 202,3%
170 1,7% 107 2,4% 62,9% 137,7% 4538 46,6% 4145 93,1% 91,3% 199,9%
103 1,1% 26 ,6% 25,2% 55,2% 4641 47,6% 4171 93,7% 89,9% 196,7%
272 2,8% 46 1,0% 16,9% 37,0% 4913 50,4% 4217 94,7% 85,8% 187,8%
1172 12,0% 115 2,6% 9,8% 21,5% 6085 62,5% 4332 97,3% 71,2% 155,8%
154 1,6% 1 ,0% ,6% 1,4% 6239 64,0% 4333 97,3% 69,5% 152,0%
3504 36,0% 119 2,7% 3,4% 7,4% 9743 100,0% 4452 100,0% 45,7% 100,0%
Node
5
10
14
15
6
9
13
16
3
5
10
14
15
6
9
13
16
3
Sample
Training
Test
N Percent
Node
N Percent
Gain
Response Index N Percent
Node
N Percent
Gain
Response Index
Node-by-Node Cumulative
Growing Method: CRT
Dependent Variable: CLOAN
30
DECISION TREES: CHAID & EXHAUSTIVE CHAID (1/2)
Classification
4846 393 92,5%
400 4159 91,2%
53,5% 46,5% 91,9%
4906 384 92,7%
418 4035 90,6%
54,6% 45,4% 91,8%
Observed
no
yes
Overall Percentage
no
yes
Overall Percentage
Sample
Training
Test
no yes
Percent
Correct
Predicted
Grow ing Method: CHAID
Dependent Variable: CLOAN
Classification
4858 381 92,7%
414 4145 90,9%
53,8% 46,2% 91,9%
4913 377 92,9%
444 4009 90,0%
55,0% 45,0% 91,6%
Observed
no
yes
Overall Percentage
no
yes
Overall Percentage
Sample
Training
Test
no yes
Percent
Correct
Predicted
Grow ing Method: EXHAUSTIVE CHAID
Dependent Variable: CLOAN
CHAID Exhaustive CHAID
Both CHAID and Exhaustive CHAID give very similar results
31
DECISION TREES: CHAID & EXHAUSTIVE CHAID (2/2)
Even three level growing rule gives very wide structure of the decision three
with 58 terminate nodes. Key growing variables are:
VOL_LOANS_AVG (1
st
level),
TORATIO-9, OVD, PRODUCTS_N3 (all for the 2
nd
level)
32
TABLE OF CONTENTS
INTRODUCTION INTO MARKETING
EFFICIENT DIRECT MARKETING
ANALYTICAL TECHNIQUES
CASE STUDY ON CONSUMER LENDING
CONCLUSIONS
33
CONCLUSIONS (1/4)
LOGISTIC REGRESSIONS
The logistic regression shows that 12 of selected 15 independent
variables are statistically significant. Compared to our preliminary
expectations, OVD proves to be very important factor that may influence
a propensity of customers to buy a consumer loan. Based on these
results, the overdraft ownership increases the odds to sell a consumer
loan tree times.
On the contrary, AGE and VOL_SAVINGS_AVG are not found important
in the forward step-wise procedure at all. As it was also partly presumed,
credit turnover (CATO_CRED_AVG) is replaced by, to some extent,
substitution credit turnover ratio (TOTORATIO2).
However, unsatisfactory results of the Hosmer-Lemeshow test presented
in the previous part of this chapter confirm that a use of logistic
regressions for the response modelling purposes can be very difficult.
34
Quite surprisingly, loan volumes (VOL_LOANS_AVG) play the most
important role when threes are grown in all methods.
The explanation could be likely the same as for OVD that is significant in
logistic regression. Most clients with higher loan volumes already use
overdraft or credit card facilities. Thus, a switch to consumer loans may
help them to optimize their debts, i.e. reduce their interest expenses.
In all threes, credit turnover ratios also play a significant role as it was
originally expected. The high influence of PRODUCTS_N3 and OVD was
presumed as well.
Concerning comparison between individual decision tree methods,
CRT gives, quite surprisingly, slightly better predictive results than CHAIDs
even though it is limited by a binary growing procedure.
CONCLUSIONS (2/4)
DECISION TREES
35
Decision trees show higher percentage of right predictions than logistic
regressions. However, it is quite common that decision trees are less
stable in their predictions than logistic regressions as their results may
easily change when new cases are added.
However, the predictive results of all logistic regressions and decision three
methods show quite high level of correlation. Correlation coefficient
between logistic regression results and decision trees results lies in the
range of 0.77-0.82.
Bivariate correlations between individual methods of decision threes are
even higher, i.e. in the range of 0.88-0.99 (see the following page).
CONCLUSIONS (3/4)
DECISION TREES
36
CONCLUSIONS (4/4)
COMPARISON OF PREDICTED RESULTS
.
.
Cor relations
1 ,926** ,920** ,918** ,821** ,821** ,796**
,000 ,000 ,000 ,000 ,000 ,000
19541 19541 19541 19541 19541 19541 19541
,926** 1 ,886** ,881** ,811** ,811** ,772**
,000 ,000 ,000 ,000 ,000 ,000
19541 19541 19541 19541 19541 19541 19541
,920** ,886** 1 ,992** ,818** ,818** ,797**
,000 ,000 ,000 ,000 ,000 ,000
19541 19541 19541 19541 19541 19541 19541
,918** ,881** ,992** 1 ,822** ,822** ,803**
,000 ,000 ,000 ,000 ,000 ,000
19541 19541 19541 19541 19541 19541 19541
,821** ,811** ,818** ,822** 1 1,000** ,955**
,000 ,000 ,000 ,000 ,000 ,000
19541 19541 19541 19541 19541 19541 19541
,821** ,811** ,818** ,822** 1,000** 1 ,955**
,000 ,000 ,000 ,000 ,000 ,000
19541 19541 19541 19541 19541 19541 19541
,796** ,772** ,797** ,803** ,955** ,955** 1
,000 ,000 ,000 ,000 ,000 ,000
19541 19541 19541 19541 19541 19541 19541
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
CRT1_PROB
CRT2_PROB
CHAID_PROB
EXCHAID_PROB
LREG1_PROB_
ENTER_ALL
LRER2_PROB_
ENTER_ELIM
LREG3_PROB_
ENTER_FINAL
CRT1_PROB CRT2_PROB CHAID_PROB
EXCHAID_
PROB
LREG1_
PROB_
ENTER_ALL
LRER2_
PROB_
ENTER_ELIM
LREG3_
PROB_
ENTER_
FINAL
Correlation is signif icant at the 0.01 level (2-tailed).
**.

You might also like