Professional Documents
Culture Documents
BookSlides 12 Case Study Customer Churn
BookSlides 12 Case Study Customer Churn
1 Business Understanding
2 Data Understanding
3 Data Preparation
4 Modelling
5 Evaluation
6 Deployment
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
− Sam Walton
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Data
Prepara1on
Deployment
Data
Modeling
Evalua1on
Business Understanding
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Data Understanding
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Data Preparation
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Ross then developed a full data quality report for the ABT
including a range of data visualizations.
The data quality report tables are shown in the following
table.
He first assessed the level of missing values within the
data:
For the continuous features, only AGE stood out with
11.47% of values missing. This rate could be handled
reasonably easily using an imputation approach. (Decided
not to do it at this stage.)
For categorical features: The REGIONTYPE and
OCCUPATION had a significant number of missing values,
74% and 47.8% respectively.
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Modelling
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
CHURN
CHURN NON-CHURN
CHURN
NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN
NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN
NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN
CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN NON-CHURN NON-CHURN NON-CHURN CHURN
CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN CHURN CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN genTree372
NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN
CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN
CHURN CHURN CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN
NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN
NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN
CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN
NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN
NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN CHURN NON-CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN NON-CHURN CHURN
avgOverBundleMins
churn billAmountChangePct
churn handsetAge
<1,240.5 ≥1,240.5
billAmoutChangePct handsetAge
churn churn
churn non-churn
Table: The confusion matrix from the test of the AT churn prediction
stratified hold-out test set using the pruned decision tree in Figure 4
[39]
.
Prediction
’churn’ ’non-churn’ Recall
’churn’ 1,058 442 70.53
Target
’non-churn’ 152 1,348 89.86
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Table: The confusion matrix from the test of the AT churn prediction
stratified hold-out test set using the pruned decision tree in Figure 4
[39]
.
Prediction
’churn’ ’non-churn’ Recall
’churn’ 1,058 442 70.53
Target
’non-churn’ 152 1,348 89.86
The confusion matrix shows that this model was slightly more
accurate when classifying instances with the non-churn target
level than with the churn one.
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Evaluation
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
Table: The confusion matrix from the test of the AT churn prediction
non-stratified hold-out test set.
Prediction
’churn’ ’non-churn’ Recall
’churn’ 1,115 458 70.88
Target
’non-churn’ 1,439 12,878 89.95
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
1.0
2.0
2.0
● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
0.8
1.8
●
1.5
Cumulative Gain
Cumulative Lift
0.6
1.6
●
1.0
Lift
●
0.4
1.4
●
0.5
●
0.2
1.2
●
●
●
0.0
0.0
1.0
● ● ● ● ● ●
1st 3rd 5th 7th 9th 1st 3rd 5th 6th 7th 8th 9th 1st 3rd 5th 6th 7th 8th 9th
Decile Decile Decile
Figure: (a) cumulative gain, (b) lift and (c) cumulative lift charts for
the predictions made on the large test data sample.
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
avgOverBundleMins
churn billAmountChangePct
churn handsetAge
<1240.5 ≥1240.5
billAmoutChangePct handsetAge
Figure: A pruned and stunted decision tree built for the Acme
Telephonica churn prediction problem
Deployment
Business Understanding Data Understanding Data Preparation Modelling Evaluation Deployment
1 Business Understanding
2 Data Understanding
3 Data Preparation
4 Modelling
5 Evaluation
6 Deployment