Professional Documents
Culture Documents
Conf Chunk Predict Ieee
Conf Chunk Predict Ieee
Abstract—Customer Churn is a challenging and one of the Supervised Learning trains a model using known input
most demanding issues in the telecom sector. The primary and output data. The data is labeled and these labels set the
motivation of businesses at present is just not only to acquire ground to exploit the data to predict future outputs on new
new customers, but to retain existing customers as well. In fact, data. Unsupervised Learning is employed if the data is
customer retention is more important because of the associated unlabeled. It finds hidden patterns or structures in the input
high costs. The present work has been carried out in a churn data using statistical means to predict churns.
prediction modeling context and benchmarks four machine
learning techniques against a publicly available This present study applies four classical machine learning
telecommunication dataset. The results provide two important algorithms (SVM, Logistic Regression, k-NN and Random
conclusions: i) Random Forest technique outperforms other Forests) for prediction of churns on a publically available
basic classification models and ii) Feature Engineering plays dataset [3] in telecommunication domain.
critical role in the performance of the model.
k,(((
WK,QWHUQDWLRQDO7HOHFRPPXQLFDWLRQ1HWZRUNVDQG$SSOLFDWLRQV&RQIHUHQFH,71$&
TABLE I. SUMMARY OF CHURN PREDICTION INITIATIVES, DATASETS AND TECHNIUES USED AND THEIR OUTCOMES
W. Au, K.C.C. Data Mining by Evolutionary 2003 Malaysian subscriber They were able to discover rules very
Chan, X. Yao [5] Learning, Decision tree database of wireless telecom effectively and predicted churn in the
(C4.5), Neural industry telecom data accurately.
Network.
S.-Y. Hung, D.C. Decision Tree and Neural 2006 Taiwan telecom company’s Both DT as well as NN techniques can
Yen, H.-Y. Wang Network Dataset deliver accurately while BPN
[5] performance is better than DT without
segmentation.
R.J. Jadhav, U.T. Back Propagation Neural 2011 It contains data from in-house Customers who are at risk of churning
Pawar [7] Network algorithm customer database, are predicted.
proprietary call record from
company & research survey
A. Sharma, P. Artificial Neural Network 2011 Telecom Dataset, UCI Accuracy obtained by Artificial
Prabin Kumar [8] Repository, University of Neural Network based model is 92%.
California, Irvine
H. Abbasimehr Adaptive Neuro-fuzzy 2011 Telecom Dataset, UCI Neuro-Fuzzy performed better than
[9] Inference system (ANFIS) Repository, University of C4.5, RIPPER in case of accuracy,
California, Irvine sensitivity and specificity.
E. Shaaban, Y. Decision tree, neural network, 2012 The Dataset is obtained from Accuracy of Neural Network is 83.7%,
Helmy, A. Khedr, and SVM. an anonymous mobile service SVM is 83.7% and Decision Tree is
M. Nasr [10] provider 77.9%
I. Brandusoiu, G. Support Vector Machine 2013 Telecom Dataset, UCI Accuracy of SVM based model is
Toderean [11] (SVM) Algorithm Repository, University Of 88.56%.
California, Irvine
K. Kim, C.-H. Logistic regression and 2014 Customer’s personal An efficient approach is developed
Jun, J. Lee [12] Multilayer perceptron neural information and CDR data is using SPA (as propagation process).
networks present in the dataset
G. Olle [13] Logistic Regression, Voted 2014 Asian Mobile telecom A Hybrid learning model is developed
perceptron operator dataset to predict churn.
T. Vafeiadis, K.I. SVM, Decision tree, Artificial 2015 Telecom Dataset, UCI SVM-POLY classifier is the best,
Diamantaras, G. Neural Network, Naive Repository, University Of using AdaBoost.
Sarigiannidis, Bayes, Regression Analysis, California, Irvine
K.C. boosting
Chatzisavvas [14]
b) Data Transformation Features which have the strongest relationship with the
Feature scaling may or may not have a significant effect output variable are selected using statistical tests. The scikit-
on the results and depends heavily on a algorithm that is learn library provides the SelectKBest class which is used to
being used. The features which have larger magnitude weigh select ‘k’ number of features according to the ‘k’ highest
more in calculations with respect to features having smaller ANOVA F-value scores.
magnitude. To suppress this effect, we brought all the b) Dimensionality Reduction
features to reasonably at the same level of magnitudes. PCA has been used for noise filtering and feature
Feature scaling is performed on features which are extraction. The scikit learn’s inbuilt implementation of PCA
continuous in nature. Voice Mail Plan, International Plan has been used to reduce the dimensionality of the data.
and Customer Service Calls features are exempted from
c) Oversampling(SMOTE)
standard scaling procedure because Voice Mail Plan and
International Plan are categorical with only 0 and 1 as SMOTE is used for the oversampling of the churners
categories and Customer Service Calls feature has values class (Here, positive class). SMOTE module from ‘imblearn’
spread over a range of 0 – 7 with frequency of 6 and 7 being library is used because here the positive class is under-
very less. represented.
IV. RESULTS
The Dataset used has 3,333 customers and 2,850 (85.5
%) are churners and 453 (14.5 %) are non-churners. The
objective under the problem is to select the model that yields
good classification accuracy with maximum possible recall
(sensitivity towards correctly predicting churner class).
We trained models with following optimal values of
parameters:
Fig. 3. ROC curve for K Nearest Neighbors Fig. 5. ROC curve for SVM
ROC-AUC Score: 0.92 ROC-AUC Score: 0.98
Fig. 4. Thresholds vs. Metrics Scores (KNN) Fig. 6. Thresholds vs. Metrics Scores
The threshold of 0.25 increases the sensitivity from 88% At 0.25 threshold, sensitivity increases from 97% to 98%
to 99% but accompanying it, is a dip in overall accuracy by by not changing other evaluation metrics appreciably.
82% to 65%. Moreover, the overall classification accuracy is almost the
TABLE V. CONFUSION MATRIX, THRESHOLD=0.5 same i.e. 94%.
Predicted Predicted
Negative Positive TABLE VII. CONFUSION MATRIX, THRESHOLD=0.5
Actual
659 196 Predicted Predicted
Negative
Negative Positive
Actual
97 758 Actual
Positive 783 72
Negative
Actual
TABLE VI. CONFUSION MATRIX, THRESHOLD=0.2 24 831
Positive
Predicted Predicted
Negative Positive TABLE VIII. CONFUSION MATRIX, THRESHOLD=0.25
Actual
264 591 Predicted Predicted
Negative Negative Positive
Actual Actual
6 849 769 86
Positive
Negative
Actual
Although, KNN has predicted True Positives better than 17 838
Positive
Logistic Regression but again the increase in False Positives
by 395 is a stronger vote against it. This simplifies to a The True Positives predicted are greater and the
tradeoff between classifying 91 churners correctly and difference between total churners and their predicted
misclassifying 395 incorrectly as churners. This is surely a numbers is very less (just 17). The False Positives generated
poor model considering the company’s objective of cost- are very less and the False Negatives are less as compared to
cutting. previous models (both in favor of our objective). Threshold
adjustment has not shown any significant improvement in
3) Support Vector Machine predicting more True Positives.
The ROC Score claims SVM to be a classifier with
excellent predictive capability. The Area under ROC Curve 4) Random Forest
is close to unity. To achieve almost 100% True Positive The ROC Score claims Random Forest to be a classifier
Rate (our objective) we need to compromise to False with outstanding predictive capability. The Area under ROC
Positive Rate of just 20%. Curve is close to unity. To achieve almost 100% True
WK,QWHUQDWLRQDO7HOHFRPPXQLFDWLRQ1HWZRUNVDQG$SSOLFDWLRQV&RQIHUHQFH,71$&
Positive Rate (our objective) we need to compromise to Although before adjustment of threshold, Random
False Positive Rate of just 20 %.( similar to SVM model) Forest Accuracy is greater than that of SVM but the True
Positives predicted are less in case of Random Forest. But
False Positives being 29 (rather than 72 in SVM) is a good
argument in favor of Random Forest. After threshold
adjustment, both SVM and Random Forest Models perform
exactly similar.
V. SUMMARY AND CONCLUSION
Churn Prediction can be modelled as a binary
classification problem. This work aims to solve this problem
using four classical methods of machine learning. The
prediction capabilities of different classification models
have been examined. The dataset taken is imbalanced and
not normalized. A subset of irrelevant features (with low
qualitative predictive power) is removed. The features
having strongest relationship with the output variable are
Fig. 7. ROC curve for Random Forest
ROC-AUC Score: 0.99 selected. We tuned the models for maximum predictive
At 0.3 threshold, the sensitivity (recall) is optimized performance using Grid Search. Models are trained on the
from 94% to 98% without much affecting the accuracy and best set of parameters obtained from the Grid Search
specificity (capability of not predicting non-churners as procedure. The classification efficiency is gauged using
churners). standard evaluation metrics (confusion metrics and ROC
Curve). Classification threshold is adjusted so as to optimize
sensitivity of the model.
It is concluded from the results presented in Section IV
that Random Forest and SVM are comparably the best
models for the given dataset. The False Positives predicted
by Random Forest and SVM are much less than the other
two models. Also, the True Positives are predicted with
accuracy of 94% and sensitivity of 98%.
REFERENCES
[1] V. Lazarov, M. Capota, churn prediction, Bus. Anal. Course TUM
Comput. Sci. (2007).
[2] R.H. Wolniewicz, R. Dodier, Predicting customer behavior in
telecommunications, IEEE Intell. Syst. 19 (2) (2004) 50–58.
[3] https://www.kaggle.com/becksddf/churn-in-telecoms-dataset
Fig. 8. Thresholds vs. Metrics Scores
[4] Leif E. Peterson (2009) K-nearest neighbor. Scholarpedia, 4(2):1883.
TABLE IX. CONFUSION MATRIX, THRESHOLD=0.5 [5] W. Au, K.C.C. Chan, X. Yao, A novel Evolutionary data mining
algorithm with applications to churn prediction, IEEE Trans. Evol.
Predicted Predicted Comput. 7 (6) (2003) 532–545
Negative Positive [6] S.-Y. Hung, D.C. Yen, H.-Y. Wang, Applying data mining to telecom
Actual churn management, Expert Syst. Appl. 31 (3) (2006) 515–524
826 29
Negative [7] R.J. Jadhav, U.T. Pawar, Churn prediction in Telecommunication
Actual using data mining technology, Int. J. Adv. Comput. Sci. Appl. 2 (2)
47 808
Positive (2011) 17–19.
[8] A. Sharma, P. Prabin Kumar, A neural network based approach for
TABLE X. CONFUSION MATRIX, THRESHOLD=0.3 Predicting Customer churn in cellular network services, Int. J.
Comput. Appl. 27 (11) (2011) 26–31.
Predicted Predicted
[9] H. Abbasimehr, A neuro-fuzzy classifier for Customer churn
Negative Positive
prediction, Int. J. Comput. Appl 19 (8) (2011) 35–41.
Actual
768 87 [10] E. Shaaban, Y. Helmy, A. Khedr, M. Nasr, A proposed churn
Negative
prediction model, Int. J. Eng. Res. Appl 2 (4) (2012) 693–697.
Actual
17 838 [11] I. Brandusoiu, G. Toderean, Churn prediction in the
Positive
telecommunications sector using support vector machines, Ann.
ORADEA Univ. Fascicle Manag. Technol. Eng. (1) (2013).
TABLE XI. PERFORMANCE INDICATORS
[12] K. Kim, C.-H. Jun, J. Lee, Improved churn prediction in
Model Threshold =0.5 Threshold =0.3 telecommunication industry by analyzing a large network, Expert
Accuracy 0.95 0.94 Syst. Appl. 41 (15) (2014) 6575–6584
Recall 0.94 0.98 [13] G. Olle, A hybrid churn prediction model in mobile
Precision 0.96 0.90 Telecommunication industry, Int. J. e-Educ. e-Bus. e-Manag. e-Learn.
F1-Score 0.95 0.94 4 (1) (2014) 55–62.
Specificity 0.96 0.90 [14] T. Vafeiadis, K.I. Diamantaras, G. Sarigiannidis, K.C. Chatzisavvas,
ROC-AUC 0.99 - A comparison of machine learning techniques for customer churn
prediction, Simul. Model. Pract. Theory 55 (2015) 1–9.