Springer ICTIS2020 Mukhopadhyay Malusare Nandanwar Shakshi

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/345675502
An Approach to Mitigate the Risk of Customer Churn Using Machine Learning

Algorithms
Chapter · October 2020

DOI: 10.1007/978-981-15-7106-0_13
CITATIONS READS
0 54
4 authors, including:
Debajyoti Mukhopadhyay
Bennett University
238 PUBLICATIONS 1,306 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Age Driven Automatic Speech Emotion Recognition System View project
Web Mining View project
All content following this page was uploaded by Debajyoti Mukhopadhyay on 15 March 2021.
The user has requested enhancement of the downloaded file.

An Approach to Mitigate the Risk of Customer Churn using
Machine Learning Algorithms
Debajyoti Mukhopadhyay, Aarati Malusare, Anagha Nandanwar, Shriya Sakshi
Abstract Churn Prediction plays an important role in various service-based

industries like telecom industry, life insurance, hospitality, banking and gaming.
In the telecom field, companies are seeking to develop means to predict
potential customers to churn. Therefore, finding the factors that increase
customer churn is important to take necessary actions to reduce this churn.
Therefore, the goal of our work is to develop the churn prediction model. The
process where one customer leaves one company and joins another is called a
churn. This paper will be discussing how to predict the customers that might
churn, Machine learning techniques is being used to do the prediction and helps
to represent large dataset in the form of graphs.
Keywords: Customer Churn ⋅ SVM ⋅ Random Forest ⋅ Decision Tree
Debajyoti Mukhopadhyay (✉) ⋅ Aarati Malusare ⋅ Anagha Nandanwar ⋅ Shriya Sakshi
Mumbai University, WIDiCoReL Research Lab
Maharashtra, India
e-mail: debajyoti.mukhopadhyay@gmail.com
Aarati Malusare
e-mail: aaratipmalusare1998@gmail.com
Anagha Nandanwar
e-mail: anaghanandanwar16@gmail.com
Shriya Sakshi
e-mail: shriya.sakshi03@gmail.com
1.Introduction
There are a number of telecommunication industries that are available and we have the luxury
to choose the one we want based on our requirements. Churn Prediction is a fundamental
aspect in serviced based industries. Churn can be defined as the customer who is switching
from one service provider to another service provider. It is most important for the companies
to hold the current customer rather than looking for a new one.
Predictive models give the proper idea about churners in the near future which helps to
provide a retention solution. This paper presents a new prediction model based on Machine
Learning (ML) techniques. The proposed model is composed of seven steps which are;
identify problem domain, data selection, investigate data set, classification, clustering,
knowledge usage and data visualization. The algorithms used in this system are Decision
Tree, Support Vector Machine and Random Forest.
Due to a competitive environment it is not easy to retain the customers. Therefore, companies
are providing new technologies to offer better services to their customers so as to retain them.
Before doing so, it is necessary to identify those customers who are likely to leave the
company in the near future in advance because losing them would result in significant loss of
profit for the company. This process is called Churn Prediction.
Machine Learning techniques are found to be more effective in predicting customer churn
from the research carried out during the past few years. To build an effective model for churn
prediction is an important task that involves lots of research right form identification of
features form large dataset to the selection of effective machine learning algorithm.
2.Literature Review
Chuanqi Wang et al. [1] explained Churn Prediction Model as cost sensitive classification
problem. They used the term cost sensitive because the way the model was designed to
classify the customers into churn and non-churn group will decide the maximum profit
gained by the company. In proposed work, the authors showed the classification performance
and the misclassification rate.
Ning Lu et al. [2] proposed Churn Prediction Model using Boosting algorithm. In this paper,
customers were divided into two clusters based on the weight assigned by the Boosting
algorithm. The Logistic Regression was used on each Base Classifier to predict churn
customers. The result Boosting was a good classifier for the Churn Prediction analysis.
Peng Li 1, 2, Sieben Li 2 [3] in these papers they predict the customer churn by using various
R packages and they created a classification model and they train by giving him a dataset and
after training they can classify the records into churn or non-churn. In this they are using a
logistic regression model.
Kiran Dahiya, Surbhi Bhatia [4] Customer churn plays an important role in customer
relationship management (CRM), and they are using various machine learning algorithm to
predict customer churn and they found ensemble learning is a best to predict customer churn,
but there exist still a lot of problems
Praveen et al [5] According to the experiment based on the real data, it is shown that the
method not only obtains a good classification performance, but also reduces the total
misclassification costs effectively.
3.Proposed Work
In the proposed system Python programming will be used to build the model for churn
prediction.
Fig.1 Proposed System
Data collection
For churn prediction the telecom dataset has been used and prediction has been done for the
same.
Data preparation
For analysis we have to first clean the data and keep it ready so that the desired results can be
derived from it. Data has been clean so that the redundancy and errors can be removed
because having such data will lead to incorrect results as well. In this paper a Churn Analysis
has been applied on Telecom data, here the agenda is to know the possible customers that
might churn from the service provider. The end result would give us the probability of churn
for each customer in the form of graph. The dataset that is used has 20 variables available.
These are related to Gender, Monthly charge, Phone Service etc. The dataset has over 7000
customer related information available.
Prediction
The business is interested in the final product and it is very important to represent your result
in a “graphical representation” such a way that it is understandable and the result helps
business to make the needed predictions which in turn brings profit.
Fig.2 Measurement of Accuracy
Data Visualization Tools

Matplotlib python library is used in this system which is a graph plotting library which was
developed by John D. Hunter. It is the most popular graphing and data visualization module
for python which helps in representing the information in an efficient and understandable
way. Power BI tool is also used for clear visualization of dataset.
At the end, From the graph churn value is represented and conclude that these are the
possible customers that would be churn from the telecom service provider.
Fig.3 Visualization of dataset
4.Architecture
A model churn prediction system consists of five phases:
● Collection of datasets and identifying the problem domain

● Extracting the required features for developing churn models
● Construction of models using different classifiers and cross validate the models
● Clustering of churning customers
● Prediction in the form of GUI & visualizing the result.
Fig.4 Process Flow
In this model different algorithms are used such as SVM, Decision Tree, Random Forest.
Support Vector Machine
SVM algorithm was proposed by Boser, Guyon, and Vapnik. It was used for both
classification and regression problems. SVM maps all the data to a higher dimensional plane
to make the data linear separable. The plane which divides data is known as hyper
plane.SVM model tries to find out the churn and non-churn customer. In order to divide the
dataset into churner and non-churn group, first it will take all the data points in n dimensional
plane and divide the data points into churner and non-churner group based on maximum
marginal hyperplane. Based on the maximum marginal hyperplane it will divide the data
points into churner and non-churner group. Here n represents the number of predictor
variables associated with the dataset.
Decision Tree
Decision Tree was developed to overcome the disadvantage of ID3 algorithm. C4.5 utilizes
the benefits of greedy approach and uses a series of rules for classification.
Random Forest
Random forest is used for both classification as well as regression. However, it is mainly
used for classification problems. As we know, a forest is made up of trees and more trees
means a more robust forest. Similarly, random forest algorithm creates decision trees on data
samples and then gets the prediction from each of them and finally selects the best solution.
KNN
KNN is used to solve both classification and regression problems.It is a non-parametric, lazy
learning algorithm. KNN uses a database in which the data points are separated into various
classes to predict the classification of a new sample point.
Logistic regression
Logistic regression is the regression analysis which is used when the dependent variable is
dichotomous (binary). Logistic regression is a predictive analysis. It is used to give detailed
information about data and to explain the relationship between one dependent binary variable
and one or more nominal, ordinal, interval or ratio-level independent variables.
5. Future Scope
● The proposed system can be used in banking and financial institutions. Customers
tend to move from one company to another due to interest rate, fixed deposit rate and
other services provided by the bank
● The proposed system can be used in the Telecom industry. System helps to find the
reason for why customers are leaving them and join their competitors
● The proposed system can be used in Online Gaming Industries
● The proposed system can be used in Hospitality

6.Conclusion
Till date, many churn prediction models are introduced. But companies require a simple and
robust model to differentiate between non-churns and churns then clustering the resulted
churners for providing retention solutions. In this prediction model, ML algorithms are
introduced to help a CRM(Customer Relationship Management) department of various
service-based companies to keep track of its customers and their behaviour against churn.
Initially, we solved the churn prediction problem by applying machine learning algorithms to
overcome the problem of customer churn in the telecom industry. Previous works are
thoroughly studied and summarized the present customer churn prediction model. Unlike
other existing systems, which primarily focused only on the prediction models and the
accuracy of churn prediction, in this system we presented the characteristics of the existing
publicly available churn prediction datasets as well as visualized them. Further, we focused
on different customer related variables that are used for churn prediction and categorized
them. Finally, we surveyed the list of the commonly used metrics proposed in the previous
system for evaluating the performance of various churn prediction models. Also, we
visualized the end result for better understanding using python programming.
References
1.Chuanqi Wang, Ruiqi Li, Peng Wang, Zonghai Chen, “Partition cost sensitive CART based
on customer value for Telecom customer Churn Prediction, Control Conference (CCC),
September 2017.
2.Ning Lu, Hua lin, Jie Lu, Guanghua Zhang, “A Customer Churn Prediction in Telecom
Industry using Boosting. Customer Behaviour in Telecommunications,” IEEE Transaction
on Industrial Informatics, Vol. 10, (2), May 2014.
3.Peng Li 1, 2, Siben Li 2, Tingting Bi 2, Yang Liu 2, " Telecom Customer Churn Prediction
Method Based on Cluster Stratified Sampling Logistic Regression" in IEEE.
4.Kiran Dahiya, Surbhi Bhatia, “Customer Churn Analysis in Telecom Industry” in IEEE
2015, 978-1-4673-7231-2/15
5.Praveen et al., Churn Prediction in Telecom Industry Using R, in (IJETR) ISSN:

2321-0869, Volume-3, Issue-5, May 2015
View publication stats
6.Adnan Idris, Asifullah Khan, “Ensemble based Efficient Churn Prediction Model for
Telecom”, Frontiers of Information Technology (FIT), International Conference,
pp.5680-5684, June 2015.
7.Guo-en Xia, Hui Wang, Yilin Jiang, “Application of Customer Churn Prediction Based on
Weighted Selective Ensembles,” International Conference on Systems and Informatics
(ICSAI 2016), pp. 513-519, November 2016.

Springer ICTIS2020 Mukhopadhyay Malusare Nandanwar Shakshi

Uploaded by

Copyright:

Available Formats

You might also like

Springer ICTIS2020 Mukhopadhyay Malusare Nandanwar Shakshi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Springer ICTIS2020 Mukhopadhyay Malusare Nandanwar Shakshi

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

An Approach to Mitigate the Risk of Customer Churn Using Machine Learning

Chapter · October 2020

Age Driven Automatic Speech Emotion Recognition System View project

Web Mining View project

The user has requested enhancement of the downloaded file.

Debajyoti Mukhopadhyay, Aarati Malusare, Anagha Nandanwar, Shriya Sakshi

Abstract ​Churn Prediction plays an important role in various service-based

Keywords​: Customer Churn ​⋅​ SVM ​⋅ ​Random Forest ​⋅ ​Decision Tree

Fig.1 Proposed System

Fig.2 Measurement of Accuracy

Data Visualization Tools

A model churn prediction system consists of five phases:

● Collection of datasets and identifying the problem domain

Support Vector Machine

● The proposed system can be used in Online Gaming Industries

● The proposed system can be used in Hospitality

5.Praveen et al., Churn Prediction in Telecom Industry Using R, in (IJETR) ISSN:

You might also like

Abstract Churn Prediction plays an important role in various service-based

Keywords: Customer Churn ⋅ SVM ⋅ Random Forest ⋅ Decision Tree