Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Predicting Telecom Customer

Retention using Machine Learning


Mohd Arman*, Rohan Gupta**, Rahul Bhandari**, Ms. Deepti Sahu**
Department of Computer Science Engineering
Sharda University, Uttar Pradesh

armansheakh987@gmail.com

Abstract
In the competitive telecom industry, predicting and Usman, Muhammad (2018) proposes a novel churn
managing customer churn is vital for effective retention prediction and retention model using fuzzy classifiers,
strategies. This research explores machine learning achieving 98% accuracy in churner classification. The
algorithms—Logistic Regression, Support Vector model automatically generates intelligent retention
Machines, Random Forest, K-Nearest Neighbor, and campaigns based on customer usage and complaint
Naive Bayes—for telecom customer churn prediction. The patterns, showcasing a holistic approach to customer
project starts with problem understanding and relationship management. (Usman, 2018)
exploratory data analysis, visualizing data for insights.
Multiple classifiers are trained and evaluated using AUC Pandithurai, O., Ahmed, H. H., S, H. N., Sriman, B., R,
scores and ROC curves. Among them, the Random Forest S. (2023) addresses customer churn in large-scale
Classifier performs exceptionally well with an AUC of industries, proposing a machine learning model that
~96%. Telecom providers prioritize customer attrition predicts potential churn. The paper compares different
analysis as a key business metric. Machine learning classification models, including Logistic Regression
algorithms, analyzing factors like subscribed services, and Random Forest, emphasizing key performance
tenure, gender, and payment method, assist in predicting metrics to guide effective decision-making. (al P. e.,
churn. The Random Forest model, with an accuracy of 2023)
~96%, precision of ~96% for retained customers, and
~94% for churned customers, proves effective. This
research not only contributes valuable insights into the Sharma, A., Gupta, D., Nayak, N., Singh, D., Verma,
domain of telecom customer churn prediction but also A. (2022) conducts a study comparing the accuracy of
underscores the practical application of machine learning various machine learning techniques in predicting
algorithms in addressing real-world business challenges. customer churn. The research proposes an algorithm
based on these techniques, aiming to identify major
causes for customer churn and suggesting ways for
Keywords enterprises to improve customer retention. (al S. e.,
Churn prediction system, Machine learning, 2022)
Telecommunication industry, Retention, Logistic Random
Forest.
Ahmad, Abdelrahim Kasem; Jafar, Assef; Aljoumaa,
Kadan (2019) advances the conversation with a churn
Introduction prediction model utilizing machine learning techniques
Customer churn prediction stands as a critical concern within on a big data platform. The model achieves a
the telecommunications industry due to its profound impact on commendable 93.3% AUC, emphasizing the practical
both customer retention and overall revenue. Developing an relevance of incorporating social network dynamics
effective churn prediction model is a time-consuming yet into churn prediction models. (al, 2019)
essential process, addressing the multifaceted domain of
customer churn, including its effects, causative factors, Suhanda, Yogasetya; Nurlaela, Lela; Kurniati, Ike;
business imperatives, methodologies, and prediction Dharmalau, Andy; Rosita, Ita (2023) rounds off the
techniques. discussion with a focus on predictive customer retention
using the random forest algorithm. The study's results,
In the study conducted by Gopal, Priya & MohdNawi, Nazri indicating approximately 81.12% customer retention and
(2023), they introduce an innovative hybrid model that 18.87% customer churn, highlight the algorithm's
integrates Convolutional Neural Networks (CNN) and a effectiveness. The identification of customer_activity as
modified Variational Autoencoder (VAE) to enhance the the most influential feature on customer retention provides
classification of high-dimensional churn data. The model's actionable insights for telecom companies striving to
effectiveness is evaluated on six benchmark datasets, enhance their customer retention strategies.
demonstrating notable efficacy in handling high-dimensional
and imbalanced time series data.
Related Work
To identify the most suitable classifier, a thorough
Title Authors Publicatio Remark model comparison was conducted. This evaluation
n Date
Improved CNN for Churn Gopal, Priya & 2023 Hybrid CNN-VAE involved calculating the Area Under the Curve (AUC)
Analysis MohdNawi, Nazri model enhances
classification of high-
score and plotting Receiver Operating Characteristic
dimensional churn data, (ROC) curves for each trained model. The Scikit-Learn
demonstrating efficacy
on benchmark datasets. library, a robust and widely used Python machine
ML in Telecom for Churn Joolfoo, Muhammad 2020 Logistic regression and learning library, was instrumental in implementing
Prediction KNN with big data
achieve 80% accuracy, these classifiers due to its efficiency and user-friendly
71% AUC in predicting
telecom customer churn. features.
Fuzzy-Based Churn Usman, Muhammad 2018 Fuzzy classifiers yield
Prediction Model 98% accuracy,
automating intelligent A well-defined problem statement and business case
retention campaigns
based on customer
were established to provide context and underscore
behavior. the significance of churn prediction in the
ML for Telecom Churn Pandithurai et al 2023 ML model predicts
Prediction churn, comparing telecommunication industry. Understanding the
Logistic Regression and
Random Forest with key
business implications guided the subsequent stages of
performance metrics. the research.
ML Techniques for Sharma et al 2022 Comparative study on
Customer Retention ML techniques for
customer churn,
proposing an algorithm
The data preparation phase involved importing
for improving retention. necessary libraries, acquiring datasets, and conducting
ML in Big Data for Churn Ahmad et al 2019 Churn prediction model
Prediction on big data with social Exploratory Data Analysis (EDA) to unveil insights
network analysis into the dataset's characteristics. Subsequent data
achieves 93.3% AUC.
Predictive Analysis with Suhanda et al 2023 Random Forest predicts visualization techniques were employed to present
Random Forest 81.12% customer
retention, identifying
meaningful patterns and correlations within the dataset,
customer_activity as a aiding in the identification of potential predictors of
key feature
churn.

Methodology Data preprocessing ensured that the dataset was in an


The methodology employed in this research encompassed optimal format for model training. This step included
several key steps to comprehensively address the task of handling missing values, encoding categorical
predicting telecom customer churn using machine learning variables, and scaling features as necessary. Multiple
classifiers. The initial phase involved gaining a thorough machine learning models, such as Logistic Regression,
understanding of various machine learning classifiers, Support Vector Machine, Random Forest Classifier, K-
including Logistic Regression, Support Vector Machines, and Nearest Neighbor, and Naive Bayes Classifier, were
Random Forest. This foundational knowledge was essential then trained and evaluated using the prepared dataset.
for making informed decisions during subsequent stages of the
research.

Figure 2: LOGISTIC REGRESSION CLASSIFIER

Figure 1: Methodology
Figure 3: SUPPORT VECTOR MACHINE (SVM)

Figure 6: Confusion Matrix(Naive Bayes)

Performance evaluation was a critical step, involving


the assessment of each model using metrics like
accuracy, precision, recall, F-measure, and ROC
curves. This thorough analysis facilitated the selection
of the most effective model for predicting telecom
customer churn.

The research concluded with a comprehensive


comparison of the trained models, identifying the
model that demonstrated superior performance in the
context of predicting churn in the telecommunication
industry. Limitations were acknowledged, and
potential avenues for future research were highlighted,
ensuring a well-rounded and insightful exploration of
Figure 4: K-NEAREST NEIGHBOUR (KNN) the chosen problem domain.

Figure 5: The graph represents that Random Forest


algorithm produced the best AUC. Therefore,
Random Forest model did a better job of classifying
Figure 5: Confusion Matrix(Random Forest) the churned/retained telecom customers.
Conclusion and Result: References
In conclusion, the Random Forest Classifier emerged as the 1. Gopal, Priya & MohdNawi, Nazri (2023). "An
most effective model among the trained classifiers for Improved Convolutional Neural Network for
predicting telecom customer churn. The evaluation metrics Churn Analysis." International Journal of
demonstrated its robust performance in accurately classifying Advanced Computer Science and Applications,
customers into retained and churned categories. 14.
https://doi.org/10.14569/IJACSA.2023.0140921.
The Random Forest Classifier yielded the following scores:
2. Joolfoo, Muhammad (2020). "Customer Churn
Prediction in Telecom Using Machine Learning in
Accuracy: Approximately 96% label accuracy,
Big Data Platform.
indicating the overall correctness of predictions.
3. Usman, Muhammad (2018). "A Fuzzy-Based
Precision: Approximately 96% precision in correctly Churn Prediction and Retention Model for Prepaid
identifying retained customers and around 94% Customers in Telecom Industry." International
precision in identifying churned customers. This Journal of Computational Intelligence Systems,
reflects the model's ability to minimize false positives. 11, 66-78. https://doi.org/10.2991/ijcis.11.1.6.
4. Pandithurai, O., Ahmed, H. H., S, H. N., Sriman,
Recall: Approximately 99% recall for retained B., R, S. (2023). "Telecom Customer Churn
customers, signifying the model's capacity to capture Prediction Using Supervised Machine Learning
the majority of actual retained customers. The recall for Techniques." 2023 International Conference on
churned customers was approximately 76%, indicating Advances in Computing, Communication and
the model's ability to correctly identify a significant Applied Informatics (ACCAI).
portion of customers who actually churned. https://doi.org/10.1109/ACCAI58221.2023.1020
0429.
5. Sharma, A., Gupta, D., Nayak, N., Singh, D.,
It is noteworthy that while the Random Forest Classifier Verma, A. (2022). "Prediction of Customer
exhibited high performance, there is room for further Retention Rate Employing Machine Learning
enhancement through the application of advanced Techniques." 2022 1st International Conference
optimization techniques such as "Grid Search." Grid Search on Informatics (ICI).
involves an exhaustive search over a specified hyperparameter https://doi.org/10.1109/ICI53355.2022.9786903.
grid, allowing for the identification of the most optimal model
configuration. 6. Ahmad, Abdelrahim Kasem; Jafar, Assef;
Aljoumaa, Kadan (2019). "Customer Churn
Resource for understanding Grid Search: Hyperparameter Prediction in Telecom Using Machine Learning in
Big Data Platform." Journal of Big Data, 6(1), 28.
Optimization with Random Search and Grid Search
https://doi.org/10.1186/s40537-019-0191-6.
This project's success in predicting customer churn highlights 7. Suhanda, Yogasetya; Nurlaela, Lela; Kurniati, Ike;
the potential of machine learning algorithms in addressing Dharmalau, Andy; Rosita, Ita (2023). "Predictive
critical business challenges within the telecom industry. The Analysis of Customer Retention Using the
findings provide valuable insights for telecom service Random Forest Algorithm.
providers seeking to implement effective customer retention
strategies and optimize operational efficiency.

Accuracy: ~96% label


accuracy

Precision: ~96% labeled as ~94% labeled as


Retained churned
customers customers

Recall: ~99% labelled ~76% labeled as


as Retained churned
customers customers

You might also like