Telco Customer Churn Prediction Project Report

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

SRI RAMAKRISHNA ENGINEERING COLLEGE

BONAFIDE CERTIFICATE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

MINI PROJECT II – MAY 2024


This is to certify that the project entitled

ANTICIPATING CUSTOMER ATTRITION

is the bonafide record of Mini Project II done by

ABINAYASRI S–2101003
ARAVIND C-2101013
FIZA S-2101054

of B.E. Computer Science and Engineering during the year 2023-2024.


who carried out the Mini Project II under my supervision, certified further that to the
best of my knowledge the work reported herein does not form part of any other thesis
or dissertation on the basis of which a degree or award was conferred on an earlier
occasion on this or any other candidate.

Mrs.T.Nithya Shree AP/CSE Dr. A. Grace Selvarani, Ph.D.,


PROJECT GUIDE HEAD OF THE DEPARTMENT
Assistant Professor, Professor,
Computer Science and Engineering, Computer Science and Engineering,
Sri Ramakrishna Engineering College, Sri Ramakrishna Engineering College,
Coimbatore-641022. Coimbatore-641022.

Submitted for the Project Viva-Voce Examination held on

Internal Examiner External Examiner


i
DECLARATION

We affirm that the Mini Project II titled “ANTICIPATING CUSTOMER


ATTRITION” being submitted in partial fulfillment for the award of Bachelor of
Engineering is the original work carried out by us. It has not formed the part of any other
project work submitted for award of any degree or diploma, either in this or any other
University.

(Signature of the Candidates)

ABINAYASRI S – (2101003)
ARAVIND C – (2101013)
FIZA S – (2101054)

I certify that the declaration made above by the candidates is true.

(Signature of the guide)

Mrs. T. NITHYA SHREE,


Assistant Professor,
Department of CSE

ii
ACKNOWLEDGEMENT

We express our gratitude to Sri. D. LAKSHMINARAYANASWAMY, Managing


Trustee, Sri. R. SUNDAR, Joint Managing Trustee, SNR Sons Charitable Trust,
Coimbatore for providing excellent facilities to carry out our project.

We express our deepest gratitude to Dr. N. R. ALAMELU, Principal, for her valuable
guidance and blessings.

We thank Dr. A. GRACE SELVARANI, Professor and Head, Department of Computer


Science and Engineering who modeled us both technically and morally for achieving great
success in life.

We sincerely thank our Project Coordinator, Dr. R. MADHUMATHI ,Associate Professor,


Department of Computer Science and Engineering for her great inspiration.

Words are inadequate to offer thanks to our respected guide. We wish to express our sincere
thanks to Mrs. T. NITHYA SHREE, Assistant Professor, Department of Computer
Science and Engineering, who gives constant encouragement and support throughout this
project workand who makes this project a successful one.

We also thank all the staff members and technicians of our Department for their help in
making this project a successful one.

iii
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT Vi
LIST OF FIGURES Vii

LIST OF ABBREVIATIONS Vii


1 INTRODUCTION 1
2 LITERATURE REVIEW 2
2.1 Customer churn prediction using synthetic minority 2
oversampling technique
2.2 New insights into churn prediction in the 2
telecommunication sector: driven data mining approach
2.3 Comparative study of machine learning algorithms 3
For churn prediciton
2.4 An intelligent customer churn prediction and 3
response framework
2.5 Customer churn prediction using improved balanced 3
random forests
3 SYSTEM ANALYSIS 4

3.1 Existing System 4


3.2 Proposed System 5
4 SYSTEM SPECIFICATION 6
4.1 Hardware Requirements 6

4.2 Software Requirements 6


5 SOFTWARE DESCRIPTION 7
5.1 Python 7
5.2 Jupytor Notebook 7
5.3 VSCode 8
5.4 Numpy 9
6 PROJECT DESCRIPTION 10

iv
6.1 Problem Definition 10

6.2 Introduction to Proposed System 11

6.2.1 System Architecture 11

6.1.2.MachineLearning 12

1.KNN

2.Random Forest

3.XGBoost

6.3 Module Description 13


6.3.1 Data Collection And Exploratory Data Analysis 13
6.3.2 Model Building 13
6.3.3 Model Evaluation 14
6.3.4 Flash Integration And Deployment 14
6.3.5 Comparison of Various Algorithms 14
7 SYSTEM IMPLEMENTATION 16

7.1 System Architecture 16


7.2 Flow Chart 16
8 RESULTS AND DISCUSSION 17
9 CONCLUSION AND FUTURE ENHANCEMENTS 18

9.1 Conclusion 18
9.2 Future Enhancements 18
9.3 Appendix 19
9.3.1 Source Code 19

9.3.2 Snapshots 27
10 REFERENCES 29

v
ABSTRACT

Customer churn poses a significant challenge for businesses across various industries,
impacting revenue streams and hindering growth potential. In this project, we aim to develop an
effective churn prediction model to identify customers at risk of churn, enabling proactive retention
strategies. Leveraging machine learning techniques and a comprehensive dataset encompassing
customer demographics, customer history, and engagement metrics, we conducted exploratory
data analysis to uncover patterns and insights into churn behavior. Feature selection and
engineering were employed to extract relevant features, followed by model selection and training
using algorithms such as logistic regression, decision trees, and gradient boosting machines.
Hyperparameter tuning and cross-validation techniques were applied to optimize model
performance. The final model was evaluated on a holdout dataset, achieving promising results with
high accuracy and predictive power. Our findings provide valuable insights for businesses to
anticipate and mitigate customer churn, ultimately fostering customer loyalty and sustainable
growth.

vi
LIST OF FIGURES

FIGURE NO NAME OF THE FIGURE PAGE NUMBER


6.1.1 System architecture 11
6.3.5 Comparing the various algorithm 15
7.1.1 System Implementation 16
7.2.2 Flowchart 16
9.3.2.1 Landing Page 27
9.3.2.2 Login Page 27
9.3.2.3 Customer Churn Predictor 28
9.3.2.4 Result for Churned Customer 28
9.3.2.5 Result for Non-Churned Customer 28

LIST OF ABBREVIATIONS

ABBREVIATION EXPANSION
KNN K-Nearest Neighbors
XGB Extreme Gradient Boosting
API Application Programming Interface
MLP Multi-Layer Perceptron

vii
1. INTRODUCTION

Customer churn, the phenomenon of customers discontinuing their relationship with a


business, presents a significant challenge across industries, impacting revenue streams and hindering
growth potential. In response, businesses are increasingly turning to predictive analytics and
machine learning techniques to identify customers at risk of churn and implement proactive retention
strategies. This project focuses on developing a robust churn prediction model to empower
businesses with the foresight to intervene and engage with at-risk customers before they defect to
competitors.
The project leverages a comprehensive dataset encompassing diverse customer attributes,
including demographics, historical transactions, usage patterns, and interaction metrics. Through
exploratory data analysis (EDA), we aim to uncover actionable insights into the underlyingfactors
driving churn behavior. By employing feature selection and engineering techniques, we extract
relevant features that capture the essence of customer churn dynamics. The core of the projectinvolves
the application of various machine learning algorithms, including logistic regression, decision trees,
random forests, and gradient boosting machines, to construct predictive models.
Model training involves rigorous evaluation and optimization procedures, including data
splitting, cross-validation, and hyperparameter tuning, to ensure robust performance and
generalizability. Thesignificance of this project lies in its potential to enable businesses to implement
targeted retention initiatives and personalized marketing campaigns aimed at preserving customer
loyalty. By accurately identifying customers most likely to churn in the near future, companies can
allocate resources more efficiently and effectively, thereby maximizing customer lifetime value and
enhancing overall profitability.
In this introduction, we provide an overview of the project'sobjectives, methodology, and
expected outcomes. Through this endeavor, we aim to offer valuable insights and actionable
recommendations to businesses seeking to mitigate customer churn and cultivate long-term customer
relationships in an increasingly competitive marketplace.

1
2. LITERATURE REVIEW

2.1 TITLE: CUSTOMER CHURN PREDICTION USING SYNTHETIC


MINORITY OVERSAMPLING TECHNIQUE

Customer churn prediction is the process of using data and analytics to forecast that
customers are likely to stop using a product or service. Business analysts and CRM (Customer
Relationship Management) analysts should comprehend the effects behind customer churn and
analyze the behaviour trends of existing churn customers from the data.

2.2 TITLE: NEW INSIGHTS INTO CHURN PREDICTION IN THE


TELECOMMUNICATION SECTOR: A PROFIT DRIVEN DATA MINING
APPROACH

This study presents a profit-driven data mining approach for churn prediction in the
telecommunications sector. By integrating customer lifetime value (CLV) into the modeling process,
the authors demonstrate improved accuracy and actionable insights for reducing churn and
maximizing profitability.

2.2 TITLE: COMPARATIVE STUDY OF MACHINE LEARNING


ALGORITHMS FOR CHURN PREDICTION

This comparative study evaluates the performance of various machine learning algorithms for
churn prediction in the e-commerce domain. By analyzing factors such as accuracy, precision,the
authorsprovide insights into the strengths and weaknesses of different modeling/ technique.

2
2.3 TITLE:AN INTELLIGENT CUSTOMER CHURN PREDICITON AND
RESPONSE FRAMEWORK

Customer retention is one of the most important issues for companies. Companies always
seek to reduce customer churn in order to increase the customer lifetime value and reduce the
cost of acquisition of new customers. By focusing on customer churn prediction and
identification, companies can predict in advance which customers are going to churn and
therefore decrease customers churn rate through related personalized actions.

2.4 TITLE: CUSTOMER CHURN PREDICTION USING IMPROVED


BALANCED RANDOM FORESTS

This study proposes an Improved Balanced Random Forests (IBRF) algorithm for customer
churn prediction. By addressing class imbalance and incorporating feature selection techniques, the
authorsdemonstrate enhanced performance in accurately identifying churn-prone customers.

2.5 TITLE: ENHANCING CUSTOMER CHURN PREDICTION IN


DIGITALBANKING USING ENSEMBLE MODELING

There are many purchases in the digital banking platform occurred daily. The electronic
banking contains various transactions with purchase behavior. Customer attrition from a digital
store to another has become a challenge to the business owners. So, Businesses should measure
their customer churn rate at regular intervals.Customer churn prediction is an important strategy
involved in the customer relationship management(CRM) strategies to forecast the probability of
attrition.

3
3. SYSTEM ANALYSIS

3.1 EXISTING SYSTEM:

System analysis is a critical phase in the development lifecycle of any software or


information system, encompassing a comprehensive examination and understanding of system
requirements, objectives, and constraints. It initiates with the gathering of requirements from
various stakeholders, including end-users, managers, and technical experts. Through interviews,
surveys, and workshops, system analysts delve into the needs of the organization and its users,
aiming to identify existing problems or challenges that the system aims to address.
This phase involves conducting a feasibility study to assess the viability of the proposed
system from technical, economic, and operational standpoints. Technical feasibility evaluates
whether the system can be developed using available technology, while economic feasibility
assesses the project's financial viability. Operational feasibility focuses on determining if the
system will meet user needs and align with organizational goals.Once requirements are gathered,
system analysts perform a detailed analysis and specification process.
This involves breaking down requirements into functional and non-functional categories
and documenting them using various techniques such as use case diagrams, entity-relationship
diagrams, and requirement traceability matrices. System modeling plays a crucial role in this
phase, where analysts create visual representations of the system's structure, behavior, and
interactions. Common modeling techniques include data flow diagrams, entity-relationship
diagrams, and process models. These models aid stakeholders in visualizing system components,
workflows, and relationships, facilitating communication and validation of requirements.In some
cases, system analysts may develop prototypes or mockups to demonstrate key features and
gather feedback from stakeholders early in the development process.
Prototyping helps validate requirements, refine design decisions, and identify potential
issues before full-scale development commences. Requirement validation is an iterative process
where analysts review requirements with stakeholders to ensure accuracy, completeness, and
alignment with organizational goals. This involves conducting review meetings, demonstrations,
and sign-off procedures to confirm agreement on the system scope and requirements.

4
3.2 PROPOSED SYSTEM:

The proposed system aims to develop a comprehensive churn prediction model that
utilizesadvancedanalytics and machine learning techniques to identify customers at risk of churn and
implement proactive retention strategies. The system will start by collecting and preprocessing
relevantcustomer data from various sources such as CRM systems, databases, and third-party APIs.
Exploratory data analysis (EDA) will be conducted to gain insights into customer behavior, identify
trends, correlations, and anomalies, thereby informing feature selection and modeling decisions.
Feature engineering techniques will be employed to extract relevant features from the dataset, while
feature selection methods will help identify the most important predictors of churn, reducing
dimensionality and improving model efficiency.
Multiple machine learning algorithms, including logistic regression, decision trees, random
forests, gradient boosting machines, and neural networks,will be trained on the preprocessed data,
with hyperparameters optimized and performance evaluatedusing cross-validation techniques. The
final model will be validated on unseen data to ensure generalizability and reliability in real-world
scenarios.
Once validated, the churn prediction model will be deployed into production environments,
seamlessly integrated with existing business systemsand workflows for real-time predictions. This
deployment will include the development of APIs or interfaces for automated decision-making and
personalized customer interactions. The proposed system offers several benefits, including early
identification of customers at risk of churn, optimization of marketing strategies and resource
allocation, and enhancement of customer satisfaction and loyalty through improved engagement and
service quality.
Continuous monitoring and refinement of the churn prediction model will be conducted based
on ongoing feedback and newdata, with future enhancements including the integration of additional
data sources and advanced techniques for further improving prediction performance. Overall, the
proposed system provides a comprehensive solution for businesses seeking to mitigate customer
churn and drive sustainable growth and profitability through data-driven insights and predictive
analysis.

5
4. SYSTEM SPECIFICATION

4.1 SOFTWARE REQUIREMENTS:

▪ HTML
▪ Bootstrap CSS
▪ Javascript
▪ Python
▪ Flask

4.2 HARDWARE REQUIREMENTS:

▪ The minimum memory size required.


▪ Smartphone-Offers more advanced computing ability and connectivity.
▪ Personal Laptop.
▪ Processor-Intel I3
▪ RAM-8GB
▪ EDITION-Windows 11

6
5. SOFTWARE DESCRIPTION

5.1. PYTHON:
Python serves as the primary programming language for developing the customer churn
prediction project. Its simplicity, versatility, and extensive library support make it well-suited for
data analysis, machine learning, and predictive modeling tasks. Python provides a rich ecosystem of
packages and frameworks, facilitating seamless integration with other tools such as NumPy,
pandas, and scikit- learn for data manipulation, analysis, and model development.

5.2 JUPYTER NOTEBOOK:

Jupyter Notebook is an interactive computing environment that allows users to create and share
documents containing live code, equations, visualizations, and narrative text. It provides an ideal
platform for data exploration, analysis, and experimentation in a collaborative and reproducible
manner. In the churn prediction project, Jupyter Notebook is used for exploratory data analysis
(EDA), model development, and result visualization. Its interactive interface enables users to
iteratively explore data, visualize insights, and document findings in a structured and intuitive
manner.

KEY COMPONENTS OF JUPYTER NOTEBOOK:

1. Notebook Dashboard:The notebook dashboard is the interface where users can manage and
organize their Jupyter notebooks. It provides access to existing notebooks, allows for the creation
of new notebooks, and facilitates navigation through directories and files.

2. Notebook Editor:The notebook editor is the main workspace where users can create, edit,and
execute code cells, as well as add formatted text, equations, images, and interactive widgets. It
supports multiple programming languages, including Python, R, Julia, and Scala.

7
3. Cells: Jupyter notebooks are organized into cells, which can contain either code, markdown
text,or raw text. Code cells allow users to write and execute code interactively, while markdown
cells enable the inclusion of formatted text, equations using LaTeX syntax, and multimedia
content.

4. Kernel: The kernel is the computational engine that executes code within the notebook
environment. Each notebook is associated with a specific kernel, which determines the
programming language and environment in which code cells are executed. Jupyter supports a wide
range of kernels, including Python, R, Julia, and more

5. Execution Environment: Jupyter Notebook provides an interactive computing environment that


allows users to execute code and view results in real-time. Users can run individual code cellsor
execute the entire notebook, observing outputs such as text, plots, tables, and interactive
visualizations directly within the notebook interface.

5.3 VISUAL STUDIO CODE(VSCode):

Visual Studio Code (VSCode) is a lightweight yet powerful source code editor developed by
Microsoft. It offers a wide range of features, including syntax highlighting, code completion,
debugging support, and version control integration, making it an ideal environment for software
development and data science projects. In the context of the customer churn prediction project,
VSCode provides a convenient and customizable workspace for writing, debugging, and managing
Python code, as well as integrating with Jupyter Notebooks for interactive data analysis and model
development.

KEY FEATURES OF VSCode:

1. Intelligent Code Editing: VSCode provides intelligent code editing features such as syntax

highlighting, code completion, and automatic code formatting, which help developers write
code faster and with fewer errors

8
2. Built-in Terminal: VSCode includes a built-in terminal that allows developers to execute
Command-line tasks directly within the editor, eliminating the need to switch between
Different applications.

3. Debugger: VSCode comes with a built-in debugger that supports multiple languages, enabling
developers to debug their code directly within the editor. It provides features such as breakpoints,
watch variables, and call stacks to facilitate the debugging process.

4. Task Automation: VSCode supports task automation through its built-in task runner, which
allows developers to define and execute custom tasks such as building, testing, and deployment
directly within the editor. This streamlines common development tasks and improves workflow
efficiency.

5. Integrated Development Environment (IDE) Features:Despite being a lightweight code


editor,VSCode offers many features typically found in full-fledged integrated development
environments (IDEs), such as code navigation, refactoring tools, and project management
capabilities.

5.4 NUMPY

NumPy is a fundamental library for scientific computing in Python, providing support for
multi dimensional arrays, mathematical functions, and linear algebra operations. In the context
of the customer churn prediction project, NumPy is used for efficient data manipulation,
numerical computations, and array-based operations. It enables seamless handling of large
datasets and facilitates the implementation of machine learning algorithms for churn
prediction.

9
6. PROJECT DESCRIPTION:

6.1 PROBLEM DESCRIPTION:

Retaining customers has shown to be the most successful strategy in the highly
competitivetelecommunications sector, where gaining new clients, upselling to current
ones, and improving customer retention are important revenue-generating techniques.
Considering the greater convenience and cheaper costs of this method in comparison to
acquiring newclients or upselling to existing ones, it is especially enticing. Telecom
companies are putting more of an emphasis on anticipating and resolving customer
churn—defined as customers switching providers—since they realise how important it is
to mitigate this issue. Telecom businesses use machine learning technology to identify
customers who areat risk of leaving by analysing previous data. This technology has shown
to be highly effective in churn prediction.

Nine months' worth of user data were used in a recent studyon the Syrian telecom
sector.Utilising a combined total of almost 70 Terabytes across multiple forms, predictive
models were constructed. With the use of a big data platform togather and process the data,
characteristics like social network analysis (SNA) metrics might be extracted, improving
the churn models' predicted accuracy. In order to create a reliable predictive model for
churn, the study assessed a variety of tree-based machine learning techniques, such as
XGBoost, Decision Trees, Random Forests, and Gradient Boosted Machines.

In addition, the study experimented with oversampling, undersampling, and no re-


balancing scenarios in order to address issues like class imbalance within the dataset. The
usage of Data Warehouse systems in traditional ways toreduce churn rates had drawbacks,
especially when it came to handling big and heterogeneous datasets Furthermore, big data
platforms made it easier to compute SNA metrics, which are essential for comprehending
the interactions and actions of customers in expansive networks. The implementation of
the churn prediction model, which was verified throughthe use of fresh datasets, showed
how useful it is for providing guidance to decision- making procedures and lowering
customer

10
6.2 INTRODUCTION TO PROPOSED SYSTEM

6.2.1 SYSTEM ARCHITECTURE:

The suggested research uses several machine learning models to predict the customer
churn. Taking characteristics out of the dataset is the first step. Then various machine
learning models are trained using these features.

FIGURE 6.1.1 SYSTEM ARCHITECTURE

6.2.2 MACHINE LEARNING:

The term "machine learning" describes the application of statistical models and
algorithms to allow computer systems to identify trends in past data and forecast future
customer attrition. In order to identify probable churners, machine learning approaches
can analyse a variety of data, including customer demographics, usage habits, billing
information, and customer care interactions. In order to identify the underlying patterns
and correlations between predictor factors and churn, machine learning (ML) algorithms
are trained on historical data with known churn outcomes. This allows churn to be
predicted for new customers.

11
1. KNN:

For applications involving regression and classification, K-Nearest


Neighbours is a straightforward supervised machine learning technique that works
well. KNN predicts telco customer turnover by comparing a new data point (a
customer) to previous data points (customers with known churn outcomes) using
a selected distance measure, usually Euclidean distance. The technique is a useful
tool for detecting similar customers who are likely to churn based on their
closeness to known churners since it applies the majority class label (churn or non-
churn) among the K nearest neighbours to the new data point and keeps it there.

2. RANDOM FOREST:

By choosing subsets of features and data samples at random from the training set,
Random Forest builds a large number of decision trees. Based on its distinct subset of
features, every decision tree separately estimates the churn probability for a particular
customer. The various forecasts from each decision tree are then combined to get the final
prediction, which is typically done using an averaging or simple majority method.
Because Random Forests can handle big datasets with high dimensionality and noisy
features, they are widely used for churn prediction tasks that need interpretability and
featureimportance.

3. XGBOOST:

In order to prioritise the most difficult cases in each iteration, XGBoost iteratively
adapts new decision trees to the residuals (the discrepancies between the actual and
predicted churn probability) of the previous trees. XGBoost is especially useful for
complicated datasets with non-linear correlations between predictor variables and the
target variable since it uses regularisation approaches to minimise overfitting and
optimise model performance. XGBoost is frequently utilised in competitive data science
competitions and real-world applications, such as telco customer churn prediction, as it
frequently surpasses other algorithms in terms of predicted accuracy.
12
6.3 MODULE DESCRIPTION:

6.3.1 DATA COLLECTION AND EXPLORATORY DATA ANALYSIS:

In order to find trends and insights that could affect customer attrition, this module
performs exploratory data analysis (EDA) on the Telco customer dataset. It consists of
statistical synopses, data visualisation methods (box plots, scatter plots, histograms,
correlation matrices, and distributions of categorical variables). Understanding the
distribution of the data, looking for any outliers, missing values, and correlations between
variables are the goals of the analysis.During the model construction process, the EDA
module offers a thorough grasp of the Telco customer dataset, allowing for well-informed
judgements. It assists in identifying critical features that could influence customer attrition
by examining trends and characteristics in the data, which informs feature selection and
engineering approaches.

6.3.2 MODEL BUILDING:

Using the XGBoost and Random Forest algorithms, this subject focuses on developing
and assessing predictive models for Telco customer attrition. It includes choosing
features, training the model, adjusting hyperparameters, and evaluating the results using
performance measures including recall, accuracy, precision, F1-score, and ROC-AUC.
Techniques for cross-validation are used to guarantee reliable model performance. The
objective of the Model Building and Evaluation module is to create precise prediction
models that will help the Telco client base identify possible churners. It investigates several
modelling techniques to identify intricate linkages within the data and increase prediction
accuracy by utilising the XGBoost and Random Forest algorithms. Metrics for evaluation
offer perceptions into model performance, directing further iterations and optimisation
endeavours.

13
6.3.3 MODEL EVALUATION :
We have selected the F1 score as our main performance indicator in light of the data's
imbalance, where churn events are less common than non-churn events. By striking a
compromise between recall and precision, the F1 score enables us to assess the model's
capacity to detect churners while reducing the number of incorrect predictions. By
concentrating on the F1 score, we want to create a reliable churn prediction model that
correctly detects churners and gives No-Churn Telecom the information they need to put
customised retention tactics into place. This strategy will maximise the efficiency of
resource allocation for retention efforts while enhancing customer pleasure and loyalty.

6.3.4 FLASK INTEGRATION AND DEPLOYMENT :

For deployment, this module combines the trained predictive models with a Flask web
application. It entails configuring Flask routes, managing data entered by users, utilising
the deployed models to make predictions, and displaying the outcomes on the user
interface. This module includes deployment to a production environment like Heroku or
AWS. The Flask Integration and Deployment module makes it easier for end users to
access and use the Telco customer churn prediction system by enabling its deployment into
a live environment. The learned models are integrated with Flask to allow for real- time
predictions depending on user input. This module offers a user-friendly interface for
dealing with the churn prediction system and guarantees smooth interaction between the
predictive models and the web application.

6.3.5 COMPARISON OF VARIOUS ALGORITHMS:

• Compared to other models, the f1 scores of Logistic Regression are comparatively


lower,ranging from 0.7716 to 0.7951.
• K-Nearest Neighbours (KNN) has constant performance across folds and
performs well,with f1 scores ranging from 0.8983 to 0.9392.
• The Support Vector Classifier (SVC), which has scores ranging from 0.8818 to
0.9163,likewise routinely delivers good results.

14
• Decision Tree performs steadily, as evidenced by its f1 scores, which range from
0.9101 to0.9351.
• High f1 ratings for Random Forest, which range from 0.9499 to 0.9661, indicate
that it hasgood predictive power.
• Gradient Boosting typically yields good results, with f1 scores between 0.9251 and
0.9393.
• With scores ranging from 0.9603 to 0.9744, which indicates good predictive power,
XGBoost performs quite well.
• The MLP Classifier exhibits competitive performance with f1 scores ranging from
0.9242to 0.9439.

The two best models to take into account when choosing are:

XGBoost: It consistently outperforms all other models in terms of f1 scores and exhibits
strong performance at various folds.

Random Forest: It is another excellent candidate for selection since it continuously


performs well and displays high F1 scores.

XGBoost and Random Forest are two of the best performing models; further assessment
and comparison are advised based on additional variables such model complexity,
interpretability, computing needs, and task-specific goals.

FIGURE 6.3.5 Comparison of various algorithms

15
7. SYSTEM IMPLEMENTATION:
7.1 SYSTEM ARCHITECTURE:
Here is the implementation for the predicting whether the customer will churn or not.

FIGURE 7.1.1 SYSTEM ARCHITECTURE

7.2 FLOW CHART:

FIGURE 7.2.1FLOW CHART

16
8. RESULTS AND DISCUSSION

Churn risk scores are essential components of client retention plans for companies in
fiercely competitive industries. These ratings assist businesses in identifying clients who
are more likely to defect or choose a rival. Businesses can proactively target these
consumers with tailored retention campaigns and offers by assessing the chance ofchurn.In
order to tackle the project, we first examined a dataset that included the length of the
customer account, the international plan, the voicemail plan, the duration of the call, the
charges, and the number of customer support calls.To get the data ready for modelling,we
conducted feature engineering, data preparation, and exploratory data analysis. We
decided to employ the XGBoost algorithm, a potent gradient boosting method well- known
for its efficiency in solving classification issues, for churn prediction.XGBoost is a good
fit for our project since it has good prediction performance and can manage unbalanced
data.We determined the most significant churn predictors by combining a variety of feature
selection techniques, including correlation analysis and feature importance from
XGBoost.Account duration, international plan, voicemail plan, call duration, costs, and
customer service calls were some of these characteristics.We improved the model by
presumably using a self-made library called BrainBay to identify the ideal threshold. The
trained XGBoost model identified churn clients with a high prediction accuracy.Metrics
like accuracy, precision, recall, and F1-score were used to assess the model's performance;
because the dataset was unbalanced, we paid special attention to the F1-score With a f1
score of 97%, the model showed encouraging results, suggesting that it can correctly
predict prospective churn clients.

17
9. CONCLUSION AND FUTURE ENHANCEMENT:

9.1 CONCLUSION:

No-Churn Telecom can benefit greatly from the churn prediction model created for
this project in terms of proactive client retention and increased customer satisfaction.To
reduce customer churn, customised retention programmes and personalised offers can be
created by identifying customers who are at danger of leaving.Increased client loyalty, a
lower churn rate, and more profitability for the company are all possible outcomes of
this.We adhered to best practices in data science for all phases of the project, including
feature engineering, model selection, data preprocessing, and performance assessment.
The project demonstrated how crucial it is to comprehend feature engineering, business
objectives, and model selection in order to solve real-world issues. Additionally, it
demonstrated XGBoost's efficacy in churn prediction tasks.

9.2 FUTURE ENHANCEMENT:

In the future, telecom churn prediction research will focus on data augmentation to
create a larger dataset, sophisticated feature engineering to gain deeper understanding,
and ensemble learning to combine models for higher accuracy. Investigating deep
learning techniques such as RNNs and CNNs may improve predicted performance even
more. Timely intervention is made possible by the real-time deployment of models
coupled with telecom systems, and ongoing monitoring guarantees that the models remain
relevant. Telecom firms are able to improve churn prediction, strengthen customer
retention tactics, and cultivate long-term customer loyalty through the collection of
varied data, use of sophisticated techniques, and implementation of models in production.

18
9.3 APPENDIX:

9.3.1 SOURCE CODE:

#HTML
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width,
initialscale=1.0">
<title>Dashboard Selection</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 0; padding: 0;
background-image: url('img3.jpg');
background-size: cover;
background-position: center;
display: flex;justify-content:
center; align-items: center;
min-height: 100vh;
}

.container {
display: flex; flex-
direction: column;
align-items: center;
max-width: 800px;
padding: 20px; background-

19
color: rgba(255, 255, 255, 0.8);
border-radius: 10px;
}

.section {
margin: 20px 0;
text-align: center;
}

.section img {
display: block;
width: 300px;
height: 200px;
border-radius: 10px;
margin: auto;
}

.btn { display: block; margin-


top: 10px; padding: 10px 20px;
background-color: #007bff; color:
#fff; text-decoration: none; border-
radius: 4px; transition: background-
color 0.3s ease;
}

.btn:hover { background-
color: #0056b3;
}
</style>
</head>
<body>
<div class="container">

20
<div class="section" id="dashboard1">
<img src="data.png" alt="Dashboard 1">
<h3>Power BI Dashboard</h3>
<a href="dashboard1.html" class="btn">Analyze</a>
</div>
<div class="section" id="dashboard2">
<img src="predict.jpg" alt="Dashboard 2">
<h3>Churn Prediction</h3>

href="C:\Users\sakth\OneDrive\Desktop\project\templates\index.html"
class="btn">Predict</a>
</div>
</div>
</body>
</html>

#FLASK CODE

import numpy as np
import pandas as pd
import xgboost
import pickle import
sklearn
from flask import Flask, render_template, request

app = Flask( name )

model = pickle.load(open('model.pkl', 'rb')) preprocessor


= pickle.load(open('preprocessor.pkl', 'rb'))
@app.route('/') def Home():
return render_template('index.html')

21
@app.route('/predict', methods=['POST']) def
predict():
# Gathering inputs account_length =
int(request.form.get('account_length')) international_plan =
request.form.get('international_plan') vmail_message =
int(request.form.get('vmail_message')) day_calls =
int(request.form.get('day_calls')) day_charge =
float(request.form.get('day_charge')) eve_charge =
float(request.form.get('eve_charge')) night_charge =
float(request.form.get('night_charge')) international_calls =
int(request.form.get('international_calls')) international_charge =
float(request.form.get('international_charge')) custServ_calls =
int(request.form.get('custServ_calls'))

inputs = inputs = pd.DataFrame(np.array([account_length, international_plan,


vmail_message, day_calls, day_charge, eve_charge, night_charge, international_calls,
international_charge, custServ_calls]).reshape(1, -1),
columns=['account_length', 'international_plan', 'vmail_message',
'day_calls',
'day_charge', 'eve_charge', 'night_charge',
'international_calls','international_charge',
'custServ_calls'])

input_processed = preprocessor.transform(inputs)

prediction = model.predict(input_processed)

# Generate churn risk scores


churn_risk_scores = np.round(model.predict_proba(input_processed)[:, 1] * 100,2)

# Churn flag if
prediction == 1:

22
prediction = 'YES'
else:
prediction = 'NO'

return render_template('predict.html', prediction=prediction,


churn_risk_scores=churn_risk_scores, inputs=request.form)

if name == ' main ':


app.run(debug=True)

import numpy as np import


pandas as pd import
geopandas as gpd

import mysql.connector import matplotlib.pyplot


as plt import seaborn as sns from
sklearn.feature_selection import RFE from
sklearn. manifold import TSNE from
MGD_Outliers import OutlierNinja from
sklearn.pipeline import Pipeline from
sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OrdinalEncoder
from imblearn.over_sampling import SMOTE from collections
import Counter

#model selection from sklearn.model_selection import cross_val_score,KFold,


cross_val_predict from sklearn.model_selection import train_test_split

#model development from sklearn.linear_model import


LogisticRegression from sklearn.neighbors import
KNeighborsClassifier from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier from

23
sklearn.ensemble import RandomForestClassifier from
sklearn.ensemble import GradientBoostingClassifier from
xgboost import XGBClassifier from
sklearn.neural_network import MLPClassifier

from sklearn.metrics import classification_report, accuracy_score, f1_score,


precision_score from sklearn.metrics import recall_score, make_scorer,
confusion_matrix, roc_auc_score from BrainBay import check_tradeoffs, opt_threshold

#hyperparameter tuning from


sklearn.model_selection import GridSearchCV

#deployment import
pickle

#warnings import warnings


warnings.filterwarnings('ignore')

#set style
sns.set_style('darkgrid') pd.set_option("display.max_columns",
None) def modify_churn(x):

"""
Modifies the churn value to a binary representation.

Args:
x (str): The churn value.

Returns:
int: The modified churn value, represented as 1 if ' True.' and 0 otherwise.
"""

24
if x==' True.':
return 1 else:
return 0
# number of customers who churned and who did not churn churn_count
= df['churn'].value_counts()

# pie chart to visualize the distribution of customer churn


labels = ['not churn', 'Churned'] sizes =
churn_count.values.tolist()

#colors colors = ['#66b3ff',


'#99ff99']

fig1, ax1 = plt.subplots(figsize=(6,6))


ax1.pie(sizes, colors = colors, labels=labels, autopct='%1.1f%%', startangle=90,
pctdistance=0.85, explode = (0.05,0.05))
#draw circle centre_circle =
plt.Circle((0,0),0.70,fc='white') fig = plt.gcf()
fig.gca().add_artist(centre_circle)

# Equal aspect ratio ensures that pie is drawn as a circle ax1.axis('equal')


plt.tight_layout(); churned=
df[df['churn']==1].groupby('state')['area_code'].count().sort_values(ascending=False)

plt.figure(figsize=(20,10)) sns.barplot(list(churned.keys()),
list(churned.values), palette='gist_rainbow_r') #gnuplot2
plt.xticks(rotation=90) plt.title('State-wise Customer churn',
fontsize=15, fontweight='bold'); smote=SMOTE(random_state=42)
X_sm, y_sm= smote.fit_resample(X_processed, y)

25
print("Actual Classes",Counter(y))
print("SMOTE Classes",Counter(y_sm))
def check_result(test, pred):
"""
Prints out the classification report, accuracy score, precision score, F1 score,
ROC AUC score, and confusion matrix for a given set of test and predicted
labels.

Args:
test (array-like of shape (n_samples,)): Ground truth (correct) target values.
pred (array-like of shape (n_samples,)): Estimated targets as returned by a classifier.

Returns:
None
"""
print(classification_report(test, pred))
print('=========================================')
print('Accuracy', accuracy_score(test, pred))
print('=========================================')
print('precision', precision_score(test, pred))
print('=========================================')
print('recall', recall_score(test, pred))
print('=========================================')
print('F1 score', f1_score(test, pred))
print('=========================================')
print('ROC AUC score', roc_auc_score(test, pred))
print('=========================================')
print('Confusion matrix') print(confusion_matrix(test, pred))
models= {'Logistic_Regression': LogisticRegression(),
'SVC': SVC(),
'KNN': KNeighborsClassifier(),
'Decision_Tree': DecisionTreeClassifier(),

26
9.3.2 SNAPSHOTS:

FIGURE 9.3.2.1 Landing Page

FIGURE 9.3.2.2 Login Page

27
FIGURE 9.2.3 Customer Churn Predictor

FIGURE 9.2.4 Result for churned customer

FIGURE 9.2.5 Result for non churned customer

28
10. REFERENCES:

[1] Farquad, H. &Vadlamani, Ravi &Surampudi, Bapi. (2014). Churn Prediction using
Comprehensible Support Vector Machine: an Analytical CRM Application. Applied Soft
Computing. 19. 10.1016/j.asoc.2014.01.031.

[2] Kumar, Dudyala& Ravi, Vadlamani. (2008). Predicting credit card customer churn
in banks using data mining. International Journal of Data Analysis Techniques and
Strategies. 1. 4-28. 10.1504/IJDATS.2008.020020.

[3] Chih Fong Tsai, “Customer churn prediction through the hybrid neural networks”,
Expert Systems with Applications 12764- 12534.

[4] WouuterVerbeke, Bart- Baesens “Constructing intelligible customer churn prediction


models with advanced rule induction techniques”, Expert Systems with Applications
2378–2394.

[5] Ning Lu, Hua Lin, Jie Lu, Guangquan Zhang “A Customer Churn Prediction Model
in Telecom Industry Using Boosting”, IEEE Transactions on Industrial Informatics, vol.
10, no. 2, may 2014.

[6] H. Karamollaoğlu, İ. Yücedağ and İ. A. Doğru, "Customer Churn Prediction Using


Machine Learning Methods: A Comparative Analysis," 2021 6th International
Conference on Computer Science and Engineering (UBMK), 2021, pp. 139- 144,
doi: 10.1109/UBMK52708.2021.9558876.

[7] Ssu-Han Chen, “The gamma CUSUM chart method for online customer churn
prediction”, Electronic Commerce Research and Applications, 17 (2016) 99–111.

29
[8] Koen W. De Bock, Dirk Van den Poel, “An empirical evaluation of rotation-based
ensemble classifiers for customer churn prediction”, Expert Systems with Applications
38 (2011) 12293–12301.

[9] D. Sikka, Shivansh, R. D and P. M, “Prediction of Delamination Size in Composite


Material Using Machine Learning,” 2022 International Conference on Electronics and
Renewable Systems (ICEARS), 2022, pp. 1228-1232, doi:
10.1109/ICEARS53579.2022.9752123.

[10] M. D. S. Rahman, M. D. S. Alam and M. D. I. Hosen, "To Predict Customer Churn


By Using Different Algorithms," 2022 International Conference on Decision Aid
Sciences and Applications (DASA), 2022, pp. 601-604, doi:
10.1109/DASA54658.2022.9765155.

[11] Koen W. De Bock, Dirk Van den Poel, “Reconciling performance and
interpretability in customer churn prediction using ensemble learning based on
generalized additive models”, Expert Systems with Applications 39 (2012) 6816– 6826.

[12] Sangamnerkar, S., Srinivasan, R., Christhuraj, M.R., Sukumaran, R., “ An ensemble
technique to detect fabricated news article using machine learning and natural language
processing techniques”, 2020 International Conference for Emerging Technology,
INCET 2020, 2020, 9154053

[13] L. Ning, L. Hua, L. Jie, Z. Guangquan, “A customer churn prediction model in


telecom industry using boosting”, IEEE Trans. Ind. Inform. 10 (2014) 1659– 1665.

30
MINI_PROJ_II
ORIGINALITY REPORT

15%
SIMILARITY INDEX

PRIMARY SOURCES

1 ieeexplore.ieee.org
Internet
71 words — 2%
2 Mohamed Galal, Sherine Rady, Mostafa Aref.
"Enhancing Customer Churn Prediction in Digital
65 words — 2%
Banking using Ensemble Modeling", 2022 4th Novel Intelligent
and Leading Emerging Sciences Conference (NILES), 2022
Crossref

3 Aishwarya H M, Bindhiya T, S Tanisha, Soundarya B,


C Christlin Shanuja. "Customer Churn Prediction 59 words — 1%
Using Synthetic Minority Oversampling Technique", 2023 4th
International Conference on Communication, Computing and
Industry 6.0 (C216), 2023
Crossref

4 open-innovation-projects.org
Internet 56 words — 1%
5 codefinity.com
Internet
40 words — 1%
6 huggingface.co
Internet
39 words — 1%
7 www.mdpi.com
Internet
37 words — 1%
8 "Information Integration and Web Intelligence",
Springer Science and Business Media LLC, 2023
25 words — 1%
Crossref

9 www.atlantis-press.com
Internet 25 words — 1%
10 mlstartups.com
Internet
24 words — 1%
11 www.aui.ma
Internet
20 words — < 1%
12 www.researchgate.net
Internet
18 words — < 1%
13 digitalcollections.anu.edu.au
Internet
17 words — < 1%
14 Bijan Moradi, Mehran Khalaj, Ali Taghizadeh
Herat, Asghar Darigh, Alireza Tamjid Yamcholo. "A
16 words — < 1%
swarm intelligence-based ensemble learning model for
optimizing customer churn prediction in the
telecommunications sector", AIMS Mathematics, 2023
Crossref

15 ijrpr.com
Internet 16 words — < 1%
16 www.bitdeal.net
Internet
14 words — < 1%
17 dev.to
Internet
12 words — < 1%
18 iris.polito.it
Internet
11 words — < 1%
19 www.devx.com
Internet
10 words — < 1%
20 www.geeksforgeeks.org
Internet
10 words — < 1%
21 www.ijnrd.org
Internet
9 words — < 1%
22 www.kluniversity.in
Internet
9 words — < 1%
23 Congjun Rao, Yaling Xu, Xinping Xiao, Fuyan Hu,
Mark Goh. "Imbalanced customer churn
8 words — < 1%
classification using a new multi-strategy collaborative
processing method", Expert Systems with Applications, 2024
Crossref

24 fastercapital.com
Internet 8 words — < 1%
25 medium.com
Internet
8 words — < 1%
26 Seyed Jamal Haddadi, Aida Farshidvard, Fillipe dos
Santos Silva, Julio Cesar dos Reis, Marcelo da Silva
6 words — < 1%
Reis. "Customer churn prediction in imbalanced datasets with
resampling methods: A comparative study", Expert Systems
with Applications, 2024
Crossref

EXCLUDE QUOTES OFF EXCLUDE SOURCES OFF


EXCLUDE BIBLIOGRAPHY OFF EXCLUDE MATCHES OFF

You might also like