Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 19

Capstone Presentation

Telecom Churn Study

Richa Bhandari
PGPDSBA June 2020 Batch
Business Problem Understanding
Overview:
At company X, there is a concern that there is an increased churn being experienced by all companies in
the telecom industry. Customer retention and Average revenue per unit (ARPU) has become a strong area
of focus, especially for company X, as the churn rate for X, has been relatively high.

Problem Statement:
Currently the effort to retain customers has been very reactive as the attempt is made only when the
subscriber calls in to close the account. The management team is keen to take more initiatives on this front
and have a targeted proactive strategy.
• Extensive experimentation with more than 10 different models was done to identify the best model that could
predict customer behavior so that the company can take proactive steps to retain the customer wherever
possible.

• The data was slightly imbalanced (Majority class: Minority Class = 76:24) and hence appropriate model
evaluation metric was required to be chosen. A combination of Harmonic mean (F1 score) and Area Under the
Curve (AUC) was used to finalize the best model.

• Models tried to arrive at the best are:


• Simple Models like Logistic Regression & Discriminant Analysis with different
thresholds for classification
• Random Forest after balancing the dataset using Synthetic Minority Oversampling
Technique
(SMOTE)
• Ensemble of five individual models and predicting the output by averaging the individual output
probabilities
• Xgboost algorithm

Note: Ensemble model with majority and stacked ensemble were also tried but they did not give good results and hence not included in the
project report
The key insights as provided by the logistic regression model, which incidentally provided the best metrics,
indicate that the variables that impact customer behaviour can be broadly classified as below:
 minutes of usage(both in terms of duration and number of calls),
 overage charges,
 network quality (for both voice and data),
 number of subscribers in the family,
 handset price and
 credit class
A Gains chart was prepared which gave a cumulative lift of 110% in the first 3 deciles. The customers with
highest probability of churn were identified. The company could use the customer details to proactively work with
them and retain them.
Process Used
Feature Outlier
Selection Treatmen
• Variables • kNN used on
greater than • Identify dataset with • Winsor &
30% Significant significant
Variable
missing variables variables
Transformatio
values
R Missing Value n
e Imputation
m
o
v
e
V
a Model Building
ri
a Logistic Regression
b
l Linear Discriminant Analysis
e Random Forest
Ensemble Models – Averaging the output from each individual model
Ensemble Models – Majority Voting
Stacked Ensemble Models
Boosting – Xgboost
Significant Variables, Missing Value Imputation & Outlier
Treatment

The above variables were removed from the original dataset and then a boruta model was built for feature selection
purposes. The model identified the significant variables which was used to further subset the original dataset.
The significant variables were, as identified, given below
Correlation Plot highlighting the correlated numeric variables

Highly correlated variables


that was considered for
removal to reduce multi-
collinearity
Significant Variables, Missing Value Imputation & Outlier
Treatment

• Further sub setting the original with by selecting the significant variables, we still have 50
variables, with 18 of them still having missing values.
• The missing values were treated by using kNN algorithm to predict the missing values.
Library VIM was used. After that some of the highly correlated variables were removed.
• Most of the numeric variables had outliers. The boxplots as can be seen when the code is
run, help visualize the variables with outliers.
• The outliers were treated by winsorizing the variables. Library Desctools was used. Some
variables were further log transformed. See the r code for more details.
Data Visualization

Similar trend for customers who churn and who do not. Higher no. of
Similar pattern in rev_Range for both customer behaviors
customers with low usage and high mean usage.
For customers who churn, the trend is generally decreasing after a
value of Similar pattern but differences in densities noticeable for comparable values.
30. Customers who don’t churn, the trend is decreasing after close to
50.
Data Visualization

Similar trend for both customer behavior.


Similar pattern in rev_Range for both customer behaviors

Similar pattern but differences in densities noticeable for comparable values. Similar pattern but differences in densities noticeable for comparable values.
Data Visualization

Distinct differentiation noticed in customer behavior across models and Distinct differentiation achieved for both the variables after transformation
Months of usage. to two classes based on churn percentage
Similarly for Call Drops also as seen below
Model Comparison
Model Comparison – The previously build models and an Ensemble Model
Ensemble of five Individual models and final prediction was done by averaging the predictions from the models
Model Comparison with Ensemble Modelling
Five Individual models built and final prediction was done by combining the predictions
from the models
Model Comparison

 Ten models tried to finalize the best amongst them. The evaluation metric comparison seen
above
 LG_26 seems to be giving the best results with highest AUC, second highest accuracy and
relatively good sensitivity and specificity
 LG_26 is a logistic regression model with a threshold of cut-off for levels chosen at 26%
Variables with more than 50% probability of changing the decision of the customer
for every 1 unit change in the respective independent variable

Top ten factors for customer churn are


1. crcscod (Credit Class Code),
2. hnd_price (current handset price),
3. avgqty, (average monthly number of calls over the life of customer) ,
4. adjmou (Billing adjusted total minutes of use over the life of customer),
5. Uniqsubs (Number of unique subscribers in the household),
6. callwait_range, (Range of number of call waiting calls)
7. Datovr_Range, Range of revenue of data overage
8. Drop_blk_Mean, Mean number of dropped or blocked calls
9. Drop_vce_Range, Range of number of dropped (failed) voice calls
10. rev_range, Range of revenue(charge amount)
Recommend rate plan migration as a proactive retention
strategy? The analysis below suggests an answer as “Yes”.

• Mou_Mean (minutes of usage) is one of the highly significant variables as was seen in the previous slide and hence it makes
sense to work toward proactively working with customers to increase their MOU so that they are retained for a longer
period.
• The below boxplot also suggests that customers who are retained seem to have higher MOU. Additionally mouR_Factor is
found to be highly significant which is a derived variable of mou_Range. Changes in MOU is also highly significant.
Change_mf is a derived variable of change_mou.
• To complement the above we also see that ovrmou_Mean is also a highly significant variable with an odds ration of more
than
1. The variable has positive estimate of coefficient indicating increase in overage increases churn.

It would help if the company is able to work with the customers and based on their usage migrate them to optimal plan rates to
avoid overage charges.
Use of model for prioritisation of customers for a proactive retention
campaigns Gains – Lift Chart

The lift achieved will help to reach out to churn candidates by targeting much fewer of the total customer pool with the
company. The highest gain can be achieved through the first 30 deciles which can give about 33% of the customers who are
likely to terminate the services. This means the company selects 30% of entire customer database and that covers 33% of
people who are likely to leave. This is much better than randomly calling customers in the absence of model which would
have given maybe 15% hit rate from all potential churn candidates.
Identification of the customers with highest probability
of terminating services
The 20% customers who need to be proactively worked with to prevent churn were identified with.
They are the customers whose probability of churn is greater than 32.24% and less than 84.7. The code
to get the customer details is given below

You might also like