Redicting Ustomer Hurn: Yilun Gu, Anna Klutho, Yinglu Liu, Yuhuai Wang, Hao Yan

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Q

P redicting C USTOMER C HURN

W
Yilun Gu, Anna Klutho, Yinglu Liu,
Yuhuai Wang, Hao Yan

E

EXECUTIVE SUMMARY

P ROBLEM R ECOMMENDATIONS
How can QWE predict customer churn and increase customer

retention in the coming months? Given this information, QWE should implement

data-driven strategies to address each of these


I NSIGHTS factors and ensure a reduction in churn in the
In our analysis of the QWE customer data for November and coming months. Recommended strategies include:

December 2011, the three drivers that influence QWE customer
• Customer satisfaction programs
churn the most is:
• Incentives to increase login recency and
• Change in Customer Happiness Index (between
frequency
November and December)
• Customer Age (expressed in months as a QWE customer)
• Regency of Logins (expressed through days since last login)



ANALYSIS

D EFINING CHURN
Before investing what drivers influence customer churn, it is important to first investigate the overall churn rate within the
QWE customer base. This gives us a basic understanding of the problem QWE is facing with customer retention.

Of the 6437 customers in the database, only 323 have left QWE (“churned”) between November and December 2011. This
gives a starting churn rate of 5.1%. This means that across QWE’s customer base, 5.1% have left the company in the last
month.

I S CHURN CAUSED BY ONE DRIVER ALONE?


In this section, we will be discussing ways to evaluate customer churn through the lens of a single customer characteristic --
that is, can churn be explained by one driver alone?


CUSTOMER AGE


It is natural to assume that the length of a customer
relationship (“Customer Age”) would have a large impact on
customer retention. But can Customer Age predict churn on
its own? The graph left shows the relationship between
Customer Age and Customer Churn (where 1 = Customer
Churn & 0 = No Customer Churn).

In looking at this graph, we see no apparent relationship


between Customer Age and Customer Churn. This illustrates
that Customer Age doesn’t necessarily have a big impact on
the probability for a customer to leave QWE or not.
However, while age is not strongly correlated with churn on its

own, as seen later in this paper, age does have an influence


in predicting churn when considered with other variables.

Graph 1 – Customer Age vs. Customer Churn

1
LOOKING AT OTHER DRIVERS – CUSTOMER HAPPINESS INDEX (CHI)


However, while Customer Age may not be the best to predict churn on its own, this does not eliminate the possibility of other
drivers having the ability to singularly explain churn. The question asked here is – What driver explains customer churn the
best on its own?

Using statistical methods like correlation and univariate logistic regression across all 11 provided customer characteristics, our
team deeply explored the impact of the current Customer Happiness Index score on its ability to predict customer churn.
We chose this variable because it had the strongest association with customer churn (with a correlation value of - 0.084)
and the highest significance as an individual predictor of churn (p = 2.04e-11).

Using this information, our team built a model to predict the probability of customer churn for 3 randomly selected customers
(Customers 672, 354, & 5203):

Table 1 – Probability of Churn for Customers 672, 354, & 5204


Customer Customer CHI Score Probability of Churn* Actually Churn?
672 148 3.3% No
354 139 3.5% No
5203 37 6.4% No
* P(Churn) = 1 / 1 + e^-[ -0.006153(CHI) - 2.46064]

This information confirms Wall’s theory that happiness would be a major driver of a customer churn. As happiness goes up,
the probability of a customer leaving decreases.

W HAT VARIABLES CAN BE USED TO PREDICT CHURN?


In this section, we will look at other methods that incorporate multiple drivers to predict churn. While the current

Customer Happiness Index succeeded in individually predicting customer churn, it logically does not make sense that an

outcome be determined by a single variable alone. Therefore, other methods can be used to see what combinations

of drivers can best predict churn and which of these variables are most important in this relationship . The following

sections provide two possible approaches to answer this question.


MULTIPLE LOGISTIC REGRESSION – PREDICTING INDIVIDUAL POSSIBILITIES OF CHURN

Multiple Logistic Regression (MLR) is a statistical technique that allows us to incorporate multiple customer characteristics to
determine the probability of customer churn. The results from this analysis provides for the calculation of churn
probabilities for each individual customers, which can then be used to rank customers as the “riskiest” or most likely to
churn.

For this approach, our team chose the following customer characteristics to include in the model:
• Change of Customer Happiness Index Score (between November and December)
• Customer Age (expressed in months as a QWE customer)
• Recency of Logins (expressed through days since last login)
• Current Customer Happiness Index Score
• Change in Number of Blog Views
These variables were selected because they had the highest significance in a model that included all 11 possible customer
characteristics (see Exhibit A in the Appendix for more details). This means they had the highest impact on churn within the
model. By re-running the model with these 5 characteristics, we can predict the probability of customer churn for the

aforementioned randomly selected customers (Customers 672, 354, & 5203):

2
Table 2 – Probability of Churn for Customers 672, 354, & 5203 (MLR)
As mentioned earlier, the advantage of this

approach is that we are able to get a list of
Customer 672 354 5203
individual customers and their individual
Probability of Churn* 3.4% 3.3% 5.3%
probabilities, allowing QWE management to

specifically target the needs of these
*Please see Exhibit B in the Appendix for MLR Model
customers.

DECISION TREES - SEGMENTING CUSTOMERS BY CUSTOMER CHURN




However, these results would change when calculated with a
different method. The Decision Tree method is a predictive
model that segments customers based on a set of decision
rules.

Given the simplistic and graphic nature of this method’s


output, it is very easy to interpret and guide decisions
through the model.

In the case of the QWE customers, by entering all 11 variables


into the decision tree model, the statistical software package
chooses the most important factors that contribute to churn
and calculates the probability of churn accordingly.

To interpret this graph, we see that four variables have an


impact on churn rates:
• Recency in Login (expressed through days since last
login)
• Frequency of Logins (expressed through number of
logins between November and December)
• Customer Age (expressed as months as QWE
customer)
• Number of Blog Views (between November and
Graph 2 – Decision Tree Output for QWE December)


From these factors, a set of rules are established to frame the likelihood of customer churn. Following the pathways of the
tree, if the customer meets that criteria, he/she goes to the left. The final node provides that customer’s likelihood of churn.
Following this model, we can determine the probability of churn for each of the selected customers (Customers 672, 354, and
5204). A summary of these customers by the four tree variables can be found in the table below.

Table 3 – Probability of Churn for Customers 672, 354, & 5203 (Decision Tree)
Customer 672 354 5203
Probability of Churn* 3.9% 3.9% 3.9%

Given that Days < 17.5 for all three customers, we can follow the decision tree model to conclude that all have a 3.9% chance
of customer churn.

COMPARING METHODS

In comparing the results of the Decision Tree method to that of the Multiple Logistic Regression, there is a difference in the
final churn probabilities predicted for each customer (see table below).

3
Table 4 – Comparing Results for Customers 672, 354, & 5203
Customer Decision Tree Multiple Logistic Regression Customer Actually Churn?
672 3.9% 3.4% No
354 3.9% 3.3% No
5203 3.9% 5.3% No

This difference occurs for two reasons:
• Different variables to determine the chance of churn
o While the decision tree method uses four variables selected by the computer to determine probability, MLR
uses the five variables selected by the team to calculate the chance of churn.

• Difference in how they calculate probability


o The decision tree model predicts according to rules it’s established, starting with the single factor at the top of
the tree, and provides a single probability for each group of customers.
o MLR calculates probability individually by the characteristics of the customers themselves; since the customers
vary across the selected variables, it makes sense that their probabilities be different as well.

W HICH METHOD TO USE?


While both methods have their advantages and disadvantages, our team recommends using the Multiple Logistic
Regression method to determine which customers are most likely going to churn. Why?
• Decision Tree is less precise in predicting individual probabilities according to each customer. With the tree
model, we could get the customer segment most likely to leave, but cannot narrow the range any further.
• With MLR, we are able to get a list of individual customers and their individual probabilities, allowing QWE
management to specifically target the needs of these customers.

ACCURACY

It’s important to evaluate the accuracy of our recommended method as well. Given that accuracy reflects the percentage of
what we predict will happen versus what actually happened, it is important to maximize accuracy in order to correctly
capture the current situation. In addition, accuracy is an important measure, as it is easily understood and communicated
across a business.

Our team chose a threshold of 12% (i.e. we predict a customer will churn if P(C) ≥ 12%), as it provides the highest accuracy
across this model overall. At this level, the MLR model has a 93.4% accuracy rate.

W HO IS MOST LIKELY TO CHURN?


Given the results from our analysis, the following customers have the highest probability of churn:

Table 5 – Top 10 Customers Most Likely to Churn in the Coming Months


Customer Number 2287 357 929 1 2025 14 18 3 21 1672
Probability of Churn 0.3999 0.3769 0.2418 0.2218 0.1981 0.1972 0.1925 0.1925 0.1902 0.1825
Actually Churn? No Yes No No No No No No No Yes

In looking at this table, we see that while these 10 customers have the highest probability of churn within the customer data
set, only two of these customers have actually churned in the last month (November to December). While this implies a
lack of accuracy for the model, this weakness is offset by the benefit that this model provides: the ability to rank each
customer individually by their probability of churn. Our team argues that the model instead captures the potential of churn
in the coming months, and that the remaining 8 individuals should be watched and managed carefully in the next
month to ensure they do not leave QWE.

4
RECOMMENDATION

In the end, we see that the following three drivers* have the highest impact on predicting customer churn:
• Change in Customer Happiness Index (between November and December)
• Customer Age (expressed in months as a customer)
• Regency of Logins (expressed through days since last login)

*While the MLR model included 5 drivers in is calculations, these three characteristics had the most significant coefficients in
the model

Intuitively, this relationship makes sense; a change in happiness level, the length of a customer relationship, and the activeness
of the customer (as expressed through a recency in logins) logically could have a significant impact on customer churn. This
fact is confirmed by our model.

Therefore, it is recommended that QWE management take action to build strategies to address these three drivers in their
operations. Such strategies include:
• Customer Satisfaction Programs – When the Customer Happiness Index score drops dramatically, personalized
outreach to these individuals with problem-solving solutions would be beneficial.
• Incentives to Increase Login Recency and Frequency – One possible incentive is the reduction of QWE subscription
price based on the number and frequency of logins in a month.

T herefore, by implementing these strategies, QWE may be able to reduce churn for their company in the future.

APPENDIX

EXHIBIT A – RESULTS FROM MULTIPLE LOGISTIC REGRESSION WITH ALL 11 VARIABLES

Estimate Std. Error p-value Significance
Level
(Intercept) -2.76E+00 1.069e-01 -25.841 <0.0000000000000002 ***
CHI -4.657e-03 1.223e-03 -3.808 0.00014 ***
Age 1.271e-02 5.370e-03 2.366 0.01799 *
Change in CHI -1.027e-02 2.474e-03 -4.153 0.0000329 ***
Cases -1.524e-01 1.049e-01 -1.452 0.14643
Change in Cases 1.703e-01 9.050e-02 1.881 0.05992 .
SP 1.593e-02 1.022e-01 0.156 0.87611
Change in SP -5.194e-02 7.852e-02 -0.661 0.50830
Logins 2.893e-04 2.092e-03 0.138 0.89002
Blogs 2.905e-04 1.960e-02 0.015 0.98817
Views -1.098e-04 4.071e-05 -2.697 0.00700 **
Days since Last Login 1.724e-02 4.289e-03 4.020 0.0000581 ***


EXHIBIT B – MLR PREDICTION MODEL

You might also like