Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Customer

Churn at QWE – Case Study Questions and Analysis Steps



1. Is Wall’s belief about the dependence of churn rates on customer age supported by the data?
To get some intuition, try visualizing this dependence (Hint: no need to run any statistical tests).

To answer the first question, you can build a histogram using SPSS. Before building the histogram,
note that case of interest are the ones where the customer churned. Therefore, you may use only
these cases in the histogram. You can use the function “select case” in SPSS.

DataàSelect CasesàChoose if condition is satisfiedàClick the if buttonàMove the variable


Churn1Yes0No into the input box up rightà enter “= 1” after the variable name in the input box
up rightàClick continueà OK.

You will notice that the cases that do not satisfy the condition are slashed out.

To build the histogram for the number of customers who churned by age.

GraphsàLegacy DialogsàHistogramàPut the variable Customerageinmonths in the Variable


box on the top, and put Churn1Yes0No in the Rows box in the “Panel by” section àOK

Based on the histogram, is churn related to customer age?
Churn seems to be related to age. There
seems to be a peak at 12 mo. Less
churning for younger and older groups –
whereas more in the middle range.











2. Run a regression model that best predicts the probability of a customer leaving using all
independent variables.

Before you do this, don’t forget to cancel the selected cases because we are going to use the
whole data set.

DataàSelect CasesàSelect “All Cases”àOK


Now you will notice that the slashes are gone and you can run the logit regression.
AnalyzeàRegressionàBinary Logistic…àPut the “Churn1Yes0No” in the dependent variable
boxà Put “CustomerAgeinMonths”, “CHIScoreMonth0”, “CHIScore01”, “SupportCasesMonth0”,
“SupportCases01”, and “Views01” into the Box in the Block section. (Just as a reminder, in this
case, none of the independent variables or predictors are categorical variables, if there are some
categorical variables, you need to define them by clicking the categorical button in the upper
right)àOK.
While conducting binary logistic regression, in ‘Save’ check ‘probabilities’.

a. What is the probability of churn based on customer age according to the resulting output?

The results suggest that the likelihood of churn is slightly greater with age (Exp(B) = 1.018).





























b. What are the ‘customer age’ and the ‘predicted probability’ that customers 354, 672, and
5,203 will leave?

Based on the regression equation and the data:

• for customer 672 (see below):

Churn = =-2.672+(0.017*16)-(0.006*148)-(0.009*1)+(0.13*0)+(0.0001*85)-(0.139*0)
Churn = -3.2885
Probability = EXP[Churn] / (1 + EXP[Churn])
Probability = 0.036, 3.6%

If you checked to save probabilities as you are running the binary regression (below), a
variable named as PRE_1 gets added to the data set. The probability in SPSS is 3.672%

• For Customer 354 (see below)



Churn = =-2.672+(0.017*13)-(0.006*139)-(0.009*-29)+(0.13*0)+(0*244)-(0.139*0)
Churn = -3.024
Probability = EXP[Churn] / (1 + EXP[Churn])
Probability = 0.046, 4.6%
The probability in SPSS is 4.624%


• For Customer 5203:

Churn = =-2.672+(0.017*4)-(0.006*37)-(0.009*32)+(0.13*1)+(0*1)-(0.139*1)
Churn = -3.123
Probability = EXP[Churn] / (1 + EXP[Churn])
Probability = 0.042, 4.2%
The probability in SPSS is 4.307%


3. Now run logistic regression by splitting the data into three slices as follows: Customer age 0-6
months, 7 to 13 months, and 14+ months. Run separate logistic regressions for each slice.

a. What is the probability of churn based on customer age according to the resulting output
of the three groups?

To run the logit regression on the customers of age 0-6 months:


DataàSelect CasesàChoose if condition is satisfiedàClick the if buttonà


Move the variable “CustomerAgeinmonths” into the input box up rightà enter “<= 6” after the
variable name in the input box up rightàClick continueà OK.

Next,

AnalyzeàRegressionàBinary LogisticàPut the “Churn1Yes0No” in the dependent variable


boxà Put “CustomerAgeinMonths”, “CHIScoreMonth0”, “CHIScore01”, “SupportCasesMonth0”,
“SupportCases01”, and “Views01” into the Box in the Block section. àOK.

To run the logit regression on the customers of the age 7 to 13 months:


First, cancel the selected cases:


DataàSelect CasesàSelect “All Cases”àOK.


Then

DataàSelect CasesàChoose if condition is satisfiedàClick the if buttonà


Enter “Range(CustomerAgeinmonths ,7,13)” in the input box up rightà Click continueà OK.

Next

AnalyzeàRegressionàBinary Logistic…àPut the “Churn1Yes0No” in the dependent variable


boxà Put “CustomerAgeinMonths”, “CHIScoreMonth0”, “CHIScore01”, “SupportCasesMonth0”,
“SupportCases01”, and “Views01” into the Box in the Block section. àOK

For customers over the age of 13 months:


First, cancel the selected cases:


DataàSelect CasesàSelect “All Cases”àOK.


Then,

DataàSelect CasesàChoose if condition is satisfiedàClick the if buttonà


Move the variable “CustomerAgeinmonths” into the input box up rightà enter “>= 14” after the
variable name in the input box up rightàClick continueà OK.

Next

AnalyzeàRegressionàBinary Logistic…à Put the “Churn1Yes0No” in the dependent variable


boxà Put “CustomerAgeinMonths”, “CHIScoreMonth0”, “CHIScore01”, “SupportCasesMonth0”,
“SupportCases01”, and “Views01” into the Box in the Block section. àOK.



0-6 month Group:







































7-13 Month Group:








































14+ Month Group:










































b. Do you think separating the data into three slices improve predictive power?

I do. Separating age groups shows the difference between ‘older’ customers, who are less
likely to churn (B value becomes <1) versus the other groups. The exp(B) is < 1 for the oldest
customer group, which means that the probability of churn is decreasing by age for this
group, while the reverse is true for the other two groups. This is a significant difference from
the insight gained from the overall model.

The probabilities:

• for customer 672 (16 mo):

Probability before segmentation = 0.036, 3.6%

Probability after segmentation = 0.038, 3.8%

• For Customer 354 (13 mo)



Probability before segmentation = 0.046, 4.6%
Probability before segmentation = 0.094, 9.4%

• For Customer 5203 (4 mo)



Probability before segmentation = 0.042, 4.2%
Probability before segmentation = 0.015, 1.5%

You might also like