MGMT 59000 - Customer Analytics

MGMT 59000 – Customer Analytics
Group Assignment 1:
Pilgrim Bank A and B
Report
February 2022
Team 07
Aaron Chen | chen4065@purdue.edu

Chetan Solanki | csolanki@purdue.edu
Erika Marietta | emariett@purdue.edu
Lucero Izquierdo | lizquier@purdue.edu
Shubham Goyanka | sgoyanka@purdue.edu
Umar Khan | khan353@purdue.edu
1. Based on the sample of customer data for 1999, what can Green conclude about average
customer profitability for Pilgrim Bank's entire customer population?
• The average customer profitability for the sample of customer data for 1999 is $111.50. We
calculated the 95% confidence intervals for this sample mean, and the results were $108.50
and $114.50. Therefore, we can conclude, with 95% confidence, that the average customer
profitability for Pilgrim Bank’s entire customer population will fall between $108.50 and
$114.50.
Results:
2. What is the difference in average profitability between online and offline customers in the
sample? (Hint: What is the magnitude of the difference? Is the difference statistically
significant different from zero?
• The average profitability for online and offline customers in the sample are $116.67 and $110.79,
respectively. Hence, the magnitude of the difference between the online and offline customers
is $5.88. However, we cannot determine yet if this difference is statistically significant, since the
average of profit of these two groups of customers are close.

For that reason, we used a two-sample Welch t-test to test for the significance of this difference
between the respective means of online and offline customers:
• H0: online customers generate the same profit on average as offline customers (u1=u2)
• H1: online customers generate more profit on average than offline customers(u1!=u2)
The p-value for the test came out to be 0.2254. Therefore, it can be concluded that at 5% level
of significance, this difference is not significantly different from 0. We cannot reject null
hypothesis (H0), and we cannot conclude that online customers generate more a significantly
different profit on average than offline customers. Another way we can prove this is by checking
the 95% confidence intervals for the difference, which include 0 between them.
Results:
3. What role do customer demographics play in analyzing customer profitability for online
and offline customers? (Hint: What type of model can help you answer this question? How
to use the continuous and categorical variables? How to deal with the missing values?)
• We performed a heterogeneity analysis for customer profitability between online and offline
customers based on customer demographics, that is age, income bracket, tenure, and district
of residence for the customer.
• This type of question can be answered by 2 separate linear regression models, one regressed
for customer profitability of online customers and the other for offline customers.
• The predictor variables for demographics included in this model are X9Age (categorical), X9Inc
(categorical), X9Tenure (continuous) and X9District (categorical). While the continuous variable
will be used directly into the regression equation, the categorical variables will be converted into
dummies such as n classes of a categorical variable will be converted to n-1 dummies. The
intercept will include all the effects from the default class of categorical variables (Age, Income
and District) whereas the beta coefficient for a particular non-default class of a categorical
variable denotes the effect of moving from the default class to the respective class.
• We use mode to impute the data for missing values. This is because of two reasons: firstly, the
data is skewed between online (3854) and offline (27780) customers and hence imputing the
mode class of categorical variable shall not make the data biased and secondly, since mode
can be used for both continuous and categorical variables.
• Results of online customers:

• The above results can be interpreted as follows:
➢ The abovementioned model has an Adjusted R-squared value of just over 8% meaning that the
demographics fail to explain much of the variation in customer profitability.
➢ However, there are a few predictor variables which have a small but significant impact on
profitability. These are X9Age (all), X9Inc7, X9Inc8, X9Inc9 and X9Tenure.
Results of offline customers:

• The above results can be interpreted as follows:
➢ The abovementioned model has an Adjusted R-squared value of just over 6.42% meaning that
the demographics fail to explain much of the variation in customer profitability.
➢ However, there are a few predictor variables which have a small but significant impact on
profitability. These are X9Age (all), X9Inc5, X9Inc6, X9Inc7, X9Inc8, X9Inc9, X9Tenure and
X9District1200.
➢ X9Inc5, X9Inc6, X9Inc7, and X9District1200 seem to impact more the profitability of offline
customers only and not the profitability of online customers. Hence, it would be advisable to
study these classes of the categorical variable specifically.
4. How do retail banks make money from their customers? How much variation is there in
profit across customers? Based on this, what do you recommend the bank do in terms of
matching service levels to customer profit levels?
• Retail banks make money from their customers using three broad sources:
➢ Investment income from deposit balances represented by the net interest margin, the difference
between the rate a bank paid on a deposit account and the rate at which it was able to invest
the deposit through commercial or mortgage lending
➢ Fee income from checking accounts, past due payments, and overdrafts
➢ Loan interest and base lending rates
• The following histogram charts profitability for all customers. We can infer from the graph that
approximately 50% of the customers are unprofitable for Pilgrim Bank.
• The following histograms represent the profitability figures for online and offline customers
separately. From the below figure we can note that the maximum number of customers (mode)
are about unprofitable (between $-50 to $0) for both online and offline. Hence, it would be
advisable to first convert these customers into profitable.

• Since there is no significant relationship between Customer Profitability and any of the
demographic predictor variables, we recommend Pilgrim Bank to conduct more background
research into the behaviour characteristics of Pilgrim Bank. Also, the average customer
profitability of online and offline customers is not significantly different, hence monetizing online
bank operations using fees or rebates would also not make sense at this point.
5. Does knowing the demographics of a customer (e.g., the customer’s age and income) and
profitability in 1999 help to predict customer profitability and/or retention in 2000?

• Firstly, to answer this question, we have considered key to keep with no missing values in the
rest of columns of 2000, such as 0Profit and 0Online, since in the previous step we had only
created dummies and imputed missing values for variables of 1999. After that, we run the
models without dividing the type of customer (online and offline) with the aim to find the
coefficients of the independent variables, and to predict the profitability and retention of the
customers for 2000 year.
• Profitability:
➢ The abovementioned model has an Adjusted R-squared value of just over 36.15%, meaning that
the demographics variables of 1999 are a little weak yet to explain most of the variation in
customer profitability in 2000 year. By analyzing other variables, the correlation can be
increased. In fact, this is still a weak model for prediction and further analysis will need to be
completed to strength the predictability of this model.
➢ Like the previous questions, there are a few predictor variables which have a small but
significant impact on profitability. These are X9Inc7, X9Inc8, X9Inc9, and X9Profit.
➢ Customer profitability in 1999 is significant in this model, hence it does help to predict customer
profitability in the year 2000. A unit increase in the profit of year 1999 shall cause .83 units
increase in the profit of year 2000.
• Retention:
➢ As we know that this sample does not have retention variable, we have created this variable
considering that for customers who kept over time on this period 1999 and 2000, retention will
have value of 1, and the rest of them value of 0. Besides, we applied logistic regression instead
of linear regression because the retention value is binary (0 and 1).

➢ There are a few predictor variables which have a small but significant impact on profitability.
These are X9Age (all), X9Inc6, X9Inc9, X9Profit and X9Distrcit1200.
➢ The Age variable is significant in this model. We can interpret the beta coefficients as follows:
as we move from X9Age1 to X9Age2, the odds ratio of retaining a customer improves by 5.64%.
Similarly, as we move from X9Age1 to X9Age3, the odds ratio of retaining a customer decrease
by 5.77%.
6. How would you refine your recommendation from the (A) case to the senior management
team in terms of Pilgrim Bank’s online channel pricing strategy?
• The recommendations that we gave in Part (A) to the senior management were supported by the
results we got by our test for significance of means between online and offline customers and
the linear model regressing customer profitability on demographics. Since none of the results
were conclusive, we recommended that there was scope for further research regarding pricing
for the online channel yet.
• In order to update our recommendations for year 2000, we performed the following statistical
tests:
➢ Welch two sample t-test for significance of mean profitability for year 2000 between online and
offline customers
➢ linear regression model for effect of demographics and online usage on profitability
➢ a logistic regression model for effect of demographics and online usage on retention rate
• The results were as follows:
➢ Welch two sample t-test: Since p-value is less than 5%, we can say that there is a significant
difference in means of customer profitability between offline and online users. While online
users had a customer profitability of $161.51, offline customers had a profitability of $140.69.
➢ Linear regression model (Y: Customer Profitability): As expected from the linear regression
model, whether the customer is online or offline is significant in impacting customer
profitability.
➢ Logistic regression model (Y: Retention): Whether the customer is online or offline seems to
impact the retention rate for Pilgrim Bank significantly as well.

• Based on the above results, we can conclude that profitability in 1999 and whether the customer
is online or offline in 1999 have definitely played a significant impact in determining whether the
customer is profitable and retained. Moreover, online customers have proven to be more
profitable than offline customers, by about $21 on an average.
• Hence, our recommendation would be to consider to start offering rebates or lower
service charges to encourage customers of Pilgrim Bank to move to online banking
channels, since online customers turn out to be more profitable in the year 2000.
• As a corollary, in order to conduct further research, we also believe that it would be particularly
important:
➢ To request the access for 5 years more of data associated with online channel to Erica
Dorstamp, head of IT services, to increase our portfolio of customers and the prediction of the
model as well.
➢ To request the log of drivers(variables) of tables provided from IT teams, to begin exploring and
pull other not only demographic variables but also geographic, psychographics, behavior and
so on. The selection of these new variables should be discussed with Jane Raines and other
pricing analysts of the company.
➢ To see another data treatment (cleaning and imputation), as separating the outliers and seeing
if there are any dominate demographics within that group.

MGMT 59000 - Customer Analytics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MGMT 59000 - Customer Analytics

Uploaded by

Copyright:

Available Formats

MGMT 59000 – Customer Analytics

Aaron Chen | chen4065@purdue.edu

customer profitability for Pilgrim Bank's entire customer population?

significant different from zero?

average of profit of these two groups of customers are close.

between the respective means of online and offline customers:

of residence for the customer.

can be used for both continuous and categorical variables.

• Results of online customers:

demographics fail to explain much of the variation in customer profitability.

Results of offline customers:

the demographics fail to explain much of the variation in customer profitability.

study these classes of the categorical variable specifically.

matching service levels to customer profit levels?

the deposit through commercial or mortgage lending

➢ Loan interest and base lending rates

approximately 50% of the customers are unprofitable for Pilgrim Bank.

advisable to first convert these customers into profitable.

demographic predictor variables, we recommend Pilgrim Bank to conduct more background

profitability in 1999 help to predict customer profitability and/or retention in 2000?

customers for 2000 year.

completed to strength the predictability of this model.

increase in the profit of year 2000.

of linear regression because the retention value is binary (0 and 1).

These are X9Age (all), X9Inc6, X9Inc9, X9Profit and X9Distrcit1200.

team in terms of Pilgrim Bank’s online channel pricing strategy?

for the online channel yet.

• The results were as follows:

model, whether the customer is online or offline is significant in impacting customer

impact the retention rate for Pilgrim Bank significantly as well.

profitable than offline customers, by about $21 on an average.

• Hence, our recommendation would be to consider to start offering rebates or lower

service charges to encourage customers of Pilgrim Bank to move to online banking

pricing analysts of the company.

if there are any dominate demographics within that group.

You might also like