Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

MGMT 59000 – Customer Analytics

Group Assignment 1:
Pilgrim Bank A and B

Report
February 2022

Team 07

Aaron Chen | chen4065@purdue.edu


Chetan Solanki | csolanki@purdue.edu
Erika Marietta | emariett@purdue.edu
Lucero Izquierdo | lizquier@purdue.edu
Shubham Goyanka | sgoyanka@purdue.edu
Umar Khan | khan353@purdue.edu
1. Based on the sample of customer data for 1999, what can Green conclude about average

customer profitability for Pilgrim Bank's entire customer population?

• The average customer profitability for the sample of customer data for 1999 is $111.50. We

calculated the 95% confidence intervals for this sample mean, and the results were $108.50

and $114.50. Therefore, we can conclude, with 95% confidence, that the average customer

profitability for Pilgrim Bank’s entire customer population will fall between $108.50 and

$114.50.

Results:

2. What is the difference in average profitability between online and offline customers in the

sample? (Hint: What is the magnitude of the difference? Is the difference statistically

significant different from zero?

• The average profitability for online and offline customers in the sample are $116.67 and $110.79,

respectively. Hence, the magnitude of the difference between the online and offline customers

is $5.88. However, we cannot determine yet if this difference is statistically significant, since the

average of profit of these two groups of customers are close.


For that reason, we used a two-sample Welch t-test to test for the significance of this difference

between the respective means of online and offline customers:

• H0: online customers generate the same profit on average as offline customers (u1=u2)

• H1: online customers generate more profit on average than offline customers(u1!=u2)

The p-value for the test came out to be 0.2254. Therefore, it can be concluded that at 5% level

of significance, this difference is not significantly different from 0. We cannot reject null

hypothesis (H0), and we cannot conclude that online customers generate more a significantly

different profit on average than offline customers. Another way we can prove this is by checking

the 95% confidence intervals for the difference, which include 0 between them.

Results:
3. What role do customer demographics play in analyzing customer profitability for online

and offline customers? (Hint: What type of model can help you answer this question? How

to use the continuous and categorical variables? How to deal with the missing values?)

• We performed a heterogeneity analysis for customer profitability between online and offline

customers based on customer demographics, that is age, income bracket, tenure, and district

of residence for the customer.

• This type of question can be answered by 2 separate linear regression models, one regressed

for customer profitability of online customers and the other for offline customers.

• The predictor variables for demographics included in this model are X9Age (categorical), X9Inc

(categorical), X9Tenure (continuous) and X9District (categorical). While the continuous variable

will be used directly into the regression equation, the categorical variables will be converted into

dummies such as n classes of a categorical variable will be converted to n-1 dummies. The

intercept will include all the effects from the default class of categorical variables (Age, Income

and District) whereas the beta coefficient for a particular non-default class of a categorical

variable denotes the effect of moving from the default class to the respective class.

• We use mode to impute the data for missing values. This is because of two reasons: firstly, the

data is skewed between online (3854) and offline (27780) customers and hence imputing the

mode class of categorical variable shall not make the data biased and secondly, since mode

can be used for both continuous and categorical variables.

• Results of online customers:


• The above results can be interpreted as follows:

➢ The abovementioned model has an Adjusted R-squared value of just over 8% meaning that the

demographics fail to explain much of the variation in customer profitability.

➢ However, there are a few predictor variables which have a small but significant impact on

profitability. These are X9Age (all), X9Inc7, X9Inc8, X9Inc9 and X9Tenure.

Results of offline customers:


• The above results can be interpreted as follows:

➢ The abovementioned model has an Adjusted R-squared value of just over 6.42% meaning that

the demographics fail to explain much of the variation in customer profitability.

➢ However, there are a few predictor variables which have a small but significant impact on

profitability. These are X9Age (all), X9Inc5, X9Inc6, X9Inc7, X9Inc8, X9Inc9, X9Tenure and

X9District1200.

➢ X9Inc5, X9Inc6, X9Inc7, and X9District1200 seem to impact more the profitability of offline

customers only and not the profitability of online customers. Hence, it would be advisable to

study these classes of the categorical variable specifically.

4. How do retail banks make money from their customers? How much variation is there in

profit across customers? Based on this, what do you recommend the bank do in terms of

matching service levels to customer profit levels?

• Retail banks make money from their customers using three broad sources:
➢ Investment income from deposit balances represented by the net interest margin, the difference

between the rate a bank paid on a deposit account and the rate at which it was able to invest

the deposit through commercial or mortgage lending

➢ Fee income from checking accounts, past due payments, and overdrafts

➢ Loan interest and base lending rates

• The following histogram charts profitability for all customers. We can infer from the graph that

approximately 50% of the customers are unprofitable for Pilgrim Bank.

• The following histograms represent the profitability figures for online and offline customers

separately. From the below figure we can note that the maximum number of customers (mode)

are about unprofitable (between $-50 to $0) for both online and offline. Hence, it would be

advisable to first convert these customers into profitable.


• Since there is no significant relationship between Customer Profitability and any of the

demographic predictor variables, we recommend Pilgrim Bank to conduct more background

research into the behaviour characteristics of Pilgrim Bank. Also, the average customer

profitability of online and offline customers is not significantly different, hence monetizing online

bank operations using fees or rebates would also not make sense at this point.

5. Does knowing the demographics of a customer (e.g., the customer’s age and income) and

profitability in 1999 help to predict customer profitability and/or retention in 2000?


• Firstly, to answer this question, we have considered key to keep with no missing values in the

rest of columns of 2000, such as 0Profit and 0Online, since in the previous step we had only

created dummies and imputed missing values for variables of 1999. After that, we run the

models without dividing the type of customer (online and offline) with the aim to find the

coefficients of the independent variables, and to predict the profitability and retention of the

customers for 2000 year.

• Profitability:
➢ The abovementioned model has an Adjusted R-squared value of just over 36.15%, meaning that

the demographics variables of 1999 are a little weak yet to explain most of the variation in

customer profitability in 2000 year. By analyzing other variables, the correlation can be

increased. In fact, this is still a weak model for prediction and further analysis will need to be

completed to strength the predictability of this model.

➢ Like the previous questions, there are a few predictor variables which have a small but

significant impact on profitability. These are X9Inc7, X9Inc8, X9Inc9, and X9Profit.

➢ Customer profitability in 1999 is significant in this model, hence it does help to predict customer

profitability in the year 2000. A unit increase in the profit of year 1999 shall cause .83 units

increase in the profit of year 2000.

• Retention:

➢ As we know that this sample does not have retention variable, we have created this variable

considering that for customers who kept over time on this period 1999 and 2000, retention will

have value of 1, and the rest of them value of 0. Besides, we applied logistic regression instead

of linear regression because the retention value is binary (0 and 1).


➢ There are a few predictor variables which have a small but significant impact on profitability.

These are X9Age (all), X9Inc6, X9Inc9, X9Profit and X9Distrcit1200.

➢ The Age variable is significant in this model. We can interpret the beta coefficients as follows:

as we move from X9Age1 to X9Age2, the odds ratio of retaining a customer improves by 5.64%.

Similarly, as we move from X9Age1 to X9Age3, the odds ratio of retaining a customer decrease

by 5.77%.
6. How would you refine your recommendation from the (A) case to the senior management

team in terms of Pilgrim Bank’s online channel pricing strategy?

• The recommendations that we gave in Part (A) to the senior management were supported by the

results we got by our test for significance of means between online and offline customers and

the linear model regressing customer profitability on demographics. Since none of the results

were conclusive, we recommended that there was scope for further research regarding pricing

for the online channel yet.

• In order to update our recommendations for year 2000, we performed the following statistical

tests:

➢ Welch two sample t-test for significance of mean profitability for year 2000 between online and

offline customers

➢ linear regression model for effect of demographics and online usage on profitability

➢ a logistic regression model for effect of demographics and online usage on retention rate

• The results were as follows:

➢ Welch two sample t-test: Since p-value is less than 5%, we can say that there is a significant

difference in means of customer profitability between offline and online users. While online

users had a customer profitability of $161.51, offline customers had a profitability of $140.69.

➢ Linear regression model (Y: Customer Profitability): As expected from the linear regression

model, whether the customer is online or offline is significant in impacting customer

profitability.
➢ Logistic regression model (Y: Retention): Whether the customer is online or offline seems to

impact the retention rate for Pilgrim Bank significantly as well.


• Based on the above results, we can conclude that profitability in 1999 and whether the customer

is online or offline in 1999 have definitely played a significant impact in determining whether the

customer is profitable and retained. Moreover, online customers have proven to be more

profitable than offline customers, by about $21 on an average.

• Hence, our recommendation would be to consider to start offering rebates or lower

service charges to encourage customers of Pilgrim Bank to move to online banking

channels, since online customers turn out to be more profitable in the year 2000.

• As a corollary, in order to conduct further research, we also believe that it would be particularly

important:
➢ To request the access for 5 years more of data associated with online channel to Erica

Dorstamp, head of IT services, to increase our portfolio of customers and the prediction of the

model as well.

➢ To request the log of drivers(variables) of tables provided from IT teams, to begin exploring and

pull other not only demographic variables but also geographic, psychographics, behavior and

so on. The selection of these new variables should be discussed with Jane Raines and other

pricing analysts of the company.

➢ To see another data treatment (cleaning and imputation), as separating the outliers and seeing

if there are any dominate demographics within that group.

You might also like