Professional Documents
Culture Documents
MGMT 59000 - Customer Analytics
MGMT 59000 - Customer Analytics
Group Assignment 1:
Pilgrim Bank A and B
Report
February 2022
Team 07
• The average customer profitability for the sample of customer data for 1999 is $111.50. We
calculated the 95% confidence intervals for this sample mean, and the results were $108.50
and $114.50. Therefore, we can conclude, with 95% confidence, that the average customer
profitability for Pilgrim Bank’s entire customer population will fall between $108.50 and
$114.50.
Results:
2. What is the difference in average profitability between online and offline customers in the
sample? (Hint: What is the magnitude of the difference? Is the difference statistically
• The average profitability for online and offline customers in the sample are $116.67 and $110.79,
respectively. Hence, the magnitude of the difference between the online and offline customers
is $5.88. However, we cannot determine yet if this difference is statistically significant, since the
• H0: online customers generate the same profit on average as offline customers (u1=u2)
• H1: online customers generate more profit on average than offline customers(u1!=u2)
The p-value for the test came out to be 0.2254. Therefore, it can be concluded that at 5% level
of significance, this difference is not significantly different from 0. We cannot reject null
hypothesis (H0), and we cannot conclude that online customers generate more a significantly
different profit on average than offline customers. Another way we can prove this is by checking
the 95% confidence intervals for the difference, which include 0 between them.
Results:
3. What role do customer demographics play in analyzing customer profitability for online
and offline customers? (Hint: What type of model can help you answer this question? How
to use the continuous and categorical variables? How to deal with the missing values?)
• We performed a heterogeneity analysis for customer profitability between online and offline
customers based on customer demographics, that is age, income bracket, tenure, and district
• This type of question can be answered by 2 separate linear regression models, one regressed
for customer profitability of online customers and the other for offline customers.
• The predictor variables for demographics included in this model are X9Age (categorical), X9Inc
(categorical), X9Tenure (continuous) and X9District (categorical). While the continuous variable
will be used directly into the regression equation, the categorical variables will be converted into
dummies such as n classes of a categorical variable will be converted to n-1 dummies. The
intercept will include all the effects from the default class of categorical variables (Age, Income
and District) whereas the beta coefficient for a particular non-default class of a categorical
variable denotes the effect of moving from the default class to the respective class.
• We use mode to impute the data for missing values. This is because of two reasons: firstly, the
data is skewed between online (3854) and offline (27780) customers and hence imputing the
mode class of categorical variable shall not make the data biased and secondly, since mode
➢ The abovementioned model has an Adjusted R-squared value of just over 8% meaning that the
➢ However, there are a few predictor variables which have a small but significant impact on
profitability. These are X9Age (all), X9Inc7, X9Inc8, X9Inc9 and X9Tenure.
➢ The abovementioned model has an Adjusted R-squared value of just over 6.42% meaning that
➢ However, there are a few predictor variables which have a small but significant impact on
profitability. These are X9Age (all), X9Inc5, X9Inc6, X9Inc7, X9Inc8, X9Inc9, X9Tenure and
X9District1200.
➢ X9Inc5, X9Inc6, X9Inc7, and X9District1200 seem to impact more the profitability of offline
customers only and not the profitability of online customers. Hence, it would be advisable to
4. How do retail banks make money from their customers? How much variation is there in
profit across customers? Based on this, what do you recommend the bank do in terms of
• Retail banks make money from their customers using three broad sources:
➢ Investment income from deposit balances represented by the net interest margin, the difference
between the rate a bank paid on a deposit account and the rate at which it was able to invest
➢ Fee income from checking accounts, past due payments, and overdrafts
• The following histogram charts profitability for all customers. We can infer from the graph that
• The following histograms represent the profitability figures for online and offline customers
separately. From the below figure we can note that the maximum number of customers (mode)
are about unprofitable (between $-50 to $0) for both online and offline. Hence, it would be
research into the behaviour characteristics of Pilgrim Bank. Also, the average customer
profitability of online and offline customers is not significantly different, hence monetizing online
bank operations using fees or rebates would also not make sense at this point.
5. Does knowing the demographics of a customer (e.g., the customer’s age and income) and
rest of columns of 2000, such as 0Profit and 0Online, since in the previous step we had only
created dummies and imputed missing values for variables of 1999. After that, we run the
models without dividing the type of customer (online and offline) with the aim to find the
coefficients of the independent variables, and to predict the profitability and retention of the
• Profitability:
➢ The abovementioned model has an Adjusted R-squared value of just over 36.15%, meaning that
the demographics variables of 1999 are a little weak yet to explain most of the variation in
customer profitability in 2000 year. By analyzing other variables, the correlation can be
increased. In fact, this is still a weak model for prediction and further analysis will need to be
➢ Like the previous questions, there are a few predictor variables which have a small but
significant impact on profitability. These are X9Inc7, X9Inc8, X9Inc9, and X9Profit.
➢ Customer profitability in 1999 is significant in this model, hence it does help to predict customer
profitability in the year 2000. A unit increase in the profit of year 1999 shall cause .83 units
• Retention:
➢ As we know that this sample does not have retention variable, we have created this variable
considering that for customers who kept over time on this period 1999 and 2000, retention will
have value of 1, and the rest of them value of 0. Besides, we applied logistic regression instead
➢ The Age variable is significant in this model. We can interpret the beta coefficients as follows:
as we move from X9Age1 to X9Age2, the odds ratio of retaining a customer improves by 5.64%.
Similarly, as we move from X9Age1 to X9Age3, the odds ratio of retaining a customer decrease
by 5.77%.
6. How would you refine your recommendation from the (A) case to the senior management
• The recommendations that we gave in Part (A) to the senior management were supported by the
results we got by our test for significance of means between online and offline customers and
the linear model regressing customer profitability on demographics. Since none of the results
were conclusive, we recommended that there was scope for further research regarding pricing
• In order to update our recommendations for year 2000, we performed the following statistical
tests:
➢ Welch two sample t-test for significance of mean profitability for year 2000 between online and
offline customers
➢ linear regression model for effect of demographics and online usage on profitability
➢ a logistic regression model for effect of demographics and online usage on retention rate
➢ Welch two sample t-test: Since p-value is less than 5%, we can say that there is a significant
difference in means of customer profitability between offline and online users. While online
users had a customer profitability of $161.51, offline customers had a profitability of $140.69.
➢ Linear regression model (Y: Customer Profitability): As expected from the linear regression
profitability.
➢ Logistic regression model (Y: Retention): Whether the customer is online or offline seems to
is online or offline in 1999 have definitely played a significant impact in determining whether the
customer is profitable and retained. Moreover, online customers have proven to be more
channels, since online customers turn out to be more profitable in the year 2000.
• As a corollary, in order to conduct further research, we also believe that it would be particularly
important:
➢ To request the access for 5 years more of data associated with online channel to Erica
Dorstamp, head of IT services, to increase our portfolio of customers and the prediction of the
model as well.
➢ To request the log of drivers(variables) of tables provided from IT teams, to begin exploring and
pull other not only demographic variables but also geographic, psychographics, behavior and
so on. The selection of these new variables should be discussed with Jane Raines and other
➢ To see another data treatment (cleaning and imputation), as separating the outliers and seeing