Professional Documents
Culture Documents
Case Analysis Pilgrim Bank
Case Analysis Pilgrim Bank
PreliminariesInitial scanning of the data set reveals that it contains three types of data as listed belowa) Nominal Online and district; b) Ordinal- Income & Age, and c) Ratio- Tenure & Profits.
ApproachSince some of the data points are missing, we have deleted the entire data set pertaining to
that data point. This has reduced over all sample size to 22813 including 2954 for online and
19859 for non-online customer groups (around 88%).
Descriptive Statistics and Histogram of Count of Age wrt Online Customers:
Mean
Median
Mode
SD
Range
Profit
Profit
Online
Offline Income
131.524 126.522
5.488
20.500
27.000
6.000
-2.000
-31.000
6.000
290.365 281.724
2.336
2292.000
2199.000
8.000
5000
Tenure
10.996
8.250
7.410
8.525
41.000
4000
3000
Offline
2000
online
1000
0
1 2 3 4 5 6 7
Analysis:
First, simple regression models of profit, with each of the independent variables i.e. Age,
income, tenure, Online/Not Online and District (Using two dummy variables) was attempted.
It was found that Income, Age, Tenure and District 1200 have positive relationship with
profitability. Also, the regression model of profit versus online/offline shows that online
customers are $5 (Slope in this model) more profitable than offline customers. But the t-stat
values when we carry out the two tailed tests gives a P value greater than 0.05 which is
needed for significance. So this model is not significant.
R Square
Adjusted R Sq
F value
Coff Intercept
Coeff Variable
t Stat Variable
P val Variable
Online
3.53E-05
-8.6E-06
0.8044
126.52
5.0028
0.8968
0.3697
Age
0.0203
0.0202
473.56
26.787
24.6963
21.761
6.1E-104
Income
0.0214
0.0214
501.00
29.754
17.750
22.383
8.8E-110
Tenure
0.0288
0.0288
678.20
65.171
5.6380
26.042
2.3E-147
District (1200&1300)
0.0025
0.0024
29.051
95.504
39.243
9.6252
6.1308
1.2066
8.88E-10 0.2275
After this, we carried out stepwise regression considering the independent variables. We
conclude that the profitability is correlated with tenure, income and age (Other variables turn
out to be insignificant in the t test). The R^2 value worked out for this model is only 5.7%.
Therefore the change in profitability cannot strongly be attributed to these independent
variables. The equation obtained is as follows:
Profit = -87.86 + 4.014*tenure + 18.03*income + 17.69*age
The p-value of all the above independent variable is below 0.05.
Age
3000
Age
2000
Polynomial (Age)
1000
0
-1000
0 1 2 3 4 5 6 7 8
Income
4000
Income
2000
Polynomial
(Income)
0
-2000
0 1 2 3 4 5 6 7 8 9 10
Tenure
5000
Tenure
Polynomial (Tenure)
0
0 5 1015202530354045
-5000
On observing scatter plots of these variables, we were unable to categorically identify the
Tukeys model quadrant. We then carried out a Quick and Dirty method considering the
square of all independent variables. After removing those variables that did not satisfy
individual t-tests, our model has an R^2 value of 6.3%.
The Quick and dirty model is as follows- Profits = -37.1+5.71tenure-0.05(tenure)217.09income+ 3.32*(income)2+18.16age+16.86online+14.7district1200
Besides this we also carried out a regression model between age and online/not online to find
out the relationship between these two independent variables. This exercise helped us in
concluding that there is a negative co-relation between age and bring online. The p value of
this model is also significant. The other multi co linearity data is as follows
Online
Age
Income
Multicollinearity
Online
Age
Income
-0.1685 0.08069
-0.0699
Tenure
-0.08078
0.42031
0.040002
There is also a negative co-relation between age and online which means that the younger
customers use the online service. Hence in our opinion, as the online customers are more
profitable, this can be promoted to young customers.