Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Assignment 3

Analytics for Strategic Market Planning (MB 829)

Group 8
Abhishek Mishra (21512003)
Akif Ahammed (215120012)
Alice M S M Rynjah (215120013)
Rohith Rajan (215120084)
Case 1: ABB Electric (Customer Choice Modelling)

Energy Spare Ease of Problem


Alternatives Price Maintenance Warranty Quality
Loss Parts Install Solver
ABB 4.636 4.886 5.705 5.250 5.239 5.364 5.898 4.841
GE 4.511 4.966 5.818 5.398 5.455 5.250 6.080 4.739
Westinghouse 4.705 5.045 5.750 5.159 5.114 5.216 5.920 4.932
Edison 4.193 4.466 5.364 5.114 4.784 5.000 5.534 4.432
Table 1: Variable Averages

Table 1 provides the averages of independent variables for each alternative. Alternative-specific constants, if
added, are set to zero by definition. It is observed from this table that the attribute Problem Solver has received
the maximum individual rating across all four alternatives. This indicates that the customers value a
knowledgeable salesforce and that this attribute has a major contribution towards influencing the purchasing
decision. Even though the customers paid a lot of emphasis on the Problem Solver attribute, as is evident from the
table, all the four companies are competitive in that attribute and have received a similar rating (+/- 0.5). This
implies that ABB has to differentiate by improving its quality in other areas where it lacks behind the competition.

Energy Spare Ease of Problem


Alternatives Price Maintenance Warranty Quality
Loss Parts Install Solver
ABB 5.167 5.333 6.389 6.222 5.722 5.333 6.500 5.111
GE 5.000 5.348 5.739 5.913 5.652 5.696 6.261 4.913
Westinghouse 5.423 5.654 6.000 5.769 5.423 5.615 6.538 5.500
Edison 5.238 5.429 6.190 6.238 5.762 5.810 6.429 5.619
Table 2: Variable Averages for Chosen Alternatives

The customers who chose ABB as their preferred supplier, found that the Maintenance, Warranty and Problem
Solver attributes were more than satisfactory. This points towards an established pattern showing that the
customers who prefer ABB are influenced majorly by the aforementioned attributes. Edison is the only other
alternative with similar average ratings (an average of more than 6). The customers who chose GE over ABB
were influenced by the Ease of Install and Energy Loss attributes in which ABB lacks by a small margin.
Meanwhile, Westinghouse was found to be better than ABB in the Price, Energy Loss, Ease of Install and Quality
attributes by a small margin.

All the aforementioned insights suggest that ABB should try to differentiate itself from its direct rival Edison by
focusing its efforts on improving the rating on Ease of Install, Spare Parts, Energy Loss, Price and Quality
attributes. In the process of doing so, ABB will also close the gap and catch up to GE and Westinghouse in terms
of the average ratings of several attributes. If ABB wants to achieve gains in average ratings on a tight budget, it
should focus on improving the quality of the Price and Ease of Install attributes since improving the average
ratings of the hardware related attributes like Energy Loss and Spare Parts may prove less economical.

Observed / Predicted Choice ABB GE Westinghouse Edison


ABB 13 3 0 1
GE 1 19 1 1
Westinghouse 4 0 23 0
Edison 0 1 2 19
Table 3: Confusion Matrix

Table 3 shows comparison of observed choices and predicted choices. High values in the diagonal of the confusion
matrix, compared to the non-diagonal values, indicate high convergence between observations and predictions.
Analysis has been performed on the estimation dataset and measures the goodness-of-fit of the model.

1
Customer-Loyalty-Based Segmentation

From the estimated choice probabilities shown in Table 4, the customers were segmented based on their loyalty
to the suppliers they chose. The choice probabilities for each of the four alternatives were observed and the
segments were then identified in accordance with the following criteria,

• If the probability of a customer purchasing from ABB was significantly larger than the other
probabilities, and if the ‘Predicted ABB’ and ‘Observed ABB’ were both equal to 1, then that particular
customer is considered to be Loyal.
• If the probability of a customer purchasing from ABB was slightly higher than the next most preferred
supplier, and if the ‘Predicted ABB’ was equal to 1 with observed purchase being equal to 1 under any
other supplier (since the difference in probability is insignificant), then that particular customer is
considered to be Competitive.
• If the probability of a customer purchasing equipment from a competitor was the highest, but not
significantly higher than the probability of purchasing from ABB, then the customer is considered to be
Switchable.
• If the probability of a customer purchasing from ABB was the least, and if the ‘Predicted ABB’ and
‘Observed ABB’ were both equal to 0, then that particular customer is considered to be Lost.

Table 4: Loyalty Based Segmentation (Applicable up to Customer 88)

Key Drivers of Choice

Coefficient Standard
Variables t-statistic
estimates deviation
Price 2.180581 0.586578 3.717459
Energy Loss 2.655609 0.673706 3.941794
Maintenance 0.593692 0.437028 1.358475
Warranty 1.140702 0.330995 3.446286
Spare Parts -0.13262 0.21757 -0.60955
Ease of Install 0.520023 0.172875 3.00808
Prob Solver 2.03218 0.549676 3.697052
Quality 2.639412 0.687749 3.837753
Const-1 -0.12379 0.678549 -0.18244
Const-2 -0.67122 0.71941 -0.93301
Const-3 -0.68723 0.715046 -0.96111
Baseline n/a n/a
Table 5: Coefficient Estimates

2
From the coefficient estimates of the choice model are shown in Table 5, it can be inferred that the Price, Energy
Loss, Warranty, Ease of Install, Problem Solver and Quality attributes are significant and positively associated
with the likelihood of choice. In short this means that better the quality of these attributes, more likely are the
chances of that alternative getting chosen by the customers. ABB was already perceived to be good at the Problem
Solver and Warranty attributes as observed before. The following steps should be taken by ABB to increase the
likelihood of purchase,

• Reduction in price
• Reduction in energy loss
• Increasing the ease of installation
• Increasing the overall quality of its products

On the basis of the loyalty segmentation done before, the segments observed were, loyal, competitive, switchable
and lost. Customers in the loyal segment are bound to choose ABB as their supplier repeatedly and thus marketing
efforts directed towards this segment will not result in any significant gain. ABB has to focus its marketing efforts
towards the lost segment in addition to the switchable and the competitive segments. In order to achieve significant
gains from its marketing efforts, ABB has to design its advertisement campaign in such a way that the important
attributes influencing the likelihood of purchase (mentioned above) are highlighted. If their working capital
allows, ABB can also focus on improving the performance and efficiency of their products in order to see
significant gains in average ratings of the Quality and Energy Loss attributes. ABB’s products should be
positioned in a way that they appeal to the customers who are looking for ease of install and quality at an affordable
price.

Ann. Purchase Firm


Customer District
Volume ($k) Chosen
35 $14,798 2 Edison
43 $12,514 2 Westinghouse
44 $10,997 2 ABB
66 $9,793 3 GE
32 $6,270 1 Westinghouse
11 $1,722 3 ABB
84 $1,404 3 GE
17 $1,364 3 GE
74 $1,219 1 GE
20 $1,009 2 GE
Table 6: Customer Descriptor Data in Descending Order of Annual Purchase Volume

As observed in Table 1, ABB should target the top 10 customers by annual purchase volume. By focusing their
marketing efforts on these high value prospects, ABB stands to gain a considerable amount in annual purchase
volume. By understanding what influenced these customers to make their purchasing decision, ABB can figure
out which attributes to focus more in order to either retain these customers or make them switch over from a
competitor. This data is however not very informative as it does not talk about the needs of the customers and
what they look for in a product.
As observed from Table 7, the majority of ABB’s loyal customers hail from district 1 and 2 which creates an
opportunity for ABB to focus its marketing efforts in district 3. First, ABB needs to understand the key choice
drivers for the customers from district 3. Secondly, ABB needs to figure out where it lacks behind with respect to
its competitors who are being chosen by the customers from district 3.

Ann. Purchase Firm


Customer District
Volume ($k) Chosen
24 $322 1 ABB
41 $444 1 ABB
42 $752 1 ABB
45 $415 1 ABB
54 $660 1 ABB
57 $736 1 ABB

3
69 $528 1 ABB
3 $643 2 ABB
13 $466 2 ABB
37 $767 2 ABB
38 $182 2 ABB
44 $10,997 2 ABB
87 $395 2 ABB
11 $1,722 3 ABB
21 $749 3 ABB
50 $584 3 ABB
58 $700 3 ABB
61 $462 3 ABB
Table 7: Loyal Customers

Ann. Purchase Firm


Customer District
Volume ($k) Chosen
84 1404 $3 GE
16 894 $3 GE
86 480 $3 GE
4 562 $3 Edison
Table 8: Competitive Customers

Ann. Purchase Firm


Customer District
Volume ($k) Chosen
11 $1,722 3 ABB
37 $767 2 ABB
50 $584 3 ABB
61 $462 3 ABB
41 $444 1 ABB
36 $511 3 Westinghouse
26 $899 2 Edison
7 $664 3 Edison
Table 9: Switchable Customers

Ann. Purchase Firm


Customer District
Volume ($k) Chosen
1 $761 1 GE
6 $233 1 GE
10 $844 1 GE
15 $696 2 GE
17 $1,364 3 GE
19 $733 1 GE
20 $1,009 2 GE
22 $518 2 GE
30 $672 1 GE
47 $956 2 GE
60 $464 1 GE
64 $438 1 GE
66 $9,793 3 GE
68 $37 1 GE
70 $686 1 GE

4
73 $396 3 GE
74 $1,219 1 GE
81 $618 2 GE
82 $307 1 GE
88 $207 2 GE
5 $469 3 Westinghouse
14 $211 1 Westinghouse
18 $408 3 Westinghouse
25 $800 2 Westinghouse
31 $37 2 Westinghouse
32 $6,270 1 Westinghouse
39 $796 1 Westinghouse
40 $808 1 Westinghouse
43 $12,514 2 Westinghouse
46 $251 2 Westinghouse
48 $335 2 Westinghouse
49 $37 3 Westinghouse
51 $777 1 Westinghouse
52 $787 2 Westinghouse
53 $989 2 Westinghouse
56 $216 2 Westinghouse
59 $65 2 Westinghouse
62 $289 2 Westinghouse
63 $198 2 Westinghouse
71 $301 3 Westinghouse
76 $760 2 Westinghouse
77 $777 2 Westinghouse
79 $199 3 Westinghouse
80 $547 1 Westinghouse
83 $629 3 Westinghouse
2 $627 1 Edison
8 $767 3 Edison
9 $467 1 Edison
12 $928 1 Edison
23 $871 2 Edison
27 $871 1 Edison
28 $855 3 Edison
29 $290 2 Edison
33 $890 3 Edison
34 $355 2 Edison
35 $14,798 2 Edison
55 $728 2 Edison
65 $504 1 Edison
67 $676 3 Edison
72 $245 3 Edison
75 $267 1 Edison
78 $226 2 Edison
85 $35 1 Edison
Table 10: Lost Customers

5
Annual Purchase Volume ($ K)
Current sales productivity for ABB from all segments $21,524
Current sales from loyal segment $17,545
Current sales from competitive segment $0
Current sales from switchable segment $3,979
Sales lost to competitors $75,821
Table 11: ABB’s Sales Productivity

Assuming the volume of sales remains the same over the years, if the switchable and competitive segments were
to be entirely captured in sales, the potential sales productivity would be, $9,393,000 above the sales achieved
from the loyal segment ($3,340,000 from competitive and $6,053,000 from switchable, as shown in Table 12 and
13). This would result in an increase of 53.5% of sales productivity over the sales from the loyal segment, and
$26,938000 in total, which is 25.1% over the current year’s sales productivity.

Ann. Purchase Firm


Customer District
Volume ($k) Chosen
84 $1,404 3 GE
16 $894 3 GE
86 $480 3 GE
4 $562 3 Edison
Total $3,340
Table 12: Total Sale in Competitive Segment

Ann. Purchase Firm


Customer District
Volume ($k) Chosen
11 $1,722 3 ABB
37 $767 2 ABB
50 $584 3 ABB
61 $462 3 ABB
41 $444 1 ABB
36 $511 3 Westinghouse
26 $899 2 Edison
7 $664 3 Edison
Total $6,053
Table 13: Total Sale in Switchable Segment

The uses of the choice modelling approach used to an analyze this case are mentioned below,

• The choice modelling approach may be used to identify the perceived importance of attributes across
different alternatives.
• It can be used to improve certain aspects of a product by looking into the average ratings of different
attributes for other alternatives.
• It outputs the probability of each data point (customer) belonging to a particular segment, which is more
informative than an upfront binary classification.

6
Despite being very useful, there are certain limitations of this approach which are described as follows,

• This modelling approach cannot be used to segment customers effectively. Determining threshold
probabilities for segmentation is almost up to the discretion of the individual performing the analysis.
For example, loyal and lost segments can be identified easily considering probabilities under these
segments are close to or significantly higher than 75% as compared to probabilities for other segments.
However, without information on the statistical significance of the probabilities, the distinction
between competitive and switchable segments is a bit hazy.

7
Case 2: Infiniti G20 (Positioning)

Audi
S2 Unreliable
Roomy
Poor Value
Ford
Poorly Built
Saab
I (54.6%) Quiet Economical Mercury
Overall
Prestige BMW Common Interesting
Eagle
S1 G20
Pontiac
Successful HondaAvantGarde
Attractive Uncomfortable
Easy Service
S3
Toyota Sporty

II (18.8%)

Figure 1: 2D Positioning Map

As is visible from Figure 1, this market has the perception that the Infiniti G20 is attractive, prestigious and
portrays success. According to the perceptual data, these attributes differentiate the Infiniti G20 from its
competitors.

Figure 2: 3D Positioning Map

8
60%
0.546

50%

40%

30%

0.188
20%

0.112
10% 0.073
0.025 0.023 0.019 0.010 0.004 0.000
0%
1 2 3 4 5 6 7 8 9 10
DIMENSIONS

Figure 3: Variance Explained

As is evident from Figure 3, the three dimensions chosen for the perceptual mapping have a variance explained
of 0.546, 0.188 and 0.112 respectively. This denotes that dimension 1 is 3 times more significant in explaining
the perceptions of the respondents. Dimensions 2 and 3 are not very significant given their low variance explained
values.

The claim that the Infiniti G20 is an affordable BMW is not very credible in this market, judging by the perception
of the respondents. The respondents perceive the BMW as quieter and more prestigious than the G20.

S2
Roomy

Saab
Quiet Economical
I (54.6%)
S1Prestige BMW
G20
Successful Honda
Attractive
S3
II (18.8%)

Figure 4: Major Competitors

9
From Figure 3, it is observed that BMW, Honda, and Saab are the major competitors for the Infiniti G20. The
attributes that are common across these brands are Quiet, Economical, Successful, Attractive and Prestige. Saab
and BMW are favored by the S1 segment and Honda is favored by the S3 segment. The Infiniti G20 is positioned
between both the segments, S1 and S3, but the perpendicular distance between S3 and G20 is lesser than the
perpendicular distance between S2 and G20. This indicates that the Infiniti G20 is favored more by the S3 segment

The S3 segment is American Dreamers and is made up of people in the age group 25-35 who are from various
ethnic backgrounds (majorly Asian and White), are white-collar officials and their average household income is
$59,000. This segment makes up 30% of the total number of respondents taken into consideration.

As observed in Figure 3, the most important attributed for all the three segments are as follows,

• Segment S1: Prestige is the most important attribute for this segment as the vector of the attribute is the
closest to the vector of S3.
• Segment S2: Roomy is the most important attribute for this segment as the vector of the attribute is the
closest to the vector of S3.
• Segment S3: Attractive is the most important attribute for this segment as the vector of the attribute is
the closest to the vector of S3.

The Infiniti G20 can be marketed to both the segments, S2 and S3 as the attributes falling under these segments
explain most of the variance. As the Infiniti has not had much success being positioned as an affordable BMW, it
can be positioned closer to the segment S3, with more emphasis on the attractiveness and the portrayal of success.
Since the segment S3, is made up of American Dreamers, the depiction of a successful life and attractiveness are
coherent with the segment. Also, segment S3 is the only segment which has people of Asian ethnicity and Infiniti
being a Japanese company will appeal more to these people.

The advantages of the perceptual mapping Excel add-in are as follows,

• Gives a better understanding of the market segments and how the preferences of the respondents are
mapped.
• Helps in identifying how brands are perceived by the market in a real-time scenario.
• Indicates how marketing efforts like advertisement campaigns and promotional offers affect the brand
perception.
• Provides credible evidence to confirm if the brand positioning aligns with the consumer perception.
• Identifies gaps in the market so that new products can be developed accordingly.
• Spots consumer trends with changing times.
• Can be used as a tool to keep track of competitive response in the market.

The disadvantages of the perceptual mapping Excel add-in are as follows,

• Works effectively for low-contribution product purchases.


• Provides a better picture for individual brands rather than brand portfolios.
• The data obtained is hard to quantify and analyze further.

10
Case 3: Bookbinders Book Club
From RFM model, the most important parameters that influence the customers to order “The Art History of
Florence” are total money spent on BBBC Books, total number of purchases in the chosen period and months
since last purchase.

Both linear regression model and binary logit model indicates that total money spent on BBBC Books, months
since last purchase and number of arts books purchased are the factors that most influenced the customers to buy
“The Art History of Florence.

Descriptive Statistics

Mean Std. Deviation N


Choice (0/1) .2500 .43315 1600
Gender .6587 .47428 1600
Amount purchased 200.9156 95.30090 1600
Frequency 12.3138 7.84140 1600
Last purchase 3.1988 3.02675 1600
First purchase 22.5763 16.23315 1600
P_Child .7394 1.05873 1600
P_Youth .3375 .62757 1600
P_Cook .7600 1.04011 1600
P_DIY .3913 .67986 1600
P_Art .4250 .73465 1600

Change Statistics
Adjusted Std. Error of the
R R Square R Square Estimate R Square Change F Change df1 df2 Sig. F Change
.490a .240 .235 .37878 .240 50.196 10 1589 <.001

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics

Model B Std. Error Beta t Sig. Tolerance VIF


1 (Constant) .364 .031 11.848 <.001
Gender -.131 .020 -.143 -6.536 <.001 .994 1.006
Amount purchased .000 .000 .060 2.464 .014 .801 1.248

Frequency -.009 .002 -.165 -4.170 <.001 .307 3.254


Last purchase .097 .014 .678 7.156 <.001 .053 18.770
First purchase -.002 .002 -.075 -1.103 .270 .103 9.685
P_Child -.126 .016 -.309 -7.698 <.001 .298 3.360
P_Youth -.096 .020 -.140 -4.792 <.001 .563 1.775
P_Cook -.141 .017 -.340 -8.520 <.001 .301 3.325
P_DIY -.135 .020 -.212 -6.834 <.001 .496 2.017
P_Art .118 .019 .200 6.061 <.001 .440 2.274
a. Dependent Variable: Choice (0/1)

11
From linear regression model, based on the beta value, it can be inferred that the money spent on BBBC books,
months since last purchase and the number of art book purchased are the most important factors that
influenced the customer to order “The Art History of Florence”.

Binary Logit Model


Hosmer and Lemeshow Test

Step Chi-square df Sig.


1 3.061 8 .930

The p-value of .930 indicates a good fit of the model.

Classification Tablea
Predicted
Choice (0/1) Percentage
Observed No Yes Correct
Step 1 Choice (0/1) No 1120 80 93.3
Yes 240 160 40.0
Overall Percentage 80.0
a. The cut value is .500

A Classification Table or Confusion Matrix describes the predicted number of successes compared with the
number of successes actually observed. Similarly, it compares the predicted number of failures with the number
actually observed.

From the classification table, it can be inferred that the model gives an accurate prediction 80% of the time.

Variables in the Equation

95% C.I.for EXP(B)


B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a Gender(1) .863 .137 39.443 1 <.001 2.371 1.811 3.104
Amount .002 .001 5.542 1 .019 1.002 1.000 1.003
purchased

Frequency -.076 .017 20.709 1 <.001 .927 .898 .958


Last purchase .612 .094 42.526 1 <.001 1.844 1.534 2.216
First purchase -.015 .013 1.333 1 .248 .985 .961 1.010
P_Child -.811 .117 48.319 1 <.001 .444 .353 .558
P_Youth -.637 .143 19.741 1 <.001 .529 .399 .700
P_Cook -.923 .119 59.677 1 <.001 .397 .314 .502
P_DIY -.906 .144 39.738 1 <.001 .404 .305 .536
P_Art .686 .127 29.178 1 <.001 1.986 1.548 2.547
Constant -1.215 .208 34.175 1 <.001 .297
a. Variable(s) entered on step 1: Gender, Amount purchased, Frequency, Last purchase, First purchase, P_Child,
P_Youth, P_Cook, P_DIY, P_Art.
Based on logit regression, the number of art books purchased and months since last purchase and gender
influenced the customer to order “The Art History of Florence”.

12
RFM Model

The heat map of mean monetary distribution shows the average monetary value for categories defined by recency
and frequency scores. Darker areas indicate a higher average monetary value. In other words, customers with
recency and frequency scores in the darker areas tend to spend more on average than those with recency and
frequency scores in the lighter areas.

From RFM Model, customers who have purchased from BBBC within last 3 to 6 months, bought 12 books and
spent on average $200 are more prone to purchase “The Art History of Florence”

13
Those who have purchased an art book within the last 3 months should be targeted for the mail campaign in the
Midwest.

Case 1: Mailing entire 50000 customers


No of Mail 50000
Cost per mail 0.65
Total Mailing Cost 32,500
Cost per Book
Purchase & Mail 15
Overhead @ 45% 6.75
Cost/Book 21.75
Selling Price 31.95
Profit 10.2
Historical Response Rate 9%
Expected Orders 4500
Total Shipment & Overhead cost 97,875
Total Campaign Cost 130,375
Total Selling Cost 143,775
Net Profit 13,400

No of Mails 4500
Cost per mail 0.65
Total Mailing Cost 2,925
Purchase & Mail 15
Overhead @ 45% 6.75
Cost/Book 21.75
Total Shipment & Overhead cost 97,875
Total Campaign Cost 100,800
Selling Price 31.95
Revenue 143,775
Net Profit 42,975
Increase in Profit 29,575

Thus by targeting the campaign to 9% of the entire customer, the company can generate $29,575/- more in profit.

RFM Model: RFM Model refers to how recently the customer purchased, how often they purchase and how much
they spend on purchases. This model analyzes customer behavior and helps define market segment, build customer
relations and loyalty.

Advantages

• RFM can be applied to different types of businesses


• It can lead to reduced marketing costs by helping you target the right customers
• The data can be a good source for loyalty programs
• It can be combined with other analytics tools for better insights
• It can help you to identify your best customers

Disadvantages

• Calculating RFM scores can be challenging


• RFM analysis depends on historic data, and not future prospects
• This analysis might not be suitable if you only sell one product

Regression Analysis: Regression Analysis is used to find relationship between independent variables and
dependent variables. It helps identify the factors that most influence the dependent variable. A major limitation to
regression is the inability to know what variable should predict what.

14
Advantages

• Regression models are easy to understand as they are built upon basic statistical principles, such as
correlation and least-square error.
• the output of regression models is an algebraic equation that is easy to understand and use to predict.
• The strength (or the goodness of fit) of the regression model is measured in terms of the correlation
coefficients, and other related statistical parameters that are well understood.
• The predictive power of regression models matches with other predictive models and sometimes
performs better than the competitive models.
• Regression models can include all the variables that one wants to include in the model.

Disadvantages

• Regression models cannot work properly if the input data has errors (that is poor quality data). If the
data pre-processing is not performed well to remove missing values or redundant data or outliers or
imbalanced data distribution, the validity of the regression model suffers.
• Regression models are susceptible to collinear problems (that is there exists a strong linear correlation
between the independent variables). If the independent variables are strongly correlated, then they will
eat into each other’s predictive power and the regression coefficients will lose their ruggedness.
• As the number of variables increases the reliability of the regression models decreases. The regression
models work better if you have a small number of variables.
• Regression models do not automatically take care of nonlinearity. The user needs to imagine the kind
of additional terms that might be needed to be added to the regression model to improve its fit.
• Regression models work with datasets containing numeric values and not with categorical variables.

Binary Logit Model: Binary logit model takes advantage of actual customer choices to infer the customer’s value.
This model provides explicit measures that help companies understand the factors contributed to customer value.
It is difficult to make statistical estimates for each person in this model.

Advantages

• Logistic regression is easier to implement, interpret, and very efficient to train.


• It makes no assumptions about distributions of classes in feature space.
• It is very fast at classifying unknown records.
• It not only provides a measure of how appropriate a predictor (coefficient size) is, but also its direction
of association (positive or negative).
• It can easily extend to multiple classes (multinomial regression) and a natural probabilistic view of
class predictions.

Disadvantages

• If the number of observations is lesser than the number of features, Logistic Regression should not be
used, otherwise, it may lead to overfitting.
• The major limitation of Logistic Regression is the assumption of linearity between the dependent
variable and the independent variables.
• Non-linear problems can’t be solved with logistic regression because it has a linear decision surface.
Linearly separable data is rarely found in real-world scenarios.
• Logistic Regression requires average or no multicollinearity between independent variables.

As the company is beginning to focus on predictive technologies, RFM model will be best suited for them as it
easy to understand and provide some of the analysis required for increasing the response rates for campaigns.

15
Case 4: ConneCtor PDA
After running segmentation by setting standardize the data since monthly and price have different scale with
respect to other variables. With hierarchical clustering, we have obtained a dendrogram and segmentation
variables which are given below.

3.13

1.45

1.11
Distance

.34
.28
.27
.26
.25

1 8 6 4 9 2 5 7 3
Cluster ID
Figure 1: Dendrogram

The cluster size as obtained is given below

Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster


Size / Cluster Overall
1 2 3 4 5 6 7 8 9
Number of
160 14 11 16 15 21 16 19 26 22
observations
Proportion 1 0.087 0.069 0.1 0.094 0.131 0.1 0.119 0.162 0.138
Table 1: Cluster Sizes

16
Segmentation Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster Cluster
Overall
variable 1 2 3 4 5 6 7 8 9
Innovator 3.47 3.43 2.45 2.19 5.13 2.1 4.06 2.79 3.65 5.09
Use Message 4.21 3.57 6.27 3.19 4.07 5.33 5.12 5.58 2.85 3.09
Use Cell 5.56 5.86 5.27 4.31 6.07 6.43 6.37 4.42 5.5 5.68
Use PIM 4.01 6.5 2.64 3.06 3.27 2.52 5.37 1.95 5.88 4.27
Inf Passive 4.45 4.57 5.45 6.12 3.4 3.67 5.87 3.21 4.73 3.82
Inf Active 4.5 6 4.45 6.25 3.8 3.24 4.5 4.32 5.19 3.32
Remote Access 3.99 4.07 4.27 5.31 2.27 5.24 3.81 5.26 3.77 2.09
Share Inf 3.71 1.93 3.64 6.12 2.53 3.52 3.94 4 3.85 3.55
Monitor 4.79 5.57 4.82 5 3.2 5.76 3.94 5.74 3.81 5.27
Email 4.72 6.71 4.82 2.87 5.67 3 6.25 2.79 5.38 5.55
Web 4.47 5.14 3.27 1.44 5.53 3.14 6.06 2.79 5.69 6.27
Media 4.01 5.5 3 1.94 5.8 2.24 5 2.37 5.15 4.91
Ergonomic 4.63 3.79 6 5.5 5.93 3.95 4.56 3.32 3.65 5.95
Monthly 28.8 25 27.7 45.3 38.3 22.9 23.7 26.6 25 28.6
Price 332 254 283 488 421 250 256 292 330 403
Table 2: Segmentation Variables

With 2 With 3 With 4 With 5 With 6 With 7 With 8 With 9


Observation
clusters clusters clusters clusters clusters clusters clusters clusters
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 8 8
3 1 1 1 1 1 1 8 8
4 1 1 1 1 1 1 8 8
5 1 1 1 1 1 1 8 8
6 1 1 1 1 1 1 8 8
7 1 1 1 1 6 6 6 6
8 1 1 1 1 1 1 8 8
9 1 1 1 1 1 1 8 8
Table 3: Cluster Members

Using dendrogram we could determine the number of clusters that should be extracted by looking for the highest
jump along the horizontal axis of the dendrogram. The highest jump is 2.02 move from 1.11 to 3.13, so we must
consider between 2 or 3 clusters but when we consider the lower part of the dendrogram, cluster 4 and 9 are quite
different from cluster 1, 8 and 6, so we decided to choose 4 clusters so that there won’t be any loss of information.
If we run the analysis using K-Means again by setting to four segments the results are as given below,

Size / Cluster Overall Cluster 1 Cluster 2 Cluster 3 Cluster 4


Number of observations 160 58 48 16 38
Proportion 1 0.362 0.3 0.1 0.237
Table 4: Cluster Sizes

17
Segmentation variable Overall Cluster 1 Cluster 2 Cluster 3 Cluster 4
Innovator 3.47 3.64 2.4 2.19 5.13
Use Message 4.21 3.78 5.58 3.19 3.55
Use Cell 5.56 5.81 5.42 4.31 5.87
Use PIM 4.01 5.83 2.21 3.06 3.89
Inf Passive 4.45 5.02 3.85 6.12 3.63
Inf Active 4.5 5.22 3.81 6.25 3.53
Remote Access 3.99 3.74 5.21 5.31 2.26
Share Inf 3.71 3.4 3.71 6.12 3.18
Monitor 4.79 4.29 5.62 5 4.42
Email 4.72 5.98 3.17 2.87 5.55
Web 4.47 5.5 3.04 1.44 6
Media 4.01 5.19 2.31 1.94 5.24
Ergonomic 4.63 4.03 4.06 5.5 5.89
Monthly 28.8 24.7 25.1 45.3 32.6
Price 332 290 269 488 411
Table 5: Segmentation Variables

After doing the analysis from data obtained from K-Means we obtained 4 clusters that can be named as Cluster 1
is identified as Average users because they often use cell, PIM, send and received time-sensitive information,
constant access to e-mail, permanent Web access, and use multimedia frequently. They often not use remote
access to information or sharing information and are on budget for both monthly fee and PDA device.

Cluster 2 is identified as Aged users because they are heavy user of messaging service, often use cell, and prefer
good ergonomics. They do not care about innovativeness, rarely use PIM, web, email, media and are on budget
for both monthly fee and PDA device.

Cluster 3 is identified as Corporate users because they does not care about innovativeness, rarely use messaging
service, PIM, web, email, media and uses cell phone occasionally. They often send and receive information,
remote access and sharing information and it is important to have a communication device that has good
ergonomics and can afford high-priced PDA and monthly fee.

Cluster 4 is identified as Technology lovers because they often use cell phone, important to have constant access
to e-mail, Web, and use multimedia frequently, and a communication device with good ergonomics. They rarely
use Instant Messaging service, PIM, send and receive information, remote access to information, or sharing
information and can afford high-priced PDA and monthly fee.

We would like to target all clusters by offering various feature of ConneCtor. ConneCtor will use the same
operating system and will have common features such as monochrome models, PDA modems, a touch-screen,
and handwriting recognition software for writing text, as well as bundled software and basic audio capabilities.
The pricing of the Basic ConneCtor is reasonable and will meet Clusters 1 and 2. In any case, for other clusters,
we will provide varying memory sizes, high-quality colour screens, and greater audio characteristics. These will
be more expensive.

By using discrimination analysis it’s possible for identification and understand differences between clusters, and
also provided correlations between discriminant axes. So we can understand customers need and preference for
each cluster.

18
Discriminant variable Overall Cluster 1 Cluster 2 Cluster 3 Cluster 4
Away 4.206 4.31 4.979 5.375 2.579
Education 2.506 2.5 2.146 1.938 3.211
PDA 0.437 0.448 0.146 0.188 0.895
Income 66.894 62.81 59.646 52.438 88.368
Business Week 0.275 0.293 0.167 0 0.5
Field & Stream 0.125 0.034 0.25 0.312 0.026
Mgoumet 0.019 0 0 0 0.079
PC 0.981 1 1 0.812 1
Construction 0.081 0.052 0.062 0.312 0.053
Emergency 0.037 0.017 0.021 0.188 0.026
Cell 0.875 0.897 0.917 0.625 0.895
Computers 0.231 0.19 0.125 0.312 0.395
Sales 0.3 0.552 0.208 0.062 0.132
Age 40.006 43.724 36.083 42.188 38.368
Professional 0.162 0.069 0.167 0 0.368
Service 0.181 0.103 0.417 0.125 0.026
PC Magazine 0.244 0.155 0.292 0.25 0.316
Table 6: Discriminant Variables

Discriminant variable Function 1 Function 2 Function 3


Away -0.714 -0.094 -0.114
Education 0.701 0.054 0.044
PDA 0.681 0.188 -0.06
Income 0.618 0.087 0.285
Business Week 0.41 -0.074 0
Field & Stream -0.355 0.102 0.355
Mgoumet 0.262 0.117 0.186
PC 0.245 -0.57 0.045
Construction -0.179 0.38 -0.006
Emergency -0.143 0.372 -0.002
Cell 0.125 -0.362 0.061
Computers 0.213 0.274 0.054
Sales 0.001 -0.348 -0.706
Age 0.011 0.081 -0.456
Professional 0.319 -0.012 0.424
Service -0.344 -0.316 0.421
PC Magazine 0.034 0.049 0.302
Variance explained 50.55 29.88 19.57
Cumulative variance
50.55 80.43 100
explained
Significance level 0 0 0
Table 7: Discriminant Function

19
Actual / Predicted cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4
Cluster 1 37 10 4 7
Cluster 2 10 33 5 0
Cluster 3 2 2 12 0
Cluster 4 3 0 0 35
Actual / Predicted cluster Cluster 1 Cluster 2 Cluster 3 Cluster 4
Cluster 1 63.80% 17.20% 06.90% 12.10%
Cluster 2 20.80% 68.80% 10.40% 00.00%
Cluster 3 12.50% 12.50% 75.00% 00.00%
Cluster 4 07.90% 00.00% 00.00% 92.10%
Hit Rate (percent of total cases correctly classified) 73.12%
Table 8: Confusion Matrix

For cluster 1 we would go with sales professionals which consist of Consists mainly of sales professional’s
occupations, some college, own PC and cell phone already, some already have PDA. Often spend time away from
the office. Read Business Week magazine. Their average age is 44 and quite price sensitive.
For cluster 2 we would go with service professionals. This cluster is having average age of 36 years who prefer
high-resolution display. But quite price sensitive. Most of them don’t have PDA yet, but own PC and cell phone.
They spend much time away office in remote locations. They read PC Magazine, Field & Stream.
For cluster 3 we would go with Construction, Emergency who usually work away from office, high school, lowest,
some of them own PC, cell phone but usually do not own a PDA. By the nature of their work, they have high
information needs and exchange information with colleagues in the field. Many read Field & Stream and also PC
Magazine. They are the least price sensitive.
For cluster 4 we would go with professionals who are early adopters, highest income. Most of them own PC, Cell
phone and PDA. They read many magazines, especially BW, PCMag. Most are highly paid and highly educated.

By doing the analysis difference and similarity between each of the customers was understood and thereby
knowing their needs better. We were able to understand which cluster we should focus on, here Cluster 1 have
lower income, cluster 4 have higher income, technology oriented but it’s a niche. After analysing we will be
focusing on all 4 clusters, cluster 1(Average users), Cluster 2(Aged users), Cluster 3(Corporate users) and cluster
4(Technology lover).

There are few concerns regarding the data collection approach which mainly focuses on worker and corporate but
freelance, student, housewife will be a better potential customers. There is disagreement with cluster 3 which is
having highest pay for monthly fee and PDA but it has the lowest income or it should be like they don’t have to
pay for these devices and services by their own. If that’s the case rather than focusing on personal needs we should
focus on company needs.

Since this is a niche market with potential customers who is having high purchasing power they could gain high
profit from these customers so they should keep on developing their PDA technology in order to become the
market leader and try not to leave other segments, mass market and should provide standard PDA at an affordable
price. These are the main upcoming steps which they should take for their development.

20

You might also like