Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Clustering for marketing


• When and why is cluster analysis needed

• Taxonomy of clustering methods

• Cluster analysis operationalization

Marketing Analytics
Why to Segment the Market?

Deep customer

customer value

Branding and Product

communication development

Marketing Analytics
Segmentation Objectives

Segmentation could have many different operational objectives, for

• To develop new product or service
• To raise brand or new product awareness
• To facilitate new product acceptance
• To improve current process
• To identify Customer Relationship Management programs
• …

Such objectives have to be properly defined:

• They must be consistent with the business needs at hand
• They guide the process to be followed

Marketing Analytics
Segmentation Process
To start with a deep understanding of customers…

Qualitative Quantitative
Focus group;
Primary Data warehouse;
Big data;

Report; Statistics;
Media; Data warehouse;
Journals; Big data;
… …

Understand different perspectives; Identify final solutions; generalize

collect relevant ideas; develop initial results to large population; support
understanding; support developing subsequent business decisions.
quantitative study.

Marketing Analytics
Segmentation Variables

Customers are different:



• Survey
• Experiments (large scale)
• Data warehosue
• Big data
Psychographics • …

Which of these variables are relevant for my product or market?

Marketing Analytics
When and Why is Cluster Analysis Needed

Subdivide the records of a dataset into homogeneous groups of

observations, called clusters, so that observations belonging to
one group are similar to one another and dissimilar from
observations included in other groups

Diagnostic clusters Biological taxonomy

Marketing Analytics
When and Why is Cluster Analysis Needed

Customer personas are embodiments of a company’s target segment.

A persona has his/her needs, wants, beliefs, preferences, goals,
motivations, or behavioral habits.

Marketing Analytics
When and Why is Cluster Analysis Needed

Segmentation could have different operational objectives, including:

• To develop new product or service
• To raise brand or new product awareness
• To facilitate new product acceptance
• To identify Customer Relationship Management programs

Such objectives have to be properly defined:

• They must be consistent with the business needs at hand
• They guide the process to be followed

Marketing Analytics
When and Why is Cluster Analysis Needed

Marketing Analytics
Select clustering solution

Which solution to choose?

• Statistically, the best solution can be suggested by indicator of

distinctiveness among the clusters

• For marketing application, it is more important to consider:

• Whether the clusters provide you relevant and actionable insights
• Whether these insights are consistent with your business
• Whether it satisfies the crieteria of ‘good segmentation’

Marketing Analytics
Select clustering solution

From experience, there could be some general consideration when it comes to

selecting the right clustering solution:

• Solution with too few clusters may not adequately capture the diversity (for
example, the 2-cluster solution of music preference only found substantial
difference regarding classical music, while the preferences for all other types
remain average)

• Solution with too many clusters will be increasingly difficult to interpret and to
manage implementation of eventual applications

Marketing Analytics
Select clustering solution

This type of statistical analysis does not have a unique, absolutely correct

Marketing Analytics
Taxonomy of clustering methods

Clustering methods can be classified into a few main types based

on the logic used for deriving the clusters:

• Partition methods

• Hierarchical methods

• Density based methods

• Grid methods

Marketing Analytics
Taxonomy of clustering methods

Clustering methods can be classified into a few main types based

on the logic used for deriving the clusters:

• Partition methods

• Hierarchical methods

• Density based methods

• Grid methods

Marketing Analytics
Taxonomy of clustering methods: K-Means

• Clustering objects into homogeneous groups based on

variables with continuous/scale variables (e.g. age, height).
• It classifies objects by computing the distance among the
objects. Variables with very different scale should be
standardized to avoid the clustering being dominated by the
measurement scale of large range.
• The number of clusters has to be imposed before
computation: several K-Means cluster analyses could be
performed to choose the best solution
• It is fast and easy to perform on a large dataset
• It cannot classify subjects with missing value

Marketing Analytics
Segmentation Methods

K-Means Cluster Analysis

• Clustering objects into homogeneous groups based on variables with

interval measurements.
• For example: actual age/height/weight/income, Likert scale survey data,
ticket size, annual spending, minutes spent online, etc.
• It classifies objects by computing the Euclidean distance among the objects.
Variables with very different scale should be standardized to avoid the
clustering being dominated by the measurement scale of large range.
• The number of clusters has to be imposed before computation: several K-
Means cluster analyses could be performed to choose the best solution
• It is fast and easy to perform on a large dataset
• It cannot classify subjects with missing value

Marketing Analytics
Variables of different scale



ID Age Income
1 25 20K Without standardization, the numerical
difference in age is neglectable compare to
2 30 28K the difference in income. A K-means analysis
3 35 52K with the original measurement scale would
4 38 43K
practically be based on only income level.
5 41 35K
6 45 60K 30000

7 47 44K
8 51 39K 20000

9 55 62K
10 58 40K 10000

0 50 100

Marketing Analytics
Example: Height & Weight of 10.000 People

Marketing Analytics
Example: Height & Weight of 10.000 People

Results of cluster analysis Actual male vs. female

• 4515 males are correctly classified; 485

males mis-classified
• 4550 female are correctly classified;
450 females mis-classified

Marketing Analytics
Segmentation Based on Data

Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 Variable 6 Variable 7 ……

Customer 1 Value1_1 Value2_1 Value3_1 Value4_1 Value5_1 Value6_1 Value7_1 …
Customer 2 Value1_2 Value2_2 Value3_2 Value4_2 Value5_2 Value6_2 Value7_2 …
Customer 3 Value1_3 Value2_3 Value3_3 Value4_3 Value5_3 Value6_3 Value7_3 …
Customer 4 Value1_4 Value2_4 Value3_4 Value4_4 Value5_4 Value6_4 Value7_4 …
Customer 5 Value1_5 Value2_5 Value3_5 Value4_5 Value5_5 Value6_5 Value7_5 …
Customer 6 Value1_6 Value2_6 Value3_6 Value4_6 Value5_6 Value6_6 Value7_6 …
Customer 7 Value1_7 Value2_7 Value3_7 Value4_7 Value5_7 Value6_7 Value7_7 …
Customer 8 Value1_8 Value2_8 Value3_8 Value4_8 Value5_8 Value6_8 Value7_8 …
Customer 9 Value1_9 Value2_9 Value3_9 Value4_9 Value5_9 Value6_9 Value7_9 …
Customer 10 Value1_10 Value2_10 Value3_10 Value4_10 Value5_10 Value6_10 Value7_10 …
…… … … … … … … … …

Survey data could include: attitude, opinion, perception, self-reported behavior,

demographics, ...
Customer database could include: transactional data (items, value, frequency,
channel, etc.), interaction data (contacts made, reasons, outcomes, channel, etc.),
demographics, …

Marketing Analytics
Segmentation Based on Data
Data dimension reduction

Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 Variable 6 Variable 7 ……

Customer 1 Value1_1 Value2_1 Value3_1 Value4_1 Value5_1 Value6_1 Value7_1 …
Customer 2 Value1_2 Value2_2 Value3_2 Value4_2 Value5_2 Value6_2 Value7_2 …
Customer 3 Value1_3 Value2_3 Value3_3 Value4_3 Value5_3 Value6_3 Value7_3 …
Customer 4 Value1_4 Value2_4 Value3_4 Value4_4 Value5_4 Value6_4 Value7_4 …
Customer 5 Value1_5 Value2_5 Value3_5 Value4_5 Value5_5 Value6_5 Value7_5 …
Customer 6 Value1_6 Value2_6 Value3_6 Value4_6 Value5_6 Value6_6 Value7_6 …
Customer 7 Value1_7 Value2_7 Value3_7 Value4_7 Value5_7 Value6_7 Value7_7 …
Customer 8 Value1_8 Value2_8 Value3_8 Value4_8 Value5_8 Value6_8 Value7_8 …
Customer 9 Value1_9 Value2_9 Value3_9 Value4_9 Value5_9 Value6_9 Value7_9 …
Customer 10 Value1_10 Value2_10 Value3_10 Value4_10 Value5_10 Value6_10 Value7_10 …
…… … … … … … … … …

Often they could be a high number of variables, which makes it difficult to

understand the segmentation result.
When appropriate, it is advisable to first reduce the data dimension through
Factor Analysis, which groups correlated variables.

Marketing Analytics
Segmentation Based on Data

Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 Variable 6 Variable 7 ……

Customer 1 Value1_1 Value2_1 Value3_1 Value4_1 Value5_1 Value6_1 Value7_1 …
Customer 2 Value1_2 Value2_2 Value3_2 Value4_2 Value5_2 Value6_2 Value7_2 …
Customer 3 Value1_3 Value2_3 Value3_3 Value4_3 Value5_3 Value6_3 Value7_3 …
Customer 4 Value1_4 Value2_4 Value3_4 Value4_4 Value5_4 Value6_4 Value7_4 …
Customer 5 Value1_5 Value2_5 Value3_5 Value4_5 Value5_5 Value6_5 Value7_5 …
Customer 6 Value1_6 Value2_6 Value3_6 Value4_6 Value5_6 Value6_6 Value7_6 …
Customer 7 Value1_7 Value2_7 Value3_7 Value4_7 Value5_7 Value6_7 Value7_7 …
Customer 8 Value1_8 Value2_8 Value3_8 Value4_8 Value5_8 Value6_8 Value7_8 …
Customer 9 Value1_9 Value2_9 Value3_9 Value4_9 Value5_9 Value6_9 Value7_9 …
Customer 10 Value1_10 Value2_10 Value3_10 Value4_10 Value5_10 Value6_10 Value7_10 …
…… … … … … … … … …

Then, different clustering techniques can be applied to group similar customers,

in order to create segments.

Marketing Analytics
Taxonomy of clustering methods: K-Means

Marketing Analytics
Taxonomy of clustering methods

Clustering methods can be classified into a few main types based

on the logic used for deriving the clusters:

• Partition methods

• Hierarchical methods

• Density based methods

• Grid methods

Marketing Analytics
Taxonomy of clustering methods: Hierarchical

• Could use different types of variables, including interval,

nominal, and ordinal. However, it is not advisable to mix
different types of variables in the same analysis
• It simultaneously computes results from 1 to N clusters for
sample size N. The number of clusters is decided later.
• It is more difficult to calculate for large dataset.

Marketing Analytics
Taxonomy of clustering methods: Hierarchical

2 clusters

3 clusters

Marketing Analytics
Segmentation Methods
Hierarchical cluster analysis

2 clusters

3 clusters
4 clusters

Each individual observations

Marketing Analytics
Hierarchical clustering: Clustering method identification

Ward’s method

Nearest Neighbour

Marketing Analytics
Segmentation Methods
Hierarchical cluster analysis

Hierarchical Cluster Analysis

• Hierarchical cluster analysis could use diverse types of variables:

interval, nominal, and ordinal. However, it is not advisable to mix
different types of variables in the same analysis.
• For example: (besides interval variables), gender, education level,
model of product purchased, touch-points used, etc.
• It simultaneously computes results from 1 to N clusters for sample
size N. The number of clusters is decided later.
• It is more difficult to calculate for large dataset.

Marketing Analytics
R – Hierarchical segmentation

Marketing Analytics
R – Hierarchical segmentation

Package: cluster; Main function: agnes

See source code in ’Music HCL.R’

Marketing Analytics
Segmentation Methods
Latent class segmentation

Latent Class Segmentation

• Latent class segmentation computes with probability, therefore the

most significant superiority is that it can deal with different types of
variables (internal, nominal, ordinal) simultaneously; it can also
deal with missing values.
• It has dependent variables, therefore ensures the meaningfulness of
the clustering.
• It can include (active or inactive) covariates. Active covariates would
influence the clustering; inactive covariates are mainly for profiling.
• Fitness indicators are available to suggest the number of clusters
which explains the data the best.

Marketing Analytics
Segmentation Methods
Latent class segmentation

Indicators: the endogenous variables that

serve as the primary drivers for
determining the segmentation

Covariates: (active) covariates are the

secondary driver for determining the
segmentation, which could be, for example,
a critical outcome variable.

Marketing Analytics
Latent Class Analysis

Response Response Response Response

variable 1 variable 2 variable 3 …… variable n


Covariate(s) Response variables and covariates are

from the dataset. They can be different
types of variables simultaneously.

Marketing Analytics
Latent Class Analysis

Marketing Analytics
Some heuristics about clustering and cluster analysis


Durable Homogeneity


Important Parsimonious

Marketing Analytics

You might also like