Guide To Build Better Predictive Models Using Segmentation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Guide to Build Better

Predictive Models using


Segmentation
www.analyticsvidhya.com

Introduction
We use linear or logistic regression technique for developing accurate
models for predicting an outcome of interest. Often, we create
separate models for separate segments. To judge their effectiveness,
we even make use of segmentation methods such as CHAID or CRT.

But, is that necessary ? Can’t we create a single model and enable it


with some segmentation variable as an input to the model ?

May be, we could. Particularly because, creatingseparate model


for separate segments may be time consuming and not
worth the effort. But, creating separate model for separate
segments may provide higher predictive power.
In this article, not only I’ve given an answer to the question above,
but have also shared a perfect guideline for doing optimal
segmentation for model development. Furthermore, this article also
explores the possibilities of leveraging a segment model approach
using complex techniques like stochastic gradient boosting or random
forest into a simple logistic or linear regression framework (albeit to
a very limited extent).

Doing this, brings the essence of interaction effects in modeling


process and replicated the advantages of complex techniques stated
above.

What is Segmentation ?
I could take you down in the depth of marketing to explain this
concept. But, why to complicate things if they could be made simpler?

So, I define Market Segmentation as,

“Dividing the target market or customers on the basis of some


significant features which could help a company sell more products in
less marketing expenses.”
Companies have limited marketing budgets. Yet, the(marketing team)
is expected to makes large number of sales to ensure rising revenue &
profits. In limited marketing budgets, how is it made possible?
Answer is, using segmentation.

Let’s move a step back and understand how do companies create a


product which people buy.

Actually, a product is created in two ways –

1. Create a product after analyzing (research) the needs and


wants of target market – For example: Computer.
Companies like Dell, IBM, Microsoft entered this market
after analyzing the enormous market which this product
upholds.
2. Create a product which evokes the needs & wants in target
market – For example: iPhone. Nobody knew they will
need a touch screen phone, until Steve Jobs made them
realize what they are deprived of!

Once the product is created, the ball shifts to the marketing team’s
court. As mentioned above, they make use of market segmentation
techniques. This ensures the product is positioned to the right
segment of customers with high propensity to buy.

Techniques of Segmentation
There are two broad set of methodologies for segmentation:
Objective (supervised) and Non-Objective (unsupervised)
segmentation methodologies. As the name indicates, a supervised
methodology requires the objective to be stated as the basis for
segmentation.

Given below are some examples of objective and non-objective


approaches.
Objective Segmentation

• Segmentation to identify the type of customers who would


respond to a particular offer
• Segmentation to identify high spenders among customers
who will use the e-commerce channel for festive shopping
• Segmentation to identify customers who will default on
their credit obligation for a loan or credit card

Non-Objective Segmentation

• Segmentation of the customer base to understand the


specific profiles which exist within the customer base so
that multiple marketing actions can be personalized for each
segment
• Segmentation of geographies on the basis of affluence and
lifestyle of people living in each geography so that sales and
distribution strategies can be formulated accordingly
• Segmentation of web site visitors on the basis of browsing
behavior to understand the level of engagement and affinity
towards the brand

Hence, it is critical that the segments created on the basis of an


objective segmentation methodology must be different with respect to
the stated objective (e.g. response to an offer).

However, in case of a non-objective methodology, the segments are


different with respect to the “generic profile” of observations
belonging to each segment, but not with regards to any specific
outcome of interest.

The most common techniques used for building an objective


segmentation are CHAID and CRT. Each of these techniques attempt
to maximize the difference among segments with regards to the
stated objective (sometimes referred to as the target for
segmentation). CHAID uses a chi square statistic for the same while
CRT uses Gini impurity.

The most common techniques for building non-objective


segmentation are cluster analysis, K nearest neighbor techniques etc.

Each of these techniques uses a distance measure (e.g. Euclidian


distance, Manhattan distance, Mahalanobis distance etc.) This is
done to maximize the distance between the two segments.
This implies maximum difference between the segments with regards
to a combination of all the variables (or factors).

How to create segments for model


development ?
1. Commonly adopted methodology

If you have thoroughly followed this article till here, we are now good
to delve into the methodology adopted for creating segments. Of
course, with the sole objective of building separate models for each
segments.

Let us consider an example.

Here we’ll build a logistic regression model for predicting likelihood


of a customer to respond to an offer. A very similar approach can also
be used for developing a linear regression model. I’ve discussed it in
the following section.

Logistic regression: It uses 1 or 0 indicator in the historical


campaign data, which indicates whether the customer has responded
to the offer or not.

Usually, one uses the target (or ‘Y’ known as dependent variable) that
has been identified for model development to undertake an objective
segmentation. Remember, a separate model will be built for each
segment. A segmentation scheme which provides the maximum
difference between the segments with regards to the objective is
usually selected. Below is a simple example of this approach.

Fig-1: Sample segmentation for building a logistic


regression – commonly adopted methodology

The above segmentation scheme is the best possible objective


segmentation developed, because the segments demonstrate the
maximum separation with regards to the objectives (i.e. response
rate).

In the above tree, each separation should represent a statistically


significant difference between the nodes with respect to the target. If
a CHAID algorithm is used for developing the segmentation tree, then
Chi Square value for each separation should be significantly different
from zero (as measured by the “p” value of the separation).

In addition, it is the common business intuition (which may not


always have a sound statistical rationale), to develop separate models
if the difference in response rates between adjacent node is at least
30% (e.g. if the response rate in a particular node is 0.7% and the
same for the adjacent node is 0.5% then the difference in response
rate is ~30%)

The commonly adopted approach would suggest that one should build
separate models for each of the terminal (or end) nodes, which have
been depicted in green in Fig-1. But, is this the best approach from a
modeling perspective? To answer that question, we need to find out a
measure for evaluating a segmentation scheme from a modeling
perspective.

The most effective measure for evaluating a segmentation scheme for


the purpose of building separate models is the lift in predictive power
that can be achieved by building segmented models. The following
example is used to illustrate the same.

Let us assume that a logistic model is developed on the entire


population to predict the likelihood of response.

Let us designate this as Model-1 (mostly analysts describe this as the


parent model), let the Gini for this model be 0.57. Now as part of the
segmented model development approach, five separate models are
built, one for each end node (mostly analysts describe them as child
models).

• Model-2.1 logistic model built for predicting response for


the segment “age less than 30”
• Model-2.2 logistic model built for predicting response for
the segment “age between 30 and 48 and income less than
INR 800K”
• Model-2.3 logistic model built for predicting response for
the segment “age between 30 and 48 and income greater
than or equal to INR 800K”
• Model-2.4 logistic model built for predicting response for
the segment “age more than 48 and income less than INR 1
Million”
• Model-2.5 logistic model built for predicting response for
the segment “age more than 48 and income greater than or
equal to INR 1 Million”

After building 5 separate models, the score or the predicted


probability is calculated for each observation (or record) and the 5
data sets (for each end node) are appended.
The Gini of the combined data set is compared with the Gini of
model-1. Then, the ratio of the two is designated as the lift in
predictive power. For instance, if the Gini of the combined data set is
0.6, then the lift will be 1.05. It can be seen that, though
segmentation is the best possible objective segmentation, yet it
provides only 5% extra lift in predictive power.

Let us find out why this may be the case. It should be noted
that, when one is developing a linear model, the lift in Adjusted R
Square should be considered instead of lift in Gini.

While building the overall model (Model-1), one can always use
appropriate dummy variables to represent the segmentation. For
instance, one can use the following dummies (it should be noted that
due to degree of freedom constraint there will be one less than all
possible number of dummies)

• Dummy-1: Age less than 30 then 1 else 0


• Dummy-2: Age between 30 and 48 then 1 else 0
• Dummy-3 Income less than INR 800K
• Dummy-4 Income between 800K and 1 Million

The predictive power of the model will be even better if one uses
dummies to replicate the segmentation tree

• Dummy-1: age less than 30


• Dummy-2: age between 30 and 48 and income less than INR
800K
• Dummy-3: age between 30 and 48 and income greater than
INR 800K
• Dummy-4: age more than 48 and income less than INR 1
Million

These dummies would provide the same differentiation in response


rate as that of the five individual segments. Hence, it can be seen that
the differentiation in response rate which is provided by the
segmentation can easily be replicated by using a set of dummy
variables in an overall regression model.

However, this is not the complete explanation behind the low lift in
predictive power. It is also important to consider the set of
segmented models or the child models. Fig-2 provides the list of
variables in the child models.

The variables in each model are ordered in accordance to their


predictive strength (as measured by the Wald Chi Square and the
Standardized Betas). The same color has been used to depict a
particular variable across segments so as to make the comparison
easier.

Fig-2: Variables across the 5 child models

It can be observed that the variables in each of the five child models
are quite similar, though the relative ordering of the variables is
slightly different. This implies that the drivers of response are similar
across segments. In addition, if one considers the predictive pattern
of a particular variable across segments, one can observe something
even more interesting.

Fig-3 depicts the predictive pattern for the variable “Number of


purchase in the last 12 months”. For depicting the predictive pattern,
the Weight of Evidence (WoE) has been plotted.

Weight of Evidence is a common measure used for understanding if a


particular range of value for a variable has a relatively higher or
lower concentration of the desired target. A positive value of WoE
indicates that there is a higher concentration of the target and vice-
versa.

In this case, for higher values of number of purchases, one observes a


higher WoE; indicating a relatively higher concentration of
the respondents. While building a linear model, the average value of
the target across each range of the variable should be used for
understanding the predictive pattern.

A visual inspection of the graph reveals that though the individual


WoEs are different across the segments, yet the trend is very similar.
This implies that the predictive pattern of the variable is same across
segments. Therefore, the impact of variable in overall model is not
very different as compared to the segment wise impact.

In other words, it means that there is no interaction effect between


the segmenting variables (i.e. age and income) aJnd the predictor
variable ““Number of purchase in the last 12 months”. Hence, the
segmentation is not expected to yield any benefit with regards to lift
in predictive power.

It should also be noted that in this case the information value of the
variables are also similar across segments. In case of a linear model,
partial R square can be used instead of information value. If this is
the case for most of highly predictive variables, then segmentation
would add limited value to the overall predictive power.
Fig-3: Predictive Pattern of the Variable “Number of
purchase in the last 12 months” Across the Five Segments

2. The Appropriate Methodology

In order to harness the interaction effect between the segmenting


variables and the predictor variables, it is important to devise a
segmentation scheme where predictors and predictive patterns of the
variables are different across segments.

This will help one to create a scenario where the predictive power of
the segmented models is higher than the predictive power of the
overall model. Fig-4 provides an alternate segmentation scheme for
the problem described earlier.

Fig-4: Sample segmentation for building a logistic


regression – alternate methodology
In this case, one would develop the following segmented models
(child models):

• Model-3.1 logistic model built for predicting response for


the segment “No purchase in the last 12 months”
• Model-3.2 logistic model built for predicting response for
the segment “At least 2 purchases in the last 12 months”
• Model-3.3 logistic model built for predicting response for
the segment “More than 2 purchase in last 12 months and
No product reviews submitted ever”
• Model-3.4 logistic model built for predicting response for
the segment “More than 2 purchase in last 12 months and At
least one product review submitted”

The variables in the child models have been depicted in Fig-5. As in


earlier case, the common variables have been highlighted in same
color. It can be observed that in this case, the extent of overlap of the
variables between segments is very limited. Therefore, each segment
represents a homogeneous set of customers for whom the driver of
response are almost completely different.

Hence, in this case the Gini of the segmented system of models is


significantly higher compared to the Gini of the overall model. This
segmentation provides a significantly superior predictive power
which is created due to the interaction of the segmenting variables
and the predictor variables.

Fig-5: Variables across the 4 child models


Fig-6: Predictive Pattern of the Variable “Number of
purchase in the last 24 months” Across the Five Segments

In this case, one can observe that the predictive pattern of a


particular variable is significantly different across segments. The
lines in Fig-5 are much more dispersed and look different from one
another as compared to Fig-3, which indicates that the predictive
pattern of the variable is different across segments.

Therefore the impact of the variable in the overall model is quite very
different as compared to the segment wise impact. In other words, it
means that there is significant interaction effect between the
segmenting variables and the predictor variable “No. of different
categories of purchase made in last 18 months”.

Hence, the segmentation is expected to produce superior lift in


predictive power. It should also be noted that in this case the
information value of the variables are also different across segments.
Using Machine Learning for
Segmentation
There is another interesting aspect to it. For segmentation, it is also
good to consider machine learning algorithms based on multiple
trees. Multiple Additive Regression Tree, Random Forest and
Stochastic Gradient Boosting are techniques that use a multitude of
trees and an ensemble of the same for making predictions.

For instance if one considers stochastic gradient boosting, at a very


simplistic (and possibly amateurish level), the method involves
building an ensemble of trees wherein the residual from the first tree
is used as the target for the second tree and so on till no further
improvement in predictive power is observed.

Each tree in this case consists of a few nodes and ensures that it does
not over fit the data. In reality, each of these tree is expected to
capture the interaction effect instead of fitting too closely to the
target at hand. One can relate this philosophy (at a broad level) with
the idea behind creation of segments for developing models, wherein
the objective of the segmentation is not to achieve a closer fit with
the target but to identify interaction effects.

In fact, a possible way of identifying the segments for developing


separate models may involve considering the nodes of the first few
trees in a stochastic gradient boosting ensemble of trees and consider
if those are appropriate for building segmented models.

End Notes
In this article, we learnt the following aspects:
1. While performing segmentation for the purpose of model
development, one should strive to achieve a lift in predictive
power and not merely differentiation with regards to the
target of interest.
2. Segmentation capture interaction effects, whereby the
relative importance of the predictors or the predictive
pattern changes across segments.
3. The process of building such segmentations involve
developing child models for each segment and investigating
the variables that are selected in each of the child models.
4. A graphical examination of the highly predictive variables
across segments can also yield valuable insights. Therefore,
the process requires iteration and careful analysis of the
child models. Sole reliance on a CHAID or CRT algorithm
may not yield the appropriate result. One must remember
that the ultimate objective is to enhance the predictive
power of the segmented system.

Did you find the article useful ? Have you used any of these
techniques in your market segmentation process? Do share your
opinions and suggestions in the comments section below.

About the Authors

Sandhya Kuruganti and Hindol Basu are authors of a book on business


analytics titled “Business Analytics: Applications to Consumer
Marketing”, recently published by McGraw Hill. The book is available
on Flipkart and Amazon India/UK/Canada/US. They are seasoned
analytics professionals with a collective industry experience of more
than 30 years.

If you like what you just read & want to continue your
analytics learning, subscribe to our emails, follow us on
twitter or like our facebook page.

You can also read this article on Analytics Vidhya's Android APP
Related Articles

Trick to enhance power of Regression model

We, as analysts, specialize in optimization of already optimized


processes. As the optimization gets finer, opportunity to make the
process better gets thinner. One of the predictive modeling technique
used frequently use is regression (Linear or Logistic). Another equally
competing technique (typically considered as a challenger) is Decision
tree. What…

October 21, 2013

In "Business Analytics"

Heart Sound Segmentation using Deep Learning - A doctor in making?

Introduction The idea of doing a project on heart sound segmentation


came from a recent breakthrough I heard over the internet. One of
the influencers I follow - Andrew Ng published a research paper a
while back - which essentially is a state-of-the-art method for
detecting heart disease. It was…

November 30, 2017

In "Deep Learning"

Top 10 Pretrained Models to get you Started with Deep Learning (Part
1 - Computer Vision)

Part 1 of this series looks at state-of-the-art pretrained models


available for the field of computer vision, and how you can get started
with them.

July 27, 2018

In "Deep Learning"

You might also like