Professional Documents
Culture Documents
Segmentation-Factor Analysis
Segmentation-Factor Analysis
and
Segmentation
Disclaimer: This material is protected under copyright act AnalytixLabs ©, 2011. Unauthorized use and/ or duplication of this material or any part of this material including data, in any
form without explicit and written permission from AnalytixLabs is strictly prohibited. Any violation of this copyright will attract legal actions.
Learn to Evolve
Introduction to Factor Analysis - PCA
Purpose of PCA
• To find a linear combination of the original variables that has the largest variance
possible.
• Need some restriction on the entries in the linear combination or problem is not
well defined.
10
5 0.7071*LTN + 0.7071*LTG
0
LTN
-5
-1 0
-1 5
0.5034*LTN + 0.8641*LTG
-2 0
-2 0 -1 5 -1 0 -5 0 5 10 15
L TG
What is happening?
• Standard regression problem with response y and regressors X1, X2, …, Xp.
• And you may want to summarize these p responses with one number (“index”)
that best captures the diversity in responses.
• E.g. is common to add the responses, or average them, perhaps being sensitive
to questions that are reverse coded.
• Already should be clear to you that a simple averaging may not be the best way
to summarize the original p questions.
Reduction of Dimension
• Often able to replace the original variables X1, X2, …, Xp with a few new variables,
say, U1, U2, …, Uk where k is much smaller than p.
• By plotting the first two or three pairs of these new variables you can often see
structure you wouldn’t otherwise be able to see (e.g. clustering).
Interpretation
• In rarer cases the new variables, U1, U2, …, Uk, are interpretable and point to
some new facet of the study.
• As you will see, however, one must be very careful with this use of Principal
Components since it is a prime opportunity to go astray and over interpret.
• Look for weights a11, a12, …, a1p such that U1=a11*X1+a12*X2+…+a1p*Xp has the
largest variance subject to the restriction that (a11)2+(a12)2+…+(a1p)2=1
• The numbers a11, a12, …, a1p are called different things in different books. In
SAS they are arrayed in a column and called the first principal component
“eigenvector”.
• If the Xi variables have had their individual means subtracted off, then the new
variable U1is called the first principal component, or in most texts, the first
principal component score.
What’s Next?
• Look for weights a21, a22, …, a2p such that U2=a21*X1+a22*X2+…+a2p*Xp has the
next largest variance subject to the restriction that (a21)2+(a22)2+…+(a2p)2=1
• The numbers a21, a22, …, a2p are called different things in different books. In SAS
they are arrayed in a column and called the second principal component
“eigenvector”.
• If the Xi variables have had their individual means subtracted off, then the new
variable U2 is called the second principal component, or in most texts, the
second principal component score.
What’s News?
• Any two arrays of weights will cross-multiply and sum to 0. Example: (a11
a21)+(a12a22)+…+(a1p a2p)=0
• Same as saying: any two of the new variables will be uncorrelated. Example:
corr(U1,U2)=0.
How Far Does This Go?
• We will look at two or three criteria for how many of these scores to
construct. We’ll start with our common sense.
• Most of the time it is not as hard as it might sound. Basically, we will look
at “how much variance” in the original data is summarized by each new
component variable.
Two Basic Constructs
mean
projection
Recall
3
E ig e n va lu e
Suggests two
2 components
0
1 2 3 4 5
N u m b e r o f C o m p o n e n ts
Loadings
D e fin it io n
C o m p o n e n t lo a d in g s a r e t h e o r d in a r y p r o d u c t -
m o m e n t c o r r e la t io n b e t w e e n e a c h o r ig in a l v a r ia b le
a n d e a ch co m p o n e n t sco re .
Interpretation
By looking at of component loadings one can
ascertain which of the original variables tend to
“load” on a given new variable. This may facilitate
interpretations, creation of subscales, etc.
Introduction to Segmentation
Segmentation
Each individual is so different
that ideally we would want to reach out to each one of them in a different way
1 2 3 4 5 6
…………………………..
1 2 3 4
Solution : Identify segments where people have same characters and target each of
these segments in a different way
Total Population
(1000)
Avg. delinquency
Avg. delinquency age = 0 age = 75 days and
Avg. delinquency Avg. delinquency Avg. age = 50 yrs.
days and Avg. age = 35 yrs.
age = 15 days and age = 12 days and Avg. Utilization = 40%
Avg. Utilization > 80%
Avg. age = 33 yrs. Avg. age = 25 yrs.
Avg. Utilization = 60% Avg. Utilization = 90%
We can exclude the group with avg. delinquency age = 75 days from mailing
This type of segmentation is known as ‘Subjective Segmentation’. It gives the salient characteristics of
the best customers
Applications of Segmentation
Business Tasks & Appropriate Segmentation
Types of Segmentation
Models can be developed in domains of risk, marketing or collections.
• Customer Segmentation
Value Based Segmentation: Customer ranking and segmentation according to current and
expected/estimated customer value
Life Stage Segmentation: Segmentation according to current life stage which he/she belongs
Segment customers, ▪ Only a couple factors are ▪ Cable client segmented prospects on
manually, based on 1 to 3 thought to drive the segments Cross-tabs and their potential telecom spend and used
Rule-based: factors to drive specific ▪ Known hypothesis to cut conditional data the segmentation to align sales
1
Hypothesis driven business objective the data to create segments cuts resources and offers to improve go-to-
market strategy
Segment customers using ▪ Data-driven segments CHAID ▪ Telecom client segmented customers
predictive algorithm, desired, but first and foremost on various factors that drive churn
Supervised:
Behavioral based on high number of segments need to be propensity and targeted high churn
2 With a dependent
segmentation factors that potentially differentiated on a specific segments with retention campaigns
variable
drive a specific outcome outcome/metric (e.g. and offers
revenue)
Segment customers using ▪ Data-driven segments TwoStep, ▪ Retail client segmented customers on
clustering algorithm desired K-Means behavioral shopping factors that
Unsupervis-ed:
based on high number of ▪ Segments need to be included category spend, shopping
3 Without a dependent
factors differentiated across many frequency/tendency, and store/channel
variable
behavioral factors shopped to inform merchandising and
offer strategy
RFM SEGMENTATION
RFM SEGMENTATION- STEPS
RFM SEGMENTATION - STEPS
RFM-SEGMENTATION STEPS
Behavioral Segmentation: Hierarchical Vs. Non-hierarchical
Behavioral Segmentation: Subjective Segmentation-Cluster Analysis
Highest value segment
Big Small
Ticket Frequent Ticket Infrequent Returner Overall
% Customers 9.8 4.2 13.5 69.5 6.6 100.0
% Revenue 27.4 33.6 15.4 13.5 10.1 100.0
Revenue per customer ($) 1,038 8,618 1209.1 220 1613.5 1077.2
Visits per customer 3.1 34.2 16.1 2.1 8.3 4.8
Basket size ($) 970.1 252.7 75.1 105.2 165.1 224.8
Average departments shopped 3.6 5.5 1.9 1.2 2.9 1.9
Stores shopped 1.1 3.0 1.8 1.1 1.2 1.7
Returning propensity (%) 0.3 6.5 5.5 0.3 25.5 3.2
Shopped in December (%) 15.1 70.8 53.3 19.4 23.3 26.6
Shopped on Memorial Day (%) 1.6 17.9 2.4 0.9 2.1 2.2
Shopped on Labor Day (%) 1.0 14.1 1.8 0.6 1.5 1.7
Shopped on President's Day (%) 0.7 12.0 1.8 0.6 1.8 1.5
Average Discount Rate (%) 14.8 11.4 6.6 4.5 10.6 11.2
Customer lifetime (months) 25.2 46.2 42.2 28.4 27.2 30.8
Note that key profile variables are not always the same as basis variables used
to generate the segmentation
Subjective Segmentation: Cluster Analysis Process
Data Cleaning and Creating New
Selection of
Preparing the data set for Relevant Variables
Variables
analysis
Step 3
Step 1 Step 2
Age Age
Rule6 0.31 No wine purchase in Sep to Nov 2012; Age = >46; Total wine transactions in Sep to Nov 2012 =
38.4 4,5,6,7,11; Average of unit price of liquor purchase in Sep to Nov 2012 = <=457.778
0
Rule7 0.14 7.3 No wine purchase in Sep to Nov 2012; Age = >46; Total wine transactions in Sep to Nov 2012 =
0
4,5,6,7,11; Average of unit price of liquor purchase in Sep to Nov 2012 = (457.778,1550]
Rule8 35.9
No wine purchase in Sep to Nov 2012; Age = <=46
Rule9 4.7 Average unit price of wine purchase in Sep to Nov 2012 = <=980; Age = <=46
Decision Trees: CHAID Segmentation
CHAID Algorithm
Q&A
Sample ‘R’ Codes
Principal Component Analysis in R
To find the best low-dimensional representation of the variation in a multivariate data set
Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters
# Hierarchical Clustering
> d <- dist(mydata, method = "euclidean") # distance matrix
> fit <- hclust(d, method="ward")
> plot(fit) # display dendogram
> groups <- cutree(fit, k=5) # cut tree into 5 clusters
# grow tree
> fit <- rpart(Kyphosis ~ Age + Number + Start,
+ method="class", data=kyphosis)
# plot tree
> plot(fit, uniform=TRUE,
+ main="Classification Tree for Kyphosis")
> text(fit, use.n=TRUE, all=TRUE, cex=.8)
Join us on:
Twitter - http://twitter.com/#!/AnalytixLabs
Facebook - http://www.facebook.com/analytixlabs
LinkedIn - http://www.linkedin.com/in/analytixlabs
Blog - http://www.analytixlabs.co.in/category/blog/