Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Factor Analysis

and
Segmentation

Disclaimer: This material is protected under copyright act AnalytixLabs ©, 2011. Unauthorized use and/ or duplication of this material or any part of this material including data, in any
form without explicit and written permission from AnalytixLabs is strictly prohibited. Any violation of this copyright will attract legal actions.

Learn to Evolve
Introduction to Factor Analysis - PCA
Purpose of PCA

• To find a linear combination of the original variables that has the largest variance
possible.

• Need some restriction on the entries in the linear combination or problem is not
well defined.

• Usually require sums of the squares of weights to be 1.


Example
H e lm e t D a ta
15

10

5 0.7071*LTN + 0.7071*LTG

0
LTN

-5

-1 0

-1 5
0.5034*LTN + 0.8641*LTG
-2 0
-2 0 -1 5 -1 0 -5 0 5 10 15
L TG
What is happening?

• Trying to find a direction where the physical scatter of points is most


clearly “jutting out”
• This “diversity” may be just what you are looking for in your data
• Why would anyone want to find such directions?
Principal Components Regression

• Standard regression problem with response y and regressors X1, X2, …, Xp.

• X1, X2, …, Xp may be exactly collinear or nearly so.

• Least squares estimates of regression coefficients are not possible, or not


reliable in that case.

• Can use Principal Components to address the problem.


Intelligent Index Formation

• May have answers to p questions, say X1, X2, …, Xp.

• And you may want to summarize these p responses with one number (“index”)
that best captures the diversity in responses.

• E.g. is common to add the responses, or average them, perhaps being sensitive
to questions that are reverse coded.

• Already should be clear to you that a simple averaging may not be the best way
to summarize the original p questions.
Reduction of Dimension

• Often able to replace the original variables X1, X2, …, Xp with a few new variables,
say, U1, U2, …, Uk where k is much smaller than p.

• By plotting the first two or three pairs of these new variables you can often see
structure you wouldn’t otherwise be able to see (e.g. clustering).
Interpretation

• In rarer cases the new variables, U1, U2, …, Uk, are interpretable and point to
some new facet of the study.

• As you will see, however, one must be very careful with this use of Principal
Components since it is a prime opportunity to go astray and over interpret.

• This is often where PCA is confused with Factor Analysis.


How Does PCA Work?

• Look for weights a11, a12, …, a1p such that U1=a11*X1+a12*X2+…+a1p*Xp has the
largest variance subject to the restriction that (a11)2+(a12)2+…+(a1p)2=1
• The numbers a11, a12, …, a1p are called different things in different books. In
SAS they are arrayed in a column and called the first principal component
“eigenvector”.
• If the Xi variables have had their individual means subtracted off, then the new
variable U1is called the first principal component, or in most texts, the first
principal component score.
What’s Next?

• Look for weights a21, a22, …, a2p such that U2=a21*X1+a22*X2+…+a2p*Xp has the
next largest variance subject to the restriction that (a21)2+(a22)2+…+(a2p)2=1

• The numbers a21, a22, …, a2p are called different things in different books. In SAS
they are arrayed in a column and called the second principal component
“eigenvector”.

• If the Xi variables have had their individual means subtracted off, then the new
variable U2 is called the second principal component, or in most texts, the
second principal component score.
What’s News?

• Any two arrays of weights will cross-multiply and sum to 0. Example: (a11
a21)+(a12a22)+…+(a1p a2p)=0
• Same as saying: any two of the new variables will be uncorrelated. Example:
corr(U1,U2)=0.
How Far Does This Go?

• Until the original data are described adequately.

• We will look at two or three criteria for how many of these scores to
construct. We’ll start with our common sense.

• Most of the time it is not as hard as it might sound. Basically, we will look
at “how much variance” in the original data is summarized by each new
component variable.
Two Basic Constructs

• Weights (used “a” to denote).


• Weights arrayed in columns and called “eigenvectors” on SAS output.
• Weights come from looking at all pairwise covariances associated with the
original p variables.
• Scores (used “u” to denote).
• Scores called “principal components” and are the new variables.
• Typically use Weights for interpretation and development of subscales.
• Typically use Scores for clustering and as a substitution for the original
data.
Geometry

principal component direction

distance is essentially the score

mean

projection
Recall

Eigenvalue Difference Proportion Cumulative


1 4.19711750 3.52963341 0.8394 0.8394
2 0.66748410 0.57285125 0.1335 0.9729
3 0.09463284 0.05392125 0.0189 0.9918
4 0.04071159 0.04065762 0.0081 1.0000
5 0.00005397 0.0000 1.0000
Scree Plots
S C R E E P lo t f o r H o s p ita l D a ta

3
E ig e n va lu e

Suggests two
2 components

0
1 2 3 4 5
N u m b e r o f C o m p o n e n ts
Loadings

D e fin it io n
C o m p o n e n t lo a d in g s a r e t h e o r d in a r y p r o d u c t -
m o m e n t c o r r e la t io n b e t w e e n e a c h o r ig in a l v a r ia b le
a n d e a ch co m p o n e n t sco re .
Interpretation
By looking at of component loadings one can
ascertain which of the original variables tend to
“load” on a given new variable. This may facilitate
interpretations, creation of subscales, etc.
Introduction to Segmentation
Segmentation
Each individual is so different
that ideally we would want to reach out to each one of them in a different way
1 2 3 4 5 6

…………………………..

Problem : The volume is too large for customization at individual level

1 2 3 4
Solution : Identify segments where people have same characters and target each of
these segments in a different way

Segmentation is for better targeting


Cluster Analysis
Example
Business Example
Consider a portfolio with 1000 customers having Credits. Business wants to make different strategies to different groups of people. How company
can group them into similar groups?

In this case we need some profiling as below: -

Total Population
(1000)

Avg. delinquency
Avg. delinquency age = 0 age = 75 days and
Avg. delinquency Avg. delinquency Avg. age = 50 yrs.
days and Avg. age = 35 yrs.
age = 15 days and age = 12 days and Avg. Utilization = 40%
Avg. Utilization > 80%
Avg. age = 33 yrs. Avg. age = 25 yrs.
Avg. Utilization = 60% Avg. Utilization = 90%

We can exclude the group with avg. delinquency age = 75 days from mailing

This type of segmentation is known as ‘Subjective Segmentation’. It gives the salient characteristics of
the best customers
Applications of Segmentation
Business Tasks & Appropriate Segmentation
Types of Segmentation
Models can be developed in domains of risk, marketing or collections.
• Customer Segmentation

 Value Based Segmentation: Customer ranking and segmentation according to current and
expected/estimated customer value

 Life Stage Segmentation: Segmentation according to current life stage which he/she belongs

 Loyalty Segmentation: Segmentation according to current & Previous value

 Behavioral Segmentation: Customer segmentation based on behavioral attributes


Value Based Segmentation
Life Stage Segmentation
Loyalty Segmentation
There are 3 approaches to behavioral segmentation
Suggested
Description When to do technique Client example

Segment customers, ▪ Only a couple factors are ▪ Cable client segmented prospects on
manually, based on 1 to 3 thought to drive the segments Cross-tabs and their potential telecom spend and used
Rule-based: factors to drive specific ▪ Known hypothesis to cut conditional data the segmentation to align sales
1
Hypothesis driven business objective the data to create segments cuts resources and offers to improve go-to-
market strategy

Segment customers using ▪ Data-driven segments CHAID ▪ Telecom client segmented customers
predictive algorithm, desired, but first and foremost on various factors that drive churn
Supervised:
Behavioral based on high number of segments need to be propensity and targeted high churn
2 With a dependent
segmentation factors that potentially differentiated on a specific segments with retention campaigns
variable
drive a specific outcome outcome/metric (e.g. and offers
revenue)

Segment customers using ▪ Data-driven segments TwoStep, ▪ Retail client segmented customers on
clustering algorithm desired K-Means behavioral shopping factors that
Unsupervis-ed:
based on high number of ▪ Segments need to be included category spend, shopping
3 Without a dependent
factors differentiated across many frequency/tendency, and store/channel
variable
behavioral factors shopped to inform merchandising and
offer strategy
RFM SEGMENTATION
RFM SEGMENTATION- STEPS
RFM SEGMENTATION - STEPS
RFM-SEGMENTATION STEPS
Behavioral Segmentation: Hierarchical Vs. Non-hierarchical
Behavioral Segmentation: Subjective Segmentation-Cluster Analysis
Highest value segment

Big Small
Ticket Frequent Ticket Infrequent Returner Overall
% Customers 9.8 4.2 13.5 69.5 6.6 100.0
% Revenue 27.4 33.6 15.4 13.5 10.1 100.0
Revenue per customer ($) 1,038 8,618 1209.1 220 1613.5 1077.2
Visits per customer 3.1 34.2 16.1 2.1 8.3 4.8
Basket size ($) 970.1 252.7 75.1 105.2 165.1 224.8
Average departments shopped 3.6 5.5 1.9 1.2 2.9 1.9
Stores shopped 1.1 3.0 1.8 1.1 1.2 1.7
Returning propensity (%) 0.3 6.5 5.5 0.3 25.5 3.2
Shopped in December (%) 15.1 70.8 53.3 19.4 23.3 26.6
Shopped on Memorial Day (%) 1.6 17.9 2.4 0.9 2.1 2.2
Shopped on Labor Day (%) 1.0 14.1 1.8 0.6 1.5 1.7
Shopped on President's Day (%) 0.7 12.0 1.8 0.6 1.8 1.5

Average Discount Rate (%) 14.8 11.4 6.6 4.5 10.6 11.2
Customer lifetime (months) 25.2 46.2 42.2 28.4 27.2 30.8

Note that key profile variables are not always the same as basis variables used
to generate the segmentation
Subjective Segmentation: Cluster Analysis Process
Data Cleaning and Creating New
Selection of
Preparing the data set for Relevant Variables
Variables
analysis
Step 3
Step 1 Step 2

Multicollinearity Check Treatment of Missing Values Tackling the


Outliers
Step 6 Step 5 Step 4

Getting Cluster Checking the


Standardization Optimality of the
Solution
Step 7 Solution
Step 8
Step 9
Process Flow for Cluster Analysis
Subjective Segmentation: K-Means Clustering Algorithm
Objective Segmentation: Decision Trees
Objective Segmentation: Decision Trees
N in node: 50,000
Average: 0.4

Average unit price of wine purchase in Sep to Nov 2012

<=980 >980 No purchase

N in node: 5,658 N in node: 1206 N in node: 42,229


Average: 0.7 Average: 5.8 Average: 0.2

Age Age

<= 46 > 46 <= 46 > 46

N in node: 2,283 N in node: 3,375 N in node: 17,619 N in node: 24,610


Average: 0.0 Average: 1.2 Average: 0.0 Average: 0.4

Total number of items bought in the Total number of wine transactions


sub-product level wine in Aug to Sep 2012 In Aug to Sep 2012
<= 4 >4 1,2,3 4,5,6,7,11

N in node: 916 N in node: 2,459 N in node: 1,187 N in node: 23,425


Average: 2.4 Average: 0.7 Average: 1.6 Average: 0.3

Average unit price of liquor purchase


In Sep to Nov 2012
<= 457.778 (457.778, 1550] >1550

N in node: 18,874 N in node: 3,585 N in node: 964


Average: 0.3 Average: 0.1 Average: 1.0
Decision Tree Example– business rules
Business rule statistics and description

Business Rules % Customer


Propensity to buy Description of the rule
Average unit price of wine purchase in Sep to Nov 2012 = >980
Rule1 5.80 2.5
Average unit price of wine purchase in Sep to Nov 2012 = <=980; Age = >46; Total number of
items bought in sub-product level wine in Aug to Sep 2012 = <=4
Rule2 2.40 1.9
No wine purchase in Sep to Nov 2012; Age = >46; Total wine transactions in Sep to Nov 2012 =
Rule3 1.60 2.4 1,2,3
No wine purchase in Sep to Nov 2012; Age = >46; Total wine transactions in Sep to Nov 2012 =
4,5,6,7,11; Average of unit price of liquor purchase in Sep to Nov 2012 = >1,550
Rule4 1.04 2.0
Average unit price of wine purchase in Sep to Nov 2012 = <=980; Age = >46; Total number of
Rule5 0.69 5.0 items bought in sub-product level wine in Aug to Sep 2012 = >4

Rule6 0.31 No wine purchase in Sep to Nov 2012; Age = >46; Total wine transactions in Sep to Nov 2012 =
38.4 4,5,6,7,11; Average of unit price of liquor purchase in Sep to Nov 2012 = <=457.778
0
Rule7 0.14 7.3 No wine purchase in Sep to Nov 2012; Age = >46; Total wine transactions in Sep to Nov 2012 =
0
4,5,6,7,11; Average of unit price of liquor purchase in Sep to Nov 2012 = (457.778,1550]
Rule8 35.9
No wine purchase in Sep to Nov 2012; Age = <=46

Rule9 4.7 Average unit price of wine purchase in Sep to Nov 2012 = <=980; Age = <=46
Decision Trees: CHAID Segmentation
CHAID Algorithm
Q&A
Sample ‘R’ Codes
Principal Component Analysis in R
To find the best low-dimensional representation of the variation in a multivariate data set

# Pricipal Components Analysis


# entering raw data and extracting PCs
# from the correlation matrix
> fit <- princomp(mydata, cor=TRUE)
> summary(fit) # print variance accounted for
> loadings(fit) # pc loadings
> plot(fit,type="lines") # scree plot
> fit$scores # the principal components
> biplot(fit)
Clustering in R
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in
some sense or another) to each other than to those in other groups (clusters)

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters

# Hierarchical Clustering
> d <- dist(mydata, method = "euclidean") # distance matrix
> fit <- hclust(d, method="ward")
> plot(fit) # display dendogram
> groups <- cutree(fit, k=5) # cut tree into 5 clusters

# draw dendogram with red borders around the 5 clusters


> rect.hclust(fit, k=5, border="red")
Clustering in R
Method of unsupervised clustering where you fix the no. of clusters prior to clustering

# K-Means Cluster Analysis


> fit <- kmeans(mydata, 5) # 5 cluster solution
# get cluster means
> aggregate(mydata,by=list(fit$cluster),FUN=mean)
# append cluster assignment
> mydata <- data.frame(mydata, fit$cluster)
Decision Tree in R
It helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or
continuous (regression tree) outcome

# Classification Tree with rpart


> library(rpart)

# grow tree
> fit <- rpart(Kyphosis ~ Age + Number + Start,
+ method="class", data=kyphosis)
# plot tree
> plot(fit, uniform=TRUE,
+ main="Classification Tree for Kyphosis")
> text(fit, use.n=TRUE, all=TRUE, cex=.8)

# Conditional Inference Tree for Kyphosis


# No pruning required
> library(party)
> fit <- ctree(Kyphosis ~ Age + Number + Start,
+ data=kyphosis)
> plot(fit, main="Conditional Inference Tree for Kyphosis")
Contact us
Visit us on: http://www.analytixlabs.in/

For course registration, please visit: http://www.analytixlabs.co.in/course-registration/

For more information, please contact us: http://www.analytixlabs.co.in/contact-us/


Or email: info@analytixlabs.co.in

Call us we would love to speak with you: (+91) 88021-73069

Join us on:
Twitter - http://twitter.com/#!/AnalytixLabs
Facebook - http://www.facebook.com/analytixlabs
LinkedIn - http://www.linkedin.com/in/analytixlabs
Blog - http://www.analytixlabs.co.in/category/blog/

You might also like