Professional Documents
Culture Documents
RSM8522-2024 2 Market Segmentation and Targeting Via RFM
RSM8522-2024 2 Market Segmentation and Targeting Via RFM
Sridhar Moorthy
Rotman School of Management
University of Toronto
©Moorthy 2024
Last class …
• The Pilgrim Bank case showed that
― consumers are heterogeneous.
― 80% of profits might be generated by 20% of consumers
• Taking a long-term customer equity perspective means moving from customer profits to
customer lifetime value (CLV)
• We can use CLV calculations to evaluate and plan marketing actions for acquisition,
retention, development.
― For example, in the Tuscan Lifestyles example, we saw how we can evaluate whether a marketing
action geared toward increasing acquisitions is worth pursuing from a CLV perspective
©Moorthy 2024 2
Today we will begin our discussion of how to do market segmentation,
starting with RFM
• We already know from the Pilgrim Bank case that not all customers are equally
profitable.
• But can we identify which customers will be more profitable, i.e., identify observable
variables that predict profitability?
• If the answer is yes, then we can segment the market based on those observable predictors
and target only those customers who are likely to be most profitable.
― This targeting strategy is likely to yield higher profits than “mass marketing.”
©Moorthy 2024
There are many ways to segment markets, as you learnt in RSM 8901
©Moorthy 2024
What does RFM stand for?
R: Recency of last purchase (“how long ago was your last purchase”)
F: Frequency of past purchases in a given time-period
M: Monetary value of past purchases in a given time-period
• RFM is based on past behavior. It can only be used to segment customers for whom R, F, and M data
are available. In other words, RFM segmentation is potentially useful for developmental marketing,
rather than acquisition marketing.
• RFM is based on the premise that customers with a “high” RFM index—made a purchase recently, buy
often, and spend a lot—are our best targets for (development) marketing efforts.
©Moorthy 2024
Frequency
(more frequent)
+ Monetary value
Recency
(larger purchase amount)
(more recent)
+
+
Likelihood of buying again in
response to marketing effort
©Moorthy 2024
Empirical evidence for the “recency effect”: meal preparation service
©Moorthy 2024
Let us see how we can do an RFM analysis in an actual case-study.
©Moorthy 2024
A bookstore wants to use RFM analysis to decide whom to target with a
book offer
• Stan Lawton, marketing director, pulls a random sample of 50,000 customers from the
database and mails the book in question, Art History of Florence, to the sample
― 4,522 customers out of 50,000 end up buying the book (“buyer”)
• Based on this test Lawton wants to determine which of the remaining 500,000 customers
in his database should be approached with the Art History of Florence offer.
©Moorthy 2024
We will first explore the assumptions of RFM analysis …
©Moorthy 2024
In the data set we have a response variable and RFM variables
Response
Recency
Frequency
Monetary value
©Moorthy 2024
The dataset looks like this …
• psych: we will use describeBy from this package to create summary statistics by category
©Moorthy 2024
Do buyers and non-buyers differ on R, F, and M variables?
Indeed:
©Moorthy 2024
Are the RFM variables independent?
Correlations
p-values
(statistical significance)
©Moorthy 2024
How to segment customers based on RFM profile: 3 methods
1. Seat-of-the-pants
• Example: one-time buyers vs. repeat-buyers, people
who bought less than 6 months ago, 6mo-1yr ago,
more than 1yr ago, etc.
2. Independent N-tile
• Classify customers into recency quintile/decile
• Independently classify customers into frequency
quintile/decile for frequency
• Independently classify customers into monetary
quintile/decile
• For each customer, aggregate the three indices
3. Sequential N-tile
• Classify customers into recency quintile/decile
• Within each recency quintile/decile, classify customers into
frequency quintile/decile
• Within each recency-frequency quintile/decile, classify
customers into monetary quintile/decile
• For each customer, aggregate the three indices
©Moorthy 2024
The three methods compared
1. The seat-of-the-pants approach is the easiest, but it relies on intuition—which is not always reliable.
2. The independent N-tile approach, being based on data, is likely to yield better predictions than seat-
of-the-pants. Also:
― Quite easy to execute: only 3 sorts required.
― Interpretation of the three RFM components is unambiguous: for example, a frequency score of 5 for one
customer means the same as a frequency score of 5 for another customer, regardless of their recency scores.
― However: In small samples, especially with skewed distributions, might result in empty cells and uneven
distribution of aggregate RFM scores.
3. The sequential N-tile approach is likely to yield the best predictions because of finer sorting:
― The finer sorting also tends to produce a more even distribution of aggregate RFM scores, especially when the
underlying distributions are skewed
― However: Harder to execute than independent n-tiles: for example, with quintiles at each stage, you need to do
one sort for R, 5 sorts for F, and 25 sorts for M.
― Also: the frequency and monetary scores are harder to interpret. For example, a frequency rank of 5 for a
customer with a recency rank of 5 may not mean the same thing as a frequency rank of 5 for a customer with a
recency rank of 4, since the frequency rank is dependent on the recency rank.
©Moorthy 2024
Let us execute the independent n-tiles method on our bookstore data.
©Moorthy 2024
The dataset looks like this …
.bincode function assigns to each observation the “bin number”—in this case, quintile—to which
last belongs (if you want to change from quintiles to deciles (10 segments), specify probs = seq(0, 1, 0.1))
The vector rec_iq will have the same length as last and will contain numbers from 1 to 5, with 1
referring to the group with the most recency (i.e., smallest last).
©Moorthy 2024
describeBy(data$last, data$rec_iq): summary stats for last by
recency quintile
©Moorthy 2024
Visually: ggplot(data=data,aes(x=rec_iq,y=last)) + geom_bar(stat =
"summary", fun.y = "mean")
Note: Quintile 1 groups consumers with the lowest number of months since last purchase,
which is what we want
©Moorthy 2024
Are the most recent purchasers most likely to buy?
Yes!
©Moorthy 2024
Next we assign people to frequency quintiles
data$freq_iq
<‐.bincode(data$purch,
quantile(data$purch, probs =
seq(0, 1,0.2)), right = TRUE,
include.lowest = TRUE)
ggplot(data=data,aes(x=freq_i
q,y=purch)) +
geom_bar(stat="summary",fun.y
="mean")
©Moorthy 2024
Why are there only four quintiles? Looking at the histogram of purch
provides a clue
ggplot(data=data,aes(x=purch)) + geom_histogram()
©Moorthy 2024
Are the most frequent purchasers—the ones with the highest probability of
purchase--in the first quintile?
ggplot(data=data,aes(x=freq_iq,y=buyerdummy)) +
geom_bar(stat="summary",fun.y="mean")
No!
©Moorthy 2024
Reorder freq_iq so that group 1 is most likely to buy
data$freq_iq <‐ 6 ‐data$freq_iq
ggplot(data=data,aes(x=freq_iq,y=buyerdummy)) +
geom_bar(stat="summary",fun.y="mean")
Yes!
©Moorthy 2024
Finally, we assign people to monetary value quintiles and anticipating the
same issue as with frequency reorder the bin numbers …
# Reorder
data$mv_iq <‐ 6 ‐data$mv_iq
ggplot(data=data,aes(x=mv_iq,y=t
otal_))+geom_bar(stat="summary",
fun.y="mean")
©Moorthy 2024
Are people in group 1 most likely to buy?
ggplot(data=data,aes(x=mv_iq,y=buyerdummy))+geom_bar(stat="summary
",fun.y="mean")
Yes!
©Moorthy 2024
Aggregate the individual R, F, and M indices into a composite RFM
index and predict average response rate by this composite index
©Moorthy 2024
Relationship between response rate and RFM index
©Moorthy 2024
Calculate the break-even response rate using the cost of the marketing
effort and the bookstore’s margin
## group: 0
## vars n mean sd median trimmed mad min max range skew
kurtosis se
describeBy(data$buyerdummy,data$target_iq) ## X1 1 26732 0.05 0.21 0 0 0 0 1 1 4.29 16.4 0
## ‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
Average response rate of the target is higher ## group: 1
## vars n mean sd median trimmed mad min max range skew
than the break-even rate and higher than the kurtosis se
response rate for the entire sample (.0904) ## X1 1 23268 0.14 0.35 0 0.05 0 0 1 1 2.07 2.28 0
©Moorthy 2024
Visualization of targeting strategy
©Moorthy 2024
# of prospects to target and number of expected buyers
# Calculate the number of mails sent Mailing to 46.54% of database means mailing
under the targeting policy. to 500,000 * 46.54%=232,680 prospects
exp_mail_iq <‐
500000*mean(data$target_iq)
# Calculate the expected number of responses under From which we expect to net
the targeting policy. 14.05% * 232,680 = 32,692 buyers
exp_res_iq <‐
exp_mail_iq*mean(data[data$target_iq==1,]$buyerdummy)
©Moorthy 2024
Profits under RFM-based targeting strategy
©Moorthy 2024
Suppose, instead, we had not followed the RFM approach and sent the
offer to all 500,000 consumers, i.e., no targeting
• Our response rate would have been the average response rate for the entire sample
Our expected number of buyers would have been 9.04% × 500,000 = 45,200.
©Moorthy 2024
And our profits and ROI would have been …
• Net profit: ($18-9-3) × 45,200−$0.5 × 500,000 = $21,200 (instead of $79,812 with RFM
targeting)
©Moorthy 2024
Step 1: assign R, F, and M indices to each customer based on quintiles or deciles
― Sort customer database from “best” to “worst” on the variable (here “best” refers to variable
value that corresponds to the highest probability of purchase)
― Decide how many segments to classify customers in; then calculate “cut-off values” for the
variable that demarcate each segment
• Five segments: quintiles; ten segments: deciles.
― Classify each customer into the quintile or decile s/he belongs to, using the cut-off values
calculated in the previous step
― Confirm that customers in the “top group” have the highest probability of buying. If not, reverse
the index so that it is.
©Moorthy 2024
Step 2: combine the three indices into a composite 3-digit index
2. Assign every customer a 3-digit composite index, e.g., 125, 555, etc., based on his/her
recency, frequency, and monetary value indices
― For example, with quintile classification in the previous step, a customer who was in the top-most
quintile on recency, the second quintile on frequency, and the bottom quintile on monetary value
would get a composite index of 125.
― With quintiles for R, F, and M, we will normally end up with 125 segments; with deciles we will
normally have 1000 segments.
― Given the classification rule above, a customer with a composite index 125 should have a higher
probability of buying than a customer with index 555.
©Moorthy 2024
Step 3: estimate the average response rate for each RFM cell
3. This is % of customers within each 3-digit composite index who responded to the
marketing action (e.g., catalog mailing)
― average of 0/1 outcome variable (not respond/respond)
©Moorthy 2024
Step 4: calculate the break-even response rate for the marketing initiative
©Moorthy 2024
Step 5: target the marketing initiative to the RFM cells with average
response rate greater than the breakeven rate
5. Select the 3-digit segments with average response rate above the break-even response
rate and target the offer only to customers in those groups
©Moorthy 2024
Now let us turn to the sequential n-tile approach
©Moorthy 2024
Sample code for implementing sequential RFM
# R index R-index
data$rec_sq <‐.bincode(data$last, quantile(data$last, probs = seq(0, 1, 0.2)), right = TRUE, same as
include.lowest = TRUE) before
# F index
data$freq_sq <‐ 0
for (i in 1:5) {
F-index not
data[data$rec_sq==i,]$freq_sq <‐.bincode(data[data$rec_sq==i,]$purch,
the same as
quantile(data[data$rec_sq==i,]$purch, probs = seq(0, 1, 0.2)), right = TRUE, include.lowest =
TRUE) before
}
# M index
data$mv_sq <‐ 0
for (i in 1:5) { M-index not
for (j in 1:5) { the same as
data[data$rec_sq==i & data$freq_sq==j,]$mv_sq <‐.bincode(data[data$rec_sq==i &
before
data$freq_sq==j,]$total_, quantile(data[data$rec_sq==i & data$freq_sq==j,]$total_, probs =
seq(0,
1, 0.2)), right = TRUE, include.lowest = TRUE)
}
}
©Moorthy 2024
The composite RFM index for sequential RFM is calculated in the same
way as before
data$rfmindex_sq <‐ 100*data$rec_sq+10*data$freq_sq+data$mv_sq
ggplot(data = data, aes(x = as.factor(rfmindex_sq),y=buyerdummy))+
geom_bar(stat="summary",fun.y="mean")+
labs(title="% Buyers by RFM Index (Sequential)",
x="RFM Index", y="% Buyer")
©Moorthy 2024
The targeting decision is again made in the same way as before
# whether to target
data$target_sq[data$RFM_response_sq > break_even] <‐ 1
data$target_sq[data$RFM_response_sq <= break_even] <‐ 0
©Moorthy 2024
Profitability of targeting using sequential RFM
©Moorthy 2024
Comparing the three methods: no targeting, independent n-tile,
sequential n-tile
©Moorthy 2024
RFM is widely used in industry
• For example, Fedex used RFM analysis “to separate growing customers with additional
upside potential from those who had reached the limit of their growth. The key differences
between the two clusters, such as sales contact rates, automation status, discounts levels,
etc. could then be worked into promotional programs designed to continue to grow these
customers with untapped upside potential.” (Sellers and Hughes 2009)
©Moorthy 2024
Note: how the “recency effect” operates depends on product category
• Strongest for frequently-purchased goods, e.g., things you buy in a grocery store.
― For these products if your last purchase was a long time ago—longer than the inter-purchase time
in the category—then it is likely that “you have moved on” and unlikely to be a buyer.
Essentially, you have to weigh recency of purchase against the normal interpurchase
time in the category.
©Moorthy 2024
Limitations of RFM analysis
• Being based on solely on “behavior,” it doesn’t use any other information, such as
demographic characteristics of the consumer, or information on the purchase
environment in the past—such as prices--which, if incorporated, could lead to better
predictions.
©Moorthy 2024
Takeaways
©Moorthy 2024