Professional Documents
Culture Documents
BBBC Presntation PDF
BBBC Presntation PDF
BOOK CLUB
•Currently has a database of 500000 readers and sends mailing once a month
•Plans to implement predictive models to improve the efficacy of its direct mail program
CONTENT
1 CASE SETTING
2 DATA
3 PROBLEM STATEMENT
4 THREE MODEL ANALYSIS
5 PROFIT ANALYSIS
6 CONCLUSIONS AND RECOMMENDATIONS
2
20,000
OVERALL RESPONDENTS Pennsylvania, New York and
Ohio, that where received a special
brochure for The Art History of
Florence book along with
their regular mail and had a 9.03%
response rate for the book
1,200 400 purchase.
NOT PURCHASED PURCHASED
* THE CASE SUGGESTS THREE
1,600 2,300 MODLES OF ANALYSIS
PREDICT VALIDATE
2
DEPENDENT VARIABLE:
CHOICE: Whether the customer purchased the The Art History of Florence. 1
corresponds to a purchase and 0 corresponds to a non-purchase.
2
INDEPENDENT VARIABLE:
INDEPENDENT VARIABLE:
INDEPENDENT VARIABLE:
INDEPENDENT VARIABLE:
INDEPENDENT VARIABLE:
INDEPENDENT VARIABLE:
THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA
4
THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA
R F M
RFM
Analysis is a marketing technique used to
determine quantitatively which customers
are the best ones by examining how
recently a customer has purchased
(recency), how often they purchase
(frequency), and how much the customer
spends (monetary).
4
RFM
• Assign scores to each variable (1-12) • Sum total scores and rank by score
DECILE COLUMN:
1,600/10= 160
2,300/10= 230
MAIL COLUMN:
C. MAIL COLUMN:
RESPONSE COLUMN:
Based on the score ranking, we count
the ‘CHOICE’ observations of each
decile.
C. RESPONSE COLUMN:
PERCENTAGE COLUMN:
C. PERCENTAGE COLUMN:
INDEX COLUMN:
Represents the response % per decile,
divided by the response % for all the
deciles.
INDEX = 37% / 9%
C. INDEX COLUMN:
Represents the cumulative response
per decile, divided by the response %
for all the deciles.
C. INDEX = 37% / 9%
4
RFM
• Build gain charts
INDEX COLUMN:
Represents the response % per decile,
divided by the response % for all the
deciles.
INDEX = 37% / 9%
C. INDEX COLUMN:
Represents the cumulative response
per decile, divided by the response %
for all the deciles.
C. INDEX = 37% / 9%
4
RFM
• Build gain charts
CASE DATA HOLD OUT DATA
400
200
350
300
150
250
200
100
150
100
50
50
0 0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301
Cumulative Baseline Cumulative Actual Response Cumulative Baseline Cumulative Actual Response
✓ NOT a good model, since it doesn’t perform better than a random case.
Meaning the response hits are below the 50% set baseline.
4
THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA
• Run the model with ‘Choice’ • Choose all the other variables
as the dependent variable independent
REGRESSION 4
✓ GENDER, is negative and the probability of purchase is higher for female (0) customers.
✓ AMOUNT PURCHASE, coefficient being 0.000, the variable does not have an impact on the choice
✓ FREQUENCY, indicates the higher the frequency the less probability of purchase.
✓ LAST PURCHASE, indicates the longer the time frame from last purchase, the higher is the probability of purchase.
✓ FIRST PURCHASE, is not significant.
✓ P_ART, is positive (+), meaning probability of purchase is higher for Art books than other types of books.
REGRESSION 4
• Substitute the values of each observation • Rank ‘Predicted Choice’ in decreasing order
in the REG equation
PROBABILITY OF CHOICE =
0.36 -0.13 *(GENDER) +0.00 *(AMOUNT PURCHASED)
–0.009* (FREQUENCY) +0.097* (LAST PURCHASE)
–0.002* (FIRST_PURCHASE) -0.126* (P_CHILD)
–0.096* (P_YOUTH) –0.141* (P_COOK) -
0.135 * (P_DIY) +0.118* (P_ART)
REGRESSION 4
✓ Customers in the first deciles have higher scores. ✓ Customers in the first deciles have higher scores.
✓ Percentage column (Response/Mail), represents the % of customers that ✓ Percentage column (Response/Mail), represents the % of customers that
purchased. Meaning that on the first decile, 118 out of 160, purchased the purchased. Meaning that on the first decile, 86 out of 230, purchased the
book. book.
REGRESSION 4
400
350 200
300
150
250
200
100
150
100
50
50
0 0
101
201
301
401
501
601
701
801
901
1001
1101
1201
1301
1401
1501
1601
1701
1801
1901
2001
2101
2201
2301
1
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601
Cumulative Baseline Cumulative Actual Response Cumulative Baseline Cumulative Actual Response
4
THREE MODELS
RFM REGRESSION LOGIT
CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA CASE DATA HOLD OUT DATA
• Run the model with ‘Choice’ • Choose all the other variables
as the dependent variable independent
LOGIT 4
• Substitute the values of each observation • Rank ‘Response Prob’ in decreasing order
in the LOGIT equation
SCORE =
0.35 - 0.86 * (GENDER) + 0.00 * (AMOUNT
PURCHASED) – 0.08 * (FREQUENCY) + 0.61 * (LAST
PURCHASE) – 0.01 * (FIRST PURCHASE) - 0.81 *
(P_CHILD) – 0.64 * (P_YOUTH) – 0.92 * (P_COOK)
– 0.91 * (P_DIY) + 0.69 * (P_ART)
P=
LOGIT 4
✓ Customers in the first deciles have higher scores. ✓ Customers in the first deciles have higher scores.
✓ Percentage column (Response/Mail), represents the % of customers that ✓ Percentage column (Response/Mail), represents the % of customers that
purchased. Meaning that on the first decile, 120 out of 160, purchased the purchased. Meaning that on the first decile, 86 out of 230, purchased the
book. book.
LOGIT 4
400
200
350
300
150
250
200
100
150
100 50
50
0
0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601
Cumulative Baseline Cumulative Actual Response Cumulative Baseline Cumulative Actual Response
4
400
350 200
300
150
250
200
100
150
100
50
50
0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 0
1 101 201 301 401 501 601 701 801 901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901 2001 2101 2201 2301
Baseline Logit Model RFM Model Regression Model Baseline Regression Model Logit Model RFM Model
RECOMMENDATIONS
SHORT TERM
✓ Target deciles 1 through 4, for the ‘The Art History of Florence’ book.
✓ For the remaining deciles send sales promotions that appeal to customers
with opposing characteristics.
LONG TERM
✓ Target Lookalike Audience to reach new people who are likely to be interested in
the products because they're similar to the best existing customers.
✓ Re-do this type of analysis for their other book offerings, that have a different
target than the one already analyzed.
6
QUESTIONS?
BOOKBINDERS
BOOK CLUB