Book Binders Logistics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1.

Estimate a logistic regression model using “buyer” as the dependent variable and the
following as predictor variable

2. Create and interpret the odds-ratios for each of the predictors. Summarize and
interpret the results (so that a marketing manager can understand them). Which
variables are significant? Which seem to be ‘important’?

For every increase in month since the last purchase, the odds of making a purchase decrease
by 9.04%. For every increase in total dollars spent, the odds of making a purchase increases by
.11%. For females, it decreases the odds of purchasing by 52.33%. For every increase in the
total number of children’s books purchased, it decreases the odds of purchasing by 17%. For
every increase in the total number of youth books purchased, it decreases the odds of
purchasing by 10.7%. For every increase in the total number of cook books purchased, it
decreases the odds of purchasing by 23.7%. For every increase in the total number of
do-it-yourself books purchased, it decreases the odds of purchasing by 41.7%. For every
increase in the total number of reference books purchased, it increases the odds of purchasing
by 26.5%. For every increase in the total number of art books purchased, it increases the odds
of purchasing by 217.6%. For every increase in the total number of geography books
purchased, it increases the odds of purchasing by 77.5%. After conducting a logistic regression
model, all of the variables are statistically significant because the p values are below 0.05. The
variables do_it, art, and geog seem to be important because their increase or decrease in odds
of purchasing is greater than 25% and have a greater effect than other variables on purchasing.
Total_, or total dollars spent can also be important even though the percentage is small
because, logically, a one unit increase in dollars is a very small unit to measure. Therefore, a
larger increase in the number of dollars spent will increase the odds of purchasing.

3. Assign each customer to a decile based on his or her predicted probability of purchase. Hint:
The “predicted probability of purchase” is the variable “purch_prob” that came out of the logistic
regression after you issued the “predict.glm” command. It represents the ​ best
​ prediction of the
logit model of how likely a customer is to buy "The Art History of Florence."

4. Create a bar chart plotting response rate by decile (as just defined above).
Hint: The “response rate” is not the same that have bought "The Art History of Florence."

5. Generate a report showing the number of customers, the number of buyers of “The Art
History of Florence’ and the response rate to the offer by decile for the random sample (i.e. the
50,000) customers in the dataset.
6. For the 50,000 customers in the dataset run a logistic regression model where you
predict response only based on the “child” variable. Why is the odds ratio for “child” different
than in the logistic regression in Part I? Please be specific and investigate beyond simply stating
the statistical problem.

The odds ratio is significantly different from part 1 because it is not taking into account the
residual effects of the other prediction variables. Instead, it is isolating the child variable and
generating a logistic regression against the probability of buying the “Art History of Florence.” It
is essentially showing that when only taking into account the effect of buying a children’s book
has on buying “Art History of Florence”, the odds of purchasing are increasing, compared to
when taking into account other variables, as shown above, in which the odds of purchasing are
decreasing. By only predicting the response using one variable, it is incorrectly weighed without
taking into account the other variables.

7. Use the information from the report in question 5 above to create a table showing the lift
and cumulative lift for each decile. You may want to use Excel for these calculations.
8. Create a chart showing the cumulative lift by decile, along with a reference line
corresponding to the “no model” baseline.

9. Use the information from the report in question 5 above to create a table showing the gains
and cumulative gains for each decile. You may want to use Excel for these calculations.
10. Create a chart showing the cumulative gains by decile along with a reference line
corresponding to ‘no model’.

Part IV
Use the following cost information to assess the profitability of using logistic regression to
determine which of the remaining 500,000 customers should receive a specific offer:

Cost to mail each offer: $0.50


Selling price of each book, including shipping: $18.00
Wholesale price paid by BookBinders to the publisher: $9.00
Shipping costs paid by BookBinders: $3.00

11. What is the breakeven response rate?


8.33%

12. For the customers in the dataset, create a new variable (call it “target”) with a value of 1 if
the customer’s predicted probability is greater than the breakeven response rate and 0
otherwise.

13. Assume that BookBinders mails the offer to buy “The Art History of Florence” only to
their target customers (i.e., those whose predicted probability of buying is greater than or equal
to the breakeven rate).
a. What is the expected number of buyers from this mailing?
3323

b. What is the expected response rate from this mailing?


0.2133072

c. What is the expected profit (in dollars) from this mailing?


$12,158

d. What is the expected return on marketing expenditures from this mailing?


156.27%

Part V: Predicting total spending (5points)


Another manager at BookBinders wants to use this dataset for a different purpose: to
understand what factors can explain how much a customer spends over time. Specifically, she
would like to understand how the overall spending for each customer (the “total_” variable) can
be explained by the following variables:
• The number of months since the customer’s first purchase (the “first” variable)
• The total number of geography books they have purchased (the “geog” variable)
• The total number of art books they have purchased (the “art” variable)
• The total number of cook books they have purchased (the “cook” variable)
14. What kind of statistical method would be best for addressing this question? Conduct the
appropriate analysis, show the results, and explain how each of the above 4 variables affects
total spending.
You would use a linear regression to find how the mentioned variables affect total
spending for each customer. A one unit increase in the number of months since the first
purchase (first) will increase the overall spending for each customer by $1.23. For every one
unit increase in the number of geography books bought per customer, the overall spending for
each customer will increase $14.71. For every one unit increase in the number of art books
bought per customer, the overall spending for each customer will increase $14.19. For every
one unit increase in the number of cook books bought per customer, the overall spending for
each customer will increase $15.26.

You might also like