Counting Your Customers The Easy Way An Alternative To The ParetoNBD Model

"Counting Your Customers" the Easy Way: An Alternative to the Pareto/NBD Model
Author(s): Peter S. Fader, Bruce G. S. Hardie and Ka Lok Lee

Source: Marketing Science , Spring, 2005, Vol. 24, No. 2 (Spring, 2005), pp. 275-284
Published by: INFORMS
Stable URL: https://www.jstor.org/stable/40056956
REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/40056956?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Marketing
Science
This content downloaded from

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
All use subject to https://about.jstor.org/terms
Marketing Science
MK
Vol. 24, No. 2, Spring 2005, pp. 275-284
issn 0732-2399 1 eissn 1526-548X 1 05 1 2402 1 0275 doi 10.1287/mksc.l040.0098
©2005 INFORMS
"Counting Your Customers" the Easy Way:

An Alternative to the Pareto/NBD Model
Peter S. Fader
The Wharton School, University of Pennsylvania, 749 Huntsman Hall, 3730 Walnut Street,
Philadelphia, Pennsylvania 19104-6340, faderp@wharton.upenn.edu
Bruce G. S. Hardie
London Business School, Regent's Park, London NW1 4SA, United Kingdom, bhardie@london.edu
Ka Lok Lee
Catalina Health Resource, Blue Bell, Pennsylvania 19422, kaloklee@alumni.upenn.edu
Today's
can thenmanagers are
serve as an veryinto
input interested
"lifetimeinvalue"
predicting the future
calculations. purchasing
Among the modelspatterns of their
that provide suchcustomers,
capa- which
bilities, the Pareto/NBD "counting your customers" framework proposed by Schmittlein et al. (1987) is highly
regarded. However, despite the respect it has earned, it has proven to be a difficult model to implement, par-
ticularly because of computational challenges associated with parameter estimation.
We develop a new model, the beta-geometric /NBD (BG/NBD), which represents a slight variation in the
behavioral "story" associated with the Pareto/NBD but is vastly easier to implement. We show, for instance,
how its parameters can be obtained quite easily in Microsoft Excel. The two models yield very similar results
in a wide variety of purchasing environments, leading us to suggest that the BG/NBD could be viewed as an
attractive alternative to the Pareto/NBD in most applications.
Key words: customer base analysis; repeat buying; Pareto/NBD; probability models; forecasting; lifetime value
History: This paper was received August 11, 2003, and was with the authors 7 months for 2 revisions;
processed by Gary Lilien.
1. Introduction buy at a steady rate (albeit in a stochastic manner)

for a period
Faced with a database containing information on of time, and then become inactive. More
the frequency and timing of transactions for specifically,
a list time to "dropout" is modelled using the
of customers, it is natural to try to make Pareto (exponential-gamma mixture) timing model,
fore-
and often
casts about future purchasing. These projections repeat-buying behavior while active is modelled
range from aggregate sales trajectories (e.g., using
for thethe NBD (Poisson-gamma mixture) counting
next 52 weeks), to individual-level conditionalmodel.
expec-The Pareto/NBD is a powerful model for
customer-base analysis, but its empirical application
tations (i.e., the best guess about a particular cus-
can be challenging, especially in terms of parameter
tomer's future purchasing, given information about
estimation.
his past behavior). Many other related issues may
Perhaps because of these operational difficulties,
arise from a customer-level database, but these are
relatively few researchers actively followed up on the
typical of the questions that a manager should ini-
SMC paper soon after it was published (as judged by
tially try to address. This is particularly true for any
citation counts). However, it has received a steadily
firm with serious interest in tracking and managing increasing amount of attention in recent years as
"customer lifetime value'' (CLV) on a systematic basis. many researchers and managers have become con-
There is a great deal of interest, among marketing cerned about issues such as customer churn, attrition,
practitioners and academics alike, in developing mod- retention, and CLV. While a number of researchers
els to accomplish these tasks. (e.g., Balasubramanian et al. 1998, Jain and Singh
One of the first models to explicitly address these 2002, Mulhern 1999, Niraj et al. 2001) refer to the
issues is the Pareto/NBD "counting your customers" applicability and usefulness of the Pareto/NBD, only
framework originally proposed by Schmittlein et al.a small handful claim to have actually implemented
(1987), called hereafter SMC. This model describes it. Nevertheless, some of these papers (e.g., Reinartz
repeat-buying behavior in settings where customer and Kumar 2000, Schmittlein and Peterson 1994)
"dropout" is unobserved: It assumes that customers have, in turn, become quite popular and widely cited.
275

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
Fader et al.: "Counting Your Customers" the Easy Way
276 Marketing Science 24(2), pp. 275-284, © 2005 INFORMS
The objective of this paper

quantities,is
suchto
as: develop a new
model, the beta-geometric • /NBD
E[X(t)], the expected number of transactions
(BG/NBD), in a
which
time
represents a slight variation inperiod
the of length t (SMC, Equation (17)),"story"
behavioral which is
central to computing
that lies at the heart of SMC's original work, but the expected transaction volume is
for the whole customer base over time.
vastly easier to implement. We show, for instance,
how its parameters can be• P(X(t) obtained= x), the probability
quiteof easily observing x trans-
in Microsoft Excel, with no appreciable losst in
actions in a time period of length (SMC,theEquations
(A40), (A43),customer
model's ability to fit or predict and (A45)). purchas-
• E(Y(t)\X
ing patterns. We develop the BG/NBD = x,tx,T), model
the expected number of
from
transactions in the period
first principles and present the expressions required (T, T + t] for an indi-
vidual with observed behavior (X = x, tx, T) (SMC,
for making individual-level statements about future
buying behavior. We compareEquation
and (22)). contrast its perfor-
The likelihood function associated with the
mance to that of the Pareto/NBD via a simulation and
an illustrative empirical application. The two models
Pareto/NBD model is quite complex, involving
numerous evaluations of the Gaussian hypergeo-
yield very similar results, leading us to suggest that
the BG/NBD should be viewed as function.
metric Besides being
an attractive unfamiliar to most
alter-
native to the Pareto/NBD model. researchers working in the areas of database market-
ing and CRM analysis, multiple evaluations of the
Before developing the BG/NBD model, we briefly
Gaussian hypergeometric are very demanding from
review the Pareto/NBD model (§2). In §3 we outline
a computational standpoint. Furthermore, the preci-
the assumptions of the BG/NBD model, deriving the
sion of some numerical procedures used to evaluate
key expressions at the individual level and for a ran-
this function can vary substantially over the parame-
domly chosen individual, in §§4 and 5, respectively.
ter space (Lozier and Olver 1995); this can cause major
This is followed by the aforementioned simulationproblems for numerical optimization routines as they
and empirical analysis. We conclude with a discussionsearch for the maximum of the likelihood function.
of several issues that arise from this work.
To the best of our knowledge, the only published
paper reporting a successful implementation of the
Pareto/NBD model using standard maximum like-
2. The Pareto/NBD Model
lihood estimation (MLE) techniques is Reinartz and
The Pareto/NBD model is based on five assumptions: Kumar (2003), and the authors comment on the asso-
(i) While active, the number of transactions made
ciated computational burden. As an alternative to
by a customer in a time period of lengthMLE, t is SMC
dis-proposed a three-step method-of-moments
tributed Poisson with mean \t.
estimation procedure, which was further refined by
(ii) Heterogeneity in the transaction rate A acrossSchmittlein and Peterson (1994). While simpler than
customers follows a gamma distribution with shape MLE, the proposed algorithm is still not easy to
parameter r and scale parameter a. implement; furthermore, it does not have the desir-
(iii) Each customer has an unobserved "lifetime"ableof statistical properties commonly associated with
length t. This point at which the customer becomes MLE. In contrast, the BG/NBD model, to be intro-
inactive is distributed exponential with dropout duced in the next section, can be implemented very
rate /jl. quickly and efficiently via MLE, and its parameter
(iv) Heterogeneity in dropout rates across cus- estimation does not require any specialized software
tomers follows a gamma distribution with shape or the evaluation of any unconventional mathematical
parameter s and scale parameter j8. functions.
(v) The transaction rate A and the dropout rate \x

vary independently across customers. 3. BG/NBD Assumptions
The Pareto/NBD (and, as we will see shortly, the Most aspects of the BG/NBD model directly mirror
BG/NBD) requires only two pieces of information those of the Pareto/NBD. The only difference lies
about each customer's past purchasing history: his in the story being told about how /when customers
"recency" (when his last transaction occurred) and become inactive. The Pareto timing model assumes
"frequency" (how many transactions he made in a that dropout can occur at any point in time, inde-
specified time period). The notation used to represent pendent of the occurrence of actual purchases. If we
this information is (X = x, tx, T), where x is the num- assume instead that dropout occurs immediately after
ber of transactions observed in the time period (0, T] a purchase, we can model this process using the beta-
and tx (0 < tx < T) is the time of the last transaction. geometric (BG) model.
Using these two key summary statistics, SMC derive More formally, the BG/NBD model is based on
expressions for a number of managerially relevant the following five assumptions (the first two of

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
Marketing Science 24(2), pp. 275-284, ©2005 INFORMS 277
which are identical to the corresponding Pareto/NBD

times the standard exponential likelihood component,
assumptions): which equals (1 -p)\e~Htx~tx-l).
(i) While active, the number of transactions
• Finally, made zero pur-
the likelihood of observing
by a customer follows a Poisson process with trans-
chases in (tx, T] is the probability the customer
becameto
action rate A. This is equivalent inactive at tx, plus
assuming the probability
that the he
time between transactions is distributed exponential
remained active but made no purchases in this inter-
with transaction rate A, i.e., val, which equals p + (1 - p)e~x^T~tx\
Therefore,
/(fJfM;A) = A<rA<'/-'M>, t}>thl>0.
L(A,p|f1,*2/.../*x,T)
(ii) Heterogeneity in A follows a gamma distribu-
tion with pdf
= \e~^ (1 - p) \e~k^-^ ... (1 - p) A£TA<'*-'*-'>
of \ r-\p-Xa
.{p + (l-p)^-A(T-^}
f(\\r,a)= r{r) , A>0. (1)
= p{\ - p)x-l\xe-Kt* + (1 - p)x\xe-KT.
(iii) After any transaction, a customer becomes
inactive with probability p. Therefore the point at As pointed out earlier for the Pareto/NBD, note that
which the customer "drops out" is distributed across information on the timing of the x transactions is not
transactions according to a (shifted) geometric distri-
required; a sufficient summary of the customer's pur-
bution with pmf
chase history is (X = x, tx, T).
P (inactive immediately after ;th transaction) Similar to SMC, we assume that all customers are
active at the beginning of the observation period;
= p(l-Py-\ ; = 1,2,3
therefore, the likelihood function for a customer mak-
(iv) ing 0follows
Heterogeneity in p purchases in the interval
a beta (0, T] isdistribution
the standard
with pdf exponential survival function:
L(A|X = 0,T) = <rAT.

f^a'b)= B(J) - os'*1' (2)
Thus, we can write the individual-level likelihood
where B(a,b) is the beta function, which can be
function as
expressed in terms of gamma functions: B(a,b) =
r(a)T(b)/T(a + b).
L(\,p\X = x,T) = (l-p)x\xe-XT
(v) The transaction rate A and the dropout proba-
bility p vary independently across customers. + 8x>0p(l-py-'\xe-*tx, (3)
4. Model Development at the where Sx>0 = 1 if x > 0, 0 otherwise.

Individual Level
4.2. Derivation of P(X(t) = x)
4.1. Derivation of the Likelihood Function Let the random variable X(t) denote the number of
Consider a customer who had x transactions in transactions occurring in a time period of length t
the period (0, T] with the transactions occurring (with a at
time origin of 0). To derive an expression
for the P(X(t) = x), we recall the fundamental rela-
0 r, t2 tx T tionship between interevent times and the number of
H
events: X(t) >xoTx<t, where Tx is the random vari-
able denoting the time of the xth transaction. Given
We derive the individual-level likelihood function our assumption regarding the nature of the dropout
in the following manner: process,
• the likelihood of the first transaction occurring
at ^ is a standard exponential likelihood component, P(X(0 = x) = P(active after xth purchase)
which equals ke~kt^ .
• the likelihood of the second transaction occur-
-P(Tx<t and Tx+1>t) + 8x>0
ring at t2 is the probability of remaining active at tx •P (becomes inactive after xth purchase)
times the standard exponential likelihood component,
■P(Tx<t).
which equals (1 -p)Ae"A(f2"fl).
This continues for each subsequent transaction, Given
until:the assumption that the time between transac-
tions is characterized by the exponential distribution,
• the likelihood of the xth transaction occurring
at tx is the probability of remaining active at P(Tx<t
tx_x and Tx+1 > t) is simply the Poisson probability

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
278 Marketing Science 24(2), pp. 275-284, ©2005 INFORMS
The
that X(t) = x, and P(Tx<t) is four Erlang-x
the BG/NBD model cdf.parameters (r,a,a,b) can
There-
fore, be estimated via the method of maximum likelihood
in the following manner. Suppose we have a sample
of N customers, where customer i had Xf = *f trans-
p(x(t)=x\\,P) = (i-Pyy >x] +sx>op(i-pri actions in the period (0, 7}], with the last transaction
occurring at tXj. The sample log-likelihood function is
given by
L /=o 1- J
N
4.3. Derivation of E[X(t)]
Given that the number of transactions follows a Pois- LL(r/a/fl/b) = ^ln[L(r/a/fl/b|Xl-=xI./fx./i;)]. (7)
son process, E[X(t)] is simply Xt if the customer is
active at t. For a customer who becomes inactive This can be maximized using standard numerical
at r<t, the expected number of transactions in the routines.
optimization
period (0,r] is At. • Taking the expectation of (4) over the distribu-
However, what is the likelihood that a customer tion of A and p results in the following expression
becomes inactive at r? Conditional on A and p, for the probability of observing x purchases in a time
period of length t:
P(r>0 = P(active at t\\,p) = ^(l-py^-L
;=0 h P(X(t) = x\r,a,a,b)
= e-Wm
^B(a,b+x)T(r+x)( a V/ t V
B{a,b) T(r)xl \a + t) \a + t)
This implies that the pdf of the dropout time is
given by g(r\ A,p) = Ap*rApT. (Note that this takes
B(a + l,b+x-l)
on an exponential form. However, it features an +6x>0 B{a,b)
explicit association with the transaction rate A, in con-
trast with the Pareto/NBD, which has an exponential [i ( a V[xf r(r+;)/ t yjl
dropout process that is independent of the transaction
rate.) It follows that the expected number of transac-
tions in a time period of length t is given by • Finally, taking the expectation of (5) over the dis-
tribution of A and p results in the following expression
for the expected number of purchases in a time period
E(X(t)\\,p) = \t-P(T>t)+ f Arg(r|A,p)dT
Jo of length t:
= ---e-^. (5) E(X{t)\r,a,a,b)

V V
5. Model Development for a = a-\ I \a + tj \ a + tj ]

Randomly Chosen Individual (9)
All the expressions developed above are conditional
on the transaction rate A and the dropout probability
where 2FiO is the Gaussian hypergeometric function.
p, both of which are unobserved. To derive the (Seeequiv-
the appendix for details of the derivation.)
alent expressions for a randomly chosen customer, Note that this final expression requires a single
we take the expectation of the individual-levelevaluation
results of the Gaussian hypergeometric function,
over the mixing distributions for A and p, as given in
but it is important to emphasize that this expecta-
(1) and (2). This yields the following results. tion is only used after the likelihood function has
• Taking the expectation of (3) over the distribu-
been maximized. A single evaluation of the Gaussian
tion of A and p results in the following expression for
hypergeometric function for a given set of parame-
the likelihood function for a randomly chosen cus-
ters is relatively straightforward, and can be closely
tomer with purchase history (X = x,tx,T)\
approximated with a polynomial series, even in a
L(r,a,a,b\X = x,tx,T) modeling environment such as Microsoft Excel.
In order for the BG/NBD model to be of use in
_B(a,b+x) Y(r+x)ar a forward-looking customer-base analysis, we need
B(a,b) r(r)(a + T)r+x to obtain an expression for the expected number of
B(a + l,b+x-l) Y{r+x)ar transactions in a future period of length t for an
+d*>° B(a,b) T(r)(a + tx)^' (6)
individual with past observed behavior (X = x,tx,T).

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
We provide a careful derivation in the

and used appendix,
the estimated but to generate fore-
parameters
here is the key expression: casts for a holdout period covering the remaining 52
weeks. We evaluate the performance of the BG/NBD
E(Y(t)\X = x,tx,T,r,a,a,b) based on the mean absolute percent error (MAPE) cal-
culated across this 52-week forecast sales trajectory. If
the MAPE value is a low number (below, say, 5%),
1-Ufi __a_(a±Z\r+x
we have faith in the applicability of the BG/NBD for
(10) that particular set of underlying parameters; other-
Once again, this expectation requires a single eval- wise, we need to look more carefully to understand
uation of the Gaussian hypergeometric function for why the BG/NBD is not doing an adequate job of
any customer of interest, but this is not a burden- matching the Pareto/NBD sales projection.
some task. The remainder of the expression is simple
arithmetic. 6.2. Simulation Results
In general, the BG/NBD performed quite well in this
holdout-forecasting task. The average value of the
6. Simulation MAPE statistic was 2.68%, and the worst case across
While the underlying behavioral story associated
all 81 with
worlds was a reasonably acceptable 6.97%.
the proposed BG/NBD model is quite similarHowever,
to thatupon closer inspection we noticed an inter-
of the Pareto/NBD, we have not yet provided esting, any
systematic trend across the worlds with rela-
assurance that the empirical performance of thehigh
tively twovalues of MAPE. In Table 1 we summarize
models will be closely aligned with each other. In this
the relevant summary statistics for the 10 worst simu-
section, therefore, we discuss a comprehensive
lated simu-
worlds in contrast with the remaining 71 worlds.
lation study that provides a thorough understanding
Notice that the BG/NBD forecasts tend to be rela-
of when the BG/NBD can (and cannot) serve aspoor
tively a when penetration and /or purchase fre-
close proxy to the Pareto/NBD. More specifically, weextremely low.
quency are
create a wide variety of purchasing environments (by
Upon further reflection about the differences
manipulating the four parameters of the Pareto/NBD
between the two model structures, this result makes
model) to look for limiting conditions under which
sense. Under the Pareto/NBD model, dropout can
the BG/NBD model does a poor job of capturing the
occur at any time - even before a customer has made
underlying purchasing process. his first purchase after the start of the observation
period. However, under the BG/NBD, a customer
6.1. Simulation Design
cannot become inactive before making his first pur-
To create these simulated purchasing environments,
chase. If penetrations and/or buying rates are fairly
we chose three levels for each of the four Pareto/NBD
high, then this difference becomes relatively inconse-
parameters, then generated a full-factorial design of
quential. However, in a world where active buyers are
34 = 81 different "worlds." For the two shape parame-
either uncommon or very slow in making their pur-
ters (r and s) we used values of 0.25, 0.50, and 0.75; for
chases, the BG/NBD will not do such a good job of
each of the two scale parameters (a and j3) we used
mimicking the Pareto/NBD.
values of 5, 10, and 15. When we translate these var-
Beyond this one source of deviation, there do
ious combinations into meaningful summary statis-
not appear to be any other patterns associated with
tics it becomes easy to see the wide variation across
higher versus lower values of MAPE. For instance, the
these simulated worlds. For instance, buyer penetra-
Pearson correlation between MAPE and penetration
tion (i.e., the number of customers who make at least
for the 71 worlds with "good behavior" is a modest
one purchase, or 1 -P(0)) varies from a low of 13%
0.142. (In to
contrast, across all 81 worlds, this correla-
a high of 76%. Likewise, average purchase frequency
tion is 0.379.) Therefore, when we set aside the worlds
(i.e., mean number of purchases among buyers, or buying, the BG/NBD appears to be very
with sparse
E[X]/(l-P(0))) ranges from 2.1 up to 8.2 purchases
robust.
per period. It is worth noting that this broadIt range
would be a simple matter to extend the BG/NBD
covers the observed values from the original Schmit-
model to allow for a segment of "hard core non-
tlein and Peterson (1994) application as well as the
buyers." This would require only one additional
actual dataset used in our empirical analysis (to be
discussed in the next section). Table 1 Summary of Simulation Results
For each of the 81 simulated worlds, we created a
Average purchase
synthetic panel of 4,000 households, then simulated MAPE Penetration frequency
the Pareto/NBD purchase (and dropout) process for
Worst 10 worlds 5.29 26% 2.6
a period of 104 weeks. We then ran the Other
BG/NBD71 worlds 2.32 43% 3.8
model on the first 52 weeks for each of these datasets,

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
the number
parameter and would likely overcome thisof transactions, not the
problem
purchased.
completely, but we do not see the likelihood or sever-
ity of this problem to be extremeMaximum enoughlikelihoodto war- estimates of th
eters {r,a,a,b)
rant such an extension as part of the basic model. are obtained by max
likelihood
Nevertheless, we encourage managers to continually function given in (7) a
numerical
monitor summary statistics such as penetration andoptimization methods are
the Solver tool in Microsoft
purchase frequency; for many firms this is already a Excel, to obtain the
routine practice. parameter estimates. (Identical estimates are obtained
using the more
Having established the robustness (and sophisticated
an impor- MATLAB programming
language.)
tant limiting condition) of the BG/NBD, To implement
we now the model
turnin Excel, we
rewrite the
to a more thorough investigation oflog-likelihood function, (6), as
its performance
(relative to the Pareto/NBD) in L(r,a,a,b\X
an actual dataset.
= x,tx,T) = AvA2-(A3 + 8x>0A4),
where
7. Empirical Analysis _T{r+x)ar _T(a + b)r(b+x)
We explore the performance of the BG/NBD model
l~ T(r) ' 2~ Y{b)Y{a + b+x)'
using data on the purchasing of CDs at the online
retailer CDNOW. The full dataset focuses on a single
cohort of new customers who made their first pur-
chase at the CDNOW website in the first quarter of
This is very easy to code in Excel - see Figure 1 for
1997. We have data covering their initial (trial) and
complete details. (A note on how to implement the
subsequent (repeat) purchase occasions for the period
model in Excel, along with a copy of the complete
January 1997 through June 1998, during which the
spreadsheet, can be found at http://brucehardie.com/
23,570 Ql/97 triers bought nearly 163,000 CDs after
notes/004/.)
their initial purchase occasions. (See Fader and Hardie
The parameters of the Pareto/NBD model are
2001 for further details about this dataset.)
also obtained via MLE, but this task could be per-
For the purposes of this analysis, we take a l/10th
formed only in MATLAB due to the computational
systematic sample of the customers. We calibrate
demands of the model. The parameter estimates and
the model using the repeat transaction data for the
corresponding log-likelihood function values for the
2,357 sampled customers over the first half of the
two models are reported in Table 2. Looking at
78-week period and forecast their future purchasing
the log-likelihood function values, we observe that the
over the remaining 39 weeks.BG/NBD
For customer
model provides ia better
(i = l, fit to the data.
...,2,357), we know the length of the
In Figure 2, wetime
examineperiod
the fit of these models
during which repeat transactionsvisually:could havenumbers
The expected occurred of people making 0,
(7] =39 - time of first purchase),
1,..., 7+ repeat purchases of
the number repeat
in the 39-week model cali-
transactions in this period (*,-),
bration and the
period fromtime
the two of his are compared to
models
last repeat transaction (tx .). the
(Ifactual
x,=0, tx =0.) In con-
frequency distribution. The fits of the two
trast to Fader and Hardie (2001),
modelswe areclose.
are very focusing on
On the basis of the chi-square
Figure 1 Screenshot of Excel Worksheet for Parameter Estimation

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
Fader et al.: Counting Your Customers the Easy Way
Table 2 Model Estimation Results Figure 3 Conditional Expectations
BG/NBD Pareto/NBD
r 0.243 0.553
a 4.414 10.578
a 0.793
b 2.426
s 0.606
p 11.669
LL -9,582.4 -9,595.0
goodness-of-fit test, we note that the BG/NBD model

provides a better fit to the data (*f = 4.82, p = 0.19)
than the Pareto/NBD, ^ = 11.99, (p = 0.007).
The performance of these models becomes more
apparent when we consider how well the models
track the actual number of (total) repeat transactions
over time. During the 39-week calibration period, the
tracking performance of the BG/NBD and Pareto/
NBD models is practically identical. In the subsequent
transactions.
39-week forecast period, both models track(For the
each x, we are averaging over cus-
actual
(cumulative) sales trajectory, with the
tomers with Pareto/NBD
different values of tx.)
Both the
performing slightly better than the BG/NBD (under-
BG/NBD and Pareto/NBD models pro-
forecasting by 2% versus 4%), but both models
vide excellent predictionsdem-
of the expected number of
onstrate superb tracking/forecasting capabilities.
transactions in the holdout period. It appears that the
Our final - and perhaps most critical
Pareto/NBD - offers
examination
slightly better predictions than the
of the relative performance of the two but
BG/NBD, models focuses
it is important to keep in mind that
on the quality of the predictions of individual-level
the groups towards the right of the figure (i.e., buyers
transactions in the forecast period
with (Weeks 40-78)
larger values of x in con-
the calibration period) are
ditional on the number of observed transactions in
extremely small. An important aspect that is hard to
the model calibration period. For the BG/NBD model,
discern from the figure is the relative performance for
these are computed using (10). For the Pareto/NBD,
the very large "zero class" (i.e., the 1,411 people who
as noted earlier, the equivalent expression is repre-
made no repeat purchases in the first 39 weeks). This
sented by Equation (22) in SMC. group makes a total of 334 transactions in Weeks 40-
In Figure 3, we report these conditional expecta-
78, which comprises 18% of all of the forecast period
tions along with the average of the actual number of
transactions. (This is second only to the 7+ group,
transactions that took place in the forecast period, bro-
which accounts for 22% of the forecast period trans-
ken down by the number of calibration-period repeat
actions.) The BG/NBD conditional expectation for the
zero class is 0.23, which is much closer to the actual
Figure 2 Predicted Versus Actual Frequency of Repeat Transactions
average (334/1,411 = 0.24) than that predicted by the
Pareto/NBD (0.14).
Nevertheless, these differences are not necessarily
meaningful. Taken as a whole across the full set of
2,357 customers, the predictions for the BG/NBD and
Pareto/NBD models are indistinguishable from each
other and from the actual transaction numbers. This
is confirmed by a three-group ANOVA (F2, 7068 = 2.65),
which is not significant at the usual 5% level.
The means reported in Figure 3 mask the variability
in the individual-level numbers. Consider, for exam-
ple, the 100 customers who made three repeat transac-
tions in the calibration period. In the course of the 39-
week forecast period, this group of customers made
anywhere between 0 and 10 repeat transactions, with
an average of 1.56 transactions. The individual-level
BG/NBD conditional expectations vary from 0.04 to

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
Table 3 Correlations Between Forecast Period Transaction Numbers As Albers (2000) notes, the use of marketing mod-
Actual BG/NBD Pareto/NBD els in actual practice is becoming less of an exception,
and more of a rule, because of spreadsheet software.
Actual 1.000
It is our hope that the ease with which the BG/NBD
BG/NBD 0.626 1.000
Pareto/NBD 0.630 0.996 1.000
model can be implemented in a familiar modeling
environment will encourage more firms to take bet-
ter advantage of the information already contained
2.57, with an average of 1.52;
in their the databases.
customer transaction Pareto/NBD
Furthermore, con-
ditional expectations vary
as key personnel become comfortable with 2.84,
from 0.09 to this type with
an average of 1.71. Table 3 reports
of model, the
we can expect to see correlations
growing demand
between the actual number of repeat transactions
for more complete (and complex) models - and more and
the BG/NBD and Pareto/NBD conditional
willingness to commit resources to them. expecta-
tions, computed across all 2,357 customers.
Beyond the purely technical aspects involved in
We observe that the correlation between the actual
deriving the BG/NBD model and comparing it to the
number of forecast period transactions and the asso-
Pareto/NBD, we have attempted to highlight some
ciated BG/NBD conditional expectations is 0.626. Is
important managerial aspects associated with this
this high or low? To the best of our knowledge,
kind of modeling exercise. For instance, to the best
no other researchers have reported such measures
of our knowledge, this is only the second empirical
of individual-level predictive performance, which
validation of the Pareto/NBD model - the first being
makes it difficult for us to assess whether this corre-
Schmittlein and Peterson (1994). (Other researchers -
lation is good or bad. (We hope that future research
e.g., Reinartz and Kumar 2000, 2003; Wu and Chen
will shed light on this issue.)
2000, have employed the model extensively, but do
Given the objectives of this research, it is of greater
not report on its performance in a holdout period.)
interest to compare the BG/NBD predictions with
those of the Pareto/NBD model. The differences areWe find that both models yield very accurate forecasts
of future purchasing, both at the aggregate level as
negligible: The correlation between these two sets of
well as at the level of the individual (conditional on
numbers is an impressive 0.996.
past purchasing).
This analysis demonstrates the high degree of
validity of both models, particularly for the purposesBesides using these empirical tests as a basis to
compare models, we also want to call more attention
of forecasting a customer's future purchasing, condi-
tional on his past buying behavior. Furthermore, itto these analyses - with particular emphasis on con-
demonstrates that the performance of the BG/NBD ditional expectations - as the proper yardsticks that
model mirrors that of the Pareto/NBD model. all researchers should use when judging the abso-
lute performance of other forecasting models for CLV-
8. Discussion related applications. It is important for a model to
be able to accurately project the future purchasing
Many researchers have praised the Pareto/NBD
behavior of a broad range of past customers, and its
model for its sensible behavioral story, its excellent
performance for the zero class is especially critical,
empirical performance, and the useful managerial
given the typical size of that "silent" group.
diagnostics that arise quite naturally from its formu-
In using this model, there are several implemen-
lation. We fully agree with these positive assessments
tation issues to consider. First, the model should be
and have no misgivings about the model whatsoever,
applied separately to customer cohorts defined by the
besides its computational complexity. It is simply our
intention to make this type of modelingtime (e.g., quarter) of acquisition, acquisition chan-
framework
nel, etc.
more broadly accessible so that many researchers (Blattberg et al. 2001). (For a very mature
and practitioners can benefit from the original base,
customer the model could be applied to coarse
ideas
of SMC. RFM-based segments.) Second, if we are using one
The BG/NBD model arises by making a small, rel- cohort's parameters as the basis for, say, another
atively inconsequential, change to the Pareto/NBDcohort's conditional expectation calculations, we must
assumptions. The transition from an exponential dis-be confident that the two cohorts are comparable.
tribution to a geometric process (to capture customer Third, we must acknowledge an implicit assumption
dropout) does not require any different psychological when using the forecasts generated using a model
theories, nor does it have any noteworthy manage- such as that developed in the paper: We are assum-
rial implications. When we evaluate the two models ing that future marketing activities targeted at the
on their primary outcomes (i.e., their ability to fit and group of customers will basically be the same as those
predict repeat transaction behavior), they are effec- observed in the past. (Of course, such models can
tively indistinguishable from each other. be used to provide a baseline against which we can

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
examine the impact of changes in marketing activ-

Next, we evaluate
ity.) Finally, as with the Pareto/NBD, the BG/NBD
,i q' p-Hl-p)"-1
must be augmented by a model of purchase amount
before it can be used as the basis for CLV calculations. Jo p(a+pty B{a,b) P
Two candidate models are the normal-normal mixture
(Schmittlein and Peterson 1994) and the gamma-
gamma mixture (Colombo and Jiang 1999). A natu-
letting q = l-p (which implies dp = -dq)
ral starting point for any such extension would be to
assume that purchase amount is independent of pur-
chase timing (Schmittlein and Peterson 1994).
The BG/NBD easily lends itself to relevant gener-which, recalling Euler's integral for the Gaussian hyperge-
alizations, such as the inclusion of demographics or
ometric function
measures of marketing activity. (In fact, some poten-
tial end users of models such as the BG/NBD and
the Pareto/NBD may view the inclusion of such
It follows that
variables as a necessary condition for implementa-
tion.) However, great care must be exercised when E(X(t)\r,a,a,b)
undertaking such extensions: To the extent that cus-
tomer segments have been formed on the basis of past
behavior (e.g., using the RFM framework) and these a-\ L \cx + t)2 l\ 'a + t)\
segments have been targeted with different market-Derivation of E(Y(t)\X = x,tx,T)
ing activities (Eisner et al. 2004), we must be aware of
Let the random variable Y(t) denote the number of pur-
econometric issues such as endogeneity bias (Shugan chases made in the period (T,T + t\. We are interested in
2004) and sample selection bias. If such extensions computing the conditional expectation E(Y(t)\X = x,tx,T),
are undertaken, the BG/NBD in its basic form would the expected number of purchases in the period (T,T + t]
still serve as an appropriate (and hard-to-beat) bench-for a customer with purchase history X = x,tx,T.
mark model and should be viewed as the right start- If the customer is active at T, it follows from (5) that
ing point for any customer-base analysis exercise in a
"noncontractual" setting (i.e., where the opportunities E(Y(t)\\/p) = ---e-xPt. (Al)
V V
for transactions are continuous and the time at which
customers become inactive is unobserved). What is the probability that a customer is active at T?
Given our assumption that all customers are active at the
Acknowledgments beginning of the initial observation period, a customer can-
The first author acknowledges the support of the Wharton-not drop out before he has made any transactions; therefore,
SMU (Singapore Management University) Research Cen-
P(active at T|X = 0,T,A,p) = l.
ter. The second author acknowledges the support of ESRC
Grant R000223742 and the London Business School Centre
For the case where purchases were made in the period
for Marketing. (0,T], the probability that a customer with purchase his-
tory (X = x,tx,T) is still active at T, conditional on A and
Appendix
In this appendix, we derive the expressions for E[X(£)] p, is simply the probability that he did not drop out at tx
and E(Y(t)\X = x,tx,T). Central to these derivations is and made no purchase in (tx,T], divided by the probabil-
Euler's integral for the Gaussian hypergeometric function: ity of making no purchases in this same period. Recalling
that this second probability is simply the probability that
the customer became inactive at tx, plus the probability he
remained active but made no purchases in this interval, we
ob. have
(l-V)e-HT-tx)
Derivation of E[X(t)]
PtactiveatTIX^,,,!,^)^;;^,,,,
To arrive at an expression for E[X(£)] for a randomly chosen (l-V)e-HT-tx)
customer, we need to take the expectation of (5)this
Multiplying over the
by [(l-p)*-U*e-A'*]/[(l-p)x-U*<rA'*]
distribution of A and p. First we take the expectation
gives us with
respect to A, giving us
E(X(t) V v" \r,a,p) 'F) = -- , "' v .

E(X(t) V v" \r,a,p) 'F) = p -- p(a+pty , v
P(activeatT|X = ,,,,T,A,,)=L(^^ = ;c^T),
The next step is to take the expectation of this over the (A2)
distribution of p. We first evaluate
where the expression for L(\,p\X = x,tx,T) is given in (3).
,ilp«-i(l-p)»-i a + ft-1 (Note that when x = 0, the expression given in (A2)
Jo p B(a,b) V a-\ ' equals 1.)

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC
Multiplying (Al) and (A2) yieldsSubstituting (6), (A7), and (A

we get
E(Y(t)\X = x,tx,T,\,p)
E(Y(t)\X = x,tx,T,r,a,a,b)
(l-pY^e-^^/p-l/pe-^/p)
= L(\,p\X = x,tx,T)
_p-l(l-p)x\xe-XT-p-l(l-p)x\xe-^T+f)t)
~ L(\,p\X = x,tx,T) ' ( } i+«x>oRfcT(^r
(Note that this reduces to (Al) when x = 0, which follows
from the result that a customer who made zero purchases
in the time period (0,T] must be assumed to be active at
time T.) References
As the transaction rate A and dropout probability p Albers, Sonke. 2000. Impact of types of functional relationships,
are unobserved, we compute E(Y(t)\X = x,tx,T) for a ran- decisions, and solutions on the applicability of marketing mod-
domly chosen customer by taking the expectation of (A3) els. Internat. J. Res. Marketing 17(2-3) 169-175.
over the distribution of A and p, updated to take into Balasubramanian, S., S. Gupta, W. Kamakura, M. Wedel. 1998. Mod-
account the information X = x,tx,T: eling large datasets in marketing. Statistica Neerlandica 52(3)
303-323.
E(Y(t)\X = x,tx,T,r,a,a,b)
Blattberg, Robert C, Gary Getz, Jacquelyn S. Thomas. 2001. Custo-
mer Equity. Harvard Business School Press, Boston, MA.
= Jo
JoJof Jo rE(Y(t)\X = x,tx,T,\,p)
Colombo, Richard, Weina Jiang. 1999. A stochastic RFM model.
•f(\,p\r,ata,b,X = x,tx,T)d\dp. (A4) /. Interactive Marketing 13(Summer) 2-12.
By Bayes theorem, the joint posterior distribution of A Eisner, Ralf, Manfred Krafft, Arnd Huchzermeier. 2004. Optimizing
and p is given by Rhenania's direct marketing business through dynamic multi-
level modeling (DMLM) in a multicatalog-brand environment.
f(X,p\r,a,a,b,X = x,tx,T) Marketing Sci. 23(2) 192-206.
^L(\,p\X = x,tx,T)f(\\r,a)f(p\a,b) Fader, Peter S., Bruce G. S. Hardie. 2001. Forecasting repeat sales
L(r,a,«,[7|X = x,f:c,T) ' l ' at CDNOW: A case study. Part 2 of 2. Interfaces 31(May-June)
S94-S107.
Substituting (A3) and (A5) in (A4), we get
Jain, Dipak, Siddhartha S. Singh. 2002. Customer lifetime value
research in marketing: A review and future directions. /. Inter-
E(Y(t)\X v Wl = x,L,T,r,a,a,b) /*/''/'/ =
E(Y(t)\X v Wl = x,L,T,r,a,a,b) /*/''/'/ = active Marketing 16(Spring) 34-46.
(A6) Lozier, D. W., F. W. J. Olver. 1995. Numerical evaluation of spe-

cial functions. Walter Gautschi, ed. Mathematics of Computation
where 1943-1993: A Half-Century of Computational Mathematics. Proc.
Sympos. Appl. Math. American Mathematical Society, Provi-
A= fJorp-l(l-p)x\xe-ATf(X\r,a)f(p\a,b)d\dp
Jo
dence, RI.
Mulhern, Francis J. 1999. Customer profitability analysis: Measure-

B{a-l,b+x) T(r+x)ar ment, concentration, and research directions. /. Interactive Mar-
B(a,b) r(r)(a + T)'+* { } keting 13(Winter) 25-40.
and Niraj, Rakesh, Mahendra Gupta, Chakravarthi Narasimhan. 2001.
Customer profitability in a supply chain. /. Marketing 65(July)
1-16.
B = Jo
JoJof Jo rp-l(1-pnxe-HT+pt)mr,a)f(p\a,b)d\dp
Reinartz, Werner, V. Kumar. 2000. On the profitability of long-life
,1 pa-2(l_p}b+x-l joo ary+x-le-X(a+T+pt) | customers in a noncontractual setting: An empirical investiga-
= i0 WaJ) U W) r tion and implications for marketing. /. Marketing 64(October)
17-35.
Reinartz, Werner, V. Kumar. 2003. The impact of customer relation-

ship characteristics on profitable lifetime duration. /. Marketing
letting q = l-p (which implies dp = -dq) 67(January) 77-99.
V(r+x)ar Schmittlein, David C, Robert A. Peterson. 1994. Customer base
analysis: An industrial purchase process application. Marketing
" Y{r)B{a,b){a + T + ty+x
Sci. 13(Winter) 41-67.
Schmittlein, David C, Donald G. Morrison, Richard Colombo. 1987.
Counting your customers: Who are they and what will they
do next? Management Sci. 33(January) 1-24.
which, recalling Euler's integral for the Gaussian hyperge-
ometric function, Shugan, Steven M. 2004. Endogeneity in marketing decision mod-
els. Marketing Sci. 23(1) 1-3.
_B(a-l,b+x) T(r+x)ar
Wu, Couchen, Hsiu-Li Chen. 2000. Counting your customers:
B(a,b) T(r)(a + T + ty+x Compounding customer's in-store decisions, interpurchase
time, and repurchasing behavior. Eur. }. Oper. Res. 127(1)
•2FJr+x,b+x;a + b+x-l; 109-119.
V a + T + tJ

144.164.194.27 on Mon, 04 Jan 2021 12:54:10 UTC

Counting Your Customers The Easy Way An Alternative To The ParetoNBD Model

Uploaded by

Copyright:

Available Formats

You might also like

Counting Your Customers The Easy Way An Alternative To The ParetoNBD Model

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Counting Your Customers The Easy Way An Alternative To The ParetoNBD Model

Uploaded by

Copyright:

Available Formats

"Counting Your Customers" the Easy Way: An Alternative to the Pareto/NBD Model

Author(s): Peter S. Fader, Bruce G. S. Hardie and Ka Lok Lee

Stable URL: https://www.jstor.org/stable/40056956

This content downloaded from