Professional Documents
Culture Documents
Counting Your Customers The Easy Way An Alternative To The ParetoNBD Model
Counting Your Customers The Easy Way An Alternative To The ParetoNBD Model
Counting Your Customers The Easy Way An Alternative To The ParetoNBD Model
REFERENCES
Linked references are available on JSTOR for this article:
https://www.jstor.org/stable/40056956?seq=1&cid=pdf-
reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Marketing
Science
Bruce G. S. Hardie
London Business School, Regent's Park, London NW1 4SA, United Kingdom, bhardie@london.edu
Ka Lok Lee
Catalina Health Resource, Blue Bell, Pennsylvania 19422, kaloklee@alumni.upenn.edu
Today's
can thenmanagers are
serve as an veryinto
input interested
"lifetimeinvalue"
predicting the future
calculations. purchasing
Among the modelspatterns of their
that provide suchcustomers,
capa- which
bilities, the Pareto/NBD "counting your customers" framework proposed by Schmittlein et al. (1987) is highly
regarded. However, despite the respect it has earned, it has proven to be a difficult model to implement, par-
ticularly because of computational challenges associated with parameter estimation.
We develop a new model, the beta-geometric /NBD (BG/NBD), which represents a slight variation in the
behavioral "story" associated with the Pareto/NBD but is vastly easier to implement. We show, for instance,
how its parameters can be obtained quite easily in Microsoft Excel. The two models yield very similar results
in a wide variety of purchasing environments, leading us to suggest that the BG/NBD could be viewed as an
attractive alternative to the Pareto/NBD in most applications.
Key words: customer base analysis; repeat buying; Pareto/NBD; probability models; forecasting; lifetime value
History: This paper was received August 11, 2003, and was with the authors 7 months for 2 revisions;
processed by Gary Lilien.
ring at t2 is the probability of remaining active at tx •P (becomes inactive after xth purchase)
times the standard exponential likelihood component,
■P(Tx<t).
which equals (1 -p)Ae"A(f2"fl).
This continues for each subsequent transaction, Given
until:the assumption that the time between transac-
tions is characterized by the exponential distribution,
• the likelihood of the xth transaction occurring
at tx is the probability of remaining active at P(Tx<t
tx_x and Tx+1 > t) is simply the Poisson probability
The
that X(t) = x, and P(Tx<t) is four Erlang-x
the BG/NBD model cdf.parameters (r,a,a,b) can
There-
fore, be estimated via the method of maximum likelihood
in the following manner. Suppose we have a sample
of N customers, where customer i had Xf = *f trans-
p(x(t)=x\\,P) = (i-Pyy >x] +sx>op(i-pri actions in the period (0, 7}], with the last transaction
occurring at tXj. The sample log-likelihood function is
given by
L /=o 1- J
N
4.3. Derivation of E[X(t)]
Given that the number of transactions follows a Pois- LL(r/a/fl/b) = ^ln[L(r/a/fl/b|Xl-=xI./fx./i;)]. (7)
son process, E[X(t)] is simply Xt if the customer is
active at t. For a customer who becomes inactive This can be maximized using standard numerical
at r<t, the expected number of transactions in the routines.
optimization
period (0,r] is At. • Taking the expectation of (4) over the distribu-
However, what is the likelihood that a customer tion of A and p results in the following expression
becomes inactive at r? Conditional on A and p, for the probability of observing x purchases in a time
period of length t:
P(r>0 = P(active at t\\,p) = ^(l-py^-L
;=0 h P(X(t) = x\r,a,a,b)
= e-Wm
^B(a,b+x)T(r+x)( a V/ t V
B{a,b) T(r)xl \a + t) \a + t)
This implies that the pdf of the dropout time is
given by g(r\ A,p) = Ap*rApT. (Note that this takes
B(a + l,b+x-l)
on an exponential form. However, it features an +6x>0 B{a,b)
explicit association with the transaction rate A, in con-
trast with the Pareto/NBD, which has an exponential [i ( a V[xf r(r+;)/ t yjl
dropout process that is independent of the transaction
rate.) It follows that the expected number of transac-
tions in a time period of length t is given by • Finally, taking the expectation of (5) over the dis-
tribution of A and p results in the following expression
for the expected number of purchases in a time period
E(X(t)\\,p) = \t-P(T>t)+ f Arg(r|A,p)dT
Jo of length t:
the number
parameter and would likely overcome thisof transactions, not the
problem
purchased.
completely, but we do not see the likelihood or sever-
ity of this problem to be extremeMaximum enoughlikelihoodto war- estimates of th
eters {r,a,a,b)
rant such an extension as part of the basic model. are obtained by max
likelihood
Nevertheless, we encourage managers to continually function given in (7) a
numerical
monitor summary statistics such as penetration andoptimization methods are
the Solver tool in Microsoft
purchase frequency; for many firms this is already a Excel, to obtain the
routine practice. parameter estimates. (Identical estimates are obtained
using the more
Having established the robustness (and sophisticated
an impor- MATLAB programming
language.)
tant limiting condition) of the BG/NBD, To implement
we now the model
turnin Excel, we
rewrite the
to a more thorough investigation oflog-likelihood function, (6), as
its performance
(relative to the Pareto/NBD) in L(r,a,a,b\X
an actual dataset.
= x,tx,T) = AvA2-(A3 + 8x>0A4),
where
7. Empirical Analysis _T{r+x)ar _T(a + b)r(b+x)
We explore the performance of the BG/NBD model
l~ T(r) ' 2~ Y{b)Y{a + b+x)'
using data on the purchasing of CDs at the online
retailer CDNOW. The full dataset focuses on a single
cohort of new customers who made their first pur-
chase at the CDNOW website in the first quarter of
This is very easy to code in Excel - see Figure 1 for
1997. We have data covering their initial (trial) and
complete details. (A note on how to implement the
subsequent (repeat) purchase occasions for the period
model in Excel, along with a copy of the complete
January 1997 through June 1998, during which the
spreadsheet, can be found at http://brucehardie.com/
23,570 Ql/97 triers bought nearly 163,000 CDs after
notes/004/.)
their initial purchase occasions. (See Fader and Hardie
The parameters of the Pareto/NBD model are
2001 for further details about this dataset.)
also obtained via MLE, but this task could be per-
For the purposes of this analysis, we take a l/10th
formed only in MATLAB due to the computational
systematic sample of the customers. We calibrate
demands of the model. The parameter estimates and
the model using the repeat transaction data for the
corresponding log-likelihood function values for the
2,357 sampled customers over the first half of the
two models are reported in Table 2. Looking at
78-week period and forecast their future purchasing
the log-likelihood function values, we observe that the
over the remaining 39 weeks.BG/NBD
For customer
model provides ia better
(i = l, fit to the data.
...,2,357), we know the length of the
In Figure 2, wetime
examineperiod
the fit of these models
during which repeat transactionsvisually:could havenumbers
The expected occurred of people making 0,
(7] =39 - time of first purchase),
1,..., 7+ repeat purchases of
the number repeat
in the 39-week model cali-
transactions in this period (*,-),
bration and the
period fromtime
the two of his are compared to
models
last repeat transaction (tx .). the
(Ifactual
x,=0, tx =0.) In con-
frequency distribution. The fits of the two
trast to Fader and Hardie (2001),
modelswe areclose.
are very focusing on
On the basis of the chi-square
Figure 1 Screenshot of Excel Worksheet for Parameter Estimation
BG/NBD Pareto/NBD
r 0.243 0.553
a 4.414 10.578
a 0.793
b 2.426
s 0.606
p 11.669
LL -9,582.4 -9,595.0
Table 3 Correlations Between Forecast Period Transaction Numbers As Albers (2000) notes, the use of marketing mod-
Actual BG/NBD Pareto/NBD els in actual practice is becoming less of an exception,
and more of a rule, because of spreadsheet software.
Actual 1.000
It is our hope that the ease with which the BG/NBD
BG/NBD 0.626 1.000
Pareto/NBD 0.630 0.996 1.000
model can be implemented in a familiar modeling
environment will encourage more firms to take bet-
ter advantage of the information already contained
2.57, with an average of 1.52;
in their the databases.
customer transaction Pareto/NBD
Furthermore, con-
ditional expectations vary
as key personnel become comfortable with 2.84,
from 0.09 to this type with
an average of 1.71. Table 3 reports
of model, the
we can expect to see correlations
growing demand
between the actual number of repeat transactions
for more complete (and complex) models - and more and
the BG/NBD and Pareto/NBD conditional
willingness to commit resources to them. expecta-
tions, computed across all 2,357 customers.
Beyond the purely technical aspects involved in
We observe that the correlation between the actual
deriving the BG/NBD model and comparing it to the
number of forecast period transactions and the asso-
Pareto/NBD, we have attempted to highlight some
ciated BG/NBD conditional expectations is 0.626. Is
important managerial aspects associated with this
this high or low? To the best of our knowledge,
kind of modeling exercise. For instance, to the best
no other researchers have reported such measures
of our knowledge, this is only the second empirical
of individual-level predictive performance, which
validation of the Pareto/NBD model - the first being
makes it difficult for us to assess whether this corre-
Schmittlein and Peterson (1994). (Other researchers -
lation is good or bad. (We hope that future research
e.g., Reinartz and Kumar 2000, 2003; Wu and Chen
will shed light on this issue.)
2000, have employed the model extensively, but do
Given the objectives of this research, it is of greater
not report on its performance in a holdout period.)
interest to compare the BG/NBD predictions with
those of the Pareto/NBD model. The differences areWe find that both models yield very accurate forecasts
of future purchasing, both at the aggregate level as
negligible: The correlation between these two sets of
well as at the level of the individual (conditional on
numbers is an impressive 0.996.
past purchasing).
This analysis demonstrates the high degree of
validity of both models, particularly for the purposesBesides using these empirical tests as a basis to
compare models, we also want to call more attention
of forecasting a customer's future purchasing, condi-
tional on his past buying behavior. Furthermore, itto these analyses - with particular emphasis on con-
demonstrates that the performance of the BG/NBD ditional expectations - as the proper yardsticks that
model mirrors that of the Pareto/NBD model. all researchers should use when judging the abso-
lute performance of other forecasting models for CLV-
8. Discussion related applications. It is important for a model to
be able to accurately project the future purchasing
Many researchers have praised the Pareto/NBD
behavior of a broad range of past customers, and its
model for its sensible behavioral story, its excellent
performance for the zero class is especially critical,
empirical performance, and the useful managerial
given the typical size of that "silent" group.
diagnostics that arise quite naturally from its formu-
In using this model, there are several implemen-
lation. We fully agree with these positive assessments
tation issues to consider. First, the model should be
and have no misgivings about the model whatsoever,
applied separately to customer cohorts defined by the
besides its computational complexity. It is simply our
intention to make this type of modelingtime (e.g., quarter) of acquisition, acquisition chan-
framework
nel, etc.
more broadly accessible so that many researchers (Blattberg et al. 2001). (For a very mature
and practitioners can benefit from the original base,
customer the model could be applied to coarse
ideas
of SMC. RFM-based segments.) Second, if we are using one
The BG/NBD model arises by making a small, rel- cohort's parameters as the basis for, say, another
atively inconsequential, change to the Pareto/NBDcohort's conditional expectation calculations, we must
assumptions. The transition from an exponential dis-be confident that the two cohorts are comparable.
tribution to a geometric process (to capture customer Third, we must acknowledge an implicit assumption
dropout) does not require any different psychological when using the forecasts generated using a model
theories, nor does it have any noteworthy manage- such as that developed in the paper: We are assum-
rial implications. When we evaluate the two models ing that future marketing activities targeted at the
on their primary outcomes (i.e., their ability to fit and group of customers will basically be the same as those
predict repeat transaction behavior), they are effec- observed in the past. (Of course, such models can
tively indistinguishable from each other. be used to provide a baseline against which we can
(l-V)e-HT-tx)
Derivation of E[X(t)]
PtactiveatTIX^,,,!,^)^;;^,,,,
To arrive at an expression for E[X(£)] for a randomly chosen (l-V)e-HT-tx)
customer, we need to take the expectation of (5)this
Multiplying over the
by [(l-p)*-U*e-A'*]/[(l-p)x-U*<rA'*]
distribution of A and p. First we take the expectation
gives us with
respect to A, giving us
^L(\,p\X = x,tx,T)f(\\r,a)f(p\a,b) Fader, Peter S., Bruce G. S. Hardie. 2001. Forecasting repeat sales
L(r,a,«,[7|X = x,f:c,T) ' l ' at CDNOW: A case study. Part 2 of 2. Interfaces 31(May-June)
S94-S107.
Substituting (A3) and (A5) in (A4), we get
Jain, Dipak, Siddhartha S. Singh. 2002. Customer lifetime value
research in marketing: A review and future directions. /. Inter-
E(Y(t)\X v Wl = x,L,T,r,a,a,b) /*/''/'/ =