Lawrence 1966

Models of Consumer Purchasing Behaviour
Author(s): Raymond J. Lawrence

Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 15, No. 3,
Research into Marketing (1966), pp. 216-233
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2985301
Accessed: 11-03-2016 04:25 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the
Royal Statistical Society. Series C (Applied Statistics).
http://www.jstor.org
This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR
RAYMOND J. LAWRENCE
University of Lancaster
It is of interest that the term "model" is tending to oust the older
term "theory", although the two words have much in common, be-
cause of its close connection with the idea of measurement and exact
mathematical relationships. It has been pointed out (Langhoff, 1965)
that the change in terminology has its uses: "Whatever the reason for
shifting from the term 'theory' to the term 'model', it offers tremendous
semantic advantages for the acceptance and proper positioning of
science and the scientist in industry generally. Theory had become a
'long-haired' word. Not only was it associated with the abstract, it
also connoted impracticality. Model, having been identified with scale
models of boats, locomotives and airplanes, communicates more
graphically the correct notions of theory to men who are unacquainted
with the scientific importance of theoretical constructs."
Within the field of fully specified mathematical models we are here
concerned with a particular subset of models which treat consumer
purchasing behaviour (a) stochastically and (b) from data based on
internal behavioural evidence. These two terms require further
clarification.
Behavioural Evidence
Consumer buying presents a picture of frustrating complexity. In
a multi-brand market almost every customer has his own individual
pattern of brand purchases through time. Theoretical constructs are
needed as a basis for aggregating the data and reducing it to manage-
able (i.e. understandable) proportions.
The most commonly used construct is the time-period. Purchases
which occur between certain calendar dates are added together, as in
the case of ordinary sales figures for the week or book month. Com-
parisons over time can be made by comparing the figures for "equal
periods". The word construct is deliberately used in this context to
draw attention to the fact that the standard time-period comparisons
are an artificial convention, although the method is so familiar from
long usage that it has the appearance of being the natural way to keep
track of events. The limitations appear when one considers how little
information is conveyed by a rise or drop in sales: the message is simply
"something has happened". The bare figures give no idea of what,
where or why. Analysis is needed to answer these questions, and here
one comes back to constructs. Types of trade outlet, sales territories
and socio-demographic classes of customer are typical breakdowns of
216
MODELS OF CONSUMER PURCHASING BEHAVIOUR 217
the figures which are used in an attempt to track down reasons for the
change in sales. Managers have in practice a theory, although they
would often hesitate to use the word themselves, about the constructs
which are likely to be helpful in isolating the causes of events affecting
their markets.
There are a number of other ways of aggregating sales information.
One is over customers, e.g. by counting up how many accounts fall
into different turnover categories. Some surprising results are apt to
show up, as in the case quoted by Sevin (1965) where 78% of a com-
pany's customers were found to account for 2% of sales volume. In
the consumer goods field an analogous operation would be a count of
customers by quantity purchased, which frequently shows up the
enormous importance of relatively few heavy buyers. A further possi-
bility (which brings us back to the point about behavioural evidence)
is to use the concept of brand loyalty as a means of subdividing the
market in an operationally useful way.
Brand loyalty can be regarded as a mental state, to be measured by
agreement with such statements as "I definitely intend to buy brand X
next time", "I always use X", etc. Unfortunately there is considerable
evidence that expressed intentions are not followed in many cases, and
more generally there seems to be an unbridged gap between measure-
ments of attitude and the occurrence of the behaviour which would
logically accompany that attitude. (For the social science evidence
see Festinger (1964). Similar conclusions are reached from a study
of the marketing literature in Haskins (1964).) An alternative approach
is to bypass the uncertain area of intention, attitude and other mental
states and to take a measure of brand loyalty directly from observed
behaviour. Perhaps the first systematic step in this direction was a
study by George Brown (1952-53) based on brand purchase sequences
taken from Chicago Tribune panel data. Brown classified such patterns
as AAAAAA as showing "undivided loyalty"; ABABAB as "divided
loyalty"; AAABBB as "unstable loyalty", etc. Later work has used
an arbitrary percentage figure, e.g. 75% or more of purchases of one
brand, as an operational definition of the brand loyal consumer.
Models which use behavioural evidence are based on records of
specific acts, usually records of continuous consumer purchases through
time. In this regard they have a kinship with psychological investiga-
tions such as animal learning experiments, where observed behaviour
has to be taken as the criterion of changes which have occurred. In
contrast, many models of human behaviour make use of verbal state-
ments from subjects about their preferences, feelings, reactions, etc.
This type of model-building may have greater long-run prospects of
usefulness but we will not be concerned with it here.
Describing Behaviour Stochastically
The application of model-building in the social sciences is a com-
paratively recent phenomenon, and the famous paper on statistical
218 APPLIED STATISTICS
learning theory by Estes (1950) is often taken as the starting date. A
basic difficulty is that few acts of human or animal behaviour are fully
predictable once an attempt is made to move beyond purely physio-
logical response measurements: even the most thoroughly conditioned
of laboratory rats is capable of delivering an occasional surprise. Thus
the immediate prospect of developing completely deterministic theories,
akin to those of the physical sciences, is slim. Attention has therefore
moved toward the stochastic or probabilistic expression of behaviour.
If the state of social science theory does not go as far as enabling an
investigator to say "this rat will turn right at the end of the maze", it
may make sense for him to adopt statements like "the probability that
this rat will turn right at the end of the maze is .95".
The probabilistic expression brings with it problems of interpreta-
tion. In one sense it is purely descriptive, and means only that a rat
will average 95 right turns in 100 runs. But it is natural to extend the
meaning beyond a count of overt, observable acts of turning and to
interpret the probability as "a state of the rat", i.e. to assume that
some semi-permanent change in behavioural disposition has come
about such that the rat itself is now a " 95 right-turner".
The transition from a description of behaviour to an assumed
corresponding internal state of the organism is not without difficulties.
Bush and Mosteller (1955, p. 15) acknowledge the problem when they
write: "We conceive that every organism possesses a 'true' proba-
bility p, at the start of each trial. As far as the mathematical system is
concerned, the physical basis for this probability is irrelevant. A
variety of physiological models might provide it. Any sort of fictitious
device that aids us to think about probabilities is as acceptable as any
other. [For example one may think in terms of spinning disks with
sectors marked to correspond with probabilities; urns containing
black and white balls; random number tables, etc.] The above
heuristic devices have little to do, of course, either with the real world
or with the mathematical system. However, we postulate that organisms
behave as if they possessed such probability mechanisms."
An alternative explanation is put forward by Newell and Simon
(1963): "Stochastic theories (theories incorporating random elements)
can be interpreted in either of two ways: the random element can be
regarded as being at the heart of things-a genuine and integral
characteristic of the subject's constitution-or it can be regarded as
an artifact for summing up the unpredictable resultant of a host of
minor (and additive) factors that influence behaviour. There appears
to be, at present, no satisfactory way to choose empirically between
these two interpretations of the random element."
The above quotations are useful aids toward understanding the
assumptions built into the stochastic model of behaviour. A further
point is worth adding. To describe behaviour stochastically is to sum
up the interplay of motivations, influences, and choice mechanisms
within an individual in a single probability value which must be in
some sense their "settled outcome" in terms of disposition toward
subsequent behaviour. Thus the probability, if it is to be adopted as
the best available concept to quantify action tendency, expresses the
action-oriented resultant of all relevant internal mental processes.
In application to consumer purchasing the state of the system usually
corresponds to the state of a customer. The probabilities which express
this state are related to brands. P(i) = 0-5, for instance, represents the
state of an individual who stands an even chance of buying the ith
birand next time. The sum of the probabilities for all i = 1,2, . . .,n is
taken to be unity. In Markov process models the consumer is defined
to be in state i when his preceding purchase was brand i, and the transi-
tion probability P(ij), including P(ii) or the probability of remaining
"loyal" to brand i, represents the likelihood of a switch from brand i
to brand j on the next purchase occasion.
Levels of Analysis
It has been assumed so far that purchasing behaviour models are
concerned with events at the individual behaviour, or micro, level.
When it comes to aggregation to represent a population of consumers
there is in theory no problem: an average can be struck of the individual
probabilities of buying brand A, preferably with appropriate weighting
so that the big purchasers count more heavily. The aggregate proba-
bility for the group should then approximate to brand A's market
share in the buying period. One of the simplest brand-switching
models, Al Rohloff's gain/loss analysis (Rohloff, 1963), follows exactly
these lines. A family's purchases in two successive periods are allo-
cated as switches from one brand to another (or the same) brand ac-
cording to two rules: (1) Repeat business for a brand is equal to the
minimum of the share points for the brand in the two periods; (2) An
increase in share points for a brand is allocated to the brands losing
share points according to the relative magnitude of their respective
losses.
The rules amount to a straightforward allocation, on a pro rata basis,
of brand gains to brands which lose share of the household's purchases.
The allocation is purely mathematical, and no theoretical claims are
made in its favour. It is just a convenient way of summarizing data.
Confusion unfortunately arises because another method of estimating
probabilities is used derived entirely from macro or aggregate in-
formation. If 100 people buy brand A this time and 20 of them buy
brand B next time, the probability of a switch from A to B is said to
be 0-2. This estimate may be valid for the group but it is not neces-
sarily valid for any individual within the group unless the group is
completely homogeneous. It could be that 40 people have a 0 5
likelihood of moving from A to B and the remaining 60 a zero likelihood,
i.e. they never buy brand B. The observed macro-level data could
equally well be explained on this hypothesis.
As long ago as 1934 Paul Lazarsfeld talked of "assuming a sort of
AS E*
standard consumer and attributing to him as a measure of degree of
importance what originally was a measure of frequency in the group
investigated" (Lazarsfeld, 1934). It is unwise, however, to push too
far the assumption that group characteristics can be translated to in-
dividuals within the group. More recently, Ronald Howard (1963)
has pointed out the fallacy of the argument in the case of Markov
process models: "The main point to make about the aggregation
problem is that if you assume a Markov model for customer behavior,
you must state whether you think this model applies to the individual
customer or to the market as a whole. If you model the whole popu-
lation by the Markov transition matrix, then you are dealing with the
Markov process as a flow model, not as a stochastic process. If the
Markov model is applied to the individual customer then there will
be a fluctuation in the number of customers purchasing each brand in
the steady state; this fluctuation can be predicted by the methods we
have discussed."
Howard's discussion is mainly statistical. He is prepared to believe
that each of the ck customers of brand K has the same probability of
purchasing brand J n periods later, i.e. he accepts the homogeneity of
the k-buying population. His point is that there are N separate bi-
nomial distributions in an N brand market, so that the total number
of customers who will be buying brand J at time n will have a distri-
bution that is the convolution of all these N binomial distributions.
Howard derives the expected value of the convolution and also its
variance, the latter being the real point of his article because he is
mainly concerned to demonstrate the possibility of fluctuation around
the expected value of a Markov process model, whereas a "pure"
Markov model incorporates the assumption that the actual flow will be
the expected flow.
Commenting on the same point, Ehrenberg (1965, p. 361) maintains
that it could be perfectly adequate to use a flow model with proba-
bilities, or more exactly proportions, derived from aggregate data. If
a proportion 07 of a population buys brand A again, it makes no
difference for such a model whether every individual in the population
has the same probability 07 of buying again or whether, at the other
extreme, a proportion 07 of the group is certain to buy again and the
remaining 03 have a zero probability of repurchase. He believes that
the only relevant criterion is whether such a model works, by fitting the
available data to an adequate degree of approximation and by supply-
ing usable deductions.
It seems doubtful whether it would be fruitful to pursue model-
building which entirely neglected heterogeneity at the micro-level.
Both aggregate and individual data may have to be used. It is im-
portant to watch for the careless translation of evidence from one field
to the other, and confusion between the two levels at which a model
might be applicable.
Macro-Level Models
It may be useful to refer briefly to types of model which have been
put forward at the macro-level. These models are non-stochastic except
in the Howard sense that a process which may be conceptualized in
probabilistic terms can have a variance associated with the proba-
bilities, so that the outcome is not fully deterministic.
Two main areas of aggregate consumer behaviour have been in-
vestigated: the rate at which sales of new brands build up, and the rate
at which sales decline, particularly when advertising and promotional
support is withdrawn. An example of the first category is the Fourt
and Woodlock model (1960), which assumes that a constant proportion,
r, of the homes which have not yet tried a new brand will buy it for
the first time in the succeeding time-period. Given a ceiling level of
penetration, x, which is the proportion of households which will ever
try the brand, the increment in penetration in the ith time-period is
given as a first approximation by the formula
rx(I - r)i-1.
To allow for buyer heterogeneity, and in particular the "stretch-
out" effect of light buyers coming in as first-time triers over a long
period of time, a refinement of the model allows a gradual increase in
the ceiling penetration figure which is represented by the line xo + ik,
i.e. the slope is a linear function of the time-period.
The way in which the use of a new drug spreads through a medical
community has been described in terms of a "snowball" or "chain-
reaction" process (Coleman et al., 1957), where the proportion (y) of
doctors who have prescribed the drug at time t can be modelled by the
differential equation dy/dt = ky(I -y)t. James Coleman outlines a "no
brand loyalty" model in which the states labelled 1, 2, 3, . . . are the
states of having bought brand A once, twice, three times, etc. in suc-
cession without having switched to another brand. The state 0 is the
state of having bought another brand at the last purchase. The lack
of brand loyalty is represented by a constant for the rate of moving on
from state to state: since it is a constant, the probability of buying
brand A next time is no greater whether the brand has been bought
once or many times previously. Under equilibrium conditions where
the number crossing into state i per unit is equal to the number going
out, Coleman shows that the proportion pi of all consumers who, at
the time of measurement, have made exactly i purchases of brand A
since buying a different brand is given by
pi == ( - x)xi,
where x is a function of the transition rates between states. Coleman
(1964a) suggests that deviations of the data from predictions given by
the model could be taken as evidence of brand loyalty.
A number of models have incorporated hypotheses about the rate of
decline in sales in the absence of advertising or promotional support.
Nerlove and Arrow (1962) assume a "decay factor" operating on a
stock of goodwill. Telser (1962), in a detailed study of the sales effect-
iveness of American cigarette advertising, has an expression for the
depreciation of advertising capital with time. Vidale and Wolfe (1957)
report an exponential "sales decay constant" from studies of many
product categories. A number of relatively crude models relating sales
directly to advertising (e.g. Weinberg, 1960; Fox, 1960; and the
"dynamic difference" approach) imply some assumption about sales
declining in the absence of advertising support.
Micro-Models-Constant Probability
Returning to the individual consumer, the simplest form of stochastic
model for brand-purchasing behaviour is to assume that each person
has a single probability of buying a given brand which remains con-
stant, and to regard each purchase decision as an independent event.
The hypothesis has been tested by Frank (1962) using Chicago Tribune
consumer panel data for regular and instant coffee purchases. Frank
used the distribution of runs approach. If a customer makes n1 pur-
chases of brand A and n2 non-brand A purchases, the expected number
of runs (uninterrupted sequences of brand A or non-brand A purchases)
in n = n1 +n2 trials will be, on a purely random basis,
M = 2n1n2+1.
The actual number of runs can be compared with the expected
number. If fewer runs appear than would be anticipated by chance
there is evidence of clustering, i.e. long sequences of brand A or non-
brand A purchases, which would indicate that the probability of
brand A purchases was not constant but had fluctuated during the
period of analysis.
A normal deviate (K) can be computed from:
r+1-M
'where r is the observed number of runs and
,I {2n12(n2n n2 -n)}
Frank found many more normal deviates in the negative tail of the
distribution than could be accommodated by the random hypothesis,
i.e. some 20% of families showed fewer runs than the "constant
probabilities" assumption would allow although there were a large
proportion of families behaving in a manner consistent with the hypo-
thesis. Kuehn (1962, p. 402). comments shrewdly on this interpretation
when he says: "Frank sets up his hypothesis, tests it at some level of
significance for each of a large number of cases (families), and then
interprets the results as though all cases not shown to deviate statistically
on an individual basis are consistent with the hypothesis. Actually,
the hypothesis was that consumers have a constant probability of
purchase, and the results indicated that a larger number of the indi-
vidual cases tested lay outside the confidence limits than is consistent
with the hypothesis, thereby rejecting the hypothesis in toto!"
The constant probability model does not appeal intuitively, but
Frank's approach indicates a useful method of analysis which is
preferable to dismissing the model out of hand. The numbers of low
deviates show the tendency of individuals' purchases to cluster by
brand and could provide the basis for "loyalty" comparisons across
product categories, although such comparisons have not yet ap-
peared in the published literature.
It is more doubtful whether constant probabilities at the micro-level
can be disproved by macro-level data. For example, Herniter (1965)
quotes data of purchase sequence frequencies in which the empirical
probability of a given brand being purchased following various pre-
ceding patterns of brand purchase is calculated. The variable 1 in a
sequence signifies that the brand in question was bought and the
variable 0 indicates the purchase of some other brand. Thus the
sequence 011 means that the brand in question was bought on the last
two of three purchase occasions. It is possible to analyse consumer
panel records for a period and establish how frequently a pattern
like 011 is followed by a 1 or a 0, giving results such as those shown
in Table I.
TABLE I
Sequence Empirical probability of
brand repurchase
000 0-11
001 0-52
010 0-36
011 0-64
100 0-38
101 0-64
110 0-49
111 0-89
This approach was pioneered and developed by Al Kuehn (1958).
However, it should be noticed that the empirical repurchase proba-
bility for the group says nothing about individual repurchase proba-
bilities unless, as some authors seem tacitly to assume, the whole
group is treated as homogeneous. Frank (1962) has shown that micro-
level data in which there is no "learning", or tendency for the purchase
of a brand to lead to the same brand being purchased next time, can
be aggregated to produce spurious evidence that repeat purchase
probabilities do increase as the number of previous consecutive pur-
chases increase. The reason is that the sub-group which has made
(say) eight consecutive purchases of a brand contains more people of
a loyal type, so that the observed higher repurchase probability is a
function of this selectivity and not of reinforcement resulting from re-
peated purchasing and experience of a brand. The same argument
applies in reverse. The high repurchase probability following the
sequence 111 may only show that the population of consumers is
heterogeneous, with no implication about micro-level behaviour.
An interesting parallel of misleading aggregation comes from an
experiment carried out by Virginia Voeks (1954), in which subjects
were conditioned to an eye-blink response. Each individual learned
the response suddenly or "jumpwise", i.e. a run of no responses was
followed by consistent responding, but the learning took place on
different trials for different individuals. When the data was added
together to show the percentage responding after 1,2,3, . . .,12 trials a
gradually increasing curve resulted. In this case there was no question
that each individual had learned gradually, as their individual records
showed. The apparent slow and continuous increase in learning at the
aggregate level was entirely due to one individual after another joining
the group who had acquired the response as the trials went on. (For
an illuminating comment on this work see Hilgard (1956).) There is a
powerful lesson in this example about making assumptions from
aggregate data about the pattern of individual behaviour.
First-Order Markov Process Models
The attractive mathematical development of Markov processes has
led to a number of attempts to apply them to consumer purchasing.
The simplest assumption is to regard the transition rates between
states as constant over time. First, however, some definitions are in
order. The fixed probability model describes the consumer as being in
a state pi (i = 1,2, . . .,n) in an n brand market, where pi is the proba-
bility that brand i will be purchased on any occasion. The Markov
process model characterizes the buyer by a transition rate Pi j represent-
ing the conditional probability that, given that he is in state i, he will
next move to state j. First-order Markov processes have been mainly
used in the marketing literature, state i corresponding to the situation
TABLE II
brand buyers Period 2 brand buyers
ABC
A 0-8 0-1 0.1
B 02 0.7 0.1
C 0.1 0.2 07
where a consumer's last purchase was brand i. A typical repeat-buying
and brand-switching matrix of this type takes the form shown in
Table II.
The rows add to unity and the table shows that a proportion 0O8 of
MODELS OF CONSUMER PURCHIASING BEHAVIOUR 225
brand A buyers in the first period will buy it again in the second period,
01 will switch to brand B and 01 will switch to brand C. Similar
matrices can be built up empirically from consumer panel purchase
records, although not without some awkwardness and artificiality
which will be discussed later. It will be noticed that Table II represents
a switch to the macro-level of analysis: as usually employed, the transi-
tion matrix is derived from aggregate data and indicates that 0-8 of
buyers of brand A will have bought brand A again next time; 0-8 is a
proportion at the macro-level, not a probability for the individual
purchaser.
A first-order stationary Markov process suffers from the incon-
venience of having no memory. By the independence-of-path assump-
tion, the transition probability applicable to a particular state depends
only on which state the system is in. The path by which the state was
reached is irrelevant. To take an urn model example, if k urns contain
balls numbered 1,2, . . .,k and the procedure is to draw a ball from an
urn, note the number on the ball and to make the next drawing from
the urn of that number, then at any moment the transition probability
p,7,. depends only on the composition of balls in thejth urn; it does not
depend on the preceding drawings by which the jth urn was reached.
In application to consumer purchasing, a Markov model requires
the preceding purchase to govern the state of the system. If brand A
was purchased last, it is irrelevant how that state was reached-whether
it followed a whole series of A purchases, or whether it was an odd,
exceptional choice. It is not surprising to learn that this assumption
does not fit with the realities of buying behaviour. Kuehn (1962,
p. 395) has shown that the pattern of four preceding purchases does
influence the fifth choice in the case of frozen orange juice concentrate
buying and concludes "this finding raises some question about the
uses currently being made of purchase-to-purchase Markov Chain
Analyses which assume that only the most recent purchase of the
consumer is influential".
Ehrenberg points out another serious difficulty with the assumption
of stationary transition rates. It is well known that Markovian matrices
converge rather rapidly to near steady state values (Maffei, 1960). A
corollary is that they diverge very fast if taken backwards through time.
Since the matrix is assumed to have a constant value there is no
reason why the transition rates of period 1 should not be used to cal-
culate the theoretically preceding matrix in period 0 or in period - 1,
since the numbering of the periods is arbitrary. When this is done it
is soon found that the matrices "blow up" by including numbers with
negative values or exceeding the number of purchasers, which is im-
possible (Ehrenberg, 1965, p. 355).
Perhaps the main usable property of Markovian mathematics is the
possibility of the steady state calculation. Harary and Lipstein (1962)
see great potential in this: "It is within this context that the steady-state
predictions of brand shares can be useful for evaluating advertising and
promotion activity. Since the steady-state shares are predictions in
the future, it is possible to make relative comparisons between ad-
vertising campaigns at a number of points in time. This is particularly
valuable in advertising experiments, which should be conducted for
extended periods of time, but which frequently break down because
of other factors."
A brief look at a steady state matrix is sufficient to show the arti-
ficiality of the situation which it represents. In the two-state case a
steady state matrix has identical rows (Feller, 1957) and with multiple
states the rows approximate each other more closely as the matrix
transitions to its stable state, finally becoming an idempotent matrix
in the limit. If the rows are very similar or identical to each other
there can be no weighting of values along the diagonal of the matrix
representing the repurchase probability of each brand. A character-
istic feature of actual purchase behaviour thus disappears. Harary and
Lipstein push other mathematical aspects of Markov processes to the
limit, for example by defining a brand as an absorbing state which can
gain but never lose customers and then calculating how long it takes
for the brand in question to swallow up the entire market. It is difficult
to put much faith in the adoption of convenient properties of Markov
chains without evidence that the processes derived from the mathe-
matics have some correspondence with real world events.
Apart from theoretical considerations difficulties arise in adapting
buying data to the requirements of a Markov model. The question of
the time-period to take is an awkward one. If it is too short, many
people will not have bought in either the first or the second period and
therefore have either to be omitted from the analysis or regarded as
"'transitioning" into an artificial category "did not buy". A longer
time-period reduces the non-buyers problem but then a number of
customers will be found to have bought a variety of different brands
in each period and it is hard to decide how to represent them as being
in a single state. Draper and Nolin (1964) escape from the dilemma
by labelling the consumer according to the brand of cake mix which
accounted for her largest expenditure during a quarter, but it is
hardly a satisfactory method. These authors also average the transition
probabilities over two six-quarter periods to obtain "average transition
matrices"; the differences between them were found not to be statis-
tically significant, but the "trends" which they showed were duly
discussed.
For practical marketing purposes it is valuable to differentiate con-
sumers by their weight or frequency of purchase, and here again the
labelling by Markovian state does not discriminate. Herniter and
Magee (1961) say that "there is evidence that in certain circumstances,
the population must be classified as a minimum into three groups:
customers or users, testers or triers, and non-customers. The second
group is made up of individuals who are searching for an adequate
product or service. They will try the product and will either drop it
or become regular users relatively quickly". Alternatively, a hard-core
and a switchers group may be the important categories. There is some
evidence that marked differences exist in the case of American tooth-
paste users, the heavy buyers being more brand loyal than the medium
and light buyers over all brands (Lawrence, 1966).
In summary, first-order stationary Markov process models have a
number of shortcomings. There is no a priori reason why the assump-
tion of constant transition rates should be any more exact than the
assumption of constant probabilities. The models need validifying in
the sense of proving to yield values which are later found to correspond
with actual performance. To date much of the published work has
dealt with the theoretical future of a system based on one transition
matrix which may provide interesting information in itself; but a
transition matrix does not guarantee that a simple Markov process can
be applied. Some work by Styan and Smith (1964) has shown that a
single transition frequency matrix can be assumed to underlie 24
separate matrices representing the week-to-week switching of house-
wives between the categories of (1) detergent only buyers, (2) soap
powder buyers, (3) both powder buyers, and (4) no powder buyers.
That is to say, each of the 24 matrices could be regarded as a random
variation of a single matrix by a suitable chi-squared test. However,
in this case the Markov process had already reached a stable state
represented by the idempotent matrix with all the rows the same. The
model had therefore no predictive value and only showed that "things
were as they were". This result is not strong evidence to set against
the cases where the Markov process independence-of-path assumptions
have been shown not to square with the facts, as one would be inclined
to expect on intuitive grounds.
Higher Order Markov Process Models
One way to extend the range of Markov models is to incorporate
more of the past of the system into the definition of its states. Thus
with two alternatives, 0 and 1, in the original system it is possible to
define four compound events, 00, 01, 10 and 11. The transitions
between these states can be set out as a matrix showing the probability
of state 00 moving to state 01, and so on. Zeros appear in the matrix
as the system cannot move directly from the state 00 to state 11 without
passing through the intermediate state 01. George Miller (1952)
writes: "In principle it is possible to extend the Markov definition in-
definitely to take into account as much of the past history of the
system as one desires." Harary and Lipstein make a similar suggestion
for consumer brand-switching behaviour but as yet little applied work
seems to have been done in this area.
Second, third and higher order Markov chains depend on taking
into account two, three or more preceding states of the system. The
mathematical basis for examining various hypotheses has been given
by Anderson and Goodman (1957), who indicate tests for the order of
a Markov chain; for whether several samples are from the same Markov
chain of a given order; for whether transition probabilities can be
assumed constant, etc. The theory is well prepared but higher order
Markov processes in a multi-brand market would involve complica-
tions which have not been tackled empirically. In any case, models
of the following type may offer better possibilities of application than
the stationary approach.
Markov Process Models based on Probability States
The statistical description of learning processes was greatly advanced
by the work of Bush and Mosteller (1955). They were dissatisfied with
the probability of a particular response being taken as the state of a
system since in learning experiments the probability tended to increase
as trials went on, thus invalidating the time-independence assumption
of a normal Markov process. Their innovation was to make the proba-
bilities themselves the states of the system, affected by outside influences
which could be expressed mathematically as "operators". Thus the
probability of any event could vary in principle continuously from
zero to one. However, only two states (changed probabilities) could
be reached from any given state (probability of response). If the be-
haviour was reinforced, the gain operator specified the amount by
which the probability of a response was increased; the loss operator
similarly specified the reduction in the probability of the response which
would occur. Both operators were assumed to be linear. A new sort
of Markov process was introduced in this way: it was independent of
the time path because, no matter how a particular level of probability
had been arrived at historically, the onward transition rates were
specified by the gain and loss operators.
Al Kuehn (1962) has demonstrated the adaptation of the Bush-
Mosteller model to consumer -brand purchasing. If an individual's
probability of purchasing a given brand on the nth occasion is p(n),
and he does in fact purchase that brand, then his probability of buying
it next time, p(n + 1), is given by
p(n + 1) = (1 - a)p(n) + bs + (a - b),
where s is the equilibrium (projected) market share for the brand and
a and b are constants depending on the product class but independent
of the brand.
If some other brand was purchased on the nth occasion, the proba-
bility of the given brand being bought next is given by the loss operator
p(n + 1) =(I - a)p(n) + bs.
Heterogeneity of buyers can be represented in two ways: by the
vertical distance separating the operators (a - b) and by the slope of
the operators (1 -a). Kuehn (1962, pp. 392-393) says that the opera-
tors are functions of the time elapsed between purchases and the mer-
chandizing activity of competitors, and gives examples of the values of
the parameters for high frequency, medium frequency and low fre-
quency purchasers.
An attractive feature of the model is that it is based on a psycho-
logical theory of the learning process derived principally from the work
of Hull and Skinner. The gain operator represents the positive influence
of reinforcement on behaviour; the loss operator shows the negative
influence of punishment or of work required in responding. However,
there must be doubts about the direct application of the theory to con-
sumer purchasing:
(1) The Bush-Mosteller model is explicitly a mathematization of a
reinforcement theory of learning, the experimental support for which
leans heavily on studies of how rats learn to run mazes. Learning is
described as a process of small increments which can be expressed as
a gradually increasing probability of "success". There is no a priori
grounds to assume that consumers learn about brands in this fashion.
An alternative theory of one-trial, "quick" learning is equally respect-
able academically-it is common to Gestalt psychology and to such
authorities as Skinner and Guthrie within the associationist school-
and may better describe the behaviour of at least some consumers of
some categories of product.
(2) In animal learning experiments the reward is clear-cut: food
for a hungry subject, typically. Reinforcement of the behaviour asso-
ciated with rewards is observed to occur. In mathematical versions of
experimental evidence, a reward always increases the probability of a
repeat performance via suitably defined operators. Adaptations of the
learning axiom to consumer behaviour, as in the work of Al Kuehn,
implicitly equate the purchase of a brand with a "reward" which in-
creases the repurchase probability for that brand automatically
through the gain operator. But it is possible that a purchase may
prove disappointing and cause the buyer to decide never to buy that
brand again. All purchases are not equally "rewards", and the sign of
the change in purchase probabilities is not necessarily positive. Further,
Bush and Mosteller point out (pp. 330, 332) that their model takes no
account of response intensity or difference in reward. In the consumer-
buying analogy, the corresponding assumption must be that all pur-
chases produce the same intensity of consumer reaction. Not only is
there always satisfaction, but that satisfaction is always of the same
amount whichever brand was purchased. It is difficult to believe that
a model which incorporates such assumptions is adequate to deal with
consumer purchase behaviour.
(3) Luce (1959) points out that learning theorists have tended to
concentrate their research on choices between two alternatives. How-
ever: "Complete data concerning the choices that a person makes
from each possible pair of alternatives taken from a set of three or
more alternatives do not appear to determine what choice he will
make when the whole set is presented. . .. For the most part, present-
day psychologists have been willing to ignore-or, to be more accurate
to bypass and postpone-the connections between pairwise choices and
more general ones. And so the relations have remained obscure."
Consumer behaviour in most markets is a question of choice between
multiple alternatives. A learning model based on dichotomies of
choices may not describe it satisfactorily.
Suggestions for Future Developments
The investigators who have developed statistical learning theory
have not been altogether successful in defending themselves against the
accusation that their work amounts to "blind curve-fitting". The
charge lies even more heavily against model-builders who fasten
equations on to empirical data unless they are able to show that a very
wide range of evidence is accommodated by their formulations. The
need is for an adequate theoretical framework to deal with the process
or system under study from which hypotheses can be derived and put
to the test.
The nature of learning is a case in point. The reinforcement mech-
anism underlies most of the models which have been developed up to
the present time. Several alternative types of learning process have
been suggested by psychologists: the slow, incremental acquisition of
conditioning or the insightful perception leading to a sudden change
in behaviour; deliberate, incidental and latent learning-for a dis-
cussion of the application of this formulation to the effects of advertis-
ing, see Kelvin (1962); and so on. We need to know which type of
process is best applicable to consumer purchasing, or which combina-
tions of processes. For it is possible that one product category or one
type of purchaser, distinguished by personality and/or socio-demo-
graphic characteristics, is best described by rapid learning. The
corresponding buying pattern would be marked by a distinct cleavage
as the advantages of brand B over brand A became suddenly apparent.
In another product category behaviour might be better described in
incremental learning terms, with gradual reinforcement occurring as
succeeding purchases are made. One would like to know the relevant
dimensions of products: whether low-value, frequent-purchase items
of limited salience could be grouped together; whether low differ-
entiation between brands or the stage of the product in its life cycle
would prove to be the best basis for aggregating products by the type
of learning process involved. Differences between individuals might
alternatively provide a more satisfactory method of classification.
To take a concrete example, what happens when a long string of
brand A purchases is suddenly interrupted by a brand B purchase?
The "logical" consumer would presumably decide between the brands
and continue by buying one or the other, but not both. Perhaps a
period of uncertainty would follow, marked by alternations between
brand A and brand B. A theory that might be applied in this case is
the three-state Markov learning model of Theios (1961), which sug-
gests that an animal in a learning experiment enters into a distinct
phase once it has made the first correct response and leaves it again
following its last failure. The probabilities of a response are 0, 2, and 1
during the pre-conditioning, conditioning, and conditioned stages
respectively. (See also Bower, 1961.) Finally, the consumer might be
entering an experimental phase and prove likely to go on to try brands
C and D. In each case an appropriate theory could be produced and
a model built from it. Little work has been done on behavioural indi-
cations of the nature of learning processes in the consumer goods field.
It is an area in which students of marketing are well placed to repay
some of the debt which they owe to the social sciences.
Secondly, future investigation might concentrate on the question of
consumer heterogeneity. Current stochastic theory tends to locate the
source of variability within individuals, by postulating that each indi-
vidual has a probability of a given response and that all individuals
are identical. At the other extreme variability can be thought of as
existing between individuals, with each person having a response proba-
bility of either 0 or 1 and the observed variation in purchasing be-
haviour being due to the relative proportions of such individuals in the
population. A combination of the two approaches is needed. Distri-
butions along both dimensions can be considered simultaneously in a
more realistic representation of market conditions. Some work dealing
with one brand at a time (Chatfield et al., 1966, and earlier references
given there) is of this kind, and James Coleman (1 964b) has suggested
a number of models which pursue the theme of variation in response
probability.
In conclusion, consumer behaviour has been treated as a dichotomy,
"bought brand A" and "bought some other brand than A", in many
models of decision processes. The simplification may be justified in
some cases for ease of computation but there is a risk that significant
differences in behaviour can be obscured. It has been powerfully
argued that paired product comparisons are misleading and should
be abandoned (Blankenship, 1966). In the wider field of consumer
purchasing it is also desirable to allow for the multiple-choice situation
of the marketplace and to devise analytical methods which are capable
of handling it in its full complexity.
REFERENCES
ANDERSON, T. W. and GOODMAN, L. A. (1957). Statistical inference about Markov
chains. Ann. Math. Statist., 28, 89-1 10.
BLANKENSHIP, A. B. (1966). Let's bury paired comparisons. 7. Advert. Res., 6, 13-17.
BOWER, G. H. (1961). General Three-State Markov Learning Models. Technical Report
No. 41. Institute of Mathematical Studies in the Social Sciences, Stanford
University.
BROWN, G. H. (1952-53). Brand loyalty-Fact or fiction? Advert. Age, 23 (June 9,
June 30, August 11, October 6, December 1, 1952); 24 (January 26, 1953).
BUSH, R. R. and MOSTELLER, F. (1955). Stochastic Models for Learning. New York:
Wiley.
AS F
CHATFIELD, C., EHRENBERG, A. S. C. and GOODHARDT, G. J. (1966). Progress on a
simplified model of stationary purchasing behaviour. 3. R. Statist. Soc. A,
129, 317-367.
COLEMAN, J. S. (1964a). Introduction to Mathematical Sociology, p. 321. Glencoe, Ill.:
Free Press.
(1964b). Models of Change and Response Uncertainty. Englewood Cliffs, N.J.:
Prentice-Hall.
COLEMAN, J., KATZ, E. and MENZEL, H. (1957). The diffusion of an innovation
among physicians. Sociometry, 20, 253-270.
DRAPER, JEAN E. and NOLIN, L. H. (1964). A Markov chain analysis of brand
preferences. 3. Advert. Res., 4, 33-38.
EHRENBERG, A. S. C. (1965). An appraisal of Markov brand-switching models.
J. Marketing Res., 2, 347-362.
ESTES, W. K. (1950). Toward a statistical theory of learning. Psychol. Rev., 57,
94-107.
FELLER, W. (1957). An Introduction to Probability Theory and its Applications, Vol. 1,
p. 385. New York: Wiley.
FESTINGER, L. (1964). Behavioral support for opinion change. Publ. Opinion Quart.,
28, 404-417.
FOURT, L. A. and WOODLOCK, J. W. (1960). Early prediction of market success for
new grocery products. 3. Marketing, 25, 31-38.
Fox, H. W. (1960). Advertising efficacy: An analytical study. JN.A.A. Bull. (Feb.),
53-59.
FRANK, R. E. (1962). Brand choice as a probability process. i. Busin., 35, 43-56.
HARARY, F. and LIPSTEIN, B. (1962). The dynamics of brand loyalty: A Markovian
approach. Operat. Res., 10, 35.
HASKINS, J. B. (1964). Factual recall as a measure of advertising effectiveness. 3.
Advert. Res., 4, 2-8.
HERNITER, J. D. (1965). Stochastic market models and the analysis of consumer
panel data. (Paper at 27th National Meeting, Operations Research Society of
America.)
HERNITER, J. D. and MAGEE, J. F. (1961). Customer behavior as a Markov process.
Operat. Res., 9, 105-122.
HILGARD, E. R. (1956). Theories of Learning, p. 72. New York: Appleton-Century-
Crofts.
HOWARD, R. A. (1963). Stochastic process models of consumer behaviour. 3. Advert.
Res., 3, 39.
KELVIN, R. P. (1962). Advertising and Human Memory, p. 16. London: Business
Publications.
KUEHN, A. A. (1958). An analysis of the dynamics of consumer behavior and its
implications for marketing management. (Unpublished Ph.D. dissertation.)
Graduate School of Industrial Administration, Carnegie Institute of Tech-
nology.
- (1962). Consumer brand choice-A learning process? In Quantitative Tech-
niques in Marketing Analysis. (ed. R. E. Frank, A. A. Kuehn and W. F. Massy).
Homewood, Ill.: Irwin.
LANGHOFF, P. (1965). Models, Measurement and Marketing, p. 13. New York: Prentice-
Hall.
LAwRENCE, R. J. (1966). New analyses of consumer purchasing patterns. Business
(March), 60.
LAZARSFELD, P. F. (1934). The psychological aspect of market research. Harvard
Busin. Rev., 13, 54-71.
LUcE, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis, p. 3. New York:
Wiley.
MAFFEI, R. B. (1960). Brand preferences and simple Markov processes. Operat.
Res., 8, 210-218.
MILLER, G. A. (1952). Finite Markov processes in psychology. Psychometrika, 17,
149-167.
NERLOVE, M. and ARROW, K. J. (1962). Optional advertising policy under dynamic
conditions. Economica, 29, 129-142.
NEWELL, A. and SIMON, H. A. (1963). Computers in psychology. In Handbook of
Mathematical Psychology (ed. R. D. Luce, R. R. Bush and E. Gallanter),
Vol. 1, p. 368. New York: Wiley.
ROHILOFF, A. C. (1963). New ways to analyze brand-to-brand competition. In
Proc. Winter Conf. American Marketing Association: Toward Scientific Marketing
(ed. S. A. Greyser), pp. 224-240. Chicago: American Marketing Association.
SEVIN, C. K. (1965). Marketing Productivity Analysis. New York: McGraw-Hill.
STYAN, G. P. H. and SMITH, H., Jr (1964). Markov chains applied to marketing.
3. Marketing Res., 1, 50-55.
TELSER, L. G. (1962). Advertising and cigarettes. 5. Polit. Econ., 70, 471-499.
THEIos, J. (1961). A Three-State Modelfor Learning. Technical Report No. 40. Insti-
tute of Mathematical Studies in the Social Sciences, Stanford University.
VIDALE, M. L. and WOLFE, H. B. (1957). An operations-research study of sales
response to advertising. Operat. Res., 5, 370-381.
VOEKS, VIRGINIA W. (1954). Acquisition of S-R connections: A test of Hull's and
Guthrie's theories. 3. Exp. Psychol., 47, 137-147.
WEINBERG, R. S. (1960). An Analytical Approach to Advertising Expenditure Strategy.
New York: Association of National Advertisers.

Lawrence 1966

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lawrence 1966

Uploaded by

Copyright:

Available Formats

Models of Consumer Purchasing Behaviour

Author(s): Raymond J. Lawrence

It is of interest that the term "model" is tending to oust the older

mathematical relationships. It has been pointed out (Langhoff, 1965)

semantic advantages for the acceptance and proper positioning of

science and the scientist in industry generally. Theory had become a

'long-haired' word. Not only was it associated with the abstract, it

also connoted impracticality. Model, having been identified with scale

models of boats, locomotives and airplanes, communicates more

graphically the correct notions of theory to men who are unacquainted

with the scientific importance of theoretical constructs."

Within the field of fully specified mathematical models we are here

concerned with a particular subset of models which treat consumer

purchasing behaviour (a) stochastically and (b) from data based on

internal behavioural evidence. These two terms require further

Consumer buying presents a picture of frustrating complexity. In

a multi-brand market almost every customer has his own individual

pattern of brand purchases through time. Theoretical constructs are

needed as a basis for aggregating the data and reducing it to manage-

able (i.e. understandable) proportions.

The most commonly used construct is the time-period. Purchases

which occur between certain calendar dates are added together, as in

periods". The word construct is deliberately used in this context to

draw attention to the fact that the standard time-period comparisons

are an artificial convention, although the method is so familiar from

information is conveyed by a rise or drop in sales: the message is simply

"something has happened". The bare figures give no idea of what,

where or why. Analysis is needed to answer these questions, and here

one comes back to constructs. Types of trade outlet, sales territories

and socio-demographic classes of customer are typical breakdowns of

change in sales. Managers have in practice a theory, although they

which are likely to be helpful in isolating the causes of events affecting

There are a number of other ways of aggregating sales information.

One is over customers, e.g. by counting up how many accounts fall

into different turnover categories. Some surprising results are apt to

pany's customers were found to account for 2% of sales volume. In

the consumer goods field an analogous operation would be a count of

customers by quantity purchased, which frequently shows up the

enormous importance of relatively few heavy buyers. A further possi-

bility (which brings us back to the point about behavioural evidence)

is to use the concept of brand loyalty as a means of subdividing the

market in an operationally useful way.

Brand loyalty can be regarded as a mental state, to be measured by

agreement with such statements as "I definitely intend to buy brand X

more generally there seems to be an unbridged gap between measure-

ments of attitude and the occurrence of the behaviour which would

logically accompany that attitude. (For the social science evidence

see Festinger (1964). Similar conclusions are reached from a study

of the marketing literature in Haskins (1964).) An alternative approach

is to bypass the uncertain area of intention, attitude and other mental

states and to take a measure of brand loyalty directly from observed

behaviour. Perhaps the first systematic step in this direction was a

study by George Brown (1952-53) based on brand purchase sequences

as AAAAAA as showing "undivided loyalty"; ABABAB as "divided

loyalty"; AAABBB as "unstable loyalty", etc. Later work has used

an arbitrary percentage figure, e.g. 75% or more of purchases of one

brand, as an operational definition of the brand loyal consumer.

Models which use behavioural evidence are based on records of

specific acts, usually records of continuous consumer purchases through

time. In this regard they have a kinship with psychological investiga-

tions such as animal learning experiments, where observed behaviour

has to be taken as the criterion of changes which have occurred. In

contrast, many models of human behaviour make use of verbal state-