Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Models of Consumer Purchasing Behaviour

Author(s): Raymond J. Lawrence


Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), Vol. 15, No. 3,
Research into Marketing (1966), pp. 216-233
Published by: Wiley for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2985301
Accessed: 11-03-2016 04:25 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Royal Statistical Society and Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the
Royal Statistical Society. Series C (Applied Statistics).

http://www.jstor.org

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR

RAYMOND J. LAWRENCE

University of Lancaster

It is of interest that the term "model" is tending to oust the older

term "theory", although the two words have much in common, be-

cause of its close connection with the idea of measurement and exact

mathematical relationships. It has been pointed out (Langhoff, 1965)

that the change in terminology has its uses: "Whatever the reason for

shifting from the term 'theory' to the term 'model', it offers tremendous

semantic advantages for the acceptance and proper positioning of

science and the scientist in industry generally. Theory had become a

'long-haired' word. Not only was it associated with the abstract, it

also connoted impracticality. Model, having been identified with scale

models of boats, locomotives and airplanes, communicates more

graphically the correct notions of theory to men who are unacquainted

with the scientific importance of theoretical constructs."

Within the field of fully specified mathematical models we are here

concerned with a particular subset of models which treat consumer

purchasing behaviour (a) stochastically and (b) from data based on

internal behavioural evidence. These two terms require further

clarification.

Behavioural Evidence

Consumer buying presents a picture of frustrating complexity. In

a multi-brand market almost every customer has his own individual

pattern of brand purchases through time. Theoretical constructs are

needed as a basis for aggregating the data and reducing it to manage-

able (i.e. understandable) proportions.

The most commonly used construct is the time-period. Purchases

which occur between certain calendar dates are added together, as in

the case of ordinary sales figures for the week or book month. Com-

parisons over time can be made by comparing the figures for "equal

periods". The word construct is deliberately used in this context to

draw attention to the fact that the standard time-period comparisons

are an artificial convention, although the method is so familiar from

long usage that it has the appearance of being the natural way to keep

track of events. The limitations appear when one considers how little

information is conveyed by a rise or drop in sales: the message is simply

"something has happened". The bare figures give no idea of what,

where or why. Analysis is needed to answer these questions, and here

one comes back to constructs. Types of trade outlet, sales territories

and socio-demographic classes of customer are typical breakdowns of

216

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 217

the figures which are used in an attempt to track down reasons for the

change in sales. Managers have in practice a theory, although they

would often hesitate to use the word themselves, about the constructs

which are likely to be helpful in isolating the causes of events affecting

their markets.

There are a number of other ways of aggregating sales information.

One is over customers, e.g. by counting up how many accounts fall

into different turnover categories. Some surprising results are apt to

show up, as in the case quoted by Sevin (1965) where 78% of a com-

pany's customers were found to account for 2% of sales volume. In

the consumer goods field an analogous operation would be a count of

customers by quantity purchased, which frequently shows up the

enormous importance of relatively few heavy buyers. A further possi-

bility (which brings us back to the point about behavioural evidence)

is to use the concept of brand loyalty as a means of subdividing the

market in an operationally useful way.

Brand loyalty can be regarded as a mental state, to be measured by

agreement with such statements as "I definitely intend to buy brand X

next time", "I always use X", etc. Unfortunately there is considerable

evidence that expressed intentions are not followed in many cases, and

more generally there seems to be an unbridged gap between measure-

ments of attitude and the occurrence of the behaviour which would

logically accompany that attitude. (For the social science evidence

see Festinger (1964). Similar conclusions are reached from a study

of the marketing literature in Haskins (1964).) An alternative approach

is to bypass the uncertain area of intention, attitude and other mental

states and to take a measure of brand loyalty directly from observed

behaviour. Perhaps the first systematic step in this direction was a

study by George Brown (1952-53) based on brand purchase sequences

taken from Chicago Tribune panel data. Brown classified such patterns

as AAAAAA as showing "undivided loyalty"; ABABAB as "divided

loyalty"; AAABBB as "unstable loyalty", etc. Later work has used

an arbitrary percentage figure, e.g. 75% or more of purchases of one

brand, as an operational definition of the brand loyal consumer.

Models which use behavioural evidence are based on records of

specific acts, usually records of continuous consumer purchases through

time. In this regard they have a kinship with psychological investiga-

tions such as animal learning experiments, where observed behaviour

has to be taken as the criterion of changes which have occurred. In

contrast, many models of human behaviour make use of verbal state-

ments from subjects about their preferences, feelings, reactions, etc.

This type of model-building may have greater long-run prospects of

usefulness but we will not be concerned with it here.

Describing Behaviour Stochastically

The application of model-building in the social sciences is a com-

paratively recent phenomenon, and the famous paper on statistical

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
218 APPLIED STATISTICS

learning theory by Estes (1950) is often taken as the starting date. A

basic difficulty is that few acts of human or animal behaviour are fully

predictable once an attempt is made to move beyond purely physio-

logical response measurements: even the most thoroughly conditioned

of laboratory rats is capable of delivering an occasional surprise. Thus

the immediate prospect of developing completely deterministic theories,

akin to those of the physical sciences, is slim. Attention has therefore

moved toward the stochastic or probabilistic expression of behaviour.

If the state of social science theory does not go as far as enabling an

investigator to say "this rat will turn right at the end of the maze", it

may make sense for him to adopt statements like "the probability that

this rat will turn right at the end of the maze is .95".

The probabilistic expression brings with it problems of interpreta-

tion. In one sense it is purely descriptive, and means only that a rat

will average 95 right turns in 100 runs. But it is natural to extend the

meaning beyond a count of overt, observable acts of turning and to

interpret the probability as "a state of the rat", i.e. to assume that

some semi-permanent change in behavioural disposition has come

about such that the rat itself is now a " 95 right-turner".

The transition from a description of behaviour to an assumed

corresponding internal state of the organism is not without difficulties.

Bush and Mosteller (1955, p. 15) acknowledge the problem when they

write: "We conceive that every organism possesses a 'true' proba-

bility p, at the start of each trial. As far as the mathematical system is

concerned, the physical basis for this probability is irrelevant. A

variety of physiological models might provide it. Any sort of fictitious

device that aids us to think about probabilities is as acceptable as any

other. [For example one may think in terms of spinning disks with

sectors marked to correspond with probabilities; urns containing

black and white balls; random number tables, etc.] The above

heuristic devices have little to do, of course, either with the real world

or with the mathematical system. However, we postulate that organisms

behave as if they possessed such probability mechanisms."

An alternative explanation is put forward by Newell and Simon

(1963): "Stochastic theories (theories incorporating random elements)

can be interpreted in either of two ways: the random element can be

regarded as being at the heart of things-a genuine and integral

characteristic of the subject's constitution-or it can be regarded as

an artifact for summing up the unpredictable resultant of a host of

minor (and additive) factors that influence behaviour. There appears

to be, at present, no satisfactory way to choose empirically between

these two interpretations of the random element."

The above quotations are useful aids toward understanding the

assumptions built into the stochastic model of behaviour. A further

point is worth adding. To describe behaviour stochastically is to sum

up the interplay of motivations, influences, and choice mechanisms

within an individual in a single probability value which must be in

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 219

some sense their "settled outcome" in terms of disposition toward

subsequent behaviour. Thus the probability, if it is to be adopted as

the best available concept to quantify action tendency, expresses the

action-oriented resultant of all relevant internal mental processes.

In application to consumer purchasing the state of the system usually

corresponds to the state of a customer. The probabilities which express

this state are related to brands. P(i) = 0-5, for instance, represents the

state of an individual who stands an even chance of buying the ith

birand next time. The sum of the probabilities for all i = 1,2, . . .,n is

taken to be unity. In Markov process models the consumer is defined

to be in state i when his preceding purchase was brand i, and the transi-

tion probability P(ij), including P(ii) or the probability of remaining

"loyal" to brand i, represents the likelihood of a switch from brand i

to brand j on the next purchase occasion.

Levels of Analysis

It has been assumed so far that purchasing behaviour models are

concerned with events at the individual behaviour, or micro, level.

When it comes to aggregation to represent a population of consumers

there is in theory no problem: an average can be struck of the individual

probabilities of buying brand A, preferably with appropriate weighting

so that the big purchasers count more heavily. The aggregate proba-

bility for the group should then approximate to brand A's market

share in the buying period. One of the simplest brand-switching

models, Al Rohloff's gain/loss analysis (Rohloff, 1963), follows exactly

these lines. A family's purchases in two successive periods are allo-

cated as switches from one brand to another (or the same) brand ac-

cording to two rules: (1) Repeat business for a brand is equal to the

minimum of the share points for the brand in the two periods; (2) An

increase in share points for a brand is allocated to the brands losing

share points according to the relative magnitude of their respective

losses.

The rules amount to a straightforward allocation, on a pro rata basis,

of brand gains to brands which lose share of the household's purchases.

The allocation is purely mathematical, and no theoretical claims are

made in its favour. It is just a convenient way of summarizing data.

Confusion unfortunately arises because another method of estimating

probabilities is used derived entirely from macro or aggregate in-

formation. If 100 people buy brand A this time and 20 of them buy

brand B next time, the probability of a switch from A to B is said to

be 0-2. This estimate may be valid for the group but it is not neces-

sarily valid for any individual within the group unless the group is

completely homogeneous. It could be that 40 people have a 0 5

likelihood of moving from A to B and the remaining 60 a zero likelihood,

i.e. they never buy brand B. The observed macro-level data could

equally well be explained on this hypothesis.

As long ago as 1934 Paul Lazarsfeld talked of "assuming a sort of

AS E*

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
220 APPLIED STATISTICS

standard consumer and attributing to him as a measure of degree of

importance what originally was a measure of frequency in the group

investigated" (Lazarsfeld, 1934). It is unwise, however, to push too

far the assumption that group characteristics can be translated to in-

dividuals within the group. More recently, Ronald Howard (1963)

has pointed out the fallacy of the argument in the case of Markov

process models: "The main point to make about the aggregation

problem is that if you assume a Markov model for customer behavior,

you must state whether you think this model applies to the individual

customer or to the market as a whole. If you model the whole popu-

lation by the Markov transition matrix, then you are dealing with the

Markov process as a flow model, not as a stochastic process. If the

Markov model is applied to the individual customer then there will

be a fluctuation in the number of customers purchasing each brand in

the steady state; this fluctuation can be predicted by the methods we

have discussed."

Howard's discussion is mainly statistical. He is prepared to believe

that each of the ck customers of brand K has the same probability of

purchasing brand J n periods later, i.e. he accepts the homogeneity of

the k-buying population. His point is that there are N separate bi-

nomial distributions in an N brand market, so that the total number

of customers who will be buying brand J at time n will have a distri-

bution that is the convolution of all these N binomial distributions.

Howard derives the expected value of the convolution and also its

variance, the latter being the real point of his article because he is

mainly concerned to demonstrate the possibility of fluctuation around

the expected value of a Markov process model, whereas a "pure"

Markov model incorporates the assumption that the actual flow will be

the expected flow.

Commenting on the same point, Ehrenberg (1965, p. 361) maintains

that it could be perfectly adequate to use a flow model with proba-

bilities, or more exactly proportions, derived from aggregate data. If

a proportion 07 of a population buys brand A again, it makes no

difference for such a model whether every individual in the population

has the same probability 07 of buying again or whether, at the other

extreme, a proportion 07 of the group is certain to buy again and the

remaining 03 have a zero probability of repurchase. He believes that

the only relevant criterion is whether such a model works, by fitting the

available data to an adequate degree of approximation and by supply-

ing usable deductions.

It seems doubtful whether it would be fruitful to pursue model-

building which entirely neglected heterogeneity at the micro-level.

Both aggregate and individual data may have to be used. It is im-

portant to watch for the careless translation of evidence from one field

to the other, and confusion between the two levels at which a model

might be applicable.

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 221

Macro-Level Models

It may be useful to refer briefly to types of model which have been

put forward at the macro-level. These models are non-stochastic except

in the Howard sense that a process which may be conceptualized in

probabilistic terms can have a variance associated with the proba-

bilities, so that the outcome is not fully deterministic.

Two main areas of aggregate consumer behaviour have been in-

vestigated: the rate at which sales of new brands build up, and the rate

at which sales decline, particularly when advertising and promotional

support is withdrawn. An example of the first category is the Fourt

and Woodlock model (1960), which assumes that a constant proportion,

r, of the homes which have not yet tried a new brand will buy it for

the first time in the succeeding time-period. Given a ceiling level of

penetration, x, which is the proportion of households which will ever

try the brand, the increment in penetration in the ith time-period is

given as a first approximation by the formula

rx(I - r)i-1.

To allow for buyer heterogeneity, and in particular the "stretch-

out" effect of light buyers coming in as first-time triers over a long

period of time, a refinement of the model allows a gradual increase in

the ceiling penetration figure which is represented by the line xo + ik,

i.e. the slope is a linear function of the time-period.

The way in which the use of a new drug spreads through a medical

community has been described in terms of a "snowball" or "chain-

reaction" process (Coleman et al., 1957), where the proportion (y) of

doctors who have prescribed the drug at time t can be modelled by the

differential equation dy/dt = ky(I -y)t. James Coleman outlines a "no

brand loyalty" model in which the states labelled 1, 2, 3, . . . are the

states of having bought brand A once, twice, three times, etc. in suc-

cession without having switched to another brand. The state 0 is the

state of having bought another brand at the last purchase. The lack

of brand loyalty is represented by a constant for the rate of moving on

from state to state: since it is a constant, the probability of buying

brand A next time is no greater whether the brand has been bought

once or many times previously. Under equilibrium conditions where

the number crossing into state i per unit is equal to the number going

out, Coleman shows that the proportion pi of all consumers who, at

the time of measurement, have made exactly i purchases of brand A

since buying a different brand is given by

pi == ( - x)xi,

where x is a function of the transition rates between states. Coleman

(1964a) suggests that deviations of the data from predictions given by

the model could be taken as evidence of brand loyalty.

A number of models have incorporated hypotheses about the rate of

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
222 APPLIED STATISTICS

decline in sales in the absence of advertising or promotional support.

Nerlove and Arrow (1962) assume a "decay factor" operating on a

stock of goodwill. Telser (1962), in a detailed study of the sales effect-

iveness of American cigarette advertising, has an expression for the

depreciation of advertising capital with time. Vidale and Wolfe (1957)

report an exponential "sales decay constant" from studies of many

product categories. A number of relatively crude models relating sales

directly to advertising (e.g. Weinberg, 1960; Fox, 1960; and the

"dynamic difference" approach) imply some assumption about sales

declining in the absence of advertising support.

Micro-Models-Constant Probability

Returning to the individual consumer, the simplest form of stochastic

model for brand-purchasing behaviour is to assume that each person

has a single probability of buying a given brand which remains con-

stant, and to regard each purchase decision as an independent event.

The hypothesis has been tested by Frank (1962) using Chicago Tribune

consumer panel data for regular and instant coffee purchases. Frank

used the distribution of runs approach. If a customer makes n1 pur-

chases of brand A and n2 non-brand A purchases, the expected number

of runs (uninterrupted sequences of brand A or non-brand A purchases)

in n = n1 +n2 trials will be, on a purely random basis,

M = 2n1n2+1.

The actual number of runs can be compared with the expected

number. If fewer runs appear than would be anticipated by chance

there is evidence of clustering, i.e. long sequences of brand A or non-

brand A purchases, which would indicate that the probability of

brand A purchases was not constant but had fluctuated during the

period of analysis.

A normal deviate (K) can be computed from:

r+1-M

'where r is the observed number of runs and

,I {2n12(n2n n2 -n)}

Frank found many more normal deviates in the negative tail of the

distribution than could be accommodated by the random hypothesis,

i.e. some 20% of families showed fewer runs than the "constant

probabilities" assumption would allow although there were a large

proportion of families behaving in a manner consistent with the hypo-

thesis. Kuehn (1962, p. 402). comments shrewdly on this interpretation

when he says: "Frank sets up his hypothesis, tests it at some level of

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 223

significance for each of a large number of cases (families), and then

interprets the results as though all cases not shown to deviate statistically

on an individual basis are consistent with the hypothesis. Actually,

the hypothesis was that consumers have a constant probability of

purchase, and the results indicated that a larger number of the indi-

vidual cases tested lay outside the confidence limits than is consistent

with the hypothesis, thereby rejecting the hypothesis in toto!"

The constant probability model does not appeal intuitively, but

Frank's approach indicates a useful method of analysis which is

preferable to dismissing the model out of hand. The numbers of low

deviates show the tendency of individuals' purchases to cluster by

brand and could provide the basis for "loyalty" comparisons across

product categories, although such comparisons have not yet ap-

peared in the published literature.

It is more doubtful whether constant probabilities at the micro-level

can be disproved by macro-level data. For example, Herniter (1965)

quotes data of purchase sequence frequencies in which the empirical

probability of a given brand being purchased following various pre-

ceding patterns of brand purchase is calculated. The variable 1 in a

sequence signifies that the brand in question was bought and the

variable 0 indicates the purchase of some other brand. Thus the

sequence 011 means that the brand in question was bought on the last

two of three purchase occasions. It is possible to analyse consumer

panel records for a period and establish how frequently a pattern

like 011 is followed by a 1 or a 0, giving results such as those shown

in Table I.

TABLE I

Sequence Empirical probability of

brand repurchase

000 0-11

001 0-52

010 0-36

011 0-64

100 0-38

101 0-64

110 0-49

111 0-89

This approach was pioneered and developed by Al Kuehn (1958).

However, it should be noticed that the empirical repurchase proba-

bility for the group says nothing about individual repurchase proba-

bilities unless, as some authors seem tacitly to assume, the whole

group is treated as homogeneous. Frank (1962) has shown that micro-

level data in which there is no "learning", or tendency for the purchase

of a brand to lead to the same brand being purchased next time, can

be aggregated to produce spurious evidence that repeat purchase

probabilities do increase as the number of previous consecutive pur-

chases increase. The reason is that the sub-group which has made

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
224 APPLIED STATISTICS

(say) eight consecutive purchases of a brand contains more people of

a loyal type, so that the observed higher repurchase probability is a

function of this selectivity and not of reinforcement resulting from re-

peated purchasing and experience of a brand. The same argument

applies in reverse. The high repurchase probability following the

sequence 111 may only show that the population of consumers is

heterogeneous, with no implication about micro-level behaviour.

An interesting parallel of misleading aggregation comes from an

experiment carried out by Virginia Voeks (1954), in which subjects

were conditioned to an eye-blink response. Each individual learned

the response suddenly or "jumpwise", i.e. a run of no responses was

followed by consistent responding, but the learning took place on

different trials for different individuals. When the data was added

together to show the percentage responding after 1,2,3, . . .,12 trials a

gradually increasing curve resulted. In this case there was no question

that each individual had learned gradually, as their individual records

showed. The apparent slow and continuous increase in learning at the

aggregate level was entirely due to one individual after another joining

the group who had acquired the response as the trials went on. (For

an illuminating comment on this work see Hilgard (1956).) There is a

powerful lesson in this example about making assumptions from

aggregate data about the pattern of individual behaviour.

First-Order Markov Process Models

The attractive mathematical development of Markov processes has

led to a number of attempts to apply them to consumer purchasing.

The simplest assumption is to regard the transition rates between

states as constant over time. First, however, some definitions are in

order. The fixed probability model describes the consumer as being in

a state pi (i = 1,2, . . .,n) in an n brand market, where pi is the proba-

bility that brand i will be purchased on any occasion. The Markov

process model characterizes the buyer by a transition rate Pi j represent-

ing the conditional probability that, given that he is in state i, he will

next move to state j. First-order Markov processes have been mainly

used in the marketing literature, state i corresponding to the situation

TABLE II

brand buyers Period 2 brand buyers

ABC

A 0-8 0-1 0.1

B 02 0.7 0.1

C 0.1 0.2 07

where a consumer's last purchase was brand i. A typical repeat-buying

and brand-switching matrix of this type takes the form shown in

Table II.

The rows add to unity and the table shows that a proportion 0O8 of

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHIASING BEHAVIOUR 225

brand A buyers in the first period will buy it again in the second period,

01 will switch to brand B and 01 will switch to brand C. Similar

matrices can be built up empirically from consumer panel purchase

records, although not without some awkwardness and artificiality

which will be discussed later. It will be noticed that Table II represents

a switch to the macro-level of analysis: as usually employed, the transi-

tion matrix is derived from aggregate data and indicates that 0-8 of

buyers of brand A will have bought brand A again next time; 0-8 is a

proportion at the macro-level, not a probability for the individual

purchaser.

A first-order stationary Markov process suffers from the incon-

venience of having no memory. By the independence-of-path assump-

tion, the transition probability applicable to a particular state depends

only on which state the system is in. The path by which the state was

reached is irrelevant. To take an urn model example, if k urns contain

balls numbered 1,2, . . .,k and the procedure is to draw a ball from an

urn, note the number on the ball and to make the next drawing from

the urn of that number, then at any moment the transition probability

p,7,. depends only on the composition of balls in thejth urn; it does not

depend on the preceding drawings by which the jth urn was reached.

In application to consumer purchasing, a Markov model requires

the preceding purchase to govern the state of the system. If brand A

was purchased last, it is irrelevant how that state was reached-whether

it followed a whole series of A purchases, or whether it was an odd,

exceptional choice. It is not surprising to learn that this assumption

does not fit with the realities of buying behaviour. Kuehn (1962,

p. 395) has shown that the pattern of four preceding purchases does

influence the fifth choice in the case of frozen orange juice concentrate

buying and concludes "this finding raises some question about the

uses currently being made of purchase-to-purchase Markov Chain

Analyses which assume that only the most recent purchase of the

consumer is influential".

Ehrenberg points out another serious difficulty with the assumption

of stationary transition rates. It is well known that Markovian matrices

converge rather rapidly to near steady state values (Maffei, 1960). A

corollary is that they diverge very fast if taken backwards through time.

Since the matrix is assumed to have a constant value there is no

reason why the transition rates of period 1 should not be used to cal-

culate the theoretically preceding matrix in period 0 or in period - 1,

since the numbering of the periods is arbitrary. When this is done it

is soon found that the matrices "blow up" by including numbers with

negative values or exceeding the number of purchasers, which is im-

possible (Ehrenberg, 1965, p. 355).

Perhaps the main usable property of Markovian mathematics is the

possibility of the steady state calculation. Harary and Lipstein (1962)

see great potential in this: "It is within this context that the steady-state

predictions of brand shares can be useful for evaluating advertising and

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
226 APPLIED STATISTICS

promotion activity. Since the steady-state shares are predictions in

the future, it is possible to make relative comparisons between ad-

vertising campaigns at a number of points in time. This is particularly

valuable in advertising experiments, which should be conducted for

extended periods of time, but which frequently break down because

of other factors."

A brief look at a steady state matrix is sufficient to show the arti-

ficiality of the situation which it represents. In the two-state case a

steady state matrix has identical rows (Feller, 1957) and with multiple

states the rows approximate each other more closely as the matrix

transitions to its stable state, finally becoming an idempotent matrix

in the limit. If the rows are very similar or identical to each other

there can be no weighting of values along the diagonal of the matrix

representing the repurchase probability of each brand. A character-

istic feature of actual purchase behaviour thus disappears. Harary and

Lipstein push other mathematical aspects of Markov processes to the

limit, for example by defining a brand as an absorbing state which can

gain but never lose customers and then calculating how long it takes

for the brand in question to swallow up the entire market. It is difficult

to put much faith in the adoption of convenient properties of Markov

chains without evidence that the processes derived from the mathe-

matics have some correspondence with real world events.

Apart from theoretical considerations difficulties arise in adapting

buying data to the requirements of a Markov model. The question of

the time-period to take is an awkward one. If it is too short, many

people will not have bought in either the first or the second period and

therefore have either to be omitted from the analysis or regarded as

"'transitioning" into an artificial category "did not buy". A longer

time-period reduces the non-buyers problem but then a number of

customers will be found to have bought a variety of different brands

in each period and it is hard to decide how to represent them as being

in a single state. Draper and Nolin (1964) escape from the dilemma

by labelling the consumer according to the brand of cake mix which

accounted for her largest expenditure during a quarter, but it is

hardly a satisfactory method. These authors also average the transition

probabilities over two six-quarter periods to obtain "average transition

matrices"; the differences between them were found not to be statis-

tically significant, but the "trends" which they showed were duly

discussed.

For practical marketing purposes it is valuable to differentiate con-

sumers by their weight or frequency of purchase, and here again the

labelling by Markovian state does not discriminate. Herniter and

Magee (1961) say that "there is evidence that in certain circumstances,

the population must be classified as a minimum into three groups:

customers or users, testers or triers, and non-customers. The second

group is made up of individuals who are searching for an adequate

product or service. They will try the product and will either drop it

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 227

or become regular users relatively quickly". Alternatively, a hard-core

and a switchers group may be the important categories. There is some

evidence that marked differences exist in the case of American tooth-

paste users, the heavy buyers being more brand loyal than the medium

and light buyers over all brands (Lawrence, 1966).

In summary, first-order stationary Markov process models have a

number of shortcomings. There is no a priori reason why the assump-

tion of constant transition rates should be any more exact than the

assumption of constant probabilities. The models need validifying in

the sense of proving to yield values which are later found to correspond

with actual performance. To date much of the published work has

dealt with the theoretical future of a system based on one transition

matrix which may provide interesting information in itself; but a

transition matrix does not guarantee that a simple Markov process can

be applied. Some work by Styan and Smith (1964) has shown that a

single transition frequency matrix can be assumed to underlie 24

separate matrices representing the week-to-week switching of house-

wives between the categories of (1) detergent only buyers, (2) soap

powder buyers, (3) both powder buyers, and (4) no powder buyers.

That is to say, each of the 24 matrices could be regarded as a random

variation of a single matrix by a suitable chi-squared test. However,

in this case the Markov process had already reached a stable state

represented by the idempotent matrix with all the rows the same. The

model had therefore no predictive value and only showed that "things

were as they were". This result is not strong evidence to set against

the cases where the Markov process independence-of-path assumptions

have been shown not to square with the facts, as one would be inclined

to expect on intuitive grounds.

Higher Order Markov Process Models

One way to extend the range of Markov models is to incorporate

more of the past of the system into the definition of its states. Thus

with two alternatives, 0 and 1, in the original system it is possible to

define four compound events, 00, 01, 10 and 11. The transitions

between these states can be set out as a matrix showing the probability

of state 00 moving to state 01, and so on. Zeros appear in the matrix

as the system cannot move directly from the state 00 to state 11 without

passing through the intermediate state 01. George Miller (1952)

writes: "In principle it is possible to extend the Markov definition in-

definitely to take into account as much of the past history of the

system as one desires." Harary and Lipstein make a similar suggestion

for consumer brand-switching behaviour but as yet little applied work

seems to have been done in this area.

Second, third and higher order Markov chains depend on taking

into account two, three or more preceding states of the system. The

mathematical basis for examining various hypotheses has been given

by Anderson and Goodman (1957), who indicate tests for the order of

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
228 APPLIED STATISTICS

a Markov chain; for whether several samples are from the same Markov

chain of a given order; for whether transition probabilities can be

assumed constant, etc. The theory is well prepared but higher order

Markov processes in a multi-brand market would involve complica-

tions which have not been tackled empirically. In any case, models

of the following type may offer better possibilities of application than

the stationary approach.

Markov Process Models based on Probability States

The statistical description of learning processes was greatly advanced

by the work of Bush and Mosteller (1955). They were dissatisfied with

the probability of a particular response being taken as the state of a

system since in learning experiments the probability tended to increase

as trials went on, thus invalidating the time-independence assumption

of a normal Markov process. Their innovation was to make the proba-

bilities themselves the states of the system, affected by outside influences

which could be expressed mathematically as "operators". Thus the

probability of any event could vary in principle continuously from

zero to one. However, only two states (changed probabilities) could

be reached from any given state (probability of response). If the be-

haviour was reinforced, the gain operator specified the amount by

which the probability of a response was increased; the loss operator

similarly specified the reduction in the probability of the response which

would occur. Both operators were assumed to be linear. A new sort

of Markov process was introduced in this way: it was independent of

the time path because, no matter how a particular level of probability

had been arrived at historically, the onward transition rates were

specified by the gain and loss operators.

Al Kuehn (1962) has demonstrated the adaptation of the Bush-

Mosteller model to consumer -brand purchasing. If an individual's

probability of purchasing a given brand on the nth occasion is p(n),

and he does in fact purchase that brand, then his probability of buying

it next time, p(n + 1), is given by

p(n + 1) = (1 - a)p(n) + bs + (a - b),

where s is the equilibrium (projected) market share for the brand and

a and b are constants depending on the product class but independent

of the brand.

If some other brand was purchased on the nth occasion, the proba-

bility of the given brand being bought next is given by the loss operator

p(n + 1) =(I - a)p(n) + bs.

Heterogeneity of buyers can be represented in two ways: by the

vertical distance separating the operators (a - b) and by the slope of

the operators (1 -a). Kuehn (1962, pp. 392-393) says that the opera-

tors are functions of the time elapsed between purchases and the mer-

chandizing activity of competitors, and gives examples of the values of

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 229

the parameters for high frequency, medium frequency and low fre-

quency purchasers.

An attractive feature of the model is that it is based on a psycho-

logical theory of the learning process derived principally from the work

of Hull and Skinner. The gain operator represents the positive influence

of reinforcement on behaviour; the loss operator shows the negative

influence of punishment or of work required in responding. However,

there must be doubts about the direct application of the theory to con-

sumer purchasing:

(1) The Bush-Mosteller model is explicitly a mathematization of a

reinforcement theory of learning, the experimental support for which

leans heavily on studies of how rats learn to run mazes. Learning is

described as a process of small increments which can be expressed as

a gradually increasing probability of "success". There is no a priori

grounds to assume that consumers learn about brands in this fashion.

An alternative theory of one-trial, "quick" learning is equally respect-

able academically-it is common to Gestalt psychology and to such

authorities as Skinner and Guthrie within the associationist school-

and may better describe the behaviour of at least some consumers of

some categories of product.

(2) In animal learning experiments the reward is clear-cut: food

for a hungry subject, typically. Reinforcement of the behaviour asso-

ciated with rewards is observed to occur. In mathematical versions of

experimental evidence, a reward always increases the probability of a

repeat performance via suitably defined operators. Adaptations of the

learning axiom to consumer behaviour, as in the work of Al Kuehn,

implicitly equate the purchase of a brand with a "reward" which in-

creases the repurchase probability for that brand automatically

through the gain operator. But it is possible that a purchase may

prove disappointing and cause the buyer to decide never to buy that

brand again. All purchases are not equally "rewards", and the sign of

the change in purchase probabilities is not necessarily positive. Further,

Bush and Mosteller point out (pp. 330, 332) that their model takes no

account of response intensity or difference in reward. In the consumer-

buying analogy, the corresponding assumption must be that all pur-

chases produce the same intensity of consumer reaction. Not only is

there always satisfaction, but that satisfaction is always of the same

amount whichever brand was purchased. It is difficult to believe that

a model which incorporates such assumptions is adequate to deal with

consumer purchase behaviour.

(3) Luce (1959) points out that learning theorists have tended to

concentrate their research on choices between two alternatives. How-

ever: "Complete data concerning the choices that a person makes

from each possible pair of alternatives taken from a set of three or

more alternatives do not appear to determine what choice he will

make when the whole set is presented. . .. For the most part, present-

day psychologists have been willing to ignore-or, to be more accurate

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
230 APPLIED STATISTICS

to bypass and postpone-the connections between pairwise choices and

more general ones. And so the relations have remained obscure."

Consumer behaviour in most markets is a question of choice between

multiple alternatives. A learning model based on dichotomies of

choices may not describe it satisfactorily.

Suggestions for Future Developments

The investigators who have developed statistical learning theory

have not been altogether successful in defending themselves against the

accusation that their work amounts to "blind curve-fitting". The

charge lies even more heavily against model-builders who fasten

equations on to empirical data unless they are able to show that a very

wide range of evidence is accommodated by their formulations. The

need is for an adequate theoretical framework to deal with the process

or system under study from which hypotheses can be derived and put

to the test.

The nature of learning is a case in point. The reinforcement mech-

anism underlies most of the models which have been developed up to

the present time. Several alternative types of learning process have

been suggested by psychologists: the slow, incremental acquisition of

conditioning or the insightful perception leading to a sudden change

in behaviour; deliberate, incidental and latent learning-for a dis-

cussion of the application of this formulation to the effects of advertis-

ing, see Kelvin (1962); and so on. We need to know which type of

process is best applicable to consumer purchasing, or which combina-

tions of processes. For it is possible that one product category or one

type of purchaser, distinguished by personality and/or socio-demo-

graphic characteristics, is best described by rapid learning. The

corresponding buying pattern would be marked by a distinct cleavage

as the advantages of brand B over brand A became suddenly apparent.

In another product category behaviour might be better described in

incremental learning terms, with gradual reinforcement occurring as

succeeding purchases are made. One would like to know the relevant

dimensions of products: whether low-value, frequent-purchase items

of limited salience could be grouped together; whether low differ-

entiation between brands or the stage of the product in its life cycle

would prove to be the best basis for aggregating products by the type

of learning process involved. Differences between individuals might

alternatively provide a more satisfactory method of classification.

To take a concrete example, what happens when a long string of

brand A purchases is suddenly interrupted by a brand B purchase?

The "logical" consumer would presumably decide between the brands

and continue by buying one or the other, but not both. Perhaps a

period of uncertainty would follow, marked by alternations between

brand A and brand B. A theory that might be applied in this case is

the three-state Markov learning model of Theios (1961), which sug-

gests that an animal in a learning experiment enters into a distinct

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 231

phase once it has made the first correct response and leaves it again

following its last failure. The probabilities of a response are 0, 2, and 1

during the pre-conditioning, conditioning, and conditioned stages

respectively. (See also Bower, 1961.) Finally, the consumer might be

entering an experimental phase and prove likely to go on to try brands

C and D. In each case an appropriate theory could be produced and

a model built from it. Little work has been done on behavioural indi-

cations of the nature of learning processes in the consumer goods field.

It is an area in which students of marketing are well placed to repay

some of the debt which they owe to the social sciences.

Secondly, future investigation might concentrate on the question of

consumer heterogeneity. Current stochastic theory tends to locate the

source of variability within individuals, by postulating that each indi-

vidual has a probability of a given response and that all individuals

are identical. At the other extreme variability can be thought of as

existing between individuals, with each person having a response proba-

bility of either 0 or 1 and the observed variation in purchasing be-

haviour being due to the relative proportions of such individuals in the

population. A combination of the two approaches is needed. Distri-

butions along both dimensions can be considered simultaneously in a

more realistic representation of market conditions. Some work dealing

with one brand at a time (Chatfield et al., 1966, and earlier references

given there) is of this kind, and James Coleman (1 964b) has suggested

a number of models which pursue the theme of variation in response

probability.

In conclusion, consumer behaviour has been treated as a dichotomy,

"bought brand A" and "bought some other brand than A", in many

models of decision processes. The simplification may be justified in

some cases for ease of computation but there is a risk that significant

differences in behaviour can be obscured. It has been powerfully

argued that paired product comparisons are misleading and should

be abandoned (Blankenship, 1966). In the wider field of consumer

purchasing it is also desirable to allow for the multiple-choice situation

of the marketplace and to devise analytical methods which are capable

of handling it in its full complexity.

REFERENCES

ANDERSON, T. W. and GOODMAN, L. A. (1957). Statistical inference about Markov

chains. Ann. Math. Statist., 28, 89-1 10.

BLANKENSHIP, A. B. (1966). Let's bury paired comparisons. 7. Advert. Res., 6, 13-17.

BOWER, G. H. (1961). General Three-State Markov Learning Models. Technical Report

No. 41. Institute of Mathematical Studies in the Social Sciences, Stanford

University.

BROWN, G. H. (1952-53). Brand loyalty-Fact or fiction? Advert. Age, 23 (June 9,

June 30, August 11, October 6, December 1, 1952); 24 (January 26, 1953).

BUSH, R. R. and MOSTELLER, F. (1955). Stochastic Models for Learning. New York:

Wiley.

AS F

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
232 APPLIED STATISTICS

CHATFIELD, C., EHRENBERG, A. S. C. and GOODHARDT, G. J. (1966). Progress on a

simplified model of stationary purchasing behaviour. 3. R. Statist. Soc. A,

129, 317-367.

COLEMAN, J. S. (1964a). Introduction to Mathematical Sociology, p. 321. Glencoe, Ill.:

Free Press.

(1964b). Models of Change and Response Uncertainty. Englewood Cliffs, N.J.:

Prentice-Hall.

COLEMAN, J., KATZ, E. and MENZEL, H. (1957). The diffusion of an innovation

among physicians. Sociometry, 20, 253-270.

DRAPER, JEAN E. and NOLIN, L. H. (1964). A Markov chain analysis of brand

preferences. 3. Advert. Res., 4, 33-38.

EHRENBERG, A. S. C. (1965). An appraisal of Markov brand-switching models.

J. Marketing Res., 2, 347-362.

ESTES, W. K. (1950). Toward a statistical theory of learning. Psychol. Rev., 57,

94-107.

FELLER, W. (1957). An Introduction to Probability Theory and its Applications, Vol. 1,

p. 385. New York: Wiley.

FESTINGER, L. (1964). Behavioral support for opinion change. Publ. Opinion Quart.,

28, 404-417.

FOURT, L. A. and WOODLOCK, J. W. (1960). Early prediction of market success for

new grocery products. 3. Marketing, 25, 31-38.

Fox, H. W. (1960). Advertising efficacy: An analytical study. JN.A.A. Bull. (Feb.),

53-59.

FRANK, R. E. (1962). Brand choice as a probability process. i. Busin., 35, 43-56.

HARARY, F. and LIPSTEIN, B. (1962). The dynamics of brand loyalty: A Markovian

approach. Operat. Res., 10, 35.

HASKINS, J. B. (1964). Factual recall as a measure of advertising effectiveness. 3.

Advert. Res., 4, 2-8.

HERNITER, J. D. (1965). Stochastic market models and the analysis of consumer

panel data. (Paper at 27th National Meeting, Operations Research Society of

America.)

HERNITER, J. D. and MAGEE, J. F. (1961). Customer behavior as a Markov process.

Operat. Res., 9, 105-122.

HILGARD, E. R. (1956). Theories of Learning, p. 72. New York: Appleton-Century-

Crofts.

HOWARD, R. A. (1963). Stochastic process models of consumer behaviour. 3. Advert.

Res., 3, 39.

KELVIN, R. P. (1962). Advertising and Human Memory, p. 16. London: Business

Publications.

KUEHN, A. A. (1958). An analysis of the dynamics of consumer behavior and its

implications for marketing management. (Unpublished Ph.D. dissertation.)

Graduate School of Industrial Administration, Carnegie Institute of Tech-

nology.

- (1962). Consumer brand choice-A learning process? In Quantitative Tech-

niques in Marketing Analysis. (ed. R. E. Frank, A. A. Kuehn and W. F. Massy).

Homewood, Ill.: Irwin.

LANGHOFF, P. (1965). Models, Measurement and Marketing, p. 13. New York: Prentice-

Hall.

LAwRENCE, R. J. (1966). New analyses of consumer purchasing patterns. Business

(March), 60.

LAZARSFELD, P. F. (1934). The psychological aspect of market research. Harvard

Busin. Rev., 13, 54-71.

LUcE, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis, p. 3. New York:

Wiley.

MAFFEI, R. B. (1960). Brand preferences and simple Markov processes. Operat.

Res., 8, 210-218.

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions
MODELS OF CONSUMER PURCHASING BEHAVIOUR 233

MILLER, G. A. (1952). Finite Markov processes in psychology. Psychometrika, 17,

149-167.

NERLOVE, M. and ARROW, K. J. (1962). Optional advertising policy under dynamic

conditions. Economica, 29, 129-142.

NEWELL, A. and SIMON, H. A. (1963). Computers in psychology. In Handbook of

Mathematical Psychology (ed. R. D. Luce, R. R. Bush and E. Gallanter),

Vol. 1, p. 368. New York: Wiley.

ROHILOFF, A. C. (1963). New ways to analyze brand-to-brand competition. In

Proc. Winter Conf. American Marketing Association: Toward Scientific Marketing

(ed. S. A. Greyser), pp. 224-240. Chicago: American Marketing Association.

SEVIN, C. K. (1965). Marketing Productivity Analysis. New York: McGraw-Hill.

STYAN, G. P. H. and SMITH, H., Jr (1964). Markov chains applied to marketing.

3. Marketing Res., 1, 50-55.

TELSER, L. G. (1962). Advertising and cigarettes. 5. Polit. Econ., 70, 471-499.

THEIos, J. (1961). A Three-State Modelfor Learning. Technical Report No. 40. Insti-

tute of Mathematical Studies in the Social Sciences, Stanford University.

VIDALE, M. L. and WOLFE, H. B. (1957). An operations-research study of sales

response to advertising. Operat. Res., 5, 370-381.

VOEKS, VIRGINIA W. (1954). Acquisition of S-R connections: A test of Hull's and

Guthrie's theories. 3. Exp. Psychol., 47, 137-147.

WEINBERG, R. S. (1960). An Analytical Approach to Advertising Expenditure Strategy.

New York: Association of National Advertisers.

This content downloaded from 129.96.252.188 on Fri, 11 Mar 2016 04:25:59 UTC
All use subject to JSTOR Terms and Conditions

You might also like