Conversion of Ordinal Attitudinal Scales

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Quant Mark Econ

DOI 10.1007/s11129-011-9116-1
Conversion of ordinal attitudinal scales: An inferential
Bayesian approach
Michael Evans Zvi Gilula Irwin Guttman
Received: 22 February 2011 / Accepted: 6 November 2011
Springer Science+Business Media, LLC 2011
Abstract The need for scale conversion may arise whenever an attitude of
individuals is measured by independent entrepreneurs each using an ordi-
nal scale of its own with possibly different numbers of (arbitrary) ordinal
categories. Such situations are quite common in the marketing realm. The
conversion of a score of an individual measured on one scale into an estimated
score of a similar scale with a different range is the concern of this paper. An
inferential Bayesian approach is adopted to analyze the situation where we
believe the scale with fewer categories can be obtained by collapsing the finer
scale. This leads to inferences concerning rules for the conversion of scales.
Further, we propose a method for testing the validity of such a model. The
use of the proposed methodology is exemplified on real data from surveys
concerning performance evaluation and satisfaction.
Keywords Ordinal scales Collapsing scales Scale conversions
Bayesian inference
JEL Classification C11
M. Evans I. Guttman
Department of Statistics, University of Toronto, Toronto, ON, Canada M5S 3G3
e-mail: mevans@utstat.utoronto.ca
Z. Gilula (B)
Department of Statistics, Hebrew University, Jerusalem 91905, Israel
e-mail: zvi.gilula@huji.ac.il
I. Guttman
Department of Statistics, State University of New York, Buffalo, NY 14214, USA
e-mail: sttirwin@acsu.buffalo.edu
M. Evans et al.
1 Introduction
The need for scale conversion may arise whenever an attitude of individuals
is measured by independent entrepreneurs each using an ordinal scale of
his/her own with possibly different number of (arbitrary) ordinal categories.
In the marketing realm, it is not uncommon to encounter situations where
satisfaction of customers is measured independently by several competing
marketing research companies. Each company has its own satisfaction scale
and they all use (non-identical) independent samples of customers from the
same population.
Although scales may differ by the number of ordinal categories used, they
may all share the same structure, where the lowest category refers to very
dissatisfied and the last category refers to most satisfied. Another, not un-
common situation arises when a company that used to measure the satisfaction
of its own customers in the past by a R-category scale, is now employing
a scale of only K < R categories for a variety of reasons (this has been
the case with several universities measuring student satisfaction from courses
attended).
The problem of how many ordinal categories an attitudinal scale should
have is an old one. See for instance (Miller 1956), and the classical paper
(Green and Rao 1970). Although some compelling scientific arguments are
mentioned in the literature as to some optimal aspects of ordinal scales, there
is no universal consensus as to the number of categories to use. As a result,
situations as described above occur quite frequently.
It might be of interest to be able to answer the question: can we sys-
tematically convert a score of an individual measured on a given scale into
an estimated score on a similar-in-nature scale but with a different number
of categories?. An attempt to answer this question is the concern of this
paper.
The methodology proposed in this paper assumes the following:
(a) All the underlying scales measure the same kind of attitude (satisfaction,
say). We do NOT propose to convert a satisfaction score measured by
one scale to an agreement score measured by another scale.
(b) All the underlying scales are ordinal-categorical scales (and not interval
or ratio scales). Ordinality of categories reflects the ordinal nature of an
attitude as a concept (more, most, better, best, etc.).
(c) All samples measured on these scales come from the same population
and no individual is sampled more than once.
Suppose then that we have two scales, labelled I and II, that are presumably
measuring the same phenomenon. Scale I classifies responses into R ordered
values and scale II classifies responses into K ordered values and suppose that
R > K.
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
Let X {1, . . . , R} and Y {1, . . . , K} denote the values taken by a ran-
domly selected population element on scale I and scale II respectively. Let
p
i
= P(X = i), for i = 1, . . . , R, and q
j
= P(Y = j), for j = 1, . . . , K, denote
the distributions of these variables over the population in question. Of course,
these distributions are unknown and we suppose that data have been col-
lected from sampling studies to make inferences about these distributions.
Let ( f
1
, . . . , f
R
), with

R
i=1
f
i
= n, denote the counts from the sampling study
involving scale I and (g
1
, . . . , g
K
), with

K
j=1
g
j
= m, denote the counts from
the sampling study involving scale II.
Our problem is concerned with determining how we should combine the
results from the two studies. We take the point-of-view that our aim should
be collapsing, in some fashion, scale I into scale II and then combining our
inferences. In the typical application it is not at all clear how this collapsing
should take place. We would like this collapsing to depend on the data and
proceed according to sound inferential principles rather than follow some ad
hoc procedure.
In Section 2 we formulate a model for the collapsing. This is perhaps
the simplest model for investigating the possibility that two scales are in
effect identical, as the coarser scale is obtained from the finer scale by a
direct mapping of whole categories on the coarser scale to whole categories
on the finer scale. Of course, we could imagine a much more complicated
relationship where categories on the coarser scale are split and our approach
does not address this possibility. So a conclusion that there is no collapsing
is not an assertion that the two scales are not effectively identical. We note,
however, that a model for the more elaborate situation will require more
assumptions as our model only needs the assumption of two independent
samples. Furthermore, inferences for the more involved model will still require
an assessment of whether or not a collapsing is feasible and we believe that this
will entail using the methods we describe here. We make some more comments
on this in Section 7.
We find it most convenient to place our approach in a Bayesian context and
include discussion on appropriate priors for this problem in Section 2. Also in
Section 2 we discuss a computational approach for implementing a Bayesian
analysis and provide a small artificial example. In Section 3 we discuss the basic
Bayesian inferences that follow from the model and provide the results of a
simulation study to demonstrate the effectiveness of our approach. In Section 4
we show how inferences about the true collapsing lead to rules for converting
one scale to another and to an assessment of the uncertainty associated with
such rules. In Section 5 we consider methods for checking the model specified
in Section 2. In Section 6 we apply these results to a problem of some practical
importance and draw some conclusions in Section 7. We note that we do not
compare our approach to any existing methodology as, surprisingly, there does
not appear to be any literature on this problem.
M. Evans et al.
2 The model, the prior, and the posterior
Given that we have observed ( f
1
, . . . , f
R
) and (g
1
, . . . , g
K
), the likelihood for
the unknown p and q is given by
L( p, q | f, g) =
R

i=1
p
f
i
i
K

j=1
q
g
j
j
, (1)
where p S
R
, q S
K
and S
N
denotes the (N 1)-dimensional simplex. Let

R
denote a prior on p. Typically, we will take this prior to be a uniform prior,
i.e., Dirichlet
R
(1, . . . , 1), to reflect noninformativity, but we will also consider
a general Dirichlet
R
(a
1
, . . . , a
R
).
Now suppose that p is given. Of course, we do not know these values but
now we will effectively put a prior on the set of all possible collapsings assum-
ing that we know p. In other words we will specify the prior hierarchically by
specifying a prior for the collapsings given the value of p, and then place a
marginal prior on p as we have already specified.
First we develop a model for the collapsing. Let (P
1
, . . . , P
K
) denote an
ordered partition of {1, . . . , R} into K disjoint subsets with
K
i=1
P
i
= {1, . . . , R}
such that P
i
= for all i and, if 1 i < j K, then P
i
= {s, s +1 . . . , t}, P
j
=
{u, u +1 . . . , v} with 1 s t < u v R. The set P
i
consists of those cate-
gories on scale I that are collapsed to form category i on scale II. Note that we
have restricted to those collapsings for which no null sets are allowed in the
partition, because we know that we can observe observations in each category
of scale II. Note that the number of such partitions (P
1
, . . . , P
K
) equals
the number of solutions of z
1
+ + z
K
= R in positive integers z
1
, . . . , z
K
.
Therefore, there are
_
R1
K1
_
such partitions. Typically, for the applications we
have in mind, this number will not be too large.
It turns out to be convenient for tabulation purposes to denote an ordered
partition (P
1
, . . . , P
K
) by the notation (i
1
, . . . , i
K
) where 1 i
1
< i
2
< <
i
K
= R and i
j
denotes the highest category in P
j
. We denote this set of K-
tuples by T
R,K
. We can systematically tabulate the T
R,K
by running through
the K-tuples (i
1
, . . . , R) starting with (1, 2, . . . , K 1, R), letting i
K1
range
from K 1 to R 1, then change i
K2
to K 1 and letting i
K1
range from K
up to R 1, etc. The following example should make the notation clear.
Example 1 (Tabulating partitions.) If R = 5 and K = 3, then there are
_
4
2
_
= 6
elements of T
5,3
. In particular, (i
1
, i
2
, i
3
) = (2, 4, 5) T
5,3
specifies the partition
(P
1
, P
2
, P
3
) = ({1, 2}, {3, 4}, {5}). Following our rule the set T
5,3
can be listed
as T
5,3
= {(1, 2, 5), (1, 3, 5), (1, 4, 5), (2, 3, 5), (2, 4, 5), (3, 4, 5)}.
Therefore, proceeding as if the categories in scale II arise from a collapsing
of scale I, the likelihood (1) is no longer correct, rather the likelihood for
( p, (P
1
, . . . , P
K
)) is given by
L( p, (P
1
, . . . , P
K
) | f, g) =
R

i=1
p
f
i
i
K

j=1

lP
j
p
l

g
j
. (2)
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
The likelihood expresses the information in the data concerning the possible
collapsings and the value of p. The correctness of Eq. 2 depends on the
assumption that the coarser scale is obtained from the finer scale by a collaps-
ing of the form we have described and, as previously noted, there are other
possibilities. In Section 5 we provide methods for checking the correctness of
this assumption.
If we denote the conditional prior on the set of all ordered partitions given
p by ((P
1
, . . . , P
K
) | p), then the marginal prior on (P
1
, . . . , P
K
) is
(P
1
, . . . , P
K
) =
_
S
R
(P
1
, . . . , P
K
| p)
R
( p) dp. (3)
As we will see, we need to compute this quantity to implement some of our
inferences. Of course, we will also need to compute the posterior probability
of (P
1
, . . . , P
K
), namely,
(P
1
, . . . , P
K
| f, g)
_
S
R
R

i=1
p
f
i
i
K

j=1

lP
j
p
l

g
j
(P
1
, . . . , P
K
| p)
R
( p) dp. (4)
Clearly the choice of (P
1
, . . . , P
K
| p) is a key step in the analysis. One
possible specification for (P
1
, . . . , P
K
| p) is the uniform prior. The uniform
prior puts weight 1/
_
R1
K1
_
on each possible partition and so Eq. 3 equals this
quantity no matter what prior is placed on p. With this choice, and
R
equal to
the Dirichlet
R
(a
1
, . . . , a
R
) density, we have that Eq. 4 is proportional to
_
S
R
R

i=1
p
f
i
+a
i
1
i
K

j=1

lP
j
p
l

g
j
dp.
Now, letting u
j
=

lP
j
p
l
, and using
u Dirichlet
K

lP
1
( f
l
+a
l
), . . . ,

lP
K
( f
l
+a
l
)

when p Dirichlet
R
( f
1
+a
1
, . . . , f
R
+a
R
), we see that Eq. 4 is propor-
tional to

R
i=1
f
i
+

R
i=1
a
i
_
K

j=1

lP
j
( f
l
+a
l
)
_
_
S
K
K

j=1
u
g
j
+

lP
j
( f
l
+a
l
)1
j
du
=
K

j=1

_
g
j
+

lP
j
( f
l
+a
l
)
_

lP
j
( f
l
+a
l
)
_ .
M. Evans et al.
So to obtain the posterior we need to sum this quantity over all partitions to
evaluate the normalizing constant for the posterior probability function.
The same argument establishes closed-form expressions for Eqs. 3 and 4 for
a more general family of priors. We summarize this in the following result.
Lemma Suppose that the prior on ( p, (P
1
, . . . , P
K
)) is given by
(P
1
, . . . , P
K
| p)
K

j=1

lP
j
p
l

b
j
, (5)
for b
j
> 1, j = 1, . . . , K, and p Dirichlet
R
(a
1
, . . . , a
R
). Then we have that
(P
1
, . . . , P
K
)
K

j=1

_
b
j
+

lP
j
a
l
_

lP
j
a
l
_ (6)
and
(P
1
, . . . , P
K
| f, g)
K

j=1

_
b
j
+ g
j
+

lP
j
( f
l
+a
l
)
_

lP
j
( f
l
+a
l
)
_ . (7)
Note that the uniform prior on (P
1
, . . . , P
K
) corresponds to b
1
= =
b
K
= 0. If a
1
= = a
R
= 1, i.e., we place a uniform prior on p, and b
1
> 0
while b
2
= = b
K
= 0, then partitions that have more elements in P
1
will
receive more prior weight than partitions with fewer elements in P
1
. Similar
interpretations can be made for other choices of the b
j
, although they are
difficult to describe precisely. It would seem, however, if we want to emphasize
partitions that place more elements in certain cells of the partition then we
should choose the corresponding b
j
relatively large. We will refer to the prior
given in Eq. 5 by the K-tuple [b
1
, . . . , b
K
] hereafter, whenever the prior on p
is uniform. So, for example, [0, . . . , 0] corresponds to the uniform prior on the
set of partitions and the uniform prior on S
R
.
We can systematically tabulate any prior and posterior via Eqs. 6 and 7 by
using the method we have previously described for listing the elements of T
R,K
.
The following example illustrates this.
Table 1 Several priors on partitions in Example 2
partition( )\prior[ ] [0, 0, 0] [10, 0, 0] [10, 10, 0] [1, 1, 1] [.5, 0, 0]
(1, 2, 5) .167 .011 .004 .143 .229
(1, 3, 5) .167 .011 .040 .190 .229
(1, 4, 5) .167 .011 .239 .143 .229
(2, 3, 5) .167 .121 .040 .190 .114
(2, 4, 5) .167 .121 .438 .190 .114
(3, 4, 5) .167 .725 .239 .143 .086
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
Table 2 Posterior probabilities on partitions in Example 2
partition( ) \ prior[ ] [0, 0, 0] [10, 0, 0] [10, 10, 0] [1, 1, 1] [.5, 0, 0]
(1, 2, 5) .007 .001 .000 .006 .007
(1, 3, 5) .189 .018 .037 .185 .208
(1, 4, 5) .181 .017 .162 .167 .199
(2, 3, 5) .200 .176 .059 .208 .189
(2, 4, 5) .398 .351 .714 .409 .376
(3, 4, 5) .025 .437 .028 .024 .020
Example 2 (Tabulating priors and posteriors in Example 1.) In Table 1 we have
tabulated several priors for the situation described in Example 1. Note that
the left column lists the elements of T
5,3
, while the first row specifies several
priors. From this we can see that the larger the value of b
i
> 0 is, the more
prior weight is being placed on partitions that have more elements in P
i
, while
the closer b
i
is to 1 the more prior weight is being placed on partitions that
place fewer elements in P
i
.
Suppose we observe ( f
1
, f
2
, f
3
, f
4
, f
5
) = (4, 3, 6, 3, 7) and (g
1
, g
2
, g
3
) =
(7, 9, 7). Note that (g
1
, g
2
, g
3
) corresponds exactly to partition (2, 4, 5). In
Table 2 we have tabulated the posterior probabilities for the priors in Table 1.
Notice that the correct partition (2, 4, 5) is the posterior mode for each analysis
except that using the prior [10, 0, 0]. This prior puts too much weight on P
1
having most of the collapsing and so (3, 4, 5) results as the mode. Still for a
relatively small amount of data the methodology is clearly on the right track in
identifying suitable collapsings. Also worth noting is that the most incorrect
collapsing, given by (1, 2, 5), is always given low posterior probability. The
benefit of having appropriate strong prior information is clearly demonstrated
by the prior [10, 10, 0].
3 Inferences for collapsings
One possibility for inferences is to use those based on the highest posterior
density (hpd) principle (see Box and Tiao 1973 for a discussion of various
Bayesian inference terms). For this we record a -credible region of the form
B

( f, g) = {(P
1
, . . . , P
K
) : (P
1
, . . . , P
K
| f, g) > k

( f, g)}
where k

( f, g) is the largest value such that (B

( f, g) | f, g) where
( | f, g) denotes posterior measure. The size of such regions, say for = .95,
then tells us something about the accuracy of our inferences. This gives a
nested sequence of subsets, indexed by , with the smallest set containing only
the posterior mode.
In general, inferences based on the highest posterior density principle suffer
from a lack of invariance. For example, if we wanted to make inference about
the continuous parameters p, or q as derived from p (see Section 4), then a
M. Evans et al.
-hpd region does not transform appropriately under 1-1, smooth transforma-
tions. A class of inferences that are invariant under such transformations is
given by the relative surprise principle (see Evans et al. 2006 and Evans and
Shakhatreh 2008 for discussion of relative surprise inferences). A -relative
surprise credible region for the true partition is of the form
C

( f, g) =
_
(P
1
, . . . , P
K
) :
(P
1
, . . . , P
K
| f, g)
(P
1
, . . . , P
K
)
k

( f, g)
_
and k

( f, g) is the largest value such that (C

( f, g) | f, g) . We see that
the relative belief ratio (see Evans and Shakhatreh 2008) for (P
1
, . . . , P
K
),
given by
RB(P
1
, . . . , P
K
) = (P
1
, . . . , P
K
| f, g)/(P
1
, . . . , P
K
),
is a measure of how belief in the validity of the partition (P
1
, . . . , P
K
) has
changed from a priori to a posteriori. Note that a relative belief ratio is similar
to a Bayes factor, which is the ratio of the posterior odds to the prior odds,
as both are measuring the change in belief from a priori to a posteriori. The
relative belief ratio measures change in belief on the probability scale while
the Bayes factor measures change in belief on the odds scale.
Accordingly, C

( f, g) contains all the partitions whose relative belief ratios


are above some cut-off. The partition that maximizes the relative belief ratio
is referred to as the least relative surprise estimate (LRSE) (Evans, Guttman
and Swartz 2006) where the terminology Least in LRSE is explained after
Eq. 9. The computations of the LRSE and the credible regions C

( f, g) are
carried out in a fashion similar to computing a posterior mode and hpd regions.
For the examples in this paper, we tabulated each possible value for the
posterior density (using Eq. 7) and the prior density (using Eq. 6), tabulated
the values of the relative belief ratio using these quantities and obtained the
LRSE by searching the list finding where this ratio is maximized. To find
credible regions, namely, to compute k

( f, g) in C

( f, g), we proceeded by
first ordering the partitions by the value of their relative belief ratios from
largest to smallest. Starting with the LRSE, which has the largest relative belief
ratio, we added partitions to the set C

( f, g), going down the ordered list, until


the sum of the posterior probabilities of the added partitions is just greater
than or equal to . The relative belief ratio corresponding to the last partition
added to C

( f, g) gives the value of k

( f, g). So the credible regions C

( f, g)
are nested, namely, C

1
( f, g) C

2
( f, g) whenever
1

2
with the smallest
nonempty one containing only the LRSE. Note that the LRSE partition is
the same as the maximum a posteriori (MAP) partition, i.e., the mode of the
posterior, whenever the prior (P
1
, . . . , P
K
) is uniform, but otherwise these
estimates can be different, as demonstrated in Example 3.
In the cited references various optimality properties have been proved for
relative surprise inferences in the class of all Bayesian inferences. Note, in
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
particular, that because relative surprise inferences are based on relative belief
ratios, we can expect these inferences to be less dependent on the prior than
say hpd-inferences, as we are dividing the posterior by the prior. In fact it
can be proven that the relative belief ratio is independent of the choice of the
marginal prior (P
1
, . . . , P
K
).
Consider now inferences for Example 1.
Example 3 (Estimating the true partition in Example 2.) In Table 3 we have
given the relative belief ratios RB(P
1
, . . . , P
K
), for the problem discussed
in Example 2. These are obtained by dividing the entries in Table 2 by
the corresponding entries in Table 1. Notice that the LRSE is given by the
correct partition (2, 4, 5) in every case, while the mode failed for the prior
[10, 0, 0]. This is characteristic of relative surprise inferences as the prior has
less influence in these inferences than in other Bayesian inferences.
From the posterior probabilities in Table 2 and the relative belief ratios in
Table 3, we can construct C

( f, g). For example, when the prior is [0, 0, 0],


then
C
.95
( f, g) = {(1, 3, 5), (1, 4, 5), (2, 3, 5), (2, 4, 5), (3, 4, 5)}, (8)
i.e., every partition but (1, 2, 5) is included and k
.95
( f, g) = 0.150. This is not
very informative, but then we do not have a lot of data. Actually, from Table 2,
the posterior content of C
.95
( f, g) is 0.993 in this case.
Suppose we wished to assess the hypothesis that a certain partition was
true say,
H
0
: (P
1
, . . . , P
K
) = (P

1
, . . . , P

K
),
i.e., we want to assess whether or not the actual collapsing is given by (P

1
, . . . ,
P

K
), as prescribed by some theory. This can be assessed by computing

= inf{ : (P

1
, . . . , P

K
) C

( f, g)}
and concluding that we have evidence against H
0
when

is near 1. Equiva-
lently we can compute the relative surprise tail probability given by

_
(P
1
, . . . , P
K
| f, g)
(P
1
, . . . , P
K
)

(P

1
, . . . , P

K
| f, g)
(P

1
, . . . , P

K
)

f, g
_
, (9)
Table 3 Relative belief ratios for partitions in Example 2
partition( ) \ prior[ ] [0, 0, 0] [10, 0, 0] [10, 10, 0] [1, 1, 1] [.5, 0, 0]
(1, 2, 5) 0.042 0.091 0.000 0.042 0.031
(1, 3, 5) 1.132 1.636 0.925 0.974 0.908
(1, 4, 5) 1.084 1.546 0.678 1.168 0.869
(2, 3, 5) 1.198 1.455 1.475 1.095 1.658
(2, 4, 5) 2.383 2.901 1.630 2.153 3.298
(3, 4, 5) 0.150 0.603 0.117 0.168 0.233
M. Evans et al.
i.e., the posterior probability of obtaining a relative belief ratio no larger
than that obtained for the hypothesized value. Small values for Eq. 9 provide
evidence against (P

1
, . . . , P

K
) in the sense that such a value is surprising in
light of the data we have observed. Note that if we put (P

1
, . . . , P

K
) equal to
the LRSE, then Eq. 9 equals 1 and so is the least surprising partition.
Example 4 (Testing a partition in Example 2.) Suppose we want to test the null
hypothesis H
0
that the true partition is (2, 4, 5) when the prior is [0, 0, 0]. Then
it is clear that the tail probability is 1 and we have no evidence against H
0
. If we
tested the null hypothesis H
0
that the true partition is (1, 2, 5) when the prior is
[0, 0, 0], then the tail probability is 1 0.993 = 0.007 and we have substantial
evidence against H
0
.
We consider now the general effectiveness of our methodology. For this
we focus on the performance of the LRSE when there is a collapsing. It can
be proved (see Evans and Jang 2011) that, for any problem where we are
estimating a quantity that takes on one of finitely many values, the LRSE
is a Bayes rule with respect to the loss function I(a = )/( = ) where
( = ) is the prior probability that = and I(a = ) = 1 when a =
and is 0 otherwise. For this loss function the prior risk of an estimator (x) (the
average loss under the sampling model for the data x and the prior) is equal to

M((X) = | = ) where the sum is taken over all the possible values
for and M((X) = | = ) is the conditional prior probability that the
estimator doesnt equal given that = . So selecting to be the LRSE
minimizes the sum of the conditional probabilities of making an error. By con-
trast the posterior mode, or MAP estimator, minimizes the unconditional prior
probability of making an error, namely,

( = )M((X) = | = ).
The advantage of the LRSE is that it treats all the possible values of
equivalently and doesnt weight them by their prior probabilities. In cases
where we inadvertently assign a relatively small prior probability to the true
value of , then the LRSE will perform much better than the MAP estimator.
Of course, they are equivalent when the prior weights are uniform. Note that
it is equivalent to say that the LRSE maximizes

M((X) = | = ),
namely, the sum of the conditional probabilities of not making an error.
The following example examines the performance of the LRSE for the
estimation of the correct collapsing by computing the probabilities c

=
M(
LRSE
(X) = | = ) for T
R,K
, where
LRSE
is the LRSE partition
and X is comprised of ( f
1
, . . . , f
R
) and (g
1
, . . . , g
K
). So c
(i
1
,...,i
K
)
is the condi-
tional prior probability, given that the true partition is the one specified by
(i
1
, . . . , i
K
), that the LRSE will equal this partition. The larger the value of
c
(i
1
,...,i
K
)
the better is the performance of the LRSE.
Example 5 (A simulation study.) In Table 4 we present the values of c
(i
1
,...,i
K
)
for several cases where ( p
1
, . . . , p
R
) Dirichlet(1, 1, . . . , 1) and (P
1
, . . . , P
K
)
has a uniform prior. Since ( f
1
, . . . , f
R
) Multinomial(n, p
1
, . . . , p
R
), then
we have (g
1
, . . . , g
K
) Multinomial(m,

iP
1
p
i
, . . . ,

iP
K
p
i
) for a specified
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
Table 4 Results of the simulation study in Example 5 of the performance of the LRSE of the true
collapsing when using uniform priors
(R, K) (m, n) c
(i
1
,...,i
K
)
(3, 2) (5, 5) c
(1,3)
= 0.755, c
(2,3)
= 0.734
(3, 2) (5, 10) c
(1,3)
= 0.756, c
(2,3)
= 0.756
(3, 2) (20, 20) c
(1,3)
= 0.836, c
(2,3)
= 0.833
(3, 2) (20, 50) c
(1,3)
= 0.852, c
(2,3)
= 0.853
(3, 2) (500, 500) c
(1,3)
= 0.961, c
(2,3)
= 0.961
(5, 3) (10, 10) c
(1,2,5)
= 0.611, c
(1,3,5)
= 0.340, c
(1,4,5)
= 0.560,
c
(2,3,5)
= 0.403, c
(2,4,5)
= 0.337, c
(3,4,5)
= 0.602
(5, 3) (500, 500) c
(1,2,5)
= 0.929, c
(1,3,5)
= 0.826, c
(1,4,5)
= 0.879,
c
(2,3,5)
= 0.887, c
(2,4,5)
= 0.824, c
(3,4,5)
= 0.929
(10, 3) (50, 50) min(c
(i
1
,i
2
,10)
) = 0.181, max(c
(i
1
,i
2
,10)
) = 0.644
(10, 3) (500, 500) min(c
(i
1
,i
2
,10)
) = 0.523, max(c
(i
1
,i
2
,10)
) = 0.872
(10, 7) (50, 50) min(c
(i
1
,...,10)
) = 0.145, max(c
(i
1
,...,10)
) = 0.603
(10, 7) (500, 500) min(c
(i
1
,...,10)
) = 0.516, max(c
(i
1
,...,10)
) = 0.885
(20, 2) (50, 50) min(c
(i
1
,20)
) = 0.211, max(c
(i
1
,20)
) = 0.702
(20, 2) (500, 500) min(c
(i
1
,20)
) = 0.476, max(c
(i
1
,20)
) = 0.853
Here c
(i
1
,...,i
K
)
is the conditional prior probability, given that true partition is the one specified by
(i
1
, . . . , i
K
), that the LRSE will equal this partition and the max and min refer to the maximum and
minimum values of these prior probabilities over all the possible partitions for a particular choice
of R and K
(P
1
, . . . , P
K
). Note that we estimated the c
(i
1
,...,i
K
)
using Monte Carlo based on
a sample of 10
4
. The estimates are felt to be accurate to about .02.
The case (R, K) = (3, 2) is the simplest possibility as #(T
3,2
) = 2. We see
that even for very small sample sizes the LRSE is very accurate as it is correct
about 75% of the time when (m, n) = (5, 5) and is correct 96% of the time
when (m, n) = (500, 500). This probability increases as we increase n or m.
Similar results are reported for the case (R, K) = (5, 3) where #(T
5,3
) = 6.
When (R, K) = (10, 3) then #(T
10,3
) = 36 and when (R, K) = (10, 7) then
#(T
10,7
) = 84, so we cannot list all the c
(i
1
,...,i
K
)
and have reported max(c
(i
1
,...,i
7
)
)
and min(c
(i
1
,...,i
7
)
) only. Note that, when (R, K) = (10, 3), (n, m) = (50, 50) the
median value of c
(i
1
,i
2
,i
3
)
is 0.305, when (m, n) = (500, 500) the median value of
c
(i
1
,i
2
,i
3
)
is 0.623, when (R, K) = (10, 3), (m, n) = (50, 50) the median value of
c
(i
1
,...,i
7
)
is 0.268 and is 0.670 when (m, n) = (500, 500).
For other priors on p the results are similar. As one would expect, an
informative prior (5) improves performance for those partitions that receive
more prior mass, in the sense that c
(i
1
,...,i
K
)
increases compared to when the
uniform prior is used, while c
(i
1
,...,i
K
)
drops for those partitions for which the
prior probability is lower than with the uniform.
It should be noted that the required simulation time becomes exorbitant
as #(T
R,K
) increases as we need to estimate c
(i
1
,...,i
K
)
for each (i
1
, . . . , i
K
) and
this constrains the range of values of R and K for which we can obtain
simulation results. There is another effect due to the increase in #(T
R,K
). As
illustrated in Table 4, the greater #(T
R,K
) is the more difficult it is for the
LRSE to exactly equal the true partition. Perhaps this is partly due to the
M. Evans et al.
greater number of possibilities for a partition but also it appears that there
is an intrinsic complexity to a particular partition as well. This is illustrated
in the case (R, K) = (20, 2), (m, n) = (50, 50) where we obtained the following
values for the c
(i
1
,i
2
),
namely,
(c
(1,20)
, c
(2,20)
, . . . , c
(18,20)
) = (0.702, 0.375, 0.294, 0.257, 0.245, 0.244, 0.230,
0.212, 0.212, 0.214, 0.212, 0.212, 0.219, 0.227, 0.248, 0.268, 0.303, 0.370, 0.702).
We see that splits near the end are easier to estimate than splits in the middle.
A plausible conjecture for this is that there are simply more splits in or near
the middle than at the ends.
We note also that the performance of the LRSE is undoubtedly much
better from a practical point-of-view than the results indicated in Table 4.
This is because the results displayed in Table 4 make no reference to how
close the LRSE is to the true partition. Clearly, if the LRSE differs
from the true partition in only a few partition elements, then this could
be viewed as satisfactory performance. It isnt completely clear, however,
what to use as the measure of distance between partitions in general. For
example, when (R, K) = (20, 2), (m, n) = (50, 50) we obtained 0.540 for the
conditional probability that the LRSE was in {(9, 20), (10, 20), (11, 20)}, given
that the true partition is (10, 20), which is much higher than the conditional
probability of 0.214 recorded for the LRSE being exactly equal to (10, 20).
Further we obtained 0.756 for the conditional probability that the LRSE is
in {(8, 20), (9, 20), (10, 20), (11, 20), (12, 20)}, given that the true partition is
(10, 20). While a measure of distance seems obvious in this case, for a general
choice of (R, K) more study is required to determine an appropriate distance
measure. Accordingly, we leave the study of this aspect of the performance of
the LRSE to a further investigation, but note that the simulations indicate that
the LRSE should do very well from this enlarged perspective.
4 Conversion of scales
A problem of major interest is how we should convert one scale to another
when we believe that one scale is a collapsing of the other. In one direction it
seems quite clear how to do this.
Once we have selected an estimate of the collapsing, then this immediately
provides a conversion rule from the finer to the coarser scale, i.e., from scale I
to scale II. To assess the uncertainty in this rule we can look at the rules that
would arise fromeach of the partitions in a -credible region for the collapsing.
We illustrate this in the following example.
Example 6 (Converting from the f iner to the coarser scale in Example 2.) In
Example 3 we showed that the LRSE partition is always (2, 4, 5) for all the
priors specified in Example 2. The conversion rule fromscale I to scale II, using
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
Table 5 Conversion of scale I
to scale II in Example 2 using
LRSE collapsing
Scale I Scale II
1 1
2 1
3 2
4 2
5 3
the LRSE partition, is as shown in Table 5 For example, individuals assigned a
score of 1 or 2 on scale I are assigned a score of 1 on scale II.
To assess the uncertainty entailed in this conversion rule we now present, in
Table 6, the full set of conversion rules obtained via the .95-relative surprise
region (8) given in Example 3, when using the prior [0, 0, 0]. Of course, values
1 and 5 on scale I must always convert to 1 and 3 on scale II. From the table we
see that there is little variation in converting 3 on scale I but more variation for
both 2 and 4, for example, four of the five partitions in the .95-credible region
convert 3 to a 2.
A conversion rule for expanding scale II to scale I is more problematic.
To see this, suppose we have selected a rule to convert from the finer to the
coarser scale. Then, if we wanted to convert from the coarser to the finer scale
we have multiple possible assignments for any categories that correspond to
a proper collapsing on the finer scale. A number of rules are possible and we
consider two rules, one deterministic, called the maximal rule, while the other
corresponds to a random assignment, called the random rule. These rules are
based on having chosen a specific collapsing as represented by the partition
(P
1
, . . . , P
K
).
For the maximal rule we look at estimates p
i
of the p
i
, corresponding to a
collapsed category, and choose, for the conversion of a value on the coarser
scale, the value in the finer scale that maximizes p
i
. For the random rule we
use the the estimates p
i
to construct the conditional distributions for each
partition element and then randomly allocate a value on scale II to a value
on scale I using the appropriate conditional distribution. The random rule
seems somewhat more realistic as not every observation on a given category on
scale II corresponding to a collapsing, would be assigned the same category on
scale I.
Table 6 A .95 credible region (based on the partitions in Eq. 8) for conversion of scale I to scale
II in Example 2 when using the uniform prior
Scale I Scale II Scale II Scale II Scale II(LRSE) Scale II
1 1 1 1 1 1
2 2 2 1 1 1
3 2 2 2 2 1
4 3 2 3 2 2
5 3 3 3 3 3
M. Evans et al.
Table 7 Possible conversions of scale II to scale I in Example 2 when using the uniform prior and
based on partitions in the .95-relative surprise region
Scale II Scale I Scale I Scale I Scale I (LRSE) Scale I
1 1 1 1 1 3
2 3 3 3 3 4
3 5 5 5 5 5
Logically, it makes sense to base the estimates p
i
on the conditional poste-
rior of p given that the particular collapsing in question holds. Fixing the parti-
tion (P
1
, . . . , P
K
), the likelihood for p is given by Eq. 2 as a function of p. Now
suppose that we have placed a Dirichlet
R
(a
1
, . . . , a
R
) prior on p to start. The
conditional prior distribution of q is then Dirichlet
K
(

lP
1
a
l
, . . . ,

lP
K
a
l
)
and, from Eq. 2, the conditional posterior distribution of q is Dirichlet
K
(g
1
+

lP
1
( f
l
+a
l
), . . . , g
K
+

lP
K
( f
l
+a
l
)). Given q, the conditional prior of
( p
l
i
/q
i
, . . . , p
u
i
/q
i
), corresponding to P
i
= (l
i
, l
i
+1, . . . , u
i
), is Dirichlet
u
i
l
i
+1
(
a
l
i
, . . . , a
u
i
) and the conditional posterior is Dirichlet
u
i
l
i
+1
( f
l
i
+a
l
i
, . . . , f
u
i
+
a
u
i
). So the conditional LRSE of p
l
/q
i
, for l satisfying l
i
l u
i
, is given by
f
l
/( f
l
i
+ + f
u
i
) which we note does not involve the prior. The maximal rule
then converts i on scale II to the value l satisfying l
i
l u
i
which maximizes
f
l
/( f
l
i
+ + f
u
i
). The random rule converts i on scale II to the value l on
scale I, satisfying l
i
l u
i
, with probability f
l
/( f
l
i
+ + f
u
i
).
We illustrate this in the following example.
Example 7 (Converting from the coarser to the f iner scale in Example 2.) In
Table 7 we give the results of converting scale II to scale I, using the maximal
rule, for each of the partitions in Eq. 8. This again gives us a sense of the
uncertainties involved in this conversion.
We note that Table 6 is based on the .95-credible region (8) that we obtained
for the collapsing, and so can be interpreted as a .95-credible region for the true
conversion from scale I to scale II. This is not the case, however, for Table 7
as these are obtained by a somewhat ad hoc, albeit intuitively reasonable rule.
There is an element of nonidentifiability in converting from scale II to scale I.
In Table 8 we record the conditional distributions corresponding to the
random rule for the LRSE partition in Example 2. We can see from this that
the choice between 1 and 2 on scale I for a 1 on scale II is somewhat uncertain.
Table 8 Conversion probabilities for converting scale II to scale I in Example 2 based on the
LRSE partition
scale II \ scale I 1 2 3 4 5
1 4/7 3/7 0 0 0
2 0 0 2/3 1/3 0
3 0 0 0 0 1
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
5 Model checking
A question of some interest is whether or not the model for collapsings that
we have developed in Section 2 makes sense in light of the observed data
( f
1
, . . . , f
R
) and (g
1
, . . . , g
K
). This is effectively model checking and should
be carried out before we proceed to make inferences about an appropriate
collapsing.
One approach to model checking is based on the minimal sufficient statistic
for the sampling model given by Eq. 2. Once this statistic is determined we
use the conditional distribution of a discrepancy statistic given the value of the
minimal sufficient statistic to assess the model via a P-value. Unfortunately, in
this problem it is very difficult to determine the minimal sufficient statistic let
alone determine the conditional distribution.
An alternative approach might be to select a discrepancy statistic, such
as Pearsons chi-squared test statistic to compare the observed g
i
with the
estimated expected values based on the observed f, for a particular partition
(P
1
, . . . , P
K
) and do this for every possible partition. This, however, entails
the use of
_
R1
K1
_
dependent chi-squared tests and it is not at all clear how to
combine these to compute an appropriate P-value.
We adopt an alternative strategy for model checking here. Suppose we
are satisfied that samples have been correctly obtained from the relevant
population so we feel confident that the underlying multinomial model leading
to likelihood (1) is correct. Now suppose that we have no information that
would suggest that scale II is a collapsing of scale I. In such a case it makes
sense to put independent priors on p and q and then use the posterior to
make inferences about these quantities. Suppose we use the same prior on
p as prescribed in Section 2, namely, p Dirichlet
R
(a
1
, . . . , a
R
) independent
of q Dirichlet
K
(1, . . . , 1). Although other choices are possible for the prior
on q, a noninformative prior would seem to make sense when we are testing a
model.
Now consider the null hypothesis H
0
: q is a collapsing of p, and we will
consider H
0
as a subset of S
R
S
K
. Notice that, provided we place a con-
tinuous prior on S
R
S
K
, then H
0
is a subset having prior probability equal
to 0. The consequence of this is that we can think of the Dirichlet
R
(a
1
, . . . ,
a
R
)Dirichlet
K
(1, . . . , 1) prior as being effectively on H
c
0
while the prior on
H
0
is as described in Section 2. The prior on H
0
only comes into play if
we agree that H
0
makes sense and, to assess this, we use the prior on the
complement H
c
0
. This shows that there is no conflict in our prior assessments
as the prior on p is the same.
To assess H
0
we proceed as described in Evans et al. (1993) and (1997). For
an arbitrary point ( p, q) S
R
S
K
let d
H
0
( p, q) be a measure of the distance
( p, q) is from H
0
. For example, we will use
d
H
0
( p, q) = min
(P
1
,...,P
K
)
K

i=1

q
i

lP
i
p
l

2
, (10)
M. Evans et al.
i.e., least squares distance, although other choices are possible and in fact
may prove more suitable for detecting specific types of deviations from the
model of a collapsing existing. Provided the amount of data is large enough,
however, the use of Eq. 10 will detect deviations from model correctness.
Now we measure the concentrations of the prior distribution and the posterior
distributions of ( p, q) about H
0
by the prior and posterior distributions of
d
H
0
( p, q), respectively. Naturally, if H
0
is true then we should see that the
posterior distribution of d
H
0
( p, q) concentrates much more closely about 0
than the prior distribution of d
H
0
( p, q). To assess this formally let
d
H
0
and

d
H
0
( | f, g) be the prior and posterior densities of d
H
0
( p, q). Then to assess
H
0
we compute the relative surprise tail probability

d
H
0
(d
H
0
( p, q) | f, g)

d
H
0
(d
H
0
( p, q))


d
H
0
(0 | f, g)

d
H
0
(0)

f, g
_
(11)
where ( | f, g) is the posterior of ( p, q) based on the Dirichlet
R
(a
1
, . . . ,
a
R
)Dirichlet
K
(1, . . . , 1) prior. Then
d
H
0
(0 | f, g)/
d
H
0
(0) is the relative belief
ratio of H
0
and this measures how the data have changed beliefs in H
0
. An
alternative to H
0
is given by a nonzero value of d
H
0
and so we see that
Eq. 11 is the posterior probability that an alternative to H
0
does not have a
relative belief ratio greater than that of H
0
. As such, when Eq. 11 is small
there are alternatives that have a larger relative belief ratio than H
0
with high
posterior probability and this is evidence against H
0
. Note that ( | f, g) is the
Dirichlet
R
(a
1
+ f
1
, . . . , a
R
+ f
R
)Dirichlet
K
(1 + g
1
, . . . , 1 + g
R
) measure.
It is straightforward to sample from both the posterior and prior dis-
tributions of ( p, q) and from such Monte Carlo samples we can construct
Fig. 1 The prior (- -) and the
posterior () densities of d
H
0
in Example 8
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
1
0
2
0
3
0
4
0
squared distance
d
e
n
s
i
t
y
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
Fig. 2 The relative belief
ratios of d
H
0
in Example 8
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
.
0
0
.
5
1
.
0
1
.
5
2
.
0
2
.
5
squared distance
r
e
l
a
t
i
v
e

b
e
l
i
e
f
estimates of
d
H
0
and
d
H
0
( | f, g) and estimate (11). The only difficulty lies in
evaluating (10) for each sample and we simply do this by brute force. Provided
_
R1
K1
_
is not large, this is feasible. We illustrate this procedure in the following
example.
Example 8 (Testing the model in Example 2.) In Fig. 1 we have plotted the
prior and posterior densities of d
H
0
( p, q) when p Dirichlet
R
(1, . . . , 1) and
q Dirichlet
K
(1, . . . , 1) based upon Monte Carlo samples from the prior
and posterior of N = 10
5
. Note that the prior and posterior densities are
very similar reflecting the fact that we have a very small amount of data
in this example. Irrespective of that, it is the difference between these dis-
tributions that is of importance as represented by the relative belief ratio

d
H
0
(d
H
0
( p, q) | f, g)/
d
H
0
(d
H
0
( p, q)) as graphed in Fig. 2. From both plots
it would appear that the posterior is concentrating closer to H
0
than the
prior. The relative belief ratio peaks at 0 and is
d
H
0
(0 | f, g)/
d
H
0
(0) = 2.794.
Therefore, Eq. 11 equals 1.00 and we have no evidence against H
0
. Therefore,
we have no evidence against the hypothesis that scale I has been collapsed into
scale II and can now proceed to inference about the specific collapsing, as we
have already discussed.
Table 9 Car satisfaction data for scale I
1 2 3 4 5
19 19 107 225 257
M. Evans et al.
Table 10 Car satisfaction data for scale II
1 2 3
5 381 115
6 Real data examples
We now consider some practical examples.
Example 9 (Satisfaction with cars survey.) This example concerns a pan-
European survey on driver satisfaction with their cars. The survey was done
in 2001 by TNS, a well-known European marketing analytics company. Data
were given to Zvi Gilula on condition that if used for publication, no disclosure
of the make of the underlying cars is permitted. The survey was carried
out on 10 different compact cars made in Europe and Japan and was done
independently in 5 countries (Italy, Germany, Spain, UK, and France). The
same 10- category scale was originally used and then was adapted for each
country to allow ranking of cars by their distribution of satisfaction following
the methodology reported by Yakir and Gilula (1999). In France, a 5-category
scale was used while in Germany a 3-category scale was used.
The following satisfaction data were obtained based on samples of 627
in France and 501 in Germany on a car that is not made in either country
(Tables 9 and 10).
In Fig. 3 we have plotted the prior and posterior densities for the squared
distance used in model checking. In this case (11) is effectively 0 and we have
Fig. 3 The prior (- -) and the
posterior () densities of d
H
0
in Example 9
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
1
0
1
5
2
0
squared distance
d
e
n
s
i
t
y
5
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
Table 11 Student evaluation data for scale I
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3 3 0 0 3 0 3 0 3 0 3 0 6 6 6 0 12 6 12 39
strong evidence against a collapsing as we have defined it. This suggests that
French and German respondents have different attitudes (possibly cultural)
towards this car, or at least that a more sophisticated model for the collapsing
is needed to explain the results, if indeed their attitudes are the same.
Example 10 (Student evaluation data.) Students instructor evaluation at the
Hebrew University has been done for years on a 20-category scale. In the
year 2000, the provost established a professional committee to re-examine
the entire evaluation methodology and come up with recommendations for
improvement. One major recommendation was to replace the original scale by
an 11-category scale. An experiment was carried out by one of us (Gilula) on
second year BA students in psychology taking a course in research methods.
The class had about 140 students and 105 of them filled the regular 20-category
form. A sample of anonymous 43 of them later filled in the 11-category form.
Although the two underlying samples are not fully independent, we used
our proposed methodology for illustrative purposes. The data appears in the
Tables 11 and 12.
There are
_
19
10
_
= 92,378 possible collapsings. With this number the com-
putation time for the model checking is considerable but the computations
for inference about the true collapsing are still very fast. In Fig. 4 we have
plotted the prior and posterior densities for the squared distance used in model
checking. We see that the posterior has concentrated much more closely about
0 than the prior. In fact Eq. 11 here is 0.967 so there is clearly no evidence
against the model.
We then proceeded to make inference about which of the possible collaps-
ings was appropriate. Experience with these surveys lead to the hypothesis
that an appropriate collapsing was given by (4, 5, 7, 8, 10, 11, 13, 15, 17, 19, 20).
In Fig. 5 we have plotted the relative belief ratios for each of the collapsings
where they are labelled sequentially according to the ordering we discussed in
Example 1. The hypothesized collapsing corresponded to #86388, with relative
belief ratio equal to 45.477, while the LRSE corresponds to collapsing #81260,
with relative belief ratio equal to 47.987. So the hypothesized collapsing is
well supported and in fact Eq. 11 equaled .991 when we tested the null
Table 12 Student evaluation data for scale II
1 2 3 4 5 6 7 8 9 10 11
2 1 1 0 1 1 3 6 5 7 16
M. Evans et al.
Fig. 4 The prior (- -) and the
posterior () densities of d
H
0
in Example 10
0.00 0.05 0.10 0.15 0.20
0
2
0
4
0
6
0
8
0
1
0
0
squared distance
d
e
n
s
i
t
y
hypothesis that this is the true collapsing. The LRSE collapsing is given by
(3, 5, 7, 8, 9, 11, 13, 16, 17, 19, 20) which we see is similar to the hypothesized
collapsing. Note that because we are using uniform priors, the LRSE is also
the posterior mode.
Fig. 5 The relative belief
ratios of the collapsings in
Example 10
Conversion of ordinal attitudinal scales: An inferential Bayesian approach
7 Conclusions
We have formalized the problem of collapsing one categorical ordinal scale
into another as an inference problem. Inferences then proceed by placing a
prior on the parameter for the finer scale and a prior on the set of collapsings
and applying Bayesian methodology. We have developed a test to assess
whether or not a collapsing makes sense for a given data set. Furthermore,
we have discussed how one can convert one scale into another and assess the
uncertainty in such a conversion. The authors will be happy to provide the R
code that was used to implement the calculations.
Both from the reviewers fruitful comments and our initial results, we are
encouraged and plan to extend this research to address the following issues.
1. The neutrality constraint. In many (if not most) cases, the attitudinal scale
has an odd number of categories to allow for a neutral category as the
middle category. It is not uncommon to have such a scale collapsed into
a coarser scale having an even number of categories without an explicit
neutral category. In such cases it is not clear whether such a category
can be collapsed over or must be retained. This adds an interesting
constraint not addressed by the methodology discussed here. We are now
investigating a new approach that utilizes stochastic ordering which, we
hope, can accommodate this constraint.
2. Cut-point modeling. Ordinality of scales has traditionally prompted the use
of latent variable models with censoring (e.g. Rossi et al. 2001). Typically,
to use such an approach, more data is needed per individual, something
like the use of several attitudinal responses per individual. Such data are
not easily disclosed, and we are currently looking for cooperative sources.
3. More than two scales. The problem of finding an optimal conversion
mechanism for 3 or more scales is challenging. As with log-linear models,
this will require an extended methodology.
4. Time dependence. Sometimes surveys are administered at different points
of time which may affect the multinomial probabilities underlying the
scales considered. We are investigating this interesting avenue.
We conclude this paper with the hope that this research will stimulate
interest among marketing scientists.
Acknowledgements The authors thank the Editor and two referees for many helpful comments.
References
Box, G. E. P., & Tiao, G. C. (1973). Bayesian inference in statistical analysis. New York: John Wiley
and Sons.
Evans, M., Gilula, Z., & Guttman, I. (1993). Computational issues in the Bayesian analysis of
categorical data: Loglinear and Goodmans RC model. Statistica Sinica, 3, 391406.
M. Evans et al.
Evans, M., Gilula, Z., Guttman, I., & Swartz, T. (1997). Bayesian analysis of stochastically ordered
distributions of categorical variables. Journal of the American Statistical Association, 92(437),
208214.
Evans, M., Guttman, I., & Swartz, T. (2006). Optimality and computations for relative surprise
inferences. Canadian Journal of Statistics, 34(1), 113129.
Evans, M., & Jang, G. H. (2011). Inferences from prior-based loss functions. Technical Report
No. 1104, Dept. of Statistics, U. of Toronto.
Evans, M., & Shakhatreh, M. (2008). Optimal properties of some Bayesian inferences. Electronic
Journal of Statistics, 2, 12681280.
Green, P. E., & Rao, V. R. (1970). Rating scales and information recovery - How many scales and
response categories to use? Journal of Marketing, 34, 3339.
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity
for processing information. Psychological Review, 63, 8197.
Rossi, P. E, Gilula, Z., & Allenby, G. M. (2001). Overcoming scale usage heterogeneity:
A Bayesian hierarchical approach. Journal of the American Statistical Association, 96(453),
2031.
Yakir, B., & Gilula, Z. (1999). An insightful comparison of brands on an arbitrary ordinal scale.
Journal of the Marketing Research Society, 40, 249264.

You might also like