On The Compatibility of Nested Logit Models With Utility Maximization

Journal of Econometrics 43 (1990) 373-388.
North-Holland
ON THE COMPATIBILITY OF NESTED LOGIT MODELS

WITH UTILITY MAXIMIZATION*
Axe1 BaRSCH-SUPAN
Urtirvrsity of Munnheim, D-6800 Munnheim, West Germcm?,
The paper examines the relationship between the nested multinomial logit (NMNL) model and
rational consumer behavior. We show that the Daly-Zachary-McFadden (DZM) condition of the
validity of stochastic utility maximization in NMNL models is unnecessarily strong and leads too
often to rejection of NMNL models with large dissimilarity parameters. In many cases, the
distribution of the stochastic utility components in the NMNL model can be modified in a way
that preserves the NMNL choice probabilities on a subset of R’, where I is the number of
alternatives in the choice set. On this subset, the DZM condition can be violated without violating
the restrictions of stochastic utility maximization.
1. Introduction
Discrete choice models are increasingly popular in applied econometric
work. Most popular is the multinomial logit (MNL) model which is easy to
compute even for a large number of alternatives, but suffers from the ‘inde-
pendence of irrelevant alternatives’. The multinomial probit (MNP) model
avoids this restriction but is computationally infeasible for problems with
more than a few alternatives. McFadden (1978) introduced the nested multi-
nomial logit (NMNL) model as a compromise between functional flexibility
and computational feasibility. This model has since enjoyed a larger number
of applications particularly in the analysis of transportation policies.
In general, all of the above models have structural microeconomic interpre-
tations as demand functions derived from stochastic utility maximization
[McFadden (1981)]. For the NMNL model, however, certain restrictions on
the parameters that control the correlation among unobserved attributes have
to be satisfied for this interpretation to hold. This is frequently not the case in
practical applications.
The aforementioned parameter conditions (the Daly-Zachary-McFadden
[DZM] condition) essentially guarantee the nonnegativity of the density func-
tion that characterizes the NMNL model. The frequent violation of the DZM
*This paper owes much to many very helpful discussions with Dan McFadden and the
comments by two anonymous referees. All remaining errors are mine. The research was conducted
at the J.F. Kennedy School of Government at Harvard University. Partial financial support by the
National Science Foundation is gratefully acknowledged.
0304-4076/90/$3,50Q1990, Elsevier Science Publishers B.V. (North-Holland)

374 A. Biirsch-Supan, Computibility of nested logit models
condition may be interpreted as a weakness of the specification of this density

function, i.e., the functional form of the NMNL model. Because in all discrete
choice models the underlying density function should just be viewed as an
approximation to something much more complicated, we propose a slight
modification of the NMNL model that extends its applicability to many cases
in which the DZM condition does not hold.
This modification consists of constructing an underlying distribution func-
tion for the stochastic component of the choice model which is nonnegative in
R’ and coincides with the distribution function of the NMNL model on a
subset of R’, where I is the number of alternatives in the choice set. On this
subset, the DZM condition can be violated without violating the restrictions of
stochastic utility maximization. This set can be characterized by signs of
derivatives of choice probabilities. For two-level NMNL models with few
alternatives per branch, the sign conditions reduce to very simple expressions.
The paper proceeds in three steps. First, we briefly review the hypothesis of
stochastic utility maximization and the NMNL model. Second, we illustrate
the above-mentioned construction in a simple example. Finally, we generalize
this construction to a broad class of discrete choice models and apply the
results on two-level NMNL models.
2. Stochastic utility maximization
We assume a sample of T consumers, each choosing among I discrete

alternatives. Each alternative i provides utility uit to consumer t. Utility u,,
consists of a deterministic component u,, (usually specified as a linear combi-
nation C k _ 1, 3,X:/l k of K characteristics X,$ with p to be estimated) and an
additive disturbance E;,:
u,t= u,,+ &,I. 0)

A consumer is said to maximize his stochastic utility [McFadden (1981)] if he
prefers alternative i over alternative j if and only if
u,t’ u,,. (4
Relations (1) and (2) imply the probability that consumer t chooses alternative
i from the set of all I alternatives:
=(LJ.,,,E,,<.
pi(uA 8, +u II -_(, ,,, i=l,..., I, J#i)
d%), (3)
A. Bijrsch-Supm, Computihility ofnestedlogit models 375
where F denotes the joint c.d.f. of Em= (si,, . . . , Ed,), assumed to be i.i.d. across
consumers, and u, = ( uir,. . . , u,~).
For F multivariate normal, eq. (3) defines the choice probabilities of the
multinomial probit (MNP) model; for F extreme value distributed and i.i.d.
across alternatives, eq. (3) generates the multinomial logit (MNL) model.
Eq. (3) shows that a discrete choice model can equivalently be specified by
the c.d.f. F of the stochastic utility components E,, i = 1,. . . , I, or a set of
choice probabilities { Pi] i = 1,. ..,Z }.We will call F the generating function
of the P,.Any c.d.f. will generate a discrete choice model. However, the
multiple integral in eq. (3) may be impractical to evaluate as in the case of
multinomial probit models for more than four alternatives. In turn, any set of
choice probabilities that satisfies a set of compatibility conditions defines a
stochastic utility maximization model with an implied joint distribution of the
stochastic utility components. These compatibility conditions are [Williams
(1977) Daly and Zachary (1979) McFadden (1981)]:
p,(u) 2 0, C P,(u)=l, P,(u)=P,(u+ar)forall (YER,

r=l . . I
(4
dP,(u)/du,=
aPj(u)/au,, (9
Condition (4) represents the basic requirements of nonnegativity and adding-up

of the Z choice probabilities as well as the dependence of the comparison only
on the differences in utilities (‘translation invariance’). Condition (5) guaran-
tees the integrability of the Piand is a straightforward analogue to the Slutsky
condition in continuous demand analysis. Condition (6) is the essential re-
quirement for the implied distribution function to be properly defined, i.e., to
have a nonnegative density function.
3.The nested multinomial logit model

In the nested multinomial logit model [NMNL, McFadden (1978)], the Z
alternatives are grouped in K subsets, each consisting of Z(k) alternatives.*
376 A. B&x+Supun, Compatibility of nested logit models
We will denote such a hierarchical choice model (‘tree’) by T(Z(l), . . . , Z(K)).

The choice can be visualized as first among the K subsets, then among the
Z(k) alternatives in the chosen subset k. The NMNL choice probabilities can
be decomposed into
Z’,(u) = Q&d .p,,,(d~ (7)
where Qk denotes the marginal choice probability of subset k and Pilk the
conditional probability of choosing alternative i among the Z(k) alternatives
in subset k. Qx and Pjlk have the familiar functional form of simple
conditional logit choice probabilities [McFadden (1978)]:
exduI/ek> E(k)8k
p,(u) = (8)
E(k) ’ 1 E(Z)”
/=1 .. .. K
HereE(k) = C,=J(k).....J(k)+l(k)-1
exp( - u/0,) denotes the exponent of the
so-called inclusive value of subset k, .I( k) the index of the first alternative in
this subset, and 8, the so-called dissimilarity parameter of subset k. In terms
of the c.d.f. in eq. (3) the NMNL model parametrizes the generating function
0,.
F by the dissimilarity parameters 8,:
F NMNL(E;O) = exp
1 [
- f
k=l
‘E)exp(-si)liek
;=I Ii (9)
Each 8, not equal to one introduces a nonzero correlation among the distur-
bances E, within subset k. Therefore, the NMNL model overcomes the
assumption of i.i.d. disturbances across alternatives, the so-called ‘indepen-
dence of irrelevant alternatives’, that is the major disadvantage of the MNL
model.
The compatibility of the NMNL model with the stochastic utility maximiza-
tion hypothesis can be established by verifying conditions (4) through (6).
Only condition (6) is restrictive and equivalent
- to
o<e,a, k= I,..., K, (10)
if condition (6) is to be valid for all deterministic utility components u E R’

A. Biirsch-Supun, Compatibility of nested logit models 311
[McFadden (1979), Daly and Zachary (1979), referred to as DZM condition].

Only if condition (10) is satisfied is the generating function F a proper
cumulative distribution function.
The dissimilarity parameters ek are usually estimated from the data by
maximum likelihood methods. The second inequality in condition (10) is quite
frequently violated in empirical analysis.3 A violation may be interpreted as a
misspecification of choice model (3), for instance an unsuitable choice of the
generating function F. On the other hand, the requirement that condition (6)
holds for all u E R’ may be overly restrictive because economic theory usually
restricts the set of data points that are sensible for a specific application of the
choice model to a subset of R’.
4. A three-alternative example
The following example shows the potential for a reconciliation of NMNL

models with the stochastic utility maximization hypothesis. We will illustrate
the effect of 8, > 1 in the simplest case of a three-alternative NMNL model in
which the first two alternatives constitute a group, denoted by T(2,l). This
model is generated by the cumulative distribution function
Because of translation invariance [condition (4)], we can reduce F to the

two-dimensional space of differences without losing information. Define the
differences q1 = Ed- E, and q2 = Ed- Ed.Their joint distribution function is
fw(-M9))e-1
(I+
Gh) = (12)
(1 +exp(-111/e))e+exp(-772)
’
We define as pseudo-density the function g = a*G/aq,aq,,
0 + exPbh/We
s(v) =
(1+exp(-rll/e))e+exp(-112)
(13)
‘Examples are: McFadden, Talvitie, and Associates (1977). Coslett (1978). Small and
Brownstone(1982), Hensher (1984). Bksch-Supan (1985).
378 A. Bijrsch-Supan, Compatibility of nested logit mddels
/
P3: /
/
c3+v3 > cl+“1 /
c3+v3 > <2+v* /
/
/
/
- - n (w1.q) - (vl-p.v1-v3)
I
PI:
P2:
cl+“1 > c3+v3
cl+“1 > cz+vJ ‘2+“2 > cl+“1
<7_+q > c3+v3
>
.
‘)I - f2-Cl
I
. .
I
. . . . . . . . . log
.
L . l b ‘ . 8-l
.
.
g(rl1vsp) ’ 0 I<
wl,w**) - (q-q,hZ(q-‘q))
.I/j___
n
‘I
. /\\<
I 8-l
. ‘\
. ‘\\\\
Fig. 1. Choice probabilities at deterministic utility (0,. I+. q) and density function of the trinary
NMNL model defined by QE,. Q, Q) = exp( -[exp( -q/B) + exp( -E~/O)]’ - exp( -Q)].
where the discriminant term A,
A=
7-Q+ exphl/~))o e-1
--=2.p,-B,
e-1
(14)
(1 + ev( -7,/W” + exph) e
signs the pseudo-density g( 7).

If 8 < 1, then A > 0 and g(v) represents a proper density function. How-
ever, if 8 > 1, then
i?(v) ’ 0 ($712 ’ ill) = 1%E-diog[i+exp(-$)I. (15)
The function hl(ql) defined by g(q,, q2) = 0 is plotted in fig. 1. For 8 > 1,
A. BBrsch-&pun, Computihili[y of nesred logit models 319
h,( vi) approaches log[( B - l)/( 19+ l)] as ni -+ cc and ni - log[( 8 - l)/( 0 +
l)] as ni + - cc. For each vi, a point w * = h,( vi) exists, such that g( vi, Q) 2 0
for n2 2 w* and g(n2, 71~)< 0 for n2 < w*. As B approaches one from above,
the constant log(8 - l/e + 1) approaches - cc so that the region with a
negative pseudo-density vanishes. Let
denote the area of nonnegative ‘density’ g(n). The complement of B(8)

[where g(n) is negative] cannot be enclosed by any set of (ni, n2) correspond-
ing to the area that defines an observed choice probability P, (see fig. 1). In
other words, even if B > 1, NMNL choice probabilities are always nonnega-
tive. This, of course, follows also from their definition as products of simple
conditional logit choice probabilities in eq. (8).
If at least one observed (vi, n2) lies outside of B(8), that is, below the
function hi, the NMNL model cannot explain the observed choices by the
stochastic utility maximization hypothesis because condition (6) (positive
density) is violated. However, we can construct a subset A(B) c B(B), such
that for all points in this subset the NMNL choice probabilities do indeed
represent a choice system compatible with stochastic utility maximization.
Fig. 1 illustrates the construction. We will choose the support of the new
distribution of (TJ~,n2) to be a subset of B(8) such that the new distribution
function coincides with the NMNL distribution function everywhere on that
subset and has a proper density function - in the sense that the density
function is nonnegative in the entire domain R’ and integrates to one.
For each point (vi, n2), define
07)
Because of (15) and (16) L(n1,n2) ~0 for n2< hi(nJ. For each nl, the
continuity of g(n) implies the existence of a point w** = h2(ni) such that
L(n) 2 0 for n2 2 w **. Evaluating L(ni, n2) yields
~(~1 >o-~~>~~(v~) =10g(e-1) -elog[l +ex~bh/Vl.

(18)
The set A(B), defined by
(19)
380 A. BBrsch-Supan, Compatibility of nested logit models
is the region where the NMNL model can be reconciled with utility maximiza-
tion. Define a modified density function
g*(7)) = g(v) if 9 E A(e),

(20)
=o otherwise.
This amounts to offsetting the ‘negative mass’ outside B(B) with the positive
mass in B(B)\A( 6) in a way that preserves the NMNL choice probabilities in
A( 0):
= (“’ 1” g(x, y)dydx if (R, 772)

EA(e).
For points in A(B), the new choice model defined by g* is equivalent to the
NMNL model. For points outside of A(B), the choice probabilities will in
general be different from those of the NMNL model because in B(0)\A(0)
0 = g* I g and elsewhere g I g* = 0. However, g* is a proper density func-
tion in the entire domain R’.
There are, of course, many other ways to construct supports that yield
proper density functions. For instance, renormalizing g(qr, Q) in B(e) yields
the largest possible support and a continuous density function [g* is discon-
tinuous on the boundary of A(e)]. However, the attractiveness of the proposed
construction is that it exactly reproduces NMNL choice probabilities on its
support. The possibility of this construction therefore reconciles the use of
NMNL choice probabilities even if 8 > 1 as long as all relevant data is
contained in A(I~).~
4Relevant data points include all observed and projected deterministic utility components ~1,.In
general, the u, will be linear combinations X,fl of characteristics X, for alternative i, see eq. (1).
The coefficients p will be estimated from a data set (d,, X,) that includes the dependent indicator
variable for alternative i, d,.
A. Biirsch-Supan, Compatibility ofnestedlogit models 381
Because of
L(u)= (“’ g(q,y)dy
the inclusion of ( ul, uZ) in A( 0) can easily be checked by evaluating the first
derivative of a choice probability:
Therefore, the above compatibility condition can also be interpreted as a

restriction on the admissable choice probabilities that is satisfied whenever the
choice probability of the first subset is sufficiently large. Eq. (21) also implies
that the set A(8) is not empty in the three-alternative, two-subset nested
choice model T(2,l) with 0 -C8 < cc.
5. General case
The above construction can be generalized. We first prove a local compati-

bility theorem that is valid for any discrete choice model.
Theorem 1. Let {P,]i= l,..., I } defke a system of choice probabilities satisfy-

ing the adding-up and translation inuariance conditions (4) and (5), but not
necessarily the global nonnegatiuity condition (6).
Let A = [a,, b,] X . . . x [a,, b,] be an I-dimensional interval.
If for all u E A the choice probabilities { P,(v) )i = 1, . . . , I } have nonnegative
mixed partial derivatiues of any order up to I - 1, then these choice probabilities
define a discrete choice model compatible with stochastic utility maximization
euen when the generating function F is not a proper c.d.f. because its associated
pseudo-density f is negatiue in the complement of A.
Proof. Let u=(v,,..., ul) be a point in A. Because of translation invariance

[condition (4)], we can map u, A, and F into the (I - 1)-dimensional space of
382 A. Biirsch-Supan, Computibili+ of nested logit models
differences relative to the first alternative:
w, = rJ1 - u,, i=2 ,f.., 1,
C=[c2,d2]x ..* x[c,,d,],
c,=min(a,-&,6,-a,), d;=max(a,--b,,b;-a,), i=2 ,..., I,
G&w,, . . . . w,) = / Oc
D,F(z,w,+z ,..., w,+z)dz.
--3o
With these definitions [McFadden (1981, eq. (5.3))],
P,(u)=P,(O,u,-u,,...,u,-u,)
(22)
= G’( u, - Vl,..., u1 -q) = G'(w)
and
pl(u) = Jr J”+l;l-y J”+w
e,=-cc ez=-w E,=_OO f(&JdE

(23)
“>
= ...
/ 12=-C-Z / n:=_a&+h.
Consider the first choice probability for w E C. Only points with w, < d, are
relevant. Define
(g’(w) if w,>ci forall i=2 ,..., I,

aI-
aw2...[awj]...
aw,pl(w)
if w,=c,, wi>c, for i=2 ,..., I, i#j,
g’.*(w) = ...
(24)
&P’(w) if w, = ci for i=2 ,..., I, i+j, w,>c. I’

I
PI(w) if w,=ci forall i=2 ,..., I,
,O if wi<c, foratleastone i=2 ,..., I.
A. Biirsch-Supnn, Computihility of nested logit models 383
The derivatives of the choice probabilities are nonnegative by assumption,

consequently, $9 * 2 0.
Furthermore.
P,(u)=/“‘~ -/ g’(q)dq
lj2=-x. ?j,=-cc
+[:‘ra, ... 1” g’(q)dq+ ... +lC* ._.JC’~l/w’gl(~)dq

--oo --m -CO Cl
+ ... +
+I:‘...[;;‘/” g’(v)dq+ ... +lC* /w3.../w’g’(q)dq

-CO -ca C) (‘1
=P,(c)+Sw2~Pl(q2.Y,...,Wl)dqZ+ e..
(‘2 2
+f- [:ig’(+G
384 A. BBrsch-Supun, Compufibility of nested logil models
which shows that the choice model based on g’, * generates the same choice
probability of the first alternative as the original choice model based on g’.
Analogous arguments show the possibility of defining density functions that
preserve the other choice probabilities. These definitions trivially coincide in
the complement of A. It remains to be shown that the definitions are
compatible with each other in the interior and at the boundary of A. Using the
Slutsky condition (5) this follows from’
1
a a a a
Jq,=-m”’ [I
WI Y
-... - . .._...~
aw, [aw,
1
awj aw, q,=-cc
ni
x . . .
/ 1),=-C-Z
s’3*(dd’rl
a a a
aw, &w
=-... - ...-...
aw, [aw,1 I
a a a
= -...
aw,
-...
aw,
-
[ 1
aw,
... &-<bJ
I
a a a a
I1 I
9 Y
aw, aw, J‘I,=-a”’ J

=- . ..-... - . ..-
aw, aw, [
1),=-m
vi
X . . .
I v,=--30
g’,*(ddv
Hence f*, defined by
g1’*(?J2,...,Tjf)=Jcc
D*f*(z,qi+z,...,~lI+Z)dz,
-cc
is a well-defined function that generates the same choice probabilities in A as

the original f. Because the original model satisfied the adding-up condition
(4) f* integrates to one. Nonnegativity was shown above. Hence f* is also a
proper density function which finishes the proof. Q.E.D.
‘The square brackets denote terms to be left out. The argument shown applies to the interior
points of A( 8). Leaving out one or more integrations yields the compatibility of the derivatives in
definition (24).
A. Bijrsch-&pun, Computihiliiy of nested logit models 385
For practical purposes, it has to be checked whether the nonnegativity

condition of Theorem 1 is satisfied in an interval which contains all relevant
deterministic utility components. In general, this involves evaluating all mixed
partial derivatives of all choice probabilities up to order I - 1, a number that
increases combinatorially.
However, in the case of the nested multinomial logit model, the tree-like
structure yields much more convenient conditions that involve derivatives of
the choice probabilities of order less than the number of alternatives in each
subset. Also, these derivatives have simple functional forms, namely mixed
polynomials in the marginal choice probabilities of these subsets. Taking first
derivatives, we obtain a generalization of ‘eq. (21) that applied to the simple
trinary model 7’(2,1).
Theorem 2. In two-level NMNL models, a necessary criterion for the nonnega-

tivity condition in Theorem 1 are sujiciently large marginal choice probabilities of
the subsets of similar alternatives:
e,N -4)/k k=l,..., K. (25)

Proof. Using the definitions in eq. (8),
E(k)e”-2 1 - 8, E(k)‘”
= exp( -Y,~,,+&) .
xE(j)B, ’ i 6, + xE(j)‘/ ’
:
and the term in curled parentheses equals (25). Q.E.D.
As opposed to the case of the three-alternative, one-subset model T(2, l),

the condition in Theorem 2 is generally not sufficient. Furthermore, the K
criteria in (25) may be conflicting for models with K > 1. For instance, in the
four-alternative, two-subset model T(2,2), we obtain
Q, 2 (1- ‘%)A and Q2 = (1 - Q,) 2 (1 - 0,)/e,,
which cannot be satisfied if r3i = 8, > 2.

Theorem 2 shows that it is not always possible to find a subset of data
points in which the NMNL model can be reconciled with stochastic utility
maximization. The following theorem presents a necessary and sufficient
criterion for the nonnegative condition in Theorem 1.
386 A. B&X/I-Supan, Compatihilit~~ of nested logit models
Theorem 3. Let T( I(l), . . . , I( K )) denote a two-level nested multinomial logit

model with K subsets of I(k) alternatives. Let g*(l), . . . , I(K) denote the
pseudo-density function that generates the model T( I(l), . . . , I(K)). Let v =
(v l,....ut) be a point in the intervalAcR’. The nonnegativity condition of
Theorem I is satisfied if and only if all members in the set
{g,,,.. ..,,,(Q,(v),...,Q,(v>; k...,&)
10<n,<I(l),...,O<n,~I(K)} (26)
are nonnegative at the marginal choice probabilities Q,(v), . . . , QK( v). More-
over, the functions g,, ,.,.,, ,,K are signed by mixed polynomials h,,,, ( ,,K in
Q1,. . . , QK of degree I(1) - 1,. . ., I(K) - 1.
Proof. The proof is a simple, but tedious exercise of evaluating all derivatives
in eq. (24). continuing the proof of Theorem 2. Note that the mixed derivatives
Table 1
Density-signing polynomials for some typical nested multinomial logit models
NMNL-model Density Nontrivial derivatives
T(231) h:
T(2.1.1) h:,. h:
T(2.2) h:,, h:,, h:, h:
T(3.1) h:,, h:, h:
T(3.2) h:,, h,,, h:,, h:,, h:, h:,, h;,
T(3,3) h 33 h:;, h::, h:,, h:,, h,,, h:,, h:,, h’j, h:, h;
h,,=h,,(Q,,Qr,e,,e2)=l20.Q:.Q:+72.r,,Q:.Q2+72.r,.Q,.Q:+54.r,.7,.Q,.Q,
+6~~,.(r,~Q~+6~r2.a2.Q~+6.~,.a,.r2~Q2+6.~2.q.r1~Q,
+ 7, 0, T2 O2
for 7, = ce, - 1)/e,. a, = (8, - 2)/e,
A. BBrs&Supun, Computihih+ of nested logit models 387
with respect to y, and yj are always nonnegative unless alternatives i and j

belong in a common subset. Finally, repeated application of aQk/Jy, =
exp( -y/O,,) . E(m)-’ . { Qk(Q,,, - a,,)} (Theorem 2) yields the signing part
h “, ,,, “K. Here, m denotes the subset of alternative i and S,, = 1 for k = m,
0 otherwise. Q.E.D.
Theorem 3 reduces the evaluation of all mixed derivatives up to order I - 1

of I choice probabilities to the evaluation of a much smaller set of mixed
polynomials in the marginal probabilities of choosing subsets of similar
alternatives. This set of mixed polynomials can be identified as the set of
pseudo-density functions of all NMNL models that are generated by deleting
one or more alternatives from the original model. Table 1 presents the sets of
polynomials for some typical nested multinomial logit models.
6. Conclusions
We showed that the Daly-Zachary-McFadden condition of the validity of

stochastic utility maximization in nested multinomial logit models is unneces-
sarily strong. Therefore we may too often reject NMNL models because of
their large dissimilarity parameters.
Theorem 1 presents a weaker validity condition for large class of discrete
choice systems that can be invoked when the domain of estimated and
projected deterministic utilities can be restricted to a subset of R’.
Theorem 3 applies this condition to NMNL models. Here, the condition has
a convenient form that involves mixed polynomials in the marginal choice
probabilities with degrees equal to the numbers of alternatives in each relevant
subset minus one. Moreover, Theorem 2 provides a simple necessary condition
for Theorem 1. It reveals that nested NMNL models with large dissimilarity
parameters are only compatible with stochastic utility maximization when the
marginal choice probabilities of subsets of similar alternatives are sufficiently
large.
References
Biirsch-Supan. Axe], 1985. Tenure choice and housing demand, in: Konrad Stahl and Raymond
Struyk. eds.. U.S. and German housing markets: Comparative economic analyses (Urban
Institute Press, Washington, DC) 55-105.
Coslett, Stephen R.. 1978, Efficient estimation of discrete choice models from choice based
samples.‘Unpublished Ph.D. dissertation (University of California at Berkeley, CA).
Dalv. Andrew and S. Zachary. 1979, Imnroved multinle choice models, in: David Hensher and
6. Dalvi. eds.. Identifying and measuring the determinants of mode choice (Teakfield.
London) 335-357.
Hensher, David A., 1984, Full information maximum likelihood estimation of a nested logit
mode-choice model, Working paper no. 13. Dimensions of automobile demand project
(Macquire University, North Ryde).
388 A. Bb’rsch-Supan, Compatihiliq of nested logit models
McFadden, Daniel. A. Talvitie, and Associates, 1977, Demand model estimation and validation:
Urban travel demand forecasting project, Final report (Institute for Transportation Studies,
University of California at Berkeley, CA).
McFadden. Daniel, 1978. Modelling the choice of residential location, in: A. Karlgvist, ed.,
Spatial interaction theory and residential location (North-Holland, Amsterdam) 75-96.
McFadden, Daniel, 1979, Quantitative methods for analyzing travel behavior of individuals: Some
recent developments, in: David Hensher and P. Stopher, eds., Behavioral travel modeling
(Croom Helm, London).
McFadden, Daniel, 1981, Econometric models of probabilistic choice, in: Charles F. Manski and
Daniel McFadden. eds., Structural analysis of discrete data with econometric applications
(MIT Press, Cambridge. MA) 198-272.
Small, Kenneth A. and David Brownstone, 1982, Efficient estimation of nested logit models:
An application to trip timing, Economic research program research memorandum no. 296
(Princeton University, Princeton, NJ).
Williams, H.. 1977. On the formation of travel demand models and economic evaluation measures
of user benefit. Environment Planning A 9.

On The Compatibility of Nested Logit Models With Utility Maximization

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On The Compatibility of Nested Logit Models With Utility Maximization

Uploaded by

Copyright:

Available Formats

Journal of Econometrics 43 (1990) 373-388.

ON THE COMPATIBILITY OF NESTED LOGIT MODELS

0304-4076/90/$3,50Q1990, Elsevier Science Publishers B.V. (North-Holland)

condition may be interpreted as a weakness of the specification of this density

2. Stochastic utility maximization

We assume a sample of T consumers, each choosing among I discrete

u,t= u,,+ &,I. 0)

p,(u) 2 0, C P,(u)=l, P,(u)=P,(u+ar)forall (YER,

Condition (4) represents the basic requirements of nonnegativity and adding-up

3.The nested multinomial logit model

We will denote such a hierarchical choice model (‘tree’) by T(Z(l), . . . , Z(K)).

Z’,(u) = Q&d .p,,,(d~ (7)

o<e,a, k= I,..., K, (10)

if condition (6) is to be valid for all deterministic utility components u E R’

[McFadden (1979), Daly and Zachary (1979), referred to as DZM condition].

The following example shows the potential for a reconciliation of NMNL

Because of translation invariance [condition (4)], we can reduce F to the

where the discriminant term A,

signs the pseudo-density g( 7).

i?(v) ’ 0 ($712 ’ ill) = 1%E-diog[i+exp(-$)I. (15)

denote the area of nonnegative ‘density’ g(n). The complement of B(8)

~(~1 >o-~~>~~(v~) =10g(e-1) -elog[l +ex~bh/Vl.

The set A(B), defined by

g*(7)) = g(v) if 9 E A(e),

= (“’ 1” g(x, y)dydx if (R, 772)

L(u)= (“’ g(q,y)dy

Therefore, the above compatibility condition can also be interpreted as a

The above construction can be generalized. We first prove a local compati-

Theorem 1. Let {P,]i= l,..., I } defke a system of choice probabilities satisfy-

Proof. Let u=(v,,..., ul) be a point in A. Because of translation invariance

differences relative to the first alternative:

w, = rJ1 - u,, i=2 ,f.., 1,

C=[c2,d2]x ..* x[c,,d,],

c,=min(a,-&,6,-a,), d;=max(a,--b,,b;-a,), i=2 ,..., I,

With these definitions [McFadden (1981, eq. (5.3))],

pl(u) = Jr J”+l;l-y J”+w

e,=-cc ez=-w E,=_OO f(&JdE

(g’(w) if w,>ci forall i=2 ,..., I,

&P’(w) if w, = ci for i=2 ,..., I, i+j, w,>c. I’

The derivatives of the choice probabilities are nonnegative by assumption,

+[:‘ra, ... 1” g’(q)dq+ ... +lC* ._.JC’~l/w’gl(~)dq

+I:‘...[;;‘/” g’(v)dq+ ... +lC* /w3.../w’g’(q)dq

aw, aw, J‘I,=-a”’ J

Hence f*, defined by

is a well-defined function that generates the same choice probabilities in A as

For practical purposes, it has to be checked whether the nonnegativity

Theorem 2. In two-level NMNL models, a necessary criterion for the nonnega-

e,N -4)/k k=l,..., K. (25)

and the term in curled parentheses equals (25). Q.E.D.

As opposed to the case of the three-alternative, one-subset model T(2, l),

Q, 2 (1- ‘%)A and Q2 = (1 - Q,) 2 (1 - 0,)/e,,

which cannot be satisfied if r3i = 8, > 2.

Theorem 3. Let T( I(l), . . . , I( K )) denote a two-level nested multinomial logit

{g,,,.. ..,,,(Q,(v),...,Q,(v>; k...,&)

NMNL-model Density Nontrivial derivatives

with respect to y, and yj are always nonnegative unless alternatives i and j

Theorem 3 reduces the evaluation of all mixed derivatives up to order I - 1

We showed that the Daly-Zachary-McFadden condition of the validity of

You might also like

~(~1 >o->(v~) =10g(e-1) -elog[l +ex~bh/Vl.