Professional Documents
Culture Documents
On The Compatibility of Nested Logit Models With Utility Maximization
On The Compatibility of Nested Logit Models With Utility Maximization
On The Compatibility of Nested Logit Models With Utility Maximization
North-Holland
Axe1 BaRSCH-SUPAN
Urtirvrsity of Munnheim, D-6800 Munnheim, West Germcm?,
The paper examines the relationship between the nested multinomial logit (NMNL) model and
rational consumer behavior. We show that the Daly-Zachary-McFadden (DZM) condition of the
validity of stochastic utility maximization in NMNL models is unnecessarily strong and leads too
often to rejection of NMNL models with large dissimilarity parameters. In many cases, the
distribution of the stochastic utility components in the NMNL model can be modified in a way
that preserves the NMNL choice probabilities on a subset of R’, where I is the number of
alternatives in the choice set. On this subset, the DZM condition can be violated without violating
the restrictions of stochastic utility maximization.
1. Introduction
Discrete choice models are increasingly popular in applied econometric
work. Most popular is the multinomial logit (MNL) model which is easy to
compute even for a large number of alternatives, but suffers from the ‘inde-
pendence of irrelevant alternatives’. The multinomial probit (MNP) model
avoids this restriction but is computationally infeasible for problems with
more than a few alternatives. McFadden (1978) introduced the nested multi-
nomial logit (NMNL) model as a compromise between functional flexibility
and computational feasibility. This model has since enjoyed a larger number
of applications particularly in the analysis of transportation policies.
In general, all of the above models have structural microeconomic interpre-
tations as demand functions derived from stochastic utility maximization
[McFadden (1981)]. For the NMNL model, however, certain restrictions on
the parameters that control the correlation among unobserved attributes have
to be satisfied for this interpretation to hold. This is frequently not the case in
practical applications.
The aforementioned parameter conditions (the Daly-Zachary-McFadden
[DZM] condition) essentially guarantee the nonnegativity of the density func-
tion that characterizes the NMNL model. The frequent violation of the DZM
*This paper owes much to many very helpful discussions with Dan McFadden and the
comments by two anonymous referees. All remaining errors are mine. The research was conducted
at the J.F. Kennedy School of Government at Harvard University. Partial financial support by the
National Science Foundation is gratefully acknowledged.
u,t’ u,,. (4
Relations (1) and (2) imply the probability that consumer t chooses alternative
i from the set of all I alternatives:
=(LJ.,,,E,,<.
pi(uA 8, +u II -_(, ,,, i=l,..., I, J#i)
d%), (3)
A. Bijrsch-Supm, Computihility ofnestedlogit models 375
where F denotes the joint c.d.f. of Em= (si,, . . . , Ed,), assumed to be i.i.d. across
consumers, and u, = ( uir,. . . , u,~).
For F multivariate normal, eq. (3) defines the choice probabilities of the
multinomial probit (MNP) model; for F extreme value distributed and i.i.d.
across alternatives, eq. (3) generates the multinomial logit (MNL) model.
Eq. (3) shows that a discrete choice model can equivalently be specified by
the c.d.f. F of the stochastic utility components E,, i = 1,. . . , I, or a set of
choice probabilities { Pi] i = 1,. ..,Z }.We will call F the generating function
of the P,.Any c.d.f. will generate a discrete choice model. However, the
multiple integral in eq. (3) may be impractical to evaluate as in the case of
multinomial probit models for more than four alternatives. In turn, any set of
choice probabilities that satisfies a set of compatibility conditions defines a
stochastic utility maximization model with an implied joint distribution of the
stochastic utility components. These compatibility conditions are [Williams
(1977) Daly and Zachary (1979) McFadden (1981)]:
(4
dP,(u)/du,=
aPj(u)/au,, (9
where Qk denotes the marginal choice probability of subset k and Pilk the
conditional probability of choosing alternative i among the Z(k) alternatives
in subset k. Qx and Pjlk have the familiar functional form of simple
conditional logit choice probabilities [McFadden (1978)]:
exduI/ek> E(k)8k
p,(u) = (8)
E(k) ’ 1 E(Z)”
/=1 .. .. K
HereE(k) = C,=J(k).....J(k)+l(k)-1
exp( - u/0,) denotes the exponent of the
so-called inclusive value of subset k, .I( k) the index of the first alternative in
this subset, and 8, the so-called dissimilarity parameter of subset k. In terms
of the c.d.f. in eq. (3) the NMNL model parametrizes the generating function
0,.
F by the dissimilarity parameters 8,:
F NMNL(E;O) = exp
1 [
- f
k=l
‘E)exp(-si)liek
;=I Ii (9)
Each 8, not equal to one introduces a nonzero correlation among the distur-
bances E, within subset k. Therefore, the NMNL model overcomes the
assumption of i.i.d. disturbances across alternatives, the so-called ‘indepen-
dence of irrelevant alternatives’, that is the major disadvantage of the MNL
model.
The compatibility of the NMNL model with the stochastic utility maximiza-
tion hypothesis can be established by verifying conditions (4) through (6).
Only condition (6) is restrictive and equivalent
- to
4. A three-alternative example
fw(-M9))e-1
(I+
Gh) = (12)
(1 +exp(-111/e))e+exp(-772)
’
We define as pseudo-density the function g = a*G/aq,aq,,
0 + exPbh/We
s(v) =
(1+exp(-rll/e))e+exp(-112)
(13)
‘Examples are: McFadden, Talvitie, and Associates (1977). Coslett (1978). Small and
Brownstone(1982), Hensher (1984). Bksch-Supan (1985).
378 A. Bijrsch-Supan, Compatibility of nested logit mddels
/
P3: /
/
c3+v3 > cl+“1 /
c3+v3 > <2+v* /
/
/
/
- - n (w1.q) - (vl-p.v1-v3)
I
PI:
P2:
cl+“1 > c3+v3
cl+“1 > cz+vJ ‘2+“2 > cl+“1
<7_+q > c3+v3
>
.
‘)I - f2-Cl
I
. .
I
. . . . . . . . . log
.
L . l b ‘ . 8-l
.
.
g(rl1vsp) ’ 0 I<
wl,w**) - (q-q,hZ(q-‘q))
.I/j___
n
‘I
. /\\<
I 8-l
. ‘\
. ‘\\\\
Fig. 1. Choice probabilities at deterministic utility (0,. I+. q) and density function of the trinary
NMNL model defined by QE,. Q, Q) = exp( -[exp( -q/B) + exp( -E~/O)]’ - exp( -Q)].
A=
7-Q+ exphl/~))o e-1
--=2.p,-B,
e-1
(14)
(1 + ev( -7,/W” + exph) e
The function hl(ql) defined by g(q,, q2) = 0 is plotted in fig. 1. For 8 > 1,
A. BBrsch-&pun, Computihili[y of nesred logit models 319
h,( vi) approaches log[( B - l)/( 19+ l)] as ni -+ cc and ni - log[( 8 - l)/( 0 +
l)] as ni + - cc. For each vi, a point w * = h,( vi) exists, such that g( vi, Q) 2 0
for n2 2 w* and g(n2, 71~)< 0 for n2 < w*. As B approaches one from above,
the constant log(8 - l/e + 1) approaches - cc so that the region with a
negative pseudo-density vanishes. Let
07)
Because of (15) and (16) L(n1,n2) ~0 for n2< hi(nJ. For each nl, the
continuity of g(n) implies the existence of a point w** = h2(ni) such that
L(n) 2 0 for n2 2 w **. Evaluating L(ni, n2) yields
(19)
380 A. BBrsch-Supan, Compatibility of nested logit models
is the region where the NMNL model can be reconciled with utility maximiza-
tion. Define a modified density function
This amounts to offsetting the ‘negative mass’ outside B(B) with the positive
mass in B(B)\A( 6) in a way that preserves the NMNL choice probabilities in
A( 0):
For points in A(B), the new choice model defined by g* is equivalent to the
NMNL model. For points outside of A(B), the choice probabilities will in
general be different from those of the NMNL model because in B(0)\A(0)
0 = g* I g and elsewhere g I g* = 0. However, g* is a proper density func-
tion in the entire domain R’.
There are, of course, many other ways to construct supports that yield
proper density functions. For instance, renormalizing g(qr, Q) in B(e) yields
the largest possible support and a continuous density function [g* is discon-
tinuous on the boundary of A(e)]. However, the attractiveness of the proposed
construction is that it exactly reproduces NMNL choice probabilities on its
support. The possibility of this construction therefore reconciles the use of
NMNL choice probabilities even if 8 > 1 as long as all relevant data is
contained in A(I~).~
4Relevant data points include all observed and projected deterministic utility components ~1,.In
general, the u, will be linear combinations X,fl of characteristics X, for alternative i, see eq. (1).
The coefficients p will be estimated from a data set (d,, X,) that includes the dependent indicator
variable for alternative i, d,.
A. Biirsch-Supan, Compatibility ofnestedlogit models 381
Because of
the inclusion of ( ul, uZ) in A( 0) can easily be checked by evaluating the first
derivative of a choice probability:
5. General case
G&w,, . . . . w,) = / Oc
D,F(z,w,+z ,..., w,+z)dz.
--3o
P,(u)=P,(O,u,-u,,...,u,-u,)
(22)
= G’( u, - Vl,..., u1 -q) = G'(w)
and
Consider the first choice probability for w E C. Only points with w, < d, are
relevant. Define
P,(u)=/“‘~ -/ g’(q)dq
lj2=-x. ?j,=-cc
+ ... +
=P,(c)+Sw2~Pl(q2.Y,...,Wl)dqZ+ e..
(‘2 2
+f- [:ig’(+G
384 A. BBrsch-Supun, Compufibility of nested logil models
which shows that the choice model based on g’, * generates the same choice
probability of the first alternative as the original choice model based on g’.
Analogous arguments show the possibility of defining density functions that
preserve the other choice probabilities. These definitions trivially coincide in
the complement of A. It remains to be shown that the definitions are
compatible with each other in the interior and at the boundary of A. Using the
Slutsky condition (5) this follows from’
1
a a a a
Jq,=-m”’ [I
WI Y
-... - . .._...~
aw, [aw,
1
awj aw, q,=-cc
ni
x . . .
/ 1),=-C-Z
s’3*(dd’rl
a a a
aw, &w
=-... - ...-...
aw, [aw,1 I
a a a
= -...
aw,
-...
aw,
-
[ 1
aw,
... &-<bJ
I
a a a a
I1 I
9 Y
vi
X . . .
I v,=--30
g’,*(ddv
g1’*(?J2,...,Tjf)=Jcc
D*f*(z,qi+z,...,~lI+Z)dz,
-cc
‘The square brackets denote terms to be left out. The argument shown applies to the interior
points of A( 8). Leaving out one or more integrations yields the compatibility of the derivatives in
definition (24).
A. Bijrsch-&pun, Computihiliiy of nested logit models 385
E(k)e”-2 1 - 8, E(k)‘”
= exp( -Y,~,,+&) .
xE(j)B, ’ i 6, + xE(j)‘/ ’
:
10<n,<I(l),...,O<n,~I(K)} (26)
are nonnegative at the marginal choice probabilities Q,(v), . . . , QK( v). More-
over, the functions g,, ,.,.,, ,,K are signed by mixed polynomials h,,,, ( ,,K in
Q1,. . . , QK of degree I(1) - 1,. . ., I(K) - 1.
Proof. The proof is a simple, but tedious exercise of evaluating all derivatives
in eq. (24). continuing the proof of Theorem 2. Note that the mixed derivatives
Table 1
Density-signing polynomials for some typical nested multinomial logit models
T(231) h:
T(2.1.1) h:,. h:
T(2.2) h:,, h:,, h:, h:
T(3.1) h:,, h:, h:
T(3.2) h:,, h,,, h:,, h:,, h:, h:,, h;,
T(3,3) h 33 h:;, h::, h:,, h:,, h,,, h:,, h:,, h’j, h:, h;
h,,=h,,(Q,,Qr,e,,e2)=l20.Q:.Q:+72.r,,Q:.Q2+72.r,.Q,.Q:+54.r,.7,.Q,.Q,
+6~~,.(r,~Q~+6~r2.a2.Q~+6.~,.a,.r2~Q2+6.~2.q.r1~Q,
+ 7, 0, T2 O2
for 7, = ce, - 1)/e,. a, = (8, - 2)/e,
A. BBrs&Supun, Computihih+ of nested logit models 387
6. Conclusions
References
Biirsch-Supan. Axe], 1985. Tenure choice and housing demand, in: Konrad Stahl and Raymond
Struyk. eds.. U.S. and German housing markets: Comparative economic analyses (Urban
Institute Press, Washington, DC) 55-105.
Coslett, Stephen R.. 1978, Efficient estimation of discrete choice models from choice based
samples.‘Unpublished Ph.D. dissertation (University of California at Berkeley, CA).
Dalv. Andrew and S. Zachary. 1979, Imnroved multinle choice models, in: David Hensher and
6. Dalvi. eds.. Identifying and measuring the determinants of mode choice (Teakfield.
London) 335-357.
Hensher, David A., 1984, Full information maximum likelihood estimation of a nested logit
mode-choice model, Working paper no. 13. Dimensions of automobile demand project
(Macquire University, North Ryde).
388 A. Bb’rsch-Supan, Compatihiliq of nested logit models
McFadden, Daniel. A. Talvitie, and Associates, 1977, Demand model estimation and validation:
Urban travel demand forecasting project, Final report (Institute for Transportation Studies,
University of California at Berkeley, CA).
McFadden. Daniel, 1978. Modelling the choice of residential location, in: A. Karlgvist, ed.,
Spatial interaction theory and residential location (North-Holland, Amsterdam) 75-96.
McFadden, Daniel, 1979, Quantitative methods for analyzing travel behavior of individuals: Some
recent developments, in: David Hensher and P. Stopher, eds., Behavioral travel modeling
(Croom Helm, London).
McFadden, Daniel, 1981, Econometric models of probabilistic choice, in: Charles F. Manski and
Daniel McFadden. eds., Structural analysis of discrete data with econometric applications
(MIT Press, Cambridge. MA) 198-272.
Small, Kenneth A. and David Brownstone, 1982, Efficient estimation of nested logit models:
An application to trip timing, Economic research program research memorandum no. 296
(Princeton University, Princeton, NJ).
Williams, H.. 1977. On the formation of travel demand models and economic evaluation measures
of user benefit. Environment Planning A 9.