Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Statistical and Econometric

Methods for Transportation


Engineering (CE687A)
Nested logit model
Aditya Medury
Lecture 20

2022-23, Semester I
IIT Kanpur
1
Disclaimer

This course material is being distributed as part of CE687A, titled “Statistical and
Econometric Methods for Transportation Engineering ", at IIT Kanpur during
semester I of the academic year 2022-23. Its contents are being shared in
confidence, for the sole purpose of instruction, and are only meant for the
students registered in this course. Any form of distribution, reproduction or
uploading of these materials anywhere, or with anyone, outside this course is
strictly prohibited.

For discrete choice modelling, parts of the discussion are also adapted from
Kenneth Train’s Discrete Choice Methods with Simulation

2
Multinomial logit (MNL) model

Probability of choosing alternative 𝑖 for individual 𝑛: ′


𝑒 𝑉𝑖𝑛 𝑒 𝛃𝑖 𝐱𝑖𝑛
𝑃𝑖𝑛 ≡ 𝑃𝑛 𝑖 𝒞 = =
σ𝑗 𝑒 𝑉𝑗𝑛 σ 𝑒 𝛃′𝑗 𝐱𝑗𝑛
𝑗

If the choice (dependent variable) is modelled as follows:


1, if alternative 𝑖 is chosen for individual 𝑛
𝑦𝑖𝑛 = ቊ
0, otherwise

• The likelihood function can be written up as:

𝑦
𝐿 𝛃 = ෑ ෑ 𝑃𝑖𝑛𝑖𝑛 , 𝐿𝐿(𝛃) = ෍ ෍ 𝑦𝑖𝑛 log 𝑃𝑖𝑛
∀𝑛 ∀𝑖 𝑛 𝑖
3
Goodness of fit

2 log 𝐿𝑀𝐿𝐸
𝑅𝑃𝑠𝑒𝑢𝑑𝑜 =1−
log 𝐿0

2
log 𝐿𝑀𝐿𝐸 − 𝐾
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 − 𝑅𝑃𝑠𝑒𝑢𝑑𝑜 =1−
log 𝐿0

4
Model selection: Likelihood Ratio Test

• Under the null hypothesis that the restricted model is true, a 𝜒 2 test statistic can be
developed as follows

2
−2 𝐿𝐿𝑅 𝛃𝑅 − 𝐿𝐿𝐹 𝛃𝐹 ≈ 𝜒𝑑𝑓 𝑅 −𝑑𝑓𝐹

5
Elasticity

• Elasticity is defined the percent effect that a 1% change in 𝑥𝑖𝑛𝑘 has on the outcome
probability 𝑃𝑖𝑛 :
𝑃𝑖𝑛 𝛿𝑃𝑖𝑛 𝑥𝑖𝑛𝑘 𝛿𝑃𝑖𝑛 /𝑃𝑖𝑛
𝐸𝑥𝑖𝑛𝑘 = × =
𝛿𝑥𝑖𝑛𝑘 𝑃𝑖𝑛 𝛿𝑥𝑖𝑛𝑘 /𝑥𝑖𝑛𝑘

• For multinomial logit models:

𝑃 𝑖𝑛
𝐸𝑥𝑖𝑛𝑘 = 1 − 𝑃𝑖𝑛 𝛽𝑖𝑘 𝑥𝑖𝑛𝑘

6
Cross-elasticity

Cross-elasticity is defined the percent effect that a 1% change in 𝑥𝑗𝑛𝑘 has on the outcome
probability 𝑃𝑖𝑛 :

𝑃
𝑖𝑛
𝛿𝑃𝑖𝑛 𝑥𝑗𝑛𝑘 𝛿𝑃𝑖𝑛 /𝑃𝑖𝑛
𝐸𝑥𝑗𝑛𝑘 = × =
𝛿𝑥𝑗𝑛𝑘 𝑃𝑖𝑛 𝛿𝑥𝑗𝑛𝑘 /𝑥𝑗𝑛𝑘

= −𝑃𝑗𝑛 𝛽𝑗𝑘 𝑥𝑗𝑛𝑘

7
MNL example (route choice, example 13.1, Washington et al.)

• A survey of 151 commuters was conducted in suburban State College, Pennsylvania.


• Information was collected on their route selection on their morning trip from home to work
(all commute by driving personal vehicles).
• Distance was measured precisely from the vehicle parking spot at the trip origin to the
vehicle parking spot at the trip destination, so there is a variance in distances among
commuters even though they departed and arrived in the same general areas.

8
MNL example (route choice, example 13.1, Washington et al.)

• Commuters had a choice of three alternate routes: a four-lane arterial, a two-lane highway,
and a limited access four-lane freeway.
• Each of these three routes shared some common portions for access and egress because,
for example, the same road to the downtown area is used by both freeway and two-lane
road alternatives since the freeway exits onto the same city street as the two-lane road.

9
Route choices and explanatory variables

• Route choices:
a. four-lane arterial (speed
limit = 60 km/h, 2 lanes
each direction)
t. two-lane highway (speed
limit = 60 km/h, 1 lane
each direction)
f. limited access four-lane
freeway (speed limit = 90
km/h, 2 lanes each
direction).
• The variables are both
individual and alternative-
specific
Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 10
transportation data analysis. CRC press.
MNL output

Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 11
transportation data analysis. CRC press.
Should effect of distance vary across routes?

Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 12
transportation data analysis. CRC press.
How about elasticities for distance variable?

Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 13
transportation data analysis. CRC press.
Nested logit model

• The IIA assumption assumes proportional substitution across all choices.


• But it can be possible that changes in the discrete choice characteristics lead to the
individual preferring some alternatives over others.
• Nested logit model proposes a tree-like hierarchical structure to represent this
scenario

Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 14
transportation data analysis. CRC press.
Nested logit model: substitution patterns

• For any two alternatives that are in the same nest, the ratio of probabilities is
independent of the attributes or existence of all other alternatives. That is, IIA holds
within each nest.
• For any two alternatives in different nests, the ratio of probabilities can depend on the
attributes of other alternatives in the two nests. IIA does not hold in general for
alternatives in different nests.

Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 15
transportation data analysis. CRC press.
Example of substitution patterns

• Consider a mode choice scenario with a given


nested structure.
• If a choice alternative is removed, there is
proportional substitution across twigs within a
branch but not across branches
• 𝑃 𝐵𝑢𝑠 |𝑃 𝑅𝑎𝑖𝑙 = 1.5

• 𝑃 𝐴𝑢𝑡𝑜 |𝑃 𝐶𝑎𝑟𝑝𝑜𝑜𝑙
=4

16
Image source: Train, K. E. (2009). Discrete choice methods with simulation. Cambridge university press.
Nested logit model: probability estimation

Let the utility function for a given choice be written as:


𝑈𝑖𝑛 = 𝑊𝑘𝑛 + 𝑌𝑖𝑛 + 𝜖𝑖𝑛

• 𝑊𝑘𝑛 depends only on variables that describe nest 𝑘. These variables differ over nests
but not over alternatives within each nest.
• 𝑌𝑖𝑛 depends on variables that describe alternative 𝑖 ∈ 𝐵𝑘 . These variables vary over
alternatives (𝐵𝑘 ) within nest 𝑘.

Let the probability of choosing 𝑖 be decomposed as follows:

𝑃𝑖𝑛 = 𝑃𝑖𝑛|𝐵𝑘 × 𝑃𝐵𝑘𝑛

17
Nested logit model: probability estimation

𝑃𝑖𝑛 = 𝑃𝑖𝑛|𝐵𝑘 × 𝑃𝐵𝑘𝑛

• 𝑃𝑖𝑛|𝐵𝑘 is the conditional probability of choosing alternative 𝑖 ∈ 𝐵𝑘 given that an


alternative in nest 𝐵𝑘 is chosen
• 𝑃𝐵𝑘 𝑛 is the marginal probability of choosing an alternative in nest 𝐵𝑘 (with the
marginality being over all alternatives in 𝐵𝑘 ).
• This equality is exact, since any probability can be written as the product of a
marginal and a conditional probability.

18
Nested logit model: probability estimation

𝑃𝑖𝑛 = 𝑃𝑖𝑛|𝐵𝑘 × 𝑃𝐵𝑘𝑛

𝑒 𝑌𝑖𝑛/𝜆𝑘
𝑃𝑖𝑛|𝐵𝑘 =
σ𝑗∈𝐵𝑘 𝑒 𝑌𝑗𝑛/𝜆𝑘

𝑒 𝑊𝑘𝑛 +𝜆𝑘 𝐼𝑘𝑛


𝑃𝐵𝑘𝑛 = 𝐾 𝑊 +𝜆 𝐼
σ𝑙=1 𝑒 𝑙𝑛 𝑙 𝑙𝑛
• 𝐼𝑘𝑛 = log σ𝑗∈𝐵𝑘 𝑒 𝑌𝑗𝑛 /𝜆𝑘 is referred to as the inclusive value (or logsum)
• Helps capture the expected utility that decision maker 𝑛 receives from the choice
alternatives in nest 𝐵𝑘
• If 𝜆𝑘 ∀𝑘 is between zero and one, the model is consistent with utility maximization
for all possible values of the explanatory variables.
19
Nested logit model contains MNL as a special case

𝑃𝑖𝑛 = 𝑃𝑖𝑛|𝐵𝑘 × 𝑃𝐵𝑘𝑛

𝑒 𝑌𝑖𝑛 /𝜆𝑘 𝑒 𝑊𝑘𝑛 +𝜆𝑘 𝐼𝑘𝑛


𝑃𝑖𝑛|𝐵𝑘 = 𝑌𝑗𝑛 /𝜆𝑘
, 𝑃𝐵𝑘 𝑛 = 𝐾 𝑊 +𝜆 𝐼
σ𝑗∈𝐵𝑘 𝑒 σ𝑙=1 𝑒 𝑙𝑛 𝑙 𝑙𝑛

• 𝐼𝑘𝑛 = log σ𝑗∈𝐵𝑘 𝑒 𝑌𝑗𝑛 /𝜆𝑘 is referred to as the inclusive value (or logsum)
• 𝜆𝑘 = 1 ∀𝑘 reduces the nested logit to MNL (which can then be tested using
hypothesis testing)

20
Nested logit model estimation

𝑃𝑖𝑛 = 𝑃𝑖𝑛|𝐵𝑘 × 𝑃𝐵𝑘𝑛

𝑒 𝑌𝑖𝑛 /𝜆𝑘 𝑒 𝑊𝑘𝑛+𝜆𝑘 𝐼𝑘𝑛


𝑃𝑖𝑛|𝐵𝑘 = 𝑌𝑗𝑛 /𝜆𝑘
, 𝑃𝑖𝑛|𝐵𝑘 = 𝐾 𝑊 +𝜆 𝐼
σ𝑗∈𝐵𝑘 𝑒 σ𝑙=1 𝑒 𝑙𝑛 𝑙 𝑙𝑛

• It can be estimated using maximum likelihood estimates by simultaneously optimizing


for parameters in the lower and upper models (full information maximum likelihood).
• A sequential approach can also be utilized wherein the MNL models are estimated for
each nest, and then the upper model containing the inclusive values are estimated.
• However, the standard errors are expected to be biased towards zero in this case
→ the coefficients thus appear more significant than they are.
• The variances of the error terms across nests also varies due to differential scaling.
21
Nested logit
(Example 13.4, Washington et al.)

• To illustrate the estimation of a nested logit model


using full-information maximum likelihood,
consider an example of a model of motorcyclists’
injury severity.
• The data consist of 2273 single-vehicle
motorcycle accidents in the state of Indiana.
• There are four possible severity outcomes:
• no-injury (property damage only and possible
injury)
• non-incapacitating injury
• incapacitating injury
• fatality

Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 22
transportation data analysis. CRC press.
Model output

Testing for whether the nesting assumption can be rejected

Image source: Washington, S., Karlaftis, M. G., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for 23
transportation data analysis. CRC press.
Other discrete choice alternatives

• Chapter 4 of Train(2009) also discusses other versions of nested logit functions that
can be explained collectively as generalized extreme value models.
• Section 4.2.3 of Train(2009) also contains a brief discussion on some differences in
probability formulae used for nested logit model estimation across different texts with
regards to whether 𝜆𝑘 are divided in the lower model or not.

24
Comments
Discussion
Questions

E-mail: amedury@iitk.ac.in 25

You might also like