GY460 Techniques of Spatial Analysis: Steve Gibbons

GY460 Techniques of Spatial Analysis
Lecture 6: Probabilistic choice models
Steve Gibbons
Introduction
• Sometimes useful to model individual firm, or other agents choices over

discrete alternatives
– Choice of transport mode
– Choice of firm location amongst regions
– Choice of cities or country to migrate to
• Theoretical framework
– Random utility model
• Empirical methods:
– Micro: Probit, logit, multinomial logit
– Aggregate: Poisson, OLS, gravity

The “Random Utility” choice model
Random Utility Model
• RUM underlies economic interpretation of discrete choice models.

Developed by Daniel McFadden for econometric applications
– see JoEL January 2001 for Nobel lecture; also Manski (2001) Daniel
McFadden and the Econometric Analysis of Discrete Choice, Scandinavian
Journal of Economics, 103(2), 217-229
• Preferences are functions of biological taste templates, experiences,

other personal characteristics
– Some of these are observed, others unobserved
– Allows for taste heterogeneity
• Discussion below is in terms of individual utility (e.g. migration,

transport mode choice) but similar reasoning applies to firm choices
• Individual i’s utility from a choice j can be decomposed into two

components:
U ij  Vij   ij
• Vij is deterministic – common to everyone, given the same

characteristics and constraints
– representative tastes of the population e.g. effects of time and cost on
travel mode choice
 ij is random
– reflects idiosyncratic tastes of i and unobserved attributes of choice j
• Vij is a function of attributes of alternative j (e.g. price and time) and

observed consumer and choice characteristics.
Vij   tij   pij   zij

• We are interested in finding , , 
• Lets forget about z now for simplicity

RUM and binary choices
• Consider two choices e.g. bus or car
• We observe whether an individual uses one or the other
• Define
yi  1 if i chooses bus
yi  0 if i chooses car
• What is the probability that we observe an individual choosing

to travel by bus?
• Assume utility maximisation
• Individual chooses bus (y=1) rather than car (y=0) if utility of

commuting by bus exceeds utility of commuting by car
RUM and binary choices
• So choose bus if U i1  U i 0
Vi1   i1  Vi 0   i10
 i1   i10  Vi1  Vi 0 
• So the probability that we observe an individual choosing bus
travel is
Pr ob   i1   i 0  Vi1  Vi 0 
 Pr ob   i1   i 0    ti1  ti 0     pi1  pi 0  
The linear probability model
• Assume probability depends linearly on observed

characteristics (price and time)
Pr ob  i chooses bus     ti1  ti 0     pi1  pi 0 
• Then you can estimate by linear regression
yi1    ti1  ti 0     pi1  pi 0    i1
• Where yi1 is the “dummy variable” for mode choice (1 if

bus, 0 if car)
• Other consumer and choice characteristics can be included (the
zs in the first slide in this section)
The linear probability model
• Unfortunately his has some undesirable properties
Pr ob  bus 
1
Linear regression line
0
Vi
Non-linear probability model
• Better for probability function to have a shape something like:
Pr ob  bus 
1
0
Vi
Probits and logits
• Common assumptions:
– Cumulative normal distribution function – “Probit”
– Logistic function – “Logit”
exp  Vi 
Pr ob  i chooses bus  
1  exp  Vi 
• Estimation by maximum likelihood
Pr ob  yi  1  F  xi β 
Prob  yi  0   1  F  xi β 
i n
ln L   yi lnF  xi β    1  yi  1  F  xi β  
i 1
Example
• McFadden, D. (1974) The Measurement of Urban Travel Demand, Journal of Public Economics, 3
• Methods of commuting in San Francisco Bay area

Example 1
McFadden (1974) car versus bus commute modes in SF Bay area
Characteristics  t
Family income $ 0.000095 (0.774)
Car-bus cost, cents per round trip -0.01022* (3.726)
Car-bus vehicle time costs (one way -0.01479 (2.460)

minutes x wage)
Bus total access time costs (one way -0.00314 (0.818)
minutes x wage)
Constant 0.3832 (0.428)
Multiple choices and the “multinomial logit”
Multiple choices
• We often want to think about many more than two choices

– Choice of regional location
– Choice of transport mode with many alternatives
– Choice amongst a sample of schools
• How can we extend the binary choice logit model?
• Random Utility model extends to many choices
U ij  Vij   ij
• Choose choice k if utility higher than for all other choices
Vik   ik  Vij   ij for all j  k

Multinomial logit (1)
• Again we need to assume some distribution for the unobserved factor


• One type of distribution (extreme value) gives a simple solution for
the probability that choice k is made:
exp  Vik 
Pr ob  i chooses k  
 exp  V 
j
ij
• This is a generalisation of the logit model with many

alternatives = “multinomial logit” or “conditional logit”
j  J i n
ln L   yij lnProb  i chooses j
j 1 i 1
Multinomial logit (2)
• Recall: Vij is a linear function of observed characteristics of the

individuals and their choices. e.g. for travel mode choice
• Vij   tij
Parameters estimated:   pij   j zij
• For an individual characteristic that is common across choices
(e.g. income, gender): one parameter per choice
– For at least one choice this is zero (base case).
• For a characteristic which varies only across choices e.g. price of

transport: one parameter common across choices
Example: Value of time
• MNL models used to estimate “value of travel time” with from observed commuter behaviour
• Three transport choices: bus (0), train (1), car (2)
• Choosing bus as the base case:
Vi1   ( price1  price0 )   (time1  time0 )

 sexi (1  0 )  companycari (1   0 )
Vi 2   ( price2  price0 )   (time2  time0 )
 sexi (2  0 )  companycari ( 2   0 )
Example 1: Value of time
• For example, from Truong and Hensher, Economic Journal, 95 (1985)

p. 15 for bus/train/car choices in Sydney 1982
Example 2: immigration
• Scott, Coomes and Izyumov, (2005)The Location Choice of

Employment-Based Immigrants among U.S. Metro Areas.
Journal of Regional Science 45(1) 113-145
• Estimate the impact of metropolitan area characteristics on
destination choice for US migrants in 1995
• 298 destination MSAs
Example 2: immigration
Source: Scott, Coomes et al (note: they also report models which include individual Xs)
The independence of irrelevant alternatives problem (IIA)
and the nested logit model
Multinomial logit and “IIA”
• Many applications in economic and geographical journals (and other

research areas)
• The multinomial logit model is the workhorse of multiple choice

modelling in all disciplines. Easy to compute
• But it has a drawback

Independence of Irrelevant Alternatives
• Consider market shares
– Red bus 20%
– Blue bus 20%
– Train 60%
• IIA assumes that if red bus company shuts down, the market shares become
– Blue bus 20% + 5% = 25%
– Train 60% + 15% = 75%
• Because the ratio of blue bus trips to train trips must stay at 1:3
• Model assumes that ‘unobserved’ attributes of all alternatives are perceived as equally similar
• But will people unable to travel by red bus really switch to travelling by train?
• Most likely outcome is (assuming supply of bus seats is elastic)

– Blue bus: 40%
– Train: 60%
• This failure of multinomial/conditional logit models is called the
• Independence of Irrelevant Alternatives assumption (IIA)

• It is easy to see why this is:
• Ratio of probabilities of choosing k (e.g. red bus) and another choice l

(e.g. train) is just
exp  Vik 

exp  Vil 
• All other choices drop out of this odds ratio
• There are models that overcome this, e.g…

Nested Logit Model
• Multinomial logit model can be generalised to relax IIA assumption

– Nested Logit (Nested Multinomial Logit)
Car (1) Public transport (2)
Bus (3) Train (4)
• Characteristics of Bus and Train affect decision of whether to

use Car or Public Transport
• Estimate by sequential logits…
Nested Logit Model
• Value placed on choices available in second stage (3,4) enter into calculation
of choice probabilities in first stage (2)…
• Logit for bus versus train to estimate V3 and V4
• Define the ‘Inclusive Value’ of public transport as
I 2  ln exp  V3   exp  V4  
• Estimate logit model for Car (1) versus Public (2) using:
exp  V2  I 2 
Pr ob  Public  
exp  V2  I 2   exp  V1 
Example: Transport mode choice
• Asensio, J., Transport Mode Choice by Commuters to Barcelona’s

CBD, Urban Studies, 39(10), 2002
• Travel mode for suburban commuters
• Sample of 1381 commuters from a travel survey
• Records mode of transport and other individual characteristics
Private car Public transport
Train Bus
Example: Transport mode choice
• Asensio, J., Transport Mode Choice by Commuters to Barcelona’s

CBD, Urban Studies, 39(10), 2002
– Some selected coefficients
Variable Parameter
Cost -0.002
Travel time by car -0.054
Travel time by public transport -0.018
Sex (car) 0.889
Sex (bus) -1.001

• We don’t know the units of measurement, but how much more
valuable is time saved car than time saved by public transport?
Other discrete choice applications
• Firm location choices e.g. Head, K. and T.Mayer seminar reading

(2004), Market Potential and the Location of Japanese Investment in
the European Union, Review of Economics and Statistics, 86(4) 959-
972
• School choice (e.g. Barro, L. (2002) School choice through relocation:
evidence from the Washington, D.C. area, Journal of Public
Economics, 86 p.155-189
• Migration destinations
• Residential choice
Aggregate choice models
Micro and aggregated choice models
• Micro level logit choice models often have aggregated equivalents
exp  Vk 
Pr ob  i chooses k  
 exp  V 
j
j
ln Pr ob  i chooses k   Vk  ln  exp  V j 
j
ln  nk / N     xk    i
• i.e. if you only have choice characteristics, you could use a choice-level
regression of the proportion of individuals making each choice on the
choice characteristics
• Obviously log(n_k) would work too (why?)

Micro and aggregated choice models
• In fact, a Poisson model on aggregated data gives exactly the same

coefficient estimates as the conditional logit model
• Which is based on ML estimation of
exp  k  k nk
Pr ob  number choosing k = nk  
nk !
ln k    xk 
• See Guimaraes et al Restats (2003)
– though this equivalence was known before this ‘discovery’
• Here’s an example…
Data (295 i’s 3 j’s)
id choice d x
1 American 0 18.97627
1 Japan 0 7.542373
1 Europe 1 3.461017
2 American 1 18.97627
2 Japan 0 7.542373
2 Europe 0 3.461017
3 American 1 18.97627
3 Japan 0 7.542373
3 Europe 0 3.461017
4 American 0 18.97627
4 Japan 1 7.542373
4 Europe 0 3.461017
5 American 1 18.97627
5 Japan 0 7.542373
5 Europe 0 3.461017
Conditional logit
Conditional (fixed-effects) logistic regression Number of obs = 885

LR chi2(1) = 129.65
Prob > chi2 = 0.0000
Log likelihood = -259.26785 Pseudo R2 = 0.2000
------------------------------------------------------------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0999331 .0091997 10.86 0.000 .081902 .1179642
------------------------------------------------------------------------------
Simpler data
choice n x p
American 192 18.97627 0.650847
Japan 64 7.542373 0.216949
Europe 39 3.461017 0.132203

Poisson
Poisson regression Number of obs = 3

LR chi2(1) = 129.65
Prob > chi2 = 0.0000
Log likelihood = -9.3973119 Pseudo R2 = 0.8734
------------------------------------------------------------------------------
n | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0999331 .0091997 10.86 0.000 .081902 .1179642
_cons | 3.364614 .1450806 23.19 0.000 3.080262 3.648967
------------------------------------------------------------------------------
OLS
. reg lnp x
Source | SS df MS Number of obs = 3

-------------+------------------------------ F( 1, 1) = 370.23
Model | 1.32738687 1 1.32738687 Prob > F = 0.0331
Residual | .003585331 1 .003585331 R-squared = 0.9973
-------------+------------------------------ Adj R-squared = 0.9946
Total | 1.3309722 2 .665486102 Root MSE = .05988
------------------------------------------------------------------------------
lnp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .101293 .0052644 19.24 0.033 .034403 .168183
_cons | -2.339238 .06295 -37.16 0.017 -3.139094 -1.539383
------------------------------------------------------------------------------
Aggregate v micro choice models
• Hence, there’s little point in using conditional logit if you only have
choice-characteristics
• Conditional/multinomial logit is good if you have individual and

group-level characteristics
• The aggregated OLS version gives rise to “Spatial interaction” models
of flows between origins and destinations
• = Gravity models
• Widely applied (generally a-theoretically) in migration, trade and

commuting applications
– e.g. See Head (2003) Gravity for beginners
Gravity/spatial interaction/migration/trade models
• Flow from place j to place k modelled as
ln( n jk )  xjk    j   k   jk
• Typically characteristics of destination and source include some
measure of “attraction” e.g. population mass (or “market potential” in
trade models) wages (endogenous)
• And measure of the cost in moving between place j and d (e.g. log
distance)
ln(n jk )   ln d jk  xjk    j   k   jk
• Hence gravity – after Newton
ln( Force jk )    ln mass j  ln massk  2ln dist jk

Gravity/spatial interaction/migration/trade models
• Strong distance decay effects

– Typical elasticities -0.5 to -2.0
• Even for internet site visits!: see Blum and Goldfarb (2006) Journal of
International Economics
• Trade literature has many examples
• Disdier and Head (2003) The Puzzling Persistence Of The Distance

Effect On Bilateral Trade, Review of Economics and Statistics
– Finds mean distance elasticity of -0.9 from about 1500 studies
Conclusion
• Generally possible to model ‘choices’ as discrete, or as flows
• Discrete choice models offer the advantage of

– Including micro-level (individual/firm) level characteristics
– An underlying structural model (RUM)
• Aggregate flow models

– Simpler to compute
– No need for distributional assumptions necessary for maximum likelihood

(nonlinear) methods
– A can’t separate individual from aggregate factors

GY460 Techniques of Spatial Analysis: Steve Gibbons

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GY460 Techniques of Spatial Analysis: Steve Gibbons

Uploaded by

Copyright:

Available Formats

GY460 Techniques of Spatial Analysis

Lecture 6: Probabilistic choice models

• Sometimes useful to model individual firm, or other agents choices over

– Choice of firm location amongst regions

– Choice of cities or country to migrate to

– Aggregate: Poisson, OLS, gravity

• RUM underlies economic interpretation of discrete choice models.

• Preferences are functions of biological taste templates, experiences,

– Allows for taste heterogeneity

• Discussion below is in terms of individual utility (e.g. migration,

• Individual i’s utility from a choice j can be decomposed into two

• Vij is deterministic – common to everyone, given the same

• Vij is a function of attributes of alternative j (e.g. price and time) and

Vij   tij   pij   zij

• Lets forget about z now for simplicity

• Consider two choices e.g. bus or car

• We observe whether an individual uses one or the other

• What is the probability that we observe an individual choosing

• Individual chooses bus (y=1) rather than car (y=0) if utility of

• Assume probability depends linearly on observed

Pr ob  i chooses bus     ti1  ti 0     pi1  pi 0 

• Then you can estimate by linear regression

yi1    ti1  ti 0     pi1  pi 0    i1

• Where yi1 is the “dummy variable” for mode choice (1 if

• Unfortunately his has some undesirable properties

Linear regression line

• Better for probability function to have a shape something like:

– Logistic function – “Logit”

• Estimation by maximum likelihood

• Methods of commuting in San Francisco Bay area

McFadden (1974) car versus bus commute modes in SF Bay area

Family income $ 0.000095 (0.774)

Car-bus cost, cents per round trip -0.01022* (3.726)

Car-bus vehicle time costs (one way -0.01479 (2.460)

• We often want to think about many more than two choices

– Choice of transport mode with many alternatives

– Choice amongst a sample of schools

• How can we extend the binary choice logit model?

• Random Utility model extends to many choices

Vik   ik  Vij   ij for all j  k

• Again we need to assume some distribution for the unobserved factor

• This is a generalisation of the logit model with many

• Recall: Vij is a linear function of observed characteristics of the

• For a characteristic which varies only across choices e.g. price of

Vi1   ( price1  price0 )   (time1  time0 )

• For example, from Truong and Hensher, Economic Journal, 95 (1985)

• Scott, Coomes and Izyumov, (2005)The Location Choice of

• Many applications in economic and geographical journals (and other

• The multinomial logit model is the workhorse of multiple choice

• But it has a drawback

– Blue bus 20%

– Train 60% + 15% = 75%

• Most likely outcome is (assuming supply of bus seats is elastic)

• This failure of multinomial/conditional logit models is called the

• Independence of Irrelevant Alternatives assumption (IIA)

• It is easy to see why this is:

• Ratio of probabilities of choosing k (e.g. red bus) and another choice l

• All other choices drop out of this odds ratio

• There are models that overcome this, e.g…

• Multinomial logit model can be generalised to relax IIA assumption

Car (1) Public transport (2)

Bus (3) Train (4)