Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 44

GY460 Techniques of Spatial Analysis

Lecture 6: Probabilistic choice models

Steve Gibbons
Introduction

• Sometimes useful to model individual firm, or other agents choices over


discrete alternatives
– Choice of transport mode

– Choice of firm location amongst regions

– Choice of cities or country to migrate to

• Theoretical framework
– Random utility model

• Empirical methods:
– Micro: Probit, logit, multinomial logit

– Aggregate: Poisson, OLS, gravity


The “Random Utility” choice model
Random Utility Model

• RUM underlies economic interpretation of discrete choice models.


Developed by Daniel McFadden for econometric applications
– see JoEL January 2001 for Nobel lecture; also Manski (2001) Daniel
McFadden and the Econometric Analysis of Discrete Choice, Scandinavian
Journal of Economics, 103(2), 217-229

• Preferences are functions of biological taste templates, experiences,


other personal characteristics
– Some of these are observed, others unobserved

– Allows for taste heterogeneity

• Discussion below is in terms of individual utility (e.g. migration,


transport mode choice) but similar reasoning applies to firm choices
Random Utility Model

• Individual i’s utility from a choice j can be decomposed into two


components:

U ij  Vij   ij

• Vij is deterministic – common to everyone, given the same


characteristics and constraints
– representative tastes of the population e.g. effects of time and cost on
travel mode choice

 ij is random
– reflects idiosyncratic tastes of i and unobserved attributes of choice j
Random Utility Model

• Vij is a function of attributes of alternative j (e.g. price and time) and


observed consumer and choice characteristics.

Vij   tij   pij   zij


• We are interested in finding , , 

• Lets forget about z now for simplicity


RUM and binary choices

• Consider two choices e.g. bus or car

• We observe whether an individual uses one or the other

• Define
yi  1 if i chooses bus
yi  0 if i chooses car

• What is the probability that we observe an individual choosing


to travel by bus?
• Assume utility maximisation

• Individual chooses bus (y=1) rather than car (y=0) if utility of


commuting by bus exceeds utility of commuting by car
RUM and binary choices

• So choose bus if U i1  U i 0

Vi1   i1  Vi 0   i10
 i1   i10  Vi1  Vi 0 
• So the probability that we observe an individual choosing bus
travel is

Pr ob   i1   i 0  Vi1  Vi 0 
 Pr ob   i1   i 0    ti1  ti 0     pi1  pi 0  
The linear probability model

• Assume probability depends linearly on observed


characteristics (price and time)

Pr ob  i chooses bus     ti1  ti 0     pi1  pi 0 

• Then you can estimate by linear regression

yi1    ti1  ti 0     pi1  pi 0    i1

• Where yi1 is the “dummy variable” for mode choice (1 if


bus, 0 if car)
• Other consumer and choice characteristics can be included (the
zs in the first slide in this section)
The linear probability model

• Unfortunately his has some undesirable properties

Pr ob  bus 
1

Linear regression line

0
Vi
Non-linear probability model

• Better for probability function to have a shape something like:

Pr ob  bus 
1

0
Vi
Probits and logits
• Common assumptions:
– Cumulative normal distribution function – “Probit”

– Logistic function – “Logit”

exp  Vi 
Pr ob  i chooses bus  
1  exp  Vi 

• Estimation by maximum likelihood

Pr ob  yi  1  F  xi β 
Prob  yi  0   1  F  xi β 
i n
ln L   yi lnF  xi β    1  yi  1  F  xi β  
i 1
Example
• McFadden, D. (1974) The Measurement of Urban Travel Demand, Journal of Public Economics, 3

• Methods of commuting in San Francisco Bay area


Example 1

McFadden (1974) car versus bus commute modes in SF Bay area

Characteristics  t

Family income $ 0.000095 (0.774)

Car-bus cost, cents per round trip -0.01022* (3.726)

Car-bus vehicle time costs (one way -0.01479 (2.460)


minutes x wage)
Bus total access time costs (one way -0.00314 (0.818)
minutes x wage)
Constant 0.3832 (0.428)
Multiple choices and the “multinomial logit”
Multiple choices

• We often want to think about many more than two choices


– Choice of regional location

– Choice of transport mode with many alternatives

– Choice amongst a sample of schools

• How can we extend the binary choice logit model?

• Random Utility model extends to many choices

U ij  Vij   ij
• Choose choice k if utility higher than for all other choices

Vik   ik  Vij   ij for all j  k


Multinomial logit (1)

• Again we need to assume some distribution for the unobserved factor



• One type of distribution (extreme value) gives a simple solution for
the probability that choice k is made:
exp  Vik 
Pr ob  i chooses k  
 exp  V 
j
ij

• This is a generalisation of the logit model with many


alternatives = “multinomial logit” or “conditional logit”

j  J i n
ln L   yij lnProb  i chooses j
j 1 i 1
Multinomial logit (2)

• Recall: Vij is a linear function of observed characteristics of the


individuals and their choices. e.g. for travel mode choice

• Vij   tij
Parameters estimated:   pij   j zij
• For an individual characteristic that is common across choices
(e.g. income, gender): one parameter per choice
– For at least one choice this is zero (base case).

• For a characteristic which varies only across choices e.g. price of


transport: one parameter common across choices
Example: Value of time
• MNL models used to estimate “value of travel time” with from observed commuter behaviour
• Three transport choices: bus (0), train (1), car (2)
• Choosing bus as the base case:

Vi1   ( price1  price0 )   (time1  time0 )


 sexi (1  0 )  companycari (1   0 )
Vi 2   ( price2  price0 )   (time2  time0 )
 sexi (2  0 )  companycari ( 2   0 )
Example 1: Value of time

• For example, from Truong and Hensher, Economic Journal, 95 (1985)


p. 15 for bus/train/car choices in Sydney 1982
Example 2: immigration

• Scott, Coomes and Izyumov, (2005)The Location Choice of


Employment-Based Immigrants among U.S. Metro Areas.
Journal of Regional Science 45(1) 113-145
• Estimate the impact of metropolitan area characteristics on
destination choice for US migrants in 1995
• 298 destination MSAs
Example 2: immigration

Source: Scott, Coomes et al (note: they also report models which include individual Xs)
The independence of irrelevant alternatives problem (IIA)
and the nested logit model
Multinomial logit and “IIA”

• Many applications in economic and geographical journals (and other


research areas)

• The multinomial logit model is the workhorse of multiple choice


modelling in all disciplines. Easy to compute

• But it has a drawback


Independence of Irrelevant Alternatives
• Consider market shares
– Red bus 20%

– Blue bus 20%

– Train 60%

• IIA assumes that if red bus company shuts down, the market shares become
– Blue bus 20% + 5% = 25%

– Train 60% + 15% = 75%

• Because the ratio of blue bus trips to train trips must stay at 1:3
Independence of Irrelevant Alternatives
• Model assumes that ‘unobserved’ attributes of all alternatives are perceived as equally similar

• But will people unable to travel by red bus really switch to travelling by train?

• Most likely outcome is (assuming supply of bus seats is elastic)


– Blue bus: 40%

– Train: 60%

• This failure of multinomial/conditional logit models is called the

• Independence of Irrelevant Alternatives assumption (IIA)


Independence of Irrelevant Alternatives

• It is easy to see why this is:

• Ratio of probabilities of choosing k (e.g. red bus) and another choice l


(e.g. train) is just

exp  Vik 

exp  Vil 

• All other choices drop out of this odds ratio

• There are models that overcome this, e.g…


Nested Logit Model

• Multinomial logit model can be generalised to relax IIA assumption


– Nested Logit (Nested Multinomial Logit)

Car (1) Public transport (2)

Bus (3) Train (4)

• Characteristics of Bus and Train affect decision of whether to


use Car or Public Transport
• Estimate by sequential logits…
Nested Logit Model

• Value placed on choices available in second stage (3,4) enter into calculation
of choice probabilities in first stage (2)…

• Logit for bus versus train to estimate V3 and V4

• Define the ‘Inclusive Value’ of public transport as

I 2  ln exp  V3   exp  V4  

• Estimate logit model for Car (1) versus Public (2) using:

exp  V2  I 2 
Pr ob  Public  
exp  V2  I 2   exp  V1 
Example: Transport mode choice

• Asensio, J., Transport Mode Choice by Commuters to Barcelona’s


CBD, Urban Studies, 39(10), 2002

• Travel mode for suburban commuters

• Sample of 1381 commuters from a travel survey

• Records mode of transport and other individual characteristics

Private car Public transport

Train Bus
Example: Transport mode choice

• Asensio, J., Transport Mode Choice by Commuters to Barcelona’s


CBD, Urban Studies, 39(10), 2002
– Some selected coefficients

Variable Parameter

Cost -0.002

Travel time by car -0.054

Travel time by public transport -0.018

Sex (car) 0.889

Sex (bus) -1.001


• We don’t know the units of measurement, but how much more
valuable is time saved car than time saved by public transport?
Other discrete choice applications

• Firm location choices e.g. Head, K. and T.Mayer seminar reading


(2004), Market Potential and the Location of Japanese Investment in
the European Union, Review of Economics and Statistics, 86(4) 959-
972
• School choice (e.g. Barro, L. (2002) School choice through relocation:
evidence from the Washington, D.C. area, Journal of Public
Economics, 86 p.155-189
• Migration destinations

• Residential choice
Aggregate choice models
Micro and aggregated choice models

• Micro level logit choice models often have aggregated equivalents

exp  Vk 
Pr ob  i chooses k  
 exp  V 
j
j

ln Pr ob  i chooses k   Vk  ln  exp  V j 
j

ln  nk / N     xk    i
• i.e. if you only have choice characteristics, you could use a choice-level
regression of the proportion of individuals making each choice on the
choice characteristics

• Obviously log(n_k) would work too (why?)


Micro and aggregated choice models

• In fact, a Poisson model on aggregated data gives exactly the same


coefficient estimates as the conditional logit model

• Which is based on ML estimation of

exp  k  k nk
Pr ob  number choosing k = nk  
nk !
ln k    xk 
• See Guimaraes et al Restats (2003)
– though this equivalence was known before this ‘discovery’

• Here’s an example…
Data (295 i’s 3 j’s)

id choice d x
1 American 0 18.97627
1 Japan 0 7.542373
1 Europe 1 3.461017
2 American 1 18.97627
2 Japan 0 7.542373
2 Europe 0 3.461017
3 American 1 18.97627
3 Japan 0 7.542373
3 Europe 0 3.461017
4 American 0 18.97627
4 Japan 1 7.542373
4 Europe 0 3.461017
5 American 1 18.97627
5 Japan 0 7.542373
5 Europe 0 3.461017
Conditional logit

Conditional (fixed-effects) logistic regression Number of obs = 885


LR chi2(1) = 129.65
Prob > chi2 = 0.0000
Log likelihood = -259.26785 Pseudo R2 = 0.2000

------------------------------------------------------------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0999331 .0091997 10.86 0.000 .081902 .1179642
------------------------------------------------------------------------------
Simpler data

choice n x p

American 192 18.97627 0.650847

Japan 64 7.542373 0.216949

Europe 39 3.461017 0.132203


Poisson

Poisson regression Number of obs = 3


LR chi2(1) = 129.65
Prob > chi2 = 0.0000
Log likelihood = -9.3973119 Pseudo R2 = 0.8734

------------------------------------------------------------------------------
n | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0999331 .0091997 10.86 0.000 .081902 .1179642
_cons | 3.364614 .1450806 23.19 0.000 3.080262 3.648967
------------------------------------------------------------------------------
OLS

. reg lnp x

Source | SS df MS Number of obs = 3


-------------+------------------------------ F( 1, 1) = 370.23
Model | 1.32738687 1 1.32738687 Prob > F = 0.0331
Residual | .003585331 1 .003585331 R-squared = 0.9973
-------------+------------------------------ Adj R-squared = 0.9946
Total | 1.3309722 2 .665486102 Root MSE = .05988

------------------------------------------------------------------------------
lnp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .101293 .0052644 19.24 0.033 .034403 .168183
_cons | -2.339238 .06295 -37.16 0.017 -3.139094 -1.539383
------------------------------------------------------------------------------
Aggregate v micro choice models

• Hence, there’s little point in using conditional logit if you only have
choice-characteristics

• Conditional/multinomial logit is good if you have individual and


group-level characteristics
• The aggregated OLS version gives rise to “Spatial interaction” models
of flows between origins and destinations
• = Gravity models

• Widely applied (generally a-theoretically) in migration, trade and


commuting applications
– e.g. See Head (2003) Gravity for beginners
Gravity/spatial interaction/migration/trade models

• Flow from place j to place k modelled as

ln( n jk )  xjk    j   k   jk
• Typically characteristics of destination and source include some
measure of “attraction” e.g. population mass (or “market potential” in
trade models) wages (endogenous)
• And measure of the cost in moving between place j and d (e.g. log
distance)
ln(n jk )   ln d jk  xjk    j   k   jk
• Hence gravity – after Newton

ln( Force jk )    ln mass j  ln massk  2ln dist jk


Gravity/spatial interaction/migration/trade models

• Strong distance decay effects


– Typical elasticities -0.5 to -2.0

• Even for internet site visits!: see Blum and Goldfarb (2006) Journal of
International Economics
• Trade literature has many examples

• Disdier and Head (2003) The Puzzling Persistence Of The Distance


Effect On Bilateral Trade, Review of Economics and Statistics
– Finds mean distance elasticity of -0.9 from about 1500 studies
Conclusion

• Generally possible to model ‘choices’ as discrete, or as flows

• Discrete choice models offer the advantage of


– Including micro-level (individual/firm) level characteristics

– An underlying structural model (RUM)

• Aggregate flow models


– Simpler to compute

– No need for distributional assumptions necessary for maximum likelihood


(nonlinear) methods

– A can’t separate individual from aggregate factors

You might also like