Professional Documents
Culture Documents
Microeconometrie Chapitre2 MultinomialModels
Microeconometrie Chapitre2 MultinomialModels
Multinomial Models
Théophile T. Azomahou
University Clermont Auvergne, CNRS, CERDI
Maastricht University, School of Business and Economics
Email: theophile.azomahou@uca.fr
1. Introduction
Chapter 1 considered models for discrete outcome variables that can take
two possible values (0/1). Here we consider several possible outcomes,
usually mutually exclusive.
Estimation is most often by maximum likelihood because the data are clearly
multinomial distributed. For some complications, however, moment-based
estimation is used instead.
2. Multinomial models
Assume there are m alternatives and the dependent variable y is defined to take
value j if the jth alternative is taken, j = 1, · · · , m (Some authors instead
consider m + 1 alternatives with j = 0, 1, · · · , m. Define the probability that
alternative j is chosen as:
Thus yj equals one if alternative j is the observed outcome and the remaining yk
equal zero, so for each observation on y exactly one of y1 , y2 , · · · , ym will be
nonzero.
The multinomial density for one observation can then be conveniently written
as:
m
y
Y
f (y ) = p1y1 × p2y2 × · · · × pm
ym
= pj j (3)
j=1
For regression models, introduce a subscript i for the ith individual and regressors
x i . A model for the probability that individual i chooses the jth alternative is:
The functional form for Fj should be such that probabilities lie between 0 and 1
and sum over j to one. Different functional specifications for Fj correspond to
specific models, notably multinomial logit, nested logit, multinomial probit,
ordered, sequential, and multivariate models.
4. Multinomial Logit
The simplest specification is the multinomial logit model, proposed by Luce
(1959). The commonly used variants of this model differ according to whether or
not regressors vary across alternatives.
Pm
Because j=1 pij = 1 an equivalent model is obtained by defining x ij to be
deviations of regressors from values of alternative 1, say, and setting x i1 = 0.
The marginal effects are given by (– Exercise –):
(
∂pij pij (1 − pik )β if j = k
= pij (δijk − pik ) β =
∂x ik −pij pik β if j 6= k
where δijk = 1[j=k] .
Théophile T. Azomahou (CERDI) Février 20-28, 2020 8 / 24
Multinomial models
In this case, the sign of the own marginal effects is the same as the sign of β k ,
while the sign of the cross effects is opposite to the sign of the coefficient.
The marginal effects in this model are the effect of changing a regressor by one
unit on the probabilities of choosing each alternative (– Exercise –):
m
!
∂pij X
= pij β j − pih β h ≡ pij β j − β̄ p (10)
∂x i
h=1
P
where β̄ p = h pih β h is a probability weighted average of alternative-specific,
using the choice probabilities p as weights.
Remarks:
From this expression we can see that the sign of a parameter estimate does
not necessarily correspond to the sign of the effect of an increase in the
regressor on the probability of choosing this alternative.
In that sense, it does not make much sense to test whether a coefficient is
different from zero or not. More subtly, the sign of the individual marginal
effects can differ across individuals, as the weighted average β̄ p uses
individual-specific choice probabilities as weights.
The two models can be combined into what some authors call a mixed logit
model.
The coefficients in the CL and MNL models can also be given a more direct
logit-like interpretation in terms of relative risk. This is because the models can
be reexpressed as binary logit models.
For the MNL model, comparison is to a base category, which is the
alternative normalized to have coefficients equal to zero. To see this note
that the multinomial logit probabilities (9) imply that the conditional
probability of observing alternative j given that either alternative j or
alternative k is observed is:
pj
P(y = j|j = j or k) =
pj + pk
0
e x βj
= 0β (11)
e j + e x 0 βk
x
0
e x (βj −βk )
= 0
1 + e x (βj −βk )
which is a logit model with coefficient (β j − β k ).
Uj = Vj + εj , j = 1, · · · , m (14)
where the tilda and second subscript j denotes differencing with respect to
reference alternative j.
which is a bivariate integral that generally does not have an analytical solution.
Estimation of MNP
ε ∼ N(0, Σ) (17)
yi∗ = x 0i β + ui (19)
For example, for very low y ∗ health status is poor, for y ∗ > α1 health status
improves to fair, for y ∗ > α2 it improves further to good, and so on. In general
for an m-alternative ordered model we define:
∂P(yi = j)
= [F 0 (αj − x 0i β) − F 0 (αj−1 − x 0i β)] β (22)
∂∂x i
Note that
P P pijk define probabilities of mutually exclusive events and
pijk = j k pijk = 1. Define m1 × m2 corresponding binary indicator variables
yjk = 1 if (y1 = j, y2 = k) and yjk = 0 otherwise. Then the joint density for the
ith observation is
m1 Y
m2
y
Y
f (y1i , y2i ) = pijkijk (24)
k=1 j=1
PN Pm1 Pm2
The log-likelihood is then: ln L = i=1 k=1 j=1 yijk ln pijk and estimation is by
ML.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 21 / 24
Multinomial models
y1∗ = x 01 β 1 + ε1 (25)
y2∗ = x 02 β 2 + ε2
where the ε1 and ε2 are joint normal with means zero, variances one, and
correlation ρ. Then the bivariate probit model specifies the observed outcomes
to be: (
2 if y1∗ > 0
y1 = (26)
1 if y1∗ ≤ 0
(
2 if y2∗ > 0
y2 = (27)
1 if y2∗ ≤ 0
where we use values (2, 1) rather than (1, 0) to be consistent with the notation in
this lecture. Observe that if ρ = 0 this specification collapses to two separate
probit models for y1 and y2 . When ρ 6= 0, there is no closed-form solution for the
probabilities.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 22 / 24
Multinomial models
For example,
p22 = P(y1 = 2, y2 = 2)
= P(y1∗ > 0, y2∗ > 0)
= P(−ε1 < x 01 β 1 , −ε2 < x 02 β 2 )
= P(ε1 < x 01 β 1 , ε2 < x 02 β 2 )
Z x 01 β1 Z x 02 β2
= φ2 (z1 , z2 , ρ)dz1 dz2
−∞ −∞
= Φ2 (x 01 β 1 , x 02 β 2 , ρ)
where φ2 (z1 , z2 , ρ) and Φ2 (x 01 β 1 , x 02 β 2 , ρ) are, respectively, the standardized
bivariate normal density and CDF for z1 , z2 ) with zero means, unit variances, and
correlation ρ, and the fourth equality holds for the bivariate normal with mean
zero.
Performing similar algebra for the other possible outcomes yields:
pjk = P(y1 = j, y2 = k)
= Φ2 (q1 x 01 β 1 , q2 x 02 β 2 , ρ)
Read the data and organize the response variables according to the following
specifications
Estimation of the multinomial logit model
Estimation of the multinomial probit model
Estimation of the ordered multinomial model
Estimation of the bivariate probit model
Comment on results