Professional Documents
Culture Documents
Econometrics - Qualitative Response Models
Econometrics - Qualitative Response Models
• Problems:
• Non-normality of 𝑢𝑖
• Although this does not necessarily create a problem since in
statistical theory (Law of Large Numbers), OLS estimators tend
to be normally distributed in general.
• Heteroscedastic variance of 𝑢𝑖
• Because the Bernoulli probability distribution’s variance is a
function of the mean
• Non-fulfillment of 𝟎 ≤ 𝑬 𝒀𝒊 |𝑿𝒊 ≤ 𝟏
• The real problem with using OLS to estimate LPM
• Because it has a linear format
• Questionable 𝑅 2
• Much lower than normal
• However, this only estimates the changes to the log of the odds ratio. You still need to compute for
marginal effects following
1
the logit estimation to recover the marginal change in the Pr(Y=1) per unit of
X – thru 𝑃𝑖 = − 𝛽0 +𝛽1 𝑋
1+𝑒 𝑖
• In Stata
• Check out the 𝑙𝑜𝑔𝑖𝑡 command to estimate the logit equation
• Followed by either 𝑚𝑓𝑥 or 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, 𝑑𝑦𝑑𝑥(∗) commands to recover the marginal effects
The LPM, Logit, and Probit Models
• The Probit Model
𝑋−𝜇 2
1 −
• Uses the Normal CDF (based on pdf 𝑓 𝑋 = 𝑒 2𝜎2 whose CDF is 𝐹 𝑋 =
𝑋−𝜇 2 2𝜎 2 𝜋
𝑋0 1 −
−∞ 2𝜎2𝜋 𝑒 2𝜎2 )
• Say that some decision 𝑌𝑖 ∈ 0,1 is based on an unobservable latent index 𝐼𝑖 that is
determined by a set of variables 𝑋𝑖 (expressed 𝐼𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 , such that the larger
the 𝐼𝑖 , the greater the probability of 𝑌𝑖 = 1. There must exist some threshold 𝐼𝑖∗ where
any 𝐼𝑖 ≥ 𝐼𝑖∗ , then 𝑌𝑖 = 1, and 𝑌𝑖 = 0 for 𝐼𝑖 < 𝐼𝑖∗ . If 𝐼𝑖∗ is normally distributed with the
same mean and variance as 𝐼𝑖 , then it is possible to estimate the parameters of the
index.
• 𝑃𝑖 = 𝑃 𝑌 = 1|𝑋 = 𝑃 𝐼𝑖 ≥ 𝐼𝑖∗ = 𝑃 𝛽0 + 𝛽1 𝑋 ≥ 𝑍𝑖 = 𝐹 𝛽0 + 2𝛽1 𝑋 where
−𝑧
1 𝐼𝑖
𝑍𝑖 ~𝑁 0, 𝜎 2 , 𝐹 is the standard normal CDF 𝐹 𝐼𝑖 = 𝑒 2 𝑑𝑧 =
2 2𝜋 −∞
1 𝛽0 +𝛽1 𝑋 −𝑧
𝑒 2 𝑑𝑧
2𝜋 −∞
• 𝑃𝑖 is then the area under the standard normal curve from −∞ to 𝐼𝑖 . We recover 𝐼𝑖 by
taking the inverse of 𝐹, 𝐼𝑖 = 𝐹 −1 𝐼𝑖 = 𝐹 −1 𝑃𝑖 = 𝛽0 + 𝛽1 𝑋
The LPM, Logit, and Probit Models
• The Probit Model
• 𝑃𝑖 is then the area under the standard normal curve from −∞ to 𝐼𝑖 . We recover 𝐼𝑖 by
taking the inverse of 𝐹, 𝐼𝑖 = 𝐹 −1 𝐼𝑖 = 𝐹 −1 𝑃𝑖 = 𝛽0 + 𝛽1 𝑋
• In Stata
• Use the 𝑝𝑟𝑜𝑏𝑖𝑡 command
• Followed by the 𝑚𝑓𝑥 or 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, 𝑑𝑦𝑑𝑥(∗)
commands post-estimation
Extensions and Applications
Extensions and Applications
• Multinomial Responses
• Multiple outcomes that are determined by the same set of regressors.
• Outcomes must be mutually-exclusive (if you choose one, you cannot go into others)
• Ex: transportation choice (car vs. bus vs. train vs. …), career choice after SHS (work vs. continue
education vs. part-time vs. stop)
• Theory behind relies on a random utility model – Say, an individual chooses from a set
𝐴, 𝐵, 𝐶 such that 𝐴 ⊥ 𝐵 ⊥ 𝐶. The individual will choose 𝐴 ≻ 𝐵 and 𝐴 ≻ 𝐶 if 𝑈 𝐴 >
𝑈 𝐵 and 𝑈 𝐴 > 𝑈 𝐶 .
• Note 𝑈 . = 𝛽0 + 𝛽1 𝑋, that is, utility from a choice can be a function of regressors
𝑃𝑖
• Recall that in the binary case, the odds ratio can be expressed as 𝑒 𝑍𝑖
1−𝑃𝑖
• In the multinomial case where multiple outcomes 𝑗 = 1, … 𝑚, are determined by the
𝛽0𝑗 +𝛽1𝑗 𝑋𝑖 𝑍
𝑒 𝑒 𝑖𝑗
same regressors, 𝑝𝑖𝑗 = Pr 𝑦𝑖 = 𝑗 = = σ𝑚 𝑍𝑖𝑘 , 𝑗 = 1, … , 𝑚
σ𝑚
𝛽0𝑘 +𝛽1𝑗𝑘 𝑋𝑖 𝑘=1 𝑒
𝑘=1 𝑒
Extensions and Applications
• Multinomial Responses
• In the multinomial case where multiple outcomes 𝑗 = 1, … 𝑚, are determined by the
𝛽0𝑗 +𝛽1𝑗 𝑋𝑖 𝑍
𝑒 𝑒 𝑖𝑗
same regressors, 𝑝𝑖𝑗 = Pr 𝑦𝑖 = 𝑗 = = σ𝑚 𝑍𝑖𝑘 , 𝑗 = 1, … , 𝑚
σ𝑚
𝛽0𝑘 +𝛽1𝑗𝑘 𝑋𝑖 𝑘=1 𝑒
𝑘=1 𝑒
• Interpretation of a positive coefficient, for example, does not mean that an increase in
that regressor leads to an increase in the probability of that alternative.
• It is relative to the reference or base category group (how much more the odds of one
alternative increases compared to the increase of the odds of the base category)
• This is estimated using Maximum Likelihood.
• In Stata
• Check out the 𝑚𝑙𝑜𝑔𝑖𝑡, 𝑚𝑝𝑟𝑜𝑏𝑖𝑡 commands
• Followed by 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, 𝑑𝑦𝑑𝑥 ∗
• Doing it with the 𝑟𝑟𝑟 (relative risk ratio) option makes it a bit easier – you don’t get to
interpret in terms of changes in probability, but in terms of change in likelihood
relative to the base category
Extensions and Applications
• Ordered Responses
• For outcomes with natural ordering
• E.g., self-rated health (excellent, good, fair, poor), Credit Ratings (AAA, AA, Baa, C…)
• Start with an index model without an intercept 𝑦𝑖∗ = 𝑿′𝒊 𝜷 + 𝑢𝑖
• As 𝑦 ∗ crosses a series of increasing unknown thresholds, we move up the ordering of
alternatives.
• As for some very low 𝑦 ∗ , health status is poor; for 𝑦 ∗ > 𝛼𝑓𝑎𝑖𝑟 then health status is fair; then for
𝑦 ∗ > 𝛼𝑔𝑜𝑜𝑑 such that 𝛼𝑔𝑜𝑜𝑑 > 𝛼𝑓𝑎𝑖𝑟 , health status is good; and so on…
𝑝𝑜𝑜𝑟 | 𝑦 ∗ < 𝛼𝑓𝑎𝑖𝑟
𝑓𝑎𝑖𝑟 | 𝑦 ∗ ≥ 𝛼𝑓𝑎𝑖𝑟
• Visually this is ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑢𝑠 =
𝑔𝑜𝑜𝑑 | 𝑦 ∗ ≥ 𝛼𝑔𝑜𝑜𝑑
…
• In general, for an 𝑚-alternative ordered model, we define
• 𝑦𝑖 = 𝑗 if 𝛼𝑗−1 < 𝑦𝑖∗ ≤ 𝛼𝑗 , where 𝛼0 = −∞ and 𝛼𝑚 = ∞
• Pr 𝑦𝑖 = 𝑗 = Pr 𝛼𝑗−1 < 𝑦𝑖∗ ≤ 𝛼𝑗 = Pr 𝛼𝑗−1 < 𝑿′𝒊 𝜷 + 𝑢𝑖 ≤ 𝛼𝑗
• = Pr 𝛼𝑗−1 − 𝑿′𝒊 𝜷 < 𝑢𝑖 ≤ 𝛼𝑗 − 𝑿′𝒊 𝜷 = 𝐹 𝛼𝑗 − 𝑿′𝒊 𝜷 − 𝐹 𝛼𝑗−1 − 𝑿′𝒊 𝜷
• Where 𝐹 . is the cdf of 𝑢𝑖
Extensions and Applications
• Ordered Responses
• Estimated using maximum likelihood
• This is different from count data estimation, or skewed data estimation (that uses
some Poisson Pseudo Maximum Likelihood)
• In Stata
• Check out the 𝑜𝑙𝑜𝑔𝑖𝑡 and 𝑜𝑝𝑟𝑜𝑏𝑖𝑡 commands
• However, getting the marginal effects isn’t as straightforward -
• Check 𝑚𝑓𝑥, 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝑜𝑢𝑡𝑐𝑜𝑚𝑒(#)) where # is the outcome whose marginal probabilities you
would like to view – do this for every outcome if you want to see how marginal effects change
across all outcomes.
• Check the 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, or the 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 post-estimation commands
References
• Cameron, C., and Trivedi, P., (2005). Microeconometrics.
• Gujarati, D., and Porter, D., (2009). Basic Econometrics. Singapore City: McGraw-Hill.