Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Econometrics

Qualitative Response Models


Outline
• Why Use Qualitative Response Models

• The LPM, Logit, and Probit Models

• Extensions and Applications


• Ordered responses
• Multinomial responses
Why Use Qualitative Response
Models
Why Use Qualitative Response Models
• You want to estimate a model where the dependent variable is a qualitative
indicator
• Could be a binary dummy variable (yes or no, participate or not)…
• What influences labor force participation decisions?
• What affects access to decent work?
• What influences the decision to use a type of technology?
• Who are those that receive remittances?
• Who are likely to receive conditional cash transfers?

• …or a categorical or ordered variable


• What influences the mode of transport do they use? (bus, car, walk, bike, train, etc.)
• What affects the nature of workers’ employment? (permanent, casual, project-based,
multiple)
• What affects the credit ratings of companies? (C, B, Baa, AAA etc.)
• What affects consumer satisfaction on the delivery of service? (e.g., delivery is good –
Strongly disagree, somewhat disagree, neutral, somewhat agree, strongly agree)
The LPM, Logit, and Probit
Models
The LPM, Logit, and Probit Models
• Will OLS work?
• Well yes, but no.
• In this case, OLS is called the “Linear Probability Model” (LPM),
• Consider 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
• Where
• 𝑌𝑖 = 1[person 𝑖 goes to school]
• This is an indicator function – value 1 if the logical statement in the bracket is true, 0 otherwise
• This can also be written 0,1 , where 1 if statement is true, 0 otherwise
• 𝑋𝑖 = family income

• 𝐸 𝑌𝑖 𝑋𝑖 can be interpreted as the conditional probability that the event 𝑌 = 1 will


occur given X, Pr 𝑌𝑖 = 1|𝑋𝑖
• 𝑌𝑖 here follows the Bernoulli probability distribution where 𝑃𝑖 is the probability that
𝑌𝑖 = 1 , and 1 − 𝑃𝑖 is the probability that 𝑌𝑖 = 0
• If there are 𝑛 trials with probability of success 𝑝 and failure 1 − 𝑝, and 𝑋 represents
the # of successes, then 𝑋 follows the binomial distribution, 𝑋~𝑏 𝑛𝑝, 𝑛𝑝 1 − 𝑝
The LPM, Logit, and Probit Models
• Will OLS work?
• Since 0 < 𝑃𝑖 < 1, then 0 ≤ 𝐸 𝑌𝑖 |𝑋𝑖 ≤ 1

• Problems:
• Non-normality of 𝑢𝑖
• Although this does not necessarily create a problem since in
statistical theory (Law of Large Numbers), OLS estimators tend
to be normally distributed in general.
• Heteroscedastic variance of 𝑢𝑖
• Because the Bernoulli probability distribution’s variance is a
function of the mean
• Non-fulfillment of 𝟎 ≤ 𝑬 𝒀𝒊 |𝑿𝒊 ≤ 𝟏
• The real problem with using OLS to estimate LPM
• Because it has a linear format
• Questionable 𝑅 2
• Much lower than normal

• Need to try alternatives that will ensure 0-1 range,


something that resembles a cumulative distribution
function.
The LPM, Logit, and Probit Models
• The Logit Model
• Follows the representation
1 1 𝑒 𝑍𝑖
• 𝑃𝑖 = or 𝑃𝑖 = = where 𝑍𝑖 = 𝛽0 + 𝛽1 𝑋𝑖
1+𝑒 − 𝛽0 +𝛽1 𝑋𝑖 1+𝑒 −𝑍𝑖 1+𝑒 𝑍𝑖
• Known as the (cumulative) logistic distribution function
• 𝑍 ∈ −∞, +∞ , 𝑃𝑖 ∈ 0,1 , and 𝑃𝑖 is non-linearly related to 𝑍𝑖
• But this cannot be estimated using OLS because it is non-linear in 𝑋’s and 𝛽’s.
• This can be linearized.
𝑒𝑍 1
• Note that 1 − 𝑃𝑖 = 1 − ( 𝑍) = ,
1+𝑒 1+𝑒 𝑍𝑖
𝑒𝑍𝑖
𝑃𝑖 1+𝑒𝑍𝑖
• So, we can write the odds ratio = 1 = 𝑒 𝑍𝑖
1−𝑃𝑖
1+𝑒𝑍𝑖
𝑃𝑖
• And if you take the log of the odds ratio, 𝐿𝑖 = ln = ln 𝑒 𝑍𝑖 = 𝑍𝑖 = 𝛽0 + 𝛽1 𝑋𝑖
1−𝑃𝑖
• Which is now linear in both 𝑋 and 𝛽’s
The LPM, Logit, and Probit Models
• The Logit Model
• Properties
• As 𝑃 goes from 0 to 1, (𝑍 varies from −∞ to +∞), 𝐿𝑖 goes from −∞ to +∞
• Probabilities are not linear with X, unlike in the LPM.
• Can have multiple regressors
• If 𝐿 is positive, an increasing 𝑋 will mean that the odds of 𝑌 = 1 is increasing. If 𝐿 is negative, the odds
of 𝑌 = 1 is decreasing with 𝑋

• Estimation is done using maximum likelihood –


• This is ideally a large-sample method, standard errors are asymptotic.
• Significance is tested using the standard normal 𝑍 statistic.
• Joint significance is tested using likelihood ratio statistic
# 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
• The Pseudo 𝑅 2 is estimated (McFadden 𝑅 2 , or Count 𝑅 2 =
# 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

• However, this only estimates the changes to the log of the odds ratio. You still need to compute for
marginal effects following
1
the logit estimation to recover the marginal change in the Pr(Y=1) per unit of
X – thru 𝑃𝑖 = − 𝛽0 +𝛽1 𝑋
1+𝑒 𝑖

• In Stata
• Check out the 𝑙𝑜𝑔𝑖𝑡 command to estimate the logit equation
• Followed by either 𝑚𝑓𝑥 or 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, 𝑑𝑦𝑑𝑥(∗) commands to recover the marginal effects
The LPM, Logit, and Probit Models
• The Probit Model
𝑋−𝜇 2
1 −
• Uses the Normal CDF (based on pdf 𝑓 𝑋 = 𝑒 2𝜎2 whose CDF is 𝐹 𝑋 =
𝑋−𝜇 2 2𝜎 2 𝜋
𝑋0 1 −
‫׬‬−∞ 2𝜎2𝜋 𝑒 2𝜎2 )
• Say that some decision 𝑌𝑖 ∈ 0,1 is based on an unobservable latent index 𝐼𝑖 that is
determined by a set of variables 𝑋𝑖 (expressed 𝐼𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 , such that the larger
the 𝐼𝑖 , the greater the probability of 𝑌𝑖 = 1. There must exist some threshold 𝐼𝑖∗ where
any 𝐼𝑖 ≥ 𝐼𝑖∗ , then 𝑌𝑖 = 1, and 𝑌𝑖 = 0 for 𝐼𝑖 < 𝐼𝑖∗ . If 𝐼𝑖∗ is normally distributed with the
same mean and variance as 𝐼𝑖 , then it is possible to estimate the parameters of the
index.
• 𝑃𝑖 = 𝑃 𝑌 = 1|𝑋 = 𝑃 𝐼𝑖 ≥ 𝐼𝑖∗ = 𝑃 𝛽0 + 𝛽1 𝑋 ≥ 𝑍𝑖 = 𝐹 𝛽0 + 2𝛽1 𝑋 where
−𝑧
1 𝐼𝑖
𝑍𝑖 ~𝑁 0, 𝜎 2 , 𝐹 is the standard normal CDF 𝐹 𝐼𝑖 = ‫ 𝑒 ׬‬2 𝑑𝑧 =
2 2𝜋 −∞
1 𝛽0 +𝛽1 𝑋 −𝑧
‫׬‬ 𝑒 2 𝑑𝑧
2𝜋 −∞
• 𝑃𝑖 is then the area under the standard normal curve from −∞ to 𝐼𝑖 . We recover 𝐼𝑖 by
taking the inverse of 𝐹, 𝐼𝑖 = 𝐹 −1 𝐼𝑖 = 𝐹 −1 𝑃𝑖 = 𝛽0 + 𝛽1 𝑋
The LPM, Logit, and Probit Models
• The Probit Model
• 𝑃𝑖 is then the area under the standard normal curve from −∞ to 𝐼𝑖 . We recover 𝐼𝑖 by
taking the inverse of 𝐹, 𝐼𝑖 = 𝐹 −1 𝐼𝑖 = 𝐹 −1 𝑃𝑖 = 𝛽0 + 𝛽1 𝑋

• This is also estimated using maximum likelihood


• Like the logit model, the difficulty is that this also does
not directly estimate the marginal contribution to the
probability of 𝑌 = 1, but to the standard normal.
• Need to recover marginal effects

• In Stata
• Use the 𝑝𝑟𝑜𝑏𝑖𝑡 command
• Followed by the 𝑚𝑓𝑥 or 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, 𝑑𝑦𝑑𝑥(∗)
commands post-estimation
Extensions and Applications
Extensions and Applications
• Multinomial Responses
• Multiple outcomes that are determined by the same set of regressors.
• Outcomes must be mutually-exclusive (if you choose one, you cannot go into others)
• Ex: transportation choice (car vs. bus vs. train vs. …), career choice after SHS (work vs. continue
education vs. part-time vs. stop)
• Theory behind relies on a random utility model – Say, an individual chooses from a set
𝐴, 𝐵, 𝐶 such that 𝐴 ⊥ 𝐵 ⊥ 𝐶. The individual will choose 𝐴 ≻ 𝐵 and 𝐴 ≻ 𝐶 if 𝑈 𝐴 >
𝑈 𝐵 and 𝑈 𝐴 > 𝑈 𝐶 .
• Note 𝑈 . = 𝛽0 + 𝛽1 𝑋, that is, utility from a choice can be a function of regressors

𝑃𝑖
• Recall that in the binary case, the odds ratio can be expressed as 𝑒 𝑍𝑖
1−𝑃𝑖
• In the multinomial case where multiple outcomes 𝑗 = 1, … 𝑚, are determined by the
𝛽0𝑗 +𝛽1𝑗 𝑋𝑖 𝑍
𝑒 𝑒 𝑖𝑗
same regressors, 𝑝𝑖𝑗 = Pr 𝑦𝑖 = 𝑗 = = σ𝑚 𝑍𝑖𝑘 , 𝑗 = 1, … , 𝑚
σ𝑚
𝛽0𝑘 +𝛽1𝑗𝑘 𝑋𝑖 𝑘=1 𝑒
𝑘=1 𝑒
Extensions and Applications
• Multinomial Responses
• In the multinomial case where multiple outcomes 𝑗 = 1, … 𝑚, are determined by the
𝛽0𝑗 +𝛽1𝑗 𝑋𝑖 𝑍
𝑒 𝑒 𝑖𝑗
same regressors, 𝑝𝑖𝑗 = Pr 𝑦𝑖 = 𝑗 = = σ𝑚 𝑍𝑖𝑘 , 𝑗 = 1, … , 𝑚
σ𝑚
𝛽0𝑘 +𝛽1𝑗𝑘 𝑋𝑖 𝑘=1 𝑒
𝑘=1 𝑒
• Interpretation of a positive coefficient, for example, does not mean that an increase in
that regressor leads to an increase in the probability of that alternative.
• It is relative to the reference or base category group (how much more the odds of one
alternative increases compared to the increase of the odds of the base category)
• This is estimated using Maximum Likelihood.

• In Stata
• Check out the 𝑚𝑙𝑜𝑔𝑖𝑡, 𝑚𝑝𝑟𝑜𝑏𝑖𝑡 commands
• Followed by 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, 𝑑𝑦𝑑𝑥 ∗
• Doing it with the 𝑟𝑟𝑟 (relative risk ratio) option makes it a bit easier – you don’t get to
interpret in terms of changes in probability, but in terms of change in likelihood
relative to the base category
Extensions and Applications
• Ordered Responses
• For outcomes with natural ordering
• E.g., self-rated health (excellent, good, fair, poor), Credit Ratings (AAA, AA, Baa, C…)
• Start with an index model without an intercept 𝑦𝑖∗ = 𝑿′𝒊 𝜷 + 𝑢𝑖
• As 𝑦 ∗ crosses a series of increasing unknown thresholds, we move up the ordering of
alternatives.
• As for some very low 𝑦 ∗ , health status is poor; for 𝑦 ∗ > 𝛼𝑓𝑎𝑖𝑟 then health status is fair; then for
𝑦 ∗ > 𝛼𝑔𝑜𝑜𝑑 such that 𝛼𝑔𝑜𝑜𝑑 > 𝛼𝑓𝑎𝑖𝑟 , health status is good; and so on…
𝑝𝑜𝑜𝑟 | 𝑦 ∗ < 𝛼𝑓𝑎𝑖𝑟
𝑓𝑎𝑖𝑟 | 𝑦 ∗ ≥ 𝛼𝑓𝑎𝑖𝑟
• Visually this is ℎ𝑒𝑎𝑙𝑡ℎ 𝑠𝑡𝑎𝑡𝑢𝑠 =
𝑔𝑜𝑜𝑑 | 𝑦 ∗ ≥ 𝛼𝑔𝑜𝑜𝑑

• In general, for an 𝑚-alternative ordered model, we define
• 𝑦𝑖 = 𝑗 if 𝛼𝑗−1 < 𝑦𝑖∗ ≤ 𝛼𝑗 , where 𝛼0 = −∞ and 𝛼𝑚 = ∞
• Pr 𝑦𝑖 = 𝑗 = Pr 𝛼𝑗−1 < 𝑦𝑖∗ ≤ 𝛼𝑗 = Pr 𝛼𝑗−1 < 𝑿′𝒊 𝜷 + 𝑢𝑖 ≤ 𝛼𝑗
• = Pr 𝛼𝑗−1 − 𝑿′𝒊 𝜷 < 𝑢𝑖 ≤ 𝛼𝑗 − 𝑿′𝒊 𝜷 = 𝐹 𝛼𝑗 − 𝑿′𝒊 𝜷 − 𝐹 𝛼𝑗−1 − 𝑿′𝒊 𝜷
• Where 𝐹 . is the cdf of 𝑢𝑖
Extensions and Applications
• Ordered Responses
• Estimated using maximum likelihood
• This is different from count data estimation, or skewed data estimation (that uses
some Poisson Pseudo Maximum Likelihood)

• In Stata
• Check out the 𝑜𝑙𝑜𝑔𝑖𝑡 and 𝑜𝑝𝑟𝑜𝑏𝑖𝑡 commands
• However, getting the marginal effects isn’t as straightforward -
• Check 𝑚𝑓𝑥, 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝑜𝑢𝑡𝑐𝑜𝑚𝑒(#)) where # is the outcome whose marginal probabilities you
would like to view – do this for every outcome if you want to see how marginal effects change
across all outcomes.
• Check the 𝑚𝑎𝑟𝑔𝑖𝑛𝑠, or the 𝑝𝑟𝑒𝑑𝑖𝑐𝑡 post-estimation commands
References
• Cameron, C., and Trivedi, P., (2005). Microeconometrics.
• Gujarati, D., and Porter, D., (2009). Basic Econometrics. Singapore City: McGraw-Hill.

You might also like