Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

CHAPTER FOUR

BINARY CHOICE MODELS

4. Introduction

This chapter describes models where the dependent variable Y is a dichotomous variable. Such models are
called limited dependent variable models or also qualitative or categorical variable models. We concentrate on
the binary case where Yi can take only two values. One example would be a model of women labor force
participation (LFP). The dependent variable in this case is the labor force participation (LFP) which would take
the value of one (1) if the woman participates in the labor force and a value of zero (0) if the woman does not
participate in the labor force.

Various explanatory variables could be included: both continuous variables such as age and dichotomous
variables such as gender or educational achievement. Other examples are models of the determinants of
willingness to pay (WTP) for pure water supply in the rural areas of Chencha Woreda in SNNP Regional State
and determinants of house ownership by households in Arba Minch town. In both cases the dependent variables
are dichotomous/dummy/binary/qualitative/limited.

Y i=f (quantitative variables∧qualitative variables)

Where the response variable (Y) is dummy, categorical, limited, binary or qualitative. Such models are
commonly used in social science and medical research with interesting estimation- and interpretation
challenges. The following models are the most commonly used binary dependent variable models:

A. The Linear Probability Models


B. The Logit Models
C. The Probit Models

4.1. LINEAR PROBABILITY MODELS

The linear probability model is simply applying of ordinary least square (OLS) method to estimate dichotomous
dependent variable. OLS is the method discussed in chapter 2 and 3. Assume that we want study the
determinants of labor force participation (LFP) of adult men in a particular town. Since the dependent variable,
labor force participation, is a nominal variable, it takes a value of 1 (participate) and 0 (for not participate). Suppose we
routinely apply the method of ordinary least-squares (OLS). The linear probability model applies the linear model.

Y i= X i β +ui

Arba Minch University 2022


Where Y i is the probability that the ith observation scores 1, X i is the matrix containing all X values for all
observations and β is a vector containing all the coefficients. This model is called a linear probability model (LPM)
because the conditional expectation of the depending variable (labor force participation), given the values of the
explanatory variables, can be interpreted as the conditional probability that the event participating in the labor force will
occur. The conditional expectation for the i th observation is given by
E ¿) = X i ' β=Pi=Pr ⁡( Y i=1∨ X i)

Probability of not being employed ¿ Pi=Pr ( Y i=0 ⃓ X i )=1− X i' β

The conditional expectation of the dependent variable is equal to the probability of something happening, given
the value of explanatory variable. Pr ⁡(Y i=1 ⃓ X i)

E(Y )= X ' i β+ui

ui=E (Y )−X ' i β

From these equations the probability distribution of the dependent variable and the error term is given as
follows:
Value of Yi Ui Probability of Ui Probability of Yi
1 X
1- i ' β X i' β X i'β
0 -X i'β 1- X i ' β 1- X i ' β
Total 1 1
Assume that we want study the determinants of labor force participation (LFP) of adult men in particular town.
The mathematical model is given as:

employment =β 0 + β 1 ( married ) + β 2 ( schooling ) + β 3 ( age )+u i

Suppose that we have data for 30 observations on labor force


participation (employment), marital status, age and years of schooling.
The data is given in this table

The application of Ordinary Least Square Methods on Linear


Probability Model gives the regression result as displayed in Figure 1.

The interpretation of LPM estimates is direct forwarding like OLS estimates.


Example the coefficient of married is negative and significant at 5% level of
significance. It indicates married individuals are less likely to get employed
than unmarried individual. Keeping other things remain constant the
probability of being employed decreases by 0.38 as one moves from
Arba Minch University 2022
unmarried to married individual. β 2=−0.00097 means keeping other things remain constant, as the age of the
individual increase by one year the probability of being employed decreased by 0.097%. However, the
relationship between employment and age is not significant. Similarly, β 3=0.911 implies that as the year of
schooling increase by one year the probability of being employed decreased by 9.7%.
. reg Employment Married Age Schooling

Source SS df MS Number of obs = 30


F(3, 26) = 4.96
Model 2.62055889 3 .87351963 Prob > F = 0.0075
Residual 4.57944111 26 .17613235 R-squared = 0.3640
Adj R-squared = 0.2906
Total 7.2 29 .248275862 Root MSE = .41968

Employment Coef. Std. Err. t P>|t| [95% Conf. Interval]

Married -.3803107 .1562374 -2.43 0.022 -.7014613 -.0591602


Age -.0009682 .0066986 -0.14 0.886 -.0147373 .012801
Schooling .0911182 .0375996 2.42 0.023 .0138312 .1684052
_cons -.2227047 .6153338 -0.36 0.720 -1.487541 1.042132

Figure 1: Regression results of the LPM with Labor Force Participation as dependent variable

Limitations of the LPM


First, the LPM assumes that the probability of labor force participation moves linearly with the value of the explanatory
variable, no matter how small or large that value is. Secondly, by logic, the probability value must lie between 0 and 1.
But there is no guarantee that the estimated probability values from the LPM will lie within these limits. This is because
OLS does not take into account the restriction that the estimated probabilities must lay within the bounds of 0 and 1.
Thirdly, the usual assumption that the error term is normally distributed cannot hold when the dependent variable takes
only values of 0 and 1. Finally, the error term in the LPM is heteroscedastic, making the traditional significance tests
suspect.
For all these reasons, LPM is not the preferred choice for modeling dichotomous variables. The alternatives discussed in
the literature are logit and probit.

4.3 LOGIT REGRESSION MODEL (NON- LINEAR PROBABILITY MODEL)

But the serious limitation with LPM is that the predicted probability of an event occurring will lie outside the
natural limit, 0≤ Pi ≤1. The predicted probability lies outside the natural limit because LPM assumes linear
relationship between predicted probability and the level of explanatory variable (X i).

However, in reality, we cannot have probabilities that fall below 0 or above 1. Therefore, we need other
techniques of estimation which guarantees that the predicted probability lies in the natural limit, 0≤P i≤1.

Arba Minch University 2022


The main objective of the Logit model is to insure/ guarantee that the predicted probability of the event
occurring given the value of explanatory variable remains within the [0, 1] bounds. That means,

0 ≤ Pr(Y = 1|X) ≤ 1 for all X

This requires a nonlinear functional form for the probability. This can be possible if we assume that the
dependent or the error term (Ui) follows some sorts of cumulative distribution function. The two important
nonlinear functions which are proposed for this are the logistic CDF and the normal CDF. The logistic CDF is
given as follows:

Zi
1 e
Pr ( Y i=1⃓ X i )=P i= −Z
= Z
1+ e i
1+e i

1
Pr ( Y i=0 ⃓ X i ) =1−Pi= Z
1+e i

Where Zi =X i β+u i = β 0 + β 1 X 1 i +…+ βk X ki+ ui

It is easy to verify that as Zi ranges from −∞ ¿+∞ , Piranges between 0 and 1 and that Pi is non-linearly related
with Zi . That means, 0≤ Pi ≤1, for all real numbers Zi . This ensures that the predicted probability ( Pi) strictly lies
between 0 and 1. Thus, the Logit model satisfies the two conditions:
A. 0≤ Pi ≤1
B. Pi is non-linearly related with Xi

The odds ratio


When we use the Logit model, we limit the estimated probabilities inside the 0-1 range. But, while we are
insuring that the predicted probability of an event occurring (P i) lies in the natural interval, 0 ≤ Pi ≤ 1, we have
created an estimation problem because Pi is nonlinear in parameters and in explanatory variables and we cannot
apply OLS. However, we can linearize this Logit model as follows.

Take the ratio of the probability of an event occurring, in our case being employed, (P i) to the probability of an
event not happening (1-Pi) and the resulting ratio is called odds ratio.

Zi
e
Zi
Pi 1+e
= = e Zi
1−Pi 1
Zi
1+e

To linearize the above odds ratio, take the natural log of both the right side and left side equations. The resulting
equation is called log of the odds ratio (Logit).
Arba Minch University 2022
Pi Zi
ln ( )=ln ⁡(e )⇒ Li=Z i
1−P i

Li=Z i=X i ' β+ ui

Where, Li is the Logit (which is linearly related with X i), X i is a matrix including all values for the explanatory
variables and β is a vector including all coefficients (the s).

4.3. 1 Characteristics of the Logit Model

a. As Zi goes from -∞ ¿+∞ , the predicted probability (Pi) goes from 0 to 1. In other words, as Xi goes from -
∞ ¿+∞ , the predicted probability (Pi) goes from 0 to 1. This implies that in Logit model, the predicted
probability (Pi) lies in the natural limit, 0≤ Pi ≤ 1.
b. Even if the Logit (L) is linear in X, the probabilities themselves are not. This property is in contrast with the
LPM model where the probabilities increase linearly with X.
c. If the Logit (L) is positive, it means that, when the value of the regressor increases, the odds that the
regressand equals 1 (meaning some event of interest happens) increases. If L is negative, the odds that the
regressand equals 1 decreases as the value of X increases. To put it differently, the Logit becomes negative
and increasingly large in magnitude as the odds ratio decreases from 1 to 0 and becomes increasingly large
and positive as the odds ratio increases from 1 to infinity.
d. More formally, the interpretation of the Logit model given above is as follows: β1, the slope, measures the
change in L for a unit change in X, that is, it tells how the log-odds in favor of success (being employed)
change as X1 changes by a unit. The intercept β0 is the value of the log of odds in favor of success (being
employed) when all the explanatory variables are zero. Like most interpretations of intercepts, this
interpretation may not have any physical meaning.
e. Given a certain value of the explanatory variable, say, X*, if we actually want to estimate not the odds in
Zi
e
favor of success but the probability of success itself, this can be done directly from Pi = Zi , once the
1+ e
estimates of β0 + β1 are available.
f. Whereas the LPM assumes that Pi is linearly related to Xi, the Logit model assumes that the log of the odds
ratio is linearly related to Xi.

4.3.2 Estimation and interpretation of the Logit model

For estimation purpose, we can write a simple Logit model (with one independent variable, X) as follows:

Pi
ln ⁡( )=Li=β 0 + β 1 X i +ui
1−Pi
Arba Minch University 2022
This model is estimated using the maximum likelihood method (MLM). This is not done manually, but by
software like Stata. In this section we’ll see how to interpret the regression results of the binary Logit model.
There are three relevant Stata outputs for a Logit model:

Take the example given in the LPM discussion. Assume that we want study the determinants of labor force
participation (LFP) of adult men in a particular town. Suppose that we have data for 30 observations on labor
force participation (employment), age and years of schooling. There are three regression results:

i. Logit (logs of odds ratio) output & interpretation


This interpretation is the direct interpretation of the coefficients of logs of odds ratio.

Figure 2: Regression results of the Logit Model with Labor Force Participation as dependent variable
Logistic regression Number of obs = 30
LR chi2(3) = 13.89
Prob > chi2 = 0.0031
Log likelihood = -13.244611 Pseudo R2 = 0.3440

Employment Coef. Std. Err. z P>|z| [95% Conf. Interval]

Married -2.608477 1.199746 -2.17 0.030 -4.959936 -.2570191


Age -.0098882 .0389641 -0.25 0.800 -.0862564 .06648
Schooling .6741644 .3215753 2.10 0.036 .0438885 1.30444
_cons -5.266331 4.120739 -1.28 0.201 -13.34283 2.81017

β 0=−5.26 indicates the value of logs of odds ratio in favor of being employed when all explanatory variables
(marriage, age, and schooling) are zero. The marriage factor has significant negative effect on employment at
5% level of significance. As the coefficient of marriage is negative it puts forward that married individuals are
less likely to get employed as compared to unmarried individuals. The value of β 1=−2.61 implies that as one
moves from unmarried to married individuals the logs of odds in favor of being employed decreases by 2.61.
β 2=−0.0098 Means a unit change (a one year increase) in the explanatory variable (age) leads to a 0.0098
change (decrease) in the log-odds ratio in favor of success (being employed) keeping other things constant.
Similarly, β 3=0.67 indicates that, keeping other things remain constant, a one year change (a one year increase)
in years of schooling results in 0.67 change (increment) in the log-odds ratio in favor of being employed.
Therefore, from this Logit output we can see which explanatory variables significantly affect the dependent
variable, and if the effect is positive or negative.

Also note the Pseudo R-squared in the output. This is the replacer of the R 2 used in linear regression (chapter 2
and 3). Pseudo R-squared compares the unrestricted log likelihood l ur for the model we are estimating and the
restricted log likelihood l r with only an intercept. If the independent variables have no explanatory powers the
restricted model will be the same as the unrestricted model and the pseudo R-squared will be 0.

Arba Minch University 2022


2 l ur
Pseudo R =1−
lr

Next to the Pseudo R2, the Likelihood Ratio (LR) test can be used to judge if the whole model is significantly
explaining the variation in the regressand (Y). The H0: the model is as good as a restricted model with only an
intercept. H1: the explanatory variables can significantly explain some variation in Y. In this case, the LR test
reveals that Married, Age and Schooling are able to explain some variation in Employment (Chi 2(3)=13.89,
p:0.0031), so the H0 is rejected and the H1 is assumed.

ii. The odds ratio interpretation


The odds ratio is the ratio of the probability of success to failure.
. logit,or

Logistic regression Number of obs = 30


LR chi2(3) = 13.89
Prob > chi2 = 0.0031
Log likelihood = -13.244611 Pseudo R2 = 0.3440

Employment Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

Married .0736466 .0883572 -2.17 0.030 .0070134 .7733534


Age .9901606 .0385807 -0.25 0.800 .917359 1.06874
Schooling 1.962392 .6310569 2.10 0.036 1.044866 3.685626
_cons .0051625 .0212734 -1.28 0.201 1.60e-06 16.61273

Figure3: Regression results of the Odds ratio of Logit Model with Labor Force Participation as dependent variable

Odds ratio greater than 1 means the probability of success is greater than the probability of failure vice versa. If odds ratio
is 1, the probability of success is equal with the probability of failure. These values are the exponents of coefficients.
Example for marriage the odds ratio is 0.074. This indicates that the odds ratio in favor of being employed for married
individuals is 0.074 times their unmarried counter parts. Age’s odds ratio is 0.99, which is less than one, implying that
probability of success is less than probabilty of failure. This indicates that as age increases by one year, the odds ratio in
favor of being employed is 0.99.
iii. Probability interpretation (Marginal Effect Methods)

This shows how the probability of success changes as the independent variable changes. As it is specified
above,

Zi
1 e
p= −Z
= Z
1+e i
1+ e i

∂p
=β ¿
∂x

The Stata result given below is the marginal effect after logit of the above employment data.
Arba Minch University 2022
. mfx

Marginal effects after logistic


y = Pr(Employment) (predict)
= .69517648

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

Married* -.4849757 .16352 -2.97 0.003 -.805475 -.164476 .566667


Age -.0020954 .00821 -0.26 0.799 -.018188 .013997 41.3333
School~g .1428596 .06298 2.27 0.023 .019429 .26629 11.8333

(*) dy/dx is for discrete change of dummy variable from 0 to 1

Figure 4: Regression results of the marginal effect of Logit Model with Labor Force Participation as dependent variable

The above marginal effect after logit result shows the effect of each explanatory variable on the probability of
being employed. Dy/dx of marriage = - 0.4849 indicates that keeping other things constant at average level, as
one moves from unmarried to married, the probability of being employed increases by 48.49%. Similarly, dy/dx
of age = -0.0021 indicates as age increase by one year then the probability of being employed decreases by
0.21% (assuming that all other explanatory variables score average).

4.4 THE PROBIT REGRESSION (NON-LINEAR MODEL)


The probit model is similar to logit modeling, but to explain the behavior of dichotomous dependent variable
(Y), it uses the normal cumulative distribution function as cumulative density function. Therefore, such model
sometimes called the normit model.
Pi

Cumulative Normal Distribution Function

Pi =1

Logistic Distribution Function

0 Arba Minch University 2022


Z −1 2
( Z ) dz
∫ √ 21 π e
Z β 0 +β 1 Xi
G (Z) = Pi= 2
e e
G (Z) = Pi= Z
= β 0+ β 1 Xi
−∞
1+ e 1+e
As can be seen from the above graph, the logistic distribution has a flatter tail than the normal distribution. This
2
π
is because the variance of the logistic distribution ( ) is greater than the variance of the standard normal
3
distribution (1). The difference between the coefficients of Logit model and that of Probit model is accounted to
the difference in the variance of the two distributions.
Assume that in our employment example, the decision to be employed or not depends on an unobservable
variable say, utility I i (the difference between utility obtained from being employed and the utility obtained
from being unemployed). If the utility obtained from being employed is greater than the utility obtained from
being unemployed the individual decides to be employed and vice versa. This variable, I i, is also known as a
latent variable that is determined by one (or more) explanatory variables.

I i=β 0 + β 1 X i
Where Xi is the explanatory variable affecting the continuous variable (the difference between the utility
obtained from being employed and the utility obtained from being unemployed).

How is the unobservable variable related to the actual decision to being employed? As before, let Y = 1 if the
individual get employed and Y = 0 if he/she does not. Now it is reasonable to assume that there is a critical or
threshold level of the index, call it u*(utility obtained from being unemployed) such that if u(utility obtained
from being employed) exceeds u*, the individual will get employed, otherwise it will not. The threshold u* ,
like Ii , is not observable, but if we assume that it is normally distributed with the same mean and variance, it is
possible not only to estimate the parameters of the index given above, but also to get some information about
the unobservable index itself.

The probability that u>u* can be calculated as follows:

Pi = Pr (Yi=1|Xi) = Pr (u≥u*)= Pr(β0 + β1Xi≥Zi) = F(β0 + β1Xi)

Where Pi = Pr (Yi=1|Xi) is the probability that an individual gets employed. Z i is the standard normal variable
which is normally distributed with a mean of 0 and variance of σ 2 . F is the standard normal CDF which can be
explicitly written as follow.

Z −1 2
1 ( Z ) dz
G (Z) = Pi= ∫ √2 π
e 2

−∞

Arba Minch University 2022


The area under the standard normal curve from −∞ up to Ii measures the probability of being employed and is
given by:
Ii −1 β 0+ β 1 Xi −1
1 1
P i=
√ −∞
2 π
∫ e 2 ( Z ) dz , which is Pi =
2

√ π
2
∫ e 2 ( Z ) dz
2

−∞

Therefore, the Probit model can be specified as follows: Yi=¿


Z −1 2
1 ( Z ) dz
¿
Pr (Yi = 1|Xi) = Pr (u≥ u ¿ = F(Z) = F(Z)= Pi= ∫ √2 π
e 2

−∞

This is estimated using the maximum likelihood method (MLM). The coefficients from the Probit model are
¿
difficult to interpret because they measure the change in the unobservable I i associated with a change in one of
the explanatory variables. A more useful measure is what we call the marginal effects.

Estimated coefficients using Logit and Probit are very different because different mathematical functions are
being fitted. But they are approximately the same in magnitude of marginal effect as well as sign of the
coefficients of the independent variables. Since it is difficult to estimate the coefficient of Probit model, we will
interpret the marginal effects.

Suppose from the above employment data the probit regresion is given as follows:
Probit regression Number of obs = 30
LR chi2(3) = 13.85
Prob > chi2 = 0.0031
Log likelihood = -13.26742 Pseudo R2 = 0.3429

Employment Coef. Std. Err. z P>|z| [95% Conf. Interval]

Married -1.45382 .6299875 -2.31 0.021 -2.688572 -.2190669


Age -.0061331 .0237867 -0.26 0.797 -.0527542 .040488
Schooling .3869652 .1759325 2.20 0.028 .0421439 .7317866
_cons -3.047916 2.399248 -1.27 0.204 -7.750356 1.654523

Figure 5: Regression results of the Probit Model with Labor Force Participation as dependent variable
The above result is the output of the probit model. We can interpret the sign of the coefficient, and its significance (by
using the z-test). However, it is difficult to interpret the magnitude of the coefficients. Also this output can be used to
evaluate the significance of the whole model (Pseudo R2 and the LR test, see example given for the Logit model).

Marginal effects after Probit

Arba Minch University 2022


The interpretation of marginal effects in the probit model is similar with logit model example the marginal effect of
married = - 0.47 means married individuals are less likely to be employed than unmarried counter parts. As one moves
from unmarried to married individuals the probability of being employed decreases by 47% keeping the effect of other
variables constant at average level.
. mfx

Marginal effects after probit


y = Pr(Employment) (predict)
= .67502799

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

Married* -.4692287 .16282 -2.88 0.004 -.788358 -.1501 .566667


Age -.0022073 .00854 -0.26 0.796 -.018947 .014532 41.3333
School~g .1392695 .06012 2.32 0.021 .02143 .257109 11.8333

(*) dy/dx is for discrete change of dummy variable from 0 to 1

Figure.6: Regression results of the Marginal Effects of Probit Model with Labor Force Participation as dependent variable

Arba Minch University 2022

You might also like