Chapter 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 5

Categorical dependent variable


models
Introduction

• Standard linear regression models are applied when the


dependent variable is continuous such as asset returns, rental
value of properties, income, saving, expenditure, output, etc.
• However, there are many situations in which the dependent
variable in a regression equation simply represents a discrete
choice assuming only a limited number of values.
• Models involving dependent variables of this kind are called
categorical (limited, discrete or qualitative) dependent
variable models.
Cont’d

• In such models, the values that the dependent variables may


take are limited to certain integers (e.g. 0, 1, 2, 3, and 4) or
even binary (only 0 or 1).
• Categorical dependent variable models may be used when a
decision maker faces a choice
• Among a set of alternatives meeting the following criteria:
1. The number of choices if finite
2. The choices are mutually exclusive (the person chooses only
one of the alternatives)
3. The choices are exhaustive (all possible alternatives are
included)
Cont’d

• The first criterion is a binding one. We can always refine the


available choices so that they can satisfy the last two criteria.
• Throughout our discussion we shall restrict ourselves to cases
of qualitative choice where the set of alternatives is binary.
• For the sake of convenience the dependent variable is given a
value of 0 or 1.
Cont’d
The linear probability model (LPM)
Cont’d
Cont’d
Cont’d
Cont’d
Cont’d

• The problem with this model is that for any individual whose
income is more than birr 60,000, the model-predicted probability of
defaulting is negative.
• For instance, the probability of defaulting of an individual whose
income is birr 80,000 is 0.15 - 0.0025(80) = -0.05.
• Clearly, such predictions cannot be allowed to stand since we know
that the probability of an event is always a number between 0 and
1 (inclusive), that is, probabilities can never be negative.
• The LPM can also produce probabilities that are greater than one.
• Thus, the use of the LPM when the dependent variable is
categorical may lead to nonsense probabilities.
The logit model

• The logit model approach overcomes the limitation of the LPM by


using a function that effectively transforms the regression model in
such a way that the fitted values (that is, the probabilities) are
bounded within the (0 , 1) interval.
• Visually, the fitted regression model appears as an S-shape rather
than a straight line. This is shown in Figure 1 below.
Cont’d
Cont’d
Cont’d
Illustration
Cont’d
Cont’d
Cont’d
Interpretation of results

• Debt-to-Income ratio
• The coefficient of debt-to-income ratio is positive. This implies that
increases in debt-to income ratio increases the probability of
defaulting, keeping all other covariates fixed.
• Household income
• The coefficient of income is negative. Thus, increases in income
decreases the probability of defaulting, keeping all other covariates
fixed.
• Number of residents in the household
• Since the coefficient is positive, increases in the number of
residents in the house leads to increases in the probability of
defaulting on a bank loan.
Cont’d

• Level of education
• The positive coefficient implies that increases in the level of education of
an individual increases in the probability of his/her defaulting.
• Home ownership status
• The coefficient of home ownership is positive. Since the reference
category is Own (reside in own house), the odds (likelihood or probability)
of defaulting is higher for the non-reference category (those who reside in
rented house).
• The odds ratio is calculated as:
• The interpretation is that the odds of defaulting are 1.377 times higher for
those who reside in rented houses as compared to those who reside in
own houses, keeping all other covariates fixed.
Cont’d

• Retired
• The coefficient of retired (whether the individual is retired or
not) is positive, and the odds ratio is:
• Since the reference category is Yes (retired), the likelihood of
defaulting is about 22 times higher for those who are not
retired as compared to those who are retired, keeping all
other covariates fixed.

You might also like