Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Categorical dependent variable models

Introduction

• Standard linear regression models are applied when the


dependent variable is continuous such as asset
returns, rental value of properties, saving, expenditure,
output, etc.

• But there are many situations in which the dependent


variable in a regression equation simply represents a
discrete choice assuming only a limited number of
values.

• Models involving dependent variables of this kind are


called categorical (limited, discrete or qualitative)
dependent variable models.
• In discrete choice models, the values that the dependent
variables may take are limited to certain integers (e.g. 0,
1, 2, 3, and 4) or even binary (only 0 or 1).

• Throughout our discussion we shall restrict ourselves to


cases of qualitative choice where the set of alternatives
is binary.

• For the sake of convenience the dependent variable is


given a value of 0 or 1.
• The independent variables that affect the success or
failure (that is, indicators of financial status) of
companies may be:

– working capital to total assets ratio


– retained earnings to total assets ratio
– earnings before interest and taxes to total assets ratio
– sale to total assets ratio.

• Thus, we would predict the probability of failure of


companies on the basis of these explanatory variables.
The linear probability model (LPM)
• The problem with this model is that for any individual
whose income is more than birr 60,000, the model-
predicted probability of defaulting is negative.

• For instance, the probability of defaulting of an individual


whose income is birr 80,000 is:

0.15 – 0.0025(80) = -0.05


• Clearly, such predictions cannot be allowed to stand
since we know that the probability of an event is
always a number between 0 and 1 (inclusive), that is,
probabilities can never be negative.

• The LPM can also produce probabilities that are


greater than one.

• Thus, the use of the LPM when the dependent variable


is categorical may lead to nonsense probabilities.
The logit model
Illustration
EViews procedure
First import the data from Excel to EViews
Click on Quick and then select Estimate Equation…
Under Method: in the Equation Estimation pop-up window,
select Binary – Binary Choice (Logit, Probit, Extreme Value)
From Binary estimation method:, select Logit
Click on OK to view the results of the fitted multiple logistic
regression model
• Is the fitted model adequate (is the model a good fit to
the data)?

• To answer this we can use the Hosmer and Lemeshow


Test.

• The null hypothesis of this test is that the model fits


the data well.

• If the null is rejected, we have to re-specify the model.


In the equation window, click on View and select
Goodness-of-Fit Test (Hosmer-Lemeshow)
Click on OK on a new dialog box to view the results of the Hosmer and
Lemeshow Test.

The p-value of the H-L Statistic (0.3867) is greater than 0.05.

Thus, we do not reject the null hypothesis and conclude that the model
fits the data well.
• If the coefficient of a qualitative explanatory variable is:

– negative, then the odds or likelihood of defaulting is higher


for the reference category (the category that is assigned the
value zero).

– positive, then the probability of defaulting is higher for the


non-reference category as compared to the reference
category.
Interpretation of results
Debt-to-Income ratio

• Debt-to-income ratio is a quantitative explanatory


variable.

• The coefficient of debt-to-income ratio is positive.

• This implies that increases in debt-to-income ratio


increases the probability of defaulting, keeping all
other covariates fixed.
Household income

• Household income is a quantitative variable.

• The coefficient of income is negative.

• Thus, increases in income decreases the probability


of defaulting, keeping all other covariates fixed.
Number of residents in the household

• This variable is again quantitative.

• Since the coefficient is positive, increases in the number


of residents leads to increases in the probability of
defaulting on a bank loan.
Level of education

• The positive coefficient implies that an increase in the


level of education of an individual increases the
probability of his/her defaulting.

You might also like