Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

4

3 5
2 78
Binary Logistic Regression
Analysis

252 3499
00715 10
The Concept of Binary Logistic Regression
Regression technique is used to assess the strength of a relationship between one dependent and independent variable(s). It helps in
predicting value of a dependent variable from one or more independent variable. Regression analysis helps in predicting how much
variance is being accounted in a single response (dependent variable) by a set of independent variables.

Linear regression analysis requires the outcome/criterion variable to be measured as a continuous variable. However, there may be
situations when the researcher would like to predict an outcome that is dichotomous/binary.

In such situation, a scholar can use Binary Logistic Regression to assess the impact of one of more predictor variables on the outcomes.
Logistic regression analysis is a method to determine the reason-result relationship of independent variable(s) with dependent
variable

The logistic regression predicts group membership


• Since logistic regression calculates the probability of success over the probability of failure, the results of the analysis are in the
form of an odds ratio.
• Logistic regression determines the impact of multiple independent variables presented simultaneously to predict membership of one
or other of the two dependent variable categories.
• In logistic regression, the expected outcome is represented by 1 while the other is coded as 0.
Example Scenarios
A scholar may utilize binary logistic regression in following situations
● A store would like to assess factors that lead to return/no return of the customer.
● A college would like to assess admission (admit/Do not admit) of a student based on Age, Grade,
Aptitude Test Results.
● Assess if a particular candidate wins/loses an election based on the time spent in constituency,
previously elected, no. of issues resolved.
● A HR researcher would like to ascertain how factors like experience, years of education, previous
salary, university ranking affect the selection of a candidate in a job interview.
● A scholar would like to predict the choice of bank (Public or Private) based on independent
variables that include Technology, Interest Rates, Value Added Services, Perceived Risks,
Reputation, and others.
Assumptions
• Logistic regression does not assume a linear relationship between the dependent and independent
variables.
• The independent variables need not be interval, nor normally distributed, nor linearly related, nor
of equal variance within each group
• The error terms (residuals) do not need to be normally distributed.
• The dependent variable must be a dichotomous ( 2 categories) for the binary logistic regression.
• The categories (groups) as a dependent variable must be mutually exclusive and exhaustive; a
case can only be in one group and every case must be a member of one of the groups.
• Larger samples are needed than for linear regression. A minimum of 50 cases per predictor is
recommended (Field, 2013)
• Leblanc and Fitzgerald (2000) suggest a minimum of 30 observations per independent variable.
Case Processing Summary and Encoding

The first section of the output shows Case Processing


Summary highlighting the cases included in the
analysis. In this example we have a total of 341
respondents.

The Dependent variable encoding table shows the coding for the
criterion variable, in this case those who will choose Private
sector banks are classified as 1 while those who will not choose
private sector banks are classified as 0.
Block 0

The next section of the output, headed Block 0, is the


results of the analysis without any of our independent
variables used in the model. This will serve as a
baseline later for comparing the model with our
predictor variables included.

In general, the block is not usable as there are no


predictors in the model.
Goodness-of-fit statistics help you to determine whether the model adequately
describes the data.

Omnibus Tests of Model Coefficients is used to test


the model fit. If the Model is significant, this shows
that there is a significant improvement in fit as
compared to the null model, hence, the model is
showing a good fit.

The Hosmer and Lemeshow test is also a test of Model


fit.
The Hosmer-Lemeshow statistic indicates a poor fit if
the significance value is less than 0.05. Here, the model
adequately fits the data. Hence, there is no difference
between the observed and predicted model.
The model adequately fits the data. As
we can see, there is no difference
between the observed and predicted
model. Both the values are
approximately equal.
Model Summary shows the Psuedo R-Square. Psuedo means that it is not
technically explaining the variation. But they can be used as approximate
variation in the criterion variable.
Normally used is Nagelkerke's R2, this is an adjusted version of the Cox &
Snell R-square that adjusts the scale of the statistic to cover the full range
from 0 to 1.
In this case we can say that 70.7% change in the criterion variable can be
accounted to the predictor variables in the model.
Classification Table
● The next Classification table provides an indication of
how well the model is able to predict the correct
category once the predictors are added into the study.
We can compare this with the Classification Table
shown for Block 0, to see how much improvement
there is when the predictor variables are included in
the model. The model correctly classified 75.1 percent
of cases overall (sometimes referred to as the
percentage accuracy in classification: PAC).

● In other words, this is the rate of correct classification


if we always predict that a respondent would choose
Private sector bank. Specifically, it presents
information on the degree to which the observed
outcomes are predicted by your model.
Classification Table ● The percentages in the first two rows provide information
regarding Specificity and Sensitivity of the model in terms of
predicting group membership on the dependent variable.

● Specificity (Also Called True Negative Rate) refers to percentage


of cases observed to fall into the non-target (or reference)
category (e.g., Those who will not select Private Bank) who were
correctly predicted by the model to fall into that group (e.g.,
predicted not to select Private).
● Overall, the accuracy rate was very good, at 75.7%. The model ● The specificity for this model is 17.8%.
exhibits good sensitivity since among those persons who will
choose Private banks over Public Banks, 96.4% were correctly ● Sensitivity (Also Called True Positive Rate) refers to percentage
predicted to Choose Private Banks based on the model.
of cases observed to fall in the target group (Y=1; e.g., those who
will select Private Bank) who were correctly predicted by the
model to fall into that group (e.g., predicted to select Private
Bank).
● The sensitivity for the model is 96.4%.
Variables in the Equation

● Odds is the Ratio of Probability - P(A)/P(B)


● This table shows the relationship between the predictors and the outcome.
● B (Beta) is the predicted change in Log Odds – for 1 unit change in predictor, there is Exp(B) change in the
probability of the outcome.
● The beta coefficients can be negative or positive, and have a t-value and significance of the t-value associated with each.
Variables in the Equation
Odds Ratio: 1
● Probability of falling into the target group is equal to
the probability of falling into the non-target group
Odds Ratio: > 1 (Probability of Event Occurring)
● Probability of falling into the target group is greater to
the probability of falling into the non-target group.
The Event is likely to occur.
Odds Ratio: < 1 (Probability of Event Occurring
Decreases)
● Probability of falling into the target group is Less to
the probability of falling into the non-target group.
The important thing about this confidence interval is that it The Event is unlikely to occur.
doesn’t cross 1 (both values are greater than 1). This is ● We can say that the odds of a customer choosing
important because values greater than 1 mean that as the Private Bank offering Value Added Services
predictor variable(s) increase, so do the odds of (in are 1.367 times higher than those Public Sector
this case) selecting Private Bank. Values less than 1 mean the
Banks which do not offer Value Added Services,
opposite: as the predictor increases, the odds of selecting
with a 95% CI of 1.097 to 1.703.
Private Bank Decreases.
References
● https://bookdown.org/chua/ber642_advanced_regression/binary-logistic-regression.html

You might also like