Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Session 2.

1
Binary Logistic Regression

Andrew Rogers
andrew.rogers@bristol.ac.uk

1
Move from linear to logistic regression

• Not all data is continuous


• Many data are binary: [0, 1]
• Yes/no
• Accept/reject
• Brand A/brand B
• Linear regression model is not suitable
• Values outside [0,1] range
• Dependent variable ~ Bin(1, μ) not N(μ,σ2)
• Non-constant variance μ(1- μ)
Move from linear to logistic regression

• Hence while the linear equation +…+


could theoretically produce values anywhere on the continuous
scale of (-∞, + ∞). The logit form ensures values [0,1].
The Logistic Curve
1

Probability
P(Y)

(ranges between 0.5


0 and 1)

0
-5 - - - -4 - - - -3 - - - -2 - - - -1 - - - 0 0. 0. 0. 1 1. 1. 1. 2 2. 2. 2. 3 3. 3. 3. 4 4. 4. 4. 5
4. 4. 4. 3. 3. 3. 2. 2. 2. 1. 1. 1. 0. 0. 0. Z 2 5 7 2 5 7 2 5 7 2 5 7 2 5 7
752 752 752 752 752 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5

Values of z can be unbounded and


continuous

3
The Logistic curve

• Hence while the linear equation +…+


could theoretically produce values anywhere on the continuous
scale of (-∞, + ∞). The logit form ensures values [0,1].
The Logistic Curve Highly likely to be
1
“1” or e.g.
“accept”
P(Y)

0.5

Highly likely to be 0
-5 - - - -4 - - - -3 - - - -2 - - - -1 - - - 0 0. 0. 0. 1 1. 1. 1. 2 2. 2. 2. 3 3. 3. 3. 4 4. 4. 4. 5
“0” or e.g. “reject” 4. 4. 4. 3. 3. 3. 2. 2. 2. 1. 1. 1. 0. 0. 0. Z 2 5 7 2 5 7 2 5 7 2 5 7 2 5 7
752 752 752 752 752 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5
“Grey area”. Less
confident these are
“0” or “1”

4
Model estimation

• Model is estimated using the log(likelihood) function

Step1
OR
Step2

Step3 (

Step4 𝐿𝑜𝑔 ( 𝐿 )=𝐿𝑁 ( 𝐿𝑖𝑘𝑒𝑙𝑖h𝑜𝑜𝑑 )

Step5
−2𝐿𝐿=−2× ∑ 𝐿𝑜𝑔(𝐿)
The aim is to selecting model coefficients () to minimise the -2LL

5
Model diagnostics

• The Wald statistic (similar to the t-statistic in linear regression),


though follows a Chi-square distribution.
• Can be open to type II errors for large β (rejecting significant results)
distribution

• The coefficient is generated as a coeffieicnt for the X independent


variables

• Odds ratio is commonly used. Shares the same property as a β in linear


regression (i.e. it is the effect on the dependent variable of a 1 unit change
in the independent variable). This is the exponentiated β and is
interpreted as a ratio (i.e. Exp(β) = 1.1 means a 1 unit increase in X
=
increases the odds by 10%)

6
Assumptions of logistic regression

• Assumptions
• Logit format not linear. Assumption that there is a linear relationship
between the predictors and the logit form.
• i.e. the z= a +bX model and the logistic curve
• Errors are independent (as per linear regression)
• Multicollinearity should be avoided (as per linear regression)

• Issues
• Incomplete predictors - not sufficient combination of independent
variables to build a viable prediction model.
• (if trying to predict brand choice based on age/gender and there was no data referring to
younger male choices)
• Complete separation – if the dependent variable can be predicted exactly
by a linear combination of the independent variables
Explore the data set loan_data.sav

As with all analysis, it is useful to


explore and understand the data
• Categorical (nominal) variables will need to be recoded into a set
of binary variables (as seen in linear regression).
• SPSS automatically does this in the model building stage

9
• Load the variables into the logistic regression
(Analyze/Regression/Binary logistic).
• Click categorical and identify which variables are categorical (nominal)

10
• Select the iteration history option to assess the change in Log
Likelihood

11
• The algorithm start with
“Block 0”.
• All classifications on one group

• Just a constant in the model

• Note the categorical variable


has been produced
12
• The model compares the full model with the constant only model (a bit like ANOVA in
linear regression)
• It then tests whether the improvement in Log Likelihood is sufficiently better
• The treason that the Log likelihood is multiplied by -2 is that it follows a Chi square
distribution which makes it convenient to statistically assess.

Block 0 Block 1
Difference
(constant only model) (Full model)

Reject H0: no difference


in the models

13
• The classification table is very useful to understand how may of
the observations are categorised correctly.
• Also for those not classified correctly, what (if any) is the
systematic error.
• The risk in this analysis is accepting a high number of applications who
have defaulted, thinking they have not defaulted.

14
Ration of the B estimate to the Odds ratio (Odds of “1” vs
S.E. Large values mean more odds of “0”. Also the Exp of
confidence in the Be estimate Sig level of the B. B. Exp(B) >1 means the odds
Coefficient of the X
(similar to t-ratio in linear) (<.05 desirable) of the outcome occurring
variable in model Z. The
change in Z given a one increases. Exp(B)<1 the less
unit change in X likely.

Categorical variables are


automatically assigned a
! Note, many variables are non-significant.
“base” category and Re-run the model, removing the larger Sig levels,
probabilities versus the base one by one.

15

You might also like