Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Logistic regression

model

Case studies for choice


models
Choice model cater to cases where the response variable are
categorical variables
Home loan/credit card/ Consumer loan defaults { default vs. no
default}
Fraud detection {fraud case vs. no fraud}
Customer Churn Analysis {churn vs. no churn}
Propensity to buy models { buy vs. no buy|

Linear regression bad choice when


response variables are categorical

- Clearly simplest
model could be y =1
when tumor size is
greater than 5
- In the first model one
could do that by
saying y_predicted
>0.5
- Adding a few more
grey points should not
result in new model or
a new line because in
reality the cut has not
changed

General structure for choice


models
X

Home loan default


Income
Debt to Income
Default on other
loans
Salaried vs.
Business
Expense to
Income

Credit Score

Probability of
default

Logistic regression model


Instead of predicting absolute value we predict probability
of an event
1.2
Probability
of Cancer
1
0.8
0.6
0.4
0.2
0
0

P(z) = 1/(1+exp(-z))
6

10

Tumor Size

12

14

16

Sigmoid function

Error function(analogy)

Y=0

(p-0)
Roughly
MLE

1
Error

Y=1

(1-p)

Error

p1 y (1 p ) y

Minimiz
e

p y (1 p )1 y

Maximiz
e

MLE
(Maximum
Likelihood)

Estimate parameter using


Maximum Likelihood

Max yi ln( p ( zi )) (1 yi ) ln(1 p ( zi ))


i

where
zi xi

Churn Model Example

Setting Threshold for


classification
Positive

Threshold

Negative

High Threshold -> High Accuracy low


capture
Low Threshold -> Low Accuracy high
capture

Picking a threshold:
KS Chart
- Divide the
population into
deciles
-

Take upper limit of


all deciles and plot
the cumulative
percentage of good
and bad examples

- Pick the
score/threshold of
the decile where the
separation between
good and bad is the
maximum

Truth Table to measure


accuracy
False Negative Rate = False Negative/Total Actual False
(specificity)
True Positive Rate = True Positive/Total Actual True
(sensitivity)
actual
True

False

True

True Positive

False
Positive

False

True
Negative

False
Negative

Predicted

Max sensitivity and


Specificity
Choose the threshold where both sensitivity and specificity are
maximized

Goodness of fit ROC Curve

- The dotted line


represents the case
where model has not
learnt anything i.e. picks
the same percentage of
of false positives and
True Positives
- The area under the blue
curve therefore
represents the goodness
of fit (0.5<Area<1)

You might also like