SCA - Module 10

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Logistic Regression

Week 11
Logistic Regression

▪ It is a variation of ordinary regression in which the dependent


variable is categorical. The independent variables may be continuous
or categorical
▪ It is generally used when the dependent variable is binary—(0 or 1)
▪ As in the credit-approval decision example that we have been using,
in which Y = 1 if the loan is approved and Y = 0 if it is rejected
▪ This situation is very common in many other business situations, to
classify customers as buyers or nonbuyers or credit-card
transactions as fraudulent or not
Logistic Regression

▪ To classify using logistic regression, estimate the probability p that it


belongs to category 1, P(Y = 1), and, consequently, the probability 1 -
p that it belongs to category 0, P(Y = 0)
▪ Cutoff value, typically is 0.5
▪ For instance, if p > 0.5, the observation would be classified into
category 1; otherwise lies in 0
▪ In logistic regression, the dependent variable is called the logit, which
is the natural logarithm of
▪ p / (1 – p)
Logistic Regression

▪ The ratio p / (1 – p) is called the odds


▪ A common notion in gambling. For example, if the probability of
winning a game is p = 0.2, then 1 - p = 0.8, so the odds of winning are
0.2/0.8 = 1/4
▪ That is win once for every four times you would lose, on average
▪ The values of predictor variable are transformed into probabilities by
a logistic function
Logistic Regression-Example
In Excel to determine whether a credit approval decision is likely to be
accepted. The dataset includes basic information of previous history
• Average credit points
• Rebounds
• Years of credit history
Approval Points Rebounds Years
0 13 4 7
1 14 5 5
0 14 5 7
1 13 10 10
1 15 5 6
0 15 5 5
0 18 3 3
1 18 7 6
1 22 6 8
0 22 10 4
1 25 12 12
0 25 5 6
Highlight the clusters which will change in the iterations
Quiz
Data provided by 3 companies Seed values
R1 2 7 A1 2, 7
R2 1 4 B1 4, 6
C1 1, 2
S1 2 5
S2 4 6
T1 1 2 Maximum iterations: 3
Or the results converged
T2 4 7

Data Distance to Cluster Cluster

A1 2 10 1 3
A2 2 5 1 3
A3 8 4 1 3
B1 5 8 1 3
B2 7 5 1 3
Deadline: Today (11:59 pm) B3 6 4 1 3
Submit Online (LMS) C1 1 2 1 3
C2 4 9 1 3

You might also like