Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Logistic Regression

PRESENTATION TITLE 1
Logistic Regression – What and
Why
Often, business decisions deal with understanding or
estimating the probability associated with certain events or
behaviors which tend to be dichotomous – Binary dependent
variable*.
Eg. Predicting from demographic behavior whether a
person will (DV=1) or will not (DV=0) subscribe to a
magazine or use a product.
Eg. Predicting whether a cell phone user will churn (DV=1)
by the end of the year and switch to another carrier. DV = 0 if
the customer is retained.
*Note: If IDV is binary, case of dummy variable. If DV is
binary, it’s a case of Logistic regression.
2
Logistic Regression
Simmons.xls
Simmons stores uses direct mail promotion. It owns and
operates a national chain of women’s apparel. Five
thousand copies of an expensive four-color sales catalog
have been printed, and each catalog includes a coupon that
provides $50 discount on purchases of $200 or more.
The catalogs are expensive and Simmons would like to send
them to only those customers who have the highest
probability of using the coupon.

PRESENTATION TITLE 3
Logistic Regression
Management thinks that annual spending at Simmons stores
and whether a customer has a Simmons credit card are two
variables that might be helpful in predicting whether a
customer who receives the catalog will use the coupon.
Simmons conducted a pilot study using a random sample of
50 Simmons credit card customers and 50 other customers
who do not have Simmons credit card.
The amount each customer spent last year at Simmons is
shown in thousands of dollars and the credit card
information has been coded as 1 if the customer has a
Simmons credit card and 0 if not.

PRESENTATION TITLE 4
Logistic Regression
1. Help Simmons estimate whether a catalog recipient will
use the the coupon.
2. Estimate the probability of using the coupon for
customers who spend $2000 annually and do not have a
Simmons credit card.
3. Estimate the probability of using the coupon for
customers who spend $2000 annually and have a
Simmons credit card.

PRESENTATION TITLE 5
Logistic Regression Equation
• Linear Regression Equation:

• Logistic Regression Equation:


Categorical dependent Variables – Logistic Regression
The dependent variable, y, assumes two discrete values,
such as 0 and 1.

PRESENTATION TITLE 7
Logistic Regression Equation
• Linear Regression Equation:

• Logistic Regression Equation E(y) =

if only one independent variable x.

if set of independent
variables x1,x2…xn

8
Logistic Regression equation
Graph is S-shaped.

Y = 0 if the customer did not use the coupon


Y=1 if the customer used the coupon
X1 = annual spending at Simmons ($1000)
X2 is 0 if the customer does not have Simmons credit card
X2 is 1 if the customer has a Simmons Credit card

9
Logistic Regression equation

E(y) is a probability that y=1 given a particular set of values


for the independent variables X1, X2..Xn.
E(y) approaches 1, as the value of x becomes larger
E(y) approaches 0, as the value of x becomes smaller

E(ŷ) = e-2.14+0.34164(spending)+1.09873(card)
1+e-2.14+0.34164(spending)+1.09873(card)

10
Logistic Regression equation
1. Estimate of probability P(y=1 /x1=2,x2=0): Substitute the
values of X1 and X2 in the equation, calculate the
probability of using the coupon for customers who spent
$2000 and do not have the credit card =0.1880
2. Estimate of probability P(y=1/x1=2,x2=1): Substitute the
values of X1 and X2 in the equation, calculate the
probability of using the coupon for customers who spent
$2000 and have the credit card = 0.4099

E(ŷ) = e-2.14+0.34164(spending)+1.09873(card)
1+e-2.14+0.34164(spending)+1.09873(card)

11
Odds Ratio
Odds Ratio measures the impact on the odds of a one-unit
increase in only one of the independent variables.

Odds Ratio = Odds1/Odds0:


Odds0: P(y=1 /x1=2,x2=0) /1- P(y=1 /x1=2,x2=0)
Odds1: P(y=1 /x1=2,x2=1) /1- P(y=1 /x1=2,x2=1)
(or)
Odds Ratio = eβi
PRESENTATION TITLE 12
Odds Ratio & Estimated odds Ratio

Odds Ratio = Odds1/Odds0:


Odds0: P(y=1 /x1=2,x2=0) /1- P(y=1 /x1=2,x2=0)
Odds1: P(y=1 /x1=2,x2=1) /1- P(y=1 /x1=2,x2=1)

Odds0= 0.1880/(1- 0.1880) = 0.2315


Odds1= 0.4099 / (1-0.4099) = 0.6946
Odds Ratio = Odds1/Odds0
Estimated Odds Ratio= 0.6946/0.2315

=3 PRESENTATION TITLE 13
Odds Ratio (formula)& Estimated odds Ratio (value)

Calculation of odds ratios from equation:


1. Odds Ratio = eβi for x1

Odds Ratio = eβi


e0.341643 = 1.41
2. Odds Ratio for x2
Odds Ratio = eβi
e1.09873 = 3.00

PRESENTATION TITLE 14
Wald’s Chi-Square
Can be used to test multiple parameters simultaneously.
Null Hypothesis: set of parameters (θ) = some value or
zero (θ0)
Alternate Hypothesis = not equal
Measures Horizontal distance between θ and θ0.
Calculated statistic used in wald’s formula has chi-square
distribution table with one degree of freedom. For
regression, only one parameter tested at a time.

PRESENTATION TITLE 15
Hosmer Lemshow Test for
Goodness of Fit
• Statistical test for overall predictive accuracy based on
the actual and predicted values of the dependent
variable.
• Better model fit indicates a smaller difference in the
observed and predicted classification.
• Well fitted models have insignificant chi-square values.
• Model fit is accepted if the chi-square test value is >0.05

PRESENTATION TITLE 16
Thank you

You might also like