Business Analytics and Operations Research

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Business Analytics and

Operations Research
Roger R. Gung, Ph.D.
Sr. Director, Business Analytics & Operations Research
roger.gung@phoenix.edu

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 1


Generalized Linear Model (GLM)

 Generalized Linear Model is a flexible generalization of ordinary linear


regression that allows for dependent/target variables that have error
distribution models other than a normal distribution
 GLM generalizes linear regression by allowing the linear model to be related to
the dependent/target variable via a link function
    

 Two major types of GLM:


Logistic Regression
Count Regression

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 2


Logistic Regression

 Logistic Regression is a regression model where the dependent/target


variable is categorical
Binary dependent variable
Multiple category dependent variable
 Cases where the dependent variable has more than two outcome categories may be
analyzed in multinomial logistic regression, or,
 if the multiple categories are ordered, in ordinal logistic regression
 This course covers the case of a binary dependent variablethat is, where it
can take only two values, "0" and "1", which represent outcomes such as
pass/fail, win/lose, alive/dead or healthy/sick
 Link function for binary dependent variable:
Logit function
Probit function

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 3


Logistic Regression

 Logit function

log    

1

1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 4


Logistic Regression

 Probit function: Inverse CDF of standard normal distribution


     

1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 5


Count Regression

Count Model Mean-Variance Relationship


Model Mean Variance
Poisson  
Negative Binomial (NB1)   1
Negative Binomial (NB2)   1  
Poisson Inverse Gaussian   1  2
Negative Binomial P   1  
 
Generalized Poisson  1  

 Poisson Regression

log     

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 6


SAS Proc Logistic

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 7


SAS Proc Logistic

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 8


Variable Significance Level

 P-Value: Significance level of each variables impact to model goodness of fit


 Odds Ratio: Significance level of each variables impact on target variable
magnitude is measured by comparing the odds of the best level to the odds of
the worst level of the driver. It is defined as the following:
" "
  !  -
1  " #$%& '$($) 1  " *+,%& '$($)

 Information Value (IV) and Weight of Evidence (WOE): Significance level of each
variables impact on target variable magnitude

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 9


Nonlinear Regression

 Nonlinear Regression is a form of regression analysis in which


response/target variable is modeled by a function which is a nonlinear
combination of independent variables
 Examples of nonlinear functions include exponential functions, logarithmic
functions, trigonometric functions, power functions, Gaussian function
    
. /0 

Convert . /0 to

    



Do not convert to log     



 because is not the random
noise for 
 Similarly for the following forms:
    
log


              
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 10
Nonlinear Regression for Marketing

 Each independent variable has a lift factor 


to describe the diminishing return
(financial benefit) to the dependent variable
 It is a multiplicative/interactive model that can be converted to additive model
 Marketing demand model
   5
60  7

log     
log



9



6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 11
Binning Nonlinear Variable

 It is not easy to find the best nonlinear function for each numerical variable
 Convert numerical variables into categorical variables
 Apply decision tree to bin numerical variables
 Each bin/level will has a linear coefficient

less population





9
:
;

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 12


Binning Nonlinear Variable

 Binning nonlinear variables can also be applied to GLM


 The dependent variable need to be judgmentally binned to observe the
nonlinearity, because it is a probability measurement

less population





9
:
;

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 13


Receiver Operating Characteristic

 Receiver operating characteristic curve, i.e. ROC curve, is a graphical plot that
illustrates the diagnostic ability of a binary classifier system as its discrimination
threshold is varied
 ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
 The true-positive rate is also known as sensitivity, recall or probability of
detection in machine learning
 false-positive rate is also known as the fall-out or probability of false alarm and
can be calculated as (1 specicity)

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 14


ROC Basic Concept

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 15


ROC Space

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 16


Constructing ROC Curve

 In binary classification, the class prediction for each instance is made based on
a continuous random variable X, which is a score computed for the instance
(e.g. estimated probability in Logistic Regression)
 Given a threshold parameter T, the instance is classified as positive if X >T,
and negative otherwise
 Follows a probability density 1  if the instance actually belongs to class
positive, and 0  if otherwise.
 TPR is given
C
TPR  BD 1 
 FPR is given
C
TPR  BD 0 

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 17


Area under Curve, C-statistics

 ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
 The area under the curve (AUC), also known as C-statistics, is equal to the
probability that a classifier will rank a randomly chosen positive instance higher
than a randomly chosen negative one
 AUC is given by
C

AUC  H I J K > J 1J0JJ K J  "L > L 


C

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 18


Weight of Evidence

 In logistic regression, we usually have categorical variables. One can choose to


use Weight of Evidence (WOE) to transform categorical variables into numerical
variables
 WOE of each level of a categorical variable measures the strength of the level
for separating yes and no in response variable

 Benefits of WOE transformation


Linear assumption in logistic regression naturally holds (explained in next page)
Reduces degree of freedom and helps with overfitting problem
Check of collinearity between categorical variables becomes easy

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 19


WOE and Nave Bayes

 For independent variable


 1,2, P , the WOE of the Qth level is
calculated as follows:
UV!PW X. !YT !P, % !
[ \ Q\ T.].T 
_ X  X.
RS Q  TP  TP
UV!PW ^! !YT !P, % !
[ \ Q\ T.].T 
_ X  ^!

 Nave Bayes classifier is given by:


 X1a  X  1  a X  1 /a
TP  TP
 X0a X  0 a X  0 /a

X  1   X  1   X  1  ` X  1
 TP  TP
X  0   X  0   X  0  ` X  0
`
X  1 
X  1
 TP  TP
X  0 
X  0

c

WOE vector
 Linear assumption after WOE transformation holds!

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 20


WOE and Information Value (IV)

 Weight of evidence indicates the predictive power of a particular level of the


variable
 Information value (IV) assesses the differentiating power of the variable which
indicates how much differences from one level to other levels in regards to the
outcome, such as pass or fail
 IV is given by:
Id 

_ X  X.  
_ X  ^! f RS Q
e)) _

 One rule of thumb regarding IV is (Siddiqi, Naeem 2006):


IV < 0.02: unpredictive
0.02 <= IV < 0.1: weak
0.1 <= IV < 0.3: medium
0.3 <= IV < 0.5: strong
IV >= 0.5: suspicious and should be checked for over-predicting

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 21


Calculation of WOE & IV

 Here is an example of calculating WOE and IV

Distribution_ Distribution_
Degree Level #PSI #REG #NonREG REG NonREG WOE IV
Associate 29,553 4,224 25,329 16.4% 8.1% 0.701 0.058
Bachelor 119,216 12,820 106,396 49.8% 34.2% 0.376 0.059
Master 21,911 1,774 20,137 6.9% 6.5% 0.063 0.000
Doctorate 1,268 46 1,222 0.2% 0.4% -0.787 0.002
Missing 164,978 6,871 158,107 26.7% 50.8% -0.643 0.155
ALL 336,926 25,735 311,191 100% 100% 0.274

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 22


Conversion of WOE Coefficients

 When one uses WOE-transformed variables to build logistic regression, the


corresponding coefficient estimates represent the change of log odds in relation
to the increase of one unit in WOE. However, this coefficient can not be easily
interpreted.
 In order to obtain the coefficients for original variables, conversion of WOE
coefficients can be performed as follows:
Multiply the coefficient with WOE for each level
Choose one level as reference (for example Undergraduate), and subtract the
multiplied result for reference from each level
Then we get the coefficient for original variable Degree Level and interpretation is
simple. For example, log odds of REG for AA PSI is 0.318 more than that of
undergraduate PSI
Degree Level WOE Coefficient Estimate Multiplied Compared with Reference (Undergraduate)
Associate 0.701 0.687 0.318
Bachelor 0.376 0.369 --
Master 0.063 0.980 0.062 -0.307
Doctorate -0.787 -0.771 -1.140
Missing -0.643 -0.631 -0.999

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 23


Survival Analysis

 Survival analysis is for analyzing the expected duration of time until one or
more events happen, such as death in biological organisms and failure in
mechanical systems
 KaplanMeier survival analysis is a non-parametric statistical analysis used to
estimate the survival function from lifetime data
 KaplanMeier survival analysis can take into account some types of censored
data:
Right-censoring
Left-censoring

Individual 1

Individual 2 Right censoring

Individual 3

Individual 4 Left censoring

Experiment Start Time Experiment End Time


Calendar Time

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 24


KaplanMeier Survival Analysis

Individual 1

Individual 2 Right censoring

Individual 3

Individual 4 Left censoring

Survival Time

1.0
Survival Probability

2/3
2/4
1/4
Survival probability is unknown from this time point.

Survival Time
 Kaplan-Meier survival probability is often viewed as actual probability.

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 25


Survival Modeling

 Survival function is the basic model employed to describe


time-to-event phenomena.
 Survival function, S(t), is the probability of an individual
surviving beyond time t.

S (t ) = Pr(T > t ) = f (t )dt = 1 F (t )
t

dS (t )
f (t ) = S(t)
dt

-f(t)

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 26
Survival Modeling
- Hazard Rate/Function

 Hazard rate is the death rate at time t conditional on


survival until time t or later, that is T >= t.
 Suppose that an individual has survived for a time t and the
probability that it will not survive for an additional time t.
Pr(t T < t + t | T t )
h(t ) = lim t 0
t
= f (t ) / S (t )
d log[ S (t )]
=
dt
t
H (t ) = h(t )dt = log[S (t )]
0

S (t ) = e H ( t )
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 27
Survival Modeling
- Competing Risks

 Assume there are k competing risks (death factors).

Pr(t T < t + t , d = i | T t )
hi (t ) = lim t 0
t
k
hT (t ) = hi (t )
i =1

t
H T (t ) = hT (t )dt = log[S (t )]
0

S (t ) = e H T ( t )

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 28
Survival Modeling
- Parametric Regression

 Defined hazard function a parametric regression function.

\  \ . 60 &g0

 Parametric Regression:

log[h(t )] = log[h0 (t )] + i (t ) X i
= 0 (t ) + i (t ) X i

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 29
Survival Modeling
- Semi-parametric Regression

 Cox (1972) defined proportional Hazard Rate:

\|a \ . 60 g0 60 g0 g0 


  .
\|a  \ .
60 g0

where a is the referenced attribute vector

 From which, we fit parameters/coefficients for regressors


using maximum likelihood estimator, without worrying
about baseline hazard rate h0(t).
 After getting coefficients, fit baseline hazard rate h0(t).

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 30
Partial Likelihood Estimator for Cox
Proportional Hazard Survival Model

 In a sample size n consisting of (Tj, Xj), j = 1, 2, , n, we assume the


censoring time is non-informative in that, given Xj, the event and
censoring time for the jth individual are independent.
 Let t1 < t2 < < tD denote the order of event times and X(k)i be the ith
covariates associated with the individual whose death time is tk.
 Define the risk set at time tk, R(tk), as the set of all individuals who are
still under study at the time just prior to tk.
 The partial likelihood based on the hazard rate is expressed by
r m
. 0no 60 g l 0
j k 5 m
0no 60 gp0
sc _q&l  .
r t r
m
log j k   
L s
 log . 0no 60 gp0

sc
c sc _q&l 

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 31
Life-Span Survival Modeling
- Fitted Functions with 2 Competing Risks

Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >= 24, FPA from Aug2009 to Aug2010) Employer Billing = No
1 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Graduation
0.7
Region = Northeast
Survival Probability

Withdraw Age = 35
0.6
AGI = 30k
0.5 Combined

0.4
0.3
0.2

0.1
0
101
106
111
116
121
126
131
136
141
146
151
156
161
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96

Week

6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 32
Life-Span Survival Modeling
- Fitted Function vs Actual

Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
1.0 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Region = Northeast
0.7
Survival Probability

Age = any
0.6
AGI = any
0.5
0.4
Fitted
0.3
Actual
0.2
0.1
0.0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
116
121
126
131
136
141
146
151
156
161
Week

6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 33
Life-Span Survival Modeling
- Expected LS vs Actual LS

Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
Gender = Male
90 Marital Status = Single
80 Managed Own Funds = No
70 Region = Northeast
Actual LS (weeks)

60 Age = 35
50 AGI = 30k
40
30
20
10
0
20-25

25-30

30-35

35-40

40-45

45-50

50-55

55-60

60-65

65-70

70-75

75-80
Expected LS (weeks)

6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 34

You might also like