Business Analytics and Operations Research

Business Analytics and
Operations Research
Roger R. Gung, Ph.D.
Sr. Director, Business Analytics & Operations Research
roger.gung@phoenix.edu
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 1

Generalized Linear Model (GLM)
Generalized Linear Model is a flexible generalization of ordinary linear

regression that allows for dependent/target variables that have error
distribution models other than a normal distribution
GLM generalizes linear regression by allowing the linear model to be related to
the dependent/target variable via a link function

Two major types of GLM:

Logistic Regression
Count Regression

Logistic Regression
Logistic Regression is a regression model where the dependent/target

variable is categorical
Binary dependent variable
Multiple category dependent variable
Cases where the dependent variable has more than two outcome categories may be
analyzed in multinomial logistic regression, or,
if the multiple categories are ordered, in ordinal logistic regression
This course covers the case of a binary dependent variablethat is, where it
can take only two values, "0" and "1", which represent outcomes such as
pass/fail, win/lose, alive/dead or healthy/sick
Link function for binary dependent variable:
Logit function
Probit function

Logistic Regression
Logit function

log

1
1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)

Logistic Regression
Probit function: Inverse CDF of standard normal distribution

1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)

Count Regression
Count Model Mean-Variance Relationship

Model Mean Variance
Poisson
Negative Binomial (NB1) 1
Negative Binomial (NB2) 1
Poisson Inverse Gaussian 1 2
Negative Binomial P 1

Generalized Poisson 1
Poisson Regression
log


SAS Proc Logistic

SAS Proc Logistic

Variable Significance Level
P-Value: Significance level of each variables impact to model goodness of fit

Odds Ratio: Significance level of each variables impact on target variable
magnitude is measured by comparing the odds of the best level to the odds of
the worst level of the driver. It is defined as the following:
" "
! -
1 " #$%& '$($) 1 " *+,%& '$($)
Information Value (IV) and Weight of Evidence (WOE): Significance level of each
variables impact on target variable magnitude

Nonlinear Regression
Nonlinear Regression is a form of regression analysis in which

response/target variable is modeled by a function which is a nonlinear
combination of independent variables
Examples of nonlinear functions include exponential functions, logarithmic
functions, trigonometric functions, power functions, Gaussian function

. /0
Convert . /0 to

Do not convert to log

because is not the random
noise for
Similarly for the following forms:

log

Nonlinear Regression for Marketing
Each independent variable has a lift factor

to describe the diminishing return
(financial benefit) to the dependent variable
It is a multiplicative/interactive model that can be converted to additive model
Marketing demand model
5
60 7
log
log

9

Binning Nonlinear Variable
It is not easy to find the best nonlinear function for each numerical variable
Convert numerical variables into categorical variables
Apply decision tree to bin numerical variables
Each bin/level will has a linear coefficient
less population

9
:
;


Binning Nonlinear Variable
Binning nonlinear variables can also be applied to GLM

The dependent variable need to be judgmentally binned to observe the
nonlinearity, because it is a probability measurement
less population

9
:
;


Receiver Operating Characteristic
Receiver operating characteristic curve, i.e. ROC curve, is a graphical plot that
illustrates the diagnostic ability of a binary classifier system as its discrimination
threshold is varied
ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
The true-positive rate is also known as sensitivity, recall or probability of
detection in machine learning
false-positive rate is also known as the fall-out or probability of false alarm and
can be calculated as (1 specicity)

ROC Basic Concept

ROC Space

Constructing ROC Curve
In binary classification, the class prediction for each instance is made based on
a continuous random variable X, which is a score computed for the instance
(e.g. estimated probability in Logistic Regression)
Given a threshold parameter T, the instance is classified as positive if X >T,
and negative otherwise
Follows a probability density 1 if the instance actually belongs to class
positive, and 0 if otherwise.
TPR is given
C
TPR BD 1
FPR is given
C
TPR BD 0

Area under Curve, C-statistics
ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
The area under the curve (AUC), also known as C-statistics, is equal to the
probability that a classifier will rank a randomly chosen positive instance higher
than a randomly chosen negative one
AUC is given by
C
AUC H I J K > J 1J0JJ K J "L > L

C

Weight of Evidence
In logistic regression, we usually have categorical variables. One can choose to

use Weight of Evidence (WOE) to transform categorical variables into numerical
variables
WOE of each level of a categorical variable measures the strength of the level
for separating yes and no in response variable
Benefits of WOE transformation

Linear assumption in logistic regression naturally holds (explained in next page)
Reduces degree of freedom and helps with overfitting problem
Check of collinearity between categorical variables becomes easy

WOE and Nave Bayes
For independent variable

1,2, P , the WOE of the Qth level is
calculated as follows:
UV!PW X. !YT !P, % !
[ \ Q\ T.].T
_ X X.
RS Q TP TP
UV!PW ^! !YT !P, % !
[ \ Q\ T.].T
_ X ^!
Nave Bayes classifier is given by:

X1a X 1 a X 1 /a
TP TP
X0a X 0 a X 0 /a
X 1 X 1 X 1 ` X 1
TP TP
X 0 X 0 X 0 ` X 0
`
X 1
X 1
TP TP
X 0
X 0

c
WOE vector
Linear assumption after WOE transformation holds!

WOE and Information Value (IV)
Weight of evidence indicates the predictive power of a particular level of the

variable
Information value (IV) assesses the differentiating power of the variable which
indicates how much differences from one level to other levels in regards to the
outcome, such as pass or fail
IV is given by:
Id

_ X X.
_ X ^! f RS Q
e)) _
One rule of thumb regarding IV is (Siddiqi, Naeem 2006):

IV < 0.02: unpredictive
0.02 <= IV < 0.1: weak
0.1 <= IV < 0.3: medium
0.3 <= IV < 0.5: strong
IV >= 0.5: suspicious and should be checked for over-predicting

Calculation of WOE & IV
Here is an example of calculating WOE and IV
Distribution_ Distribution_
Degree Level #PSI #REG #NonREG REG NonREG WOE IV
Associate 29,553 4,224 25,329 16.4% 8.1% 0.701 0.058
Bachelor 119,216 12,820 106,396 49.8% 34.2% 0.376 0.059
Master 21,911 1,774 20,137 6.9% 6.5% 0.063 0.000
Doctorate 1,268 46 1,222 0.2% 0.4% -0.787 0.002
Missing 164,978 6,871 158,107 26.7% 50.8% -0.643 0.155
ALL 336,926 25,735 311,191 100% 100% 0.274

Conversion of WOE Coefficients
When one uses WOE-transformed variables to build logistic regression, the

corresponding coefficient estimates represent the change of log odds in relation
to the increase of one unit in WOE. However, this coefficient can not be easily
interpreted.
In order to obtain the coefficients for original variables, conversion of WOE
coefficients can be performed as follows:
Multiply the coefficient with WOE for each level
Choose one level as reference (for example Undergraduate), and subtract the
multiplied result for reference from each level
Then we get the coefficient for original variable Degree Level and interpretation is
simple. For example, log odds of REG for AA PSI is 0.318 more than that of
undergraduate PSI
Degree Level WOE Coefficient Estimate Multiplied Compared with Reference (Undergraduate)
Associate 0.701 0.687 0.318
Bachelor 0.376 0.369 --
Master 0.063 0.980 0.062 -0.307
Doctorate -0.787 -0.771 -1.140
Missing -0.643 -0.631 -0.999

Survival Analysis
Survival analysis is for analyzing the expected duration of time until one or
more events happen, such as death in biological organisms and failure in
mechanical systems
KaplanMeier survival analysis is a non-parametric statistical analysis used to
estimate the survival function from lifetime data
KaplanMeier survival analysis can take into account some types of censored
data:
Right-censoring
Left-censoring
Individual 1
Individual 2 Right censoring
Individual 3
Individual 4 Left censoring
Experiment Start Time Experiment End Time

Calendar Time

KaplanMeier Survival Analysis
Individual 1
Individual 2 Right censoring
Individual 3
Individual 4 Left censoring
Survival Time
1.0
Survival Probability
2/3
2/4
1/4
Survival probability is unknown from this time point.
Survival Time
Kaplan-Meier survival probability is often viewed as actual probability.

Survival Modeling
Survival function is the basic model employed to describe

time-to-event phenomena.
Survival function, S(t), is the probability of an individual
surviving beyond time t.

S (t ) = Pr(T > t ) = f (t )dt = 1 F (t )
t
dS (t )
f (t ) = S(t)
dt
-f(t)
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 26
Survival Modeling
- Hazard Rate/Function
Hazard rate is the death rate at time t conditional on

survival until time t or later, that is T >= t.
Suppose that an individual has survived for a time t and the
probability that it will not survive for an additional time t.
Pr(t T < t + t | T t )
h(t ) = lim t 0
t
= f (t ) / S (t )
d log[ S (t )]
=
dt
t
H (t ) = h(t )dt = log[S (t )]
0
S (t ) = e H ( t )
Survival Modeling
- Competing Risks
Assume there are k competing risks (death factors).
Pr(t T < t + t , d = i | T t )
hi (t ) = lim t 0
t
k
hT (t ) = hi (t )
i =1
t
H T (t ) = hT (t )dt = log[S (t )]
0
S (t ) = e H T ( t )
Survival Modeling
- Parametric Regression
Defined hazard function a parametric regression function.
\ \ . 60 &g0
Parametric Regression:
log[h(t )] = log[h0 (t )] + i (t ) X i
= 0 (t ) + i (t ) X i
Survival Modeling
- Semi-parametric Regression
Cox (1972) defined proportional Hazard Rate:
\|a \ . 60 g0 60 g0 g0

.
\|a \ .
60 g0
where a is the referenced attribute vector
From which, we fit parameters/coefficients for regressors

using maximum likelihood estimator, without worrying
about baseline hazard rate h0(t).
After getting coefficients, fit baseline hazard rate h0(t).
Partial Likelihood Estimator for Cox
Proportional Hazard Survival Model
In a sample size n consisting of (Tj, Xj), j = 1, 2, , n, we assume the

censoring time is non-informative in that, given Xj, the event and
censoring time for the jth individual are independent.
Let t1 < t2 < < tD denote the order of event times and X(k)i be the ith
covariates associated with the individual whose death time is tk.
Define the risk set at time tk, R(tk), as the set of all individuals who are
still under study at the time just prior to tk.
The partial likelihood based on the hazard rate is expressed by
r m
. 0no 60 g l 0
j k 5 m
0no 60 gp0
sc _q&l .
r t r
m
log j k
L s
log . 0no 60 gp0
sc
c sc _q&l
Life-Span Survival Modeling
- Fitted Functions with 2 Competing Risks
Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >= 24, FPA from Aug2009 to Aug2010) Employer Billing = No
1 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Graduation
0.7
Region = Northeast
Withdraw Age = 35
0.6
AGI = 30k
0.5 Combined
0.4
0.3
0.2
0.1
0
101
106
111
116
121
126
131
136
141
146
151
156
161
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
Week
- Fitted Function vs Actual
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
1.0 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Region = Northeast
0.7
Age = any
0.6
AGI = any
0.5
0.4
Fitted
0.3
Actual
0.2
0.1
0.0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
116
121
126
131
136
141
146
151
156
161
Week
- Expected LS vs Actual LS
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
Gender = Male
90 Marital Status = Single
80 Managed Own Funds = No
70 Region = Northeast
Actual LS (weeks)
60 Age = 35
50 AGI = 30k
40
30
20
10
0
20-25
25-30
30-35
35-40
40-45
45-50
50-55
55-60
60-65
65-70
70-75
75-80
Expected LS (weeks)

Business Analytics and Operations Research

Uploaded by

Copyright:

Available Formats

You might also like

Business Analytics and Operations Research

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Analytics and Operations Research

Uploaded by

Copyright:

Available Formats

Business Analytics and

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 1

Generalized Linear Model is a flexible generalization of ordinary linear

Two major types of GLM:

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 2

Logistic Regression is a regression model where the dependent/target

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 3

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 4

Probit function: Inverse CDF of standard normal distribution

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 5

Count Model Mean-Variance Relationship

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 6

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 7

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 8

P-Value: Significance level of each variables impact to model goodness of fit

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 9

Nonlinear Regression is a form of regression analysis in which

Do not convert to log    

Each independent variable has a lift factor 

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 12

Binning nonlinear variables can also be applied to GLM

6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 13

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 14

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 15

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 16

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 17

AUC  H I J K > J 1J0JJ K J  "L > L

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 18

In logistic regression, we usually have categorical variables. One can choose to

Benefits of WOE transformation

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 19

For independent variable

Nave Bayes classifier is given by:

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 20

Weight of evidence indicates the predictive power of a particular level of the

One rule of thumb regarding IV is (Siddiqi, Naeem 2006):

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 21

Here is an example of calculating WOE and IV

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 22

When one uses WOE-transformed variables to build logistic regression, the

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 23

Individual 2 Right censoring

Individual 4 Left censoring

Experiment Start Time Experiment End Time

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 24

Individual 2 Right censoring

Individual 4 Left censoring

6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 25

Survival function is the basic model employed to describe

Hazard rate is the death rate at time t conditional on

Assume there are k competing risks (death factors).

Defined hazard function a parametric regression function.

\  \ . 60 &g0

Cox (1972) defined proportional Hazard Rate:

\|a \ . 60 g0 60 g0 g0

where a is the referenced attribute vector

From which, we fit parameters/coefficients for regressors

In a sample size n consisting of (Tj, Xj), j = 1, 2, , n, we assume the

Do not convert to log

Each independent variable has a lift factor

AUC H I J K > J 1J0JJ K J "L > L

\ \ . 60 &g0

\|a \ . 60 g0 60 g0 g0