Professional Documents
Culture Documents
Business Analytics and Operations Research
Business Analytics and Operations Research
Business Analytics and Operations Research
Operations Research
Roger R. Gung, Ph.D.
Sr. Director, Business Analytics & Operations Research
roger.gung@phoenix.edu
Logit function
log
1
1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)
1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)
Poisson Regression
log
Information Value (IV) and Weight of Evidence (WOE): Significance level of each
variables impact on target variable magnitude
Convert . /0 to
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 10
Nonlinear Regression for Marketing
log
log
9
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 11
Binning Nonlinear Variable
It is not easy to find the best nonlinear function for each numerical variable
Convert numerical variables into categorical variables
Apply decision tree to bin numerical variables
Each bin/level will has a linear coefficient
less population
9
:
;
less population
9
:
;
Receiver operating characteristic curve, i.e. ROC curve, is a graphical plot that
illustrates the diagnostic ability of a binary classifier system as its discrimination
threshold is varied
ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
The true-positive rate is also known as sensitivity, recall or probability of
detection in machine learning
false-positive rate is also known as the fall-out or probability of false alarm and
can be calculated as (1 specicity)
In binary classification, the class prediction for each instance is made based on
a continuous random variable X, which is a score computed for the instance
(e.g. estimated probability in Logistic Regression)
Given a threshold parameter T, the instance is classified as positive if X >T,
and negative otherwise
Follows a probability density 1 if the instance actually belongs to class
positive, and 0 if otherwise.
TPR is given
C
TPR BD 1
FPR is given
C
TPR BD 0
ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
The area under the curve (AUC), also known as C-statistics, is equal to the
probability that a classifier will rank a randomly chosen positive instance higher
than a randomly chosen negative one
AUC is given by
C
X 1 X 1 X 1 ` X 1
TP TP
X 0 X 0 X 0 ` X 0
`
X 1
X 1
TP TP
X 0
X 0
c
WOE vector
Linear assumption after WOE transformation holds!
Distribution_ Distribution_
Degree Level #PSI #REG #NonREG REG NonREG WOE IV
Associate 29,553 4,224 25,329 16.4% 8.1% 0.701 0.058
Bachelor 119,216 12,820 106,396 49.8% 34.2% 0.376 0.059
Master 21,911 1,774 20,137 6.9% 6.5% 0.063 0.000
Doctorate 1,268 46 1,222 0.2% 0.4% -0.787 0.002
Missing 164,978 6,871 158,107 26.7% 50.8% -0.643 0.155
ALL 336,926 25,735 311,191 100% 100% 0.274
Survival analysis is for analyzing the expected duration of time until one or
more events happen, such as death in biological organisms and failure in
mechanical systems
KaplanMeier survival analysis is a non-parametric statistical analysis used to
estimate the survival function from lifetime data
KaplanMeier survival analysis can take into account some types of censored
data:
Right-censoring
Left-censoring
Individual 1
Individual 3
Individual 1
Individual 3
Survival Time
1.0
Survival Probability
2/3
2/4
1/4
Survival probability is unknown from this time point.
Survival Time
Kaplan-Meier survival probability is often viewed as actual probability.
dS (t )
f (t ) = S(t)
dt
-f(t)
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 26
Survival Modeling
- Hazard Rate/Function
S (t ) = e H ( t )
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 27
Survival Modeling
- Competing Risks
Pr(t T < t + t , d = i | T t )
hi (t ) = lim t 0
t
k
hT (t ) = hi (t )
i =1
t
H T (t ) = hT (t )dt = log[S (t )]
0
S (t ) = e H T ( t )
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 28
Survival Modeling
- Parametric Regression
Parametric Regression:
log[h(t )] = log[h0 (t )] + i (t ) X i
= 0 (t ) + i (t ) X i
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 29
Survival Modeling
- Semi-parametric Regression
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 30
Partial Likelihood Estimator for Cox
Proportional Hazard Survival Model
sc
c sc _q&l
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 31
Life-Span Survival Modeling
- Fitted Functions with 2 Competing Risks
Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >= 24, FPA from Aug2009 to Aug2010) Employer Billing = No
1 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Graduation
0.7
Region = Northeast
Survival Probability
Withdraw Age = 35
0.6
AGI = 30k
0.5 Combined
0.4
0.3
0.2
0.1
0
101
106
111
116
121
126
131
136
141
146
151
156
161
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
Week
6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 32
Life-Span Survival Modeling
- Fitted Function vs Actual
Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
1.0 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Region = Northeast
0.7
Survival Probability
Age = any
0.6
AGI = any
0.5
0.4
Fitted
0.3
Actual
0.2
0.1
0.0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
116
121
126
131
136
141
146
151
156
161
Week
6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 33
Life-Span Survival Modeling
- Expected LS vs Actual LS
Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
Gender = Male
90 Marital Status = Single
80 Managed Own Funds = No
70 Region = Northeast
Actual LS (weeks)
60 Age = 35
50 AGI = 30k
40
30
20
10
0
20-25
25-30
30-35
35-40
40-45
45-50
50-55
55-60
60-65
65-70
70-75
75-80
Expected LS (weeks)
6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 34