Professional Documents
Culture Documents
Odeling THE Xpert: An Introduction To Logistic Regression
Odeling THE Xpert: An Introduction To Logistic Regression
• Electronically available
Medical Claims
• Standardized
Diagnosis, Procedures,
Doctor/Hospital, Cost • Not 100% accurate
• Under-reporting is common
Pharmacy Claims • Claims for hospital visits
can be vague
Drug, Quantity, Doctor,
Medication Cost
• Dependent Variable
Claims Sample • Quality of care
• Independent Variables
Expert Review • ongoing use of narcotics
• only on Avandia, not a good first
choice drug
Variable Extraction
• Dependent Variable
Claims Sample • Quality of care
• Independent Variables
Expert Review • Diabetes treatment
• Patient demographics
• Healthcare utilization
Expert Assessment • Providers
• Claims
• Prescriptions
Variable Extraction
• h
P (y = 1) ⇧ (p1 ) = 0.9, ⇧ (p2 ) = 0.8, ⇧ (q1 ) =
• Then )P (y = 0) = 1 P⇧(y(p= 1 ) 1)
= ⇧ (p2 ) = ⇧ (q1 ) = 0.5
• Independent variables x1 , x2 , . . . , xk
• Uses the Logistic Response ⇧ (x1Function
, x2 , . . . , xk ) = logistic ( 0 +
1 1
P (y = 1) = logistic
( + x(r)+ =x2 +...+ k xk )
1+e 0 1 1 2
1 + exp ( r)
• Nonlinear transformation of0 ,linear 1 , .regression
. . , k equation to
produce number between 0 and 1
L (⇧)
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
⇧ (Visits, Narcotics) = logistic ( 2
Understanding the Logistic Function
1
P (y = 1) = (
1+e 0 + 1 x1 + 2 x2 +...+ k xk )
1.0
• Positive values are
0.8
predictive of class 1
0.6
P(y = 1)
0.4
• Negative values are
0.2
predictive of class 0 0.0
-4 -2 0 2 4
1
P (y = 1) = (
1+e 0 + 1 x1 + 2 x2 +...+ k xk )
1
P (y = 1) = (
1+e 0 + 1 x1 + 2 x2 +...+ k xk )
log(Odds) = 0 + 1 x1 + 2 x2 + ... + k xk
(sensitivity) on y-axis
1.0
• Proportion of poor
0.8
care caught
0.4
(1-specificity) on x-axis
0.2
• Proportion of good
care labeled as poor
0.0
simultaneously
1.0
0.8
• High threshold
• Low specificity
0.0
1.0
for best trade off
0.8
• cost of failing to
0.6
• costs of raising false
0.4
alarms
0.2
0.0
1.0
1
for best trade off
0.81
0.8
• cost of failing to
0.63
0.6
• costs of raising false
0.44
0.4
alarms
0.25
0.2
0.07
0.0
1.0
1
0
for best trade off 0.1
0.81
0.8
• cost of failing to
0.63
0.2
0.6
• costs of raising false 0.4
0.3
0.44
0.4
alarms 0.5
0.6
0.7
0.25
0.2
0.8
0.9
0.07
0.0
1
0.0 0.2 0.4 0.6 0.8 1.0
• Measures of accuracy
N = number of observations
the curve
1.0
• Interpretation AUC = 0.775
0.8
• Given a random
0.6
proportion of the time
you guess which is
0.4
which correctly
0.2
• Less affected by
sample balance than
0.0
Maximum of 1
1.0
•
(perfect prediction)
0.8
True positive rate
0.6
0.4
0.2
0.0
Maximum of 1
1.0
•
(perfect prediction)
0.8
True positive rate
• Minimum of 0.5
0.6
(just guessing)
0.4
0.2
0.0