Professional Documents
Culture Documents
7071 I
7071 I
regression models
Ewout W. Steyerberg, PhD
Clinical epidemiologist
Frank E. Harrell, PhD
Biostatistician
Personal background
Ewout Steyerberg:
Erasmus MC, Rotterdam, the Netherlands
Learning objectives:
knowledge of
Performance objectives
To be able to explain why validation is
necessary for predictive models
To be able to judge the adequacy of a
validation procedure
3 examples of regression
Quality of life one year after surgery:
continuous outcome, linear regression
Death at 30 days after surgery:
binary outcome, logistic regression
Long term survival:
time-to-outcome, Cox regression
Evaluation of predictions
Calibration
average of predictions correct?
low and high predictions correct?
Discrimination
distinguish low risk from high risk
patients?
0.4
0.1
0.2
0.3
0.0
0.0
0.1
0.2
0.3
0.4
Predicted probability of 30-day mortality
3 types of validation
Apparent: performance on sample used to
develop model
Internal: performance on population
underlying the sample
External: performance on related but
slightly different population
Apparent validity
Easy to calculate
Results in optimistic performance
estimates
Internal validity
More difficult to calculate
Test model in new data, random from
underlying population
External validity
Moderately easy to calculate when new
data are available
Test model in new data, different from
development population
0.4
0.1
0.2
0.3
0.0
0.0
0.1
0.2
0.3
0.4
Predicted probability of 30-day mortality
Summary
Apparent validity gives an optimistic
estimate of model performance
Internal validity may be estimated by
bootstrapping
External validity should be determined
in other populations
Key references
tutorial and book on multivariable models
(Harrell 1996, Stat Med 15:361-87;
Harrell: regression modeling strategies, Springer 2001)
Links
Interactive text book on predictive
modeling
http://www.neri.org/symptom/mockup/Chapter_8/