7071 I

Validation of predictive
regression models
Ewout W. Steyerberg, PhD
Clinical epidemiologist
Frank E. Harrell, PhD
Biostatistician
Personal background
Ewout Steyerberg:
Erasmus MC, Rotterdam, the Netherlands
Frank Harrell: Health Evaluation Sciences,

Univ of Virginia, Charlottesville, VA, USA
Validation of predictions from

regression models is of
paramount importance
Learning objectives:
knowledge of
common types of regression models

fundamental assumptions of regression
models
performance criteria of predictive
models
principles of different types of validation
Performance objectives
To be able to explain why validation is
necessary for predictive models
To be able to judge the adequacy of a
validation procedure
Predictive models provide

quantitative estimates of an
outcome, e.g.
Quality of life one year after surgery
Death at 30 days after surgery
Long term survival
Predictive models are often

based on regression analysis
y ~ a + sum(bi*xi)
y: outcome variable
a: intercept
bi: regression coefficient i
xi: predictor variable i
i in [1,many], usually 2 to 20
3 examples of regression
Quality of life one year after surgery:
continuous outcome, linear regression
Death at 30 days after surgery:
binary outcome, logistic regression
Long term survival:
time-to-outcome, Cox regression
Predictive models make

assumptions
Distribution
Linearity of continuous variables
Additivity of effects
Example: a simple logistic

regression model
30day mortality ~ a + b1*sex + b2*age
Assumptions:
Distribution of 30day mortality is binomial
Age has a linear effect
The effects of sex and age can be added
Assessing model assumptions

Examine model residuals
Perform specific tests
add nonlinear terms, e.g. age+age2
add interaction terms, e.g. sex*age
Model assumptions and

predictions
Better predictions if assumptions are met
Some violation inherent in empirical data
Evaluate predictions in new data
Evaluation of predictions
Calibration
average of predictions correct?
low and high predictions correct?
Discrimination
distinguish low risk from high risk
patients?
0.4
0.1
0.2
0.3
Area under ROC: 0.77

Calibration: OK
0.0
Actual 30-day mortality
Example: predicted probabilities
0.0
0.1
0.2
0.3
0.4
Predicted probability of 30-day mortality
3 types of validation
Apparent: performance on sample used to
develop model
Internal: performance on population
underlying the sample
External: performance on related but
slightly different population
Apparent validity
Easy to calculate
Results in optimistic performance
estimates
Apparent estimates optimistic

since same data used for:
Definition of model structure:
e.g. selection and coding of variables
Estimation of model parameters:
e.g. regression coefficients
Evaluation of model performance:
e.g. calibration and discrimination
Internal validity
More difficult to calculate
Test model in new data, random from
underlying population
Why internal validation?

Honest estimate of performance should
be obtained, at least for a population
similar to the development sample
Internal validated performance sets an
upper limit to what may be expected in
other settings (external validity)
External validity
Moderately easy to calculate when new
data are available
Test model in new data, different from
development population
Why external validation?

Various factors may differ from
development population, including
different selection of patients
different definitions of variables
different diagnostic or therapeutic
procedures
Internal validation techniques

Split-sample:
development / validation
Cross-validation:
alternating development / validation
extreme: n-1 develop / 1 validate
(jack-knife)
Bootstrap
Bootstrap is the preferred

internal validation technique
bootstrap sample for model development:
n patients drawn with replacement
original sample for validation: n patients
difference: optimism
efficiency: development and validation on
n patients
Example: bootstrap results for

logistic regression model
30-day mortality ~ a + b1*sex + b2*age
Apparent area under the ROC curve: 0.77
Mean area of 200 bootstrap samples:0.772
Mean area of 200 tests in original: 0.762
Optimism in apparent performance: 0.01
Optimism-corrected area: 0.76
External validation techniques

Temporal validation: same
investigators, validate in recent years
Spatial validation (other place): same
investigators, cross-validate in centers
Fully external: other investigators, other
centers
Example: external validity of

logistic regression model
30-day mortality ~ a + b1*sex + b2*age
Apparent area in 785 patients: 0.77
Tested in 20,318 other patients: 0.74
Tested by other investigators: ?
0.4
0.1
0.2
0.3
Area under ROC: 0.74

Calibration: reasonable
0.0
Actual 30-day mortality
Example: external validation
0.0
0.1
0.2
0.3
0.4
Predicted probability of 30-day mortality
Summary
Apparent validity gives an optimistic
estimate of model performance
Internal validity may be estimated by
bootstrapping
External validity should be determined
in other populations
Key references
tutorial and book on multivariable models
(Harrell 1996, Stat Med 15:361-87;
Harrell: regression modeling strategies, Springer 2001)
empirical evaluations of strategies

(Steyerberg 2000: Stat Med19: 1059-79)
internal validation (Steyerberg 2001:JCE 54: 774-81)

external validation
(Justice 1999: Ann Intern Med 130:515-24;
Altman 2000: Stat Med 19: 453-73)
Links
Interactive text book on predictive
modeling
http://www.neri.org/symptom/mockup/Chapter_8/
Harrells Regression modeling strategies

http://hesweb1.med.virginia.edu/biostat/rms/

7071 I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7071 I

Uploaded by

Copyright:

Available Formats

Validation of predictive

Frank Harrell: Health Evaluation Sciences,

Validation of predictions from

common types of regression models

Predictive models provide

Predictive models are often

Predictive models make

Example: a simple logistic

Assessing model assumptions

Model assumptions and

Area under ROC: 0.77

Actual 30-day mortality

Example: predicted probabilities

Apparent estimates optimistic

Why internal validation?

Why external validation?

Internal validation techniques

Bootstrap is the preferred

Example: bootstrap results for

External validation techniques

Example: external validity of

Area under ROC: 0.74

Actual 30-day mortality

Example: external validation

empirical evaluations of strategies

internal validation (Steyerberg 2001:JCE 54: 774-81)

Harrells Regression modeling strategies

You might also like