Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

Validation of predictive

regression models
Ewout W. Steyerberg, PhD
Clinical epidemiologist
Frank E. Harrell, PhD
Biostatistician

Personal background
Ewout Steyerberg:
Erasmus MC, Rotterdam, the Netherlands

Frank Harrell: Health Evaluation Sciences,


Univ of Virginia, Charlottesville, VA, USA

Validation of predictions from


regression models is of
paramount importance

Learning objectives:
knowledge of

common types of regression models


fundamental assumptions of regression
models
performance criteria of predictive
models
principles of different types of validation

Performance objectives
To be able to explain why validation is
necessary for predictive models
To be able to judge the adequacy of a
validation procedure

Predictive models provide


quantitative estimates of an
outcome, e.g.
Quality of life one year after surgery
Death at 30 days after surgery
Long term survival

Predictive models are often


based on regression analysis
y ~ a + sum(bi*xi)
y: outcome variable
a: intercept
bi: regression coefficient i
xi: predictor variable i
i in [1,many], usually 2 to 20

3 examples of regression
Quality of life one year after surgery:
continuous outcome, linear regression
Death at 30 days after surgery:
binary outcome, logistic regression
Long term survival:
time-to-outcome, Cox regression

Predictive models make


assumptions
Distribution
Linearity of continuous variables
Additivity of effects

Example: a simple logistic


regression model
30day mortality ~ a + b1*sex + b2*age
Assumptions:
Distribution of 30day mortality is binomial
Age has a linear effect
The effects of sex and age can be added

Assessing model assumptions


Examine model residuals
Perform specific tests
add nonlinear terms, e.g. age+age2
add interaction terms, e.g. sex*age

Model assumptions and


predictions
Better predictions if assumptions are met
Some violation inherent in empirical data
Evaluate predictions in new data

Evaluation of predictions
Calibration
average of predictions correct?
low and high predictions correct?
Discrimination
distinguish low risk from high risk
patients?

0.4
0.1

0.2

0.3

Area under ROC: 0.77


Calibration: OK

0.0

Actual 30-day mortality

Example: predicted probabilities

0.0
0.1
0.2
0.3
0.4
Predicted probability of 30-day mortality

3 types of validation
Apparent: performance on sample used to
develop model
Internal: performance on population
underlying the sample
External: performance on related but
slightly different population

Apparent validity
Easy to calculate
Results in optimistic performance
estimates

Apparent estimates optimistic


since same data used for:
Definition of model structure:
e.g. selection and coding of variables
Estimation of model parameters:
e.g. regression coefficients
Evaluation of model performance:
e.g. calibration and discrimination

Internal validity
More difficult to calculate
Test model in new data, random from
underlying population

Why internal validation?


Honest estimate of performance should
be obtained, at least for a population
similar to the development sample
Internal validated performance sets an
upper limit to what may be expected in
other settings (external validity)

External validity
Moderately easy to calculate when new
data are available
Test model in new data, different from
development population

Why external validation?


Various factors may differ from
development population, including
different selection of patients
different definitions of variables
different diagnostic or therapeutic
procedures

Internal validation techniques


Split-sample:
development / validation
Cross-validation:
alternating development / validation
extreme: n-1 develop / 1 validate
(jack-knife)
Bootstrap

Bootstrap is the preferred


internal validation technique
bootstrap sample for model development:
n patients drawn with replacement
original sample for validation: n patients
difference: optimism
efficiency: development and validation on
n patients

Example: bootstrap results for


logistic regression model
30-day mortality ~ a + b1*sex + b2*age
Apparent area under the ROC curve: 0.77
Mean area of 200 bootstrap samples:0.772
Mean area of 200 tests in original: 0.762
Optimism in apparent performance: 0.01
Optimism-corrected area: 0.76

External validation techniques


Temporal validation: same
investigators, validate in recent years
Spatial validation (other place): same
investigators, cross-validate in centers
Fully external: other investigators, other
centers

Example: external validity of


logistic regression model
30-day mortality ~ a + b1*sex + b2*age
Apparent area in 785 patients: 0.77
Tested in 20,318 other patients: 0.74
Tested by other investigators: ?

0.4
0.1

0.2

0.3

Area under ROC: 0.74


Calibration: reasonable

0.0

Actual 30-day mortality

Example: external validation

0.0
0.1
0.2
0.3
0.4
Predicted probability of 30-day mortality

Summary
Apparent validity gives an optimistic
estimate of model performance
Internal validity may be estimated by
bootstrapping
External validity should be determined
in other populations

Key references
tutorial and book on multivariable models
(Harrell 1996, Stat Med 15:361-87;
Harrell: regression modeling strategies, Springer 2001)

empirical evaluations of strategies


(Steyerberg 2000: Stat Med19: 1059-79)

internal validation (Steyerberg 2001:JCE 54: 774-81)


external validation
(Justice 1999: Ann Intern Med 130:515-24;
Altman 2000: Stat Med 19: 453-73)

Links
Interactive text book on predictive
modeling
http://www.neri.org/symptom/mockup/Chapter_8/

Harrells Regression modeling strategies


http://hesweb1.med.virginia.edu/biostat/rms/

You might also like