Professional Documents
Culture Documents
Survival Analysis-Debby Raden
Survival Analysis-Debby Raden
Survival Analysis-Debby Raden
3
Introduction
Analysis of data that correspond to the time from a
well-defined time origin until the occurrence of
some particular event or end-point
5
Things that need to be precisely defined
Death
Disease (diagnostic, start of symptom, relapse)
Remission of diarrhea
Quit smoking
Menopause 6
Example of “time until any well-defined event”:
7
Special features of survival data
Special features:
C e n s o re d o b s e r v at i o n Tr u e S ( t )
Good alternatives Tr u e S ( t )
For example:
In a follow-up study, the censoring occurs due to the end of the study,
loss of follow-up, or early withdrawals
Reasons for censoring:
o Patients decide to move to another hospital
o Patients quit treatment because of side-effects of a drug
o Failures occur after the end of study, etc. 11
Censoring—Random Censoring
Ideally, we would like to observed the “complete data” t1, t2, t3, …….,
tn
Due to censoring, we only observe “right-censoring data”:
yi = ti if ti ≤ ci
ci if ti > ci
The censoring indicators
δi = 1 if data is uncensored, ti ≤ ci
0 if data is censored, ti > ci
12
Data: (y , δ ), (y , δ ), …, (y , δ ) and possibly some covariate
Censoring—Random Censoring
yi 25 18 17 22 27
δi 1 0 1 0 1
13
Study Time and Patient Time
Patient Time
Study Time
14
Study Time and Patient Time
15
Survival Function
and
Hazard Function
Cumulative Distribution Function
F(t) = Pr (T ≤ t)
17
Survival Function S(t)
F(t) = Pr (T ≤ t)
Characteristics of S(t):
(a) S(t) = 1 if t < 0
(b) S(∞)= limt->∞ S(t)=0
(c) S(t) is non-increasing in t
19
Density Function S(t)
Note that:
20
Hazard Function S(t)
Here:
λ(t)Δt≈ the proportion of individuals experiencing failure in (t, t+Δt)
to those surviving up to t 21
Hazard Function S(t)
Example:
(a) Constant hazard λ(t) = λ0
(b) Increasing hazard λ(t2) ≥ λ(t1), if t2 ≥ t1
(c) Decreasing hazard λ(t2) ≤ λ(t1), if t2 ≥ t1
(d) U-shape hazard (human mortality for age at death)
22
Cumulative Hazard Function (chf) Λ
(t)
Definition: Cumulative hazard function (chf)Λ(t).
(a) If T is discrete, let xi’s be the mass points
23
Relationship among Functions
(a) If T is discrete,
24
Relationship among Functions
∫∞0 λ(u)du= ∞ 25
METHODS
IN SURVIVAL ANALYSIS
Methods in Survival Analysis
27
Methods in Survival Analysis
28
Non-Parametric Procedures
Estimating Survival Function (for no censored observations)
Equivalenty
32
Example: Life-table estimates of the Survivor
Function
33
Example: Life-table estimates of the Survivor
Function
34
35
Kaplan-Meier Estimate
The Kaplan-Meier Estimate
Let y(1) < y(2) < … < y(k), k≤ n, be the distinct, uncensored,
and ordered failure times.
44
The Kaplan-Meier Estimate
For example
Data: 3, 2+, 0, 1, 5+, 3, 5
45
The Kaplan-Meier Estimate
46
The Kaplan-Meier Estimate
N(1)=7, N(2)=6, N(3)=4, N(4)=2
d(1)=1, d(2)=1, d(3)=2, d(4)=1
Now estimate
4747
The Kaplan-Meier Estimate
Example: 3, 2+, 0, 1, 5+, 3, 5
Uncensored 0 1 3 5
times
di 1 1 2 1
Ni 7 6 4 2
4848
Steps in Kaplan-Meier Estimation
49
The Kaplan-Meier Estimate
Example:
10 individuals followed up to 24 months
Follow-up times:
17 4 8+ 20 24+ 13 16+ 2 9 10+
50
The Kaplan-Meier Estimate
ti
survival at ti survival function S(t)
2 9/10= 0.9 0.900
4 8/9= 0.889 0.900 x 0.889 = 0.800
9 6/7= 0.857 0.800 x 0.857 = 0.686
13 4/5= 0.8 0.686 x 0.800 = 0.549
17 2/3= 0.667 0.549 x 0.667 = 0.366
20 1/2= 0.5 0.366 x 0.500 = 0.183
51
The Kaplan-Meier Estimate
52
The Kaplan-Meier Estimate
If the largest observed time is uncensored, the Kaplan- Meier
estimate will reach the value 0 as t ≥ the largest observed time
In this case, we say that S(t) undetermined for t > the largest
observed time.
53
Notes
The calculations are made at the times when events occur. Censoring times are
skipped. Censored observations only contribute information up to the time when
they are withdrawn.
Kaplan & Meier call this method the “product-limit estimate”, for it is the limit of the
life table with the shortest possible intervals (as short as it is needed to include only
one event)
The method is theoretically designed for exact event times. In case of approximate
(rounded) times (e.g., years), ties can occur:
more than one event at one given time: no problem event(s) and censored
observations.
Convention: place the censored observation after the events at each failure time with
ties (suggested by Kaplan&Meier,1959).
The K-M estimate is a nonparametric method which can be applied to either discrete
or continuous data. 54
Greenwood’s Formula
For estimating the variance of the Kaplan-Meier estimate:
55
Greenwood’s Formula
Confidence Interval
Greenwood method (1962)
56
Remark
Remark 1
Property
When n is large
Where σ(t)2 can be estimated by the Greenwood’s formula.
Remark 2
The accuracy of the K-M estimate and Greenwood’s formula relies on large sample size of
uncensored data. Make sure that you have at least, say, 20 or30 uncensored failure times
in your data set before using the methods.
Remark 3
Greenwood’s formula is more appropriate when 0<<S(t)<<1. Using Greenwood’s formula,
the confidence interval limits could be above 1 or below 0. In this case, we usually replace
these limit points by 1 or 0.
For example, a 95% confidence interval could be (0.845,1.130), we will use(0.845,1) instead.
57
58
59
60
61
62
Proportional Hazard Model
(PHM)
Proportional Hazard Model (COX, 1972)
Assume that at any given time (t), that hazard in those exposed to a
certain risk factor [ h1 (t) ] is a multiple of some underlying hazard [ h0
(t) ]
Lets “call” that multiplying factor” ” (so that it always has a positive value,
regardless of ’s value); so that
When X=1
When X=0
Thus
Proportionality assumption
OR
Notes
The baseline hazard can be constant (as in the exponential model) or
changing (as in other parametric models).
In fact, the baseline hazard could have any shape. It is not necessary to
figure out the shape of the baseline hazard,
as long as the hazard in the exposed is always a multiple (that of the
unexposed (proportional).
For example:
Notes
The fact that to estimate there is no need to estimate h0 (t) is why this is
considered a semiparametric model.
Problem
Compare the hazard of two groups that are identical with respect
to all characteristics expect that X1=1 in one group (exposed) and
X1=0 in another (unexposed)
Relative hazard
=
The Cox model can be expressed as
a function of survival in exposed and
unexposed:
XJ XM XW That is
Forsyth Co 0 0 0 xJ=1 for Jackson, 0 otherwise
Jackson 1 0 0 xM=1 for Minneapolis and 0 otherwise xW=1 for
Washington Co. and 0 otherwise
Minneapolis 0 1 0
Washington DC 0 0 1 In the model :
Interpretation of the regression coefficients
Graphical Analysis:
Survival curves: not very
sensitive except to
assess total lack of
proportionality: cross-
over
Checking Proportionality
Collett, David. (2013). Modelling Survival Data in Medical Research. Second Edition.
Published by Chapman & Hall.
David G. Kleinbaum, Mitchel Klein. (2012). Survival Analysis: A Self-Learning Text.
Third Edition. Springer-Verlag New York
David W. Hosmer, Stanley Lemeshow, Susanne May. (2008) Applied Survival
Analysis: Regression Modeling of Time to Event Data (Wiley Series in Probability and
Statistics). John Wiley & Sons, Inc.
Steve Selvin.(2008) Survival Analysis for Epidemiologic and Medical Research
(Practical Guides to Biostatistics and Epidemiology). Cambridge University Press.
97
98