Survival Analysis-Debby Raden

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 98

Survival Analysis

time to event analysis

Debby Syahru Romadlon & Raden Ahmad Dedy M


Outline
• Introduction
• Survival Function and Hazard Function
• Methods in Survival Analysis
• Non-Parametric Procedures
• Life-Table Analysis
• Kaplan-Meier Estimate
• Proportional Hazard Model (PHM)
• Check for Proportional Hazard
• Two Sample Testing
Objectives

• Introduce the concepts,


• Analytical methods,
• Applications in survival analysis

3
Introduction
Analysis of data that correspond to the time from a
well-defined time origin until the occurrence of
some particular event or end-point

• A failure time (survival time, lifetime), T,


is a non-negative valued random
variable
• For most of the applications, the value
DATA of T is the time from a certain event to
a failure event.

5
Things that need to be precisely defined

1. “TIME” (with origin)


Time since recruitment into the study
Time since randomization (in a clinical trial)
Time since employment
Time since diagnosis (prognosis studies)
Time since infection (e.g. HIV)
Time since menarche
Calendar time
Age

2. “EVENT” (with date or precise time in appropriate scale

Death
Disease (diagnostic, start of symptom, relapse)
Remission of diarrhea
Quit smoking
Menopause 6
Example of “time until any well-defined event”:

a. Time from start of treatment to a failure time


b. Time from onset of infection to onset of disease
c. Time until death
d. Time until relapse or progression of cancer
e. Time until engraftment after bone marrow transplant
f. Time until relapse after quitting smoking
g. Time until the commission of a crime after a criminal is released from jail
h. Time from birth to death = age at death
i. Time from birth to onset of a disease – onset age

7
Special features of survival data

Special features:

(1)The survival data are generally not


symmetrically distributed (tend to be
positively skewed)
(2)The survival times are frequently
censored (i.e., when the end-point of
interest has not been observed for that
individual)

Important The actual survival time of an individual, t, is independent of


Assumption of any mechanism which causes that individual’s survival time to
censored data:
be censored at time c, where c < t (i.e., uninformative
censoring) 8
Key Assumption—Uninformative Censoring

C e n s o re d o b s e r v at i o n Tr u e S ( t )

Have, on the average, the same


experience after being
censored than those remaining
under observation
(“Uninformative censoring”,
censoring is not related to risk
of the event)

If losses “die” immediately after being lost


(e.g., because the more severe cases go to
another place for treatment) 9
Key Assumption—Uninformative Censoring

Good alternatives Tr u e S ( t )

No perfect ways to assess the


validity of this assumption;
therefore, there are good
alternatives can be taken:
 Use common sense and
judgement
 Examine baseline
characteristics of losses and If losses “have better prognosis (e.g., they
feel so good that they do not care about
retained observations
being in the study anymore).
10
Censoring—Random Censoring

This type of censoring will be the main censoring


mechanism that we deal with. It occurs when the
censoring time varies from individual to individual
and is unknown in advance.

For example:
 In a follow-up study, the censoring occurs due to the end of the study,
loss of follow-up, or early withdrawals
 Reasons for censoring:
o Patients decide to move to another hospital
o Patients quit treatment because of side-effects of a drug
o Failures occur after the end of study, etc. 11
Censoring—Random Censoring

Under random censoring, what is the actually observed data?

 Ideally, we would like to observed the “complete data” t1, t2, t3, …….,
tn
 Due to censoring, we only observe “right-censoring data”:
yi = ti if ti ≤ ci
ci if ti > ci
 The censoring indicators
δi = 1 if data is uncensored, ti ≤ ci
0 if data is censored, ti > ci
12
 Data: (y , δ ), (y , δ ), …, (y , δ ) and possibly some covariate
Censoring—Random Censoring

Example: A set of observed survival data is

yi 25 18 17 22 27

δi 1 0 1 0 1

The data can also be presented as


25 18+ 17 22+ 27

13
Study Time and Patient Time

Patient Time
Study Time

The calendar time period in The period of time that a


which an individual is in the patient spends in the study,
study measured from that patient’s
time origin

14
Study Time and Patient Time

15
Survival Function
and
Hazard Function
Cumulative Distribution Function

Definition: Cumulative distribution function F(t)

F(t) = Pr (T ≤ t)

17
Survival Function S(t)

Definition: Survival function S(t)


S(t) = Pr (T > t) = 1=Pr (T ≤ t) = 1 – F(t)

F(t) = Pr (T ≤ t)

Characteristics of S(t):
 (a) S(t) = 1 if t < 0
 (b) S(∞)= limt->∞ S(t)=0
 (c) S(t) is non-increasing in t

 In general, the survival function S(t) provides useful summary


statistics, such as the median survival time, t-year survival rate,
etc.
18
 Although the “outcome” is the
proportion surviving up to a
given time

 The survival curve also allows


estimating survival percentiles,
e.g., the median survival time.

19
Density Function S(t)

Definition: Density function f(t)


(a) If T is a discrete random variable:
f(t) = Pr (T = t )
(b) If T is (absolutely) continuous:

Note that:

20
Hazard Function S(t)

Definition: Hazard function λ(t)


(a) If T is discrete:

(b) If T is (absolutely) continuous:

Here:
λ(t)Δt≈ the proportion of individuals experiencing failure in (t, t+Δt)
to those surviving up to t 21
Hazard Function S(t)

Example:
(a) Constant hazard λ(t) = λ0
(b) Increasing hazard λ(t2) ≥ λ(t1), if t2 ≥ t1
(c) Decreasing hazard λ(t2) ≤ λ(t1), if t2 ≥ t1
(d) U-shape hazard (human mortality for age at death)

Remark: Modeling the hazard function is one way for parametric


modeling.

22
Cumulative Hazard Function (chf) Λ
(t)
Definition: Cumulative hazard function (chf)Λ(t).
(a) If T is discrete, let xi’s be the mass points

(b) If T is (absolutely) continuous

23
Relationship among Functions

(a) If T is discrete,

(b) If T is (absolutely) continuous


S(t) = Pr (T > t) = Pr ( T ≥ t ),

24
Relationship among Functions

A well-known relationship among the density, hazard, and


survival function is:

Thus S(t) = e-Λ(t) = e-


Or
Λ(t) = -ln S(t)

When T is a continous variable, we also have

∫∞0 λ(u)du= ∞ 25
METHODS
IN SURVIVAL ANALYSIS
Methods in Survival Analysis

Variable in interest: TIME to occurrence of an EVENT

Usual primary objective(s):


(1)To estimate SURVIVAL FUNCTION (Cumulative survival)
Methods: LIFE TABLE
KAPLAN MEIER

27
Methods in Survival Analysis

Variable in interest: TIME to occurrence of an EVENT

(2) To compare of survival curves in different groups


Methods: LOGRANK
NON-PARAMETRIC MODELS PHM (COX)

28
Non-Parametric Procedures
Estimating Survival Function (for no censored observations)

(1) Suppose a single sample of survival times, where none of the


observations are censored.
(2) Defined Survival function S(t)
the probability that an individual survives for a time greater than
t.
(3) Estimated by the empirical survival function:

Equivalenty

where the empirical distribution function F(t) is


30
Life-Table Analysis
Life-table estimates of the survivor function

The life table estimate of the survivor function:


First obtained by dividing the period of observation into a
series of time intervals.
These intervals need not necessarily be of equal length,
although they usually are.

32
Example: Life-table estimates of the Survivor
Function

33
Example: Life-table estimates of the Survivor
Function

34
35
Kaplan-Meier Estimate
The Kaplan-Meier Estimate

 The Kaplan-Meier estimator (1958, JASA) is a


nonparametric estimator for the survival function S.
 Consider now either random censoring or type-I
censoring.

Objective: Estimate the Survival function


Cumulative survival at time t, S(t)
The complement of the cumulative incidence

S (t ) = 1-[ Cumulative incidence] 43


The Kaplan-Meier Estimate
 Assume uninformative censoring. That is, assume that Ti
is independent of Ci for each i. The data are
(y1, δ1), (y2, δ2), …., (yn, δn)

 Let y(1) < y(2) < … < y(k), k≤ n, be the distinct, uncensored,
and ordered failure times.

 In the presence of censoring, this is estimated as a


product of conditional probabilities. Kaplan-Meier
estimate (Kaplan & Meier, 1958)

44
The Kaplan-Meier Estimate
 For example
 Data: 3, 2+, 0, 1, 5+, 3, 5

 (y(1), y(2), y(3), y(4))=(0, 1, 3, 5)

 Suppose y(i-1) < t < y(i). A principle of nonparametric


estimation of S is to assign positive probability to and
only to uncensored failure times. Therefore, we try to
estimate

45
The Kaplan-Meier Estimate

 How to estimate S(t)? Define

46
The Kaplan-Meier Estimate
N(1)=7, N(2)=6, N(3)=4, N(4)=2
d(1)=1, d(2)=1, d(3)=2, d(4)=1
Now estimate

The Kaplan-Meier estimate is thus

4747
The Kaplan-Meier Estimate
Example: 3, 2+, 0, 1, 5+, 3, 5

Uncensored 0 1 3 5
times

di 1 1 2 1

Ni 7 6 4 2

4848
Steps in Kaplan-Meier Estimation

For each time of


Sort the survival
2 occurrence of an event,
times from
1 compute the conditional
smallest to greatest
survival (1- “hazard”)

3 For each time of occurrence


of an event, calculate the
survival function (multiplying
conditional probabilities of
survival).

49
The Kaplan-Meier Estimate
Example:
10 individuals followed up to 24 months

6 died, 4 censored before end of follow up

Follow-up times:
17 4 8+ 20 24+ 13 16+ 2 9 10+

50
The Kaplan-Meier Estimate
ti
survival at ti survival function S(t)
2 9/10= 0.9 0.900
4 8/9= 0.889 0.900 x 0.889 = 0.800
9 6/7= 0.857 0.800 x 0.857 = 0.686
13 4/5= 0.8 0.686 x 0.800 = 0.549
17 2/3= 0.667 0.549 x 0.667 = 0.366
20 1/2= 0.5 0.366 x 0.500 = 0.183

Ranking: 2 4 8+ 9 10+ 13 16+ 17 20 24+

51
The Kaplan-Meier Estimate

52
The Kaplan-Meier Estimate
If the largest observed time is uncensored, the Kaplan- Meier
estimate will reach the value 0 as t ≥ the largest observed time

If the largest observed time is censored, the Kaplan-


Meier estimate will not go down to 0 and is unreliable for t >
largest yi.

In this case, we say that S(t) undetermined for t > the largest
observed time.

53
Notes
 The calculations are made at the times when events occur. Censoring times are
skipped. Censored observations only contribute information up to the time when
they are withdrawn.
 Kaplan & Meier call this method the “product-limit estimate”, for it is the limit of the
life table with the shortest possible intervals (as short as it is needed to include only
one event)
 The method is theoretically designed for exact event times. In case of approximate
(rounded) times (e.g., years), ties can occur:
 more than one event at one given time: no problem event(s) and censored
observations.
 Convention: place the censored observation after the events at each failure time with
ties (suggested by Kaplan&Meier,1959).
 The K-M estimate is a nonparametric method which can be applied to either discrete
or continuous data. 54
Greenwood’s Formula
 For estimating the variance of the Kaplan-Meier estimate:

55
Greenwood’s Formula

 Confidence Interval
 Greenwood method (1962)

56
Remark
Remark 1
 Property
 When n is large
 Where σ(t)2 can be estimated by the Greenwood’s formula.
Remark 2
The accuracy of the K-M estimate and Greenwood’s formula relies on large sample size of
uncensored data. Make sure that you have at least, say, 20 or30 uncensored failure times
in your data set before using the methods.
Remark 3
Greenwood’s formula is more appropriate when 0<<S(t)<<1. Using Greenwood’s formula,
the confidence interval limits could be above 1 or below 0. In this case, we usually replace
these limit points by 1 or 0.
For example, a 95% confidence interval could be (0.845,1.130), we will use(0.845,1) instead.

57
58
59
60
61
62
Proportional Hazard Model
(PHM)
Proportional Hazard Model (COX, 1972)
 Assume that at any given time (t), that hazard in those exposed to a
certain risk factor [ h1 (t) ] is a multiple of some underlying hazard [ h0
(t) ]
 Lets “call” that multiplying factor” ” (so that it always has a positive value,
regardless of ’s value); so that

When X=1
When X=0

= Cox regression coefficient, determined by


partial likelihood estimation
Proportional Hazard Model (COX, 1972)

 According to this model, is the logarithm of the relative hazard (RH):

 Thus
Proportionality assumption

 Assumes that changes in levels of the independent variables will produce


proportionate changes in the hazard function, independent of time

OR
Notes
 The baseline hazard can be constant (as in the exponential model) or
changing (as in other parametric models).
 In fact, the baseline hazard could have any shape. It is not necessary to
figure out the shape of the baseline hazard,
as long as the hazard in the exposed is always a multiple (that of the
unexposed (proportional).

 For example:
Notes

 Cox’s brilliant idea was to find a procedure to estimate in the presence


of censored data without needing to specify or estimate h0 (t) !!!!
Partial (or “conditional”) likelihood

 The fact that to estimate there is no need to estimate h0 (t) is why this is
considered a semiparametric model.

 The assumption if “proportionality” is implicit in the fact that there is only


one RH (one )for the whole follow-up.
Notes

 The Assumption of proportionality is


analogous to the assumption of lack of
multiplicative interaction (uniform RH
across multiple strata --- strata of time in
this case) needed to calculate Mantel-
Haenzel or logrank tests.

 If the hazard are not proportional,


particularly when the curves cross
(“qualitative interaction”), the model,
the ,and the RH are I-r-r-e-l-e-v-a-n-t!!! Danger of using this model as a
black box!
Extend to the multivariate
situation
 As with any regression method, the Cox model can be extended to
the multivariate situation:

 Problem
 Compare the hazard of two groups that are identical with respect
to all characteristics expect that X1=1 in one group (exposed) and
X1=0 in another (unexposed)
Relative hazard

=
 The Cox model can be expressed as
a function of survival in exposed and
unexposed:

 Although usually this is not the primary


objective, S0(t) (and h0(t)) can be estimated once the  ' s have
been obtained (Kalbfleisch & Prentice,1980).
 Thus ,adjusted survival curves can be obtained from the Cox model.
Other non-parametric alternatives to obtain adjusted survival estimates
have been described (Nieto & Coresh,1996).
Interpretation of the regression coefficients
 For a binary variable (X1={1, 0}):
Log (RH) corresponding to exposed (x1=1) compared to unexposed
(x1=0), adjusted for the other ‘x”.
 For a continuous variable:
Adjusted Log (RH) corresponding to an increment in one unit of x1:

 To calculate the RH corresponding to an increment in 10 units of x1, for


example:
Interpretation of the regression
coefficients
 For a categorical ordinal variable
 The same as for a continuous variable
 Where only one term x1 is included, only one estimate of the RH is
obtained
 Assumption
• LINEARITY:
The RH comparing x1=5 with x1=4 is identical to the
RH comparing x1=2 and x1=1
• If this assumption is not true, 2 possible strategies:
(1) Use dummy variable
(2) Use quadratic terms, (x12), polynomial regression, etc.
Interpretation of the regression coefficients
 For a non ordinal categorical variable:
 ”Factor” the variable, i.e., define indicator (dummy) variables:
 Example: ”Center”, in a study with participants from Jackson, Forsyth Co.,
Minneapolis, and Washington Co.
(1) Select one of the categories reference: e.g., Forsyth Co.
(2) Create one indicator variable for each of the remaining categories:

XJ XM XW That is
Forsyth Co 0 0 0 xJ=1 for Jackson, 0 otherwise
Jackson 1 0 0 xM=1 for Minneapolis and 0 otherwise xW=1 for
Washington Co. and 0 otherwise
Minneapolis 0 1 0
Washington DC 0 0 1 In the model :
Interpretation of the regression coefficients

𝛽 𝐽=¿ Adjusted log RH comparing participants from Jackson with


participants from Forsyth

Adjusted log RH comparing participants from Minneapolis with


𝛽 𝑀=¿ participants from Forsyth

𝛽𝑊 =¿ Adjusted log RH comparing participants from Washington Co. with


participants from Forsyth
Estimation of variance and SE

 In addition to the regression coefficient (β), it is possible to estimate the


variance and standard error [SE(β)].

 HYPOTHESIS TEST (H0: β =0)


Wald test:

or it’s square, with distribution (1 d.f.))


Estimation of variance and SE

 CONFIDENCE LIMITS ( 95 %):

and the corresponding CL for the relative hazard:

 Note: For a RH for an increment of more than one unit it is necessary


to multiply also the SE to obtain the CL. For example, for an
increment in 10 units:
How to deal with interaction
 Note that the coefficients for x1 are calculated assuming that x2, for
example, is the same in numerator and denominator, regardless of its
value.
 Assumption: there is no interaction between x1 and x2. To assess if this
assumption is true and handle possible
 interaction, 2 possible strategies:
 (1) Stratified analysis:
E.g.: calculate ß1 in those with x2=1 (ß1,x2=1) and in those with x2=0
(ß1,x2=0):
Check if ß1,x2=1 ≈ ß1,x2=0
(Advantage: Simple, easy to explain)

 (2) Add an interaction term to the regression


h1(t)=h0(t) x e{1x1+2x2+3x3+…..+kxk+1x2(x1x2)}
Checking Proportionality

 Add interaction term exposure*time: create a time-dependent


interaction term, redefined at each event time:

 Can use time as a continuous variable, or categorized in the


relevant intervals.

 The Wald statistic of this term assess whether there is a


“statistically significant departure from proportionality”
Checking Proportionality

 Graphical Analysis:
 Survival curves: not very
sensitive except to
assess total lack of
proportionality: cross-
over
Checking Proportionality

 Graphical Analysis (cont’d):


 Log of cumulative hazard
function [log(-log(S))]: if the
hazards are proportional, the
curves have to be parallel:
S1(t)=[S0(t)]exp( x)

 Taking logarithms and


multiplying by –1 in both sides:
-LogS1(t)= exp (x) X [-logS0(t)]

 Taking logarithms again:


Log[-logS1(t)]= x +log[-logS0(t)]
83
84
Check for Proportional Hazard
Figure: Baseline hazard functions for operation status data
Figure: Log minus log plot for two levels of operation
Methods for the examination of assumptions of the Cox Model

Assumption How to Assess Assumption How to Assess


Assumptions shared by all survival analysis Assumptions shared by other multiple regression methods:
-Non informative Knowledge about Categorizations of continuous
censoring subject/study -Linearity variables
populations Quadratic terms Residuals
-No secular trends
Comparing baseline Measures of influence
-Extreme values
characteristics Residuals/outliers
Knowledge about subject
Assumptions more unique to the Cox model: -Adequate variables Stepwise methods Best-model
selection Residuals
Interaction x* time
-Proportionality Graphical analysis Stratification
-No interaction
Stratification Add interaction terms
Two Sample Testing
Examples from the Precursors Study

 Survival according to smoking:


Log-rank Test for Right Censored Data
 Ideas:
 Create a 2x2 table at each uncensored failure time
 The construction of each 2x2 table is based on the
corresponding risk set.
 Combine information from tables
 The null hypothesis is
H0 : A(t)= B(t) (or, SA(t) = SB(t)) for all t

Note: Where “for all t” might be replaced by“for observed t”.


When do we reject H0

The null hypothesis is H0 : A(t)= B(t) for all t

Consider three different kinds of alternatives:


(A1) H1 : A  B no prior knowledge
(A2) H1 : A < B treatment A is better
(A3) H1 : A > B treatment B is better

Usually the significance level of a test is set up to be 0.05.


Example from the statistical
output
 Objective: Compare the Survival function two (or more) groups
 Exposed/Unexposed (2 sample) in this case
 Use Log-rank test
 Example
 Exposed: 2 4 8+ 9 10+ 13 16+ 17 20 24+
 Unexposed: 3+ 5 7+ 11+ 14 18 19+ 24+ 24+ 24+
Example from the statistical
output

Chi Square DF Pr>Chi- Square


Log-Rank 2.0640 1 0.1508
Wilcoxon 1.7468 1 0.1863
-2 Log(LR) 1.6773 1 0.1953
Compensating Bias

 KEY ASSUMPTION when comparing


two curves:
 ”Bias” due to censoring (if
present) is the same across study
groups (compensating bias):

 Generalizations of Log-rank tests for


K groups are available.
Summary

 After this course, students would understand


 The features and assumptions of survival data
 The definition of survival function and hazard function
 The use and interpretations of analytic methods for survival data,
including
• Life table analysis
• Kaplan-Meier analysis
• Log-rank test
• Cox proportional hazard model
References

 Collett, David. (2013). Modelling Survival Data in Medical Research. Second Edition.
Published by Chapman & Hall.
 David G. Kleinbaum, Mitchel Klein. (2012). Survival Analysis: A Self-Learning Text.
Third Edition. Springer-Verlag New York
 David W. Hosmer, Stanley Lemeshow, Susanne May. (2008) Applied Survival
Analysis: Regression Modeling of Time to Event Data (Wiley Series in Probability and
Statistics). John Wiley & Sons, Inc.
 Steve Selvin.(2008) Survival Analysis for Epidemiologic and Medical Research
(Practical Guides to Biostatistics and Epidemiology). Cambridge University Press.

97
98

You might also like