Professional Documents
Culture Documents
Cox Proportional Hazard Model
Cox Proportional Hazard Model
(PH) Model
Week 9
Overview (1/2)
What is a lifetime variable?
T: time (from a pre-determined origin) to an event of
interest (T>0).
Examples of events
death (biomedicine)
disease occurrence/recurrence (biomedicine)
failure of a machine component (industrial eng.)
marriage, divorce, birth of the first child …
(sociology)
employment, unemployment, strike,… (labor
economics)
3
Overview (2/2)
Applications:
biomedical research
industrial statistics – reliability
sociology
labor economics (James Heckman: Nobel prize
winner 2000)
4
f t PrT t , t T t
t lim
conditional
S t 0 density
t dt PrT t , t dt T t
conditional
probability
7
exp k 2 t S 2 t
k
Mean
The mean (for a positive random variable)
E (T ) tf (t )dt tdS (t ) [tdS (t ) |tt
0 S (t ) dt ] S (t ) dt
0 0 0 0
t
S t
S u du
t
S t
13
Comparison (1/5)
Comparison between survival analysis and other
statistics courses:
1. Data type
Other statistics courses: data T1 ,..., Tn is a random
sample of T.
iid
Parametric analysis: assumen
Ti ~ f (t )
Likelihood: L( ) f (ti ) maximize it (easy to
i 1
construct)
15
Comparison (2/5)
Nonparametric analysis Ti ~iid f (t ) but the form of f(t)
is unknown. n
Empirical estimator of F t : Fˆ t I Ti t n
i 1
n
Empirical estimator of S t : Sˆ t I Ti t n
i 1
Comparison (3/5)
Survival Analysis: complete data are usually not
available.
Example: right censoring some patients are
'
still be alive at the time of analysis (some Ti s are
unknown but we still have partial information)
17
Comparison (4/5)
2. Quantity of interest
Linear regression
E (T | Z ) T Z
Survival analysis
(t | Z ) 0 (t ) exp( T Z )
proportional hazards model
g{E (T | Z )} T Z ( g (.) : the link function in GLM)
18
Comparison (5/5)
3. Inference methods
Survival analysis: nonparametric and semi-
parametric are preferable due to their flexibility
and robustness property.
Classical nonparametric technique: rank-based
(Wilcoxon rank-sum test … etc) not suitable
for analyzing survival data which may be
incomplete.
19
Regression Analysis
(Cox PH Model) (1/5)
Why the Cox PH model is popular? Cox PH
model is “robust”: Will closely approximate
correct parametric model.
If correct model is:
Weibull: Cox model will approximate Weibull
Exponential: Cox model will approximate
exponential
20
HR
hˆ t , X*
hˆt , X
where
X* X 1* , X 2* ,, X *p
and
X X 1 , X 2 ,, X p
exp p
i 1 i i i
ˆ X * X
30
exp ˆ1 1.96sˆ
1
No interaction: simple formula
Interaction: complex formula
32
Example
> library(survival)
> data(ovarian) → ovarian cancer
The data consists of the following variables:
futime: survival time (in days) after diagnosis of the
cancer
fustat: 0 = censored, 1 = dead.
age: age in years
residual.dz: a measure of health condition after
chemotherapy.
rx: 1 = treatment A, 2 = treatment B
ecog.ps: measure of functioning of the ovaries
33
Result (1/2)
Call:
coxph(formula = Surv(futime, fustat) ~ age + resid.ds +
rx + ecog.ps, data = ovarian)
covariate coef exp(coef) se(coef) z p
age 0.125 1.133 0.0469 2.662 0.0078
resid.ds 0.826 2.285 0.7896 1.046 0.3000
rx -0.914 0.401 0.6533 -1.400 0.1600
ecog.ps 0.336 1.400 0.6439 0.522 0.6000
34
Result (2/2)
The estimated hazard ratio for an increase of 10 years in
age is HR (10) = exp (10 x 0.125) = 3.490. Thus, for
every increase of 10 years in age, the hazards of ovarian
cancer being present is estimated to increase roughly
3.5 times.
95% CI for β1:
exp ˆ1 1.96sˆ exp 0.125 1.96 * 0.0469
1