Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Cox Proportional Hazard

(PH) Model
Week 9

Jerry D.T. Purnomo, Ph.D.


2

Overview (1/2)
 What is a lifetime variable?
T: time (from a pre-determined origin) to an event of
interest (T>0).
 Examples of events
 death (biomedicine)
 disease occurrence/recurrence (biomedicine)
 failure of a machine component (industrial eng.)
 marriage, divorce, birth of the first child …
(sociology)
 employment, unemployment, strike,… (labor
economics)
3

Overview (2/2)
 Applications:
 biomedical research
 industrial statistics – reliability
 sociology
 labor economics (James Heckman: Nobel prize
winner 2000)
4

Important Descriptive Measures for T


 Key points:
 Understand the meaning and application of each
measure.
 Understand the mathematical relationship
between different measures.
5

The Survival Function


 The survival function is defined as

S (t )  Pr(T  t )  1  F (t )   f (u)du
u t
where
dF (t ) dS (t )
f (t )  
dt dt
is the density function.
 The probability that the event has not occurred
by time t
 A non-increasing function
6

The Hazard Function (1/5)


 The hazard function is defined as

f t  PrT  t , t    T  t 
 t    lim 
conditional
S t   0  density

 t dt  PrT  t , t  dt  T  t  
conditional
probability
7

The Hazard Function (2/5)


 Three cases of the hazard function:
 human mortality - bathtub shape
 positive aging – hazard increases as time
passes.
 negative aging – hazard decreases as time
passes.
8

The Hazard Function (3/5)


 The cumulative hazard function is defined as
t
(t )    (u )du
0

 The cumulative hazard function is an increasing


function.
 This function is easier to estimate than  t  .
9

The Hazard Function (4/5)


 Important relationship between the survival
function and the hazard function:
t
d log S (t )
 (t )       (u )du  log( S (t ))
dt 0
t
S (t )  exp(  (u)du)  exp((t ))
0

where t  is the cumulative hazard function.


10

The Hazard Function (5/5)


Example: two-sample comparison
Case 1: two curves have a crossing point
Case 2: One dominates the other.
1 (t )  k 2 (t ) (hazards are proportional)
 
 S1 t   exp   1 u du  exp  1 t 
t

 exp  k 2 t   S 2 t 
k

(Survival functions have no crossing)


11

Mean
 The mean (for a positive random variable)
   
E (T )   tf (t )dt   tdS (t )  [tdS (t ) |tt 
 0   S (t ) dt ]   S (t ) dt
0 0 0 0

(integration by parts – a very useful skill in


survival analysis)
12

Mean Residual Life (1/2)


 The mean residual life is defined as

 u  t  f u  du
r t   E T T  t   t
S t  
 u  t  dS u  u t   S u  d u  t 
 u  

 t 

S t  

 S u  du
 t
S t  
13

Mean Residual Life (2/2)


 The relationship between r (t ) and S (t )
t
r (0) 1
S (t )  exp(  du )
r (t ) 0
r (u )
 Mean residual life is often used in actuarial
science (insurance)
 r (t ) is the average of remaining lifetime when
the person’s current age is t.
14

Comparison (1/5)
Comparison between survival analysis and other
statistics courses:
1. Data type
Other statistics courses: data T1 ,..., Tn is a random
sample of T.
iid
Parametric analysis: assumen
Ti ~ f (t )
Likelihood: L( )   f (ti )  maximize it (easy to
i 1
construct)
15

Comparison (2/5)
Nonparametric analysis Ti ~iid f (t ) but the form of f(t)
is unknown. n
Empirical estimator of F t  : Fˆ t    I Ti  t  n
i 1
n
Empirical estimator of S t  : Sˆ t    I Ti  t  n
i 1

Nonparametric estimator of f(t) : needs smoothing


technique
16

Comparison (3/5)
 Survival Analysis: complete data are usually not
available.
 Example: right censoring  some patients are
'
still be alive at the time of analysis (some Ti s are
unknown but we still have partial information)
17

Comparison (4/5)
2. Quantity of interest
Linear regression
E (T | Z )   T Z
Survival analysis
 (t | Z )  0 (t ) exp(  T Z )
 proportional hazards model
g{E (T | Z )}   T Z ( g (.) : the link function in GLM)
18

Comparison (5/5)
3. Inference methods
 Survival analysis: nonparametric and semi-
parametric are preferable due to their flexibility
and robustness property.
 Classical nonparametric technique: rank-based
(Wilcoxon rank-sum test … etc)  not suitable
for analyzing survival data which may be
incomplete.
19

Regression Analysis
(Cox PH Model) (1/5)
 Why the Cox PH model is popular? Cox PH
model is “robust”: Will closely approximate
correct parametric model.
 If correct model is:
 Weibull: Cox model will approximate Weibull
 Exponential: Cox model will approximate
exponential
20

Cox PH Model (2/5)


 Preliminary: regression models for survival data
T: the failure time
C: the censoring time
X: a 1×p covariate vector

 Objective: Model T based on X


21

Cox PH Model (3/5)


Model choice 1: The proportional hazard model

ht , X   h0 t  exp βT X 
 h0 t  exp  p
i 1
i X i 
h0 t  exp  p
i 1
i X i 
Baseline Exponential
hazard
Exponential Involves X’s but not
Involves t t (X’s are time
but not X’s independent)
 The covariate effect is “multiplicative” on the hazard.
22

Cox PH Model (4/5)


 An important feature of this formula, which
concerns the proportional hazards (PH)
assumption, is that the baseline hazard is a
function of t, but does not involve the X’s. In
contrast, the exponential expression shown
here, involves the X’s, but does not involve t.
The X’s here are called time-independent X’s.
 X’s involving t: time-dependent (requires
extended Cox model (no PH))
23

Cox PH Model (5/5)


Model choice 2: The accelerated failure time (AFT)
model (log-linear model)
log T  Z T   
 
T  exp Z T   exp  

 The covariate effect is “multiplicative” on the


failure time.
24

Property of Cox PH Model (1/3)


 The property of the Cox PH model:
1. If all the X’s are equal to zero, the formula
reduces to the baseline hazard function. This
property of the Cox model is the reason why
h0(t) is called the baseline function.
2. The baseline hazard, h0(t), is an unspecified
function. It is this property that makes the Cox
model a semiparametric model.
25

Property of Cox PH Model (2/3)


3. Even though the baseline hazard part of the
model is unspecified, it is still possible to
estimate the β’s in the exponential part of the
model.
4. The measure of effect, which is called a hazards
ratio, is calculated without having to estimate the
baseline hazard function.
5. Can estimate h(t, X) and S(t, X) for Cox model
using a minimum of assumptions.
26

Property of Cox PH Model (3/3)


6. Cox model preferred to logistic model.

Uses survival Uses (0,1) outcome;


times and ignores survival times
censoring and censoring
27

ML Estimation of Cox PH Model


 The parameters are the β’s in the general Cox
model formula shown here.
 The corresponding estimates of these
parameters are called maximum likelihood (ML)
estimates and are denoted as ˆi .
 Steps for obtaining ML estimates:
a. form L from model
b. maximize ln L by solving
L
0
 i
28

Hazard Ratio (1/3)


 Hazard ratio:

HR 

hˆ t , X* 
hˆt , X 
where

X*  X 1* , X 2* ,, X *p 
and
X  X 1 , X 2 ,, X p 

denote the set of X’s for two individuals


29

Hazard Ratio (2/3)


 To interpret HR, want HR > 1, i.e., hˆt , X*   hˆt , X
 Typical coding: X*: group with larger h
X: group with smaller h
 Summarizing:
ˆ
 ˆ
p
ˆ *
h(t , X ) h (t ) exp( X *
)
HR   i 1 i i
0
ˆ h0 (t ) exp( i 1 ˆi X )
ˆ p
h(t , X)

 exp  p

i 1 i i i
ˆ X *  X 
30

Hazard Ratio (3/3)


 General rule: If X1 is a (0,1) exposure variable,
then
 
HR  exp ˆ1

(=effect of exposure adjusted for other X’s)


provided no other X’s are product terms involving
exposure.
31

Cox PH Model (95% CI)


 Large sample 95% confidence interval:


exp ˆ1  1.96sˆ
1

 No interaction: simple formula
 Interaction: complex formula
32

Example
> library(survival)
> data(ovarian) → ovarian cancer
The data consists of the following variables:
 futime: survival time (in days) after diagnosis of the
cancer
 fustat: 0 = censored, 1 = dead.
 age: age in years
 residual.dz: a measure of health condition after
chemotherapy.
 rx: 1 = treatment A, 2 = treatment B
 ecog.ps: measure of functioning of the ovaries
33

Result (1/2)
Call:
coxph(formula = Surv(futime, fustat) ~ age + resid.ds +
rx + ecog.ps, data = ovarian)
covariate coef exp(coef) se(coef) z p
age 0.125 1.133 0.0469 2.662 0.0078
resid.ds 0.826 2.285 0.7896 1.046 0.3000
rx -0.914 0.401 0.6533 -1.400 0.1600
ecog.ps 0.336 1.400 0.6439 0.522 0.6000
34

Result (2/2)
 The estimated hazard ratio for an increase of 10 years in
age is HR (10) = exp (10 x 0.125) = 3.490. Thus, for
every increase of 10 years in age, the hazards of ovarian
cancer being present is estimated to increase roughly
3.5 times.
 95% CI for β1:
 
exp ˆ1  1.96sˆ  exp 0.125  1.96 * 0.0469
1

 exp 0.125  0.092


 exp 0.033, 0.217 
 95% CI for HR: (1.033, 1.242)

You might also like