Professional Documents
Culture Documents
Cox Regression
Cox Regression
Cox regression
Torben Martinussen
Department of Biostatistics
University of Copenhagen
Survival analysis
T = T∗ ∧ C D = I(T ∗ ≤ C)
ubiquitous with T ∗ being the true survival time and C the (potential) censoring
time and possibly covariates Xi (t).
Hazard-function: α(t) = limh↓0 h1 P(t ≤ T ∗ < t + h | T ∗ ≥ t, Ft− ).
where R
t
Λi (t) = 0 Yi (s)α(s) ds (compensator), Yi (t) = I(t ≤ Ti ) (at risk process).
Intensity: Y (t)α(t).
• in other words Z t
−logS(t) = A(t) = α(s)ds
0
• So
d
logS(t) α(t) = −
dt
• Notation, may use both α(t) and λ(t) for the hazard function. Forgive me.
• But keep in mind that intensity function is Y (t)α(t).
Parametric Models
Λ̂(t)
Malignant melanoma
In the period 1962-77 205 patients had their tumour removed and were
followed until 1977.
At the end of 1977:
• 57 died of mgl. mel.
• 14 died of non-related mgl. mel.
• 134 were still alive.
Purpose: Study effect on survival of sex, age, thickness of tumour, ulceration, ..
Malignant melanoma
1 10 3 1 76 1972 6.76 1
2 30 3 1 56 1968 0.65 0
3 35 2 1 41 1977 1.34 0
4 99 3 0 71 1968 2.90 0
5 185 1 1 52 1965 12.08 1
6 204 1 1 28 1971 4.84 1
7 210 1 1 77 1972 5.16 1
8 232 3 0 60 1974 3.22 1
9 232 1 1 49 1968 12.88 1
. . . . . . . .
. . . . . . . .
. . . . . . . .
203 4688 2 0 42 1965 0.48 0
204 4926 2 0 50 1964 2.26 0
205 5565 2 0 41 1962 2.90 0
Cox-Regression
models and life tables, Journal of the Royal Statistical Society, Series B, 34,
pp. 187-220.
• Regression techniques are very useful for dealing with many covariates
• Can be used to learn about treatment effect while correcting for other
covariates.
• Cox model:
λi (t) = λ0 (t) exp(β1 Xi1 + ... + βp Xip ),
where λ0 (t) is baseline hazard for a subject with covariates 0.
• Note: λ0 (t) is not further specified!
Consider the Cox-model for the melanoma with the explanatory variables sex
and log(thickness) (lthick):
We get:
> melanoma$lthick=log(melanoma$thick)
> fit=coxph(Surv(days,status==1)~ factor(sex)+lthick,data=melanoma)
> fit
Call:
coxph(formula = Surv(days, status == 1) ~ factor(sex) + lthick,
data = melanoma)
PBC-data
PBC data (primary biliary cirrhosis): 418 patients are followed until death or
censoring.
PBC is a fatal chronic liver disease.
Important explanatory variables:
• Age: in years
• Albumin: serum albumin (mg/dl)
• Bilirubin: serum bilirunbin (mg/dl)
• Edema: 0 no edema, 0.5 untreated or successfully treated 1 edema
despite diuretic therapy
• Prothrombin time: standardised blood clotting time
> library(survival)
> data(pbc)
> with(pbc,cbind(time,status,age,edema,bili,protime,albumin)[1:5,])
time status age edema bili protime albumin
[1,] 400 2 58.76523 1.0 14.5 12.2 2.60
[2,] 4500 0 56.44627 0.0 1.1 10.6 4.14
[3,] 1012 2 70.07255 0.5 1.4 12.0 3.48
[4,] 1925 2 54.74059 0.5 1.8 10.3 2.54
[5,] 1504 1 38.10541 0.0 3.4 10.9 3.53
>
> fit.pbc<-coxph(Surv(time, status==2) ~ age+factor(edema)+log(bili)+log(protime)+
log(albumin),data=pbc)
> fit.pbc
Call:
coxph(formula = Surv(time, status == 2) ~ age + factor(edema) +
log(bili) + log(protime) + log(albumin), data = pbc)
Likelihood ratio test=232 on 6 df, p=0 n=416 (2 observations deleted due to missingness)
Likelihood construction
Likelihood construction
Pn
• Let S0 (t, β) = i=1 Yi (t) exp(XiT β).
• Based on martingale equation can you then suggest an estimator of A0 (t)
assuming β is known?
Likelihood construction
• The "estimator" Z t
1
Â0 (t, β) = dN· (s)
0 S0 (t, β)
is called the Breslow-estimator.
• Now plug this estimator into the likelihood function, and arrive at a
function only depending on β!
• This function, L(β) is called Cox’s partial likelihood function.
where
n
X
S0 (t, β) = Yi (t) exp(XiT β).
i=1
• We have presented theory with time-const. X ’s, but can easily handle
time-var. covariates.
• Now let’s look into the asymptotical properties of Cox’s estimators.
• The first partial derivative of S0 (t, β) with respect to β is denoted :
n
X
S1 (t, β) = Yi (t) exp(XiT β)Xi
i=1
Asymptotic properties
Theorem
• Under the standard conditions, then, as n → ∞,
D
n1/2 (β̂ − β0 ) → N(0, Σ−1 ),
Asymptotic properties
• Key to the proof is that the score evaluated in the true point β0 is a
martingale (evaluated at τ ):
n Z
X τ
U(β0 ) = (Xi − E(t, β0 ))dMi (t) (6)
i=1 0
where
S2 (t, β)
V (t, β) = − E(t, β)⊗2 . (7)
S0 (t, β)
• Lindeberg condition of martingale CLT is also fulfilled
Asymptotic properties
IID decomposition
• The score process evaluated at β0
n
X
n−1/2 U(β0 , t) = n−1/2 1i (t) + op (1)
i=1
where
t
Z
s1 (β0 , t)
1i (t) = Xi − dMi (s)
0 s0 (β0 , t)
IID decomposition
n1/2 (Λ̂0 (t, β̂) − Λ0 (t)) ≈ n1/2 (Λ̂0 (t, β0 ) − Λ0 (t)) + Dβ Λ̂0 (t, β̂)n1/2 (β̂ − β0 )
P
since n−1 S0 (s, β)→s0 (s, β).
IID decomposition
• It thus follows that
n1/2 (Λ̂0 (t, β̂) − Λ0 (t))
is asymptotically equivalent to
n
X
n−1/2 2i (t), (8)
i=1
where
Resampling
Resampling
• Just saw that : n1/2 (β̂ − β, Λ̂0 − Λ0 ) was asymptotically equivalent to the
processes
n
X
∆1 = n−1/2 ˆ1i Gi ,
i=1
n
X
∆2 (t) = n−1/2 ˆ2i (t)Gi ,
i=1
where
Wish to study a treatment effect while correcting for other variables. May use
Cox-model:
λi (t) = λ0 (t) exp(β1 Xi1 + ... + βp Xip )
λ0 (t) is the baseline hazard for a subject with covariates 0.
The Cox models ability to deal with many covariates comes from the
regression structure. Some assumptions have been made
• The effects of covariates are additive and linear on the log risk scale.
• If covariates interact with each other the regression model should include
interaction terms.
• The relative risk between the hazard rate for two subjects is constant over
time
λ0i (t)
c(β1 , ..., βp ) =
λi (t)
We will try to check all these assumptions with the latter being the key
assumption.
Melanoma data:
Consider the Cox-model for melanoma data with explanatory variables sex and
log(thickness) (lt):
Checking proportionality
• Constant relative risk between the hazard rates for two subjects
λ0i (t)
c(β1 , ..., βp ) =
λi (t)
• In the 2-sample case the assumption is equivalent to proportional
intensities for the two groups, i.e.,
• A plot of Λ̂2 (t) versus Λ̂1 (t) should give a straight line through (0,0) with
slope exp(β1 ).
• Similarly
log(Λ2 (t)) − log(Λ1 (t)) = β1
so plotting log(Λ̂k (t)), k = 1, 2, versus t should give parallel curves.
Torben Martinussen — Cox regression — 20. september 2012
Slide 31/51
UNIVERSITY OF COPENHAGEN DEPARTMENT OF BIOSTATISTICS
The stratified Cox model contains different baselines for different strata :
-1
-2
Logcums
-3
-4
-5
Parallel?
Torben Martinussen — Cox regression — 20. september 2012
Slide 33/51
UNIVERSITY OF COPENHAGEN DEPARTMENT OF BIOSTATISTICS
• Graphical test may reveal serious violations, but they may be difficult to
use in general.
• If effect is expected to wear off with time or to increase with time, one may
try to describe this effect and then fit a Cox model with a time-varying
explanatory variable f (t)
λ1 (t)
exp(β + θ · f (t)) =
λ2 (t)
• For a given choice of f (t) we can then test if θ is 0, i.e, if the time-varying
effect can be left out of the model. A common choice of f (t) is ln(t). More
specialized methods and programs are available.
• Note that
with
X (t) = I(t > 1400)) · ulc
being a time-varying covariate.
• So this is just a Cox-model again, but now with a time-varying covariate.
n= 367
These methods are quite useful but also have some limitations:
• Graphical method:
I Not parallel. What is acceptable?
I What if a given covariate is continuous?
• Test: Ad hoc method. Which transformation to use?
• Time-changing effect. Also ad-hoc in that the time where effects changes
are rarely known in practice.
• All methods: They assume that model is ok for all the other covariates.
The score function, evaluated in the estimate β̂, and seen as a function of time,
can for example be written as
n Z t n Z t
X S1 (s, β̂) X
U(β̂, t) = (Xi − )dNi (s) = Xi d M̂i (s)
i=1 0 S0 (s, β̂) i=1 0
Hence
n o
n−1/2 U(β̂, t) ≈ n−1/2 U(β0 , t) − I(β̂, t)(I(β̂, τ ))−1 U(β0 , τ )
where
n
X n Z
X t
M1 (t) = M1i (t) = (Xi (s) − e(s, β0 ))dMi (s)
i=1 i=1 0
for almost any sequence of the counting processes that determines M̂ and I.
Torben Martinussen — Cox regression — 20. september 2012
Slide 42/51
UNIVERSITY OF COPENHAGEN DEPARTMENT OF BIOSTATISTICS
To summarize the cumulative score process plots one may look at test
statistics like
PBC-data
> library(timereg)
> fit=cox.aalen(Surv(days,status==1)~ prop(factor(sex))+prop(lthick)+
prop(ulc), data=melanoma)
> summary(fit)
PBC-data
prop(factor(sex))1 prop(lthick)
10
6
Cumulative coefficients
Cumulative coefficients
4
5
2
0
0
-2
-5
-4
-10
-6
Time Time
prop(ulc)
6
Cumulative coefficients
4
2
0
-2
-4
-6
Time
Cumulative Residuals
Cumulative residuals
Call:
cum.residuals(fit.gof, melanoma, cum.resid = 1)
10
5
Cumulative residuals
0
-5
-10
3 4 5 6 7
Cumulative residuals
Call:
cum.residuals(fit.gof, melanoma, cum.resid = 1)
10
5
Cumulative residuals
0
-5
-10
Summary