Cox Regression

UNIVERSITY OF COPENHAGEN DEPARTMENT OF BIOSTATISTICS
Faculty of Health Sciences
Cox regression
Torben Martinussen
Department of Biostatistics
University of Copenhagen
20. september 2012

Slide 1/51
Survival analysis
Standard setup for right-censored survival data. IID copies of (T , D) where
T = T∗ ∧ C D = I(T ∗ ≤ C)
ubiquitous with T ∗ being the true survival time and C the (potential) censoring
time and possibly covariates Xi (t).
Hazard-function: α(t) = limh↓0 h1 P(t ≤ T ∗ < t + h | T ∗ ≥ t, Ft− ).
Counting process: Ni (t) = I(Ti ≤ t, Di = 1)
Martingale: Mi (t) = Ni (t) − Λi (t)
where R
t
Λi (t) = 0 Yi (s)α(s) ds (compensator), Yi (t) = I(t ≤ Ti ) (at risk process).
Intensity: Y (t)α(t).
Torben Martinussen — Cox regression — 20. september 2012

Slide 2/51
Modelling Survival Data
• To model and analyze survival data and deal with censorings in a

convenient way one often specifies the model through the hazard rate:
α(t) = risk of dying among people at risk at time t
• The hazard function is closely related to the survival probability

Z t
P(T > t) = S(t) = exp(− α(s)ds)
0
• in other words Z t
−logS(t) = A(t) = α(s)ds
0
• So
d
logS(t) α(t) = −
dt
• Notation, may use both α(t) and λ(t) for the hazard function. Forgive me.
• But keep in mind that intensity function is Y (t)α(t).

Slide 3/51
Parametric Models
• To give precise and clear answers. Kaplan-Meier curves and log-rank

tests are not always sufficient.
• By making simplifying reasonable assumptions something more can be
said through parametric models.
• The simplest possible survival model simply claims that the intensity is
constant over time
λ(t) = λ, Λ(t) = λt
When the rate is constant the survival time is exponentially distributed.
• One may validate the model by considering the non-parametric estimator
of the cumulative intensity (Nelson-Aalen)
Λ̂(t)
• Should be approximately linear if the model is correct

Slide 4/51
Malignant melanoma
In the period 1962-77 205 patients had their tumour removed and were
followed until 1977.
At the end of 1977:
• 57 died of mgl. mel.
• 14 died of non-related mgl. mel.
• 134 were still alive.
Purpose: Study effect on survival of sex, age, thickness of tumour, ulceration, ..

Slide 5/51
Malignant melanoma
N time status sex age year thickness ulcer
1 10 3 1 76 1972 6.76 1
2 30 3 1 56 1968 0.65 0
3 35 2 1 41 1977 1.34 0
4 99 3 0 71 1968 2.90 0
5 185 1 1 52 1965 12.08 1
6 204 1 1 28 1971 4.84 1
7 210 1 1 77 1972 5.16 1
8 232 3 0 60 1974 3.22 1
9 232 1 1 49 1968 12.88 1
. . . . . . . .
. . . . . . . .
. . . . . . . .
203 4688 2 0 42 1965 0.48 0
204 4926 2 0 50 1964 2.26 0
205 5565 2 0 41 1962 2.90 0

Slide 6/51
Parametric models - the Weibull model

• Let X , a (p-dimensional) covariate. The Weibull regression model has
hazard function
αθ (t) = λγ(λt)γ−1 exp(X T β), (1)
where θ = (λ, γ, β); β denotes the regression parameters.
• The hazard (1) may be written as a so-called proportional hazards model
λ0 (t) exp(X T β),
• Likelihood function based on Ti = Ti∗ ∧ Ci Di = I(Ti∗ ≤ Ci ), Xi (ignoring

censoring part) is :
R Ti
αθ (t) dt
Y
L(θ) = αθ (Ti )Di e− 0
• Standard MLE applies,

√
Σ̂ = I −1 (θ̂)
D
n(θ̂ − θ)→N(0, Σ),
where I(θ) is minus the derivative of U(θ).

• How would you compute θ̂ if you were to implement it in R?
Slide 7/51
Parametric models - the Weibull model
• Do exercise 3.7 (a) in MS, p. 76.

• Read in the melanoma-data, and fit the Weibull-model with only sex in the
model, see p. 69 in MS
• Any suggestion on how to check the validity of the Weibull-model in this
case?

Slide 8/51
Cox-Regression
• Cox-regression is the regression technique for survival analysis, Cox

(1972).
• Per 2005. The two most cited statistical papers:
I (1) With 25,869 citations (currently cited 1,984 times per year): Kaplan, E. L.
and Meier, P. (1958) Nonparametric estimation from incomplete observations,

Journal of the American Statistical Association, 53, pp. 457-481.
I (2) With 18,193 citations (1,342 per year), Cox, D. R. (1972) Regression
models and life tables, Journal of the Royal Statistical Society, Series B, 34,
pp. 187-220.
• Regression techniques are very useful for dealing with many covariates
• Can be used to learn about treatment effect while correcting for other
covariates.
• Cox model:
λi (t) = λ0 (t) exp(β1 Xi1 + ... + βp Xip ),
where λ0 (t) is baseline hazard for a subject with covariates 0.
• Note: λ0 (t) is not further specified!

Slide 9/51
The Cox model
• The regression coefficients β1 , ..., βp represent the effects of the

covariates.
• β1 is the effect of Xi1 when we have corrected for the other covariates.
• β1 may be interpreted in terms of the relative risk when the covariate Xi1 is
increased 1:
λ0 (t) exp(β1 (Xi1 + 1) + ... + βp Xip )
= exp (β1 )
λ0 (t) exp(β1 Xi1 + ... + βp Xip )
• If β1 > 0 the risk of dying increases as Xi1 increases, and if β1 < 0 the risk
of dying decreases as Xi1 increases.
• The quantity
β̂1 Xi1 + ... + β̂p Xip
is called the prognostic index for the ith subject.

Slide 10/51
Two-Sample Cox Model
• Consider a simple 2-sample situation where we wish to study effect of sex.

• Defining a covariate for the ith patient

1 Male
Xi =
0 Female
• The hazard can be written as
λi (t) = λ0 (t) exp(βXi )
giving the contributions λ0 (t) exp(β) when Xi = 1 and λ0 (t) when Xi = 0.

• Note again that λ0 (t) is unspecified meaning that which distribution is
unspecified?
• In a moment we derive estimators for the Cox model, but let’s already use
them. Can be calculated in R (or SAS, Stata, SPSS).
• http://nyheder.ku.dk/alle_nyheder/2012/2012.9/
forskere-paaviser-klar-sammenhaeng-mellem-antistof-og-leddegigt/
• BMJ-paper

Slide 11/51
Example: Melanoma data
Consider the Cox-model for the melanoma with the explanatory variables sex
and log(thickness) (lthick):
λi (t) = λ0 (t) exp (β1 · sexi + β2 · lti )
We get:
> melanoma$lthick=log(melanoma$thick)
> fit=coxph(Surv(days,status==1)~ factor(sex)+lthick,data=melanoma)
> fit
Call:
coxph(formula = Surv(days, status == 1) ~ factor(sex) + lthick,
data = melanoma)
coef exp(coef) se(coef) z p

factor(sex)1 0.458 1.58 0.269 1.70 8.8e-02
lthick 0.781 2.18 0.157 4.96 6.9e-07
Likelihood ratio test=33.5 on 2 df, p=5.45e-08 n= 205
What does the relative risks of 1.58 and 2.18 mean?

Slide 12/51
PBC-data
PBC data (primary biliary cirrhosis): 418 patients are followed until death or
censoring.
PBC is a fatal chronic liver disease.
Important explanatory variables:
• Age: in years
• Albumin: serum albumin (mg/dl)
• Bilirubin: serum bilirunbin (mg/dl)
• Edema: 0 no edema, 0.5 untreated or successfully treated 1 edema
despite diuretic therapy
• Prothrombin time: standardised blood clotting time

Slide 13/51
Fitting Cox’s model in R
> library(survival)
> data(pbc)
> with(pbc,cbind(time,status,age,edema,bili,protime,albumin)[1:5,])
time status age edema bili protime albumin
[1,] 400 2 58.76523 1.0 14.5 12.2 2.60
[2,] 4500 0 56.44627 0.0 1.1 10.6 4.14
[3,] 1012 2 70.07255 0.5 1.4 12.0 3.48
[4,] 1925 2 54.74059 0.5 1.8 10.3 2.54
[5,] 1504 1 38.10541 0.0 3.4 10.9 3.53
>
> fit.pbc<-coxph(Surv(time, status==2) ~ age+factor(edema)+log(bili)+log(protime)+
log(albumin),data=pbc)
> fit.pbc
Call:
coxph(formula = Surv(time, status == 2) ~ age + factor(edema) +
log(bili) + log(protime) + log(albumin), data = pbc)

age 0.0405 1.0413 0.00771 5.25 1.6e-07
factor(edema)0.5 0.2819 1.3256 0.22523 1.25 2.1e-01
factor(edema)1 1.0125 2.7524 0.28975 3.49 4.8e-04
log(bili) 0.8590 2.3609 0.08324 10.32 0.0e+00
log(protime) 2.3599 10.5902 0.77314 3.05 2.3e-03
log(albumin) -2.5158 0.0808 0.65262 -3.85 1.2e-04
Likelihood ratio test=232 on 6 df, p=0 n=416 (2 observations deleted due to missingness)

Slide 14/51
Likelihood construction
• Likelihood function based on Ti = Ti∗ ∧ Ci Di = I(Ti∗ ≤ Ci ), Xi is:

Y R Ti YY Rτ
α(Ti )Di e− 0
α(t) dt
= α(t)∆Ni (t) e− 0 Yi (t)α(t) dt
i i t
I τ is end of observation period
I Yi (t) = I(t ≤ Ti ) is the at risk indicator.

• Can also write it as
YY Rτ
dA(t)∆Ni (t) e− 0 Yi (t)dA(t)
i t
Rt
where A(t) = 0
α(s) ds.
T Rt
• Cox-model: A(t) = A0 (t)eX β
with A0 (t) = 0
α0 (s) ds.

Slide 15/51
Pn
• Let S0 (t, β) = i=1 Yi (t) exp(XiT β).
• Based on martingale equation can you then suggest an estimator of A0 (t)
assuming β is known?

Slide 16/51
• The "estimator" Z t
1
Â0 (t, β) = dN· (s)
0 S0 (t, β)
is called the Breslow-estimator.
• Now plug this estimator into the likelihood function, and arrive at a
function only depending on β!
• This function, L(β) is called Cox’s partial likelihood function.

Slide 17/51
Cox’s proportional hazards model

• The Cox model is
λi (t) = Yi (t)λ0 (t) exp(XiT β), (2)
where X = (X1 , ..., Xp ) is a p-dimensional locally bounded predictable
covariate and Yi (t) is an at risk indicator.
• The regression parameter β is estimated as the maximizer to Cox’s partial
likelihood function
Y Y exp (X T β) ∆Ni (t)
i
L(β) = , (3)
t
S0 (t, β)
i
where
n
X
S0 (t, β) = Yi (t) exp(XiT β).
i=1
• Λ0 (t) is estimated by the Breslow estimator

Z t
1
Λ̂0 (t) = Λ̂0 (t, β̂) = dN· (t). (4)
0 S0 (t, β̂)

Slide 18/51
• We have presented theory with time-const. X ’s, but can easily handle
time-var. covariates.
• Now let’s look into the asymptotical properties of Cox’s estimators.
• The first partial derivative of S0 (t, β) with respect to β is denoted :
n
X
S1 (t, β) = Yi (t) exp(XiT β)Xi
i=1
• Show that β̂ is found as the solution to the score equation U(β̂) = 0,

where
n Z τ
X S1 (t, β)
U(β) = (Xi (t) − E(t, β))dNi (t) E(t, β) = . (5)
0 S0 (t, β)
i=1
• Show also that U(β0 ) can be written as a martingale evaluated at τ . Here,

β0 denotes the true β.

Slide 19/51
Asymptotic properties
Theorem
• Under the standard conditions, then, as n → ∞,
n−1/2 U(β0 )→N(0, Σ),

D
D
n1/2 (β̂ − β0 ) → N(0, Σ−1 ),
and Σ is estimated consistently by n−1 I(β̂), with I minus the derivative of

U.
• Under the standard conditions, then, as n → ∞,

D
n1/2 (Λ̂0 (t, β̂) − Λ0 (t)) → U(t)
where U(t) is a zero-mean Gaussian process with covariance function

Φ(t). Not a martingale though, why?
Get back to how Φ(t) can be estimated consistently

Slide 20/51
• Key to the proof is that the score evaluated in the true point β0 is a
martingale (evaluated at τ ):
n Z
X τ
U(β0 ) = (Xi − E(t, β0 ))dMi (t) (6)
i=1 0
• The predictable variation process of n−1/2 U(β0 ) is

n Z
X τ
hn−1/2 U(β0 )i = n−1 (Xi (t) − E(t, β0 ))⊗2 Yi (t) exp(XiT (t)β0 )dΛ0 (t)
i=1 0
Z τ
−1 P
=n V (t, β0 )S0 (t, β0 )dΛ0 (t)→Σ.
0
where
S2 (t, β)
V (t, β) = − E(t, β)⊗2 . (7)
S0 (t, β)
• Lindeberg condition of martingale CLT is also fulfilled

Slide 21/51
• It follows that U(β0 ) converges in distribution to a normal variate with

zero-mean and variance Σ.
• A Taylor series expansion of the score gives
n1/2 (β̂ − β0 ) = (n−1 I(β ∗ ))−1 n−1/2 U(β0 ),
where β ∗ is on the line segment between β0 and β̂.

• Consistency of β̂ and the results above give that n1/2 (β̂ − β0 ) converges
to the postulated normal distribution.

Slide 22/51
IID decomposition
• The score process evaluated at β0
n
X
n−1/2 U(β0 , t) = n−1/2 1i (t) + op (1)
i=1
where
t
Z
s1 (β0 , t)
1i (t) = Xi − dMi (s)
0 s0 (β0 , t)
with sj (β0 , t) the limit in prob. of n−1 Sj (β0 , t).

• That is a sum of zero-mean iid terms!
• The variance of n1/2 (β̂ − β0 ) may be estimated consistently by
( n
)
−1
X
Σ̃ = nI (β̂, τ ) ˆ⊗2
1i (τ ) I −1 (β̂, τ ).
i=1
where ˆ1i (τ ) is given by 1i (τ ) replacing unknowns with their empirical

counterparts.
Slide 23/51
IID decomposition
• More important! Try the same for the Breslow-estimator.
n1/2 (Λ̂0 (t, β̂) − Λ0 (t)) ≈ n1/2 (Λ̂0 (t, β0 ) − Λ0 (t)) + Dβ Λ̂0 (t, β̂)n1/2 (β̂ − β0 )
• Dβ Λ̂0 (t, β̂) conv. in probability so last term is essentially a sum of

zero-mean iid’s.
• But also
Z t
1
n1/2 (Λ̂0 (t, β0 ) − Λ0 (t)) = dM· (s)
0 S 0 (s, β)
Z t
1
≈ n−1/2 dMi (s),
0 s0 (s, β)
P
since n−1 S0 (s, β)→s0 (s, β).

Slide 24/51
IID decomposition
• It thus follows that
n1/2 (Λ̂0 (t, β̂) − Λ0 (t))
is asymptotically equivalent to
n
X
n−1/2 2i (t), (8)
i=1
where
2i (t) =3i (t) + H T (β0 , t)I(β0 , τ )−1 1i (τ ),

Z t
3i (t) = S0−1 (β0 , s)dMi (s),
0
Rt
and where H(β, t) is the derivative of 0
S0−1 (β, s)dN.(s) wrt β.
• The variance of n1/2 (Λ̂0 (t, β̂) − Λ0 (t)) thus may be estimated by
n
X
n−1 ⊗2
2i (t)
i=1

Slide 25/51
Resampling
• Now, n1/2 (Λ̂0 (t, β̂) − Λ0 (t)) is asymptotically equivalent to

n
X
n−1/2 ˆ2i (t)Gi (9)
i=1
where G1 , ..., Gn are iid N(0, 1).

• This is very useful for constructing confidence bands for the survival
predictions and to make tests about the baseline and survival function
predictions.
• Taylor expansion:
n1/2 (S0 (Λ̂0 , β̂, t) − S0 (Λ0 , β, t)) = −S0 (Λ0 , β, t)

n o
n1/2 exp(X0T β)(Λ̂0 (t) − Λ0 (t)) + Λ0 (t) exp(X0T β)X0T (β̂ − β) . (10)

Slide 26/51
Resampling
• Just saw that : n1/2 (β̂ − β, Λ̂0 − Λ0 ) was asymptotically equivalent to the
processes
n
X
∆1 = n−1/2 ˆ1i Gi ,
i=1
n
X
∆2 (t) = n−1/2 ˆ2i (t)Gi ,
i=1
where G1 , ..., Gn are independent standard normals.

• It follows that n1/2 (Ŝ0 − S0 ) has the same asymptotic distribution as
n
X
∆S (t) = S0 (t)n−1/2 ˆ4i (t)Gi , (11)
i=1
where
4i (t) = exp(Z0T β)2i (t) + A(t) exp(X0T β)X0T 1i .

Slide 27/51
Cox-Regression: checking assumptions
Wish to study a treatment effect while correcting for other variables. May use
Cox-model:
λi (t) = λ0 (t) exp(β1 Xi1 + ... + βp Xip )
λ0 (t) is the baseline hazard for a subject with covariates 0.
The Cox models ability to deal with many covariates comes from the
regression structure. Some assumptions have been made
• The effects of covariates are additive and linear on the log risk scale.
• If covariates interact with each other the regression model should include
interaction terms.
• The relative risk between the hazard rate for two subjects is constant over
time
λ0i (t)
c(β1 , ..., βp ) =
λi (t)
We will try to check all these assumptions with the latter being the key
assumption.

Slide 28/51
Interaction between categorical and cont. variables
Melanoma data:
Consider the Cox-model for melanoma data with explanatory variables sex and
log(thickness) (lt):
λi (t) = λ0 (t) exp (β1 · sexi + β2 · lti )
How do we check for interaction in this situation?

• Group the continuous variable (log(thickness)) and proceed as before.
• Allow for two different β2 ’s, one for each value of sex.
• Fit this latter model in R, and interpret output.

Slide 29/51
Interaction between categorical and cont. variables
> fit=coxph(Surv(days,status==1)~ factor(sex)+factor(sex)*lthick,data=melanoma)

> fit
Call:
coxph(formula = Surv(days, status == 1) ~ factor(sex) + factor(sex) *
lthick, data = melanoma)

factor(sex)1 1.111 3.037 1.817 0.612 5.4e-01
lthick 0.834 2.302 0.214 3.895 9.8e-05
factor(sex)1:lthick -0.113 0.893 0.311 -0.363 7.2e-01
Cox-model: λ0 (t) exp (β1 · sex + β2 · lt + β3 · sexlt)

with

β1 + (β2 + β3 ) · lt, sex = 1
β1 · sex + β2 · lt + β3 · sexlt =
β2 · lt, sex = 0
• The baseline λ0 (t) is the hazard for?

• The effect of lt is estimated to ? for male and to ? for female
• Is there an interaction between sex and lt?

Slide 30/51
Checking proportionality
• Constant relative risk between the hazard rates for two subjects
λ0i (t)
c(β1 , ..., βp ) =
λi (t)
• In the 2-sample case the assumption is equivalent to proportional
intensities for the two groups, i.e.,
exp(β1 )λ1 (t) = λ2 (t).
• Implies that the cumulative intensities are proportional

Z t
Λ2 (t) = λ2 (s)ds = exp(β1 )Λ1 (t)
0
• A plot of Λ̂2 (t) versus Λ̂1 (t) should give a straight line through (0,0) with
slope exp(β1 ).
• Similarly
log(Λ2 (t)) − log(Λ1 (t)) = β1
so plotting log(Λ̂k (t)), k = 1, 2, versus t should give parallel curves.
Slide 31/51
The Stratified Cox model
The stratified Cox model contains different baselines for different strata :
λik (t) = λk (t) exp(β1 Xi1 + ... + βp Xip ) k = 1, ..., K
λk (t) is the baseline hazard for a subject in strata k .

• The regression coefficients β1 , ..., βp represent the effects of the
covariates as in the simple Cox regression model.
• Melanoma-data: we might stratify according to ulceration.
• The baselines give the mortality of the two defined by ulceration (yes/no)
when all other covariates are 0.

Slide 32/51
Log-cumulative hazard plot

> fit=coxph(Surv(days,status==1)~ factor(sex)+lthick+strata(ulc),data=melanoma)
> fit.detail=coxph.detail(fit)
> logcum1=log(cumsum(fit.detail$hazard[c(17:57)]))
> logcum0=log(cumsum(fit.detail$hazard[1:16]))
> plot(c(fit.detail$time[17:57],3700),c(logcum1,logcum1[41]),type=’s’,xlab="Time (in days)",ylab="Logcums")
> lines(c(fit.detail$time[1:16],3700),c(logcum0,logcum0[16]),type=’s’,lty=2)
-1
-2
Logcums
-3
-4
-5
500 1000 1500 2000 2500 3000 3500
Time (in days)
Parallel?
Slide 33/51
Checking Proportionality: tests
• Graphical test may reveal serious violations, but they may be difficult to
use in general.
• If effect is expected to wear off with time or to increase with time, one may
try to describe this effect and then fit a Cox model with a time-varying
explanatory variable f (t)
λ1 (t)
exp(β + θ · f (t)) =
λ2 (t)
• For a given choice of f (t) we can then test if θ is 0, i.e, if the time-varying
effect can be left out of the model. A common choice of f (t) is ln(t). More
specialized methods and programs are available.

Slide 34/51
> fit=coxph(Surv(days/365,status==1)~ factor(sex)+ulc+lthick,data=melanoma)

> time.test=cox.zph(fit,transform="log")
> time.test
rho chisq p
factor(sex)1 -0.0627 0.230 0.6312
ulc -0.1335 0.956 0.3283
lthick -0.2942 4.022 0.0449
GLOBAL NA 8.773 0.0325
Based on this, the Cox model is rejected.

Slide 35/51
• Another option is to have a time-changing effect of ulc.

• Imagine for instance that effect of ulc is most important in an initial phase.
We try the first 4 years, approx 1400 days.
• We hence replace βulc · ulc by βulc (t) · ulc with
βulc (t) = βulc,1 + βulc,2 I(t > 1400)
• Note that
βulc (t) · ulc = (βulc,1 + βulc,2 I(t > 1400)) · ulc

= βulc,1 · ulc + βulc,2 · X (t)
with
X (t) = I(t > 1400)) · ulc
being a time-varying covariate.
• So this is just a Cox-model again, but now with a time-varying covariate.

Slide 36/51
Time-changing effect of covariates
• Such a model can easily be fitted in R using the survSplit-function.

> melanoma1=survSplit(melanoma,cut=c(1400),end="days",start="start",
event="status")
> melanoma1$ulcnew=melanoma1$ulc*as.numeric(melanoma1$days>1400)
>
> fit1=coxph(Surv(start,days,status==1)~ factor(sex)+lthick+
actor(ulc)+ factor(ulcnew), data=melanoma1)
> summary(fit1)
Call:
coxph(formula = Surv(start, days, status == 1) ~ factor(sex) +
lthick + factor(ulc) + factor(ulcnew), data = melanoma1)
n= 367
coef exp(coef) se(coef) z Pr(>|z|)

factor(sex)1 0.3744 1.4541 0.2701 1.386 0.165759
lthick 0.5741 1.7756 0.1801 3.187 0.001436 **
factor(ulc)1 1.6967 5.4561 0.5024 3.377 0.000732 ***
factor(ulcnew)1 -1.5515 0.2119 0.6451 -2.405 0.016170 *
---
• Relative risk Ulc/no Ulc in the first 4 years, and after the first 4 years?
Conclusion?

Slide 37/51
• The procedures described above are the traditional goodness-of-fit

tools.
• Make tests against specific deviations: Replace X1 with (X1 , X1 (log (t))),
say (β1 → β1 + βp+1 · log (t)). Test the null βp+1 = 0.
• Graphical method
• Time-changing effect.
These methods are quite useful but also have some limitations:
• Graphical method:
I Not parallel. What is acceptable?
I What if a given covariate is continuous?
• Test: Ad hoc method. Which transformation to use?
• Time-changing effect. Also ad-hoc in that the time where effects changes
are rarely known in practice.
• All methods: They assume that model is ok for all the other covariates.

Slide 38/51
Cumulative martingale residuals
Alternative: Cumulative martingale residuals, Lin, Wein and Ying (1993).

The martingales under the Cox regression model can be written as
Z t
Mi (t) = Ni (t) − Yi (s) exp(X T β)dΛ0 (s)
0
Z t
M̂i (t) = Ni (t) − Yi (s) exp(X T β̂)d Λ̂0 (s)
0
Z t
1
= Ni (t) − Yi (s) exp(XiT (s)β̂) dN· (s).
0 S0 (s, β̂)
One idea is now to look at different groupings of the these residuals and see if
they behave as they should under the model.

Slide 39/51
The score function, evaluated in the estimate β̂, and seen as a function of time,
can for example be written as
n Z t n Z t
X S1 (s, β̂) X
U(β̂, t) = (Xi − )dNi (s) = Xi d M̂i (s)
i=1 0 S0 (s, β̂) i=1 0
so it is equivalent to cumulating the martingale residuals (in certain way). Also,

by Taylor series expansion,
n−1/2 U(β̂, t) ≈ n−1/2 U(β0 , t) − (n−1 I(β̂, t))n1/2 (β̂ − β)

n1/2 (β̂ − β) ≈ (n−1 I(β̂, τ ))−1 n−1/2 U(β0 , τ )
Hence
n o
n−1/2 U(β̂, t) ≈ n−1/2 U(β0 , t) − I(β̂, t)(I(β̂, τ ))−1 U(β0 , τ )

Slide 40/51
n−1/2 U(β̂, t) is thus asymptotically equivalent to the process

n−1/2 M1 (t) − I(t, β̂)I −1 (τ, β̂)M1 (τ ) , (12)
where
n
X n Z
X t
M1 (t) = M1i (t) = (Xi (s) − e(s, β0 ))dMi (s)
i=1 i=1 0
with e(t, β0 ) = limp E(t, β0 ).
Note: n−1/2 U(β̂, τ ) = 0

Slide 41/51

The distribution of the process n−1/2 M1 (t) (t ∈ [0, τ ]) is asymptotically
equivalent to
n Z
X t
n−1/2 (Xi (s) − E(s, β̂))dNi (s)Gi
i=1 0
where G1 , ..., Gn are independent standard normals.

Since Mi ’s are i.i.d. with variance E(Ni ). Alternatively to this resampling
approach one may also ([?]), show
n
X Z t
n−1/2 M̂1i (t)Gi , with M̂1i (t) = (Xi (s) − E(s, β̂))d M̂i (s).
i=1 0
Therefore it can be shown that n1/2 U(β̂, t) is asymptotically equivalent to

n
X
n−1/2 M̂1i (t) − I(t, β̂)I −1 (τ, β̂)M̂1i (τ ) Gi ,
i=1
for almost any sequence of the counting processes that determines M̂ and I.
Slide 42/51
To summarize the cumulative score process plots one may look at test
statistics like
sup |Uj (β̂, t)| |, j = 1 . . . , p,

t∈[0,τ ]

Slide 43/51
PBC-data
> library(timereg)
> fit=cox.aalen(Surv(days,status==1)~ prop(factor(sex))+prop(lthick)+
prop(ulc), data=melanoma)
> summary(fit)
Proportional Cox terms :

Coef. SE Robust SE D2log(L)^-1 z P-val
prop(factor(sex))1 0.381 0.274 0.281 0.271 1.36 0.174000
prop(lthick) 0.576 0.162 0.172 0.179 3.34 0.000847
prop(ulc) 0.939 0.315 0.307 0.324 3.06 0.002230
Test for Proportionality
sup| hat U(t) | p-value H_0
prop(factor(sex))1 3.27 0.282
prop(lthick) 8.17 0.012
prop(ulc) 4.31 0.054
> plot(fit,score=T)

Slide 44/51
PBC-data
prop(factor(sex))1 prop(lthick)
10
6
Cumulative coefficients
4
5
2
0
0
-2
-5
-4
-10
-6
0 500 1500 2500 0 500 1500 2500
Time Time
prop(ulc)
6
4
2
0
-2
-4
-6
0 500 1500 2500
Time

Slide 45/51
Cumulative Residuals
• LWY also suggested :

Z t
Mc (t, z) = KzT (s)d M̂(s)
0
where Kz (t) is an n × 1 matrix with elements I(Xi1 (t) ≤ z) for i = 1, . . . , n

focusing here on the first continuous covariate X1 , say.
• Thus cumulating residuals versus both time and the covariate values.
• A 1-dimensional process :
Z τ
Mc (z) = KzT (t)d M̂(t), (13)
0
which can be plotted against z.

Rτ
• If X1 (t) = X1 then 0 KzT (t)d M̂(t) =
P
i M̂i (τ )I(Xi1 ≤ z) which is a
process in z.
• This will reveal if the functional form is appropriate. These processes can
also be resampled.

Slide 46/51
Cumulative residuals
> fit.gof=cox.aalen(Surv(days,status==1)~ prop(lthick),melanoma,

residuals=1)
>
> resids=cum.residuals(fit.gof,melanoma,cum.resid=1)
> summary(resids)
Test for cumulative MG-residuals
Grouped cumulative residuals not computed, you must provide

modelmatrix to get these (see help)
Residual versus covariates consistent with model
sup| hat B(t) | p-value H_0: B(t)=0

5.772 0.512
Call:
cum.residuals(fit.gof, melanoma, cum.resid = 1)

Slide 47/51
10
5
0
-5
-10
3 4 5 6 7

Slide 48/51
> fit.gof=cox.aalen(Surv(days,status==1)~ prop(thick),melanoma,

residuals=1)
> resids=cum.residuals(fit.gof,melanoma,cum.resid=1)
> summary(resids)
Test for cumulative MG-residuals
Grouped cumulative residuals not computed, you must provide

modelmatrix to get these (see help)
Residual versus covariates consistent with model
sup| hat B(t) | p-value H_0: B(t)=0

10.885 0.022
Call:
cum.residuals(fit.gof, melanoma, cum.resid = 1)

Slide 49/51
10
5
0
-5
-10
0 500 1000 1500

Slide 50/51
Summary
• Cox’s proportional hazards model.

• Nice model with easily interpretable parameters - relative risks!
• Is is used heavily in Biostatistcs.
• Are the relative risks really not depending on time? Check model carefully.
• Resampling techniques useful for evaluating asymptotics distributions.
• Useful goodness-of-fit based on cumulative residuals with p-values.

Slide 51/51

Cox Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cox Regression

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF COPENHAGEN DEPARTMENT OF BIOSTATISTICS

Faculty of Health Sciences

20. september 2012

Standard setup for right-censored survival data. IID copies of (T , D) where

Counting process: Ni (t) = I(Ti ≤ t, Di = 1)

Martingale: Mi (t) = Ni (t) − Λi (t)

Torben Martinussen — Cox regression — 20. september 2012

Modelling Survival Data

• To model and analyze survival data and deal with censorings in a

α(t) = risk of dying among people at risk at time t

• The hazard function is closely related to the survival probability

Torben Martinussen — Cox regression — 20. september 2012

• To give precise and clear answers. Kaplan-Meier curves and log-rank

• Should be approximately linear if the model is correct

Torben Martinussen — Cox regression — 20. september 2012

Torben Martinussen — Cox regression — 20. september 2012

N time status sex age year thickness ulcer

Torben Martinussen — Cox regression — 20. september 2012

Parametric models - the Weibull model

λ0 (t) exp(X T β),

• Likelihood function based on Ti = Ti∗ ∧ Ci Di = I(Ti∗ ≤ Ci ), Xi (ignoring

• Standard MLE applies,

where I(θ) is minus the derivative of U(θ).

Parametric models - the Weibull model

• Do exercise 3.7 (a) in MS, p. 76.

Torben Martinussen — Cox regression — 20. september 2012

• Cox-regression is the regression technique for survival analysis, Cox

and Meier, P. (1958) Nonparametric estimation from incomplete observations,

Torben Martinussen — Cox regression — 20. september 2012

The Cox model

• The regression coefficients β1 , ..., βp represent the effects of the

Torben Martinussen — Cox regression — 20. september 2012

Two-Sample Cox Model

• Consider a simple 2-sample situation where we wish to study effect of sex.

• The hazard can be written as

λi (t) = λ0 (t) exp(βXi )

giving the contributions λ0 (t) exp(β) when Xi = 1 and λ0 (t) when Xi = 0.

Torben Martinussen — Cox regression — 20. september 2012

Example: Melanoma data

λi (t) = λ0 (t) exp (β1 · sexi + β2 · lti )

coef exp(coef) se(coef) z p

Likelihood ratio test=33.5 on 2 df, p=5.45e-08 n= 205

What does the relative risks of 1.58 and 2.18 mean?

Torben Martinussen — Cox regression — 20. september 2012

Torben Martinussen — Cox regression — 20. september 2012

Fitting Cox’s model in R

coef exp(coef) se(coef) z p

Torben Martinussen — Cox regression — 20. september 2012

• Likelihood function based on Ti = Ti∗ ∧ Ci Di = I(Ti∗ ≤ Ci ), Xi is:

I τ is end of observation period

I Yi (t) = I(t ≤ Ti ) is the at risk indicator.

Torben Martinussen — Cox regression — 20. september 2012

Torben Martinussen — Cox regression — 20. september 2012

Torben Martinussen — Cox regression — 20. september 2012

Cox’s proportional hazards model

• Λ0 (t) is estimated by the Breslow estimator

Torben Martinussen — Cox regression — 20. september 2012

Cox’s proportional hazards model

• Show that β̂ is found as the solution to the score equation U(β̂) = 0,

• Show also that U(β0 ) can be written as a martingale evaluated at τ . Here,

Torben Martinussen — Cox regression — 20. september 2012

n−1/2 U(β0 )→N(0, Σ),

and Σ is estimated consistently by n−1 I(β̂), with I minus the derivative of

where ˆ1i (τ ) is given by 1i (τ ) replacing unknowns with their empirical

2i (t) =3i (t) + H T (β0 , t)I(β0 , τ )−1 1i (τ ),

4i (t) = exp(Z0T β)2i (t) + A(t) exp(X0T β)X0T 1i .