Gap Time Distributions: DFCI Biostat, Nov 12, 1999

DFCI Biostat, Nov 12, 1999
Gap Time Distributions
Interested in the distribution of T2 , T1 . Examples: Duration of response. T1 to relapse/progression.
= Time from baseline to event 1 T2 = Time from baseline to event 2

T1
= time to start of response, T2 = time = time to progression, T2 = = time to
Survival beyond progression. T1 time to death.
Recurrent events (time between j th and j + 1st). T1 j th event. T2 = time to j + 1st event.
Comparisons of treatment groups are primarily descriptive: eg consider duration of response only a subset of cases respond responders from different treatment groups may differ in ways other than treatment received thus cannot use the baseline randomization to infer a causal effect of treatment Terminology and examples used in the rest of the talk will be for survival beyond progression, but the same issues arise for other gap time distribution inference problems.
E9486 Multiple Myeloma Groups dened by markers measured at time of progression. Is survival beyond progression different among groups? 451 cases with disease progression, 413 dead, 41 still alive
Following slides: Plots of the raw data, and KM estimates of survival beyond progression for groups formed by time of progression, show strong evidence for dependence.
E9486
Dead survtime-progtime 6 Alive 8 4
0 2 4 progtime (years) 6
Survival Beyond Progression by Year of Progressio

P = 0.0000000000039 1.0 0.8 Probability 0.6 0.4 0.2 0.0 0 2 4 Years 6 8 0-1 (98 events/ 98 cases) 1-2 (88 events/ 91 cases) 2-3 (129 events/ 131 cases) >3 (98 events/ 134 cases)
Another approach: t a proportional hazards model with response = survival beyond progression, covariate = time to progression. Use penalized partial likelihood to give a exible estimate of the effect. z <- cox.spline(a,usurv-uprogtime,ustat,uprogtime) cox.spline.plot(z,1, font=5, lwd=2,xlab=Time of Progression, main=Survival Beyond Progression Hazard Ratio)
Survival Beyond Progression Hazard Ratio

1 Log Hazard Ratio -5 0 -4 -3 -2 -1 0
4 Time of Progression
Identiability Issues Let T1

T2
= time from baseline to death

F1 x
= time from baseline to progression = P T1 x; S y jx = P T2 , T1 y jT1 = x

S y
Let
and Then
S y
Z1 =
0
= P T2 , T1 j
y :
S y x dF1 x :

Weight given to S y jx is the probability (density) that T1
That is, the marginal distribution S y is the weighted average of the conditional distributions S y jx.
= x.
With limited follow-up, S y jx is not identiable or estimable for all x; y . Then the data only contain information on S y jx for x; y combinations with x + y c. Let c = the maximum follow-up. Eg, for E9486, c = 11:5 years.
:
In the plot of the data, x is on the horizontal axis and y on the vertical. Only have information on S y jx for x; y points below the line y = 11:5 , x.
10
E9486
8 x=4.5 Dead Alive
y=survtime-progtime
0 2 4 x=progtime (years) 6
y=11.5-x
11
Since do not have info on S y jx for all x; y , cannot estimate the marginal distribution S y . Exception: If T1 y c , b.
b always, and b c, then S y is estimable for

Eg, for duration of response, if response always occurs within 6 months of entry when it occurs at all, and c = 5 years, then S y is estimable for y 4:5.
12
Dependent Censoring Let C be the potential censoring time measured from baseline. Suppose C is independent of T1 ; T2 . Actually observe minfT1 ; C g, I T1
C , minfT2 ; C g, I T2 C .
For the gap time distribution, the failure time is T2 , T1 and the censoring time is C , T1 . If T2 , T1 is correlated with T1 , then C , T1 will be correlated with T2 , T1 , so the censoring and failure times will not be independent when measured from T1 . In E9486, censoring from the accrual and follow-up periods should be roughly uniform over 7; 11:5, which is the region between the two diagonal lines, below. (Of course, our follow-up is not quite that consistent or reliable).
13
E9486
Dead y=survtime-progtime y=7-x Alive 8 6
0 2 4 x=progtime (years) 6
y=11.5-x
14
On the gap-time scale, consider the subjects at risk at y
= 1 years.
Subjects censored at this time will all have long times to progression. Since they have long times to progression, they will have longer survival beyond progression, because of the correlation between the two quantities. Hence the censored subjects are not a random subset of the risk set at any given time (dependent censoring) Thus standard methods applied to the marginal gap time data (eg Kaplan-Meier) will be biased, even when S y is identiable. (Will be unbiased on the difference between the maximum of the support of the progression dist. and the minimum of the support of the censoring dist.)
15
What to Estimate / How to Estimate it? 1. Focus on the conditional distribution S y jx. Identiable for x + y
T2 c.
, T1 and C , T1 are conditionally independent given T1 , so
generally dependent censoring will not be a problem Can model the dependence of the distribution of T2 , T1 on T1 (eg Cox model) Inferences on other factors from tests and Cox models stratied on TTP groups are approximately valid Can give nonparametric kernel-type estimators (eg Dabrowska, SJS, 1987). See the function surv.smooth() in the local S library.
16
Is the marginal distribution really of interest with dependence? 2. Focus on the conditional distribution
H yx
j = P T2 , T1
y T1
j x
c.
Identiable for x + y
Lin, Sun, Ying (Bka, 1999) give a consistent estimator (below), and in unpublished work have developed a generalization of the logrank test. Lin, Sun, Ying: Let H x; y = P T1
ji ji i
= P C t. ~ Index subjects with the subscript i, and let T = minfT ; C g, = I T C , j = 1; 2, i = 1; : : : ; n. Note that H y jx = H x; y =H x; 0.
y and G t
ji ji i
x; T2 , T1
17
With no censoring, the EDF
1
n
X
i
I T1i
x; T2 , T1
i
is unbiased for H x; y . With censoring, note that Gt 0 for t ~ ~ y = 0 when 1i = 0. I T2i , T1i
c, then
i i i i i i
c. Also,
If x + y
~ ~ ~ ~ EfI T1 x; T2 , T1 y=Gy + T1 jT1 ; T2 g = EfI T1 x; T2 , T1 y; C , T1 y=Gy + T1 jT1 ; T2 g = I T1 x; T2 , T1 y;
i i i i i i i i i i i
so
18
1
n
X ~
i
I T1i
~ ~ x; T2 , T1
i
y =G y
~ + T1
i
is unbiased for H x; y . Substituting a consistent estimator for Gt then gives a consistent estimator for H x; y .
^ Can use the Kaplan-Meier estimator Gt, computed from the data ~ T2i ; 1 , 2i .
Note that the full data set measured from baseline is used to estimate G. Asymptotic variance is not trivial to calculate.
19
S Function survbrec(): survival beyond recurrence

survbrec <- function(tp,rtime,rstat,stime,sstat, tp2=maxstime-max(tp),maxstime=max(stime)) { ### Estimates conditional gap time distribution ### H(tp[j]|tp2)=P(S-R>tp[j]|R<=tp2) (tp2 is a scalar) ### rtime, stime are potentially censored observations on ### R, S, measured from baseline ### rstat, sstat are 1 for events, 0 for failures ### Generally H is not identifiable if tp+tp2>max(stime) but ### can override the observed max(stime) with maxstime arg. out <- tp subr <- stime-rtime > 0 & rstat == 1 & rtime <= tp2 if (length(subr[subr])==0) return(rep(NA,length(tp))) if (all(sstat == 1)) {# no censoring, dist identifiable h0 <- length(stime[subr]) for (j in 1:length(tp)) {
20
h1 <- length(stime[stime-rtime > tp[j] & subr]) out[j] <- h1/h0 } } else { cd <- survfit(Surv(stime,1-sstat)1) h0 <- sum(1/(summary(cd,times=sort((rtime)[subr]))$surv)) for (j in 1:length(tp)) { if (tp[j]>maxstime-tp2) out[j] <- NA else { i <- stime-rtime > tp[j] & subr out[j] <- if (length(i[i])==0) 0 else {sum(1/( summary(cd,times=sort((rtime+tp[j])[i]))$surv))/h0 } } } } out }
21
> # 9486; H(i|2) > survbrec(1:10,d$progtime,d$progstat,d$survtime,d$survstat,2) [1] 0.35475379 0.19061398 0.12707598 0.07412766 0.04264862 [6] 0.02346133 0.01434252 0.01304146 0.02603643 NA > ## NOTE: not monotone > # 9486; H(i|5) > survbrec(1:7,d$progtime,d$progstat,d$survtime,d$survstat,5) [1] 0.51997911 0.30615689 0.20832986 0.12561816 0.06387121 [6] 0.03879853 NA
Simulation: T1 Exp1, T2 , T1 jT1 Weibull1 + :5T1 ; 2, corT1 ; T2 , T1 = :53, C U 0; 2:5, n = 200 On average expect 73 cases without progression, 70 progressed but alive, and 57 progressed and died.
f1 <- function(tpp,cutoff,n=200,mc=2.5){ u1 <- rexp(n) #progtime u2 <- u1+rweibull(n,shape=2,scale=1+.5*u1) #survtime
22
truc <- truu <- tpp sub <- u1<=cutoff; nc <- length(sub[sub]) for (j in 1:length(tpp)) { xj <- u2-u1>tpp[j] truu[j] <- sum(xj)/n truc[j] <- sum(xj & sub)/nc } ct <- mc*runif(n) #censoring i1 <- ifelse(ct<u1,0,1) u1 <- pmin(u1,ct) i2 <- ifelse(ct<u2,0,1) u2 <- pmin(u2,ct) d2 <- survbrec(tpp,u1,i1,u2,i2,cutoff,maxstime=mc) sub <- u2-u1>0 & i1 == 1 k1 <- summary(survfit(Surv(u2-u1,i2)1,subset=sub), times=tpp)$surv cbind(truu,truc,d2,k1) }
23
> > > > > + > >
ntri <- 500 tpp <- c(.5,1) out <- array(NA,c(length(tpp),4,ntri)) for (i in 1:ntri) out[,,i] <- f1(tpp,1) dimnames(out) <- list(format(tpp),c(True Unc, True Cond,LSY,KM),NULL) # Estimates of means apply(out,c(1,2),mean) True Unc True Cond LSY KM 0.5 0.87127 0.8360188 0.8393883 0.8450474 1.0 0.59050 0.4958633 0.4990447 0.5027919 > # Standard errors of means > sqrt(apply(out,c(1,2),var)/ntri) True Unc True Cond LSY KM 0.5 0.001071808 0.001477317 0.002585383 0.001610749 1.0 0.001596997 0.002019595 0.003563854 0.002712360
24
> # Estimates of variances > apply(out,c(1,2),var) True Unc True Cond LSY KM 0.5 0.0005743859 0.001091232 0.003342102 0.001297256 1.0 0.0012752004 0.002039382 0.006350528 0.003678449
True Unc = True unconditional probability S y True Cond = True conditional probability H y jx LSY = Lin Sun Ying estimator of H y j1 KM = Kaplan-Meier applied to the gap time data
Conditional and marginal distributions are different LSY is essentially unbiased for the true conditional distribution
25
KM is biased as an estimator of the true unconditional distribution It is coincidence that KM is also nearly unbiased for the conditional distribution. The KM would be the same for any value of cutoff above, while H y jx would vary with x =cutoff. Variance of LSY estimator is substantially larger than KM. How efcient is the LSY procedure?
26
Summary If the time to the initiating event is correlated with the gap time to the terminating event, then In general the marginal gap time distribution is not identiable When it is identiable, standard methods for inference on the marginal distribution may be invalid due to dependent censoring Various conditional distributions can be estimated, and inference should focus on these. LSY estimator is not monotone, and its efciency properties are not clear.

Gap Time Distributions: DFCI Biostat, Nov 12, 1999

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gap Time Distributions: DFCI Biostat, Nov 12, 1999

Uploaded by

Copyright:

Available Formats

DFCI Biostat, Nov 12, 1999