PBHS32700 Lecture15

Lecture 15: More Non-Parametric Estimation
Methods and Testing in Survival Analysis
James J. Dignam
Department of Public Health Sciences

University of Chicago
J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 1 / 43

Non-parametric procedures
Last lecture described non-parametric (distribution-free) methods

to estimate survivor function and hazard function: Kaplan-Meier
method and Life-table method.
Also briefly described hazard estimator derived after KM or
Life-Table survival curve estimator
Here a few additional approaches and related quantities are
described before a review of commonly used testing procedures
after nonparametric estimation

Life-table estimates of the cumulative hazard function
Since H (t ) = −log(S(t )), we can estimate H (t ) by

k
X dj
Ĥ (t ) = −log(Ŝ(t )) = − log(1 − ) (1)
j =1 n 0j
where n 0j = n j − c j /2, and t is in the interval of t k to t k+1 .

The life-table estimate of the hazard function in the k t h time
interval (from t k to t k+1 ) is given by
dk
ĥ(t ) = (2)
(n k0 − d k /2)τk
where τk = t k+1 − t k .

Time to discontinuation of the use of an IUD
. l t a b l e t i m e s t a t u s , i n t e r v a l ( 1 0 ) hazard
Beg . Cum. Std . Std .

Interval Total Failure Error Hazard Error [95% Conf . I n t . ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
10 20 18 0.1176 0.0781 0.0125 0.0088 0.0000 0.0298
20 30 14 0.1176 0.0781 0.0000 . . .
30 40 13 0.2588 0.1126 0.0174 0.0123 0.0000 0.0414
50 60 10 0.3412 0.1267 0.0118 0.0117 0.0000 0.0348
70 80 7 0.4353 0.1392 0.0154 0.0153 0.0000 0.0454
90 100 6 0.6235 0.1429 0.0400 0.0277 0.0000 0.0943
100 110 4 0.7741 0.1448 0.0500 0.0484 0.0000 0.1449
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

The Hazard Function - Estimating λ(t )
An alternate way of formulating (similar to actuarial, but also like KM)

Imagine dividing time into small intervals:
l 1- l 2- l 3- l -
j -
τ0 τ1 τ2 ... τj t
Events: d 1 d2 d3 ... dj
Person Time: N1 N2 N3 ... Nj
where d j is the number of events in interval j and N j is the total

person-time observed in interval j .

The Hazard Function - Estimating λ(t )
Assume:
λ j = event rate for subjects in interval j is constant within the
interval
d j = Poisson(µ j ) where µ j = N j λ j
Then the maximum likelihood estimate for λ j is given by:
λ̂ j = d j /N j
ie. The estimated rate in interval j equals the observed number of

cases divided by the person-time at risk (compare with the
life-table estimate of the hazard)

Estimating the Hazard
IUD data. Estimate interval hazard rates using average hazard
idea in STATA:
* Need t o s t s e t w i t h i d ( ) o p t i o n − make a unique p t i d e n t i f i e r
. gen PTid = _n
. s t s e t time , f a i l u r e ( s t a t u s ) i d ( PTid )
. . .
. s t p t i m e , a t ( 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 70 ,80 , 90 , 100 , 110)
Cohort | person − t i m e failures rate [95% Conf . I n t e r v a l ]

−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
(0 − 10]| 180 1 .00555556 .0007826 .0394393
(10 − 2 0 ] | 160 1 .00625 .0008804 .0443692
(20 − 3 0 ] | 133 1 .0075188 .0010591 .0533765
(30 − 4 0 ] | 114 1 .00877193 .0012356 .0622726
(40 − 5 0 ] | 100 0 0 . .
(50 − 6 0 ] | 89 1 .01123596 .0015827 .0797648
(60 − 7 0 ] | 70 0 0 . .
(70 − 8 0 ] | 65 1 .01538462 .0021671 .1092165
(80 − 9 0 ] | 60 0 0 . .
(90 − 1 . 0 e + 0 2 ] | 50 2 .04 .0100039 .1599375
> 1 . 0 e +02| 25 1 .04 .0056345 .2839629
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
total | 1046 9 .00860421 .0044769 .0165365
hazards a bit different from actuarial (due to how time boundaries defined) but not inconsistent with
those estimates in a small dataset.

Estimating the Hazard
Larger dataset: Data on survival among 863 kidney transplant

patients.
* Need t o s t s e t w i t h i d ( ) o p t i o n
. s t s e t t i m e y r s , f a i l u r e ( death ) i d ( p t i d )
. s t p t i m e , a t ( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ) o u t p u t ( " kidney_haz " , r e p l a c e )
f a i l u r e _d : death == 1
analysis time _t : timeyrs
id : ptid
Cohort | person − t i m e failures rate [95% Conf . I n t e r v a l ]

−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
(0 − 1]| 751.41889 65 .08650302 .0678348 .1103087
(1 − 2]| 612.75222 19 .03100764 .0197783 .0486125
(2 − 3]| 517.29706 18 .03479625 .0219231 .0552284
(3 − 4]| 429.5024 13 .03026758 .0175751 .0521265
(4 − 5]| 347.68925 5 .01438066 .0059856 .03455
(5 − 6]| 257.2731 7 .02720844 .0129712 .0570726
(6 − 7]| 181.32786 8 .04411898 .0220638 .0882207
(7 − 8]| 111.19918 4 .03597149 .0135007 .0958427
(8 − 9]| 48.233404 1 .02073252 .0029205 .1471816
> 9 | 3.421628 0 0 . .
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
total | 3260.115 140 .04294327 .0363878 .0506798

Estimating the Hazard - Kidney Data
Plot of λ(t )
. use kidney_haz
. twoway ( s c a t t e r _Rate _group ) i f _group < 11 , y l a b e l ( 0 ( . 0 2 ) . 1 6 , f o r m a t (%5.3g ) )
y m t i c k ( , f o r m a t ( %3.0g ) ) x t i t l e ( year )

The Nelson-Aalen Estimator of Λ(t )
Recall that Λ(t ) = − log{S(t )}

Therefore, a natural estimator of the cumulative hazard function is
Λ̂(t ) = − log{S km [t ]}
Rt
Another Estimator: Recall also that Λ(t ) = 0 λ(s) d s which
provides an additional estimator
l 1- l 2- l 3- l -
j -
τ0 τ1 τ2 ... τj t
Events: d 1 d2 d3 ... dj
Person Time: N1 N2 N3 ... Nj
λ̂ j = d j /N j
Z t X dj
Λ̂(t ) = λ̂(s) d s = lj
0 t j ≤t Nj

Let intervals shrink: J ↑ ∞ so that l j = τ j − τ j −1 ↓ 0

Each interval contains a single subject with d i j = 1 ( or d j subjects
if ties)
No one enters or leaves (except δi j )
For sufficiently small intervals
Nj ≈ l j nj

Definition The Nelson-Aalen estimator of Λ(t ) is given by:
X dj
Λ̂(t ) =
t j ≤t nj
where the sum is over the distinct observed event times.

As the total sample size increases, the total number of jumps
grows and the jump sizes get smaller, resulting in a better
approximation to a continuous function.
Variance from Greenwood’s formula:
X dj
VarΛ̂(t ) =
t j ≤t n 2j
In small samples, the Nelson-Aalen estimator has better

properties than − log{S km [t ]}

Estimating the NA Hazard
In Stata
Calculate the N-A estimator at specified time points and plot it
. s t s l i s t , na a t ( 0 1 : 10 )
Beg . Nelson − Aalen Std .

Time Total Fail Cum. Haz . Error [95% Conf . I n t . ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
0 0 0 0.0000 0.0000 . .
1 668 65 0.0836 0.0104 0.0655 0.1067
2 564 19 0.1143 0.0126 0.0922 0.1418
3 481 18 0.1489 0.0150 0.1222 0.1813
4 387 13 0.1791 0.0172 0.1484 0.2161
5 295 5 0.1940 0.0184 0.1610 0.2337
6 220 7 0.2214 0.0212 0.1836 0.2670
7 147 8 0.2650 0.0262 0.2183 0.3217
8 80 4 0.2967 0.0308 0.2421 0.3635
9 18 1 0.3223 0.0400 0.2526 0.4112
10 1 0 . . . .
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Note : Nelson − Aalen f u n c t i o n i s c a l c u l a t e d over f u l l data and e v a l u a t e d a t ind
i t i s n o t c a l c u l a t e d from aggregates shown a t l e f t .
The cumulative hazard is important for model form diagnostic purposes.

Estimating the NA Hazard
Plot the cumulative hazard estimate for males and females

. s t s graph , na by ( sex )
0.40
0.30
0.20
0.10
0.00 Nelson−Aalen cumulative hazard estimates, by female
0 2 4 6 8 10
analysis time
female = 0 female = 1

Kernel Estimate of the Hazard λ(t )
Basic Idea: At each time t , let the estimate of λ(t ) be a weighted

ˆ ) at nearby time points per unit of time
average of the jumps in Λ(t
1 Choose a bandwidth ±b outside of which jumps are not averaged.
2 Choose a kernel or weight function K (·). For example,
Uniform kernel: K (x) = 1/2 for −1 ≤ x ≤ 1
Epanechnikov kernel: K (x) = .75(1 − x 2 ) for −1 ≤ x ≤ 1
Calculate the estimate (with denominator units of time)

t − ti di
µ ¶
1 X
λ̂(t ) = K
b i :|t −ti |≤b b ni
Standard error is given by

( ¶ )
t − ti di
µ
1
SE[λ̂(t )] = K2
X
b i :|t −t i |≤b b ni

Kernel Estimation of the Hazard
As b becomes larger, the estimate of λ̂ becomes smoother. We
trade off roughness and bias by manipulating b
. s t s graph , hazard k e r n e l ( gaussian ) w i d t h ( 1 )
y s c a l e ( range ( 0 , . 1 5 ) ) y l a b e l ( 0 . 0 5 . 1 0 . 1 5 )
Smoothed hazard estimate

.15
.1
.05
0
0 2 4 6 8
analysis time

The Average Hazard Rate as a Data Summary
As we have seen, from time to event data we can estimate the

incidence rate or incidence density. To compute, we divide the number
of responses by the sum of observation time over all individuals:
D
λ̂ = PN
i =1 t i
Where
t i is the follow-up time for patient i - add up each patient’s time
observed until failure or censoring
D is the number of failure events (relapses, deaths, etc). When
events are recorded using a 0/1 indicator di , then D = iN=1 di
P
Sometimes also called the average hazard rate or incidence rate

This estimator λ̂ happens to be the maximum likelihood estimator
for the exponential survival model

Estimation of the Average Hazard Rate
We can obtain the average hazard rate using the stsum or strate
command in Stata. For the IUD data:
. stsum
f a i l u r e _d : status
analysis time _t : time
| incidence no . o f |−−−−−− S u r v i v a l t i m e −−−|

| time at r i s k rate subjects 25% 50% 75%
−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
total | 1046 .0086042 18 36 93 107
. * same as i f one adds up needed q u a n t i t i e s d i r e c t l y

. egen events = t o t a l ( s t a t u s )
. egen pmonths = t o t a l ( t i m e )
. d i s p l a y events / pmonths
.00860421

Estimation of the Average Hazard Rate
Changing time units: In a randomized trial evaluating tamoxifen for
early stage breast cancer, there were 438 deaths among 1395
patients over about 12 years. The total person-months of follow-up
is 201,134.6:
. s t s e t t i m e s u r v , f a i l u r e ( dead )
. stsum
| incidence no . o f
| time at r i s k rate subjects
−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
total | 201134.6 .0021776 1395
We can convert this to years by dividing by 12, or 16,761.2

person-years. The death rate is
438
= .00218
201, 134.6
per person-month or .0264 per person-year (multiply by 12). We
can also express this as 2.604 per hundred per year, or roughly a
2.6% per year mortality rate.
Average Hazard Rate Comparison
We can compute within relevant subject strata. Then to contrast two

groups, we can form a difference or ratio of average hazard rates. In
the tamoxifen trial, the ratio of mortality by treatment group is:
. sort t r t
. stsum , by ( t r t )
| incidence no . o f
trt | time at r i s k rate subjects
−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
1 | 98724.69994 .0024107 696
2 | 102409.9001 .0019529 699
−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
total | 201134.6 .0021776 1395
The rate ratio (tamoxifen/placebo) is .0019529

.0024107 = 0.81 or a 19%
relative mortality reduction. The absolute difference is .0004578,
or 0.0055 per person year, or .55/100 patients/year. These are
summaries upon which we may want to make inference about
treatment.

The Hazard Ratio
Inference on the hazard rate ratio essentially tests
H0 : H R = 1.0 versus
HA : H R 6= 1.0
Since the two rates λ1 and λ2 are constant (time-independent),

the alternative here will be some constant
So, we have H A : λ2 = φ × λ1
The logrank test, widely used in survival analysis, provides a test
of HR=1.0 under specific conditions - works for time-independent
and time-varying hazards, as long as the ratio is constant in the
latter case

Comparing Two Survival Distributions
More generally, how to compare two survival distributions or

summaries thereof to make inference?
Example Remission times, in weeks, of leukemia patients
randomized to receive either 6-mercaptopurine (6-MP) or not (control).
6-MP Control
(N=21) (N=21)
6,6,6,6*,7, 1,1,2,2,3,
9*,10,10*,11*, 4,4,5,5,8,
13,16,17*,19*, 8,8,8,11,
20*,22,23,25*, 11,12,12,15,
32*,32*,34*,35* 17,22,23
* denotes a censored observation

Kaplan−Meier Survival Curves by Treatment Group
1.00
0.80
0.60
0.40
0.20
0.00
0 10 20 30 40
weeks
placebo 6−MP
Question of interest Is there a difference between these survival

curves, or, is 6-MP treatment associated with a longer time in cancer
remission?
One Approach Compare survival probabilities at a particular time

point t 0
Use the Kaplan-Meier estimate to obtain: Ŝ 0 (t 0 ) − Ŝ 1 (t 0 )
Use Greenwood’s variance estimator for the difference in survival
probabilities. Use to test:
H0 :S 0 (t 0 ) = S 1 (t 0 )
H A :S 0 (t 0 ) 6= S 1 (t 0 )
using the test statistic:
Ŝ 0 (t 0 ) − Ŝ 1 (t 0 )
Z=q ∼ N (0, 1) under H0
V̂ [Ŝ 0 (t 0 )] + V̂ [Ŝ 1 (t 0 )]

Problems:
1 Comparing survival at a single time point is subjective and may
not reflect clinical question at hand
2 Testing differences at a single time point is inefficient since we
ignore information from the rest of the survival curve
Goal: Compare entire survival curves (up to the time interval sampled)

The Logrank Test
To facilitate, we specify the following:

For t 1 < t 2 < ... < t D denote the set of distinct survival times occurring in
the pooled sample (both groups combined)
We want to test H0 : S 1 (t ) = S 0 (t ) or H0 : λ1 (t ) = λ0 (t ) for all t

vs. H A : S 1 (t ) 6= S 0 (t ) or H A : λ1 (t ) 6= λ0 (t )
Specifically, the alternative we will specify is of the form

vs. H A : S 1 (t ) = [S 0 (t )]φ or H A : λ1 (t ) = φλ0 (t )

The Logrank Test (cont.)
At each failure time, consider a 2 × 2 table of the form:
Failure
Yes No Risk Set
Group 0 d 0k y 0k − d 0k y 0k
Group 1 d 1k y 1k − d 1k y 1k
Total dk y k − dk yk
For failure time t k compute the observed number of deaths in

group 1 and the expected number of deaths under H0 .
Observed: o 1k = d 1k
y 1k d k
Expected: e 1k =
yk
At each failure time, compute:
u k = (o 1k − e 1k )

The Logrank Test (cont.)
Under H0 :
E(u k ) = 0
y 1k y 0k (y k − d k )d k
Var(u k ) = = vk
y k2 (y k − 1)
The logrank statistic is given by:

£PD ¤2 £PD ¤2
k=1
(o k − e k ) u
k=1 k
TL = PD = PD
k=1 k
v v
k=1 k
Assuming independent failure times and for large n ,

˙ 21
TL ∼χ
or
p
Z= TL ∼N
˙ (0, 1)

Logrank Statistic - 6-MP leukemia example
In Stata
. s t s e t time , f a i l u r e ( r e l a p s e )
. sts test txgrp
f a i l u r e _d : r e l a p s e
Log − rank t e s t f o r e q u a l i t y o f s u r v i v o r f u n c t i o n s
| Events Events
txgrp | observed expected
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−
Control | 21 10.75
6−mp | 9 19.25
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−
Total | 30 30.00
chi2 (1) = 16.79
Pr > c h i 2 = 0.0000

Logrank Test and Proportional Hazards
Recall the alternative hypothesis to ‘no difference in survival’ we

specified earlier was of the form
vs. H A : S 1 (t ) = [S 0 (t )]φ or H A : λ1 (t ) = φλ0 (t )
The logrank test has the most power (probability of rejecting H0

when in fact it is false) for the case where the hazard ratio remains
constant over time. This is called the proportional hazards case.
The test will still detect differences deviating (to a reasonable
extent) from this form.

Proportional Hazards
Proportional hazards looks like this on the survival curve scale:

Weighted Logrank Statistics
What if PH does not hold, or we are interested in other

alternatives? Consider weighting (Obs − E xp ) differently over time
This will enable us to inflate the influence of early or late
differences (note: weight =1 under standard logrank test)
→ Potential for increased power under non-proportional hazards
£PD ¤2 £PD ¤2
k=1
w k (o k − e k ) w u
k=1 k k
TW = PD = PD
k=1
w k2 v k w2v
k=1 k k

Choices for w k
1 w k = n k gives the Gehan-Breslow test (weights equal to the total
number of subjects at risk at each failure time). Applies greater
weight to early failure times.
2 w k = Ŝ K M (t k −) gives the generalized Wilcoxon test (weights
equal to the pooled estimate of survival (across groups) just prior
to time t k ). Applies greater weight to early failure times.
Equivalent to the Wilcoxon rank sum statistic when there is no
censoring.

A more general approach to weighted tests:
The G ρ,γ family Fleming and Harrington (1991)

£ ¤ρ £ ¤γ
w k = Ŝ K M (t k −) 1 − Ŝ K M (t k −)
ρ = γ = 0 gives the usual logrank statistic
ρ = 1 and γ = 0 gives the generalized Wilcoxon test
ρ = 0 and γ > 0 gives more weight to late differences

Weighted Logrank Statistics - under proportional
hazards
logrank test: p < 0.0001 Wilcoxon test: p < 0.0001

Weighted Logrank Statistics - early difference in
hazards that later converges
logrank test: p = 0.082 Wilcoxon test: p = 0.0003

Weighted Logrank Statistics - late diverging hazards
logrank test: p = 0.033 Wilcoxon test: p = 0.084

Implementation in Stata - leukemia example

We know that the (unweighted) logrank statistic will be most
powerful under proportional hazards. How can we (informally)
check the proportional hazards assumption?
If we have proportional hazards, then
λ1 (t ) = φλ0 (t )
so that
log Λ1 (t ) = log(φ) + log Λ0 (t )
So, if the log cumulative hazards are roughly parallel, the logrank
test will be most powerful

Look at plot
* * Compute t h e Nelson − Aalen e s t i m a t e f o r each group
. s t s gen cumhaz=na , by ( t x g r p )
. gen logH = l o g ( cumhaz )
. gen logH0 = logH i f ( t x g r p ==0)
. gen logH1 = logH i f ( t x g r p ==1)
. s c a t t e r logH0 time , connect ( l ) | | s c a t t e r logH1 time , connect ( l )
2
1
0
−1
−2
0 10 20 30 40
time
logH0 logH1

Plot is not too bad...Do we expect the generalized Wilcoxon test to

be more or less powerful than the logrank test for this situation?
. sts t e s t txgrp , fh (1 ,0)
Fleming − H a r r i n g t o n t e s t f o r e q u a l i t y o f s u r v i v o r f u n c t i o n s
| Events Events Sum o f
t x g r p | observed expected ranks
−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
0 | 21 10.75 6.8977592
1 | 9 19.25 − 6.8977592
−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Total | 30 30.00 0
chi2 (1) = 14.15

Pr > c h i 2 = 0.0002

Try weighting late failures more

. sts t e s t txgrp , fh (0 ,2)
f a i l u r e _d : relapse
Fleming − H a r r i n g t o n t e s t f o r e q u a l i t y o f s u r v i v o r f u n c t i o n s
| Events Events Sum o f

t x g r p | observed expected ranks
−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
0 | 21 10.75 1.6946697
1 | 9 19.25 − 1.6946697
−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Total | 30 30.00 0
chi2 (1) = 12.35

Pr > c h i 2 = 0.0004
Here, downweighting early large difference diminishes significance (a

tiny bit) relative to the unweighted log-rank test

Choosing the Right Logrank Statistic
How should weights be chosen?

For scientific inference it is not reasonable to look at the survival
curves first, then choose weights.
Standard (unweighted) logrank test by far the most commonly used
version
First, ask whether there is a reason to believe we will have
non-proportional hazards
If not, use the logrank test - most powerful under PH and reasonable
under minor to moderate deviations
If so, consider what survival differences are most meaningful (early vs
late)
→ Childhood cancer (late differences may be more important)
→ Advanced stage cancer (early differences may be more important)

Summary
Nonparametric Estimation and Tests

Nonparametric estimation and testing are the most common form
of analysis in one major research construct: the randomized trial.
This is because, aside from stratification factors that can be
accommodated easily, covariates do not typically influence the
main comparison of interest
Next: modeling survival data - needed to study or account for
covariates related to outcome. The hazard function will largely be
the basis on which we model factors related to survival time

PBHS32700 Lecture15

Uploaded by

Copyright:

Available Formats

You might also like

PBHS32700 Lecture15

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PBHS32700 Lecture15

Uploaded by

Copyright:

Available Formats

Lecture 15: More Non-Parametric Estimation

Methods and Testing in Survival Analysis

Department of Public Health Sciences

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 1 / 43

Last lecture described non-parametric (distribution-free) methods

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 2 / 43

Since H (t ) = −log(S(t )), we can estimate H (t ) by

where n 0j = n j − c j /2, and t is in the interval of t k to t k+1 .

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 3 / 43

Beg . Cum. Std . Std .

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 4 / 43

An alternate way of formulating (similar to actuarial, but also like KM)

where d j is the number of events in interval j and N j is the total

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 5 / 43

ie. The estimated rate in interval j equals the observed number of

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 6 / 43

Cohort | person − t i m e failures rate [95% Conf . I n t e r v a l ]

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 7 / 43

Larger dataset: Data on survival among 863 kidney transplant

Cohort | person − t i m e failures rate [95% Conf . I n t e r v a l ]

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 8 / 43

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 9 / 43

Recall that Λ(t ) = − log{S(t )}

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 10 / 43

Let intervals shrink: J ↑ ∞ so that l j = τ j − τ j −1 ↓ 0

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 11 / 43

where the sum is over the distinct observed event times.

In small samples, the Nelson-Aalen estimator has better

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 12 / 43

Beg . Nelson − Aalen Std .

The cumulative hazard is important for model form diagnostic purposes.

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 13 / 43

Plot the cumulative hazard estimate for males and females

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 14 / 43

Basic Idea: At each time t , let the estimate of λ(t ) be a weighted

Calculate the estimate (with denominator units of time)

Standard error is given by

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 15 / 43

Smoothed hazard estimate

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 16 / 43

As we have seen, from time to event data we can estimate the

Sometimes also called the average hazard rate or incidence rate

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 17 / 43

| incidence no . o f |−−−−−− S u r v i v a l t i m e −−−|

. * same as i f one adds up needed q u a n t i t i e s d i r e c t l y

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 18 / 43

We can convert this to years by dividing by 12, or 16,761.2

We can compute within relevant subject strata. Then to contrast two

The rate ratio (tamoxifen/placebo) is .0019529

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 20 / 43

Inference on the hazard rate ratio essentially tests

Since the two rates λ1 and λ2 are constant (time-independent),

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 21 / 43

More generally, how to compare two survival distributions or

* denotes a censored observation

J. Dignam (UChicago) Lecture 15 Mar. 3, 2020 22 / 43

Kaplan−Meier Survival Curves by Treatment Group