Professional Documents
Culture Documents
Lecture - Final Survival Analysis)
Lecture - Final Survival Analysis)
Alternative terminology
• Event analysis, Time series analysis, Time-to-event analysis
• Survival analysis –studies involving time to death (biomedical
sciences)
• Reliability theory / Reliability analysis (engineering)
• Duration analysis / Duration modeling (economics)
• Event history analysis (Sociology)
Uses
• Clinical trials
• Cohort studies
2/8/2018 Dr.Haftom Temesgen
Survival Analysis
• In many medical studies, the primary endpoint is time
until an event occurs (e.g. death, remission)
• Data are typically subject to censoring when a study
ends before the event occurs
• Survival Function - A function describing the
proportion of individuals surviving to or beyond a
given time. Notation:
– T survival time of a randomly selected individual
– t a specific point in time.
– S(t) = P(T > t) Survival Function
l(t) instantaneous failure rate at time t aka hazard
function
SURVIVAL ANALYSIS HISTORY
9
Why use survival analysis?
1. Why not compare mean time-to-event
between your groups using a t-test or linear
regression?
-- ignores censoring
2. Why not compare proportion of events in
your groups using risk/odds ratios or logistic
regression?
--ignores time
10
Survival Analysis: Terms
• Time-to-event: The time from entry into a study
until a subject has a particular outcome
11
Data Structure: Survival Analysis
Two-variable outcome :
• Time variable: ti = time at last
disease-free observation or time at
event
12
Censoring
B. The age at which children are able to count from 1–10 at school.
Some children are already able to count before joining school.
Student 2
"Survived" - still enrolled at
the end of the study period
Student 3
Student 4
Student 5
1 2 3 4 5 6 7 8 9 10 11 12
Time in Terms Outcome data
Censored Event Total
2 3 5
2/8/2018 Dr.Haftom Temesgen
SURVIVAL ANALYSIS - CENSORING
Consequences of mishandling or ignoring censored data:
Example
Student cohort, N = 50, event of interest = Graduation
Still enrolled at the end of the study, N = 6
No longer enrolled but did not graduate, N = 4
Options: Code all 10 as missing or code 4 as missing, 6 as
graduated as of study end
Consequences:
Mean time to degree is over or understated
selection bias risk
Ignoring censored records completely or arbitrarily assigning event dates
introduces bias into the results
Inclusion of the censored data produces less bias. Newell/Nyun 2011
2/8/2018 Dr.Haftom Temesgen
SURVIVAL ANALYSIS – HANDLING CENSORED DATA
• S(t) = P[ T ≥ t ] = 1 – P[ T < t ]
• A function describing the proportion of
individuals surviving to or beyond a given time.
0 1 2 3 4 5 6 7 8 9
Months since surgery
2/8/2018 Dr.Haftom Temesgen 25
Hazard Function
Since the significance value of the test is less than 0.05, we can
conclude that the survival curves of at least two groups are different.
The pairwise tests for Basic service customers show that their survival
curve is statistically significantly different from E-service and Plus
service customers
The pairwise tests for Total service and Basic service customers show,
their survival curves are not statistically distinguishable. Since the
significance value of their pairwise comparison is >0.05.
2/8/2018 Dr.Haftom Temesgen 48
Summary
• With the Life Tables procedure, we have examined
the distribution of the time to churn, broken down
by levels of the factor Customer category.
The survival table is a descriptive table that details the time until the
drug takes effect.
The table is sectioned by each level of Treatment, and each observation
occupies its own row in the table. As a result, the table is very large, so
only the section corresponding to the first 14 cases to respond to the New
drug2/8/2018
customers is shown here. Dr.Haftom Temesgen 53
Survival Tables
Status: Indicates whether the case experienced the terminal event or was
censored.
. 2/8/2018 Dr.Haftom Temesgen 54
Survival Tables
The plot for the New drug below that of the Existing drug throughout
most of the trial, which suggests that the new drug may give faster relief
than the old.
To determine whether these differences are due to chance, look at the
comparisons
2/8/2018 tables. Dr.Haftom Temesgen 57
Comparing Survival Curves
The means and medians for survival time table offers a quick numerical
comparison of the "typical" times to effect for each of the medications.
Since there is a lot of overlap in the confidence intervals, it is unlikely
that there is much difference in the "average" survival time.
The percentiles table gives estimates of the first quartile, median, and
third quartile of the survival distribution.
Since the significance values of the tests are all greater than 0.10, we
cannot determine a difference between the survival curves
• The shape of the survival function and the regression coefficients for
the predictors are estimated from observed subjects; the model can
then be applied to new cases that have measurements for the
predictor variables.
• Note that information from censored subjects, that is, those that do
not experience the event of interest during the time of observation,
contributes usefully to the estimation of the model.
2/8/2018 Dr.Haftom Temesgen 62
Cox Regression…
• The Cox Regression procedure is useful for modeling the
time to a specified event, based upon the values of given
covariates.
Example. Do men and women have different risks of developing
lung cancer based on cigarette smoking? By constructing a Cox
Regression model, with cigarette usage (cigarettes smoked per day)
and gender entered as covariates, you can test hypotheses
regarding the effects of gender and cigarette usage on time-to-
onset for lung cancer.
64
The Proportional Hazards Model…
The baseline hazard function measures this potential
independently of the covariates.
• The shape of the hazard function over time is defined by the
baseline hazard, for all cases.
• The covariates simply help to determine the overall magnitude
of the function.
• The value of the hazard is equal to the product of the baseline
hazard and a covariate effect.
• While the baseline hazard is dependent upon time, the
covariate effect is the same for all time points.
• Thus, the ratio of the hazards for any two cases at any time
period is the ratio of their covariate effects. This is the
proportional hazards assumption.
2/8/2018 Dr.Haftom Temesgen 65
Cox regression data consideration
Data:
Your time variable should be quantitative, but your status variable can
be categorical.
Independent variables (covariates) can be continuous or categorical; if
categorical, they should be dummy- or indicator-coded.
Strata variables should be categorical, coded as integers or short
strings.
Assumptions:
Observations should be independent, and the hazard ratio should be
constant across time; that is, the proportionality of hazards from one
case to another should not vary over time.
The case processing summary shows that 726 cases are censored.These
are customers who have not churned
2/8/2018 Dr.Haftom Temesgen 76
The categorical variable codings are a useful reference for
interpreting the regression coefficients for categorical covariates.
In the fourth step, age is removed from the model, likely because the
variation in time to churn that is explained by age is also explained
by employ and address; thus, when these variables are added to the
model, age is no longer necessary. Finally, marital is added in the fifth step
2/8/2018 Dr.Haftom Temesgen 78
Variable Selection
Since the significance value of the change is less than 0.05, you can be
confident that custcat contributes to the model.
The churn hazard for a customer who has worked for the same
employer for three years is reduced by 100%−(100%×0.9203)=22.1%.
However, the significance value for this coefficient is greater than 0.10,
so any observed difference between these customer categories could be
due to chance.