Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Introduction to Survival Analysis

Survival analysis is a family of statistical methods designed for the analysis of


duration data (i.e., how long until an event occurs). It is used in a variety of
areas, including reliability and failure time analysis and clinical data analysis.
survival analysis is a collection of statistical procedures for data analysis for
which the outcome variable of interest is time until an event occurs.
Survival analysis involves a random variable: time. By time, we mean years,
months, weeks, or days from the beginning of follow-up of an individual until
an event occurs; alternatively, time can refer to the age of an individual when
an event occurs.
There are different ways in which we perform survival analysis. It is performed
in several ways like when we define a group. Some of them are Kaplan Meier
Curves, Cox Regression Models, Hazard Function, Survival Function, etc.
When the Survival Analysis is done to compare the survival analysis of two
different groups. There we perform the Log-Rank test. When the Survival
Analysis like to describe the categorical and quantitative variables on survival
we like to do Cox proportional hazards regression, Parametric Survival
Models, etc.
In the Survival Analysis, we need to define certain terms before one proceeds
like the Event, Time, Censoring, Survival Function, etc.
Event, when we talk about, is the activity which is going on or is going to
happen in the survival analysis study like the Death of a Person from a
particular disease, time to get cure by a medical diagnose, time to get cured
by vaccines, time of occurrence of failure of machines in the manufacturing
shop floor, time for diseases occurrence, etc.
• Time: In survival analysis case study is the time from the beginning of
the survival analysis observation on the subject matter till the time when
the event is going to occur. Like in the case of Mechanical Machine to a
failure we need to know
a) time of an event when the machine will start
b) when the machine will fail
c) loss of machine or the shutdown of the machine from the survival
analysis study.

• Censoring/ Censored Observation: This terminology is defined as if


the subject matter on which we are doing the study of survival analysis
doesn’t get affected by the defined event of study, then they are
described as censored. The censored subject might also not have an
event after the end of the survival analysis observation. The subject is
called censored in the sense that nothing was observed out of the
subject after the time of censoring.
Censoring Observation are also of 3 types

1
i. Right Censored: Right censoring is used in many problems. It
happens when we are not certain what happened to people after
a certain point in time. It occurs when the true event time is
greater than the censored time when c < t. This happens if either
some people cannot be followed the entire time because they
died or were lost to follow up or withdrew from the study.
ii. Left Censored: Left censoring is when we are not certain what
happened to people before some point in time. Left censoring is
the opposite, occurring when the true event time is less than the
censored time when c > t.
iii. Interval Censored: Interval censoring is when we know
something has happened in an interval (not before starting time
and not after ending time of the study) but we do not know exactly
when in the interval it happened. Interval censoring is a
concatenation of the left and right censoring when the time is
known to have occurred between two-time points Survival
Function S (t): This is a probability function that depends on the
time of the study. The subject survives more than time t. The
Survivor function gives the probability that the random variable T
exceeds the specified time t.

Kaplan Meier Estimator


Kaplan Meier Estimator is used to estimate the survival function for lifetime
data. It is a non-parametric statistics technique. It is also known as the
product-limit estimator, and the concept lies in estimating the survival time
for a certain time of like a major medical trial event, a certain time of death,
failure of the machine, or any major significant event.
The Kaplan-Meier survival curve is a nonparametric maximum likelihood
estimator of the probability of survival in a given length of time, with time taken
in multiple small intervals. This method allows the survival function to be
calculated. The Kaplan-Meier survival curve is also called “product limit
estimate.”
The survival probability at any point in time is written as:

𝑆𝑡 = ∏ Number good at start –Number that failed


Number good at start
𝒕𝒊 < 𝒕

2
𝑛𝑖 −𝑑𝑖
𝑆𝑡 = ∏
𝒕𝒊 < 𝒕 𝑑𝑖
where 𝑛𝑖 is the number of survivors just prior to time 𝑡𝑖 . If no censoring occurs,
all subjects have failed. When there is censoring, 𝑛𝑖 equals the number of
survivors less the number of censored cases (i.e., subjects who were no
longer observed at a given time). The number of failed subjects within the
same observation period is indicated by 𝑑𝑖 .
The Kaplan-Meier method can create both a tabular estimate and graphical
stairstep curve for use in analysis. If the analysis needed falls within the curve,
before the last censored time, it is a simple method to use. To use the Kaplan-
Meier method, certain assumptions must be present: censoring is unrelated to
either survival or failure, the survival probabilities are the same for all subjects
regardless of when the observation period began, and event times are
accurately recorded. These minimal assumptions are important because they
allow the Kaplan-Meier method to be applied to a range of time-to-event data.

3
Assumptions of Kaplan Meier Survival
In real-life cases, we do not have an idea of the true survival rate function. So
in Kaplan Meier Estimator we estimate and approximate the true survival
function from the study data. There are 3 assumptions of Kaplan Meier
Survival
i. Survival Probabilities are the same for all the samples who joined late in
the study and those who have joined early. The Survival analysis which
can affect is not assumed to change.
ii. Occurrence of Event are done at a specified time.
iii. Censoring of the study does not depend on the outcome. The Kaplan
Meier method doesn’t depend on the outcome of interest.
Interpretation of Survival Analysis is Y-axis shows the probability of subject
which has not come under the case study. The X-axis shows the representation
of the subject’s interest after surviving up to time. Each drop in the survival
function (approximated by the Kaplan-Meier estimator) is caused by the event
of interest happening for at least one observation.
The plot is often accompanied by confidence intervals, to describe the
uncertainty about the point estimates-wider confidence intervals show high
uncertainty, this happens when we have a few participants- occurs in both
observations dying and being censored.

Important things to consider for Kaplan Meier


Estimator Analysis

I. We need to perform the Log Rank Test to make any kind of inferences.
II. Kaplan Meier’s results can be easily biased. The Kaplan Meier is a
univariate approach to solving the problem
III. Removal of Censored Data will cause to change in the shape of the
curve. This will create biases in model fit-up
IV. Statistical tests and observations become mislead if the Dichotomizing
of Continuous Variable is performed.
V. By dichotomizing means we take statistical measures such as median
to create groups but this may lead to problems in the data set.

4
Log-Rank Test
The log-rank test is a nonparametric hypothesis test comparing more than one
strata of survival distributions. The log-rank test compares the hazard function
estimates of two or more survival groups at each observed time. In other
words, this test allows for comparisons of differences in survival times for an
event among different groups of observations (e.g., different brands of
batteries, differences among batteries from multiple vendors, different battery
charging management techniques).

The general form of the test statistic is:

where Oa , Ob , and Oc are the observed failures and Ea , Eb , and Ec are


the expected failures of groups a, b, and c, respectively.

This test statistic will have a chi-squared distribution with degrees of freedom
equal to one less than the number of groups being compared. The null
hypothesis for the log-rank test is that all groups have an equal hazard
rate/survival distribution. The rejection of the null hypothesis would be that at
least one of the survival groups has a different hazard rate/survival
distribution. In other words, a rejection of the null hypothesis would result in a
conclusion that at least one of the survival groups has a different survival
distribution than the other two.

5
Advantages & Dis-Advantages of Kaplan Meier
Estimator

• Advantages
i. Does not require too many features- time to the survival analysis
event is only required.
ii. Provides an average overview related to the event.

• Disadvantages
i. Lots of variables cannot be correlated and monitor
simultaneously.
ii. If censoring data is removed the model will get biased at the time
of fitting.
iii. The proper estimation of the magnitude of change in the event
cannot be predicted.

Conclusion
Kaplan-Meier statistical method is very useful in the field of epidemiology
especially in the analysis of time to event data. The method is used in survival
analysis to analyze the patients that reached a certain event and those that
are censored during a given period of time. It is also very applicable in making
comparison between groups of participants such as control group and
treatment group. Statistical software such as SPSS, Stata, SAS and R
packages can be used to generate survival table and Kaplan-Meier estimate
curve as well as other important and relevant tables like overall comparisons
table. The KM estimate is also applied in other disciplines such as
engineering, economics, physics etc.

You might also like