Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Survival Analysis

Analysis of time to event data


By:
Haftom Temesgen (PhD)
Department of Biostatistics
College of Health Sciences
Mekelle University

2/8/2018 Dr.Haftom Temesgen


January 23, 2018
1
Learning Objectives:
At the end of this presentation, participants should be able to :
• Survival analysis
– Measure Time and Events
– Understand Censoring
– Understand Survival and Hazard Functions
– Understand Models and Hypothesis Testing
• Kaplan- Meier survival curve & estimates
• Cox Proportional Hazards Model

2/8/2018 Dr.Haftom Temesgen 2


Regression vs. Survival Analysis

Technique Predictor Outcome Censoring


Variables Variable permitted?
Linear Categorical or Normally No
Regression continuous distributed

Logistic Categorical or Binary No


Regression continuous

Survival Time and Binary Yes


Analyses categorical or
continuous
2/8/2018 Dr.Haftom Temesgen 3
Regression vs. Survival Analysis
Technique Mathematical model Yields

Linear Y=B1X + Bo Linear


Regression (linear) changes

Logistic Ln(P/1-P)=B1X+Bo Odds ratios


Regression
Survival h(t) = ho(t)exp(B1X+Bo) Hazard rates
Analyses
2/8/2018 Dr.Haftom Temesgen 4
Survival Analysis Background
Definition
• A statistical method for studying the time to an event. The term
“survival” suggests that the event of interest is death but the
technique is useful for other types of events.

Alternative terminology
• Event analysis, Time series analysis, Time-to-event analysis
• Survival analysis –studies involving time to death (biomedical
sciences)
• Reliability theory / Reliability analysis (engineering)
• Duration analysis / Duration modeling (economics)
• Event history analysis (Sociology)
Uses
• Clinical trials
• Cohort studies
2/8/2018 Dr.Haftom Temesgen
Survival Analysis
• In many medical studies, the primary endpoint is time
until an event occurs (e.g. death, remission)
• Data are typically subject to censoring when a study
ends before the event occurs
• Survival Function - A function describing the
proportion of individuals surviving to or beyond a
given time. Notation:
– T  survival time of a randomly selected individual
– t  a specific point in time.
– S(t) = P(T > t)  Survival Function
 l(t)  instantaneous failure rate at time t aka hazard
function
SURVIVAL ANALYSIS HISTORY

• Unknown – been around for a few hundred years

• Techniques developed in medical / biological sciences

• World War II –military vehicles (reliability and failure


time analysis)

• The Kaplan-Meier Estimator was introduced with the


publication of NONPARAMETRIC ESTIMATION
FROM INCOMPLETE OBSERVATIONS – E. L.
Kaplan / Paul Meier, 1958
• Cited 34,000 times as of 2011
2/8/2018 Dr.Haftom Temesgen
http://articles.chicagotribune.com/2011-08-18/news/ct-met-meier-obit-20110818_1_clinical-trials-research-experimental-treatment
What is survival analysis?
• Model time to failure or time to event
– Unlike linear regression, survival analysis has a dichotomous
(binary) outcome
– Unlike logistic regression, survival analysis analyzes the time to
an event

• Able to account for censoring


• Can compare survival between 2+ groups
• Assess relationship between covariates and survival
time
2/8/2018 Dr.Haftom Temesgen 8
What do we mean by Time?
• Length of follow-up till the event of interest
occurs
• Follow-up can start at (for example)
1. Randomization into a clinical trial
2. Time of employment
3. First contact on record in retrospective cohort
• Age of the individual at the time of the event

9
Why use survival analysis?
1. Why not compare mean time-to-event
between your groups using a t-test or linear
regression?
-- ignores censoring
2. Why not compare proportion of events in
your groups using risk/odds ratios or logistic
regression?
--ignores time

10
Survival Analysis: Terms
• Time-to-event: The time from entry into a study
until a subject has a particular outcome

• Censoring: Subjects are said to be censored if


they are lost to follow up or drop out of the
study, or if the study ends before they die or have
an outcome of interest.

11
Data Structure: Survival Analysis

Two-variable outcome :
• Time variable: ti = time at last
disease-free observation or time at
event

• Censoring variable: ci =1 if had the


event; ci =0 no event by time ti

12
Censoring

2/8/2018 Dr.Haftom Temesgen 13


Censoring (cont…)

2/8/2018 Dr.Haftom Temesgen 14


Types of censoring

2/8/2018 Dr.Haftom Temesgen 15


Types of censoring….

2/8/2018 Dr.Haftom Temesgen 16


Types of censoring….

2/8/2018 Dr.Haftom Temesgen 17


What kind of censoring?

A. Leukæmia patients are given a drug or a placebo. Survival time


is the duration from remission to relapse. The study ends at 52
weeks with some patients yet to relapse.

B. The age at which children are able to count from 1–10 at school.
Some children are already able to count before joining school.

2/8/2018 Dr.Haftom Temesgen 18


Censored observations :
• Individuals who have not experienced “the event” by the end of the
study
• Right censoring
o Study participant can’t be located
o or lives beyond the end of the study
o or drop outs before the study is completed
o or is still enrolled
o An observation with incomplete information
o Don’t have to handle these individuals as “missing”
o Do have to follow rules with respect to censored data
o # of censored should be small relative to non-censored
o Censored and non-censored population should be similar (Kaplan-
Meier)
2/8/2018 Dr.Haftom Temesgen 19
SURVIVAL ANALYSIS TO ANALYZE DEGREE COMPLETION –
CENSORING terms enrolled Graduation_status
Student 1 5 0
Student 2 9 1
Student 3 14 0
Student 4 7 1
Student 5 8 1
Dropped out after 5 terms
Student 1

Student 2
"Survived" - still enrolled at
the end of the study period
Student 3

Student 4

Student 5

1 2 3 4 5 6 7 8 9 10 11 12
Time in Terms Outcome data
Censored Event Total
2 3 5
2/8/2018 Dr.Haftom Temesgen
SURVIVAL ANALYSIS - CENSORING
Consequences of mishandling or ignoring censored data:

Example
Student cohort, N = 50, event of interest = Graduation
Still enrolled at the end of the study, N = 6
No longer enrolled but did not graduate, N = 4
Options: Code all 10 as missing or code 4 as missing, 6 as
graduated as of study end
Consequences:
Mean time to degree is over or understated
selection bias risk
Ignoring censored records completely or arbitrarily assigning event dates
introduces bias into the results
Inclusion of the censored data produces less bias. Newell/Nyun 2011
2/8/2018 Dr.Haftom Temesgen
SURVIVAL ANALYSIS – HANDLING CENSORED DATA

Two methods to produce the cumulative probability of


survival that the survival graph is based upon:

1. SPSS Life Table: (Each time period) the effective size of


the cohort is reduced by ½ of the censored group

2. Kaplan-Meier Survival Table: The survival probability


estimate for each time period, except the first, is a
compound conditional probability

2/8/2018 Dr.Haftom Temesgen


Survival Analysis

• Survival analysis deals with making inference about


EVENT RATES
• Rate at t = Rate among those at risk at t
• Deals with Median survival (50%).
• Not Mean survival (need everyone to have an event)
• Outcome variable = event time
• Examples of events:
– Death, infection, prostate cancer death,
hospitalization
– Recurrence of cancer after treatment
2/8/2018 Dr.Haftom Temesgen 23
Survival Function

• S(t) = P[ T ≥ t ] = 1 – P[ T < t ]
• A function describing the proportion of
individuals surviving to or beyond a given time.

• Plot: Y axis = % alive, X axis = time


• Proportion of population still without the event
by time t

2/8/2018 Dr.Haftom Temesgen 24


Survival Curve
Survival Curve
1.0 0.8
Proportion Alive
0.2 0.4 0.6
0.0

0 1 2 3 4 5 6 7 8 9
Months since surgery
2/8/2018 Dr.Haftom Temesgen 25
Hazard Function

• Also termed incidence rate, instantaneous risk,


force of mortality
• λ(t)
• Event rate at t among those at risk for an event
• Key function
• Estimated in a straightforward way
– Censored

2/8/2018 Dr.Haftom Temesgen 26


Time to Cardiovascular Adverse Event in VIGOR Trial

2/8/2018 Dr.Haftom Temesgen 27


Hazard Function

• Event = death, scale = months since Tx


• “λ(t) = 1% at t = 12 months”

• “At 1 year, patients are dying at a rate of 1%


per month”

• “At 1 year the chance of dying in the following


month is 1%”
2/8/2018 Dr.Haftom Temesgen 28
Relationship between survivor function and hazard
function
• Survivor function, S(t) defines the probability of
surviving longer than time t
– this is what the Kaplan-Meier curves show.
– Hazard function is the derivative of the survivor
function over time h(t)=dS(t)/dt
• instantaneous risk of event at time t (conditional failure
rate)
• Survivor and hazard functions can be converted
into each other

2/8/2018 Dr.Haftom Temesgen 29


Kaplan-Meier

• One way to estimate survival


• Nice, simple, can compute by hand
• Can add stratification factors
• Cannot evaluate covariates like Cox model

2/8/2018 Dr.Haftom Temesgen 30


Limit of Kaplan-Meier curves
• What happens when you have several covariates that you
believe contribute to survival?
• Example
– Smoking, diabetes, hypertension, contribute to time to
myocardial infarct

• Can use stratified K-M curves – for 2 or maybe 3


covariates

• Need another approach – multivariate Cox proportional


hazards model is most common -- for many covariates

2/8/2018 Dr.Haftom Temesgen 31


Multivariable method: Cox
proportional hazards
• Needed to assess effect of multiple
covariates on survival

• Cox-proportional hazards is the most


commonly used multivariable survival
method

2/8/2018 Dr.Haftom Temesgen 32


Cox proportional hazard model

• Works with hazard model


• Conveniently separates baseline hazard function
from covariates
– Baseline hazard function over time
• h(t) = ho(t)exp(B1X+Bo)
– Covariates are time independent
– B1 is used to calculate the hazard ratio, which is
similar to the relative risk

2/8/2018 Dr.Haftom Temesgen 33


Cox Proportional Hazards Model…

• Add covariates to the model

• Proportional change in the hazard (on the log


scale)

• Can test the effect of the factor as in linear


regression - H0: β=0

2/8/2018 Dr.Haftom Temesgen 34


Limitations of Cox PH model

• Does not accommodate variables that change over time

– Most variables (e.g. gender, ethnicity) are constant


• If necessary, one can program time-dependent
variables

2/8/2018 Dr.Haftom Temesgen 35


Summary
• Survival analyses quantifies time to a single,
dichotomous event
• Handles censored data well
• Survival and hazard can be mathematically converted
to each other
• Kaplan-Meier survival curves can be compared
statistically and graphically
• Cox proportional hazards models help distinguish
individual contributions of covariates on survival,
provided certain assumptions are met.
2/8/2018 Dr.Haftom Temesgen 36
Life Tables
• Life Tables is a descriptive procedure for
examining the distribution of time-to-event
variables. Additionally, you can compare the
distribution by levels of a factor variable.

2/8/2018 Dr.Haftom Temesgen 37


Life Tables
• Example: use the data set telco.sav.
• Using Life Tables to Examine Customer Time to
Churn
• As part of its efforts to reduce customer churn, a
telecommunications company is interested in
examining the "time to churn".

• Use Life Tables to examine the distribution of "time


to churn" by levels of the company-
assigned Customer category.
2/8/2018 Dr.Haftom Temesgen 38
Life Tables
• To run a Life Tables analysis, from the menus choose:
• Analyze > Survival > Life Tables...
The time a customer has been with the company is measured in
months, but we would like to see the life table by quarters.
 Select Months with service as the time variable.
► Type 60 as the maximum time interval to display and
type 3 as the number of intervals to display by.
► Select Churn within last month as the status variable.
► Click Define Event.
Select Customer category as a factor.
Click Define Range.
2/8/2018 Dr.Haftom Temesgen 39
► Select Survival in the Plot Type group.
► Choose to compare levels of the first factor Pairwise.
► Click Continue.
► Click OK in the Life Tables dialog box

2/8/2018 Dr.Haftom Temesgen 40


The life table is a descriptive table summarizing the time to churn. The
table is sectioned by each level of Customer category. As a result, the
table is very large, so only the section corresponding to Basic
service customers is shown here.
2/8/2018 Dr.Haftom Temesgen 41
The time period that The number of censored cases in this
marks the beginning interval. These are still active
of the interval customers, but so far they have not
been customers longer than the time
period indicated by this interval.

The number of surviving


cases at the beginning of
the interval. This value
Number Exposed to Risk:
decreases steadily with The number of surviving cases
each interval as customers minus one half the censored cases.
who terminated service or
who have not been
This is intended to account for the
customers for very long are effect of the censored cases.
dropped from further
2/8/2018 Dr.Haftom Temesgen 42
analysis.
Number of Terminal Events:
The number of cases that Proportion Surviving:
experience the terminal event in One minus the proportion
this interval. These are terminating.
customers that cancelled their
service.

Cumulative Proportion Surviving at


Proportion Terminating: End of Interval: The proportion of
The ratio of terminal events cases surviving from the start of
to the number exposed to the table to the end of the interval.
risk.
2/8/2018 Dr.Haftom Temesgen 43
Probability Density. An estimate
of the probability of experiencing
the terminal event during the
interval.

Hazard Rate: An estimate of the


risk of experiencing the terminal
event during the interval, conditional
upon surviving to the start of the
2/8/2018 Dr.Haftom Temesgen 44
interval.
The greatest number and proportion of terminal events
occur within the first year, which suggests that customers
should be monitored more closely during their first year to
be sure of their satisfaction with the company's service.

2/8/2018 Dr.Haftom Temesgen 45


Survival Curve:
The survival curves give a visual representation of the life
tables.
Any point on the survival
curve shows the probability
that a customer of a given
service category will remain
a customer past that time

Total service and Basic service


customers appear to have the lowest
survival curves,
E-service customers appear to have
lower curves than Plus service customers.
To determine whether these differences
are due to chance, look at the comparisons
tables.
2/8/2018 Dr.Haftom Temesgen 46
Comparisons:

This table provides an overall test of the equality of survival times


across groups. The test statistic is based upon the differences in group
mean scores.

Since the significance value of the test is less than 0.05, we can
conclude that the survival curves of at least two groups are different.

Look at the pairwise tests to confirm which groups are different.


2/8/2018 Dr.Haftom Temesgen 47
Comparisons (cont…)

The pairwise tests for Basic service customers show that their survival
curve is statistically significantly different from E-service and Plus
service customers

The pairwise tests for Total service and Basic service customers show,
their survival curves are not statistically distinguishable. Since the
significance value of their pairwise comparison is >0.05.
2/8/2018 Dr.Haftom Temesgen 48
Summary
• With the Life Tables procedure, we have examined
the distribution of the time to churn, broken down
by levels of the factor Customer category.

• The comparison tests show that Total service and


Basic service customers have the lowest survival
curves, and E-service customers have lower curves
than Plus service customers.

2/8/2018 Dr.Haftom Temesgen 49


Kaplan-Meier Survival Analysis
• Kaplan-Meier Survival Analysis is a descriptive
procedure for examining the distribution of time-
to-event variables.

• Additionally, we can compare the distribution by


levels of a factor variable or produce separate
analyses by levels of a stratification variable.

2/8/2018 Dr.Haftom Temesgen 50


Example:
 A pharmaceutical company is developing an anti-
inflammatory medication for treating chronic pain.

 Of particular interest is the time it takes for the drug to take


effect and how it compares to an existing medication. Shorter
times to effect are considered better.
 The results of a clinical trial collected in pain_medication.sav.

 Use Kaplan-Meier Survival Analysis to examine the


distribution of "time to effect" and compare the effectiveness
of the two treatments.

2/8/2018 Dr.Haftom Temesgen 51


► To run a Kaplan-Meier Survival Analysis analysis, from the
menus choose:
► Analyze > Survival > Kaplan-Meier...
► Select Time to effect as the time variable.
► Select Effect status as the status variable.
► Click Define Event.
► Select Treatment as a factor.
► Click Compare Factor.
► Click Options

2/8/2018 Dr.Haftom Temesgen 52


Survival Tables

The survival table is a descriptive table that details the time until the
drug takes effect.
The table is sectioned by each level of Treatment, and each observation
occupies its own row in the table. As a result, the table is very large, so
only the section corresponding to the first 14 cases to respond to the New
drug2/8/2018
customers is shown here. Dr.Haftom Temesgen 53
Survival Tables

Time: The time at which the event or censoring occurred.

Status: Indicates whether the case experienced the terminal event or was
censored.
. 2/8/2018 Dr.Haftom Temesgen 54
Survival Tables

Cumulative Proportion Surviving at the Time: The proportion of


cases surviving from the start of the table until this time.
When multiple cases experience the terminal event at the same time,
these estimates are printed once for that time period and apply to all the
cases whose drug took effect at that time.
2/8/2018 Dr.Haftom Temesgen 55
Survival Tables (cont…)

Number of Cumulative Events: The number of cases that have


experienced the terminal event from the start of the table until this time.

Number of Remaining Cases: The number of cases that, at this time,


have yet to experience the terminal event or be censored.
2/8/2018 Dr.Haftom Temesgen 56
Survival Curve

Any point on the survival


curve shows the probability
that a patient on a given
treatment will not have
experienced relief by that
time.

The plot for the New drug below that of the Existing drug throughout
most of the trial, which suggests that the new drug may give faster relief
than the old.
To determine whether these differences are due to chance, look at the
comparisons
2/8/2018 tables. Dr.Haftom Temesgen 57
Comparing Survival Curves

The means and medians for survival time table offers a quick numerical
comparison of the "typical" times to effect for each of the medications.
Since there is a lot of overlap in the confidence intervals, it is unlikely
that there is much difference in the "average" survival time.

2/8/2018 Dr.Haftom Temesgen 58


Comparing Survival Curves

The percentiles table gives estimates of the first quartile, median, and
third quartile of the survival distribution.

The interpretation of percentiles for survival curves is that the


75thpercentile is the latest time that at least 75 percent of the patients
have yet to feel relief.
2/8/2018 Dr.Haftom Temesgen 59
Comparing Survival Curves

This table provides overall tests of the equality of survival times


across groups.

Since the significance values of the tests are all greater than 0.10, we
cannot determine a difference between the survival curves

2/8/2018 Dr.Haftom Temesgen 60


Summary:

• With the Kaplan-Meier Survival Analysis


procedure, we have examined the distribution of
time to effect for two different medications.

• The comparison tests show that there is not a


statistically significant difference between them.

2/8/2018 Dr.Haftom Temesgen 61


Cox Regression
• It builds a predictive model for time-to-event data.

• The model produces a survival function that predicts the probability


that the event of interest has occurred at a given time t for given
values of the predictor variables.

• The shape of the survival function and the regression coefficients for
the predictors are estimated from observed subjects; the model can
then be applied to new cases that have measurements for the
predictor variables.

• Note that information from censored subjects, that is, those that do
not experience the event of interest during the time of observation,
contributes usefully to the estimation of the model.
2/8/2018 Dr.Haftom Temesgen 62
Cox Regression…
• The Cox Regression procedure is useful for modeling the
time to a specified event, based upon the values of given
covariates.
 Example. Do men and women have different risks of developing
lung cancer based on cigarette smoking? By constructing a Cox
Regression model, with cigarette usage (cigarettes smoked per day)
and gender entered as covariates, you can test hypotheses
regarding the effects of gender and cigarette usage on time-to-
onset for lung cancer.

• The basic model offered by the Cox Regression procedure


is the proportional hazards model, which can be
extended through the specifications of a strata variable
or time-dependent covariates.
2/8/2018 Dr.Haftom Temesgen 63
The Proportional Hazards Model

64
The Proportional Hazards Model…
 The baseline hazard function measures this potential
independently of the covariates.
• The shape of the hazard function over time is defined by the
baseline hazard, for all cases.
• The covariates simply help to determine the overall magnitude
of the function.
• The value of the hazard is equal to the product of the baseline
hazard and a covariate effect.
• While the baseline hazard is dependent upon time, the
covariate effect is the same for all time points.
• Thus, the ratio of the hazards for any two cases at any time
period is the ratio of their covariate effects. This is the
proportional hazards assumption.
2/8/2018 Dr.Haftom Temesgen 65
Cox regression data consideration
Data:
Your time variable should be quantitative, but your status variable can
be categorical.
Independent variables (covariates) can be continuous or categorical; if
categorical, they should be dummy- or indicator-coded.
Strata variables should be categorical, coded as integers or short
strings.
Assumptions:
Observations should be independent, and the hazard ratio should be
constant across time; that is, the proportionality of hazards from one
case to another should not vary over time.

The latter assumption is known as the proportional hazards


assumption.
2/8/2018 Dr.Haftom Temesgen 66
Stratified Proportional Hazards
 When the proportional hazards assumption is not met
with respect to a categorical covariate,

 We can sometimes correct the problem by specifying


that covariate as a stratification variable.

 Separate baseline hazards are computed for each level


of the stratification variable, while the regression
coefficients for the remaining covariates are equal
across strata.
2/8/2018 Dr.Haftom Temesgen 67
Proportional Hazards
 If the proportional hazards assumption does not hold
(hazard ratio should be constant across time), you may
need to use the Cox with Time-Dependent Covariates
procedure.

• If you have no covariates, or if you have only one


categorical covariate, you can use the Life Tables or
Kaplan-Meier procedure to examine survival or hazard
functions for your sample(s)

2/8/2018 Dr.Haftom Temesgen 68


Summary
• The basic model offered by the Cox Regression
procedure is the proportional hazards model, which
can be extended through the specifications of a strata
variable or time-dependent covariates.

• The proportional hazards model assumes that the


time to event and the covariates are related through
the following equation

2/8/2018 Dr.Haftom Temesgen 69


Summary…

2/8/2018 Dr.Haftom Temesgen 70


Using Cox Regression to Model Customer Time to Churn

• As part of its efforts to reduce customer churn, a


telecommunications company is interested in modeling the
"time to churn" in order to determine the factors that are
associated with customers who are quick to switch to another
service.

• To this end, a random sample of customers is selected and their


time spent as customers, whether they are still active
customers, and various demographic fields are pulled from the
database.

• This information is collected in telco.sav.


2/8/2018 Dr.Haftom Temesgen 71
Using Cox Regression to Model Customer Time to Churn…

• Use Cox Regression to determine which attributes are


associated with shorter "time to churn".

• The company is especially interested in the relationship of the


company-assigned Customer category to churn, so be sure that
the final model contains this variable.

2/8/2018 Dr.Haftom Temesgen 72


Cox Regression analysis
• Analyze > Survival > Cox Regression..
► Select Months with service [tenure] as the time variable.
► Select Churn within last month [churn] as the status variable.
► Click Define Event.
► Select Age in years [age] through Years at current address
[address] and Level of education [ed] through Number of people in
household [reside] as covariates.

► Select Forward: LR as the variable selection method.


► Click Next in the Block group.
► Select Customer category [custcat] as a covariate.

► By using the entry method in a second block, this guarantees that


Customer category will be included in the model.

2/8/2018 Dr.Haftom Temesgen 73


► Click Categorical.
►Select marital, ed, retire, gender,
and custcat as categorical covariates.
► Click Continue.
► Click Plots in the Cox Regression
dialog box.
2/8/2018
Dr.Haftom Temesgen 74
► Select Survival and Hazard in the Plot Type group.
► Choose to have separate lines produced for each level of custcat.
► Click Continue.
► Click OK in the Cox Regression dialog box

2/8/2018 Dr.Haftom Temesgen 75


The status variable identifies whether the event has occurred for a
given case. If the event has not occurred, the case is said to be censored.

Censored cases are not used in the computation of the regression


coefficients, but are used to compute the baseline hazard.

The case processing summary shows that 726 cases are censored.These
are customers who have not churned
2/8/2018 Dr.Haftom Temesgen 76
The categorical variable codings are a useful reference for
interpreting the regression coefficients for categorical covariates.

By default, the reference category is the "last" category.


For example, even though Married customers have variable values of
1 in the data file, they are coded as 0 for the purposes of the regression.

2/8/2018 Dr.Haftom Temesgen 77


The omnibus tests are measures of how well the model performs. The
chi-square change from previous step is the difference between the –2
log-likelihood of the model at the previous step and the current step.

In the fourth step, age is removed from the model, likely because the
variation in time to churn that is explained by age is also explained
by employ and address; thus, when these variables are added to the
model, age is no longer necessary. Finally, marital is added in the fifth step
2/8/2018 Dr.Haftom Temesgen 78
Variable Selection

2/8/2018 Dr.Haftom Temesgen 79


The change from previous step and change from previous block both
report the effect of adding custcat to the model selected in Block 1.

Since the significance value of the change is less than 0.05, you can be
confident that custcat contributes to the model.

2/8/2018 Dr.Haftom Temesgen 80


Variable selection

The final model includes marital, address, employ, and custcat.

To understand the effects of individual predictors, look at Exp(B), which


can be interpreted as the predicted change in the hazard for a unit
increase in the predictor

2/8/2018 Dr.Haftom Temesgen 81


The value of Exp(B) for marital means that the churn hazard for an
unmarried customer is 1.541 times that of a married customer. Recall from
the categorical variable codings that unmarried = 1 for the regression.
The value of Exp(B) for address means that the churn hazard is
reduced by 100%−(100%×0.941)=5.9% for each year a customer has
lived at the same address.
The churn hazard for a customer who has lived at the same address for
five years is reduced by 100%−(100%×0.9225)=26.2%.
2/8/2018 Dr.Haftom Temesgen 82
Likewise, the value of Exp(B) for employ means that the churn hazard
is reduced by 100%−(100%×0.920)=8.0% for each year a customer has
worked for the same employer.

The churn hazard for a customer who has worked for the same
employer for three years is reduced by 100%−(100%×0.9203)=22.1%.

2/8/2018 Dr.Haftom Temesgen 83


The regression coefficient for the first category, corresponding to Basic
service customers, suggests that the hazard for Basic service customers is
1.129 times that of Total service customers;

However, the significance value for this coefficient is greater than 0.10,
so any observed difference between these customer categories could be
due to chance.

2/8/2018 Dr.Haftom Temesgen 84


By contrast, the significance values for the second and third categories,
corresponding to E-service and Plus service customers, are less than 0.05,
which means they are statistically different from the Total service.

The regression coefficients suggest that the hazard for E-service


customers is 0.563 times that of Total service customers, and
 The hazard for Plus service customers is 0.518 times that of Total
service customers.

2/8/2018 Dr.Haftom Temesgen 85


Thank you

2/8/2018 Dr.Haftom Temesgen 86

You might also like