Professional Documents
Culture Documents
Stata UK 2009 Handout
Stata UK 2009 Handout
Stata UK 2009 Handout
of Health Sciences, University of Leicester, UK Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden 3 MRC Clinical Trials Unit, London, UK
1 Department
Paul C Lambert
1/52
Outline
1 2 3 4 5 6 7 8
The stpm2 command Why We Need Flexible Models Time-Dependent Eects Quantifying Dierences Average Survival Curve Attained Age as the Time-Scale Relative Survival Crude Mortality
Paul C Lambert
2/52
Paul C Lambert
3/52
Linear Predictor
i = s (ln(t)|, k0 ) + x For models on the log cumulative hazard scale.
stpm2 ml hazard.ado
program stpm2 ml hazard version 10.0 args todo b lnf g negH g1 g2 tempvar xb dxb mleval xb = b, eq(1) mleval dxb = b, eq(2) local st exp(-exp(xb)) local ht dxb*exp(xb) mlsum lnf = _d*ln(ht) + ln(st) /** then deal with late entry, first and *** *** second derivatives etc **/ end
Paul C Lambert Flexible Parametric Survival Models UK Stata User Group 2009, London 5/52
Paul C Lambert
6/52
Women diagnosed with breast cancer in England and Wales 1986-1990 with follow-up to 1995(Coleman et al., 1999). As an example I will investigate the eect of deprivation (in ve groups) on all-cause mortality in women who were diagnosed under the age of 50 years. Follow-up will be restricted to 5 years. Due to their age, most of the women who die within 5 years will die due to their cancer.
Paul C Lambert
7/52
Deprivation Group
Least Deprived 2 3 4 Most Deprived
Paul C Lambert
8/52
Paul C Lambert
9/52
Time-Dependent Eects
The dierence between two hazard rates may not be proportional. We can choose to,
1 2 3
Ignore. Model on a dierent scale. Fit an interaction between the covariate and time.
Paul C Lambert
10/52
Time-Dependent Eects
A proportional hazards model can be written ln [Hi (t|xi )] = i = s (ln(t)|, k0 ) + xi With D time-dependent eects we write,
D
s (ln(t)| j , kj )xij + xi
There is a set of spline variables for each time-dependent eect. For any time-dependent eect there is an interaction between the covariate and the spline variables. The number of spline variables for a particular time-dependent eect will depend on the number of knots, kj
Paul C Lambert Flexible Parametric Survival Models UK Stata User Group 2009, London 11/52
There is no need to split the time-scale when tting time-dependent eects. When time-dependence is a linear function of ln(t) and N = 50, 000, 50% censored and no ties.
stcox using tvc() - 28 minutes, 24 seconds. stpm2 using dftvc(1) - 0 minutes, 2.5 seconds.
Paul C Lambert
12/52
Deprivation Group
40 0 1 Least Deprived Most Deprived 2 3 4 Years from Diagnosis 5
Paul C Lambert
13/52
Paul C Lambert
14/52
Quantifying Dierences
A key advantage of using a parametric model over the Cox model is that we can transform the model parameters to express dierences between groups in dierent ways. The hazard ratio is a relative measure and a greater understanding of the impact of an exposure can be obtained by also looking at absolute dierences. The predict command of stpm2 makes the predictions easy. They work in a similar way as the hrnumerator() and hrdenominator() commands.
Paul C Lambert
15/52
Paul C Lambert
16/52
Paul C Lambert
17/52
Paul C Lambert
19/52
This is what stcurve does. A dierent concept is the mean survival for a population with a particular covariate distribution.
N
Spop (t) =
i=1
Survival Function
Paul C Lambert
22/52
Survival curve calculated for each subject in the study population and then averaged. In large studies use the timevar() option.
Paul C Lambert Flexible Parametric Survival Models UK Stata User Group 2009, London 23/52
Paul C Lambert
24/52
Paul C Lambert
25/52
Paul C Lambert
26/52
Outcome is for femoral neck fractures. Risk of fracture varies by age. Age is used as the main time-scale. Alternative way of adjusting for age. Gives the age specic incidence rates.
Paul C Lambert Flexible Parametric Survival Models UK Stata User Group 2009, London 27/52
Cox Model Incidence rate ratio (no orchiectomy) Incidence rate ratio (orchiectomy)
.
Royston-Parmar Model Incidence rate ratio (no orchiectomy) Incidence rate ratio (orchiectomy)
Paul C Lambert Flexible Parametric Survival Models
Proportional Hazards
Incidence Rate (per 1000 pys) 75 50 25 10 5 1 Control No Orchiectomy Orchiectomy
.1 40 60 Age 80 100
Paul C Lambert
29/52
.1 40 60 Age 80 100
Paul C Lambert
30/52
20 10 5 2 1 50 60 70 Age
horizontal lines from piecewise Poisson model
80
90
100
Paul C Lambert
31/52
20
10
0 50 60 70 Age 80 90 100
Paul C Lambert
32/52
Relative Survival is used in population-based cancer studies. Growing interest in other disease areas: HIV (Bhaskaran et al., 2008), CHD (Nelson et al., 2008). Relative Survival is used to measure mortality associated with a particular disease. Avoids needing information on cause of death. Important as cause of death may not be recorded or may be inaccurately recorded. We use expected mortality (from routine data sources).
Paul C Lambert
33/52
If we transform to the survival scale, Relative Survival = Observed Survival Expected Survival R(t) = S(t) S (t)
Paul C Lambert
34/52
Analyse all 115,331 women diagnosed with breast cancer. Compare 5 age groups.
Relative Survival
. stpm2 agegrp2-agegrp5, df(5) scale(hazard) bhazard(rate)
Paul C Lambert
36/52
The excess hazard ratios come from a poor tting model. The eect of age is nearly always time-dependent. The inclusion of time-dependent eects is the same as for standard survival models. Relative and standard survival are now analysed within the same framework.
Paul C Lambert Flexible Parametric Survival Models UK Stata User Group 2009, London 37/52
Crude Mortality
Patient survival is the most important single measure of cancer patient care (the diagnosis and treatment of cancer) and is of considerable interest to clinicians, patients, researchers, politicians, health administrators, and public health professionals (Dickman and Adami, 2006). Little attention has been paid to the fact that each of these consumers of survival statistics have quite dierent needs. The standard approach of estimating net survival (relative survival or cause-specic survival) is useful for comparing populations but not necessarily relevant to individual patients.
Paul C Lambert
38/52
Paul C Lambert
39/52
Cronin and Feuer(Cronin and Feuer, 2000) showed how crude mortality due to cancer and due to other causes can be calculated from life tables. Available in Paul Dickmans strs command. Calculated separately in age groups. Time-scale split into large (yearly) time intervals. No individual level prediction using continuous covariate.
Paul C Lambert
41/52
Crude mortality can be estimated after tting a relative survival model. The tting of the relative survival model is not any dierent, but we do some tricky calculations postestimation. The exible parametric models allow individual level covariates to be modelled. See Lambert et al. (2009) for details.
Paul C Lambert
42/52
(u)du
S (u)R(u)(u)du
S (u)R(u)h (u)du
UK Stata User Group 2009, London 43/52
Integrating
The integration is performed numerically by splitting the time-scale into a large number, n, of small intervals (e.g. 1000). The predicted value of the integrand at each of the n values of t is obtained. The crude probability of death is the sum of the these predicted values. The variance is a bit trickier, as the observation-specic derivatives need to be obtained. These are calculated numerically (Statas predictnl command). The approach is similar to that used by Carstensen when calculating survival functions from Poisson based survival models(Carstensen, 2006)
Paul C Lambert Flexible Parametric Survival Models UK Stata User Group 2009, London 44/52
Example
28,943 men diagnosed with prostate cancer aged 40-90 in England and Wales between 1986-1988 inclusive and followed up to 1995. Restricted cubic splines are used to
Model the baseline excess hazard (6 knots). Model the main eect of age (4 knots). Model time-dependence of age (4 knots).
Paul C Lambert
45/52
stpm2cm is a post estimation command. It will calculate the crude probability of death due to cancer and other causes with associated condence intervals.
stpm2cm
. stpm2cm using uk popmort, /// mergeby( year sex region caquint age) maxt(10) /// diagage(agediag) diagyear(1986) attyear( year) /// attage( age) diagsex(1) /// at(agercs1 a1 agercs2 a2 agercs3 a3) /// stub(ciagediag) tgen(ci tagediag) /// mergegen(region 1 caquint 1) nobs(1000) ci
Paul C Lambert
46/52
Net Probability
Crude Probability
1.0 0.8 0.6 0.4 0.2 0.0 0 2 4 6 8 10 Years from Diagnosis
65 years
55 years 85 years
Paul C Lambert
47/52
Probability of Death
10
Alive
Paul C Lambert
48/52
10
Alive
Paul C Lambert
49/52
10
Alive
Paul C Lambert
50/52
Further Extensions
Update to Stata 11. Univariate and shared frailty models. Multiple Events. Competing Risks. Survey options? Cure models. Estimation of loss in expectation of life. Enhance ability to model multiple time-scale.
Paul C Lambert
51/52
References I
Bhaskaran, K., Hamouda, O., Sannes, M., Boufassa, F., Johnson, A. M., Lambert, P. C., Porter, K., and Collaboration, C. A. S. C. A. D. E. (2008). Changes in the risk of death after hiv seroconversion compared with mortality in the general population. JAMA, 300(1):5159. Carstensen, B. (2006). Demography and epidemiology: Practical use of the lexis diagram in the computer age or: Who needs the cox-model anyway? Technical report, Department of Biostatistics, University of Copenhagen. Coleman, M., Babb, P., Damiecki, P., Grosclaude, P., Honjo, S., Jones, J., Knerer, G., Pitard, A., Quinn.M.J., Sloggett, A., and De Stavola, B. (1999). Cancer survival trends in England and Wales, 1971-1995: deprivation and NHS Region. Oce for National Statistics, London. Cronin, K. A. and Feuer, E. J. (2000). Cumulative cause-specic mortality for cancer patients in the presence of other causes: a crude analogue of relative survival. Statistics in Medicine, 19(13):17291740. Dickman, P. W. and Adami, H.-O. (2006). Interpreting trends in cancer patient survival. J Intern Med, 260(2):103117. Dickman, P. W., Adolfsson, J., Astrm, K., and Steineck, G. (2004). Hip fractures in men with prostate cancer treated with orchiectomy. Journal of Urology, 172(6 Pt 1):22082212. Lambert, P. C., Dickman, P. W., Nelson, C. P., and Royston, P. (2009). Estimating the crude probability of death due to cancer and other causes using relative survival models. Statistics in Medicine, (in press). Lambert, P. C. and Royston, P. (2009). Further development of exible parametric models for survival analysis. The Stata Journal, 9:265290. Nelson, C. P., Lambert, P. C., Squire, I. B., and Jones, D. R. (2007). Flexible parametric models for relative survival, with application in coronary heart disease. Statistics in Medicine, 26(30):54865498. Nelson, C. P., Lambert, P. C., Squire, I. B., and Jones, D. R. (2008). Relative survival: what can cardiovascular disease learn from cancer? European Heart Journal, 29(7):941947. Nieto, F. J. and Coresh, J. (1996). Adjusting survival curves for confounders: a review and a new method. Am J Epidemiol, 143(10):10591068. Royston, P. (2001). Flexible parametric alternatives to the Cox model, and more. The Stata Journal, 1:128.
75
50
1 df: AIC = 53746.92, BIC = 53788.35 2 df: AIC = 53723.60, BIC = 53771.93 3 df: AIC = 53521.06, BIC = 53576.29
25
4 df: AIC = 53510.33, BIC = 53572.47 5 df: AIC = 53507.78, BIC = 53576.83 6 df: AIC = 53511.59, BIC = 53587.54
7 df: AIC = 53510.06, BIC = 53592.91 8 df: AIC = 53510.78, BIC = 53600.54 9 df: AIC = 53509.62, BIC = 53606.28 10 df: AIC = 53512.35, BIC = 53615.92
Paul C Lambert
1.1
1.2
Deprivation Group 3
1.3
1.4
df for Splines
1.1
1.2
Deprivation Group 4
1.3
1.4
1.3
1.4
Hazard Ratio
Paul C Lambert Flexible Parametric Survival Models UK Stata User Group 2009, London
The default knots positions tend to work fairly well. Unless the knots are in stupid places then there is usually very little dierence in the tted values. The graphs on the following page shows for 5 df (4 internal knots) the tted hazard and survival functions with the internal knot locations randomly selected.
Paul C Lambert
75
50
13.7 55.8 60.5 64.3 6.1 10.9 61.8 68.4 4.5 25.5 55.5 87.1
25
42.4 52.2 84.1 89.8 21.1 26.5 56.4 94.8 11.8 27.7 40.8 72.2
42.2 46.1 87.2 89.4 5.8 67.6 69.9 71.5 9.8 23.2 35.3 59.5 10.2 10.9 57.7 80.7
Paul C Lambert
Paul C Lambert
Model Model Model Model Model Model (a) (b) (c) (d) (e)
Baseline dfb 5 8 5 3 8
Time-dependent dft 3 5 5 3 8
age dfa 3 5 3 3 8
No. of Parameters 18 39 24 16 81
Paul C Lambert
Age 55
1 .8 .6 .4 .2 0 0
Age 65
10
Age 75
1 Crude Probability .8 .6 .4 .2 0 0 2 4 6 8 Years from Diagnosis 10 Crude Probability 1 .8 .6 .4 .2 0 0
Age 85
10
Model (a)
Model (b)
Model (c)
Model (d)
Model (e)
Paul C Lambert