Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Computer Methods and Programs in Biomedicine 177 (2019) 133–139

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine


journal homepage: www.elsevier.com/locate/cmpb

Evaluate Cutpoints: Adaptable continuous data distribution system for


determining survival in Kaplan-Meier estimator
Magdalena Ogłuszka, Magdalena Orzechowska, Dorota Jedroszka,
˛ Piotr Witas,
Andrzej K. Bednarek∗
Department of Molecular Carcinogenesis, Medical University of Lodz, Lodz, Poland

a r t i c l e i n f o a b s t r a c t

Article history: Background and Objective: Growing evidence of transcriptional and metabolomic differentiation induced
Received 21 August 2018 many studies which analyze such differentiation in context of outcome of disease progression, treatment
Revised 15 February 2019
or influence of many different factors affecting cellular and tissue metabolism. Particularly, cancer re-
Accepted 22 May 2019
searchers are looking for new biomarkers that can serve as a diagnostic/prognostic factor and its fur-
ther corresponding relationship regarding clinical effects. As a result of the increasing interest in use
Keywords: of dichotomization of continuous variables involving clinical or epidemiological data (gene expression,
Disease-free survival biomarkers, biochemical parameters, etc.) there is a large demand for cutoff point determination tools
ROC curve with simultaneous lack of software offering stratification of patients based on continuous and binary vari-
Prognosis
ables. Therefore, we developed “Evaluate Cutpoints” application offering wide set of statistical and graph-
Survival analysis
ical methods for cutpoint optimization enabling stratification of population into two or three groups.
Software
Methods: Application is based on R language including algorithms of packages such as survival, survMisc,
OptimalCutpoints, maxstat, Rolr, ggplot2, GGally and plotly offering Kaplan-Meier plots and ROC curves with
cutoff point determination.
Results: All capabilities of Evaluate Cutpoints were illustrated with example analysis of estrogen, pro-
gesterone and human epidermal growth factor 2 receptors in breast cancer cohort. Through ROC curve
the cutoff points were established for expression of ESR1, PGR and ERBB2 in correlation with their im-
munohistochemical status (cutoff: 1301.253, 243.35, 11,434.438, respectively; sensitivity: 94%, 85%, 64%,
respectively; specificity: 93%, 86%, 91%, respectively). Through disease-free survival analysis we divided
patients into two and three groups regarding expression of ESR1, PGR and ERBB2. Example algorithm
cutp showed that lowered expression of ESR1 and ERBB2 was more favorable (HR = 2.07, p = 0.0412;
HR = 2.79, p = 0.0777, respectively), whereas heightened PGR expression was correlated with better prog-
nosis (HR = 0.192, p = 0.0115).
Conclusions: This work presents application Evaluate Cutpoints that is freely available to download at
http://wnbikp.umed.lodz.pl/Evaluate-Cutpoints/. Currently, many softwares are used to split continuous
variables such as Cutoff Finder and X-Tile, which offer distinct algorithms. Unlike them, Evaluate Cut-
points allows not only dichotomization of populations into groups according to continuous variables and
binary variables, but also stratification into three groups as well as manual selection of cutoff point thus
preventing potential loss of information.
© 2019 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license.
(http://creativecommons.org/licenses/by-nc-nd/4.0/)

1. Introduction cal research is focused on identification of associations of potential


biomarkers with outcome that may serve as prognostic factor, fur-
In the advent of high-throughput technologies such as next thermore assessing its relationship with health effect. As a result
generation sequencing, mass spectrometry and metabolomics of increasing interest in the use of dichotomization of continuous
along with quantitative analysis of data, more and more biomedi- variables involving clinical or epidemiological data (gene expres-
sions, biomarkers, biochemical parameters, etc.) there is a large de-
mand for cutoff point determination tools. The “optimal” cutpoint

Corresponding author. being defined as that threshold value of the continuous covariate
E-mail address: andrzej.bednarek@umed.lodz.pl (A.K. Bednarek).

https://doi.org/10.1016/j.cmpb.2019.05.023
0169-2607/© 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license. (http://creativecommons.org/licenses/by-nc-nd/4.0/)
134 M. Ogłuszka, M. Orzechowska and D. Jedroszka
˛ et al. / Computer Methods and Programs in Biomedicine 177 (2019) 133–139

distribution, which best separates low and high risk patients with user also specifies the statistical method that will be used to calcu-
respect to some outcome [1,2]. In general, there are several sim- late the cutoff point. When the user chooses to divide the popula-
ple approaches that can possibly determine the cutoff point (i.e. tion into three groups, they also indicate if the cutoff point should
using the mean, median or quantile distribution). However, due be estimated using the hierarchical clustering method, or whether
to the fact that many factors can affect the pathogenic process of they will set the cutoff points manually. The query is then pro-
cancer and the effect of treatment cannot be unequivocally con- cessed by the server and the results include estimated cutoff points
firmed, due to the risk of recurrence, it is necessary to use the and graphical visualization.
appropriate statistical methods. One of the best known methods
accounting for the uncertainties associated with clinically relevant 2.2. Methods for cutoff point value determination
endpoints is survival analysis. Survival analysis is a set of statisti-
cal methods used to determine the time elapsed for any event to Compartmentation of the population into two groups can be
occur; in biomedical research for instance it can be death or re- performed with four statistical methods based on significance of
lapse. At the other hand choosing the best cutoff point is often correlation with binary outcome or survival time. The user can also
a compromise between the highest sensitivity (the smallest per- determine the cutoff point value manually.
centage of false negative results) and the highest specificity (the Methods based on significance of correlation with binary out-
smallest percentage of false positive results) which for measurable come: Youden index and minimization of the distance between
data is graphically represented in the form of ROC curve [3,4]. Of- PROC plot and point (0,1). Application uses the package Optimal-
ten the categorization of only two groups results in loss of infor- Cutpoints to generate ROC plots and estimate: cuttoff points, pos-
mation, therefore in some cases it is necessary to distinguish the itive predictive value (PPV), negative predictive value (NPV), sen-
third, middle group. There are several possibilities for use continu- sitivity (Se), specificity (Sp), positive diagnostic likelihood ratio
ous variable with survival analysis and Kaplan-Meier plot. The sim- (LR+), negative diagnostic likelihood ratio (LR-), false positives (FP),
plest approach is to divide continuous predictor into groups based false negatives (FN).
on quantile groups an example is PROGgene web application [5]. Methods based on significance of correlation with survival
Unfortunately, such non-systemic determination of cut-off point time: maximally selected rank statistics and Cox proportional haz-
lacks reproducibility and different studies are difficult to compare. ard model. Calculation of maximally selected rank statistics and es-
The statistical-mathematical methods of a cut-off point determi- timation of cutoff value (the most significant split based on the
nation for continuous predictor are implemented in several algo- standarized log-rank test) is performed with the use of maxstat
rithms like OptimalCutpoints [6], maxstat [7] but those programs package. As for the second method, application uses coxph function
are not user friendly and require minimal programing knowledge. from the survival package to fit Cox proportional hazard model to
Cutoff Finder is a one of the best web based and user friendly ap- the binary (outcome) and continuous (survival time and biomarker
plications that one may use with continuous variable predictor [8]. value) covariates. Cutpoint is then computed with the cutp func-
Another one is X-tile a standalone software [9]. Both programs are tion (survMisc package).
widely use to assess the association between marker of gene ex- Adaptive (manual) selection of the cutoff value: the user can
pression and survival, and automatically select the optimum cut select a cutoff value based on a scalable, interactive heatmap (gen-
point in continuous variable like gene expression or CpG methyla- erated with the use of plotly package) that illustrates all cuttoff
tion studies [10–12]. Here we present Evaluate Cutpoint software points (biomarker values arranged from lowest to highest). The in-
enabling estimation of one or two cutpoints, by using different tensity of the color (from blue to yellow) of each field represents
kinds of statistical methods, and graphical visualization of results. the probability value, calculated using the coxph function from the
Our software suite has several advantages over algorithms men- survival package. Selection of a field on the heatmap results in gen-
tioned above and the most important is a wider selection of sta- eration of Kaplan-Meier plot and a table with estimated hazard ra-
tistical algorithms for cut-off point determination. Application has tio, confidence interval and p-value. Cutpoint adaptability increases
been tested using clinical and gene expression data of Breast Inva- the value of the algorithm, since the statistically significant cutoff
sive Carcinoma (BC) cohort from The Cancer Genome Atlas (TCGA). value may not be optimal biologically or medically. In such cases,
We investigated whether the expression together with immunohis- despite lower statistical significance, we may receive better infor-
tochemically determined status of ESR1, PGR and ERBB2 have signif- mation from a medical point of view.
icant impact on disease free survival in breast cancer patients. Classification of the population into three groups based on sur-
vival time and binary outcome can be executed in the applica-
2. Materials and methods tion with the hierarchical clustering method (function rhier from
the Rolr package). Firstly, the algorithm splits the cohort into two
Evaluate Cutpoints is an application developed using the R lan- groups by estimation of the optimal cutpoint with the highest log-
guage [13], Shiny framework and R packages (R version 3.4.1): rank statistics. The procedure is then repeated in the resulting
survival, survMisc, OptimalCutpoints [6], maxstat [7], Rolr, ggplot2, groups to obtain two supplementary cutoff values. Second opti-
GGally and plotly. It consists of two main layers – the first one gen- mal cutpoint is the one with larger test statistics. Application omits
erates HTML dynamically, the second separates the real-time data all rows (observations) with NA values. User can also manually se-
analysis. Software is available through the web interface. lect two cutpoints from the sliders to observe the changes in the
Kaplan-Meier plot and pairwise comparison of hazard ratio, confi-
2.1. Workflow dence intervals and p-values between the resulting groups.
One should bear in mind that slight differences in results given
The input dataset can be any tab-separated file. Observations by employed methods, when they are indeed comparable, are
should be placed in rows, columns should represent variables. The natural. Firstly, theory behind an R function might use slightly
next step is to choose the number of resulting groups (two or different mathematics (as in the case of cutp and maxstat.test [14]),
three). The user then defines variables needed to perform the anal- secondly - there are potential differences in implementation of
ysis: biomarker, survival time and outcome. When the user wants algorithms performing the calculations, as well as in algorithms
to split the cohort into two groups, they also determine whether themselves. Moreover, various methods can be expected to give
the cutoff value should be estimated manually or based on the consistent results provided that cutpoints are defined unambigu-
significance of correlation with the binary outcome or survival. The ously for the data at hand. If there is an intermediate region
M. Ogłuszka, M. Orzechowska and D. Jedroszka
˛ et al. / Computer Methods and Programs in Biomedicine 177 (2019) 133–139 135

in the range of the interesting covariate that does not clearly tor 2 (ERBB2) receptors in BC tissue. BC is heterogeneous disease,
differentiate between survival curves, one can in principle end up therefore the evaluation of its molecular subtype is essential step
with significantly non-equal cutpoints from the former. in clinical management.
We used publicly available gene expression (RNAseqV2, RSEM
2.3. Data visualization and analysis normalized) and clinical data of BC cohort extracted from TCGA
(data status of Jan 28th, 2016). We defined cuttoff points for all
For each cuttoff point estimation method, the application gener- three receptors using all algorithms available in Evaluate Cutpoints
ates the histogram, Kaplan-Meier plot and calculates hazard ratio, application and all resulting figures were automatically generated.
confidence intervals and p-values (pairwise in case of division into Among the methods which we used for cutoff point deter-
three groups). Kaplan-Meier plot is generated using a combination mination was the one based on the measurement of specificity
of survival, ggplot2 and plotly packages. It is scalable and interac- and sensitivity. We focused on two methods established based on
tive (when the user hovers over the plot, survival probability and ROC curve (Youden Index and minimization of the distance be-
time is shown). Hazard ratio, confidence intervals and p-values are tween ROC plot and point (0,1)). The methods optimize the cuttoff
estimated with the use of coxph function from survival package. point using correlation between gene expression measurements as
biomarker and outcome information which in our case was im-
3. Results munohistochemically determined receptors status. The cutoff point
for ESR1 was determined as 1301.253 with sensitivity greater than
Once the input data has been prepared, we illustrate the func- 94% and specificity greater than 93% and was exactly the same
tionality of Evaluate Cutpoints application by the analysis of estro- for both methods (Fig. 1). For PGR the optimal cutoff point was
gen (ESR1), progesterone (PGR) and human epidermal growth fac- also the same for both methods and was determined as 243.35

Fig. 1. Cutoff point determination using ROC curve. A) and B) Histogram of ESR1 gene expression in 1093 breast cancer patients from TCGA data classifying them into two
groups according to cutoff point determined by Youden index and ROC plot (0, 1), respectively; C) ROC curve as a result of Youden index method with optimal cutoff point
for ESR1 and the quality of the prediction assessed by the AUC; D) ROC curve as a result of method of minimizing distance to the point (0.1) with optimal cutoff point for
ESR1 and the quality of the prediction assessed by the AUC;.
136 M. Ogłuszka, M. Orzechowska and D. Jedroszka
˛ et al. / Computer Methods and Programs in Biomedicine 177 (2019) 133–139

Table 1
Statistics for cutoff point optimization using methods based on ROC curve.

Youden index

cutpoint se sp PPV NPV DLR+ DLR- FP FN AUC

ESR1 1301.253 0.947 0.932 0.979 0.837 14.022 0.057 16 43 0.96


PGR 243.35 0.85 0.868 0.929 0.739 6.457 0.173 45 105 0.922
ERBB2 11,434.438 0.646 0.912 0.684 0.898 7.387 0.388 49 58 0.813
ROC (0.1)

ESR1 1301.253 0.947 0.932 0.979 0.837 14.022 0.057 16 43 0.96


PGR 243.35 0.85 0.868 0.929 0.739 6.457 0.173 45 105 0.922
ERBB2 10,753.2 0.659 0.884 0.624 0.898 5.674 0.386 65 56 0.813

Fig. 2. Disease-free survival analysis according to expression of ESR1 performed with cutp algorithm. A) Kaplan-Meier plot according to expression cutpoint of ESR1
calculated. B) Histogram of ESR1 gene expression in 1093 breast cancer patients from TCGA data classifying them into two groups according to cutoff point determined by
cutp survival algorithm.

Fig. 3. Disease-free survival analysis according to expression of ESR1 performed with maxstat algorithm. A) Plot of standardized log-rank statistics of ESR1 expression.
B) Histogram of ESR1 gene expression in 1093 breast cancer patients from TCGA data classifying them into two groups according to cutoff point determined by maxstat
survival algorithm. C) Kaplan-Meier plot according to expression cutpoint of ESR1.

with sensitivity equal 85% and specificity greater than 86%. In the HR = 2.78, p = 0.0791; adaptive: HR = 2.79, p = 0.0777, respectively),
case of ERBB2 the results were different depending on the method. whereas higher expression of PGR was more favorable (cutp:
Youden Index indicated cuttoff point as 11,434.438 with sensitiv- HR = 0.192, p = 0.0115; maxstat: HR = 0.222, p = 0.0246; adaptive:
ity greater than 64% and specificity greater than 91% while accord- HR = 0.108, p < 0.01) (Table 2). Figures for PGR and ERBB2 may be
ing to the method minimizing the distance, the cutoff point was found in Supplementary information (Supplementary Figures S9-
10,753.2 with sensitivity greater than 65% and specificity greater S12; S13-S18; S19-S24).
than 88%. In addition, the area under the curve (AUC) was equal Next, we stratified the population of patients into three groups
0.96, 0.922 and 0.813 for ESR1, PGR and ERBB2, respectively, indi- with simultaneous consideration of the survival variables using
cating high quality of the test prediction (Table 1). Figures for PGR Rolr package. For ESR1 the first optimal cutpoint was determined
and ERBB2 may be found in Supplementary information (Supple- as 134.9501 and the second one as 1932. For PGR first cutpoint
mentary Figures S1-S8). was 4.0357 and second cutpoint was 6667.0359, and for ERBB2
Cutpoint, optimal cutoff point; se, sensitivity; sp, specificity; first cutpoint was determined as 3743.1147 and second cutpoint
PPV, positive predictive value; NPV, negative predictive value; as 10,819.5356 (Table 3). Kaplan-Meier plots were generated and
DLR+, value set for positive diagnostic likelihood ratio, DLR-, value showed the differences in DFS between patients with low, medium
set for negative diagnostic likelihood ratio; FP, false positives; FN, and high expression of ESR1, PGR and ERBB2. In case of ESR1
false negatives; AUC, area under curve. and ERBB2 low expression of receptors was correlated with the
Disease-free survival (DFS) analysis showed that regardless most favorable prognosis, while low expression of PGR was asso-
of algorithm used, lowered ESR1 and ERBB2 expression was re- ciated with the worst prognosis (statistically significant are only:
lated with better recurrence prognosis (cutp: HR = 2.07, p = 0.0412 ESR1: HR = 0.167, p = 0.0446 for low vs high; PGR: HR = 7.98e+09,
(Fig. 2); maxstat: HR = 2.39, p = 0.0423 (Fig. 3); adaptive: HR = 2.42, p = 0.00301 for low vs high and HR = 9.13, p = 0.00865 for medium
p = 0.0396 (Fig. 4) and cutp: HR = 2.79, p = 0.0777; maxstat: vs high) (Table 4). Additionally, we could also chose Adapt option
M. Ogłuszka, M. Orzechowska and D. Jedroszka
˛ et al. / Computer Methods and Programs in Biomedicine 177 (2019) 133–139 137

Table 2
Statistics of DFS analysis according to three algorithms.

cutp

gene cutpoint No. of patients in No. of patients in HR 95% CI p-value


group < cutpoint group cutpoint >

ESR1 3890 328 711 2.07 1.01–4.24 0.0412


PGR 5542 896 197 0.192 0.0459–0.798 0.0115
ERBB2 3743 181 912 2.79 0.851–9.13 0.0777
maxstat

ESR1 1836 298 795 2.39 1–5.69 0.0423


PGR 6538 928 165 0.222 0.0532–0.93 0.0246
ERBB2 3714 180 913 2.78 0.848–9.09 0.0791
adaptive

ESR1 1913 299 794 2.42 1.02–5.75 0.0396


PGR 6645 931 162 0.108 0.0148–0.794 0.00816
ERBB2 3736 181 912 2.79 0.851–9.13 0.0777

HR, hazard ratio; CI, confidence intervals.

for manual selection of optimal cutoff points classifying patients


into three groups (Fig. 5B and D). Figures and statistic tables for
PGR and ERBB2 may be found in Supplementary information (Sup-
plementary Figures S25-S28).
As an Evaluate Cutpoints validation method we used Cutoff
Finder tool. From example data set GSE2034 downloaded from Cut-
off Finder website we chose ESR_20522_at as a biomarker. For cut-
off point determination based on ROC curve, immunohistologically
determined estrogen receptor status (ER_IHC) was employed as an
outcome variable. Both Cutoff Finder and Evaluate Cutpoints re-
ported nearly identical results. Cutoff point and AUC obtained from
Cutoff Finder were 10.11 and 0.94 respectively (Supplementary Fig-
ures S39-S32), and from Evaluate Cutpoint 10.128 and 0.939, both
ROC(0,1) and Youden methods (Supplementary Figures S33-S36).
All methods showed also the same sensitivity and specificity val-
ues, 91.9% and 85.7%, respectively. Of the example data set we
could not perform comparative DFS analysis due to lack of appro-
priate data; instead distant metastasis-free survival (DMFS) was ex-
amined with dmfs_time as time variable, dmfs_event as outcome
variable and ESR1 expression (ESR1 205225_at) as biomarker. Both
tools showed exactly the same results (cutpoint: 12.52, HR = 1.41
(95% CI: 0.96–2.06), p = 0.079) (Supplementary Table S1, Supple-
mentary Figures S37-S40). Adaptive selection of cutpoint showed
range of 11.8–12.8 with 12.5–12.6 being the most statistically sig-
nificant cutpoints (Supplementary Figures S41-S42).

4. Discussion

In our work, we present the freely available web ap-


plication Evaluate Cutpoints (http://wnbikp.umed.lodz.pl/
Evaluate-Cutpoints/), which allows estimation of one or two
cutoff points using various statistical algorithms and graphical
visualization of the results. The program is straightforward to use:
the user uploads the molecular data and select the method for
cutoff point determination. As a results user get estimated optimal
cutoff points, Kaplan-Meier plots and other overview plots as well
as interactive heat map, which allows manual selection of the
optimal cutoff point.
Many programs and applications that are used to split contin-
uous data exist with the example of X-tile and Cutoff Finder [8,9].
They differ from each other with the statistical algorithms offered,
Fig. 4. Disease-free survival analysis according to expression of ESR1 performed
the way the data are presented, the number of groups resulting
with adaptive algorithm. A) Kaplan-Meier plot according to expression cutpoint from the division and ease of use. We believe that Evaluate Cut-
of ESR1. B) Histogram of ESR1 gene expression in 1093 breast cancer patients from points combines the advantages and eliminates the deficiencies of
TCGA data classifying them into two groups according to cutoff point determined prior created software. It allows split of studied population into
by adaptive algorithm. C) Heatmap showing p-values of distinct cutoff points for
groups according to continuous variables (time, biomarker value)
each from ESR1 expression value, which are represented by differential intensity of
the colors. and binary variables (observation result). Major advantage of
138 M. Ogłuszka, M. Orzechowska and D. Jedroszka
˛ et al. / Computer Methods and Programs in Biomedicine 177 (2019) 133–139

Table 3
Cutoff point optimization using Rolr package and number of patients in each from three
groups (low, medium, high).

GENE CUTPOINT1 CUTPOINT2 LOW MEDIUM HIGH

ESR1 134.9501 1932.0478 155 (14.2%) 145 (13.3%) 793 (72.6%)


PGR 4.0357 6667.0359 17 (1.56%) 915 (83.7%) 161 (14.7%)
ERBB2 3743.1147 10,819.5356 181 (16.6%) 640 (58.6%) 272 (24.9%)

Evaluate Cutpoints is availability of algorithm for stratification not methods and calculates hazard ratio, which is not available in
only into two but also into three groups, unlike the Cutoff Finder, X-tile software. We have tested our software in comparison with
which offers dichotomization only. Such approach seems to be the Cutoff Finder using the same data and when the same statistical
most natural, however could cause loss of information. Addition- algorithms were used, results are identical (supplementary data).
ally, the user can manually adapt the cutoff point and observe In general, Evaluate Cutpoints as well as Cutoff Finder and X-Tile
how the survival probability and biological properties fit together. were developed to analyze data from growing high-throughput
Moreover, the software generates interactive heat map, in which gene expression experiments. However, such algorithms may be
the color is changing according to p-value for each possible cutoff used with another genomic continuous variable like recently
point. Cutoff Finder is not characterized by the flexibility of choice reported differential methylation of CpG islands in clear cell renal
and in consequence it is not possible to identify other potential cell carcinoma [11]. There are several important concerns pointing
cutoff points with high statistical significance. Furthermore, Eval- that discretization of continuous variables generates discontinuous
uate Cutpoints estimates cutoff point using five different statistical model of response that is not truly relevant for a data like gene

Fig. 5. Disease-free survival analysis according to expression of ESR1 performed with Rolr and adaptive algorithms. A) Kaplan-Meier plot according to expression
cutpoint of ESR1 by Rolr algorithm. B) Kaplan-Meier plot according to expression cutpoint of ESR1 by adaptive algorithm. C and D) Histogram of ESR1 gene expression in
1093 breast cancer patients from TCGA data classifying them into two groups according to cutoff point determined by Rolr and adaptive algorithm, respectively.
M. Ogłuszka, M. Orzechowska and D. Jedroszka
˛ et al. / Computer Methods and Programs in Biomedicine 177 (2019) 133–139 139

Table 4 Conflict of interest


Statistics for cutoff point optimization using Rolr package.

HR 95% CI p-value All co-authors declare no conflict of interests.


ESR1
Supplementary materials
Low vs High 0.167 0.0228–1.22 0.0446
Low vs Medium 0.245 0.0285–2.11 0.165
Medium vs High 0.501 0.176–1.43 0.187 Supplementary material associated with this article can be
PGR
found, in the online version, at doi:10.1016/j.cmpb.2019.05.023.

Low vs High 7.98e+09 0-Inf 0.00301 References


Low vs Medium 2.47 0.33–18.3 0.36
Medium vs High 9.13 1.24–67 0.00865 [1] M. Mazumdar, J.R. Glassman, Categorizing a prognostic variable: review of
ERBB2 methods, code for easy implementation and applications to decision-making
about cancer treatments, Stat. Med. 19 (20 0 0) 113–132.
Low vs High 0.341 0.0951–1.23 0.0848 [2] M. Mazumdar, A. Smith, J. Bacik, Methods for categorizing a prognostic variable
Low vs Medium 0.405 0.122–1.35 0.128 in a multivariable setting, Stat. Med. 22 (2003) 559–571, doi:10.1002/sim.1333.
Medium vs High 0.737 0.375–1.45 0.375 [3] J. Fan, S. Upadhye, A. Worster, Understanding receiver operating characteristic
(ROC) curves, CJEM 8 (2006) 19–20.
HR, hazard ratio; CI, confidence intervals. [4] L. Gonçalves, A. Subtil, M.R. Oliveira, P. Bermudez, ROC curve estimation: an
overview, REVSTAT–Stat. J. 12 (2014) 1–20.
[5] C.P. Goswami, H. Nakshatri, PROGgene: gene expression based survival analysis
expression. On another hand, most researchers and clinicians agree web application for multiple cancers, J. Clin. Bioinforma. 3 (2013) 22, doi:10.
1186/2043-9113-3-22.
to use immunohistochemical staining for survival analysis though [6] M. López-Ratón, M. Rodríguez-Álvarez, C. Cadarso-Suárez, F. Gude-Sampedro,
its categorized test data represents continuous protein concentra- OptimalCutpoints: an R package for selecting optimal cutpoints in diagnostic
tion. Another concern is that majority of microarray based gene tests, J. Stat. Softw. 61 (2014) 1–36, doi:10.18637/jss.v061.i08.
[7] T. Hothorn, B. Lausen, On the exact distribution of maximally selected rank
expression experiments are hardly comparable and using those statistics, Comput. Stat. Data Anal. 43 (2003) 121–137.
for survival analysis has a very limited clinical usage. This is also [8] J. Budczies, F. Klauschen, B.V. Sinn, B. Gyorffy, W.D. Schmitt, S. Darb-Esfahani,
true, but we should point that those are raver preliminary results C. Denkert, Cutoff Finder: a comprehensive and straightforward Web applica-
tion enabling rapid biomarker cutoff optimization, PLoS One 7 (2012) e51862,
directing follow-up analyses and not a true clinical marker selec-
doi:10.1371/journal.pone.0051862.
tion. Most important is that growing evidence of a standardized [9] R.L. Camp, M. Dolled-Filhart, D.L. Rimm, X-tile: a new bio-informatics tool for
RNAseq data like TCGA database are very useful for comparable biomarker assessment and outcome-based cut-point optimization, Clin. Cancer
survival analysis. cBioPortal on-line tool are quite useful for TCGA Res. 10 (2004) 7252–7259, doi:10.1158/1078- 0432.CCR- 04- 0713.
[10] A. Jouinot, G. Assie, R. Libe, M. Fassnacht, T. Papathomas, O. Barreau, B. de
database analysis including survival analysis based on gene ex- la Villeon, S. Faillot, N. Hamzaoui, M. Neou, K. Perlemoine, F. Rene-Corail,
pression variables [15,16]. However, cBioPortal survival analysis is S. Rodriguez, M. Sibony, F. Tissier, B. Dousset, S. Sbiera, C. Ronchi, M. Kroiss,
based on cutoff point chosen arbitrary by researcher that requires E. Korpershoek, R. de Krijger, J. Waldmann, D.K Bartsch, M. Quinkler, M. Hais-
saguerre, A. Tabarin, O. Chabre, N. Sturm, M. Luconi, F. Mantero, M. Man-
numerous tests. Alternatively, as showed in our analysis, TCGA nelli, R. Cohen, V. Kerlan, P. Touraine, G. Barrande, L. Groussin, X. Bertagna,
gene expression data can be used for survival analysis using our E. Baudin, L. Amar, F. Beuschlein, E. Clauser, J. Coste, J. Bertherat, DNA methy-
Evaluate Cutpoints or another algorithm discussed above. lation is an independent prognostic marker of survival in adrenocortical cancer,
J. Clin. Endocrinol. Metab. 102 (2017) 923–932, doi:10.1210/jc.2016-3205.
Evaluate Cutpoints was developed to facilitate the cutoff point [11] J.-H. Wei, A. Haddad, K.-J. Wu, H.-W. Zhao, P. Kapur, Z.-L. Zhang, L.-Y. Zhao,
estimation and optimization from continuous data. The software Z.-H. Chen, Y.-Y. Zhou, J.-C. Zhou, B. Wang, Y.-H. Yu, M.-Y. Cai, D. Xie, B. Liao,
was created for application in biomedical research, but it could be C.-X. Li, P.-X. Li, Z.-R. Wang, F.-J. Zhou, L. Shi, Q.-Z. Liu, Z.-L. Gao, D.-L. He,
W. Chen, J.-T. Hsieh, Q.-Z. Li, V. Margulis, J.-H. Luo, A CpG-methylation-based
also utilized in other areas where the division of continuous data assay to predict survival in clear cell renal cell carcinoma, Nat. Commun. 6
is necessary. The program can also be further extended with other (2015) 8699, doi:10.1038/ncomms9699.
methods for estimating cutoff point and their visualization. [12] E. Camacho-Urkaray, J. Santos-Juanes, F.B. Gutierrez-Corres, B. Garcia,
L.M. Quiros, I. Guerra-Merino, J.J. Aguirre, I. Fernandez-Vega, Establishing cut-
off points with clinical relevance for bcl-2, cyclin D1, p16, p21, p27, p53, Sox11
Acknowledgments and WT1 expression in glioblastoma - a short report., Cell. Oncol. (Dordr) 41
(2018) 213–221, doi:10.1007/s13402- 017- 0362- 4.
All co-authors testify that our article entitled Evaluate Cut- [13] R Development Core Team, R: aA Language and Environment for Statisti-
cal Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008.
points: adaptable continuous data distribution system for deter- http://www.R-project.org.
mining survival in Kaplan-Meier estimator submitted to Computer [14] C. Contal, J. O’Quigley, An application of changepoint methods in studying the
Methods and Programs in Biomedicine has not been published effect of age on survival in breast cancer, Comput. Stat. Data Anal. 30 (1999)
253–270.
in whole or in part elsewhere is not currently being considered [15] E. Cerami, J. Gao, U. Dogrusoz, B.E. Gross, S.O. Sumer, B.A. Aksoy, A. Jacobsen,
for publication in another journal and all authors have been per- C.J. Byrne, M.L. Heuer, E. Larsson, Y. Antipin, B. Reva, A.P. Goldberg, C. Sander,
sonally and actively involved in substantive work leading to the N. Schultz, The cBio cancer genomics portal: an open platform for explor-
ing multidimensional cancer genomics data, Cancer Discov. 2 (2012) 401–404,
manuscript, and will hold themselves jointly and individually re- doi:10.1158/2159- 8290.CD- 12- 0095.
sponsible for its content. [16] J. Gao, B.A. Aksoy, U. Dogrusoz, G. Dresdner, B. Gross, S.O. Sumer, Y. Sun, A. Ja-
This study was funded by Medical University of Lodz 503/0- cobsen, R. Sinha, E. Larsson, E. Cerami, C. Sander, N. Schultz, Integrative anal-
ysis of complex cancer genomics and clinical profiles using the cBioPortal, Sci.
078-02/503-01-004. Signal 6 (2013) pl1, doi:10.1126/scisignal.2004088.

You might also like