Deep Learning Approach To Predict Sentinel Lymph Node Status Directly From Routine Histology of Primary Melanoma Tumours

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

European Journal of Cancer 154 (2021) 227e234

Available online at www.sciencedirect.com

ScienceDirect

journal homepage: www.ejcancer.com

Original Research

Deep learning approach to predict sentinel lymph node


status directly from routine histology of primary
melanoma tumours

Titus J. Brinker a,*, Lennard Kiehl a, Max Schmitt a, Tanja B. Jutzi a,


Eva I. Krieghoff-Henning a, Dieter Krahl b, Heinz Kutzner c,
Patrick Gholam d, Sebastian Haferkamp e, Joachim Klode f,
Dirk Schadendorf f, Achim Hekler a, Stefan Fröhling g,
Jakob N. Kather g,h, Sarah Haggenmüller a, Christof von Kalle i,
Markus Heppt j, Franz Hilke k, Kamran Ghoreschi k, Markus Tiemann l,
Ulrike Wehkamp m, Axel Hauschild m,1, Michael Weichenthal m,1,
Jochen S. Utikal n,o,1

a
Digital Biomarkers for Oncology Group, National Center for Tumor Diseases, German Cancer Research Center, Heidelberg,
Germany
b
Private Laboratory of Dermatohistopathology, Mönchhofstraße 52, 69120, Heidelberg, Germany
c
Dermatopathology Laboratory, Friedrichshafen, Germany
d
Department of Dermatology, University Hospital Heidelberg, Heidelberg. Germany
e
Department of Dermatology, University Hospital Regensburg, Regensburg, Germany
f
Department of Dermatology, University Hospital Essen, Essen, Germany
g
Translational Medical Oncology, German Cancer Research Center (DKFZ), National Center for Tumor Diseases (NCT),
69120, Heidelberg, Germany
h
Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
i
Department of Clinical-Translational Sciences, Charité University Medicine and Berlin Institute of Health (BIH), Berlin,
Germany
j
Department of Dermatology, University Hospital Erlangen, Erlangen, Germany
k
Department of Dermatology, Venereology and Allergology, Charite´ - Universitätsmedizin, Berlin, Germany
l
Institute for Hematopathology Hamburg, Hamburg, Germany
m
Skin Cancer Unit, German Cancer Research Center (DKFZ), Heidelberg, Germany
n
Department of Dermatology, Venereology and Allergology, University Medical Center Mannheim, Ruprecht-Karl University
of Heidelberg, Mannheim, Germany
o
Department of Dermatology, University Hospital (UKSH), Kiel, Germany

Received 26 March 2021; received in revised form 18 May 2021; accepted 20 May 2021
Available online 20 July 2021

* Corresponding author: Digital Biomarkers for Oncology Group, National Center for Tumor Diseases (NCT), German Cancer Research Center
(DKFZ), Im Neuenheimer Feld 460, Heidelberg, 69120, Germany.
E-mail address: titus.brinker@dkfz.de (T.J. Brinker).
1
These authors contributed equally.

https://doi.org/10.1016/j.ejca.2021.05.026
0959-8049/ª 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://
creativecommons.org/licenses/by-nc-nd/4.0/).
228 T.J. Brinker et al. / European Journal of Cancer 154 (2021) 227e234

KEYWORDS Abstract Aim: Sentinel lymph node status is a central prognostic factor for melanomas.
Melanoma; However, the surgical excision involves some risks for affected patients. In this study, we
Skin cancer; therefore aimed to develop a digital biomarker that can predict lymph node metastasis non-
Artificial intelligence; invasively from digitised H&E slides of primary melanoma tumours.
Neural network Methods: A total of 415 H&E slides from primary melanoma tumours with known sentinel
model; node (SN) status from three German university hospitals and one private pathological practice
Lymph node biopsy; were digitised (150 SN positive/265 SN negative). Two hundred ninety-one slides were used to
Sentinel; train artificial neural networks (ANNs). The remaining 124 slides were used to test the ability
Histology; of the ANNs to predict sentinel status. ANNs were trained and/or tested on data sets that were
Machine learning; matched or not matched between SN-positive and SN-negative cases for patient age, ulcera-
Biomarkers; tion, and tumour thickness, factors that are known to correlate with lymph node status.
Pathology Results: The best accuracy was achieved by an ANN that was trained and tested on un-
matched cases (61.8%  0.2%) area under the receiver operating characteristic (AUROC).
In contrast, ANNs that were trained and/or tested on matched cases achieved
(55.0%  3.5%) AUROC or less.
Conclusion: Our results indicate that the image classifier can predict lymph node status to
some, albeit so far not clinically relevant, extent. It may do so by mostly detecting equivalents
of factors on histological slides that are already known to correlate with lymph node status.
Our results provide a basis for future research with larger data cohorts.
ª 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction identify those patients that will relapse in spite of a


negative SN biopsy and therefore can be identified as
Malignant melanoma accounts for approximately 75% candidate for adjuvant therapy [19]. There is a high
of all skin cancererelated deaths [1,2]. Sentinel lymph clinical need to predict SLN status non-invasively,
node (SLN) status is a strong prognostic factor for the reproducibly, and with high accuracy, especially for
survival of melanoma patients. SLN biopsy (SLNB) is subgroups of patients with high-risk factors for surgery
routinely performed for patients presenting with a pri- and multiple comorbidities [20,21].
mary melanoma with a Breslow thickness of 1 mm or Recently, deep learning-based artificial neural net-
more for staging purposes and to identify patients that works (ANNs) have proven their potential in skin can-
are more likely to benefit from adjuvant treatments cer image analysis [22e24] as well as digital pathology
[3e5]. Conversely, the curative impact of lymph node for melanoma [25e30]. Kulkarni et al. predicted distant
dissection was limited in several clinical trials [6]. visceral recurrence on digitised sections of the primary
The probability of sentinel node (SN) positivity is tumour using a combination of precomputed features, a
15e25% overall [7], ranging from less than 10% for T1 convolutional neural network (CNN) and a recurrent
tumours to approximately 45% for T4 tumours [8]. neural network [31].
SLNBs by themselves but especially in conjunction with In this study, we aimed to develop a deep
subsequent completion lymph node dissection in the learningebased digital biomarker to predict the likeli-
tumour area can be associated with considerable hood of SLNþ from digitised H&E slides using whole
morbidity, such as scar formation, lymphedema, and/or slide images (WSIs) of primary tumours.
infection in some patients [9]. Thus, it would be highly
desirable to determine the likelihood of SLNþ in 2. Materials and methods
advance so that SLNB could be omitted for a clinically
relevant proportion of patients. 2.1. Histopathological slides and clinical characteristics
Risk factors known to be associated with positive
lymph node status (SLNþ) in melanomas are increasing Ethics approval was obtained from the ethics commit-
Breslow thickness and ulceration [10e12]. Younger age, tees of the respective universities before the study was
mitotic rate, and the level of tumour-infiltrating lym- initiated. A total of 415 digitised H&E-stained pathol-
phocytes may also influence SLN status [13,14]. BRAF ogy slides of primary melanoma tumours from different
mutations may be correlated with SLN positivity [15]. patients were collected from three different German
Expression analysis of isolated markers or of marker university hospitals (Heidelberg 1, Mannheim, and Kiel)
combinations may also contribute to predicting and one pathological practice (Dres Krahl, Heidelberg
lymphatic metastases [16e18] and in addition may 2) and scanned with a specialised slide scanner
T.J. Brinker et al. / European Journal of Cancer 154 (2021) 227e234 229

(Heidelberg 1 and 2 and Mannheim with a ZEISS Axio immune, and other; Fig. 1), clustered cells of the same
Scan. Z1, resolution Z 0.22 mm/pixel; Kiel with a class and computed the same seven cell features as
3DHistech Panoramic 1000, resolution Z 0.25 mm/ Kulkarni et al. (see Appendix for details). Tiles with
pixel). Available clinical data included patient age, more than 65% background pixels (Luma >217) or tiles
tumour thickness, ulceration, and SLN status (Table 1). where more than 80% of the detected cells were classified
We used iterative stratification [32,33] to distribute as “other” were discarded, resulting in a total of 1.4
the slides of each university hospital into separate million tiles. Tile labels were inherited from the corre-
training/validation (n Z 291) and test (n Z 124) sets. sponding slide. The number of tiles per slide varied
Stratification occurred with respect to clinical data greatly (range: 38e53483) depending on lesion size.
(patient age, tumour thickness, and ulceration) and Finally, in addition to the SLN status, we included
number of tiles per slide, which were generated in pro- clinical data (tumour thickness, ulceration, and patient
portion to the annotation area and which were thus age), each normalised to a value between 0 and 1. A
approximately proportional to the area of visible combination of cell features and clinical data was per-
tumour tissue. formed using concatenation.
From the full set, which was unmatched for tumour
thickness and other possibly relevant factors between 2.3. ANN training and testing
the SLNþ and SLN groups, we paired each case from
the minority class to the closest match of the majority We selected the commonly used CNN architecture
class within the respective set regarding the clinical data ResNeXt50 [35], which was pretrained on ImageNet-1k
tumour thickness, ulceration, and patient age, resulting [36] as an image feature extractor. Our library of choice
in an alternative ‘nested’ matched data set with n Z 288 was PyTorch Lightning [37] on top of PyTorch [38]. Cell
(training/validation: n Z 200, test: n Z 88). features and/or clinical data were first fed through a
three-layer fully connected network and then used to
2.2. ANN input preparation scale the image features in Squeeze and Excitation style
[39]. The classification head consisted of a two-layer
Images, cell features, and clinical data were used as fully connected network.
separate inputs for the ANN. All WSIs were annotated During training, we randomly cropped images and
by a bioinformatician instructed by an experienced resized them to 224  224 and used RandAugment [40]
dermatohistopathologist to mark the tumour area. The to increase generalisation. For all our experiments, we
tumour area was partitioned into tiles of 256  256 used SGDP [41] with a maximum learning rate of 0.04, a
pixels as input for the ANN. maximum momentum of 0.95, and a batch size of 256.
We also computed cell features of each tile using The learning rate followed a cyclical schedule with two
QuPath [34]. We ran automated cell detection on each cycles [42]. During the first cycle, CNN parameters were
tile, classified each cell into one of three classes (tumour, not updated, whereas maximum learning rate was

Table 1
Clinical data distribution in the data set (median  median absolute deviation).
Parameter SLNþ SLN All P*
Overall n 150 265 415
Tumour thickness (mm) 2.5  1.3 1.7  0.8 2.0  1.0 0.038
Age (years) 63  12 66  10 65  10 0.031
Ulceration (yes/no/unknown) 40/59/51 49/99/117 89/158/168 0.776
Heidelberg 1 (HD1) n 13 68 81
Tumour thickness (mm) 4.5  1.9 1.2  0.35 1.4  0.5 0.079
Age (years) 63  12 66  10 65  10 0.277
Ulceration (yes/no/unknown) 0/0/13 0/0/68 0/0/81 0.803
Heidelberg 2 (HD2) n 19 55 74
Tumour thickness (mm) 1.8  0.6 1.2  0.35 1.4  0.5 0.406
Age (years) 53  14 63  12 62  12 0.239
Ulceration (yes/no/unknown) 0/9/10 6/20/29 6/29/39 0.255
Kiel (KI) n 62 92 154
Tumour thickness (mm) 3.0  1.6 2.4  1.3 2.6  1.4 0.110
Age (years) 65  9 68  8 67  8 0.064
Ulceration (yes/no/unknown) 29/31/2 28/60/4 57/91/6 0.047
Mannheim (MA) n 56 50 106
Tumour thickness (mm) 2.3  1.0 2.5  1.5 2.4  1.2 0.491
Age (years) 61  12 68  10 65  10 0.263
Ulceration (yes/no/unknown) 11/19/26 15/19/16 26/38/42 0.987
*P values from multivariate logistic regression using two-sided t-test.
230 T.J. Brinker et al. / European Journal of Cancer 154 (2021) 227e234

Fig. 1. Pipeline overview. (A) Tumour regions on whole slide images were identified and annotated as regions of interest. (B) Marked
regions were tessellated into tiles of 256  256 pixels. (C) Cells were detected, classified (tumour, immune, and other), and clustered, and
seven cell features (see Appendix for details) were computed. (D) Clinical and/or cell features were processed by an MLP and were fed into
the model by SE. (E) A CNN was trained to extract relevant image features directly from the tiles to predict lymph node status on the tile
level. (F) Image features extracted by CNN were combined with processed clinical and/or cell features. (G) Statistical evaluation was done
on slide level by averaging all tile scores of a particular slide. MLP, multilayer perceptron; SE, Squeeze and Excitation; CNN, con-
volutional neural network.

halved and all parameters were updated during the 3.2. Performance of classifiers on the unmatched test set
second cycle. Finally, we logarithmically decreased the
learning rate up to the first layer by a factor of 100. Fig. 2, top panel, shows the mean AUROC of our
To account for the large variance in the number of trained models on the test set that is unmatched for
tiles in between slides, we resampled a maximum of 500 clinical data. Models could encompass three separate
tiles per slide each epoch and then sampled each batch inputs individually or in combination: image features
to contain roughly half positive and half negative tiles extracted by a CNN, cell composition calculated by
with replacement (for further details, see Appendix). QuPath, and the clinical data. Table 2 provides a more
Slide scores were calculated by taking the average scores detailed overview over the best-performing models.
of their respective tiles. A slide was predicted as positive The model that was trained on the unmatched
when a certain threshold was reached on the validation training set and received only images as input achieved
set. Performance was measured by calculating the area the highest AUROC of (61.8  0.2)% and the highest
under the receiver operating characteristic (AUROC) AP of (46.4  2.7)%, which were significantly better
and the balanced accuracy on a holdout test set. than those of a random classifier (50% AUROC and 46/
Because of the limited interpretability of the AUROC 124 Z 37% AP). The corresponding clinical data clas-
in an imbalanced classification setting [43], we addi- sifier performed on par with (61.6  0.3)% AUROC and
tionally included average precision (AP), a metric (45.1  0.5)% AP. A combination of images and clinical
similar to the area under the precisionerecall curve. data did not yield a better performance, suggesting that
similar information may be encoded by both classifiers.
3. Results The models that were trained on matched images but
tested on the unmatched test set did not perform better
3.1. Distribution of features in training/validation and test than a random classifier. In particular, the pure image
sets classifier trained on slides matched for clinical data
performed slightly worse than random.
The ‘unmatched’ test set contained all available images Cellular composition did not correlate with SLN
from the participating clinics. Thus, this set contained status in our cohort, as this classifier also showed similar
more SLN-negative slides than SLN-positive slides. The performance to a random classifier with (52.1  1.0)%
‘matched’ test set, in contrast, contained the same AUROC and (37.3  0.5)% AP. Any combination with
amount of SLN-positive and SLN-negative cases, which cell composition features performed worse than without
were matched for tumour thickness, ulceration, and those.
patient age. These features were therefore significantly
different in the unmatched test set (Table 1) but not in 3.3. Performance after matching
the matched test set. Note that the matched test set is a
subset of the unmatched test set and therefore has fewer Performance of the clinical data and image classifiers
samples. trained on the unmatched training set and tested on the
T.J. Brinker et al. / European Journal of Cancer 154 (2021) 227e234 231

Fig. 2. AUROC comparison of input types (image features, cell features, and clinical data), their combinations, and the corresponding
model. All models were trained six times on data from all clinics, either on matched or unmatched training images, and subsequently tested
on both the matched and the unmatched test set. The black vertical line indicates the expected performance of a random classifier.

unmatched test set were very similar. Thus, we investi- In conjunction with the on-par performance of the
gated whether the image classifier trained on unmatched image classifier with clinical data in the unmatched/un-
images might identify features in the test set that merely matched setting, this indicates that the image classifier
reflect the clinical data or whether it might use other, mostly detected features corresponding to the clinical
independent features. The difference between un- data in the image and not independent features.
matched and matched training/validation and test sets
was that there were fewer SLN patients, and therefore,
similar numbers of SLNþ and SLN samples in the 4. Discussion
matched sets in such a way that the clinical data did not
correlate with the diagnosis in the matched sets In this study, we show that SN status can be predicted to
anymore. some extent using CNN-based image analysis. Our data
After matching the test set for tumour thickness and indicate that the CNN may mostly achieve this by
patient age, all models regardless of input performed detecting morphological equivalents of features that are
similarly to the random classifier or worse, including already known to correlate with lymph node status,
models trained on an unmatched set but tested on the namely, tumour thickness, ulceration, and patient age.
matched test set (Fig. 2, bottom panel). The image As with the clinical features, the performance of our
classifier trained on unmatched images performed only image classifier is so far not highly accurate. This may be
slightly above baseline with (55.0  3.5)% AUROC and because of several reasons. For instance, melanomas are
55.1%  3.9% AP. The clinical data classifier showed an composed of tumour cells, which are not tightly con-
almost perfect random performance on the matched test nected to begin with. Therefore, morphological changes
set. Note that the AP baseline in the matched setting is associated with the ‘classical’ epithelialemesenchymal
44/88 Z 50%. transition (EMT) may be required to a lesser extent to
232 T.J. Brinker et al. / European Journal of Cancer 154 (2021) 227e234

Table 2
Overview of relevant models when training and testing were done on the unmatched set.
Input/model AUROC (%) AP (%) BA (%) Sensitivity (%) Specificity (%)
Cell composition 52.1  1.0 37.3  0.5 50.7  2.4 37.7  31.5 63.7  27.4
Clinical data 61.6  0.3 45.1  0.5 59.4  1.4 48.9  10.7 69.9  8.3
Cell þ Clinical 60.9  1.7 44.6  1.4 60.9  1.9 67.4  6.1 54.5  9.7
Image features 61.8  0.2 46.4  2.7 56.5  2.0 48.2  14.2 64.7  11.1
Image þ Cell 56.5  1.8 43.8  2.0 55.6  4.1 55.8  5.3 55.3  13.3
Image þ Clinical 61.3  0.4 44.9  0.8 59.0  1.9 52.9  7.5 65.2  7.9
Image þ Clinical þ Cell 61.6  1.3 45.5  1.4 58.6  2.8 59.1  8.5 58.1  8.5
Metrics are given as mean  standard deviation over six runs over the whole test set.

enable the tumour cells to spread. Hence, there may be also reflect more aggressive growth. In thin melanomas,
few morphological changes of the tumour cells them- biological differences may be especially relevant for the
selves or of tumour architecture available for the CNN formation of lymph nodes and distant metastases. For
to use. instance, it was shown that mitotic count correlates with
In addition, some tumour cells may have the ability positive lymph nodes in this subgroup in particular.
to spread into the lymph nodes downstream of the pri- Therefore, it may be rewarding to analyse this subgroup
mary tumour but may not have spread just yet. Vice in particular more thoroughly in further studies using
versa, single tumour cells in lymph nodes may be missed CNN-based image analysis. As thin melanomas rarely
during the histopathological workup. Therefore, the spread and lymph node analysis is not necessarily per-
‘ground truth’ underlying our classes of SLNþ and formed, however, it will be challenging to obtain enough
SLN is unavoidably not completely correct. Another samples to generate a reliable classifier for this group.
factor may also contribute to a ‘fuzzy’ ground truth. We Our data set, which was obtained from routine clinical
only analysed one slide per case, and particularly, su- practice, did not contain enough of these cases to
perficially spreading melanomas may be highly heter- perform such analyses.
ogenous. Based on the assumption that the area of the
tumour with the highest vertical spread is the area of the 4.1. Limitations
most aggressive tumour growth, we used tissue sections
encompassing this area for our analyses. Nevertheless, We did not use an external test set to validate our re-
as we cannot tell at present with any certainty which sults, and our data set was comparatively small. Thus, it
area of the tumour or which cell clone is going to spread, might be possible that the CNN used features for clas-
we may have missed areas of relevance in some cases sification that do not represent true biological differ-
using this approach. Also, all tiles derived from the same ences but artefacts or features that correlate with lymph
slide received the same label, so that the classifier was node status by chance in the data set. For instance, in a
likely also trained with tiles from tumour areas that did small data set, a CNN may learn to associate features
not contain tumour cells with spreading potential. Vice that reoccur on many tiles of one tumour with its class,
versa, slides in the test set were classified by ‘majority although they reflect patient-individual features only.
vote’, so if only a small area of the lesion was able to However, the fact that the e albeit moderate e better
spread, this area might not have sufficed to induce the than random classification accuracy disappears
correct classification. completely when either the slides of the test or of the test
The question still remains to what extent melanoma set are matched for factors that are known to correlate
cells have to undergo changes to spread into lymph with lymph node status may argue for a classification
nodes and/or distant tissues. Using the same cell based on relevant image features.
composition as Kulkarni et al. used to predict visceral
metastases did not improve predictive accuracy in this 5. Conclusions
setting, indicating divergent biological requirements for
these two types of tumour spreading. It appears most In summary, our data indicate that it might be possible
likely that both biological differences and access to to predict the probability of lymph node positivity by
lymph vessels contribute to tumour cell spread. CNN-based image analysis of the primary tumour.
Although our CNN did not detect features beyond However, additional studies are required to confirm
tumour thickness and patient age that predict lymph these results and to improve the prediction to increase
node positivity, studies with melanoma patients and clinical applicability further.
experiments in mice indicate that biological differences
may play a role in melanoma spread to the lymph nodes Author contributions
[44]. Also, melanoma thickness and ulceration, factors
that correlate with lymph node positivity, do not merely TJB: Study concepts, Study design, Data acquisition,
represent a function of tumour growing time but may Data analysis and interpretation, Manuscript
T.J. Brinker et al. / European Journal of Cancer 154 (2021) 227e234 233

preparation. TBJ: Study concepts, Study design, Data [4] Peach H, Board R, Cook M, Corrie P, Ellis S, Geh J, et al.
analysis and interpretation. EKH: Study concepts, Study Current role of sentinel lymph node biopsy in the management of
cutaneous melanoma: a UK consensus statement. J Plast
concepts, Data analysis and interpretation, Manuscript Reconstr Aesthetic Surg 2020;73:36e42.
preparation. LK: Study concepts, Study design, Study [5] Eigentler TK, Mühlenbein C, Follmann M, Schadendorf D,
concepts, Study design, Data analysis and interpreta- Garbe C. S3-Leitlinie Diagnostik, Therapie und Nachsorge des
tion, Statistical analysis, Manuscript preparation. MS: Melanoms - update 2015/2016, Kurzversion 2.0. J Dtsch Der-
Study concepts, Study design, Quality control of data matol Ges 2017;15:e1e41.
[6] Bartlett EK. Current management of regional lymph nodes in
and algorithms. JSU: Data acquisition. DK: Data patients with melanoma. J Surg Oncol 2019;119:1186.
acquisition. Axel H, Wehkamp U, Tiemann M: Data [7] Vetto JT, Hsueh EC, Gastman BR, Dillon LD, Monzon FA,
acquisition. MW: Data acquisition. Achim H: Data Cook RW, et al. Guidance of sentinel lymph node biopsy de-
analysis and interpretation. SH: Manuscript prepara- cisions in patients with T1eT2 melanoma using gene expression
tion. All authors involved Manuscript editing and profiling. Future Oncol 2019;15:1207e17.
[8] El Sharouni M-A, Witkamp AJ, Sigurdsson V, van Diest PJ.
Manuscript review. Trends in sentinel lymph node biopsy enactment for cutaneous
melanoma. Ann Surg Oncol 2019;26:1494e502.
Funding [9] Renner P, Torzewski M, Zeman F, Babilas P, Kroemer A,
Schlitt HJ, et al. Increasing morbidity with extent of lymphade-
nectomy for primary malignant melanoma. Lymphatic Res Biol
This research was funded by the German Ministry of
2017;15:146e52.
Health, Berlin, Germany (Grant: Tumor Behavior Pre- [10] Hanna AN, Sinnamon AJ, Roses RE, Kelz RR, Elder DE, Xu X,
diction Initiative (TPI) Grant Holder: Dr. med. Titus J. et al. Relationship between age and likelihood of lymph node
Brinker). metastases in patients with intermediate thickness melanoma
(1.01-4.00 mm): a National Cancer Database study. J Am Acad
Dermatol 2019;80:433e40.
Conflict of interest statement [11] Conic RRZ, Ko J, Damiani G, Funchain P, Knackstedt T, Vij A,
et al. Predictors of sentinel lymph node positivity in thin mela-
The authors declare the following financial interests/ noma using the National Cancer Database. J Am Acad Dermatol
personal relationships which may be considered as po- 2019;80:441e7.
[12] Chang JM, Kosiorek HE, Dueck AC, Leong SPL, Vetto JT,
tential competing interests:T.J.B. would like to disclose
White RL, et al. Stratifying SLN incidence in intermediate
that he owns a health technology company (Smart thickness melanoma patients. Am J Surg 2018;215:699e706.
Health Heidelberg GmbH, Handschuhsheimer Landstr. [13] Egger ME, Stevenson M, Bhutiani N, Jordan AC, Scoggins CR,
9/1, 69120 Heidelberg, Germany, https://smarthealth. Philips P, et al. Age and lymphovascular invasion accurately
de), which develops mobile apps, outside the submitted predict sentinel lymph node metastasis in T2 melanoma patients.
Ann Surg Oncol 2019;26:3955e61.
work. All remaining authors have declared no conflicts
[14] Fortes C, Mastroeni S, Caggiati A, Passarelli F, Ricci F,
of interest. Michelozzi P. High level of TILs is an independent predictor of
negative sentinel lymph node in women but not in men. Arch
Acknowledgements Dermatol Res 2020. https://doi.org/10.1007/s00403-020-02067-0.
[15] Manninen AA, Gardberg M, Juteau S, Ilmonen S, Jukonen J,
Andersson N, et al. BRAF immunohistochemistry predicts
The authors would like to thank Astrid Doppler from sentinel lymph node involvement in intermediate thickness mela-
Melanom Info Deutschland (MID) for her support with nomas. PLoS One 2019;14:e0216043.
the ethics approval. [16] Toberer F, Haenssle HA, Laimer M, Heinzel-Gutenbrunner M,
Enk A, Hartschuh W, et al. Vascular endothelial growth factor
receptor-3 expression predicts sentinel node status in primary
Appendix A. Supplementary data cutaneous melanoma. Acta Derm Venereol 2020;100:adv00235.
[17] Bellomo D, Arias-Mejias SM, Ramana C, Heim JB,
Supplementary data to this article can be found online Quattrocchi E, Sominidi-Damodaran S, et al. Model combining
tumor molecular and clinicopathologic risk factors predicts
at https://doi.org/10.1016/j.ejca.2021.05.026.
sentinel lymph node metastasis in primary cutaneous melanoma.
JCO Precis Oncol 2020;4:319e34.
References [18] Mulder EEAP, Dwarkasing JT, Tempel D, van der Spek A,
Bosman L, Verver D, et al. Validation of a clinicopathological
[1] Mahbod A, Schaefer G, Ellinger I, Ecker R, Pitiot A, Wang C. and gene expression profile model for sentinel lymph node
Fusing fine-tuned deep features for skin lesion classification. metastasis in primary cutaneous melanoma. Br J Dermatol 2020.
Comput Med Imag Graph 2019;71:19e29. https://doi.org/10.1111/bjd.19499.
[2] Han SS, Kim MS, Lim W, Park GH, Park I, Chang SE. Classi- [19] Eggermont AMM, Bellomo D, Arias-Mejias SM, Quattrocchi E,
fication of the clinical images for benign and malignant cutaneous Sominidi-Damodaran S, Bridges AG, et al. Identification of stage
tumors using a deep learning algorithm. J Invest Dermatol 2018; I/IIA melanoma patients at high risk for disease relapse using a
138:1529e38. clinicopathologic and gene expression model. Eur J Canc 2020;
[3] Verver D, van Klaveren D, van Akkooi ACJ, Rutkowski P, 140:11e8.
Powell BWEM, Robert C, et al. Risk stratification of sentinel [20] Moody JA, Ali RF, Carbone AC, Singh S, Hardwicke JT.
node-positive melanoma patients defines surgical management Complications of sentinel lymph node biopsy for melanoma - a
and adjuvant therapy treatment considerations. Eur J Canc 2018; systematic review of the literature. Eur J Surg Oncol 2017;43:
96:25e33. 270e7.
234 T.J. Brinker et al. / European Journal of Cancer 154 (2021) 227e234

[21] Tejera-Vaquerizo A, Ribero S, Puig S, Boada A, Paradela S, patients at risk for visceral recurrence and death. Clin Canc Res
Moreno-Ramı́rez D, et al. Survival analysis and sentinel lymph 2020;26:1126e34.
node status in thin cutaneous melanoma: a multicenter observa- [32] Sechidis K, Tsoumakas G, Vlahavas I. On the stratification of
tional study. Cancer Med 2019;8:4235e44. multi-label data. Machine learning and knowledge discovery in
[22] Brinker TJ, Hekler A, Enk AH, Klode J, Hauschild A, Berking C, databases. Springer Berlin Heidelberg; 2011. p. 145e58.
et al. Deep learning outperformed 136 of 157 dermatologists in a [33] Szyma nski P, Kajdanowicz T. A network perspective on stratifi-
head-to-head dermoscopic melanoma image classification task. cation of multi-label data. 2017. arXiv [statML].
Eur J Canc 2019;113:47e54. [34] Bankhead P, Loughrey MB, Fernández JA, Dombrowski Y,
[23] Brinker TJ, Hekler A, Enk AH, Berking C, Haferkamp S, McArt DG, Dunne PD, et al. QuPath: open source software for
Hauschild A, et al. Deep neural networks are superior to der- digital pathology image analysis. Sci Rep 2017;7. https:
matologists in melanoma image classification. Eur J Canc 2019; //doi.org/10.1038/s41598-017-17204-5.
119:11e7. [35] Xie S, Girshick R, Dollár P, Tu Z. Aggregated residual trans-
[24] Hekler A, Utikal JS, Enk AH, Hauschild A, Weichenthal M, formations for deep neural networks. Proc IEEE 2017.
Maron RC, et al. Superior skin cancer classification by the [36] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification
combination of human and artificial intelligence. Eur J Canc with deep convolutional neural networks. In: Pereira F,
2019;120:114e21. Burges CJC, Bottou L, Weinberger KQ, editors. Advances in
[25] Hekler A, Utikal JS, Enk AH, Solass W, Schmitt M, Klode J, neural information processing systems 25. Curran Associates,
et al. Deep learning outperformed 11 pathologists in the classifi- Inc.; 2012. p. 1097e105.
cation of histopathological melanoma images. Eur J Canc 2019; [37] Falcon WA. PyTorch lightning. GitHub Note: https://github.
118:91e6. com/williamFalcon/pytorch-Lightning Cited.by.2019;3.
[26] Hekler A, Utikal JS, Enk AH, Berking C, Klode J, [38] Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al.
Schadendorf D, et al. Pathologist-level classification of histo- PyTorch: an imperative style, high-performance deep learning li-
pathological melanoma images with deep neural networks. Eur J brary. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc F,
Canc 2019;115:79e83. Fox E, Garnett R, editors. Advances in neural information pro-
[27] Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck cessing systems 32. Curran Associates, Inc.; 2019. p. 8026e37.
Krauss Silva V, Busam KJ, et al. Clinical-grade computational [39] Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Pro-
pathology using weakly supervised deep learning on whole slide ceedings of the IEEE conference on computer vision and pattern
images. Nat Med 2019;25:1301e9. recognition; 2018. p. 7132e41.
[28] Kather JN, Krisam J, Charoentong P, Luedde T, Herpel E, [40] Cubuk ED, Zoph B, Shlens J, Le QV. RandAugment: practical
Weis C-A, et al. Predicting survival from colorectal cancer his- automated data augmentation with a reduced search space. 2019.
tology slides using deep learning: a retrospective multicenter arXiv [csCV].
study. PLoS Med 2019;16:e1002730. [41] Heo B, Chun S, Oh SJ, Han D, Yun S, Uh Y, et al. Slowing down
[29] Maron RC, Schlager JG, Haggenmüller S, von Kalle C, Utikal JS, the weight norm increase in momentum-based optimizers. 2020.
Meier F, et al. A benchmark for neural network robustness in skin arXiv Preprint arXiv:2006 08217.
cancer classification. Eur J Canc, PII: S0959-8049(21)00442-1, [42] Smith LN. Cyclical learning rates for training neural networks.
https://doi.org/10.1016/j.ejca.2021.06.047. In: 2017 IEEE winter conference on applications of computer
[30] Haggenmüller S, Maron RC, Hekler A, Utikal JS, Barata C, vision. WACV; 2017. p. 464e72.
Barnhill RL, et al. Skin cancer classification via convolutional [43] Saito T, Rehmsmeier M. The precision-recall plot is more infor-
neural networks: systematic review of studies involving human mative than the ROC plot when evaluating binary classifiers on
experts. Eur J Canc, PII: S0959-8049(21)00444-5, https://doi.org/ imbalanced datasets. PLoS One 2015;10:e0118432.
10.1016/j.ejca.2021.06.049. [44] Werner-Klein M, Scheitler S, Hoffmann M, Hodak I, Dietz K,
[31] Kulkarni PM, Robinson EJ, Sarin Pradhan J, Gartrell- Lehnert P, et al. Genetic alterations driving metastatic colony
Corrado RD, Rohr BR, Trager MH, et al. Deep learning based formation are acquired outside of the primary tumour in mela-
on standard H&E images of primary melanoma tumors identifies noma. Nat Commun 2018;9:595.

You might also like