Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/364022537

Uncertainty estimation for margin detection in cancer surgery using mass


spectrometry

Article in International Journal of Computer Assisted Radiology and Surgery · September 2022
DOI: 10.1007/s11548-022-02764-3

CITATIONS READS

2 185

11 authors, including:

Fahimeh Fooladgar Amoon Jamzad


University of British Columbia - Vancouver Queen's University
24 PUBLICATIONS 113 CITATIONS 44 PUBLICATIONS 121 CITATIONS

SEE PROFILE SEE PROFILE

Martin Kaufmann Yigao Wang


Queen's University Anhui Medical University
115 PUBLICATIONS 3,288 CITATIONS 15 PUBLICATIONS 93 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Structured Light Scanning for medical applications View project

PLUS toolkit View project

All content following this page was uploaded by Fahimeh Fooladgar on 08 November 2022.

The user has requested enhancement of the downloaded file.


International Journal of Computer Assisted Radiology and Surgery
https://doi.org/10.1007/s11548-022-02764-3

ORIGINAL ARTICLE

Uncertainty estimation for margin detection in cancer surgery using


mass spectrometry
Fahimeh Fooladgar1 · Amoon Jamzad2 · Laura Connolly2 · Alice Santilli2 · Martin Kaufmann3 · Kevin Ren4 ·
Purang Abolmaesumi1 · John F. Rudan3 · Doug McKay3 · Gabor Fichtinger2 · Parvin Mousavi2

Received: 15 January 2022 / Accepted: 19 September 2022


© CARS 2022

Abstract
Purpose Rapid evaporative ionization mass spectrometry (REIMS) is an emerging technology for clinical margin detection.
Deployment of REIMS depends on construction of reliable deep learning models that can categorize tissue according to
its metabolomic signature. Challenges associated with developing these models include the presence of noise during data
acquisition and the variance in tissue signatures between patients. In this study, we propose integration of uncertainty estimation
in deep models to factor predictive confidence into margin detection in cancer surgery.
Methods iKnife is used to collect 693 spectra of cancer and healthy samples acquired from 91 patients during basal cell
carcinoma resection. A Bayesian neural network and two baseline models are trained on these data to perform classification
as well as uncertainty estimation. The samples with high estimated uncertainty are then removed, and new models are trained
using the clean data. The performance of proposed and baseline models, with different ratios of filtered data, is then compared.
Results The data filtering does not improve the performance of the baseline models as they cannot provide reliable estimations
of uncertainty. In comparison, the proposed model demonstrates a statistically significant improvement in average balanced
accuracy (75.2%), sensitivity (74.1%) and AUC (82.1%) after removing uncertain training samples. We also demonstrate that
if highly uncertain samples are predicted and removed from the test data, sensitivity further improves to 88.2%.
Conclusions This is the first study that applies uncertainty estimation to inform model training and deployment for tissue
recognition in cancer surgery. Uncertainty estimation is leveraged in two ways: by factoring a measure of input noise in
training the models and by including predictive confidence in reporting the outputs. We empirically show that considering
uncertainty for model development can help improve the overall accuracy of a margin detection system using REIMS.

Keywords Rapid evaporative ionization mass spectrometry · Cancer surgery · Uncertainty estimation · Basal cell carcinoma ·
iKnife

Introduction

Basal cell carcinoma (BCC) is amongst the most common


cancer types with increasing incidence rate [1]. Although
Fahimeh Fooladgar, Amoon Jamzad, Laura Connolly contributed BCC has low metastatic risk, it is often diagnosed on the
equally to this work. face, neck, and head, and can lead to significant scarring and
facial deformity. If left untreated, it can also cause bleed-
B Parvin Mousavi
mousavi@queensu.ca ing and ulceration [1]. One of the most common treatment
options for BCC is surgical excision which involves remov-
1 Department of Electrical and Computer Engineering, ing the cancerous lesion along with some of the surrounding
University of British Columbia, Vancouver, BC, Canada
healthy tissue. Achieving negative resection margins, which
2 School of Computing, Queen’s University, Kingston, ON, indicate that the entire cancer was removed, is an important
Canada
factor for reducing disease recurrence [2]. The current stan-
3 Department of Surgery, Queen’s University, Kingston, ON, dard of care for evaluating surgical margins is microscopic
Canada
investigation by a histopathologist following excision. This
4 Department of Pathology and Molecular Medicine, Queen’s process can be time intensive and expensive. Additionally, it
University, Kingston, ON, Canada

123
International Journal of Computer Assisted Radiology and Surgery

is a challenging task for surgeons to identify resection mar- put being correct [8]. Overall, depending on the structure
gins of BCC intraoperatively. Therefore, providing clinical of the underlying neural network model, uncertainty estima-
tools that offer real-time information about the pathology of tion methods have been based on: (1) Bayesian networks,
the tissue during resection is critical. (2) single deterministic networks, (3) ensemble models, and
Rapid evaporative ionization mass spectrometry (REIMS) (4) those that utilize test-time data augmentation techniques
is an emerging technology that enables real-time biochemical [9]. Two types of uncertainty can be estimated from deep
analysis of human tissue [3]. Unlike other mass spectrometry models, aleatoric and epistemic uncertainty [10]. Aleatoric
techniques, REIMS requires no sample preparation which uncertainty is associated with the data, and hence the input
makes it ideal for intraoperative or perioperative use. The to a model, whereas epistemic uncertainty is a measure of
intelligent knife, known as the iKnife (Waters Corp., Mil- the knowledge of a model and is related to its output [11]. A
ford, MA, USA) is a commercially available REIMS solution common approach for quantifying these two types of uncer-
designed for intraoperative margin assessment. The iKnife is tainty is via Bayesian neural networks (BNN). Kendall and
made up of a mass spectrometer that is connected to an elec- Gal [10] provide a unified Bayesian deep learning frame-
trocautery device so that it can analyze the smoke that is work to decompose these two uncertainties. An ensemble of
released during tissue incineration (burn) [4]. Recent studies deep networks is an alternative to Bayesian inference where
suggest that it is feasible to use the iKnife for margin detec- the outputs of different models, resulting from alternate ini-
tion in BCC [4,5]. Solutions are based on the application of tialization and noise in the stochastic gradients, are used
machine learning (ML) and deep learning (DL) methods to to estimate the predictive uncertainty. Although this tech-
build models that can distinguish cancer from healthy tissue nique estimates the total predictive uncertainty, it does not
using the metabolomic signature of the tissue. decompose it to aleatoric and epistemic uncertainty. To make
One of the challenges of developing such technology is an inference, this approach also needs to maintain several
the noisy nature of REIMS data. The observed noise can independent models to calculate the variance of their output
be categorized into data noise and label noise. Data noise predictions as the uncertainty metric.
is a result of several factors such as instrumentation, mea- BNN’s have been successfully applied to several med-
surement drift over time, hemorrhage, and burns that include ical imaging applications. DeVries and Taylor [12] apply
more than one tissue type (i.e., when a larger cautery tip is this technique for image segmentation and use uncertainty
used). The instrumentation drift can be attributed to mea- estimation to remove instances of the data that fail to affect
surement deviations that occur over time after the instrument the segmentation pipelines. Senepati et al. [13] also apply
is calibrated, while the instrumentation noise can be gener- this concept to segmentation and disease classification and
ally attributed to the background signal which is a digitized use uncertainty information to inform segmentation errors.
voltage measurement [6]. The background signal fluctuates; Finally, Ruhe et al. [14] investigate the use of uncertainty
hence, the separation of the pure analyte signal from back- estimation to improve the trustworthiness of mortality pre-
ground is challenging. This noise can usually be removed diction in the ICU.
with background suppression but still introduces potential Uncertainty estimation is important to the clinical imple-
errors. Finally, the location where ions strike the detector mentation of REIMS in two ways: (i) to address errors made
within the mass spectrometer cause variance in the signal. in its prediction of the tissue type that may otherwise go unno-
Separately, label noise is caused by uncertain or incorrect ticed during deployment or, (ii) to detect noisy and OOD
annotation. Annotation of iKnife data in BCC surgery is input samples during training tissue classification models.
challenging as the surgeon excises several different layers of Assigning a measure of predictive uncertainty to the output
tissue and the boundaries between different tissue types are of deep networks allows the surgeon to use this additional
not always clear. In addition to noise, some REIMS samples information to augment their decision making. For instance,
can also be considered out of distribution (OOD) indicating if the model predicts that a tissue sample is cancer at the mar-
their metabolic signatures do not fall within the general dis- gin but with low uncertainty, the surgeon may choose to either
tribution of the other samples. In BCC, encountering OOD investigate the tissue more closely or continue to excise more
data is not uncommon because the biochemical composition tissue to achieve a negative margin. Moreover, for emerging
of similar tissue may vary from patient to patient and the sur- technologies like REIMS, where database development is
geon may even encounter unique pathologies. The inclusion still underway, uncertainty estimation can be factored into
of noisy/OOD samples during model training is undesirable database consolidation to remove noisy or OOD samples.
as it can skew the input data distribution and lead to random In this study, we investigate the application of uncertainty
and even incorrect predictions [7]. One way to detect these estimation for intraoperative margin detection with REIMS.
samples is to leverage uncertainty estimation. Figure 1 demonstrates an overview of our proposed work-
In the context of deep learning, uncertainty estimation flow. We train a BNN using mass spectra collected from
refers to the degree of model’s lack of confidence in its out- resected BCC tissues. The model is used to predict benign

123
International Journal of Computer Assisted Radiology and Surgery

Fig. 1 Left—clinical application of uncertainty estimation in BCC healthy prediction with low uncertainty, c Prediction with high aleatoric
resection with the iKnife during both model training and deployment. uncertainty—indicates to resample, d Positive cancer prediction with
Right—different surgical feedback options for communicating model high epistemic uncertainty—indicates the prediction is uncertain
output: a Positive cancer prediction with low uncertainty, b Positive

and BCC tissue along with the aleatoric and epistemic uncer- Data preprocessing
tainty associated with each prediction. We then investigate
the effects of removing highly uncertain samples from train- Each mass spectrum was initially preprocessed using the
ing, on predictive accuracy, sensitivity and specificity. We Abstract Model Builder (AMX) software (Waters Corp.,
also discuss the potential of including uncertainty estima- Milford, MA, USA). The steps included total ion current
tion in tissue classification reporting. The integration of this normalization, lock mass correction, binning to a resolu-
uncertainty estimation approach with REIMS will ultimately tion of 0.1 m/z , and sub-band selection in m/z 100–1000
improve the reliability of using computational methods for range. To reduce the dimensionality of the peaks, we fur-
margin detection in cancer surgery. ther applied abundance-based down-sampling by picking the
highest peak in each 1 m/z range (picking one of ten peaks).
Therefore, each spectrum is represented by 900 intensity
Materials and methods
peaks which we use as features for training our models.
Data To handle the data scarcity and increase the generalizabil-
ity of the models, and also to balance the distribution of
Data used in this study were collected from 100 tissue sam- data between cancer and non-cancer classes, we used an
ples resected from 91 BCC patients recruited from the skin intensity-aware data augmentation approach that we par-
clinic at our institution. The protocol for patient recruitment ticularly designed for mass spectrometry data [5]. Briefly,
was approved by the institutional HREB. After the skin tis- this method applies random shifts in peak locations (cali-
sue was resected by the surgeon, a derma-pathologist cut and bration error), adds low-power noise to abundant peaks, and
inspected the cross section of the specimens. The protocol for low-frequency medium-power noise to non-abundant peaks
tissue resection and data acquisition is similar to Jamzad et (background noise) of existing spectra to generate new data.
al. [5]. Briefly, the derma-pathologist collected several mass Since this augmentation approach is intensity-aware, it pre-
spectra samples from the BCC region, as well as from sur- serves the tissue specific molecular signature of the mass
rounding benign regions noting the different layers of the spectra. As will be explained in the next section, the aug-
normal skin including dermis, epidermis, and adipose. Tissue mentation was only applied to training data following data
sampling was performed via contact of the tip of an electro- stratification.
surgical cautery unit (cut mode at 35 Watt) to the tissue. Each
mass spectrum represents the distribution of the number of
ions per mass to charge ratios, i.e., m/z. The acquired dataset
consists of 693 mass spectra including 252 BCC and 441
normal skin (259 adipose, 123 dermis, 59 epidermis).

123
International Journal of Computer Assisted Radiology and Surgery

Fig. 2 The overall block diagram of the proposed framework at the training phase. A BNN model is trained on the entire training samples. Based
on the measurement of uncertainty, OOD and noisy data are removed from the training data to produce a cleaner dataset and train a new model

Proposed method of the network for i-th sample of N training data. Consid-
ering the output of the network being transformed through a
An approach to estimate uncertainty using BNNs is to infer sigmoid function, the predictions of the network for sample
the predictive distribution of the network target y, p(y | x), i are computed as:
where x is the input. In this approach, the predictive uncer-
tainty is composed of two different types of uncertainty, pi,m = sigmoid(μi,m + σi,m ),  ∼ N (0, 1). (1)
epistemic and aleatoric. The epistemic uncertainty is consid-
ered as the uncertainty of the model in its prediction. Hence,
we hypothesize that samples with lower epistemic uncer- To fit a Gaussian distribution to the output layer of the
tainty are in-distribution data, which have characteristics network requires Monte Carlo approximation during train-
similar to most of the data in our training set. Additionally, ing, and sampling Eq. (1) M times, where  is derived from
samples with higher epistemic uncertainty are OOD. There- a unit Gaussian distribution each time. Hence, the prediction
fore, the epistemic uncertainty of the prediction could be used of the network can be computed by:
to filter OOD data. The aleatoric uncertainty is considered as
1 
M
the uncertainty of the observed data. As previously discussed, pi = pi,m
such uncertainty can be due to the inherent noise in the mea- M
m=1
surements or labels, or wrongly annotated data. Hence, we
1 
M
hypothesize that samples with higher aleatoric uncertainty = sigmoid(μi,m + σi,m ),  ∼ N (0, 1). (2)
are either noisy tissue burns or noisy annotations of the data. M
m=1
In our method, we train a BNN model to estimate uncer-
tainty and detect the OOD and noisy data. We then filter At test time, the network has already learned to predict μ and
highly uncertain samples to yield a clean dataset. Hencefor- σ 2 , hence  = 0 in Eq. (2). By enabling MC dropout at the
ward, the model is trained only with the clean data and is test time, for each sample i at each forward pass of the model,
expected to result in improved prediction and estimation of we chose from the approximate posterior distribution, there-
uncertainty at test time. A block diagram of our model and fore, we have {μi , σi2 }m=1
M . The variance of the predictions

the training process is illustrated in Fig. 2. from the forward passes are the epistemic uncertainty while
the average of σ 2 can be considered as the aleatoric uncer-
tainty. Consequently, the total predictive uncertainty for the
Uncertainty estimation i-th sample is:

1  2 1  2
Although various approximations of BNNs have been pro- M M
posed, in this study we leverage the Monte Carlo (MC) uncertainty = pi,m + p̄i 2 + σi,m (3)
M M
dropout for BNN approximation [15]. Kendall and Gal [10] m=1 m=1
     
demonstrate how to calculate the aleatoric and epistemic Epistemic Aleatoric
uncertainty using BNNs with MC dropout. They propose to
modify the BNNs by placing a Gaussian distribution over the where p̄i denotes the mean of { pi,m }m=1
M . To train the

output layer of the network, where mean (μi ) and variance method, we utilized the following loss function proposed in
(σi2 ) of the Gaussian distribution are considered as the output [15]

123
International Journal of Computer Assisted Radiology and Surgery


−1  
N M
1 each data point in the training set. The uncertainty infor-
Lx = (yi ) log pi,m mation is then used to “filter” the training set, i.e., exclude
N M
i=1 m=1
 samples with high uncertainty values. The clean subset of
1 
M
data is then used to train a new model and its performance
+ (1 − yi ) log (1 − pi,m ) , (4) is compared with the baseline. In this study, we filter 20
M
m=1
and 30% for the data based on three criteria of aleatoric,
epistemic, and aleatoric + epistemic uncertainty. Each con-
This loss approximates the log likelihood loss function
figuration is trained 30 times with different initializations
through Monte Carlo integration.
to increase the generalization, and the performances of the
models on the test set are compared.
Model architecture and training We also implement two baseline models, a PCA/LDA
approach which is commonly used in the literature for anal-
The architecture of the model used in our study constitutes ysis of mass spectrometry data, and a deep model with the
a neural network to transform the input x, while the net- same architecture as our proposed model but without the
work output is split to predict both μ and σ 2 . The network Bayesian output and uncertainty estimation capability. To
is constructed as three fully connected layers with 128, 64, filter training data for the baseline models, uncertainty is
and 2 units, respectively. The first two layers are followed estimated using their probability output, p(y | x, θ ), as
by batch normalization, transformation through a Relu acti- 1 − maxk p(y = k | x, θ ), where x is the feature vector
vation function and a drop-out layer [16]. The input of the and y is the class label. We further explore the distribution of
model is a spectrum with 900 peaks. The model was trained filtered (highly uncertain) samples, as well as the patient-wise
with mini-batch stochastic gradient descent and the Adam distribution of misclassified data in test set. Finally, we inves-
optimizer [16]. Training starts from random initial weights tigate if excluding uncertain samples during deployment can
and continues with learning rate of 0.0001 and batch size of improve the performance of tissue classification.
16 for 100 epochs. Using the proposed BNN model, we can
simultaneously classify the samples and estimate their epis-
temic and aleatoric uncertainty to detect the OOD and noisy Results and discussion
data. Hence, for each sample x, we have M = 50 estimations
of μ and σ 2 where the epistemic uncertainty is computed as Performance Table 1 summarizes the performance of the
the variance of M predictions of the model while the aleatoric baseline models in comparison to the proposed model using
uncertainty is computed as the average of predicted σ 2 . different ratios of filtered training data. The second column
of the table indicates the percentage of filtered samples and
Experiments the uncertainty criteria for filtering. As can be seen, the pro-
posed model results in a higher AUC compared to the baseline
To evaluate the model, we stratify the data by patients into models when no filtering is used (row 7 versus rows 1 and
training, validation and test sets. The training and validation 4). The overall sensitivity of all models without filtering is
sets contain 436 and 93 spectra, respectively. Of those, 158 relatively low which can be due to patient heterogeneity, or
training samples and 37 validation samples have been labeled the distribution of mass spectra in the feature space, i.e., peak
as BCC. The remaining 278 and 56 samples of training and distribution.
validation sets are benign cases. The test set contains 164 According to Table 1, neither of the baseline models are
spectra collected from 20 patients with 57 BCC and 107 improved by data filtering. As mentioned before, the model
benign cases. We augment our training set with a ratio of 5:2 probability is not able to correctly estimate the uncertainty.
of cancer to normal spectra. For each training task, the best However, for the proposed approach, all filtering config-
model is selected based on its performance on the validation urations improve the sensitivity statistically significantly
set. The final performance of the model is evaluated based on (p-value < 0.001 for rows 8–13 compared to row 7 based
the test data. For all experiments, a decision threshold of 0.5 on one-tail Wilcoxon signed-rank test). However, the overall
was used for binary classification (i.e., positive vs. negative balanced accuracy is not always significantly improved. Fil-
margins). We use sensitivity (true positive rate), specificity tering samples with high epistemic uncertainty decreases the
(true negative rate), balanced accuracy (average of sensitivity specificity, hence the improvement in the balanced accura-
and specificity), and AUC, area under the receiver operating cies are not statistically significant anymore (p-value of 0.26
characteristic curve as performance measures. and 0.16 for 20 and 30% filtering, respectively). Although
The proposed model is first trained using the entire aug- the average AUC value decreased slightly by only filtering
mented training dataset as a baseline. Then, the trained model the epistemically uncertain data, the change is not statisti-
is used to estimate the aleatoric and epistemic uncertainty of cally significant (p-value of 0.87 and 0.67, for 20 and 30%

123
International Journal of Computer Assisted Radiology and Surgery

Table 1 Comparison of the performance of the baseline models and the proposed BNN method with different ratios of filtered training data
Method Uncertainty filtering Balanced accuracy Sensitivity Specificity AUC

PCA/LDA (Baseline) 0% 69.7 ± 2.3 66.0 ± 3.3 73.5 ± 1.7 73.9 ± 1.3
20 % Prob. 68.6 ± 1.4 54.0 ± 2.6 83.2 ± 3.0 73.5 ± 1.5
30 % Prob. 69.8 ± 1.5 52.6 ± 4.4 86.9 ± 2.5 73.6 ± 2.0
MLP (Baseline) 0% 73.8 ± 2.4 59.4 ± 5.4 88.2 ± 3.8 78.6 ± 2.0
20 % Prob. 72.7 ± 1.8 63.3 ± 3.1 82.1 ± 3.9 78.4 ± 1.6
30 % Prob. 71.9 ± 1.9 61.8 ± 5.7 82.0 ± 5.4 76.8 ± 1.9
BNN (Ours) 0% 72.1 ± 1.8 60.5 ± 3.7 83.7 ± 2.7 81.1 ± 2.3
20 % Epi. 72.9 ± 3.3 69.9 ± 7.1 75.8 ± 4.4 79.8 ± 3.2
30 % Epi. 72.9 ± 2.8 73.0 ± 4.4 73.1 ± 6.5 80.6 ± 2.4
20 % Ale. 75.3 ± 2.6 74.0 ± 4.6 76.1 ± 5.2 81.5 ± 3.0
30 % Ale. 73.5 ± 2.8 72.3 ± 4.9 74.6 ± 5.4 81.2 ± 2.9
20% Ale. + Epi. 75.2 ± 2.9 74.1 ± 3.8 77.3 ± 4.8 82.1 ± 2.6
30 % Ale. + Epi. 74.9 ± 2.9 71.6 ± 3.3 77.3 ± 5.6 82.3 ± 2.5
The bolded row indicates the best model with highest performance
Here, p% filtering indicates that p% of the highly uncertain samples have been removed from the training set, and a new model has been trained
on the remaining certain samples. For the baseline models, uncertainty is estimated from the model probability output (Prob.). For the proposed
approach, both aleatoric (Ale.) and epistemic (Epi.) uncertainties are used for filtering

filtering, respectively). When samples that have both high approach, while simulating noise to increase data variability,
epistemic and aleatoric uncertainties are filtered (the last two does not increase the uncertainty level of the data.
rows of Table 1), the improvement in accuracy and AUC To evaluate the relationship between the estimated uncer-
become statistically significant (p-values of 0.0001 and 0.008 tainty with the peak distribution in spectra, we further
for balanced accuracy with 20% and 30% filtering, respec- examine highly uncertain data in the training set. Two sample
tively, and p-values of 0.003 and 0.009 for AUC with 20 and spectra with high estimated epistemic and aleatoric uncer-
30% filtering, respectively). The best results are achieved tainty are illustrated in Fig. 4 left and right, respectively.
when 20% of samples with high aleatoric + epistemic uncer- In the epistemic uncertain sample, we noticed multiple pat-
tainty are filtered. In this case, although the specificity of the terns of high intensity peaks, that can be interpreted as the
model is reduced compared to no filtering, it is important combination of different molecular signatures that is very
to note that the cost of false negatives are much higher than different from the rest of the dataset, i.e., an out of distri-
false positives since the application is margin detection in bution sample. For the sample that was detected as aleatoric
cancer surgeries. Therefore, a model with higher sensitivity uncertain, the distribution of peaks in unremarkable. How-
that detects all cancer at margins is preferred. ever, although the label of the data sample is adipose, the
Exploration of uncertain data As mentioned in the data concentration of peaks in the triglyceride range (m/z 860–
collection section, the non-cancer samples in this study 940) is relatively smaller than the phospholipid range (m/z
were collected from different layers of the skin. Figure 3 670–800), which is in contrast with adipose label [17]. One
visualizes the distribution of different tissue types in the hypothesis is that the adipose region was small and tumor
augmented training set that was used for baseline model cells were progressed through the region. Then during data
(first row of Table 1), and after filtering out 20% of high collection, the cautery mostly burnt BCC, which added noise
aleatoric + epistemic uncertain samples (second to the last to the label and increased the aleatoric uncertainty.
row of Table 1). It can be seen that although models were Deployment To better evaluate the proposed method
trained for binary classification of cancer vs. healthy tis- in deployment phase, the distribution of true/false posi-
sue (without specifying adipose, dermis, and epidermis), the tive/negative predictions in the test set is visualized in Fig. 5
proportion of uncertain, filtered samples among different tis- for the baseline and one of the filtered models. As expected
sue types is very similar. This indicates that the protocol from Table 1, the number of false negatives are significantly
that was used for data collection is not biased to a spe- reduced at test time when the model was trained with more
cific tissue type. It also can be seen that the portion of certain samples. False negatives are considered to be more
removed uncertain spectra for real and augmented data are critical than false positives in cancer diagnosis and treatment.
similar which demonstrates that the proposed augmentation In the case of mass spectrometry sampling of tissue during
cancer resection, false negatives refer to undetected positive

123
International Journal of Computer Assisted Radiology and Surgery

Fig. 3 Distribution of different tissue types in the training data follow- uncertainty (Bottom). It can be seen that the uncertain samples are evenly
ing intensity-aware augmentation (Top), and the data distribution after distributed between data points and are not specific to a certain tissue
filtering out 20% of the training samples with high aleatoric + epistemic type

Fig. 4 Visualization of two sample spectra from the test set with high signatures that make the sample out of distribution. Although the peak
levels of estimated epistemic (Left) and aleatoric (Right) uncertainties. distribution of aleatoric uncertain spectrum is in-distribution, its label
The epistemic uncertain spectrum shows a mixture of high-intensity is determined to be noisy

margin, i.e., leaving cancer cells behind in the body. There- decision making. In terms of margin assessment using mass
fore, the high sensitivity achieved by the proposed method is spectrometry, high epistemic uncertainty can be handled by
preferred. not making the decision solely based on the deep model pre-
Figure 5 also illustrates the false positives are patient spe- diction and using other sources of information. On the other
cific. For instance, most of the benign samples from patient 9 hand, high aleatoric uncertainty can be interpreted as noisy
are misclassified. Further examination of data reveals the low spectra or unknown molecular signatures, and the suggestion
level of generated ions from the tissue burns for this patient, would be to re-sample the area again.
which resulted in relatively lower intensity and distribution
of peaks in the spectra. A sample dermis spectrum from this
patient, along with a common dermis spectrum, is visualized Conclusions and future work
in Fig. 6 for comparison. The difference in distribution of
the peaks can be due to the tissue structures that are specific The inclusion of DL in any clinical context requires robust
to the patient and sampled during data collection. The size and reliable predictions and reporting. In this paper we apply
and hydration level of the resected tissue specimen can also uncertainty estimation to achieve this and propose a new
affect the spectra that collected from that tissue sample. approach to model training and deployment of REIMS imag-
Excluding the highly uncertain samples from decision ing. To do so, we compute probabilistic uncertainties during
making during model deployment can further improve deci- binary classification and relay them back to their clinical and
sion. To test this, we chose one of the filtered models from the biochemical significance. This information is then used to
second last row of Table 1 and re-calculated the sensitivity of detect OOD and noisy samples and clean our training data
test set by excluding the uncertain samples. We observed that by removing them and retraining our model. We demon-
the initial sensitivity of 80.7% increased to 84.9, 88.2, and strate that this approach results in higher balanced accuracy
89.6% by rejecting 10, 20, and 30% of highly uncertain spec- and sensitivity with a smaller, more refined dataset. Our
tra, respectively. This empirically shows that providing the experiments demonstrate that data-centric methods are effec-
surgeon with uncertainty estimation and a strategy to deal tive in yielding high-quality datasets in order to enhance
with uncertainty may further improve the performance of classification performance of models, especially in medi-

123
International Journal of Computer Assisted Radiology and Surgery

Fig. 5 Patient-wise classification of the test set using: Left, base- classification. As can be seen, the number of false negatives is signifi-
line model, and Right, model trained with 20% filtering of high cantly reduced when highly uncertain samples were removed during the
aleatoric + epistemic uncertain samples from training set. Each column training phase, which indicates the effectiveness of our contributions in
depicts all the samples of each patient. The color of the circles is associ- increasing the sensitivity
ated with real data labels, and the size associated with the correctness of

Fig. 6 Comparison of two mass spectra acquired from the dermis region. The left spectrum has relatively lower peak distribution and intensity
compared to a common dermis spectrum presented in the right figure. The left spectra were misclassified by the model while having high estimated
aleatoric uncertainty

cal applications. In addition to using uncertainty for refining Ethical approval This study was approved by the Queen’s University
training data, we also proposed a feedback approach for com- Health Sciences Research Ethics Board.
municating uncertainty to the surgeon during deployment
Informed consent All patients that participated in the study gave
and empirically evaluated it. For BCC resection, it has been informed verbal and written consent.
shown that REIMS technology is a viable tool for tissue clas-
sification and consequently margin detection. Therefore, the
integration of uncertainty in this same approach can be used
to further augment clinical decision making by communi- References
cating a measure of predictive confidence to the surgeon as
1. Manoli S-M, Moutsoudis A, Papageorgiou C, Lallas K, Rigas H-M,
well. In the future, we would like to integrate this new training Kyrmanidou E, Papadimitriou I, Paschou E, Spyridis I, Gkentsidi T,
strategy on more complex DL architectures to improve the Sotiriou E, Vakirlis E, Ioannidis D, Apalla Z, Lallas A (2020) Real-
overall classification accuracy, as well as apply this approach life data on basal cell carcinoma treatment: insights on clinicians’
to multi-label classification problems. therapeutic choices from an institutional hospital registry. Dermatol
Ther 33(6):14414
Funding We would like to thank the following sources of funding: 2. Filho RB, de Carvalho Fantini B, Dos Santos CA, Melo RV, Rosan
Natural Sciences and Engineering Council of Canada (NSERC), the I, Chahud F, da Silva Souza C (2019) Attributes and risk factors
Canadian Institute for Health Research (CIHR), Southeastern Ontario of positive margins on 864 excisions of basal cell carcinomas: a
Academic Medical Organization (SEAMO) Innovation Fund, Britton single-center retrospective study. J Dermatol Treat 31(6):589–596
Smith Chair in Surgery to J. Rudan, and Canada Research Chair to G. 3. Balog J, Sasi-Szabo L, Kinross J, Lewis MR, Muirhead LJ,
Fichtinger. Veselkov K, Mirnezami R, Dezso B (2013) Intraoperative tissue
identification using rapid evaporative ionization mass spectrome-
try. Sci Transl Med 5(2):194
4. Santilli A, Jamzad A, Janssen N, Kaufmann M, Connolly L, Van-
Declarations
derbeck K, Wang A, McKay D, Rudan J, Fichtinger G, Mousavi P
(2020) Perioperative margin detection in bcc using a deep learning
Conflict of Interest The authors declare no conflicts of interest. framework: a feasibility study. Int J CARS 15:887–96

123
International Journal of Computer Assisted Radiology and Surgery

5. Jamzad A, Sedghi A, Santilli AML, Janssen NNY, Kaufmann M, 14. Ruhe D, Cinà G, Tonutti M, de Bruin D, Elbers P (2019) Bayesian
Ren KYM, Vanderbeck K, Wang A, Mckay D, Rudan JF, Fichtinger modelling in practice: using uncertainty to improve trustworthiness
G, Mousavi P (2020) Improved resection margins in surgical oncol- in medical applications. arXiv:1906.08619
ogy using intraoperative mass spectrometry. In: Medical image 15. Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation:
computing and computer assisted intervention, MICCAI, Lecture representing model uncertainty in deep learning. In: International
notes in computer science, vol 12263. Springer, Cham https://doi. conference on machine learning, pp 1050–1059
org/10.1007/978-3-030-59716-0_5 16. Murphy KP (2022) Probabilistic machine learning: advanced top-
6. Wells G, Prest H, William C, Iv R. Application note chemical anal- ics. MIT Press, Cambridge
ysis signal, noise, and detection limits in mass spectrometry 17. St-John ER, Al-Khudairi R, Ashrafian H, Athanasiou T, Takats Z,
7. Loquercio A, Segu M, Scaramuzza D (2020) A general framework Hadjiminas DJ, Darzi A, Leff DR (2017) Diagnostic accuracy of
for uncertainty estimation in deep learning. IEEE Robotics Autom intraoperative techniques for margin assessment in breast cancer
Lett 5(2):3153–3160 surgery. Anal Surg 265(2):300–310
8. Vranken JF, van de Leur RR, Gupta DK, Juarez Orozco LE, Has-
sink RJ, van der Harst P, Doevendans PA, Gulshad S, van Es R
(2021) Uncertainty estimation for deep learning-based automated
Publisher’s Note Springer Nature remains neutral with regard to juris-
analysis of 12-lead electrocardiograms. Eur Heart J Digital Health
dictional claims in published maps and institutional affiliations.
2(3):401–415
9. Gawlikowski J, Tassi CRN, Ali M, Lee J, Humt M, Feng J, Kruspe
Springer Nature or its licensor holds exclusive rights to this article
A, Triebel R, Jung P, Roscher R et al (2021) A survey of uncertainty
under a publishing agreement with the author(s) or other rightsholder(s);
in deep neural networks. arXiv:2107.03342
author self-archiving of the accepted manuscript version of this article
10. Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian
is solely governed by the terms of such publishing agreement and appli-
deep learning for computer vision? arXiv:1703.04977
cable law.
11. Hüllermeier E, Waegeman W (2021) Aleatoric and epistemic
uncertainty in machine learning: an introduction to concepts and
methods. Mach Learn 110(3):457–506
12. DeVries T, Taylor GW (2018) Leveraging uncertainty estimates
for predicting segmentation quality. arXiv:1807.00502
13. Senapati J, Roy AG, Pölsterl S, Gutmann D, Gatidis S, Schlett
C, Peters A, Bamberg F, Wachinger C (2020) Bayesian neural
networks for uncertainty estimation of imaging biomarkers. In:
Lecture notes in computer science (including subseries lecture
notes in artificial intelligence and lecture notes in bioinformatics),
vol 12436. LNCS, pp 270–280

123

View publication stats

You might also like