MEDS-Net - Multi-Encoder Based Self-Distilled Network With Bidirectional Maximum Intensity Projections Fusion For Lung Nodule Detection

Engineering Applications of Artificial Intelligence 129 (2024) 107597
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai
MEDS-Net: Multi-encoder based self-distilled network with bidirectional

maximum intensity projections fusion for lung nodule detection
Muhammad Usman a,b,c ,∗, Azka Rehman b,d , Abdullah Shahid b , Siddique Latif e , Yeong-Gil Shin a
a
Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea
b
Center for Artificial Intelligence in Medicine and Imaging, HealthHub Co. Ltd., Seoul, 06524, South Korea
c
SeoulDynamics Inc., Seongnam-si, Gyeonggi-do, 13449, South Korea
d
Department of Biomedical Sciences, Seoul National University Graduate School, Seoul, 03080, South Korea
e
School of Computer Science and Engineering, University of New South Wales, Sydney NSW 2052, Australia
ARTICLE INFO ABSTRACT
Keywords: Early detection of lung cancer increases cure rates, making precise detection of lung nodules in computed
Bidirectional maximum intensity projection tomography (CT) images crucial. However, creating an accurate computer-aided detection (CADe) system is
Computer-aided detection (CADe) challenging due to lung structure complexity and nodule heterogeneity. We present a lung nodule detection
Multi-encoder network
scheme, inspired by radiologist workflow, fusing CT scan sub-volumes with bidirectional maximum intensity
Pulmonary nodule detection
projection (MIP) images within a single architecture. We have developed a novel multi-encoder-based network
Self-distillation
(MEDS-Net) using self-distillation to learn from three types of inputs: 3D sub-volume, forward, and backward
MIP images. MEDS-Net utilizes three encoders to extract insights, passed to the decoder via attention units to
suppress redundant features. The decoder employs a self-distillation mechanism, connecting to a distillation
block with four auxiliary lung nodule detectors, accelerating convergence and enhancing learning ability. The
auxiliary detectors are used during inference to reduce false positives. Our scheme, tested on 888 LIDC-IDRI
dataset scans, achieved a 93.6% competition performance metric score, illustrating that fusing bidirectional
MIP images with raw CT slices and auxiliary detectors empowers MEDS-Net to accurately detect lung nodules
while minimizing false positives.
1. Introduction employed in 2D and 3D to achieve significantly improved perfor-

mance (Monkam et al., 2019). Such CADe system typically consists of
Lung cancer is the most frequent cancer type, with the highest two stages: screening and false positives (FPs) reduction. The former is
mortality rates globally (Sung et al., 2021). The survival rate of patients typically applied to identify suspicious regions (i.e., nodule candidates)
highly depends upon the early detection of pulmonary nodules (Bald- in a patient’s exam. The latter stage is to reduce the FPs by applying
win, 2015). Computed tomography (CT) is an extensively employed the additional CNNs (Cao et al., 2020; Pezeshk et al., 2018; Zhao et al.,
effective medical imaging modality for diagnosing pulmonary nod- 2022; Zhou et al., 2022). Although such CADe systems demonstrated
ules (Sluimer et al., 2006). However, it is often a time-consuming promising performance for lung nodule detection, these techniques add
and tedious job to manually identify nodules in CT scans. Radiologists computational complexity by increasing the number of CNNs to train
need to read the CT scans slice by slice, and depending upon the slice and are more prone to error as nodules missed at the first stage cannot
thickness, a chest CT may contain hundreds of slices. Additionally,
be detected at the second stage. Subsequently, the sensitivity at the
manual lung nodule detection is error-prone (Seemann et al., 1999).
detection stage is kept high, significantly increasing the FPs which adds
To address this problem, computer-aided detection (CADe) systems
the load to the FPs reduction stage. To overcome the challenges as-
becoming popular in aiding radiologists in accurately diagnosing lung
sociated with dual-stage-based CADe systems, single-stage frameworks
cancer.
were proposed to detect the lung nodules (Li and Fan, 2020; Zhu
Numerous efforts (e.g., Zhao et al. (2022), Zhou et al. (2022))
have been made in designing CADe systems. Recently, deep learning et al., 2022; Luo et al., 2022). These frameworks considerably improve
(DL) based techniques have made vast inroads in lung nodule de- the computational complexities and ease the training and optimization
tection (Zhang et al., 2018; Halder et al., 2020). In DL-based CADe process; however, they suffer from a low performance that limits their
systems, convolutional neural networks (CNNs) have been extensively real-time clinical employment.
∗ Corresponding author at: Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.
E-mail address: ussman@snu.ac.kr (M. Usman).
https://doi.org/10.1016/j.engappai.2023.107597
Received 19 July 2023; Received in revised form 23 September 2023; Accepted 20 November 2023
Available online 4 December 2023
0952-1976/© 2023 Elsevier Ltd. All rights reserved.
M. Usman et al. Engineering Applications of Artificial Intelligence 129 (2024) 107597
Despite the advancements in performance, a significant portion of Yuan et al. (2021) presented a dual-stage scheme that utilized 3-D
existing CADe systems do not align with the clinical workflow, leading residual UNet with channel attention and applied multi-task learning
to challenges in lung nodule detection. Without incorporating domain- with a multi-branch classifier network for nodule detection and FPs
specific knowledge, these systems often resort to more intricate models reduction, respectively. Pereira et al. (2021) used mask region-mask
applied across multiple stages for accurate nodule identification. In region-convolutional neural network (Mask R-CNN) to detect bounding
typical clinical practice, radiologists predominantly use the maximum boxes in axial slices and used a classifier ensemble based on CT attenua-
intensity projection (MIP) images (Wallis et al., 1989) of CT scans tion patterns to boost 3D pulmonary nodule classification performance.
for initial nodule screening, turning to the raw CT slices for final Mei et al. (2021) used a 3D region proposal network to generate
confirmation. Notably, only a handful of studies (Masood et al., 2020; pulmonary nodule candidates, and the multi-scale feature maps were
Zheng et al., 2019) have integrated MIP images to enhance lung nod- used to reduce the FPs. These dual-stage methods are computationally
ule detection, aligning more closely with clinical practices. Although expensive to run, and misdiagnosis and omission errors in the preceding
these studies have demonstrated the value of aligning with clinical stages significantly impact the FP reduction in the subsequent stage. In
workflows using relatively simpler, conventional networks, they have contrast, we propose a single-stage network for lung nodule detection.
often employed multiple networks across different stages. This multi-
network approach not only adds complexity but also presents hurdles 2.2. Single stage methods for lung nodule detection
for real-time clinical deployment. To this end, our work introduces
a cohesive, single-stage CADe system that synergizes the 3D patch Due to the issues associated with dual-stage CADe systems, some
of CT scans with MIP images for precise lung nodule detection. In studies proposed single-stage CADe systems that simultaneously per-
a pioneering effort to minimize training overhead while enhancing form nodule detection and false positive reduction using a single net-
performance, our methodology incorporates self-distillation into our work. For instance, Li and Fan (2020) used the 3D encoder–decoder
proposed architecture for lung nodule detection. architecture for lung nodule detection. To effectively reduce the false
Below is a summary of the main contributions of this study in positives, they utilize a dynamically scaled cross-entropy loss function.
contrast to the existing literature. Wu et al. (2021) combined image enhancement and a dual-branch neu-
ral network for improving the visibility of lung nodules by suppressing
1.1. Summary the noises from the background. The enhanced images were fed to the
architecture based on two UNets to detect the nodules. Zhu et al. (2022)
utilized 3D residual UNet with improved attention gates to reduce the
1. We design a novel single-stage multi-encoder-based CNN that
false positives. They also applied a channel interaction unit prior to the
utilizes the self-distillation mechanism and improves the perfor-
detection head and the gradient harmonizing mechanism loss function
mance compared to state-of-the-art single-stage as well as most
to combat the problem of data imbalance. Luo et al. (2022) detected
dual-stage frameworks on the public LIDC-IDRI dataset.
the position, radius, and offset of nodules by using an anchor-free 3D
2. We propose to utilize bidirectional MIP images of three slab
sphere representation-based center-points matching detection network.
thicknesses, i.e., 3, 5, and 10 mm, for lung nodule detection,
Single-stage techniques greatly reduce the computational complexities
which improves the network’s ability to distinguish nodules from
and are easier to train and optimize; however, they suffer from lower
non-nodular structures in the lung region.
performance, limiting their clinical usage.
3. To reduce false positives, we introduce our proposed method,
which uniquely leverages outputs from auxiliary branches of the
2.3. MIP for lung nodule detection
same architecture for both lung nodule candidate detection and
subsequent false positive reduction. This approach stands out
In the real-time clinical workflow, mostly, radiologists first examine
as it effectively utilizes the same network in a dual capacity, a
the maximum intensity projection (MIP) images (Wallis et al., 1989) of
feature not commonly observed in existing methods.
CT scan to roughly screen the nodule candidates and use the raw CT
scan slices to finalize the decision. MIP helps to summarize the shape
2. Related work variations appearing along the z-axis, which improves the visibility
of lung nodules by distinguishing them from surrounding structures
2.1. Dual stage methods for lung nodule detection inside the lung. It can be taken in both directions (i.e., forward and
backward), and each highlights the structural variations occurring in
Most DL-based computer-aided detection (CADe) systems for lung each direction. Very few studies utilized the MIP images to improve
nodule detection consist of two stages: screening and false positive lung nodule detection performance and comply with the clinical work-
(FP) reduction. The former is typically applied to identify suspicious flow. For instance, Masood et al. (2020) leveraged MIP to extract
regions (i.e., nodule candidates) in a patient’s exam. The latter stage meaningful insights of lung nodules from 2D views (i.e., axial, coronal,
is to reduce the FPs by using the additional network (e.g., CNNs). For and sagittal) and used multidimensional region-based fully convolu-
instance, Cao et al. (2020) proposed a two-stage CADe system with an tional neural network based CAD system for lung nodule detection and
improved UNet architecture for candidate deduction. classification, respectively. Similarly, Zheng et al. (2019) attempted to
They used a two-phase inference scheme, i.e., rough segmentation follow the clinical workflow by MIP with different slab thicknesses
and fine segmentation in the smaller local region. To eliminate the (i.e., 5 mm, 10 mm, 15 mm) to train four different UNet architectures
false positives, they use 3D-CNNs-based ensemble architecture. Pezeshk and merged their outputs for nodule candidates detection. They trained
et al. (2018) applied 3D-CNNs for the initial screening of lung nod- two different-sized DNNs, which classified the 3D candidate patch into
ules, and FPs are reduced by an ensemble of 3D-CNNs trained using nodule or non-nodule for false positive reduction. Based on the results,
different transformations applied to both the positive and negative they show that the performance of CADe for nodule detection can be
patches to augment the training set. Zhao et al. (2022) employed the improved by leveraging MIP images. However, this study utilized four
high-resolution fused attention module, containing channel and spatial MIP images with four different networks, each network was provided
attention, to incorporate the global and spatial information of focal with partial information, which is contrary to the conventional clinical
nodules for detecting the nodule candidates. In the second stage, an workflow. In clinical practices, the radiologist leverages raw CT slices
adaptive 3D CNN structure is designed to further reduce the false along with MIP images of various thicknesses to effectively detect lung
positives, which extracts the multilevel contextual information via an nodules (Guleryuz Kizil et al., 2020). It is infeasible for radiologists to
adaptive 3D convolution kernel. solely rely on MIP images for diagnosis as it provides the highlights of
2
Table 1
Summary of comparison of our work with the existing studies.
Author (Year) Input type False positive Single stage Self-
reduction or network distillation/auxiliary
outputs
2D/3D slices MIP images
Forward Backward
Pezeshk et al. (2018) ✓ ✗ ✗ ✓ ✗ ✗
Zheng et al. (2019) ✓ ✓ ✗ ✓ ✗ ✗
Cao et al. (2020) ✓ ✗ ✗ ✓ ✗ ✗
Li and Fan (2020) ✓ ✗ ✗ ✗ ✓ ✗
Masood et al. (2020) ✓ ✓ ✗ ✗ ✓ ✗
Pereira et al. (2021) ✓ ✗ ✗ ✓ ✗ ✗
Zhou et al. (2022) ✓ ✗ ✗ ✓ ✗ ✗
Zhu et al. (2022) ✓ ✗ ✗ ✗ ✓ ✗
Mei et al. (2021) ✓ ✗ ✗ ✓ ✗ ✗
Luo et al. (2022) ✓ ✗ ✗ ✗ ✓ ✗
Zhao et al. (2022) ✓ ✗ ✗ ✓ ✗ ✗
Ours (2023) ✓ ✓ ✓ ✓ ✓ ✓
adjacent slices by suppressing the lower intensities, and in some cases, 1. While some studies utilize forward MIP for lung nodule detec-
it also subdue the nodules lying at the lower intensity values. Therefore, tion, however, we are the first to exploit bidirectional (forward
radiologists examine the raw CT scans complementary with MIP images and backward) MIP images to boost the detection performance.
of various thicknesses to draw accurate decisions on lung nodules (Gru- 2. None of the studies used self-distillation in CADe systems for
den et al., 2002). More particularly, it has proven to be more effective lung nodule detection. Our model equips with self-distillation to
for small nodules and nodules with higher malignancy (Jabeen et al., better optimize the model.
2019). Nevertheless, in this work, we completely emulate the clinical 3. None of the previous work on single-stage models achieved bet-
workflow by simultaneously incorporating the raw CT slices and MIP ter performance compared to the dual-stage techniques. Whereas
images of various thicknesses into proposed multi-encoder based self- or model utilizes auxiliary outputs from various levels of the de-
distilled network (MEDS-Net). We further introduce the novel concept coder block to reduce false positives in a single stage and achieve
of bidirectional MIP images which enables MEDS-Net to exploit the state-of-the-art performance compared to dual-stage models.
enriched insights to detect the lung nodules.
2.4. Self-distillation for lung nodule detection 3. Proposed methodology
The conventional knowledge distillation (KD) used a pre-trained In this section, we describe the proposed framework for lung nodule
teacher model to produce the soft labels as ground truth to train detection in abdominal CT scans, which mimics the clinical prac-
a student model (Hinton et al., 2015). It was originally introduced tices of radiologists who utilize maximum intensity projection (MIP)
to compress a big network (teacher) to a small network (student), images (Wallis et al., 1989). Our proposed multi-encoder-based self-
i.e., network compression (Hui et al., 2018; Tung and Mori, 2019). distilled network (MEDS-Net) is designed to fully exploit the MIP
It computes a distillation loss between teacher and student, for which images along with a 3D patch of CT scan as shown in Fig. 1.
the implementation methods can be logits-based (Jiang et al., 2019; The proposed MEDS-Net leverages the self-distillation mechanism
Phuong and Lampert, 2019), feature-based (Liu et al., 2020; Xu et al., and auxiliary outputs to optimize the learning process and reduce
2020b), and hybrid (Li et al., 2017; Xu et al., 2020a). For instance, Qin false positives at the inference stage, respectively. The details of each
et al. (2021) feature-based KD for the segmentation task in the medical component of the proposed framework have been described in the
images. In recent works, KD has been extended to its self-learning following subsections.
version called self-distillation in which teacher and student networks
share the same backbone architecture. Self-distillation helps models 3.1. Bidirectional maximum intensity projection
converge to flat minima which features in generalization inherently,
prevents models from vanishing gradient problem, and enables the Maximum intensity projection (MIP) image consists of maximum
network to learn more discriminating features (Zhang et al., 2019). values, which come across the plane of projection, i.e., the 𝑧-axis. MIP
Subsequently, it has achieved impressive results in several visual tasks, images of thoracic CT scans have proven effective for lung nodule
such as image classification (Zhang et al., 2019; Luan et al., 2019; detection (Gruden et al., 2002) and it has been extensively utilized
Xu et al., 2020b) and image segmentation (Wang et al., 2022; Ni
by radiologists to quickly screen lung nodules. MIP images can be
et al., 2022). Most importantly, self-distillation has also been proven
generated of various slab thicknesses; however, research has shown that
effective for improving the performance of the main (teacher) model
slab thicknesses up to 10 mm are the most effective ones, and increasing
for medical image segmentation (Dong et al., 2022; Xie et al., 2022).
the slab thickness further reduces the detection sensitivity (Zheng et al.,
These schemes perform layer-wise context distillation to optimize the
2020). Subsequently, in this work, we utilize MIP images with three
shallow networks during the training process which eventually helps
slab thicknesses, i.e., 3 mm, 5 mm, and 10 mm. The generation of a
the deepest architecture to obtain better performance. However, these
MIP image can be described as follows:
techniques do not utilize shallow networks during the inference process
despite having same backbone as the deepest network. In contrast to 𝐼𝑥,𝑦 = max 𝐷𝑥,𝑦,𝛥𝑧 (1)
𝛥𝑧
the previous techniques in this work, we not only employ the self-
distillation during the training process to improve the lung nodule where, 𝐼𝑥,𝑦 and 𝐷𝑥,𝑦,𝛥𝑧 represent the generated MIP image and sub-
detection ability of the proposed MEDS-Net but also at the inference, volume or slab of CT scan, respectively. Here, 𝛥𝑧 denotes the slab
we exploit the shallow networks, called auxiliary detectors, to reduce thickness, which in our case is set to 3, 5, and 10 mm. Traditionally,
the false positives. MIP is taken in one direction, which helps to distinguish nodules from
We highlight the difference between our work with the existing the vessels penetrating in one direction only. In this work, we take
work in Table 1. it one step further to improve the visibility of nodules by utilizing
3
Fig. 1. The illustration of our proposed computer-aided detection system. Multi-encoder based self-distilled network (MEDS-Net) which incorporates three different inputs, i.e., 3D
sub-volume of CT scan, forward and backward MIP images of three slab thickness (i.e., 3, 5 and 10 mm).
the bidirectional MIP. Therefore, the range of 𝛥𝑧 is defined differently 3.2. Multi-encoder based self-distilled network
while generating the forward and backward MIP image which can be
formulated as follows: Our proposed multi-encoder-based self-distilled network (MEDS-
{[ ] Net) for end-to-end lung nodule detection framework is demonstrated
𝜙𝑐 , 𝜙𝑐 + 𝜙𝑚 , For forward MIP image
𝛥𝑧 = [ ] (2) in Fig. 1. We summarize its working as follows: (1) first 3D sub-volume
𝜙𝑐 − 𝜙𝑚 , 𝜙𝑐 , For backward MIP image (stack of slices) consisting of a total of 11 slices (i.e., 11 × 256 × 256)
where, 𝜙𝑐 and 𝜙𝑚 represent the depth of the current slice in the scan of 1 mm thickness is fed to a dense block which squeezes the repre-
and the thickness of MIP which has to be generated, i.e., 3, 5, and sentation into three channels by using dense units; (2) the compressed
10 mm, respectively. The process of generating the bidirectional MIP features from dense block along with forward and backward MIP
images, in forward and backward directions, with three slab thick- images are inputted into encoder block which contains three deep
encoders to extract the meaningful features from each three-channel
nesses, has been visually demonstrated in Fig. 2(a). Fig. 2(b) shows the
input; (3) the extracted features at various levels of three encoders
changes in the appearance of nodules in forward and backward MIP
are concatenated to combine with the corresponding decoder layers
images. Increasing the thickness of MIP image improves the visibility
via attention units to decoder block; (4) decoder block utilizes the
of vessels which helps to distinguish the nodule from other non-nodular
deconvolutional layers to up-sample the latent features at each stage
structures. The forward and backward MIP of the same depth provide
that are utilized as gating signals to attention units. At five levels of the
different aspects of the nodule. To further elaborate the intuition of
decoder, the up-scaled refined features are inputted to the distillation
introducing bidirectional MIP images, in Fig. 3, backward and forward block and main detector to produce the detection results; (5) finally, the
MIP images of 3 mm thickness with corresponding normal slices have main detector produces all the nodule candidates while the distillation
been shown for four selected nodules, i.e., (a), (b), (c), and (d). In all block utilizes four auxiliary detectors to reshape the refined multi-scale
the samples, the nodules’ appearance in the backward MIP is more features to the dimensions same as the input image, i.e., 256 × 256 for
distinguishable, whereas, in the forward MIP, nodules appear more acquiring the nodule candidates.
connected with vessels which makes the detection harder. Therefore, The details of each block of the proposed MEDS-Net have been
incorporating backward MIP images into the proposed MEDS-Net can described in the following subsections.
greatly complement the network by enhancing the nodule’s visibility.
Moreover, we also normalize the slice thickness along the 𝑧-axis of 3.2.1. Dense block
the raw CT scan to 1 mm while keeping the original pixel spacing in To incorporate the slice-level spatial and sequential information
xy-plane for 3D patch-based input. that is crucial to detect the small nodules, we input a stack of slices,
4
Fig. 2. Demonstration of bidirectional MIP images. (a) Process of generating the forward and backward MIP images. (b) Examples of forward and backward MIP of thicknesses
3 mm, 5 mm, and 10 mm (left to right).
called sub-volume, to the proposed architecture. Concretely, the five of each of our branches is kept small to avoid overfitting. Our first
adjacent slices from both sides of the central slice, i.e., the 3D sub- Conv-Block contains eight feature maps that get doubled after every
volume with dimensions 11 × 256 × 256, is extracted and inputted Conv-Block, and eventually, our final Conv-Block has 256 features.
into a 3D dense block, which is shown in Fig. 4(a). 3D dense block Finally, the output features from the second to sixth Conv-Blocks of
takes 11 slices of 1 mm thickness as input and transforms into the all three encoders are concatenated to feed into the decoder block via
meaningful latent representative of the same dimension as three MIP attention units.
images, i.e., 3 × 256 × 256. The 3D dense block is composed of four sets
of dense units followed by the max-pooling layers, and the fifth dense 3.2.3. Decoder block
unit is followed by the reshaping layer, which squeezes the information The proposed architecture leverages rich information extracted from
to desired dimensions. Fig. 4(b) shows the architecture of each dense three types of inputs, i.e., 3D sub-volume consisting of 1 mm thick
unit which is composed of five sets of 3D convolution, Relu activation, CT slices, and forward and backward MIP images. This information
and batch normalization. The output of each set is propagated to all is extracted from encoders at multiple scales by skip connections to
the next sets using skip connections. After each dense unit features are feed into the decoder block. In the decoder block, we exploit the
down-sampled along the depth, subsequently, at the end of the dense attention gates (Schlemper et al., 2019) to enable the network to
focus on the expected nodule regions in coarse features coming from
block, we have three channels of feature maps. These feature maps are
the encoder block while discarding the redundant ones. The attention
further fed into the encoder block to extract the underlying insights
units achieve this goal by performing element-wise multiplication of
pertaining to nodule detection.
input feature-maps and attention coefficients to highlight important
features as described in Fig. 6. Overall, the decoder block consists
3.2.2. Encoder block
of six deconvolution (De-Conv) blocks that up-samples the inputted
In Fig. 1, the overall structure of the encoder block has been
features while extracting useful information at different levels. Fig. 5(b)
demonstrated, which includes three coding paths, namely, 3D sub-
describes the architecture of the De-Conv block, which consists of two
volume, forward, and backward MIPs encoders. All three encoders are
stages. In the first stage, two convolution blocks are followed by a batch
kept identical and inspired by conventional UNet encoder (Ronneberger normalization layer. Input and output features of the first stage are
et al., 2015). Nevertheless, our encoders are deeper than the encoder combined using a skip connection and forwarded to stage two. The
in the standard UNet to effectively learn meaningful insights from second stage is similar to the first one, followed by the up-sampling
lung structure which are crucial to distinguish the lung nodules. Each layer. The decoder upscales the dimension of features step by step till
encoder consists of six Conv-Blocks which down-sample the input to the final deconvolution block, which has the same feature dimension as
lower dimensions while extracting the meaningful features. As shown the ground truth mask, i.e., 256 × 256. Further, we extract the outputs
in Fig. 5(a), Conv-Block has two stages; the first stage has two 2D of each De-Conv block to feed forward to the distillation block, which
convolutional layers followed by a batch normalization layer, then a utilizes these features to reconstruct the nodule masks.
skip connection is used to combine the input with the output features
of the first convolution stage. This type of short skip connection tends 3.3. Self-distillation mechanism
to stabilize gradient updates (He et al., 2016). In the second stage,
the combined features are passed to the same set of layers as the The proposed framework employs a self-distillation mechanism to
first stage, followed by an additional down-sampling layer. The width improve the training process by using a self-distillation block, which
5
Fig. 3. The illustration of 3 mm thick backward and forward MIP images with a corresponding normalized slice for four selected nodules. Each row, i.e., (a), (b), (c), and (d),
represents one nodule sample.
has been proven effective in optimizing the network’s performance for auxiliary or student detectors in the proposed framework, we utilize
classification tasks (Zhang et al., 2019). We extract the features from three types of losses during the training processes:
four intermediate De-Conv blocks of the decoder block to pass into the Loss 1: The dice loss that is calculated with labels of training data
self-distillation block. Self-distillation block consists of four auxiliary to all the detector networks outputs. It provides equal opportunity to
detectors that distil the knowledge from the main detector. The main each detector network to leverage the supervision from actual labels
detector is connected with the deepest De-Conv block, therefore, having for optimizing their weights.
access to the most refined features that provide enriched insights Loss 2: KL (Kullback–Leibler) divergence loss under the main detector’s
for lung nodule detection. Each auxiliary detector block shares the supervision, which acts as a teacher network. We utilize the outputs of
same architecture which is demonstrated in Fig. 5(c). The auxiliary each auxiliary detector, i.e., softmax output, and output of the main
detector first immediately up-scales the features extracted from the detector to compute the KL divergence. This loss helps the distillation
decoder block to the same dimensions as the inputs’ height and width, of knowledge from the main detector to the auxiliary detectors, forcing
i.e., 256 × 256. The second layer is a 2D convolution layer with Relu them to learn to produce similar results as the deepest detector.
activation, and the final layer is a convolution layer of a single filter Loss 3: This is L2 loss from hint (Adriana et al., 2015) that is computed
followed by softmax activation. Similarly, the main detector architec- between the feature maps of the main detector network and each
ture, which is shown in Fig. 5(d), has a 2D convolution layer with Relu auxiliary detector. The loss enables the auxiliary detectors to distill
activation and a convolution layer of a single filter followed by softmax the implicit knowledge in feature maps from the main detector, which
activation. induces the auxiliary detectors’ feature maps to fit the feature maps of
the main detector.
During the training process, the main detector is trained with a
supervised training scheme by utilizing the ground truth, whereas all 3.4. False positive reduction using auxiliary detectors
the auxiliary detectors are trained as student models via distillation
from the main detector. Here, the main detector can be conceptually In contrast to previous studies (e.g., Pezeshk et al. (2018) and Yuan
regarded as the teacher model. To improve the learning ability of et al. (2021)) on dual-stage techniques, our framework exploits the
6
Fig. 4. Description of 3D dense block which transforms the 3D sub-volume of normalized thoracic CT scans into three-channel latent representation.
Fig. 6. The architecture of attention unit used in residual connections from encoders to
decoder block. Gating signal and feature maps come from the previous de-conv block
and the concatenated encoder features, respectively.
extracted from the 3D outputs of all four auxiliary detectors to detect

the false positives. Given that 𝑛 is the number of voxels in 𝑅𝑜𝑃𝑖 in
the 𝑖𝑡ℎ detector, we can define true positive determination criteria as
Fig. 5. The architectural details of convolutional, de-convolutional, and detector blocks
follows:
are illustrated in (a), (b), and (c), respectively. The architecture of the attention unit is
( )
1 ∑∑
used in residual connections from encoders to decoder blocks. Gating signal and feature 𝑘 𝑛
maps come from the previous de-conv block and the concatenated encoder features, 𝑖𝑠𝑇 𝑃 = 𝑡ℎ𝑟 𝑅𝑜𝑃𝑖 (𝑗), 𝜏 (3)
respectively. 𝑛𝑘 𝑖=1 𝑗=1
{
1, 𝜃 > 𝜏
𝑡ℎ𝑟(𝜃, 𝜏) = (4)
0, otherwise
auxiliary detectors of the same network to reduce the false positives at
the time of inference. All the nodules detected by the main detector are In the above equations, 𝑘 is the number of auxiliary detectors, which
considered nodule candidates, and the auxiliary detectors complement in our case are four, and 𝜏 is the threshold of the nodule and non-
the detection of the main detector to validate each nodule candidate. nodule probabilities. While 𝑡ℎ𝑟(.) is a binarized function to binarize the
Concretely, our main detector provides all the nodule candidates, and possibility value 𝜃, which is in the range of 0 to 1. We normalize the
we determine the 3D bounding box, by incorporating all the nodular whole submissions of probabilities by 𝑛𝑘 to limit the accumulation to
voxels, which act as a region of the proposal (𝑅𝑜𝑃 ) for each nodule 1. We apply the same criteria to all the nodule candidates after the
candidate. The similar 𝑅𝑜𝑃 𝑠 of the same dimensions and positions are inference to remove the false positives.
7
Fig. 7. Description of the steps involved in the lung parenchyma segmentation. (a) Raw thoracic CT image. (b) Mask of the air regions. (c) Mask after the removal of irrelevant
objects other than the lung region. (d) Dilated mask of the lung parenchyma. (e) Image of segmented lung parenchyma.
4. Experimental setup In the second stage, we perform the scan normalization. Since the
data set is collected from various CT scanners in the LIDC/IDRI dataset,
4.1. Datasets we set the window level from −1000 HU to 400 HU and normalized
images to the range between 0 and 1. We also normalized the slice
In this work, we utilize the lung image database consortium and thickness to 1 mm and cropped each slice equally from all sides while
image database resource initiative (LIDC/IDRI) dataset (Armato et al., keeping the lung at the center for normalizing to 256 × 256 dimensions.
2011) for all the experimentation. LIDC/IDRI dataset consists of 1018
thoracic computed tomography (CT) scans acquired from 1010 differ- 4.3. Training strategy
ent patients, which are annotated to facilitate computer-aided systems
for the assessment of lung nodule detection, classification and quantifi- We divide the whole dataset scans into ten subsets to perform 10-
cation.The resolution of CT scans becomes particularly crucial when fold cross-validation. For each fold, training, validation, and test set
the primary goal is the detection of lung nodules, which often re- split are set to 80%, 10%, and 10% of the total scans, respectively. The
quire a higher resolution to distinguish and accurately classify. In the batch size is set to 3 due to the memory limitations of the GPU. We
LIDC/IDRI dataset, slice thickness varies from 0.6 mm to 5.0 mm. How- used adam optimizer with an initial learning rate of 0.001 and first
ever, the scans with a slice thickness greater than 2.5 mm are typically and second momentum of 0.9 for the decay of the learning rate. We
not recommended for lung nodule detection (Kazerooni et al., 2014). use early stopping with patience of 10 epochs to avoid overfitting.
Such scans might fail to capture the minute details of smaller nodules, This study exploits three types of losses to train the model as
potentially leading to false negatives. Subsequently, we remove the described in Section 3.3, which can be written as
scans with slice thickness greater than 2.5 mm as such low-resolution
scans are not recommended for nodule screening (Kazerooni et al., Total = 1 + 2 + 3 (5)
2014). To formulate these losses, we denote our main and auxiliary detec-
However, while refining our dataset, we are also cognizant of tors by 𝜃𝑚 and 𝜃𝑖∕𝑘 detectors, respectively. Where, 𝑖 ∈ [[1, 𝑘]] and 𝑘 is
the fact that in real-world settings, we often encounter noisy data, the number of auxiliary detectors.
which may not always adhere to ideal conditions. It is essential for
a computer-aided detection system to be robust to such variabilities. ∑
𝑘
1 = DSC (𝜃𝑚 , 𝑦) + (1 − 𝛼) DSC (𝜃𝑖∕𝑘 , 𝑦) (6)
By focusing our study on high-resolution scans, we aim to build a 𝑖=1
foundation upon which the system’s capabilities can be honed. Future 2. ∣ 𝑥 ∩ 𝑦 ∣
iterations and validations of the model could consider noisy and less- DSC (𝑥, 𝑦) = 1 − (7)
∣𝑥 ∪ 𝑦∣
than-ideal scans to gauge and further improve the system’s robustness.
The second loss Kullback–Leibler (𝐾𝐿) divergence between each
LIDC/IDRI dataset is annotated in two steps by four radiologists.
auxiliary and main detector can be written as:
Firstly, each radiologist annotated the nodules individually, and later
all the radiologists jointly analyzed each annotation to reassess their ∑
𝑘
annotations. For this study, we only utilized nodules having diameters 2 = 𝛼. 𝐾𝐿(𝜃𝑖∕𝑘 , 𝜃𝑚 ) (8)
𝑖=1
equal to or greater than 3 mm due to their clinical significance (Team,
2011). In order to avoid possible outliers, we only considered the The third supervision comes from the features of the main deepest
nodules accepted by at least 3 out of 4 radiologists as the reference detector to each auxiliary detector. Due to the difference in the size
standard. After applying the aforementioned criteria, 888 scans were of features inputted to each detector, additional convolution layers
left, which were utilized for all the experimentation. in detector blocks are employed to align the dimensions prior to the
extraction of feature maps to calculate the feature loss. Feature loss
4.2. Data preprocessing can be formulated as:
∑
𝑘
We preprocess all the scans prior to feeding them into the proposed 3 = 𝜆. ∣ 𝐹𝑖∕𝑘 , 𝐹𝑚 ∣ (9)
MEDS-Net. Our preprocessing consists of two stages; lung parenchyma 𝑖=1
segmentation and scan normalization. where 𝐹𝑖∕𝑘 and 𝐹𝑚 represents the features from 𝜃𝑖∕𝑘 and 𝜃𝑚 detectors.
In the first stage, we automatically segment the lung parenchyma While two hyper-parameters 𝛼 and 𝜆 are used to moderate the role of
to exclude irrelevant regions such as clothes, machine objects, tissues, each loss.
spines, or ribs in the scans. Fig. 7 demonstrates the steps followed to
perform lung segmentation. Initially, an air mask is created, and wa- 4.4. Evaluation parameters
tershed algorithm (Angulo and Jeulin, 2007) is applied to remove the
region other than the lung as shown in Fig. 7(b) and (c), respectively. While comparing different CADe systems for lung nodule detection,
In order to avoid missing wall-attached lesions that might be nodules, its imperative to focus on metrics that not only quantify performance
we keep more boundary information by binary morphology operations, but also signify clinical relevance. In this section, we elaborate on the
i.e., closing and dilation. Finally, the binarized scan was masked with evaluation metrics used throughout our manuscript to quantify the
the original scan. performance of our lung nodule detection system. We also elucidate
8
the importance of our chosen evaluation metrics in the context of the Table 2
Performance comparison with other computer-aided detection systems on the LIDC/IDRI
proposed MEDS-Net’s efficacy.
dataset at nodule candidate detection stage.
4.4.1. Competition Performance Metric (CPM)
CADe systems Sensitivity Total number Average number of
The Competition Performance Metric (CPM) is a measure specifi- (%) of candidates candidates per scan
cally devised for comparing different computer-aided detection systems
Zheng et al. (2019) 95.40 18,116 20.40
in a competitive setting. Mathematically, the CPM for a system can be Pereira et al. (2021) 98.15 44,111 49.67
represented as: Wang et al. (2019) 96.80 53,484 60.20
∑𝑁 Setio et al. (2017) 98.30 754,975 850.2
𝑆𝑖 Zhao et al. (2022) 94.70 18,116 14.10
𝐶𝑃 𝑀 = 𝑖=1
𝑁 Our method 97.80 19,190 21.61
where 𝑆𝑖 is the sensitivity of the system for the 𝑖th test case and 𝑁 Note: The results presented are sourced directly from previously published research
is the total number of test cases. A higher CPM indicates a superior papers.
performance, denoting that the detection system is both accurate and

efficient.
4.4.2. Lung nodule detection sensitivity (%) 5.1. Overall performance analysis
Sensitivity, often referred to as the True Positive Rate, quantifies
how effectively a system identifies the presence of lung nodules. It is Typical CADe systems consist of two stages, i.e., candidate detection
calculated using the formula: and false positive reduction, and mostly two separate networks are
𝑇𝑃 trained to achieve this goal. Nevertheless, in this study, the proposed
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦(%) = × 100 framework combines both stages by employing self-distillation during
𝑇𝑃 + 𝐹𝑁
where 𝑇 𝑃 is the number of true positives (actual nodules correctly the training and utilizing the auxiliary detectors for false positive reduc-
identified) and 𝐹 𝑁 is the number of false negatives (actual nodules tion. For a comprehensive analysis of the proposed CADe system, we
compare our performance with previously published methods for the
missed by the system). Sensitivity offers a direct measure of the sys-
candidate detection stage as well as after the false positive reduction
tem’s capability to correctly identify nodules, an essential trait for any
stage. Below we compare the performance of the proposed MEDS-Net
clinical detection system.
with multiple state-of-the-art studies.
4.4.3. Average number of nodule candidates per scan
This metric gives an average count of potential nodule candidates
5.1.1. Performance at nodule candidates detection stage
identified by the system for each scan. While not a direct measure of
All the nodules detected by the main, the deepest detector, of the
accuracy, it provides insight into the system’s tendency towards being
proposed MEDS-Net are considered nodule candidates. The capability
conservative or aggressive in its detection approach. It is given by:
of nodule detection of the proposed framework has been compared with
∑𝑁
𝐶𝑖 the recently published deep learning-based individual candidate detec-
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝐶𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒𝑠 𝑝𝑒𝑟 𝑆𝑐𝑎𝑛 = 𝑖=1
𝑁 tion systems. Table 2 summarizes the results of candidate detection
where 𝐶𝑖 represents the number of candidates detected for the 𝑖th scan stages of previously published works. Pereira et al. (2021) and Setio
and 𝑁 is the total number of scans. et al. (2017) outperform the proposed method in terms of sensitivity.
4.4.4. False positives per scan However, in order to localize as many candidates as possible, these
This metric measures the average number of false positive detec- CADe systems obtained a several-fold higher number of FPs. Although
tions per scan, providing an indication of the system’s propensity for Zheng et al. (2019) and Zhao et al. (2022) achieved the lowest can-
making erroneous identifications. It can be represented as: didates per scan, these CADe systems had lower sensitivity compared
∑𝑁 to our proposed scheme. Most importantly, among these two CADe
𝐹 𝑃𝑖 systems, Zheng et al. (2019) obtained superior sensitivity of 95% by
𝐹 𝑃 𝑝𝑒𝑟 𝑆𝑐𝑎𝑛 = 𝑖=1
𝑁 exploiting the unidirectional MIP images of various thicknesses which
where 𝐹 𝑃𝑖 is the number of false positives for the 𝑖th scan and 𝑁 is the shows the efficacy of MIP image for lung nodule detection. Zheng et al.
total number of scans. A lower value for this metric is desirable as it (2019) trained four individual 2D networks (i.e., UNet) with unidirec-
implies fewer incorrect nodule identifications per scan. tional MIP images to obtain such higher sensitivity. Nevertheless, our
The Average Number of Nodule Candidates per Scan and False Positives framework achieved 97.8% sensitivity with a single self-distillation-
per Scan jointly provide insight into the system’s precision and potential based network by leveraging the bidirectional MIP images. The results
for over-detection. A balanced value indicates that MEDS-Net can effec- demonstrate that incorporating bidirectional MIP images in the pro-
tively differentiate nodules from non-nodular structures without being posed self-distilled network significantly improves the performance on
overly conservative or aggressive. nodule detection.
5. Experimental results and discussion 5.1.2. Performance after false positives reduction
To provide a comparative analysis of the complete pipeline of the
In this section, we perform a thorough examination of the proposed proposed framework, we chose recent DL-based studies which uti-
MEDS-Net framework, focusing on both its overall performance and the lized the LIDC/IDRI dataset. We select competition performance metric
findings from our ablation study. We begin in Section 5.1 by juxtapos- (CPM) (Niemeijer et al., 2010) for comparison. Table 3 summarizes the
ing the performance of our MEDS-Net with that of previously published performance of these techniques. In the table, the columns under ’False
studies. Specifically, Section 5.1.1 evaluates our method for detecting Positive Per Scan’ represent the sensitivity of the CADe systems at vari-
lung nodule candidates against existing research, while Section 5.1.2 ous allowed false positive rates. Specifically, for each false positive rate
extends the comparison to assess our framework following the false (e.g., 0.5 false positives per scan), the corresponding value illustrates
positive reduction phase. Transitioning to Section 5.2, we direct our the proportion of actual nodules the system could detect. Therefore,
attention to a comprehensive ablation study. Here, the efficacy of every a higher value indicates superior nodule detection capability for that
individual input and module within MEDS-Net is scrutinized. While specific false positive allowance. Most of the listed methods are dual-
Section 5.2.1 narrows its focus to the nodule candidates detection stage, stage in which nodule candidate detection and false positive reduction
Section 5.2.2 broadens the lens to examine the entirety of the MEDS-Net are performed with different networks. Among all the listed methods,
framework. our proposed method outperforms in terms of CPM which demonstrates
9
Table 3
Complete pipeline performance comparison with other computer-aided detection systems on the LIDC/IDRI dataset.
CADe system Year Type of scheme False positive per scan CPM
0.125 0.25 0.5 1 2 4 8
Wang et al. (2019) 2018 Dual-stage 0.788 0.847 0.895 0.934 0.952 0.959 0.963 0.903
Pezeshk et al. (2018) 2019 Dual-stage 0.637 0.723 0.804 0.865 0.907 0.938 0.952 0.832
Zheng et al. (2019) 2019 Dual-stage 0.876 0.899 0.912 0.927 0.942 0.948 0.953 0.922
Cao et al. (2020) 2020 Dual-stage 0.848 0.899 0.925 0.936 0.949 0.957 0.96 0.925
Li and Fan (2020) 2020 Single-stage 0.60 0.674 0.751 0.824 0.85 0.853 0.859 0.773
Pereira et al. (2021) 2021 Dual-stage 0.812 0.875 0.918 0.949 0.957 0.970 0.976 0.922
Zhou et al. (2022) 2022 Dual-stage 0.742 0.840 0.899 0.925 0.944 0.954 0.959 0.895
Zhao et al. (2022) 2022 Dual-stage 0.656 0.754 0.833 0.917 0.951 0.97 0.977 0.865
Zhu et al. (2022) 2022 Single-stage 0.782 0.834 0.893 0.917 0.932 0.952 0.956 0.895
Mei et al. (2021) 2022 Dual-stage 0.712 0.802 0.865 0.901 0.937 0.946 0.955 0.874
Guo et al. (2021) 2022 Single-stage 0.832 0.886 0.928 0.937 0.946 0.952 0.959 0.92
Our framework 2023 Single-stage 0.883 0.915 0.928 0.941 0.953 0.962 0.968 0.936
Table 4
Summary of results obtained from different configurations of proposed model on LIDC/IDRI dataset for lung nodule detection stage.
Model # Input type Attention Self distillation Performance
3D Sub-volume Forward MIP images Backward MIP images Sensitivity (%) Total number of candidates Candidates per scan
1 ✓ ✗ ✗ 90.22 42,476 47.83
2 ✗ ✓ ✗ 93.84 36,281 40.86
3 ✗ ✗ ✓ ✓ ✓ 94.10 35,958 40.49
4 ✓ ✓ ✗ 96.12 24,345 27.42
5 ✗ ✓ ✓ 95.78 27,863 31.38
6 ✗ ✗ 92.16 41,954 47.25
7 ✓ ✗ 93.59 30,096 33.89
✓ ✓ ✓
8 ✗ ✓ 95.45 22,847 25.73
9 ✓ ✓ 97.81 19,190 21.61
the evident effectiveness of the proposed auxiliary detectors-based false sub-volume, forward, and backward MIP images, respectively. These
positive reduction scheme. Pereira et al. (2021) had slightly better networks are equipped with attention units and self-distillation mech-
performance for higher FPs per scan (i.e., ≥1). A possible explanation is anisms for lung nodule detection tasks. The results demonstrate that
that they had utilized a two-stage scheme in which they first exploited models 2 and 3 with MIP images of various thicknesses achieved better
a mask region-convolutional neural network (Mask R-CNN) for nodule sensitivity with a lower number of candidates than the 3D input-based
detection and employed a classifier ensemble based on CT attenuation model. This proves that the incorporation of MIP images significantly
patterns for false positive reduction, making the technique more com- improves the learning ability of networks which helps to better dis-
plex. Nevertheless, our method performs significantly better for smaller tinguish between nodule and non-nodular structures. Whereas, models
values of FPs per scan (i.e., <1). Similarly, Zheng et al. (2019) obtained 2 and 3 depict quite similar performance which shows that forward
better performance than Pereira et al. (2021) for smaller values of FPs and backward MIP images provide different yet the same quantity
per scan (i.e., <1) which can be attributed to the incorporation of MIP of information, i.e., unidirectional insights, to the network and their
images which enhances the nodule detection capability of networks. individual effect is the same. Models 4 and 5 are trained with two types
Zheng et al. (2019) utilized unidirectional MIP images, whereas our of inputs by using dual-encoder-based architecture having attention
scheme exploits bidirectional MIPs with different levels of thickness to units and a self-distillation mechanism. Particularly, we combined 3D
achieve superior results. Most importantly, among single-stage meth- sub-volume with forward MIP images and forward MIP images with
ods, our technique achieved significantly better performance for all the backward MIP images as inputs to models 4 and 5, respectively. It
range of FPs per scan which proves that incorporating self-distillation can be observed that combining the two types of inputs improves
greatly assists the optimization of the network to improve the lung the sensitivity and also reduces the number of candidates per scan.
detection rate efficiently. However, model 4 performed slightly better than model 5 which can
be attributed to the diverse type of inputs that enable the network
5.2. Ablation study to leverage the exceptional abilities of MIP images to provide deeper
insights and 3D patch input that improves the localization of nodules.
To demonstrate the effectiveness of each component incorporated The visual examination of feature maps offers a deeper understand-
into the proposed MEDS-Net architecture, we implemented various ver- ing of performance differences, and it is paramount to appreciate the
sions of the proposed framework. We analyzed results after the nodule advantages brought by the inclusion of forward and backward MIP
detection stage as well as after the false positives reduction stage. The images in our proposed MEDS-Net. To elucidate this, we delve into the
following subsections describe the details of these experiments and the Class-Activation Map (CAM) from models 1, 4, and 9, corresponding
corresponding results. to single, dual, and triple input configurations, detailed in Table 4.
Refer to Fig. 9, which contrasts the CAMs for the said models against
5.2.1. Analysis of candidates detection stage the ground truth. Model 1, relying solely on raw 3-D slices, manifests
We initiate our evaluation by analyzing the contribution of each more diffuse feature maps, inadvertently drawing attention to non-
component of MEDS-Net for the lung nodule detection stage. For this nodular regions, leading to increased false positives. In contrast, model
purpose, we implemented eight downgraded versions of MEDS-Net, 4, assimilating forward MIP images and 3-D raw slices, presents more
having different configurations and architecture, to evaluate their lung concentrated feature maps. Yet, it occasionally struggles to discern
nodules detection performance. Table 4 summarizes the results ob- bronchi and vessels from genuine nodules. Model 9, the cornerstone of
tained from these models for the lung nodules detection stage. First, to our proposal—the MEDS-Net, takes in a trio of inputs: raw 3-D CT slices
fairly analyze the effect of each input individually, models 1 to 3 utilize and both forward and backward MIP images. Evidently, it exhibits the
single encoder-based architecture which takes only one input among 3D most adeptness, underscoring that the bidirectional MIP images amplify
10
efficiency. Finally, the best performance is shown by model 9, which

represents the proposed MEDS-Net architecture, having attention and
a self-distillation mechanism with three types of inputs. The results
evidence that the combination of attention and self-distillation mecha-
nism greatly increases our architecture’s learning ability, enabling it to
better distinguish between the nodules and non-nodular structures in
the complex lung region.
To analyze the impact of self-distillation, we implemented the vari-
ant of the proposed architecture without self-distillation. Since MEDS-
Net is a deeper network, it is critical to ensure that the network
is optimally trained without encountering vanishing gradient prob-
lems (Glorot and Bengio, 2010). Subsequently, we analyzed the effect of
self-distillation during the training process. The training and validation
curves from the deepest detector of both variants have been shown in
Fig. 10. Conceptually, self-distillation is an extended version of deep
supervision that is proven to be effective in addressing the vanish-
ing gradient problem for segmentation networks (Dou et al., 2016).
The results show that the self-distillation mechanism in the proposed
framework assists the training process to optimize network learning.
It improves the convergence rate and helps optimize the learning by
achieving lower training and validation losses. These improvements can
be attributed to the supervision of the intermediate layers at multiple
levels, which encourages the earlier layers to learn more meaningful
and representative features for improved detection of lung nodules.
We further analyze the performance of each detector, including
main and auxiliary detectors, in the proposed MEDS-Net. Table 5
summarized the results obtained from all the detectors, including the
baseline implementation of multi-encoder-based architecture without
self-distillation. An ensemble result is obtained by including all the
nodule candidates detected by each detector. The results of those
detectors with inferior performance to the baseline model have been
highlighted in red. Further, the qualitative results of each detector have
been illustrated in Fig. 8 on three different slices containing two or
more nodules. The results demonstrate that (i) The proposed MEDS-
Net has effectively detected almost all the nodules and achieved the
sensitivity 99.41% with its five detectors. (ii) The overall performance
of deeper detectors is improved; however, several nodules missed by
deeper networks have been detected by the less deep network, indicat-
ing that each detector has detected nodules independently. The same
has been demonstrated in Fig. 8. (iii) The self-distillation plays a crucial
role in improving the deepest network, the main detector. Introducing
four auxiliary detectors improves the 4.08% sensitivity of the main
detector and significantly reduces the number of false positives per
scan.
Fig. 8. Illustration of detected nodule candidates by all four auxiliary detectors and
main detector on three samples, i.e., (a), (b), and (c). Bounding boxes in green, 5.2.2. Analysis after false positive reduction
yellow, and red colors represent the gold standard, true positives, and false positives,
In this section, we analyze the effect of each input incorporated in
respectively.
the proposed MEDS-Net architecture on the complete pipeline. Con-
cretely, the MEDS-Net exploits bidirectional MIP images along with
the 3D patch to detect the lung nodules; therefore, it is important
the MEDS-Net’s efficacy in distinguishing nodules from non-nodular to analyze the impact of bidirectional MIP images after false positive
areas. reduction. We implemented single-encoder and dual-encoder versions
From models 6 to 9, we analyze the effectiveness of incorporat- of the proposed architecture with a complete pipeline that includes the
ing the attention units and self-distillation in our proposed archi- false positive reduction stage. We implemented three single-encoder-
tecture. For this purpose, we fixed our input pipeline and utilized based variants that take one input at a time and trained these models
multi-encoder-based architecture incorporating the 3D sub-volume and with the 3D patch, forward, and backward image inputs individually.
bidirectional (forward and backward) MIP images. Among the four Similarly, we implemented two dual-encoder-based variants, i.e., one
versions, in model 6, we observe severe degradation in the sensi- with bidirectional MIP image inputs and another with 3D patch and
tivity when attention units and self-distillation are removed. It also forward MIP images. All the variants are implemented with a self-
immensely increased the number of nodule candidates, which shows distillation mechanism and followed by the same pipeline. To compare
that the attention and self-distillation mechanism significantly con- the performances of these variants models with the proposed MEDS-Net
tributes to reducing the number of false positives at the candidate for nodule detection, we analyze the free-response receiver operating
detection stage. Concretely, the individual incorporation of attention characteristic (FROC) (Bandos et al., 2009) curves of each variant
units and self-distillation mechanism improve 3.94% and 5.78% sen- shown in Fig. 11. The single-encoder-based architecture with 3D patch
sitivity, respectively. Most importantly, the self-distillation mechanism input has the lowest performance, and it becomes worse when we
greatly reduces the number of candidates, improving the architecture’s reduce the number of false positives per scan. Since the 3D patch of
11
Fig. 9. Visualization of CAMs for diverse input configurations in the MEDS-Net. Each row showcases an input image demarcated by the ground truth, succeeded by CAMs from
models #1, #4, and #9.
Fig. 10. Comparison of learning curves of the training and validation curves of proposed multi-encoder based networks with and without self-distillation mechanism has been
demonstrated.
scan consists of 11 slices of 1 mm thickness (i.e., of depth 10 mm) similar; however, better performance compared to its 3D patch variant.
which is insufficient to extract detailed insights of the nodule in the MIP image based models are able to achieve almost 80% sensitivity
complex surroundings within the lung, it is challenging to detect nod- even with 0.125 FPs per scan which demonstrates the effectiveness of
ules without false positives. The nodule bigger than 10 mm appear MIP images for lung nodule detection.
similar to the vessels, and small vessels have quite similar structural We also compared dual-encoder-based architectures, for which we
appearance to the nodule, which confuses the network and results in implemented two combinations. At first, we utilized bidirectional MIP
poor performance. Since conceptually, both directions of MIP images images and achieved significantly improved performance, which can
contain the same amount of information, subsequently, the single- be attributed to the fact that MIP images enhance the visibility of
encoder versions with forward and backward MIP images demonstrate lung nodules, making it easier for the network to distinguish them
12
Fig. 11. Free-response receiver operating characteristic (FROC) curves of various versions of proposed MEDS-Net architecture along with MEDS-Net results on the LIDC/IDRI
database.
Table 5
Performance comparison of each detector of proposed MEDS-Net for lung nodule detection on different size of nodules.
Total number of scans: 888
Total number of nodules: 1186 (3–10 mm: 905, 10–20 mm: 231, ≥20 mm: 50)
Feature level No. of detected No. of detected No. of detected Total No. of Sensitivity False positives FPs per scan
nodules (3–10 nodules (10–20 nodules (≥20 detected nodules (%) (FPs)
mm) mm) mm)
Multi-encoder net 862 202 50 1114 93.93 28 910 32.56
(without
self-distillation)
Auxiliary detector (1/4) 841 195 46 1082 91.23 30 226 34.04
Auxiliary detector (2/4) 857 198 48 1103 93 26 954 30.35
Main detector 889 221 50 1160 97.81 18 004 20.27
Ensemble 901 228 50 1179 99.41 36 780 41.42
from other anatomical structures. Most importantly, bidirectional MIP the likelihood of a nodule being present. Table 6 summarizes the
images cover a reasonable depth of scan, i.e., 30 mm, which enables the performance metrics: false positive reduction (evaluated via CPM score)
network to better explore the insights of each suspected nodule region. and computational efficiency (measured in terms of the number of
However, MIP images have certain limitations while representing the networks involved, the total count of network parameters, cumulative
information as they rely on maximum intensities, suppressing the infor- training time, and overall inference duration). The findings suggest that
mation in the intermediate intensities. Therefore, in clinical practice, our proposed technique not only excels over traditional false positive
radiologists examine the nodules in the raw CT slices after the initial reduction methods but also showcases computational efficiency. The
screen in MIP images. To this end, we also implemented a dual-encoder- shorter training and inference times are especially crucial for real-
based variant with a 3D patch and forward MIP images. The results time clinical scenarios. The noteworthy performance in reducing false
demonstrate that the combination of 1 mm thick stack of slices with positives can be attributed to the enhanced learning capability of the
unidirectional MIP images are more effective than bidirectional MIPs auxiliary branches in MEDS-Net, a result of shared weights, which
as it equips the network with rich information which contains high- bolsters nodule recognition. Given that our method requires no addi-
level 3D aspect in the form of MIP as well as low-level, 2D slice-level tional networks, only one network is trained, leading to a reduction
information that helps to capture the subtle variations and precisely in the number of network parameters. This translates to less time
locate the small nodules. Finally, combining raw 3D patches of scan being needed for both training and inference, emphasizing the proposed
with bidirectional MIPs images in MEDS-Net depicts the optimal and scheme’s suitability for clinical implementation.
consistent performance for lung nodule detection.
To assess the effectiveness of the proposed auxiliary false posi- 6. Conclusions and future works
tive reduction technique, we implemented a dual-stage variant of our
framework. In this scheme, MEDS-Net remains consistent for detecting This work proposes a novel multi-encoder-based self-distillation
nodule candidates as in the single-stage approach. However, for false network (MEDS-Net) that leverages bidirectional maximum intensity
positive reduction, two distinct 3-D CNNs are used those are fed with projection (MIP) images for a lung nodule detection system. Partic-
volume sizes of 32 × 32 × 32 and 16 × 16 × 16 voxels, analogous to ularly, along with a 3D patch of the scan, the proposed framework
the method in Zheng et al. (2019). All nodule candidates identified utilized forward and backward MIP images of 3, 5, and 10 mm thick-
by MEDS-Net are used to train both 3-D CNNs, which then compute ness as input to improve the network learning. Most importantly, unlike
13
Table 6
Performance and computational comparison of single-stage and dual-stage versions of proposed MEDS-Net for lung nodule
detection.
Framework details CPM Total No. of Total number Total training Total inference
networks trained of parameters time (h) time (s)
MEDS-Net (Single-stage 0.936 1 510,566 3.416 25
architecture)
MEDS-Net (Dual-stage 0.928 3 723,301 4.666 38
architecture)
conventional computer-aided detection (CADe) systems, the study ex- References

ploited the auxiliary detectors of the same network to reduce the false
positives, which avoids the extra computational costs caused by the Adriana, R., Nicolas, B., Ebrahimi, K.S., Antoine, C., Carlo, G., Yoshua, B., 2015. Fitnets:
Hints for thin deep nets. In: Proc. ICLR, Vol. 2.
separate false positive reduction networks. The key highlights are as
Angulo, J., Jeulin, D., 2007. Stochastic watershed segmentation. In: ISMM (1). pp.
follows: 265–276.
Armato, III, S.G., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer, C.R.,
1. The framework has been comprehensively evaluated using the Reeves, A.P., Zhao, B., Aberle, D.R., Henschke, C.I., Hoffman, E.A., et al., 2011.
widely used LIDC/IDRI public dataset. The results using the The lung image database consortium (LIDC) and image database resource initiative
proposed MEDS-Net showed that this network has excellent (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys.
detection performance for diverse nodule morphology and has 38 (2), 915–931.
Baldwin, D.R., 2015. Prediction of risk of lung cancer in populations and in pulmonary
the ability to distinguish false-positive nodules accurately. The
nodules: significant progress to drive changes in paradigms. pp. 1–3.
CPM scores of the proposed method can be as high as 0.936, Bandos, A.I., Rockette, H.E., Song, T., Gur, D., 2009. Area under the free-response ROC
superior to the most advanced competitive networks. curve (FROC) and a related summary index. Biometrics 65 (1), 247–256.
2. MEDS-Net exploits multi-scale features learning by using in- Cao, H., Liu, H., Song, E., Ma, G., Xu, X., Jin, R., Liu, T., Hung, C.-C., 2020. A two-stage
termediate auxiliary outputs branches originating from various convolutional neural networks for lung nodule detection. IEEE J. Biomed. Health
Inform. 24 (7), 2006–2015.
levels of decoder block to minimize the false positives. Dong, H., Chen, Z., Zhao, J., Yuan, M., Yu, F., Zhang, J., Zhang, L., 2022. Abdominal
3. The experiments have shown that employment of self-distillation organ segmentation via self training.
significantly improves the learning process by supervising the Dou, Q., Chen, H., Jin, Y., Yu, L., Qin, J., Heng, P.-A., 2016. 3D deeply supervised
intermediate layers of the decoder block. Ablation experiments network for automatic liver segmentation from ct volumes. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention.
have also demonstrated that all the components in the pro-
Springer, pp. 149–157.
posed MEDS-Net are chosen carefully for effective lung nodule Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep feedforward
detection. neural networks. In: Proceedings of the Thirteenth International Conference on
Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings,
The exceptional nodule detection capability of our proposed CADe pp. 249–256.
system holds promise not only in assisting radiologists with swift lung Gruden, J.F., Ouanounou, S., Tigges, S., Norris, S.D., Klausner, T.S., 2002. Incremental
nodule diagnosis but also in laying the groundwork for a holistic, benefit of maximum-intensity-projection images on observer detection of small
pulmonary nodules revealed by multidetector CT. Am. J. Roentgenol. 179 (1),
automated lung diagnosis system capable of both nodule segmentation
149–157.
and classification based on severity and texture of lung nodules. Guleryuz Kizil, P., Hekimoglu, K., Coskun, M., Akcay, S., 2020. Diagnostic importance
Although the study effectively emulated clinical workflows via the of maximum intensity projection technique in the identification of small pulmonary
MEDS-Net architecture, a clear limitation is the framework’s inter- nodules with computed tomography.
pretability, a critical facet in the realm of medical imaging. Moreover, Guo, Z., Zhao, L., Yuan, J., Yu, H., 2021. MSANet: Multiscale aggregation network
integrating spatial and channel information for lung nodule detection. IEEE J.
the validation of our framework was predominantly anchored on the Biomed. Health Inf. 26 (6), 2547–2558.
benchmark LIDC/IDRI dataset. However, to ascertain its broad applica- Halder, A., Dey, D., Sadhu, A.K., 2020. Lung nodule detection from feature engineering
bility and robustness, its imperative to deploy and evaluate it in real- to deep learning in thoracic CT images: a comprehensive review. J. Digit. Imaging
time clinical scenarios. As a forward-looking measure, our subsequent 33 (3), 655–677.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recog-
endeavors aim to bolster the framework’s interpretability, potentially
nition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
through algorithm unrolling techniques, and undertake comprehensive Recognition. pp. 770–778.
evaluations in real-world clinical settings. Hinton, G., Vinyals, O., Dean, J., 2015. Distilling the knowledge in a neural network.
arXiv preprint arXiv:1503.02531.
CRediT authorship contribution statement Hui, Z., Wang, X., Gao, X., 2018. Fast and accurate single image super-resolution
via information distillation network. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition. pp. 723–731.
Muhammad Usman: Conceptualization, Methodology, Software, Jabeen, N., Qureshi, R., Sattar, A., Baloch, M., 2019. Diagnostic accuracy of maximum
Validation, Formal analysis, Writing – original draft. Azka Rehman: intensity projection in diagnosis of malignant pulmonary nodules. Cureus 11 (11).
Software, Writing – original draft, Investigation, Validation. Abdullah Jiang, L., Zhou, W., Li, H., 2019. Knowledge distillation with category-aware attention
and discriminant logit losses. In: 2019 IEEE International Conference on Multimedia
Shahid: Software, Investigation, Visualization. Siddique Latif: For-
and Expo (ICME). IEEE, pp. 1792–1797.
mal analysis, Writing – review & editing. Yeong-Gil Shin: Validation, Kazerooni, E.A., Austin, J.H., Black, W.C., Dyer, D.S., Hazelton, T.R., Leung, A.N.,
Supervision. McNitt-Gray, M.F., Munden, R.F., Pipavath, S., 2014. ACR–STR practice parameter
for the performance and reporting of lung cancer screening thoracic computed
Declaration of competing interest tomography (CT): 2014 (resolution 4). J. Thorac. Imaging 29 (5), 310–316.
Li, Y., Fan, Y., 2020. Deepseed: 3D squeeze-and-excitation encoder-decoder convo-
lutional neural networks for pulmonary nodule detection. In: 2020 IEEE 17th
The authors declare that they have no known competing finan- International Symposium on Biomedical Imaging (ISBI). IEEE, pp. 1866–1869.
cial interests or personal relationships that could have appeared to Li, Q., Jin, S., Yan, J., 2017. Mimicking very efficient network for object detection. In:
influence the work reported in this paper. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
pp. 6356–6364.
Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., Pietikäinen, M., 2020. Deep
Data availability learning for generic object detection: A survey. Int. J. Comput. Vis. 128, 261–318.
Luan, Y., Zhao, H., Yang, Z., Dai, Y., 2019. Msd: Multi-self-distillation learning via
The authors do not have permission to share data. multi-classifiers within deep neural networks. arXiv preprint arXiv:1911.09418.
14
Luo, X., Song, T., Wang, G., Chen, J., Chen, Y., Li, K., Metaxas, D.N., Zhang, S., Team, N.L.S.T.R., 2011. Reduced lung-cancer mortality with low-dose computed
2022. SCPM-net: An anchor-free 3D lung nodule detection network using sphere tomographic screening. N. Engl. J. Med. 365 (5), 395–409.
representation and center points matching. Med. Image Anal. 75, 102287. Tung, F., Mori, G., 2019. Similarity-preserving knowledge distillation. In: Proceedings
Masood, A., Sheng, B., Yang, P., Li, P., Li, H., Kim, J., Feng, D.D., 2020. Automated of the IEEE/CVF International Conference on Computer Vision. pp. 1365–1374.
decision support system for lung cancer detection and classification via enhanced Wallis, J.W., Miller, T.R., Lerner, C.A., Kleerup, E.C., 1989. Three-dimensional display
RFCN with multilayer fusion RPN. IEEE Trans. Ind. Inform. 16 (12), 7791–7801. in nuclear medicine. IEEE Transactions on Medical Imaging 8 (4), 297–230.
Mei, J., Cheng, M.-M., Xu, G., Wan, L.-R., Zhang, H., 2021. SANet: A slice-aware Wang, J., Wang, J., Wen, Y., Lu, H., Niu, T., Pan, J., Qian, D., 2019. Pulmonary nodule
network for pulmonary nodule detection. IEEE Trans. Pattern Anal. Mach. Intell.. detection in volumetric chest CT scans using CNNs-based nodule-size-adaptive
Monkam, P., Qi, S., Ma, H., Gao, W., Yao, Y., Qian, W., 2019. Detection and detection and classification. IEEE Access 7, 46033–46044.
classification of pulmonary nodules using convolutional neural networks: a survey. Wang, X., Zheng, Y., Zhang, X., Cao, T., Li, C., Wang, X., Wang, Y., Fang, Z.,
IEEE Access 7, 78075–78091. Wang, Y., 2022. Self-knowledge distillation for the object segmentation based on
Ni, Z.-L., Zhou, X.-H., Wang, G.-A., Yue, W.-Q., Li, Z., Bian, G.-B., Hou, Z.-G., 2022. atrous spatial pyramid. In: Journal of Physics: Conference Series, Vol. 2294. IOP
SurgiNet: Pyramid attention aggregation and class-wise self-distillation for surgical Publishing, 012023, no. 1.
instrument segmentation. Med. Image Anal. 76, 102310. Wu, Z., Zhou, Q., Wang, F., 2021. Coarse-to-fine lung nodule segmentation in CT images
Niemeijer, M., Loog, M., Abramoff, M.D., Viergever, M.A., Prokop, M., van Ginneken, B., with image enhancement and dual-branch network. IEEE Access 9, 7255–7262.
2010. On combining computer-aided detection systems. IEEE Trans. Med. Imaging Xie, L., Cai, W., Gao, Y., 2022. Dmcgnet: A novel network for medical image
30 (2), 215–223. segmentation with dense self-mimic and channel grouping mechanism. IEEE J.
Pereira, F.R., De Andrade, J.M.C., Escuissato, D.L., De Oliveira, L.F., 2021. Classifier Biomed. Health Inf. 26 (10), 5013–5024.
ensemble based on computed tomography attenuation patterns for computer-aided Xu, G., Liu, Z., Li, X., Loy, C.C., 2020a. Knowledge distillation meets self-supervision.
detection system. IEEE Access 9, 123134–123145. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August
Pezeshk, A., Hamidian, S., Petrick, N., Sahiner, B., 2018. 3-D convolutional neural 23–28, 2020, Proceedings, Part IX. Springer, pp. 588–604.
networks for automatic detection of pulmonary nodules in chest CT. IEEE J. Xu, K., Rui, L., Li, Y., Gu, L., 2020b. Feature normalized knowledge distillation for
Biomed. Health Inform. 23 (5), 2080–2090. image classification. In: Computer Vision–ECCV 2020: 16th European Conference,
Phuong, M., Lampert, C.H., 2019. Distillation-based training for multi-exit architectures. Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer, pp.
In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 664–680.
1355–1364. Yuan, H., Wu, Y., Cheng, J., Fan, Z., Zeng, Z., 2021. Pulmonary nodule detection using
Qin, D., Bu, J.-J., Liu, Z., Shen, X., Zhou, S., Gu, J.-J., Wang, Z.-H., Wu, L., Dai, H.-F., 3-D residual U-net oriented context-guided attention and multi-branch classification
2021. Efficient medical image segmentation based on knowledge distillation. IEEE network. IEEE Access 10, 82–98.
Trans. Med. Imaging 40 (12), 3820–3831. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K., 2019. Be your own teacher:
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: Convolutional networks for Improve the performance of convolutional neural networks via self distillation. In:
biomedical image segmentation. In: International Conference on Medical Image Proceedings of the IEEE/CVF International Conference on Computer Vision. pp.
Computing and Computer-Assisted Intervention. Springer, pp. 234–241. 3713–3722.
Schlemper, J., Oktay, O., Schaap, M., Heinrich, M., Kainz, B., Glocker, B., Rueckert, D., Zhang, J., Xia, Y., Cui, H., Zhang, Y., 2018. Pulmonary nodule detection in medical
2019. Attention gated networks: Learning to leverage salient regions in medical images: A survey. Biomed. Signal Process. Control 43, 138–147.
images. Med. Image Anal. 53, 197–207. Zhao, D., Liu, Y., Yin, H., Wang, Z., 2022. An attentive and adaptive 3D CNN for
Seemann, M., Staebler, A., Beinert, T., Dienemann, H., Obst, B., Matzko, M., Pis- automatic pulmonary nodule detection in CT image. Expert Syst. Appl. 118672.
titsch, C., Reiser, M., 1999. Usefulness of morphological characteristics for the Zheng, S., Cui, X., Vonder, M., Veldhuis, R.N., Ye, Z., Vliegenthart, R., Oudkerk, M.,
differentiation of benign from malignant solitary pulmonary lesions using HRCT. van Ooijen, P.M., 2020. Deep learning-based pulmonary nodule detection: Effect of
Eur. Radiol. 9 (3), 409–417. slab thickness in maximum intensity projections at the nodule candidate detection
Setio, A.A.A., Traverso, A., De Bel, T., Berens, M.S., Van Den Bogaard, C., Cerello, P., stage. Comput. Methods Programs Biomed. 196, 105620.
Chen, H., Dou, Q., Fantacci, M.E., Geurts, B., et al., 2017. Validation, comparison, Zheng, S., Guo, J., Cui, X., Veldhuis, R.N., Oudkerk, M., Van Ooijen, P.M., 2019.
and combination of algorithms for automatic detection of pulmonary nodules in Automatic pulmonary nodule detection in CT scans using convolutional neural
computed tomography images: the LUNA16 challenge. Med. Image Anal. 42, 1–13. networks based on maximum intensity projection. IEEE Trans. Med. Imaging 39
Sluimer, I., Schilham, A., Prokop, M., Van Ginneken, B., 2006. Computer analysis of (3), 797–805.
computed tomography scans of the lung: a survey. IEEE Trans. Med. Imaging 25 Zhou, Z., Gou, F., Tan, Y., Wu, J., 2022. A cascaded multi-stage framework for auto-
(4), 385–405. matic detection and segmentation of pulmonary nodules in developing countries.
Sung, H., Ferlay, J., Siegel, R.L., Laversanne, M., Soerjomataram, I., Jemal, A., IEEE J. Biomed. Health Inf..
Bray, F., 2021. Global cancer statistics 2020: GLOBOCAN estimates of incidence Zhu, X., Wang, X., Shi, Y., Ren, S., Wang, W., 2022. Channel-wise attention mechanism
and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin. 71 in the 3D convolutional network for lung nodule detection. Electronics 11 (10),
(3), 209–249. 1600.
15

MEDS-Net - Multi-Encoder Based Self-Distilled Network With Bidirectional Maximum Intensity Projections Fusion For Lung Nodule Detection

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MEDS-Net - Multi-Encoder Based Self-Distilled Network With Bidirectional Maximum Intensity Projections Fusion For Lung Nodule Detection

Uploaded by

Copyright:

Available Formats

Engineering Applications of Artificial Intelligence 129 (2024) 107597

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

MEDS-Net: Multi-encoder based self-distilled network with bidirectional

ARTICLE INFO ABSTRACT

1. Introduction employed in 2D and 3D to achieve significantly improved perfor-

2.4. Self-distillation for lung nodule detection 3. Proposed methodology

extracted from the 3D outputs of all four auxiliary detectors to detect

performance, denoting that the detection system is both accurate and

efficiency. Finally, the best performance is shown by model 9, which

conventional computer-aided detection (CADe) systems, the study ex- References

You might also like