Professional Documents
Culture Documents
Unsupervised Desmoking of Laparoscopy Images Using Multi-Scale Desmokenet
Unsupervised Desmoking of Laparoscopy Images Using Multi-Scale Desmokenet
Author Proof
1 Introduction
In laparoscopic surgery, the generation of artefacts like specular reflections, sur-
gical smoke [2], blood and inadequate illumination degrades the image quality,
which affects the efficiency of the computer vision algorithms for assistive help
at tasks like tracking and detection [14]. The robust performance of the track-
ing and detection algorithms cannot be realized without removing the artefacts
c Springer Nature Switzerland AG 2020
J. Blanc-Talon et al. (Eds.): ACIVS 2020, LNCS 12002, pp. 1–12, 2020.
https://doi.org/10.1007/978-3-030-40605-9_37
2 V. Vishal et al.
from the images. This study particularly discusses the smoke in the laparoscopic
Author Proof
surgery, which blocks surgeon’s operative field and affects his ability to carry out
various procedures specially in case of image-guided surgical systems. Hence, the
removal of the smoke is an essential task in order to realize the full potential
and benefits of laparoscopic surgery. Hardware based techniques [23] and other
computer vision methods [13,24] have been developed to remove the smoke,
which are bound by certain limitations. A digital solution can serve as a use-
ful tool to better visualize the operative field and produce high quality images
for the computer vision pipeline of the system. In recent times, the advance-
ments in computer vision has been greatly influenced by Deep learning. The rise
of new deep learning networks, objective functions and strategies has enabled
to achieve impressive results and breakthroughs with problems that were dif-
ficult to be solved by traditional and non-learning based approaches. Further,
Goodfellow et al. formulated the Generative Adversarial networks (GANs) [8],
which has shown significant progress in the field of image synthesis. The com-
petitive learning between the generator and discriminator networks drives the
GANs to realise an implicit loss function that helps produce images that are
more sharper and of higher perceptual quality. The generator and discriminator
networks both learn the distribution of the training images. The generator tries
to produces fake images, while the discriminator network tries to distinguish
the fake images from the real images from the training dataset. The min-max
optimization between the generator and the discriminator networks, i.e genera-
tor minimizes the discriminator’s ability to distinguish the fake images from real
images and discriminator maximizes its ability to identify real images. GANs has
also depicted outstanding results at task of Image-to-Image translation, which
requires either paired or unpaired data.
Previously, Isola et al. proposed the Pix2Pix framework [10] that utilizes
a conditional generative adversarial network to learn a mapping between the
input and output images using paired data. Various tasks like generation of
photographs from semantic layouts [11], sketches [21] and edges can be seen
using similar approach. The extended work of the Pix2Pix framework resulted in
CycleGAN [31] that aimed to tackle the task of image translation using unpaired
data. It learns a mapping between two data domains: X and Y rather than just
pair of images. This enables it to be unsupervised and allows the generation of
images even in the absence of ground truth. Similar work can be seen in case of
UNIT [17], DiscoGAN [12] and DualGAN [30].
In the present study, we focus on the task of smoke removal as a single
image desmoking that translates a smoke image to an image resembling smoke-
free images. The translation of the image from the smoke domain to smoke-free
domain depicts considerable decrease in the smoke component and results in the
enhancement of the image. The main contributions of the paper are as follows:
(MSFE) blocks. The MSFE blocks perform robust feature extraction and help
capture the smoke features at multiple scales at each encoder level.
3. A new loss function called structure-consistency loss has been proposed. This
loss ensures the structure in the image is maintained in the translation frame-
work, hence resulting in the reconstruction of the desmoked images with struc-
ture and edges identical to the smoke image.
2 Related Work
Previously, Tchaka et al. [24] utilized the dark channel prior dehazing method
(DCP) [9] for the task of smoke removal. They have improved upon the DCP
by performing two modifications: decreasing the emphasis on pixel values in the
range where smoke is expected to be present and thresholding the dark chan-
nel by a constant value. Further, to enhance the contrast they have performed
histogram equalization. In [13], Kotwal et al. framed the problem as Bayesian
inference problem and performed joint de-smoking and de-noising using a prob-
abilistic graphical, where the uncorrupted image is represented as a Markov
random field (MRF) and maximum-a-posteriori (MAP) is used to obtain the
enhanced images. The method was extended in [1] to perform specular removal,
along with de-smoking and de-noising. Previously proposed methods for desmok-
ing also focused on applying solutions relevant to the problem of dehazing. The
mathematical model for the atmospheric scattering is given as follows:
where I(x) represents the hazy image, J(x) is the haze-free image, t(x) is the
transmission map and A is the global atmospheric light on each x pixel coordi-
nates.
Wang et al. in [27], devised a variational method to estimate the smoke
veil for smoke removal assuming smoke component to have low contrast and
low inter-channel differences. The results of the method yielded considerable
enhancement in the image and higher visual quality. On the other hand, Luo
et al. [18] introduced a visibility-driven defogging framework that recovers the
atmospheric veil solution as bilateral of bilateral grid and finally obtains the
defogged image by Eq. 1. The work also proposed no-reference image quality
assessment metrics to quantify the naturalness and sharpness in an image.
Recently, Sabri et al. produced a synthetic smoke images dataset using Perlin
noise and utilized AOD-net model to transfer learn the task of smoke removal [4].
In [28], Wang et al. trained a encoder-decoder architecture with Laplacian image
4 V. Vishal et al.
3 Method
This section describes the proposed method for single image desmoking of laparo-
scopic images. Figure 1 depicts the overview of the Image-to-Image translation
framework, which consists of two generator networks GDS and GS that help gen-
erate synthetic desmoked and smoke images respectively, two discriminator net-
works DDS and DS that help distinguish the synthesized desmoked images from
real smoke-free images and synthesized smoke images from real smoke images.
The two mapping functions are desmoking (DS) and re-smoking (RS) that map
smoke to desmoke images and smoke-free to smoke images respectively. In addi-
tion to the adversarial and cycle-consistency losses, we include structure consis-
tency loss as well, this enables to maintain and preserve the structure and edge
information during the mapping operations.
In order to ensure the structure between the images of the mapping operations
is preserved, we include the structure consistency loss. Such a loss is essential at
reconstructing the structure and edges in an image similar to the input images.
We make use of the Canny edge detection to obtain edge information in the
image. The image gradients of the input image serve as the ground truth and
through L1 norm we try to generate the image gradients of the translated images
to be similar to the input image. This forces the generator networks to produce
images that resemble the input images in both edge and structural information
and also helps in reduction of artefacts, distortions etc. The structure consistency
loss is given as follows:
where LSF and LSB are losses for the forward and backward translations respec-
tively. The terms α1 and α2 are set as 0.5 and 1 respectively in order to control
the amount of edge information reconstructed. α1 is set lower than α2 as the edge
information in the input image approximately resembles the translated image,
while the edge information of the reconstructed input image has complete resem-
blance with respect to that of input image.
propose similar technique for the task of desmoking. Our multi-scale feature
Author Proof
extraction blocks consists of different kernel sizes. The input feature map passes
sequentially through conv 3 × 3, conv 5 × 5, conv 7 × 7 branch in an incremental
kernel dimension pattern and conv 7 × 7, conv 5 × 5, conv 3 × 3 branch in a
decremental kernel dimension pattern respectively. By doing so, we are able to
vary the receptive field effectively and capture distinctive features from the input
feature map. The output feature maps from each convolution operation is con-
catenated, as depicted in Fig. 2. The concatenated blocks are further convolved
by particular kernel sizes and the output feature maps are concatenated and
later convolved by 1 × 1 kernel to maintain the desired channel dimension. If the
channel dimension in the input feature map is M, then the channel dimension
after every convolution operation is maintained as M. We also include residual
learning in order making the learning more efficient and allow the information
to flow from the input layer to the output layer. This is brought about a skip
connection and element-wise addition.
Generator Architecture: We adopt an encoder-decoder architecture design
for the generator. Each encoder layer consists of a multi-scale feature extraction
block and is downsampled by a convolution operation with stride 2. The bottle-
neck consists of 6 residual blocks similar to [31]. Each decoder layer consists of a
pixel shuffle [22] operation that receives input feature maps from the respective
encoder and decoder layers as depicted in Fig. 3. Once the desired spatial resolu-
tion is realized, convolution operations are performed to obtain the output image
with same dimensions of that of the input image. This architectural design for
the generator network allows it reconstruct the images.
tecture as in [31]. They consist of five convolutional layers that identify whether
the 70 × 70 overlapping image patches are synthetic or real.
This section evaluates the proposed approach on the ITEC Smoke Cholec80 real
laparoscopic image dataset, which is composed of 80 different cholecystectomy
procedures. The present study first introduces the dataset and discusses the
implementation details. Each component of the proposed approach is evaluated
and the results are analyzed. Finally, the study provides the comparison results
with other state-of-the-art de-smoking methods.
ITEC Smoke Cholec80 Image Dataset: The public dataset [15] consists of
100K smoke/non-smoke images extracted from the Cholec80 dataset [25]. We
have selected 1200 images each for smoke domain and smoke-free domain as
our training set, random 100 images from the training set as the validation
set and 200 images as the test set. As there is black corners due to the camera
arrangement, the images have been center cropped to 240×320, which will avoid
network to learn unnecessary information. The selected images have varying
levels of smoke at different depths, to ensure the network is to learn on a diverse
set of images.
Fig. 4. Ablation study on the utility of the MSFE blocks and the structure consistency
loss. a: input smoke images, b: images generated without MSFE blocks but just con-
volution blocks, c: images generated without structure consistency loss in the overall
objective function, d: images from the proposed method
Method a b c d
Image quality BRISQE CEIQ BRISQE CEIQ BRISQE CEIQ BRISQE CEIQ
Mean 19.14 3.345 17.95 3.350 17.50 3.317 17.43 3.352
epochs with ADAM as the optimizer. The batch size is fixed to one. Experimen-
tally, the λ1 and λ2 terms in the complete objective function are set to 10 and
0.1 respectively. Tensorflow was used to train the network on the NVidia K80
GPU.
Experimental Results: The use of multi-scale feature extraction block in com-
parison to just convolution layers help to capture features at different scales and
this is essential at detecting and reducing the smoke component in the image and
also leads to better contrast. The addition of the structure-consistency loss along
with adversarial and cycle-consistency loss results in images that are sharper and
with enhanced structural information as shown in Fig. 4. Further, the quantita-
tive analysis of ablation study on the validation dataset in terms of image quality
measures (i) Blind/Referenceless Image Spatial Quality Evaluator (BRISQE)
[20] and (ii) Quality Assessment of Contrast-Distorted Images (CEIQ) [29] is
shown in Table 1. The lower values for BRISQE and higher values for CEIQ
indicates better perceptual quality.
Fig. 5. Qualitative evaluation of smoke removal on randomly selected images from test
set. a: input smoke images, b: Non-local Dehaze, c: DehazeNet, d: DCP, e: CycleGAN,
f: Proposed Multi-scale DesmokeNet
10 V. Vishal et al.
local Dehaze [3], Single image haze removal using dark channel prior (DCP)
[9], Dehazenet [5] and CycleGAN [31]. Figure 5 shows the qualitative results on
random four images. It is quite evident that the desmoked images generated
from other comparison methods are low in contrast, dark and result in lesser
smoke removal in comparison to the proposed method. The images from proposed
method looks more promising and of higher quality. Further, the quantitative
results in Table 2 also suggest the better performance of the proposed method
over the other state-of-the-art methods.
Table 2. Quantitative evaluation on the test dataset. The values denote the mean of
the image quality.
3.7 Conclusion
This paper introduced a new unsupervised learning method for removal of smoke
in laparoscopic images. The proposed method consists of a new generator archi-
tecture of comprising of novel multi-scale feature extraction blocks that help to
alleviate the smoke component at different scales. Further, the new structure-
consistency loss in addition with the adversarial and cycle-consistency losses
results in preserving the structure of the image effectively. The proposed method
is qualitatively and quantitatively compared and has shown the edge over other
state-of-the-art desmoking methods.
As the surgical smoke removal relies on mechanical solutions that still have
lag time, having a digital visualization of the frames that automatically removes
smoke will be of great use for the practitioners and surgeons, we plan to extend
this work to develop an algorithm that performs in real time. Further, looking at
the spatial-temporal consistency between the sequences of frames and we would
try to ensure that the level of smoke removal is not dependent on its amount
but just on its presence in that frame.
Unsupervised Desmoking of Laparoscopy Images Using DesmokeNet 11
References
Author Proof
1. Baid, A., Kotwal, A., Bhalodia, R., Merchant, S., Awate, S.P.: Joint desmoking,
specularity removal, and denoising of laparoscopy images via graphical models and
Bayesian inference. In: 2017 IEEE 14th International Symposium on Biomedical
Imaging, ISBI 2017, pp. 732–736. IEEE (2017)
2. Barrett, W.L., Garber, S.M.: Surgical smoke: a review of the literature. Surg.
Endosc. 17(6), 979–987 (2003)
3. Berman, D., Avidan, S., et al.: Non-local image dehazing. In: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 1674–1682
(2016)
4. Bolkar, S., Wang, C., Cheikh, F.A., Yildirim, S.: Deep smoke removal from min-
imally invasive surgery videos. In: 2018 25th IEEE International Conference on
Image Processing (ICIP), pp. 3403–3407. IEEE (2018)
5. Cai, B., Xu, X., Jia, K., Qing, C., Tao, D.: DehazeNet: an end-to-end system for
single image haze removal. IEEE Trans. Image Process. 25(11), 5187–5198 (2016)
6. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convo-
lutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe,
N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-46493-0 22
7. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder
with atrous separable convolution for semantic image segmentation. In: Ferrari, V.,
Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp.
833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2 49
8. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Infor-
mation Processing systems, pp. 2672–2680 (2014)
9. He, K., Sun, J., Tang, X.: Single image haze removal using dark channel prior.
IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2010)
10. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with condi-
tional adversarial networks. In: Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1125–1134 (2017)
11. Karacan, L., Akata, Z., Erdem, A., Erdem, E.: Learning to generate images of out-
door scenes from attributes and semantic layouts. arXiv preprint arXiv:1612.00215
(2016)
12. Kim, T., Cha, M., Kim, H., Lee, J.K., Kim, J.: Learning to discover cross-domain
relations with generative adversarial networks. In: Proceedings of the 34th Inter-
national Conference on Machine Learning-Volume 70, pp. 1857–1865. JMLR. org
(2017)
13. Kotwal, A., Bhalodia, R., Awate, S.P.: Joint desmoking and denoising of
laparoscopy images. In: 2016 IEEE 13th International Symposium on Biomedi-
cal Imaging (ISBI), pp. 1050–1054. IEEE (2016)
14. Lawrentschuk, N., Fleshner, N.E., Bolton, D.M.: Laparoscopic lens fogging: a
review of etiology and methods to maintain a clear visual field. J. Endourol. 24(6),
905–913 (2010)
15. Leibetseder, A., Primus, M.J., Petscharnig, S., Schoeffmann, K.: Real-time image-
based smoke detection in endoscopic videos. In: Proceedings of the on Thematic
Workshops of ACM Multimedia 2017, pp. 296–304. ACM (2017)
16. Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K.: Enhanced deep residual networks
for single image super-resolution. In: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition Workshops, pp. 136–144 (2017)
12 V. Vishal et al.
17. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation net-
Author Proof
works. In: Advances in Neural Information Processing Systems, pp. 700–708 (2017)
18. Luo, X., McLeod, A.J., Pautler, S.E., Schlachta, C.M., Peters, T.M.: Vision-based
surgical field defogging. IEEE Trans. Med. Imaging 36(10), 2021–2030 (2017)
19. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares gen-
erative adversarial networks. In: Proceedings of the IEEE International Conference
on Computer Vision, pp. 2794–2802 (2017)
20. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in
the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
21. Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: Scribbler: controlling deep image
synthesis with sketch and color. In: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pp. 5400–5409 (2017)
22. Shi, W., et al.: Real-time single image and video super-resolution using an efficient
sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 1874–1883 (2016)
23. Takahashi, H., et al.: Automatic smoke evacuation in laparoscopic surgery: a sim-
plified method for objective evaluation. Surg. Endosc. 27(8), 2980–2987 (2013)
24. Tchaka, K., Pawar, V.M., Stoyanov, D.: Chromaticity based smoke removal in
endoscopic images. In: Medical Imaging 2017: Image Processing, vol. 10133, p.
101331M. International Society for Optics and Photonics (2017)
25. Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy,
N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE
Trans. Med. Imaging 36(1), 86–97 (2016)
26. Vishal, V., Sharma, N., Singh, M.: Guided unsupervised desmoking of laparoscopic
images using cycle-desmoke. In: Zhou, L., et al. (eds.) OR 2.0/MLCN -2019. LNCS,
vol. 11796, pp. 21–28. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
32695-1 3
27. Wang, C., Cheikh, F.A., Kaaniche, M., Beghdadi, A., Elle, O.J.: Variational based
smoke removal in laparoscopic images. Biomed. Eng. Online 17(1), 139 (2018)
28. Wang, C., Mohammed, A.K., Cheikh, F.A., Beghdadi, A., Elle, O.J.: Multiscale
deep desmoking for laparoscopic surgery. In: Medical Imaging 2019: Image Pro-
cessing, vol. 10949, p. 109491Y. International Society for Optics and Photonics
(2019)
29. Yan, J., Li, J., Fu, X.: No-reference quality assessment of contrast-distorted images
using contrast enhancement. arXiv preprint arXiv:1904.08879 (2019)
30. Yi, Z., Zhang, H., Tan, P., Gong, M.: DualGAN: unsupervised dual learning for
image-to-image translation. In: Proceedings of the IEEE International Conference
on Computer Vision, pp. 2849–2857 (2017)
31. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation
using cycle-consistent adversarial networks. In: Proceedings of the IEEE Interna-
tional Conference on Computer Vision, pp. 2223–2232 (2017)
Author Queries
Author Proof
Chapter 37
AQ1 Per Springer style, both city and country names must
be present in affiliations. Accordingly, we have inserted
the country name in affiliation “2”. Please check and
confirm if the inserted country name is correct. If not,
please provide us with the correct country name.