Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO.

11, NOVEMBER 2022 3357

Triplet Cross-Fusion Learning for Unpaired


Image Denoising in Optical
Coherence Tomography
Mufeng Geng , Xiangxi Meng , Lei Zhu , Zhe Jiang , Mengdi Gao, Zhiyu Huang,
Bin Qiu, Yicheng Hu, Yibao Zhang , Qiushi Ren, and Yanye Lu

Abstract — Optical coherence tomography (OCT) is a pose a novel triplet cross-fusion learning (TCFL) strategy
widely-used modality in clinical imaging, which suffers from for unpaired OCT image denoising. The model complexity
the speckle noise inevitably. Deep learning has proven of our strategy is much lower than those of the cycleGAN-
its superior capability in OCT image denoising, while the based methods. During training, the clean components and
difficulty of acquiring a large number of well-registered the noise components from the triplet of three unpaired
OCT image pairs limits the developments of paired learning images are cross-fused, helping the network extract more
methods. To solve this problem, some unpaired learning speckle noise information to improve the denoising accu-
methods have been proposed, where the denoising net- racy. Furthermore, the TCFL-based network which is trained
works can be trained with unpaired OCT data. However, with triplets can deal with limited training data scenarios.
majority of them are modified from the cycleGAN frame- The results demonstrate that the TCFL strategy outperforms
work. These cycleGAN-based methods train at least two state-of-the-art unpaired methods both qualitatively and
generators and two discriminators, while only one gener- quantitatively, and even achieves denoising performance
ator is needed for the inference. The dual-generator and comparable with paired methods. Code is available at:
dual-discriminator structures of cycleGAN-based methods https://github.com/gengmufeng/TCFL-OCT.
demand a large amount of computing resource, which may
be redundant for OCT denoising tasks. In this work, we pro- Index Terms — Unpaired learning, optical coherence
tomography, image restoration.
Manuscript received 18 March 2022; revised 10 May 2022 and 9 June
2022; accepted 15 June 2022. Date of publication 20 June 2022; date
of current version 27 October 2022. This work was supported in part I. I NTRODUCTION
by the Beijing Natural Science Foundation under Grant Z210008; in
part by the Shenzhen Science and Technology Program under Grant
KQTD20180412181221912 and Grant JCYJ20200109140603831; and
in part by the Open Project through the Key Laboratory of Carcinogen-
O PTICAL coherence tomography (OCT), a non-invasive
cross-sectional imaging technique, plays an essential role
in clinical imaging [1], such as brain imaging, ophthalmol-
esis and Translational Research, Ministry of Education/Beijing, under
Grant 2022 Open Project-2. (Corresponding author: Yanye Lu.) ogy, cardiology, etc. In practice, OCT images are inevitably
This work involved human subjects or animals in its research. Approval susceptible to the speckle noise, which severely degrades the
of all ethical and experimental procedures and protocols was granted by image quality and affects clinical diagnosis. Existing OCT
the Ethics Committee of Peking University First Hospital under Applica-
tion No. 2017-47, and performed in line with the Ethical Standards of the denoising methods can be divided into two categories: the
Institutional and/or National Research Committee, and the Declaration hardware-based methods and the software-based methods.
of Helsinki. As for the hardware-based approaches such as improving
Mufeng Geng, Lei Zhu, Zhe Jiang, Mengdi Gao, Zhiyu Huang, Bin Qiu,
Yicheng Hu, and Qiushi Ren are with the Institute of Medical Technology, the light source, the noise of detectors and scanners can be
Peking University Health Science Center, Peking University, Beijing reduced. However, the speckle or white noise in the imaging
100191, China, also with the Department of Biomedical Engineering, system cannot be eliminated [2], limiting the development of
College of Future Technology, Peking University, Beijing 100871, China,
also with the Institute of Biomedical Engineering, Peking University hardware-based methods. In contrast, software-based methods
Shenzhen Graduate School, Shenzhen 518055, China, also with post-process OCT images directly, and are not restricted to
the Institute of Biomedical Engineering, Shenzhen Bay Laboratory, specific acquisition processes. A range of traditional algorith-
Shenzhen 518071, China, and also with the National Biomedical
Imaging Center, Beijing 100871, China (e-mail: gmfpku@pku. mic methods have been developed for OCT image denoising,
edu.cn; zhulei@stu.pku.edu.cn; gjiang47@163.com; 1901111963@ including block-matching 3D (BM3D) [3], nonlocal-means
pku.edu.cn; hzy_bme@pku.edu.cn; qiub@pku.edu.cn; huyc@stu. (NLM) [4], noise adaptive wavelet thresholding algorithm [5],
pku.edu.cn; qren@pku.edu.cn).
Xiangxi Meng and Yibao Zhang are with the Key Laboratory of Car- etc. Although these methods can reduce the speckle noise to
cinogenesis and Translational Research (Ministry of Education), Peking various extents, they always cause certain noise residuals and
University Cancer Hospital and Institute, Beijing 100142, China (e-mail: the loss of many structural details in the denoised images.
mengxiangxi@pku.edu.cn; zhangyibao@pku.edu.cn).
Yanye Lu is with the Institute of Medical Technology, Peking Univer- Recently, deep learning has been widely utilized in the
sity Health Science Center, Peking University, Beijing 100191, China, medical imaging field due to its powerful modeling capability,
and also with the Institute of Biomedical Engineering, Peking Univer- such as low-dose reconstruction [6], material decomposi-
sity Shenzhen Graduate School, Shenzhen 518055, China (e-mail:
yanye.lu@pku.edu.cn). tion [7], super-resolution [8], segmentation [9], as well as
Digital Object Identifier 10.1109/TMI.2022.3184529 OCT image denoising. Compared with traditional algorithms,

1558-254X © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3358 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

deep learning-based OCT denoising methods have demon- [16], [25] to fit OCT image denoising tasks. They treat the
strated their superior capabilities in both speckle noise reduc- OCT denoising problem as a domain translation between the
tion and structure preservation. In order to improve the noisy domain and the clean domain, which requires at least
denoising effect, researchers have made some attempts in two generators (one for noisy-to-clean translation and the other
optimizing deep network structures [10], [11] or design- for clean-to-noisy translation) and at least two discriminators
ing structure-sensitive loss functions [12], [13]. Specifically, (one for noisy image discrimination and the other for clean
Xu et al. combined deep nonlinear convolutional neural net- image discrimination). As a result, the network structures
work (CNN) mapping and residual learning and proposed of these methods are always complex with huge parameters,
the OCTNet, realizing speckle noise reduction and texture which may be redundant for OCT image denoising tasks and
preservation for low-quality OCT images [10]. Shen et al. may cause overfitting and poor generality [15].
reported a double-path parallel CNN (DPNet) to extract deeper In this work, we propose a novel triplet cross-fusion learn-
denoising information from the noisy OCT images [11]. ing (TCFL) strategy for unpaired OCT image denoising. The
Ma et al. designed an edge-sensitive loss function inspired by model complexity of our strategy is much lower than those of
the edge preservation index (EPI) to make the deep denoiser the cycleGAN-based methods, where only one generator and
sensitive to edge-related details [12]. Qiu et al. presented a one discriminator are utilized in training. In addition, unlike
perceptually-sensitive loss function to preserve more structure most existing denoising methods whose training is based
information in the denoised OCT images [13]. Meanwhile, on two paired/unpaired images, the TCFL-based network is
there are also a number of OCT denoising works based on trained based on three unpaired images (i.e., a triplet), includ-
generative adversarial networks (GANs) to better regularize ing two unpaired noisy images and one clean image. During
the training process of denoising tasks [14], [15]. Although training, the clean components and the noise components
the above-mentioned deep learning methods have achieved from the triplets are cross-fused, helping the network extract
impressive results in OCT denoising tasks, they all follow more speckle noise information to improve the denoising
the supervised training strategy, requiring noisy-clean paired accuracy. To enable the capabilities of noise reduction and
OCT images. Clean labels are usually obtained by multi-frame structure preservation, based on the TCFL strategy, a global
averaging, which is time-consuming. It has been repeatedly residual learning is utilized, and an identity loss is designed.
demonstrated that the averaging operation might introduce The proposed TCFL strategy is verified over three OCT
motion artifacts and lose some critical structure details includ- datasets. The results demonstrate that the proposed strategy
ing some small lesions or abnormal features (e.g., hyper reflec- outperforms existing unpaired state-of-the-art methods both
tion), due to unconscious eye movement or body jitter during qualitatively and quantitatively, and even achieves denoising
scanning [16], [17]. Thus, it is hard to acquire the ideally- performance comparable with paired methods. This might be
registered noisy-clean OCT image pairs in practice. When the because the information interaction of clean components and
supervised deep learning networks are trained with such non- noise components among the training samples gets enhanced
ideal noisy-clean image pairs, the trained networks tend to through our TCFL strategy, which helps deep learning net-
smooth some critical structure details [16], [17]. Furthermore, works better distinguish between the clean components and
some weakly-supervised OCT denoising methods have been the noise components. In summary, the contributions of this
proposed based on the Noise2Noise (N2N) strategy [18]–[20]. paper are as follows:
Although the N2N strategy mitigates the need of noisy-clean • We propose a TCFL strategy for unpaired OCT image
image pairs, capturing well-registered noisy-noisy OCT image denoising, where three unpaired images are cross-fused
pairs remains a very challenging problem due to the uncon- during training, helping the denoising network extract
scious eye movement in clinical scannings. Both supervised more speckle noise information to improve the denoising
and weakly-supervised methods belong to paired methods, accuracy.
where the requirement of paired training data limits their • Based on the TCFL strategy, an identity loss is designed
developments. to enable the convergence of the denoising network.
In order to avoid the above-mentioned problem of paired Ablation studies demonstrate that the identity loss is
training data, some unpaired OCT image denoising methods essential for effective denoising.
have been proposed. Guo et al. reported the nonlocal-GAN • Compared with cycleGAN-based methods which involve
model to denoise OCT images [21], which considered that at least two generators and two discriminators, the
the background areas of OCT images mainly contained pure proposed TCFL strategy can be implemented with
“real” noise, and used the background areas to guide the whole only one generator and one discriminator, ensuring
image denoising process. However, in practice, the background lower model complexity. Extensive experimental results
areas of OCT images seldom purely contain noise information. demonstrate that the proposed strategy has reached the
Therefore, the background areas inevitably introduced some state-of-the-art in the unpaired OCT image denoising
wrong noise priors, limiting the denoising accuracy. Most field.
existing unpaired OCT image denoising methods are based • The use of three unpaired images results in a large
on the cycleGAN framework [16], [17], [22]–[25]. Although number of triplet combinations based on the fixed training
cycleGAN was not originally designed for OCT image denois- data, which can deal with the problem of limited train-
ing, some researchers have tried to optimize it in terms of ing data. Related experiments are conducted to confirm
network structures [17], [23], [24] or objective loss functions this.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
GENG et al.: TRIPLET CROSS-FUSION LEARNING FOR UNPAIRED IMAGE DENOISING IN OCT 3359

Fig. 1. Training data requirements of (a) paired image denoising and (b) unpaired image denoising.

II. R ELATED W ORKS images from a single noisy image to meet the N2N con-
Over the years, numerous deep learning-based image dition. These self-supervised methods were mainly used in
denoising methods have been proposed, which can be divided natural image denoising and involved requirements in terms
into paired and unpaired learning methods. Fig. 1 compares of noise characteristics, such as zero-mean noise distribution.
their training data requirements. These requirements are implausible for some medical image
denoising tasks, limiting the applications in this area.
Some researchers tried to use cycleGAN in unpaired image
A. Paired Image Denoising
denoising [22], [32]–[34], and the training of this method
Paired learning methods can be further divided into super- is based on two unpaired noisy-clean images. Zhu et al.
vised methods and weakly-supervised methods. As for the first proposed cycleGAN for unpaired image-to-image trans-
supervised image denoising, noisy-clean image pairs are lation [22]. After that, cycleGAN was widely adopted in the
needed to train the network. Zhang et al. proposed DnCNN field of image denoising, including OCT image denoising.
which first used a CNN and a global residual learning in image Specifically, Guo et al. proposed a structure-aware noise reduc-
denoising [26]. DnCNN verifies the effectiveness of deep tion GAN (SNRGAN) for unpaired OCT image denoising,
CNNs for image denoising tasks, and has been optimized and which learned the mapping function between two domains
widely used in various medical image denoising scenarios [6]. (the clean and the noisy domains), and a structure-aware
In the field of OCT image denoising, a number of supervised loss based on the structural similarity (SSIM) index was
learning-based methods have been proposed [10]–[15], which introduced [16]. Similarly, Manakov et al. developed a hybrid
have been reviewed in Section I. However, in practice, it is discriminator cycleGAN (HDcycleGAN) to realize unpaired
always hard and expensive to acquire a large number of OCT image denoising, which introduced skip connections
noisy-clean image pairs to train the deep learning network, into generators and utilized a shared discriminator [23].
limiting the development of supervised image denoising. Das et al. reported an unpaired denoising framework inspired
Weakly-supervised denoising networks can be trained using by cycleGAN to perform fast and reliable OCT denoising [24].
a pair of noisy images. Lehtinen et al. first statistically Huang et al. presented a disentangled representation GAN
demonstrated that under some reasonable assumptions on (DRGAN), which also introduced the cycle-consistency idea
noise distribution, it is possible to retrieve the underlying clean and transformed the speckle noise reduction problem to a noise
image using only noisy images during training, and proposed disentanglement problem [17]. Wu et al. proposed a structure-
a weakly-supervised denoising strategy, N2N. After that, N2N persevered cycleGAN (SPcycleGAN) to denoise OCT images
was extended to various image denoising tasks [19], [27]. and used a structure-specific cross-domain description to pre-
Qiu et al. comprehensively studied the denoising performance serve retinal details [25]. These works have demonstrated the
of four common deep networks in the OCT image denoising capability of the cycleGAN framework to address the OCT
task following the N2N strategy [19]. Ahmed et al. reported image denoising tasks.
a collaborative technique to train multiple N2N denoisers
to address low-dose computed tomography (CT) denoising
III. M ETHODS
tasks [27]. Although N2N avoids the use of clean-noisy
paired images, the collection of a large number of well- A. OCT Speckle Noise Model
registered noisy-noisy image pairs might be also challenging, In clinical settings, considering the limited dynamic range
especially for the medical imaging due to the unconscious of display monitors and the perception of the human eyes,
body movement during scanning. projecting the measured OCT data into the logarithm space
to acquire the finally-displayed OCT images is widely
B. Unpaired Image Denoising adopted [35]. As a result, it causes the conversion of speckle
Unpaired learning methods mainly include self-supervised noise from multiplicative (data-dependent) to additive (data-
methods and cycleGAN-based methods [28]. Self-supervised independent) [36]. Thus, in this work, we simplify a speckle
denoising methods train the deep denoising networks based on noise-corrupted OCT image I as the sum of its clean compo-
a single noisy image, including Noise2Self [29], Recorrupted- nent C and its speckle noise N.
to-Recorrupted (R2R) [30], Neighbor2Neighbor (NBR) [31], I = C + N. (1)
etc. Noise2Self utilized a blind-spot strategy to make deep
networks learn the clean image from a single noisy image, We aim to restore C from I to improve the OCT image
while R2R and NBR manually created paired noisy-noisy quality. Fig. 2 shows a noisy OCT image and its corresponding

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3360 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

Fig. 2. Speckle noise-corrupted OCT image (left), the corresponding


clean OCT image (middle), and the speckle noise (right).

clean image and speckle noise. The clean image is generated


with 50-frame averaging, and the speckle noise is the differ-
ence between the noisy image and the clean image. We can
hardly see any structural information from the speckle noise
image, which supports the rationality of the additive noise
model in (1).

B. Motivation
In the field of OCT image denoising, in contrast with
paired learning, unpaired learning breaks the limit of paired
training data, reducing the difficulty of data acquisition. Fig. 3. The proposed triplet cross-fusion learning (TCFL) strategy, whose
Among unpaired OCT image denoising methods, cycleGAN training utilizes only one generator (i.e., noise predictor) and one discrim-
and its variants (such as HDcycleGAN [23], SNRGAN [16], inator. The training of noise predictor consists of three sequential parts:
(a) Noisy2Predicted-Clean (N2P) part, (b) Predicted-Clean2Predicted-
DRGAN [17], SPcycleGAN [25], etc.) have become the Clean (P2P) part, and (c) Clean2Predicted-Clean (C2P) part.
mainstream solutions, due to the similarities between image- (d) Discriminator and loss calculation.
to-image translation and image denoising. In detail, cycleGAN
was first proposed to solve unpaired image-to-image transla-
tion problems, which aimed to translate images between two unpaired noisy images and one clean image. A discriminator
domains, X = {I1 , I2 , . . . , Im } and Y = {C1 , C2 , . . . , Cm }, is also introduced to help the denoising network generate
in the absence of paired samples. It is noteworthy that the realistic fake images. The proposed strategy involves only one
image-to-image translation through cycleGAN is bidirectional, generator and one discriminator. Obviously, compared with
consisting of both the mapping from X to Y and the inverse cycleGAN-based methods, the model complexity is reduced
mapping from Y to X . As a result, cycleGAN-based methods significantly with our proposed strategy.
contain at least two generators (G 1 for X → Y and G 2 for
Y → X ) and at least two discriminators (D1 for discrimination C. Triplet Cross-Fusion Learning
of X and D2 for discrimination of Y). Migrating to OCT
1) Overview: The proposed TCFL strategy is able to utilize
image denoising, X is regarded as the noisy domain, and Y
corresponds to the clean domain. That is, the OCT image concise structures to finish the training of the denoising
denoising problem is treated as a special case of image- network in an end-to-end manner. Three unpaired OCT images
to-image translation through cycleGAN. Existing cycleGAN- are used to train the TCFL-based network, including one
based OCT image denoising methods train both G 1 for noisy image IA , the other one noisy image IB , and one
X → Y and G 2 for Y → X , while only G 1 is needed clean image CC . As shown in Fig. 3, only one generator
for the inference. The dual-generator and dual-discriminator (i.e., noise predictor) and one discriminator are utilized under
structures of cycleGAN-based methods demand a large amount the proposed TCFL strategy. The training of noise predic-
of computing resource, which may be redundant for OCT tor consists of three sequential parts: Noisy2Predicted-Clean
denoising tasks. Thus, there is a demand of exploring an alter- (N2P) part, Predicted-Clean2Predicted-Clean (P2P) part, and
native unpaired image denoising method to cycleGAN-based Clean2Predicted-Clean (C2P) part. Algorithm 1 presents the
denoising methods, reducing the model complexity with fewer pseudocode of the training and inference of the TCFL-based
generators and fewer discriminators. network.
2) N2P Part: In this part, IA and IB first go through noise
In this work, based on the observation that the speckle
noise in OCT images is additive in the logarithm space, predictor pn (·) to generate two speckle noise images NA and
we design a simple yet effective clean-to-noisy translation NB , respectively.
approach. A fake noisy OCT image can be synthesized by NA = pn (IA ),
adding a clean OCT image and the speckle noise of another
NB = pn (IB ). (2)
noisy OCT image. In order to enable the loss constraint and
the convergence of the network, the denoising network is Then, two fake clean images and CA1 CB1
can be estimated
trained based on three unpaired OCT images including two by subtracting noisy inputs from predicted noise (i.e., global

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
GENG et al.: TRIPLET CROSS-FUSION LEARNING FOR UNPAIRED IMAGE DENOISING IN OCT 3361

Algorithm 1 Training and Inference of the TCFL-Based two clean images CA2 and CB2 , respectively.
Network
CA2 = IA − pn (IA ),
Require: Three unpaired OCT images, IA , IB , and CC ;
Require: A noise predictor pn and a discriminator D; CB2 = IB − pn (IB ). (5)
Require: Initialize parameters θ pn and θ D randomly; Although the speckle noise components in IA and IA are
for each epoch do different, their clean components (i.e., CA1 and CA2 ) should be
Sample the batch;
identical. Analogically, CB1 and CB2 should also be identical.
for each iteration do
4) C2P Part: Similar to the synthesis of IA and IB , we can
Noisy2Predicted-Clean (N2P) part:
pn (IA ) → NA ; synthesize two fake noisy images IC1 and IC2 based on CC ,
pn (IB ) → NB ; NA , and NB .
IA − NA → CA1 ; IC1 = CC + NA ,
IB − NB → CB1 ; IC2 = CC + NB . (6)
Predicted-Clean2Predicted-Clean (P2P) part:
CA1 + NB → IA (cross-fusion); Then, IC1 and IC2 go through pn (·) and subtraction operations
CB1 + NA → IB (cross-fusion); to estimate two clean images CC1 and CC2 , respectively.
IA − pn (IA ) → CA2 ;
CC1 = IC1 − pn (IC1 ),
IB − pn (IB ) → CB2 ;
Clean2Predicted-Clean (C2P) part: CC2 = IC2 − pn (IC2 ). (7)
CC + NA → IC1 (cross-fusion); Ideally, CC , CC1 and CC2 should be identical.
CC + NB → IC2 (cross-fusion); 5) Discriminator: In this work, we adopt a discriminator to
IC1 − pn (IC1 ) → CC1 ; realize the adversarial learning with the generator (i.e., noise
IC2 − pn (IC2 ) → CC2 ; predictor). The generator in TCFL is designed to estimate
Calculate identity loss and GAN loss; noise, whereas the discriminator of TCFL is used to identify
Update pn : θ pn ← Adam; real clean OCT images CC from the predicted OCT images
Update D: θ D ← Adam; C = {CA1 , CA2 , CB1 , CB2 , CC1 , CC2 }, as shown in Fig. 3 (d).
Return: Trained pn ; 6) Loss Function: The total loss of TCFL L total can be
Inference: Given the noisy image ID from the test set, expressed as:
predict the clean image CD : ID − pn (ID ) → CD . L total = L GAN + λL iden , (8)
where L GAN denotes the GAN loss contributed by the discrim-
inator D; L iden indicates the specifically-designed identity loss
residual learning). It has been reported in the literature that of generator G; λ is the weight of L iden .
the global residual learning is able to help the deep denoisers Inside (8), L GAN can be written as:
protect more structural details [10], [26].
L GAN = E[logD(C)] + E[log(1− D(C ))], (9)
CA1 = IA − NA , where E[·] represents the expectation operator; C is the real
CB1 = IB − NB . (3) clean image; C denotes the fake clean image. D tries to
maximize this objective, while G tries to minimize it.
It can be seen that in this part the noise predictor pn (·) predicts The identity loss L iden is defined as follows:
two speckle noise images (NA and NB ) and two clean images
(CA1 and CB1 ) with two noisy images (IA and IB ) as inputs L iden = L iden A + L iden B + μ(L iden C1 + L iden C2 ), (10)
in an independent and parallel manner. where μ is the loss weight, and {L iden A , L iden B , L iden C1 ,
3) P2P Part: Due to the data-independence of speckle L iden C2 } can be expressed as:
noise in the logarithm space, we can synthesize two fake
noisy images by a cross-fusion mechanism. Specifically, the L iden A = CA1 − CA2 1 ,
cross-fusion mechanism aims to synthesize a new noisy image L iden B = CB1 − CB2 1 ,
by adding the clean component from one image and the L iden C1 = CC1 − CC 1 ,
noise component from another image. That is, one fake
L iden = CC2 − CC 1 , (11)
noisy image IA can be obtained by adding CA1 and NB ; the C2

other one fake noisy image IB can be obtained by adding where  · 1 represents L1 norm, L iden A indicates the identity
CB1 and NA . loss between CA1 and CA2 ; L iden B denotes the identity loss
between CB1 and CB2 ; L iden C1 is the identity loss between
IA = CA1 + NB , CC1 and CC ; L iden C1 represents the identity loss between
IB = CB1 + NA . (4) CC2 and CC .
The designed identity loss L iden enables the conver-
After that, IA and IB are inputted into the noise predictor pn (·) gence of the denoising network. Under the constraints of
and subsequently performed subtraction operations to estimate L iden , the clean components and the noise components from

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3362 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

Fig. 4. An representative implementations of the noise predictor


and the discriminator. (a) DnCNN (noise predictor). (b) PatchGAN
(discriminator).

{IA , IB , CC } can be cross-fused diversely, resulting in that the


network can extract more speckle noise information. Thus,
the clean components and noise components can be better
distinguished.

Fig. 5. Example images from PKU37, DUKE17, and DUKE28. For each
IV. E XPERIMENTAL S ETUP AND R ESULTS image, four ROIs are chosen to calculate unsupervised metrics.

A. Experimental Setup
1) Network Implementation: The proposed TCFL strategy TABLE I
T HREE OCT D ATASETS U SED IN T HIS W ORK
can be implemented based on existing CNNs. In this work,
we adopted a classical denoising CNN, DnCNN [26], as the
noise predictor and utilized PatchGAN [22] as the discrimina-
tor. DnCNN has been widely used in many image denoising
tasks, which is simple yet effective. PatchGAN constrains the
image style and texture at the scale of patches, and has also
been widely used. Fig. 4 shows their network architectures.
Compared with the original version in [26], the DnCNN
adopted in this work was modified: 1) the residual learning was
discarded to output noise; 2) ReLU was replaced with Leaky
ReLU to fix the ReLU dying problem [37]. The PatchGAN For the division of training set, validation set, and test
used in this work kept the same architecture as the version set, the images from 20 subjects in PKU37 were selected
in [22]. We named the network implementation of TCFL using for training (namely PKU37-train), and the images from
DnCNN and PatchGAN as TCFL-DnCNN. the rest 17 subjects in PKU37 were used for validation
2) Data Preparation: In this work, we evaluated the proposed
(namely PKU37-validation). PKU37-validation was used for
TCFL strategy over three OCT datasets, including one not-yet- the hyperparameter tuning of all deep learning-based methods.
public dataset (namely PKU371) and two publicly available In order to verify the generalization of the deep learning-based
datasets (namely DUKE17 [38] and DUKE28 [39]). Fig. 5 methods, both DUKE17 and DUKE28 were used for cross-
shows the example images from the three datasets. PKU37 was domain test.
acquired with a customized spectral-domain OCT (SDOCT) 3) Training Details: We implemented the proposed
system, whose lateral resolution and axial resolution were TCFL-based network through the PyTorch framework [40].
16 μm and 6 μm, respectively. The central wavelength of the An NVIDIA GTX 3090 GPU was used for all the training.
light source was 845 nm, and the full width at half maximum Adam [41] was adopted as the optimizer, with momentum 1
bandwidth was 45 nm. Averaging 50 frames was adopted to and momentum 2 as 0.5 and 0.999, respectively. Batch
acquire the clean images. That is, in PKU37, one clean image size was 2, and learning rate was set to 0.0002. For
corresponds to 50 noisy images with independent speckle model initialization, the weights in convolutions and batch
noise. More details about the acquisition modes of PKU37 normalization were random numbers which followed the
can be found in [13]. Table I presents main information of the normal distributions of N(0, 0.02) and N(1.0, 0.02),
three datasets, and the detailed information of the other two respectively, and the biases were both initialized as 0.
public datasets can be found in [38] and [39]. The training epoch number was 100, and the training
time was 40 hours. The weights λ and μ in (8) and (10)
1 PKU37 will be released at https://wiki.milab.wiki/display/LF/Open+Source were empirically set as 6 and 1, respectively, which were
+Project soon determined by a number of experiments.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
GENG et al.: TRIPLET CROSS-FUSION LEARNING FOR UNPAIRED IMAGE DENOISING IN OCT 3363

TABLE II
Q UANTITATIVE C OMPARISON OF THE P ROPOSED TCFL-D N CNN AND S EVEN T RADITIONAL OR U NPAIRED M ETHODS ON PKU37-VALIDATION

4) Baselines and Quantitative Measurements: To better ver- unpaired methods generate fewer noise residues than tra-
ify the effectiveness of the proposed TCFL strategy, nine ditional algorithms. Specifically, BM3D introduces some
methods were implemented for comparison, including two streak artifacts after denoising, and NLM is not competent
traditional algorithms and seven deep learning-based meth- to the speckle noise reduction task, with a lot of noise
ods. The traditional algorithms include BM3D [3] and residues. Among unpaired methods, the image denoised by
NLM [4]. The deep learning-based methods consist of one nonlocal-GAN loses many retinal layer signals; both R2R-
supervised method (OCTNet [10]), one weakly-supervised DnCNN and NBR-DnCNN enhance the original noisy OCT
method (N2N-DnCNN [19]), and five unpaired methods image slightly, while there are many noise residues; the
(nonlocal-GAN [21], R2R-DnCNN [30], NBR-DnCNN [31], denoising result of DRGAN has low contrast which heavily
DRGAN [17], and SPcycleGAN [25]). For the traditional weakens the blood flow information, as indicated by the
algorithms, we determined their parameters with a range of arrows in Fig. 6 (c4); SPcycleGAN reduces speckle noise to
experiments to achieve the best denoising performance. The a certain extent, while it also loses some structural details,
deep learning-based methods were implemented according to as indicated by the arrows in Fig. 6 (c5). In contrast, the
the publicly available codes released by the authors. All deep proposed TCFL-DnCNN not only reduces the speckle noise
learning-based methods adopted the same division of training effectively, but also well preserves many structural details and
set, validation set, and test set as TCFL-DnCNN. For a fair blood flow information.
comparison, we tried different hyperparameters for each deep Table II summarizes the quantitative results
learning-based method to obtain the optimal denoising per- (mean±standard deviation) of traditional and unpaired
formance on PKU37-validation. The hyperparameters leading methods over PKU37-validation. It can be observed that
to the optimal results on PKU37-validation were selected as the proposed TCFL-DnCNN outperforms the traditional and
the final hyperparameters. The image pairs during one training unpaired reference methods in terms of most quantitative
epoch were also the same, 1000 pairs. metrics except EPI. NLM achieves a higher EPI value than
In terms of quantitative comparison, three unsupervised TCFL-DnCNN. Considering that EPI is calculated using
metrics and three supervised metrics were used. The adopted the images before and after denoising, we speculate that
unsupervised metrics included signal-to-noise ratio(SNR), this is because NLM could not effectively eliminate the
contrastto-noise ratio(CNR), and EPI. Referring to [10], [15], speckle noise and the difference between the images before
[17], the unsupervised metrics were calculated based on three and after denoising was not significant. We also compared
signal regions of interests (ROIs) and one background ROI. the inference time of different algorithms, as shown in
As shown in Fig. 5, the signal ROIs (red rectangles) located Table II. The proposed TCFL-DnCNN, R2R-DnCNN, and
at or near the retinal layers, and the background ROI (green NBR-DnCNN are much faster than other methods, which
rectangles) was the homogeneous region. As for supervised have the potential for in vivo OCT denoising.
metrics, in addition to two commonly-used metrics, peak
signal-to-noise ratio (PSNR) and SSIM, we also calculated C. Comparison With Paired Methods
the gradient conduction mean square error (GCMSE) [42], Fig. 6 (d1) presents the denoising result of the supervised
which is very sensitive to edges and borders. Smaller value OCTNet, and Fig. 6 (d2) shows the denoising result of the
of GCMSE indicates the better denoising performance, and weakly-supervised N2N-DnCNN. It can be seen that, although
larger values of the other quantitative metrics denote higher OCTNet and N2N-DnCNN generate smoother images than the
performance. proposed TCFL-DnCNN, they (especially N2N-DnCNN) suf-
fer from the over-smoothness problem, losing some structural
details or blood flow information, as indicated by the arrows in
B. Comparison With Traditional/Unpaired Methods Figs. 6 (d1) and (d2). In contrast, TCFL-DnCNN achieves a
As shown in Fig. 6, we selected one representative OCT good balance between denoising and structure preservation,
image from PKU37-validation to visualize the denoising not only improving the visual quality but also preserving
performance of all methods. Two ROIs were magnified essential structures.
for a clearer comparison. Figs. 6 (b1) and (b2) illustrate the Table III presents the quantitative results of TCFL-DnCNN,
denoised results of traditional algorithms, and Figs. 6 (c1)-(c6) OCTNet, and N2N-DnCNN on PKU37-validation. TCFL-
show the denoised results of six unpaired methods. Most DnCNN achieves the highest EPI and the lowest GCMSE,

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3364 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

Fig. 6. Denoising performance comparison of different methods on PKU37-validation. (a1) Clean. (a2) Noisy. (b1) BM3D. (b2) NLM. (c1) nonlocal-
GAN. (c2) R2R-DnCNN. (c3) NBR-DnCNN. (c4) DRGAN. (c5) SPcycleGAN. (c6) Our TCFL-DnCNN. (d1) OCTNet. (d2) N2N-DnCNN.

both of which are sensitive to the edge and texture. In terms of TABLE III
SNR, CNR, and SSIM, the proposed TCFL-DnCNN presents Q UANTITATIVE C OMPARISON OF THE P ROPOSED TCFL-D N CNN AND
T WO PAIRED M ETHODS ON PKU37-VALIDATION
the second best. Although OCTNet and N2N-DnCNN achieve
higher PSNR as a result of over-smoothness, the higher PSNR
values do not represent that the denoised images have a better
structural and texture preservation. The quantitative results
confirm our visual observations that TCFL-DnCNN tends to
preserve more structural information. As for the inference
time, as shown in Table III, both TCFL-DnCNN and N2N-
DnCNN are faster than OCTNet.
D. Generalization Comparison
Medical imaging including OCT always suffers from the methods by two cross-domain tests. Specifically, the deep
domain-shift problem, due to different scanning protocols, learning-based models were trained with PKU37-train, and
reconstruction methods, patients, etc. Thus, we also com- then were tested over DUKE17 and DUKE28, to verify their
pared the generalization capabilities of the deep learning-based capabilities of denoising “unseen” OCT images.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
GENG et al.: TRIPLET CROSS-FUSION LEARNING FOR UNPAIRED IMAGE DENOISING IN OCT 3365

TABLE IV
Q UANTITATIVE C OMPARISON OF G ENERALIZATION FOR THE P ROPOSED TCFL-D N CNN
AND F IVE U NPAIRED M ETHODS ON DUKE17 AND DUKE28

showed certain generalization capabilities. Similar to the


denoising performance on PKU37-validation, OCTNet and
N2N-DnCNN generate over-smoothed images which lose
some structural information; nonlocal-GAN discards a lot
of meaningful signals; there are a lot of noise residuals in
the denoised images by R2R-DnCNN and NBR-DnCNN;
the images denoised by DRGAN have low contrast; SPcy-
cleGAN loses some structural details. However, the pro-
posed TCFL-DnCNN not only reduces the speckle noise
effectively but also preserves more structural details (espe-
cially some layer information), as indicated by the arrows in
Fig. 7 and Fig. 8.
Table IV presents the quantitative results of TCFL-DnCNN
and five unpaired reference methods on DUKE17 and
DUKE28, where TCFL-DnCNN reaches the best scores in
all metrics. Table V summerizes the quantitative results of
TCFL-DnCNN, OCTNet and N2N-DnCNN on DUKE17 and
DUKE28. On DUKE17, compared with OCTNet and N2N-
DnCNN, the EPI and GCMSE of TCFL-DnCNN are the best,
and the SNR, CNR, and SSIM of TCFL-DnCNN achieves the
second best. On DUKE28, TCFL-DnCNN yields better scores
in EPI, SSIM, and GCMSE than OCTNet and N2N-DnCNN,
which are sensitive to edge and texture. Although OCTNet
and N2N-DnCNN are superior to TCFL-DnCNN in terms of
SNR, CNR, and PSNR, they over-smooth the denoised images.
These results reveal that the proposed TCFL-DnCNN has a
superior generalization capability to all reference methods,
in terms of reducing speckle noise, preventing over-smoothing
images, as well as preserving structural details. This might be
because the information interaction of clean components and
Fig. 7. Denoising performance comparison of cross-domain test noise components among the training samples gets enhanced
with different deep learning-based methods on DUKE17. (a1) Clean.
(a2) Noisy. (b1) nonlocal-GAN (b2) R2R-DnCNN. (b3) NBR-DnCNN. through our TCFL strategy, which helps deep learning net-
(b4) DRGAN. (b5) SPcycleGAN. (b6) Our TCFL-DnCNN. (c1) OCTNet. works better distinguish between the clean components and
(c2) N2N-DnCNN. the noise components.

V. D ISCUSSIONS
Fig. 7 and Fig. 8 illustrate the denoising examples of
different methods over DUKE17 and DUKE28, respectively. A. Loss Weight Setting and Ablation Studies
The OCT images of DUKE28 are normal, while those of As indicated in (8) and (10), there are two loss weights,
DUKE17 are pathological. All deep learning-based methods λ and μ, in our proposed loss function. We investigated

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3366 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

Fig. 9. The influence of different (a) λ and (b) μ on the TCFL-DnCNN


denoising performance on PKU37-validation, DUKE17, and DUKE28.

terms of PSNR, SSIM, and GCMSE. Then, we fixed λ to 6,


and explored different μ from {0.6, 0.7, 0.8, 0.9, 1.0, 1.1,
Fig. 8. Denoising performance comparison of cross-domain test 1.2, 1.3, 1.4}. As shown in Fig. 9 (b), μ = 1 is the optimal
with different deep learning-based methods on DUKE28. (a1) Clean. value for TCFL-DnCNN. Thus, we set λ and μ to 6 and 1,
(a2) Noisy. (b1) nonlocal-GAN (b2) R2R-DnCNN. (b3) NBR-DnCNN.
(b4) DRGAN. (b5) SPcycleGAN. (b6) Our TCFL-DnCNN. (c1) OCTNet. respectively.
(c2) N2N-DnCNN. As indicated in (10), there are four identity loss terms,
L iden A , L iden B , L iden C1 , and L iden C2 . We also carried out
TABLE V
the ablation studies for the four identity loss terms. PKU37
Q UANTITATIVE C OMPARISON OF G ENERALIZATION FOR THE was used for training and validation. It is noteworthy that
P ROPOSED TCFL-D N CNN AND T WO PAIRED M ETHODS during the ablation studies, λ and μ were fixed to 6 and 1,
ON DUKE17 AND DUKE28 respectively. Table VI summarizes the quantitative results
of different combinations of identity loss terms on PKU37-
validation. Fig. 10 shows the related visual results.
From Table VI and Fig. 10, we can observe that when
only L iden A or L iden B was used as L iden (i.e., L iden =
L iden A or L iden = L iden B ), the denoising performance of
TCFL-DnCNN is relatively poor. Taking L iden = L iden A as
an example, only IA → NA → CA1 → IA → CA2 could
form a closed-loop constraint by calculating the identity loss
between CA1 and CA2 , while there is no loss constraint for
IB → NB → CB1 → IB → CB2 . As a result, it could not
be guaranteed that NB bears the wanted speckle noise, and
the subsequently synthesized IA might contain some structural
information of IB .
When both L iden A and L iden B were adopted as L iden
(i.e., L iden = L iden A + L iden B ), the denoising performance
of TCFL-DnCNN is significantly improved compared with
the influence of different loss weights on the TCFL-DnCNN the situations of L iden = L iden A or L iden = L iden B . This
denoising performance, and the related quantitative results is because both IA → NA → CA1 → IA → CA2 and
over PKU37-validation, DUKE17, and DUKE28 are shown IB → NB → CB1 → IB → CB2 get closed-loop constraints.
in Fig. 9. We first set μ to 1, and tried different λ from {2, 3, By adding L iden C1 or L iden C2 to L iden A and L iden B (i.e.,
4, 5, 6, 7, 8, 9, 10}. As illustrated in Fig. 9 (a), when λ = 6, L iden = L iden A + L iden B + μL iden C1 or L iden = L iden A +
TCFL-DnCNN achieves the best denoising performance in L iden B + μL iden C2 ), the generator was able to look at real

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
GENG et al.: TRIPLET CROSS-FUSION LEARNING FOR UNPAIRED IMAGE DENOISING IN OCT 3367

Fig. 11. Model architecture comparison of CNN as the noise predictor


and as the content predictor.

Fig. 10. Visual comparison of ablation study for different identity loss
combinations on PKU37-validation.

TABLE VI
Q UANTITATIVE R ESULTS OF A BLATION S TUDY FOR D IFFERENT
I DENTITY L OSS C OMBINATIONS ON PKU37-VALIDATION

Fig. 12. Denoising performance comparison of different CNNs as the


noise (or content) predictor through the proposed TCFL strategy on
PKU37-validation.

clean images and generate more real-like clean images. Thus, CNNs are adopted to estimate clean images (content) directly,
the denoising performance could be further improved. as shown in Fig. 11. That is, the ablation studies for the
When all of the four identity loss terms were utilized as residual learning were carried out.
L iden (i.e., L iden = L iden A + L iden B +μ(L iden C1 + L iden C2 )), Fig. 12 shows the denoising examples of different CNNs as
the clean components and the noise components from three the noise (or content) predictor through the proposed TCFL
unpaired images are cross-fused more diversely. The network strategy on PKU37-validation. Generally speaking, the three
could extract more speckle noise information. As a result, CNNs show effective denoising capabilities. DnCNN and
the noise could be reduced more effectively, so the structural U-Net outperform SRDenseNet in speckle noise reduction.
details could be preserved better, achieving the best denoising Furthermore, all CNNs improve their denoising performance
performance. These results have revealed that the identity loss when they were used as the noise predictors, especially for
is essential for the denoising performance of TCFL-DnCNN. SRDenseNet. That is, the effectiveness of the global resid-
ual learning has been verified. Table VII summarizes the
quantitative results of different CNNs on PKU37-validation,
B. Universality Study which support our visual observations. These studies have
In order to verify the universal applicability of the pro- demonstrated the universality of the proposed TCFL strategy.
posed TCFL strategy, besides DnCNN, we also implemented That is to say, the strategy can be implemented by CNNs with
the strategy with two other widely-used CNNs as the noise different structures.
predictor, i.e., U-Net [43] and SRDenseNet [44]. These three In addition to exploring the impact of different network
CNN models represent different network architectures, where architectures on the denoising performance of the TCFL
DnCNN, U-Net, and SRDenseNet correspond to the single- strategy, we also compared the denoising performance of two
path, the U-shaped, and the multi-path architectures, respec- TCFL-DnCNNs with different activation functions (i.e., Leaky
tively. In addition, we also compared the situations where ReLU and ReLU). In terms of quantitative metrics, when
CNNs are used to predict noise and the situations where the activation function of TCFL-DnCNN was replaced from

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3368 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

TABLE VII
Q UANTITATIVE R ESULTS OF D IFFERENT CNN S U NDER THE P ROPOSED TCFL S TRATEGY ON PKU37-VALIDATION

TABLE VIII
Q UANTITATIVE R ESULTS OF TCFL-D N CNN U NDER D IFFERENT S PECKLE N OISE L EVELS ON DUKE17

Leaky ReLU to ReLU, PSNR is decreased from 29.418 to


29.035, SSIM is decreased from 0.635 to 0.618, and GCMSE
is improved from 7.723 to 8.984, on PKU37-validation.
The visual denoising comparison is presented in Fig. 12.
We can observe that although Leaky ReLU outperforms ReLU
quantitatively, ReLU preserves more structural information
visually. In the field of medical imaging, the macroscopic
image appearance is more informative than the commonly
used quantitative denoising evaluation metrics. These results
reveal that Leaky ReLU might tend to obtain better quantitative
results, while ReLU is preferred to Leaky ReLU with respect
to structure preservation.

C. Robustness Analysis
1) Noise Robustness: It is necessary to explore the noise
robustness for a trained deep learning network. Thus,
we applied the TCFL-DnCNN trained by PKU37-train to a
set of DUKE17 test scenarios with different noise levels. The
simulation of pseudo-speckle noise-corrupted OCT images
was referred to [17], [45], where a parameter σ 2 was employed
to control the noisy level, and the larger σ 2 value corresponds
to the noisier image. We tried different σ 2 from {0.02, 0.05,
0.1, 0.2}. Based on the clean images in DUKE17, four test
sets with different noise levels are simulated.
Fig. 13 illustrates the simulated noisy OCT images and
the corresponding denoised results by TCFL-DnCNN. It can
be seen that the proposed TCFL-DnCNN can deal with dif-
ferent noise levels and restore the noisy images effectively.
As the value of σ 2 increased, the quality of the simulated
noisy images gets worse, and the denoising performance of Fig. 13. The denoising performance of TCFL-DnCNN under different
TCFL-DnCNN exhibits slight decay. However, TCFL-DnCNN speckle noise levels on DUKE17.
still denoises the simulated noisy images effectively even
when σ 2 was 0.2. Table VIII presents the related quantitative
results, which support the visual observations. These results including two noisy images and one clean image. In contrast,
have demonstrated the noise robustness of the proposed TCFL existing denoising methods used only one (e.g., nonlocal-
strategy. GAN [21], R2R-DnCNN [30], and NBR-DnCNN [31]) or two
2) Training With Limited Data: As we have elaborated in images (e.g., OCTNet [10], N2N-DnCNN [19], DRGAN [17],
Section III, the TCFL-based network was trained using triplets, and SPcycleGAN [25]) for training. For example, the

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
GENG et al.: TRIPLET CROSS-FUSION LEARNING FOR UNPAIRED IMAGE DENOISING IN OCT 3369

Fig. 14. Quantitative results of different methods trained in lim-


ited training data scenarios. (a)-(c) correspond to PKU37-validation, Fig. 15. Denoising performance comparison of different methods in
(d)-(f) correspond to DUKE17, and (g)-(i) correspond to DUKE28. limited training data scenarios on DUKE28.

denoising training set D contains n noisy images and m clean N2N-DnCNN outperform TCFL-DnCNN in some metrics at
images. When the unpaired denoising networks are trained the beginning (i.e., 100% training data). However, as the
based on two unpaired noisy-clean images (i.e., the doublet), proportion of training data decreased, TCFL-DnCNN shows
the number of all the possible doublet combinations from D slight declines in all metrics, while the declines of the ref-
will be C1n ×C1m in Combinatorics. As for the proposed TCFL- erence methods are much heavier. When the proportion is
based networks, the number of all the possible triplet combina- less than 50%, TCFL-DnCNN is superior to all reference
tions from D will be C2n × C1m , which is obviously larger than methods in all quantitative metrics. Fig. 15 visually compares
the number of the doublet combinations. Thus, our strategy can TCFL-DnCNN with seven reference methods on DUKE28.
deal with some limited training data scenarios. We compared In Fig. 15, all reference methods suffer from over-smoothness
TCFL-DnCNN with other seven deep learning-based methods and structural loss problems as the proportion decreased.
(i.e., OCTNet, N2N-DnCNN, nonlocal-GAN, R2R-DnCNN, However, the proposed TCFL-DnCNN still denoises the OCT
NBR-DnCNN, DRGAN, and SPcycleGAN) over six limited images effectively, even when the proportion was 2%. These
training data scenarios, where the numbers of images in the results have proved the tolerance of the TCFL strategy to
training set are reduced to 75%, 50%, 25%, 10%, 5%, and scenarios with limited training data.
2%, respectively. For a fair comparison, the reference methods
adopted some routine augmentation methods (including flip- D. Application in Pronounced Retinal Pathologies
ping, cropping, shifting, zooming, and some combinations) to In this work, three OCT denoising datasets were used.
keep the numbers of their training pairs consistent with that The clean OCT images are always obtained by multi-frame
of TCFL-DnCNN. averaging, which is very time-consuming and not easy to
Fig. 14 presents the quantitative results of different meth- be obtained. The difficulty of acquiring clean OCT images
ods over PKU37-validation, DUKE17, and DUKE28. It can limits the diversity of the OCT denoising datasets to some
be seen that, the supervised OCTNet or weakly-supervised extent, especially for the patients with severe pathologies.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3370 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

Fig. 16. Visual comparison before and after TCFL-DnCNN denoising


for two pronounced retinal pathologies (retinal fluid and vitreomacular
traction syndrome) and a non-foveal region.
Fig. 17. Comparison of the segmentation results after the denoising
Thus, the three datasets used in this work do not comprise by various methods. (a1) ground truth. (a2) no denoising. (b1) BM3D.
pronounced pathologies as can be frequently found in clinical (b2) NLM. (c1) nonlocal-GAN. (c2) R2R-DnCNN. (c3) NBR-DnCNN.
(c4) DRGAN. (c5) SPcycleGAN. (c6) Our TCFL-DnCNN. (d1) OCTNet.
routine, as well as they are restricted to the region of the (d2) N2N-DnCNN.
fovea. In order to further verify the generalization capabilities
of the proposed TCFL strategy, we applied TCFL-DnCNN trained deep denoisers and two traditional denoising algo-
trained on PKU37-train to denoising two noisy OCT images rithms to denoise the OCT images of a public retinal layer
with pronounced retinal pathologies (i.e., retinal fluid and segmentation dataset [46]. Finally, we adopted the denoised
vitreomacular traction syndrome) and a noisy non-foveal OCT OCT images as the inputs of a trained segmentation network
image, which are provided by our clinical collaborators. (U-Net) to acquire the retinal layer segmentation results. In the
Fig. 16 shows the related visual comparison before and after retinal layer segmentation dataset [46], there are 220 image
TCFL-DnCNN denoising. It can be seen that TCFL-DnCNN pairs from 20 subjects, and each pair consists of an OCT
improves the visual quality obviously. It is noteworthy that retinal image as the input and a retinal layer segmentation
although PKU37-train does not contain pathological images, mask as the label. The layer segmentation masks include the
the trained TCFL-DnCNN can still effectively denoise various background area, the total retina area, and the retinal pigment
OCT images with different pathologies, shapes, and structures. epithelium and drusen complex (RPEDC) area. Fig. 17 shows
This is because the characteristics of speckle noise in OCT the visual segmentation results with the help of the denois-
images are largely determined by the imaging system rather ing by various methods. We can observe that the proposed
than the samples [5]. TCFL-DnCNN achieves the most obvious improvement for
the downstream segmentation task. Two quantitative evaluation
E. Application in Retinal Layer Segmentation metrics were calculated, including pixel accuracy (PA) and
mean intersection over union (MIoU). Table IX presents the
Image denoising can be a pre-processing step for some
quantitative results, which also demonstrate the superiority of
high-level vision tasks (such as segmentation, detection, clas-
our TCFL-DnCNN in serving for the downstream segmenta-
sification, etc.) to mitigate the deterioration due to noise.
In order to further demonstrate the effectiveness of our TCFL tion task.
strategy, we compared the influence of different OCT image
denoising methods on the downstream retinal layer segmenta- F. Model Complexity
tion task. That is, we first trained different deep learning-based As mentioned above, most unpaired OCT image denoising
denoising networks using PKU37-train. Then we applied the works are based on the cycleGAN frameworks which have

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
GENG et al.: TRIPLET CROSS-FUSION LEARNING FOR UNPAIRED IMAGE DENOISING IN OCT 3371

TABLE IX interaction among training samples by quadruple or quintu-


Q UANTITATIVE R ESULTS OF THE R ETINAL L AYER S EGMENTATION ple cross-fusion. Second, quadruple or quintuple cross-fusion
A FTER VARIOUS D ENOISING M ETHODS
learning can deal with the scenarios of fewer training samples.
We will conduct further studies toward this direction.

VI. C ONCLUSION
In this work, we propose a novel triplet cross-fusion learn-
ing (TCFL) strategy to denoise OCT images with unpaired
data. Compared with cycleGAN-based methods, the model
complexity of our strategy is reduced significantly. Three
widely-used CNNs were adopted to implement the TCFL strat-
egy, and three datasets were used for evaluation. The results
demonstrate that the proposed strategy outperformed exist-
TABLE X ing unpaired methods both qualitatively and quantitatively.
M ODEL C OMPLEXITY C OMPARISON OF TCFL-D N CNN AND Compared with existing supervised and weakly-supervised
T WO CYCLE GAN-B ASED D EEP N ETWORKS methods, the proposed strategy has competitive quantitative
results, and demonstrates better structure preservation capa-
bility visually. We have also verified the universality and
generalization capability of the TCFL strategy. Furthermore,
the TCFL strategy can deal with some limited training data
scenarios.

R EFERENCES
at least two generators and two discriminators, whereas the [1] A. M. Zysk, F. T. Nguyen, A. L. Oldenburg, D. L. Marks, and
proposed TCFL strategy contains only one generator and S. A. Boppart, “Optical coherence tomography: A review of clinical
one discriminator. To better demonstrate that the model com- development from bench to bedside,” J. Biomed. Opt., vol. 12, no. 5,
2007, Art. no. 051403.
plexity of the TCFL strategy is much lower than those of [2] T. Klein, R. André, W. Wieser, T. Pfeiffer, and R. Huber, “Joint aperture
the cycleGAN-based methods, we compare TCFL-DnCNN detection for speckle reduction and increased collection efficiency in
with two cycleGAN-based methods (i.e., DRGAN [17] and ophthalmic MHz OCT,” Biomed. Opt. Exp., vol. 4, no. 4, pp. 619–634,
2013.
SPcycleGAN [25]) in terms of model parameters. As listed [3] B. Chong and Y.-K. Zhu, “Speckle reduction in optical coherence
in Table X, the parameters of TCFL-DnCNN is significantly tomography images of human finger skin by wavelet modified BM3D
smaller than those of DRGAN and SPcycleGAN. filter,” Opt. Commun., vol. 291, pp. 461–469, Mar. 2013.
[4] J. Aum, J.-H. Kim, and J. Jeong, “Effective speckle noise suppression in
optical coherence tomography images using nonlocal means denoising
filter with double Gaussian anisotropic kernels,” Appl. Opt., vol. 54,
G. Outlook no. 13, pp. D43–D50, 2015.
[5] F. Zaki, Y. Wang, H. Su, X. Yuan, and X. Liu, “Noise adaptive
The proposed image denoising strategy is based on the wavelet thresholding for speckle noise removal in optical coherence
assumption that the OCT speckle noise in logarithm space is tomography,” Biomed. Opt. Exp., vol. 8, no. 5, pp. 2720–2731, 2017.
additive and data-independent. In practice, considering factors [6] M. Geng et al., “Content-noise complementary learning for medical
image denoising,” IEEE Trans. Med. Imag., vol. 41, no. 2, pp. 407–419,
such as OCT signal acquisition and image reconstruction, the Feb. 2022.
speckle noise in logarithm space might not be completely [7] M. Geng et al., “PMS-GAN: Parallel multi-stream generative adversarial
signal-independent. As a result, the synthesized OCT images network for multi-material decomposition in spectral computed tomog-
raphy,” IEEE Trans. Med. Imag., vol. 40, no. 2, pp. 571–584, Feb. 2021.
by the simple adding operation might contain a few structural [8] B. Qiu et al., “N2NSR-OCT: Simultaneous denoising and super-
residues from the other OCT images. However, these structural resolution in optical coherence tomography images using semisu-
residues can be ignored by the deep learning networks, due pervised deep learning,” J. Biophoton., vol. 14, no. 1, Jan. 2021,
Art. no. e202000282.
to their powerful modeling capabilities. The above denoising [9] L. Zhu et al., “Synergistically segmenting choroidal layer and vessel
results confirm this conclusion. using deep learning for choroid structure analysis,” Phys. Med. Biol.,
As for some other medical imaging modalities such as low- vol. 67, no. 8, Apr. 2022, Art. no. 085001.
[10] M. Xu, C. Tang, F. Hao, M. Chen, and Z. Lei, “Texture preservation
dose CT, the noise is always data-independent. Our strat- and speckle reduction in poor optical coherence tomography using the
egy cannot deal with these situations with data-independent convolutional neural network,” Med. Image Anal., vol. 64, Aug. 2020,
noise. A novel synthetic method for the fake noisy images Art. no. 101727.
[11] Z. Shen, M. Xi, C. Tang, M. Xu, and Z. Lei, “Double-path parallel
should be explored instead of the simple adding opera- convolutional neural network for removing speckle noise in different
tion in (4) and (6). We will explore a more generalized types of OCT images,” Appl. Opt., vol. 60, no. 15, pp. 4345–4355,
unpaired medical image denoising strategy which can deal May 2021.
[12] Y. Ma, X. Chen, W. Zhu, X. Cheng, D. Xiang, and F. Shi, “Speckle
with both signal-independent and signal-dependent noise in noise reduction in optical coherence tomography images based on edge-
future research. sensitive cGAN,” Biomed. Opt. Exp., vol. 9, no. 11, pp. 5129–5146,
Furthermore, the proposed TCFL can be extended to Nov. 2018.
[13] B. Qiu et al., “Noise reduction in optical coherence tomography images
quadruple or quintuple cross-fusion learning. This brings using a deep neural network with perceptually-sensitive loss function,”
two benefits. First, it will further enhance the information Biomed. Opt. Exp., vol. 11, no. 2, pp. 817–830, 2020.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.
3372 IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 41, NO. 11, NOVEMBER 2022

[14] A. K. Nilesh, D. Rupali, D. Ambedkar, and K. Y. Phaneendra, [31] T. Huang, S. Li, X. Jia, H. Lu, and J. Liu, “Neighbor2Neighbor:
“SiameseGAN: A generative model for denoising of spectral domain Self-supervised denoising from single noisy images,” in Proc.
optical coherence tomography images,” IEEE Trans. Med. Imag., vol. 40, IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021,
no. 1, pp. 180–192, Jan. 2021. pp. 14781–14790.
[15] M. Wang et al., “Semi-supervised capsule cGAN for speckle noise [32] Y. Li, H. Wang, and X. Dong, “The denoising of desert seismic data
reduction in retinal OCT images,” IEEE Trans. Med. Imag., vol. 40, based on cycle-GAN with unpaired data training,” IEEE Geosci. Remote
no. 4, pp. 1168–1183, Apr. 2021. Sens. Lett., vol. 18, no. 11, pp. 2016–2020, Nov. 2020.
[16] Y. Guo et al., “Structure-aware noise reduction generative adversarial [33] J. Song, J.-H. Jeong, D.-S. Park, H.-H. Kim, D.-C. Seo, and J. C. Ye,
network for optical coherence tomography image,” in Proc. Int. Work- “Unsupervised denoising for satellite imagery using wavelet direc-
shop Ophthalmic Med. Image Anal. Cham, Switzerland: Springer, 2019, tional CycleGAN,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 8,
pp. 9–17. pp. 6823–6839, Aug. 2021.
[17] Y. Huang et al., “Noise-powered disentangled representation for unsu- [34] J. Gu, T. S. Yang, J. C. Ye, and D. H. Yang, “CycleGAN
pervised speckle reduction of optical coherence tomography images,” denoising of extreme low-dose cardiac CT using wavelet-assisted
IEEE Trans. Med. Imag., vol. 40, no. 10, pp. 2600–2614, Oct. 2021. noise disentanglement,” Med. Image Anal., vol. 74, Dec. 2021,
[18] J. Lehtinen et al., “Noise2Noise: Learning image restoration without Art. no. 102209.
clean data,” in Proc. ICML, 2018, pp. 1–12. [35] S. Aumann, S. Donner, J. Fischer, and F. Müller, “Optical coherence
[19] B. Qiu et al., “Comparative study of deep neural networks with tomography (OCT): Principle and technical realization,” in High Reso-
unsupervised Noise2Noise strategy for noise reduction of optical coher- lution Imaging in Microscopy and Ophthalmology. Cham, Switzerland:
ence tomography images,” J. Biophoton., vol. 14, no. 11, Nov. 2021, Springer, 2019, pp. 59–85.
Art. no. e202100151. [36] H. M. Salinas and D. C. Fernandez, “Comparison of PDE-based non-
[20] Z. Jiang et al., “Weakly supervised deep learning-based optical coher- linear diffusion approaches for image enhancement and denoising in
ence tomography angiography,” IEEE Trans. Med. Imag., vol. 40, no. 2, optical coherence tomography,” IEEE Trans. Med. Imag., vol. 26, no. 6,
pp. 688–698, Feb. 2021. pp. 761–771, Jun. 2007.
[21] A. Guo, L. Fang, M. Qi, and S. Li, “Unsupervised denoising of opti- [37] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified
cal coherence tomography images with nonlocal-generative adversarial activations in convolutional network,” 2015, arXiv:1505.00853.
network,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021. [38] L. Fang, S. Li, Q. Nie, J. A. Izatt, C. A. Toth, and S. Far-
[22] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image siu, “Sparsity based denoising of spectral domain optical coherence
translation using cycle-consistent adversarial networks,” in Proc. IEEE tomography images,” Biomed. Opt. Exp., vol. 3, no. 5, pp. 927–942,
Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232. May 2012.
[23] I. Manakov, M. Rohm, C. Kern, B. Schworm, K. Kortuem, and V. Tresp, [39] L. Fang et al., “Fast acquisition and reconstruction of optical coherence
“Noise as domain shift: Denoising medical images by unpaired image tomography images via sparse representation,” IEEE Trans. Med. Imag.,
translation,” in Domain Adaptation and Representation Transfer and vol. 32, no. 11, pp. 2034–2049, Nov. 2013.
Medical Image Learning With Less Labels and Imperfect Data. Cham, [40] A. Paszke et al., “PyTorch: An imperative style, high-performance deep
Switzerland: Springer, 2019, pp. 3–10. learning library,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32,
[24] V. Das, S. Dandapat, and P. K. Bora, “Unsupervised super-resolution Dec. 2019, pp. 8026–8037.
of OCT images using generative adversarial network for improved [41] Z. Zhang, “Improved Adam optimizer for deep neural networks,” in
age-related macular degeneration diagnosis,” IEEE Sensors J., vol. 20, Proc. IEEE/ACM 26th Int. Symp. Quality Service (IWQoS), Jun. 2018,
no. 15, pp. 8746–8756, Aug. 2020. pp. 1–2.
[25] M. Wu, W. Chen, Q. Chen, and H. Park, “Noise reduction for SD-OCT [42] J. Lopez-Randulfe, C. Veiga, J. J. Rodriguez-Andina, and J. Farina,
using a structure-preserving domain transfer approach,” IEEE J. Biomed. “A quantitative method for selecting denoising filters, based on a new
Health Informat., vol. 25, no. 9, pp. 3460–3472, Sep. 2021. edge-sensitive metric,” in Proc. IEEE Int. Conf. Ind. Technol. (ICIT),
[26] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a Mar. 2017, pp. 974–979.
Gaussian denoiser: Residual learning of deep CNN for image denoising,” [43] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017. works for biomedical image segmentation,” in Proc. Int. Conf. Med.
[27] A. M. Hasan, M. R. Mohebbian, K. A. Wahid, and P. Babyn, “Hybrid- Image Comput. Comput.-Assist. Intervent. Cham, Switzerland: Springer,
collaborative Noise2Noise denoiser for low-dose CT images,” IEEE 2015, pp. 234–241.
Trans. Radiat. Plasma Med. Sci., vol. 5, no. 2, pp. 235–244, Mar. 2021. [44] T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using
[28] C. Tian, L. Fei, W. Zheng, Y. Xu, W. Zuo, and C.-W. Lin, “Deep learning dense skip connections,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
on image denoising: An overview,” Neural Netw., vol. 131, pp. 251–275, Oct. 2017, pp. 4799–4807.
Nov. 2020. [45] M. Li, R. Idoughi, B. Choudhury, and W. Heidrich, “Statistical model for
[29] J. Batson and L. Royer, “Noise2Self: Blind denoising by self- OCT image denoising,” Biomed. Opt. Exp., vol. 8, no. 9, pp. 3903–3917,
supervision,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 524–533. 2017.
[30] T. Pang, H. Zheng, Y. Quan, and H. Ji, “Recorrupted-to- [46] S. J. Chiu, J. A. Izatt, R. V. O’Connell, K. P. Winter, C. A. Toth,
recorrupted: Unsupervised deep learning for image denoising,” in Proc. and S. Farsiu, “Validated automatic segmentation of AMD pathology
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, including drusen and geographic atrophy in SD-OCT images,” Invest.
pp. 2043–2052. Ophthalmol. Vis. Sci., vol. 53, no. 1, pp. 53–61, 2012.

Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on June 01,2024 at 05:15:34 UTC from IEEE Xplore. Restrictions apply.

You might also like