Fast MRI Reconstruction Using StrainNet With Dual D - 2023 - Intelligent Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Intelligent Systems with Applications 18 (2023) 200203

Contents lists available at ScienceDirect

Intelligent Systems with Applications


journal homepage: www.journals.elsevier.com/intelligent-systems-with-applications

Fast MRI reconstruction using StrainNet with dual-domain loss on spatial


and frequency spaces
Worapan Kusakunniran a,∗ , Sarattha Karnjanapreechakorn a , Thanongchai Siriapisith b ,
Pairash Saiviroonporn b
a
Faculty of Information and Communication Technology, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, 73170, Nakhon Pathom, Thailand
b
Department of Radiology, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wanglang Road, Bangkok Noi, 10700, Bangkok, Thailand

A R T I C L E I N F O A B S T R A C T

Keywords: One of the main challenges to obtain a high throughput in the MRI process is a slow signal acquisition. This
Fast MRI process could be improved using a parallel imaging technique, where fewer raw data with multiple radio
Image reconstruction frequency (RF) coils are acquired simultaneously to reconstruct a final MR image. Nowadays, all multi-coil
Encoder-decoder
MRI machines have a parallel imaging technique for the image reconstruction. However, the parallel imaging
CNN
still cannot accelerate sufficiently to reduce the overall acquisition time. In another way, this paper proposes
Frequency domain
a solution relying on a deep convolution neural network (CNN) to generate high-quality reconstruction MR
images with higher acceleration factors. The proposed method, called StrainNet, performs the reconstructions by
encoding the under-sampled data (i.e., for the speeding up process) into high-level features. Then the important
part of the network, called Strainer, is applied to discard irrelevance information, and decodes remaining features
back to reconstruct MR images. The proposed network could be trained end-to-end with a newly presented loss
function, Dual-Domain Loss (DDL), combining both spatial and frequency losses. The experimental results are
based on the fastMRI dataset and show that StrainNet outperforms the competing methods for both 4- and 8-fold
accelerations.

1. Introduction coding (SENSE) Pruessmann et al. (1999), Generalized Autocalibrating


Partially Parallel Acquisitions (GRAPPA) Griswold et al. (2002), and
A non-invasive imaging technology makes Magnetic Resonance Compressed Sensing (CS) Lustig et al. (2008). Due to their dominant
Imaging (MRI) a widespread tool for diagnostic. A significant challenge performances, the SENSE and GRAPPA are currently used in modern
of the MRI is the overall processing time which could take a long time MRI machines.
inherently when compared with other imaging tools. Thus it leads to a An emerging machine learning approach to a variety of problems re-
lower patient throughput. So, the current and ongoing MRI researches cently makes a good promising technique for the MR image reconstruc-
also focus on increasing the speed of data acquisition and improving the tion problem Hammernik et al. (2016), Wang et al. (2016), Hammernik
reconstruction process. et al. (2018). In addition, the excellent performance of deep learning
Modern MRI machines use a multi-coil system with multiple radio technology makes it a gold standard for machine learning tasks. So
frequency (RF) coils. This is because more than one RF coil can si- the deep learning technique suits the MR image reconstruction prob-
multaneously record different parts of a target object rather than in lem well due to the expandable complexity structure of the network. As
a standard sequential order. However, to do such a paralleled process, also mentioned by Liang et al. (2020), deep learning based approaches
specialized mathematical algorithms are required to combine multiple were shown to have a high potential for achieving a task of MRI recon-
under-sampled MR images, from multiple RF coil, into a single fully- struction.
sampled MR image. That is where parallel imaging (PI) is used in Recently, several research works applied the deep learning technol-
the MRI process. Three well-known PI techniques are SENSitivity En- ogy to the MR image reconstruction with a high acceleration factor and

*
Corresponding author.
E-mail addresses: worapan.kun@mahidol.edu (W. Kusakunniran), j.sarattha@gmail.com (S. Karnjanapreechakorn), thanongchai@gmail.com (T. Siriapisith),
pairash.sai@gmail.com (P. Saiviroonporn).

https://doi.org/10.1016/j.iswa.2023.200203
Received 14 December 2022; Received in revised form 31 January 2023; Accepted 10 February 2023
Available online 14 February 2023
2667-3053/© 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

indicated promising results Zbontar et al. (2018), Zhu et al. (2018). 𝑘 =  (𝑥) + 𝛼 (1)
However, large datasets are needed to compensate for the model com-
where 𝛼 is the noise that can be measured by the MRI machine Baert
plexity and prevent an overfitting phenomenon from achieving high
(2007), Sriram et al. (2020).
quality reconstruction results at a high acceleration rate. Therefore, the
However, modern MRI machines contain a multi-RF coil system that
Facebook AI Research (FAIR) collaborated with NYU Langone Health
can scan different object parts concurrently. They acquired coils’ k-
on a research project competition, fastMRI, to investigate the use of
space data which is then modulated by their own coils’ sensitivities
Artificial Intelligence (AI) to make MRI acquisition faster. The fastMRI
to compute MR signals. Each coil’s sensitivity is varied due to a dis-
competition provided large-scale raw MRI datasets with baseline results
tance between the coil and object location. So the acquired MR images,
Zbontar et al. (2018).
which have already been transformed, will look similar to a set of di-
Based on our literature review, the remaining research gap of speed-
verse images with inhomogeneous brightness. The k-space data, which
ing up the MRI reconstruction process using machine learning based
is acquired by the 𝑖-th coil, is described below.
approaches is about a quality of reconstructed images, especially for the
8-fold accelerations. Details in a reconstructed image are missing, when 𝑘𝑖 =  (𝑆𝑖 𝑥) + 𝛼𝑖 , 𝑖 = 1, 2, ..., 𝑁 (2)
compared with a corresponding groundtruth image, as can be seen in
Fig. 17. where 𝑖 is the 𝑖-th coil, 𝑁 is a total number of coils, and 𝑆𝑖 is the 𝑖-th
This paper introduces the deep convolutional neural network-based coil’s sensitivity matrix. Then every coiled-MR images are combined by
solution to reconstruct MR images. The proposed network, Strain- a reconstruction algorithm as below.
Net, is developed based on the encoder-decoder structure Cho et al. 𝑁

(2014a, 2014b), Ronneberger et al. (2015). Depending on the prob- 𝐼𝑟𝑒𝑐𝑜𝑛 = ( 𝑘𝑖 ) (3)
lem domains, the encoder-decoder structure provides an easement to 𝑖=1
scale the complexity in both encoder and decoder modules. It strives where 𝐼𝑟𝑒𝑐𝑜𝑛 is a reconstructed MR image, and  is a reconstruction
to reconstruct an MR image at a high acceleration rate. The raw MR algorithm.
data in the k-space contains a large amount of high-frequency signals, Since the MRI machine can acquire k-space data simultaneously
in which the conventional encoder-decoder method cannot compre- with the multi-coil system, each coil’s k-space will have some redundant
hend. The Strainer module of the proposed network is introduced to information spread from neighboring RF coils. For this reason, all mod-
address this challenge. It is located between every encoder and de- ern MRI machines could accelerate the acquisition process accordingly.
coder pair. Along with the StrainNet, a new type of loss function is This process reduces a number of acquired k-space lines under-sampled
also proposed, called dual-domain-loss (DDL), to maximize the network k-space, which leads to decreasing of the acquisition time. However,
performance. The proposed network is trained and evaluated using the when the under-sampled k-spaces are transformed back to the spatial
fastMRI dataset Zbontar et al. (2018). domain, this will result in MR images that contain alias artifacts Hamil-
The main contributions of this research paper are as follows. First, ton et al. (2017). Hence, the reconstruction algorithm, , in equation
the strainer module is introduced and added in every layer between (3), has to combine all coils’ k-spaces. It will also have to reduce or
an encoder module and a corresponding decoder module. It helps in eliminate such artifacts by utilizing the spread of redundant data.
reducing small-size artifacts in a reconstruction process. It also en-
hances an extraction of multiple scales features, where a complexity 2.2. Image reconstruction
of the network is not expanded. Second, a loss function is proposed to
combine losses from both domains of spatial and frequency domains. The state-of-the-art conventional reconstruction methods used in
It benefits in a training process, since both domains are related to the modern MRI machines can be divided into two main categories: SENSE
reconstruction, where an original data is in a frequency space and an and GRAPPA. These two reconstruction techniques use different types
output is reconstructed in a spatial space. Third, a number of param- of data as inputs.
eters of the proposed network is significantly smaller, when compared
with an original encoder-decoder based network. The performances of 2.2.1. SENSE
the reconstruction could be remained the same, but with faster training Sensitivity encoding for fast MRI (SENSE) is the reconstruction
and inferencing processes. Fourth, the proposed method could preserve method that operates in the spatial domain. So the acquired k-space
more details in reconstructed images, when compared with existing data is needed to be transformed into the spatial domain, before SENSE
methods, as shown in Figs. 13–18. is applied. However, SENSE requires another key input data: the coil’s
The rest of the paper is organized as follows. Background and related sensitivity map that can be measured by a pre-scanning process in the
work are described in section 2. The proposed method is introduced in MRI machine. This is because the MRI machine that uses SENSE for the
section 3. Experimental results are explained in section 4, where discus- reconstruction could result in aliased MR images with superimposed
sion and conclusion are stated in sections 5 and 6, respectively. pixels. This phenomenon happens because reducing the sampling k-
space lines’ distance reduces the FOV and causes aliasing. Since each
2. Background and related work coil’s superimposed MR image occurs with different weights according
to the coil’s sensitivity. Therefore, the coils’ sensitivity map can be used
2.1. Parallel imaging for solving equation (4) in order to reconstruct a clean and complete MR
image Pruessmann et al. (1999), Baert (2007), Hamilton et al. (2017),
The slow process of MR acquisitions is caused by a full amount of Blaimer et al. (2004).
raw data in the frequency domain, i.e., k-space. The data is needed to be
𝐹 = 𝑖𝑛𝑣𝑒𝑟𝑡(𝑆) × 𝐴 (4)
acquired line by line till the raw MR data contains an entire field of view
(FOV). Along with the old generation of single RF coil MRI machines, where 𝐹 is the unfolded MR image matrix, 𝑆 is the coils’ sensitivity
these accumulate the total processing time of the MR acquisitions. Af- map matrix, and 𝐴 is the aliased images matrix.
ter the acquisition process, an MR image in the spatial domain can be
constructed by applying an inverse Fourier transform function ( −1 ) to 2.2.2. GRAPPA
the MR data in the frequency domain. The basis MR image in the Carte- Generalized autocalibrating partially parallel acquisitions or
sian space 𝑥 ∈ ℂM , i.e., spatial domain, is related to the k-space in the GRAPPA is a method that operates in the local k-space data, unlike
Cartesian space 𝑘 ∈ ℂM via an equation below. SENSE method, by synthesizing missing k-space lines from redundant

2
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Fig. 1. An overview structure of the encoder-decoder network.

information Griswold et al. (2002). Another difference from the SENSE similar to the U character, called U-net, as shown in Fig. 1, for biomed-
method is that the GRAPPA uses under-sampled k-spaces that preserve ical image segmentation. U-net achieves very high-performances on
central area data called Auto Calibration Signal (ACS). This ACS calcu- various biomedical image segmentation problems. Also, each encoder
lates a GRAPPA weight (𝑊 ) related to each missing k-space line. This module consecutively decreases an input image’s size by two times. This
can be done because the central information in the k-space corresponds process enables U-net to learn image features at multiple scales. Then
to low spatial frequency information, which can be reconstructed into these extracted features are passed to each corresponding decoder mod-
the overall structure of MR image. Therefore, each missing k-space ule at the same level of U-net. Each decoder module expands an input
line can be filled with a linear combination by utilizing the GRAPPA image’s size by two times.
weight and neighboring sampled k-space from all coils as a convolu- The U-net structure can achieve outstanding segmentation results,
tional equation as below Griswold et al. (2002), Baert (2007), Hamilton especially on medical images. Many research works have improved
et al. (2017), Blaimer et al. (2004). and extended the original U-net structure. Zhou et al. proposed the
improved version of U-net, called U-net++, for medical image segmen-
𝐾𝑚 = 𝑊 ∗ 𝐾 (5) tation Zhou et al. (2018). U-net++ contains a more number of dense
where 𝐾𝑚 is the missing k-space data, and 𝐾 is the known k-space layers and a series of nested feature extractors, when compared with
data. After every missing coils’ k-space data is filled, the inverse Fourier the original U-net. So, the number of trainable parameters of U-net++
transform ( −1 ) is used for transforming all k-space images back to the increases significantly, and U-net++ can outperform the original U-
image spatial-space. Then, the coils’ MR images are combined into a net. Milletari et al. presented the encoder-decoder network in which
single MR image using a sum of squares method. its structure looks similar to the V character instead of the U character
Milletari et al. (2016). V-net also contains encoder and decoder modules
2.3. Encoder-decoder convolution neural network in the same ways as U-net, but each module is designed for a volume
type input data, e.g., MRI volume. Gu et al. proposed the enhanced U-
In recent years, many types of deep convolution neural networks net that changes original encoder modules to the pre-trained ResNet-34
have been adopted to solve challenging problems in medical related for more efficient feature extraction Gu et al. (2019).
fields as they can be scaled according to each problem’s size and Since the emergence of a parallel imaging technique for the multi-
complexity. One standout network type is the encoder-decoder type. coil MR machine, there are many techniques for the reconstruction
The encoder-decoder convolution neural network consists of two major process of both conventional techniques, including SENSE and GRAPPA,
parts: encoder and decoder. The encoder module performs as an image and more advanced techniques that require additional input data Lustig
feature extractor that enables the network to learn image’s features. et al. (2008), Hammernik et al. (2016, 2018), Donoho (2006). How-
Then decoder module takes the extracted and known image features ever, not many researches apply the deep convolution neural network
to transform them back to a problem requirement, e.g., segmented ob- in the MRI reconstruction process. For example, Zhu et al. presented the
ject and reconstructed image. Typically, the encoder-decoder network CNN-based solution that can transform and reconstruct under-sampled
contains four encoder and decoder modules but can be varied by the k-space data to fully-sampled MR images to produce promising recon-
problem’s complexity. This type of architecture makes the encoder- structed MR images Zhu et al. (2018). Recently, the method proposed
decoder network robust for object segmentation, image reconstruction, in the fastMRI competition adopted U-net for MR image reconstruc-
and super-resolution problems. tion on both knee and brain MR images Zbontar et al. (2018). These
Ronneberger et al. Ronneberger et al. (2015) proposed the state- researches open up for applying the deep CNN to the MR image recon-
of-the-art encoder-decoder convolutional neural network with a shape struction. Therefore, this paper proposes StrainNet that also relies on

3
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Fig. 2. An overview structure of the proposed StrainNet.

Fig. 3. An overview structure of the developed encoder module.

the encoder-decoder structure, similar to U-net, as a backbone with ad- various-scaled features. The deeper network could generate more high-
ditional Strainer modules. In addition, a Dual-Domain loss function is level features since a quality of extracted features is related to a number
introduced for a better performance in a training process. of feature extractors (i.e., filtering kernels) at each level. However, a
large number of filtering kernels leads to a large number of trainable
3. Proposed method parameters, where the network can struggle to fit into the input data
samples. Eventually, the training process is affected by an overfitting
3.1. StrainNet phenomenon. Therefore, the encoder module in each level is carefully
constructed based on a trade-off between the performance and the net-
In this paper, StrainNet is proposed based on an encoder-decoder work complexity. The encoding process can be summarized as in the
type of convolution neural network. It takes multi-coil under-sampled equation below.
MR images as input and outputs a single reconstructed MR image. Fig. 2
shows the architecture of StrainNet, which consists of three parts for 𝑥𝑒𝑛𝑐𝑜𝑑𝑒,𝑖 = 𝑖 (𝑥𝑚𝑢𝑙𝑡𝑖𝑐𝑜𝑖𝑙 ) (6)
four levels in depth. First, the encoder part acts as a feature extractor
in a consecutive order. The encoder module also decreases an input where 𝑖 is an encoder module in the network at 𝑖-th level and 𝑥𝑚𝑢𝑙𝑡𝑖𝑐𝑜𝑖𝑙
image’s size by half at each level, for the multi-scale feature extraction. is a multi-coil under-sampled MR image.
Second, the Strain part receives the extracted features from the en- As shown in Fig. 3, an encoder module structure displays a data
coder and filters out non-relevant features. Then, the remaining features flow from each sub-module. Each encoder module consists of two sub-
are passed to the third part, i.e., the decoder. The filtered features are module groups combined with 3x3 convolutions, normalizations, Leaky
then decoded and expanded consecutively until a reconstructed image ReLU activations, and dropouts. This structure is designed for network’s
has the same size as an input image. In this research work, StrainNet simplification and scalability. If more trainable parameters are needed
has four layers in depth. So, the encoder, Strainer, and decoder all are in each level, a more sub-module group can be appended accordingly.
existed in all four layers, as illustrated in Fig. 2, where the same colors Moreover, a filter number in the convolution layer can be adjusted
are referred to the same levels. based on the problem’s complexity. The normalizations help the mod-
ule to normalize extracted features into the same scale. The Leaky ReLU
3.1.1. Encoder module activation transforms the extracted features from the linear into the
The encoder module is a StrainNet’s feature extractor that takes non-linear spaces for enhancing important features.
multi-coil MR images as input data. Each encoder module extracts The dropout discards weakly extracted features. Finally, the input
features and decreases input images’ sizes by half at every level for image’s size is reduced by half for a smaller-scaled feature extraction,

4
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

3.1.3. Decoder module


The decoder module is a process that reconstructs an image from the
selected encoded features and increases the image’s size consecutively.
A decoding process can be described below.

𝑥𝑟𝑒𝑐𝑜𝑛,𝑖 = 𝑖 (𝑖 (𝑖 (𝑥𝑚𝑢𝑙𝑡𝑖𝑐𝑜𝑖𝑙 )) + 𝑖+1 ) (8)


where 𝑖 is the decoder module in the network at 𝑖-th level, and 𝑥𝑟𝑒𝑐𝑜𝑛,𝑖
is the reconstructed image. Equation (8) shows that features at the same
network level are processed through encoding, filtering, and decoding
steps. It also uses a reconstructed result from a level below as an input
of filtered features.
Fig. 7 displays the decoder module structure that consists of two ad-
ditional layers, including a pixel shuffle and a convolution layer, besides
the two groups of the encoder sub-modules. The pixel shuffle is partic-
ularly used for increasing the features’ sizes instead of a convolution
transpose or bilinear interpolation like other techniques Ronneberger
Fig. 4. An overview structure of the proposed Strainer module. et al. (2015), Zhou et al. (2018), Gu et al. (2019). Then, the additional
convolutional layer is applied for a dimension matching. Finally, the
two groups of encoder sub-modules are used as the central reconstructor
using a max pooling. Then the extracted features are passed to the
unit. Since both the encoder module and the decoder module also use
Strainer module and the next encoder module.
the same encoder sub-module, it can prevent the network from strug-
gling to fit into the input dataset.
3.1.2. Strainer module
The pixel shuffle is proposed by Shi et al. Shi et al. (2016) for an
The Strainer module is placed between the encoder and the decoder image and video super-resolution. Unlike the conventional techniques,
parts inside StrainNet in every level, as shown in Fig. 2. The Strainer bi-cubic or linear interpolation, which usually uses a mathematical algo-
module enables the network to select a good quality of extracted fea- rithm, the pixel shuffle utilizes additional feature’s channels to increase
tures at various scales. Moreover, selected features are passed to the the feature size, as shown in Fig. 8. The pixel shuffle is performed by
decoder module as extra input data for the MRI reconstruction. The taking features from multiple channels and shuffling them into a single
feature selection process can be described below. channel with a larger size. As shown in Fig. 6, as an example, 2x2-
features from 4 channels are shuffled into a 4x4-feature in a single
𝑥𝑠𝑒𝑙𝑒𝑐𝑡,𝑖 = 𝑖 (𝑖 (𝑥𝑚𝑢𝑙𝑡𝑖𝑐𝑜𝑖𝑙 )) (7)
channel. So, this step could reduce a number of channels, but instead
where 𝑖 is the Strainer module in the network at 𝑖-th level, and 𝑥𝑠𝑒𝑙𝑒𝑐𝑡,𝑖 increase a feature size in a single channel.
is the selected features. Therefore, it can achieve a higher quality reconstructed image than
Fig. 4 demonstrates the structure of the Strainer module, which con- interpolation methods or even a well-known transpose convolution.
sists of two major components. They are concatenated with residual This is because, in the training process, the upsampling process is also
input data at the end. A first component is a group of multi-scales trained by the network for the reconstruction process. Consequently,
feature pooling called Residual Multi-Kernal Pooling (RMP) Gu et al. the network can learn to select correct data from each channel to use as
(2019). It acts as a strainer to discard low-quality features at multiple supplementary data for expanding the features’ sizes. Also, the network
scales. Therefore, RMP is adapted to help StrainNet to select good- that uses the pixel shuffle tends to output an image with fewer artifacts
quality features from mixed and complex multi-scales features. RMP when compared with the other techniques.
structure consists of four max-pooling kernels: 2x2, 3x3, 5x5, and 6x6,
as shown in Fig. 5, for handling multi-scales features. Then, it is fol- 3.2. Dual-domain loss
lowed by a 1x1 convolution for a dimension reduction. Selected features
are concatenated with the residual feature map at the end. Fig. 9a and 9b show an under-sampled MR image and an artificial
A second component is a Zoom-In/Zoom-Out. It consists of three blurred MR image that is affected by a motion blur in the horizontal
convolution layers with a residual concatenation. Zoom-In/Zoom-Out axis, respectively. It can be seen that the two images look close to each
can be separated into two sub-units. The first one is Zoom-In which other in terms of the superimpose pixel phenomenon. Because when
decreases feature size at the second convolution with a zooming factor. transforming the under-sampled k-space back to the spatial domain, it
Moreover, the second sub-unit is Zoom-Out, which increases the feature generates an aliased MR image, in which some pixels are folded into
size one step back. Then the features are passed to the third convolution neighborhood areas. As well as the blurred image, a group of pixels is
and concatenation, respectively. Zoom-In with a zooming factor enables linearly translated into the position of nearby pixels by blur parameters
StrainNet to learn roughly basic features from the encoder module at a Tiwari et al. (2013).
lower level. Zoom-Out re-sizes the features to match the decoder mod- When magnitude spectrum images are created by applying the
ule’s required size at the same level. Fourier Transform on both under-sampled and artificial blurred MR
Zoom-In/Zoom-Out unit’s flow helps the network to learn basic en- images, as shown in Fig. 10a and 10b, they both have the same charac-
coded features as additional data. Fig. 2 shows that the decoder module teristics. They are visible light vertical lines that represent the superim-
at the 𝑖𝑡ℎ level receives the input features from the (𝑖 + 1)𝑡ℎ level that posed pixel phenomenon. However, the magnitude spectrum image of
has already been decoded. However, the basic undecoded features help blurred MR image contains equable spreading of light vertical lines, un-
each decoder module to learn how to reconstruct an output image with like the magnitude spectrum image of under-sampled MR image with
selected good features. A zooming factor is determined by how much random spreading of light vertical lines due to the under-sampling
the feature size is decreased. This enables Zoom-In/Zoom-out to learn process. These can distinguish the average magnitude intensity value
more than one level of basic encoded features. Fig. 6 shows the diagram graph along both magnitude spectrum images’ horizontal axis (i.e., fre-
of Zoom-In/Zoom-Out structure. For the simplicity of the experiment, quency).
Zoom-In/Zoom-Out is used with a zooming factor of 2. This lets the The graph of the artificial blurred MR image, as shown in Fig. 11b,
network use basic encoded features from one level below. contains smoother lines between each peak than the under-sampled MR

5
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Fig. 5. An overview structure of RMP.

Fig. 6. An overview structure of the proposed Zoom-In/Zoom-Out component.

Fig. 7. An overview structure of the decoder module.

Fig. 8. An overview structure of the pixel shuffle operation.

Fig. 9. Illustration of (a) the under-sampled MR image, and (b) the artificial
blurred MR image.

image (Fig. 11a). Each peak in the graph represents the visible light ver-
Fig. 10. Illustration of (a) the magnitude spectrum image of the under-sampled
tical lines in the magnitude spectrum images (Fig. 10a and 10b). When
MR image, and (b) the artificial blurred MR image.
counting the number of peaks in the graph of blurred MR image, it is
equal to the number of blur parameters used to generate the blurred

6
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Fig. 12. Examples of under-sampled k-space images for (a) SENSE and (b)
GRAPPA techniques and their corresponding under-sampled MR images (c) and
(d) respectively, after applying the inverse Fourier transform.

Fig. 11. Illustration of (a) the average magnitude intensity value along the
horizontal axis graph of the under-sampled spectrum MR image, and (b) the
This loss function enables the network to learn differences between
artificial blurred spectrum MR image. spatial reconstructed and ground truth MR images. However, the net-
work can also learn the differences in the frequency data space. This
will fill the missing data in the magnitude spectrum image (i.e., visi-
MR image (Fig. 9b), where the central peak can be counted as two ble light vertical lines). Eventually, it will lead to a decrease in aliased
peaks Tiwari et al. (2013), Moghaddam and Jamzad (2007). Therefore, artifacts in the reconstructed MR image.
the magnitude spectrum image can identify a motion blur and its pa- Since the conventional-based MR image reconstruction techniques
rameters. Then a de-blurring process can be applied by constructing the are operated on different domains, where SENSE is on the spatial do-
motion blur model from this knowledge Tiwari et al. (2013), Moghad- main, and GRAPPA is on the frequency domain. Therefore, the under-
dam and Jamzad (2007), Mayntx et al. (1999), Ji and Liu (2008). sampling processes of these techniques are varied. SENSE requires an
However, the under-sampled MR image on both the magnitude spec- under-sampled k-space to be acquired by reducing the k-space lines
trum image and its graph is more complex than the blurred image. So equally, as shown in Fig. 12a. In contrast, GRAPPA requires an acquired
the traditional modeling process, such as the de-blurring, cannot be ap- k-space to be preserved in central areas (i.e., ACS lines) along with eq-
plied. The presented evidence shows that both the aliased and blurred uispaced reducing k-space lines for calculating GRAPPA’s weights, as
MR images have similar spatial and frequency domain characteristics. shown in Fig. 12b.
Instead of modeling an aliased frequency data for a de-aliased process Consequently, transformed MR images from these two techniques
like the motion blur, this research proposes a new loss function to uti- are very diverse, as shown in Fig. 12c and 12d. Since the GRAPPA
lize the sophistication of the deep convolutional neural network and technique typically outputs higher quality reconstructed MR images.
its ability to learn non-linearly complex features for training StrainNet, Therefore, most deep convolutional neural network techniques, includ-
called Dual-Domain Loss (DDL). DDL is a combination of two 𝐿1 losses ing the proposed StrainNet, use the under-sampled k-space similar to
on spatial and frequency (i.e., magnitude spectrum) data spaces. DDL the GRAPPA type Zbontar et al. (2018), Zhu et al. (2018). Therefore,
equation is described below. StrainNet has to learn to reconstruct the GRAPPA-typed under-sampled
MR images.
𝐷𝐷𝐿 = 𝛼1 𝐿1,𝑠𝑝𝑎𝑡𝑖𝑎𝑙 + 𝛼2 (𝜆𝐿1,𝑓 𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 ) (9)
4. Experimental results
with
In this section, experiments are operated on the multi-coil knee MR
𝐿1 = ||𝑥𝑟𝑒𝑐𝑜𝑛 − 𝑥𝑔𝑡 ||1 (10)
images from the fastMRI dataset Zbontar et al. (2018), Knoll et al.
where 𝑥𝑟𝑒𝑐𝑜𝑛 is the reconstructed MR image and 𝑥𝑔𝑡 is the ground truth (2020). The fastMRI dataset consists of fully sampled k-space data from
MR image, 𝜆 is a rescaling factor for 𝐿1,𝑓 𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 and both 𝛼1 and 𝛼2 1,594 scans of four different MRI machines. All k-space data samples
are weights of spatial and frequency 𝐿1 losses respectively (such that are 15 channels type. The dataset consists of two pulse sequences: coro-
𝛼1 + 𝛼2 = 1). nal proton-density weighting with fat suppression (PDFS, 798 scans)

7
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Table 1
Comparison between the proposed method and the existing networks performance for the multi-coil knee task on
the validation dataset.

Model Acceleration NMSE PSNR SSIM


PD PDFS PD PDFS PD PDFS

fastMRI network (Zbontar et al., 2018) 4-fold 0.0054 0.0112 37.58 36.39 0.9287 0.8655
U-Net (Ronneberger et al., 2015) 0.0046 0.0107 38.37 36.64 0.9354 0.8693
CE-Net (Gu et al., 2019) 0.2422 0.2015 20.96 23.64 0.6431 0.6756
U-Net++ (Zhou et al., 2018) 0.0222 0.0259 31.35 32.94 0.8387 0.8289
StrainNet 0.0044 0.0107 38.60 36.67 0.9373 0.8695

fastMRI network (Zbontar et al., 2018) 8-fold 0.0120 0.0181 34.12 34.23 0.8915 0.8286
U-Net (Ronneberger et al., 2015) 0.0087 0.0156 35.54 34.89 0.8997 0.8324
CE-Net (Gu et al., 2019) 0.2514 0.2045 20.79 23.55 0.6139 0.6658
U-Net++ (Zhou et al., 2018) 0.0256 0.0384 30.73 31.27 0.8164 0.7836
StrainNet 0.0078 0.0150 36.01 35.09 0.9048 0.8351

Table 2
Reconstruction results of the proposed method for each combination of
𝛼1 and 𝛼2 on the validation dataset.

𝛼1 𝛼2 NMSE PSNR SSIM


PD PDFS PD PDFS PD PDFS

0.0 1.0 0.3729 0.2022 19.00 23.69 0.3789 0.5468


0.1 0.9 0.0059 0.0118 37.19 36.15 0.9243 0.8611
0.2 0.8 0.0060 0.0119 37.09 36.11 0.9229 0.8607
0.3 0.7 0.0061 0.0120 37.02 36.07 0.9230 0.8607
0.4 0.6 0.0060 0.0118 37.11 36.13 0.9236 0.8609
0.5 0.5 0.0057 0.0115 37.39 36.26 0.9260 0.8627
0.6 0.4 0.0054 0.0114 37.62 36.33 0.9275 0.8631
0.7 0.3 0.0060 0.0118 37.13 36.15 0.9240 0.8615
0.8 0.2 0.0057 0.0116 37.38 36.24 0.9262 0.8628
0.9 0.1 0.0059 0.0117 37.19 36.19 0.9240 0.8618
1.0 0.0 0.0060 0.0118 37.12 36.16 0.9241 0.8618

and without fat suppression (PD, 796 scans). The knee images were 1x320x320x32) is used in this process. This smaller version can signifi-
cropped with a size of 320 × 320 pixels. All experiments use random cantly reduce the training time while preserving the same architecture’s
type masked k-space with both 4-fold and 8-fold for simulating the MRI benefits. The experimental results for each combination of 𝛼1 and 𝛼2 are
machine’s acceleration process. shown in Table 2.
The proposed StrainNet is compared with three existing networks In Table 2, it can be seen that the highest performance is obtained
of image reconstruction (see Table 1). The three networks used in the when 𝛼1 =0.6 and 𝛼2 =0.4. Then, the full-sized version of the proposed
comparisons are: network is further investigated around this optimal point. It is then con-
cluded that, with the full-sized version, the best combination is when
• U-Net Ronneberger et al. (2015) 𝛼1 =0.7 and 𝛼2 =0.3.
• CE-Net Gu et al. (2019) So, in the remaining experiments, the proposed StrainNet is trained
• U-Net++ Zhou et al. (2018) by multi-coils under-sampled MR images with the Dual-Domain Loss,
where (𝛼1 , 𝛼2 ) and 𝜆 are empirically set to (0.7, 0.3) and 0.05, respec-
All three networks are the state-of-the-art encoder-decoder types tively. The three compared networks are trained by the root sum-of-
of networks that present good performances in many fields, e.g., im- squares (RSS), combining multi-coil images into single MR images with
age segmentation, image reconstruction, and image super-resolution. the 𝐿1 loss. Experimental results are shown in Table 1, which compares
U-Net is a pinnacle and baseline for the encoder-decoder network Ron- StrainNet and the three networks with the other published fastMRI
neberger et al. (2015), where CE-Net is the enhanced version of the baseline results Zbontar et al. (2018). All networks are trained with
U-Net’s encoder part by replacing the traditional encoder module with the fastMRI’s training dataset and tested on the fastMRI’s validation
the ResNet-34 He et al. (2016) and shows significantly improved results dataset. The fastMRI’s testing dataset is not used because the fastMRI
Gu et al. (2019). competition only provides the ground truth MR images (fully-sampled
On the other hand, U-Net++ is the improved version of U-Net on MR images) on the training and validation datasets. Table 1 presents
creating nested groups of the encoder-decoder module to form a more the networks’ performances on three metrics with two types of pulse
complex structure. U-Net++ shows better results on the medical image sequences.
segmentation, when compared with U-Net Zhou et al. (2018). There- Examples of reconstructed MR images are shown in Fig. 13b - 14e
fore, it is promising to compare StrainNet with these networks. All and 15b - 16e for 4-fold and 8-fold accelerations, respectively, with
experiments are trained for 100 epochs with RMSProp algorithm to both pulse sequences. Reconstructed images are shown with ground-
minimize L1-loss for the three networks and the Dual-Domain Loss truth MR images for a comparison. Fig. 17b - 17f and 18b - 18f show
of StrainNet. Evaluation metrics used for measuring the performance some zoomed-in regions for 4-fold and 8-fold accelerations respectively.
are normalized mean squared error (NMSE), peak signal-to-noise ratio
(PSNR), and structural similarity (SSIM). 5. Discussion
Our experiment begins with finding optimum values for 𝛼1 and
𝛼2 . However, to speed up this optimization step, a smaller-sized ver- The experimental results show the promising reconstruction perfor-
sion of the proposed network (i.e., with the starting feature size of mances in both scenarios of 4-fold and 8-fold accelerations. Table 1

8
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Fig. 13. Sample reconstructions for 4-fold undersampling with the PD pulse sequence. They are (a) the Ground Truth image, (b) the U-Net reconstructed image, (c)
the CE-Net reconstructed image, (d) the U-Net++ reconstructed image, and (e) the StrainNet reconstructed image.

Fig. 14. Sample reconstructions for 4-fold undersampling with the PDFS pulse sequence. They are (a) the Ground Truth image, (b) the U-Net reconstructed image,
(c) the CE-Net reconstructed image, (d) the U-Net++ reconstructed image, and (e) the StrainNet reconstructed image.

Fig. 15. Sample reconstructions for 8-fold undersampling with the PD pulse sequence. They are (a) the Ground Truth image, (b) the U-Net reconstructed image, (c)
the CE-Net reconstructed image, (d) the U-Net++ reconstructed image, and (e) the StrainNet reconstructed image.

Fig. 16. Sample reconstructions for 8-fold undersampling with the PDFS pulse sequence. They are (a) the Ground Truth image, (b) the U-Net reconstructed image,
(c) the CE-Net reconstructed image, (d) the U-Net++ reconstructed image, and (e) the StrainNet reconstructed image.

shows that StrainNet outperforms the fastMRI baseline network and the ter than other networks with a superior contrast. This can be useful
other three networks in every evaluation metric on both 4-fold and 8- when diagnosing a disease. Besides, for a future work, alternative solu-
fold accelerations. Table 1 also shows that the improved U-Net type tions could be based on a type of generative adversarial network Sohan
networks, including CE-Net and U-Net++, struggle to reconstruct the and Yousuf (2020). Also, a quality of reconstructed images could be fur-
under-sampled MR image. This can be caused by replacing a standard ther validated using a well-trained classification model, e.g. Aurna et al.
encoder module with ResNet-34 in CE-Net and the nested groups of (2022). This could confirm that details of diseases are remained in re-
an encoder-decoder module in U-Net++. These two networks present constructed images in some extent, if these images could be correctly
higher performances on other problems. However, in the MR image classified.
reconstruction problem, the traditional encoder-decoder module out-
performs the improved versions. Although StrainNet achieved slightly 6. Conclusion
higher performance on the 4-fold acceleration than U-Net, StrainNet
outperforms U-Net performance significantly for the 8-fold accelera- In this paper, StrainNet was proposed for the MRI reconstruction.
tion. The zoomed-in images in Fig. 17b - 17f and 18b - 18f also indicates After masking k-space data, to simulate the acceleration process in the
that the reconstructed images from StrainNet preserve tiny details bet- MRI machine, the multi-coil under-sampled k-space was transformed

9
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Fig. 17. Sample reconstructions for 4-fold undersampling with the zoomed-in region. They are (a) the Ground Truth image with highlighted zoomed area, (b) the
Ground Truth zoomed image, (c) the U-Net zoomed reconstructed image, (d) the CE-Net zoomed reconstructed image, (e) the U-Net++ zoomed reconstructed image,
and (f) the StrainNet zoomed reconstructed image.

Fig. 18. Sample reconstructions for 8-fold undersampling with the zoomed-in region. They are (a) the Ground Truth image with highlighted zoomed area, (b) the
Ground Truth zoomed image, (c) the U-Net zoomed reconstructed image, (d) the CE-Net zoomed reconstructed image, (e) the U-Net++ zoomed reconstructed image,
and (f) the StrainNet zoomed reconstructed image.

back to the spatial under-sampled MR images. Then, StrainNet was used explained in the frequency domain. Therefore, the proposed StrainNet
to combine multi-coils under-sampled MR images and reconstruct the can learn to apply this information for the de-blurring process.
fully-sampled single MR image. The structure of StrainNet was based on The proposed method has a potential to achieve a better perfor-
the well-known encoder-decoder type of CNN that performed well on mance, especially on reconstructing details in a spatial domain from a
the reconstruction problems. StrainNet also included the multi-layers high frequency data. To do this, it is necessary to increase a complexity
pooling between encoder and decoder modules on every level besides of the proposed architecture. Therefore, it would need more computa-
the traditional encoder-decoder module. This could help the network tional resources and times for training the reconstruction model, since
eliminate irrelevant information while the under-sampled MR images the number of parameters could be expanded from approximately 500
were being trained. This research also proposed a new type of loss million to 1.5 billion.
function, called Dual-Domain Loss (DDL), by adding the 𝐿1 loss on the
frequency data and the standard 𝐿1 loss on the spatial data.
CRediT authorship contribution statement
Reconstruction results of the proposed network on a multi-coil knee
validation dataset could outperform U-Net and other encoder-decoder
networks in every metric, including NMSE, PSNR, and SSIM. StrainNet Worapan Kusakunniran: Conceptualization, Data curation, For-
can preserve small information that was usually difficult to reconstruct mal analysis, Methodology, Software, Validation, Writing – original
from the reconstructed results, especially in the 4-fold acceleration. draft. Sarattha Karnjanapreechakorn: Conceptualization, Data cura-
Besides the MR image reconstruction, StrainNet can be adapted to tion, Formal analysis, Methodology, Software, Validation, Writing –
other problems, such as a de-blurring process of an image that could original draft. Thanongchai Siriapisith: Conceptualization, Data cu-
be affected by a motion blur or different types of blurring. The DDL ration, Formal analysis, Methodology, Validation, Writing – review &
adopted information from the frequency and spatial domains that two editing. Pairash Saiviroonporn: Conceptualization, Data curation, For-
hyper-parameters could control. Moreover, the blurring effects could be mal analysis, Methodology, Validation, Writing – review & editing.

10
W. Kusakunniran, S. Karnjanapreechakorn, T. Siriapisith et al. Intelligent Systems with Applications 18 (2023) 200203

Declaration of competing interest Ji, H., & Liu, C. (2008). Motion blur identification from image gradients. In 2008 IEEE
conference on computer vision and pattern recognition, IEEE (pp. 1–8).
Knoll, F., Murrell, T., Sriram, A., Yakubova, N., Zbontar, J., Rabbat, M., Defazio, A.,
The authors declare that they have no known competing financial
Muckley, M. J., Sodickson, D. K., Zitnick, C. L., & Recht, P. M. (2020). Advancing
interests or personal relationships that could have appeared to influence machine learning for MR image reconstruction with an open competition: Overview
the work reported in this paper. of the 2019 fastMRI challenge. Magnetic Resonance in Medicine, 84, 3054–3070.
Liang, D., Cheng, J., Ke, Z., & Ying, L. (2020). Deep magnetic resonance image recon-
struction: Inverse problems meet neural networks. IEEE Signal Processing Magazine,
Data availability 37, 141–151.
Lustig, M., Donoho, D. L., Santos, J. M., & Pauly, J. M. (2008). Compressed sensing MRI.
No data was used for the research described in the article. IEEE Signal Processing Magazine, 25, 72–82.
Mayntx, C., Aach, T., & Kunz, D. (1999). Blur identification using a spectral inertia tensor
and spectral zeros. In Proceedings 1999 international conference on image processing
References (cat. 99CH36348) (pp. 885–889). IEEE.
Milletari, F., Navab, N., & Ahmadi, S. A. (2016). V-net: Fully convolutional neural net-
Aurna, N. F., Yousuf, M. A., Taher, K. A., Azad, A., & Moni, M. A. (2022). A classification works for volumetric medical image segmentation. In 2016 fourth international con-
of MRI brain tumor based on two stage feature level ensemble of deep CNN models. ference on 3D vision (3DV) (pp. 565–571). IEEE.
Computers in Biology and Medicine, 146, Article 105539. Moghaddam, M. E., & Jamzad, M. (2007). Motion blur identification in noisy images using
Baert, A. (2007). Parallel imaging in clinical MR applications. Springer Science & Business mathematical models and statistical measures. Pattern Recognition, 40, 1946–1957.
Media. Pruessmann, K. P., Weiger, M., Scheidegger, M. B., & Boesiger, P. (1999). SENSE: Sensi-
Blaimer, M., Breuer, F., Mueller, M., Heidemann, R. M., Griswold, M. A., & Jakob, P. M. tivity encoding for fast MRI. Magnetic Resonance in Medicine: An Official Journal of the
(2004). SMASH, SENSE, PILS, GRAPPA: How to choose the optimal method. Topics in International Society for Magnetic Resonance in Medicine, 42, 952–962.
Magnetic Resonance Imaging, 15, 223–236. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014a). On the properties biomedical image segmentation. In International conference on medical image comput-
of neural machine translation: Encoder-decoder approaches. arXiv preprint, arXiv: ing and computer-assisted intervention (pp. 234–241). Springer.
1409.1259. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang,
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Z. (2016). Real-time single image and video super-resolution using an efficient sub-
Bengio, Y. (2014b). Learning phrase representations using RNN encoder-decoder for pixel convolutional neural network. In Proceedings of the IEEE conference on computer
statistical machine translation. arXiv preprint, arXiv:1406.1078. vision and pattern recognition (pp. 1874–1883).
Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52, Sohan, K., & Yousuf, M. A. (2020). 3D bone shape reconstruction from 2D X-ray images
1289–1306. using med generative adversarial network. In 2020 2nd international conference on
Griswold, M. A., Jakob, P. M., Heidemann, R. M., Nittka, M., Jellus, V., Wang, J., Kiefer, advanced information and communication technology (ICAICT) (pp. 53–58). IEEE.
B., & Haase, A. (2002). Generalized autocalibrating partially parallel acquisitions Sriram, A., Zbontar, J., Murrell, T., Zitnick, C. L., Defazio, A., & Sodickson, D. K. (2020).
(GRAPPA). Magnetic Resonance in Medicine: An Official Journal of the International GrappaNet: Combining parallel imaging with deep learning for multi-coil MRI re-
Society for Magnetic Resonance in Medicine, 47, 1202–1210. construction. In Proceedings of the IEEE/CVF conference on computer vision and pattern
Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., Zhang, T., Gao, S., & Liu, J. (2019). recognition (pp. 14315–14322).
CE-net: Context encoder network for 2D medical image segmentation. IEEE Transac- Tiwari, S., Shukla, V. P., Singh, A., & Biradar, S. (2013). Review of motion blur estimation
tions on Medical Imaging, 38, 2281–2292. techniques. Journal of Image and Graphics, 1, 176–184.
Hamilton, J., Franson, D., & Seiberlich, N. (2017). Recent advances in parallel imaging Wang, S., Su, Z., Ying, L., Peng, X., Zhu, S., Liang, F., Feng, D., & Liang, D. (2016). Accel-
for MRI. Progress in Nuclear Magnetic Resonance Spectroscopy, 101, 71–95. erating magnetic resonance imaging via deep learning. In 2016 IEEE 13th international
Hammernik, K., Klatzer, T., Kobler, E., Recht, M. P., Sodickson, D. K., Pock, T., & Knoll, symposium on biomedical imaging (ISBI) (pp. 514–517). IEEE.
F. (2018). Learning a variational network for reconstruction of accelerated MRI data. Zbontar, J., Knoll, F., Sriram, A., Murrell, T., Huang, Z., Muckley, M. J., Defazio, A., Stern,
Magnetic Resonance in Medicine, 79, 3055–3071. R., Johnson, P., Bruno, M., et al. (2018). fastMRI: An open dataset and benchmarks
Hammernik, K., Knoll, F., Sodickson, D., & Pock, T. (2016). Learning a variational model for accelerated MRI. arXiv preprint, arXiv:1811.08839.
for compressed sensing MRI reconstruction. In Proceedings of the international society Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., & Liang, J. (2018). UNet++: A nested
of magnetic resonance in medicine (ISMRM) (p. 1088). U-net architecture for medical image segmentation. In Deep learning in medical image
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recogni- analysis and multimodal learning for clinical decision support (pp. 3–11). Springer.
tion. In Proceedings of the IEEE conference on computer vision and pattern recognition Zhu, B., Liu, J. Z., Cauley, S. F., Rosen, B. R., & Rosen, M. S. (2018). Image reconstruction
(pp. 770–778). by domain-transform manifold learning. Nature, 555, 487–492.

11

You might also like