Path-Restore: Learning Network Path Selection For Image Restoration

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Path-Restore: Learning Network Path Selection for Image Restoration

Ke Yu1 Xintao Wang1 Chao Dong2 Xiaoou Tang1 Chen Change Loy3
1
CUHK - SenseTime Joint Lab, The Chinese University of Hong Kong
2
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
3
Nanyang Technological University, Singapore
{yk017, wx016, xtang}@ie.cuhk.edu.hk chao.dong@siat.ac.cn ccloy@ntu.edu.sg
arXiv:1904.10343v1 [cs.CV] 23 Apr 2019

Performance‐Complexity Trade‐Off
Abstract 3
ൈ10‐3
smooth region, mild noise
smooth region, severe noise
texture region, severe noise
Very deep Convolutional Neural Networks (CNNs)
have greatly improved the performance on various image
restoration tasks. However, this comes at a price of in- 2

creasing computational burden, which limits their practi-


MSE Loss zoom in
cal usages. We believe that some corrupted image regions
are inherently easier to restore than others since the dis-
tortion and content vary within an image. To this end, we 1

propose Path-Restore, a multi-path CNN with a pathfinder


that could dynamically select an appropriate route for each
image region. We train the pathfinder using reinforcement
learning with a difficulty-regulated reward, which is related 0
3 5 12 20
to the performance, complexity and “the difficulty of restor- Network Depth Input 5 layers 12 layers 20 layers

ing a region”. We conduct experiments on denoising and Figure 1. The relation between performance and network depth for
mixed restoration tasks. The results show that our method different image regions. The noise level is gradually increasing
could achieve comparable or superior performance to exist- from the left to the right side. The MSE loss is shown on the left
while visual results are presented on the right.
ing approaches with less computational cost. In particular,
our method is effective for real-world denoising, where the
noise distribution varies across different regions of a sin- Do we really need a very deep CNN to process all the
gle image. We surpass the state-of-the-art CBDNet [13] regions of a distorted image? In reality, a large variation of
by 0.94 dB and run 29% faster on the realistic Darmstadt image content and distortion may exist in a single image,
Noise Dataset [27]. Models and codes will be released.1 and some regions are inherently easier to process than oth-
ers. An example is shown in Figure 1, where we use several
denoising CNNs with different depth to restore a noisy im-
1. Introduction age. It is observed that a smooth region with mild noise in
the red box is well restored with a 5-layer CNN, and fur-
Image restoration aims at estimating a clean image from
ther increasing the network depth can hardly bring any im-
its distorted observation. Very deep Convolutional Neu-
provement. In the green box, a similar smooth region yet
ral Networks (CNNs) have achieved great success on var-
with severe noise requires a 12-layer CNN to process, and
ious restoration tasks. For example, in the NTIRE 2018
a texture region in the blue box still has some artifacts after
Challenge [32], the top-ranking super-resolution methods
being processed by a 20-layer CNN. This observation moti-
are built on very deep models such as EDSR [23] and
vates the possibility of saving computations by selecting an
DenseNet [15], achieving impressive performance even
optimal network path for each region based on its content
when the input images are corrupted with noise and blur.
and distortion.
However, as the network becomes deeper, the increasing
computational cost makes it less practical for real-world ap- In this study, we propose Path-Restore, a novel frame-
plications. work that can dynamically select a path for each image re-
gion with specific content and distortion. In particular, a
1 https://yuke93.github.io/Path-Restore/ pathfinder is trained together with a multi-path CNN to find
the optimal dispatch policy for each image region. Since solve image restoration problems.
path selection is non-differentiable, the pathfinder is trained
in a reinforcement learning (RL) framework driven by a re- 2. Related Work
ward that encourages both high performance and few com-
putations. There are mainly two challenges. First, the multi- CNN for Image Restoration. Image restoration has been
path CNN should be able to handle a large variety of dis- extensively studied for decades. CNN-based methods have
tortions with limited computations. Second, the pathfinder achieved dramatic improvement in several restoration tasks
requires an informative reward to learn a good policy for including denoising [22, 16, 6, 13], deblurring [38, 30, 26],
different regions with diverse contents and distortions. deblocking [8, 35, 11, 12] and super-resolution [9, 18, 19,
We make two contributions to address these challenges: 31, 21, 34].
Although most of the previous papers focus on address-
(1) We devise a dynamic block as the basic unit of our
ing a specific degradation, several pioneering works aim at
multi-path CNN. A dynamic block contains a shared path
dealing with a large variety of distortions simultaneously.
followed by several dynamic paths with different complex-
For example, Zhang et al. [40] propose DnCNN that em-
ity. The proposed framework allows flexible design on the
ploys a single CNN to address different levels of Gaussian
number of paths and path complexity according to the spe-
noise. Guo et al. [13] develop CBDNet that estimates a
cific task at hand.
noise map as a condition to handle diverse real noise. How-
(2) We devise a difficulty-regulated reward, which values
ever, processing all the image regions using a single path
the performance gain of hard examples more than simple
makes these methods less efficient. Smooth regions with
ones, unlike conventional reward functions. Specifically,
mild distortions do not need such a deep network to restore.
the difficulty-regulated reward scales the performance gain
Yu et al. [39] propose RL-Restore to address mixed dis-
of different regions by an adaptive coefficient that repre-
tortions. RL-Restore is the seminal work that applies deep
sents the difficulty of restoring this region. The difficulty is
reinforcement learning in solving image restoration prob-
made proportional to a loss function (e.g., MSE loss), since
lems. Their method adaptively selects a sequence of CNNs
a large loss suggests that the region is difficult to restore,
to process each distorted image, achieving comparable per-
such as the region in blue box of Figure 1. We experimen-
formance with a single deep network while using fewer
tally find that this reward helps the pathfinder learn a rea-
computational resources. Unlike RL-Restore that requires
sonable dispatch policy.
manual design for a dozen CNNs, we only use one net-
The proposed method differs significantly from several
work to achieve dynamic processing. Our method can thus
works that explore ways to process an input image dynam-
be more easily migrated and generalized to various tasks.
ically. Yu et al. [39] propose RL-Restore that dynamically
We also empirically find that our method runs faster and
selects a sequence of small CNNs to restore a distorted im-
produces more spatially consistent results than RL-Restore
age. Wu et al. [37] and Wang et al. [33] propose to dy-
when processing a large image. Moreover, our framework
namically skip some blocks in ResNet [14] for each im-
enjoys the flexibility to balance the trade-off between per-
age, achieving fast inference while maintaining good per-
formance and complexity by merely adjusting the reward
formance for the image classification task. These existing
function, which is not possible in Yu et al. [39].
dynamic networks treat all image regions equally, and thus
Dynamic Networks. Dynamic networks have been inves-
cannot satisfy our demand to select different paths for dif-
tigated to achieve a better trade-off between speed and per-
ferent regions. We provide more discussions in the related
formance in different tasks. Bengio et al. [5] use condi-
work section.
tional computation to selectively activate one part of the
We summarize the merits of Path-Restore as follows:
network at a time. Recently, several approaches [10, 37, 33]
1) Thanks to the dynamic path-selection mechanism, our
are proposed to save the computational cost in ResNet [14]
method achieves comparable or superior performance to ex-
by dynamically skipping some residual blocks. While the
isting approaches with faster speed on various restoration
aforementioned methods mainly address one single task,
tasks. 2) Our method is particularly effective in real-world
Rosenbaum et al. [28] develop a routing network to facili-
denoising, where the noise distribution is spatially variant
tate multi-task learning, and their method seeks an appropri-
within an image. We achieve the state-of-the-art perfor-
ate network path for each specific task. However, the path
mance on the realistic Darmstadt Noise Dataset (DND) [27]
selection is irrelevant to the image content in each task.
benchmark2 3) Our method is capable of balancing the per-
Existing dynamic networks successfully explore a rout-
formance and complexity trade-off by simply adjusting a
ing policy that depends on either the whole image content
parameter in the proposed reward function during training.
or the task to address. However, we aim at selecting dy-
4) After RL-Restore [39], this is another successful attempt
namic paths to process different regions of a distorted im-
of combining reinforcement learning and deep learning to
age. In this case, both the content (e.g., smooth regions or
2 https://noise.visinf.tu-darmstadt.de/benchmark/#results srgb rich textures) and the restoration task (e.g., distortion type
Input Output GT
1

Dynamic
Dynamic
Dynamic

Block
Block
Block
Conv

Conv
… … L2 loss

Pathfinder Reward
Conv

Conv

Conv

Conv
Forward

+ 1 Backward 



+

LSTM
Conv

Conv
Conv

Conv

Fc
Fc
Path
Pathfinder Selection
+

Figure 2. Framework Overview. Path-Restore is composed of a multi-path CNN and a pathfinder. The multi-path CNN contains N dynamic
blocks, each of which has M optional paths. The number of paths is made proportional to the number of distortion types we aim to address.
The pathfinder is able to dynamically select paths for different image regions according to their contents and distortions.

and severity) have a weighty influence on the performance, path, there are M dynamic paths denoted by f1i , f2i , . . . , fM
i
.
and they should be both considered in the path selection. Each dynamic path contains a residual block except that the
To address this specific challenge, we propose a dynamic first path is a bypass connection. According to the output of
block to offer diverse paths and adopt a difficulty-regulated pathfinder, the dynamic path with the highest probability is
reward to effectively train the pathfinder. The contributions activated, where the path index is denoted by ai . Therefore,
are new in the literature. the i-th dynamic block can be formulated as:

3. Methodology xi+1 = fai i (fSi (xi )), (1)

Our task is to recover a clear image y from its distorted where xi and xi+1 denote the input and output of the i-th
observation x. There might exist multiple types of distor- dynamic block, respectively. Note that in two different dy-
tions (e.g., blur and noise), and each distortion may have namic blocks, the parameters of each corresponding path
different degrees of severity. We wish that our method could are different, while the parameters of pathfinder are shared.
scale well to complex restoration tasks effectively and ef- As can be seen from Eq. (1), the image features go
ficiently. Therefore, the challenges are mainly two-fold. through the shared path and one dynamic path. The shared
First, the network should be capable of addressing a large path is designed to explore the similarity among different
variety of distortions efficiently. Second, the pathfinder re- tasks and images. The dynamic paths offer different op-
quires an informative reward to learn an effective dispatch tions of complexity. Intuitively, simple samples should be
policy based on the content and distortion of each image led to the bypass path to save computations, while hard ex-
region. amples might be guided to another path according to its
We propose Path-Restore to address the above chal- content and distortion. The number of dynamic paths can
lenges. We first provide an overview of the architecture of be designed based on the complexity of distortion types in a
Path-Restore in Sec. 3.1. In Sec. 3.2, we then introduce the specific task. We show various options in our experiments.
structure of pathfinder and the reward to evaluate its policy.
Finally, we detail the training algorithm in Sec. 3.3. 3.2. Pathfinder
The pathfinder is the key component to achieve path se-
3.1. Architecture of Path-Restore
lection. As shown in Figure 2, the pathfinder contains C
We aim to design a framework that can offer different op- convolutional layers, followed by two fully-connected lay-
tions of complexity. To this end, as shown in Figure 2, Path- ers with a Long Short-Term Memory (LSTM) module in the
Restore is composed of a convolutional layer at the start- middle. The number of convolutional layers depends on the
and end-point, and N dynamic blocks in the middle. In the specific task, which will be specified in Sec. 4. The LSTM
i
i-th dynamic block fDB , there is a shared path fSi that every module is used to capture the correlations of path selection
image region should pass through. Paralleling to the shared in different dynamic blocks. The pathfinder accounts for
path, a pathfinder fP F generates a probabilistic distribu- less than 3% of the overall computations.
tion of plausible paths for selection. Following the shared Since path selection is non-differentiable, we formulate
the sequential path selection as a Markov Decision Pro- Algorithm 1 REINFORCE (1 update)
cess (MDP) and adopt reinforcement learning to train the Get a batch of K image pairs, {x(1) , y (1) }, · · · , {x(K) , y (K) }
pathfinder. We will first clarify the state and action of this (k) (k) (k) (k) (k)
With policy π, derive (s1 , a1 , r1 , . . . , sN , aN , rN )
(k)

MDP, and then illustrate the proposed difficulty-regulated Compute gradients using REINFORCE
reward to train the pathfinder.
K N N
State and Action. In the i-th dynamic block, the state si 1 XX (k) (k)
X (k)
∆θ = ∇θ log π(ai |si ; θ)( rj − b(k) )
consists of the input features xi and the hidden state of K i=1 j=i
k=1
LSTM hi , denoted by si = {xi , hi }. Given the state si ,
the pathfinder fP F generates a distribution of path selec- Update parameters θ ← θ + β∆θ
tion, which can be formulated as fP F (si ) = π(a|si ). In the
training phase, the action (path index) is sampled from this
probabilistic distribution, denoted by ai ∼ π(a|si ). While 3.3. Training Algorithm
in the testing phase, the action is determined by the highest The training process is composed of two stages. In the
probability, i.e., ai = argmaxa π(a|xi ). first stage, we train the multi-path CNN with random pol-
Difficulty-Regulated Reward. In an RL framework, the icy of path selection as a good initialization. In the second
pathfinder is trained to maximize a cumulative reward, and stage, we then train the pathfinder and the multi-path CNN
thus a proper design of reward function is critical. Exist- simultaneously, so that the two components are better asso-
ing dynamic models for classification [37, 33] usually use a ciated and optimized.
trade-off between accuracy and computations as the reward. The First Stage. We first train the multi-path CNN with
However, in our task, a more effective reward is required randomly selected routes. The loss function consists of two
to learn a reasonable dispatch policy for different image parts. First, a final L2 loss constrains the output images to
regions that have diverse contents and distortions. There- have a reasonable quality. Second, an intermediate loss en-
fore, we propose a difficulty-regulated reward that not only forces the states observed by the pathfinder at each dynamic
considers performance and complexity, but also depends on block to be consistent. Specifically, the loss function is for-
“the difficulty of restoring an image region”. In particular, mulated as:
the reward at the i-th dynamic block is formulated as: N
X
 L1 = ||y − ŷ||22 +α ||y − flast (xi )||22 , (4)
−p(1 − 1{1} (ai )), 1 ≤ i < N, i=1
ri =
−p(1 − 1{1} (ai )) + d(−∆L2 ), i = N,
(2) where ŷ and y denote the output image and the ground truth,
where p is the reward penalty for choosing a complex path respectively. The N represents the number of dynamic
in one dynamic block. The 1{1} (·) represents an indica- blocks. The last convolutional layer of Path-Restore is de-
tor function and ai is the selected path index in the i-th noted by flast , and α is the weight of the intermediate loss.
dynamic block. When the first path (bypass connection) If no intermediate loss is adopted, the features observed at
is selected, i.e., ai = 1, the reward is not penalized. The different dynamic blocks may vary a lot from each other,
performance and difficulty are only considered in the N -th which makes it more difficult for the pathfinder to learn a
dynamic block. In particular, −∆L2 represents the perfor- good dispatch policy.
mance gain in terms of L2 loss. The difficulty is denoted by The Second Stage. We then train the pathfinder together
d in the following formula: with the multi-path CNN. In this stage, the multi-path CNN
is trained using the loss function in Eq. (4) with α = 0,
 i.e., intermediate loss does not take effect. Meanwhile, we
Ld /L0 , 0 ≤ Ld < L0 ,
d= (3) train the pathfinder using the REINFORCE algorithm [36],
1, Ld ≥ L0 ,
as shown in Algorithm 1. The parameters of pathfinder and
learning rate are denoted by θ and β, respectively. We adopt
where Ld is the loss function of the output image and L0
an individual baseline reward b(k) to reduce the variance of
is a threshold. As Ld approaches zero, the difficulty d de-
gradients and stabilize training. In particular, b(k) is the re-
creases, indicating the input region is easy to restore. In this
ward of the k-th image when the bypass connection is se-
case, the performance gain is regulated by d < 1 and the re-
lected in all dynamic blocks.
ward also becomes smaller, so that the proposed difficulty-
regulated reward will penalize the pathfinder for wasting 4. Experiments
too many computations on easy regions. There are multi-
ple choices for Ld such as L2 loss and VGG loss [17], and In this section, we first clarify the implementation de-
different loss functions may lead to different policy of path tails. In Sec. 4.1, we present the results of blind Gaus-
selection. sian denoising. In Sec. 4.2, we then show the capability
Table 1. Settings of architecture and reward for different tasks. 50 images in the DIV2K training set (denoted by DIV2K-
Architecture Reward
Task T50). In addition to uniform Gaussian noise, we also con-
N M C Threshold(L0 ) Penalty(p)
duct experiments on spatially variant Gaussian noise using
Denoising 6 2 2 5×10−4 8×10−6 the settings in [41]. Specifically, “linear” denotes that the
Mixed 5 4 4 0.01 4×10−5
Gaussian noise level gradually increases from 0 to 50 from
the left to the right side of an image. Another spatially vari-
of Path-Restore to address a mixed of complex distortions. ant noise, denoted by “peaks”, is obtained by translating
In Sec. 4.3, we further demonstrate that our method per- and scaling Gaussian distributions as in [41], and the noise
forms well on real-world denoising. Finally, in Sec. 4.4, we level also ranges from 0 to 50.
offer an ablation study to validate the effectiveness of the Quantitative and Qualitative Results. We report the
network architecture and the difficulty-regulated reward. PSNR performance and average FLOPs in Table 2. For
Network Architecture. As shown in Table 1, we select dif- both uniform and spatially variant denoising on different
ferent architectures for different restoration tasks. Specifi- datasets, Path-Restore is able to achieve about 26% speed-
cally, we employ N = 6 dynamic blocks and M = 2 paths up than DnCNN [40] while achieving comparable perfor-
for all the denoising tasks, and the pathfinder has C = 2 mance. The acceleration on DIV2K-T50 is slightly larger
convolutional layers. For the task to address mixed distor- than that on CBSD68, because high-resolution images in
tions, we adopt N = 5 dynamic blocks, M = 4 paths, DIV2K-T50 tend to have more smooth regions that can
and C = 4 convolutional layers in the pathfinder. These be restored in short network paths. When processing a
parameters are chosen empirically on the validation set for 512×512 image, the CPU runtime of DnCNN is 3.16s,
good performance and also for fair comparison with base- while the runtime of Path-Restore is 2.37s and 2.99s given
line methods. σ = 10 and σ = 50, respectively. In practice, Path-Restore
Training Details. As shown in Table 1, the threshold of is about 20% faster than DnCNN on average.
difficulty L0 and the reward penalty p are set based on the Several qualitative results of spatially variant (type
specific task. Generally it requires larger threshold and re- “peaks”) Gaussian denoising are shown in Figure 3. Al-
ward penalty when the distortions are more severe. For all though most of the restored images are visually compara-
tasks, the difficulty d is measured by L2 loss. In the first ble, Path-Restore achieves better results on some regions
training stage, the coefficient of intermediate loss α is set with fine-grained textures, since Path-Restore could select
as 0.1. Both training stages take 800k iterations. Training more complex paths to process hard examples with detailed
images are cropped into 63×63 patches and the batch size textures.
is 32. The learning rate is initially 2×10−4 and decayed by Path Selection. We first visualize the path selection for
a half after every quarter of the training process. We use uniform Gaussian denoising in Figure 4. The input noisy
Adam [20] as the optimizer and implement our method on images are shown in the first row, and the policy under p =
TensorFlow [1]. 8 × 10−6 and p = 5 × 10−6 are presented in the second and
Evaluation Details. During testing, each image is split into third row, respectively. The green color represents a short
63×63 regions with a stride 53. After being processed by and simple path while the red color stands for a long and
the multi-path CNN, all the regions are merged into a large complex path.
image with overlapping pixels averaged. In each task, we It is observed that a larger reward penalty p leads to
report the number of floating point operations (FLOPs) that a shorter path, as the pathfinder learns to avoid choosing
are required to process a 63×63 region on average. We also overly complex paths that have very high reward penalty.
record the CPU3 runtime to process a 512×512 image. Although the noise level is uniform within each image, the
4.1. Evaluation on Blind Gaussian Denoising pathfinder learns a dispatch policy that depends on the im-
age content. For example, the pathfinder focuses on pro-
Training and Testing Details. We first conduct exper- cessing the subject of an image (e.g., the airplane and pen-
iments on a blind denoising task where the distortion is guin in the left two images) while assigning the smooth
Gaussian noise with a wide range of standard deviation [0, background to simple paths.
50]. The training dataset includes the first 750 images of The dispatch policy for spatially variant (type “linear”)
DIV2K [3] training set (denoted by DIV2K-T750) and 400 Gaussian denoising are shown in Figure 5. As the noise
images used in [7]. For fair comparison, we use the offi- level gradually increases from left to right, the chosen paths
cially released codes to train DnCNN [40] with the same become more and more complex. This demonstrates that
training dataset as ours. Following [40], we test our model the pathfinder learns a dispatch policy based on distortion
on CBSD684 , and we further do evaluation on the remaining level. Moreover, on each blue dash line, the noise level re-
3 Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz mains the same but the path selection is quite diverse, again
4 Color images of BSD68 dataset [29]. demonstrating the pathfinder could select short paths for
Table 2. PSNR and average FLOPs of blind Gaussian denoising on CBSD68 and DIV2K-T50 datasets. The unit of FLOPs is Giga (×109 ).
Path-Restore is consistently 25%+ faster than DnCNN with comparable performance on different noise settings.
Dataset CBSD68 [29] DIV2K-T50 [2]
uniform spatially variant uniform spatially variant
Noise
σ=10 σ=50 FLOPs linear peaks FLOPs σ=10 σ=50 FLOPs linear peaks FLOPs
DnCNN [40] 36.07 27.96 5.31G 31.17 31.15 5.31G 37.32 29.64 5.31G 32.82 32.64 5.31G
Path-Restore 36.04 27.96 4.22G 31.18 31.15 4.22G 37.26 29.64 4.20G 32.83 32.64 4.17G
DnCNN
Path‐Restore

Figure 3. Qualitative results of spatially variant (type “peaks”) Gaussian denoising. While most visual results are comparable, Path-Restore
recovers texture regions better than DnCNN since these regions are processed with long paths.
Input
10
8

long
10
5

short

Figure 4. The policy of path selection for uniform Gaussian denoising σ = 30. The pathfinder learns to select long paths for the objects
with detailed textures while process the smooth background with short paths.
Input Path selection Input Path selection Noise distribution
long
50

short
Figure 5. The policy of path selection for spatially variant (type “linear”) Gaussian denoising. The pathfinder learns a dispatch policy based
on both the content and the distortion, i.e., short paths for smooth and clean regions while long paths for textured and noisy regions.

smooth regions while choose long paths for texture regions. comparison, we test RL-Restore in two ways for large im-
ages: 1) each 63×63 region is processed using a specific
4.2. Evaluation on Mixed Distortions toolchain (denoted by RL-Restore-re), and 2) each 63×63
Training and Testing Details. We further evaluate our region votes for a toolchain, and the whole image is pro-
method on a complex restoration task as that in RL- cessed with a unified toolchain that wins the most votes (de-
Restore [39], where an image is corrupted by different lev- noted by RL-Restore-im).
els of Gaussian blur, Gaussian noise and JPEG compres- Quantitative Results. The quantitative results are shown
sion simultaneously. Following [39], we use DIV2K-T750 in Table 3. The results on 63×63 sub-images show that
for training and DIV2K-T50 for testing. We report results Path-Restore achieves a slightly better performance com-
not only on 63×63 sub-images as in [39] but also on the pared with RL-Restore. While testing on large images with
whole images with 2K resolution. To conduct a thorough 2K resolution, our method shows more apparent gain (0.2
Path‐Restore RL‐Restore‐im RL‐Restore‐re
26.18dB 27.31dB 24.04dB 30.08dB

25.98dB 26.78dB 24.32dB 30.04dB

26.55dB 27.59dB 24.51dB 30.35dB

Figure 6. Qualitative results of addressing mixed distortions. RL-Restore [39] tends to generate artifacts across different regions (see the
white arrow), while Path-Restore is able to handle diverse distortions and produce more spatially consistent results (zoom in for best view).

Table 3. Results of addressing mixed distortions on DIV2K- Table 4. Results of real-world denoising on the Darmstadt Noise
T50 [3] compared with two variants of RL-Restore [39]. Dataset [27]. Unpublished works are denoted by “*”.
Image size 63x63 image 2K image Method PSNR / SSIM FLOPs (G) Time (s)
Metric PSNR / SSIM PSNR / SSIM FLOPs(G) CBDNet [13] 38.06 / 0.9421 6.94 3.95
RL-Restore-re N/A 25.61 / 0.8264 1.34 NLH* 38.81 / 0.9520 N/A N/A
RL-Restore-im 26.45 / 0.5587 25.55 / 0.8251 0.948 DRSR v1.4* 39.09 / 0.9509 N/A N/A
Path-Restore 26.48 / 0.5667 25.81 / 0.8327 1.38 MLDN* 39.23 / 0.9516 N/A N/A
RU sRGB* 39.51 / 0.9528 N/A N/A
Path-Restore 39.00 / 0.9542 5.60 3.06
dB higher) than RL-Restore-re and RL-Restore-im with just Path-Restore-Ext 39.72 / 0.9591 22.6 12.6
a tiny increase in computational complexity (FLOPs). The
CPU runtime of RL-Restore-re, RL-Restore-im and Path-
sensor sizes.
Restore are 1.34s, 1.09s and 0.954s, respectively. These
Quantitative Results. The quantitative results are pre-
results show that Path-Restore runs faster than RL-Restore
sented in Table 4, where CBDNet [13] is the method
in practice. The reason is that RL-Restore needs extra time
that achieves the highest performance among all the pub-
to switch among different models while this overhead is not
lished works. Using the same training datasets as CBD-
needed in the unified framework – Path-Restore.
Net, Path-Restore significantly improves the performance
Qualitative Results. As shown in Figure 6, RL-Restore- by nearly 1 dB with less computational cost and CPU run-
re tends to produce inconsistent boundaries and appearance time. Path-Restore-Ext denotes an extended deeper model
between adjacent 63×63 regions, as each region may be trained with more data, which further improves 0.7 dB and
processed by entirely different models. RL-Restore-im fails currently achieves the state-of-the-art performance on the
to address severe noise or blur in some regions because only DND benchmark. More details are presented in the supple-
a single toolchain can be selected for the whole image. On mentary material.
the contrary, thanks to the dynamic blocks in a unified net- Qualitative Results. As shown in Figure 7, CBDNet fails
work, Path-Restore can not only remove the distortions in to remove the severe noise in dim regions and tends to gen-
different regions but also yield spatially consistent results. erate artifacts along edges. This may be caused by the in-
correct noise estimation for real-world images. On the con-
4.3. Evaluation on Real-World Denoising
trary, Path-Restore can successfully clean the noise in dif-
Training and Testing Details. We further conduct exper- ferent regions and recover sharp edges without artifacts, in-
iments on real-word denoising. The training data are con- dicating that our path selection is more robust to handle real
sistent with CBDNet [13] for a fair comparison. In par- noise compared with the noise estimation in CBDNet.
ticular, we use BSD500 [25] and Waterloo [24] datasets to
4.4. Ablation Study
synthesize noisy images, and use the same noise model as
CBDNet. A real dataset RENOIR [4] is also adopted for Architecture of Dynamic Block. We investigate the num-
training as CBDNet. We evaluate our method on the Darm- ber of paths in each dynamic block. For the task to address
stadt Noise Dataset (DND) [27]. The DND contains 50 re- mixed distortions, the number of paths is originally set as
alistic high-resolution paired noisy and noise-free images, 4 since there are 3 paths for various distortions and 1 by-
captured by four different consumer cameras with differing pass connection. As shown in Table 5, we alternatively use
Input CBDNet Path‐Restore Input CBDNet Path‐Restore

Figure 7. Qualitative results on the Darmstadt Noise Dataset [27]. Path-Restore recovers clean results with sharp edges.

Table 5. Ablation study on the number of paths in a dynamic block. WĞƌĨŽƌŵĂŶĐĞͲŽŵƉůĞdžŝƚLJdƌĂĚĞͲKĨĨ


DIV2K-T50 [2] ϯϮ͘ϵϱ ŝĨĨŝĐƵůƚLJͲƌĞŐƵůĂƚĞĚ
Dataset EŽŶͲƌĞŐƵůĂƚĞĚ
63×63 sub-image 2K image
ŶEE
Metric PSNR FLOPs (G) PSNR FLOPs (G) ϯϮ͘ϵϬ
4 paths 26.48 1.00 25.81 1.38 W^EZ
2 paths 26.46 1.09 25.80 1.48 ϯϮ͘ϴϱ

ϯϮ͘ϴϬ
2 paths to address the same task. It is observed that the
FLOPs increase by nearly 10% when achieving comparable ϯϮ͘ϳϱ
ϰ͘ϱ ϱ͘Ϭ ϱ͘ϱ ϲ͘Ϭ
performance with the original setting. &>KWƐ ϭĞϵ
Difficulty-Regulated Reward. A performance-complexity Figure 8. Performance-complexity trade-off vs. non-regulated re-
trade-off can be achieved by adjusting the reward penalty p ward and DnCNN for spatially variant (type “linear”) Gaussian
while training. The original setting is p = 8 × 10−6 for the denoising on DIV2K-T50 [2].
denoising task, and we gradually reduce p to 3 × 10−6 . As
shown in Figure 8, the blue curve depicts that PSNR and Input Difficulty‐regulated Non‐regulated
FLOPs both increase as the reward penalty decreases. This
is reasonable since a smaller reward penalty encourages the
pathfinder to select a longer path for each region. We also
adjust the depth of DnCNN to achieve the trade-off between
PSNR and FLOPs, which is shown in the green curve of
Figure 9. Path selection of regulated and non-regulated reward.
Figure 8. We observe that Path-Restore is consistently bet-
ter than DnCNN by 0.1 dB with the same FLOPs. When
achieving the same performance, our method is nearly 30%
5. Conclusion
faster than DnCNN.
We then study the impact of “difficulty” in the proposed We have devised a new framework for image restora-
reward function. In particular, we set difficulty d ≡ 1 tion with low computational cost. Combining reinforce-
in Eq. (3) to form a non-regulated reward. As shown ment learning and deep learning, we have proposed Path-
in Figure 8, the orange curve is consistently below the Restore that enables path selection for each image region.
blue curve, indicating that the proposed difficulty-regulated Specifically, Path-Restore is composed of a multi-path CNN
reward helps Path-Restore achieve a better performance- and a pathfinder. The multi-path CNN offers several paths
complexity trade-off. We further present the path-selection to address different types of distortions, and the pathfinder
policy of different rewards in Figure 9. Compared with is able to find the optimal path to efficiently restore each
non-regulated reward, our difficulty-regulated reward saves region. To learn a reasonable dispatch policy, we propose
more computations when processing easy regions (e.g., re- a difficulty-regulated reward that encourages the pathfinder
gion in the red box), while using a longer path to process to select short paths for easy regions. Path-Restore achieves
hard regions (e.g., region in the blue box). comparable or superior performance to existing methods
with much fewer computations on various restoration tasks. [16] V. Jain and S. Seung. Natural image denoising with convo-
Our method is especially effective for real-world denoising, lutional networks. In NIPS, 2009. 2
where the noise distribution is diverse across different im- [17] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for
age regions. real-time style transfer and super-resolution. In ECCV, 2016.
4
References [18] J. Kim, J. Kwon Lee, and K. Mu Lee. Accurate image super-
resolution using very deep convolutional networks. In CVPR,
[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, 2016. 2
M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensor- [19] J. Kim, J. K. Lee, and K. M. Lee. Deeply-recursive convolu-
flow: A system for large-scale machine learning. In 12th tional network for image super-resolution. In CVPR, 2016.
{USENIX} Symposium on Operating Systems Design and 2
Implementation ({OSDI} 16), pages 265–283, 2016. 5
[20] D. P. Kingma and J. Ba. Adam: A method for stochastic
[2] E. Agustsson and R. Timofte. Ntire 2017 challenge on single optimization. In ICLR, 2015. 5
image super-resolution: Dataset and study. In CVPRW, 2017.
[21] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham,
6, 8
A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al.
[3] E. Agustsson and R. Timofte. Ntire 2017 challenge on sin- Photo-realistic single image super-resolution using a genera-
gle image super-resolution: Dataset and study. In CVPRW, tive adversarial network. In CVPR, 2017. 2
volume 3, page 2, 2017. 5, 7
[22] S. Lefkimmiatis. Non-local color image denoising with con-
[4] J. Anaya and A. Barbu. Renoir–a dataset for real low-light
volutional neural networks. In CVPR, 2017. 2
image noise reduction. Journal of Visual Communication
[23] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee. Enhanced
and Image Representation, 51:144–154, 2018. 7
deep residual networks for single image super-resolution. In
[5] Y. Bengio, N. Léonard, and A. Courville. Estimating or prop-
CVPRW, volume 1, page 4, 2017. 1
agating gradients through stochastic neurons for conditional
computation. arXiv preprint arXiv:1308.3432, 2013. 2 [24] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and
L. Zhang. Waterloo exploration database: New challenges
[6] J. Chen, J. Chen, H. Chao, and M. Yang. Image blind denois-
for image quality assessment models. IEEE Transactions on
ing with generative adversarial network based noise model-
Image Processing, 26(2):1004–1016, 2017. 7
ing. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 3155–3164, 2018. 2 [25] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database
of human segmented natural images and its application to
[7] Y. Chen and T. Pock. Trainable nonlinear reaction diffusion:
evaluating segmentation algorithms and measuring ecologi-
A flexible framework for fast and effective image restora-
cal statistics. In null, page 416. IEEE, 2001. 7
tion. IEEE transactions on pattern analysis and machine
intelligence, 39(6):1256–1272, 2017. 5 [26] S. Nah, T. H. Kim, and K. M. Lee. Deep multi-scale con-
volutional neural network for dynamic scene deblurring. In
[8] C. Dong, Y. Deng, C. Change Loy, and X. Tang. Compres-
CVPR, 2017. 2
sion artifacts reduction by a deep convolutional network. In
ICCV, 2015. 2 [27] T. Plotz and S. Roth. Benchmarking denoising algorithms
[9] C. Dong, C. C. Loy, K. He, and X. Tang. Image super- with real photographs. In CVPR, pages 1586–1595, 2017. 1,
resolution using deep convolutional networks. TPAMI, 2, 7, 8
38(2):295–307, 2016. 2 [28] C. Rosenbaum, T. Klinger, and M. Riemer. Routing net-
[10] M. Figurnov, M. D. Collins, Y. Zhu, L. Zhang, J. Huang, works: Adaptive selection of non-linear functions for multi-
D. P. Vetrov, and R. Salakhutdinov. Spatially adaptive com- task learning. In ICLR, 2017. 2
putation time for residual networks. In CVPR, volume 2, [29] S. Roth and M. J. Black. Fields of experts. International
page 7, 2017. 2 Journal of Computer Vision, 82(2):205, 2009. 5, 6
[11] J. Guo and H. Chao. Building dual-domain representations [30] J. Sun, W. Cao, Z. Xu, and J. Ponce. Learning a convolu-
for compression artifacts reduction. In ECCV, 2016. 2 tional neural network for non-uniform motion blur removal.
[12] J. Guo and H. Chao. One-to-many network for visually In CVPR, 2015. 2
pleasing compression artifacts reduction. arXiv preprint [31] Y. Tai, J. Yang, and X. Liu. Image super-resolution via deep
arXiv:1611.04994, 2016. 2 recursive residual network. In CVPR, 2017. 2
[13] S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang. Toward [32] R. Timofte, S. Gu, J. Wu, L. V. Gool, L. Zhang, et al. Ntire
convolutional blind denoising of real photographs. In CVPR, 2018 challenge on single image super-resolution: Methods
2019. 1, 2, 7 and results. In CVPRW, 2018. 1
[14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn- [33] X. Wang, F. Yu, Z.-Y. Dou, and J. E. Gonzalez. Skipnet:
ing for image recognition. In Proceedings of the IEEE con- Learning dynamic routing in convolutional networks. In
ference on computer vision and pattern recognition, pages ECCV, 2018. 2, 4
770–778, 2016. 2 [34] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, C. C.
[15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Loy, Y. Qiao, and X. Tang. Esrgan: Enhanced super-
Densely connected convolutional networks. In CVPR, vol- resolution generative adversarial networks. arXiv preprint
ume 1, page 3, 2017. 1 arXiv:1809.00219, 2018. 2
[35] Z. Wang, D. Liu, S. Chang, Q. Ling, Y. Yang, and T. S.
Huang. D3: Deep dual-domain based fast restoration of
JPEG-compressed images. In CVPR, 2016. 2
[36] R. J. Williams. Simple statistical gradient-following algo-
rithms for connectionist reinforcement learning. Machine
learning, 8(3-4):229–256, 1992. 4
[37] Z. Wu, T. Nagarajan, A. Kumar, S. Rennie, L. S. Davis,
K. Grauman, and R. Feris. Blockdrop: Dynamic inference
paths in residual networks. In CVPR, pages 8817–8826,
2018. 2, 4
[38] L. Xu, J. S. J. Ren, C. Liu, and J. Jia. Deep convolutional
neural network for image deconvolution. NIPS, 2014. 2
[39] K. Yu, C. Dong, L. Lin, and C. C. Loy. Crafting a toolchain
for image restoration by deep reinforcement learning. In
CVPR, pages 2443–2452, 2018. 2, 6, 7
[40] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond
a gaussian denoiser: Residual learning of deep cnn for image
denoising. TIP, 2017. 2, 5, 6
[41] K. Zhang, W. Zuo, and L. Zhang. Ffdnet: Toward a fast
and flexible solution for cnn based image denoising. IEEE
Transactions on Image Processing, 2018. 5

You might also like