Training-Free, Single-Image Super-Resolution Using A Dynamic Convolutional Network

IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO.
1, JANUARY 2018 85
Training-Free, Single-Image Super-Resolution Using

a Dynamic Convolutional Network
Aritra Bhowmik, Suprosanna Shit, and Chandra Sekhar Seelamantula, Senior Member, IEEE
Abstract—The typical approach for solving the problem of used for making performance comparisons. A thorough review
single-image super-resolution (SR) is to learn a nonlinear mapping of single-image SR techniques is available in [5].
between the low-resolution (LR) and high-resolution (HR) repre-
sentations of images in a training set. Training-based approaches
can be tuned to give high accuracy on a given class of images, but A. Related Literature
they call for retraining if the HR → LR generative model deviates
or if the test images belong to a different class, which limits their A landmark contribution was made recently by Yang et al.
applicability. On the other hand, we propose a solution that does [6], [7], who trained dictionaries for LR and HR image patches.
not require a training dataset. Our method relies on constructing The key assumption is that the LR and HR patches have the same
a dynamic convolutional network (DCN) to learn the relation be- sparse representation in respective dictionaries. The sparse rep-
tween the consecutive scales of Gaussian and Laplacian pyramids. resentation corresponding to an LR patch from an unseen image
The relation is in turn used to predict the detail at a finer scale,
thus leading to SR. Comparisons with state-of-the-art techniques is used to synthesize the corresponding HR patch, thus lead-
on standard datasets show that the proposed DCN approach re- ing to SR. With this as the central idea, Yang et al. [8] also
sults in about 0.8 and 0.3 dB gain in peak signal-to-noise ratio for developed a coupled LR/HR dictionary model optimized us-
2× and 3× SR, respectively. The structural similarity index is on ing a joint cost function. Kim and Kwon used kernel ridge
par with the competing techniques. regression and incorporated image priors to suppress ringing
Index Terms—Convolutional neural network (CNN), deep learn- artifacts [9]. Timofte et al. proposed anchored neighborhood
ing, dynamic convolutional network (DCN), Gaussian/Laplacian regression [10] in which they solve a ridge regression prob-
pyramids, super-resolution (SR). lem with neighborhood constraints, leading to a closed-form
solution for the regression coefficients. This approach is fast
I. INTRODUCTION and leads to qualitatively the same results as the compet-
ing techniques. In [11], they developed an advanced version
INGLE-IMAGE super-resolution (SR) is an important tool
S in applications such as biomedical imaging [1], face hal-
lucination [2], etc. In single-image SR [3], [4], one infers lo-
of the algorithm, which learns from the patches in the local
neighborhood of the anchor patch from the training dataset
and not from the dictionary. These were by far the best per-
cal image properties from the low-resolution (LR) image or forming techniques before the advent of neural network SR
learns them over a collection of given high-resolution (HR)–LR approaches.
pairs. The learning approaches employ dictionaries or neural Within the learning paradigm, the SR problem is essentially
networks, which capture the LR–HR association. In this letter, posed as one of discovering the nonlinear association between
we develop a technique that infers HR image features starting the LR and HR patches. Dong et al. used a convolutional neural
from the LR image without going through the standard process network (CNN) [12], [13] to learn the end-to-end mapping be-
of training, obviating the need for a training dataset. Before tween LR and HR pairs [14], [15]. The training is data-intensive
proceeding further, we review recent techniques that specifi- and time-consuming, whereas the run-time complexity is low
cally tackle the single-image SR problem, and highlight their leading to fast SR. Recently, Dong et al. proposed a threefold
strengths and weaknesses. Some of these techniques will be acceleration strategy:
1) introducing a deconvolution layer at the end of the CNN;
Manuscript received June 28, 2017; revised August 17, 2017; accepted
2) reducing the dimensionality of the input feature; and
September 4, 2017. Date of publication September 15, 2017; date of current 3) employing smaller filter sizes, all of which resulted in a
version November 29, 2017. The associate editor coordinating the review of 40× speed-up.
this manuscript and approving it for publication was Dr. S. Channappayya.
(Aritra Bhowmik and Suprosanna Shit contributed equally to this work and
This method is referred to as fast SRCNN [16]. Shi et al.
must be treated as joint first authors). (Corresponding author: Chandra Sekhar developed an efficient subpixel CNN to perform SR [17]. The
Seelamantula). early layers operate on the LR image, whereas the final layer
The authors are with the Department of Electrical Engineering, Indian In-
stitute of Science, Bangalore 560012, India (e-mail: aritra0593@gmail.com;
relies on subpixel convolution to upscale the image. Kim et al.
suprosanna93@gmail.com; chandra.sekhar@ieee.org). proposed a very deep CNN architecture that learns the residual
This letter has supplementary downloadable material available at http:// images instead of the HR image [18]. They showed that the
ieeexplore.ieee.org.
Color versions of one or more of the figures in this letter are available online
very deep SR technique overcomes the limitations of SRCNN.
at http://ieeexplore.ieee.org. They also proposed a method called deep recursive CNN, which
Digital Object Identifier 10.1109/LSP.2017.2752806 incorporates skip connections between each hidden layer and
1070-9908 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
86 IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 1, JANUARY 2018
prediction layer [19]. Ledig et al. developed a perceptual-loss-

based generative adversarial network, which has been shown
to outperform the other techniques in terms of mean opinion
score [20]. Huang et al. proposed self-exemplar-based SR (Self-
ExSR) to learn the patchwise homography relation between a
given image and the self-exemplar LR image [21]. Recently,
Lai et al. proposed a deep Laplacian pyramid network architec-
ture, which consists of a feature extractor network and an image
reconstructor network [22]. The feature extractor network pre-
dicts the residual images, which are added with the correspond-
ing upsampled images produced by the image reconstruction
network.
II. PROPOSED APPROACH

Training-based SR approaches offer high accuracy on images
that belong to the same class as seen during training. By class,
we are referring to images acquired in same conditions/modality,
for example, class of natural images, class of face images, class
of magnetic resonance images, etc. However, their performance
deteriorates when images from an unseen class are given as
input1 . We address the single-image SR problem within the
framework of prediction, without the need for training. Among Fig. 1. Gaussian and Laplacian pyramids for an image, and the network
the prominent SR methods reviewed in Section I-A, SelfExSR architecture for predicting the detail images. The D− components also go
is the only technique that does not use a training dataset. The through a similar network.
point-spread function (PSF) that is used to derive the LR im-
age from the HR image is used to construct the Gaussian and
Laplacian pyramids. We learn the interscale relation between III. FROM LAPLACIAN PYRAMID TO SR
the layers of a Laplacian pyramid (see Section III) with the help The Laplacian pyramid is a multiscale representation of an
of a CNN. The network is then used to predict the Laplacian image and allows for perfect reconstruction by constructing the
image at a higher scale. SelfExSR follows a similar approach detail images at each scale of a Gaussian pyramid. The Laplacian
in which the relationship between Gaussian pyramid levels is pyramid finds applications in image processing and computer
learnt and then inverted to predict the HR image. Our method, vision problems, the most recent ones being semantic segmen-
on the other hand, does not require inversion to predict the HR tation [26], optical flow estimation [27], deep generative models
image. We use the CNN as an optimization model and not as [28], etc. We assume that the given LR image is at the immedi-
a conventional feature extractor/classifier. We super-resolve an ately next coarser level of a Gaussian pyramid of the unknown
image directly without having to train (which might take several HR image. The key idea is to predict, based on interscale depen-
days even with the aid of graphics processing units) a network dencies, the finer level detail image of the Laplacian pyramid in
unlike the other NN approaches. We refer to the network as order to reconstruct the HR image (cf. Fig. 1).
the dynamic convolutional network (DCN), which is optimized To explain further, let the given LR image X 0 denote the
specifically for a given LR image. The motivation lies in the dy- zeroth level of a Gaussian pyramid. Images at the two coarser
namic filter network of De Brabandere et al. [23], who used such levels of the Gaussian pyramid (X 1 and X 2 ) are first con-
networks to learn input-specific filters. Most learning-based SR structed. Then, the upsampled (zero insertion followed by bicu-
algorithms are intensive both with respect to the training pro- bic interpolation [29]) images (X̂ 0 and X̂ 1 , respectively) and
cess and data. Furthermore, if the class of images or the PSF the difference images in the Laplacian pyramid (D 0 and D 1 ,
changes, the networks have to be adapted or retrained for opti- respectively) are constructed. In the following analysis, bold
mal performance [24]. On the contrary, the proposed approach symbols in lowercase correspond to the vectorized versions, for
is image-specific and does not involve training. Validations on instance, x0 is the vectorized form of X 0 .
benchmark databases Set5 and Set14 using peak signal-to-noise Let Li denote the downsampling matrix from (i − 1)th to ith
ratio (PSNR) and structural similarity index (SSIM) metrics level, U i the upsampling matrix from ith to (i − 1)th level, Gi
show that our approach compares favorably with the state-of- and Ĝi the Gaussian blur and interpolating kernel, respectively,
the-art techniques. The implementation was carried out using at the (i − 1)th level, and di the Laplacian pyramid image at
TensorFlow [25]. the ith level. Using these notations, the first two levels of the
Gaussian pyramid are related as follows:
1 This aspect is illustrated in Fig. 1 of the Supplemental document. x1 = L1 G1 x0 . (1)

BHOWMIK et al.: TRAINING-FREE, SINGLE-IMAGE SR USING A DCN 87
The upsampled image is given by 1) Sparsity Promoting Regularization: To incorporate the

prior that the detail image is sparse, we add the 1 regularizer:
x̂0 = Ĝ1 U 1 x1 = Ĝ1 U 1 L1 G1 x0 . (2)
Correspondingly, the difference image is computed as C sparsity = D̃ 0 1 .
d0 = x0 − x̂0 = (I 1 − Ĝ1 U 1 L1 G1 )x0 , (3) 2) Contrast Enhancing Regularization: The Laplacian pyra-
mid contains high-frequency details largely comprising texture
where I 1 is the identity matrix. Similarly and edges. In order to preserve texture and edge content, local
d1 = (I 2 − Ĝ2 U 2 L2 G2 )x1 . (4) contrast-enhancing loss-functions have been suggested by Liu
et al. [31]. We incorporate this as an additional regularizer act-
From (1), (3), and (4), we get an expression for the coarser level ing on the predicted HR image X̃ −1 = X̂ −1 + D̃ −1 , in a 3 × 3
difference image in terms of the finer level one as patch-based fashion. Since not all patches contain edges or tex-
d1 = (I2 − Ĝ2 U 2 L2 G2 )L1 G1 (I1 − Ĝ1 U 1 L1 G1 )−1 d0 . ture, we use a weight wk for the kth patch P k that indicates
whether a patch does contain an edge/texture (wk = 1) or does
P1
not (wk = 0). The resulting contrast regularizer is given by
P 1 is rank-deficient and hence, not invertible, making the prob-
lem of recovering d0 given P 1 and d1 ill-posed. Also, P 1 is C contrast = − wk (X̃ −1 (pi ) − X̃ −1 (pj ))2 , (6)
a huge matrix, of size mn × mn for an m × n image. For in- k pi ,pj
stance, for a 256 × 256 image, it would require about 16 GB of where pi and pj denote the pixel coordinates in the patch. In
memory only for storing P 1 considering double-precision rep- order to determine the 1/0 weight assignment for wk , we use a
resentation. The high demand on memory makes it impractical Canny edge-map of the upsampled image X̂ −1 .
to handle the image all at once. The mapping between d1 and 3) Smoothness Preserving Regularization: To enforce local
d0 is modeled using a DCN and the filter-sets are learnt based smoothness in regions lacking edges/texture, we add a comple-
on the input LR image. mentary regularization functional:

A. Network Architecture C smoothness = wk (X̃ −1 (pi ) − X̃ −1 (pj ))2 ,
k pi ,pj
We use an L-layer CNN with the rectified linear unit (ReLU)
activation function η for modeling the nonlinearity. For mod- where the weights are set as 1 for smooth regions, and 0 other-
erate network depths, the ReLU activation is known to over- wise. The contrast regularizer has a negative sign since it has to
come the problem of vanishing gradient [30]. Since the im- be maximized, but the smoothness regularizer does not since it
ages in the Laplacian pyramid, unlike those in the Gaussian has to be minimized to favor a smooth reconstruction.
pyramid, contain both positive and negative values, we sepa- The total objective for learning the CNN filters is given by
rate the detail images into positive (D + 1 ) and negative compo-
nents (−D − + − C total = C CNN + λ1 C sparsity + λ2 C contrast + λ3 C smoothness ,
1 ). For instance, D 1 is divided into D 1 and D 1 , so
+ − + −
that D 1 = D 1 − D 1 , where D 1 , D 1 0. Next, we upsample where {λ1 , λ2 , λ3 } are the regularization weights. We optimize
+ −
D+ −
1 and D 1 to get D̂ 0 and D̂ 0 , respectively, and pass them the cost C total with respect to the CNN filters using the ADAM
through two CNNs. Each stage of the CNN is a tensor filter. Cor- optimization technique [32]. Denote the optimized filter-set as
+ −
respondingly, for the positive-part and negative-part images, we {K̂ i , K̂ i }. Our SR algorithm is based on the assumption that
have the filters K + −
i and K i , respectively, for i = 1, 2, . . . , L. the relationship that exists between D 1 and D 0 also holds be-
+ −
These filters take the upsampled images D̂ 0 and D̂ 0 as input tween D 0 and D −1 . The assumption is justified given the in-
and predict the detail images at the immediate finer scale result- terscale correlation and recurrence of image patches [33], [34].
+ − + −
ing in D̃ 0 and D̃ 0 , respectively. The prediction equations are Therefore, we predict D̃ −1 and D̃ −1 as follows:
as follows: + + + +
+ + D̃ −1 = η K̂ L ∗ η · · · ∗ η K̂ 1 ∗ D̂ 0 , (7)
D̃ 0 = η(K + +
L ∗ η(· · · ∗ η(K 1 ∗ D̂ 0 ))),
− − − −
− − D̃ −1 = η K̂ L ∗ η · · · ∗ η K̂ 1 ∗ D̂ 0 , (8)
D̃ 0 = η(K − −
L ∗ η(· · · ∗ η(K 1 ∗ D̂ 0 ))),
+ −
where η is the ReLU activation and ∗ denotes convolution. The and construct D̃ −1 = D̃ −1 − D̃ −1 . Finally, the super-resolved
+ −
positive-part prediction D̃ 0 and negative-part prediction D̃ 0 image is obtained as X̃ −1 = X̂ −1 + D̃ −1 .
+ −
are combined to result in D̃ 0 : D̃ 0 − D̃ 0 . The cost function
measures the fidelity between the predicted detail and the true IV. EXPERIMENTAL RESULTS
one and is expressed, using Frobenius norm (F), as We consider a three-layer CNN (L = 3). The first layer has
+ −
sixteen 9 × 9 filters, the second layer consists of one hundred
C CNN = C CNN + C CNN = D 0 − D̃ 0 2F . (5) and twenty eight 3 × 3 filters, and the third one has eight 5 × 5
filters. We perform the SR method on the Y channel of the
B. Incorporating Regularization YCbCr decomposition and upsample the other channels using
Since the detail images are structured and sparse, we incor- bicubic interpolation. We validate on two standard databases,
porate appropriate regularizers in the optimization. Set5 and Set14, which consist of image classes encompassing
88 IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 1, JANUARY 2018
TABLE I
COMPARISON OF THE DCN METHOD WITH THE STATE-OF-THE-ART TECHNIQUES
Dataset Set5 Set14
Metric Scale Bicubic SC SRCNN SelfeXSR DCN Bicubic SC SRCNN SelfeXSR DCN
PSNR 2× 27.93 28.17 28.370 28.49 29.56 24.46 24.86 24.78 24.85 25.40
3× 25.78 26.74 26.35 26.45 26.50 23.00 23.60 23.38 23.47 23.62
mw-PSNR 2× 27.10 27.36 27.55 27.62 28.91 24.59 24.77 24.93 25.01 25.69
3× 25.59 26.02 26.10 26.17 26.42 23.46 23.99 23.82 23.87 24.24
SSIM 2× 0.929 0.932 0.935 0.936 0.946 0.823 0.830 0.836 0.838 0.852
3× 0.889 0.908 0.900 0.901 0.899 0.767 0.791 0.783 0.786 0.796
ms-SSIM 2× 0.973 0.979 0.976 0.976 0.983 0.948 0.951 0.953 0.953 0.963
3× 0.945 0.950 0.955 0.955 0.957 0.907 0.927 0.921 0.921 0.931
The highest values are shown in boldface.
Fig. 2. Super-resolved images (2× upscaling) obtained using different meth-

ods in comparison with the ground-truth HR image. The detail images show
that the DCN prediction accuracy is high. The values displayed are PSNR
(dB)/SSIM.
landscapes, faces, and animals. We have taken HR images from

these datasets and created LR images by downsampling after
Fig. 3. An illustration of super-resolution by 3×. The values displayed are
filtering with a Gaussian PSF with parameter σ. We experiment PSNR (dB)/SSIM.
with various values of σ: 1.5, 2, and 2.5. The results for σ = 2
are reported here and the results for the other values of σ are
provided in the Supplemental document. methods give over bicubic interpolation reduces. On the other
Apart from the bicubic interpolation method, we compare hand, the DCN has a gain margin of about 2 dB for 2× SR
with three state-of-the-art SR algorithms: over bicubic interpolation, which shows that the prediction of
1) Sparse coding using HR/LR dictionaries [6]; the detail image is accurate. Furthermore, the gains in DCN are
2) SRCNN, which learns an end-to-end mapping between higher for 2× than 3× for a fixed HR image size. This is because
LR and HR images; and as the downsampling factor increases, the effective number of
3) SelfExSR, which learns a LR → HR mapping. pixels available for learning the interscale dependency, which
Figs. 2 and 3 show super-resolved images obtained using our is crucial for accurate prediction, decreases. On the other hand,
method vis-à-vis the other methods for 2× and 3× upscaling. for larger images and greater downsampling factors, the results
For the purpose of comparison, the original detail image and are better, as shown in the Supplemental document.
the detail image predicted by the proposed DCN approach are
also shown. Table I gives the PSNR, SSIM [35], multiscale V. CONCLUSION
SSIM (ms-SSIM) [36], and the morphological wavelet-based We have introduced the idea of learning the detail image at
PSNR (mw-PSNR) [37] values averaged over the two datasets. the coarser scales of a Laplacian pyramid using a DCN. The
The proposed DCN method outperforms the state-of-the-art by learnt relationship is used to predict the detail at a finer scale,
about 0.6–0.8 dB in PSNR for 2×, and about 0.3 dB on the leading to SR. The proposed approach does not require a training
average for 3× SR. The improvement is also observed in SSIM, dataset and hence is applicable to a given image from any class.
mw-PSNR, and ms-SSIM. From Figs. 2 and 3, we also observe a This feature is particularly attractive in resource-constrained
visual improvement in the DCN SR results. From the results for settings. Comparisons with state-of-the-art techniques show that
the other values of σ reported in the Supplemental document, we significant gains in PSNR are obtained without compromising
observed that as σ increases, the margin of gain the competing on the structural fidelity of the image.
BHOWMIK et al.: TRAINING-FREE, SINGLE-IMAGE SR USING A DCN 89
REFERENCES [19] J. Kim, J. K. Lee, and K. M. Lee., “Deeply-recursive convolutional net-

work for image super-resolution,” in Proc. IEEE Conf. Comput. Vis. Pat-
[1] M. D. Robinson, S. J. Chiu, C. A. Toth, J. Izatt, J. Y. Lo, and S. tern Recognit., 2016, pp. 1637–1645.
Farsiu, “Novel applications of super-resolution in medical imaging,” in [20] C. Ledig et al., “Photo-realistic single image super-resolution using a
Super-Resolution Imaging. Boca Raton, FL, USA: CRC Press, 2010, generative adversarial network,” arXiv:1609.04802, May 2017.
pp. 383–412. [21] J. B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from
[2] N. Wang, D. Tao, X. Gao, X. Li, and J. Li, “A comprehensive survey to transformed self-exemplars,” in Proc. IEEE Conf. Comput. Vis. Pattern
face hallucination,” Int. J. Comput. Vis., vol. 106, no. 1, pp. 9–30, 2014. Recognit., 2015, pp. 5197–5206.
[3] W. T. Freeman, T. R. Jones, and E. C. Pasztor, “Example-based super- [22] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep Laplacian
resolution,” IEEE Comput. Graph. Appl., vol. 22, no. 2, pp. 56–65, pyramid networks for fast and accurate super-resolution,” in Proc. IEEE
Mar./Apr. 2002. Conf. Comput. Vis. Pattern Recognit., 2017.
[4] Y. Hu, N. Wang, D. Tao, X. Gao, and X. Li, “SERF: A simple effective [23] B. De Brabandere, X. Jia, T. Tuytelaars, and L. Van Gool, “Dy-
robust and fast image super-resolver from cascaded linear regression,” namic filter networks,” in Proc. Neural Inf. Process. Syst., 2016,
IEEE Trans. Image Process., vol. 25, no. 9, pp. 4091–4102, Sep. 2016. pp. 1–9.
[5] C. Y. Yang, C. Ma, and M. H. Yang, “Single-image super-resolution: A [24] NTIRE SR Challenge (CVPR 2017 workshop), 2017. [Online].
benchmark,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 372–386. Available: http://www.vision.ee.ethz.ch/ timofter/publications/Timofte-
[6] J. Yang, J. Wright, T. Huang, and Y. Ma, “Image super-resolution as sparse CVPRW-2017.pdf
representation of raw image patches,” in Proc. IEEE Conf. Comput. Vis. [25] TensorFlow, 2017. [Online]. Available: https://www.tensorflow.org/
Pattern Recognit., 2008, pp. 1–8. [26] G. Ghiasi and C. C. Fowlkes, “Laplacian pyramid reconstruction and
[7] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution refinement for semantic segmentation,” in Proc. Eur. Conf. Comput. Vis.,
via sparse representation,” IEEE Trans. Image Process., vol. 19, no. 11, 2016, pp. 519–534.
pp. 2861–2873, Nov. 2010. [27] A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyra-
[8] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. Huang, “Coupled dictionary mid network,” arXiv:1611.00850, Nov. 2016.
training for image super-resolution,” IEEE Trans. Image Process., vol. 21, [28] E. L. Denton, S. Chintala, and R. Fergus, “Deep generative image models
no. 11, pp. 3467–3478, Aug. 2012. using a Laplacian pyramid of adversarial networks,” in Proc. Neural Inf.
[9] K. I. Kim and Y. Kwon, “Single-image super-resolution using sparse Process. Syst., 2015, pp. 519–534.
regression and natural image prior,” IEEE Trans. Pattern Anal. Mach. [29] R. Keys, “Cubic convolution interpolation for digital image process-
Intell., vol. 32, no. 6, pp. 1127–1133, Jun. 2010. ing,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-29, no. 6,
[10] R. Timofte, V. De Smet, and L. Van Gool, “Anchored neighborhood pp. 1153–1160, Dec. 1981.
regression for fast example-based super-resolution,” in Proc. IEEE Int. [30] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training
Conf. Comput. Vis., 2013, pp. 1920–1927. recurrent neural networks,” in Proc. Int. Conf. Mach. Learn. 2013 ser. J.
[11] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neigh- Mach. Learn. Res., 2013, vol. 28, pp. 1310–1318.
borhood regression for fast super-resolution,” in Proc. IEEE Asian Conf. [31] F. Liu, J. Wang, S. Zhu, M. Gleicher, and Y. Gong, “Visual quality optimiz-
Comput. Vis., 2014, pp. 111–126. ing super resolution,” Comput. Graph. Forum, vol. 28, no. 1, pp. 127–140,
[12] Y. Bengio, I. J. Goodfellow, and A. Courville, Deep Learning. Cambridge, 2009.
MA, USA: MIT Press, 2015. [32] D. Kingma and J. Ba, “ADAM: A method for stochastic optimization,” in
[13] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient based learning Proc. Int. Conf. Learn. Represent., arXiv:1412.6980 [cs.LG], Jan. 2017.
applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278– [33] M. Zontak and M. Irani, “Internal statistics of a single natural im-
2324, Nov. 1998. age,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2011,
[14] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional pp. 977–984.
network for image super-resolution,” in Proc. Eur. Conf. Comput. Vis., [34] M. Zontak, I. Mosseri, and M. Irani, “Separating signal from noise using
2014, pp. 184–199. patch recurrence across scales,” in Proc. IEEE Conf. Comput. Vis. Pattern
[15] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using Recognit. 2013, pp. 1195–1202.
deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., [35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality
vol. 38, no. 2, pp. 295–307, Feb. 2016. assessment: From error visibility to structural similarity,” IEEE Trans.
[16] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution Image Process., vol. 13, no. 11, pp. 600–612, Apr. 2004.
convolutional neural network,” in Proc. Eur. Conf. Comput. Vis., 2016, [36] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similar-
pp. 391–407. ity for image quality assessment,” in Proc. IEEE Conf. Rec. 37th Asilomar
[17] W. Shi et al., “Real-time single image and video super-resolution using Conf. Signals Syst. Comput., 2003, pp. 1398–1402.
an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. [37] D. S. Stankovic, D. Kukolj, and P. L. Callet, “DIBR synthe-
Comput. Vis. Pattern Recognit., 2016, pp. 1874–1883. sized image quality assessment based on morphological wavelets,” in
[18] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using Proc. IEEE Int. Workshop Qual. Multimedia Experience, Jan. 2015,
very deep convolutional networks,” in Proc. IEEE Conf. Comput. Vis. pp. 1–6.
Pattern Recognit., 2016, pp. 1646–1654.

Training-Free, Single-Image Super-Resolution Using A Dynamic Convolutional Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Training-Free, Single-Image Super-Resolution Using A Dynamic Convolutional Network

Uploaded by

Copyright:

Available Formats

IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO.

Training-Free, Single-Image Super-Resolution Using

prediction layer [19]. Ledig et al. developed a perceptual-loss-

II. PROPOSED APPROACH

1 This aspect is illustrated in Fig. 1 of the Supplemental document. x1 = L1 G1 x0 . (1)

The upsampled image is given by 1) Sparsity Promoting Regularization: To incorporate the

Dataset Set5 Set14

The highest values are shown in boldface.

Fig. 2. Super-resolved images (2× upscaling) obtained using different meth-

landscapes, faces, and animals. We have taken HR images from

REFERENCES [19] J. Kim, J. K. Lee, and K. M. Lee., “Deeply-recursive convolutional net-

You might also like