Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A leraning based Image Super-resolution: Use of

nonhomogeneous AR Prior
K.P. Upla S. Ravishanker P.P. Gajjar and M.V. Joshi
SVNIT Amritaschool of engineering DA-IICT
Surat, Gujarat, India Bangalore, India Gandhinagar, Gujarat, India
email: kpu@eced.svnit.ac.in email: sravishankar2002@gmail.com email: {prakash gajjar,mv joshi}@daiict.ac.in

Abstract—In this paper, we propose a new technique for image fields simultaneously. The authors in [?] present technique of
super-resolution. Given a single low resolution (LR) observation image interpolation using wavelet transform. They estimate
and a database consisting of low resolution images and their the wavelet coefficients at higher scale from a single low
high resolution versions, we obtain super-resolution for the LR
observation. We first obtain a close approximation to the super- resolution observation and achieve interpolation by taking in-
resolved image by learning the high frequency components using verse wavelet transform. The authors in [?] propose technique
the available database. The final super-resolved image is obtained for super-resolving a single frame image using a database of
by a regularization technique using a proper prior model. A high resolution images. They learn the high frequency details
Discrete Cosine Transform (DCT) based approach is used to from a database of high resolution images and obtain initial
learn the high frequency details of the observation. The super-
resolution is then cast as a restoration problem. The LR image estimate of the image to be super-resolved. They formulate
is modeled as an aliased and noisy version of the high resolution regularization using wavelet prior and MRF model prior and
(HR) image using a linear model. In order to preserve the HR employ simulated annealing for optimization (This is learning
texture we use a non homogeneous autoregressive (AR) process based). Add some papers on reconstruction based in this
as a prior. The model parameters are obtained by segmenting the paragraph.
learned initial estimate into number of homogeneous regions and
estimating the AR parameters for each region. We then arrive Recently, many of the researchers are working on learning
at a cost function consisting of a data fitting term and a prior based techniques for super-resolution. In this approach missing
term which makes use of the estimated AR parameters. Finally information of the high resolution image is learned by using
a simple gradient descent optimization technique is used to a database consisting of HR-LR set or a database consisting
minimize the cost function. We show the efficacy of the proposed of high resolution images only. Freeman et al. [?] propose
method by comparing our results with the standard interpolation
methods and existing super-resolution techniques. The advantage an example based super-resolution technique. They estimate
of the proposed method is that it does not require a number low missing high-frequency details by interpolating the input low-
resolution observations as used by the motion based methods and resolution image into the desired scale. The super-resolution
hence the task of registration is not required. is performed by the nearest neighbor based estimation of
high-frequency patches based on the corresponding patches
I. I NTRODUCTION
of input low-frequency image. Brandi et al. [?] propose
In many applications high resolution images lead to better an example-based approach for video super-resolution. They
classification, analysis and interpretation. The resolution of restore the high-frequency information of an interpolated block
an image depends on the density of sensing elements in by searching in a database for a similar block, and by adding
the camera. High end camera with large memory storage the high frequency of the chosen block to the interpolated one.
capability can be used to capture the high resolution images. They use the high frequency of key HR frames instead of the
In some applications such as wildlife sensor network, video database to increase the quality of non-key restored frames. In
surveillance, it may not be feasible to employ costly camera. [?], the authors address the problem of super-resolution from
In such applications algorithmic approaches can be helpful a single image using multi-scale tensor voting framework.
to obtain high resolution images from low resolution images They consider simultaneously all the three color channels
obtained using low cost cameras. to produce a multi-scale edge representation to guide the
The super-resolution idea was first proposed by Tsai and process of high-resolution color image reconstruction, which
Huang [?]. They use frequency domain approach and employ is subjected to the back projection constraint. The authors
motion as a cue. In [?], the authors use a Maximum a in [?] recover the super-resolution image through neighbor
posteriori (MAP) framework for jointly estimating the regis- embedding algorithm. They employ histogram matching for
tration parameters and the high-resolution image for severely selecting more reasonable training images having related con-
aliased observations. The authors in [?] describe an MAP- tents. In [?] authors propose a neighbor embedding based
MRF based super-resolution technique using blur cue and super-resolution through edge detection and Feature Selection
recover both the high-resolution scene intensity and the depth (NeedFS). They propose a combination of appropriate features
for preserving edges as well as smoothing the color regions. II. DFT BASED APPROACH FOR LEARNING THE INITIAL
The training patches are learned with different neighborhood HR ESTIMATE .
sizes depending on edge detection. In (write our IEEE SMCB In this section, a DFT based approach to learn high fre-
paper) the authors propose a super-resolution technique using quency details for the super-resolved for a decimation factor
zoom cue. Here the SR image is modeled as a homogeneous of 2 (q = 2) is described. Each set in the database consists of
AR model and the model parameters are obtained using the a pair of low resolution and high resolution image. The test
most zoomed image. The drawback of the proposed method image (observed low resolution image) and the LR training
is that they assume that the entire region (SR region that images are of size M × M pixels. Corresponding HR training
corresponds to least zoomed region) is homogeneous which is images have the size of 2M × 2M pixels. We first upsample
not true for real images. The authors in (write our IGARSS- the test image and all the low resolution training images by a
09) Krishna and Joshi propose a model based multiresolution factor of 2 and create images of size 2M × 2M pixels each. A
fusion in which the nonhomogeneous AR parameters are standard interpolation technique can be used for the same. It
estimated using the available Panchromatic image and use the may be noted that while learning the high frequency contents
same during regularization. in the upsampled test image we make use of the upsampled
LR traing images and their corresponding true high resolution
images. The learning is done as follows. We divide each of
In this paper, we propose a learning based method to obtain the images, i.e. the upsampled test image, upsampled low
super-resolution from a single image. Our approach is based resolution images and their high resolution versions, in blocks
on the method proposed in (Write ICVGIP-08 paper) and of size 4 × 4. The motivation for dividing into 4 × 4 block
(Write our IGARSS-09 paper). In [ICVGIP paper] the authors is due to the theory of JPEG compression where an image is
first learn an initial HR estimate and use a inhomogeneous divided into 8 × 8 blocks in order to extract the redundancy in
Markov random field (IGMRF) as the prior for rgularization. each block. However, in this case we are interested in learning
A wavelet based learning was used to obtain the initial estimate the non aliased frequency components from the HR training
and the IGMRF model parameters were estimated using the images using the aliased test image and the aliased LR training
local gradient as the standard deviation. Although the method images. This is done by taking the DFT on each of the block
works well and has advantages when compared to using a for all the upsampled images and their true images in the
homogeneous AR prior, it has the following drawbacks: 1. The database as well as on the upsampled test image. Fig.??(a)
initial high resolution estimate obtained using wavelet based shows the DFT blocks of the upsampled test image whereas
approach assumes that a primitive edge element is confined Fig.??(b) shows the DFT blocks of upsampled LR training
to a local region which is not true in practice. Also the images and HR training images. We learn the DFT coefficients
learned edges are limited to horizontal, vertical and diagonal for each block in the test image from the corresponding
directions. This leads to an initial estimate that may not be blocks in the HR images in the database. It is reasonable
a close approximation to the super-resolved image. 2. The to assume that when we interpolate the test image and the
IGMRF parameters estimated at every location are based on low resolution training images to obtain 2M × 2M pixels, the
the approach proposed in (write Andre Jalobeanu paper). Here distortion due to aliasing and interpolation is minimum in the
the authors use a Maximum Likelihood (ML) estimate and use lower frequencies. This means that lower frequencies in the
a simple approximation of the local variance in order to reduce upsampled test image and the corresponding high resolution
the computational complexity. This leads to approximation to images are significantly similar. Hence we learn only those
the true IGMRF parameter values. Also since these parameters DFT coefficients that correspond to high frequencies (already
are estimated using the initial HR derived from the wavelet aliased) and now distorted due to interpolation.
based learning it causes larger errors in estimated parameters. We consider upsampled LR training images to find the best
matching DFT coefficients for each of the blocks in the test
image. Let CT (i, j), 1 ≤ i, j ≤ 4, be the DFT coefficient at
In this paper we first learn the high frequency content of location (i, j) in a 4 × 4 block of the test image. Similarly,
the super-resolved image using a database of training images (m) (m)
let CLR (i, j) and CHR (i, j), m = 1, 2, . . . , L, be the DCT
consisting of LR-HR pairs. The learned HR image is used as
coefficients at location (i, j) in the block at the same position
a close approximation to the final solution. Instead of using
in the mth upsampled LR image and mth HR image. Here L
a wavelet based approach we use a DFT based learning for
is the number of the training sets in the database. Now the best
estimating the higher frequencies. Since the problem is ill-
matching HR image block for the considered low resolution
posed a prior model is needed in order to make the solution
image block (upsampled) is obtained as
better posed. We use a nonhomogeneous AR model as prior
and the prior parameters are estimated using the learned
8
HR image. These parameters are used while regularizing the X (m)
m̂ = arg min
m kCT (i, j) − CLR (i, j)k2 . (1)
solution. The final cost function, consisting of data fitting term
i+j>T hreshold
and a prior term is minimized using computationally efficient
gradient descent optimization. Here, m̂(i, j) is the index for the training image which gives
the minimum for the block. Those non aliased best matching Our problem is to estimate z given y, which is an ill-posed
HR image DCT coefficients are now copied into corresponding inverse problem. It may be mentioned here that the observation
locations in the block of the upsampled test image. In effect, captured is not blurred. In other words, we assume identity
we learn non aliased DCT coefficients for the test image matrix for blur. Generally, the decimation model to obtain the
block from the set of LR-HR images. The coefficients that aliased pixel intensities from the high resolution pixels, for a
corresponds to low frequencies are not altered. Thus at location decimation factor of q, has the form [?]
(i, j) in a block, we have, 
1 1...1 0

1 
(
ˆ
(m) 1 1...1 
CT (i, j) = CHR (i, j) if (i + j) > T hreshold, D= 2  . (4)
q  
CT (i, j) else.
0 1 1...1
COMMENTS: The above paragraph has to be modified The decimation matrix in Eq. (4) indicates that a low
according to how DFT coefficients are learned. Another point resolution pixel intensity y(i, j) is obtained by averaging the
to be written is why DCT why not DFT. DCT has a better intensities of q 2 pixels corresponding to the same scene in the
energy compaction. But this is not a point to be considered high resolution image and adding noise intensity n(i, j).
here as we want better high frequency content learning. You
may have to go through details of the DFT and DCT and V. T EXTURE M ODELING
look for advantage/s of DFT in representing the frequency Modify this section also. Refer to our TGRS paper (M.
contents as compared to DCT. We will have write the ad- V. Joshi, L. Bruzzone and S. Chaudhuri, A Learning Based
vantage of using DFT based learning although DCT is a Approach to Multiresolution Fusion in Remotely Sensed Im-
real transform and computational complexity is less. This is ages, IEEE Trans. Geoscience and Remote Sensing, vol. 44(9),
repeated for every block in the test image. We conducted pp. 2549-2562, Sep. 2006. ) paper and RVS Krishna paper
the experiment with different T hreshold values. We begin (IGARSS-09)
with T hreshold = 2 where all the coefficients except the Natural images consists of smooth regions, edges and
DC coefficient are learned. We subsequently increased the texture areas. We regularize the solution using the texture
T hreshold value and conducted the experiment. The best preserving prior. We capture the features of texture by applying
results were obtained when the T hreshold was set to 4 that different filters to the image and compute histograms of
corresponds to learning a total of 10 coefficients from the the filtered images. These histograms estimate the marginal
best matching HR image in the database. After learning the distribution of the image. These histograms are used as the
DCT coefficients for every block in the test image, we take features of the image. We use a filter bank that consists of
inverse DFT transform to get high spatial resolution image and two kinds of filters: Laplacian of Gaussian (LoG) filters and
consider it as the close approximation to the HR image. (This Gabor filters.
part also has to be changed accordingly) A. Filter Bank
III. S EGMENTATION OF THE L EARNED I MAGE The Gaussian filters play an important role due to its nice
COMMENT: Here refer to RVS Krishna paper but write in low pass frequency property. The two dimensional Gaussian
your own words. function can be defined as
(x−x0 )2 (y−y0 )2

IV. I MAGE F ORMATION M ODEL 1 − 2
2σx
+ 2
σy
G(x, y|x0 , y0 , σx , σy ) = e . (5)
2πσx σy
In this work, we obtain super-resolution for an image from
a single observation. The observed image Y is of size M × M Here (x0 , y0 ) are location parameters and (σx , σy ) are scale
pixels. Let y represent the lexicographically ordered vector of parameters. The Laplacian of Gaussian (LoG) filter is a radi-
size M 2 × 1, which contains the pixels from image Y and ally symmetric centersurround Gaussian filter with (x0 , y0 ) =
z be the super-resolved image. The observed images can be (0, 0) and σx = σy = T . Hence LoG filter can be represented
modeled as by
x2 +y 2
y = Dz + n, (2) F (x, y|0, 0, T ) = c · (x2 + y 2 − T 2 )e− T2 . (6)
where D is the decimation matrix which takes care of aliasing. Here c is a constant and T scale parameter. We can choose
For an integer decimation factor of q, the decimation matrix D different scales with T = √12 , 1, 2, 3, and so on.
consists of q 2 non-zero elements along each row at appropriate The Gabor filter with sinusoidal frequency ω and amplitude
locations. We estimate this decimation matrix from the initial modulated by the Gaussian function can be represented by
estimate. The procedure for estimating the decimation matrix
Fω (x, y) = G(x, y|0, 0, σx , σy )e−jωx . (7)
is described below. n is the i.i.d noise vector with zero mean
and variance σn2 . It is of the size, M 2 × 1. The multivariate A simple case of Eq.(7) with both sine and cosine compo-
noise probability density is given by nents can chosen as

1 2 2 2π
1 − 1
2 nT n G(x, y|0, 0, T, θ) = c·e 2T 2 4(x cos θ+y sin θ) +(−x sin θ+y cos θ) ×e−i T (x co
P (n) = M2
e 2σn
. (3)
(2πσn2 ) 2 (8)
By varying frequency and rotating the filter in x − y plane, VII. E XPERIMENTAL R ESULTS
we can obtain a bank of filters. We can choose different scales In this section, we present the results of the proposed
T = 2, 4, 6, 8 and so on. Similarly, the orientation θ can be method for the super-resolution. We compare the performance
varied as θ = 0◦ , 30◦ , 60◦ , 90◦ and so on. of the proposed method with other methods on the basis
B. Marginal Distribution Prior of quality of images. All the experiments were conducted
As mentioned earlier, the histograms of the filtered images on real images. Each observed image is of size 64 × 64
estimate the marginal distribution of the image. We use this pixels. The super-resolved images are of size 128 × 128. We
marginal distribution as a prior. We obtain the close approxi- used the quantitative measure Mean Square Error (MSE) for
mation ZC of the HR image using discrete cosine transform comparison of the results. The MSE used here is
based learning approach as described in section II and assume P ˆ 2
i,j [f (i, j) − f (i, j)]
that the marginal distribution of the super-resolved image M SE = P 2
, (13)
should match that of the close approximation ZC . Let B be i,j [f (i, j)]

a bank of filters. We apply each of the filters in B to ZC where f (i, j) is the original high resolution image and fˆ(i, j)
(α)
and obtain filtered images ZC , where α = 1, . . . , |B|. We is estimated super-resolution image.
(α) (α)
compute histogram HC of ZC . Similarly, We apply each RESULTS TO BE INCLUDED HERE : IMAGES, TABLES
of the filter in B to the initial HR estimate and obtain filtered AND DISCUSSION ...
images Z (α) , where α = 1, . . . , |B|. We compute histogram
H (α) of Z (α) . We define the marginal distribution prior term VIII. C ONCLUSION
as, We have presented a technique to obtain super-resolution
|B| for an image captured using a low cost camera. The high
(α)
X
CH = |HC − H (α) |. (9) frequency content of the super-resolved image is learnt from
α=1 a database of low resolution images and their high reso-
VI. S UPER - RESOLVING THE I MAGE lution versions. The suggested technique for learning the
The final cost function consisting of the data fitting term high frequency content of the super-resolved image yields
and marginal distribution prior term can be expressed as close approximation to the solution. The LR observation is
" |B|
# represented using linear model and marginal distribution is
2
arg min k y − Dz k (α) used as prior information for regularization. The cost function
X
(α)
ẑ = z +λ |HC − H | , (10)
2σn2 α=1 consisting of a data fitting term and a marginal distribution
where λ is a suitable weight for the regularization term. prior term is optimized using particle swarm optimization. The
The cost function consists of non-linear term it cannot be optimization process converges rapidly. It may be concluded
minimized using simple gradient descent optimization tech- that the proposed method yields better results considering both
nique. We employ particle swarm optimization and avoid the smoother regions as well as texture regions and greatly reduces
computationally complex optimization methods like simulated the optimization time.
annealing.
Let S be the swarm. The swarm S is populated of images
Zp , p = 1, . . . , |S| expanded using existing interpolation
techniques such as bicubic interpolation, lanczose interpolation
and learning based approaches. Each pixel in this swarm is a
particle. The dimension of the search space for each image is
D = N × N . The i-th image of the swarm can be represented
by a D-dimensional vector, Zi = (zi1 , zi2 , . . . , ziD )T . The
velocity of particles in this image can be represented by
another D-dimensional vector Vi = (vi1 , vi2 , . . . , viD )T . The
best previously visited position of the i-th image is denoted as
Pi = (pi1 , pi2 , . . . , piD )T . Defining g as the index of the best
particle in the swarm, the swarm is manipulated according to
the following two equations [?],
n+1 n
vid = wvid + c1 r1 (pnid − zid
n
) + c2 r2 (pngd − zid
n
)(11)
n+1 n n
zid = zid + vig , (12)
where d = 1, 2, . . . , D; i = 1, 2, . . . , F ; w is weighting
function, r1 and r2 are random numbers uniformly distributed
in [0, 1], n is iteration number and c1 and c2 are cognitive and
social parameter, respectively. The fitness function in our case
is the cost function that has to be minimized.

You might also like