An Accurate Deep Convolutional Neural Networks Model For No-Reference Image Quality Assessment

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Proceedings of the IEEE International Conference on Multimedia and Expo (ICME) 2017 10-14 July 2017

AN ACCURATE DEEP CONVOLUTIONAL NEURAL NETWORKS MODEL FOR


NO-REFERENCE IMAGE QUALITY ASSESSMENT

Bahetiyaer Bare, Ke Li, Bo Yan

School of Computer Science, Shanghai Key Laboratory of Intelligent Information Processing,


Fudan University,Shanghai 201203, China
byan@fudan.edu.cn

ABSTRACT Objective quality assessment methods can be categorized


The goal of image quality assessment (IQA) is to use com- into three groups based on whether and how reference im-
putational models to measure the consistency between im- ages are used. These three groups are : full-reference IQA
age quality and subjective evaluations. In recent years, con- methods, reduced-reference IQA methods, and no-reference
volutional neural networks (CNNs) have been widely used IQA methods. Taking full information of original image as
in image processing community and have achieved perfor- reference, full-reference IQA methods perform better than
mance leaps than non CNNs-based methods. In this work, other type of methods. Among various various kind of full-
we describe an accurate deep CNNs model for no-reference reference IQA methods, structural similarity [1] (SSIM) in-
IQA. Taking image patches as input, our deep CNNs model dex is a new standard for image processing applications. It
achieves an end-to-end method without any handcrafted fea- has better quality prediction results than classic methods like
tures and pre-processing procedures that are employed by pre- PSNR or MSE due to taking structural similarity as an im-
vious no-reference IQA methods. The proposed model con- portant factor. In [2], Zhang et al. proposed an feature simi-
sists of six convolutional layers, two fully connected layers, larity (FSIM) index based on the fact that human visual sys-
one max pooling layer and two sum layers. The experimental tem (HVS) understands an image mainly according to its low-
results verify that our model outperforms the state-of-the-art level features. According to the experiment results, FSIM
no-reference IQA methods and most of the full-reference IQA achieves the state-of-the-art performances on various image
metrics. quality databases. It predicts image quality very similar to the
HVS.
Index Terms— Image quality assessment, Deep learning,
Convolutional neural networks Although full-reference IQA provides a useful and effec-
tive way to evaluate quality differences, in many applications
the reference image is not available. So, no-reference IQA is
1. INTRODUCTION
required. Because humans often can not judge an distorted
image without reference image, it is very challenging from
With the development of social networks and the increasing
a computational perspective. No-reference measures can di-
number of imaging devices, an enormous amount of visu-
rectly quantify image degradations by exploiting features that
al data is making its way to consumers. Digital images are
are discriminant for image degradations. Most successful ap-
subject to a wide variety of distortions during acquisition,
proaches are Natural Scene Statistics (NSS) based methods.
processing, compression, storage, transmission and reproduc-
Early NSS based methods extracted features in transforma-
tion, any of which may result in a degradation of visual qual-
tion domains via DCT transform [3] or wavelet transform [4].
ity. Thus a perceptual evaluation process is needed for digital
However, it is very slow to extract features from transforma-
images. While human subjective judgments of images are the
tion domain. BRISQUE [5] extracted features from the spatial
most reliable assessment, these are time consuming and diffi-
domain, which leads to a significant reduction in computation
cult to obtain. Thus image quality assessment (IQA) methods
time. Very different from NSS based methods, CORNIA [6]
are used to automatically predict the visual quality of images.
demonstrates that it is possible to learn discriminant image
IQA methods can be used to optimize image processing algo-
features directly from the raw image pixels, instead of using
rithms and can be used to benchmark image processing sys-
handcrafted features.
tems and algorithms. Therefore, IQA plays a very important
role in image processing community. In recent years, with the explosion of CNNs based meth-
This work is supported in part by the National Key Research and De-
ods in computer vision and image processing tasks, some
velopment Plan (Grant No. 2016YFC0801005), and NSFC (Grant No.: CNNs based no-reference IQA methods are emerged. A-
61522202; 61370158). mong them, Kang et al. [7] proposed a CNNs model for

978-1-5090-6067-2/17/$31.00 2017
c IEEE
978-1-5090-6067-2/17/$31.00 ©2017 IEEE ICME 2017

1356
no-reference IQA and achieved very similar results to full- extracted using the 9/7 DWT. In [4], Moorthy and Bovik pre-
reference IQA methods. Inspired by [7], Liang et al. [8] pro- sented the DIIVINE algorithm, which improves upon BIQI by
posed an IQA method using similar scene as reference and using a steerable pyramid transform with two scales and six
implement full-reference and no-reference IQA methods in orientations. In [15, 3], Saad et al. presented the BLIINDS-I
the paper. Their no-reference IQA method named CNN-NR- and BLIINDS-II algorithms which estimate quality based on
d achieved the state-of-the-art performances in two popular DCT statistics. In [5], Mittal et al. presented the BRISQUE
IQA databases. However, these methods have limitations. algorithm, a fast no-reference IQA algorithm, which employs
The shallow network structure and the training strategy limit statistics measured in the spatial domain.
their stability. Since 2012, CNNs have been widely applied in com-
In this paper, in order to address the limitations of previ- puter vision fields with the success of deep CNNs in image
ous CNNs based no-reference IQA methods, we proposed an classification. Successful deep CNNs models such as deep
accurate deep CNNs model for no-reference IQA. The main residual network [9] have been continuously proposed and
contributions of our work lie in three aspects: i) Inspired by achieved outstanding performances. In [7], Kang et al. pro-
[9], we proposed a deep CNNs model for no-reference IQA , posed convolutional neural networks for no-reference IQA
which has better stability and performances; ii) we achieved and achieved performance leaps than classic methods. In [8],
an end-to-end IQA method without pre-processing, which is Liang et al proposed an IQA method using similar scene as
used by previous CNNs based methods; iii) by training the reference, which is also achieved by using dual-path convo-
proposed model on image patches, which are employed FSIM lutional neural network. For comparison, they also achieved
[2] value as label, we achieved state-of-the-art performances. full-reference and no-reference IQA methods using same net-
The rest of the paper is organized as follows. In Section 2, work architecture.
we introduce related works. Then we describe our proposed Although CNNs-based IQA methods achieved outstand-
model in Section 3. In Section 4, we present experimental ing performances, they have limitations in three aspects: first,
results. Finally, we draw conclusions in Section 5. they take the subjective evaluation value of the entire image
as a label of small patches. Thus all the image patches which
are taken from same image have a same label. Although this
2. RELATED WORK
work for uniformly distributed noises, it makes the trained
No-reference IQA methods aims to predict the quality of the model has poor generalization ability; second, the input im-
distorted images without reference images. A large amount of ages of these two methods go through a preprocessing, so
study has been conducted on to address the no-reference IQA these two methods is not end-to-end method. We believe that
problem. The vast majority of no-reference IQA algorithm- convolutional neural networks can learn features from raw
s attempt to detect specific types of distortion such as blur- images without preprocessing; third, the network they used
ring, blocking, ringing, or various forms of noise. In our re- is shallow, some revolutionary techniques like deep residual
search, we focus on more general-purpose no-reference IQA network [9] motivates us to use deeper network architecture.
algorithms which do not attempt to detect specific types of
distortions. This type of methods typically reformulate the 3. IQA WITH PROPOSED MODEL
IQA problem into a classification and regression problem in
which the regressors/classifiers are trained using specific fea- In most of the IQA cases, there is no reference image provid-
tures. The relevant features are either discovered via machine ed for distorted images. Although there existed some very ex-
learning or specified by using natural-scene statistics [10]. cellent no-reference IQA methods, the performance of these
In [11], Tang et al. presented a learning-based no- methods is limited by the lack of large scale image quality as-
reference IQA method named LBIQ, which measures various sessment database and shallow network architecture. There-
low-level quality features derived from natural-scene and tex- fore, we propose a deep CNNs model for no-reference IQA
ture statistics. LBIQ estimates quality via a regression based in this section. The model can accurately predict the quality
combination of the features. In [12, 13], Ye and Doermann of distorted images and has well generalization ability. The
presented the CBIQI and CBIQ-II algorithms which operate proposed model is detailed in the following subsections.
based on visual codebooks.
Another popular approach to no-reference IQA is to use 3.1. Deep CNNs Model for No-Reference IQA
natural scene statistics (NSS). The main idea in this approach
is that natural images demonstrate certain statistical regular- We present an accurate deep CNNs model for no-reference
ities that can be affected in the presence of distortion. This IQA problem. For an input image, we split it to multiple
kind of methods always require training, where the regression 32 ˆ 32 small patches without overlaps at first. Then, the
model is used to map the measured features to an associated proposed model predict quality score for each small patch.
quality score. In [14], Moorthy and Bovik presented the BIQI At the end, the final quality score of the input image is the
algorithm which estimates quality based on statistical features average sum of the quality score of small patches. The net-

1357
9[SRG_KX

/TV[Z OSGMKVGZINKY )UT\UR[ZOUTGRRG_KX )UT\UR[ZOUTGRRG_KX )UT\UR[ZOUTGRRG_KX


1KXTKR VGJJOTM  1KXTKR VGJJOTM  1KXTKR VGJJOTM 

VUUROTMRG_KX
YO`K YZXOJK 

RGHKR

,)
RUYY

9[SRG_KX )UT\UR[ZOUTGRRG_KX )UT\UR[ZOUTGRRG_KX )UT\UR[ZOUTGRRG_KX


1KXTKR VGJJOTM  1KXTKR VGJJOTM  1KXTKR VGJJOTM 
,)

Fig. 1. The network architecture of our method. Our network consists of six convolutional layers, one max pooling layer, two
sum layers and two fully connected (FC) layers. The output of the second FC layer is the predicted quality score of the input
image patch. The overall quality score of the input image is the average sum of the predicted score of small patches.

work architecture of our model is shown in Fig. 1. As shown scent we add two sum layers to our method. As revealed in
in this figure, our model consists of six convolutional layer- [9], sum layer is very effective to train deep CNNs model. The
s with taking rectified linear unit (ReLU) [16] as activation first sum layer adds the outputs of second and the third convo-
function, two fully connected layers and a max pooling layer. lutional layers and send them to the pooling layer. Similarly,
Its worth noting that we add two sum layers to our network the second sum layer adds the outputs of the 5th and 6th con-
inspired by deep residual network [9]. volutional layers and send them to first fully connected layer.
The second fully connected layer outputs the predicted val-
3.2. Layers ue. The image score is predicted by minimizing the following
Euclidean loss,
The network configuration is listed in Table 1. Our model
has six convolutional layers to extract local features. Each min ||f pX; W q ´ Y || (3)
convolutional layer takes ReLU [16] as activation function. W
Denote Cj as the feature map of the j th layer, Wj and Bj as where X and Y denote the input image patch and its label
the weight and bias of the j th layer, then the local information respectively and f pX; W q be the predicted score of X with
is extracted into deeper layers by Eq.(1), where * denotes the network weights W .
calculation of convolution. We set the bias Bj to zero in our
model.
3.3. Learning
Cj`1 “ maxp0, Wj ˚ Cj ` Bj q (1) Stochastic gradient decent (SGD) and backpropagation are
In order to reduce the complexity and computation cost, used to solve the parameters W in Eq.(3) that minimizes the
we add pooling layers to our model. We employed max pool- loss between predicted score and ground truth. In particular,
ing with 2 ˆ 2 window size. So after pooling layer, the feature weights of the convolutional or fully connected layers can be
map size is reduced to half. The max pooling is applied as updated as Eq.(4) [8]
Eq.(2), where R is the pooling region of corresponding fea- BL
Δi`1 “ m ¨ Δi ´ η BW j
ture map. i
(4)
j
Cj`1 “ max Cj (2) Wi`1 “ Wij ` Δi`1 ´ ληWij
R

Our model is a deep CNNs model with six convolutional where m is the momentum factors, η is the learning rate,
layers. In order to easily converge and prevent gradient de- j is index of the layer and Δi`1 is the gradient increment

1358
Reference image score=0.849, DMOS=44.84 score=0.425, DMOS=72.49

score=0.841, DMOS=48.07 score=0.725, DMOS=56.7 score=0.917, DMOS=25.73

Fig. 2. An example of predicted quality of our method. We show the performance of our model with five different kind of
distortions at different degree. The score of each image is predicted by our model (1 is the best quality and 0 is the worst). For
comparison, we put the DMOS value for each distorted image (higher DMOS represent lower perceptual quality).

4. EXPERIMENT
Table 1. Configuration of our deep CNNs model

Layer name Padding Filter size Stride Output size


In this section, we demonstrates experiments to validate the
effectiveness of the proposed model. We apply deep learning
input 0 32 ˆ 32 ˆ 3
toolbox Matconvnet [17] to train the deep CNNs model for
conv1/relu 0 5ˆ5 1 28 ˆ 28 ˆ 50 no-reference IQA. The LIVE [18] database is used to train
conv2/relu 0 3ˆ3 1 26 ˆ 26 ˆ 50 our model. At the training stage, we first extract 32 ˆ 32
conv3/relu 1 3ˆ3 1 26 ˆ 26 ˆ 50 patches with stride 32 from the images in LIVE database. S-
sum1 26 ˆ 26 ˆ 50
ince different image patches have different quality values, we
compute FSIM [2] value for them and take FSIM value as la-
max pooling 2ˆ2 2 13 ˆ 13 ˆ 50
bel. Then we train our model with different learning rates.
conv4/relu 0 3ˆ3 1 11 ˆ 11 ˆ 25 Learning rate is changed to 1{10 of the previous one at the
conv5/relu 0 3ˆ3 1 9 ˆ 9 ˆ 25 interval of five epoch. It begins with 0.1 and stops at 0.00001.
conv6/relu 1 3ˆ3 1 9 ˆ 9 ˆ 25 After reaches 0.00001, it becomes a fixed value. Finally, our
deep CNNs model is obtained by training after 80 epoches.
sum2 9 ˆ 9 ˆ 25
The experimental results are demonstrated in the following
fc1 1024 subsections.
fc2 1

4.1. Datasets
The following two datasets are used in our experiments.
for training iteration i. λ is the weight decay factor. Mo- (1) LIVE [18]: A total of 779 distorted images with five
mentum factor and weight decay factor were fixed to 0.9 and different distortions - JP2k compression (JP2K), JPEG com-
0.0005 respectively in our model training. Learning rate is set pression (JPEG), White Noise (WN), Gaussian blur (BLUR)
to different values ranges from 0.01 to 0.00001 at different e- and Fast Fading (FF) at 7-8 degradation levels derived from
poches. Experiment details will be further explained in next 29 reference images. Each image has a differential Mean
section. Opinion Scores (DMOS), which are in the range [0,100].

1359
Table 2. SROCC and LCC scores for testing on LIVE

Full-reference methods No-reference methods


SSIM PSNR FR-DCNN BRISQUE CORNIA CNN-NR CNN-NR-d Ours
LCC 0.945 0.875 0.977 0.942 0.935 0.953 0.968 0.974
SROCC 0.948 0.876 0.975 0.940 0.942 0.956 0.967 0.971

Higher DMOS indicates lower quality.


Table 3. SROCC and LCC scores for testing on TID2008
(2) TID2008 [19]: 1700 distorted images with 17 different
distortions derived from 25 reference images at 4 degradation
levels. In our experiments, we only consider the four common BRISQUE CORNIA CNN-NR CNN-NR-d Ours
distortions that are shared by the LIVE dataset, i.e. JP2k, SROCC 0.882 0.890 0.920 0.921 0.957
JPEG, WN and BLUR. Each image is associated with a Mean LCC 0.892 0.880 0.903 0.920 0.939
Opinion Score (MOS) in the range [0, 9]. Contrary to DMOS,
higher MOS indicates higher quality.
4.4. Cross Dataset Test

4.2. Evaluation In order to test the generalization ability of our method, we


trained proposed method on the whole LIVE [18] dataset and
We deploy Pearsons (linear) correlation coefficient (LCC) and conducted cross dataset test on the TID2008 [19] dataset.
Spearman Rank Order Correlation Coefficient (SROCC) to e- This kind of test is followed by the protocol of previous no-
valuate the performance of IQA algorithms. LCC measures reference IQA methods [5, 6, 7, 8]. Only four types of dis-
the linear dependence between two quantities and SROCC tortions that are shared by LIVE and TID2008 are examined
measures how well one quantity can be described as a mono- in this experiment. Because our method is trained by tak-
tonic function of another quantity. ing FSIM [2] value as the label of the image patches, we have
no mapping process like other CNNs-based no-reference IQA
methods [7, 8].
4.3. Evaluation on LIVE
The experimental results are listed in Table 3. We also
In order to test the correlation between our method and hu- compare performance of our method with the state-of-the-
man opinions, we conducted test on LIVE [18] dataset. S- art no-reference IQA methods: BRISQUE [5], CORNIA [6],
ince our approach is totally trained on the LIVE dataset CNN-NR [7], and CNN-NR-d [8]. As can be seen from this
by setting FSIM [2] values as label of image patches, we table, our method achieves the best performance on TID2008
can test our method on LIVE. We compare our method dataset among the no-reference IQA methods. Because our
with two widely used full-reference method, one deep CNNs method is more deeper than other CNNs-based no-reference
based full-reference method, and state-of-the-art no-reference IQA methods [7, 8] and learning different scores for differ-
IQA methods. They are: PSNR, SSIM [1], FR-DCNN [8], ent small patches that taken from same images, our method
BRISQUE [5], CORNIA [6], CNN-NR [7], and CNN-NR- has more generalization ability than compared state-of-the-art
d [8]. Experiment may he different from other no-reference methods.
IQA methods since they used 80% data for training and the
remaining 20% for testing. Because they randomly select
80% data for training and iterate train-test process up to 1000 4.5. Discussion
times, the experimental results are very similar to test all the
images of the datasets. So it is fair to compare our method Our model achieves good performance in two well-known
with other no-reference IQA methods. databases. By taking FSIM value as label of image patches
The experimental results are listed in Table 2. As shown and applying deeper CNNs model, we achieve better results
in this table, our method outperforms state-of-the-art no- than previous no-refrence IQA methods. Deeper CNNs mod-
reference methods in SROCC value and LCC value compar- el and larger size of image patches will further improve the
isons. When compared with full-reference IQA methods, our performance of CNNs based methods. In Fig.2, we show the
method generates very similar results to the state-of-the-art predicted score of our model to same images suffered from
deep CNNs based full-reference method. It is worth noting different distortions at different level. As we can observed
that our method outperforms two widely used full-reference from this figure, our method can predict image quality, which
IQA methods. is consistent with the HVS.

1360
5. CONCLUSION using similar scene as reference,” in European Confer-
ence on Computer Vision. Springer, 2016, pp. 3–18.
In this paper, we proposed a highly accurate deep CNNs mod-
el for no-reference image quality assessment. We achieved [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
an end-to-end no-reference image quality assessment method Sun, “Deep residual learning for image recognition,”
without any pre-processing and post-processing procedures. arXiv preprint arXiv:1512.03385, 2015.
By taking the large scale of image patches as the training set
[10] Damon M Chandler, “Seven challenges in image quality
and taking its FSIM value as the label, our model can achieve
assessment: past, present, and future research,” ISRN
promising quality prediction results. We used the contents in
Signal Processing, vol. 2013, 2013.
LIVE dataset to train and test our method. In order to ver-
ify stability of our method, we also conducted cross dataset [11] Huixuan Tang, Neel Joshi, and Ashish Kapoor, “Learn-
test on TID2008 dataset. The experimental results confirm ing a blind measure of perceptual image quality,” in
a performance leap relative to ten compared state-of-the-art Computer Vision and Pattern Recognition (CVPR), 2011
methods. IEEE Conference on. IEEE, 2011, pp. 305–312.

[12] Peng Ye and David Doermann, “No-reference image


6. REFERENCES
quality assessment using visual codebooks,” in IEEE
[1] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P International Conference on Image Processing (ICIP),
Simoncelli, “Image quality assessment: from error vis- pp. 3150–3153, 2011.
ibility to structural similarity,” IEEE transactions on [13] Peng Ye and David Doermann, “No-reference im-
image processing, vol. 13, no. 4, pp. 600–612, 2004. age quality assessment using visual codebooks,” IEEE
[2] Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang, Transactions on Image Processing, vol. 21, no. 7, pp.
“Fsim: a feature similarity index for image quality as- 3129–3138, 2012.
sessment,” IEEE transactions on Image Processing, vol. [14] Anush Krishna Moorthy and Alan Conrad Bovik, “A
20, no. 8, pp. 2378–2386, 2011. two-step framework for constructing blind image quali-
[3] Michele A Saad, Alan C Bovik, and Christophe Charri- ty indices,” IEEE Signal Processing Letters, vol. 17, no.
er, “Blind image quality assessment: A natural scene s- 5, pp. 513–516, 2010.
tatistics approach in the dct domain,” IEEE Transaction- [15] Michele A Saad, Alan C Bovik, and Christophe Char-
s on Image Processing, vol. 21, no. 8, pp. 3339–3352, rier, “A dct statistics-based blind image quality index,”
2012. IEEE Signal Processing Letters, vol. 17, no. 6, pp. 583–
[4] Anush Krishna Moorthy and Alan Conrad Bovik, 586, 2010.
“Blind image quality assessment: From natural scene [16] Vinod Nair and Geoffrey E. Hinton, “Rectified lin-
statistics to perceptual quality,” IEEE Transactions on ear units improve restricted boltzmann machines vinod
Image Processing, vol. 20, no. 12, pp. 3350–3364, 2011. nair,” in International Conference on Machine Learn-
[5] Anish Mittal, Anush Krishna Moorthy, and Alan Con- ing, 2010, pp. 807–814.
rad Bovik, “No-reference image quality assessment in
[17] Andrea Vedaldi and Karel Lenc, “Matconvnet: Convo-
the spatial domain,” IEEE Transactions on Image Pro-
lutional neural networks for matlab,” in Proceedings of
cessing, vol. 21, no. 12, pp. 4695–4708, 2012.
the 23rd ACM international conference on Multimedia.
[6] Peng Ye, Jayant Kumar, Le Kang, and David Doerman- ACM, 2015, pp. 689–692.
n, “Unsupervised feature learning framework for no-
[18] Hamid R Sheikh, Zhou Wang, Lawrence Cor-
reference image quality assessment,” in Computer Vi-
mack, and Alan C Bovik, “Live image qual-
sion and Pattern Recognition (CVPR), 2012 IEEE Con-
ity assessment database release 2. online,
ference on. IEEE, 2012, pp. 1098–1105.
http://live.ece.utexas.edu/research/quality,” 2005.
[7] Le Kang, Peng Ye, Yi Li, and David Doermann, “Con-
[19] Nikolay Ponomarenko, Vladimir Lukin, Alexander Ze-
volutional neural networks for no-reference image qual-
lensky, Karen Egiazarian, M Carli, and F Battisti,
ity assessment,” in Proceedings of the IEEE Conference
“Tid2008-a database for evaluation of full-reference vi-
on Computer Vision and Pattern Recognition, 2014, pp.
sual quality assessment metrics,” Advances of Modern
1733–1740.
Radioelectronics, vol. 10, no. 4, pp. 30–45, 2009.
[8] Yudong Liang, Jinjun Wang, Xingyu Wan, Yihong
Gong, and Nanning Zheng, “Image quality assessment

1361

You might also like