Professional Documents
Culture Documents
An Accurate Deep Convolutional Neural Networks Model For No-Reference Image Quality Assessment
An Accurate Deep Convolutional Neural Networks Model For No-Reference Image Quality Assessment
An Accurate Deep Convolutional Neural Networks Model For No-Reference Image Quality Assessment
978-1-5090-6067-2/17/$31.00 2017
c IEEE
978-1-5090-6067-2/17/$31.00 ©2017 IEEE ICME 2017
1356
no-reference IQA and achieved very similar results to full- extracted using the 9/7 DWT. In [4], Moorthy and Bovik pre-
reference IQA methods. Inspired by [7], Liang et al. [8] pro- sented the DIIVINE algorithm, which improves upon BIQI by
posed an IQA method using similar scene as reference and using a steerable pyramid transform with two scales and six
implement full-reference and no-reference IQA methods in orientations. In [15, 3], Saad et al. presented the BLIINDS-I
the paper. Their no-reference IQA method named CNN-NR- and BLIINDS-II algorithms which estimate quality based on
d achieved the state-of-the-art performances in two popular DCT statistics. In [5], Mittal et al. presented the BRISQUE
IQA databases. However, these methods have limitations. algorithm, a fast no-reference IQA algorithm, which employs
The shallow network structure and the training strategy limit statistics measured in the spatial domain.
their stability. Since 2012, CNNs have been widely applied in com-
In this paper, in order to address the limitations of previ- puter vision fields with the success of deep CNNs in image
ous CNNs based no-reference IQA methods, we proposed an classification. Successful deep CNNs models such as deep
accurate deep CNNs model for no-reference IQA. The main residual network [9] have been continuously proposed and
contributions of our work lie in three aspects: i) Inspired by achieved outstanding performances. In [7], Kang et al. pro-
[9], we proposed a deep CNNs model for no-reference IQA , posed convolutional neural networks for no-reference IQA
which has better stability and performances; ii) we achieved and achieved performance leaps than classic methods. In [8],
an end-to-end IQA method without pre-processing, which is Liang et al proposed an IQA method using similar scene as
used by previous CNNs based methods; iii) by training the reference, which is also achieved by using dual-path convo-
proposed model on image patches, which are employed FSIM lutional neural network. For comparison, they also achieved
[2] value as label, we achieved state-of-the-art performances. full-reference and no-reference IQA methods using same net-
The rest of the paper is organized as follows. In Section 2, work architecture.
we introduce related works. Then we describe our proposed Although CNNs-based IQA methods achieved outstand-
model in Section 3. In Section 4, we present experimental ing performances, they have limitations in three aspects: first,
results. Finally, we draw conclusions in Section 5. they take the subjective evaluation value of the entire image
as a label of small patches. Thus all the image patches which
are taken from same image have a same label. Although this
2. RELATED WORK
work for uniformly distributed noises, it makes the trained
No-reference IQA methods aims to predict the quality of the model has poor generalization ability; second, the input im-
distorted images without reference images. A large amount of ages of these two methods go through a preprocessing, so
study has been conducted on to address the no-reference IQA these two methods is not end-to-end method. We believe that
problem. The vast majority of no-reference IQA algorithm- convolutional neural networks can learn features from raw
s attempt to detect specific types of distortion such as blur- images without preprocessing; third, the network they used
ring, blocking, ringing, or various forms of noise. In our re- is shallow, some revolutionary techniques like deep residual
search, we focus on more general-purpose no-reference IQA network [9] motivates us to use deeper network architecture.
algorithms which do not attempt to detect specific types of
distortions. This type of methods typically reformulate the 3. IQA WITH PROPOSED MODEL
IQA problem into a classification and regression problem in
which the regressors/classifiers are trained using specific fea- In most of the IQA cases, there is no reference image provid-
tures. The relevant features are either discovered via machine ed for distorted images. Although there existed some very ex-
learning or specified by using natural-scene statistics [10]. cellent no-reference IQA methods, the performance of these
In [11], Tang et al. presented a learning-based no- methods is limited by the lack of large scale image quality as-
reference IQA method named LBIQ, which measures various sessment database and shallow network architecture. There-
low-level quality features derived from natural-scene and tex- fore, we propose a deep CNNs model for no-reference IQA
ture statistics. LBIQ estimates quality via a regression based in this section. The model can accurately predict the quality
combination of the features. In [12, 13], Ye and Doermann of distorted images and has well generalization ability. The
presented the CBIQI and CBIQ-II algorithms which operate proposed model is detailed in the following subsections.
based on visual codebooks.
Another popular approach to no-reference IQA is to use 3.1. Deep CNNs Model for No-Reference IQA
natural scene statistics (NSS). The main idea in this approach
is that natural images demonstrate certain statistical regular- We present an accurate deep CNNs model for no-reference
ities that can be affected in the presence of distortion. This IQA problem. For an input image, we split it to multiple
kind of methods always require training, where the regression 32 ˆ 32 small patches without overlaps at first. Then, the
model is used to map the measured features to an associated proposed model predict quality score for each small patch.
quality score. In [14], Moorthy and Bovik presented the BIQI At the end, the final quality score of the input image is the
algorithm which estimates quality based on statistical features average sum of the quality score of small patches. The net-
1357
9[SRG_KX
VUUROTMRG_KX
YO`K YZXOJK
RGHKR
,)
RUYY
Fig. 1. The network architecture of our method. Our network consists of six convolutional layers, one max pooling layer, two
sum layers and two fully connected (FC) layers. The output of the second FC layer is the predicted quality score of the input
image patch. The overall quality score of the input image is the average sum of the predicted score of small patches.
work architecture of our model is shown in Fig. 1. As shown scent we add two sum layers to our method. As revealed in
in this figure, our model consists of six convolutional layer- [9], sum layer is very effective to train deep CNNs model. The
s with taking rectified linear unit (ReLU) [16] as activation first sum layer adds the outputs of second and the third convo-
function, two fully connected layers and a max pooling layer. lutional layers and send them to the pooling layer. Similarly,
Its worth noting that we add two sum layers to our network the second sum layer adds the outputs of the 5th and 6th con-
inspired by deep residual network [9]. volutional layers and send them to first fully connected layer.
The second fully connected layer outputs the predicted val-
3.2. Layers ue. The image score is predicted by minimizing the following
Euclidean loss,
The network configuration is listed in Table 1. Our model
has six convolutional layers to extract local features. Each min ||f pX; W q ´ Y || (3)
convolutional layer takes ReLU [16] as activation function. W
Denote Cj as the feature map of the j th layer, Wj and Bj as where X and Y denote the input image patch and its label
the weight and bias of the j th layer, then the local information respectively and f pX; W q be the predicted score of X with
is extracted into deeper layers by Eq.(1), where * denotes the network weights W .
calculation of convolution. We set the bias Bj to zero in our
model.
3.3. Learning
Cj`1 “ maxp0, Wj ˚ Cj ` Bj q (1) Stochastic gradient decent (SGD) and backpropagation are
In order to reduce the complexity and computation cost, used to solve the parameters W in Eq.(3) that minimizes the
we add pooling layers to our model. We employed max pool- loss between predicted score and ground truth. In particular,
ing with 2 ˆ 2 window size. So after pooling layer, the feature weights of the convolutional or fully connected layers can be
map size is reduced to half. The max pooling is applied as updated as Eq.(4) [8]
Eq.(2), where R is the pooling region of corresponding fea- BL
Δi`1 “ m ¨ Δi ´ η BW j
ture map. i
(4)
j
Cj`1 “ max Cj (2) Wi`1 “ Wij ` Δi`1 ´ ληWij
R
Our model is a deep CNNs model with six convolutional where m is the momentum factors, η is the learning rate,
layers. In order to easily converge and prevent gradient de- j is index of the layer and Δi`1 is the gradient increment
1358
Reference image score=0.849, DMOS=44.84 score=0.425, DMOS=72.49
Fig. 2. An example of predicted quality of our method. We show the performance of our model with five different kind of
distortions at different degree. The score of each image is predicted by our model (1 is the best quality and 0 is the worst). For
comparison, we put the DMOS value for each distorted image (higher DMOS represent lower perceptual quality).
4. EXPERIMENT
Table 1. Configuration of our deep CNNs model
4.1. Datasets
The following two datasets are used in our experiments.
for training iteration i. λ is the weight decay factor. Mo- (1) LIVE [18]: A total of 779 distorted images with five
mentum factor and weight decay factor were fixed to 0.9 and different distortions - JP2k compression (JP2K), JPEG com-
0.0005 respectively in our model training. Learning rate is set pression (JPEG), White Noise (WN), Gaussian blur (BLUR)
to different values ranges from 0.01 to 0.00001 at different e- and Fast Fading (FF) at 7-8 degradation levels derived from
poches. Experiment details will be further explained in next 29 reference images. Each image has a differential Mean
section. Opinion Scores (DMOS), which are in the range [0,100].
1359
Table 2. SROCC and LCC scores for testing on LIVE
1360
5. CONCLUSION using similar scene as reference,” in European Confer-
ence on Computer Vision. Springer, 2016, pp. 3–18.
In this paper, we proposed a highly accurate deep CNNs mod-
el for no-reference image quality assessment. We achieved [9] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian
an end-to-end no-reference image quality assessment method Sun, “Deep residual learning for image recognition,”
without any pre-processing and post-processing procedures. arXiv preprint arXiv:1512.03385, 2015.
By taking the large scale of image patches as the training set
[10] Damon M Chandler, “Seven challenges in image quality
and taking its FSIM value as the label, our model can achieve
assessment: past, present, and future research,” ISRN
promising quality prediction results. We used the contents in
Signal Processing, vol. 2013, 2013.
LIVE dataset to train and test our method. In order to ver-
ify stability of our method, we also conducted cross dataset [11] Huixuan Tang, Neel Joshi, and Ashish Kapoor, “Learn-
test on TID2008 dataset. The experimental results confirm ing a blind measure of perceptual image quality,” in
a performance leap relative to ten compared state-of-the-art Computer Vision and Pattern Recognition (CVPR), 2011
methods. IEEE Conference on. IEEE, 2011, pp. 305–312.
1361