Professional Documents
Culture Documents
10 1 1 83 6857
10 1 1 83 6857
by entropy
a a b
Eric A. Silva* , Karen Panetta , Sos S. Agaian
a
Department of Electrical & Computer Engineering, Tufts University, 161 College
Avenue, Medford, MA 02155
b
College of Engineering, University of Texas at San Antonio, 6900 North Loop 1604
West, San Antonio, TX 78249
ABSTRACT
Measurement of image similarity is important for a number of image processing applications. Image
similarity assessment is closely related to image quality assessment in that quality is based on the apparent
differences between a degraded image and the original, unmodified image. Automated evaluation of image
compression systems relies on accurate quality measurement. Current algorithms for measuring similarity
include mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM).
They have some limitations: such as consistent, accuracy and incur greater computational cost.
In this paper, we show that a modified version of the measurement of enhancement by entropy (EME) can
be used as an image similarity measure, and thus an image quality measure. Until now, EME has generally
been used to measure the level of enhancement obtained using a given enhancement algorithm and
enhancement parameter. The similarity-EME (SEME) is based on the EME for enhancement. We will
compare SEME to existing measures over a set of images subjectively judged by humans. Computer
simulations have demonstrated its promise through a set of examples, as well as comparison to both
subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG.
Keywords: image quality assessment, image similarity
1. INTRODUCTION
Digital images are subject to a wide variety of distortions during acquisition, processing, storage,
transmission and reproduction, any of which may result in a degradation of visual quality. For applications
in which images are ultimately to be viewed by human beings, the only “correct” method of quantifying
visual image quality is through subjective evaluation. In practice, however, subjective evaluation is usually
too inconvenient, time-consuming and expensive. The goal of research in objective image quality
assessment is to develop quantitative measures that can automatically predict perceived image quality. An
objective image quality metric can play a variety of roles in image processing applications. First, it can be
used to dynamically monitor and adjust image quality; second, it can be used to optimize algorithms and
parameter settings of image processing systems; third, it can be used to benchmark image processing
systems and algorithms [1].
Measurement of image similarity is very important for image compression applications. When evaluating
image compression systems, the better an image similarity measure approximates the human visual system
(HVS), the more accurate the results of the evaluation. The most common measures applied to image
similarity are mean squared error (MSE) and peak signal-to-noise ratio (PSNR). It has been shown that
MSE and PSNR lack a critical feature: the ability to assess image similarity across distortion types [2], [1].
The distortions that digital images are subjected to result in a decrease in image quality. All image quality
and similarity measures are approximations (of varying accuracy) of the human visual system. The
opinions of human observers are the best benchmark for a similarity measure hoping to emulate the HVS.
*
eric.silva@tufts.edu; phone 1 781 515 7056; www.ece.tufts.edu/~esilva02
The complexity of the HVS is so great that it will not be completely understood with existing
psychophysical means. However, the use of even a “simplified model” of the HVS in similarity measures
improves performance [10]. We use the opinions of human observers to evaluate our measure and others.
In the last three decades, a great deal of effort has gone into the development of image quality measures [7].
See Table 1 for examples of existing similarity measures.
Table 1. Image quality measures
N. Cross-Correlation M N " M N
2
NK = !! F( j, k) F( j, k) / !![F( j, k)]
j=1 k =1 j=1 k =1
Maximum Difference !
MD = Max{| F( j, k ) " F( j, k ) |}
Image Fidelity & M N ) M N #
IF = 1 ' $ (( [F( j, k ) ' F( j, k )]2 / (( [F( j, k )] !
$ j=1 k =1 !
% j=1 k =1 "
Laplacian Mean Square Error M "1 N "1 # M "1 N "1
LMSE = !! [O{F( j, k )} " O{F( j, k )}]2 / !![O{F( j, k)}] 2
j=1 k = 2 j=1 k = 2
2. BACKGROUND
Many existing image similarity measures are simple and can be represented as mathematical quantities.
These include MSE, PSNR, mean absolute error (MAE), and root mean squared (RMS) error. Others
attempt to model the human visual system (HVS), such as the universal quality index (UQI) and SSIM. In
this section we will define the measures compared to our similarity measure: MSE, PSNR, and SSIM.
MSE and PSNR simply and objectively quantify the error signal and they do not model the human visual
system. They both have low computational complexities when compared to image similarity measures that
attempt to model the HVS. MSE and PSNR are acceptable image similarity measures when the images in
question differ by simply increasing distortion of a certain type [5]. The mathematical similarity measures
fail to capture image quality when they are used to measure across distortion types. This is critical when
evaluating compression techniques.
Table 2 in Section 3.1 shows this clearly as does [1]. The advantages of MSE and PSNR are that they are
very fast and easy to implement. However, they simply and objectively quantify the error signal. Although
PSNR is more accurate than MSE, it still fails when used to assess similarity across distortion types.
Below MSE, PSNR, and SSIM are defined. With PSNR and SSIM, greater values indicate greater image
similarity, while with MSE greater values indicate lower image similarity.
1 h w 2
MSE = " "
hw j=1 k=1
I j,k
Figure 1: Difference Mean Opinion Score vs. Mean Squared Error for 233 mean opinion scores
" MAX %
PSNR = 20 log10 $ '
# MSE &
Figure 2: Difference Mean Opinion Score vs. Peak Signal-to-Noise Ratio for 233 mean opinion scores
Where X and Y are the reference and distorted images, respectively and xj and yj are the local windows.
1 N
µx = ! xi
N i =1
(4)
Where x is the input block, N is the number of pixels in that block, and µ is the mean intensity of x.
2! x! y + C2
c(x, y) = (5)
! x2 + ! y2 + C2
Where x and y are the reference and distorted blocks and the standard deviation σx is used as an estimate of
contrast in x, which is shown below in equation (6). C2 is a constant, and for the purposes of this paper was
C2 = 2C3.
1/ 2
N
σx = $ 1 ( (x i ' µ x )2 !
& # (6)
$ N '1 !
% i =1 "
Where x is the input block and N is the number of pixels in that block.
! xy + C3
s(x, y) = (7)
! x! y + C3
Where x and y are the reference and distorted blocks and σx is as defined in equation (6). Equation (8)
shows the definition of σxy. C3 is a constant.
1 N
σxy ! (xi " µ x )(yi " µ x )
N " 1 i =1
(8)
Where x is the input block, µx is the mean intensity of x, and N is the number of pixels in x.
Figure 3: Difference Mean Opinion Score vs. Structural Similarity for 233 mean opinion scores
3. EXPERIMENT
In Section 3, we will describe the various experiments and methods used to evaluate our measure and
others. In Section 3.1 we will describe the difficulty in measuring similarity across distortion types. In
Section 3.2 we will discuss the use of subjective human opinions to evaluate image similarity measures.
Table 2 shows an image (a) distorted five different ways. In each instance, the MSE and PSNR are nearly
equivalent, which would suggest each image is of equal quality. This is clearly not the case. The SEME
and the SSIM measures vary substantially from distortion type to distortion type. With the exception of the
contrast stretch distortion, the SEME and SSIM measures have the same rank order. This simple example
illustrates one of the primary problems with mean squared error that is shared by peak signal-to-noise ratio.
3.2. Subjective quality opinions of JPEG compressed images
The Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin created the
LIVE database—a database of images judged by humans [1]. These human opinions of image quality were
used to evaluate our similarity measure as well as others. The LIVE database provides 233 images
distorted to varying degrees by JPEG compression, each with an associated difference mean opinion score
(DMOS) [3].
The DMOS of a given image represents the similarity and or quality of that image as judged by a group of
human observers. The higher the correlation between the DMOSs and the scores of a given similarity
measure, the better the measure. The DMOS is computed by taking the mean opinion score (MOS) of a
test image and subtracting it from the MOS of the original, undistorted image [3]. See equations (9) and
(10).
Each image has a MOS, which is the average quality score given to an image by test subjects, without
outliers. Further discussion of mean opinions scores is available in [3].
N
1 (9)
MOS = ! score i
N i =1
The scatter plots in figures Figure 1–Figure 7 visually show the relationship between each similarity
measure. The performance metrics in section 4.1 provide an objective assessment of similarity measures
for JPEG images. It is important to note that these performance metrics do not evaluate similarity measures
across distortion types.
The performance metrics in section 4.1 show the relationship between each similarity measure and the
subjective opinions. We would expect to see a strong correlation in the scatter plots of successful similarity
measures. Correlation coefficients between the objective and subjective scores are computed after non-
linear regression analysis [3].
4. NEW SIMILARITY MEASURE
In this section we define a new similarity measure. The EME is a measure of entropy. It was originally
created as to determine optimal parameters for enhancement by measuring image quality blindly (without a
reference or “original” image) [4]. To measure the similarity of images, the entropy of the error signal (the
difference between the images) is measured with SEME. The absolute values of the minimum and
maximum luminances are taken to accumulate differences, to prevent certain types of differences from
canceling out other differences. The logarithmic nature of the original EME has been eliminated leaving a
linear function [4]. This seems counterintuitive due to the logarithmic nature of the human visual system,
but yields better results for similarity measurement. The HVS lacks a complete or nearly complete
mathematical model, therefore the omission of the logarithmic function should be of little concern [5].
In equations (11) and (12) we define the two variants of SEME.
w w
1 k2 k1 I max;i , j " I min;i , j
SEME r = !! I w w
(11)
k1k 2 j =1 i =1 max;i , j + I min;i, j
Where k1 and k2 are the block number of horizontal and vertical blocks in an image (which is related to the
block and image size) and Imax; i,j and Imin; i,j are the maximum and minimum values of the block,
respectively.
w w
1 k2 k1 ' I max;i , j ( I min;i , j $
SEME m = !! % w w
MSE ( I kw,l )" (12)
k1k 2 j =1 i=1 % I max;i , j + I min;i , j "#
&
Where k1 and k2 are the block number of horizontal and vertical blocks in an image (which is related to the
block and image size) and Imax; i,j and Imin; i,j are the maximum and minimum values of the block,
respectively. MSE(Ii, j) is the mean squared error of the block.
k k w w
1 2 1 Imax;i, j " Imin;i, j
SEME r = ## w + I w
k1k 2 j=1 i=1 Imax;i, j min;i, j
Figure 4: Difference Mean Opinion Score vs. SEME-r-3 (SEME-r with 3 × 3 pixel windows) for 233
mean opinion scores
Figure 5: Difference Mean Opinion Score vs. SEME-r-4 (SEME-r with 4 × 4 pixel windows) for 233
mean opinion scores
k k w w
1 2 1 Imax;i, j " Imin;i, j
SEME m = # # w + I w MSE(Ik,lw )
k1k 2 j=1 i=1 Imax;i, j min;i, j
Figure 6: Difference Mean Opinion Score (DMOS) vs. SEME-m-3 (SEME-m with 3 × 3 pixel windows)
for 233 mean opinion scores
Figure 7: Difference Mean Opinion Score (DMOS) vs. SEME-m-4 (SEME-m with 4 × 4 pixel windows)
for 233 mean opinion scores
6($ D2 )
" = 1# (14)
N ( N 2 #1)
Where D is the difference between the rank of the similarity measure in question and the subjective
reference scores and N is the number of data points.
!
Consistency
Prediction consistency can be measure by calculating the outlier ratio (OR) of the data. Outliers are
defined as those data points outside a window of two standard deviations [3].
Performance Summary
Table 3. Similarity measure performance summary, CC: correlation coefficient, OR: outlier ratio, RMS: root mean
squared error, MAE: mean absolute error, SROCC: Spearman rank order correlation coefficient
Non-linear Regression Rank-order
Model
CC OR RMS MAE SROCC
PSNR 0.852 0.217 8.36 6.75 0.844
MSSIM 0.923 0.166 5.96 4.52 0.903
MSE 0.858 0.200 8.22 6.35 0.844
SEME-r-3 0.855 0.154 8.28 6.08 0.790
SEME-r-4 0.859 0.183 8.18 5.92 0.811
SEME-m-3 0.895 0.211 7.12 5.48 0.877
SEME-m-4 0.901 0.194 6.95 5.31 0.881
The greatest consistency can be found using SEME-r-3, however the greatest monotonicity and accuracy is
still found using MSSIM, followed closely by SEME-m-4 and SEME-m-3. SEME-m-4 is nearly as
accurate as MSSIM, but can be computed in half as much time.
4.3. Speed
Running speed of similarity and quality measures is important is systems where adjustments in
compression, etc. are made in real-time, such as streaming video systems. The table below shows the
estimated performance of similarity measures in terms of running time. The elapsed time was taken by
running each measure on a set of 233 images.
Table 4: MATLAB implementation running on a PowerPC 1.5 GHz, 512 MB RAM, running on 233 images.
5. CONCLUSION
The capacity for an image similarity measure to evaluate across distortion types is critical. We have
introduced a new measure that accomplishes this and exceeds the best existing measure in terms of
consistency. We have introduced a computationally fast algorithm that is a closer approximation to the
human visual system than existing fast algorithms. We have shown that SEME measures image similarity
across distortion types with high accuracy, speed, and consistency. Within the class of similarity measures
that successfully measure across distortion types, our new measure is greater than twice as fast as MSSIM.
MSSIM still represents the greatest amount of accuracy and monotonicity, but is two to four times slower
than the competing measures.
6. REFERENCES
1. Wang, Bovik, Sheikh, et al. “Image Quality Assessment: From Error Visibility to Structural Similarity,”
IEEE Transactions of Image Processing, vol. 13, pp. 1-12, April 2004.
2. Michael P. Eckert, Andrew P. Bradley, “Perceptual quality metrics applied to still image compression,”
Signal Processing 70, pp. 177-200, 1998.
3. Final report from the video quality experts group on the validation of objective models of video quality
assessment, phase II” March 2003. http://www.vqeg.com/.
4. Sos S. Agaian, Blair Silver, and Karen A. Panetta, “Transform coefficient histogram-based image
enhancement algorithms using contrast entropy,” IEEE Transactions on Image Processing, vol. 16, pp.
741-758, Mar. 2007.
5. Zhou Wang and Alan C. Bovik, “A Universal Image Quality Index,” IEEE Signal Processing Letters,
vol. 9, no. 3, pp. 81-84, Mar. 2002.
6. Ramesh Jain, S. N. Jayaram Murthy, Peter L-J Chen (Luong Tran), Shankar Chatterjee, “Similarity
Measures for Image Databases,” SPIE Proceedings Storage and Retrieval for Image and Video Databases,
pp. 58-65, 1995.
7. Ahmet M. Eskicioglu and Paul S. Fisher, “Image Quality Measures and Their Performance,” IEEE
Transactions on Communications, vol. 43, pp. 2959-2965, Dec. 1995.
8. Blair Silver, Sos Agaian, and Karen Panetta, “Logarithmic transform coefficient histogram matching
with spatial equalization,” Proceedings: Visual Information Processing XIV, SPIE Defense and Security
Symposium 2005, Vol 5817, Mar. 2005. pp. 237-249.
9. Hamid Rahim Sheikh and Alan C. Bovik, “Image Information and Visual Quality,” IEEE Transactions
on Image Processing, vol. 15, no. 2, Feb. 2006.
10. A. M. Eskicioglu and P. S. Fisher, “A survey of quality measures for gray scale image compression,” in
Proc. 1993 Space and Earth Science Data Compression Workshop (NASA conference Publication 3191),
Snowbird, Utah, Apr. 2, 1993, pp. 49-61.
11. G. G. Kuperman and D. L. Wilson, “Objective and subjective assessment of image compression
algorithms,” in Society for Information Display Int. Symp. Digest of Technical Papers, vol. 22, pp. 627-
630, 1991.
12. T. N. Pappas and R. J. Safranek, “Perceptual criteria for image quality evaluation,” in Handbook of
Image and Video Proc., A. Bovik, Ed. New York: Academic, 2000.
13. Z. Wang, H. R. Sheikh, and A. C. Bovik, “Objective video quality assessment,” in The Handbook of
Video Databases: Design and Applications, B. Furht and O. Marques, Eds. Boca Raton, FL: CRC Press,
2003.
14. S. Winkler, “Issues in vision modeling for perceptual video quality assessment,” Signal Processing,
vol. 78, pp. 231–252, 1999.
15. Harilaos Koumaras, Anastasios Kourtis, and Drakoulis Martakos, “Evaluation of Video Quality Based
on Objectively Estimated Metric,” Journal of Communications and Networks, Vol. 7, No. 3, Sept. 2005.
16. Y. K. Lai and J. Kuo, “A Haar wavelet approach to compressed image quality measurement,” Journal
of Visual Communication and Image Representation, vol. 11, pp. 81–84, 2000.
17. J. Lauterjung, “Picture quality measurement,” in Proc. Int. Broadcasting Convention (IBC’98),
Amsterdam, 1998, pp. 413-417.