Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Quantifying image similarity using measure of enhancement

by entropy
a a b
Eric A. Silva* , Karen Panetta , Sos S. Agaian
a
Department of Electrical & Computer Engineering, Tufts University, 161 College
Avenue, Medford, MA 02155
b
College of Engineering, University of Texas at San Antonio, 6900 North Loop 1604
West, San Antonio, TX 78249

ABSTRACT

Measurement of image similarity is important for a number of image processing applications. Image
similarity assessment is closely related to image quality assessment in that quality is based on the apparent
differences between a degraded image and the original, unmodified image. Automated evaluation of image
compression systems relies on accurate quality measurement. Current algorithms for measuring similarity
include mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM).
They have some limitations: such as consistent, accuracy and incur greater computational cost.
In this paper, we show that a modified version of the measurement of enhancement by entropy (EME) can
be used as an image similarity measure, and thus an image quality measure. Until now, EME has generally
been used to measure the level of enhancement obtained using a given enhancement algorithm and
enhancement parameter. The similarity-EME (SEME) is based on the EME for enhancement. We will
compare SEME to existing measures over a set of images subjectively judged by humans. Computer
simulations have demonstrated its promise through a set of examples, as well as comparison to both
subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG.
Keywords: image quality assessment, image similarity

1. INTRODUCTION
Digital images are subject to a wide variety of distortions during acquisition, processing, storage,
transmission and reproduction, any of which may result in a degradation of visual quality. For applications
in which images are ultimately to be viewed by human beings, the only “correct” method of quantifying
visual image quality is through subjective evaluation. In practice, however, subjective evaluation is usually
too inconvenient, time-consuming and expensive. The goal of research in objective image quality
assessment is to develop quantitative measures that can automatically predict perceived image quality. An
objective image quality metric can play a variety of roles in image processing applications. First, it can be
used to dynamically monitor and adjust image quality; second, it can be used to optimize algorithms and
parameter settings of image processing systems; third, it can be used to benchmark image processing
systems and algorithms [1].
Measurement of image similarity is very important for image compression applications. When evaluating
image compression systems, the better an image similarity measure approximates the human visual system
(HVS), the more accurate the results of the evaluation. The most common measures applied to image
similarity are mean squared error (MSE) and peak signal-to-noise ratio (PSNR). It has been shown that
MSE and PSNR lack a critical feature: the ability to assess image similarity across distortion types [2], [1].
The distortions that digital images are subjected to result in a decrease in image quality. All image quality
and similarity measures are approximations (of varying accuracy) of the human visual system. The
opinions of human observers are the best benchmark for a similarity measure hoping to emulate the HVS.

*
eric.silva@tufts.edu; phone 1 781 515 7056; www.ece.tufts.edu/~esilva02
The complexity of the HVS is so great that it will not be completely understood with existing
psychophysical means. However, the use of even a “simplified model” of the HVS in similarity measures
improves performance [10]. We use the opinions of human observers to evaluate our measure and others.
In the last three decades, a great deal of effort has gone into the development of image quality measures [7].
See Table 1 for examples of existing similarity measures.
Table 1. Image quality measures

Image Quality Measures


Peak Mean Square Error 1 M N "
PMSE = !! [F( j, k ) # F( j, k )]2 /[Max{F( j, k )}]2
MN j=1 k =1
Average Difference M N "
AD = !![F( j, k) F( j, k)] / MN
j=1 k =1

Structural Content M N M N "


2 2
SC = !![F( j, k)] / !![F( j, k)]
j=1 k =1 j=1 k =1

N. Cross-Correlation M N " M N
2
NK = !! F( j, k) F( j, k) / !![F( j, k)]
j=1 k =1 j=1 k =1

Correlation Quality M N " M N


CQ = !! F( j, k) F( j, k) / !! F( j, k)
j=1 k =1 j=1 k =1

Maximum Difference !
MD = Max{| F( j, k ) " F( j, k ) |}
Image Fidelity & M N ) M N #
IF = 1 ' $ (( [F( j, k ) ' F( j, k )]2 / (( [F( j, k )] !
$ j=1 k =1 !
% j=1 k =1 "
Laplacian Mean Square Error M "1 N "1 # M "1 N "1
LMSE = !! [O{F( j, k )} " O{F( j, k )}]2 / !![O{F( j, k)}] 2

j=1 k = 2 j=1 k = 2

N. Absolute Error M N " M N


NAE = !! | O{F( j, k )} # O{F( j, k )} | / !! | O{F( j, k)} |
j=1 k =1 j=1 k =1

N. Mean Square Error M N " M N


2 2
NMSE = !![O{F( j, k)} # O{F( j, k)}] / !![O{F( j, k)}]
j=1 k =1 j=1 k =1

Note: For LMSE, O{F(j,k)}=F(j+1,k)+F(j-1,k)+F(j,k+1)+F(j,k-1)-4F(j,k). For NAE and NMSE,


O{F(j,k)} is defined in three ways: (1) O{F(j,k)}=F(j,k), (2) O{F(j,k)}=F(j,k)1/3, (3)
O{F(u,v)}=H{(u2+v2)1/2}F(u,v) (in cosine transform domain).
Reviews on image and video quality assessment algorithms can be found in [2], [12], [13], [14], [15], [16],
and [17]. In [5] the Universal Quality Index (UQI) was introduced by Wang and Bovik and can
successfully measure image similarity across distortion types.
Recently, Structural Similarity (SSIM) measure was introduced by Wang, Bovik, et al [1] and measures
similarity across distortion types. Both UQI and SSIM measure similarity with greater accuracy and
consistency than MSE and PSNR, but incur greater computational cost. We use the opinions of human
observers to evaluate our measure and others. In this paper we focused our experiments on MSE and
PSNR due to their commonness and SSIM due to its high performance. In the last three decades, a great deal
of effort has gone into the development of quality assessment methods that take advantage of known
characteristics of the human visual system (HVS) [1].
Our goal was to develop a fast image similarity measure that approximates the human visual system (HVS)
and is consistent as well as accurate. It is necessary to use an objective measure to evaluate the quality of
compressed images. MSE and PSNR are common, but they are ineffective at approximating human
subjectivity across distortion types. They also have limited performance within a distortion type. These
measures are simple and can be easily represented mathematically, which may account for their popularity
[2]. SSIM and UQI are accurate and can measure across distortion types, but they are computationally
complex.
The expansion of new compression techniques requires methods to evaluate performance. The ability to
measure image similarity and quality across distortion types is critical when comparing different
compression schemes. In this paper, we introduce a new similarity measure based upon the measure of
enhancement by entropy (EME) that successfully does so [4]. We call this similarity-EME, or SEME. We
compare existing and new similarity measures with human opinions using various performance metrics.
The rest of the paper is organized as follows. Section 2 presents a background on a subset of similarity
measures. In Section 3, a quantification of subjective assessments is presented. Section 4 examines the
efficiency of the proposed method by comparing subjectively derived data to the corresponding ones that
result from our objective measure and others.

2. BACKGROUND
Many existing image similarity measures are simple and can be represented as mathematical quantities.
These include MSE, PSNR, mean absolute error (MAE), and root mean squared (RMS) error. Others
attempt to model the human visual system (HVS), such as the universal quality index (UQI) and SSIM. In
this section we will define the measures compared to our similarity measure: MSE, PSNR, and SSIM.
MSE and PSNR simply and objectively quantify the error signal and they do not model the human visual
system. They both have low computational complexities when compared to image similarity measures that
attempt to model the HVS. MSE and PSNR are acceptable image similarity measures when the images in
question differ by simply increasing distortion of a certain type [5]. The mathematical similarity measures
fail to capture image quality when they are used to measure across distortion types. This is critical when
evaluating compression techniques.
Table 2 in Section 3.1 shows this clearly as does [1]. The advantages of MSE and PSNR are that they are
very fast and easy to implement. However, they simply and objectively quantify the error signal. Although
PSNR is more accurate than MSE, it still fails when used to assess similarity across distortion types.
Below MSE, PSNR, and SSIM are defined. With PSNR and SSIM, greater values indicate greater image
similarity, while with MSE greater values indicate lower image similarity.
1 h w 2
MSE = " "
hw j=1 k=1
I j,k

Figure 1: Difference Mean Opinion Score vs. Mean Squared Error for 233 mean opinion scores

" MAX %
PSNR = 20 log10 $ '
# MSE &

Figure 2: Difference Mean Opinion Score vs. Peak Signal-to-Noise Ratio for 233 mean opinion scores

The structural similarity (SSIM) index is defined in [1] by equations (1)-(8).


M
1
MSSIM( X, Y) = ! SSIM(x , y j j ) (1)
M j =1

Where X and Y are the reference and distorted images, respectively and xj and yj are the local windows.

SSIM(x, y) = [l(x, y)]α ⋅[c(x, y)]β ⋅[s(x, y)]γ (2)


Where x and y are the reference and distorted blocks to operate on, l(x, y) is the luminance comparison,
c(x, y) is the contrast comparison, and s(x, y) is the structure comparison. These are described below in
equations (3), (5), and (7). The relative importance of luminance, contrast, and structure can be adjusted
with the parameters α, β, and γ, respectively.
2µ x µ y + C1
l(x, y) = (3)
µ x2 + µ y2 + C1
Where x and y are the reference and distorted blocks and µ is the mean intensity of each respective block,
which is show below in equation (4). C1 is a constant.

1 N
µx = ! xi
N i =1
(4)

Where x is the input block, N is the number of pixels in that block, and µ is the mean intensity of x.
2! x! y + C2
c(x, y) = (5)
! x2 + ! y2 + C2
Where x and y are the reference and distorted blocks and the standard deviation σx is used as an estimate of
contrast in x, which is shown below in equation (6). C2 is a constant, and for the purposes of this paper was
C2 = 2C3.
1/ 2
N
σx = $ 1 ( (x i ' µ x )2 !
& # (6)
$ N '1 !
% i =1 "
Where x is the input block and N is the number of pixels in that block.
! xy + C3
s(x, y) = (7)
! x! y + C3
Where x and y are the reference and distorted blocks and σx is as defined in equation (6). Equation (8)
shows the definition of σxy. C3 is a constant.

1 N
σxy ! (xi " µ x )(yi " µ x )
N " 1 i =1
(8)

Where x is the input block, µx is the mean intensity of x, and N is the number of pixels in x.

Figure 3: Difference Mean Opinion Score vs. Structural Similarity for 233 mean opinion scores
3. EXPERIMENT
In Section 3, we will describe the various experiments and methods used to evaluate our measure and
others. In Section 3.1 we will describe the difficulty in measuring similarity across distortion types. In
Section 3.2 we will discuss the use of subjective human opinions to evaluate image similarity measures.

3.1. Measuring similarity across distortion types


In this section we will illustrate the problem that occurs when most traditional measures are used to assess
similarity across distortion types. Popular similarity measures like mean squared error (MSE) and peak
signal-to-noise ratio (PSNR) simply and objectively quantify the error signal. They were not designed
specifically for the error signals created when an image is distorted, but for signal distortion in general.
Table 2. An original image with 5 variations, each depicting 4 quality/similarity measures with rank order in
parenthases

(a) Original (b) Contrast stretch (c) Mean shift


MSE = 272.9 MSE = 272.3
SEME = 0.2930 (3rd) SEME = 0.0000 (1st)
SSIM = 0.9470 (2nd) SSIM = 0.9837 (1st)
PSNR = 23.77 PSNR = 21.78

(d) JPEG compression (e) Blur (f) Impulsive noise


MSE = 275.1 MSE = 272.2 MSE = 266.7
SEME = 0.4326 (4th) SEME = 0.4819 (5th) SEME = 0.1220 (2nd)
SSIM = 0.6383 (5th) SSIM = 0.6944 (4th) SSIM = 0.7135 (3rd)
PSNR = 23.74 PSNR = 23.78 PSNR = 23.87

Table 2 shows an image (a) distorted five different ways. In each instance, the MSE and PSNR are nearly
equivalent, which would suggest each image is of equal quality. This is clearly not the case. The SEME
and the SSIM measures vary substantially from distortion type to distortion type. With the exception of the
contrast stretch distortion, the SEME and SSIM measures have the same rank order. This simple example
illustrates one of the primary problems with mean squared error that is shared by peak signal-to-noise ratio.
3.2. Subjective quality opinions of JPEG compressed images
The Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin created the
LIVE database—a database of images judged by humans [1]. These human opinions of image quality were
used to evaluate our similarity measure as well as others. The LIVE database provides 233 images
distorted to varying degrees by JPEG compression, each with an associated difference mean opinion score
(DMOS) [3].
The DMOS of a given image represents the similarity and or quality of that image as judged by a group of
human observers. The higher the correlation between the DMOSs and the scores of a given similarity
measure, the better the measure. The DMOS is computed by taking the mean opinion score (MOS) of a
test image and subtracting it from the MOS of the original, undistorted image [3]. See equations (9) and
(10).
Each image has a MOS, which is the average quality score given to an image by test subjects, without
outliers. Further discussion of mean opinions scores is available in [3].
N
1 (9)
MOS = ! score i
N i =1

DMOS = MOS original ! MOS distorted (10)

The scatter plots in figures Figure 1–Figure 7 visually show the relationship between each similarity
measure. The performance metrics in section 4.1 provide an objective assessment of similarity measures
for JPEG images. It is important to note that these performance metrics do not evaluate similarity measures
across distortion types.
The performance metrics in section 4.1 show the relationship between each similarity measure and the
subjective opinions. We would expect to see a strong correlation in the scatter plots of successful similarity
measures. Correlation coefficients between the objective and subjective scores are computed after non-
linear regression analysis [3].
4. NEW SIMILARITY MEASURE
In this section we define a new similarity measure. The EME is a measure of entropy. It was originally
created as to determine optimal parameters for enhancement by measuring image quality blindly (without a
reference or “original” image) [4]. To measure the similarity of images, the entropy of the error signal (the
difference between the images) is measured with SEME. The absolute values of the minimum and
maximum luminances are taken to accumulate differences, to prevent certain types of differences from
canceling out other differences. The logarithmic nature of the original EME has been eliminated leaving a
linear function [4]. This seems counterintuitive due to the logarithmic nature of the human visual system,
but yields better results for similarity measurement. The HVS lacks a complete or nearly complete
mathematical model, therefore the omission of the logarithmic function should be of little concern [5].
In equations (11) and (12) we define the two variants of SEME.
w w
1 k2 k1 I max;i , j " I min;i , j
SEME r = !! I w w
(11)
k1k 2 j =1 i =1 max;i , j + I min;i, j

Where k1 and k2 are the block number of horizontal and vertical blocks in an image (which is related to the
block and image size) and Imax; i,j and Imin; i,j are the maximum and minimum values of the block,
respectively.
w w
1 k2 k1 ' I max;i , j ( I min;i , j $
SEME m = !! % w w
MSE ( I kw,l )" (12)
k1k 2 j =1 i=1 % I max;i , j + I min;i , j "#
&
Where k1 and k2 are the block number of horizontal and vertical blocks in an image (which is related to the
block and image size) and Imax; i,j and Imin; i,j are the maximum and minimum values of the block,
respectively. MSE(Ii, j) is the mean squared error of the block.
k k w w
1 2 1 Imax;i, j " Imin;i, j
SEME r = ## w + I w
k1k 2 j=1 i=1 Imax;i, j min;i, j

Figure 4: Difference Mean Opinion Score vs. SEME-r-3 (SEME-r with 3 × 3 pixel windows) for 233
mean opinion scores
Figure 5: Difference Mean Opinion Score vs. SEME-r-4 (SEME-r with 4 × 4 pixel windows) for 233
mean opinion scores

k k w w
1 2 1 Imax;i, j " Imin;i, j
SEME m = # # w + I w MSE(Ik,lw )
k1k 2 j=1 i=1 Imax;i, j min;i, j

Figure 6: Difference Mean Opinion Score (DMOS) vs. SEME-m-3 (SEME-m with 3 × 3 pixel windows)
for 233 mean opinion scores
Figure 7: Difference Mean Opinion Score (DMOS) vs. SEME-m-4 (SEME-m with 4 × 4 pixel windows)
for 233 mean opinion scores

4.1. Fitting with Logistic Function


Fitting to a logistic function was used in favor of a linear fit. This was done because subjective testing may
have nonlinear results at the extremes of the testing range. The stability and error-variance are the critical
characteristics of an objective similarity measure, not linearity [3].
Therefore, a nonlinear regression was performed on the data prior to analysis. The following logistic
function was used on the recommendation of the Video Quality Experts Group (VQEG).
K (13)
DMOS p = "r(SM"t 0 )
1+ e
In equation (13), SM is the similarity measure value and DMOSp is transformed value. The DMOSp
(transformed objective scores) and DMOS (subjective scores) values are used to compute the performance
metrics described below.
!
4.2. Performance metrics
The VQEG has devised three performance metric categories, accuracy, monotonicity, and consistency to
compare video quality assessment models: accuracy, monotonicity, and consistency. We use these to
compare still image similarity models.
Accuracy
The accuracy of the model can be measured by computing a correlation coefficient between objective
similarity measurements and subjective scores.
Monotonicity
Prediction monotonicity can be measured with the Spearman rank order correlation coefficient (SROCC)
[3]. This measures correlation between the objective similarity measure’s rank order of the subjective
scores (DMOSs). The SROCC is described by the following equation.

6($ D2 )
" = 1# (14)
N ( N 2 #1)
Where D is the difference between the rank of the similarity measure in question and the subjective
reference scores and N is the number of data points.

!
Consistency
Prediction consistency can be measure by calculating the outlier ratio (OR) of the data. Outliers are
defined as those data points outside a window of two standard deviations [3].
Performance Summary
Table 3. Similarity measure performance summary, CC: correlation coefficient, OR: outlier ratio, RMS: root mean
squared error, MAE: mean absolute error, SROCC: Spearman rank order correlation coefficient
Non-linear Regression Rank-order
Model
CC OR RMS MAE SROCC
PSNR 0.852 0.217 8.36 6.75 0.844
MSSIM 0.923 0.166 5.96 4.52 0.903
MSE 0.858 0.200 8.22 6.35 0.844
SEME-r-3 0.855 0.154 8.28 6.08 0.790
SEME-r-4 0.859 0.183 8.18 5.92 0.811
SEME-m-3 0.895 0.211 7.12 5.48 0.877
SEME-m-4 0.901 0.194 6.95 5.31 0.881

The greatest consistency can be found using SEME-r-3, however the greatest monotonicity and accuracy is
still found using MSSIM, followed closely by SEME-m-4 and SEME-m-3. SEME-m-4 is nearly as
accurate as MSSIM, but can be computed in half as much time.

4.3. Speed
Running speed of similarity and quality measures is important is systems where adjustments in
compression, etc. are made in real-time, such as streaming video systems. The table below shows the
estimated performance of similarity measures in terms of running time. The elapsed time was taken by
running each measure on a set of 233 images.

Table 4: MATLAB implementation running on a PowerPC 1.5 GHz, 512 MB RAM, running on 233 images.

Measure Elapsed Time (seconds)


MSE 253 ||||||||||||
PSNR 288 ||||||||||||||
MSSIM 1207 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SEME-r-3 328 ||||||||||||||||
SEME-r-4 335 |||||||||||||||||
SEME-m-3 579 |||||||||||||||||||||||||||
SEME-m-4 458 |||||||||||||||||||||||

5. CONCLUSION
The capacity for an image similarity measure to evaluate across distortion types is critical. We have
introduced a new measure that accomplishes this and exceeds the best existing measure in terms of
consistency. We have introduced a computationally fast algorithm that is a closer approximation to the
human visual system than existing fast algorithms. We have shown that SEME measures image similarity
across distortion types with high accuracy, speed, and consistency. Within the class of similarity measures
that successfully measure across distortion types, our new measure is greater than twice as fast as MSSIM.
MSSIM still represents the greatest amount of accuracy and monotonicity, but is two to four times slower
than the competing measures.
6. REFERENCES
1. Wang, Bovik, Sheikh, et al. “Image Quality Assessment: From Error Visibility to Structural Similarity,”
IEEE Transactions of Image Processing, vol. 13, pp. 1-12, April 2004.
2. Michael P. Eckert, Andrew P. Bradley, “Perceptual quality metrics applied to still image compression,”
Signal Processing 70, pp. 177-200, 1998.
3. Final report from the video quality experts group on the validation of objective models of video quality
assessment, phase II” March 2003. http://www.vqeg.com/.
4. Sos S. Agaian, Blair Silver, and Karen A. Panetta, “Transform coefficient histogram-based image
enhancement algorithms using contrast entropy,” IEEE Transactions on Image Processing, vol. 16, pp.
741-758, Mar. 2007.
5. Zhou Wang and Alan C. Bovik, “A Universal Image Quality Index,” IEEE Signal Processing Letters,
vol. 9, no. 3, pp. 81-84, Mar. 2002.
6. Ramesh Jain, S. N. Jayaram Murthy, Peter L-J Chen (Luong Tran), Shankar Chatterjee, “Similarity
Measures for Image Databases,” SPIE Proceedings Storage and Retrieval for Image and Video Databases,
pp. 58-65, 1995.
7. Ahmet M. Eskicioglu and Paul S. Fisher, “Image Quality Measures and Their Performance,” IEEE
Transactions on Communications, vol. 43, pp. 2959-2965, Dec. 1995.
8. Blair Silver, Sos Agaian, and Karen Panetta, “Logarithmic transform coefficient histogram matching
with spatial equalization,” Proceedings: Visual Information Processing XIV, SPIE Defense and Security
Symposium 2005, Vol 5817, Mar. 2005. pp. 237-249.
9. Hamid Rahim Sheikh and Alan C. Bovik, “Image Information and Visual Quality,” IEEE Transactions
on Image Processing, vol. 15, no. 2, Feb. 2006.
10. A. M. Eskicioglu and P. S. Fisher, “A survey of quality measures for gray scale image compression,” in
Proc. 1993 Space and Earth Science Data Compression Workshop (NASA conference Publication 3191),
Snowbird, Utah, Apr. 2, 1993, pp. 49-61.
11. G. G. Kuperman and D. L. Wilson, “Objective and subjective assessment of image compression
algorithms,” in Society for Information Display Int. Symp. Digest of Technical Papers, vol. 22, pp. 627-
630, 1991.
12. T. N. Pappas and R. J. Safranek, “Perceptual criteria for image quality evaluation,” in Handbook of
Image and Video Proc., A. Bovik, Ed. New York: Academic, 2000.
13. Z. Wang, H. R. Sheikh, and A. C. Bovik, “Objective video quality assessment,” in The Handbook of
Video Databases: Design and Applications, B. Furht and O. Marques, Eds. Boca Raton, FL: CRC Press,
2003.
14. S. Winkler, “Issues in vision modeling for perceptual video quality assessment,” Signal Processing,
vol. 78, pp. 231–252, 1999.
15. Harilaos Koumaras, Anastasios Kourtis, and Drakoulis Martakos, “Evaluation of Video Quality Based
on Objectively Estimated Metric,” Journal of Communications and Networks, Vol. 7, No. 3, Sept. 2005.
16. Y. K. Lai and J. Kuo, “A Haar wavelet approach to compressed image quality measurement,” Journal
of Visual Communication and Image Representation, vol. 11, pp. 81–84, 2000.
17. J. Lauterjung, “Picture quality measurement,” in Proc. Int. Broadcasting Convention (IBC’98),
Amsterdam, 1998, pp. 413-417.

You might also like