Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score

Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score Manish K Thakur, Vikas
Saxena, J P Gupta
Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score
*1,
Manish K Thakur, 2Vikas Saxena, 3J P Gupta Jaypee Institute of Information Technology, Noida, India, mthakur.jiit@gmail.com 2, Jaypee Institute of Information Technology, Noida, India, vikas.saxena@jiit.ac.in 3, Sharda University, Greater Noida, India, jaip.gupta@gmail.com
Abstract
Due to ever growing needs of multimedia data for various applications like video conferencing and streaming, which involves human subjects, it is often desired to have good resumed perceptual quality of videos, thus arises the need of quality assessment. This paper discusses involved steps during subjective quality assessment and various available objective quality metrics like peak signal to noise ratio (PSNR) and structural similarity index (SSIM) along with their performance with respect to human visual system (HVS) models. This paper presents an HVS model which incorporates frame drop distortion and its impact over subjective score. Simulation results show that PSNR and SSIM are inefficient to analyse the quality deterioration due to frame drop.
Keywords: Frame Drop, Temporal Distortion, Spatial Distortion, PSNR, SSIM, Quality Metrics,
Subjective Quality Assessment
1. Introduction
Multimedia information is center of attraction in every ones day to day life. In last decade it has been accelerated due to ever growing internet/intranet technologies. From end users perspective, perceptually good quality multimedia data are always desired. Therefore, maintaining the quality of multimedia data is always a challenging task for researchers. Traditionally in network system the guaranteed bandwidth of channel facilitates with least risk in data flow error and non significant packet loss, thus not effecting the quality of service (QoS). But in modern network applications like multimedia over IP and video conferencing where dynamic and voluminous data flow is common across the network, may put forth the issue of (QoS) data quality maintenance during the network communication [1-5]. Therefore, standardisation of QoS is a motivational domain to be resolved before research community. Packet drop and jitter are the major distortion in networks. Various available remedial approaches of packet loss in traditional network the recovery is based upon either copying or averaging the previous packet on the next place, but in real time data flow like video conferencing the delayed packet or dropped packet would be recovered technically with meaningless application [6]. Hence not recovered in completion or having lesser video frames comparatively during network transmission at receiver end and resulting in frame drop. Due to late or early arrival of packets (+ve or ve jitter), video frames at receiving end might get disordered and resulting in frame swapping [7]. In both cases there will be positional disorder in received video stream. Apart from transmission, visual information may get distorted while compression, analysis and storage over it. These distortions might be accidental or due to malafied intentions and lead to deterioration in visual quality [8-13]. These distortions in a video stream might appear in spatial or temporal domain. Pixel bits of a video frame get changed accidentally or deliberately and classified as spatial distortions (SD). Video frames get dropped or swapped or averaged and classified as temporal distortions (TD). Like SD, these temporal distortions might be accidental or with malafied intentions. These distortions heavily deteriorate visual quality and badly affect QoS which is never desired. To standardise QoS, quality assessment of two videos (reference and distorted) for introduced percentage of distortions (either SD or TD) is required. There are various approaches for video quality assessment which are broadly categorised under subjective and objective models. Subjective models are based upon the human perceptions about the visual quality which
International Journal of Digital Content Technology and its Applications(JDCTA) Volume7,Number3,February 2013 doi:10.4156/jdcta.vol7.issue3.83
679
Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score Manish K Thakur, Vikas Saxena, J P Gupta
are the basis of human visual system (HVS) and further employed for designing many objective models [14, 15]. For quantifying the result, two types of objective metrics are applied i.e. statistical metrics and perceptual metrics such as PSNR and SSIM respectively. Statistical metrics employs the application of various mathematical functions, calculating pixel-by-pixel weighted differences between reference and distorted video while, perceptual metrics use HVS characteristics such as luminance contrast sensitivity, frequency contrast sensitivity and masking effects [16-19]. PSNR and SSIM are actually designed for quality analysis of images. Although they are also used for video quality analysis by taking average of computed indices of each frame of reference and distorted videos, but due to simply taking average of each frame, they might be inadequate to deal with temporal distortions (e.g. frame drop). Thus it is required to analyse whether these metrics are capable or not to analyse the visual quality due to temporal distortion. This paper is organized as follows; section 2 consolidates the requirements during subjective experiments. Objective metrics are discussed in section 3. In section 4, a subjective model has been presented which analyses the impact of frame drop over visual quality. This section also discusses the experiments performed to analyse the performance of PSNR and SSIM metrics. Analysis of these experiments has been addressed in section 5 followed by conclusion and references.
2. Subjective Quality Assessment

In subjective video quality assessment human subjects are involved to observe video quality under different schemes viz. full reference (FR), reduced reference (RR), and no reference (NR). In FR, quality estimation is done by comparing distorted video with reference video and human subject scores the quality according to prescribed scale [20, 21]. All scores provide the mean opinion score (MOS) which is computed by averaging all the gathered subjective measurements. The obtained MOS describes the decrement in video quality. Fig. 1 and subsequent paragraphs consolidates the involved steps during subjective video quality assessment under full reference. Reference and Distorted Video: Reference/original videos and set of distorted videos (distorted intentionally or accidentally) are required for quality assessment. Reference videos either can be taken from publicly available video data sets such as set of interlaced videos provided by Video Quality Expert Group or Laboratory of Image and Video Engineering data sets or set of videos provided by www.media.xiph.org or it can be picturised with high end equipment by researchers [2, 22-25]. A video sequence can be intentionally distorted (SD or TD) by attackers to misuse it, or it can be distorted unintentionally during some processing. Some common distortions and video attacks are listed in Table 1. Human Subjects: The survey of numerous papers suggested that typically 30 trained or untrained human subjects are to be used during subjective experiments [14, 20, 21]. Generally untrained subjects are involved during experiments and subjects are to be given a brief training regarding the experiment before it starts.
Reference and Distorted Video Viewing Conditions and Setup Experiments Procedure
Human Subjects
Subjective Video Quality Assessment
Experiments Sessions
Scoring Policy
Computation of Mean Opinion score
Subject Rejection Scheme
Figure 1. Steps involved during subjective quality measurement
680
Table 1. Most Often Temporal/Spatial Distortions/Attacks in Video [7, 10, 23-27]

Distortion/ Attacks Name
Blocking Effects Blur Jitter Bit Inversion Compression
Effects and cause of distortion

Due to this distortion, periodic discontinuities appear in each frame of the compressed video at block boundary. It generally occurs in compressed videos where block based compression has been used (like MPEG 1, H.261). Due to removal of high frequency components of video sequence during its compression, which generally uses DCT transformation, objects in a video frame are vague and less distinct. Abrupt transitions during display of video frames which is due to delay during video frames transmission over networks. It is related to the variance of frame/packet delays. Some pixel bits of a video frame might be inverted due to bit error rate or processing over video frames. Lot of significant data (like additional data added by creator of video stream for ownership) can be lost in a lossy compression, which generally uses DCT transformation. Video frames can be intentionally dropped by attackers or can be dropped during transmission in a network or can be dropped due to inadequate buffering space of receiver; some frames can be dropped from reconstruction of video. To make the video sequence of original length, some dropped frames might be inserted by averaging the adjacent frames or copying the last non dropped frame Video frames can be swapped intentionally by attackers or it may be due to late arrival of video frames while transmission. Video frames might be intentionally scaled (down or up), rotated, and translated resulting in loss of spatial information of a video frame.
Frame Drop Frame Averaging Frame Swapping Geometrical Attacks
Experiment Sessions: Experiments can be conducted in one session or in multiple sessions, based upon total time required to conduct the experiment. To avoid fatigue, generally experiments are conducted in multiple sessions of maximum 30 minutes; however one has to care about the identical environments. Experiments Procedure: There are two popular viewing styles; single stimulus continuous quality evaluation (SSCQE) where at a time one video is viewed and another one is double stimulus continuous quality evaluation where viewers watch two videos simultaneously on split screen [15]. Generally SSCQE procedure is followed for video quality assessment. Viewing Conditions and Setup: Atmospheric environment during experiments is crucial. Experiments can be conducted with artificial light or with proper sun light. Further, CRT monitors are generally used where high visual quality is available and degradation can be properly visualized [2]. Care is to be taken about the viewing distance and angle also [22]. Scoring Policy: During subjective experiments, human subjects are required to rate the quality. These scores can be quantative (subjects rate the quality at the scale of 5 or 10) or qualitative [2, 14]. In qualitative approach ratings are based upon qualitative parameters like; bad, poor, fair, good, excellent etc. Subject Rejection: Non-serious human subjects are required to be rejected during subjective experiments [23]. There can be random rejection of subjects or a scheme can be made for it. One of such scheme is to have random selection of subjects during the experiments and same experiments can be performed with other sets of subjects. The repeated process will identify the non-serious subjects. Mean Opinion Score: After rejection of non-serious subjects, the MOS of various experiments will be computed which provides the quality degradation [14, 28]. It can be simply MOS or Differential MOS.
681
3. Objective Video Quality Assessment

Objective quality metrics can be mathematical or perceptual models and classified according to the availability of reference visual information. These metrics can be designed either as FR quality assessment, where reference information along with distorted information is available for analysis or NR quality assessment where quality analysis is done without reference information or RR where little information about reference video is available [29-31]. Some of the objective quality metrics are PSNR, SSIM, VQM (Video quality metrics), and Sarnoff JND model [9, 19, 28, 32]. PSNR: It is a full reference quality metric and most often used metric for analyzing image and video quality. It is a mathematical model and computes the quality (PSNR index) as given in equation 1. PSNR index is calculated by taking logarithmic ratio of maximum amplitude of picture elements and mean square error (MSE) as given in equation 2, and defined as the square of the difference between picture element in reference video frame and picture element in distorted video frame [33, 34].
PSNR [ dB ] 10 log
M 1 N 1 i0 j0
10
( 2 n 1) 2 MSE
' 2 a ij ]
(1) (2)
MSE
1 MN
[a
ij
Where, n is the number of bits per picture element of reference video, M N is frame size of video data, aij is reference picture element, and aij is picture element of distorted video. PSNR lies between the ranges of 0dB to 100dB, visual information without any distortion has PSNR as 100dB, and however PSNR between 30 to 40dB is perceptually acceptable. PSNR is most often used quality metric because of its simple and faster computation but as it is computed by considering pixel by pixel difference, most often it is impossible to meet HVS model [16, 19, 35, 36]. It can be observed from Fig. 2 that PSNR difference of 8 8 block of an image in Fig. 2b and Fig. 2c with respect to Fig. 2a is similar, whereas, perceptually the distortions in Fig. 2b is more visible because of centralized distortions then the distributed distortions in Fig. 2c. SSIM: It is a more complex computing metric based on different properties of HVS model that is starting to replace most widely used PSNR metric [28, 37]. It estimates quality of video by computing luminance, contrast and structure information of each frame of reference and distorted video sequences (Equation 3-7).
SSIM ( x , y ) l ( x , y ) c ( x , y ) s ( x , y )
(3)
2 2 l ( x, y ) (2 x y C1 ) ( x y C1 )
(4)
682
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
20 18 20 20 20 20 80 60
20 18 20 20 20 20 80 60
20 18 20 20 20 20 80 60
20 18 20 20 20 20 80 60
20 18 16 14 12 10 80 60
20 18 16 14 12 10 80 60
(a)
21 18 21 18 17 14 13 10 80 60 20 18 16 14 12 10 80 60 20 18 16 14 12 10 80 60
(b)
20 18 17 15 12 10 80 60 20 18 16 14 13 11 80 60 20 19 16 14 12 10 80 60 20 18 16 14 12 10 80 60
Figure 2. 8 x 8 blocks of different grey images
17 14 13 10 80 60
(c)
c ( x , y ) ( 2 x
2 C 2 ) ( x
2 y
C 2)
(5)
s ( x , y ) (
xy
C 3 ) ( x
C3)
(6)
Where, l(x, y) is luminance comparison (mean intensity), c(x, y) is contrast comparison (unbiased estimate of the standard deviation), s(x, y) is structure comparison (each signal is normalized by its own standard deviation) and C1, C2, and C3 are constants defined as follows, where, L is dynamic range of pixel values, and K1, K2 << 1.
C1 (K 1 L) 2 , C 2 (K 2 L) 2 , C 3 C 2 2
(7)
SSIM value of 1 indicates that compared frames are identical while lower values indicate degradation and structural dissimilarity [19]. SSIM metric is based on the fact that the HVS model is more sensitive to structural changes than to luminance and contrast changes; therefore it gives an edge over other metrics (PSNR) and does not face problems as illustrated by Fig. 2.
4. Experimental Details
Earlier many authors conducted subjective experiments to analyze the impact of packet drop (resulting in frame drop) over the experienced quality [1, 4, 5, 7, 18, 25]. In our experiments, we have observed the impact of contiguous frame drop in five video sequences (Fig. 3) available at www.media.xiph.org. Frame drops in reference video of length m, will give a distorted video of length n (m>n), and this mismatching of number of frames in reference and distorted videos at regular intervals should have considerable impact over experienced quality by subjects with respect to the selected parameters such as contiguous frame drop (cfd) and total dropped frames (tdf). Contiguous frame drop (cfd) in compressed video might be resulted in drop of many I frames, which are essential for respective codec like MPEG or H.261/H.263, thus bigger chunk of cfd is
683
undesirable with compressed videos [6]. Therefore, we have uncompressed the five video sequences of Fig. 3 and used raw (uncompressed) video sequences in our experiments. The uncompressed video sequences comprised of total number of frames ranging from 126 frames to 251 frames with frame rate of 25 fps. We conducted several experiments with above described raw videos by introducing SD and TD such as bit error using LSB watermarking and frame drop respectively and analyzed the experienced quality. Similarly, PSNR and SSIM quality metrics indices have been calculated for their performance analysis over temporally distorted videos. Our experiments have been categorised under different scenarios with different cases as follows and conducted experiments specifically for Scenario 2 and 3: Scenario 1: If reference videos are only spatially distorted, i.e. no TD is present in distorted video with respect to reference video which infers that experienced or computed distortion (by SSIM and PSNR) will be due to SD only. Scenario 2: If reference videos are only temporally distorted (frame drop), i.e. no SD is present in distorted video which infers that experienced or computed distortion will be due to TD only.
Bus
Coastguard
Galleon
Football
Stefan
Figure 3. Reference videos (available at www.media.xiph.org) used in section 4 Here we have tuned both cfd from low (where less frames have been contiguously dropped) to high and tdf from low (m>n) to high (m>>n). High tdf with low cfd indicates that there will be multiple instances of dropped frames in smaller chunk, whereas high cfd with low tdf indicates that there will be lesser instances of dropped frames. Here we have considered following cases with different possibilities: Case 4.1: If reference videos are distorted with low contiguous frame drop (we call cfd of 1 to 5% as low), and total dropped frames as low (we call tdf of 1 to 10% as low), Case 4.2: cfd is low, and tdf is high (we call tdf of 11 to 20% as high), Case 4.3: cfd is high (we call cfd of 6 to 10% as high), and tdf is low, Case 4.4: cfd is high, and tdf is high. Considering reference video VR as {VR1, VR2, .. VRm}, distorted video VD as {VD1, VD2, .. VDn}, and one chunk of the contiguous dropped frames is from VRj to VRj+c where c is the count of contiguously dropped frames in reference video from index j. Each case was tested for following four possibilities: Possibility 1: If VRj VRj+c (i.e. adjacent frames in between dropped frames are almost similar), and VRj+1 to VRj+c-1 are also almost similar to VRj or VRj+c like frames of video sequence Galleon. Possibility 2: If VRj VRj+c and VRj+1 to VRj+c-1 VRj or VRj+c, as VRj VRj+1, there is change of scene from previous non dropped frame to first dropped frame. Similarly there will be change of scene from last dropped frame VRj+c-1 to next non dropped frame VRj+c like frames of video sequence Stefan. Possibility 3: If VRj VRj+c (i.e. adjacent frames in between dropped frames are different) and VRj+1 to VRj+c-1 are almost similar to VRj like frames of video sequences Bus and Coastguard. Possibility 4: If VRj VRj+c and VRj+1 to VRj+c-1 VRj, as VRj VRj+1, there is change of scene from previous non dropped frame to first dropped frame. Similarly, there will be change of scene from last dropped frame VRj+c-1 to next non dropped frame VRj+c like frames of video sequence Football. Scenario 3: If reference videos are distorted both temporally and spatially which infer that whatever may be the experienced or computed distortion (SSIM and PSNR), it will be due to both SD and TD. We have considered the pre-defined cases and possibility as in Scenario 2 to investigate under low and high SD
684
(i.e. if introduced bit error using watermark embedding at 4th LSB then SD would be high, while for 1st LSB it would be low). To conduct the experiments for scenario 2, we distorted the video sequences by dropping video frames (starting frames, or middle frames, or last frames) such that cfd ranges from 1% - 10% (i.e. 2 to 25 frames of each video) and total dropped frames ranges from 1% - 20% (i.e. 2 to 50 frames). Further to conduct experiments for scenario 3, we intentionally introduced SD at specific positions (at LSB, and 4th LSB) using LSB watermarking [38] in the same dataset used in scenario 2 having TD. In total, we used 5 reference videos and 80 distorted videos (5 reference video 4 cases 4 possibilities) for scenario 2; whereas we used 5 reference videos and 160 distorted videos (80 distorted videos of scenario 2 2 types of SD including LSB and 4th LSB Watermarking) for scenario 3. We briefed experiment procedures to 30 naive viewers and conducted experiments in sessions (each session of 30 minutes with similar environments) using experiment procedure as SSCQE. We asked subjects to rate the quality at the scale of 5 where 1 represents the worst and 5 for excellent. During experiments, we disqualified 4 non serious subjects and removed their scores during MOS computation i.e. MOS computation is based upon scores of 26 subjects. Further, we have quantified the experienced average distortion in the form of categorical data such as (least, less, above average, average, below average, high, and highest) which corresponds to MOS in each between 5 to 4.9, 4.8 to 4.7, 4.6 to 4.4, 4.4 to 4.0, 3.9 to 3.0, 2.9 to 2.0, and 1.9 to 1.0 respectively.
5. Analysis
The experienced average scores in different scenarios have been represented in Table 2 and the computed average distortion by PSNR and SSIM metrics in Table 3 and Table 4. Generally, PSNR and SSIM quality metrics use to compute the quality frame by frame and get it averaged for entire video sequence, but here we have computed the average PSNR and SSIM of a video sequence up to length n only (i.e. length of distorted video) due to dissimilar length of videos (m and n, with m>n). In subjective experiments, it has been observed that drop of bigger chunk (i.e. high cfd) described as Case 4.3 and Case 4.4 is more experienced by subjects as compared with smaller chunks (i.e. low cfd) described as Case 4.1 and Case 4.2. Further possibilities 1 and 3 are lesser experienced by subjects as compared with possibilities 2 and 4 where there is an abrupt change observed in non-dropped adjacent frames. It is important to be noticed that when the cfd was low in possibilities 1 and 2, the missing clip was less experienced by subjects as compared with high cfd. If tdf is low then it is less experienced by subject, but it experienced the early completion of video if tdf was high. The distortions in scenario 2 and scenario 3 (with low SD) have experienced almost similar observations (Table 2), suggesting that low SD does not make any significant impact over the perceived quality under the low and high TD. But the deterioration in quality in scenario 3 (high SD) has been experienced maximum. The obtained observations may be interpreted that high SD has significant impact over the perceived quality irrespective of high and low TD. We observed poor performance in term of frame drop distortion of quality metrics (PSNR and SSIM) as detailed in Table 2 and 3. It has been observed from Table 2 (experienced quality), Table 3 (quality computation using PSNR index), and Table 4 (quality computation using SSIM index) that even though experienced quality deterioration is least due to few dropped frames, SSIM and PSNR compute huge deterioration. Table 2. Average experienced distortions by dropping frames in Scenario 2 and Scenario 3
Possi bility Scenario 2 Case 4.1
1 Least
Scenario 3 With Low SD Case 4.3

Less
With High SD Case 4.3

Less
Case 4.2
Least
Case 4.4
Above Avg.
Case 4.1
Least
Case 4.2
Less
Case 4.4
Above Avg.
Case 4.1
Avg.
Case 4.2
Avg.
Case 4.3
Below Avg.
Case 4.4
High
685
2 3 4
Least Least Least
Less Above Avg. Avg.
Above Avg. Avg. Below Avg.
Avg. Below Avg. High
Less Less Less
Less Above Avg. Avg.
Above Avg. Avg. Below Avg.
Avg. Below Avg. High
Avg. Below Avg. Below Avg.
Below Avg. Below Avg. High
Below Avg. High High
High Highe st Highe st
Table 3. Average distortions using PSNR by dropping frames in Scenario 2 and Scenario 3
Possi bility Scenario 2 Case 4.1
1 2 3 4 41.83 40.27 39.85 39.21
Scenario 3 With Low SD Case 4.3

30.74 28.78 26.54 25.89

22.96 22.45 22.21 21.76
Case 4.2
33.15 32.67 29.45 29.16
Case 4.4
30.62 28.15 27.39 25.78
Case 4.1
36.79 35.41 35.06 34.22
Case 4.2
25.37 24.12 23.85 23.17
Case 4.4
22.65 22.08 21.58 21.32
Case 4.1
24.41 24.27 24.04 23.98
Case 4.2
22.1 22.07 21.76 21.33
Case 4.3
20.09 20.01 19.78 19.54
Case 4.4
20.06 19.89 19.73 19.29
Table 4. Average distortions using SSIM by dropping frames in Scenario 2 and Scenario 3
Scenario 3 Possi bility Scenario 2 Case 4.1
1 2 3 4 0.89 0.78 0.73 0.7
With Low SD Case 4.3

0.71 0.69 0.61 0.61

0.64 0.63 0.61 0.61
Case 4.2
0.73 0.69 0.66 0.63
Case 4.4
0.62 0.61 0.6 0.58
Case 4.1
0.83 0.76 0.71 0.69
Case 4.2
0.68 0.67 0.64 0.63
Case 4.4
0.67 0.66 0.64 0.6
Case 4.1
0.71 0.7 0.62 0.61
Case 4.2
0.63 0.61 0.58 0.58
Case 4.3
0.6 0.59 0.54 0.51
Case 4.4
0.56 0.55 0.53 0.51
But the obvious reason is dropping of frames leads un-alignment/disordering in video frames. This made significant impact over the computed values by both metrics. Further, if disordering in frames appears at the beginning of a video sequence then it has more impact over the computed quality as compared with disordering appearing in middle or last video frames. The similar deterioration in visual quality might not be experienced (by subjects) which are affected due to positional disorder.
5. Conclusion
In this paper we presented the performance analysis of PSNR and SSIM metrics against one of the most often temporal distortion frame drop. First we presented a subjective model than through subjective experiments which involved 26 human subjects (4 non serious subjects were disqualified), 5 reference video sequences and 80/160 distorted video sequences (spatially distorted by LSB watermarking and temporally distorted by frame drop) we obtained the opinion scores for various test cases and computed the MOS. Same video sets (reference and distorted) had been used to compute PSNR and SSIM indices. In the analysis section we had observed that these metrics behavior are unpredictable and inefficient to analyse the quality deterioration due to frame drop.
686
6. References
[1]
[2]
[3] [4] [5] [6] [7] [8] [9]
[10]
[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
[23]
[24]
K. Stuhlmller, N. Frber, M. Link, and B. Girod, Analysis of Video Transmission over Lossy Channels, IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 1012-1032, 2000 G. Zhai, J. Cai, W. Lin, X. Yang, W. Zhang, and M. Etoh, Cross-Dimensional Perceptual Quality Assessment for Low Bit-Rate Videos, IEEE Transactions on Multimedia, vol. 10, no. 7, pp. 13161324, 2008 P. Pace, M. Belcastro, and E. Viterbo, Fast and accurate PQoS estimation over 802.11g wireless network, in Proceedings ICC 2008, pp. 262-267, 2008 C. Yim, A. C. Bovik, Evaluation of temporal variation of video quality in packet loss networks, Signal Processing: Image Communication 26 (2011), pp. 24-38, 2011 M. H. Pinson, S. Wolf, and G. Cermak, HDTV Subjective Quality of H.264 vs. MPEG-2, With and Without Packet Loss, IEEE Transactions on Broadcasting, vol. 56, no. 1, pp. 86-91, 2010 Z.N. Li and M. S. Drew, Fundamentals of Multimedia, Pearson Education. S. R. Gulliver and G. Ghinea, The Perceptual and Attentive Impact of Delay and Jitter in Multimedia Delivery, IEEE Transactions on Broadcasting, vol. 53, no. 2, pp. 449-458, 2007 E. D. Gelasca, Full-Reference Objective Quality Metrics for Video Watermarking, Video Segmentation and 3D Model Watermarking, Ph. D Dissertation G Gvozden, M Grgic, S Grgic, and M Gosta, Comparison of Video Coding Standards used in Mobile Applications, Chapter X of Handbook of Research on Mobile Multimedia, Second Edition, Information Science Reference, vol. 1, pp. 133-149 Y. Wang and A. Pearmain, Blind MPEG-2 Video Watermarking Robust Against Geometric Attacks: A Set of Approaches in DCT Domain, IEEE Transactions on Image Processing, vol. 15, no. 6, pp. 1536-1543, 2006 X. Jiang, T. Sun, J. Li, and Y. Yun, A Novel Real-Time MPEG-2 Video Watermarking Scheme in Copyright Protection, in Proceedings of IWDW 2008, LNCS 5450, pp. 4551, 2009 http://www.cyberlawsindia.net/index1.html. Last Accessed on Aug 22, 2012. http://cyber.law.harvard.edu/metaschool/fisher/integrity/Links/Articles/winick.html. Last Accessed on Aug 22, 2012 D. S. Hands, A Basic Multimedia Quality Model, IEEE Transactions on Multimedia, vol. 6, no. 6, pp. 806-816, 2004 Z. Wang, L. Lu, and A. C. Bovik, Video Quality Assessment Based on Structural Distortion Measurement, Signal Processing: Image Communication, vol. 19, no. 2, pp. 121-132, 2004 S. Winkler and P. Mohandas, The Evolution of Video Quality Measurement: From PSNR to Hybrid Metrics, IEEE Transactions on Broadcasting, vol. 54, no. 3, pp. 1-9, 2008 Z. Wang, H. R. Sheikh, and A. C. Bovik, Objective video quality assessment, in The Handbook of Video Databases: Design and Applications (B. Furht and O. Marques, eds.), CRC Press, 2003 M. H. Pinson and S. Wolf, A New Standardized Method for Objectively Measuring Video Quality, IEEE Transactions on Broadcasting, vol. 50, no. 3, pp. 312-322, 2004 A. Bovik, The Essential Guide to Image Processing, 2nd Edition, Academic Press, Elsevier, 2009 Methodology for Subjective Assessment of the Quality of Television Pictures Recommendation BT.500-11, International Telecommunication Union, Geneva, Switzerland, 2002 Subjective Video Quality Assessment Methods for Multimedia Applications Recommendation P.910, International Telecommunication Union, Geneva, Switzerland, 1996 R. Feghali, F. Speranza, D. Wang, and A. Vincent, Video Quality Metric for Bit Rate Control via Joint Adjustment of Quantization and Frame Rate, IEEE Transactions on Broadcasting, vol. 53, no. 1, pp. 441-446, 2007 A. K. Moorthy, K. Seshadrinathan, R. Soundararajan, and A. C. Bovik, Wireless video quality assessment: A study of Subjective scores and Objective algorithms, IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 4, pp. 587-599, 2010 K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, Study of Subjective and Objective quality assessment of video, IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1427-1441, 2010
687
[25] [26] [27]
[28]
[29] [30]
[31] [32] [33]
[34]
[35] [36]
[37]
[38]
K. Seshadrinathan and A. C. Bovik, Motion tuned spatio-temporal quality assessment of natural videos, IEEE Transactions on Image Processing, vol. 19, no. 2, pp. 335-350, 2010 M. Yuen and H. R. Wu, A survey of hybrid MC/DPCM/DCT video coding distortions, Signal Process., vol. 70, no. 3, pp. 247-278, 1998 M. K. Thakur, V. Saxena, J. P. Gupta, A Full Reference Algorithm for Dropped Frames Identification in Uncompressed Video Using Genetic Algorithm, International Journal of Digital Content Technology and its Applications, vol. 6, no. 20, pp 562-573, 2012 S. Chikkerur, V. Sundaram, M. Reisslein, and L. J. Karam, Objective Video Quality Assessment Methods: A Classification, Review, and Performance Comparison, IEEE Transactions on Broadcasting, vol. 57, no. 2, pp. 165-182, 2011 H. R. Sheikh and A. C. Bovik, Image Information and Visual Quality, IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430-444, 2006 C. Keimel, T. Oelbaum, and K. Diepold, No-reference video quality evaluation for highdefinition video, in Proc. IEEE Int. Conf. Acoust., Speech and Signal Process., pp. 11451148, 2009 S. Hemami and A. Reibman, No-reference image and video quality estimation: Applications and human-motivated design, Signal Process.: Image Commun., vol. 25, no. 7, pp. 469481, 2010 J. Lubin and D. Fibush, Sarnoff JND vision model, T1A1.5 Working Group Document #97-612, ANSI T1 Standards Committee, 1997 H. M. Xin and F. Zhao, Improved Denoising Method of Two Dimensional Gel Electrophoresis Images, International Journal of Digital Content Technology and its Applications, vol. 6, no. 11, pp. 352-360, 2012 M. K. Thakur, V. Saxena, J. P. Gupta, A performance analysis of objective video quality metrics for digital video watermarking, In Proceedings, 3rd IEEE International Conference on Computer Science and Information Technology, ICCSIT 2010, vol. 4, pp. 12-17, 2010 B. Girod, Whats wrong with mean-squared error, in Digital Images and Human Vision (A. B. Watson, ed.), the MIT press, pp. 207220, 1993 H. Duan and G. Chen, "A New Digital Halftoning Algorithm by Integrating Modified PulseCoupled Neural Network with Random Number Generator", International Journal of Digital Content Technology and its Applications, vol. 6, no. 12, pp. 29-37, 2012 Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 1-14, 2004 Abdullah B, Rosziati I, Nazib M S, A New Digital Watermarking Algorithm using Combination of Least Significant Bit (LSB) and Inverse Bit, Journal of Computing, vol. 3, Issue 4, pp. 1-8, 2011
688

Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score

Uploaded by

Copyright:

Available Formats

You might also like

Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score

Uploaded by

Copyright:

Available Formats

Performance Analysis of PSNR and SSIM Against Frame Drop and Its Subjective Score Manish K Thakur, Vikas

2. Subjective Quality Assessment

Subjective Video Quality Assessment

Computation of Mean Opinion score

Subject Rejection Scheme

Figure 1. Steps involved during subjective quality measurement

Table 1. Most Often Temporal/Spatial Distortions/Attacks in Video [7, 10, 23-27]

Effects and cause of distortion

Frame Drop Frame Averaging Frame Swapping Geometrical Attacks

3. Objective Video Quality Assessment

Figure 2. 8 x 8 blocks of different grey images

Scenario 3 With Low SD Case 4.3

With High SD Case 4.3

Least Least Least

Less Above Avg. Avg.

Above Avg. Avg. Below Avg.

Avg. Below Avg. High

Less Less Less

Less Above Avg. Avg.

Above Avg. Avg. Below Avg.

Avg. Below Avg. High

Avg. Below Avg. Below Avg.

Below Avg. Below Avg. High

Below Avg. High High

High Highe st Highe st

Scenario 3 With Low SD Case 4.3

With High SD Case 4.3

With Low SD Case 4.3

With High SD Case 4.3

[3] [4] [5] [6] [7] [8] [9]

[25] [26] [27]

[31] [32] [33]

You might also like