An Evaluation of Video Quality Assessment Metrics For Passive Gaming Video Streaming

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

An Evaluation of Video Quality Assessment Metrics for Passive

Gaming Video Streaming


Nabajeet Barman Steven Schmidt Saman Zadtootaghaj
Kingston University Quality and Usability Lab, TU Berlin Deutsche Telekom AG
London, United Kingdom Berlin, Germany Berlin, Germany
n.barman@kingston.ac.uk steven.schmidt@tu-berlin.de saman.zadtootaghaj@telekom.de

Maria G. Martini Sebastian Möller


Kingston University Quality and Usability Lab, TU Berlin
London, United Kingdom Berlin, Germany
m.martini@kingston.ac.uk sebastian.moeller@tu-berlin.de

ABSTRACT 1 INTRODUCTION
Video quality assessment is imperative to estimate and hence man- Gaming video streaming applications are becoming increasingly
age the Quality of Experience (QoE) in video streaming applications popular. They can be divided into two different, but related, ap-
to the end-user. Recent years have seen a tremendous advancement plications: interactive and passive services. Interactive gaming
in the field of objective video quality assessment (VQA) metrics, video streaming applications are commonly known as cloud gam-
with the development of models that can predict the quality of the ing, where the actual gameplay is performed on a cloud server. The
videos streamed over the Internet. However, no work so far has user receives the rendered gameplay video back on a client device
attempted to study the performance of such quality assessment and then inputs corresponding game commands. Such applications
metrics on gaming videos, which are artificial and synthetic and have received lots of attention, resulting in the rapid development
have different streaming requirements than traditionally streamed and acceptance of such services [1]. On the other hand, passive
videos. Towards this end, we present in this paper a study of the per- gaming video streaming refers to applications such as Twitch.tv1 ,
formance of objective quality assessment metrics for gaming videos where viewers can watch the gameplay of other gamers. Such appli-
considering passive streaming applications. Objective quality as- cations have received much less attention from both the gaming and
sessment considering eight widely used VQA metrics is performed video community despite the fact that Twitch.tv, with its nine mil-
on a dataset of 24 reference videos and 576 compressed sequences lion subscribers and about 800 thousand active viewers at the same
obtained by encoding them at 24 different resolution-bitrate pairs. time, is alone responsible for the 4th highest peak Internet traffic
We present an evaluation of the performance behavior of the VQA in the USA [2]. With the increasing popularity of such services,
metrics. Our results indicate that VMAF predicts subjective video along with demand for other over-the-top services such as Netflix
quality ratings the best, while NIQE turns out to be a promising and YouTube, the demand on network resources has also increased.
alternative as a no-reference metric in some scenarios. Therefore, to provide the end-user with a service at a reasonable
Quality of Experience (QoE) and satisfy the user expectation of
CCS CONCEPTS anytime, anyplace and any-content video service availability, it is
•Information systems → Multimedia streaming; necessary to optimize the video delivery process.
For the assessment of video quality, typically subjective tests are
KEYWORDS carried out. However, these tests are time-consuming and expensive.
Thus, numerous efforts are being made to predict the video quality
Gaming Video Streaming, Quality Assessment, QoE
through video quality assessment (VQA) metrics. Depending on
ACM Reference format: the availability and the amount of reference information, objective
Nabajeet Barman, Steven Schmidt, Saman Zadtootaghaj, Maria G. Martini,
video quality assessment (VQA) algorithms can be categorized into
and Sebastian Möller. 2018. An Evaluation of Video Quality Assessment
full-reference (FR), reduced-reference (RR), and no-reference (NR)
Metrics for Passive Gaming Video Streaming. In Proceedings of 23rd Packet
Video Workshop, Amsterdam, Netherlands, June 12–15, 2018 (Packet Video’18), metrics. So far, these metrics have been developed and tested for
6 pages. non-gaming videos, usually considering video on demand (VoD)
DOI: 10.1145/3210424.3210434 streaming applications. Also, some of the metrics such as NIQE
and BRISQUE are based on Natural Scene Statistics (for details
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed see Section 2). Gaming videos, on the other hand, are artificial
for profit or commercial advantage and that copies bear this notice and the full citation and synthetic in nature, have different streaming requirements (1-
on the first page. Copyrights for components of this work owned by others than the pass, Constant Bitrate (CBR)) and hence the performance of these
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission VQA metrics remains an open question. Our earlier study in [3]
and/or a fee. Request permissions from permissions@acm.org. found some differences in the performance of such metrics when
Packet Video’18, Amsterdam, Netherlands comparing gaming videos to non-gaming videos. Towards this end,
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
978-1-4503-5773-9/18/06. . . $15.00
DOI: 10.1145/3210424.3210434 1 https://www.twitch.tv/

7
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands N. Barman et al.

we present in this paper the evaluation and analysis of some of the considers only the spatial domain for its computation [9]. For both
most widely used VQA metrics. Since for applications such as live these metrics, we used the default settings and implementation as
video streaming, where due to the absence of reference information, provided by the authors.
FR and RR metrics cannot be used, we provide a more detailed
discussion on the performance of the NR metrics. We believe that 2.3 NR Metrics
the insight gained from this study will help to improve or design NR metrics try to predict the quality without using any source
better performing VQA metrics. The remainder of the paper is information. Since for gaming applications, a high-quality reference
organized as follows: Section 2 presents a discussion about the eight video is typically not available, the development of good performing
VQA metrics used in this work. Section 3 describes the dataset and no-reference metrics is of very high importance. Blind/referenceless
the evaluation methodology. The results and main observations are image spatial quality evaluator (BRISQUE) [10] is an NR metric
presented in Section 4 and Section 5 finally concludes the paper. which tries to quantify the possible loss of naturalness in an image
by using the locally normalized luminance coefficients. Blind image
2 OVERVIEW OF VQA METRICS quality index (BIQI) is a modular NR metric based on distortion
We start with a brief introduction of the eight VQA metrics consid- image statistics which is based on natural scene statistics (NSS)
ered in this work. The primary focus of this work is to evaluate the [11]. Natural Image Quality Evaluator (NIQE) is a learning-based
performance of the existing VQA metrics on gaming video content NR quality estimation metric which uses statistical features based
which has not been investigated. on the space domain NSS model [12].
For FR metrics, we use the results made available in the dataset.
2.1 FR metrics For ST-RREDOpt, SpEED-QA and BIQI we used the implementation
made available by the authors using the default settings. NIQE2
FR metrics refer to the VQA metrics which requires the availability
and BRISQUE3 calculations were done using the inbuilt MATLAB
of full reference information. We selected Peak Signal to Noise Ra-
function (version: R2017b).
tio (PSNR), Structural Similarity Index Metric (SSIM) [4] and Video
Multi-Method Assessment Fusion (VMAF) [5] as the choice of our
three FR metrics. Due to its simplicity and ease of computation,
3 EVALUATION DATASET AND
PSNR is one of the most widely used metrics for both image and METHODOLOGY
video quality assessment. SSIM, which computes the structural 3.1 Evaluation Dataset
similarity between the two images, was shown to correlate better
For this work, we use the GamingVideoSET, a public open source
with subjective judgement and hence is also widely used for both
dataset made available by the authors in [13]. We briefly describe
image and video quality assessment [4]. For video quality assess-
the dataset and the data used in this work and refer the reader to the
ment, frame-level PSNR and SSIM are temporally pooled (usually
dataset and associated publication for further information. Gam-
averaged) over the video duration to obtain a single score. VMAF is
ingVideoSET consists of a total of 24 gaming video sequences of 30
a fusion based metric which combines scores from three different
seconds duration obtained from two recorded video sequences from
metrics to obtain a single score between 0 to 100, with higher score
each of the 12 games considered. The dataset also provides subjec-
denoting a higher quality. The choice of VMAF along with PSNR
tive test results for 90 gaming video sequences obtained by encoding
and SSIM is influenced by our previous work which has shown to
six gaming videos in 15 different resolution-bitrate pairs (three res-
have a very high correlation with subjective scores [3].
olutions, 1080p, 720p and 480p) using the H.264/AVC compression
standard. In addition, a total of 576 encoded videos obtained by
2.2 RR Metrics encoding the 24 reference videos in 24 different resolution-bitrate
Reduced-reference metrics are used when only partial information pairs (inclusive of the ones used for subjective assessment) are pro-
about the reference video is available. As such they are less ac- vided in MP4 format. The encoding mode used is 1-pass, Constant
curate than FR metrics but are useful in applications where there Bitrate (CBR). In the rest of this paper, we refer to the part of the
is limited source information available such as limited bandwidth dataset reporting the subjective results as subjective dataset and the
transmissions. We used Spatio-temporal-reduced reference entropic whole dataset as full dataset.
differences (ST-RRED), an RR metric proposed by the authors in
[6], since it is one of the most widely used RR metrics with very 3.2 Evaluation Methodology
good performance on various VQA databases [7]. It measures the The standard practice to evaluate how well a VQA metric performs is
amount of spatial and temporal information differences in terms of to measure the correlation between the objective metric score with
wavelet coefficients of the frames and frame differences between subjective scores. In this work, we measure the performance of the
the distorted and received videos. In this work, we use the recently objective metrics in two phases. In the first phase, we compare the
developed optimized version of ST-RRED known as ST-RREDOpt performance of the VQA metrics with subjective scores considering
which calculates only the desired sub-band, resulting in almost the the subjective dataset. In the second phase, for a comprehensive
same performance as ST-RRED but almost ten times computation- evaluation of the VQA metrics on the full dataset, we compare the
ally faster [8]. In addition, we also use the recently proposed spatial VQA metric performance with a benchmark VQA metric. Since the
efficient entropic differencing for quality assessment (SpEED-QA)
model, which is almost 70 times faster than the original implemen- 2 https://de.mathworks.com/help/images/ref/niqe.html

tation of ST-RRED and seven times faster than ST-RREDOpt as it 3 https://de.mathworks.com/help/images/ref/brisque.html

8
An Evaluation of VQA Metrics for Passive Gaming Video Streaming Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands

50 1
PSNR (dB)
40

SSIM
0.9
30

20 0.8
500 1000 1500 2000 2500 3000 3500 4000 500 1000 1500 2000 2500 3000 3500 4000
Bitrate (kbps) Bitrate (kbps)
100 1000

ST-RREDOpt
VMAF

50 500

0 0
500 1000 1500 2000 2500 3000 3500 4000 500 1000 1500 2000 2500 3000 3500 4000
Bitrate (kbps) Bitrate (kbps)
3000 60
SpEEDQA

BRISQUE
2000
40
1000

0 20
500 1000 1500 2000 2500 3000 3500 4000 500 1000 1500 2000 2500 3000 3500 4000
Bitrate (kbps) Bitrate (kbps)
60 6

NIQE
BIQI

40 4

20 2
500 1000 1500 2000 2500 3000 3500 4000 500 1000 1500 2000 2500 3000 3500 4000
Bitrate (kbps) Bitrate (kbps)

Figure 1: Quality vs. Bitrate plots for eight different quality metrics for 1080p resolution.

encoded videos available are MP4, for FR and RR metric calculations, metrics4 . The results are reported separately for each resolution
we instead use the decoded, raw YUV videos obtained from the and also considering all three resolution-bitrate combined (all data).
encoded, MP4 videos (The videos at 480p and 720p resolution were It can be observed that VMAF results in the highest performance in
rescaled to 1080p YUV format using bilinear scaling filter, as was terms of both PLCC and SROCC values across all three resolutions
done by the authors in GamingVideoSET for subjective quality and all data. The two RR metrics have a similar performance in
evaluation). For NR metric calculations we instead use the encoded terms of correlation values across all resolution-bitrate pairs and
videos at their original resolution (without scaling 480p and 720p over all data. Hence for applications where an increased speed
videos to 1080p) due to the reasons discussed later in Section 4.6 of computation is of high importance, SpEEDQA can be selected
as RR metric as it is almost seven times faster than ST-RREDOpt.
4 RESULTS Among the NR metrics, BIQI performs the worst. BRISQUE and
NIQE result in almost the same performance for 1080p and 720p
4.1 VQA Metrics Variation With Bitrates resolutions, but for 480p resolution and all data, NIQE performs
Figure 1 shows the rate-distortion results for the eight VQA metrics better than BRISQUE.
for all twenty-four videos considering different bitrates for the
1080p resolution. Similar results for 720p and 480p resolution videos 4.3 Impact of resolution on VQA metrics
are also obtained but are not presented here due to lack of space. It It can be observed that in general, the performance of the VQA
can be observed that the FR and RR metrics, at higher bitrates, the metrics varies across different resolutions. For the FR and NR
quality gap between various content (due to content complexity) metrics, the performance decreases as one moves across from higher
decreases. Both RR metric results in identical behavior with both resolution to lower resolution videos. In contrast, both RR metrics
reaching saturation at higher bitrates. For NR metrics, almost a resulted in higher correlation in terms of PLCC with MOS scores
reverse trend is observed, with increased quality gap at higher for 720p resolution videos, followed by 1080p and 480p resolution
bitrates compared to at lower bitrates. videos. Fisher’s Z-test5 to assess the significance of the difference
between two correlation coefficients indicates that the difference
4.2 Comparison of VQA metrics with MOS between 720p and 1080p is not statistically significant, while the
The performance of a VQA metric with respect to subjective rat- difference between 720p and 480p is significant, Z = 2.954, p < 0.01.
ing is evaluated in terms of Pearson Linear Correlation Coefficient For all eight VQA metrics, the performance for the 480p resolution
(PLCC) and Spearman’s Rank Correlation Coefficient (SROCC) val-
4 While the authors in [13] makes available both raw MOS and MOS scores after outlier
ues. Negative PLCC and SROCC correlation values indicate that
detection, we in this work consider only the raw MOS scores and not the ones obtained
higher values for the respective metric indicate lower quality and without any subjective scores processing
vice versa. Table 1 shows the correlation values of the eight VQA 5 http://psych.unl.edu/psycrs/statpage/biv corr comp eg.pdf

9
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands N. Barman et al.

Table 1: Comparison of the performance of the VQA metric scores with MOS ratings in terms of PLCC and SROCC values. All
Data refers to the combined data of all three resolution-bitrate pairs. The best performing metric is shown in bold.

Table 2: Comparison of the performance of the VQA metric scores with VMAF scores in terms of PLCC and SROCC values.
All Data refers to the combined data of all three resolution-bitrate pairs. The best performing metric is shown in bold.

480p 720p 1080p All Data


Metrics
PLCC SROCC PLCC SROCC PLCC SROCC PLCC SROCC
PSNR 0.62 0.60 0.79 0.77 0.91 0.92 0.87 0.87
FR Metrics
SSIM 0.56 0.56 0.68 0.70 0.80 0.83 0.70 0.74
ST-RREDOpt -0.66 -0.85 -0.74 -0.89 -0.77 -0.91 -0.53 -0.61
RR Metrics
SpEEDQA -0.68 -0.88 -0.76 -0.92 -0.77 -0.93 -0.55 -0.63
BRISQUE -0.68 -0.68 -0.79 -0.79 -0.77 -0.78 -0.14 -0.14
NR Metrics BIQI -0.57 -0.54 -0.70 -0.71 -0.67 -0.68 -0.05 -0.05
NIQE -0.75 -0.77 -0.81 -0.81 -0.78 -0.76 -0.42 -0.42

(cf. Table 1) is considerably lower compared to the same VQA with VMAF scores. It can be observed that PSNR results in the
metric performance for the 720p and 1080p resolutions. Also, the highest correlation followed by SSIM. Similar to the correlation
decrease in performance for some metrics is higher than others. We values with MOS as reported in Table 1, both RR metrics result
explain this observation using an example metric, PSNR, as shown in similar performance. Also, it is observed that similar to results
in Figure 2b. Based on the figure, it can be observed that PSNR reported in Table 1, for some metrics the correlation values vary
for different bitrates at 480p resolution is not able to capture the significantly over different resolutions. At 1080p, PSNR results in
variation in MOS (cf. Figure 2a) as its value for the 480p resolution the highest PLCC scores and SpEEDQA results in higher SROCC
almost remain constant even at higher bitrates. VMAF, on the other values. At 720p and 480p, NIQE results in the highest PLCC scores
hand, as evident from Figure 2c, captures this variation quite well and SpEEDQA results in the highest SROCC values. These results
and hence results in increased performance overall and also across indicate towards the high potential for the use of RR and NR metrics
each individual resolutions. for quality evaluations for applications limited to a single resolution
and where full reference information is not available.

4.4 Comparison of VQA metrics with VMAF


In the previous section, we presented and evaluated the perfor-
4.5 Comparative performance analysis of NR
mance of the eight VQA metrics based on the subjective ratings metrics
using six reference gaming video sequences and 15 resolution- While the VQA metrics, in general, perform quite well, when consid-
bitrate pairs. It was found that across all conditions, VMAF resulted ering multiple resolutions their performance decreases. Compared
in the highest performance in terms of both PLCC and SROCC to FR and RR metrics, the performance degradation of NR metrics
values. In the absence of subjective ratings for the full dataset, for all data was considerably high. We investigate the reason behind
and taking into account the fact that our previous results showed such performance degradation across multiple resolution-bitrate
superior performance of VMAF among all eight VQA metrics, we pairs using Figure 3 which shows the scatter plot of BRSIQUE, BIQI
consider VMAF values as the reference score. We then evaluate and NIQE with VMAF scores considering all three resolutions. It
the rest of the seven VQA metrics on the full dataset (24 reference can be observed from Figure 3 that, when considering individual
video sequences and a total of 24 resolution-bitrate pairs, result- resolutions, the variation of the NR metric values with respect to
ing in a total of 576 encoded video sequences). Table 2 shows the VMAF values are somewhat well correlated and increases linearly
PLCC and SROCC correlation values for the seven VQA metrics and hence results in reasonable PLCC scores. When considering all

10
An Evaluation of VQA Metrics for Passive Gaming Video Streaming Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands

(a) MOS vs. Bitrate (kbps) decrease for 480p (wider spread of the scores) and all data. BIQI
5 performs the worst among all three.
Resolution The difference in values per resolution can be attributed to the
Mean Opinion Score (MOS)

4
480 720 1080
fact that, while for FR and RR metric calculations we used the
rescaled YUV videos, for 720p and 480p resolutions, for NR met-
ric calculations we used the downscaled, compressed MP4 videos.
This, along with lack of proper training with videos consisting of
3

different resolutions, as well as the absence of parameters in the


2 models which can capture the differences due to change in res-
olution results in lower correlation scores when considering all
1
resolution-bitrate pairs. We discuss next the results obtained for
300 500 600 750 1200 2000 4000
NR metric performance evaluation when considering the upscaled
Bitrate (kbps)
YUV videos as was done for FR and RR metric evaluation.

(b) PSNR (dB) vs. Bitrate (kbps) 4.6 NR metric evaluation with rescaling
40

Resolution As mentioned before, the three NR metrics were evaluated on videos


480 720 1080 without rescaling. We briefly present and discuss the results ob-
tained with rescaled YUV videos and limitations of the same. Fig-
30

ure 4 shows the variation of the NIQE scores for one of the sample
PSNR

20
gaming video (FIFA) over 24 different resolution-bitrate pairs. While
for 1080p resolution videos, the NIQE values indicate higher quality
10 with increase in encoding bitrate (as one would expect), for 720p
resolution videos, the estimated quality remains approx. the same
even when considering higher bitrates. For 480p, the trend actually
reverses, with NIQE estimating a poorer quality at higher bitrates.
0
300 500 600 750 1200 2000 4000

Bitrate (kbps) A similar behavior is observed for BRISQUE and BIQI. A possible
reason behind such behavior could be that these NR metrics, which
(c) VMAF vs. Bitrate (kbps) are based on natural scene statistics, are not able to capture the
100 combined effect of quality loss due to compression and quality
Resolution loss due to rescaling, a common method used in resolution switch-
ing in adaptive streaming applications such as Dynamic Adaptive
80 480 720 1080

Streaming over HTTP (DASH) and HTTP Live Streaming (HLS).


Hence, while the results for NR metrics when considering the
60
VMAF

compressed, low resolution version without upscaling (480p and


720p) are as expected, the same are not capable to estimate MOS
40

20
values when rescaled versions of the sequences are considered.
This indicates unsuitability of their usage for applications such as
0 DASH and HLS where there is quality adaptation using multiple
300 500 600 750 1200 2000 4000
resolution-bitrate pairs, and the videos are usually rescaled to the
Bitrate (kbps)
native resolution (1080p in our case). Further investigation into the
design of these metrics can help to overcome this shortcoming and
also perhaps increasing their performance. Training and evaluation
Figure 2: MOS (with 95% confidence interval), PSNR and
of these metrics considering rescaled, multiple resolution-bitrate
VMAF values for the CSGO video sequence at different
pairs can possibly lead to improved prediction accuracy.
resolution-bitrate pairs. A similar behavior is observed for
other video sequences (relevant results not reported here
due to lack of space). 5 CONCLUSION AND FUTURE WORK
In this paper, we presented an objective evaluation and analysis of
the performance of eight different VQA metrics on gaming video
resolution-bitrate pairs, however, the spread of values is no longer considering a passive, live streaming scenario. At first, on a subset
linear, hence the lower correlation scores. Among the three NR of GamingVideoSET consisting of 90 video sequences, we evalu-
metrics, NIQE results in a much lower spread for each individual ated the performance of the VQA metrics against MOS scores. We
resolution and when considering all data as compared to BIQI and found that VMAF results in the highest correlation with subjective
BRISQUE. Hence, NIQE results in a higher overall prediction quality scores followed by SSIM and NIQE. It was observed that many
when using both MOS scores and VMAF scores as the benchmark. metrics failed to capture the MOS variation at lower resolutions,
BRISQUE on the other hand results in almost similar performance hence resulting in lower correlation values. Then we evaluated
as NIQE for 1080p and 720p resolutions but the correlation values the performance of the rest of the VQA metrics against VMAF on

11
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands N. Barman et al.

(a) VMAF vs. BRISQUE (b) VMAF vs. BIQI (c) VMAF vs. NIQE
Resolution Resolution Resolution
100 480 100 480 100 480
720 720 720
1080 1080 1080

80 80 80
VMAF

VMAF

VMAF
60 60 60

40 40 40

20 20 20

0 0 0
0 10 20 30 40 50 60 10 20 30 40 50 60 1 2 3 4 5 6

BRISQUE BIQI NIQE

Figure 3: Scatter plot showing the variation of the NR metrics wrt. VMAF scores considering all three resolutions over the
whole dataset.

6.5
FIFA17
ACKNOWLEDGMENT
6.0
This work is part of a project that has received funding from the Eu-
5.5
ropean Union’s Horizon 2020 research and innovation programme
5.0 under the Marie Sk lodowska-Curie grant agreement No 643072 and
4.5 was supported by the German Research Foundation (DFG) within
NIQE

4.0 project MO 1038/21-1.


3.5

3.0
REFERENCES
[1] S. Shirmohammadi, M. Abdallah, D. T. Ahmed, Y. Lu, and A. Snyatkov. Introduc-
2.5
tion to the special section on visual computing in the cloud: Cloud gaming and
Resolution
2.0 virtualization. IEEE Transactions on Circuits and Systems for Video Technology,
480
720
480
720
25(12):1955–1959, 2015.
1.5
1080 1080 [2] D. Fitzgerald and D. Wakabayashi. Apple Quietly Builds New Networks. https:
1.0 //www.wsj.com/articles/apple-quietly-builds-new-networks-1391474149, Feb-
300 400 500 600 750 900 1000 1200 1500 1600 2000 2500 3000 4000
ruary 2014. [Online: accessed 27-February-2017].
Bitrate (kbps) [3] N. Barman, S. Zadtootaghaj, M. G. Martini, S. Möller, and S. Lee. A Compara-
tive Quality Assessment Study for Gaming and Non-Gaming Videos. In Tenth
International Conference on Quality of Multimedia Experience (QoMEX), May 2018.
Figure 4: NIQE score variation for one of the sample gaming [4] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image Quality Assess-
video sequence (FIFA) considering the rescaled YUV videos ment: From Error Visibility to Structural Similarity. IEEE Transactions on Image
for 720p and 480p resolution. Similar patterns are observed Processing, 13(4):600–612, 2004.
[5] Netflix. VMAF - Video Multi-Method Assessment Fusion. https://github.com/
for other videos but not presented here due to lack of space. Netflix/vmaf. [Online: accessed 12-Dec-2018].
[6] R. Soundararajan and A. C. Bovik. Video quality assessment by reduced reference
the full test dataset. The performance of the NR metrics decreased spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems
for Video Technology, 23(4):684–694, April 2013.
when considering different resolution-bitrate pairs together. Also, [7] A. C. Bovik, R. Soundararajan, and Christos Bampis. On the Robust Performance
when considering rescaled videos, the NR metrics results in erro- of the ST-RRED Video Quality Predictor. http://live.ece.utexas.edu/research/
neous predictions. Possible reasons could be attributed to the lack Quality/ST-RRED/.
[8] C. G. Bampis, P. Gupta, R. Soudararajan, and A.C. Bovik. Source code for
of proper training, gaming video content, etc., which we plan to optimized Spatio-Temporal Reduced Reference Entropy Differencing Video Qual-
investigate in our future works. ity Prediction Model. http://live.ece.utexas.edu/research/Quality/STRRED opt
demo.zip, 2017.
We believe that the observations and discussions presented in [9] C. G. Bampis, P. Gupta, R. Soundararajan, and A. C. Bovik. SpEED-QA: Spa-
this work will be helpful to improve the prediction efficiency of tial Efficient Entropic Differencing for Image and Video Quality. IEEE Signal
these existing metrics as well as develop better performing NR VQA Processing Letters, 24(9):1333–1337, Sept 2017.
[10] A. Mittal, A. K. Moorthy, and A. C. Bovik. No-reference image quality assessment
metrics with a focus on live gaming video streaming applications. in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708,
In addition to passive gaming service as discussed in this work, a Dec 2012.
well-performing NR metric can also be used for predicting video [11] A. K. Moorthy and A. C. Bovik. A two-step framework for constructing blind
image quality indices. IEEE Signal Processing Letters, 17(5):513–516, May 2010.
quality for interactive cloud gaming services. It should be noted [12] A. Mittal, R. Soundararajan, and A. C. Bovik. Making a ”Completely Blind” Image
that our current subjective evaluation was limited in terms of the Quality Analyzer. IEEE Signal Processing Letters, 20(3):209–212, March 2013.
[13] N. Barman, S. Zadtootaghaj, S. Schmidt, M. G. Martini, and S. Möller. Gam-
number of videos considered. Also, the gaming videos used in this ingVideoSET: A Dataset for Gaming Video Streaming Applications. In Network
work were limited to 30 fps frame rate. As a future work, we plan to and Systems Support for Games (NetGames), 16th Annual Workshop on, Amster-
extend our subjective analysis using more videos and also include dam, Netherlands, June 2018.
higher frame rate videos.

12

You might also like