Professional Documents
Culture Documents
5j Future 2020 02 078
5j Future 2020 02 078
5j Future 2020 02 078
articleinfo abstract
Article history: This paper proposes a novel social-viewport adaptive caching scheme (SACS) for virtual reality (VR)
Received 11 July 2019 streaming in an edge-computing platform. In VR contents with 360 degree views where only a part of the
Received in revised form 14 November 2019
entire view (ie, the viewport) is shown and the remaining parts are decoded but not shown, we collect
Accepted 26 February 2020
Available online 4 March 2020
and record multiple clients' viewports of the same VR contents in local proximity on the edge-computing
platform. We extract a social-viewport map, which represents where most of the local clients are directing
Keywords: their attention. By utilizing the social-viewport map, under our proposed scheme, k-means and mean-shift
adaptive caching clustering algorithms are adopted to partition 360 degree views into multiple clusters with the nearest
Edge computing mean of hit-ratios from multiple clients. Accordingly, in order to save cache storage while maintaining a
k-means clustering high-quality VR streaming service, we adaptively assign different encoding rates with various levels to
Mean-shift clustering
multiple viewports. We implement the proposed scheme using a commercial EdgeX foundry edge-
social-viewports
computing platform. A measurement-based experiment reveals that the proposed scheme achieves a
Video streaming
Virtual reality maximum storage reduction of almost 74%, with a 92% hit-ratio to the highest encoded viewports.
https://doi.org/10.1016/j.future.2020.02.078
0167-739X/© 2020 Elsevier BV All rights reserved.
Machine Translated by Google
Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431 425
viewport. They firstly addressed the fundamental trade-offs be levels. This method provides users a video consisting of high bitrate
tween caching, computing, and communication for emerging VR tiles and low bitrate layer in order to avoid any blank or frozen tiles
services. This perspective would be useful for efficient caching of in the VR video even when users' views are quickly changing.
VR contents with 360 degree views, which require higher data rates As similar to [14], Kurutepe et al. [15] proposed a multiview video
(ie, 4k streaming) with only part of the entire view being consumed. within head tracking 3D displays, which simply provides viewport
Thus, there is room to improve on the previous work by considering adaptive streaming. However, while such layered-based and
VR contents' characteristics. multiview-based coding approached provide novel ways to streaming,
In this light, this paper proposes a novel social-viewport adaptive they still require high computing resources for en coding or decoding
caching scheme (SACS). The proposed scheme consists of the in practical situation. Not only to reduce the bandwidth usages, but
following: also to prevent exceptional situations of blank tile, A, Deniz Aladagli
et al. [16] proposed a system that predicts head trajectories in 360
• By considering the situation in which only part of the entire view
VR video streaming. With the proposed system, the tiles in the
(the viewport) is shown, while the remaining parts are decoded
predicted head trajectories can be prefetched with high bitrate tiles
but not shown, viewports of multiple clients for the same VR
contents are collected and recorded in local proximity. so that occurrence of blank tiles can be prevented.
426 Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431
Fig. 3. Example for clusters formation and centroid discovery with mean-shift
algorithm.
Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431 427
with the nearest mean of hit-ratio in an iterative manner. As a result, closest data instance. Thresholds required in the algorithm are
Ji is efficiently decided without manual configuration. calculated with the input hyperparameters, and the bandwidth is
adaptively recalculated every iteration of the data shift process.
After shifting the data instances into dense regions, the clusters, G,
Algorithm 1 k-means algorithm
are finalized by aggregating the nearby shifted data instances into
Input: K (the number of encoding levels), viewport map groups. The cluster representatives, z, are calculated by taking
Output: a set of K clusters with the nearest mean of hit-ratio the average of the shifted data instances belonged to each cluster.
1: Arbitrarily choose K viewports from viewport map as the
initial cluster centers As described, the mean-shift algorithm shifts a data instance xi
2: repeat to ward a higher density region in respect to the other data
3: (re)assign each viewport to the cluster to which the viewport is instances xn, ÿn = 1,, N,
2, .which
. . assumed to be stationary at that
the most similar, based on the mean value of the viewports moment. The update function for lth iteration with a bandwidth, h.
in the cluster 4: Update By rewriting the update rule in the perspective of the amount of
the cluster means, ie calculate the mean value of shift in every iteration, the shifting vector of the viewport currently
the viewports for each cluster being updated can be found. 2 ( xi , xn )
5: until no change
d he
he
l+1 exp ( ÿ 2h 2 ) d
update ( x)i , h)=x = ÿN xn
3.4. Mean-shift clustering algorithm Yo
he
n=1
2(x
expi ,xm ) m=1 2h2 ) _
Mean-Shift algorithm is a kernel-based fixed-point iteration
problem to shift each data instance towards higher density region d he
( ÿ ÿN 2 ( x i ,xn )
in respect to the other data instances [17]. Mean-shift algorithm he
exp ( ÿ 2h 2 ) d
ÿÿm ( xi , l xn ÿ x i .
can form clusters in non-convex shapes without requiring the h ) = ÿN he
2(x
number of clusters as an input. Therefore, it does not require a n=1 ÿN
i ,xm ) m=1 exp ( ÿ 2h 2 )
predetermined number of clusters which comes from the prior
knowledge of the saliency areas in the video, but it rather dis
covers the information by exploring the dataset. As described in Algorithm 2 mean-shift algorithm
Fig. 3, it analyzes the density between data instances to forms
Input: ÿ and k (hyperparameters for determining bandwidth and
clusters in dense regions.
threshold), viewport map
Accordingly, mean-shift algorithm can form clusters in arbitrary
Output: a set of unpredefined number of clusters
shapes, even non-convex. When a cluster's shape is non-convex,
1: set thresholds for iteration stop criteria with ÿ 2:
the data instances' mean and center of mass in the cluster tend to
for each viewport map data, x, do 3:
be highly dissimilar. For the clustering techniques which calculate
repeat shift
the representative of a cluster as the average of the data instances,
4: the viewport with the update rule, fupdate
the representative of a non-convex cluster often cannot properly
5: update the bandwidth, h, with the distance of kth closest
typify the cluster. However, mean-shift algorithm can realize the
viewport and sensitivity parameter, ÿ
regions where the data instances in a cluster gather densely and
6: until the magnitude of the shifting vector, ÿÿm , is smaller than
extract that spatial points as the representatives for the clusters the threshold
[18]. Therefore, mean-shift clustering can extract the viewport 7: end for
patterns of a video even if the patterns are somehow to be non-
8: aggregate the nearby shifted viewports, of which distances are
convex in their vector space representations.
smaller than a threshold, into a set of groups to form clusters
Mean-shift algorithm requires an input for a bandwidth of the
kernel, h, in order to decide the sensitivity in forming clusters.
9: calculate the clusters' representatives by averaging the shifted
The shape of the kernel and its bandwidth determine the number
viewports belong to each cluster
and the weights of the data instances to be covered by the kernel
during the clustering process. The bandwidth is often set as the
distance from the currently updating data instance to the kth closest 3.5. implementations
data instance with a predetermined value of k. In this paper,
Gaussian kernel is used, and the bandwidth is set according to a We implement our proposed SACS on (1) a commercial EdgeX
sensitivity hyperparameter ÿ and the distance of the kth foundry edge-computing platform for the cache side and (2) an
Machine Translated by Google
428 Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431
Table 1
Summary of viewport distribution for VR contents.
results mean SD CV
Fig. 7. Each thumbnail of each content for (a) the Muay Thai, (b) the climbing and (c) the droneview, respectively.
Machine Translated by Google
Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431 429
Fig. 8. Viewport distribution of VR contents for (a) Muay Thai demonstration, (b) Climbing advertisement and (c) Drone view.
Fig. 9. Heatmap graphs for contents that has concentrated and widespread density of viewport respectively. (a) shows graph for climbing and (b) for drone view.
430 Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431
viewport. We made heatmap graphs for each content we tested and 5.2. mean-shift
use it to understand and contrast with the results generated by our
mean-shift algorithm. More advanced than K-means clustering, mean-shift algorithm
Fig. 10 shows the cache storage usages of the proposed based clustering is able to examine the dataset to determine a
schemes with the rule-based algorithm, the k-means clustering proper number of clusters to form as described in the previous
algorithm and the mean-shift algorithm with respect to the three VR section. When the video contents get complex, the appropriate
contents where N = 20 (1 second per chunk) and M = 4×4 = 16 (16 number of saliency areas are hard to find, and the users' viewport
tiles at each chunk). For differential encoding rates, the compression patterns become multiplex. Mean-shift clustering can find proper
ratios ÿK are ÿ1 = 1.0, ÿ2 = 0.42 and ÿ3 = 0.1, where K = 3.
saliency spots in videos which improves the quality of service by
Under these settings, we consider (1) a prior scheme which increasing the hit-ratio.
only streams the original tiles encoded with highest quality of Mean-shift clustering can be applied for the videos with complex
bitrate as a benchmark without our proposed scheme, (2) the rule- contents and viewports patterns to improve the quality of the service
based algorithm with J1 = 0.3 , J2 = 0.5 , and J3 = 0.2 (denoted by retaining the hit-ratio without a significant increase in the video
by ''Rule-based (0.3,0.5,0.2)''), (3) the rule-based algorithm with data size. In the mean-shift clustering based VR streaming scheme,
J1 = 0.5, J2 = 0.3, J3 = 0.2 (denoted by ''Rule based ( 0.5, 0.3, both the mean and the centroid of a cluster are used since the
0.2)''), (4) the k-means clustering algorithm and (5) the mean-shift cluster mean tends to represent the gener alized locations of the
clustering algorithm. social viewports while the cluster centroid tends to indicate the
As shown in Fig. 10, the cache storage reduction gain, which is movement or change in the social viewports.
defined by the 1-(VR content storage usage with the adaptive The standard deviations from the mean and the centroid are
encoding rate/storage usage with the highest encoding rate), calculated to determine the regions for high, medium, and low
increases when the viewports are very close to particular spots with bitrate. The regions in 2 standard deviations are encoded in high
low variance. This is due to fact that the lower portion of overall bitrate, in 3 standard deviations in medium bitrate, and the rest in
viewports are encoded with high encoding rates. The gain of the low bitrate.
proposed rule-based algorithms increases with the lower value of
J1. Gains of 48%–77% and 39%–67% are achieved by Rule based 5.3. Delay analysis
(0.3,0.5,0.2) and Rule-based (0.5, 0.3, 0.2), respectively.
However, it is difficult to manually find a suitable configuration by
From the adoption of the proposed cache system, we provided
considering the trade-off between cache storage savings and the
quality level of VR streaming service. In the proposed configurations, simple analysis to address the benefit in terms of start-up de law.
it is obvious that the proposed k-means cluster ing algorithm finds Here, we assume that conventional system receives the VR contents
an efficient encoding rule without manual configuration and from the end (cloud) server with 100 ms round trip time (rtt) delay,
outperforms other rule-based algorithms. Also, the proposed mean- denoted by De2c , and the proposed SACS receives the one from
shift algorithm which automatically process clustering like k-means the cache server with 50 ms rtt, which is denoted by Dc2c . Here,
produces inefficient storage usage than the k-means algorithm but we consider the climbing content with K-Means algorithm so that
as shown as Fig. 11 it has more higher hit-ratio than the k-means the cached chunk size will be about 60% (= 0.6ÿ16 Mb) of the
algorithm, it is worth noting that we can manually increase J2 or J3 original chunk size (= So, 16 Mb) shown Sc , as Fig. 10. With
in the rule-based algorithm to achieve more storage reduction. the data rate R (= 50 Mbps) at bottleneck link (access network), the
start-up delay to receive a first chunk can be estimated by LSACS
Finally, Fig. 11 shows the performance of the hit-ratio of proposed = Dc2c + Sc / R = 0.242 s while we get Lorigin = De2c + So /R =
schemes, which represents the hit-ratio to the tile with the highest 0.42 in the conventional system. Finally, since SACS can reduce
encoding level (ÿ1). This can be translated into the quality level of total chunk size to 39%–77% depending on the hit ratio, we can
the VR streaming service. Here, similar to the results of cache also reduce start-up delay to 40%–71% compared to the conventional
storage reduction, the hit-ratio increases when the viewports are system.
very close to particular spots, with low variance.
This is due to the fact that most of the viewports are concentrated 5.4. Future works
on those points of interest. In this sense, the k-means clustering
algorithm has a similar performance to the others by achieving a
78%–92% hit-ratio. Furthermore, the mean-shift algorithm has higher In the proposed SACS, there are challenging issues to support
performance than k-means and other algorithms. Accordingly, we a multitude of social data from various intention types of clients as
can conclude that the proposed k-means and mean-shift clustering follows: if some clients want to see tiles that are low in popularity,
algorithm automatically finds an efficient encoding rule within the the SACS may be inconvenient for satisfying those clients. To
trade-off between cache storage savings and quality level of VR address this challenge, we can adaptively use the SACS depending
streaming service. on the network condition. Specifically, clients actively measure their
network condition (eg, throughput), and determine whether they
5.Discussion turn on the SACS or not. For instance, if the throughput is low to get
high or medium bitrate of tiles, they use the proposed SACS. If
5.1. k-means
otherwise, they prefer to receive high-encoded VR contents without
using the SACS. Such adaptive mechanisms will be considered as
Since k-means algorithm automatically get k clusters at a time, our future work.
we can choose appropriate encoding level per each cluster. There
fore, we can select each tile's bitrate by checking the number of
6. Conclusions
users in each cluster. After creating bitrate map for each tiles(or
clusters) of the content, we encode the content with applicable map
by using FFMPEG with spatial encoding. But since we have to In this paper, we proposed a novel social-viewport adaptive
choose the number of clusters, k, before we start clustering with k- caching scheme (SACS) for VR streaming in an edge-computing
means algorithm, it can be an obstacle to create the number of platform. By considering a trade-off between cache storage savings
reasonable clusters because it needs heuristic steps to find proper and the quality of VR streaming services, efficient encoding rules
number per content. over multiple viewports are provided. The proposal
Machine Translated by Google
Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431 431
solution is implemented on a commercial EdgeX foundry edge [15] E. Kurutepe, MR Civanlar, AM Tekalp, Selective streaming of multi view video
for head-tracking 3D displays, in: 2007 IEEE International Conference on
computing platform, and a measurement-based experiment show
Image Processing, Vol. 3, 2007, http://dx.doi . org/10.1109/ ICIP.2007.4379250,
that the proposed scheme achieves a maximum storage reduction III-77–III-80.
of almost 74% with a 92% hit-ratio to the highest encoded [16] AD Aladagli, E. Ekmekcioglu, D. Jarnikov, A. Kondoz, Predicting head
viewports. trajectories in 360ÿ virtual reality videos, in: 2017 International Conference on
3D Immersion (IC3D), 2017, pp. 1–6, http://dx.doi.org/10.1109/IC3D.
2017.8251913.
Declaration of competing interest
[17] scikit learn, Clustering, Scikit Learn. (2019) https://scikit-learn.org/stable/
modules/clustering.html.
The authors declare that they have no known competing [18] D. Comaniciu, P. Meer, Mean shift: A robust approach toward feature space
financial interests or personal relationships that could have analysis, IEEE Trans. Pattern Anal. Mach. Intel. 24 (5) (2002) 603–619, http://
appeared to influence the work reported in this paper. dx.doi.org/10.1109/34.1000236.
Acknowledgment
Yousung Yang is currently a student pursuing MS and
This research was supported by the Basic Science Research received his BS degree with Software Department,
Gachon University, South Korea. He presented his project
Program of the National Research Foundation of South Korea
work at Samsung SOSCON 2018 with Electronics and
under Grant NRF-2018R1C1B6001849. Telecommunications Research Institute, ETRI, in South
Korea. His research interests include IoT systems, VR
References streaming, object detection and machine learning in edge
computing systems.
[1] Z. Research, Global (vr) virtual reality market size forecast to reach USD 26.89
billion by 2022, GlobeNewswire News Room (2018) https://globenewswire.com/
news-release/2018/05/01/1494026/ 0/in/Global-VR Virtual-Reality-Market-
Size-Forecast-to-Reach-USD-26-89-Billion-by 2022.html.
Joohyung Lee received the BS, MS, and Ph.D. degrees
[2] T. Netzer, Vodafone & saguna research results: Mec improves video streaming from the Korea Advanced Institute of Science and
user experience, Saguna (2017) https://www.saguna.net/blog/ improving-
Technology, Daejeon, South Korea, in 2008, 2010, and
video-streaming-customer-experience-with-multi- access edge-computing-
2014, respectively. From 2012 to 2013, he was a Visiting
research-results-from-vodafone-and-saguna/.
Researcher with the Information Engineering Group,
[3] A. Bentaleb, B. Taani, AC Begen, C. Timmerer, R. Zimmermann, A survey on
Department of Electronic Engineering, City University of
bitrate adaptation schemes for streaming media over http, IEEE Commun.
Hong Kong, Hong Kong. From 2014 to 2017, he was a
Surv. Tutor. 21 (1) (2019) 562–585, http://dx.doi.org/10.1109/comst.2018.
2862938. Senior Engineer with Samsung Electronics. He is currently
an Assistant Professor with the Department of Software,
[4] DH Lee, C. Dovrolis, AC Begen, Caching in HTTP adaptive streaming, in:
Gachon University, South Korea. He has contributed
Proceedings of Network and Operating System Support on Digital Audio and several articles to the International Telecommunication
Video Workshop - NOSSDAV '14, 2013, http://dx.doi.org /10.
Union Telecommunication (ITU-T) and the 3rd Generation Partnership Project
1145/2597176.2578270.
(3GPP). His research work is at the intersection of mobile systems and machine
[5] S. Li, J. Xu, M. van der Schaar, W. Li, Popularity-driven content caching, in:
IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on learning focusing on edge computing architectures to optimize the trade-off between
latency, energy, bandwidth and accuracy for video analytics. He received the Best
Computer Communications, 2016, http:// dx .doi.org/10.1109/infocom.2016.
7524381. Paper Award at the Integrated Communications, Navigation, and Surveillance
Conference, in 2011, and Award for outstanding contribution in reviewing at Elsevier
[6] J. Ahn, SH Jeon, H.-S. Park, A novel proactive Caching strategy with community-
Computer Communications. He has been an IEEE Senior member since 2019, and
aware learning in comp-enabled small-cell networks, IEEE Commun. Lett. 22
a Technical Reviewer for several conferences and journals, such as the IEEE
(9) (2018) 1918–1921, http://dx.doi.org/10.1109/lcomm. 2018.2856761.
Communications Letters, the IEEE Transactions on Vehicular Technology, and
[7] S. Wang, X. Zhang, Y. Zhang, L. Wang, J. Yang, W. Wang, A survey on mobile Elsevier Computer Communications.
edge networks: Convergence of computing, caching and communications,
IEEE Access 5 (2017) 6757– 6779, http://dx.doi.org/10.1109/ACCESS.2017. Nakyoung Kim is currently a student pursuing Ph.D. in
2685434.
School of Electrical Engineering, KAIST, Daejeon,
[8] J. Chakareski, VR/AR immersive communication: Caching, edge computing, Republic of Korea. She received her BS degree in
and transmission trade-offs, in: Proceedings of the Workshop on Virtual computer and electrical engineering and MS degree in
Reality and Augmented Reality Network, in: VR/AR Network '17, ACM, New electrical engineering in 2015 and 2016, respectively,
York, NY, USA, 2017, p. 36–41, http://dx.doi.org/10.1145/3097895. 3097902. from the Georgia Institute of Technology, Atlanta, Georgia,
United States. Her research interests are IoT, machine
[9] A. Zare, A. Aminlou, MM Hannuksela, M. Gabbouj, HEVC-Compliant tile learning, artificial intelligence, data analysis, and data
based streaming of panoramic video for virtual reality applications, in: processing. Her recent research investigates analysis on
Proceedings of the 24th ACM International Conference on Multimedia, in: MM power-related IoT data with machine learning tech
'16, ACM, New York, NY, USA, 2016, p. 601–605, http://dx.doi.org/ niques. She has contributed to ITU-T SG 13, SG 20 and
10.1145/2964284.2967292 . Focus Group on Data Processing and Management (FG-DPM) as a contributor and
[10] C. Ozcinar, A. De Abreu, A. Smolic, Viewport-aware adaptive 360ÿ video an editor of articles on IoT and data processing. In Internet Engineering Task Force
streaming using tiles for virtual reality, in: 2017 IEEE International Conference (IETF), she is contributing to an article on integrating AI into IoT network and IoT
on Image Processing (ICIP), 2017, pp. 2174–2178, http://dx.doi.org/10.1109/ data processing as she participates in IETF hackathons regarding to the solution
ICIP.2017.8296667 . development.
[11] M. Hosseini, V. Swaminathan, Adaptive 360 vr video streaming: divide and
conquer, in: 2016 IEEE International Symposium on Multimedia (ISM), 2016,
pp. 107–110. Kwihoon Kim studied in KAIST, MS degree and Ph.D.
[12] M. Hosseini, V. Swaminathan, Adaptive 360 vr video streaming: Divide and degree in 2000 and 2019, respectively. He worked in LG
conquer, in: 2016 IEEE International Symposium on Multimedia (ISM), 2016, DACOM 2000 2005 and is a research engineer in ETRI
pp. 107–110, http://dx.doi.org/10.1109/ISM.2016.0028. since 2005. He is a principal research engineer of intel
[13] X. Corbillon, G. Simon, A. Devlic, J. Chakareski, Viewport-adaptive navigable ligent IoE networking research team, ETRI now. He is an
360-degree video delivery, in: 2017 IEEE International Conference on editor and rapporteur of ITU-T SG 11 since 2006. His
Communications (ICC), 2017, pp. 1–7, http://dx.doi.org/10.1109/ICC.2017. interested fields are Fog/edge computing, Internet of
7996611. Things, 5G/IMT2020, deep learning, machine learning,
[14] A. Taghavi Nasrabadi, A. Mahzari, JD Beshay, R. Prakash, Adaptive 360- reinforcement learning, GAN and knowledge-converged
degree video streaming using layered video coding, in: 2017 IEEE Virtual intelligent service.
Reality (VR), 2017, pp. 347–348, http://dx.doi.org/10.1109/VR.2017. 7892319.