5j Future 2020 02 078

Machine Translated by Google
Future Generation Computer Systems 108 (2020) 424–431
Contents lists available at ScienceDirect
Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs
Social-viewport adaptive caching scheme with clustering for virtual

reality streaming in an edge computing platform
yousung yang to
, Joohyung Lee a,ÿ, Nakyoung Kim b
,
kwihoon kim c
to
Department of Software, Gachon University, Seongnam 13120, South Korea
b
Department of Electrical Engineering, KAIST, Daejeon 34141, South Korea
c
KSB Convergence Research Department, ETRI, Daejeon 34129, South Korea
articleinfo abstract
Article history: This paper proposes a novel social-viewport adaptive caching scheme (SACS) for virtual reality (VR)
Received 11 July 2019 streaming in an edge-computing platform. In VR contents with 360 degree views where only a part of the
Received in revised form 14 November 2019
entire view (ie, the viewport) is shown and the remaining parts are decoded but not shown, we collect
Accepted 26 February 2020
Available online 4 March 2020
and record multiple clients' viewports of the same VR contents in local proximity on the edge-computing
platform. We extract a social-viewport map, which represents where most of the local clients are directing
Keywords: their attention. By utilizing the social-viewport map, under our proposed scheme, k-means and mean-shift
adaptive caching clustering algorithms are adopted to partition 360 degree views into multiple clusters with the nearest
Edge computing mean of hit-ratios from multiple clients. Accordingly, in order to save cache storage while maintaining a
k-means clustering high-quality VR streaming service, we adaptively assign different encoding rates with various levels to
Mean-shift clustering
multiple viewports. We implement the proposed scheme using a commercial EdgeX foundry edge-
social-viewports
computing platform. A measurement-based experiment reveals that the proposed scheme achieves a
Video streaming
Virtual reality maximum storage reduction of almost 74%, with a 92% hit-ratio to the highest encoded viewports.
© 2020 Elsevier BV All rights reserved.
1.Introduction bitrates, and a client requesting the chunk corresponding to its

network condition from a server [3]. However, due to the large
With rapid developments in commercialized virtual reality (VR) distance between the clients and the server, the start-up delay
products (eg head-mounted displays, 360 degree cameras, smart- may remain unsatisfactory even though the DASH protocol is
phones) and wireless networking technologies, explosive growth adopted. To alleviate this, operators use their cache servers in
of the demand for mobile VR streaming services has been order to relocate their own content servers close to end clients,
witnessed [1] . From the perspective of video streaming services, so that the physical distance can be shortened [4].
a video should start up quickly and play without re-buffering. It is Due to limited and relatively expensive cache storage, several
worth noting that the numbers at 39% decrease in the amount of approaches have been proposed in [4–7] for efficient cache man
video played with just one buffering event from research results agement. The typical way to deal with this problem is to increase
at [2]. the content hit-ratio at the cache server. In order to improve the
This concern has motivated studies on various video streaming content hit-ratio, Li et al. consider a popularity-driven content
schemes that aim at reducing the start-up delay and re-buffering caching scheme [5], which is more responsive to continuously
rate [3,4]. One of the most popular methods is that of a client changing trends of content popularity, for optimal caching policy.
controlling the video encoding bitrate at the application layer by
In recent work, machine learning has been used to predict content
inspecting the dynamic network transmission rates. Dynamic
request probabilities from a content rating matrix in the community
adaptive streaming over HTTP (DASH) is the first international
in [6]. Nevertheless, most existing work has been confined to
standard to realize such a method; it works by breaking the
content-based cache management. For instance, it only considers
content into a sequence of small chunks with different encoding
which content should be stored or should be removed from the
cache server by estimating or learning the pop ularity of content.
ÿ
Corresponding author. To the best of our knowledge, there has been less attention to
E-mail addresses: stardung86@gc.gachon.ac.kr (Y.Yang),
j17.lee@gachon.ac.kr (J.Lee), nkim71@kaist.ac.kr (N.Kim), kwihooi@etri.re.kr
the importance within the specific content for cache management.
(K.Kim). In [8], the authors propose to partially cache of VR contents over
URL: https://sites.google.com/view/joohyung (J.Lee). multiple base stations by considering client's
https://doi.org/10.1016/j.future.2020.02.078
0167-739X/© 2020 Elsevier BV All rights reserved.
Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431 425
viewport. They firstly addressed the fundamental trade-offs be levels. This method provides users a video consisting of high bitrate
tween caching, computing, and communication for emerging VR tiles and low bitrate layer in order to avoid any blank or frozen tiles
services. This perspective would be useful for efficient caching of in the VR video even when users' views are quickly changing.
VR contents with 360 degree views, which require higher data rates As similar to [14], Kurutepe et al. [15] proposed a multiview video
(ie, 4k streaming) with only part of the entire view being consumed. within head tracking 3D displays, which simply provides viewport
Thus, there is room to improve on the previous work by considering adaptive streaming. However, while such layered-based and
VR contents' characteristics. multiview-based coding approached provide novel ways to streaming,
In this light, this paper proposes a novel social-viewport adaptive they still require high computing resources for en coding or decoding
caching scheme (SACS). The proposed scheme consists of the in practical situation. Not only to reduce the bandwidth usages, but
following: also to prevent exceptional situations of blank tile, A, Deniz Aladagli
et al. [16] proposed a system that predicts head trajectories in 360
• By considering the situation in which only part of the entire view
VR video streaming. With the proposed system, the tiles in the
(the viewport) is shown, while the remaining parts are decoded
predicted head trajectories can be prefetched with high bitrate tiles
but not shown, viewports of multiple clients for the same VR
contents are collected and recorded in local proximity. so that occurrence of blank tiles can be prevented.
While these streaming solutions are providing real-time en

• By generating a social-viewport map, which provides the general
information where most local clients are directing their attention coding and streaming depending on network condition or device
over 360 degree views, the proposed scheme partitions 360 type, they require much higher complexity in computing with pared
degree views into multiple clusters with the nearest mean of to naive basic schemes. To the best of our knowledge, there has not
hit-ratios from multiple clients adopting a various clustering been any work on VR cache system on the commercial edge
schemes such as k means clustering and mean shift clustering. computing framework nor utilization of the concept of social-viewport
from multiple clients. To analyze the social viewport, we adopt
• Correspondingly, different encoding rates with k levels are multiple clustering-based machine learning algorithms. The proposed
adaptively assigned to the viewports for efficient VR with tents work does not require real-time coding, thus useful and highly
caching. applicable for non-real time VR contents by efficiently utilizing
virtualized cache storage.
Therefore, under the proposed SACS approach, important
viewports over entire 360 degree views are differently handled for 3. Proposed scheme
efficient caching. This is beneficial to manage relatively expensive
cache storage for VR contents. The contributions of this paper are In this section, we provide scheme that we build for adaptive
summarized as follows caching by using several algorithms that referred to as easy-to
deploy like rule-based, k-means clustering and mean-shift clustering.
• This paper designs the SACS, which consists of generating a
We will be going to introduce our basic system model for adaptive
social-viewport map, feedback from clients, a virtualized cache
caching in 3.1 with algorithms introduced in 3.2–3.4.
server, and a differential encoding mechanism • The
proposed SACS is implemented using a commercial EdgeX
3.1. System model
foundry edge computing platform and android operate ating
system (YOU). •
We consider L clients that request the same VR content and feed
The measurement-based experiment shows that The SACS
back their viewports to an intermediate node called the ''edge
achieves a maximum storage reduction gain of almost 74%
computing server'' via a base station while VR stream ing, as shown
with a 92% hit-ratio to the highest encoded viewports.
in Fig. 1. Here, the edge-computing server with systems of three
2.Related work modules: (1) the encoding and caching module, (2) the k-means
clustering module, and (3) the social-viewport map generation
In low latency and bandwidth efficient VR streaming, various module. By integrating these three modules onto the edge-computing
adaptive VR streaming schemes have been extensively proposed to server, the SACS is proposed as a novel edge-computing platform.
consider the situation in which only part of the entire view is shown
and the remaining ones are decoded but not shown. For instance, The detailed procedure consists of the seven steps shown in Fig.
[9], they proposed tile-based streaming method for VR contents 1. When a VR client consumes the VR contents, they send their
which uses HEVC. Their main goal is to reduce data usage for viewports to the proposed edge computing system (SACS) via the
fetching VR contents by encoding each tile with different ent base station. Then, the social-viewport map generation module
encoding rate. Similarly, Cagri Ozcinar [10] integrated tiling and collects viewports and generates a social-viewport map corresponding
DASH. In order to provide viewport-aware bitrate level selection, an to some arbitrary VR content. Here, social viewport map (SM) is
end-to-end streaming system was implemented for encoding and modeled by an M × N matrix such as
displaying at a time. Mohammad Hosseini et al. [11, 12], proposed ÿu21
ÿ u11
u22u12
... ...
u2N u1N
SMÿ =
ÿ ÿÿ ÿÿ
their own streaming system based on MPEG-DASH SRD, which ÿ ÿ ÿ ÿ uM1 uM2 ... uMN ÿ ÿ
provides spatial relationship description between each tile to ... ,

efficiently utilize bandwidth resources. They partitioned the 3D
mesh into multiple 3D sub-meshes and constructed a 3D geometry
to show tiled 360 VR videos in the 3D space optimally. Their results where N and M denote the number of chunks in the VR contents
showed that their proposed method can save up to 72% of the and the number of viewports, respectively. Each entry of SM
required bandwidth for VR streaming. The work of Corbillon et al. represents the hit value collected from L clients. An example of SM
[13] proposed the cube-map projection based streaming scheme with the above model is depicted in Fig. 2.
with quality emphasized region (QER) and discussed about The k-means clustering module periodically acquires a social
optimization of the proposed algorithms in VR live streaming. Afshin viewport map from the social-viewport map generation module.
et al. [14], researched for a layered stream ing solution which Based on the map, coding rules over 360 degree views are de
streams multiple layers with different bitrate terminated. Afterwards, it sends the encoding rules to the encoding
426 Y. Yang, J. Lee, N. Kim et al. / Future Generation Computer Systems 108 (2020) 424–431
Fig. 1. Proposed system model and procedure.
Fig. 3. Example for clusters formation and centroid discovery with mean-shift
algorithm.
manually define a fixed encoding rule. To model the algorithm, for

each i ÿ I, we define Ji as the ratio of a certain encoding level i.
Fig. 2. Example of social-viewports map.
To apply the rule-based algorithm, a sorting algorithm (eg quick
sort) is adopted to sort views in hit ratio order, which is generally
known to scale with O(NlogN). Afterwards we set I = {1, 2, 3}, which
and caching module. Finally, the VR contents are encoded with the is K = 3. Accordingly, we set J1 = 0.3, J2 = 0.4, and J3 = 0.3. That
specified encoding rule and the caching module is updated. is, the top 30 percent of viewports are encoded with the highest
In this context, when a service provider defines the number of bitrates. Then, the next 40 percent of viewports are encoded with
encoding levels with the specified ratio of overall views, we define I the middle bitrates. Finally, the remaining 30 percent of viewports
as the set of an encoding level where |I| = K denotes the total are encoded with the lowest bitrates.
number of encoding levels. Here, a corresponding compression
ratio to each i ÿ I is denoted by ÿi , where ÿ1 < ÿ2 < · · · < ÿK are 3.3. k-means clustering algorithm
satisfied (0 ÿ ÿi ÿ 1). Then, for some VR content, the raw data of
each viewport is assumed to be R Mbits. Then, the overall cache
A k-means clustering algorithm is adopted to automatically
storage usage CU of this VR content can exist as
improve the encoding rule generation. Here, similar to the rule
M × N × ÿK R ÿ CU ÿ M × N × ÿ1R based algorithm, the service provider sets the number of encoding
levels K, which is used for the input value into the algorithm as the
From the above equation, considering a trade-off between the
number of clusters. In this context, given the partition G1, . . . , it is
saving in cache storage and the quality of the VR streaming service,
required for the following optimization problem, a well GK ,
different encoding rates can be adaptively assigned over multiple
known non-convex and NP hard problem, to find representatives
viewports. Here, the goal when constructing an encoding rule is to
z1, . . . , zK of each group:.
save cache storage while maintaining an acceptable quality level of
the VR streaming service. 2
arg minj ÿ (1/M) ÿ (xi ÿ zj) .
1ÿjÿK iÿGj
3.2. Rule-based algorithm
where, for a convenience, xi is redefined as the hit value of a certain
In the proposed system model, as a baseline algorithm to create viewport among M viewports, and zj = (1/|Gj |) ÿ xi . iÿGj Then, the
an encoding rule by referring to the social-viewport map, a rule- heuristic based k-means clustering algorithm de written in
based algorithm is proposed which provides the freedom to Algorithm 1 partitions 360 degree views into K clusters
Fig. 4. Implementation for the proposed VR client.
with the nearest mean of hit-ratio in an iterative manner. As a result, closest data instance. Thresholds required in the algorithm are
Ji is efficiently decided without manual configuration. calculated with the input hyperparameters, and the bandwidth is
adaptively recalculated every iteration of the data shift process.
After shifting the data instances into dense regions, the clusters, G,
Algorithm 1 k-means algorithm
are finalized by aggregating the nearby shifted data instances into
Input: K (the number of encoding levels), viewport map groups. The cluster representatives, z, are calculated by taking
Output: a set of K clusters with the nearest mean of hit-ratio the average of the shifted data instances belonged to each cluster.
1: Arbitrarily choose K viewports from viewport map as the
initial cluster centers As described, the mean-shift algorithm shifts a data instance xi
2: repeat to ward a higher density region in respect to the other data
3: (re)assign each viewport to the cluster to which the viewport is instances xn, ÿn = 1,, N,
2, .which
. . assumed to be stationary at that
the most similar, based on the mean value of the viewports moment. The update function for lth iteration with a bandwidth, h.
in the cluster 4: Update By rewriting the update rule in the perspective of the amount of
the cluster means, ie calculate the mean value of shift in every iteration, the shifting vector of the viewport currently
the viewports for each cluster being updated can be found. 2 ( xi , xn )
5: until no change
d he
he
l+1 exp ( ÿ 2h 2 ) d
update ( x)i , h)=x = ÿN xn
3.4. Mean-shift clustering algorithm Yo
he
n=1
2(x
expi ,xm ) m=1 2h2 ) _
Mean-Shift algorithm is a kernel-based fixed-point iteration
problem to shift each data instance towards higher density region d he
( ÿ ÿN 2 ( x i ,xn )
in respect to the other data instances [17]. Mean-shift algorithm he
exp ( ÿ 2h 2 ) d
ÿÿm ( xi , l xn ÿ x i .
can form clusters in non-convex shapes without requiring the h ) = ÿN he
2(x
number of clusters as an input. Therefore, it does not require a n=1 ÿN
i ,xm ) m=1 exp ( ÿ 2h 2 )
predetermined number of clusters which comes from the prior
knowledge of the saliency areas in the video, but it rather dis
covers the information by exploring the dataset. As described in Algorithm 2 mean-shift algorithm
Fig. 3, it analyzes the density between data instances to forms
Input: ÿ and k (hyperparameters for determining bandwidth and
clusters in dense regions.
threshold), viewport map
Accordingly, mean-shift algorithm can form clusters in arbitrary
Output: a set of unpredefined number of clusters
shapes, even non-convex. When a cluster's shape is non-convex,
1: set thresholds for iteration stop criteria with ÿ 2:
the data instances' mean and center of mass in the cluster tend to
for each viewport map data, x, do 3:
be highly dissimilar. For the clustering techniques which calculate
repeat shift
the representative of a cluster as the average of the data instances,
4: the viewport with the update rule, fupdate
the representative of a non-convex cluster often cannot properly
5: update the bandwidth, h, with the distance of kth closest
typify the cluster. However, mean-shift algorithm can realize the
viewport and sensitivity parameter, ÿ
regions where the data instances in a cluster gather densely and
6: until the magnitude of the shifting vector, ÿÿm , is smaller than
extract that spatial points as the representatives for the clusters the threshold
[18]. Therefore, mean-shift clustering can extract the viewport 7: end for
patterns of a video even if the patterns are somehow to be non-
8: aggregate the nearby shifted viewports, of which distances are
convex in their vector space representations.
smaller than a threshold, into a set of groups to form clusters
Mean-shift algorithm requires an input for a bandwidth of the
kernel, h, in order to decide the sensitivity in forming clusters.
9: calculate the clusters' representatives by averaging the shifted
The shape of the kernel and its bandwidth determine the number
viewports belong to each cluster
and the weights of the data instances to be covered by the kernel
during the clustering process. The bandwidth is often set as the
distance from the currently updating data instance to the kth closest 3.5. implementations
data instance with a predetermined value of k. In this paper,
Gaussian kernel is used, and the bandwidth is set according to a We implement our proposed SACS on (1) a commercial EdgeX
sensitivity hyperparameter ÿ and the distance of the kth foundry edge-computing platform for the cache side and (2) an
Table 1
Summary of viewport distribution for VR contents.
results mean SD CV
Muay Thai 9 1.2 0.13

climbing 8 1.5 0.17
drone view 8 2 0.21
state transfer (REST) API and can be run in most Android

OS versions. Thus, the viewport value is wrapped up with
Fig. 5. Implementation for the web dashboard service.
javascript object notation (JSON), typed, and delivered to
the edge-computing platform by REST
API. • On the cache side, we modify coredata and metadata
micro services provided by EdgeX foundry edge computing
platform (ie, EdgeX), which is running on the Docker with
tainer. Specific information about the VR contents is stored
in metadata micro services and the social-viewport map
generation module is conducted with the coredata micro
service. Since the original EdgeX system provides default
profile that can only be applicable to the simple sensor
value, we make new profile and value descriptor to get a
list based viewport type. On the EdgeX, we implement RESP
API to support multiple clients identification and interworking
with the original server. Furthermore, since EdgeX supports
the aggregation for multiple edges, we can easily aggregate
its data or device information with simple steps. The k
means clustering module is implemented with Python 3.6 to
construct an encoding rule whereby the flask library is used
to send a POST request for the encoding rule to the
encoding and caching module. Furthermore, a dashboard
server is implemented to monitor the social-viewport map
status in real-time.
The k-means clustering module is implemented with Python
3.6 to construct an encoding rule whereby the flask library is
used to send a POST request for the encoding rule to the
encoding and caching module. Furthermore, a dashboard
Fig. 6. Configured test bed of the proposed scheme.
server is implemented to monitor the social-viewport map
status in real-time.
• The encoding and caching module is implemented with the
Nginx and FFMPEG, where HEVC tile modes are adopted.
android platform for the client side. We make the following
For the streaming protocol, the HTTP live streaming (HLS)
developments to support our proposed scheme:
protocol is used. By using HLS protocol, most of device
• On the VR client side, to extract the viewport value and feed which does not support tile-based DASH streaming can also
it back to the edge computing server while consuming the fetch contents as usual.
VR contents, a VR content viewer is developed by using the • The dashboard service shows that social-viewport viewport
Google VR software development kit (SDK) to get accurate data in timestamp as depicted in Fig. 5. It can help ser vice
positions presented with yaw and pitch typed value. The provider to efficiently manage its own cache server by
application works as following steps, (1) Fetch list of analyzing basic viewport trend of VR contents.
contents from middleware server, (2) User choose the
content from list and (3) Put a device into VR device like 4. Performance evaluation
Gear VR or Google Cardboard to watch VR content. Fig. 4
shows simple step to set and check target edge computing In this section, we provide detailed evaluation results that we
middleware url, and consumes the VR contents from the VR have obtained to illustrate the behavior of the proposed SACS.
device. We adopt the Retrofit library to feed back the We conducted a measurement-based experiment in which 50
viewport value, since the Retrofit library is based on a representational
clients at the university, aged between 20 and 23 years, were
Fig. 7. Each thumbnail of each content for (a) the Muay Thai, (b) the climbing and (c) the droneview, respectively.
Fig. 8. Viewport distribution of VR contents for (a) Muay Thai demonstration, (b) Climbing advertisement and (c) Drone view.
Fig. 9. Heatmap graphs for contents that has concentrated and widespread density of viewport respectively. (a) shows graph for climbing and (b) for drone view.
equipped with VR devices. The configured testbed of the proposed

scheme is shown in Fig. 6, which shows desktop workstation
contains EdgeX microservices with docker.
We used three different VR contents: (1) A Muay Thai
demonstration, (2) A climbing advertisement, and (3) a landscape
video (drone view). The different VR contents were expected to
have different viewport distributions. As shown in Fig. 7, each
figure shows the representative thumbnail of each content, (a)
contains only one person who demonstrates the Muay Thai which
is called as Thai boxing, (b) contains two groups, a climber who
demonstrates climbing under the rock and people who sail the
boat, and (c) contains a landscape view recorded by drone.
Also, Fig. 8 shows the viewport distribution of each VR with
tents from the 50 participants in the test. It should be noted that a
violin plot is used to efficiently address the viewport distribution
patterns, including both variance and probability density of the
data at different values. As shown in Fig. 8(a), the viewport
distribution for the Muay Thai content tends to cluster around
Fig. 10. Cache storage usage of proposed schemes with varying VR contents. particular areas (eg the 10th and 5th viewports), with low
variance. That is, the Muay Thai content has certain interesting
points (eg a person demonstrating Muay Thai skill), which most
view ports concentrate on. On the other hand, in Fig. 8(c), the
viewport distribution for the drone view has high variance and
data points are spread out. This is because there are no
particularly interesting points, and entire viewports are expected
to be seen by clients. Finally, in Fig. 8(b), the climbing
advertisement example has medium variance and multiple
interesting points over entire viewports.
We provide in Table 1 a summary of viewport distribution for
the different VR contents. Accordingly, based on these view port
distributions, which we call the social-viewport map, the encoding
rule corresponding to a particular type of VR content is decided.
The obtained gain will be differentiated according to the viewport
distribution.
Before we forming clusters, we created heatmap that visually
show the big picture of viewers' interesting point which contains
most of viewport. As shown in Fig. 9, (a) shows the heatmap
graph for a content which has lower density of viewport while (b)
Fig. 11. Hit ratio of proposed schemes with varying VR contents. shows the graph for content which has higher density of
viewport. We made heatmap graphs for each content we tested and 5.2. mean-shift
use it to understand and contrast with the results generated by our
mean-shift algorithm. More advanced than K-means clustering, mean-shift algorithm
Fig. 10 shows the cache storage usages of the proposed based clustering is able to examine the dataset to determine a
schemes with the rule-based algorithm, the k-means clustering proper number of clusters to form as described in the previous
algorithm and the mean-shift algorithm with respect to the three VR section. When the video contents get complex, the appropriate
contents where N = 20 (1 second per chunk) and M = 4×4 = 16 (16 number of saliency areas are hard to find, and the users' viewport
tiles at each chunk). For differential encoding rates, the compression patterns become multiplex. Mean-shift clustering can find proper
ratios ÿK are ÿ1 = 1.0, ÿ2 = 0.42 and ÿ3 = 0.1, where K = 3.
saliency spots in videos which improves the quality of service by
Under these settings, we consider (1) a prior scheme which increasing the hit-ratio.
only streams the original tiles encoded with highest quality of Mean-shift clustering can be applied for the videos with complex
bitrate as a benchmark without our proposed scheme, (2) the rule- contents and viewports patterns to improve the quality of the service
based algorithm with J1 = 0.3 , J2 = 0.5 , and J3 = 0.2 (denoted by retaining the hit-ratio without a significant increase in the video
by ''Rule-based (0.3,0.5,0.2)''), (3) the rule-based algorithm with data size. In the mean-shift clustering based VR streaming scheme,
J1 = 0.5, J2 = 0.3, J3 = 0.2 (denoted by ''Rule based ( 0.5, 0.3, both the mean and the centroid of a cluster are used since the
0.2)''), (4) the k-means clustering algorithm and (5) the mean-shift cluster mean tends to represent the gener alized locations of the
clustering algorithm. social viewports while the cluster centroid tends to indicate the
As shown in Fig. 10, the cache storage reduction gain, which is movement or change in the social viewports.
defined by the 1-(VR content storage usage with the adaptive The standard deviations from the mean and the centroid are
encoding rate/storage usage with the highest encoding rate), calculated to determine the regions for high, medium, and low
increases when the viewports are very close to particular spots with bitrate. The regions in 2 standard deviations are encoded in high
low variance. This is due to fact that the lower portion of overall bitrate, in 3 standard deviations in medium bitrate, and the rest in
viewports are encoded with high encoding rates. The gain of the low bitrate.
proposed rule-based algorithms increases with the lower value of
J1. Gains of 48%–77% and 39%–67% are achieved by Rule based 5.3. Delay analysis
(0.3,0.5,0.2) and Rule-based (0.5, 0.3, 0.2), respectively.
However, it is difficult to manually find a suitable configuration by
From the adoption of the proposed cache system, we provided
considering the trade-off between cache storage savings and the
quality level of VR streaming service. In the proposed configurations, simple analysis to address the benefit in terms of start-up de law.
it is obvious that the proposed k-means cluster ing algorithm finds Here, we assume that conventional system receives the VR contents
an efficient encoding rule without manual configuration and from the end (cloud) server with 100 ms round trip time (rtt) delay,
outperforms other rule-based algorithms. Also, the proposed mean- denoted by De2c , and the proposed SACS receives the one from
shift algorithm which automatically process clustering like k-means the cache server with 50 ms rtt, which is denoted by Dc2c . Here,
produces inefficient storage usage than the k-means algorithm but we consider the climbing content with K-Means algorithm so that
as shown as Fig. 11 it has more higher hit-ratio than the k-means the cached chunk size will be about 60% (= 0.6ÿ16 Mb) of the
algorithm, it is worth noting that we can manually increase J2 or J3 original chunk size (= So, 16 Mb) shown Sc , as Fig. 10. With
in the rule-based algorithm to achieve more storage reduction. the data rate R (= 50 Mbps) at bottleneck link (access network), the
start-up delay to receive a first chunk can be estimated by LSACS
Finally, Fig. 11 shows the performance of the hit-ratio of proposed = Dc2c + Sc / R = 0.242 s while we get Lorigin = De2c + So /R =
schemes, which represents the hit-ratio to the tile with the highest 0.42 in the conventional system. Finally, since SACS can reduce
encoding level (ÿ1). This can be translated into the quality level of total chunk size to 39%–77% depending on the hit ratio, we can
the VR streaming service. Here, similar to the results of cache also reduce start-up delay to 40%–71% compared to the conventional
storage reduction, the hit-ratio increases when the viewports are system.
very close to particular spots, with low variance.
This is due to the fact that most of the viewports are concentrated 5.4. Future works
on those points of interest. In this sense, the k-means clustering
algorithm has a similar performance to the others by achieving a
78%–92% hit-ratio. Furthermore, the mean-shift algorithm has higher In the proposed SACS, there are challenging issues to support
performance than k-means and other algorithms. Accordingly, we a multitude of social data from various intention types of clients as
can conclude that the proposed k-means and mean-shift clustering follows: if some clients want to see tiles that are low in popularity,
algorithm automatically finds an efficient encoding rule within the the SACS may be inconvenient for satisfying those clients. To
trade-off between cache storage savings and quality level of VR address this challenge, we can adaptively use the SACS depending
streaming service. on the network condition. Specifically, clients actively measure their
network condition (eg, throughput), and determine whether they
5.Discussion turn on the SACS or not. For instance, if the throughput is low to get
high or medium bitrate of tiles, they use the proposed SACS. If
5.1. k-means
otherwise, they prefer to receive high-encoded VR contents without
using the SACS. Such adaptive mechanisms will be considered as
Since k-means algorithm automatically get k clusters at a time, our future work.
we can choose appropriate encoding level per each cluster. There
fore, we can select each tile's bitrate by checking the number of
6. Conclusions
users in each cluster. After creating bitrate map for each tiles(or
clusters) of the content, we encode the content with applicable map
by using FFMPEG with spatial encoding. But since we have to In this paper, we proposed a novel social-viewport adaptive
choose the number of clusters, k, before we start clustering with k- caching scheme (SACS) for VR streaming in an edge-computing
means algorithm, it can be an obstacle to create the number of platform. By considering a trade-off between cache storage savings
reasonable clusters because it needs heuristic steps to find proper and the quality of VR streaming services, efficient encoding rules
number per content. over multiple viewports are provided. The proposal
solution is implemented on a commercial EdgeX foundry edge [15] E. Kurutepe, MR Civanlar, AM Tekalp, Selective streaming of multi view video
for head-tracking 3D displays, in: 2007 IEEE International Conference on
computing platform, and a measurement-based experiment show
Image Processing, Vol. 3, 2007, http://dx.doi . org/10.1109/ ICIP.2007.4379250,
that the proposed scheme achieves a maximum storage reduction III-77–III-80.
of almost 74% with a 92% hit-ratio to the highest encoded [16] AD Aladagli, E. Ekmekcioglu, D. Jarnikov, A. Kondoz, Predicting head
viewports. trajectories in 360ÿ virtual reality videos, in: 2017 International Conference on
3D Immersion (IC3D), 2017, pp. 1–6, http://dx.doi.org/10.1109/IC3D.
2017.8251913.
Declaration of competing interest
[17] scikit learn, Clustering, Scikit Learn. (2019) https://scikit-learn.org/stable/
modules/clustering.html.
The authors declare that they have no known competing [18] D. Comaniciu, P. Meer, Mean shift: A robust approach toward feature space
financial interests or personal relationships that could have analysis, IEEE Trans. Pattern Anal. Mach. Intel. 24 (5) (2002) 603–619, http://
appeared to influence the work reported in this paper. dx.doi.org/10.1109/34.1000236.
Acknowledgment
Yousung Yang is currently a student pursuing MS and
This research was supported by the Basic Science Research received his BS degree with Software Department,
Gachon University, South Korea. He presented his project
Program of the National Research Foundation of South Korea
work at Samsung SOSCON 2018 with Electronics and
under Grant NRF-2018R1C1B6001849. Telecommunications Research Institute, ETRI, in South
Korea. His research interests include IoT systems, VR
References streaming, object detection and machine learning in edge
computing systems.
[1] Z. Research, Global (vr) virtual reality market size forecast to reach USD 26.89
billion by 2022, GlobeNewswire News Room (2018) https://globenewswire.com/
news-release/2018/05/01/1494026/ 0/in/Global-VR Virtual-Reality-Market-
Size-Forecast-to-Reach-USD-26-89-Billion-by 2022.html.
Joohyung Lee received the BS, MS, and Ph.D. degrees
[2] T. Netzer, Vodafone & saguna research results: Mec improves video streaming from the Korea Advanced Institute of Science and
user experience, Saguna (2017) https://www.saguna.net/blog/ improving-
Technology, Daejeon, South Korea, in 2008, 2010, and
video-streaming-customer-experience-with-multi- access edge-computing-
2014, respectively. From 2012 to 2013, he was a Visiting
research-results-from-vodafone-and-saguna/.
Researcher with the Information Engineering Group,
[3] A. Bentaleb, B. Taani, AC Begen, C. Timmerer, R. Zimmermann, A survey on
Department of Electronic Engineering, City University of
bitrate adaptation schemes for streaming media over http, IEEE Commun.
Hong Kong, Hong Kong. From 2014 to 2017, he was a
Surv. Tutor. 21 (1) (2019) 562–585, http://dx.doi.org/10.1109/comst.2018.
2862938. Senior Engineer with Samsung Electronics. He is currently
an Assistant Professor with the Department of Software,
[4] DH Lee, C. Dovrolis, AC Begen, Caching in HTTP adaptive streaming, in:
Gachon University, South Korea. He has contributed
Proceedings of Network and Operating System Support on Digital Audio and several articles to the International Telecommunication
Video Workshop - NOSSDAV '14, 2013, http://dx.doi.org /10.
Union Telecommunication (ITU-T) and the 3rd Generation Partnership Project
1145/2597176.2578270.
(3GPP). His research work is at the intersection of mobile systems and machine
[5] S. Li, J. Xu, M. van der Schaar, W. Li, Popularity-driven content caching, in:
IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on learning focusing on edge computing architectures to optimize the trade-off between
latency, energy, bandwidth and accuracy for video analytics. He received the Best
Computer Communications, 2016, http:// dx .doi.org/10.1109/infocom.2016.
7524381. Paper Award at the Integrated Communications, Navigation, and Surveillance
Conference, in 2011, and Award for outstanding contribution in reviewing at Elsevier
[6] J. Ahn, SH Jeon, H.-S. Park, A novel proactive Caching strategy with community-
Computer Communications. He has been an IEEE Senior member since 2019, and
aware learning in comp-enabled small-cell networks, IEEE Commun. Lett. 22
a Technical Reviewer for several conferences and journals, such as the IEEE
(9) (2018) 1918–1921, http://dx.doi.org/10.1109/lcomm. 2018.2856761.
Communications Letters, the IEEE Transactions on Vehicular Technology, and
[7] S. Wang, X. Zhang, Y. Zhang, L. Wang, J. Yang, W. Wang, A survey on mobile Elsevier Computer Communications.
edge networks: Convergence of computing, caching and communications,
IEEE Access 5 (2017) 6757– 6779, http://dx.doi.org/10.1109/ACCESS.2017. Nakyoung Kim is currently a student pursuing Ph.D. in
2685434.
School of Electrical Engineering, KAIST, Daejeon,
[8] J. Chakareski, VR/AR immersive communication: Caching, edge computing, Republic of Korea. She received her BS degree in
and transmission trade-offs, in: Proceedings of the Workshop on Virtual computer and electrical engineering and MS degree in
Reality and Augmented Reality Network, in: VR/AR Network '17, ACM, New electrical engineering in 2015 and 2016, respectively,
York, NY, USA, 2017, p. 36–41, http://dx.doi.org/10.1145/3097895. 3097902. from the Georgia Institute of Technology, Atlanta, Georgia,
United States. Her research interests are IoT, machine
[9] A. Zare, A. Aminlou, MM Hannuksela, M. Gabbouj, HEVC-Compliant tile learning, artificial intelligence, data analysis, and data
based streaming of panoramic video for virtual reality applications, in: processing. Her recent research investigates analysis on
Proceedings of the 24th ACM International Conference on Multimedia, in: MM power-related IoT data with machine learning tech
'16, ACM, New York, NY, USA, 2016, p. 601–605, http://dx.doi.org/ niques. She has contributed to ITU-T SG 13, SG 20 and
10.1145/2964284.2967292 . Focus Group on Data Processing and Management (FG-DPM) as a contributor and
[10] C. Ozcinar, A. De Abreu, A. Smolic, Viewport-aware adaptive 360ÿ video an editor of articles on IoT and data processing. In Internet Engineering Task Force
streaming using tiles for virtual reality, in: 2017 IEEE International Conference (IETF), she is contributing to an article on integrating AI into IoT network and IoT
on Image Processing (ICIP), 2017, pp. 2174–2178, http://dx.doi.org/10.1109/ data processing as she participates in IETF hackathons regarding to the solution
ICIP.2017.8296667 . development.
[11] M. Hosseini, V. Swaminathan, Adaptive 360 vr video streaming: divide and
conquer, in: 2016 IEEE International Symposium on Multimedia (ISM), 2016,
pp. 107–110. Kwihoon Kim studied in KAIST, MS degree and Ph.D.
[12] M. Hosseini, V. Swaminathan, Adaptive 360 vr video streaming: Divide and degree in 2000 and 2019, respectively. He worked in LG
conquer, in: 2016 IEEE International Symposium on Multimedia (ISM), 2016, DACOM 2000 2005 and is a research engineer in ETRI
pp. 107–110, http://dx.doi.org/10.1109/ISM.2016.0028. since 2005. He is a principal research engineer of intel
[13] X. Corbillon, G. Simon, A. Devlic, J. Chakareski, Viewport-adaptive navigable ligent IoE networking research team, ETRI now. He is an
360-degree video delivery, in: 2017 IEEE International Conference on editor and rapporteur of ITU-T SG 11 since 2006. His
Communications (ICC), 2017, pp. 1–7, http://dx.doi.org/10.1109/ICC.2017. interested fields are Fog/edge computing, Internet of
7996611. Things, 5G/IMT2020, deep learning, machine learning,
[14] A. Taghavi Nasrabadi, A. Mahzari, JD Beshay, R. Prakash, Adaptive 360- reinforcement learning, GAN and knowledge-converged
degree video streaming using layered video coding, in: 2017 IEEE Virtual intelligent service.
Reality (VR), 2017, pp. 347–348, http://dx.doi.org/10.1109/VR.2017. 7892319.

5j Future 2020 02 078

Uploaded by

Copyright:

Available Formats

You might also like

5j Future 2020 02 078

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5j Future 2020 02 078

Uploaded by

Copyright:

Available Formats

Machine Translated by Google

Future Generation Computer Systems 108 (2020) 424–431

Contents lists available at ScienceDirect

Future Generation Computer Systems

Social-viewport adaptive caching scheme with clustering for virtual

© 2020 Elsevier BV All rights reserved.

1.Introduction bitrates, and a client requesting the chunk corresponding to its

While these streaming solutions are providing real-time en

provides spatial relationship description between each tile to ... ,

Fig. 1. Proposed system model and procedure.

manually define a fixed encoding rule. To model the algorithm, for

Fig. 4. Implementation for the proposed VR client.

Muay Thai 9 1.2 0.13

state transfer (REST) API and can be run in most Android

equipped with VR devices. The configured testbed of the proposed

You might also like