Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
1

Towards Encrypted Cloud Media Center with


Secure Deduplication
Yifeng Zheng, Xingliang Yuan, Xinyu Wang, Jinghua Jiang, Cong Wang, and Xiaolin Gui

Abstract—The explosive growth of multimedia contents, espe- media data such as videos can easily reveal content-sensitive
cially videos, is pushing forward the paradigm of cloud-based information. For guaranteed confidentiality, data encryption is
media hosting today. However, the wide attacking surface of the considered by many as the only viable approach that needs
public cloud and the growing security awareness from the society
are both calling for data encryption before outsourcing to cloud. to be adopted, when building privacy-assured cloud based
Under the circumstance of encrypted videos, how to still preserve applications [9].
all the service benefits of cloud media centre remains to be fully Despite quite effective to address the security concerns,
explored. In this paper, we present a secure system architecture directly applying data encryption to multimedia data would
design as our initial effort towards this direction, which bridges explicitly invalidate many benefits of deploying the cloud-
together the advancements of video coding techniques and secure
deduplication. Our design enables the cloud with the crucial based media applications. Accordingly, in the literature there
deduplication functionality to completely eliminate the extra have been recent endeavors on investigating how to enable
storage and bandwidth cost, which would have been incurred by the cloud to support various desirable functionalities over
hosting encrypted videos from different entities. The design is also encrypted multimedia data, such as encrypted feature extrac-
carefully tailored to the scalable video coding (SVC) techniques tion [10], [11], encrypted watermark detection [12], encrypted
to support heterogeneous networks and devices for high quality
adaptive video dissemination. We show fully functional system scalable sharing [13], [14], encrypted social discovery [15],
implementations with structure-aware encryption design and etc. Under the circumstance of encrypted videos, how to still
structure-aware deduplication strategies that are both completely preserve all the service benefits of cloud media centre remains
compliant with the video format in SVC. Extensive security to be fully explored.
analysis and experiments via our prototype deployed on Azure In this work, we show a secure system design along this
cloud platform show the practicality of the design. Our work can
also be easily extended to support other media applications that direction, which aims to bring together the advancements of
employ media files with scalable structures. video coding techniques and secure deduplication. We target
the crucial deduplication functionality at cloud, which can
Index Terms—Cloud media center, secure deduplication, scal-
able video coding, layer-level deduplication. eliminate the burdensome storage and bandwidth overhead
when storing encrypted videos from different entities. Our
design is also fully tailored to the scalable video coding
I. I NTRODUCTION (SVC) techniques from the very beginning, and supports the
The explosive growth of multimedia contents, especially ubiquitous adaptive video dissemination in the context of
videos, is pushing forward the paradigm of cloud-based media heterogeneous networks and devices.
hosting today. Towards such trend, many emerging media Specifically, for deduplication over encrypted data,
applications, such as media live streaming [2], media coding message-locked encryption (MLE) [16] is known as
[3], media transcoding [4], and media contrast enhancement the state-of-the-art approach, which generally uses keys
[5], are being increasingly deployed at the cloud for the well- deterministically derived from the data (e.g., the hash value)
understood service benefits [6]. to generate tags for duplicate checking in the encrypted
While leveraging the cloud media center is quite promising, domain. But directly applying MLE over videos would not
the wide attacking surface of the public cloud and the growing be suitable, as MLE is known to be vulnerable to off-line
security awareness from the society are both calling for data brute-force guessing attacks, when the target plaintext is
encryption before outsourcing to cloud. On the one hand, from a small space or considered as predictable [17]. In
public cloud might be vulnerable to security breaches, and video applications, popular videos, trending searches, and
unauthorised data disclosure incidents occur from time to time near-duplicate videos, might all fall into this predictable
in recent years [7], [8]. On the other hand, semantically rich space category, and could be the easy breach point of such
off-line guessing attacks, threatening the video confidentiality
Y. Zheng, X. Yuan, X. Wang, J. Jiang, and C. Wang are with the
Department of Computer Science, City University of Hong Kong, Kowloon, guarantee. Besides, for proper video dissemination, the
Hong Kong, and J. Jiang is also with the Department of Computer encrypted deduplication design must also prevent malicious
Science and Technology, Xi’an Jiaotong University, Xi’an, China (e- users from illegitimately accessing unauthorised videos by
mail: yifeng.zheng@my.cityu.edu.hk, xinglyuan3-c@my.cityu.edu.hk, xiny-
wang@cityu.edu.hk, jinghua2-c@my.cityu.edu.hk, congwang@cityu.edu.hk). simply using the checking tags [18].
X. Gui is with the Department of Computer Science and Technology, Xi’an We propose a non-trivial secure deduplication framework
Jiaotong University, Xi’an, China (e-mail: xlgui@mail.xjtu.edu.cn). that will address the above problems completely and suit
A preliminary version [1] of this paper was presented at the 10th ACM
Symposium on Information, Computer and Communications Security (ASI- the needs of cloud-based video applications. Specifically, it
ACCS’15). supports secure deduplication with strong video protection

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
2

against malicious users and untrusted cloud. Building on top data with a convergent key derived via computing the hash
of recent advancements on secure deduplication, it supports value of the data content. CE has been deployed in many
secure deduplication with resistance to bounded data leakage, systems [26]–[29]. Later, Bellare et al. [16] formalize CE
and with defense against off-line brute-force attacks over under the notion of message-locked encryption (MLE). Abadi
predictable videos, respectively. Meanwhile, we provide de- et al. [30] strengthen the security definitions of MLE by
signs for our secure framework in both the centralized and considering plaintext distributions that may depend on the
decentralized settings, respectively, where the decentralized public system parameters. Bellare et al. [31] further consider
setting provides stronger security than the centralized one. To privacy for messages that are both correlated and dependent on
our best knowledge, no prior works enable an encrypted cloud the public system parameters, and extend MLE to interactive
media center with such comprehensive protection. MLE (iMLE), which is a theoretical construction that mainly
Under this encrypted framework, we then consider how to relies on extremely complex fully homomorphic encryption. In
facilitate the fast-growing demand of adaptively disseminating [32], Li et al. leverage secret sharing and distribute the secret
videos to heterogeneous networks and devices, such as PCs, shares of the convergent keys across multiple key management
smart mobile devices, and SmartTVs [19]. One direct approach servers, allowing users not to manage the keys on their own.
is to store multiple encrypted versions of the same video In [33], Stanek et al. study secure deduplication that provides
content at the cloud. However, it will incur a considerable differential security for popular and unpopular data. They use
amount of storage and bandwidth overhead [20], increasing the CE to encrypt popular data and design a two-layer encryption
capital cost of using cloud services. To further mitigate such scheme for the deduplication of encrypted unpopular data.
burden while preserving the adaptive delivery functionality, we Although MLE and its variants ( [30]–[33]) support cross-
resort to the SVC techniques [21]. With the special structure user deduplication in the encrypted domain, they are not
of layers, including one base layer and several enhancement resistant to off-line brute-force attacks over predictable data
layers, SVC enables multiple versions of the same video [17]. Moreover, they do not consider a strong security model
content to be contained in a single video file, which can in which bounded data leakage might occur [18], i.e., the data
greatly improve the storage efficiency and dissemination scal- hash might be disclosed.
ability [22]. In light of these benefits, we carefully tailor our In order to defend off-line brute-force attacks in secure
secure deduplication design to be compatible with the inherent destination-based deduplication, Bellare et al. [17] propose to
characteristics of SVC videos. The proposed structure-aware use a key server to obliviously provide message-derived keys
layer-level deduplication strategies effectively enable the en- for encryption. They also adopt rate-limiting strategies on the
crypted SVC video deduplication, while efficiently supporting key server to mitigate online brute-force attacks in practice.
the adaptive video delivery. Later, Duan [34] resort to threshold signatures and extend the
Aiming for a fully functional system implementation, we framework of [17] to the setting of distributed key servers. In
also present a structure-aware encryption mechanism for SVC [35], Puzio et al. [35] resorts to a server to further encrypt
videos, similar to the works in [23], [24], with further op- the CE-encrypted data collected from users. Very recently,
timisations on the storage part to support efficient video Liu et al. [36] propose a scheme capable of defending offline
retrieval and dissemination. The structure-aware encryption brute-force attacks without introducing additional independent
mechanism and the structure-aware deduplication strategies server. However, their scheme requires a number of online
are both completely compliant with the video format in SVC. users to actively assist the cloud to perform duplicate check
Thorough security analysis shows our system design achieves and help transfer encryption keys.
strong protection of the video confidentiality. We conduct Despite useful in defending off-line brute-force attacks,
experiments through an end-to-end prototype implementation all these work do not consider any bounded data leakage
deployed on Azure, with about 17,000 lines of codes. Various setting, in which the data encryption key might be leaked [18].
performance measures justify the effectiveness and efficiency Very recently, Xu et al. [18] propose a secure source-based
of our system. To cover a wide range of encrypted cloud media deduplication scheme that is resilient to bounded data leakage.
applications, we also show how to extend our work to support However, the scheme does not offer the defense against off-
other media files that are inherently with scalable structures. line brute-force attacks over predictable data.
The rest of this paper is organized as follows. Section II Different from existing work, in this paper we carefully
describes the related work. Section III presents our problem design the encrypted cloud media center with comprehensive
formulation. Section IV presents the preliminaries. Section V protection. It supports secure deduplication with the resilience
formulates the general system framework for secure dedupli- to bounded data leakage, and with the defense against off-line
cation. Section VI provides the construction of adaptive video brute-force attacks, respectively. Moreover, our system bridges
delivery with structure-aware secure deduplication. Section the gap between adaptive video delivery and structure-aware
VII presents the security analysis. Section VIII gives the secure deduplication.
experiment results. Section IX concludes the whole paper.
II. R ELATED W ORK B. SVC Video Security
A. Secure Deduplication Our proposed research also relates to the branch of work
To support encrypted deduplication, Douceur et al. [25] on SVC video security. In [23], Wei et al. propose a format-
first propose convergent encryption (CE), which encrypts compliant SVC encryption scheme, considering the scenario

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
3

when SVC videos is delivered through an open network. Encrypted Cloud Media Center

The scheme achieves format compliance by constructing new


network abstract layer units to substitute the original ones, Agency Server
so after encryption the video ciphertext still has the standard
SVC structure. Deng et al. [37] propose an efficient block-
based SVC encryption scheme. Their target scenario is that a Encrypted Adaptive
Deduplication Delivery
pay TV broadcaster wants to offer all users with the base layer
of the broadcast program, but requires that the enhancement
Defense against
layers are only accessible to authorized users. Therefore, the Off-line Attacks
proposed scheme leaves the base layer in the cleartext and
User Devices
adopts secure pseudorandom permutations on macroblocks
and subblocks to encrypt the enhancement layers. In [13],
Fig. 1. Illustration of our system architecture.
Wu et al. investigate scalable access control on SVC videos
in cloud computing. They propose a new scalable access
control scheme based on ciphertext policy attribute-based to heterogeneous devices and networks. After the outsourcing
encryption (CP-ABE), which can embed multiple messages of encrypted videos, the user may delete them at local, and
into one ciphertext. The scheme enables users with different later access her own videos at cloud. The agency server, hosted
privileges to derive the corresponding layer keys and further by a third party, facilitates our system to defend off-line brute-
the corresponding layers. Later, Ma et al. [14] propose a two- force attacks. In practice, it can be hosted by an independent
dimensional scalable access control scheme for SVC. Unlike economical cloud service provider whose services are getting
[13] which uses hash chains to generate access keys, the cheaper [39], [40], or be a gateway server located within the
scheme adopts a CP-ABE based approach so as to resist user boundary of a large corporate customer of the cloud video
collusion. In [38], Wei et al. present a hybrid authentication hosting service [17]. Specifically, the agency server provides
scheme for SVC videos. The scheme uses message authenti- assistance to support secure deduplication in a controllable
cation code (MAC) to authenticate the base layer, and verifies fashion, and supports users to properly encrypt videos in a
the enhancement layers via extracted video features. Different way that enables secure deduplication. We note that the multi-
from the above work, in this paper we investigate structure- server model has been commonly adopted in the literature to
aware secure deduplication over encrypted SVC videos. facilitate various security-aware cloud applications [11], [41]–
Portions of the work presented in this paper have previously [44].
appeared as an extended abstract in [1]. We have revised the
article a lot and improved many technical details as compared
to [1]. The primary improvements are as follows: Firstly, we B. Threat Model
provide a new Section V-C to extend the proposed system Our security goal is to provide strong protection for the
framework to the decentralized setting of multiple agency video confidentiality. In our system, two types of adversaries
servers, enhancing the security and reliability. The security of are considered, i.e., external adversary and internal adversary.
this new design is analyzed in detail in Section VII. We also The external adversary may refer to a user who might obtain
add Section V-D to show how our system framework can be some knowledge of a video (e.g., a hash value) via some public
adapted to meet other security notions in secure deduplication. channel [18], [32], and attempt to cheat the video ownership
Secondly, we add Section VI-C to generalize the construction from cloud. For example, file hashes are widely used over
of secure SVC video deduplication, and also discuss structure- the Internet for integrity verification of downloaded files, and
aware secure deduplication over other scalable media in a new they are not really meant to be secret [45]. We assume that the
Section VI-D. We also elaborate the encryption for both the external adversary will not upload a fake video to compromise
SVC video header and content in Section VIII-A. Thirdly, we the integrity of other users’ videos. Video tampering detection
provide security evaluation in a new Section VIII-D to measure is not the focus of our work, and it can be handled by various
the effectiveness of rate limiting in slowing down online brute- orthogonal mechanisms such as proof of storage [46].
force attacks. Finally, we redo all the experiments and extend The internal adversary is honest-but-curious, which may
the performance evaluation. refer to cloud or the agency server. First, cloud honestly
executes the designated deduplication protocols, yet intends to
III. P ROBLEM S TATEMENT extract the video content underlying users’ encrypted videos.
In particular, cloud may try to launch off-line brute-force
A. System Architecture attacks over target predictable videos so as to recover the
We consider an encrypted cloud video hosting service video contents. Second, the third-party agency server honestly
involving three different entities, as illustrated in Fig. 1: the executes the assigned functions, but also tries to infer useful
cloud media center (abbr. cloud), the user, and the agency information about users’ videos. We will consider the settings
server. Cloud serves as a video hosting platform that stores of a single agency server and decentralized agency servers,
encrypted videos outsourced by users. It enforces deduplica- respectively. In the former case, it is assumed that there is no
tion so as to eliminate the storage and bandwidth redundancy, collusion between cloud and the agency server. In the latter
and is required to adaptively disseminate the encrypted videos case, cloud is allowed to corrupt a certain number of agency

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
4

Algorithm 1 The RSA-OPRF protocol it is beneficial to store SVC videos at the cloud media
Input: Message: M ; Secret key of the agency server: d. center to achieve adaptive video dissemination. Without loss
Output: PRF result z.
User: // Blind message of generality, we can denote a SVC video SV with l layers as
1:
$
γ ← Z∗N ;
SV = (m1 , m2 , . . . , ml ), where m1 is the base layer and mi
2: h ← H(M ); is the (i − 1)th enhancement layer for i ∈ [2, l] [13]. Table I
3: x ← h · γ e modN ; gives the description of the notations used in this paper.
4: Send x to the agency server.
Agency server: // Sign
5: y ← xd modN ; V. S ECURE S YSTEM F RAMEWORK FOR T HE E NCRYPTED
6: Send y to the user. C LOUD M EDIA C ENTER
User: // Produce the PRF output
7: z ← y · γ −1 modN ;
In this section, we study secure deduplication in the en-
8: Ret G(z). crypted cloud media center. In order to strongly protect the
video confidentiality while supporting the crucial deduplica-
tion functionality, we formulate a secure system framework
servers. Note that in this paper, we do not consider that cloud supporting secure deduplication with resistance to off-line
modifies or deletes users’ videos. brute-force attacks over predictable videos by cloud, and
ownership cheating attacks by the user. Note that we do
IV. P RELIMINARIES not explicitly consider the underlying video structure when
designing the framework, and defer structure-aware secure
A. Oblivious Pseudorandom Function
video deduplication to Section VI. Such treatment makes the
An oblivious pseudorandom function (OPRF) protocol en- system framework also suitable for generic data, e.g., textual
ables two parties, say A holding an input x and B holding a files and images. We start with describing our design rationale
secret key sk, to jointly and securely compute a pseudorandom to address the threats mentioned in Section III-B. Then, we
function (PRF) fk (x). The protocol is executed obliviously present the secure system framework in detail. For simplicity
in the sense that A only learns the output value, while B of exposition, we consider a single agency server during the
learns nothing from the process [47]. In our system, the user framework design. Subsequently, we will provide extension to
interacts with the agency server via an OPRF protocol similar a decentralized setting of multiple agency servers, enhancing
to [17], which is built from blind RSA signatures [48]. Let the security and reliability. Finally, we investigate how the
(N, e) and (N, d) be the agency server’s public key and secret system framework can be adapted to meet other security
key, respectively, which are as in the RSA cryptosystem. Let notions in secure deduplication.
H : {0, 1}∗ → Z∗N and G : Z∗N → {0, 1}λ be two hash
functions. Algorithm 1 shows the RSA-OPRF protocol. Given
a message M (the PRF input), the user produces a blinded A. Design Rationale
hash x of the message and sends it to the agency server. The Deduplication is crucial for the encrypted cloud media
agency server signs x with its secret key d (the PRF key) center to eliminate the burdensome storage and bandwidth
and returns the signature y to the user, who then removes the overhead when storing encrypted videos from different users.
blinding to derive the actual signature and computes the hash Our system targets secure source-based deduplication1 , where
of the signature, i.e., G(H(M )d mod N ), as the PRF output. the video redundancy is eliminated at the source side. More
precisely, duplicate check is performed before users upload
B. Scalable Video Coding their encrypted videos so that the transmission of duplicate
videos would be saved. Our security design focused on ad-
SVC enables multiple representations of the same video
dressing potential security threats by building on top of the
content to be contained in a single video file. A SVC video
recent advancements on secure deduplication [17], [18].
consists of one base layer and several enhancement layers.
First, we consider a strong security model of secure dedupli-
The base layer renders the basic visual quality, while the
cation, i.e., the bounded leakage setting first proposed by Xu
enhancement layers can enhance the basic quality by sup-
et al. in [18], in which a certain amount of deterministically
plementing the base layer via different scalability dimensions
and efficiently extractable information of the plaintext data
such as resolution, which we are particularly interested in
could be leaked. Under such model, MLE is not suitable for
as the first instantiation. Note that during the decoding of a
use in our system as its key for encryption is not leakage
SVC video, if a higher layer exists, a lower layer must be
resilient. In particular, the key is generated from the data in
present, but not the other way around. In other words, if the
a deterministic way and might already be leaked before the
SVC layers are discarded from the highest one, the remaining
encryption process in practice [18]. For similar reasons, under
layers are still decodable for visual rendering. Thus, using a
such model, simply using the plaintext video hash as a proxy
single SVC video, a user is able to adaptively enjoy different
for the video ownership could also be insecure.
representations of the same video content by using different
To resist the threats from bounded data leakage, we note
number of layers. Besides, SVC eliminates the redundancy
that the following treatment inspired from [18] could be used.
between multiple representations of the same video content
via various inter-layer prediction techniques [49]. In a word, 1 Destination-based deduplication requires all data to be uploaded to cloud
in terms of storage efficiency and dissemination scalability, for the duplicate check.

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
5

TABLE I β, i.e., α = OPRFRSA RSA


k1 (H(V )) and β = OPRFk2 (H(V )),
N OTATIONS USED IN THIS PAPER where k1 and k2 are two different secret keys of the agency
Notations Description server for signing. The tag α, instead of the plaintext hash,
V Video is used for duplicate check at cloud. And the label β is
SV SVC video embedded in the one-time message-derived mask during the
mi A SVC layer
u Initial user initial upload of a new video, and assists the recovery of the
u0 Subsequent user random key τ during the subsequent upload of a duplicate
L No. of layers of a SVC video of the initial user video. We note that through this careful enhancement design,
L0 No. of layers of a SVC video of the subsequent user our system can protect predictable videos against the off-line
Lc No. of layers of a SCV video stored at cloud
α Message-derived tag brute-force attack by cloud, and thus strongly protect the video
β Message-derived label confidentiality, which will be analyzed in detail in Section VII.
r Masked key
H(V ) Hash value of a video
k1 Signing key for generating message-derived tag B. The Proposed Secure System Framework
k2 Signing key for generating message-derived label
τ Video encryption key Based on the above design intuition, we are now ready to
Cτ Ciphertext of encryption key present our secure system framework for the encrypted cloud
CV Video ciphertext media center. The service flow of our system framework con-
sists of three phases, i.e., initial upload, subsequent upload and
video retrieval. Let SE = (KGen, Enc, Dec) be a deterministic
First, duplicate check is achieved at cloud via the hash value symmetric encryption scheme with λ bits long key length and
H(V ) of the video V sent by the user. Second, the key τ hK : {0, 1}∗ → {0, 1}λ be a key-ed hash function. Each phase
for encrypting videos is randomly selected by the user that is elaborated as follows:
initially uploads that video. And she also hides τ by a one- Initial Upload. Suppose user u is the initial one who uploads
time message-derived mask via a keyed hash function, i.e., the video V . The protocol of initial upload consists of the
hs (·), where s is a random string. In this way, even the video following steps:
hash value H(V ) is possibly leaked, the video is still well (1) User u runs the RSA-OPRF protocol (see Algorithm 1)
protected since τ is randomly generated. Moreover, the mask with the agency server and derives the message-derived tag α
enables all the subsequent users owning the duplicate videos and label β;
to extract the key τ , and further prove to cloud that they indeed (2) User u sends α to cloud for duplicate check and gets a
have the videos via a proofs-of-ownership (PoW) protocol. “non-duplicate” response;
The above treatment, however, is not directly suitable for (3) User u then encrypts V with a random key τ to produce
use in the encrypted cloud media center to strongly protect the video ciphertext CV ← Encτ (V ), and generates a masked
the video confidentiality. Specifically, the above treatment key r ← hs (V k β) ⊕ τ , where s is a random string. Besides,
is vulnerable to off-line brute-force attacks over predictable u encrypts τ under her private key sk, which is pre-generated
videos. More precisely, if such treatment is adopted for secure via KGen, to produce the ciphertext Cτ ← Encsk (τ ). Finally,
deduplication, given the video ciphertext CV and knowing u sends {CV , s, r, Cτ } to cloud;
that its underlying video V is from a dictionary DV = (4) Cloud computes H(CV ) for the later use in ownership
{V1 , V2 , . . . , Vn }, cloud is able to recover V via launching verification in the phase of subsequent upload.
the following two kinds of off-line brute-force attacks. In the Subsequent Upload. Suppose u0 is a subsequent user who also
first attack, cloud can first compute the hash for each candidate tries to upload V . The protocol of subsequent upload consists
Vi (i ∈ [1, n]) in DV , and then make comparisons between the of the following steps:
computed hashes and the received (stored) hash H(V ). When (1) User u0 derives the message-derived tag α and label β via
two hashes are found equal, the target video V is recovered. In the RSA-OPRF protocol with the agency server;
the second attack, cloud may first produce a set of candidate (2) User u0 sends α to cloud for duplicate check and gets
keys µset = {µ1 , µ2 , . . . , µn } by using each Vi in DV to a “duplicate” response along with (s, r), which indicates the
decrypt the masked random key. Then, cloud can try to use requirement of running a PoW protocol;
each candidate key µi to decrypt the target video ciphertext, (3) User u0 recovers the encryption key τ via τ ← hs (V k
and compare the decryption result with the candidate Vi in the β) ⊕ r. Then u0 encrypts V with τ to produce the ciphertext
dictionary DV . When a match is found, then the target video CV . Finally, u0 computes H(CV ) and sends it to cloud for
is recovered. equality verification;
In order to provide strong video protection in the encrypted (4) After successful verification, u0 is admitted as the owner of
cloud media center, we resort to an agency server for as- CV and encrypts the recovered τ under her private key sk 0 to
sistance in our system (inspired by [17] ), maintaining the produce the ciphertext Cτ0 ← Encsk0 (τ ), which is then stored
security strength of the above treatment while avoiding its at cloud.
vulnerability. Specifically, we leverage the agency server to Video Retrieval. Suppose now user u wants to access her
obliviously embed secrets in the video hash value H(V ). video V . The protocol of video retrieval consists of the
An OPRF protocol is initiated between the user and the following steps:
agency server, producing a message-derived tag α and label (1) User u sends the request for CV to cloud;

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
6

(2) Cloud checks whether u is an owner of CV . If it is, cloud Algorithm 2 The Threshold RSA-OPRF protocol
returns {CV , Cτ } to u. Otherwise, cloud rejects the request; Input: Message: M ; Secret key of agency server i: si .
Output: PRF output: z.
(2) Upon receiving {CV , Cτ }, u uses her private key sk to User: // Blind message
recover the decryption key τ from Cτ . Finally, u decrypts $
1: γ ← Z∗N ;
CV with the recovered τ and derives the video V , i.e., V ← 2: h ← H(M );
Decτ (CV ). 3: x ← h · γ e modN ;
4: Send x to a subset S of t agency servers.
Agency server i: // Sign
C. Decentralized Secure System Framework - Multiple Agency 5: yi ← x2∆si mod N ; // ∆ = n!
Servers 6: Send yi to the user.
User: // Compute the PRF output.
In the above proposed system framework, we consider a 7: λS
Q j
0,i = ∆ j∈S\{i} j−i , for each i ∈ S;
single agency server for the defense against off-line brute- S

8: w = i∈S yi 0,i ;
Q
force attacks. Sometimes, from the perspective of security and
reliability, one agency server might not be enough. Specifi- 9: y = wa xb ; // a, b ∈ Z such that 4∆2 a + eb = 1
10: z ← y · γ −1 modN ;
cally, a single agency server is not able to provide compromise 11: Ret G(z).
resilience and fault tolerance. On the one hand, when the single
agency server is corrupted, the defense will be invalidated. On
the other hand, when the single agency server breaks down,
Algorithm 2 illustrates how the threshold RSA-OPRF pro-
the system is not able to run normally since the user cannot
tocol2 works in the setting of n agency servers. Note that here
derive α and β. Therefore, to address the limitations of the
we do not explicitly differentiate the generation of α and β.
centralized setting of a single agency server, we also consider
Different from Algorithm 1 in the centralized setting, the PRF
extending the proposed framework to the decentralized setting
key d now is split into n secret shares and each secret share
of multiple agency servers. Note that introducing decentralized
si is held privately by agency server i. The splitting is based
agency servers is affordable in practice. As mentioned before,
on Shamir’s secret sharing method so that the PRF key d can
the agency service can be provided by other independent
be reconstructed from any subset of t shares. The threshold
economical cloud service providers and the multi-server model
RSA-OPRF protocol consists of the following steps:
has been widely used in the literature for secure applications.
(1) Upon the input of a message M (the PRF input), the user
One heuristic approach in the decentralized setting can be
produces a blinded hash x of the message M , and sends it to
considered as follows. Each agency server holds its own secret
a subset S of t agency servers;
keys. When a user needs to derive α and β, she runs the
(2) Upon receiving the blinded hash, agency server i ∈ S signs
RSA-OPRF protocol with each agency server. Then she uses
it with the secret key si , and returns the blind signature share
the concatenation of each PRF output from each RSA-OPRF
yi to the user;
protocol to derive α and β, respectively. In this way, the system
(3) After receiving t blind signature shares, the user first
can provide compromise resilience since now the defense line
computes the value λS0,i for each i ∈ S, through the standard
against off-line brute-force attacks are established by multiple
Lagrange interpolation formula;
agency servers. However, this approach does not help with
(4) The user performs combination of the blind signature
the reliability issue. Particularly, once an agency server breaks
shares and produces a final blind signature y, from which she
down, the user is not able to derive valid α and β any more.
further removes the blinding to derive the valid signature z.
In order to simultaneously improve the system security and
(5) The user computes the hash of z as the PRF output.
reliability, we resort to threshold cryptography. Such treatment
also appears in existing work [32], [34].
In the decentralized setting, we replace the underlying RSA D. Further Investigation
signature scheme of the OPRF protocol with a threshold The proposed general secure system framework focuses on
version [50]. Let n be the number of signers and t < n defending the threats posed by data predictability and bounded
a threshold parameter, a (t, n)-threshold signature scheme data leakage. We now show that this general framework can
allows any subset of t signers to produce a valid signature, be flexibly adapted to meet other security notions proposed
but prohibits any t − 1 or less signers from doing so. Recall by existing work on secure deduplication. In particular, we
that the agency server in our system framework plays the role investigate how to support the security for lock-dependent
of a signer. It assists in generating the message-derived tag α messages as defined in [30].
and label β, via signing the blinded input. Therefore, applying In [30], Abadi et al. consider the case of plaintext distribu-
the (t, n)-threshold RSA signature scheme in the decentralized tions that may depend on the public parameters of the schemes
setting of n agency servers enables a user to derive α and β in secure deduplication. Such inputs are referred to as lock-
from any subset of t agency servers. In this way, if an attacker dependent messages. They propose that when considering the
wants to break the defense line against off-line brute-force security of lock-dependent messages, using deterministic tags
attacks, it must corrupt at least t out of n agency servers. for duplicate check in secure deduplication may not satisfy
Meanwhile, the system now is running without relying on a the security for them. Specifically, if an adversary is allowed
single point any more. Hence, both the security and reliability
are boosted. 2 We assume the setup process of threshold RSA signatures is properly done.

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
7

to specify a distribution of plaintexts, it may use the fact that motivate the importance of layer-level deduplication for SVC
the tags are deterministic for leaking unnecessary information videos. Then, based on some important observations, we
on the messages. For example, the adversary may select a present the detailed construction for secure SVC video dedu-
distribution that is concentrated on messages whose tags share plication, under the above designed system framework. Note
a particular property, such as that they all start with a zero bit. that for the simplicity of exposition, we only present the
To avoid using deterministic tags for duplicate check and construction in the setting of single agency server, and it
achieve the security for lock-dependent messages, they design can be extended accordingly in the setting of multiple agency
a randomized tag generation scheme that supports equality servers, as described in Section V-C. Further, we show how to
testing. Specifically, a tag for the message M for duplicate generalize the construction of secure SVC video deduplication.
check is computed as ε = (g r , g rF (M ) ), where g is a generator Finally, we discuss structure-aware secure deduplication over
of a group, r is a random number, and F is a hash function. other file types with scalable structures.
And the equality testing is based on the use of bilinear map.
If considering the security for lock-dependent data, our gen- A. Layer-level Deduplication
eral system framework can seamlessly integrate this technique
The common deduplication strategies, i.e., file-level dedu-
to avoid using deterministic tags. Let e : G × G → GT be a
plication and block-level deduplication, are not suitable for
bilinear map, where G and GT are groups of prime order p,
SVC videos. Consider a realistic scenario: A user has the
and G is generated by the generator g. The bilinear map has
base layer and an enhancement layer for a source content,
the following bilinearity property: e(g a , g b ) = e(g, g)ab , where
while another user only has the same base layer. And both of
a, b ∈ Zp . Further, let F : {0, 1}∗ → Zp be a hash function.
them use some cloud data hosting platform with deduplication.
We can adopt this technique as follows. After deriving α
At the file level, the two SVC videos are different as their
from the agency server(s), the user produces the tag as ε =
contents are not identical. Thus, by file-level deduplication, the
(g r , g rF (α) ), which is sent to cloud for duplicate check. Given
two SVC videos cannot be deduplicated. On the other hand,
two tags ε1 = (g r1 , g r1 F (α1 ) ) and ε2 = (g r2 , g r2 F (α2 ) ), cloud
? it is also not a desirable choice to directly split a SVC video
tests eb(g r1 , g r2 F (α) ) = eb(g r2 , g r1 F (α) ) to check duplicates. into blocks and perform block-level deduplication, because the
layers in a SVC video are formatted in a special structure [21],
E. Discussion on Near-duplicate Video Detection [38]. Hence, applying block-level deduplication may destroy
We note that there are some plaintext works (e.g., [51]) that structure and is not suitable for SVC. To address the above
studying near-duplicate video detection for storage reduction. challenges, we propose to exploit the layered nature of SVC
However, our system framework does not consider supporting videos to enforce structure-aware layer-level deduplication,
near-duplicate video detection to eliminate encrypted near- which treats each SVC layer as a unit for deduplication.
duplicate video copies for the following reasons.
First, our target service setting is different from that of near- B. Secure SVC Video Deduplication
duplicate video detection, which usually aims to discover the Before going into the details of our construction of secure
near-exact video copies among the videos of a content provider SVC video deduplication, we describe two vital observations
like YouTube [51], [52]. Second, the limitation of existing which facilitate our design. First, we note that the base
plaintext techniques of near-duplicate video detection prevents layer in a SVC video plays a critical role. It serves as the
our design from eliminating encrypted near-duplicate video foundation of a SVC video and as the reference basis for
copies from different users. The general principle of identify- higher enhancement layers [21]. Such observation indicates
ing near-duplicate videos in the plaintext domain is measuring that if two SVC videos do not have the same based layer,
the distance-based similarity of a compact representation of they can hardly have duplicate layers. It inspires us to utilize
videos call video signatures [53], which inevitably incur false- the base layer for the duplicate check for a given SVC video.
positives [51], [52]. In particular, two videos that are identified Second, for the same source content, users who have the same
as near-duplicates under certain similarity measures may not base layer may own different number of enhancement layers,
be perceptually similar, i.e. not really near-duplicate copies. If under their heterogeneous devices and network environments.
we directly eliminate encrypted near-duplicate video copies This indicates that we only need to store a single copy of SVC
identified based on existing plaintext techniques, it would video with the highest quality (i.e., with the highest number
cause the loss of user videos, seriously impacting the system of layers) for adaptive dissemination.
service and user experience. With the above observations, the main idea of secure SVC
VI. A DAPTIVE V IDEO D ELIVERY WITH video deduplication in our system design can be described as
S TRUCTURE - AWARE S ECURE D EDUPLICATION follows. To upload a SVC video, a user first sends the message-
In this section, we illustrate how to bridge the gap between derived tag α1 of the base layer and the number of layers to
video coding and secure deduplication, enabling the encrypted cloud for duplicate check. If there is not a match for α1 at
cloud media center with efficient adaptive dissemination while cloud, it is considered that the user has a new SVC video and
supporting structure-aware secure deduplication. In particular, all layers should be uploaded. Otherwise, cloud has already
we exploit the SVC technique to support efficient adaptive stored a SVC version with the same base layer and thus the
video delivery, and show how to perform effective structure- version of the user may contain a certain number of duplicate
aware deduplication over encrypted SVC videos. First, we layers. In this case, if the user’s SVC version has fewer layers

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
8

than that one at cloud, she has to run the PoW protocol over over which she passes the PoW protocol successfully. That
all her layers with cloud. Otherwise, the PoW protocol only is, during the subsequent upload, regardless of the number of
needs to be run over the duplicate layers, while the additional layers claimed by a user, she is marked as the owner of a
layers should be uploaded. duplicate layer only when she passes the PoW protocol over
We are now ready to present our construction of secure it.
layer-level deduplication over encrypted SVC videos. It also SVC Video Retrieval. Suppose now user u wants to access
consists of three phases, i.e., initial upload, subsequent upload, her SVC video. The protocol of SVC video retrieval consists
and SVC video retrieval. Let SV = {m1 , m2 , . . . , ml } denote of the following steps:
the SVC video to be uploaded, where l is the number of layers. (1) User u sends the request of lreq layers for SV to cloud;
Each phase is elaborated as follows: (2) Cloud check whether u is an owner of all the requested
Initial Upload. Suppose the initial user u has L layers (i.e., layers. If it is, cloud returns the corresponding layer and
l = L) to upload. The protocol of initial upload consists of key ciphertexts {Ci , Cτi }1≤i≤lreq . Otherwise, cloud rejects the
the following steps: request or only returns the ones owned by u;
(1) User u interacts with the agency server to derive the (3) Upon receiving the layer and key ciphertexts, user u uses
message-derived tag α1 of the base layer and a message- her private key to recover each layer key τi and then recover
derived label set {βi }1≤i≤L for all layers, where α1 = each layer mi .
OPRFRSA RSA
k1 (H(m1 )) and βi = OPRFk2 (H(mi )). Then, the
user sends α1 and the number L of layers to cloud;
(2) Cloud performs duplicate check based on α1 . As α1 does C. Generalized Secure SVC video Deduplication
not exist, cloud returns a “non-duplicate” response to u;
(3) User u encrypts each layer following the way as specified The above construction of secure SVC video deduplication
in the proposed framework. For each layer mi , u produces a is based on the assumption that the duplicate layers of different
layer ciphertext Ci , a masked layer key ri along with a random users originate from the same source SVC video, considering
string si , and a layer key ciphertext Cτi , where τi is the layer that duplicate usually comes from the same source in prac-
key. Then u sends {Ci , (si , ri ), Cτi }1≤i≤L to cloud; tice. Therefore, if a match for the base layer is found, the
(4) Cloud computes H(Ci ) over each layer ciphertext Ci for succeeding layers are considered as identical. However, we
the later use of PoW protocol. are also able to relax such assumption to accommodate the
Subsequent Upload. Suppose the subsequent user u0 has L0 more complicated cases, by slightly changing the construction
layers (i.e., l = L0 ) and the already stored SVC version and generalizing it .
in cloud has Lc layers. The protocol of subsequent upload We now show how to generalize the above construction
consists of the following steps: by describing the main modifications. The main idea is to
(1) User u0 derives the message-derived tag α1 of the base perform duplicate check not only over the base layer, but
layer and a message-derived label set {βi }1≤i≤L0 for all layers. also over the enhancement layers. In particular, to generalize
Then, the user sends α1 and the number L0 of layers to cloud; the construction, some changes need to be made for the two
(2) Cloud first performs duplicate check via α1 and finds upload phases, which can be summarized as follows:
a match. It then checks the relation between L0 and Lc . (1) To upload a SVC video with l layers, before the user
If L0 ≤ Lc , it returns a “duplicate” response along with interacts with the cloud, she runs the RSA-OPRF protocol with
{(si , ri )}1≤i≤L0 . Otherwise, it returns a “duplicate” response the agency server to derive a tag set {αi }1≤i≤l and a label
along with {(si , ri )}1≤i≤Lc ; set {βi }1≤i≤l . Then the user sends the tag set {αi }1≤i≤l to
(3) If L0 ≤ Lc , user u runs the PoW protocol over all cloud for duplicate check;
layers to earn the ownership from cloud. Recall that the layer (2) If the tag α1 of the base layer does not exist at cloud, it is
keys {τi }1≤i≤L0 are recovered during the PoW process. Then considered that the SVC video does not contain any duplicate
u0 encrypts each τi using her private key sk 0 and stores layers. Thus, cloud informs the user that all layers should be
{Cτ0 i }1≤i≤L0 in cloud; uploaded;
(4) If L0 > Lc , for the Lc preceding (duplicate) layers (3) If the tag α1 of the base layer exists at cloud, it is
{mi }1≤i≤Lc of the SVC video, u0 runs the PoW protocol over considered that a SVC version with the same base layer
each of them to earn the ownership, and produces the cipher- has be stored. In this case, cloud first locates all stored
texts {Cτ0 i }1≤i≤Lc of the recovered layer keys {τi }1≤i≤Lc . enhancement layers that are related with the matched base
For each additional layer mi (Lc < i ≤ L0 ), u0 produces layer, and forms a candidate set of duplicate layers, which
a layer ciphertext Ci , a masked layer key ri along with a also includes the base layer. Then, by using the tags {αi }2≤i≤l
random string si , and a layer key ciphertext Cτ0 i , where τi of the enhancements layers, cloud checks whether there exist
is the layer key. Finally, u0 stores {Ci , (si , ri )}Lc <i≤L0 along duplicate enhancement layers. In particular, cloud sequentially
with {Cτ0 i }1≤i≤L0 in cloud. performs duplicate check from tag α2 to αl , and stops as long
Note that when L0 > Lc , cloud will update Lc as L0 as no match for a certain tested tag is found;
only when user u0 passes the PoW protocol enforced over (4) After the sequential duplicate check, cloud outputs a final
all duplicate layers. If the number of layers actually owned by set of duplicate layers, over which the user is required to run
user u0 is less than the claimed one, cloud would not update the PoW protocol to derive the ownership. Note that the user
Lc and only marks u0 as the owner of the duplicate layers, also needs to upload the non-duplicate layers if she has.

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
9

D. Secure Deduplication over Other Scalable Media The SVD security game GSVD A (η0 , η1 ) between a PPT
Although our proposed structure-aware secure deduplication adversary A and a challenger w.r.t. SVD scheme is defined as
focuses on encrypted SVC videos, we emphasize that the below, where η0 > η1 ≥ λ. Here, η0 denotes the lower bound
underlying design can also be flexibly extended to support of min-entropy of the challenged video V at the beginning
other media applications that employ media files with scalable of the game, and the adversary is allowed to learn at most
structures. (η0 − η1 ) bits information of video V from the challenger.
In real-world media applications, files of almost all formats Setup. Make the description of (E, D,<PoWP ,PoWV >)
could be exchanged [13]. Specifically, the content of various public, and let V be sampled from any distribution over
media file formats, e.g., text, JPEG 2000, H.26x and presen- {0, 1}B with min-entropy ≥ η0 , where the public integer
tations, can be divided into logical units. Each unit itself is parameter B≥η0 is polynomially bounded in λ. The challenger
meaningful, and as more units are supplemented, more infor- sends H(V ) to the adversary A.
mation can be provided. Such media content can be referred Learning-I The adversary A can make a Leak-Query to the
to as scalable media content [13]. It is obvious that the SVC challenger as follows:
video studied in this paper is one kind of such scalable media, • Leak-Query(F): This query is constituted of a PPT-
which in turn indicates that the proposed idea/construction for computable function F. To response this query, the challenger
SVC is also able to embrace other scalable media. computes w:=F(V) and sends w to the adversary, where the
Based on our design, we now summarize the core points bit-length of output w is required to be smaller than (η0 − η1 ).
to perform structure-aware secure deduplication over other Commit. The adversary A chooses a subset of f indices
scalable media. First, the deduplication design should take into i1 , i2 , · · · , if from [1, |V |], where f ≥ 1 and f +|w| ≤ η0 −η1 .
account the underlying structures of scalable media. It should The challenger finds the sub bitstream γ ∈ {0, 1}f of V , such
be designed at the logic unit level for scalable media so as that, for each j ∈ [1, f ], γ[j] = V [j]. The challenger chooses
$
to achieve effectiveness, rather than at the file level or block a random bit b ∈ {0, 1} and sets γb := γ and γ1−b ← {0, 1}f .
level. Second, utilize the message-derived tags of the basis The challenger sends (γ0 , γ1 ) to the adversary A.
units for duplicate check, if they are the references units of Guess-I. Denote with ViewCommit A the view of the adver-
such scalable media. This can help the storage server shrink sary A. Given ViewCommit A as input, another PPT algorithm
the range of duplicate check and fast locate the duplicate. called “extractor” A∗ outputs a guess bA∗ of value b.
Learning-II. The adversary A can adaptively make the
VII. S ECURITY A NALYSIS following queries to the challenger, where concurrent queries
In this section, formal security analysis is provided to show are not allowed and each query is described as follows:
that our system design meets the security goals. In particular, • Encode-Query: In response to the Encode-Query, the
we will show that our system design is able to address the challenger runs the probabilistic encoding algorithm on V to
threats from the external adversary (i.e., the user) and internal generate (C0 , CV , τ, ) := E(V, 1λ ), where C0 = (α, s, r).
adversary (i.e., cloud or the agency server), as mentioned in Then, the challenger sends (C0 , CV ) to the adversary. The
Section III-B. More precisely, our system design is able to adversary can make exactly one query in this type.
address the threats of bounded data leakage and off-line brute- • Verify-Query: The challenger running the
force attacks over predictable videos, respectively. prover algorithm PoWP with input V , obtains
We first prove that the proposed secure video deduplication (x0 ; x1 , x2 ) :=<PoWP (V ), A> via interacting with adversary
scheme, denoted by SVD, is secure in the bounded leak- A which replaces the verifier algorithm V. The adversary
age setting. For simplicity of description, we denote with knows the values of x1 and x2 , and can make polynomially
(E, D,<PowP ,PoWV >) the three components of our SVD many queries in this type.
scheme, where E is the user-side algorithm of encoding • Prove-Query: The challenger running the verifier al-
a video V , which takes as input a video V , and outputs gorithm PoWV with input C0 , obtains (x0 ; x1 ; x2 ) :=<
(α, s, r, CV , τ ); D is user-side algorithm of decrypting a video A, PoWV (C0 ) via interacting with the adversary A which
ciphertext CV , which takes as input a video ciphertext CV replaces the prover algorithm PoWP . The adversary A knows
and the key τ , and outputs V ; and <PoWP , PoWV > is the the value of x0 , and can make polynomially many queries in
prover algorithm and verifier algorithm in the PoW protocol, this type.
respectively, where P takes as input V and output x0 ∈{τ, ⊥}, Guess-II. The adversary A outputs a guess bA ∈ {0, 1} of
and V takes as input (α, s, r) and outputs x1 ∈{Accept, Reject} value b.
and x2 ∈ {H(CV ), ⊥}. Our security proof will follow the
security framework in [18]. Definition 1. (Secure SVD). Let λ be the security parameter
We now give the security formulation. In particular, the and η0 >η1 ≥ λ. We say a SVD scheme(E, D,<PowP ,PoWV >
formulation will address the protection of any physical bit of ) is (η0 , η1 )-secure, if for any (internal or external) PPT
a video from the external adversary user and the honest-but- adversary A, there exists some PPT extractor algorithm A∗ ,
curious internal adversary cloud. Roughly speaking, a prob- such that in the security game GSVD
A (η0 , η1 )
abilistic polynomial time (PPT) external/internal adversary is
P r[bA = b] ≤ P r[bA∗ = b] + negl(λ) (1)
not able to learn any new information on any physical bit of a
video V from the secure source-based deduplication process The above definition requires P r[bA = b] ≤ P r[bA∗ =
beyond the side channel leakage. b] + negl(λ), which means that the adversary A essentially

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
10

cannot learn any new information on physical bits of the video video V should have at least (η1 − λ) = λ + Ω(λ) bits min-
V during Learning-II phase. entropy.
Theorem 1. Let η0 > η1 = 2λ + Ω(λ), and the hash function
H be a collision-resistant full domain hash function. If the Claim 1. Suppose that SE is a semantically secure symmetric
encryption scheme SE is semantically secure and the hash key encryption scheme and {hs (·)} is a universal hash family.
function family {hs (·)} is a universal hash, the SVD scheme The simulated game GSim is computationally indistinguish-
is (η0 , η1 )-secure. able with the real game GReal = GSVDASVD (η0 , η1 ) to the view
of adversary ASVD .
Proof. For any PPT adversary ASVD against the proposed
SVD scheme, we show that we can construct a PPT adversary Proof. All messages that the adversary ASVD obtain in game
ASE against the underlying semantically secure symmetric GReal from the challenger are (QReal , α, s, hs (V ||β)⊕τ, CV ),
encryption scheme SE. where QReal = (w, γ) and w is the output of Leak-Query and
Construction of ASE : The adversary ASE is given a cipher- γ is computed in Commit phase. Similarly, the counterpart in
text CV = SE.Encτ (V ), where the encryption key τ and the game GSim is (QSim , α, s1 , s2 , CV ). For the same adversary
input video V are unknown and V has at least η0 bits min- ASVD with the same random coin, QReal = QSim . So for
entropy. The adversary ASE is allowed to learn any output of simplicity we just write them as Q.
F(V ) from the oracle OV , where the PPT-computable function Sample a video V 0 from {0, 1}|V | following the same
Func is chosen by ASE . distribution from which V is sampled. Generate a key τ 0 =
The adversary ASE can simulate a security game G Sim as SE.KGen(1λ ) and encrypt V 0 with key τ 0 to obtain video ci-
below, where ASE plays the role of challenger and ASVD plays phertext CV 0 = SE.Encτ 0 (V 0 ). Let Y ≈ind Z denote that ran-
the role of adversary: dom variable Y and Z are computationally-indistinguishable.
Setup. The adversary ASE learns the hash value H(V ) from We have
the oracle OV , and sends α = OPRF(k1 , H(V )) to ASVD .
Learning-I. The adversary ASE simply forwards Leak- (Q, α, s, hs (V ||β) ⊕ τ, CV ) (2)
Query made by ASVD to the oracle OV and forwards the ≈ind (Q, α, s, hs (V ||β) ⊕ τ, CV 0 ) (3)
response given by the oracle to ASVD .
≈ind (Q, α, s1 , s2 , CV 0 ) (4)
Commit. The adversary ASE learns the value of the chal-
lenged sub bitstream γ = V [i1 ]||V [i2 ]|| · · · ||V [if ] from the ≈ind (Q, α, s1 , s2 , CV ). (5)
oracle OV and then exactly follows the rest part of Commit
phase in the real game GSVD A . The above equations can be explained as follows. First, Eq
Guess-I. Denote with bSim A∗ ∈ {0, 1} the output of the (2)≈ind Eq (3) is because SE is a ciphertext-indistinguishable
SVD
extractor A∗SVD . symmetric encryption scheme. In particular, given information
Learning-II. Challenger ASE answers the queries made by (Q, α, s, hs (V ||β) ⊕ τ ) about video V , the unknown video V
A∗SVD as follows: still has at least Ω(λ) entropy, hence its encryption CV is
• Encode-Query: To respond to the encode query, computationally indistinguishable from an encryption CV 0 of
the challenger ASE independently and randomly chooses a random video V 0 ∈ {0, 1}|V | under a random encryption
$ $
τb ← KGen(1λ ), and s1 , s2 ← {0, 1}λ , and set Cτb := key τ 0 , where V 0 is sampled following the same distribution
(s1 , s2 , α). Let (C0 , C1 ) = (Cτb, CV ), and sends (C0 , C1 ) to from which V is sampled. Second, Eq (3) ≈ind Eq (4) follows
ASVD . Recall that ASE is given the ciphertext CV , and α is directly from the leftover hash lemma [55] which applies to
obtained from the oracle OV and OPRF protocol in the Setup the universal hash {hs }. Note that CV 0 is independent of the
phase. other terms in these two equations. Finally, Eq (4) ≈ind Eq (5)
• Verify-Query: The adversary ASE runs the prover al- again relies on the ciphertext-indistinguishability property of
gorithm and ASVD replaces the verifier algorithm. Denote the encryption scheme SE. Note that s1 , s2 are independent
the message received from ASVD as (y1 , y2 ). If (y1 , y2 ) = of the other terms in these two equations. This completes the
(s1 , s2 ), then send H(CV ) to ASVD ; otherwise, send a random proof for Claim 1.
$
value H ← {0, 1}λ to ASVD .
Claim 2. There exists some PPT extractor A∗SVD , such that
• Prove-Query: The adversary ASE runs V(C0 ) to interact
P r[bSim
ASVD = b
Sim
] ≤ P r[bSim
A∗ = bSim ] + negl(λ).
with adversary A∗SVD which replaces the prover algorithm, SVD

following the description in game GSVD A exactly.


Guess-II. The adversary ASVD outputs a guess bSim Proof. During the Learning-II phase of the simulated game
ASVD ∈
{0, 1} of b. The game GSim simulated by ASE completes. GSim , the challenger ASE does not make any new queries
Finally, ASE outputs γbSim ∈ {γ0 , γ1 }, and wins to OV , and all responses that AE provided to ASVD are
ASVD
computed from randomly sampled values and information that
the semantic-security game w.r.t. encryption scheme SE if
ASVD has already known before Learning-II (i.e. the hash
γbSim =γ=V [i1 ]||V [i2 ]|| · · · ||V [if ]. So far, ASE has received
ASVD value H(V )), except the ciphertext CV . Straightforwardly, we
at most (λ+η0 −η1 ) bits (in term of length) message about the have
unknown video V from the oracle OV . Therefore, according to V
Lemma 2.2 in [54], after leakage from the oracle, the unknown P r[AO Sim
SE (CV , |V |) = γ] = P r[bASVD = b
Sim
]

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
11

the agency server, cloud is not able to generate the message-


As the underlying symmetric encryption scheme SE is derived label β for each candidate in the dictionary. And
semantically secure, there exists a PPT algorithm C, such that thus for each trial, cloud is required to interact with the
V V agency server to get β and unmask the key via xoring a
P r[AO
SE (CV , |V |) = γ] ≤ P r[C
O
(|V |) = γ] + negl(λ) mask derived from β and V . In a word, our design effectively
prevents off-line brute-force attacks over predictable videos in
An extractor A∗SVD can be constructed based on algorithm C a controllable fashion.
as follows. Let γC be the output of C. If γC = γbb ∈ {γ0 , γ1 } for We now further analyze the security against the agency
some bb ∈ {0, 1}, then A∗SVD outputs bSim := bb; otherwise server in both the centralized and decentralized settings:
A∗ SVD
$ The case of single agency server. We also consider that
ASVD outputs a random bit bSim
A∗ ← {0, 1}. We have the agency server is semi-trusted. It faithfully assists in the
SVD

V generation of the message-derived tag α and label β via its


P r[bSim
ASVD = b
Sim
] = P r[AO
SE (CV , |V |) = γ] private keys, but is interested in H(V ). Accordingly, our
V
≤ P r[C O (|V |) = γ] + negl(λ) (6) system adopts the RSA-OPRF protocol, so the input H(V )
≤ P r[bSim =b Sim
] + negl(λ) is protected against to the agency server.
A∗
CSD
The case of decentralized agency servers. Recall that in the
Hence, there exists some PPT extractor A∗SVD , such that setting of decentralized agency servers, we resort to threshold
P r[bSim Sim
] ≤ P r[bSim = bSim ] + negl(λ). cryptography and replace the underlying RSA scheme of the
ASVD = b A∗SVD
Moreover, it is implied by Claim 1 that OPRF protocol with a threshold one. During the interaction
with different agency servers, the input H(V ) is still blinded
|P r[bSim
ASVD = b
Sim
] − P r[bReal
ASVD = b
Real
]| ≤ negl(λ) (7) to each agency server. Therefore, each involved agency server
is also not able to learn any information about the H(V ).
|P r[bSim
A∗
CSD
= bSim ] − P r[bReal
A∗
SVD
= bReal ]| ≤ negl(λ) (8) Besides, we note that the decentralization of agency servers
significantly enhances the security strength of our system
We have the following by combining Eq (6), Eq (7) and Eq framework. In particular, the security of threshold RSA sig-
(8): nature guarantees that the attacker has to corrupt up to a
threshold number of agency servers so as to break the defense
P r[bReal
ASVD = b
Real
] ≤ P r[bReal
A∗
SVD
= bReal ] + negl(λ) (9) line against off-line brute-force attacks. This indicates that
the defense line becomes much stronger as the attack now
demands much more efforts, compared with the centralized
Therefore, the proposed secure video deduplication scheme setting of a single agency server.
SVD is (η0 , η1 )-secure according to Definition 1. This com- It should be noted that even if collusion attacks between
pletes the proof. agency servers could be possible, the security of our decen-
tralized scheme against off-line brute-force attacks will not be
In the above security analysis, we show that our proposed compromised. In particular, the collusion between a threshold
secure video deduplication scheme is secure in the bounded number of agency servers will just reveal the signing keys
leakage setting. We now analyze the security against off-line to themselves. As analyzed above, as long as cloud does
brute-force attacks over predictable videos. not have access to the signing keys, it is prevented from
Theorem 2. For a predictable video that falls in a dictio- launching off-line brute-force attacks. In a word, from the
nary D = {V1 , V2 , · · · , Vn }, our design prevents cloud from security perspective, leveraging multiple agency servers avoids
launching off-line brute-force attacks in our design. the problem of single point of failure, significantly enhancing
the security of our system.
Proof. As described in Section V-A, if cloud knows that Note that the proposed construction of secure SVC video
the plaintext underlying a target video ciphertext is from deduplication is built under the secure deduplication frame-
a relatively small message space (or a dictionary), it can work. And ownership verification via the PoW protocol is
compute the hashes of all candidates in the dictionary. We enforced over each duplicate layer, ensuring that only the user
note that leveraging the agency server in our system design who indeed owns the duplicate layers can earn the ownership.
can effectively prevent off-line brute-force attacks. Discussion. With the assistance of the agency server, cloud
Firstly, instead of directly exposing the hash H(V ) to cloud is prevented from launching off-line brute-force attacks over
for duplicate check, we leverage the agency server to assists predictable videos. Hence, if cloud intends to recover a target
the generation of the message-derived tag α. Without the predictable video, it is forced to mount online brute-force
private key of the agency server, cloud is not able to produce attacks, i.e., each trial is required to access the agency server
α for each candidate in the dictionary by itself. Hence, for to obtain the tag α or the label β 3 . In this case, several proper
each trial, it is required to interact with the agency server to rate-limiting strategies, e.g., fixed delay and bounded query,
obtain α. Secondly, cloud may also try to recover the video can be adopted at the agency server side to effectively slow
encryption key and then obtain the video plaintext, as it owns
the video ciphertext CV and the masked key τ ⊕ hs (V ||β) 3 The tag-based method can be easier than the label-based one as it only
(along with a random string s). Likewise, without accessing requires tag equality testing.

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
12

SPS SSPS ... PPS ...


content, one straightforward method is to encrypt each layer
SVC File Header
per frame. However, this method is not directly compatible
Frame i
Frame 1 ... Base with the underlying SVC structure [23], [24]. Specifically,
NALU NALU
Layer in the SVC standard each layer is also divided into it own
Frame 2 ... Enhancement NALUs. A NALU pair (Type 14, Type 1) or (Type 14, Type
NALU NALU
Layer1
5), is used together to represent the base layer, while NALUs
... ...
of Type 20 are used for the enhancement layers, and each
(a) SVC structure of them can be identified by the dependency id (DID) in the
SVC extension header of the NALU. If the whole layer is
encrypted, a user client requesting a SVC video is not able
NALU Header SVC Extension Header Payload to perform timely decryption until the whole layer ciphertext
is downloaded, which in turn affects the timely decoding for
playback.
Instead of performing layer-level encryption for the SVC
video content, we adopt the NALU-level encryption mech-
F NRI Type R I PRID N DID QID TID U D O RR
anism inspired from [23], in which the payload of each
(b) NALU structure NALU is encrypted individually and the header information
is left in cleartext. Consequently, the client is able to perform
Fig. 2. Structures of SVC and NALU. Note that the SVC extension header
is specified for the NALUs of enhancement layers. decryption and then decoding as long as one encrypted NALU
is received. The processing of each layer can be executed in a
pipelined fashion at the user client. Regarding the implemen-
down the online brute-force rates [17]. We will show the tation, the user client extracts the NALUs belonging to the
effectiveness of rate-limiting in the following section. same layer per frame, encrypts them one by one, and uploads
them to cloud.
VIII. E XPERIMENTS
A. SVC Video Encryption B. Implementation
We now first illustrate how to perform proper encryption We implement our system prototype with roughly 7,000
over SVC videos in our system, including the video header lines of C++ code and 10,000 lines of Java code. It is deployed
and content (layers). Fig. 2 illustrates the underlying structure at Azure D4 instances. For cryptographic primitives, we use
of a SVC video. At a high level, a SVC video consists of the GMP library 4 and the Openssl library 5 to implement blind
a file header and a number of frames. In the file header, signature, symmetric encryption (AES/CBC-256), and full-
the indispensable information includes sequence parameter domain hash function (SHA-256). For video coding, we collect
set (SPS), subset sequence parameter set (SSPS), and picture videos from the VIRAT [57] and DASH [58] benchmarks,
parameter set (PPS). The base layer refers to the SPS, while encode them into SVC videos with the JSVM software6 . And
each enhancement layer refers to a corresponding SSPS. And the decoding of SVC videos is done with the Open SVC
each layer relates to a corresponding PPS [56]. Each parameter decoder [59]. Besides, we integrate the Open SVC decoder
set is in the format of a network abstraction layer unit (NALU), library to a open source video player MPlayer7 and re-compile
which is composed of a NALU header and payload, as shown it to play the SVC videos.
in Fig. 2 (b). Different kinds of parameter sets can be identified Entity Implementation. The implementation of each entity
by the Type field in the NALU header, while each parameter is presented as follows. (1) User Client: it is developed in
set of the same type (i.e., SSPS and PPS) can be identified to C++ and processes user’s requests. To generate the tag α
a corresponding layer by the NALU payload. and label β, the user client communicates with the agency
To encrypt the SVC video header, one straightforward way server. We also implement a basic access control mechanism,
is to treat the parameter sets as a whole and encrypt it. which includes user login and register operations; (2) Agency
However, such treatment is not scalable as it requires the server: it is also developed in C++. The agency server will
user to retrieve the whole encrypted header information each sign the blinded input received from the user client and return
time when requesting a SVC video, regardless of the number the result; (3) Application server: it is implemented with Java
of layers requested. Instead, we adopt a separate encryption and has three functions. Firstly, it handles the request from the
mechanism, i.e., encrypting each parameter set independently. user client for duplicate check, queries the storage server, and
In this way, upon receiving the request of a certain number of then query result to the user client. Secondly, it processes the
layers of a SVC video, cloud can just return the corresponding PoW protocol. Lastly, it verifies the access permission upon
encrypted parameter sets along with the encrypted video receiving the request of video download from the user client,
content. Note that the parameter sets are encrypted by the and returns the corresponding SVC videos; (4) Storage server:
user’s secret key and thus do not get involved in deduplication. 4 The CNU Multiple Precision Arithmetic Library: https://gmplib.org
Regarding the SVC video content, each frame at a high 5 OpenSSL Project: http://www.openssl.org
level, as shown in Fig. 2 (a), consists of a base layer and 6 Joint Video Team: SVC reference software(jsvm soft- ware), 2011.

several enhancement layers [21]. To encrypt the SVC video 7 MPlayer: http://www.mplayerhq.hu/design7/dload.html

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
13

TABLE II
E FFECTIVENESS EVALUATION OF FIXED - DELAY (1 SECOND ) RATE
LIMITING . T HE N ONE COLUMN CORRESPONDS TO NO RATE LIMITING .

Space Off-line t1 (s) None t2 (s) Fixed delay t3 (s)


215 33.76 167.28 32934.28
(a) 144P (b) 240P (c) 1080P 220 1080.61 5352.98 1053927.98
225 34579.52 171295.38 33725726.38
Fig. 3. Visual experience of different qualities for a given SVC video after
the decryption and decoding at the user client.
brute-force attacks, the increased time cost of tag generation
slows down the attack.
it stores SVC video ciphertexts along with tags for duplicate
To demonstrate the effectiveness, we simulate the brute-
check and ciphertext hashes for PoW, and the user profiles with
force attack and estimate the required time. As assumed, the
encrypted private keys and video ownerships. Note that we
attacker can narrow the video spaces via the sizes of SVC
use Thrift with cross-language Remote Process Call (RPC)8
layers, the SVC structure, etc. Thus, the number of trials of
to create network services between the entities.
brute-force attacks can be greatly reduced. We use the off-line
Storage Optimization. In our implementation, the NALUs
brute-force attack as our baseline, and evaluate its costs under
belonging to the same layer are grouped as a layer block
two situations, i.e., introducing the agency server with and
at the storage server. During the download of SVC videos,
without the fixed delay strategy. We note that the time cost of
the storage server needs to fast locate the corresponding layer
off-line brute-force attacks can be just considered as the SHA-
blocks and their NALUs. In the SVC standard [21], a start
256 computing time. And we use a single-layer SVC video
code prefix 0x00000001 is added between each NALU for
with a size of 100 KB as the attacker’s target, and evaluate
separation. Regarding our NALU-based encryption, using the
the attack under three different message spaces, i.e., 215 , 220
start code to fetch NALUs may not meet the performance
and 225 . Table II shows the results, where the “None” column
requirements for fast NALU fetching during video retrieval.
corresponds to no rate limiting. We can observe that by just
To address this issue, we utilize the Key-Length-Value
introducing 1 second delay for response at the agency server,
(KLV) encoding standard9 to package the NALUs, optimizing
the attack can be slowed down significantly. For example,
the storage of SVC. In particular, each NALU is encoded into
under the 220 message space, the off-line attack only needs
a Key-Length-Value triplet, where Key identifies the NALU
roughly 18 minutes to succeed, while using the fixed delay,
(its frame ID), Length specifies the NALU’s length, and
more than 12 days are required. Overall, the fixed-delay rate
Value is the NALU itself. Consequently, the storage server
limiting can slow down brute-force attacks by 975 times.
can efficiently distinguish each encrypted NALU. And upon
The Case of Decentralized Agency Servers. We now further
receiving the request of SVC video download, the storage
analyze the effectiveness of rate limiting in the setting of
server can efficiently fetch the encrypted NALUs by the help
decentralized agency servers. Recall that in such setting, the
of Length, and combine them with the same frame ID.
tag α can be successfully produced with the assistance of any
subset of t out of n agency servers. Hence, the online brute-
C. Visual Experience force rate will be increased by a factor of n/t. We remark
Fig. 3 displays the different qualities for a given SVC that compared with single agency server, the effectiveness
video after the decryption and decoding at the user client. of rate limiting in the decentralized setting is lowered but
It is observed that the results are consistent with the inherent controllable, via properly selecting the parameters of the
characteristics of SVC. In particular, the more layers the video (t, n)-threshold signature scheme. On another hand, recall that
has, the higher quality the video is. Therefore, our security the decentralization of agency servers avoids single point of
design does not affect the visual scalability of SVC. failure from the perspective of both security and reliability.

E. Performance Evaluation
D. Security Evaluation
Storage Consumption. We measure the increase in storage at
The Case of Single Agency Server. In our system, we first cloud to support the proposed secure deduplication design. To
consider an agency server to enforce that brute-force attacks store a SVC video with l layers, the storage overhead includes
over predictable videos can only be launched online. We now the tag α1 (32 bytes) for duplicate check, l masked key r with
show that with some proper rate-limiting strategy, online brute- seed s (64 bytes per one), l owner key ciphertexts (32 bytes
force attacks can be slowed down effectively. In particular, we per one) and l hashes of the layer ciphertexts (32 bytes per
adopt one of the common rate-limiting strategies called fixed one). Recall that the message-derived tag of the base layer
delay [17], which works by introducing a artificial delay tD is used for duplicate check, as two SVC videos which do not
before the agency server responds to a query. Since the process have the same base layer can hardly have duplicate layers [38].
of accessing the agency server is indispensable for each trial of Totally, the storage overhead for a SVC video with l layers
8 Apache is 32 + 128l bytes, which is roughly in linear to the number
Thrift: http://thrift.apache.org
9 BT.1563: Data encoding protocol using key-length-value: of layers and independent of the video size. Fig. 4 shows the
http://www.itu.int/rec/R-REC-BT.1563-1-201103-I/en relation between the storage overhead and the sizes of SVC

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
14

−3 1.2
10 tGen
1 layer
1 vEnc
3 layers
rGen
−4 5 layers
Storage overhead ratio 10 7 layers 0.8

Time (sec)
0.6
−5
10
0.4

−6 0.2
10
0
1 2 3
−7 Number of Layers
10
0 50 100 150 200
SVC video size (MB)
Fig. 5. Computation costs for initial upload of different number of layers of
a SVC video with size 70MB. There are totally three layers.
Fig. 4. Ratio of the storage overhead to the SVC video size.

10
tGen
vEnc
8
videos with different number of layers. It is observed that rGen

our security design incurs little storage overhead, compared

Time (sec)
6
with the video sizes under different number of layers. For
4
example, given a SVC video with 1MB size and 7 layers, the
storage overhead is only 928 bytes, just 0.09% of the video 2

size. And for a fixed number of layers, the ratio quickly drops 0
70 113 207 324 462 669
and becomes negligible as the video size gets larger. SVC video size (MB)

Computation Performance. We measure the performance of


Fig. 6. Computation costs for initial upload of SVC videos with the same
different computation components during the initial upload. number (three) of layers but different sizes.
Recall that there are four computation components: (1) gener-
ation of tag α1 via RSA-OPRF; (2) encryption of SVC video; 0.8
(3) generation of the masked key r (label β via RSA-OPRF 0.7
included); (4) encryption of key τ . We note that the time of key 0.6

encryption is negligible compared to others, so we focus on 0.5


Time (ms)

0.4
the first three ones, which are denoted as tGen, vEnc and rGen,
0.3
respectively. Besides, vEnc is regarded as the necessary opera- 0.2
tion to protect the video confidentiality. Thus, the computation 0.1
overhead for secure deduplication lies in the components tGen 0
0 2 4 6 8 10 12
and rGen. We evaluate the computation performance for both t

the centralized and decentralized settings.


Fig. 7. Time to combine the shares of blind signatures for varying the
The case of single agency server: Fig. 5 shows the threshold t out of totally sixteen agency servers.
time costs of the three components tGen, vEnc and rGen,
when different number of layers of a 70MB SVC video is
uploaded. The network delays in the RSA-OPRF protocol are bination. Fig. 7 shows the time to combine the shares of blind
not considered. It is shown that the time consumed by vEnc signatures, for varying the threshold t when there are totally
and rGen linearly increases with the number of layers, as 16 agency servers. It is observed that time grows linearly with
the relevant operations should be conducted for each layer. In t. Moreover, the cost is only at the order of milliseconds.
contrast, the time consumed by tGen remains constant since it Time Savings from Secure Deduplication. Similar to
is only related to the base layer. Fig. 6 further illustrates the prior work on secure source-based deduplication [18], we
time costs of the three components tGen, vEnc and rGen for also measure the time savings provided by our customized
SVC videos with different sizes but with the same number of secure deduplication design, compared with a basic policy
layers. When the video sizes increase, the layer sizes increase Enc+No-Dedup in which that the encrypted duplicate SVC
as well, so each computation component will take more time videos (layers) are always sent to cloud. The running time
to complete. For SVC video sizes ranging from 70MB to of that policy includes the encryption time and the network
669MB, on average, the total computation costs of the three transfer time, while in our system it includes the time of all
components vary from 1.047 seconds to 9.6207 seconds. customized computation and network interactions for a user
The case of decentralized agency servers: The decentral- to obtain ownership of duplicates. We consider two network
ization of agency servers only affects the tGen and rGen scenarios, i.e., a fast network (20Mbps) and an extremely
components, since they proceed under the agency server’s fast network (100Mbps) [60]. Note that for the simplicity
assistance. Recall that the threshold RSA-OPRF protocol of presentation, we only evaluate the case of single agency
requires the user to perform the combination of blind signature server. As illustrated above, the additional time costs in the
shares, which is the extra computation compared with the decentralized setting would be trivial.
non-threshold one. Hence, we focus on measuring the time Fig. 8 compares the running time between our system and
consumed by the user to perform blind signature share com- the Enc+No-Dedup policy over different number of duplicate

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
15

30 customized design of layer-level secure SVC video deduplica-


Our system (20Mbps)
25 Our system (100Mbps) tion. We have provided thorough security analysis to show the
Enc+No−Dedup (20Mbps)
Enc+No−Dedup (100Mbps)
security strengths of our system design. Our implementations
20
have adopted a format-compliant SVC encryption strategy, and
Time (sec)
15 optimized the storage of encrypted SVC videos for efficient
10 dissemination. The extensive experiments have demonstrated
the practicality of our system.
5

0
1 2
Number of layers
3 R EFERENCES
[1] Y. Zheng, X. Yuan, X. Wang, J. Jiang, C. Wang, and X. Gui, “Enabling
Fig. 8. Running time of our system and Enc+No-Dedup policy, over different encrypted cloud media center with secure deduplication,” in Proc. of
number of duplicate layers for a 70MB SVC video with totally three layers. ACM ASIACCS, 2015.
[2] F. Chen, C. Zhang, F. Wang, J. Liu, X. Wang, and Y. Liu, “Cloud-assisted
300
live streaming for crowdsourced multimedia content,” IEEE Trans. on
Our system (20Mbps) Multimedia, vol. 17, no. 9, pp. 1471–1483, 2015.
250 Our system (100Mbps) [3] H. Yue, X. Sun, J. Yang, F. Wu et al., “Cloud-based image coding for
Enc+No−Dedup (20Mbps) mobile devices-toward thousands to one compression,” IEEE Trans. on
200 Enc+No−Dedup (100Mbps) Multimedia, vol. 15, no. 4, pp. 845–857, 2013.
Time (sec)

[4] G. Gao, W. Zhang, Y. Wen, Z. Wang, and W. Zhu, “Towards cost-


150 efficient video transcoding in media cloud: Insights learned from user
100
viewing patterns,” IEEE Trans. on Multimedia, vol. 17, no. 8, pp. 1286–
1296, 2015.
50 [5] S. Wang, K. Gu, S. Ma, W. Lin, X. Liu, and W. Gao, “Guided image
contrast enhancement based on retrieved images in cloud,” IEEE Trans.
0 on Multimedia, vol. PP, no. 99, p. 1, 2015.
70 113 207 324 462 669
SVC video size (MB) [6] W. Zhu, C. Luo, J. Wang, and S. Li, “Multimedia cloud computing,”
IEEE Signal Processing Magazine, vol. 28, no. 3, pp. 59–69, 2011.
Fig. 9. Running time of our system and Enc+No-Dedup setting, over SVC [7] BBC, “Apple to tighten iCloud security after celebrity leaks,” http://
videos in different sizes but with the same number (three) of duplicate layers. www.bbc.com/news/technology-29076899, 2014.
[8] K. Vinton, “Data Breach Bulletin: Staples, NeedMyTranscript, iCloud,
Sourcebooks,” http://www.forbes.com/sites/katevinton/2014/10/24/data-
breach-bulletin-staples-needmytranscript-com-icloud-sourcebooks/,
layers of a 70MB SVC video. It is observed that in both 2014.
network scenarios, our system always consumes much less [9] C. Wang, K. Ren, W. Lou, and J. Li, “Toward secure and effective data
time than the compared policy. The reason is that our system utilization in public cloud,” IEEE Network Magazine, vol. 26, no. 6, pp.
69–74, 2012.
enforces source-based deduplication, and thus network transfer [10] C.-Y. Hsu, C.-S. Lu, and S.-C. Pei, “Image feature extraction in
of duplicate layers is saved. Although some specific security encrypted domain with privacy-preserving sift,” IEEE Trans. on Image
mechanisms such as the RSA-OPRF and PoW protocols are Processing, vol. 21, no. 11, pp. 4593–4607, 2012.
[11] Z. Qin, J. Yan, K. Ren, C. W. Chen, and C. Wang, “Towards efficient
involved in our system, their impact on the time savings privacy-preserving image feature extraction in cloud computing,” in
rendered by deduplication is modest. In Fig. 9, we further Proc. of ACM MM, 2014.
compare the running time between our system and the policy [12] Q. Wang, W. Zeng, and J. Tian, “A compressive sensing based secure
watermark detection and privacy preserving storage framework,” IEEE
over SVC videos of sizes from 70MB to 669MB, but with the Trans. on Image Processing, vol. 23, no. 3, pp. 1317–1328, 2014.
same number of duplicate layers. As shown, for the 70MB [13] Y. Wu, Z. Wei, and R. H. Deng, “Attribute-based access to scalable
SVC video in the 20 Mbps network scenario, the compared media in cloud-assisted content sharing networks,” IEEE Trans. on
Multimedia, vol. 15, no. 4, pp. 778–788, 2013.
policy takes 28.9431 seconds, while the time for our system [14] C. Ma and C. W. Chen, “Secure media sharing in the cloud: Two-
is only 1.7065 seconds; in the 100 Mbps network scenario, dimensional-scalable access control and comprehensive key manage-
the compared policy takes 6.0296 seconds, while the time ment,” in Proc. of IEEE ICME, 2014.
[15] X. Yuan, X. Wang, C. Wang, A. Squicciarini, and K. Ren, “Enabling
for our system is only 1.7061 seconds. On average, the time privacy-preserving image-centric social discovery,” in Proc. of IEEE
savings via our secure system can achieve about 94.37 % in the ICDCS, 2014.
20Mbps network and about 72.92% in the 100Mbps network, [16] M. Bellare, S. Keelveedhi, and T. Ristenpart, “Message-locked encryp-
tion and secure deduplication,” in Proc. of EUROCRYPT, 2013.
respectively. Note that although the time savings naturally [17] ——, “Dupless: Server-aided encryption for deduplicated storage,” in
decrease as the network speed increases, storage savings still Proc. of USENIX Security, 2013.
matter to cloud and are independent on the network conditions. [18] J. Xu, E. Chang, and J. Zhou, “Weak leakage-resilient client-side
deduplication of encrypted data in cloud storage,” in Proc. of ACM
AISACCS, 2013.
IX. C ONCLUSION [19] Y. Zhou, T. Z. J. Fu, D. M. Chiu, and Y. Huang, “An adaptive cloud
downloading service,” IEEE Trans. on Multimedia, vol. 15, no. 4, pp.
In this paper, we have designed and implemented an en- 802–810, 2013.
crypted cloud media center hosting encrypted SVC videos. [20] Y. Sanchez, T. Schierl, C. Hellge, T. Wiegand, D. Hong, D. D.
We have first formulated a secure system framework sup- Vleeschauwer, W. V. Leekwijck, and Y. L. Louedec, “idash: improved
dynamic adaptive streaming over http using scalable video coding,” in
porting secure deduplication while strongly protecting the Proc. of ACM MMSys, 2011, pp. 257–264.
video confidentiality. It is resistant to the adversaries in the [21] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video
bounded leakage setting, and the adversaries launching brute- coding extension of the h.264/avc standard,” IEEE Trans. on Circuits and
System for Video Technology, vol. 17, no. 9, pp. 1103–1120, 2007.
force attacks over predictable videos, respectively We then [22] S. Xiang, “Scalable streaming,” https://sites.google.com/site/
have leveraged the layered structure of SVC and proposed a svchttpstreaming/storagesaving.

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMM.2016.2612760, IEEE
Transactions on Multimedia
16

[23] Z. Wei, Y. Wu, X. Ding, and R. H. Deng, “A scalable and format- [52] Y. H. W. W. Yixin Chen, Wenbo He, “Compoundeyes: Near-duplicate
compliant encryption scheme for h.264/svc bitstreams,” Signal Process- detection in large scale online video systems in the cloud,” in Proc. of
ing: Image Communication, vol. 27, no. 9, pp. 1011–1024, 2012. IEEE INFOCOM, 2016.
[24] T. Stutz and A. Uhl, “A survey of h. 264 avc/svc encryption,” IEEE [53] J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo, “Effective multiple
Trans. on Circuits and Systems for Video Technology, vol. 22, no. 3, pp. feature hashing for large-scale near-duplicate video retrieval,” IEEE
325–339, 2012. Trans. on Multimedia, vol. 15, no. 8, pp. 1997–2008, 2013.
[25] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer, [54] Y. Dodis, R. Ostrovsky, L. Reyzin, and A. D. Smith, “Fuzzy extractors:
“Reclaiming space from duplicate files in a serverless distributed file How to generate strong keys from biometrics and other noisy data,”
system,” in Proc. of IEEE ICDCS, 2002. SIAM J. on Computing, vol. 38, no. 1, pp. 97–139, 2008.
[26] P. Anderson and L. Zhang, “Fast and secure laptop backups with [55] B. Barak, Y. Dodis, H. Krawczyk, O. Pereira, K. Pietrzak, F. Standaert,
encrypted de-duplication,” in Proc. of USENIX LISA, 2010. and Y. Yu, “Leftover hash lemma, revisited,” in Proc. of CRYPTO, 2011.
[27] Z. Wilcox-O’Hearn and B. Warner, “Tahoe: the least-authority filesys- [56] IETF, “RTP payload format for scalable video coding,” https://tools.ietf.
tem,” in Proc. of ACM StorageSS, 2008. org/html/rfc6190#page-8.
[28] Gnunet, “GNU’s framework for secure peer-to-peer networking,” https: [57] S. Oh, A. Hoogs, A. Perera, N. Cuntoor, C.-C. Chen, J. T. Lee,
//gnunet.org/. S. Mukherjee, J. Aggarwal, H. Lee, L. Davis et al., “A large-scale
[29] Freenet, “The Free Network,” http://freenetproject.org/. benchmark dataset for event recognition in surveillance video,” in Proc.
[30] M. Abadi, D. Boneh, I. Mironov, A. Raghunathan, and G. Segev, of IEEE CVPR, 2011.
“Message-locked encryption for lock-dependent messages,” in Proc. of [58] S. Lederer, C. Müller, and C. Timmerer, “Dynamic adaptive streaming
Crypto, 2013. over http dataset,” in Proc. of ACM MMSys, 2012.
[31] M. Bellare and S. Keelveedhi, “Interactive message-locked encryption [59] M. Blestel and M. Raulet, “Open svc decoder: a flexible svc library,” in
and secure deduplication,” in Proc. of PKC, 2015. Proc. of ACM MM, 2010.
[32] J. Li, X. Chen, M. Li, J. Li, P. P. Lee, and W. Lou, “Secure deduplication [60] Akamai, “The Akamai State of the Internet Report,” http://www.akamai.
with efficient and reliable convergent key management,” IEEE Trans. on com/stateoftheinternet/.
Parallel and Distributed Systems, vol. 25, no. 6, pp. 1615–1625, 2014.
[33] J. Stanek, A. Sorniotti, E. Androulaki, and L. Kencl, “A secure data
deduplication scheme for cloud storage,” in Proc. of FC, 2014.
[34] Y. Duan, “Distributed key generation for encrypted deduplication:
achieving the strongest privacy,” in Proc. of ACM CCSW, 2014.
[35] P. Puzio, R. Molva, M. Önen, and S. Loureiro, “Cloudedup: secure
deduplication with encrypted data for cloud storage,” in Proc. of IEEE
CloudCom, 2013.
[36] J. Liu, N. Asokan, and B. Pinkas, “Secure deduplication of encrypted
data without additional independent servers,” in Proc. of ACM CCS,
2015.
[37] R. H. Deng, X. Ding, Y. Wu, and Z. Wei, “Efficient block-based
transparent encryption for h.264/svc bitstreams,” Multimedia Systems,
vol. 20, no. 2, pp. 165–178, 2014.
[38] Z. Wei, Y. Wu, R. H. Deng, and X. Ding, “A hybrid scheme for
authenticating scalable video codestreams,” IEEE Trans. on Information
Forensics and Security, vol. 9, no. 4, pp. 543–553, 2014.
[39] Fortune.com, “Cloud Price Cutting Isn’t Dead Yet,”
http://fortune.com/2016/01/15/cloud-price-cuts-continue/.
[40] Infoworld.com, “Faster and cheaper: The cloud is becoming harder to
resist,” http://www.infoworld.com/article/2865212/cloud-computing/the-
cloud-is-getting-both-faster-and-cheaper.html.
[41] E. Stefanov and E. Shi, “Multi-cloud oblivious storage,” in Proc. of
ACM CCS, 2013.
[42] Q. Wang, J. Wang, S. Hu, Q. Zou, and K. Ren, “Sechog: Privacy-
preserving outsourcing computation of histogram of oriented gradients
in the cloud,” in Proc. of ACM AsiaCCS, 2016.
[43] V. Nikolaenko, S. Ioannidis, U. Weinsberg, M. Joye, N. Taft, and
D. Boneh, “Privacy-preserving matrix factorization,” in Proc. of ACM
CCS, 2013.
[44] V. Nikolaenko, U. Weinsberg, S. Ioannidis, M. Joye, D. Boneh, and
N. Taft, “Privacy-preserving ridge regression on hundreds of millions of
records,” in Proc. of IEEE S&P, 2013.
[45] Apache OpenOffice, “How to verify the integrity of the downloaded
file?” https://www.openoffice.org/download/checksums.html.
[46] C. Wang, S. S. M. Chow, Q. Wang, K. Ren, and W. Lou, “Privacy-
preserving public auditing for secure cloud storage,” IEEE Trans. on
Computers, vol. 62, no. 2, pp. 362–375, 2013.
[47] S. Jarecki and X. Liu, “Efficient oblivious pseudorandom function with
applications to adaptive ot and secure computation of set intersection,”
in Proc. of Theory of Cryptography Conference, 2009, pp. 577–594.
[48] M. Bellare, C. Namprempre, D. Pointcheval, and M. Semanko, “The
one-more-rsa-inversion problems and the security of chaum’s blind
signature scheme,” Journal of Cryptology, vol. 16, no. 3, pp. 185–215,
2003.
[49] Y. Sanchez, T. Schierl, C. Hellge, T. Wiegand, D. Hong, D. D.
Vleeschauwer, W. V. Leekwijck, and Y. L. Louédec, “Efficient http-
based streaming using scalable video coding,” Signal Processing: Image
Communication, vol. 27, no. 4, pp. 329–342, 2012.
[50] V. Shoup, “Practical threshold signatures,” in Proc. of EUROCRYPT,
2000.
[51] A. Katiyar and J. B. Weissman, “Videdup: An application-aware frame-
work for video de-duplication,” in Proc. of USENIX HotStorage, 2011.

1520-9210 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like