Professional Documents
Culture Documents
Towards Differential Query Services in
Towards Differential Query Services in
Towards Differential Query Services in
X, MONTH YEAR 1
AbstractCloud computing as an emerging technology trend is expected to reshape the advances in information technology. In a cost-
efcient cloud environment, a user can tolerate a certain degree of delay while retrieving information from the cloud to reduce costs. In
this paper, we address two fundamental issues in such an environment: privacy and efciency. We rst review a private keyword-based
le retrieval scheme that was originally proposed by Ostrovsky. Their scheme allows a user to retrieve les of interest from an untrusted
server without leaking any information. The main drawback is that it will cause a heavy querying overhead incurred on the cloud, and
thus goes against the original intention of cost efciency. In this paper, we present a scheme, termed efcient information retrieval for
ranked query (EIRQ), based on an aggregation and distribution layer (ADL), to reduce querying overhead incurred on the cloud. In
EIRQ, queries are classied into multiple ranks, where a higher ranked query can retrieve a higher percentage of matched les. A user
can retrieve les on demand by choosing queries of different ranks. This feature is useful when there are a large number of matched
les, but the user only needs a small subset of them. Under different parameter settings, extensive evaluations have been conducted
on both analytical models and on a real cloud environment, in order to examine the effectiveness of our schemes.
In this paper, we introduce a novel concept, differential research that is similar to ours can be found in the areas
query services, to COPS, where the users are allowed of private searching [3][11].
to personally decide how many matched les will be Unlike searchable encryption [2], [12], where the user
returned. This is motivated by the fact that under certain conducts searches on encrypted data, private searching
cases, there are a lot of les matching a users query, performs keyword-based searches on unencrypted data.
but the user is interested in only a certain percentage Private searching was rst proposed in [3], [4], which
of matched les. To illustrate, let us assume that Alice allows a server to lter streaming data without compro-
wants to retrieve 2% of the les that contain keywords mising user privacy. Their solution requires the server
A, B, and Bob wants to retrieve 20% of the les that to return a buffer of size O(f log(f )) when f les match
contain keywords A, C. The cloud holds 1,000 les, a users query. Each le is associated with a survival
where {F1 , . . . , F500 } and {F501 , . . . , F1000 } are described rate, which denotes the probability of this le being
by keywords A, B and A, C, respectively. In the successfully recovered by the user. Based on the Paillier
Ostrovsky scheme, the cloud will have to return 2, 000 cryptosystem [13], the les that mismatch a query will
les. In the COPS scheme, the cloud will have to return not survive in the buffer, but the matched les enjoy a
1, 000 les. In our scheme, the cloud only needs to return high survival rate.
200 les. Therefore, by allowing the users to retrieve Among various extensions, Refs. [5], [6] further re-
matched les on demand, the bandwidth consumed in duced the communication cost from O(f log(f )) to O(f )
the cloud can be largely reduced. by solving a set of linear equations to recover f matched
Motivated by this goal, we propose a scheme, termed les. However, their scheme requires the decryption of
Efcient Information retrieval for Ranked Query (EIRQ), one more buffer, thus the computation cost is higher than
in which each user can choose the rank of his query the Ostrovsky scheme. Ref. [8] presented an efcient de-
to determine the percentage of matched les to be re- coding mechanism which allows the recovery of les that
turned. The basic idea of EIRQ is to construct a privacy- collide in a buffer position. Ref. [9] proposed a recursive
preserving mask matrix that allows the cloud to lter out extraction mechanism, which requires a buffer of size
a certain percentage of matched les before returning to O(f ) when f les match a users query. Ref. [10] pro-
the ADL. This is not a trivial work, since the cloud needs posed two new communication-optimal constructions;
to correctly lter out les according to the rank of queries one uses Reed-Solomon codes and allows for a zero-
without knowing anything about user privacy. Focusing error, and the other is based on irregular LDPC codes
on different design goals, we provide two extensions: and allows for lower computation cost at the server. The
the rst extension emphasizes simplicity by requiring above private searching schemes only support searching
the least amount of modications from the Ostrovsky for OR of keywords or AND of two sets of keywords.
scheme, and the second extension emphasizes privacy by Ref. [11] extended the types of queries to support dis-
leaking the least amount of information to the cloud. junctive normal forms (DNF) of keywords. The main
Our key contributions are as follows: drawback of existing private searching schemes is that
1) We propose three EIRQ schemes based on the both the computation and communication costs grow
ADL to provide a cost-efcient solution for private linearly with the number of users executing queries.
searching in cloud computing. Thus, when applying these schemes to a large-scale
2) The EIRQ schemes can protect user privacy while cloud environment, querying costs will be extensive.
providing a differential query service that allows Our previous work [7] was the rst to make private
each user to retrieve matched les on demand. searching techniques applicable to a cloud environment.
3) We provide two solutions to adjust related param- However, Ref. [7] requires the cloud to return all of the
eters; one is based on the Ostrovsky scheme, and matched les, which may cause a waste of bandwidth
the other is based on Bloom lters. when only a small percentage of les are of interest.
4) Extensive experiments were performed using a To alleviate the problem, we introduced the concept of
combination of simulations and real cloud deploy- differential query services in [14]. The main difference
ments to validate our schemes. between this work and [14] is that we provide two
extensions to address different aspects of the problem,
The remainder of this paper is organized as follows.
and we conduct extensive experiments on a real cloud
We introduce related work in Section 2 before presenting
to verify the effectiveness of the proposed schemes.
preliminaries in Section 3. We describe EIRQ schemes in
Section 4 and adjust the parameters in Section 5. After 3 BACKGROUND
analyzing the performance and security of the proposed
schemes in Section 6, we conduct evaluations in Section 3.1 System Model
7. Finally, we conclude this paper in Section 8. The system mainly consists of three entities1 : the aggre-
gation and distribution layer (ADL), many users, and the
2 R ELATED WORK 1. The users inside an organization share data in the cloud. Thus, we
assume that a management server maintained by the organization is
Our work aims to provide differential query services in charge of managing the authorized users and related keys. Limited
while protecting user privacy from the cloud. Existing by the space, we do not detail this entity here.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH YEAR 3
buffer. For le Fj , the corresponding c-e pair, denoted Algorithm 1 The EIRQ-Efcient scheme
as (cj , ej ), is generated as follows: the bits in query Q MatrixConstruct (run by the ADL with public key pk)
corresponding to keywords in Fj are multiplied together for i = 1 to d do
to form cj = Dic[i]Fj Q[i], where Dic[i] denotes the set l to be the highest rank of queries choosing Dic[i]
i-th keyword in the dictionary, and le content |Fj | is for j = 1 to r do
|F |
powered to cj to form ej = cj j . if j r l then
Then, the cloud constructs a buffer of size . Let B M [i, j] = Epk (1)
denote the buffer, where the i-th entry, denoted as B[i], else
consists of two parts, denoted as B[i, 1] and B[i, 2], both M [i, j] = Epk (0)
of which are initialized with an encryption of 0 under adjust and so that le survival rate is 1
the users public key. To map (cj , ej ) to the buffer, the FileFilter (run by the cloud)
cloud randomly chooses an entry, say p , and multiplies for each le Fj stored in the cloud do
(cj , ej ) to this entry by performing B[p , 1] = B[p , 1] cj for i = 1 to d do |F |
and B[p , 2] = B[p , 2] ej . The mapping operation will k = j mod r; cj = Dic[i]Fj M [i, k]; ej = cj j
be performed times. After mapping all pairs to the map (cj , ej ) times to a buffer of size
buffer, each buffer entry has one of the three statuses:
survival, collision, and mismatch. If only one matched le
is mapped, the entry state is survival; if more than one
returned. Suppose that queries are classied into 0 r
matched le is mapped, the entry state is collision; if no
ranks. Rank-0 queries have the highest rank and Rank-r
matched les are mapped, the entry state is mismatch.
queries have the lowest rank. In this paper, we simply
Step 3. The user runs the FileRecover algorithm to
determine this relationship by allowing Rank-i queries
recover les. The user decrypts the buffer, entry by entry,
to retrieve (1 i/r) percent of matched les. Therefore,
to obtain the plaintext c-e pairs. For the entries in the
Rank-0 queries can retrieve 100% of matched les, and
survival state, le content can be recovered by dividing
Rank-r queries cannot retrieve any les.
the plaintext e value by the plaintext c value.
Secondly, we should determine which matched les
The security of the Ostrovsky scheme derives from the
will be returned and which will not. In this paper, we
semantic security of the Paillier cryptosystem. The key
simply determine the probability of a le being returned
technique of their scheme is that the les mismatching a
by the highest rank of queries matching this le. Specif-
users query are processed to encrypted 0s, which have
ically, we rst rank each keyword by the highest rank
no impact on the matched les, even if they are mapped
of queries choosing it, and then rank each le by the
in the same entry. Thus, the buffer size only depends on
highest rank of its keywords. If the le rank is i, then the
the number of matched les, which is much smaller than
probability of being ltered out is i/r. Therefore, Rank-0
the number of les stored in the cloud.
les will be mapped into a buffer with probability 1, and
Rank-r les will not be mapped at all. Since unneeded
4 S CHEME D ESCRIPTION les have been ltered out before mapping, the mapped
In this section, we will describe the original EIQR scheme les should survive in the buffer with probability 1. In
and its two extensions. To distinguish the three EIRQ Section 5, we will illustrate how to adjust the buffer size
schemes, we name the original EIRQ scheme as EIRQ- and mapping times to achieve this goal.
Efcient, the rst extension as EIRQ-Simple, and the EIRQ-Efcient mainly consists of four algorithms, with
second extension as EIRQ-Privacy, in this paper. its working process being shown in Fig. 2-(b). Since algo-
The basic idea of EIQR-Efcient is to construct a rithms QueryGen and ResultDivide are easily understood,
privacy-preserving mask matrix with which the cloud can we only provide the details of algorithms MatrixCon-
lter out a certain percentage of matched les before struct and FileFilter in Alg. 1.
mapping them to a buffer. As proven in the Ostrovsky Step 1. The user runs the QueryGen algorithm to send
scheme, the le survival rate is determined by the buffer keywords and the rank of the query to the ADL. Since
size and mapping times . Therefore, the basic idea of the ADL is assumed to be a trusted third party, this query
two extensions is that, for each rank i {0, . . . , r}, the will be sent without encryption.
ADL adjusts the buffer size i and the mapping times i
Step 2. After aggregating enough user queries, the
to make the le survival rate qi approach 1i/r. To better
ADL runs the MatrixConstruct algorithm to send a mask
illustrate the working process of the EIRQ schemes, we
matrix to the cloud. The mask matrix M is a d-row and
provide examples in the supplementary le.
r-column matrix, where d is the number of keywords
in the dictionary, and r is the lowest query rank. Let
4.1 The EIRQ-Efcient Scheme M [i, j] denote the element in the i-th row and the j-
Before illustrating EIQR-Efcient, two fundamental th column, and let l be the highest rank of queries
problems should be resolved: that choose the i-th keyword Dic[i] in the dictionary.
Firstly, we should determine the relationship between M is constructed as follows: for the i-th row of M that
query rank and the percentage of matched les to be corresponds to Dic[i], M [i, 1], . . . , M [i, r l] are set to
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH YEAR 5
element by element, to form a resulting row. Each el- an array of m bits, all of which are initially set to 0. It
ement in the resulting row corresponds to a c value. Let uses k independent random hash functions h1 , . . . , hk ,
cj,1 , . . . , cj,m denote Fj s c values, where m = max i . The with range 0, . . . , m 1, to map each member in U to
cloud powers the le content |Fj | to cj,k to form ej,k , and random k bits. For each member s U , the bits hi (s)
maps (cj,k , ej,k ) to the buffer once, where 1 k m. are set to 1 if s S; otherwise, they are set to 0, for
Note that with the MatrixConstruct algorithm, we can 1 i k. A location can be set to 1 multiple times, but
make sure that, for a Rank-l le, the number of c values only the rst change has an effect. To check if a member
larger than 0 is l . Therefore, although m c-e pairs will s is in S, we check whether all hi (s) are set to 1. If not,
be mapped, only l of them will take effect, which is then clearly s is not a member of S. Otherwise, there is
equal to mapping c-e pairs l times to a buffer. still a certain probability (false positive) that s is not in
S, since k bits may be set to 1 by other members.
The false positive is calculated as follows [17]: after all
5 PARAMETER SETTING
members of S are hashed, the probability for a specic
5.1 Ostrovsky Parameter Setting bit to be 0 is (1 1/m)kn e(kn/m) . A false positive
The Ostrovsky scheme has proven that, given f les that occurs when each of k locations of one non-member are
match a query, when each le is randomly mapped set to 1, which is (1(11/m)kn )k (1e(kn/m) )k . Let
times into a buffer of 2 f entries, the le failure rate g = k ln(1 e(kn/m) ). Minimizing the false positive is
will be lower than f /2 , i.e., the le survival rate will be equivalent to minimizing k, which in turn is equivalent
higher than 1f /2 . Therefore, given a threshold failure to minimizing g. When k = ln 2 (m/n), we have
rate p > 0, if we map each le log(f /p ) times2 into a dg/dk = 0. In this case, the false positive is minimized to
buffer of size 2 f log(f /p ), then the real failure rate p (0.6185)m/n . That is to say, given the number of members
is smaller than p , and the real le survival rate q will n and the threshold false positive p , we can make the
be higher than 1 p . Furthermore, we know that, given real false positive approximate p when each member
the estimated number of the matched les, two factors is hashed log0.6185 (p ) ln 2 times to a Bloom lter of
have an impact on le survival rate: the buffer size and log0.6185 (p ) n bits. Recall that the le failure rate in
the mapping times. EIRQ schemes denotes the probability of a missing le,
Suppose that queries are classied into 0 r ranks, i.e., the probability that all mappings of each le collide.
where fi les match Rank-i query but mismatch higher In a sense, the failure rate is equivalent to the false
ranked queries, and fi les match Rank-i query. The probability in Bloom lters. Therefore, we let the number
Ostrovsky parameter setting is as follows: the ADL of members n, the threshold false positive p , and the
determines a threshold value > 0, and then adjusts number of bits m in Bloom lters represent the number
parameters with Eq. 1-3. EIRQ-Efciency lters out a of les that match the query f , threshold failure rate p ,
certain percentage of matched les before mapping them and buffer size in EIRQ schemes, respectively.
into the buffer, and thus all remaining les should be The Bloom lter parameter setting is as follows: the
returned. EIRQ-Efciency adopts one buffer, where the ADL determines a threshold value > 0, and then
le survival rate is 100%. EIRQ-Simple returns multiple adjusts parameters with Eq. 4-6. Here, EIRQ-Efcient
buffers with different le survival rates, one for each will use Eq. 4, EIRQ-Simple will use Eq. 5, and EIRQ-
rank. EIRQ-Privacy still adopts one buffer, but with Privacy will use Eq. 6 to adjust the parameters under the
different mapping times for les of different ranks. Bloom lter parameter setting.
Therefore, EIRQ-Efcient will use Eq. 1, EIRQ-Simple r
i
will use Eq. 2, and EIRQ-Privacy will use Eq. 3 to adjust = log0.6185 () ln 2, = ( fi (1 )) (4)
the parameters under the Ostrovsky parameter setting. i=0
ln 2 r
r r i i
f (1 ri ) i i = log0.6185 ( + ) ln 2, i = fi (5)
= log( i=0 i ), = 2 fi (1 ) (1) r ln 2
r r
i=0 i i
i = log0.6185 ( + ) ln 2, = ( f ) (6)
i
i = log(fi /( + )), i = 2 i fi (2) r i=0
ln 2 i
r
i r Note that, in Eqs. 1 and 4, p = , and thus, all of the
i = log(fi /( + )), = 2 i fi (3) les that are mapped into the buffer will survive with a
r i=0 rate higher than 1 ; in Eqs. 2, 3, 5, and 6, pi = i/r + ,
and thus the les that match Rank-i queries will survive
5.2 Bloom Filter Parameter Setting in the buffer with a rate higher than 1 i/r .
An alternative solution is to use Bloom lters [17], [18]
to adjust the parameters in our schemes. Bloom lter is 6 A NALYSIS
a technique that is used to represent a subset S of n 6.1 Security Analysis
members from a universe U . A Bloom lter consists of
We will show that EIRQ schemes can provide search
2. log is the abbreviation of log2 privacy, access privacy, and rank privacy as follows.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH YEAR 7
TABLE 1
No Rank vs. Three EIRQ schemes
Scheme Computation Communication under Ostrovsky setting Communication under Bloom lter setting 2
Search privacy. In the three schemes, the combined higher ranked queries. Furthermore, in No Rank and
query sent to the cloud is encrypted under the ADLs EIRQ-Efcient, the threshold le survival rate p is set to
public key with the Paillier cryptosystem. The query is a ; in EIRQ-Simple and EIRQ-Privacy, pi is set to i/r + .
matrix of encrypted 0s and 1s. The Paillier cryptosystem Computational cost. We only consider the cost of the
is semantically secure, and the ciphertext of every 1 or exponential operation, which is the most expensive. In
0 is different from other 1s or 0s. Therefore, the cloud both parameter settings, the results are the same. In
cannot deduce what each user is searching for from the EIRQ-Simple, the computational cost is r times more
encrypted query. than No Rank since, for each ranked query, the cloud
Access privacy. In the three schemes, the cloud pro- needs to process it on the le collection once. In EIRQ-
cesses the encrypted query on each le in a collection, Privacy, the computational cost is max(i ) times more
and maps the processing result into a buffer, which is than No Rank since, for each le, the cloud needs to ex-
encrypted with the ADLs public key. The cloud conducts ecute max(i ) exponentiations with the matrix elements.
this process for all les in the same way. Therefore, In EIRQ-Efcient, the computational cost is much the
the cloud cannot know which les are actually returned same as in No Rank, since the cloud needs to execute
from the encrypted buffer. exponentiation once for each le.
Rank privacy. In EIRQ-Simple, the messages from the Communication cost. Under the Ostrovsky parame-
ADL to the cloud are r encrypted queries, the buffer ter setting, for EIRQ-Simple, the ADL sends r queries,
size, and the mapping times, where r is the information, each of which is of size O(d), to the cloud, which
which we leak more than [3]. Given r, the cloud only will return r buffers to the ADL, each of which is
knows the number of query ranks without knowing of size O(fi log(fi /(i/r + ))); for EIRQ-Privacy, the
how many users are in each rank, nor which users are ADL sends a mask matrix of size O(max i d) to the
r
in which ranks. Therefore, EIRQ-Simple can protect the cloud, which will return a buffer of size O( i=0 (fi
basic level of rank privacy for a user. In EIRQ-Privacy,
log(fi /(i/r + )))) to the ADL; for EIRQ-Efcient, the
the message from the ADL to the cloud is a d-row and m- ADL sends a mask matrix of size O(r r d) to
the cloud,
column mask matrix, where d is the number of keywords whichr
will return a buffer of size O( i=0 (f i (1 i/r))
in the dictionary, and m = max i is the maximal value f (1i/r)
log( i=0 i )) to the ADL. Under the Bloom lter
of mapping times. Here, no extra information is leaked parameter setting, the query sizes are the same as those
more than [3]. Therefore, EIRQ-Privacy provides a high under the Ostrovsky parameter setting. The buffer sizes
level of user rank privacy. In EIRQ-Efcient, the message returned from the cloud in EIRQ-Simple, EIRQ-Privacy,
from the ADL to the cloud is a d-row and r-column r
andr EIRQ-Efcient are O( i=0 (fi log
r0.6185 (i/r + ))),
mask matrix, where d is the number of keywords in O( i=0 (fi log0.6185 (i/r +))), and O( i=0 (fi (1i/r))
the dictionary, and r is the lowest rank of user queries. log0.0.6185 ()), respectively.
Here, r is the information that we leak more than [3].
Therefore, EIRQ-Efcient can protect the basic level of
rank privacy for a user. 7 E VALUATION
In this section, we will compare three EIRQ schemes
6.2 Performance Analysis from the following aspects: le survival rate and compu-
tation/communication cost incurred on the cloud. Then,
We compare the performance between No Rank and the based on the simulation results, we deploy our program
three EIRQ schemes under different parameter settings in Amazon Elastic Compute Cloud (EC2) to test the
(see Table 1). In No Rank, the ADL only combines user transfer-in and transfer-out time incurred on the cloud
queries, but does not provide differential query services. when executing private searches. Note that the energy-
In the supplementary le, we also provide a comparison performance trade-off is crucial to the success of cloud
of performance between No Rank and the work in [3], computing, and existing energy-saving techniques are
[6]. Suppose that queries are classied into 0 r ranks,
t les stored in the cloud whose keywords constitute log(x)
2. O(log0.6185 (x)) = O(log(x)), as log0.6185 (x) = log(0.6185) , where
a dictionary of size d, fi les matching Rank-i queries, 1/ log(0.6185) is a constant. We keep log0.6185 (x) in the complexity
and fi les matching Rank-i queries but mismatching analysis for the ease of comparison.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH YEAR 8
Fig. 6. Comparison of communication cost under the Bloom lter setting. The x-axis denotes the number of queries
in each rank, and the y-axis denotes the buffer size (KB).
4 4 4
6000 x 10 x 10 x 10
No Rank 2 2.5 2.5
No Rank No Rank No Rank
EIRQSimple
EIRQSimiple 2 EIRQSimiple 2 EIRQSimple
4000 EIRQPrivate 1.5
EIRQPrivacy EIRQPrivacy EIRQPrivacy
EIRQEfficient 1.5
EIRQEfficient EIRQEfficient 1.5 EIRQEfficient
1
2000 1 1
0.5
0.5 0.5
0 0 0 0
5 10 15 20 25 5 10 15 20 25 5 10 15 20 25 5 10 15 20 25
(a) 4 common keywords (b) 2 common keywords (c) 1 common keyword (d) Random keywords
Fig. 7. Comparison of communication cost under the Ostrovsky setting. The x-axis denotes the number of queries in
each rank, and the y-axis denotes the buffer size (KB).
2000 6000
No Rank No Rank specifying queries of different ranks. By further reducing
EIRQSimple EIRQSimple
1500
EIRQPrivacy EIRQPrivacy
the communication cost incurred on the cloud, the EIRQ
4000
EIRQEfficient EIRQEfficient schemes make the private searching technique more ap-
1000
plicable to a cost-efcient cloud environment. However,
2000
500 in the EIRQ schemes, we simply determine the rank of
0 0
each le by the highest rank of queries it matches. For
5 10 15 20 25 0 5 10 15 20 25
(a) 4 Common keywords (a) 1 Common keyword
our future work, we will try to design a exible ranking
mechanism for the EIRQ schemes.
Fig. 10. Transfer-out time in real cloud under the Ostro-
vsky setting. The x-axis denotes the number of users, and
ACKNOWLEDGMENTS
the y-axis denotes the transferring time (s). This research was supported in part by NSF grants ECCS
1231461, ECCS 1128209, CNS 1138963, CNS 1065444,
and CCF 1028167; NSFC grants 61272151 and 61073037,
First, we test the transfer-in time in the real cloud, ISTCP grant 2013DFB10070, the China Hunan Provin-
which is mainly incurred by receiving queries from the cial Science & Technology Program under Grant Num-
ADL. Under both parameter settings, the query size ber 2012GK4106, and the Mobile Health Ministry of
for No Rank, EIRQ-Simple, EIRQ-Privacy, and EIRQ- Education-China Mobile Joint Laboratory (MOE-DST
Efcient can be calculated with O(d), O(r d), O(max i No. [2012]311).
d), and O(r d), respectively. Given d = 100, r = 4, and
|w| = 1KB, the query size for No Rank, EIRQ-Simple, R EFERENCES
and EIRQ-Efcient is about 100KB, 400KB, and 400KB, [1] P. Mell and T. Grance, The nist denition of cloud computing
(draft), NIST Special Publication, 2011.
respectively. For EIRQ-Privacy, the mapping times are [2] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovsky, Searchable
calculated in different ways under different parameter symmetric encryption: improved denitions and efcient con-
settings. Under the Bloom lter parameter setting, the structions, in Proc. of ACM CCS, 2006.
[3] R. Ostrovsky and W. Skeith, Private searching on streaming
mapping times are 7, 4, 1, 1, respectively, and thus the data, in Proc. of CRYPTO, 2005.
query size is about 700KB. However, under the Ostro- [4] , Private searching on streaming data, Journal of Cryptology,
vsky parameter setting, the mapping times depend on 2007.
[5] J. Bethencourt, D. Song, and B. Waters, New constructions and
the number of matched les, which in turn depends on practical applications for private stream searching, in Proc. of
the common interests among queries. The comparisons IEEE S&P, 2006.
of transfer-in time are shown in Fig. 8. [6] , New techniques for private stream searching, ACM Trans-
actions on Information and System Security, 2009.
Then, we test the transfer-out time at the cloud, which [7] Q. Liu, C. Tan, J. Wu, and G. Wang, Cooperative private search-
is mainly incurred by returning les to the ADL. The ing in clouds, Journal of Parallel and Distributed Computing, 2012.
results are shown in Figs. 9 and 10. In all cases, EIRQ- [8] G. Danezis and C. Diaz, Improving the decoding efciency of
private search, in IACR Eprint archive number 024, 2006.
Efcient consumes the least amount of transfer time, [9] , Space-efcient private search with applications to rateless
and EIRQ-Simple works better than No-Rank under the codes, Financial Cryptography and Data Security, 2007.
Bloom lter setting. For example, under the Ostrovsky [10] M. Finiasz and K. Ramchandran, Private stream search at the
same communication cost as a regular search: Role of ldpc codes,
scheme, No-Rank consumes from 83.6s to 1191.8s, EIRQ- in Proc. of IEEE ISIT, 2012.
simple consumes from 189.8s to 1597.6s, EIRQ-Privacy [11] X. Yi and E. Bertino, Private searching for single and conjunctive
consumes from 83.3s to 1099.9s, and EIRQ-Efcient con- keywords on streaming data, in Proc. of ACM Workshop on Privacy
in the Electronic Society, 2011.
sumes from 57.4s to 475.1s when there are 4 common [12] B. Hore, E.-C. Chang, M. H. Diallo, and S. Mehrotra, Indexing
keywords; No-Rank consumes from 191.1s to 3857.5s, encrypted documents for supporting efcient keyword search,
EIRQ-simple consumes from 181.5s to 5369.7s, EIRQ- in Secure Data Management, 2012.
[13] P. Paillier, Public-key cryptosystems based on composite degree
Privacy consumes from 161.8s to 3323.4s, and EIRQ- residuosity classes, in Proc. of EUROCRYPT, 1999.
Efcient consumes from 81.3s to 1502.7s when there is [14] Q. Liu, C. C. Tan, J. Wu, and G. Wang, Efcient information
1 common keyword. retrieval for ranked queries in cost-effective cloud environments,
in Proc. of IEEE INFOCOM, 2012.
Therefore, EIRQ-Efcient is most suitable to be de- [15] S. Yu, C. Wang, K. Ren, and W. Lou, Achieving secure, scalable,
ployed to a cloud environment. For example, the time and ne-grained data access control in cloud computing, in Proc.
to transfer a query from the ADL to the cloud consumes of IEEE INFOCOM, 2010.
[16] G. Wang, Q. Liu, J. Wu, and M. Guo, Hierarchical attribute-based
less than 100 seconds, and the time to transfer the buffer encryption and scalable user revocation for sharing data in cloud
from the cloud to the ADL consumes less than 500 servers, Computers & Security, 2011.
seconds, under 4 common keywords. [17] M. Mitzenmacher, Compressed bloom lters, IEEE/ACM Trans-
actions on Networking, 2002.
[18] D. Guo, J. Wu, H. Chen, and X. Luo, Theory and network ap-
plications of dynamic bloom lters, in Proc. of IEEE INFOCOM,
8 C ONCLUSION 2006.
In this paper, we proposed three EIRQ schemes based [19] A. Berl, E. Gelenbe, M. Di Girolamo, G. Giuliani, H. De Meer,
M. Q. Dang, and K. Pentikousis, Energy-efcient cloud comput-
on an ADL to provide differential query services while ing, The Computer Journal, 2010.
protecting user privacy. By using our schemes, a user [20] E. Gelenbe, R. Lent, and M. Douratsos, Choosing a local or
can retrieve different percentages of matched les by remote cloud, in Proc. of IEEE NCCA, 2012.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. XX, NO. X, MONTH YEAR 11