Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Estimating Frequency Moments of Data Streams using

Random Linear Combinations

Sumit Ganguly

Indian Institute of Technology, Kanpur


e-mail: sganguly@iitk.ac.in

Abstract. The problem of estimating the kth frequency moment Fk for any non-
negative k, over a data stream by looking at the items exactly once as they arrive,
was considered in a seminal paper by Alon, Matias and Szegedy [1, 2]. The space
1
complexity of their algorithm is Õ(n1− k ). For k > 2, their technique does not
apply to data streams with arbitrary insertions and deletions. In this paper, we
present an algorithm for estimating Fk for k > 2, over general update streams
1
whose space complexity is Õ(n1− k−1 ) and time complexity of processing each
stream update is Õ(1).
Recently, an algorithm for estimating Fk over general update streams with similar
space complexity has been published by Coppersmith and Kumar [7]. Our tech-
nique is, (a) basically different from the technique used by [7], (b) is simpler and
symmetric, and, (c) is significantly more efficient in terms of the time required to
1
process a stream update (Õ(1) compared with Õ(n1− k−1 )).

1 Introduction

A data stream can be viewed as a sequence of updates, that is, insertions and deletions
of items. Each update is of the form (l, ±v), where, l is the identity of the item and v
is the change in frequency of l such that |v| ≥ 1. The items are assumed to draw their
identities from the domain [N ] = {0, 1, . . . , N − 1}. If v is positive, then the operation
is an insertion operation, otherwise, the operation is a deletion operation. The frequency
of an item with identity l, denoted by fl , is the sum of the changes in frequencies of
l from the start of the stream.
P In this paper, we are interested in computing the k th
k
frequency moment Fk = l fl , for k > 2 and k integral, by looking at the items
exactly once when they arrive.
The problem of estimating frequency moments over data streams using randomized
algorithms was first studied in a seminal paper by Alon, Matias and Szegedy [1, 2].
They present an algorithm, based on sampling, for estimating Fk , for k ≥ 2, to within
any specified approximation factor  and with confidence that is a constant greater than
1
1/2. The space complexity of this algorithm is s = Õ(n1− k ) (suppressing the term
1 1
1− k
2 ) and time complexity per update is Õ(n ), where, n is the number of distinct
elements in the stream. This algorithm assumes that frequency updates are restricted to
the form (l, +1).
One problem with the sampling algorithm of [1, 2] is that it is not applicable to
streams with arbitrary deletion operations. For some applications, the ability to handle
deletions in a stream may be important. For example, a network monitoring applica-
tion might be continuously maintaining aggregates over the number of currently open
connections per source or destination.
In this paper, we present an algorithm for estimating Fk , for k > 2, to within an
accuracy of (1 ± ) with confidence at least 2/3. (The method can be boosted using the
median of averages technique to return high confidence estimates in the standard way
[1, 2].) The algorithm handles arbitrary insertions and legal deletions (i.e., net frequency
of every item is non-negative) from the stream and generalizes the random linear com-
binations technique of [1, 2] designed specifically for estimating F2 . The space com-
1
plexity of our method is Õ(n1− k−1 ) and the time complexity to process each update is
Õ(1), where, functions of k and  that do not involve n are treated as constants.
In [7], Coppersmith and Kumar present an algorithm for estimating Fk over general
1
update streams. Their algorithm has similar space complexity (i.e., Õ(n1− k−1 )) as the
one we design in this paper. The principal differences between our work and the work
in [7] are as follows.
1. Different Technique. Our method constructs random linear combinations of the
frequency vector using randomly chosen roots of unity, that is, we construct the
sketch Z = fl xl , where, xl is a randomly chosen k th root of unity. Coppersmith
and Kumar construct random linear combinations C = fl xl , where, for l ∈ [N ],
1 1 1 1
xl = −1/n1− k−1 or 1 − 1/n1− k−1 with probability 1 − 1/n1− k−1 and 1/n1− k−1
respectively.
2. Symmetric and Simpler Algorithm. Our technique is a symmetric method for all
k ≥ 2, and is a direct generalization of the sketch technique
 of Alon, Matias and
Szegedy [1, 2]. In particular, for every k ≥ 2, E Re Z k = Fk . The method of
Coppersmith and Kumar gives complicated expressions for estimating Fk , for k ≥
4. For k = 4, their estimator is C 4 − Bn F22 (where, Bn ≈ n−4/3 (1 − n−2/3 )2 ),
and requires, in addition, an estimation of F2 to within an accuracy factor of (1 ±
n−1/3 ). The estimator expression for higher values of k (particularly, for powers
of 2) are not shown in [7]. These expressions require auxiliary moment estimation
and are quite complicated.
3. Time efficient. Our method is significantly more efficient in terms of the time taken
to process an arrival over the stream. The time complexity to process a stream
update in our method is Õ(1), whereas, the time complexity of the Coppersmith
1
Kumar technique is Õ(n1− k−1 ).
The recent and unpublished work in [11] presents an algorithm for estimating Fk , for
k > 2 and for the append only streaming model (used by [1, 2]), with space complexity
2
Õ(n1− k+1 ). Although, the algorithm in [11] improves on the asymptotic space com-
plexity of the algorithm presented in this paper, it cannot handle deletion operations
over the stream. Further, the method used by [11] is significantly different from the
techniques used in this paper, or from the techniques used by Coppersmith and Kumar
[7].

Lower bounds. The work in [1, 2] shows space lower bounds for this problem to be
Ω(n1−5/k ), for any k > 5. Subsequently, the space lower bounds have been strength-
ened to Ω(2 n1−(2+)/k ), for k > 2,  > 0, by Bar-Yossef, Jayram, Kumar and Sivaku-
mar [3], and further to Ω(n1−2/k ) by Chakrabarti, Khot and Sun [5]. Saks and Sun [14]
show that estimating the Lp distance d between two streaming vectors to within a factor
of dδ requires space Ω(n1−2/p−4δ ).

Other Related Work. For the special case of computing F2 , [1, 2] presents an O(log n+
log m) space and time complexity algorithm, where, m is the sum of the frequencies.
Random linear combinations based on random variables drawn from stable distributions
were considered by [13] to estimate Fp , for 0 < p ≤ 2. The work presented in [9]
presents a sketch technique to estimate the difference between two streams based on
the L1 metric norm. There has been substantial work on the problem of estimating F0
and related metrics (set expression cardinalities over streams) for the various models of
data streams [10, 1, 4, 12].
The rest of the paper is organized as follows. Section 2 describes the method and
Section 3 presents formal lemmas and their proofs. Finally we conclude in Section 4.

2 An overview of the method


In this section, we present a simple description of the algorithm and some of its proper-
ties. The lemmas and theorems stated in this section are proved formally in Section 3.
Throughout the paper, we treat k as a fixed given value larger than 1.

2.1 Sketches using random linear combinations of kth roots of unity


Let x be a randomly chosen root of the equation xk = 1, such that each of the k roots
is chosen with equal probability of 1/k. Given a complex number z, its conjugate is
denoted by z̄. For any j, 1 ≤ j ≤ k, the following basic property holds, as shown
below. (
 j  j 0 if 1 ≤ j < k
E x = E x̄ = (1)
1 if j = k.
     
Proof. Let j = k. Then, E xj = E xk = E 1 = 1, since, x is a root of unity.

Let 1 ≤ j < k and let u be the elementary k th root of unity, that is, u = e2π −1/k .
k k
  1X 1X j l uj (1 − ujk )
E xj = (ul )j = (u ) =
k k k (1 − uj )
l=1 l=1

where, the last equality follows from the sum of a geometric progression in the complex
field. Since uk = 1,√it follows that ujk = 1. Further, since u is the elementary k th root
of unity, uj = e2πj −1/k jk
 j  6= 1, for 1 ≤ j < k. Thus, the expression (1 − u )/(1 −
j
u ) = 0. Therefore, E x = 0, for 1 ≤ j < k.
The conjugation operator is a 1-1 and onto operator in the field of complex numbers.
Further, if x is a root of xk = 1, then, x̄k = xk = 1̄ = 1, and therefore, x̄ is also a k th
root of unity. Thus, the conjugation operator, applied to the group of k th roots of unity,
results in a permutation of the elements in the group (actually, it is an isomorphism). It
therefore follows that the sum of the j th powers of the roots of unity is equal
 tothe sum
of the j th powers of the conjugates of the roots of unity. Thus, E x̄j = E xj . t
u
P
Let Z be the random variable defined as Z = l∈[N ] fl xl . The variable xl , for
each l ∈ [N ], is one of a randomly chosen root of xk = 1. The family of variables {xl }
is assumed to be 2k-wise independent. The following lemma shows that Re Z k is an
unbiased estimator of Fk . Following [1, 2], we call Z as a sketch. The random variable Z
can be efficiently maintained with respect to stream updates as follows. First, we choose
a random hash function θ : [N ] → [k] drawn from a family of hash functions that is 2k-
wise independent. Further, we pre-compute the k th roots √
of unity into an array A[1..k]
2·π·r· −1/k
of size k (of complex numbers), that is, A[r] = e , for r = 1, 2, . . . , k. For
every stream update (l, v), we update the sketch as follows.

Z = Z + v · A[θ(l)]

The space required to maintain the hash function θ = Õ(k), and the time required for
processing a stream update is also Õ(k).
 
Lemma 1. E Re Z k = Fk .
As the following lemma shows, the variance of this estimator is quite high.
 
Lemma 2. Var Re Z k = O(k 2k F2k ).
   
This implies that Var Re Z k /(E Re Z k )2 = O(F2k /Fk2 ), which could be as large
as nk−2 . To reduce the variance we organize the sketches in a hash table.

2.2 Organizing sketches in a hash table


Let φ : {0, 1, . . . , N −1} → [B] be a hash function that maps the domain {0, 1, . . . , N −
1} into a hash table consisting of B buckets. The hash function φ is drawn from a
family of hash functions H that is 2k-wise independent. The random bits used by the
hash family is independent of the random bits used by the family {xl }l∈{0,1,...,N −1} , or,
equivalently, the random bits used to generate φ and θ are independent. The indicator
variable yl,b , for any domain element l ∈ {0, 1, . . . , N − 1}] and bucket b ∈ [B], is
defined as yl,b = 1 if φ(l) = b and yl,b = 0 otherwise. Associated with each bucket b is
a sketch Zb of the elements that have hashed to that bucket. The random variables, Yb
and Zb are defined as follows.
X X
Zb = fl · xl · yl,b , Yb = Re Zbk , and Y = Yb
l b∈[B]

Maintaining the hash table of sketches in the presence of stream updates is analogous
to maintaining Z. As discussed previously, let θ : {0, 1, . . . , N − 1} → [k] denote a
random hash function that is chosen from a 2k-wise independent family of hash func-
tions (and independently

of the bits used by φ), and let A[1 . . . k] be an array whose j th
2·π·j· −1/k
entry is e , for j = 1, . . . , k. For every stream update (l, v), we perform the
following operation.
Zφ(l) = Zφ(l) + v · A[θ(l)]
The time complexity of the update operation is Õ(k). The sketches in the buckets except
the bucket numbered φ(l) are left unchanged.
  Y into
The main observation of the paper is that the hash partitioning of the sketch
{Yb }b∈[B] reduces the variance of Y significantly, while maintaining that E Y = Fk .
This is stated in the lemma below.
1  
Lemma 3. Let B ≤ 2n1− k . Then, Var Y = O(Fk2 nk−2 /B k−1 ).
A hash table organization of the sketches is normally used to reduce the time com-
plexity of processing each stream update [6, 8]. However, for k > 2, the hash table
organization of the sketches has the additional effect of reducing the variance.
Finally, we keep s1 independent copies Y [0], . . . , Y [s1 − 1] of the variable
  Y . The
average of these variables is denoted by Ȳ ; thus Var Ȳ = (1/s1 )Var Y . The re-
sult of the paper is summarized below, which states that Ȳ estimates Fk to within an
accuracy factor of (1 ± ) with constant probability greater than 1/2 (at least 2/3).
1 1
Theorem

k−1 ≤ B ≤ 2 · n1− k−1 and s = 6 · 2k · k 3k /2 . Then,
4. Let n1− 1
Pr |Ȳ − Fk | > Fk ≤ 1/3.
1
The space usage of the algorithm is therefore Õ(B · s1 ) = O(n1− k−1 ) bits, since
a logarithmic overhead is required to store each sketch Zb . To boost the confidence of
the answer to at least 1 − 2−Ω(s2 ) , a standard technique of returning the median value
among s2 such average estimates can be used, as shown in [1, 2].
The algorithm assumes that the number of buckets in the hash table is B, where,
1 1
1− k−1
n ≤ B ≤ 2 · n1− k−1 . Since, in general, the number of distinct items in the
stream is not known in advance, one possible method that can be used is as follows.
First estimate n to within a factor of (1 ± 81 ) using an algorithm for estimating F0 ,
such as [10, 1, 2, 4]. This can be done with high probability, in space O(log N ). Keep
2 log N +4 group of (independent) hash tables, such that the ith group uses Bi = d2i/2 e
buckets. Each group of the hash tables uses the data structure described earlier. At the
time of inference, first n is estimated as n̂, and, then, we choose a hash table group
1
indexed by i such that i = 2 · d(1 − k−1 ) log (8 · n̂/7)e. This ensures that the hash
1 1
table size Bi satisfies n1− k−1 ≤ Bi ≤ 2 · n1− k−1 , with high probability. Since, the
number of hash table groups is 2 · log N , this construction adds an overhead in terms
of both space complexity and update time complexity by a factor of 2 · log N . In the
remainder of the paper, we assume that n is known exactly, with the understanding that
this assumption can be alleviated as described.

3 Analysis

The j th frequency moment of the set of elements that map to P bucket b under the hash
j
function φ, is a random variable denoted by Fj,b . Thus, Fj,b
P = l fl yl,b . Further, since
every element in the stream hashes to exactly one bucket, b Fj,b = Fj . We define hl,b ,
for l ∈ {0, 1, . . . , N − 1} and b ∈ [B] to be hl,b = fl · yl,b . Thus, Fj,b = l hjl,b , for
P
j ≥ 1.
Notation: Marginal expectations. The random variables, Y, {Yb }b∈B are functions
of two families of random variables, namely, x = {xl }l∈{0,1,...,N −1} , used to gener-
ate the random roots of unity, and y = {yl,b }, l ∈ {0, 1, . . . , N − 1} and b ∈ [B],
used to map elements to buckets in the hash table. Our independence assumptions
imply that these two families are mutually independent (i.e., their seeds use indepen-
dent random bits), that is, Pr {x = u and y = v} = Pr {x = u} · Pr {y = v} Let
W = W (x, y) be a random variable that is a function  of the random variables in x and
y. For a fixed random choice of y = y0 , Ex W denotes the marginal expectation of
P
 of y.That is, Ex W = u W (u, y0 )Pr {x = u}. It follows that
W asa function
E W = Ey Ex W .
Overview of the analysis. The main  steps in the proof of Theorem 4 are as follows. 
k
In Section 3.1, we show that Ex Y = Fk . In Section 3.2, we show  2 that E Re Z ≤
2k k 2k
P k
k F2 . In Section 3.3, using the above result, we show that Ex Y ≤ k b F2,b .
 k 
Section 3.4 shows that Ey F2,b ≤ (2/B + 2k · nk−2 /B k )Fk2 and also concludes the
proof of Theorem 4. Finally, we conclude in Section 4.
P
Notation: Multinomial Expansion. Let X be defined as X = l∈{0,1,...,N −1} al ,
k
where, al ≥ 0, for l ∈ {0, 1, . . . , N − 1}. Then, X can be written as
k  
X X k X e
Xk = ael11 al2j · · · aelss
s=1
e1 e2 · · · es
e1 +···es =k,e1 >0,··· ,es >0 l1 <l2 <···<ls

where, s is the number of distinct terms in the product and ei is the exponent of the
ith product term. The indices li are therefore necessarily distinct, li ∈ {0, 1, . . . , N −
1}, i = 1, 2, . . . , s. For easy reference, the above equation is written and used in the
following form.
s
e 
X X Y
Xk = C(e) aljj . (2)
s,e:Q(e,s) l:R(e,l,s) j=1
Ps
where, Q(e, s) ≡ 1 ≤ s ≤ k and e = (e1 , e2 , . . . , es ) is s-dimensional and j=1 ej =
k; R(e, l, s) ≡ l = (l1 , l2 , . . . , ls ) is s-dimensional and 0 ≤ l 1 < l2 < · · · < l s ≤
k
N −1; and the multinomial coefficient C(e) = e1 ,...,e s
. In this notation, the following
inequality holds .
s s X
e e 
X Y Y
aljj ≤ al j . (3)
l:R(e,l,s) j=1 j=1 l

By setting n = k, and a1 = a2 = · · · = ak = 1, we obtain,


  X
k
X k
k = C(e) > C(e).
e,s
s e,s

By squaring the above equation on both sides, we obtain that k 2k = ( e,s C(e) ks )2 >
P 
2
P
e,s C (e). We therefore have the following inequalities.
X X
C(e) < k k , C 2 (e) < k 2k . (4)
e,s e,s
3.1 Expectation
 k

In this
  section, we show that E Re Z = Fk , thereby proving Lemma 1, and that
Ex Y = Fk .
Proof (of Lemma 1). Since the family of variables xl ’s is k-wise independent, therefore
s s
Y e   e 
Y
E xljj = E xljj .
j=1 j=1

Applying equation (2) to Z k = ( l fl xl )k and using linearity of expectation and k-


P
wise independence property of xl ’s, we obtain
s s
e   e 
X X Y Y
E Zk =
 
C(e) fljj E xljj .
s,e:Q(e,s) l:R(e,l,s) j=1 j=1
Qs ej 
Using equation (1), we note that the term j=1 E xlj = 0, if s > 1, since in this
case, ej < k, for each j = 1, . . . , s. Thus, the above summation reduces to
  X k
E Zk = fl = Fk .
l
 
Since Fk is real, E Re Z k is also Fk , proving Lemma 1. t
u
Lemma 5. Suppose
 that the family
  of random
 variables {xl } is k-wise independent.
Then, Ex Yb = Fk,b and Ex Y = E Y = Fk .
    P  P 
Proof. We first show that Ex Yb = Fk,b . Ex Zbk = Ex ( l fl yl,b xl )k = Ex ( l hl,b xl )k ,
by letting
P hl,b = fl · yl,b . By
P an argument
P analogous P to kthe proof of Lemma 1, we ob-
tain Ex ( l hl,b xl )k = l hkl,b = l flk yl,b k
= l fl,k yl,b = Fk,b , (since yl,b ’s are
   k

binary
  variables).
P Since
 P F k,b is always
  P real, E x Yb = E x Re Zb = Fk,b . Finally,
Ex Y = Ex Y
b b = E
b x Yb = F
b k,b = F
  k  , since each element is hashed
to exactly one bucket. Further, E Y = Ey Ex Y = Ey Fk = Fk . t
u

3.2 Variance of Re Z k
In this section, we estimate the variance of Re Z k and derive some simple corollaries.
 
Lemma 6. Let W = Re ( l al xl )k . Then, Var W ≤ k 2k ( l a2l )k .
P P
       
X = ( l al xl )k . Then, Var W = E W 2 − (E W )2 ≤ E X X̄ −
P
Proof.
 Let
(E W )2 . Using equation (2), for X, X̄, we obtain the following.
s s
e  e 
X X Y Y
X= C(e) aljj · xljj
s,e:Q(e,s) l:R(e,l,s) j=1 j=1
t t
X X Y g 0  Y g 0 
X̄ = C(g) amjj0 · x̄mjj0
t,g:Q(g,t) l:R(g,m,t) j 0 =1 j 0 =1
Multiplying the above two equations, we obtain
X X X X
X · X̄ = C(e) · C(g)
s,e:Q(e,s) t,g:Q(g,t) l:R(e,l,s) l:R(g,m,t)
s t s t
Y e 
Y g 0  Y e 
Y g 0 
aljj · amjj0 · xljj · x̄mjj0 .
j=1 j 0 =1 j=1 j 0 =1

The general form Q of the product Qt of random variables that arises in the multinomial
s e g 0
expansion of X X̄ is ( j=1 xljj )( j 0 =1 x̄mjj0 ). Since the random variables xl ’s are 2k-
wise independent, using equation (1), it follows that,

s t 1 if s = t = 1, e1 = g1 = k

Y e
Y
x̄gmjj0 = 1 if s = t, t > 1, e = g and l = m,

E xljj

j=1 j 0 =1 
0 otherwise.
This directly yields the following.

s
2e
X X Y
C 2 (e)
 
E X X̄ = alj j
e,s:Q(e,s) l:R(e,l,s) j=1
s X
2ej 
X Y
≤ C 2 (e) al , by equation (3)
e,s j=1 l
s
X Y X  ej X 2ej
X
≤ C 2 (e) a2l , since al ≤( a2l )ej
e,s j=1 l l l
s
X k X X
a2l C 2 (e) , since

= ej = k
l e,s j=1
X k
≤ a2l · k 2k , by equation (4). t
u
l

By letting al = fl , l ∈ {0, 1, 2 . . . , N − 1}, Lemma 6 yields


X
Var Re Z k = Var Re ( fl xl )k ≤ k 2k F2k ,
   
(5)
l

which is the statement of Lemma 2. By letting al = hl,b = fl · yl,b , where, b is a fixed


bucket index, and l ∈ {0, 1, 2 . . . , N − 1}, yields the following equation.
Ex Yb2 ≤ k 2k F2,b k
 
, for b ∈ [B]. (6)

 
3.3 Var Y : Vanishing of cross-bucket terms
 
We now
  consider
 the
 2problem
 of obtaining  Var
  2 an upper bound on Y . Note that
 
Var Y = Ey Ex Y −(Ey Ex Y ) . From Lemma 5, Ey Ex Y = Fk . Thus,
Var Y = Ey Ex Y 2 − Fk2 .
    
(7)
   k 
Lemma 7. Var Y ≤ k 2k b Ey F2,b
P
, assuming independence assumption I.
 2 P  P 2 P   
= Ex ( b Yb )2 = Ex Ex Yb2 +
P
Proof. Ex Y b Yb + a6=b Ya Yb = b
P  
a6=b Ex Ya Yb .  
We now consider Ex Ya Yb , for a 6= b. Recall that Ya = Re Zak ( and analogously,
Yb is defined). For any two complex numbers z, w, (Re z)(Re w) = (1/2)Re (z(w +
w̄)). Thus, Ya Yb = (Re Zak )(Re k k k
 Zb ) = (1/2)Re (Za Zb + Za Zb ).
k k

Let us first consider Ex Zak Zbk . The general term involving product of random
Qs e Qt g 0 Qs e Qt g
variables is ( j=1 fljj ) · ( j 0 =1 fmjj ) · ( j=1 ylj ,a · xljj ) · ( j 0 =1 ymj0 ,b · xmjj0 ). Con-
Qs e
sider the last two product terms in the above expression, that is, ( j=1 ylj ,a · xljj ) ·
Qt g
( j 0 =1 ymj0 ,b · xmjj0 ). For any 1 ≤ j ≤ s and 1 ≤ j 0 ≤ t, it is not possible that
lj = mj 0 , that is, the same element whose index is given by lj = mj 0 cannot si-
multaneously hash to two distinct buckets, a and b (recall that a 6= b). By 2k-wise
independence, we therefore obtain that the only way the above product term can be non
zero (i.e., 1)
 on expectation,
 P is that s = t = 1 and therefore, e1 = k and g1 = k. Thus,
we have E Zak Zbk = l,m hkl,a hkm,b = Fk,a Fk,b .
 
Using the same observation, it can be argued that Ex Zak Zbk = Fk,a Fk,b . It fol-
 
lows that Ex (1/2)(Zak Zbk + Zak Zbk )) = Fk,a Fk,b , which is a real number. Therefore
   
Ex Re (1/2)(Zak Zbk + ZakZbk) = Fk,a F k,b
  2  = E x Y a Y b .
2
By
P equation
 (7), Var Y = E y E x Y − F k . Further, from Lemma 5, Fk =
Ey b F k,b . We therefore have,
X 2 
Var Y = Ey Ex Y 2 −
    
Fk,b
b
X   X  X 2 
Ex Yb2 +

= Ey Ex Ya Yb − Fk,b
b a6=b b
X   X X 2 
= Ey Ex Yb2 + Fk,a Fk,b − Fk,b , by above argument
b a6=b b
X   X 2 
= Ey Ex Yb2 − Fk,b
b b
X
Ex Yb2
 
≤ Ey
b
X
k 2k F2,b
k

≤ Ey , by equation (6) t
u
b

 k 
3.4 Calculation of E F2,b

Pt a t-dimensional vector e = (e1 , . . . , et ) such that ei > 0, for 1 ≤ i ≤ t and


Given
j=1 ej = k, we define the function ψ(e) as follows. Without loss of generality, let
the indices ej be arranged in non-decreasing order. Let r = r(e) denote the largest
index such that er < k/2. Then, we define the function φ(e) as follows.
Pr
ψ(e) = n j=1 (1−2ej /k) /B t
The motivation of this definition stems from its use in the following lemma.

Pt Qt
Lemma 8. Suppose j=1 ej = k and ej > 0, for j = 1, . . . , t. Then, j=1 F2ej ≤
ψ(e) · Fk2 · B t .

j/k j/k
Proof. From [1, 2], Fj ≤ n1−j/k Fk , if j < k and Fj ≤ Fk , if j > k. Thus,
     
t r t r t
2ej /k   2ej /k 
Y Y Y Y Y
F2ej =  F2ej   F2ej  =  n1−2ej /k Fk Fk
j=1 j=1 j=r+1 j=1 j=r+1
Pt
j=1 2ej /k
Pr X
=n j=1 (1−2ej /k) F = ψ(e) · B t · Fk2 , since t
ej = k.u
k
j

The function ψ satisfies the following property that we use later.


1
Lemma 9. If B < 2 · n1− k , then, ψ(e) ≤ max (2/B, 2k · nk−2 /B k ).

Proof. Let e be a t-dimensional vector. If t = 1, ψ(e) = 1/B. If t = r, then P ψ(e) =


nt−2 /B t ≤ 2k · nk−2 /B k . If t ≥ r + 2, then ψ(e) = (2t /B t ) · nt−((t−r)+ 2ej /k) <
2t · nt−2 /B t ≤ 2k · nk−2 /B k . Finally, let t = r + 1. Then,
P
ψ(e) = 2t · nt−1−2 ej /k
/B t ≤ 2t · nt−1−2(t−1)/k /B t ,
Pr
since j=1 ej ≥ r = t − 1. Thus,

1 1
ψ(e) ≤ ψ(e)(2 · n1− k /B)k−t ≤ (2t · nt−1−2(t−1)/k /B t )(2 · n1− k /B)k−t =
2k · nk−2−(t−2)/k /B k ≤ 2k · nk−2 /B k .

1
where, the first inequality follows from the assumption that B < n1− k and the second
inequality follows because t ≥ 2. t
u

1  k 
Lemma 10. Let B < 2 · n1− k . Then, E F2,b < k k Fk2 (2/B + 2k · nk−2 /B k ).

Proof. For a fixed b, the variables yl,b are k-wise independent. F2,b is a linear function
k
of yl,b . Thus, F2,b is a symmetric multinomial of degree k, as follows.
X
k
F2,b =( fl2 yl,b )k
l
X X
= C(e) fl2e
1
1
· · · fl2e
s
s
yl1 ,b · yl2 ,b · yls ,b .
s,e l1 <l2 <···ls
Taking expectations, and using k-wise independence of the yl,b ’s, we have,
 k  X 2e
X
fl2e
 
E F2,b = C(e) 1
1
· · · flj j E yl1 ,b · yl2 ,b · yls ,b
s,e l1 <l2 <···<ls
2e
X X
fl2e
     
= C(e) 1
1
· · · flj j E yl1 ,b · E yl2 ,b · · · E yls , b
s,e l1 <l2 <···<ls
X X 2ej 1 1
fl2e
 
= C(e) 1
· · · flj , since, E ylj ,b =
s,e
1
Bs B
l1 <l2 <···<ls
X s
Y
≤ C(e) · (1/B s ) · F2ej
s,e j=1
X
≤ C(e) · ψ(e) · Fk2 , by Lemma 8
s,e
X
≤ C(e) · Fk2 · (2/B + 2k · nk−2 /B k ), by Lemma 9
s,e
X
≤ k k · Fk2 · (2/B + 2k · nk−2 /B k ), since, C(e) < k k t
u
s,e

Combining
  the result of Lemma 7 with Lemma 10, we obtain the following bound on
Var Y .
Var Y ≤ k 3k · Fk2 · (2 + 2k · nk−2 /B k−1 )
 
(8)
Recall that Ȳ is the average of s1 independent estimators, each calculating Y . The main
theorem of the paper now follows simply.
   2 2
Proof (of Theorem 4). By Chebychev’s inequality,
  2 Pr |Ȳ − Fk | > F k < Var Ȳ /( Fk ).
2
Substituting Equation (8), we have Var Ȳ /( · Fk ) ≤ 1/3. t
u

4 Conclusions
The paper presents a method for estimating the k th frequency moment, for k > 2,
of data streams with general update operations. The algorithm has space complexity
1
Õ(n1− k−1 )) and is based on constructing random linear combinations using randomly
chosen k th roots of unity. A gap remains between the lower bound for this problem,
namely, O(n1−2/k ), for k > 2, as proved in [3, 5] and the complexity of a known
algorithm for this problem.

References
1. Noga Alon, Yossi Matias, and Mario Szegedy. “The Space Complexity of Approximating
the Frequency Moments”. In Proceedings of the 28th Annual ACM Symposium on the Theory
of Computing STOC, 1996, pages 20–29, Philadelphia, Pennsylvania, May 1996.
2. Noga Alon, Yossi Matias, and Mario Szegedy. “The space complexity of approximating
frequency moments”. Journal of Computer Systems and Sciences, 58(1):137–147, 1998.
3. Ziv Bar-Yossef, T.S. Jayram, Ravi Kumar, and D. Sivakumar. “An information statistics
approach to data stream and communication complexity”. In Proceedings of the 34th ACM
Symposium on Theory of Computing (STOC), 2002, pages 209–218, Princeton, NJ, 2002.
4. Ziv Bar-Yossef, T.S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. “Counting
distinct elements in a data stream”. In Proceedings of the 6th International Workshop on
Randomization and Approximation Techniques in Computer Science, RANDOM 2002, Cam-
bridge, MA, 2002.
5. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. “Near-Optimal Lower Bounds on
the Multi-Party Communication Complexity of Set Disjointness”. In Proceedings of the
18th Annual IEEE Conference on Computational Complexity, CCC 2003, Aarhus, Denmark,
2003.
6. Moses Charikar, Kevin Chen, and Martin Farach-Colton. “Finding frequent items in data
streams”. In Proceedings of the 29th International Colloquium on Automata Languages and
Programming, 2002.
7. Don Coppersmith and Ravi Kumar. “An improved data stream algorithm for estimating
frequency moments”. In Proceedings of the Fifteenth ACM SIAM Symposium on Discrete
Algorithms, New Orleans, LA, 2004.
8. G. Cormode and S. Muthukrishnan. “What’s Hot and What’s Not: Tracking Most Frequent
Items Dynamically”. In Proceedings of the Twentysecond ACM SIGACT-SIGMOD-SIGART
Symposium on Principles of Database Systems, San Diego, California, May 2003.
9. Joan Feigenbaum, Sampath Kannan, Martin Strauss, and Mahesh Viswanathan. “An Ap-
proximate L1 -Difference Algorithm for Massive Data Streams”. In Proceedings of the 40th
Annual IEEE Symposium on Foundations of Computer Science, New York, NY, October
1999.
10. Philippe Flajolet and G.N. Martin. “Probabilistic Counting Algorithms for Database Appli-
cations”. Journal of Computer Systems and Sciences, 31(2):182–209, 1985.
11. Sumit Ganguly. “A bifocal technique for estimating frequency moments over data streams”.
Manuscript, April 2004.
12. Sumit Ganguly, Minos Garofalakis, and Rajeev Rastogi. “Processing Set Expressions over
Continuous Update Streams”. In Proceedings of the 2003 ACM SIGMOD International
Conference on Management of Data, San Diego, CA, 2003.
13. Piotr Indyk. “Stable Distributions, Pseudo Random Generators, Embeddings and Data
Stream Computation”. In Proceedings of the 41st Annual IEEE Symposium on Foundations
of Computer Science, pages 189–197, Redondo Beach, CA, November 2000.
14. M. Saks and X. Sun. “Space lower bounds for distance approximation in the data stream
model”. In Proceedings of the 34th ACM Symposium on Theory of Computing (STOC),
2002, 2002.

You might also like