Professional Documents
Culture Documents
Viden Io Data Analytics Bloom Filter and FM Algo Final
Viden Io Data Analytics Bloom Filter and FM Algo Final
• • x : An element
• S: A set of elements
• Input: x, S
• Output:
• -TRUE if x in S
• -FALSE if x not in S
3/13/2023 MODULE-I DATA ANALYTICS 9
Insert (x)
Find h1(x), h2(x),......,hk(x), set all these bits in Bloom Filter to 1
m
k= ln(2)
n
Bloom Filter -- Analysis
• What fraction of the bit vector B are 1s?
– Throwing k∙m darts at n targets
– So fraction of 1s is (1 – e-km/n)
• What happens as we
keep increasing k?
• “Optimal” value of k: n/m ln(2)
– In our case: Optimal k = 8 ln(2) = 5.54 ≈ 6
• Error at k = 6: (1 – e-1/6)2 = 0.0235
Example-1
• n size of the bloom filter = 11
• m is the number of items = 2
• k is the number of hash functions = 4
km k
1 e
• Pr (All k cells are set to 1) =
n
Example-2
(1/e ≈0.37....)
Counting Distinct Elements
Definition
• Data stream consists of a universe of elements chosen from
a set of N
• Maintain a count of number of distinct items seen so far.
{2,4,3,2,3,4,0,4,2,3,3,2}
Binnary Bits
Convesion to binanary:
{010,100,011,010,011,100,000,100,010,0
Trailing Zeros
Computing r(a):
{1,2,0,1,0,2,0,2,1,0,2,1}
Distinct Elements
So, r(a):
{1,2,0,1,0,2,0,2,1,0,2,1}
R = max r(a) = 2
Estimate = 2R = 22 = 2*2 =4