Professional Documents
Culture Documents
Tài Liệu Ôn Tập Mô Hình Thuật Toán Internet
Tài Liệu Ôn Tập Mô Hình Thuật Toán Internet
1. Background
Pr(E1E2) = Pr(E1) + Pr(E2) - Pr(E1E2)
ng dng:
- 2 independent events: Pr(AB) = Pr(A)Pr(B)
- 2 disjoint events: Pr(E1E2) = Pr(E1) + Pr(E2)
2. Tm tt l thuyt
Cc mc ch Ni dung
o
1.Verifying
Polynomial
Identities
Take note
2. Axioms of
Probability
3. Verifying
Matrix
Multiplication
V
d:
Cho
x1,x2,x3,x4,x5,x6
l
6
Trong
cc
rj
u
c
chn
ngu
s
t
nhin
random.
nhin.
Gi
s
ta
chn
ngu
nhin
tt
Tnh
xc
sut
x1
+
x2
+
cc
cc
rj(j
=
1
n
n)
ch
cn
li
ri.
Lc
x4
+
x5
+
x6
l
s
chia
ht
n
d r
ny
j!1,j i ij j
nhn
mt
gi
tr
no
cho
6.
dij
p
dng
deferred
decision
c
th
l
0,
1
hay
khc
i.
Suy
ra
kh
ta
c
xc
sut
ny
l
1/6.
nng
chn
ri
tha
mn
phng
trnh
(3.1)
l
khng
qu
1/2
bi
ri
ch
c
th
nhn
gi
tr
0
hoc
1.
Vy
xc
sut
tht
bi
trong
mt
ln
chy
ALGORITHM(1.3)
l
1/2.
Chy
n
ln
c
lp
cho
ta
xc
sut
tht
1
bi
l
(2)n
rj = 0 ri =
4. A
Randomized
Min-Cut
Algorithm
(Karger
Algprithm)
dij rj
dij
(3.1)
c
k
1
nk Bt
ng
thc:
m
!x
2
1 x e
Bn
c
th
o
hm
chng
minh
bt
ng
thc
Nhn
xt:
xc
sut
tht
bi
khng
ph
ny..
thuc
vo
s
cnh
ca
th
m
ch
ph
thuc
vo
s
nh.
Iteration
th
2:
Sau
ln
chy
u
tin
th
cn
n-1
cnh.
2
Do
vy
Pr(E2 |F1 ) = 1 n!1
Tng
t
nh
vy
ti
iteration
th
i:
th
cn
n
-
i
+
1
cnh
2
Pr(Ei |Fi!1 ) = 1 n!i!1
Tng
kt
li
ta
c:
Pr(Fn!2 ) = Pr(En!2 Fn!3 )
= Pr(En!2 |Fn!3 )Pr(Fn!3 )
= ...
ni+1
) =
n
i=1
ni1
ni+1
) =
n
i=1
2
n(n1)
( 1
Ta
ly
kt
qu
nh
nht
trong
ln
ln
chyc
chng
trnh
s
dng
ALGORITHM
1.4
2
1
Pr(fail) = (n(n!1))n(n!1)ln n e!2 ln n = n2
3. Exercises:
http://docs.google.com/View?id=dgmqjfk5_188cq53p6ft
1. Background
1.1. The inclusive-exclusive principle:
Pr(E1E2) = Pr(E1) + Pr(E2) - Pr(E1E2)
ng dng:
- 2 independent events: Pr(AB) = Pr(A)Pr(B)
- 2 disjoint events: Pr(E1E2) = Pr(E1) + Pr(E2)
1.2. Bayes' Law:
Pr(E1 | B) = Pr(BE1) / Pr(B) =Pr(BE1)Pr(BE1)+ Pr(BE2)
2. Tm tt l thuyt
Cc mc
ch o
Ni dung
Take note
Do
:
y Pr ((X = x) (Y =
y )) = Pr(X = x)
2. The
Bernoulli
Random
Variable
[
or
Bernoulli and indicator
random
variable]
Binomial
Xt
kt
qu
ca
mt
th
nghim:
Random
Y
=
1
nu
kt
qu
thnh
cng
Variables
Y
=
0
nu
ngc
li.
vi
Pr(Y
=
1)
=
p;
E[Y]
=
1
.
p
+
0
.
(1-p)
=
p
Binomial
Random
Variable
Ta
gi
X
l
mt
Binomial
random
variable
with
parameters
n
and
p
nu:
C
2
cch
nh
ngha
mt
bin
ngu
nhin:
1.
nh
ngha
da
trn
logic
2.
nh
ngha
da
trn
xc
sut
tc
ra
ch
r
tp
v
xc
sut
ca
tng
s
kin
trong
tp
.
Cch
th
2
cho
nh
ngha
cht
ch
hn
v
c
s
dng
nhiu
hn.
n k
p (1 p)n!k
i
3
Conditional
Expectation
V
d:
Xt
2
con
xc
sc
Conditional Xt
mt
khng
gian
mu
con
ca
khng
chun
(chun
tc
c
6
mt,
Expectation gian
mu
,
tha
mn
Z
=
z;E[Y | Z =
mi
mt
c
xc
sut
1/6
v
ghi
mt
s
khc
nhau
t
1
z] = y yPr(Y = y | Z = z)
c
gi
l
expectation
ca
bin
ngu
n
6).
Gieo
1
ln
c
2
s
l
X1
v
X2.
nhin
Y
vi
iu
kin
Z
=
z.
t
X
=
X1
+
X2;
Decomposition
Law
xPr(X
E[X ] = y Pr(Y = y)E[X | Y = E[X |X1 = 2] =
x
y]
= x|X1 = 2)
Chng
minh
cng
thc
ny
tng
t
nh
Nhn
t
hy
6
>
=
X1,
X2
>=
1;
chng
minh
linearity
of
expectation.
y
X1
=
2
nn
8>=
X
>=
3
nh
l
v
k
vng
ca
k
vng:
8
E[X |X1 = 2] =
x
E[Y]
=
E[E[Y
|
Z]
x ! 3
8
Pr(X = x|X1
= 2)
x
x ! 3
1 11
=
6
2
Compare:
E1,
E2
l
c
s
kin
xung
khc
(E1 E2 =)
m
E1
v
E2
lp
y
khng
gian
mu.
Khi
vi
mt
s
kin
bt
k
B
ta
c:
Pr(B) = Pr(B E1) +
Pr(B E2)
Chng
minh:
t:
f(Z)
=
E
[Y
|
Z].
Ta
c:
E[E[Y
|
Z]
=
E[f(Z)]
E[f(Z) ] =
Pr(Z = z)f(z)
z
=
Pr(Z = z)E[Y | Z z]
z
= E[X]
(ng
thc
cui
suy
t
decomposition
law)
4. The
Geometric
Distribution
Geometric
X
l
mt
geometric
random
variable
Expectation with
parameter
p
nu:
Pr(X = n) = (1 p)n!1 p
T
y
ta
tnh
c:
Pr(X n) = i!n Pr(X = n) =
(1 p)n!1 p
i!n
nh
ngha
v
geometric
random
variable
c
a
ra
di
dng
phn
phi
xc
sut
(Xem
Chapter
1)
Din
gii
v
ngha,
X
geometric
random
variable
with
parameter
p
tc
X
l
s
ln
cn
th
t
c
= (1 p)n!1
1 (1 p)
The
number
of
Trials
:
fixed
=
n
The
number
os
Successes:
X
Cng
thc
tnh
expectation
cho
bin
2.
Geometric
Random
nguyn
dng::
Variable:
Cho
X
l
mt
bin
ngu
nhin
ri
rc
ch
The
number
of
Trials:
X
nhn
cc
gi
tr
nguyn
dng:
The
number
of
success:
E[X] = i!1 Pr(X 1)
fixed
=
1
p
dng
cng
thc
trn
ta
tnh
c
Chng
minh:
expectation
ca
geometric
random
(cng
thc
tnh
expectation
variable:
cho
bin
nguyn
dng)
E[X] =
jPr(X = j)
j!1
i!j
Pr(X = j)
=
j!1
i!1
j!1
i!1
Extra:
Coupon
Collector's
Problem
i!j
i!1
j!i
Pr(X = j) =
Pr(X = j)
= i!1 Pr(X j)
Problem:
C
n
loi
coupons
trong
hp,
s
lng
mi
loi
rt
rt
ln.
Mi
ln
ta
Bi
ton
ny
c
nhiu
ng
ly
ra
1
coupon.
Hi
ta
phi
ly
bao
nhiu
dng
trong
thc
t
v
vy
bn
ln
c
th
thu
thp
c
n
loi
cn
c
k
c
phg
php
coupons
ny.
phn
tch
v
li
gii.
Problem
Analysis:
Bi
ton
yu
cu
tm
s
ln
ly
c
th
S
dng
k
thut
braching
thu
c
n
loi.
Nu
vy
s
khc
nhau
process
with
0
generation
in
gia
n-1
loi
v
n
loi
l
g?
Lc
ta
ly
memory
or
memoryless.
c
1
loi
coupon
ri.
Kh
nng
c
thm
loi
na
l
rt
d,
xc
sut
ln
ly
tip
theo
c
thm
1
loi
coupon
l
(n-
1)/n.
Cn
nu
xt
khi
c
n-1
loi
ri,
ly
c
loi
th
n
kia
xc
sut
ch
l
1/n.
Nh
vy
vic
ly
thm
c
mt
loi
coupon
mi
khng
ph
thuc
vo
cng
vic
ta
lm
trc
m
ch
ph
thuc
vo
s
coupon
tnh
n
thi
im
hin
ti.
Tc
s
coupon
cn
ly
thm
i
t
i-1
loi
n
i
loi
ch
ph
thuc
vo
gi
tr
ca
i.
Proof:
Gi
X_i
l
s
coupon
cn
ly
thm
tnh
t
lc
ta
c
i-1
loi
n
lc
ta
c
i
loi.
H(n)
c
gi
l
Harmonic
Mi
X_i
(i=1,2,...,n)
l
mt
geometric
number.
random
variable
with
parameter
H(n)
=
ln(n)
+
(1).
i!1
n!i!1
pi = 1 n = n .
Suy
ra:
E[Xi ] =
chng
minh
ta
ch
cn
dng
bt
ng
thc
tch
phn
1
n
=
n!i!1
pi
t
1
n
n
cho
hm
f(x)
=
1/x
Suy
ra:
f( x ) f(x) f( x )
n
E[X] = E[ i ! 1 Xi ] =
n
n
E[Xi ] = ni ! 1
i ! 1
n!i!1
t
k
=
n
-
i
+
1
ta
dc:
n
= n
k ! 1
1
= nH(n)
k
5.
Quick
sort
l
mt
gii
thut
tng
i
Application: n
gin
v
hiu
qu.
Vn
cht
cng
The Expected trong
quik
sort
chnh
l
chn
pivot
sao
Run-Time of cho
hp
l,
trong
ri
vo
worst
case
Quicksort
n^2
ca
gii
thut.
Nu
nh
y
ta
chn
pivot
mt
cch
ngu
nhin
liu
gii
thut
trn
c
tr
nn
tt
hn
khng?
tr
li
cu
hi
trn
ta
s
phn
tch
thi
gian
tnh
ca
Quick
Sort
vi
pivot
chn
ngu
nhin.
Probabilistic
Analysis:
Trong
gii
thut
Quick
Sort,
sau
khi
K thut s dng Bernoulli
Random Variable:
chn
xong
pivot
cng
vic
ca
ta
l
so
snh
pivot
vi
tng
s
trong
dy
con.
y
nu
phn
tch
k
hn
th
cu
lnh
so
snh
ny
chnh
l
cu
lnh
c
trng
ca
vng
lp
trong
Quick
Sort.
Do
,
ch
cn
tnh
s
ln
so
snh
ny
ta
s
thu
c
thi
gian
tnh
ca
thut
ton.
Gi
s
ln
so
snh
ny
l
X.
Gi
s
y_1,y_2,...,y_n
l
dy
c
xp
xp.
Gi
X_ij
l
bin
ngu
nhin
Beunoulli
tha
mn:
X_ij
=
1
nu
trong
qu
trnh
sp
xp
ta
c
so
snh
y_i
v
y_j
i,j
=
1,2,3...,n
;
i<>j;
X_ij
=
0
nu
ngc
li.
Ta
c:
n
X = n!1
Xij
i!1
j!i!1
Suy
ta:
n
E[X] = E[ n!1
Xij ] =
i!1
j!i!1
n!1
n
E[Xij ]
(5.1)
i!1
j!i!1
Cng
vic
tip
theo
ca
ta
l
i
tnh
E[X_ij}
Xt
cc
s
trong
khong
t
v
tr
i
n
v
Harmonic number H(n)
tr
j
:
y_i,y_i+1,....y_j
;
(i
<=
j)
R
rang
n
mt
lc
no
pivot
c
chn
phi
lm
vic
vi
c
y_i
v
y_j.
(C
n
k!2
n!k!1
k
= (2n + 2)
k!2
1
2(n 2 + 1)
k
Rt
gn
ta
c:
E[X]
=
(2n+2)H(n)
-
4n
hay
E[X] = (nln n)
Exercise
Exercise 2.3:
1. Cho f(x) l mt vertex function (f''(x)>=0). Chng minh rng: E[f(x)] >= f(E[x])
2.Chng minh rng: E[Xk ] (E[X])k
Exercise 2.6: C 2 con xc sc chun (chun tc c 6 mt, mi mt c xc sut 1/6 v ghi
mt s khc nhau t 1 n 6). Gieo xc sc ta c 2 s l X1 v X2.
(a) Tnh: E[X | X1 chn]
(b) Tnh E[X | X1 = X2]
(c) Tnh E[X1 | X= 9]
(d) Tnh E[X1-X2 | X = k
Exercise 2.7: Cho X v Y l 2 bin geometric vi tham s (with parameter) ln lt l p
v q.
(a) Tnh Pr(X = Y)
(b) Tnh E[max(X,Y)]
(c) Tnh Pr(min(X,Y) = k)
(d) Tnh E[X | X Y]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Answers:
____________________________________________________
Exercise 2.3:
1. Cho f(x) l mt vertex function (f''(x)>=0). Chng minh rng: E[f(x)] f(E[x])
p dng Taylor Expansion ln cn im :
f(x) = f() +
(0,1)
f'()(x-)
1!
cf''()(x-)2
2!
f'()(x-)
1!
] = E[f()] + E[
f'()(x-)
f'()(x-)
1!
].
=
6
6
x ! 1
x =
7
2
==
1
3
6
x ! 1
3x =
3
6
x ! 1
x =
21 = 7
6
x ! 3
xPr(X1 = x| X = 9)
Pr(X1 ! xX ! 9)
Pr(X ! 9)
6
x ! 3
=
21
1
36
4
36
1
4
= 0.
Bi X1 v X2 l 2 bin hon ton c lp, gi vai tr nh nhau trong biu thc trn. Do
vy kt qu ca 2 biu th phi nh nhau.
(Chng minh bng phn chng cng l mt cch hay bi E[X] c nh ngha l mt
nh x t R vo R.
____________________________________________________
Exercise 2.7: Cho X v Y l 2 bin geometric vi tham s (with parameter) ln lt l p
v q.
(a) Tnh Pr(X = Y)
Pr(X = Y) = n!1 Pr(X = x Y = y) = n!1 Pr(X = x )Pr(Y = y)
=
(1-q)
n!1
(1-p)
n-1
p(1-q)
n-1
q = pq
n!1
((1-p)
n-1
1
pq
= pq 1- (1-p) (1-q) = p ! q - pq
(b) Tnh E[max(X,Y)]
Do X v Y l 2 bin geometric vi tham s (with parameter) ln lt l p v q nn E[X] =
1/p v E[Y] = 1/q
Gi X1 l mt Bernoulli random variable tha mn
X1 = TRUE khi v ch khi X = 1 tc ln th u tin thnh cng. Pr(X1 = TRUE) =
p;
X1 = FALSE nu ngc li.
E[max(X,Y)] = Pr(X1 = TRUE) E[max(X | X1 = TRUE , Y) + Pr(X1 = FALSE)
E[max(X | X1 = FALSE , Y)
= p * E[Y] + (1-p)*E[max(X | X1 = FALSE , Y)
] ( v X1 = TRUE khi v ch khi X = 1 nn max(X|X1= 1,Y) = Y) (b-1)
Khi X > 1 , gi X* l s ln cn phi th cho n ln thnh cng u tin. Khi E[X|
X1 = FALSE] = E[X* +1].
E[max(X,Y)] = p * E[Y] + (1-p)*E[max(X* + 1 , Y) ]
Gi Y1, v Y* l bin tng t nh X1 v X*, ch cn thay X bi Y. Lm tng t nh
trn ta thu c:
E[max(X,Y)] = p * E[Y] + (1-p)*( q*E[max( X* + 1 , Y|Y1 = TRUE) + (1 q)*E[max(X* + 1 , Y*+1)] )
= p * E[Y] + (1-p)*( q*E[X* +1] + (1-q)
*E[max(X*,Y*) + 1]).
Do tnh memoryless ca phn phi geometry nn E[X*] = E[X], E[Y*] = E[Y],
E[max(X*,Y*) + 1] = E[max(X,Y)], E[X] = 1/p v E[Y] = 1/q . Thay vo ta c:
E[max(X,Y)] = p/q + (1-p)*(q*(1/p+1)+(1-q)*(E[max(X,Y)]+1) )
Suy ra : E[max(X, Y)] =
p q
q p
1 ! ! - p -q
p ! q -pq
k-1
p(1-q)
k-1
q + (1-p)k p(1-q)
k-1
k-1
k-1
(p + q - pq)
k-1
Xi-1
n
X
Suy ra: E[Xi |Xi-1 ] = Xi-1 n + (Xi-1 + 1)(1 - ni-1) = 1 + Xi-1 (1- n)
Ly Expectation ca 2 v ta c: E[X_i] = E[E[X_i | X_i-1]] = 1 + a*E[X_i-1]; vi a =
1-1/n.
Bin i cng thc truy hi trn ta thu c: E[X2n ] = a2n-1 X[1] + a2n-2 + .. . + a + 1 ;
vi a = 1-1/n.
Thay X_1 = 1 vo ra rt gn ta c: E[X2n ] = a
2n-1
+a
2n-2
+ .. . + a + 1 =
1-a
2n
1-a
-2
-4
E[X2n ] = (1-e )
2
____________________________________________________
Exercise 2.22: Cho u vo l mt dy ngu nhin n s: a1,a2,....,an. Mi s a_i c i
ch vi s lin k cho n khi n n c v tr cn sp xp.
Tnh Expected Number of Swap of Buble Sort.
Proof:
Ta ni a_i v a_j l inverted (b o ln) nu (i < j) AND (a_i > a_j).
Gi X_ij l mt Bernoulli random variable tha mn:
X_ij = 1 nu a_i v a_j l mt inverted pair. Pr(Xij = 1) = nk!1 Pr(ai = k
aj > k) =
n
k!1
1
n
n-k
= 1 -
n
n
k!1
k = -
2 2n
n-1
i!1
n
j!i!1
n-1
i!1
Xij ] =
1
n
j!i!1
n-1
i!1
( - ) =
n
j!i!1
(n-1)2
2 2n
____________________________________________________
Exercise 2.23: Cho u vo l mt dy ngu nhin n s: a1,a2,....,an
Tnh Expected Runtime of Linear Insertion Sort.
Proof:
Sau khi sp xp cc s c th t t l: 1,2,...,n
Gi s trc khi sp xp cc s c th t 1,2,...,n ang v tr ln lt l x_1,x_2, ...
,x_n. y (x_1,x_2, ... ,x_n ) l mt hon v ca (1,2,...,n)
x_i v n v tr th nht cn thc hin |x_i - i| ln swap. Tng s ln swap l
n
i!1
X =
|xi - i|
1
n
E[|ai - i|] =
(
i-1
k!1
i-1
k!1
(i-k) +
Pr(ai = k)(i - k) +
n
k!i!1
= (
i-1
j!1
n
i!1
Suy ra:
n
i!1
E[|xi - i|] = (
n
hon i sig-ma ta c:
1
E[X] = (
1
n
n-1
j!1
n-1
j!1
((n-j)j ) +
n-1
i!j
n-1
j!1
2
= (
n
(n-1) -
2
n-1
j +
n-1
j!1
n-1
j!1
n-i
j!1
i-1
j!1
n-j
i!1
j )
j +
n-i
j!1
n
i!1
j ) . p dng lut
j ) =
((n-j)j ) = 2
j +
( (n-j)j ))
(n-1)n(2n - 1)
E[X] = n2 +
Pr(ai = k)(k - i) =
(k-i))
1
E[X] =
n
k!i!1
n-1
j!1
j - (
n
n-1
j!1
j ) = n
1. Background
1.1. The inclusive-exclusive principle:
Pr(E1E2) = Pr(E1) + Pr(E2) - Pr(E1E2)
ng dng:
- 2 independent events: Pr(AB) = Pr(A)Pr(B)
- 2 disjoint events: Pr(E1E2) = Pr(E1) + Pr(E2)
1.2. Bayes' Law:
Pr(E1 | B) = Pr(BE1) / Pr(B) =Pr(BE1)Pr(BE1)+ Pr(BE2)
1.3. Expectation
1.4. Binomial Distribution
: n trials + p success
1.5. Geometric Distribution
: n trials + 1 success
2. Tm tt l thuyt
Cc mc
ch o
Ni dung
Take note
2. Balls into
Bins
k! < i!0 i! = ek
]
Do
xc
sut
tn
ti
mt
bin
cha
nhiu
hn
M
balls
l:
n
1
e
n M (n)M n(M)M
Thay
M
=
3
ln
n/ln
ln
n
vo
ri
chuyn
ton
b
sang
dng
exp
ta
chng
minh
c
xc
sut
ny
khng
qu
1/n
3. The Poisson
Distribution
4. Application
Hashing:
Problem Set
Membership
4.1. Chain
Hashing
4.2.
Fingerprint
Problem:
hiu
th
no
Cho
tp
S
=
{s_1,s_2,
...
,s_m}
l
tp
con
ca
mt
tp
l
rt
ln
bn
c
rt
ln
universe
U.
th
coi
S
l
tp
Vi
mt
phn
t
x
bt
k
chn
t
U,
ta
phi
tr
li
cc
bi
ht
trong
cu
hi:"
x
c
l
phn
t
ca
S
hay
khng?".
my
tnh
ca
bn.
Cu
hi
ny
c
gi
l
Set
Membership
Problem
.
Cn
U
l
tp
ton
1.
Chain
Hashing
b
bn
nhc
trn
Phng
php
c
in
nht
l
to
mt
bng
bm
tm
th
gii.
Method
kim,
Bn
c
th
dng
hm
bm
ngu
nhin.Phng
4.3. Bloom php
ny
lun
cho
kt
qu
chnh
xc
v
thi
gian
Filter Method kh
nh.
Theo
phn
tnh
mc
2,
maximum
load
bng
ln
n/ln
ln
n
l
khng
qu
1/n.
Vy
th
thi
gian
tm
kim
ln
hn
(ln
n/ln
ln
n
)
vi
xc
sut
khng
qu
n.
Nhc
im
ca
phng
ph
ny
l
truy
cp
b
nh
qu
ln
:
m
phn
t
ca
tp
S
khng
th
lu
trong
RAM
c.
4.2.
Fingerprint
Method
Ta
nh
ngha
mt
hm
to
fingerprint
nh
sau:
f:
S
->
B
fingerprint
=
du
Trong
B
l
tp
cc
s
nh
phn
b
bt,
D
thy
B
vn
tay
c
2^b
phn
t.
Ta
ch
cn
lu
m
phn
t,
mi
phn
t
b
bt
trong
RAM.
Tc
cn
m*2^b
bit.
Vic
tnh
f(x)
cng
chnh
l
vic
tm
ra
fingerprint
ca
x.
ALGORITHM
Tnh
f(x).
So
snh
f(x)
vi
tt
c
cc
f(s_i);
s_i
thuc
S
C
2
trng
hp
xy
ra:
Case
1:
Nu
f(x)
<>
f(s_i);
mi
i
=
1,
2,
...,m
=>
x
khng
thuc
S.
Case
2:
Nu
tn
ti
1<=
i
<=
m
f(x)
=
f(s_i)
=>
x
thuc
S.
PROBABILISTIC
ANALYSIS:
Case
1:
Nu
f(x)
<>
f(s_i);
mi
i
=
1,
2,
...,m
=>
x
khng
thuc
S.
Gi
s
ngc
li
x
thuc
S
th
phi
tn
ti
i
m
f(x)
=
f(s_i)
vi
1<=
i
<=
m.
Mu
thun!
Case
2:
Nu
tn
ti
1<=
i
<=
m
f(x)
=
f(s_i)
=>
x
thuc
S.
iu
khng
nh
ny
l
cha
hn
ng
bi
c
th
x
khng
thuc
s
nhng
v
false
negative
l
tnh
fingerprint
ca
x
trng
vi
fingerprint
ca
s_i.
khi
x
thuc
S
m
S
kin
ny
c
ta
gi
l
false
positive.
Ta
nhn
ta
li
tnh
ton
ra
nhm
x
.
l
x
khng
thuc
Liu
xc
sut
false
positve
ny
c
ln
khng?
Nu
S.
ln
qu
th
ta
khng
nn
dng
gii
thut
ny.
Vic
gi
l
Pr(false positive) = Pr (x S (k: f(x) = f(sk )) )
negative
hay
= Pr(i: f(x) = f(si )) ) Pr (x S (i: f(x) positive
l
ch
tc
ng
ca
kt
qu
= f(si )) )
sai
ln
ng
= 1 Pr((i: f(x) = f(si )) ) Pr (x S )
dng
m
ta
ang
thc
hin.
y
ta
coi
S
l
tp
rt
quan
trng,
"th
ly
nhm
cn
hn
b
st".
ca tp U rt rt ln.
m
i!1
Pr(i: f(x)f(si ) ) =
m
i!1
Pr(f(x)f(si ) ) =
(1 Pr(f(x) = f(si ) )
Thay
vo
ng
th
(4.2)
ta
c:
Pr(false positive) = 1 (1
1
2
!m
1 e
m
2
16
32
1
65536
4.3.
Bloom
Filter
Method
Ging
nh
fingerprint
method
ta
s
dng
mt
nh
x
f
t
tp
S
vo
tp
cc
gi
tr
n-bit
By
gi
thay
cho
vic
mi
mt
phn
t
cho
ra
mt
fingerprint
ta
ch
cn
mt
dy
n
bt
m
ta
gi
l
Bloom
lu
tt
c
cc
f(s_i).
Nu
f(s_i)
tr
li
gi
tr
no
m
ti
bit
=
1
th
ta
thit
lp
bit
ny.
V
d:
Bloom
n
=
4
bit
0000;
sau
khi
tinhs
f(s_1)
=
2
=
0010.
Ta
thu
c
Bloom
0010.
Sau
khi
tnh
f(s_2)
=
10
=
1010
ta
thu
c
Bloom
1010.
Phng
php
Bloom
Filter
s
dng
mt
Bloom,
v
cc
hm
h_i;
i
=1,2,...,k
Ta
ni
y
nm
trong
Bloom
nu
tt
c
cc
v
tr
bit
1
ca
y
u
c
trong
Bloom.
V
d:
y=
8
=
1000
nm
trong
Bloom
1010.;
y
=
1001
khng
nm
trong
Bloom
1010.
ALGORITHM
Tnh
h_i(x);
i
=1,2,...,k
x
thuc
S
<=>
h_i(x)
nm
trong
Bloom
no
vi
mi
i
=1,2,...,k
PROBABILISTIC
ANALYSIS:
Case
1:
Khng
tn
ti
i
h_i(x)
nm
trong
Bloom
=>
x
khng
thuc
S.
Gi
s
ngc
li
x
thuc
S
th
phi
tn
ti
i:
1<=
i
<=
m.
m
h_i(x)
nm
trong
Bloom.
Mu
thun!
Case
2:
Nu
mi
i:
1<=
i
<=
m
,
h_i(x)
nm
trong
Bloom
;
=>
x
thuc
S.iu
khng
nh
ny
l
cha
hn
ng
bi
c
th
x
khng
thuc
S
nhng
h_i(x)
nm
trong
Bloom.
Ging
nh
trong
fingerprint
method,
ta
gi
s
kin
ny
l
false
positive.
Ta
nhn
nhm
x
.
Pr(false positive) = Pr (x
S (i: hi (x) in Bloom) )
= Pr(i: hi (x) in Bloom ) Pr (x
S (i: hi (x) in Bloom ))
(V
cc
Bloom
ny
c
lp
nhau).
(4.3)
Pr (x S ) = 0
do
S
l
tp
con
c
m
phn
t
ca
tp
U
rt
rt
ln.
Gi
s
ta
lu
Bloom
di
dng
1
mng
A[1:n].
Xt
1
Bloom
bt
k:
Vic
thit
lp
cc
bit
1
trong
Bloom
c
thc
hin
k
ln
ng
vi
k
hm
h(x),
mi
ln
nh
vy
nh
x
m
phn
t
ca
S.
Suy
ra:
1
= (Pr(j A[hi (x)] 1))mk = (1 )mk
n
Thay
vo
ng
thc
(4.3)
ta
c:
1
Pr(false positive) = (1 (1 )mk )k = (1 p)k ;
n
1
! n/mln pln(1!p)
ln p
ln(p(1!p))
.
t
min
khi
ln
p
*
ln
(1-p)
t
max.
ln(1 p)
ln pln(1 p)
3. Exercises:
http://docs.google.com/View?id=dgmqjfk5_184c7sdskcv
Exercise
Exercise 5.21:
Trong open addressing, hash table c ci t bng mng, hon ton khng s dng
linked-lists. Mi entry trong bng ch c th rng hoc cha 1 phn t.
Bn c th nhp vo link sau tm hiu r hn.
http://en.wikipedia.org/wiki/Open_addressing
http://courseweb.xu.edu.ph/courses/ics20/supplements/holte/open-addr.htm
http://courseweb.xu.edu.ph/courses/ics20/supplements/holte/open-addr.htm
Vi mi key k trong table ta nh ngha mt probe sequence h(k,0),h(k,1) .... ,h(k,n);
n l s entry trong table.
chn kha k ta tnh ln lt h(k,0),h(k,1) .... cho n khi tm c trng chn k,
sau n ln tht bi ta hiu l bng full v khng th chn thm
Khi tm kim cng lm tng t nh vy, tnh ln lt h(k,0),h(k,1) ....n khi tm c
kha k, hoc tm thy mt trng th chng t trong bng khng c kha k.
Gi s h(k,j) c th nhn bt c gi tr ngu nhin no trong n entries ca bng v tt c
cc h(k,j) c lp.
Sau khi s dng bng ny lu gi m = n/2 phn t, ta nhn c yu cu tm kha k
trong bng .
Gi X_i l s probe (thm d) cn thc hin chn kha th i. t X = max {X_i}; 1
i m l s thm d ln nht cn thc hin chn phn t c kha m
(a) Chng minh Pr(X>2log n) 1/n
(b) Chng minh expectation ca di ln nht ca chui thm d cn thc hin l E[X]
= O(log m). Ch : n = 2m.
Phng php trn cn c gi l Double Hashing. V d: h(k,i) = a*h(k) + b*h(i) (mod
n) tc l dng 2 hash function.
(c) Open addressing/Linear Probing l mt trng hp ring ca phng php ny. :
h(k,i) = h(k) + i (mode n); tron h(i) = i. Nhng khng hn nh vy bi h(k,i) v
h(k,i+1) khng cn c lp nhau na. Hy a ra nh hng ca s khac bit ny v tm
cch p dng Double Hashing cho vic tm xpectation ca di ln nht ca chui thm
d cn thc hin cho Open addressing/Linear Probing. ( thi K52 - CNTT - HBK HN)
Exercise 5.22:
Gi s list cc bi ht bn u thch l X, v list cc bi ht ti u thch l Y. Bit rng
|X| = |Y| = n.
Ta to ra Bloom filter ca cc tp X v Y s dng cc s m bits v k hash functions.
(a) Tnh expectation ca s cp bit khc nhau trong Bloom filter ca X v Y
(b) Tnh E[ |X Y|]
(c) Gii thch ti sao ta c th s dng phng php ny tm nhng ngi c s thch
cng th loi nhc thay cho vic so snh tt c list mt cch trc tip
2
1
p s: (a) n(2p-p ) trong p = (1 - n)mk ;
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Answer
Exercise 5.21: Trong open addressing, hash table c ci t bng mng, hon ton
khng s dng linked-lists. Mi entry trong bng ch c th rng hoc cha 1 phn t.
Bn c th nhp vo link sau tm hiu r hn.
http://en.wikipedia.org/wiki/Open_addressing
http://courseweb.xu.edu.ph/courses/ics20/supplements/holte/open-addr.htm
http://courseweb.xu.edu.ph/courses/ics20/supplements/holte/open-addr.htm
Vi mi key k trong table ta nh ngha mt probe sequence h(k,0),h(k,1) .... ,h(k,n);
n l s entry trong table.
chn kha k ta tnh ln lt h(k,0),h(k,1) .... cho n khi tm c trng chn k,
sau n ln tht bi ta hiu l bng full v khng th chn thm
Khi tm kim cng lm tng t nh vy, tnh ln lt h(k,0),h(k,1) ....n khi tm c
kha k, hoc tm thy mt trng th chng t trong bng khng c kha k.
Gi s h(k,j) c th nhn bt c gi tr ngu nhin no trong n entries ca bng v tt c
cc h(k,j) c lp.
Sau khi s dng bng ny lu gi m = n/2 phn t, ta nhn c yu cu tm kha k
trong bng .
Gi X_i l s probe (thm d) cn thc hin chn kha th i. t X = max {X_i}; 1
i m l s thm d ln nht cn thc hin chn phn t c kha m
Proof:
(a) Chng minh Pr(X>2log n) 1/n
ln chn th i, trong bng c i - 1 entries. Ta phi tnh h(i.j) cho n khi tm c
entry trng. Nh vy X_i l mt geometric random variable with parameter
i-1
j-1
= p(1-p)
l!0
l!j
Pr(X = j) =
j-1
j-1
l!j
(1-p)l = p(1-p)
(1-p) p
1
1-(1-p)
= (1-p)
j-1
Suy ra: Pr(Xi > 2log m) = Pr(Xi 2log m + 1 ) = (1-pi )2logm ; thay pi = 1 -
i-1
n
v n
= 2m vo ta c:
Pr(Xi > 2log m) = (
i-1 2logm
2m
m 2logm
2m
1
m2
(b) Chng minh expectation ca di ln nht ca chui thm d h(i,j) cn thc hin l
E[X] = O(log m). Ch : n =2m.
m
E[X] =
( xPr(X = x)) +
x!2 log m
(xPr(X = x))
x ! 2 log m
m
< 2 log m
Pr(X = x) + n
x!2 log m
Pr(X = x)
x ! 2 log m
1
n
= 2 log n
1
trn ta s dng: Pr(X 2 log m) 1 v kt qu Pr(X > 2 log m) t cu a.
n
pi!1
pi = 1 - . Do vy: E[Xi!1 ] =
n
n-i
1
1 -
; trong = n
Ch :
Gi X_i l s probe (thm d) cn thc hin chn kha th i.
trn ta a ra nhn xt X_i l mt geometric random variable with parameter
i-1
pi!1
pi = 1 - . Do vy: E[Xi!1 ] =
n
n-i
; trong = n
1 -