Randomized Algorithms

Randomization
Algorithmic design patterns.

13. R ANDOMIZED A LGORITHMS ・Greedy.
・Divide-and-conquer.
‣ contention resolution ・Dynamic programming.
‣ global min cut ・Network flow.
・Randomization.
‣ linearity of expectation in practice, access to a pseudo-random number generator
‣ max 3-satisfiability Randomization. Allow fair coin flip in unit time.
‣ universal hashing
Why randomize? Can lead to simplest, fastest, or only known algorithm for
‣ Chernoff bounds a particular problem.
‣ load balancing
Lecture slides by Kevin Wayne  Ex. Symmetry-breaking protocols, graph algorithms, quicksort, hashing,
Copyright © 2005 Pearson-Addison Wesley 
load balancing, closest pair, Monte Carlo integration, cryptography, .…
http://www.cs.princeton.edu/~wayne/kleinberg-tardos
Last updated on 1/5/22 12:31 PM 2
Contention resolution in a distributed system
Contention resolution. Given n processes P1, …, Pn, each competing for

13. R ANDOMIZED A LGORITHMS access to a shared database. If two or more processes access the database
simultaneously, all processes are locked out. Devise protocol to ensure all
‣ contention resolution processes get through on a regular basis.
‣ global min cut

Restriction. Processes can’t communicate.
‣ linearity of expectation
‣ max 3-satisfiability Challenge. Need symmetry-breaking paradigm.
‣ Chernoff bounds P1
‣ load balancing
P2
.
.
.
Pn
4
Contention resolution: randomized protocol Contention resolution: randomized protocol
Protocol. Each process requests access to the database at time t with Claim. The probability that process i fails to access the database in
probability p = 1/n. en rounds is at most 1 / e. After e ⋅ n (c ln n) rounds, the probability ≤ n−c.
Claim. Let S[i, t] = event that process i succeeds in accessing the database at Pf. Let F[i, t] = event that process i fails to access database in rounds 1
time t. Then 1 / (e ⋅ n) ≤ Pr [S(i, t)] ≤ 1/(2n). through t. By independence and previous claim, we have
Pr [F[i, t]] ≤ (1 – 1/(en)) t.
Pf. By independence, Pr [S(i, t)] = p (1 – p) n – 1.
・Choose t = ⎡e ⋅ n⎤:
$en % en
Pr[ F(i, t)] ≤ (1− en1 ) ≤ (1− en1 ) ≤ 1
e
process i requests access none of remaining n-1 processes request access
・Setting p = 1/n, we have Pr [S(i, t)] = 1/n (1 – 1/n) n – 1. ▪ ・Choose t = ⎡e ⋅ n⎤ ⎡c ln n⎤: Pr[ F(i, t)] ≤ ( 1e )
c ln n
= n−c
€
value that maximizes Pr[S(i, t)] between 1/e and 1/2
€
Useful facts from calculus. As n increases from 2, the function:
・(1 – 1/n) n -1 converges monotonically from 1/4 up to 1 / e.
・(1 – 1/n) n – 1 converges monotonically from 1/2 down to 1 / e.
5 6
Contention resolution: randomized protocol
Claim. The probability that all processes succeed within 2e ⋅ n ln n rounds

is ≥ 1 – 1 / n. 13. R ANDOMIZED A LGORITHMS
Pf. Let F[t] = event that at least one of the n processes fails to access ‣ contention resolution
database in any of the rounds 1 through t.
‣ global min cut
"n % n t
Pr [ F [t] ] = Pr$ ∪ F [i, t ] ' ≤ ∑ Pr[ F [i, t]] ≤ n ( 1− en1 )
# i=1 & i=1
‣ max 3-satisfiability
union bound previous slide
€
‣ Chernoff bounds
・Choosing t = 2 ⎡en⎤ ⎡c ln n⎤ yields Pr[F[t]] ≤ n · n-2 = 1 / n. ▪
‣ load balancing
"n % n
Union bound. Given events E1, …, En, Pr $ ∪ Ei ' ≤ ∑ Pr[ Ei ]
# i=1 & i=1
€
Global minimum cut Contraction algorithm
Global min cut. Given a connected, undirected graph G = (V, E), Contraction algorithm. [Karger 1995]
find a cut (A, B) of minimum cardinality. ・Pick an edge e = (u, v) uniformly at random.
・Contract edge e.
Applications. Partitioning items in a database, identify clusters of related - replace u and v by single new super-node w
documents, network reliability, network design, circuit design, TSP solvers. - preserve edges, updating endpoints of u and v to w
- keep parallel edges, but delete self-loops
Network flow solution. ・Repeat until graph has just two nodes u1 and v1.
・Replace every edge (u, v) with two antiparallel edges (u, v) and (v, u). ・Return the cut (all nodes that were contracted to form v1).
・Pick some vertex s and compute min s–v cut separating s from each
other node v ∈ V.
False intuition. Global min-cut is harder than min s-t cut.

a
u
b
d
c
v
⇒
contract u-v
a b
w
c
e
f f
9 10
Contraction algorithm Contraction algorithm
Contraction algorithm. [Karger 1995] Claim. The contraction algorithm returns a min cut with prob ≥ 2 / n2.
・Pick an edge e = (u, v) uniformly at random.
・Contract edge e. Pf. Consider a global min-cut (A*, B*) of G.
- replace u and v by single new super-node w ・Let F* be edges with one endpoint in A* and the other in B*.
- preserve edges, updating endpoints of u and v to w ・Let k = | F* | = size of min cut.
- keep parallel edges, but delete self-loops ・In first step, algorithm contracts an edge in F* probability k / | E |.
・Repeat until graph has just two nodes u1 and v1. ・Every node has degree ≥ k since otherwise (A*, B*) would not be
・Return the cut (all nodes that were contracted to form v1 ). a min-cut ⇒ | E | ≥ ½ k n ⇔ k / | E | ≤ 2 / n.
・Thus, algorithm contracts an edge in F* with probability ≤ 2 / n.
A* B*
F*
Reference: Thore Husfeldt

11 12
Contraction algorithm Contraction algorithm
Claim. The contraction algorithm returns a min cut with prob ≥ 2 / n2. Amplification. To amplify the probability of success, run the contraction
algorithm many times.
with independent random choices,
Pf. Consider a global min-cut (A*, B*) of G.
・Let F* be edges with one endpoint in A* and the other in B*. Claim. If we repeat the contraction algorithm n2 ln n times,
・Let k = | F* | = size of min cut. then the probability of failing to find the global min-cut is ≤ 1 / n2.
・Let Gʹ be graph after j iterations. There are nʹ = n – j supernodes.
・Suppose no edge in F* has been contracted. The min-cut in Gʹ is still k. Pf. By independence, the probability of failure is at most
・Since value of min-cut is k, | Eʹ | ≥ ½ k nʹ ⇔ k / | Eʹ | ≤ 2 / nʹ.
・Thus, algorithm contracts an edge in F* with probability ≤ 2 / nʹ. 2
)# 2 & 12 n 2 , 2ln n
・Let Ej = event that an edge in F* is not contracted in iteration j. # 2 &n ln n
2ln n 1
%1− 2 (
$ n '
= +%1− 2 ( .
+*$ n ' .-
( )
≤ e−1 =
n2
(1 – 1/x)x ≤ 1/e
Pr[E1 ∩ E2 ! ∩ En−2 ] = Pr[E1 ] × Pr[E2 | E1 ] × ! × Pr[En−2 | E1 ∩ E2 ! ∩ En−3 ]
2 ! 1− 2 1− 2
≥ (1− 2n ) (1− n−1) ( 4 ) ( 3) €
= ( n ) ( n −1 ) ! ( 24 ) ( 13 )
n −2 n−3
= 2
n(n−1)
≥ 2
n2 13 14
€
Contraction algorithm: example execution Global min cut: context
Remark. Overall running time is slow since we perform Θ(n2 log n) iterations
trial 1 and each takes Ω(m) time.
Improvement. [Karger–Stein 1996] O(n2 log3 n).

trial 2
・Early iterations are less risky than later ones: probability of contracting
an edge in min cut hits 50% when n / √2 nodes remain.
trial 3 ・Run contraction algorithm until n / √2 nodes remain.
・Run contraction algorithm twice on resulting graph and
trial 4
return best of two cuts.
trial 5 Extensions. Naturally generalizes to handle positive weights.

(finds min cut)
Best known. [Karger 2000] O(m log3 n).
trial 6
faster than best known max flow algorithm or
deterministic global min cut algorithm
...
Reference: Thore Husfeldt
15 16
Expectation
Expectation. Given a discrete random variable X, its expectation E[X]

13. R ANDOMIZED A LGORITHMS is defined by: ∞
E[X ] = ∑ j Pr[X = j]
j=0
‣ contention resolution
‣ global min cut
Waiting for a first success. Coin is heads with probability p and tails with
€
‣ linearity of expectation probability 1– p. How many independent flips X until first heads?
∞ ∞ p ∞ p 1− p 1
‣ universal hashing E[X ] = ∑ j ⋅ Pr[X = j] = ∑ j (1− p) j−1 p = ∑ j (1− p) j = ⋅ =
j=0 j=0 1− p j=0 1− p p2 p
‣ Chernoff bounds j –1 tails 1 head
‣ load balancing 1
X
j xj =
x
€ (1 x)2
j=0
18
Expectation: two properties Guessing cards
Useful property. If X is a 0/1 random variable, E[X] = Pr[X = 1]. Game. Shuffle a deck of n cards; turn them over one at a time;
try to guess each card.
∞ 1
Pf. E[X ] = ∑ j ⋅ Pr[X = j] = ∑ j ⋅ Pr[X = j] = Pr[X = 1]
j=0 j=0
Memoryless guessing. No psychic abilities; can’t even remember what’s
been turned over already. Guess a card from full deck uniformly at random.
not necessarily independent
€
Claim. The expected number of correct guesses is 1.
Linearity of expectation. Given two random variables X and Y defined over Pf. [ surprisingly effortless using linearity of expectation ]
the same probability space, E[X + Y] = E[X] + E[Y]. ・Let Xi = 1 if ith prediction is correct and 0 otherwise.
・Let X = number of correct guesses = X1 + … + Xn.
・E[Xi] = Pr[Xi = 1] = 1 / n.
Benefit. Decouples a complex calculation into simpler pieces. ・E[X] = E[X1] + … + E[Xn] = 1 / n + … + 1 / n = 1. ▪
linearity of expectation
19 20
Guessing cards Coupon collector
Game. Shuffle a deck of n cards; turn them over one at a time; Coupon collector. Each box of cereal contains a coupon. There are n
try to guess each card. different types of coupons. Assuming all boxes are equally likely to contain
each coupon, how many boxes before you have ≥ 1 coupon of each type?
Guessing with memory. Guess a card uniformly at random from cards
not yet seen. Claim. The expected number of steps is Θ(n log n).
Pf.
Claim. The expected number of correct guesses is Θ(log n). ・Phase j = time between j and j + 1 distinct coupons.
Pf. ・Let Xj = number of steps you spend in phase j.
・Let Xi = 1 if ith prediction is correct and 0 otherwise. ・Let X = number of steps in total = X0 + X1 + … + Xn–1.
・Let X = number of correct guesses = X1 + … + Xn.
・E[Xi] = Pr[Xi = 1] = 1 / (n – ( i – 1)).
・E[X] = E[X1] + … + E[Xn] = 1 / n + … + 1 / 2 + 1 / 1 = H(n).
n−1 n−1 n n 1
▪ E[X ] = ∑ E[X j ] = ∑ = n ∑ = n H (n)
j=0 j=0 n− j i=1 i
linearity of expectation ln(n+1) < H(n) < 1 + ln n

prob of success = (n – j) / n
expected waiting time = n / (n – j)
€
21 22
Maximum 3-satisfiability
exactly 3 literals per clause and
each literal corresponds to a different variable
13. R ANDOMIZED A LGORITHMS Maximum 3-satisfiability. Given a 3-SAT formula, find a truth assignment
that satisfies as many clauses as possible.
C1 = x2 ∨ x3 ∨ x4
‣ global min cut
C2 = x2 ∨ x3 ∨ x4
‣ linearity of expectation C3 = x1 ∨ x2 ∨ x4
‣ max 3-satisfiability C4 = x1 ∨ x2 ∨ x3
‣ universal hashing C5 = x1 ∨ x2 ∨ x4
‣ Chernoff bounds
‣ load balancing Remark. NP-hard optimization
€ problem.
Simple idea. Flip a coin, and set each variable true with probability ½,
independently for each variable.
24
Maximum 3-satisfiability: analysis The probabilistic method
Claim. Given a 3-SAT formula with k clauses, the expected number of Corollary. For any instance of 3-SAT, there exists a truth assignment that
clauses satisfied by a random assignment is 7k / 8. satisfies at least a 7/8 fraction of all clauses.
" 1 if clause C j is satisfied

Pf. Consider random variable Zj = # Pf. Random variable is at least its expectation some of the time. ▪
$ 0 otherwise.
・Let Z = number of clauses

€
satisfied by random assignment.
k
E[Z ] = ∑ E[Z j ] Probabilistic method. [Paul Erdös] Prove the existence
j=1
k of a non-obvious property by showing that a random
linearity of expectation
= ∑ Pr[clause C j is satisfied] construction produces it with positive probability!
j=1
= 7k
8
25 26
Maximum 3-satisfiability: analysis Maximum 3-satisfiability: analysis
Q. Can we turn this idea into a 7/8-approximation algorithm? Johnson’s algorithm. Repeatedly generate random truth assignments until
A. Yes (but a random variable can almost always be below its mean). one of them satisfies ≥ 7k / 8 clauses.
Lemma. The probability that a random assignment satisfies ≥ 7k / 8 clauses Theorem. Johnson’s algorithm is a 7/8-approximation algorithm.
is at least 1 / (8k).
Pf. By previous lemma, each iteration succeeds with probability ≥ 1 / (8k).
Pf. Let pj be probability that exactly j clauses are satisfied; By the waiting-time bound, the expected number of trials to find the
let p be probability that ≥ 7k / 8 clauses are satisfied. satisfying assignment is at most 8k. ▪
7k = E[Z ] =
8 ∑ j pj
j ≥0
= ∑ j pj + ∑ j pj
j < 7k /8 j ≥ 7k /8
≤ ( 7k
8
− 18 ) ∑ p j + k ∑ p j
j < 7k /8 j ≥ 7k /8
≤ ( 87 k − 18 ) ⋅ 1 + k p
Rearranging terms yields p ≥ 1 / (8k). ▪

27 28
€
Maximum satisfiability Monte Carlo vs. Las Vegas algorithms
Extensions. Monte Carlo. Guaranteed to run in poly-time, likely to find correct answer.
・Allow one, two, or more literals per clause. Ex: Contraction algorithm for global min cut.
・Find max weighted set of satisfied clauses.
Theorem. [Asano–Williamson 2000] There exists a 0.784-approximation Las Vegas. Guaranteed to find correct answer, likely to run in poly-time.
algorithm for MAX-SAT. Ex: Randomized quicksort, Johnson’s MAX-3-SAT algorithm.
Theorem. [Karloff–Zwick 1997, Zwick+computer 2002] There exists a 7/8-

stop algorithm
approximation algorithm for version of MAX-3-SAT in which each clause has after a certain point
at most 3 literals.
Remark. Can always convert a Las Vegas algorithm into Monte Carlo,
Theorem. [Håstad 1997] Unless P = NP, no ρ-approximation algorithm for but no known method (in general) to convert the other way.
MAX-3-SAT (and hence MAX-SAT) for any ρ > 7/8.
very unlikely to improve over simple randomized

algorithm for MAX-3-SAT
29 30
RP and ZPP
RP. [Monte Carlo] Decision problems solvable with one-sided error in poly-time.
13. R ANDOMIZED A LGORITHMS
can decrease probability of false negative
One-sided error. to 2-100 by 100 independent repetitions
・If the correct answer is no, always return no. ‣ contention resolution
・If the correct answer is yes, return yes with probability ≥ ½. ‣ global min cut
ZPP. [Las Vegas] Decision problems solvable in expected poly-time. ‣ linearity of expectation
running time can be unbounded,
but fast on average ‣ universal hashing
Theorem. P ⊆ ZPP ⊆ RP ⊆ NP.

‣ Chernoff bounds
‣ load balancing
Fundamental open questions. To what extent does randomization help?
Does P = ZPP ? Does ZPP = RP ? Does RP = NP ?
31
Dictionary data type Hashing
Dictionary. Given a universe U of possible elements, maintain a subset Hash function. h : U → { 0, 1, …, n – 1 }.

S ⊆ U so that inserting, deleting, and searching in S is efficient.
Hashing. Create an array a of length n. When processing element u,
Dictionary interface. access array element a[h(u)].
・create(): initialize a dictionary with S = ∅.
birthday paradox
・insert(u): add element u ∈ U to S. Collision. When h(u) = h(v) but u ≠ v.
・delete(u): delete u from S (if u is currently in S). ・A collision is expected after Θ(√n) random insertions.
・lookup(u): is u in S ? ・Separate chaining: a[i] stores linked list of elements u with h(u) = i.
Challenge. Universe U can be extremely large so defining an array of a[0] jocularly seriously
size | U | is infeasible.
a[1] null
Applications. File systems, databases, Google, compilers, checksums, P2P a[2] suburban untravelled considerating
networks, associative arrays, cryptography, web caching, etc.
a[n-1] browsing
33 34
Ad-hoc hash function Algorithmic complexity attacks
Ad-hoc hash function. When can’t we live with ad-hoc hash function?
・Obvious situations: aircraft control, nuclear reactor, pace maker, .…
int hash(String s, int n) { ・Surprising situations: denial-of-service (DOS) attacks.
int hash = 0;
for (int i = 0; i < s.length(); i++) malicious adversary learns your ad-hoc hash function
(e.g., by reading Java API) and causes a big pile-up
hash = (31 * hash) + s[i]; in a single slot that grinds performance to a halt
return hash % n;
} hash function à la Java string library
Real world exploits. [Crosby–Wallach 2003]
・Linux 2.4.20 kernel: save files with carefully chosen names.
Deterministic hashing. If | U | ≥ n2, then for any fixed hash function h, ・Perl 5.8.0: insert carefully chosen strings into associative array.
there is a subset S ⊆ U of n elements that all hash to same slot. ・Bro server: send carefully chosen packets to DOS the server,
Thus, Θ(n) time per lookup in worst-case. using less bandwidth than a dial-up modem.
Q. But isn’t ad-hoc hash function good enough in practice?
35 36
Hashing performance Universal hashing (Carter–Wegman 1980s)
Ideal hash function. Maps m elements uniformly at random to n hash slots. A universal family of hash functions is a set of hash functions H mapping a
・Running time depends on length of chains. universe U to the set { 0, 1, …, n – 1 } such that
・Average length of chain = α = m / n. ・For any pair of elements u ≠ v : Pr h ∈ H [ h(u) = h(v) ] ≤ 1/ n
・Choose n ≈ m ⇒ expect O(1) per insert, lookup, or delete. ・Can select random h efficiently.
・Can compute h(u) efficiently. chosen uniformly at random
€
Challenge. Hash function h that achieves O(1) per operation. Ex. U = { a, b, c, d, e, f }, n = 2. H = {h1, h2}
Approach. Use randomization for the choice of h. Pr h ∈ H [h(a) = h(b)] = 1/2
a b c d e f
not universal
h1(x) 0 1 0 1 0 1 Pr h ∈ H [h(a) = h(c)] = 1
adversary knows the randomized algorithm you’re using, h2(x) 0 0 0 1 1 1 Pr h ∈ H [h(a) = h(d)] = 0
but doesn’t know random choice that the algorithm makes
...
H = {h1, h2 , h3 , h4}
a b c d e f
Pr h ∈ H [h(a) = h(b)] = 1/2
h1(x) 0 1 0 1 0 1 universal
Pr h ∈ H [h(a) = h(c)] = 1/2
h2(x) 0 0 0 1 1 1
Pr h ∈ H [h(a) = h(d)] = 1/2
h3(x) 0 0 1 0 1 1
Pr h ∈ H [h(a) = h(e)] = 1/2
h4(x) 1 0 0 1 1 0
Pr h ∈ H [h(a) = h(f)] = 0
37 38
...
Universal hashing: analysis Designing a universal family of hash functions
Proposition. Let H be a universal family of hash functions mapping a Modulus. We will use a prime number p for the size of the hash table.
universe U to the set { 0, 1, …, n – 1 }; let h ∈ H be chosen uniformly at
random from H; let S ⊆ U be a subset of size at most n; and let u ∉ S. Integer encoding. Uniquely identify each element u ∈ U with a base-p
Then, the expected number of items in S that collide with u is at most 1. integer of r digits: x = (x1, x2, …, xr). distinct elements have
different encodings
Pf. For any s ∈ S, define random variable Xs = 1 if h(s) = h(u), and 0 otherwise. Hash function. Let A = set of all r-digit, base-p integers. For each
Let X be a random variable counting the total number of collisions with u. a = (a1, a2, …, ar) where 0 ≤ ai < p, define
Eh∈H [X ] = E[∑s∈S X s ] = ∑s∈S E[X s ] = ∑s∈S Pr[X s = 1] ≤ ∑s∈S 1 = | S | 1n ≤ 1 #r &

n ha (x) = % ∑ ai xi ( mod p maps universe U to set { 0, 1, …, p – 1 }
$ i=1 '
linearity of expectation Xs is a 0–1 random variable universal
€ Hash function family. H = { ha : a ∈ A }.

€
Q. OK, but how do we design a universal class of hash functions?
39 40
Designing a universal family of hash functions Number theory fact
Theorem. H = { ha : a ∈ A } is a universal family of hash functions. Fact. Let p be prime, and let z ≢ 0 mod p. Then α z ≡ m mod p has
at most one solution 0 ≤ α < p.
Pf. Let x = (x1, x2, …, xr) and y = (y1, y2, …, yr) encode two distinct elements of U.
We need to show that Pr[ha(x) = ha(y)] ≤ 1 / p. Pf.
・Since x ≠ y, there exists an integer j such that xj ≠ yj. ・Suppose 0 ≤ α1 < p and 0 ≤ α2 < p are two different solutions.
・We have ha(x) = ha(y) iff ・Then (α1 – α2) z ≡ 0 mod p; hence (α1 – α2) z is divisible by p.
aj (yj − xj) ≡
= ∑ ai (xi − yi ) mod p
・Since z ≢ 0 mod p, we know that z is not divisible by p.
!#"# $ i≠ j
! ##"## $ ・It follows that (α1 – α2) is divisible by p.
・This implies α1 = α2. ▪
z
here’s where we
m
use that p is prime
・Can assume a was chosen uniformly at random by first selecting all
coordinates ai where i ≠ j, then selecting aj at random. Thus, we can
assume €
ai is fixed for all coordinates i ≠ j. Bonus fact. Can replace “at most one” with “exactly one” in above fact.
・Since p is prime, aj z ≡ m mod p has at most one solution among p Pf idea. Euclid’s algorithm.
possibilities. see lemma on next slide
・Thus Pr[ha(x) = ha(y)] ≤ 1 / p. ▪
41 42
Universal hashing: summary
Goal. Given a universe U, maintain a subset S ⊆ U so that insert, delete,

and lookup are efficient. 13. R ANDOMIZED A LGORITHMS
Universal hash function family. H = { ha : a ∈ A }. ‣ contention resolution

#r &
ha (x) = % ∑ ai xi ( mod p ‣ global min cut
$ i=1 '
・Choose p prime so that m ≤ p ≤ 2m, where m = | S |. can find such a prime using ‣ max 3-satisfiability
・Fact:
€ there exists a prime between m and 2m. another randomized algorithm (!) ‣ universal hashing
‣ Chernoff bounds
Consequence. ‣ load balancing
・Space used = Θ(m).
・Expected number of collisions per operation is ≤ 1
O(1) time per insert, delete, or lookup.
43
Chernoff Bounds (above mean) Chernoff Bounds (above mean)
Theorem. Suppose X1, …, Xn are independent 0-1 random variables. Let X = Pf. [ continued ]
X1 + … + Xn. Then for any µ ≥ E[X] and for any δ > 0, we have ・Let pi = Pr [Xi = 1]. Then,
for any α ≥ 0, 1 + α ≤ e α
sum of independent 0-1 random variables
・Combining everything:
is tightly centered on the mean
Pf. We apply a number of simple transformations. t t

≤ e−t(1+δ)µ ∏i E[e t X i ] ≤ e−t(1+δ)µ ∏i e pi (e −1) ≤ e−t(1+δ)µ eµ(e −1)
・For any t > 0, Pr[X > (1+ δ)µ]
[
Pr[X > (1+ δ)µ] = Pr e t X > e t(1+δ)µ ] ≤ e−t(1+δ)µ ⋅ E[e tX ] previous slide inequality above ∑i pi = E[X] ≤ μ
f(x) = etX is monotone in x Markov’s inequality: Pr[X > a] ≤ E[X] / a €
€ ・Finally, choose t = ln(1 + δ). ▪

・Now E[e tX ] = E[e t ∑i X i ] = ∏ i E[e t X i ]
definition of X independence
45 46
€
Chernoff Bounds (below mean)
Theorem. Suppose X1, …, Xn are independent 0-1 random variables.

Let X = X1 + … + Xn. Then for any µ ≤ E [X ] and for any 0 < δ < 1, we have 13. R ANDOMIZED A LGORITHMS
‣ global min cut
Pf idea. Similar.
Remark. Not quite symmetric since only makes sense to consider δ < 1. ‣ max 3-satisfiability
‣ Chernoff bounds
‣ load balancing
47
Load balancing Load balancing
Load balancing. System in which m jobs arrive in a stream and need to be Analysis.
processed immediately on m identical processors. Find an assignment that ・Let Xi = number of jobs assigned to processor i.
balances the workload across processors. ・Let Yij = 1 if job j assigned to processor i, and 0 otherwise.
・We have E[Yij] = 1/n.
Centralized controller. Assign jobs in round-robin manner. Each processor ・Thus, Xi = ∑j Yi j , and μ = E[Xi] = 1.
receives at most ⎡ m / n ⎤ jobs. ・Applying Chernoff bounds with δ = c – 1 yields
Decentralized controller. Assign jobs to processors uniformly at random. ・Let γ(n) be number x such that xx = n, and choose c = e γ(n).
How likely is it that some processor is assigned “too many” jobs?
・Union bound ⇒ with probability ≥ 1 – 1/n no processor receives more

than e γ(n) = Θ(log n / log log n) jobs.
Bonus fact: with high probability,

some processor receives Θ(logn / log log n) jobs
49 50
Load balancing: many jobs
Theorem. Suppose the number of jobs m = 16 n ln n. Then on average,

each of the n processors handles μ = 16 ln n jobs. With high probability,
every processor will have between half and twice the average load.
Pf.
・Let Xi , Yij be as before.
・Applying Chernoff bounds with δ = 1 yields
16n ln n ln n
e 1 1
Pr[Xi > 2µ] < < =
4 e n2
1
( 12 )2 16n ln n 1
Pr Xi < 12 µ < e 2 =
n2
・Union bound ⇒ every processor has load between half and

twice the average with probability ≥ 1 – 2/n. ▪
51

Randomized Algorithms

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Randomized Algorithms

Uploaded by

Copyright:

Available Formats

Randomization

Algorithmic design patterns.

‣ max 3-satisfiability Randomization. Allow fair coin flip in unit time.

Last updated on 1/5/22 12:31 PM 2

Contention resolution in a distributed system

Contention resolution. Given n processes P1, …, Pn, each competing for

‣ global min cut

Contention resolution: randomized protocol

Claim. The probability that all processes succeed within 2e ⋅ n ln n rounds

False intuition. Global min-cut is harder than min s-t cut.

Contraction algorithm Contraction algorithm

Reference: Thore Husfeldt

Improvement. [Karger–Stein 1996] O(n2 log3 n).

trial 5 Extensions. Naturally generalizes to handle positive weights.

Best known. [Karger 2000] O(m log3 n).

Expectation. Given a discrete random variable X, its expectation E[X]

Expectation: two properties Guessing cards

linearity of expectation ln(n+1) < H(n) < 1 + ln n

" 1 if clause C j is satisfied

・Let Z = number of clauses

Maximum 3-satisfiability: analysis Maximum 3-satisfiability: analysis

Rearranging terms yields p ≥ 1 / (8k). ▪

Theorem. [Karloff–Zwick 1997, Zwick+computer 2002] There exists a 7/8-

very unlikely to improve over simple randomized

Theorem. P ⊆ ZPP ⊆ RP ⊆ NP.

Dictionary. Given a universe U of possible elements, maintain a subset Hash function. h : U → { 0, 1, …, n – 1 }.

networks, associative arrays, cryptography, web caching, etc.

Ad-hoc hash function Algorithmic complexity attacks

Q. But isn’t ad-hoc hash function good enough in practice?

Universal hashing: analysis Designing a universal family of hash functions

Eh∈H [X ] = E[∑s∈S X s ] = ∑s∈S E[X s ] = ∑s∈S Pr[X s = 1] ≤ ∑s∈S 1 = | S | 1n ≤ 1 #r &

€ Hash function family. H = { ha : a ∈ A }.

Q. OK, but how do we design a universal class of hash functions?

・Thus Pr[ha(x) = ha(y)] ≤ 1 / p. ▪

Universal hashing: summary

Goal. Given a universe U, maintain a subset S ⊆ U so that insert, delete,

Universal hash function family. H = { ha : a ∈ A }. ‣ contention resolution

Pf. We apply a number of simple transformations. t t

f(x) = etX is monotone in x Markov’s inequality: Pr[X > a] ≤ E[X] / a €

€ ・Finally, choose t = ln(1 + δ). ▪

Chernoff Bounds (below mean)

Theorem. Suppose X1, …, Xn are independent 0-1 random variables.

・Union bound ⇒ with probability ≥ 1 – 1/n no processor receives more

Bonus fact: with high probability,

Load balancing: many jobs

Theorem. Suppose the number of jobs m = 16 n ln n. Then on average,

・Union bound ⇒ every processor has load between half and

You might also like