170 Ps

You might also like

Download as ps, pdf, or txt
Download as ps, pdf, or txt
You are on page 1of 54

Chris Umans

January 26, 2000

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583


umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 Graph representation pra ti e: If A is the adja en y matrix of a graph G, what
does the transpose AT represent? If G is an undire ted graph this is just G. If G is a
dire ted graph, AT represents G with all of the edge dire tions reversed.
 Graph representation pra ti e: If A is the adja en y matrix of an undire ted graph
G, what do the entries of AAT represent? The diagonal entries (i; i) are the degree of
vertex i. The other entries (i; j ) are the number of paths of length exa tly 2 from i to
j.
 Master Theorem, easier version: This variant of the CLR Master Theorem is less
detailed, but is su ient for most situations:
Theorem 1 Given a re urren e T (n) = aT (n=b) + (nd ), and T (1) = (1), the
solution is:
1.
2.
3.

 nlog a , if a > bd ;


 nd log n , if a = bd ;
 
 nd , if a < bd .
b

 Master Theorem pra ti e:


{ The running time of Mergesort obeys the following re urren e: T (n) = 2T (n=2)+
(n) and T (1) = (1). Here a = 2, b = 2, and d = 1, so by the Master Theorem,
we have T (n) = (n log n), sin e a = bd.
{ The \naive" integer multipli ation algorithm from se tion 6 of the reading yields
the following re urren e: T (n) = 4T (n=2) + (n), and T (1) = (1). Here a = 4,
b = 2, and d = 1, so by the Master Theorem we have T (n) = (n2 ), sin e a > bd .
{ The improved integer multipli ation algorithm from se tion 6 of the reading yields
the re urren e: T (n) = 3T (n=2) + (n), and T (1) = (1). Here a = 3, b = 2,
and d = 1, so by the Master Theorem we have T (n) = (nlog 3)  (n1:59 ), sin e
a > bd .
2

 Homework 1, Problem 1.3-7, alternative solution: The problem asks for a


(n log n)-time algorithm to determine if any two numbers out of a set S of n real
numbers sum to a real number x. One idea is to sort the n numbers in in reasing order
a1 ; a2 ; a3 ; : : : ; an , and then for ea h ai , perform a binary sear h for x ai . If this binary
sear h nds some other aj = x ai , then we know that ai + aj = x, and we have found
the desired two numbers. Otherwise, we know that for ea h ai, there is no other aj
that sums to x. The sorting takes time (n log n), and we perform n binary sear hes,
ea h taking time (log n), for a total of (n log n) + n  (log n) = (n log n).
 Homework 1, Problem 3-1(b), alternative solution: We are asked for an asymptoti ally tight bound for:
n
X
S (n) = logs k:
k=1
For the upper bound we noti e that ea h term in the sum is at most logs n. There are
n terms, so the S (n)  n logs n = O(n logs n).
We'd like to use the same strategy for the lower bound, but the best lower bound we
an give for ea h term is logs 1 = 0, whi h doesn't get us anywhere! Really, most terms
in the sum are mu h larger than zero, and we an apture this intuition by \splitting
the summation:"
S (n) =

n
X

k=1

log k =
s

n=X
2 1
k=1

log k +
s

n
X

k=n=2

logs k:

Now, if we use a lower bound on ea h term, the terms from the se ond half of the
summation will give a meaningful lower bound:
S (n) =

n=X
2 1
k=1

log k +
s

n
X
k=n=2

logs k  n=2  0 + n=2  logs(n=2)

Remember that we want S (n) 


(n logs n), whi h means S (n)  n logs n for all
n  n0 for some onstants and n0 . We now noti e that log(n=2) = log n 1 is only
\slightly" smaller than log n. In fa t we have 2(log n 1)  log n for n  4. Therefore,
we have:
(2s+1)n=2  logs(n=2) = 2  n=2  (2(log n 1))s  n logs n:
Sin e s is a onstant, 2s+1 is also a onstant, and nally, we have:
1
S (n)  n=2 logs (n=2)  s+1 n logs n;
2
s
for n  4, whi h implies S (n) 
(n log n).
2

Chris Umans

February 2, 2000

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583


umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 DFS pra ti e: Given a graph G, is it true that every spanning forest of the graph is
produ ed by the DFS algorithm for some ordering of the verti es and adja en y lists
in the adja en y list stru ture? No! Some examples:
a

b
c
d

(1)

(2)

f
(3)

Example (1) annot have been produ ed by a DFS. (Why?) However it ould have
been produ ed by a BFS. Example (2) ould have been produ ed by a DFS. We an
determine from the gure that the DFS would have to have visited the right neighbor
of the root before the left. Example (3) ould have been produ ed by a DFS (noti e
it is a forest, not a single tree). In what order did it visit the verti es? First d, then
e; f; ; b; a.
 Strongly onne ted omponents, example proof: One of the key properties
used in the strongly onne ted omponents algorithm states that \if C1 and C2 are
two strongly omponents, and there is an edge from a vertex of C1 to a vertex of C2,
then the rst vertex visited in C1 has a larger f -value than any vertex in C2 ." How do
we prove a statement like this? Here, it works best to onsider the di erent possible
ways a DFS might visit the verti es. We an break them into two ases: either (1)
a vertex of C1 is the rst vertex among C1 and C2 to be visited, or (2) a vertex of
C2 is the rst vertex among C1 and C2 to be visited. These two ases over all the
possibilities.
In the rst ase, all v the rst vertex visited in C1. By the \white-path theorem," we
must eventually traverse an edge leading into C2 before nishing v. At this point, we
visit and nish all verti es in C2, be ause C2 is strongly- onne ted. Sometime after
3

this point, we ba kup to v and nish it. So f (v) is larger than the f -value for any
vertex in C2, as required.
In the se ond ase, we visit some vertex in C2 rst, and then we know that we will visit
all of C2, sin e it is a strongly- onne ted omponent. We also know that we annot
get to any vertex in C1 until we visit and nish all verti es in C2 , be ause otherwise
there would be an dire ted path from some vertex in C2 to a vertex in C1, whi h would
make them a single onne ted omponent, ontrary to assumption. So all the verti es
in C2 are nished before we even visit a vertex of C1, so if v is the rst vertex visited
in 1, we have f (v) > d(v) and d(v) greater than the f -value for any vertex in C2 , as
required.
 Strongly onne ted omponents, example: We worked through the following
example of the strongly onne ted omponents algorithm. The rst DFS o urs in GT ,
pi tured on the right. The dis overy and nishing times are re orded. In the pro ess,
we re ord a list of verti es in des ending order of nishing times; in this example the
list is a; ; d; f; e; g; b.
a

0/13
1/12

c
d

4/11

2/3

5/6

g
f

8/9

7/10
T
G

Now the se ond DFS is run in G, using the list of de reasing nishing times as a guide.
We start at a, and dis over a; b and (the rst strongly onne ted omponent). At
this point the DFS gets \stu k," as intended, and restarts at an unvisited vertex. The
unvisited vertex in the list with the next highest nishing time is d. Starting from d, we
dis over d; e and f (the se ond strongly onne ted omponent). Again, we get stu k,
and restart at the last unvisited vertex g, whi h is a strongly onne ted omponent by
itself.
 Strongly onne ted omponents, pra ti e: Why do we go to the trouble of reversing all the graph edges for the rst DFS and then omputing the se ond DFS in
des ending order of nishing values? Couldn't we just do the rst DFS on the original
graph and ompute the se ond DFS in order of in reasing nishing values? Here is a
ounterexample:
4

0/9 5/8 6/7

2/3 1/4

The dis overy and nishing times are re orded for ea h vertex, for an initial DFS
starting at the leftmost vertex. The strongly onne ted omponents are marked with
dashed lines. Noti e that the earliest nishing value o urs in the strongly onne ted
omponent that is a sour e in the strongly onne ted omponent DAG. If we ran the
se ond DFS a ording to in reasing nishing times, we would start in this omponent and group all of the graph into a single strongly onne ted omponent, whi h is
in orre t.
Is there a ounterexample to the onje ture that we an nd strongly onne ted omponents by omputing the se ond DFS in order of de reasing dis overy times? How
about in reasing dis overy times?

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

February 9, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 Prims and Dijkstra's algorithms, review: It is useful to separate the strategy
behind the two algorithms from the implementation details. For Prim's algorithm, the
idea is simple: at ea h step, we add the lightest edge that grows the tree. For Dijkstra's
algorithm, the idea is similar: at ea h step, we add an edge that grows the shortest
path tree; we pi k an edge from the existing tree to a vertex with the shortest \shortest
path estimate."
The implementation uses priority queues to ensure that at ea h step, we an e iently
identify a suitable edge to add. Re all the three operations we need our priority queues
to support:
1. Insert: add a vertex to the queue.
2. DeleteMin: remove the vertex with the smallest key.
3. De reaseKey: de rease the key of some vertex in the queue.
A simple implementation of priority queues uses min-heaps. Re all that a min-heap
is a omplete binary tree with any un lled positions grouped on the right of the row
of leaves. This tree satis es the heap property: the key of ea h parent has a key that
is at most the key value of its hildren. To Insert an element, we pla e it in the
next un lled position at the bottom of the tree, and \per olate-up," swapping it with
its parent until the heap property is satis ed. To perform a DeleteMin, we remove
the root of the tree, repla e it with the rightmost leaf element, and then \sift down,"
ex hanging this element with its parent until the heap property is satis ed. To perform
a De reaseKey, we hange the key value and \per olate up" until the heap property is
satis ed.
The main fa t used in analyzing the running times of these operations is that the depth
of the heap is O(log n), where n is the number of nodes in the heap. Sin e all of the
operations involve moving an element (at worst) from the top to the bottom of the
tree, or vi e versa, all three of the operation run in time O(log n).
 Ex hange Property, proof: The Ex hange Property is a useful tool in proving things
about Minimum Cost Spanning Trees (MCSTs). It states:
6

Lemma 2 (Ex hange Property) Given spanning trees T and T 0 , for any edge e0 2
T 0 T , there exists an edge e 2 T T 0 , with the property that T feg [ fe0 g is a

spanning tree.

Noti e that we haven't mentioned anything about edge weights yet. Here is a simple
example:
e

e
(a)

(b)

(c)

Figure (a) pi tures the graph with tree T in bold; gure ( ) pi tures the graph with
tree T 0 in bold. In this example, we have hosen the labeled edge e0 2 T 0 T , and we
want to perform an exhange using this edge. We an think of rst adding edge e0 to
T (pi tured in gure (b)), and then removing the edge e 2 T T 0 whose existen e is
guaranteed by the lemma.
How do we prove this lemma? It might be useful to use the example as a guide. We
rst noti e that in the general ase, adding edge e0 to T reates a unique y le, be ause
T is a spanning tree. Then to restore a spanning tree, we need only to remove a single
edge e on that y le. The lemma requires that this edge be in T T 0. What would
have to happen for us not to be able to nd an edge e 2 T T 0 on the y le? What
ould go wrong? All of the edges on the y le are already in T , so it would have to be
that all of the edges were also in T 0. But that an't happen, be ause T 0 is a tree! So
an edge e as required by the lemma must exist, and that proves the lemma.
 Transforming a spanning tree into a MCST: A very useful Theorem one an
prove using the Ex hange Property is the following:
Theorem 3 Let T be a spanning tree and let M be a MCST. Using a series of ex hange
moves, we an transform T into M :
T

! T1 ! T2 ! T3 !    ! Tk = M:

Furthermore, the weights of ea h su essive spanning tree are non-in reasing; i.e.
w(Ti )  w(Ti+1 ) for all i.

Proof: The proof is by indu tion on the number of edges T and M share. As a base
ase, if they share all n 1 edges, then T = M , and the theorem is trivially true.

Otherwise, pi k e0 to be the lightest edge that is not in both trees. That is, e0 is the
lightest edge among all those edges in T M and M T . We'll perform an ex hange
on edge e0 and see what we end up with.
Sin e e0 omes from either M or T , there are two ases:
1. e0 2 T M : if we perform an ex hange using the edge e 2 M T guaranteed by
the Ex hange Property, we obtain a new spanning tree M 0 = M feg[fe0 g. Like
e0 , e is an edge that is not in both trees. Sin e e0 was hosen to be the lightest
su h edge, we know that w(e0)  w(e). We on lude that w(M 0 )  w(M ), and
be ause M was a MCST, it must be that equality holds, and therefore M 0 is also a
MCST. Tree M 0 has more edges in ommon with T than M did, so by indu tion,
we an identify the required sequen e of ex hanges to transform T into M 0 , and
then reverse the ex hange we just performed to get M .
2. e0 2 M T : if we perform an ex hange using the edge e 2 T M guaranteed by the
Ex hange Property, we obtain a new spanning tree T 0 = T feg[fe0 g. As before,
we know that w(e0)  w(e). We on lude that w(T 0)  w(T ), so this ex hange is
a perfe tly good rst step for our sequen e. Tree T 0 has more edges in ommon
with M than T did, so by indu tion, we an identify the required sequen e of
ex hanges to transform T 0 into M , ompleting the sequen e of ex hanges required
by the theorem.

 Appli ation of Theorem 3: The question we want to answer is the following: Suppose we have a graph G in whi h all edge weights are distin t, and let M be a MCST
of G. Is M unique?
The answer turns out to be \yes," but it is not so straightforward to prove this from
s rat h. Using Theorem 3 above, it be omes pretty simple.
Suppose, for the purpose of ontradi tion, that we have two distin t MCSTs M1 and
M2 . Applying the theorem, we see that we an transform M1 into M2 using a series
of exhange moves, in whi h the weights of su essive trees are non-in reasing. But the
very rst ex hange in this sequen e, whi h transforms M1 into some other spanning
tree M10 , must alter the weight somehow, sin e the two edges being ex hanged have
distin t weights. Sin e M1 is a MCST, it annot be the ase that the weight de reases.
Therefore it must in rease, whi h ontradi ts Theorem 3. We on lude that there
annot have been two distin t MCSTs.
8

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

February 16, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 Union-Find Analysis: We want to analyze the number of steps required to perform
m operations (Unions or Finds) on n elements. We assume that we use both the
Union-By-Rank and Path Compression heuristi s:
1. Union-By-Rank: ea h element has a RANK value, whi h we should think of as
an overestimate (why?) of the height of that element's subtree. Singleton sets
start out with a RANK of 0. When we Link two subtrees, we always link the
one with smaller RANK to the one with larger RANK. If both subtrees have the
same RANK, then it doesn't matter whi h we link to whi h; however, the RANK
of the new root of the ombined tree is in remented in this situation. Note that
this is the only s enario in whi h an element's RANK an hange.
2. Path Compression: when we perform a Find on an element x, we follow its parent
pointers up the tree to the root. Path Compression hanges the parent pointers of
these elements (the ones visited traversing the path from x to the root) to point
dire tly to the root, to redu e the work in future Find operations.
We begin by proving two properties of RANKs.
Observation 1 The number of elements of RANK k is at most n=2k .
Proof: Let x be an element with RANK k. We argue that the number of elements
in the tree rooted at x is at least 2k . We prove this by indu tion on k. For the base
ase, if RANK (x) = 0, we have only a single element, so the tree rooted at x has size
1 = 20. For the general ase, we ask \How ould x get a RANK of k?" As noted above,
this an only happen if we link two trees, ea h with RANK k 1. By the indu tion

hypothesis the size of ea h of these (disjoint) trees is  2k 1. Therefore the size of the
resulting tree (with RANK k) is at least 2  2k 1 = 2k .
Now, we noti e that the ranks of elements below x in the tree have RANKS stri tly
smaller than k. Therefore, for every element x of RANK k, we must have at least 2k
other elements of whi h don't have RANK k in the tree below x. Therefore, if there
are n total elements, we an only a omodate n=2k elements of RANK k, at most.
9

Observation 2 The RANKS of the n elements range from 0 to log n.


Proof: Clearly, no RANK an be smaller than 0, sin e all elements start with RANK
0, and RANKs an only in rease. Now suppose that the largest RANK r was greater
than log n. Then by Observation 1, the number of elements of RANK r is at most
n=2r < 1. Sin e the number of elements of RANK r is an integer, it must be 0;
therefore, the largest rank is at most log n.

Now, we group the RANKs as follows: RANK k is in group log (k), where log is the
iterated log fun tion. Noti e that ea h group ontains RANKs that go from k + 1 to
2k , where k is some \tower of twos" (e.g. 22 ). As in the readings, we will sometimes
refer to a group by the range of RANKs that it ontains, like this: (k; 2k .
We an now make two further observations about RANKs and groups:
22

Observation 3 There are at most log n di erent groups.


Proof: By Observation 2, the largest RANK is at most log n. Therefore, the largest
group number is log (log n) whi h equals (log n) 1. Sin e the groups are numbered
starting at 0, we have at most (log n) 1 + 1 = log n distin t groups.
Observation 4 The number of elements in group (k; 2k is at most n=2k .
Proof: The elements in group (k; 2k have RANKs between k + 1 and 2k , in lusive.

By algebra, and Observation 1, we have:


2k
X
j =k+1

n=2j 

1
X
j =k+1

n=2j = n=2k :

Now, we are ready to bound the number of steps in m operations on n elements. Sin e
all of the work in a Union operation is done in the two Finds that it requires, we will
simply analyze the amount of work done in m Finds, and we will only be o by a
onstant.
The work done during Find(x) is simply the number of steps we must travel to get
from x to the root. We will a ount for these steps summed over all m operations using
ammortized analysis. This is just an a ounting tri k that makes the analysis easier
10

to think about: we will \ harge" some steps to the Find operations themselves, and
some steps we will \ harge" to parti ular elements. In the end, the sum of all of these
harged steps is just the total number of steps performed, whi h is what we want.
For ea h Find(x) operation, we will harge the steps as follows. Remember that we
are traveling from x to the root, and as we go, we en ounter elements with larger
and larger RANKs. For those pointers that we follow that go from an element whose
RANK is in group i to an element whose RANK is in group i + 1, we harge one step
to the Find operation. Sin e there are at most log n di erent groups (by Observation
3), the total harge to ea h Find operation is at most log n.
For the other pointers that we follow while performing a Find(x) operation, we harge
a step to the element from whi h the pointer originates. Suppose u is some vertex
on the path from x to the root. Noti e that every time we follow a pointer out of u,
the RANK of u's parent hanges (be ause of path ompression), unless u's parent is
already the root. This gives us a way to bound the number of times we an follow a
pointer out of u before the RANK of u's parent hanges to a new group. Spe i ally,
we an follow a pointer out of u at most 2k times while u's parent is still in group
(k; 2k .
Now, let us sum over all of the elements in group (k; 2k . By Observation 4, there are at
most n=2k of these elements, and by the previous paragraph, we harged ea h of them
at most 2k times. This leads to a total harge among all elements of n=2k  2k = n,
while the parent RANKs remain in group (k; 2k .
As we perform more operations, the parent RANKs move to higher and higher groups
{ remember that RANKs (and therefore group numbers) never de rease. So to a ount
for the total harge, we need to sum over the at most log n groups, giving us a total
harge among all elements during all m operations of at most n log n.
Finally, summing the harge to the m Find operations, and the harge to the elements,
we get that the total work for m operations is O((m + n) log n).

11

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

February 23, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109

 Hu man odes for Fibona i frequen ies: In the se ond part of the short exer ise,
we are given n hara ters 1; 2; : : : n whose frequen ies are Fibona i numbers, i.e.
f ( i ) = Fi , where Fi is given by the re urren e: F1 = 1; F2 = 1 and Fi = Fi 1 + Fi 2
for i  3. Based on the rst part of the exer ise, we guess that this will result in a
highly unbalan ed Hu man tree, that looks like this:
cn
cn-1
c4
c3
c1

c2

This would result in a Hu man ode of: 0n 1 for 1 , 0n 21 for 2, and 0n i1 for i,
i  3.
How do we prove that this tree gets built? Re alling the example on a small number
of hara ters in the rst part of the exer ise. We should show that at ea h step of the
algorithm, the two smallest frequen ies onsidered by the algorithm are the frequen y
at the root of the tree built so far, and the next smallest frequen y among the individual
hara ters not yet in the tree. More formally, we need to show that for all i  1, at step
i of the algorithm, there is a single tree onsisting
of all of the hara ters 1 ; 2; : : : i,
Pi
and that the frequen y at the root of this tree, j=1 Fj , is stri tly less than the Fi+2 .
If this is true, then the next two trees linked by the algorithm are the tree so far, and
the single hara ter i+1, with frequen y Fi+1.
So, what we need to prove boils down to this: Pij=1 Fj < Fi+2, for all i  1. This ries
out for an indu tion proof. The base ase, when i = 1, is simple:
i
X
j =1

Fj =

1
X
j =1

Fj = F1 = 1 < 2 = F3 = Fi+2 :

12

For the indu tion step, onsider the de nition of Fi+2 : it is equal to Fi+1 + Fi. We an
repla e one of these with a sum and make it smaller, using the indu tion hypothesis.
After trying both, it seems that it is easiest to repla e Fi+1, giving us:
0
1
i 1
i
X
X
Fi+2 = Fi+1 + Fi >  Fj A + Fi = Fj :
j =1
j =1
|
{z
}

by indu tion

That ompletes the proof.

 Greedy Choi e Property: The book \de nes" something it alls the Greedy Choi e
Property, whi h is supposed to give you a way to determine if a greedy strategy will
work on a given problem. This is sort of a vague on ept, that translates roughly
into: \The Greedy Choi e Property holds for a parti ular greedy algorithm if the
greedy hoi e made at ea h step of that algorithm an't hurt you." We looked at three
examples of greedy algorithms, and the statements of the Greedy Choi e Property for
ea h.
{ Prim's MST algorithm: Here, the Greedy Choi e Property states that if tree
X is ontained in some MST of G, and e is the lightest edge in G that extends
X , then X [ feg is also ontained in some MST of G.
When we apply the Greedy Choi e Property to prove orre tness of Prim's algorithm, X is the tree grown so far, and we apply the statement above to show
that after ea h step of the algorithm the tree so far is ontained in some MST.
Therefore, after the last step, we must have a MST.
{ Kruskal's MST algorithm: The Greedy Choi e Property here is very similar
to the one for Prim's algorithm. It states that if forest X is ontained in some
MST of G, and e is the lightest edge in G whose addition to X does not reate a
y le, then X [ feg is ontained in some MST.
We apply this statement to prove the orre tness of Kruskal's in exa tly the same
way we used the Greedy Choi e Property to prove orre tness of Prim's algorithm.
{ Hu man's algorithm: The Greedy Choi e Property here states that if x and y
are the hara ters with the two lowest frequen ies, then there exists some optimal
Hu man tree in whi h x and y are siblings.
We use this to prove orre tness as follows: the Greedy Choi e Property tells
us that there exists an optimal Hu man tree T in whi h x and y are siblings.
A separate argument says that if T is an optimal tree in whi h x and y are
13

siblings, then ombining x and y into a single meta- hara ter z whose frequen y
is f (x) + f (y) gives an optimal tree for this new (smaller) hara ter set (CLR
Lemma 17.3). Therefore, at ea h step in the algorithm, the tree produ ed \so
far" is ontained in some optimal tree; therefore after the last step, we must have
an optimal tree.
 Proof of Greedy Choi e Property for Hu man's algorithm: It is instru tive
to re all how we proved the Greedy Choi e property for, say, Prim's algorithm. There
we wanted to show that extending X by e was \safe." We argued that if T is a MST
ontaining X , then either (1) T also ontains e, and we are done, or (2) T does not
ontain e. In ase (2) we showed how to swap e with some other edge e0 already in T
to form a new tree T 0; we also showed that the swap an only have de reased the ost,
so T 0 must also be a MST, and it ontains X and e, as required.
For the Greedy Choi e Property for Hu man's algorithm, we pursue a similar strategy.
We onsider an optimal tree T . If T has x and y as siblings, we are done. Otherwise,
tree T looks like this:

y
a

Here, a and b are the deepest siblings in T . We will form a new tree T 00 by ex hanging
x with a and y with b, and argue that B (T 00 )  BP(T ), so T 00 is optimal if T was. Re all
that the ost of a tree is de ned by: B (T ) = 2C f ( )dT ( ), where C is the set of
hara ters, f ( ) is the frequen y of hara ter , and dT ( ) is the depth of in tree T .
We rst onsider T 0 obtained by swapping x and y. We will show that B (T 0)  B (T )
by showing that the net de rease is non-negative:
X
X
B (T ) B (T 0 ) =
f ( )dT ( )
f ( )dT ( )
2C
2C
= f (x)dT (x) + f (a)dT (a) f (x)dT (a) f (a)dT (x)
= [f (a) f (x)[dT (a) dT (x)
Sin e x and y were hosen to be the hara ters with the smallest frequen ies, we
know that f (x)  f (a), and by our hoi e of a and b, we know that dT (x)  dT (a).
0

14

Therefore both of the terms in the produ t on the last line above are non-negative, so
B (T ) B (T 0 ) is non-negative.
Finally, we transform T 0 into T 00 by swapping y and b, and using an identi al argument,
we show that B (T 00 )  B (T ). Here we use the fa t that f (y)  f (b), and dT (y)  dT (b).
The result is an optimal tree T 00 that in ludes x and y as siblings, whi h shows that
the Greedy Choi e Property holds for the Hu man algorithm.

15

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

Mar h 1, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 Strassen's algorithm example: In the short exer ises you are asked to multiply two
2  2 matri es using Strassen's algorithm. It's a good idea to rst ompute the result
using ordinary matrix multipli ation to he k our results:
"
# "
# "
#
1 3  8 4 = 26 10 :
5 7
6 2
82{z 34 }
| {z } | {z } |
A

Now we move on to Strassen's method. We rst need to ompute the 7 intermediate


results (using a total of 7 multipli ations):
P1 = (a11 a22 )(b21 + b22 ) = (3 7)(6 + 2) = 32
P2 = (a11 + a22 )(b11 + b22 ) = (1 + 7)(8 + 2) = 80
P3 = (a11 a21 )(b11 + b12 ) = (1 + 5)(8 + 4) = 48
P4 = (a11 + a12 )b22 = (1 + 3)2 = 8
P5 = a11 (b12 b22 ) = 1(4 2) = 2
P6 = a22 (b21 b11 ) = 7(6 8) = 14
P7 = (a21 + a22 )b11 = (5 + 7)8 = 96
Now, using addition and subtra tion of the intermediate results, we ompute the 4
entries of the result matrix C :
11 = P1 + P2 P4 + P6 = ( 32) + 80 8 + ( 14) = 26
12 = P4 + P5 = 8 + 2 = 10
21 = P6 + P7 = ( 14) + 96 = 82
22 = P2 P3 + P5 P7 = 80 ( 48) + 2 96 = 34
As expe ted, these numbers agree with our omputation of C = AB above.
 Re ursive algorithms for matrix multipli ation { why they work: Strassen's
algorithm is one example of a re ursive algorithms for matrix multipli ation. There
are two fundamental (and not ne essarily obvious) fa ts about matri es that justify
the orre tness of these re ursive algorithms.
16

Fa t 1 We an subdivide n  n matri es M1 and M2 and ompute their produ t pie e


by pie e. This is alled blo king. Strassen's algorithm divides matri es into 4 n=2  n=2

matri es, whi h is the ase we will verify here. But one an just as easily blo k matri es
in other ways (e.g. into 9 n=3  n=3 submatri es). Let us onsider the blo king used
by Strassen's algorithm:
"
# "
# "
#
A B
E G
AE + BF AG + DH
C D  F H = CE + DF CG + DH :
{z
}
| {z } | {z } |
M1

M2

M1 M2

Clearly, if A; B; C; D; E; F; G and H were just numbers, the multipli ation above is


orre t { we would just be multiplying 2  2 matri es, the usual way. The amazing fa t
is that this also is orre t if A; B; C; : : : H are entire matri es. To see this, onsider
the ij -th entry in the result matrix M1 M2 . Say it lies in the upper-left quadrant
(an identi al argument an be made for the ase in whi h it lies in any of the other
quadrants). We know that it should be the dot-produ t of the i-th row of M1 and the
j -th olumn of M2 . The way we are omputing it here, though, is by adding the ij -th
entry of the matrix produ t AE to the ij -th entry of the matrix produ t BF . But what
are these values? The ij -th entry of AE is simply the dot produ t of the rst half of
M1 's row i with the rst half of M2 's olumn j . The ij -th entry of BF is simply the
dot produ t of the se ond half of M1 's row i with the se ond half of M2 's olumn j .
When we add them together, we get the dot produ t of M1 's row i with M2 's olumn
j , whi h is exa tly the orre t value for the ij -th entry of M1 M2 !
Fa t 2 Arithmeti works with matri es in pla e of numbers. For example, if I write:
(a + b) b , I know by algebra that I an rewrite this as (a + b) b = (a + b ) b =
a + (b b ) = a . I am assuming a; b and are, say, real numbers. It turns out
that one an do the same algebrai manipulations when a; b and are matri es. We

know how to multiply and add matri es, and one an verify that the various rules used
in grade-s hool algebra (multipli ation distributes over addition, asso iativity, et ...)
hold for matri es as well. In the terminology of abstra t algebra, we say that matri es
form a ring.
Fa t 1 allows us to apply a method for multiplying 2  2 matri es re ursively. Fa t 2 is
the reason we an verify algebrai ally that Strassen's fan y method of getting the four
entries in the result matrix, and then be sure that the algebra still works out when the
\entries" are entire matri es instead of just single numbers.
 More on re ursive matrix multipli ation: In presenting Strassen's algorithm as
17

an improvement over the \naive" re ursive matrix multipli ation, CLR and the readings are ex ited that Strassen's method is able to redu e 8 multipli ations to 7. But
Strassen's method also uses 18 additions, as opposed to only 4 in the naive algorithm.
Why do we are so mu h about multipli ations, and not additions?
The answer is that we apply Strassen's 2  2 matrix multipli ation method re ursively.
Ea h multipli ation is a tually a matrix multipli ation, whi h is very ostly, while ea h
addition is just a matrix addition, whi h is omparatively mu h faster. It is instru tive
to look at the running time re urren es in ea h ase:
(n2 )

z }| {

Naive method (8 mults, 4 additions): T (n) = 8T (n=2) + 4(n=2)2


Strassen's method (7 mults, 18 additions): T (n) = 7T (n=2) + |18(n=
2)2
{z }
(n2 )

Using the Master Theorem for re urren es, we see that both re urren es fall into
the ase in whi h the number of re ursive alls dominates, and the solutions are
T (n) = (nlog 8 ) = (n3 ) and T (n) = (nlog 7 ) = (n2:81::: ), respe tively. We
now an see why multipli ations matter so mu h more than additions. The number of
multipli ations translates dire tly to the number of re ursive alls (whi h di tates the
running time for these re urren es). The number of additions, on the other hand simply ontributes to the \f (n)" term in the re urren es. No matter what onstant this
number is, the f (n) term will always be (n2), so in reasing the number of additions
does not hurt the asymptoti running time at all.
 Other Strassen-like results: Vi tor Pan has a method for multiplying 68  68
matri es using only 132; 464 multipli ations (the naive method uses 314; 432). He also
has a method for multiplying 70  70 matri es using only 143; 640 multipli ations,
and 72  72 matri es using only 155; 424 multipli ations. And you probably thought
Strassen must have spent too mu h time with matri es!
There is an important point here, though. Pan's method for multiplying, say, 72  72
matri es an be applied re ursively, in the same way Strassen's method an. We simply
need to blo k the matri es into n=72  n=72 submatri es at ea h re ursive all! This
gives a general algorithm for multiplying matri es, with a running time re urren e of:
T (n) = 155; 424T (n=72) + (n=72)2 :
Here is the number of additions Pan's method requires. Noti e that we don't even
need to know this number, sin e we an see that the (n=72)2 term is just (n2 ),
2

18

and that is all the information we need to solve the re urren e. The solution of this
re urren e is, by the Master Theorem, T (n) = (nlog 155;424 ) = (n2:795:::). This is
indeed a small improvement over Strassen's algorithm. But noti e how mu h harder
Pan needed to work on larger matri es: Strassen only needed to save a single multipli ation, while Pan needed to save over half of the multipli ations (some 217,824
multipli ation saved)!
Question 1 on Homework 7 asks you to gure out how many multipli ations you would
need to save on 3  3 matri es in order to improve Strassen's result.
72

19

Chris Umans

Mar h 8, 2000

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583


umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109

 Dis rete Fourier Transform example: In the short exer ises you are asked to
ompute the Dis rete Fourier Transform (DFT) of the ve tor (0; 1; 2; 3). Some people
ele ted to do this by tra ing through the FFT algorithm. Here we will simply do the
appropriate polynomial evaluations, whi h seems a less error-prone method.
Remember that the input ve tor a = (a0 ; a2; : : : an 1) gives the oe ients of a polynomial A(x). Here we have:
A(x) = 0 + 1  x + 2  x2 + 3  x3 :
The DFT requires us to evaluate the polynomial A(x) at the n omplex n-th roots of
unity !n0 ; !n1 ; !n2 ; : : : !nn 1. Here we noti e1 that the omplex 4-th roots of unity are just
1; i; 1; i. We ompute the DFT ve tor y = (y0; y1; y2; y3) as follows:
y0 = A(!40 ) = A(1) = 0 + 1  (1) + 2  (1)2 + 3  (1)3 = 6
y1 = A(!41 ) = A(i) = 0 + 1  (i) + 2  (i)2 + 3  (i)3 = 2 2i
y2 = A(!42 ) = A( 1) = 0 + 1  ( 1) + 2  ( 1)2 + 3  ( 1)3 = 2
y3 = A(!43 ) = A( i) = 0 + 1  ( i) + 2  ( i)2 + 3  ( i)3 = 2 + 2i:
Therefore the DFT of (0; 1; 2; 3) is (6; 2 2i; 2; 2 + 2i).
 The Fast Fourier Transform (FFT) algorithm: This is an in redibly important
and beautiful algorithm, that omputes the DFT in time O(n log n), instead of the
O(n2 ) time we get using the \obvious" algorithm of just performing the n seperate
polynomial evaluations.
Our goal is to evaluate a polynomial A(x) = a0 + a1 x + a2 x2 +    + an 1 xn 1 at
the omplex n-th roots of unity. For the appli ation we have in mind (multiplying
polynomials), any n distin t points will do, but as we will see, these parti ular n
points are riti al for the divide and onquer approa h we will take. The omplex
1 We an see this by plugging in 4 for

eiu = os(u) + i sin(u).

in the following two equations: (1)

20

!n

e2i=n ,

and (2)

n-th roots of unity are:

1 = !n0 ; !n1 ; !n2 ; : : : !nn 1, where !n is de ned to be e2i=n. It is


important to remember that these are simply omplex numbers, nothing more.
The FFT is a divide and onquer algorithm. As with any divide and onquer algorithm,
we rst need to spe ify how to split the problem into subproblems. Our input is
an (n 1)-degree polynomial A(x), whi h we will split into two (n=2 1)-degree
polynomials as follows:
A(x) = Aeven (x2 ) + xAodd (x2 );
(1)
where Aeven and Aodd simply use the even and odd oe ients of A, respe tively (we
assume n is even):
Aeven (z ) = a0 + a2 z + a4 z 2 + a6 z 3 +    + an 2 z n=2 1
Aodd (z ) = a1 + a3 z + a5 z 2 + a7 z 3 +    + an 1 z n=2 1 :
We write Aeven and Aodd as polynomials in z to emphasize that while they are derived
from A(x), they are not the same as A(x), nor are they formed from a subset of the
terms of A(x). It is easy to verify, however, that Equation 1 above is orre t; i.e. in
order to evaluate A at a point x, we need only to evaluate Aeven and Aodd at the
point x2, and then ombine the results with one further multipli ation and addition.
This leads to the following strategy for a hieving our goal of omputing the n values:
A(!n0 ); A(!n1 ); A( !n2 ); : : : A(!nn 1 ):
Using our way of splitting up A, we should ompute:
Aeven ((!n0 )2 ); Aeven ((!n1 )2 ); Aeven ((!n2 )2 ); : : : Aeven ((!nn 1 )2 );
and
Aodd ((!n0 )2 ); Aodd ((!n1 )2 ); Aodd ((!n2 )2 ); : : : Aodd ((!nn 1)2 );
and nally ombine ea h of the n pairs: A(!nk ) = Aeven ((!nk )2) + !nk Aodd ((!nk )2 ), for
k = 0; 1; 2; : : : n 1.
Now, for the rst time, we will use a property of the spe ial set of n numbers at whi h
we hose to evaluate A. It would seem that our divide and onquer approa h has us
evaluating A at n points by evaluating Aeven and Aodd at n points, whi h isn't an
improvement. However, by the magi of omplex roots of unity, it turns out that we
are a tually evaluating Aeven and Aodd at only n=2 distin t points, and, further, these
n=2 distin t points are exa tly the omplex n=2-th roots of unity! This is alled the
Halving Lemma in CLR, and we restate it here:
21

0 ; ! 1 ; : : : ! n=2 1
Lemma 4 Let !n0 ; !n1 ; : : : !nn 1 be the omplex n-th roots of unity, and let !n=
2 n=2
n=2
be the omplex n=2-th roots of unity. We have:
(!n0 )2
= (!nn=2)2 = !n=0 2
(!n1 )2
= (!nn=2+1)2 = !n=1 2
(!n2 )2
= (!nn=2+2)2 = !n=2 2

..
.

..
.

..
.
n=2 1
!n=
2 :

(!nn=2 1)2 = (!nn 1)2 =


We now have a bona de divide and onquer algorithm. It goes as follows: to evaluate
an (n 1)-degree polynomial A(x) at the omplex n-th roots of unity, re ursively
evaluate Aeven and Aodd, two (n=2 1)-degree polynomials, at the omplex n=2-th
roots of unity; then ombine the results a ording to Equation 1 using n multipli ations
and additions. The running time T (n) of this algorithm is expressed by the following
(familiar) re urren e:
T (n) = 2T (n=2) + (n);
whose solution, by the Master Theorem, is T (n) = (n log n).
 Appli ation: polynomial multipli ation: Our motivation for trying to get a fast
algorithm for the DFT was polynomial multipli ation. Here we are given two polynomials A(x) and B (x), ea h of degree at most n and we want to ompute C (x) = A(x)B (x).
We noti ed that just as we an uniquely express a degree n polynomial with its n + 1
oe ients, we an also uniquely express a degree n polynomial with its evaluation at
n + 1 distin t points. Furthermore, given an evaluation of A(x) and B (x) at 2n + 1
distin t points x1 ; x2 ; : : : x2n+1 , we an obtain an evaluation of C (x) = A(x)B (x) at the
same set of 2n + 1 distin t points by pointwise multipli ation. That is, if A(xi) = yi
and B (xi ) = zi , then C (xi ) = A(xi )B (xi) = yizi. Finally, sin e C (x) an have degree at most 2n, these 2n + 1 points uniquely determine C (x), and we an re over the
oe ients by interpolation.
Our algorithm for polynomial evaluation is thus: (1) evaluate A(x) and B (x) at 2n +1
distin t points, (2) multiply them pairwise to obtain the value of C (x) = A(x)B (x)
at 2n + 1 distin t points, and (3) interpolate to nd the oe ients of C (x). Step (2)
takes (n) operations. The FFT algorithm allows us to do step (1) and step (3) in
time (n log n) (using the omplex (2n + 1)-th roots of unity as our distin t points),
for an overall running time of (n log n).
 The Inverse DFT: For the appli ation above, we laimed that it was possible to
use the FFT algorithm to interpolate, or in this ase, to ompute the Inverse DFT.
22

Remember that the DFT transforms a ve tor a = (a0; a1 ; : : : an 1) of oe ients of a


degree n 1 polynomial A(x) into a ve tor y = (y0; y1; : : : yn 1) of evaluations of A(x) at
the omplex n-th roots of unity. The Inverse DFT omputes the ve tor a of oe ients
from the ve tor y of evaluations. In the polynomial-multipli ation appli ation above,
we use the Inverse DFT to re over the oe ients of C (x) after we obtain a ve tor of
evaluations of C (x) at the omplex (2n + 1)-th roots of unity.
In omputing the DFT, we omputed from the n entries of the a ve tor, the n entries
of the y ve tor, whose formulas were:
yk =

nX1
j =0

aj (!nk )j :

It turns out that we an express the a's in terms of the y's by the following formulas
(proved in CLR):
1 nX1 y (! k )j :
ak =
j n
n
j =0

These are remarkably similar! It should not be surprising, then, that we an ompute
the Inverse DFT using the FFT algorithm in whi h we repla e !n by its inverse !n 1,
and divide all of the entries in the result ve tor by n (although we ertainly haven't
proved that this works).

23

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

Mar h 15, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 Computing binomial oe ients: In the short exer ises you are asked to ompute
using three di erent methods the binomial oe ient C (n; k) de ned as follows:
C (n; k) = C (n 1; k 1) + C (n 1; k); n > 0; k > 0
C (n; 0) = 1; n  0
C (0; k) = 0; k > 0:
The re ursive method follows easily from the de nitions:
re ursive_C(n, k) {
if(k == 0) return 1;
if(n == 0) return 0;
return(re ursive_C(n-1, k-1) + re ursive_C(n-1, k));
}

One an prove by a straightforward indu tion that the running time of re ursive C (n; k)
is at least C (n; k) whi h is at least exponential in the worst ase. The spa e required
is
(n), the maximum depth of the re ursion tree.
The dynami programming method lls in a (n +1)  (k +1) table, in whi h entry (i; j )
is intended to be C (i; j ). The only thing we need to onsider before oding this is what
order to ll in the table. We noti e that entry (i; j ) depends on entries (i 1; j 1)
and (i 1; j ). If we visualize the table with entry (0; 0) in the upper left orner, this
means that ea h entry depends on the entry diagonally above and to the left of it, and
the entry dire tly above it. If we ll in the table from top to bottom, and left to right
within ea h row, we will be sure that when we ompute entry (i; j ), the table entries
onsulted in omputing entry (i; j ) are already lled in. The ode follows:
dynami _prog_C(n, k) {
for(i = 0; i <= n; i++) C[i, 0 = 1;
for(j = 1; j <= k; j++) C[0, j = 0;
for(i = 1; i <= n; i++)

24

for(j = 1; j <= k; j++)


C[i, j = C[i-1, j-1 + C[i-1, j;

As with all dynami programming algorithms, the running time is the number of entries
in the table, times the amount of work to ll in ea h entry. In this ase, we have (nk)
table entries, and we ll in ea h with O(1) work, for a total running time of (nk).
The spa e required is the size of the table, or (nk).
Finally, using the formula C (n; k) = (n nk!)!k! , we an write the following ode:
formula_C(n, k) {
if(k == 0) return 1;
if(n == 0) return 0;
x = 1;
for(i = 1; i <= n; i++) {
x = x * i;
if(i == n-k) n_minus_k_fa t = x;
if(i == k) k_fa t = x;
}
return(x/(n_minus_k_fa t * k_fa t));
}

Counting multipli ations as onstant time and onstant spa e operations (whi h is
slightly unrealisti here, be ause the numbers we are omputing are so large), the
running time is O(n), and the spa e required is a onstant. A better running time of
O(k) an be obtained using the observation that n!=(n k)! = n(n 1)(n 2)    (n
k + 1) and k! an both be omputed with only k multipli ations.
 Framework for dynami programming: In devising a dynami programming algorithm, there are 4 steps/questions to be answered:
1. What are the subproblems?
2. How do we express the solution to ea h subproblem in terms of solutions to smaller
subproblems?
3. What is the orre t order to ll in the dynami programming table?
4. What is the running time and spa e requirement of the algorithm?
25

Step 2 and sometimes step 1 are the di ult parts; on e these are omplete, it should
be fairly easy to omplete step 3, and write the a tual (pseudo-) ode.
A few notes on steps 1 and 2: by a subproblem, we mean an instan e of the original
problem with a di erent (smaller) input. In determining the subproblems, sometimes
it is ne essary to rst rephrase the original problem. In guring out the re ursive
expression in step 2, we use the so- alled optimal substru ture of the problem { we
often onsider a range of initial hoi es for onstru ting a solution, and then ea h
hoi e leads to some number of subproblems. The next few examples should make
these observations more on rete.
 Chain Matrix Multipli ation: We want to multiply a hain of re tangular matri es
A1 ; A2 ; : : : An with dimensions m0 ; m1 ; m2 ; : : : mn (matrix Ai has dimensions mi 1 mi ).
Sin e the number of operations required varies with the order we hoose to multiply
the matri es, we want to determine the optimal way of parenthesizing the matri es.
Let's see how the solution we saw in le ture ts into the framework above:
1. The problem asks: What is the optimal way to multiply A1  A2      An?
The subproblems ask: What is the optimal way to multiply Ai  Ai+1      Aj ?
Noti e that subproblems are all instan es of the same problem, with di erent
inputs, namely: all sub- hains of matri es Ai; Ai +1; : : : Aj . There are O(n2) su h
subproblems, and ea h is de ned by a pair of endpoints (i; j ). Noti e that the
problem on the original input is just subproblem (1; n).
2. First, we de ne some notation: let M (i; j ) be the number of operations for multiplying Ai; : : : Aj in an optimal order. Now, we want to write a re ursive expression
for M (i; j ), so we \think re ursively": the optimal grouping rst breaks the matri es at some k between i and j , multiplies the left half in an optimal order,
multiplies the right half in an optimal order, and then does one nal multipli ation of the left result matrix and the right result matrix. In our notation:
M (i; j ) = imin
[M (i; k) + M (k + 1; j ) + |mi 1{zmk mj}
k<j
ost of nal step
M (i; i) = 0 for all i
3. The dynami programming table for this example has size O(n2), and we want
to pla e the value of M (i; j ) in entry (i; j ), for all 1  i  j  n. We always
ll in the table from the smallest subproblem up to the largest subproblem. Here
the \size" of subproblem (i; j ) is simply the number of matri es in its input, or
j i + 1. We an verify that in our re ursive expression, M (i; j ) depends only on
26

solutions to smaller subproblems. So, we ll in the table in the following order:


ll in M (i; i) = 0
for i = 1; 2; : : : n
(size 1)
ll in M (i; i + 1)
for i = 1; 2; : : : n 1 (size 2)
ll in M (i; i + 2)
for i = 1; 2; : : : n 2 (size 3)
...
...
...
ll in M (i; i + (n 1)) for i = 1
(size n).
4. As noted, the size of the table in this example is O(n2), and the work required
to ll in ea h entry is O(n), sin e we are taking the minimum over O(n) values.
Therefore the running time is O(n3). The spa e required is the size of the table
O(n2 ).
 Longest Common Subsequen e (LCS): In this problem we are given two hara ter
strings x = x1 x2 : : : xm and y = y1y2 : : : yn, and we wish to nd the longest ommon
subsequen e of x and y . A subsequen e is just a portion of a string obtained by omitting
hara ters. We want to nd a longest subsequen e that o urs in both x and y. For
example, if x = nonsense and y = longest, the longest ommon subsequen e of x and
y is ones. Let's devise a dynami programming algorithm using the framework above:
1. The problem asks: What is the longest ommon subsequen e of x1 x2 : : : xm and
y1 y2 : : : yn ? The subproblems ask: What is the longest ommon subsequen e of
xi xi+1 : : : xm and yj yj +1 : : : yn ? There are O(n2 ) subproblems, and we an des ribe
ea h by a pair of indi es (i; j ). As always, the subproblems are all instan es of
the same problem with di ering inputs.
2. We introdu e the notation L(i; j ) to represent the length of the longest ommon
subsequen e in subproblem (i; j ). We need to write a re ursive de nition for
L(i; j ), expressing it in terms of smaller subproblems. We might start by onsidering the \ rst" hoi e in nding a LCS of two strings: if the rst hara ters
mat h, then the LCS is just the rst hara ter, plus the LCS of the remainder of
the two strings. Otherwise, the LCS is either the LCS of the rst string with the
se ond string after dis arding the rst hara ter, or the LCS of the se ond string
with the rst string after dis arding the rst hara ter. In our notation, we have:
(
1 + L(i + 1; j + 1)
if xi = yj
L(i; j ) =
max[L(i + 1; j ); L(i; j + 1) otherwise
L(i; n) = 1if yn appears in string xi xi+1 : : : xm and 0 otherwise
L(m; j ) = 1if xm appears in string yj yj +1 : : : yn and 0 otherwise
For the base ase, we noti ed that the length of a LCS of two strings is easy to
27

determine if either string is just a single hara ter { in this ase, the length of a
LCS is 1 if that hara ter appears in the other string, and 0 otherwise.
3. Like the hain matrix multipli ation example, the dynami programming table
for this example has size O(n2), and we want to pla e the value of L(i; j ) in entry
(i; j ), for all 1  i  j  n. The order I des ribed in se tion for lling in the table
was slightly ina urate. Here is a good way to visualize what the orre t order
should be:
j

1 2 3

1
2
3
i

Entry (i; j ) in the table depends on three other entries (from our re ursive definition above). It depends on the entry to its right, the entry below it, and the
entry diagonally to its right and below it, as pi tured. Therefore, we an ll in
the table in the following order: rst, ll in the last olumn and the last row,
whi h orrespond to the base ases. Then, ll in the table from bottom to top,
and right to left within ea h row. It is easy to see that at the time we get to
lling in ea h entry (i; j ), the other three entries upon whi h it depends will have
already been lled in.
4. The size of the table is O(n2), and the time to ll in ea h entry is O(1); therefore
the running time is O(n2).
 All-Pairs Shortest Paths: In this problem, we are given a graph G = (V; E ) with
edge weights w(vi; vj ) for all edges (vi; vj ) 2 E . (There are no negative y les, and for
onvenien e, we will assume w(vi; vj ) = 1 if (vi; vj ) 62 E ). We want to nd the shortest
path from vi to vj for all vertex pairs (vi; vj ). Let's see how the dynami programming
solution to this problem ts into the framework above:
1. This is an example of a ase in whi h we alter the problem statement in order
to be able to learly state the subproblems. Our modi ed problem asks: What
is the shortest path from vi to vj using (some of) the verti es v1 ; v2 ; : : : ; vn as
intermediate verti es on the path? The subproblems ask: What is the shortest path

28

from vi to vj using (some of) the verti es v1 ; v2 ; : : : ; vk as intermediate verti es


on the path? Noti e that sin e we will be solving all of the subproblems, we an

extra t the solution to our original problem from these solutions, sin e, among
other things, we will have the shortest paths for all vertex pairs (vi; vj ). Here
there are O(n3) subproblems, and ea h an be des ribed by a triple (i; j; k).
2. We introdu e the notation T (i; j; k) to represent the length of the shortest path
from i to j using (some of) the intermediate verti es v1; v2; : : : vk . We want a
re ursive expression for T (i; j; k) in terms of smaller subproblems. As with the
LCS problem, we begin by noti ing that there are two ases: either the shortest
path doesn't use vertex vk , or it does; these two possibilities are pi tured below
in (a) and (b), respe tively:
vi

vj

v1

vi

v2

vj

v1

v2

v3

v3
vk-1

v4

vk-1
v4

vk
(a)

vk
(b)

In (a), the length of the shortest path from vi to vj using v1 : : : vk as intermediate


verti es is simply the length of the shortest path from vi to vj using verti es
v1 : : : vk 1 as intermediate verti es. In (b), the length is the length of the shortest
path from vi to vk using verti es v1 : : : vk 1 as intermediate verti es plus the length
of the shortest path from vk to vj using verti es v1 : : : vk 1 as intermediate verti es.
This observation allows us to write the following re ursive expression for T (i; j; k):
T (i; j; k) = min[T (i; j; k 1); T (i; k; k 1) + T (k; j; k 1)
T (i; j; 0) = w(vi ; vj )
3. Here it is di ult to visualize the dynami programming table (sin e it has 3
dimensions), but it is easy to des ribe the order in whi h we must ll it in. The
\size" of subproblem (i; j; k) is just k, the number of allowed intermediate verti es.
It is easy to verify that in the expression for T (i; j; k) above, T (i; j; k) depends
only on smaller subproblems. We an ll in the table in the following order:
29

ll in T (i; j; 0) for all i; j (size 0)


ll in T (i; j; 1) for all i; j (size 1)
ll in T (i; j; 2) for all i; j (size 2)
...
...
...
ll in T (i; j; n) for all i; j (size n)
These last entries are the answers to our original question about the shortest
paths between all pairs of verti es.
4. Here, the table has size O(n3), and the work to ll in ea h entry is O(1), for a
running time of O(n3).

30

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

Mar h 22, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109

 Graphi al LP example: In the short exer ises you are asked to solve the following
Linear Program (LP) in two variables (whi h I have renamed x and y):
maximize 3x + 5y
subje t to x + 2y  6
x y  2
x  3
x; y  0
The following gure gives a graphi al representation of the problem:
x=3
(0, 3)
obj = 15

11111
00000
00000
11111
00000
11111
00000
(0, 0)11111
00000
obj = 011111
(2, 0)
obj = 6

x-y = 2
objective function
x + 2y = 6
(3, 3/2); obj = 33/2
(3, 1); obj = 14

The line de ned by ea h of the onstraints is pi tured, along with a line representing
a onstant value for obje tive fun tion. The feasible region is shaded. We imagine
\sliding" the obje tive fun tion line in the dire tion of de reasing obje tive fun tion
value (in the dire tion of the arrows in the gure), until it rst tou hes the feasible
region. The point of interse tion will be the maximum value of the obje tive fun tion
subje t to the onstraints. This will always happen at a orner (or along an entire line,
in whi h ase we an pi k an endpoint of that line, whi h is a orner).
31

To determine whi h orner attains the maximum value of the obje tive fun tion, we
evaluate the obje tive fun tion at ea h of the orners of the feasible region. In this
example, there are 5 orners, ea h marked with a heavy dot in the gure, and ea h is
labled with its oordinates, and the value of the obje tive fun tion at that point. The
optimum value of 33=2 is attained at (3; 3=2). The solution to the LP is thus: x = 3;
y = 3=2.
 Why we are: Many real-world problems t into the general framework \maximize
or minimize some obje tive fun tion, subje t to some onstraints." It turns out that
many su h formulations are very hard to solve e iently. For example, we might want
to require that the variables in an LP are integers, whi h is natural for many problems
that seem to need a Boolean variable. We might also want an obje tive fun tion that
in ludes a quadrati term (one that multiplies two variables together). It turns out
that both of these formulations are (probably) phenomenally hard to solve exa tly.
But, Linear Programming is easy, in the following two ways:
{ A Linear Program an be solved in polynomial time. The algorithms that do this
are omplex and often not useful in pra ti e.
{ The simplex algorithm solves most Linear Programs very e iently in pra ti e,
although on some degenerate ases it requires exponential running time. The
simlex algorithm takes advantage of the fa t that the optimum lies at a orner
of the feasible region. It starts at some orner, and repeatedly moves to a better
neighboring orner, until it an't improve the value of the obje tive fun tion
anymore. Of ourse, in general this algorithm is operating in n-dimensional spa e,
moving from orner to orner of the polytope that is the feasible region. In this
ourse, you are only expe ted to be able to tra e the steps the simplex algorithm
might take on 2 (and possibly 3) variable problems, where one an reasonably
expe t to be able to draw the feasible region graphi ally.
Be ause we have an e ient way to solve LP's, and they provide a quite general
framework for expressing optimization problems, it's often worth putting a fair amount
of e ort into trying to express an optimization problem as a LP. If su h a transformation
an be found, it automati ally gives an e ient solution to the optimization problem.
 Example: produ tion s heduling: One of the major uses of Linear Programming
is in business, where a ompany is trying to make de isions that maximize pro t,
while obeying ertain onstraints. Here is a simple example: a business produ es two
produ ts, A and B , and it must de ide how many of ea h to produ e in the next month.
32

Say produ t A sells for $10 per unit and B sells for $12 per unit. Then to maximize
pro t, the business should solve the following LP:
maximize 10a + 12b
subje t to
a; b  0
Here a is the amount of A to produ e and b is the amount of B to produ e. What is
the solution to this LP? The solution is to produ e an in nite amount of A and B . In
LP terminology we say that this problem is unbounded. Clearly this is not realisti , so
we should add some onstraints to model the situation more a urately. Let's say that
the ompany an produ e no more than 10; 000 units of either produ t. We should
then add the onstraints:
a  10000
b  10000
Now, the optimum is a hieved by produ ing 10; 000 units of ea h. This is reasonable
but not terribly interesting. Now suppose there is a single pa kaging ma hine that
must pa kage ea h unit of A or B produ ed by the ompany before it is shipped, and
suppose that it takes 1 minute to pa kage a unit of A and 3 minutes to pa kage a unit
of B . Furthermore, the pa kaging ma hine is run during working hours, whi h are 40
hrs/week, for 4 weeks. We get the following additional onstraint:
a + 3b  9600;
be ause the ma hine an operate for a total of 4  40  60 = 9600 minutes. Now it's
harder to see what the optimum is, but we ould use our LP-solver to nd it.
To make the problem even more realisti , we ould onsider planning a s hedule for
the next 12 months. We would then have 24 variables (a1 ; a2; : : : a12 and b1; b2 ; : : : b12 )
instead of 2, representing the amount of ea h produ t produ ed ea h month. Our
onstraints ould be modi ed to a ount for anti ipated di erent pri es from month to
month, di erent monthly osts, and the possibility of produ ing extra units one month
and storing them (at some ost, of ourse) until they are needed in a future month.
The readings have su h an example. The point is that it is often very easy to formulate
this type of de ision-making as a LP, and by adding onstraints, we an model quite
omplex s enarios.
 Example: maximum delay in a network: Suppose I send an email to a friend
and I want to know how long I must wait before I am ertain that the email has
been re eived. The maximum delay will depend on the hara teristi s of the network
33

through whi h the message passes. We model the network with the following simple
graph:
s1
s2
s3

t1
t2
t3

t
sn

tn

In this model, the sour e omputer is lo ated at the vertex s, the destination is at
vertex t, the verti es labeled s1 ; s2; : : : sn represent the possible points of entry to the
network, and the verti es labeled t1 ; t2; : : : tn represent the possible exit points. The
edges (si; tj ) represent possible routes through the network. Every edge e in the graph
has an asso iated weight w(e) that is an upper bound on the delay along that route. To
solve my problem, I am interested in the path from s to t with the maximum weight.
We will try to write this problem as a Linear Program.
This is an example of a problem where the LP variables are not at all obvious. For
this problem we will have a variable asso iated with ea h vertex, whi h, by abuse of
notation we will label with the same name as the vertex itself. So, the LP variables
are s; t; s1; s2 ; : : : sn; t1 ; t2; : : : tn. The LP is:
minimize s t
subje t to s si  w(s; si) 8i
si tj  w(si; tj ) 8i; j
tj t  w(tj ; t) 8j
This is fairly ounter-intuitive. To understand how these onstraints are useful, noti e
that we are trying to minimize the gap between s and t. That is, the LP is trying to
push t as lose to s as it possibly an. We want the gap between s and t to be the
length of the longest path, so we need onstraints that prevent the gap from be oming
too small. Ea h onstraint in the LP for es the gap between the variables representing
endpoints of an edge to be at least as large as the length of that edge. In parti ular,
we an prove the following laim:
Claim 1 For all paths p from s to t, a feasible assignment to the variables in the LP
above must have s t  length(p).

34

Proof: Say the path p goes from s to si to tj to t. Let's see what information we an

get from the onstraints we know these four variables must obey:
s si  w(s; si)
si tj  w(si; tj )
tj t  w(tj ; t)
If we sum these inequalities, we get s t  length(p) as required.
Sin e this laim holds for all paths, it holds, in parti ular, for the longest path. This
means that the optimum value of the LP an be no smaller than the length of the
longest path. To nish proving that the optimum a tually gives us the length of the
longest path, we need to exhibit a feasible solution for whi h the obje tive fun tion
a tually attains the value length(p), where p is a longest s-t path.
It turns out that setting t = 0, tj = w(tj ; t) for all j , si = maxj (w(si; tj ) + w(tj ; t)) for
all i, and s to the length of the longest s-t path, we get a feasible solution. The proof
is not di ult, but we did not get to it in se tion.
This problem is meant to give you some ideas that should be useful for the last problem
on the homework, whi h asks you to give a LP formulation of the single-pair shortest
paths problem.

35

Chris Umans

April 5, 2000

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583


umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109

 Network ow example: In the short exer ises, you are asked to draw the residual
graph after ea h augmenting path is added to the ow in a simple ow network. The
graphs should look something like this:
s

1
2
3
4
5

6
7
8
9

1
2
3
4
5

(a)

1
2
3
4
5

6
7
8
9

(b)
6
7
8
9

(c)

1
2
3
4
5

6
7
8
9

(d)

All the edge apa ities in the gures are one. Figure (a) is the original dire ted graph,
whi h is the same as the residual graph with respe t to the \zero" ow (no ow on
any edge). The rst augmenting path we nd is: s ! 1 ! 6 ! t. We push one unit
of ow along this path, and get the residual graph pi tured in (b). Next we nd the
augmenting path s ! 2 ! 8 ! t, and push one unit of ow along it. This yields the
residual graph in ( ). Finally we nd the augmenting path s ! 3 ! 7 ! t, and push
one unit of ow along it, yielding the residual graph in (d). In this residual graph,
there are no augmenting paths, so we know that the total ow of three units that we
have a hieved is a maximum ow.
 Max-Flow/Min-Cut Theorem: This is an important theorem, and working through
the proof is a good way to be ome familiar with properties of ows and augmenting
paths.
36

Theorem 5 (Max-Flow/Min Cut Theorem) Given a ow network G, the value


of a maximum ow in G equals the apa ity of a minimum ut in G.
Proof: First, a reminder of some de nitions, via some examples:
2/2
s

2/3
0/1

0/1

b 1/1
1/1 t

2
s

(a)

b 1

1
1

1/1

1
c

(b)

The value of a ow f , denoted jf j is the amount of ow leaving the sour e s (equivalently, the amount of ow entering the sink t). In the example network pi tured in (a),
ea h edge is labeled with the ow along that edge and the edge apa ity; the value of
the ow in this example is 2.
The apa ity of a ut (S; T ), denoted (S; T ) is the sum of the apa ities of the edges
rossing the ut in the s ! t dire tion. (Remember that a ut is a partition of the
verti es into two sets, S and T , su h that S ontains the sour e s and T ontains the
sink t). In the example network pi tures in (b), the dashed line represents a ut (i.e.,
S = fs; a; bg and T = f ; tg), whose apa ity is 3.
We an now pro eed with the proof. Our strategy is to prove the following two statements, whi h together imply the theorem:
1. The value of any ow in G is at most the apa ity of a minimum ut in G.
2. The value of a maximum ow f in G equals the apa ity of some ut in G.
Proof of (1). Let (S; T ) be a minimum ut in G, and let f be any ow in G. First,

we laim that:

jf j =

X
u2S;v2V

f (u; v ):

Why is this true? Let's break down the sum, by onsidering di erent verti es u might
be. When u is the sour e s, we sum all the outgoing ow from s, whi h we know
equals jf j. When u is any other vertex in S , we sum the outgoing ow minus the
in oming ow, whi h we know by onservation of ow to be zero. The in oming ow
is subtra ted be ause of the onvention that f (u; v) = f (v; u).
37

Let's break up the sum another way:


X
jf j =
f (u; v )
=

u2S;v2V

u2S;v2T

f (u; v ) +

X
u2S;v2S

f (u; v )

The rst summation on the last line is exa tly the ow rossing the ut. The se ond
summation on the last line equals zero, be ause of the same fa t we used above: for
every u; v 2 S , we are summing f (u; v) and f (v; u), and we know that f (u; v) =
f (v; u). Noting that the ow along an edge f (u; v ) an be no larger than the apa ity
of that edge (u; v), we get, nally:
X
jf j =
f (u; v )
=
=

u2S;v2V

u2S;v2T

u2S;v2T

f (u; v ) +
f (u; v ) 

X
u2S;v2S

u2S;v2T

f (u; v )
(u; v ) = (S; T );

whi h proves part (1). Noti e also that we proved along the way that jf j = Pu2S;v2T f (u; v),
for any ow f and any ut (S; T ); we will use that again in the se ond part.
Proof of (2). Let f be a maximum ow in G. We will onstu t a ut (S; T ) su h
that (S; T ) = jf j. We know that there an be no augmenting paths in the residual
network Gf . If there were, then f ould be augmented and it ould not have been
a maximum ow. Let S be the set of all verti es rea hable from the sour e s in the
residual network Gf . This set annot in lude the sink t be ause then there would be
an augmenting path. If we let T be the verti es not rea hable from s in the residual
network, then (S; T ) is a ut in G.
What an we say about the edges in the original network G, that ross the ut (S; T )
in the s ! t dire tion? Consider su h an edge (u; v), where u 2 S and v 2 T . If the
ow along edge (u; v) is less than the apa ity of edge (u; v), then the ex ess apa ity
will show up as an edge from u to v in residual graph with non-zero apa ity. But that
an't be, be ause by our hoi e of S and T , u is rea hable from s in the residual graph,
but v is not! Therefore the ow along edge (u; v) must equal its apa ity. We say that
edge (u; v) is saturated.

38

Sin e we have just argued that all edges that ross the ut in the s ! t dire tion are
saturated, we have:
X
X
f (u; v ) =
(u; v ) = (S; T ):
u2S;v2T

u2S;v2T

Putting that together with the fa t that jf j = Pu2S;v2T f (u; v), whi h we proved earlier,
we have:
X
X
jf j =
f (u; v ) =
(u; v ) = (S; T );
u2S;v2T

u2S;v2T

whi h is what we set out to prove.

39

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

April 12, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109

 The P versus NP question: The question of whether P equals NP is one of the


most important open questions in Computer S ien e and Math. The P/NP problem
is a deep and signi ant problem; here's part of the reason why: there are at least two
dramati ally di erent \worlds" we might be living in, and we don't know whi h one
we are in.
{ In World 1, P 6= NP . We think we live in this world, but we are not sure. In
this world, there are easy problems (ones in P), and many hard problems (NP omplete problems).
{ In World 2, P = NP . This is a strange world. Here, designing algorithms is as
easy as des ribing how to re ognize a solution. There is (in some sense) a single
\super" algorithm, that an be easily adapted to solve almost every problem in
CS (like Linear Programming on a mu h grander s ale). And, interestingly, all
urrently-used ryptography an be broken.
 Why you should are: P, NP, and NP- ompleteness are pra ti ally useful on epts
as well. Pi ture an algorithm designer who knows about these on epts versus an
algorithm designer who doesn't. The algorithm designer who doesn't know about NP ompleteness will probably waste lots of time throughout their areer trying to invent
e ient algorithms for problems for whi h su h algorithms are probably not possible
(some of you experien ed this rst-hand with the Garage Sale Problem, before the
TAs told you it was NP- omplete). The algorithm designer who does know about
NP- ompleteness, when given a hard problem, an often prove with relative ease that
trying to nd an e ient exa t algorithm is a waste of time, and go on to more fruitful
alternative approa hes (su h as approximation algorithms).
 What you need to be able to do: Given a problem Qnew , you should know how to
prove it is NP- omplete. This involves two steps:
1. Prove Qnew is in NP.
2. Redu e a known NP- omplete problem, Qold to Qnew .
40

A problem Q that is NP- omplete, is one of the \hardest" problems in the lass NP;
moreover, if you an solve Q in polynomial time, then every problem in NP an be
solved in polynomial time, and hen e P = NP . Before we des ribe how to do ea h
step, we need to know exa tly what is meant by a \problem."
 Optimization and de ision problems: For our purposes, a problem is a question
with the following form:
1. Given | {z } , what is the shortest/longest/minimum/maximum | {z } ?
stru ture
instan e
E.g., Given a graph G, what is the traveling salesperson tour of minimum length?
2. Given | {z } , is there a | {z } ?
stru ture
instan e
E.g., Given a graph G and an integer k, is there a traveling salesperson tour of
length  k?
The rst type of problem is an optimization problem. The answer to the question it
asks is some optimal stru ture (like a shortest tour in the graph). The se ond type of
problem is a de ision problem. The answer to the question it asks is YES or NO. It is
important to remember that P and NP are lasses of de ision problems. Optimization
problems an be transformed into de ision problems by adding a \bound" k to the
problem instan e, as we did above. On your homework you are asked to show that the
de ision and optimization versions of a problem are \equivalant", up to a polynomial
fa tor.
 Showing a problem is in NP: A (de ision) problem Q is in NP if there exists a
polynomial-time he king algorithm. This algorithm is given an instan e I of Q and a
\potential solution" C to Q, and it outputs either YES or NO. We require that
1. If I is a \yes" instan e, then there exists some C (a erti ate) that makes the
he king algorithm output YES.
2. If I is a \no" instan e, then no C makes the he king algorithm output YES.
In pra ti e, you should think in terms of a short erti ate you would need to provide
to \ onvin e" the he ker that a \yes" instan e really is a YES instan e, together with
an e ient pro edure for the he ker. Some examples (for problems you have seen):
{ 3-SAT: The instan e is a 3-CNF Boolean formula; the erti ate is a (purported)
satisfying assignment. The he ker assigns the values spe i ed by the assignment,
and he ks that ea h lause ontains a TRUE.
41

{ Hamiltonian Path: The instan e is a graph G verti es; the erti ate is a list of
verti es that (supposedly) forms a Hamiltonian path in G. The he ker makes

sure that every vertex appears exa tly on e in the list, that there is an edge
between ea h pair of subsequent verti es in the list.
{ Maximum Independent Set: The instan e is a graph G and an integer k; the
erti ate is a (supposedly) independent set of the verti es of size k. The he ker
makes sure no pair of the verti es has an edge between them, and that there are
indeed k di erent verti es in the set.
{ Graph Isomorphism: The instan e is a pair of graphs G1 and G2 ; the erti ate is
a (purported) isomorphism between G1 and G2 (i.e., a bije tive mapping  from
the verti es of G1 to the verti es of G1 ). The he ker veri es that  is indeed
bije tive, and that for every edge (u; v) in G1 , the edge ((u); (v)) belongs to
G2 , and that for every edge (u; v ) in G2 , the edge ( 1 (u);  1(v )) belongs to G1 .
 Redu tions: A polynomial-time redu tion from a de ision problem Qold to Qnew is a
polynomial-time algorithm R that is given an (arbitrary) instan e of Qold and produ es
an instan e of Qnew . We require that \yes" instan es of Qold map to \yes" instan es
of Qnew , and that \no" instan es of Qold map to \no" instan es of Qnew . Graphi ally,
the mapping looks like this:
YES

YES

NO

NO
R

Q_old

Q_new

To remember whi h dire tion the redu tion goes in, it is useful to remember our purpose
in performing the redu tion in the rst pla e. A redu tion from Qold to Qnew allows
us to argue: \If Qnew has a polynomial-time algorithm, then Qold does." Why is this
true? Suppose we had an e ient algorithm A for Qnew . Then given any instan e I of
Qold , we ould apply the redu tion R, obtaining an instan e R(I ) of Qnew , and solve
that instan e using A. Be ause yes maps to yes and no maps to no, we know that the
yes/no answer for R(I ) is also the yes/no answer for I . If you are not sure if you have
redu ed in the right dire tion, try making this argument, and see if it works.
 A sample redu tion: Let's try redu ing 3-SAT to Independent Set. Here are the
two (de ision) problem de nitions:
42

{ 3-SAT: Given a Boolean formula in 3-CNF form (an AND or ORs, where ea h

OR involves at most 3 literals), is there a satisfying assignment?


{ Independent Set: Given a graph G = (V; E ) and an integer k, is there a independent set V 0  V of size at least k? (An independent set is a set of verti es, no
two of whi h are onne ted by an edge).
Our redu tion takes an instan e of 3 SAT , a 3-CNF  with m lauses, whi h looks
something like:
(x _ y _ :z) ^ (:x _ w _ z) ^    ^ (  )
and produ es a graph G and an integer k. The graph looks like this:
x

~x

~z

In general, there will be a triangle for ea h of the m lauses (or an edge or a single vertex
if there are only 2 or 1 literals in the lause). To help us reason about the problem we
label the verti es of the triangle with the literals in the orresponding lause. Then we
onne t every pair of verti es labeled with ontradi tory literals. Finally we set k = m.
To show that \yes maps to yes and no maps to no," we prove the following laim.
Noti e that we need to prove both dire tions of this laim, even though the redu tion
is only a mapping in one dire tion.
Claim 2  has a satisfying assignment if and only if G has an independent set of size
at least k.
Proof: (!) Suppose  has a satisfying assignment. Then under that satisfying as-

signment there must be at least one TRUE in ea h lause. Pi k one TRUE literal in
ea h lause, and in lude the orresponding vertex in the independent set. This set
has size m  k, and it is in fa t an independent set be ause we annot have in luded
any verti es labeled with ontradi tory literals, sin e the TRUES ame from a truth
assignment.
( ) Suppose G has an independent set V 0 of size at least k. No more than one vertex
from ea h triangle an be in V 0 , so V 0 must have size exa tly k = m. Furthermore, no
two verti es labeled with ontradi tory literals an be in V 0 be ause it is an independent
43

set. Then, we an set the literals labeling verti es in V 0 to TRUE in a onsistent way,
and set the other variables in any way we wish, and we will have at least one TRUE
literal in ea h lause, whi h is a satsifying assignment.

44

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

April 19, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 Proving 0/1 integer programming NP- omplete: This was the short exer ise.
The 0/1 integer programming problem is de ned in CLR as follows:
{ 0/1-IP: Given an integer m  n matrix A and an integer m-ve tor ~b, is there an
0/1 n-ve tor ~x su h that A~x  ~b?
Re all that there are two steps in proving a problem is NP- omplete. First, we show
that 0/1-IP is in NP. We ask ourselves, \For a yes instan e, what short erti ate would
onvin e a e ient veri er that this is indeed a yes instan e?" Clearly, the solution
ve tor ~x would su e. The he king pro edure takes an instan e A; b, and a 0/1 ve tor
~x that is a (purported) solution, and he ks that A~x  ~b. As this involves he king m
inequalities, it an be done easily in polynomial time. This shows that 0/1 IP is in NP.
Now we follow the hint and redu e from 3-CNF-SAT to 0/1-IP. Re all the de nition
of 3-CNF-SAT:
{ 3-CNF-SAT: Given an Boolean formula  that is the AND or ORs, with ea h OR
involving at most 3 literals, is there a satisfying assignment for ?
The redu tion R needs to perform the following mapping:
YES

YES

NO

NO

R
3-CNF-SAT

0/1-IP

That is, given an (arbitrary) instan e of 3-CNF-SAT, , whi h looks something like:
(z1 _ z2 _ :z3 ) ^ (:z1 _ z4 _ z3 ) ^    ^ (  );
we need to produ e an instan e of 0/1-IP, (whi h onsists of m inequalities involving
n 0/1 variables) spe i ed by an m  n matrix A and an m-ve tor ~b.
Before attempting the transformation, let's remind ourselves of the laim we will need
to be able to prove:
45

Claim 3  has a satisfying assignment if and only if there exists a 0/1 ve tor ~x for
whi h A~x  ~b.

Sin e the 3-CNF-SAT instan e involved nding a truth assignment to the z variables,
it seems natural to have the x variables of the IP orrespond to the z variables. So,
we set n to be the number of z variables in , and we have one 0/1 x variable for ea h
in our instan e of 0/1 IP. We will asso iate the 0/1 value of variable xi in the IP with
the FALSE and TRUE values of the orresponding zi variable in .
Now, we need to onstrain the x variables so that there is a solution to Ax  b if and
only if there is a satisfying assignment to . Noti e that the IP solution must satisfy
ALL of the inequalities, just as a satisfying assignment must satisfy ALL of the lauses
of . This suggests that we should have a onstraint in our IP for ea h lause in .
For a lause in  with no negated variables, like (z1 _ z2 _ z3), it is pretty lear that
we want a onstraint like:
x1 + x2 + x3  1
Thinking of our orresponden e between IP variables and variables of , this onstraint
says \at least one variable in the lause must be true."
What onstraint should we write for a lause like (z1 _ z2 _ :z3 )? We need a (linear!)
fun tion f that behaves like :. That is, we would like f (x3) = 0 if x3 = 1 and f (x3) = 1
if x3 = 0. It seems that f (x3) = 1 x3 will do the tri k. Then we an en ode a lause
like (z1 _ z2 _ :z3 ) with a onstraint like:
x1 + x2 + (1 x3 )  1:
We now have all we need to spe ify the redu tion. We set m to be the number of lauses
in , and for ea h lause, we write the orresponding onstraint on the x variables.
Using the standard algebrai manipulation we an get all of the onstraints into the
form a1 x1 + a2 x2 +    anxn  bj , where the ai and bj are onstants. From there we get
the matrix A and the ve tor ~b. Finally, we an prove the laim above:
Proof:[of laim (!) Assume  has a satsifying assignment. Then for ea h i we an
set the IP variable xi to 1 if zi is TRUE in the satisfying assignment, or 0 if zi is
FALSE. Sin e the satsifying assignment has at least one TRUE in ea h lause, there
is at least one xi set to 1 (or (1 xi ) set to 1) in ea h onstraint of the IP; hen e we
have des ribed a ve tor ~x for whi h A~x  ~b.
( ) Assume there is some ve tor ~x for whi h A~x  ~b. Then for ea h i, set the variable
zi to TRUE if xi is set to 1 and FALSE if xi is set to 0. Sin e every IP onstraint is
46

obeyed, every lause in  has a TRUE in it. We have exhibited a satisfying assignment
and therefore  is satis able.

 k- oloring example: As further pra ti e with redu tions, onsider the 3- oloring
problem (whi h you will prove to be NP- omplete in a homework):
{ 3-Coloring: Given a graph G, is there a oloring of the verti es with 3 olors so
that no two adja ent verti es are olored with the same olor?
We will try to prove that the 4- oloring problem is NP- omplete:
{ 4-Coloring: Given a graph G, is there a oloring of the verti es with 4 olors so
that no two adja ent verti es are olored with the same olor?
We will skip over verifying that 4- oloring is in NP; it should be quite easy.
We now want a redu tion R that performs the following mapping:
YES

YES

NO

NO
R

3-COLORING

4-COLORING

We are given an (arbitrary) graph G, and we want the redu tion to produ e a graph
G0 so that we an prove the following laim:
Claim 4 G is 3- olorable if and only if G0 is 4- olorable.

As a rst attempt, we might try setting G0 = G. One dire tion of the laim works { if
G is 3- olorable, then G0 = G is ertainly 4- olorable (we an just not use one olor).
However, the other dire tion is problemati . If G0 is 4- olorable, we don't know that
G is 3- olorable. We need to nd a way to \for e" G0 to \waste" one olor, so that the
remaining oloring is essentially a 3- oloring of G. We an do this by de ning G0 to
be G with one extra vertex v that is onne ted to every other vertex. Having spe i ed
our redu tion, we an now prove the laim:
Proof:[of laim (!) Suppose we have a 3- oloring of G. Then we an use the same
olors to olor all of G0 ex ept the extra vertex v with 3 olors, and then olor v with
the 4-th olor. Hen e G0 is 4- olorable.
47

( ) Suppose we have a 4- oloring of G0. Noti e that whatever olor is used by the
extra vertex v annot be used by any of the other verti es (be ause v is onne ted
to every other vertex). Therefore the remaining verti es of G0 (whi h are exa tly the
verti es of G) must be olored with only 3 olors, whi h implies that G is 3- olorable.
Noti e that we an generalize this redu tion. We an prove the problem k-Coloring,
for any k  4, is NP- omplete (e.g., 29-Coloring is NP- omplete). We use the same
idea, but now we need to add k 3 extra verti es, ea h of whi h is onne ted to every
other vertex (in luding the other k 4 extra verti es).

48

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

April 27, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109
 Solving a modular linear equation, example: The short exer ise asked you to
solve the following equation:
35x  10 (mod 50)
By the theorem from le ture, this equation has a solution if and only if g d(35; 50)
divides 10. We should he k that rst. By inspe tion, d = g d(35; 50) = 5, (we ould
have omputed it using Eu lid's algorithm), and 5 divides 10, so the equation has a
solution. Also by the theorem, it has d = 5 distin t solutions modulo 50. To nd the
solutions, we run ExtendedEu lid on 50 and 35:
EE (50; 35) ! EE (35; 15) ! EE (15; 5) ! EE (5; 0)
#
(5; 2; 3)
(5; 1; 2)
(5; 0; 1)
(5; 1; 0)
The top row (from left to right) is meant to indi ate the series of re ursive alls to
ExtendedEu lid (EE); we use the fa t that EE (a; b) alls EE (b; a mod b). The bottom
row (from right to left) shows the triplet returned by ea h all; we use the fa t that if
EE (b; a mod b) returns (d; x0 ; y 0) then EE (a; b) returns (d; y 0; x0 ba=b y 0 ).
The result gives us a linear ombination of 50 and 35 that equals 5:
(3)(35) + ( 2)(50) = 5:
Now we onsider this equation modulo 50:
(3)(35) + ( 2)(50)  5 (mod 50)
(3)(35)  5 (mod 50)
(6)(35)  10 (mod 50)
The last line is obtained by multiplying both sides by 2. Now, we have a solution to the
equation, namely, 6. The theorem tells us that the other 4 distin t solutions (modulo
50) an be obtained by adding multiples of 50=d = 50=5 = 10 to this solution. The 5
distin t solution modulo 50 are therefore 6; 16; 26; 36; 46.
49

 Christo des' algorithm: We dis ussed an approximation algorithm for TSP with the
triangle inequality in le ture; here is a more lever variant. Re all the approximation
algorithm from lass: we are given a omplete graph G whose edge weights satisfy the
triangle inequality. We nd a minimum spanning tree M in G, and form a tour T 0 by
going \twi e around" this tree, as pi tured below:

We then transformed tour T 0 into tour T by \short- ir uiting" the tour so that it
visited every vertex exa tly on e. That is, starting at the root, we followed T 0 until it
was about to visit an already-visited vertex, at whi h point we moved dire tly to the
next unvisited vertex on tour T 0.
We noti ed that be ause an optimum traveling salesperson tour of G yields a spanning
tree with the same or smaller ost (by simply deleting an edge from the tour), that
w(M )  w(OP T ). Here w(M ) is the weight of spanning tree M , and w(OP T ) is the
length of the optimum traveling salesperson tour.
Be ause of the triangle inequality, the short- ir uiting pro ess annot have in reased
the tour length, so we have:
w(T )  w(T ) = 2w(M )  2w(OP T );
whi h means that we have an approximation algorithm with ratio 2, sin e it returns a
tour that is no longer than twi e the optimum.
Christo des' algorithm introdu es a lever twist, and improves the ratio to 1:5. It
begins in the same way, by nding in G = (V; E ) a minimum spanning tree M .
It then nds a minimum weight perfe t mat hing on V 0, the odd-degree verti es of
M . This requires some explanation. First, noti e that M (indeed any graph) has an
even number of odd-degree verti es. This is be ause the sum of the degrees equals
twi e the number of edges, an even number. Now, a perfe t mat hing is simply a
mat hing in whi h every vertex is tou hed by an edge in the mat hing. The weight of
a mat hing is just the sum of the weights of the edges in the mat hing. A minimum
weight perfe t mat hing (MWPM) is a perfe t mat hing with minimum weight. There
50

is a polynomial-time algorithm that nds a minimum weight perfe t mat hing, but we
won't des ribe it here.
Let us all the MWPM we nd P . We add the edges of P to our MST M . In our
example graph, we might get something like this:

Noti e that in the resulting graph, every vertex has even degree! We have \repaired"
all of the odd-degree verti es in V 0 by mat hing them with other odd-degree verti es.
What do we know about graphs in whi h every vertex has even degree? There is an
Euler ir uit (a tour that traverses every edges exa tly on e and returns to its starting
point). We let T 0 be an Euler ir uit, and then short- ir uit it to get an honest traveling
salesperson tour T as before.
Now, how long is our tour T ? We know that w(T )  w(T 0) as before, and that
w(T 0) = w(M ) + w(P ), sin e T 0 traverses every edge in the spanning tree M and the
perfe t mat hing P exa tly on e. As before we have the inequality w(M )  w(OP T ).
We need to know the relationship between w(OP T ) and w(P ).
We laim that 2w(P )  w(OP T ). This is not hard to see: rst, noti e that an optimum
traveling salesperson tour on only the verti es in V 0 ( all it OP T 0) an be no longer
than OP T , by the triangle inequality { we ould just short- ir uit OP T to skip over
the verti es not in V 0 , making the tour no longer in the pro ess. Now, sin e there
are an even number of verti es in V 0, OP T 0 is a y le of even length. Therefore, by
taking every-other edge around OP T 0, we get a perfe t mat hing on V 0, and by taking
the other edges around OP T 0 we get another perfe t mat hing on V 0. Ea h of these
perfe t mat hings an have weight no larger than the MWPM, so we have:
w(OP T )  w(OP T 0)  2w(P ):
This implies that w(P )  0:5w(OP T ). Putting everything together, we an bound the
length of our tour T :
w(T )  w(T 0) = w(M ) + w(P )  w(OP T ) + 0:5w(OP T ) = 1:5w(OP T ):
Therefore, the algorithm we just des ribed attains an approximation ratio of 1.5, sin e
it returns a tour that is no longer than 1.5 times the optimum.
51

Chris Umans

O e Hours: 3:00{5:00 Tuesdays and Thursdays in Soda 583

May 3, 2000

umans s.berkeley.edu

Se tions: 10:00{11:00 in Cory 289


11:00{12:00 in Et heverry 3109

 A pseudo-polynomial-time algorithm for knapsa k: The Knapsa k optimization


problem is de ned as follows:
{ Given n elements with integer values v1 ; v2 ; : : : vn and weights w1 ; w2 ; : : : wn , and a
apa ity W , what is the olle tion of elements with maximum total value, whose
weight does not ex eed W ?
The de ision version of this problem is NP- omplete (it should seem similar to SubsetSum and Bin-Pa king), but we will not prove that here. Instead we will fo us on a
dynami programming algorithm that produ es an exa t solution to this problem.
We rst des ribe the subproblems and the dynami programming table. De ne K (i; v)
to be the minimum weight of a subset of the rst i elements with total value v (or 1
if no su h subset exists). Let us denote by V the maximum value of any element; i.e.,
V = maxi vi . The dynami programming table looks something like this:
v
0 1 2 3
i

nV

1
2
n

We know that nV is an upper bound on the maximum possible value attainable, so


it is safe to allo ate only nV olumns in the table. To gure out the solution to the
knapsa k problem from this table (on e all the ells are lled in), we onsider the values
in the bottom row from right to left. Starting in olumn nV , if the bottommost ell
has weight  W , then we have a solution with value nV . If not, there is no way to
sele t items with value nV and permissible weight. So we he k if the weight in ell
(n; nV 1) is  W . If so, we have a solution with value nV 1. We ontinue in this
fashion until we see a ell (n; v) with weight  W ; this v is then the solution to the
problem.
52

To ll in the table, we use the following re urren e:


K (i; v ) = minf |K (i {z 1; v }) ; |K (i 1; v{z vi ) + w}i g:
in lude item i
( ex lude item i
w1 if v1 = v
K (1; v ) =
1 otherwise
The running time of this algorithm is O(n  nV ) = O(n2V ). At rst glan e this seems
like a perfe tly good e ient algorithm for a problem we said was NP- omplete. Why
does this not show P = NP ? The running time depends on V , the maximum value of
the elements in the problem. This value might be exponentially large, be ause we an
represent a value of size about 2n using n bits in the input to the problem. However, for
\reasonable" values for the elements, this solution works well. This is what is known
as a pseudo-polynomial time algorithm, be ause its running time depends polynomially
on the magnitude of the values in the problem instan e, instead of polynomially on the
size of their representation.
 A polynomial-time approximation s heme: The algorithm we just des ribed an
be adapted to give an approximation algorithm for knapsa k that is the best sort of
approximation algorithm we an hope for: for any given  > 0, we will be able to devise
a polynomial-time algorithm that gets - lose to the optimum.
Let's onsider what a \bad" ase for the exa t algorithm above might look like. We
might have really large values for the elements, like this:
v1 = 10000034
v2 = 28000001
v3 = 34000110
v4 = 13500003
Sin e we only need an approximate solution, a good idea would be to trun ate these
values, keeping only the most signi ant digits, and solve the resulting knapsa k problem, whi h now has smaller values.
Let's make this idea more formal: de ne a new instan e of knapsa k from the old one
by trun ating the last k bits of every value. That is, vi0 = dvi =2k e for ea h i, and then
the new maximum value V 0 = dV=2k e. Now, let's run the exa t dynami programming
algorithm on this new instan e, and then multiply the value it a hieves by 2k at the
end. We'll all the value output by this pro edure AP P ROX .
53

We noti e that the feasible subsets of items (subsets whose total weight does not ex eed
W ) remains un hanged, so our algorithm's solution is a valid solution to the original
instan e. Be ause ea h element potentially lost some value in the s aling down/s aling
up pro ess, AP P ROX  OP T , where OP T is the value of an optimum solution to
the original problem. But how mu h smaller that OP T an AP P ROX really be?
We laim that ea h item lost at most 2k in value (be ause we essentially ignored the
lower order k bits). Sin e there are n items, we have:
AP P ROX  OP T n2k :
We then have:
k
k
AP P ROX OP T n2k

= 1 n2  1 n2 ;
OP T

OP T

OP T
OP T  V (we

where the last inequality is true be ause


an always in lude only the
highest-value element).
So now, if we want to get - lose to OP T , we set k = log2(V=n). Plugging in, we get:
k
AP P ROX
 1 n2 = (1 )OP T:
OP T

The running time of the approximation algorithm is:


O(n2 V 0 ) = O(n2dV=2k e) = O(n3 =);
whi h is polynomial for any given hoi e of  > 0.
What we have des ribed is a polynomial-time approximation s heme, or \PTAS". A
PTAS is a lass of approximation algorithms, one for any  > 0 that gets - lose to
the optimum. Ea h algorithm in the lass runs in polynomial time. The te hnique
of trun ating values that we used here an be more generally applied to problems for
whi h there is a pseudo-polynomial time exa t algorithm.

54

You might also like