Professional Documents
Culture Documents
170 Ps
170 Ps
170 Ps
nlog a , if a > bd ;
nd log n , if a = bd ;
nd , if a < bd .
b
n
X
k=1
log k =
s
n=X
2 1
k=1
log k +
s
n
X
k=n=2
logs k:
Now, if we use a lower bound on ea
h term, the terms from the se
ond half of the
summation will give a meaningful lower bound:
S (n) =
n=X
2 1
k=1
log k +
s
n
X
k=n=2
Chris Umans
February 2, 2000
b
c
d
(1)
(2)
f
(3)
Example (1)
annot have been produ
ed by a DFS. (Why?) However it
ould have
been produ
ed by a BFS. Example (2)
ould have been produ
ed by a DFS. We
an
determine from the gure that the DFS would have to have visited the right neighbor
of the root before the left. Example (3)
ould have been produ
ed by a DFS (noti
e
it is a forest, not a single tree). In what order did it visit the verti
es? First d, then
e; f;
; b; a.
Strongly
onne
ted
omponents, example proof: One of the key properties
used in the strongly
onne
ted
omponents algorithm states that \if C1 and C2 are
two strongly
omponents, and there is an edge from a vertex of C1 to a vertex of C2,
then the rst vertex visited in C1 has a larger f -value than any vertex in C2 ." How do
we prove a statement like this? Here, it works best to
onsider the dierent possible
ways a DFS might visit the verti
es. We
an break them into two
ases: either (1)
a vertex of C1 is the rst vertex among C1 and C2 to be visited, or (2) a vertex of
C2 is the rst vertex among C1 and C2 to be visited. These two
ases
over all the
possibilities.
In the rst
ase,
all v the rst vertex visited in C1. By the \white-path theorem," we
must eventually traverse an edge leading into C2 before nishing v. At this point, we
visit and nish all verti
es in C2, be
ause C2 is strongly-
onne
ted. Sometime after
3
this point, we ba
kup to v and nish it. So f (v) is larger than the f -value for any
vertex in C2, as required.
In the se
ond
ase, we visit some vertex in C2 rst, and then we know that we will visit
all of C2, sin
e it is a strongly-
onne
ted
omponent. We also know that we
annot
get to any vertex in C1 until we visit and nish all verti
es in C2 , be
ause otherwise
there would be an dire
ted path from some vertex in C2 to a vertex in C1, whi
h would
make them a single
onne
ted
omponent,
ontrary to assumption. So all the verti
es
in C2 are nished before we even visit a vertex of C1, so if v is the rst vertex visited
in
1, we have f (v) > d(v) and d(v) greater than the f -value for any vertex in C2 , as
required.
Strongly
onne
ted
omponents, example: We worked through the following
example of the strongly
onne
ted
omponents algorithm. The rst DFS o
urs in GT ,
pi
tured on the right. The dis
overy and nishing times are re
orded. In the pro
ess,
we re
ord a list of verti
es in des
ending order of nishing times; in this example the
list is a;
; d; f; e; g; b.
a
0/13
1/12
c
d
4/11
2/3
5/6
g
f
8/9
7/10
T
G
Now the se
ond DFS is run in G, using the list of de
reasing nishing times as a guide.
We start at a, and dis
over a; b and
(the rst strongly
onne
ted
omponent). At
this point the DFS gets \stu
k," as intended, and restarts at an unvisited vertex. The
unvisited vertex in the list with the next highest nishing time is d. Starting from d, we
dis
over d; e and f (the se
ond strongly
onne
ted
omponent). Again, we get stu
k,
and restart at the last unvisited vertex g, whi
h is a strongly
onne
ted
omponent by
itself.
Strongly
onne
ted
omponents, pra
ti
e: Why do we go to the trouble of reversing all the graph edges for the rst DFS and then
omputing the se
ond DFS in
des
ending order of nishing values? Couldn't we just do the rst DFS on the original
graph and
ompute the se
ond DFS in order of in
reasing nishing values? Here is a
ounterexample:
4
2/3 1/4
The dis
overy and nishing times are re
orded for ea
h vertex, for an initial DFS
starting at the leftmost vertex. The strongly
onne
ted
omponents are marked with
dashed lines. Noti
e that the earliest nishing value o
urs in the strongly
onne
ted
omponent that is a sour
e in the strongly
onne
ted
omponent DAG. If we ran the
se
ond DFS a
ording to in
reasing nishing times, we would start in this
omponent and group all of the graph into a single strongly
onne
ted
omponent, whi
h is
in
orre
t.
Is there a
ounterexample to the
onje
ture that we
an nd strongly
onne
ted
omponents by
omputing the se
ond DFS in order of de
reasing dis
overy times? How
about in
reasing dis
overy times?
Chris Umans
February 9, 2000
umans s.berkeley.edu
Lemma 2 (Ex
hange Property) Given spanning trees T and T 0 , for any edge e0 2
T 0 T , there exists an edge e 2 T T 0 , with the property that T feg [ fe0 g is a
spanning tree.
Noti
e that we haven't mentioned anything about edge weights yet. Here is a simple
example:
e
e
(a)
(b)
(c)
Figure (a) pi
tures the graph with tree T in bold; gure (
) pi
tures the graph with
tree T 0 in bold. In this example, we have
hosen the labeled edge e0 2 T 0 T , and we
want to perform an exhange using this edge. We
an think of rst adding edge e0 to
T (pi
tured in gure (b)), and then removing the edge e 2 T T 0 whose existen
e is
guaranteed by the lemma.
How do we prove this lemma? It might be useful to use the example as a guide. We
rst noti
e that in the general
ase, adding edge e0 to T
reates a unique
y
le, be
ause
T is a spanning tree. Then to restore a spanning tree, we need only to remove a single
edge e on that
y
le. The lemma requires that this edge be in T T 0. What would
have to happen for us not to be able to nd an edge e 2 T T 0 on the
y
le? What
ould go wrong? All of the edges on the
y
le are already in T , so it would have to be
that all of the edges were also in T 0. But that
an't happen, be
ause T 0 is a tree! So
an edge e as required by the lemma must exist, and that proves the lemma.
Transforming a spanning tree into a MCST: A very useful Theorem one
an
prove using the Ex
hange Property is the following:
Theorem 3 Let T be a spanning tree and let M be a MCST. Using a series of ex
hange
moves, we
an transform T into M :
T
! T1 ! T2 ! T3 ! ! Tk = M:
Furthermore, the weights of ea
h su
essive spanning tree are non-in
reasing; i.e.
w(Ti ) w(Ti+1 ) for all i.
Proof: The proof is by indu
tion on the number of edges T and M share. As a base
ase, if they share all n 1 edges, then T = M , and the theorem is trivially true.
Otherwise, pi
k e0 to be the lightest edge that is not in both trees. That is, e0 is the
lightest edge among all those edges in T M and M T . We'll perform an ex
hange
on edge e0 and see what we end up with.
Sin
e e0
omes from either M or T , there are two
ases:
1. e0 2 T M : if we perform an ex
hange using the edge e 2 M T guaranteed by
the Ex
hange Property, we obtain a new spanning tree M 0 = M feg[fe0 g. Like
e0 , e is an edge that is not in both trees. Sin
e e0 was
hosen to be the lightest
su
h edge, we know that w(e0) w(e). We
on
lude that w(M 0 ) w(M ), and
be
ause M was a MCST, it must be that equality holds, and therefore M 0 is also a
MCST. Tree M 0 has more edges in
ommon with T than M did, so by indu
tion,
we
an identify the required sequen
e of ex
hanges to transform T into M 0 , and
then reverse the ex
hange we just performed to get M .
2. e0 2 M T : if we perform an ex
hange using the edge e 2 T M guaranteed by the
Ex
hange Property, we obtain a new spanning tree T 0 = T feg[fe0 g. As before,
we know that w(e0) w(e). We
on
lude that w(T 0) w(T ), so this ex
hange is
a perfe
tly good rst step for our sequen
e. Tree T 0 has more edges in
ommon
with M than T did, so by indu
tion, we
an identify the required sequen
e of
ex
hanges to transform T 0 into M ,
ompleting the sequen
e of ex
hanges required
by the theorem.
Appli
ation of Theorem 3: The question we want to answer is the following: Suppose we have a graph G in whi
h all edge weights are distin
t, and let M be a MCST
of G. Is M unique?
The answer turns out to be \yes," but it is not so straightforward to prove this from
s
rat
h. Using Theorem 3 above, it be
omes pretty simple.
Suppose, for the purpose of
ontradi
tion, that we have two distin
t MCSTs M1 and
M2 . Applying the theorem, we see that we
an transform M1 into M2 using a series
of exhange moves, in whi
h the weights of su
essive trees are non-in
reasing. But the
very rst ex
hange in this sequen
e, whi
h transforms M1 into some other spanning
tree M10 , must alter the weight somehow, sin
e the two edges being ex
hanged have
distin
t weights. Sin
e M1 is a MCST, it
annot be the
ase that the weight de
reases.
Therefore it must in
rease, whi
h
ontradi
ts Theorem 3. We
on
lude that there
annot have been two distin
t MCSTs.
8
Chris Umans
umans s.berkeley.edu
hypothesis the size of ea
h of these (disjoint) trees is 2k 1. Therefore the size of the
resulting tree (with RANK k) is at least 2 2k 1 = 2k .
Now, we noti
e that the ranks of elements below x in the tree have RANKS stri
tly
smaller than k. Therefore, for every element x of RANK k, we must have at least 2k
other elements of whi
h don't have RANK k in the tree below x. Therefore, if there
are n total elements, we
an only a
omodate n=2k elements of RANK k, at most.
9
Now, we group the RANKs as follows: RANK k is in group log (k), where log is the
iterated log fun
tion. Noti
e that ea
h group
ontains RANKs that go from k + 1 to
2k , where k is some \tower of twos" (e.g. 22 ). As in the readings, we will sometimes
refer to a group by the range of RANKs that it
ontains, like this: (k; 2k .
We
an now make two further observations about RANKs and groups:
22
n=2j
1
X
j =k+1
n=2j = n=2k :
Now, we are ready to bound the number of steps in m operations on n elements. Sin
e
all of the work in a Union operation is done in the two Finds that it requires, we will
simply analyze the amount of work done in m Finds, and we will only be o by a
onstant.
The work done during Find(x) is simply the number of steps we must travel to get
from x to the root. We will a
ount for these steps summed over all m operations using
ammortized analysis. This is just an a
ounting tri
k that makes the analysis easier
10
to think about: we will \
harge" some steps to the Find operations themselves, and
some steps we will \
harge" to parti
ular elements. In the end, the sum of all of these
harged steps is just the total number of steps performed, whi
h is what we want.
For ea
h Find(x) operation, we will
harge the steps as follows. Remember that we
are traveling from x to the root, and as we go, we en
ounter elements with larger
and larger RANKs. For those pointers that we follow that go from an element whose
RANK is in group i to an element whose RANK is in group i + 1, we
harge one step
to the Find operation. Sin
e there are at most log n dierent groups (by Observation
3), the total
harge to ea
h Find operation is at most log n.
For the other pointers that we follow while performing a Find(x) operation, we
harge
a step to the element from whi
h the pointer originates. Suppose u is some vertex
on the path from x to the root. Noti
e that every time we follow a pointer out of u,
the RANK of u's parent
hanges (be
ause of path
ompression), unless u's parent is
already the root. This gives us a way to bound the number of times we
an follow a
pointer out of u before the RANK of u's parent
hanges to a new group. Spe
i
ally,
we
an follow a pointer out of u at most 2k times while u's parent is still in group
(k; 2k .
Now, let us sum over all of the elements in group (k; 2k . By Observation 4, there are at
most n=2k of these elements, and by the previous paragraph, we
harged ea
h of them
at most 2k times. This leads to a total
harge among all elements of n=2k 2k = n,
while the parent RANKs remain in group (k; 2k .
As we perform more operations, the parent RANKs move to higher and higher groups
{ remember that RANKs (and therefore group numbers) never de
rease. So to a
ount
for the total
harge, we need to sum over the at most log n groups, giving us a total
harge among all elements during all m operations of at most n log n.
Finally, summing the
harge to the m Find operations, and the
harge to the elements,
we get that the total work for m operations is O((m + n) log n).
11
Chris Umans
umans s.berkeley.edu
Human
odes for Fibona
i frequen
ies: In the se
ond part of the short exer
ise,
we are given n
hara
ters
1;
2; : : :
n whose frequen
ies are Fibona
i numbers, i.e.
f (
i ) = Fi , where Fi is given by the re
urren
e: F1 = 1; F2 = 1 and Fi = Fi 1 + Fi 2
for i 3. Based on the rst part of the exer
ise, we guess that this will result in a
highly unbalan
ed Human tree, that looks like this:
cn
cn-1
c4
c3
c1
c2
This would result in a Human
ode of: 0n 1 for
1 , 0n 21 for
2, and 0n i1 for
i,
i 3.
How do we prove that this tree gets built? Re
alling the example on a small number
of
hara
ters in the rst part of the exer
ise. We should show that at ea
h step of the
algorithm, the two smallest frequen
ies
onsidered by the algorithm are the frequen
y
at the root of the tree built so far, and the next smallest frequen
y among the individual
hara
ters not yet in the tree. More formally, we need to show that for all i 1, at step
i of the algorithm, there is a single tree
onsisting
of all of the
hara
ters
1 ;
2; : : :
i,
Pi
and that the frequen
y at the root of this tree, j=1 Fj , is stri
tly less than the Fi+2 .
If this is true, then the next two trees linked by the algorithm are the tree so far, and
the single
hara
ter
i+1, with frequen
y Fi+1.
So, what we need to prove boils down to this: Pij=1 Fj < Fi+2, for all i 1. This
ries
out for an indu
tion proof. The base
ase, when i = 1, is simple:
i
X
j =1
Fj =
1
X
j =1
Fj = F1 = 1 < 2 = F3 = Fi+2 :
12
For the indu
tion step,
onsider the denition of Fi+2 : it is equal to Fi+1 + Fi. We
an
repla
e one of these with a sum and make it smaller, using the indu
tion hypothesis.
After trying both, it seems that it is easiest to repla
e Fi+1, giving us:
0
1
i 1
i
X
X
Fi+2 = Fi+1 + Fi > Fj A + Fi = Fj :
j =1
j =1
|
{z
}
by indu tion
Greedy Choi
e Property: The book \denes" something it
alls the Greedy Choi
e
Property, whi
h is supposed to give you a way to determine if a greedy strategy will
work on a given problem. This is sort of a vague
on
ept, that translates roughly
into: \The Greedy Choi
e Property holds for a parti
ular greedy algorithm if the
greedy
hoi
e made at ea
h step of that algorithm
an't hurt you." We looked at three
examples of greedy algorithms, and the statements of the Greedy Choi
e Property for
ea
h.
{ Prim's MST algorithm: Here, the Greedy Choi
e Property states that if tree
X is
ontained in some MST of G, and e is the lightest edge in G that extends
X , then X [ feg is also
ontained in some MST of G.
When we apply the Greedy Choi
e Property to prove
orre
tness of Prim's algorithm, X is the tree grown so far, and we apply the statement above to show
that after ea
h step of the algorithm the tree so far is
ontained in some MST.
Therefore, after the last step, we must have a MST.
{ Kruskal's MST algorithm: The Greedy Choi
e Property here is very similar
to the one for Prim's algorithm. It states that if forest X is
ontained in some
MST of G, and e is the lightest edge in G whose addition to X does not
reate a
y
le, then X [ feg is
ontained in some MST.
We apply this statement to prove the
orre
tness of Kruskal's in exa
tly the same
way we used the Greedy Choi
e Property to prove
orre
tness of Prim's algorithm.
{ Human's algorithm: The Greedy Choi
e Property here states that if x and y
are the
hara
ters with the two lowest frequen
ies, then there exists some optimal
Human tree in whi
h x and y are siblings.
We use this to prove
orre
tness as follows: the Greedy Choi
e Property tells
us that there exists an optimal Human tree T in whi
h x and y are siblings.
A separate argument says that if T is an optimal tree in whi
h x and y are
13
siblings, then
ombining x and y into a single meta-
hara
ter z whose frequen
y
is f (x) + f (y) gives an optimal tree for this new (smaller)
hara
ter set (CLR
Lemma 17.3). Therefore, at ea
h step in the algorithm, the tree produ
ed \so
far" is
ontained in some optimal tree; therefore after the last step, we must have
an optimal tree.
Proof of Greedy Choi
e Property for Human's algorithm: It is instru
tive
to re
all how we proved the Greedy Choi
e property for, say, Prim's algorithm. There
we wanted to show that extending X by e was \safe." We argued that if T is a MST
ontaining X , then either (1) T also
ontains e, and we are done, or (2) T does not
ontain e. In
ase (2) we showed how to swap e with some other edge e0 already in T
to form a new tree T 0; we also showed that the swap
an only have de
reased the
ost,
so T 0 must also be a MST, and it
ontains X and e, as required.
For the Greedy Choi
e Property for Human's algorithm, we pursue a similar strategy.
We
onsider an optimal tree T . If T has x and y as siblings, we are done. Otherwise,
tree T looks like this:
y
a
Here, a and b are the deepest siblings in T . We will form a new tree T 00 by ex
hanging
x with a and y with b, and argue that B (T 00 ) BP(T ), so T 00 is optimal if T was. Re
all
that the
ost of a tree is dened by: B (T ) =
2C f (
)dT (
), where C is the set of
hara
ters, f (
) is the frequen
y of
hara
ter
, and dT (
) is the depth of
in tree T .
We rst
onsider T 0 obtained by swapping x and y. We will show that B (T 0) B (T )
by showing that the net de
rease is non-negative:
X
X
B (T ) B (T 0 ) =
f (
)dT (
)
f (
)dT (
)
2C
2C
= f (x)dT (x) + f (a)dT (a) f (x)dT (a) f (a)dT (x)
= [f (a) f (x)[dT (a) dT (x)
Sin
e x and y were
hosen to be the
hara
ters with the smallest frequen
ies, we
know that f (x) f (a), and by our
hoi
e of a and b, we know that dT (x) dT (a).
0
14
Therefore both of the terms in the produ
t on the last line above are non-negative, so
B (T ) B (T 0 ) is non-negative.
Finally, we transform T 0 into T 00 by swapping y and b, and using an identi
al argument,
we show that B (T 00 ) B (T ). Here we use the fa
t that f (y) f (b), and dT (y) dT (b).
The result is an optimal tree T 00 that in
ludes x and y as siblings, whi
h shows that
the Greedy Choi
e Property holds for the Human algorithm.
15
Chris Umans
Mar h 1, 2000
umans s.berkeley.edu
matri
es, whi
h is the
ase we will verify here. But one
an just as easily blo
k matri
es
in other ways (e.g. into 9 n=3 n=3 submatri
es). Let us
onsider the blo
king used
by Strassen's algorithm:
"
# "
# "
#
A B
E G
AE + BF AG + DH
C D F H = CE + DF CG + DH :
{z
}
| {z } | {z } |
M1
M2
M1 M2
know how to multiply and add matri
es, and one
an verify that the various rules used
in grade-s
hool algebra (multipli
ation distributes over addition, asso
iativity, et
...)
hold for matri
es as well. In the terminology of abstra
t algebra, we say that matri
es
form a ring.
Fa
t 1 allows us to apply a method for multiplying 2 2 matri
es re
ursively. Fa
t 2 is
the reason we
an verify algebrai
ally that Strassen's fan
y method of getting the four
entries in the result matrix, and then be sure that the algebra still works out when the
\entries" are entire matri
es instead of just single numbers.
More on re
ursive matrix multipli
ation: In presenting Strassen's algorithm as
17
an improvement over the \naive" re
ursive matrix multipli
ation, CLR and the readings are ex
ited that Strassen's method is able to redu
e 8 multipli
ations to 7. But
Strassen's method also uses 18 additions, as opposed to only 4 in the naive algorithm.
Why do we
are so mu
h about multipli
ations, and not additions?
The answer is that we apply Strassen's 2 2 matrix multipli
ation method re
ursively.
Ea
h multipli
ation is a
tually a matrix multipli
ation, whi
h is very
ostly, while ea
h
addition is just a matrix addition, whi
h is
omparatively mu
h faster. It is instru
tive
to look at the running time re
urren
es in ea
h
ase:
(n2 )
z }| {
Using the Master Theorem for re
urren
es, we see that both re
urren
es fall into
the
ase in whi
h the number of re
ursive
alls dominates, and the solutions are
T (n) = (nlog 8 ) = (n3 ) and T (n) = (nlog 7 ) = (n2:81::: ), respe
tively. We
now
an see why multipli
ations matter so mu
h more than additions. The number of
multipli
ations translates dire
tly to the number of re
ursive
alls (whi
h di
tates the
running time for these re
urren
es). The number of additions, on the other hand simply
ontributes to the \f (n)" term in the re
urren
es. No matter what
onstant this
number is, the f (n) term will always be (n2), so in
reasing the number of additions
does not hurt the asymptoti
running time at all.
Other Strassen-like results: Vi
tor Pan has a method for multiplying 68 68
matri
es using only 132; 464 multipli
ations (the naive method uses 314; 432). He also
has a method for multiplying 70 70 matri
es using only 143; 640 multipli
ations,
and 72 72 matri
es using only 155; 424 multipli
ations. And you probably thought
Strassen must have spent too mu
h time with matri
es!
There is an important point here, though. Pan's method for multiplying, say, 72 72
matri
es
an be applied re
ursively, in the same way Strassen's method
an. We simply
need to blo
k the matri
es into n=72 n=72 submatri
es at ea
h re
ursive
all! This
gives a general algorithm for multiplying matri
es, with a running time re
urren
e of:
T (n) = 155; 424T (n=72) + (n=72)2 :
Here is the number of additions Pan's method requires. Noti
e that we don't even
need to know this number, sin
e we
an see that the (n=72)2 term is just (n2 ),
2
18
and that is all the information we need to solve the re
urren
e. The solution of this
re
urren
e is, by the Master Theorem, T (n) = (nlog 155;424 ) = (n2:795:::). This is
indeed a small improvement over Strassen's algorithm. But noti
e how mu
h harder
Pan needed to work on larger matri
es: Strassen only needed to save a single multipli
ation, while Pan needed to save over half of the multipli
ations (some 217,824
multipli
ation saved)!
Question 1 on Homework 7 asks you to gure out how many multipli
ations you would
need to save on 3 3 matri
es in order to improve Strassen's result.
72
19
Chris Umans
Mar h 8, 2000
Dis
rete Fourier Transform example: In the short exer
ises you are asked to
ompute the Dis
rete Fourier Transform (DFT) of the ve
tor (0; 1; 2; 3). Some people
ele
ted to do this by tra
ing through the FFT algorithm. Here we will simply do the
appropriate polynomial evaluations, whi
h seems a less error-prone method.
Remember that the input ve
tor a = (a0 ; a2; : : : an 1) gives the
oe
ients of a polynomial A(x). Here we have:
A(x) = 0 + 1 x + 2 x2 + 3 x3 :
The DFT requires us to evaluate the polynomial A(x) at the n
omplex n-th roots of
unity !n0 ; !n1 ; !n2 ; : : : !nn 1. Here we noti
e1 that the
omplex 4-th roots of unity are just
1; i; 1; i. We
ompute the DFT ve
tor y = (y0; y1; y2; y3) as follows:
y0 = A(!40 ) = A(1) = 0 + 1 (1) + 2 (1)2 + 3 (1)3 = 6
y1 = A(!41 ) = A(i) = 0 + 1 (i) + 2 (i)2 + 3 (i)3 = 2 2i
y2 = A(!42 ) = A( 1) = 0 + 1 ( 1) + 2 ( 1)2 + 3 ( 1)3 = 2
y3 = A(!43 ) = A( i) = 0 + 1 ( i) + 2 ( i)2 + 3 ( i)3 = 2 + 2i:
Therefore the DFT of (0; 1; 2; 3) is (6; 2 2i; 2; 2 + 2i).
The Fast Fourier Transform (FFT) algorithm: This is an in
redibly important
and beautiful algorithm, that
omputes the DFT in time O(n log n), instead of the
O(n2 ) time we get using the \obvious" algorithm of just performing the n seperate
polynomial evaluations.
Our goal is to evaluate a polynomial A(x) = a0 + a1 x + a2 x2 + + an 1 xn 1 at
the
omplex n-th roots of unity. For the appli
ation we have in mind (multiplying
polynomials), any n distin
t points will do, but as we will see, these parti
ular n
points are
riti
al for the divide and
onquer approa
h we will take. The
omplex
1 We
an see this by plugging in 4 for
20
!n
e2i=n ,
and (2)
0 ; ! 1 ; : : : ! n=2 1
Lemma 4 Let !n0 ; !n1 ; : : : !nn 1 be the
omplex n-th roots of unity, and let !n=
2 n=2
n=2
be the
omplex n=2-th roots of unity. We have:
(!n0 )2
= (!nn=2)2 = !n=0 2
(!n1 )2
= (!nn=2+1)2 = !n=1 2
(!n2 )2
= (!nn=2+2)2 = !n=2 2
..
.
..
.
..
.
n=2 1
!n=
2 :
nX1
j =0
aj (!nk )j :
It turns out that we
an express the a's in terms of the y's by the following formulas
(proved in CLR):
1 nX1 y (! k )j :
ak =
j n
n
j =0
These are remarkably similar! It should not be surprising, then, that we
an
ompute
the Inverse DFT using the FFT algorithm in whi
h we repla
e !n by its inverse !n 1,
and divide all of the entries in the result ve
tor by n (although we
ertainly haven't
proved that this works).
23
Chris Umans
umans s.berkeley.edu
One
an prove by a straightforward indu
tion that the running time of re
ursive C (n; k)
is at least C (n; k) whi
h is at least exponential in the worst
ase. The spa
e required
is
(n), the maximum depth of the re
ursion tree.
The dynami
programming method lls in a (n +1) (k +1) table, in whi
h entry (i; j )
is intended to be C (i; j ). The only thing we need to
onsider before
oding this is what
order to ll in the table. We noti
e that entry (i; j ) depends on entries (i 1; j 1)
and (i 1; j ). If we visualize the table with entry (0; 0) in the upper left
orner, this
means that ea
h entry depends on the entry diagonally above and to the left of it, and
the entry dire
tly above it. If we ll in the table from top to bottom, and left to right
within ea
h row, we will be sure that when we
ompute entry (i; j ), the table entries
onsulted in
omputing entry (i; j ) are already lled in. The
ode follows:
dynami
_prog_C(n, k) {
for(i = 0; i <= n; i++) C[i, 0 = 1;
for(j = 1; j <= k; j++) C[0, j = 0;
for(i = 1; i <= n; i++)
24
As with all dynami
programming algorithms, the running time is the number of entries
in the table, times the amount of work to ll in ea
h entry. In this
ase, we have (nk)
table entries, and we ll in ea
h with O(1) work, for a total running time of (nk).
The spa
e required is the size of the table, or (nk).
Finally, using the formula C (n; k) = (n nk!)!k! , we
an write the following
ode:
formula_C(n, k) {
if(k == 0) return 1;
if(n == 0) return 0;
x = 1;
for(i = 1; i <= n; i++) {
x = x * i;
if(i == n-k) n_minus_k_fa
t = x;
if(i == k) k_fa
t = x;
}
return(x/(n_minus_k_fa
t * k_fa
t));
}
Counting multipli
ations as
onstant time and
onstant spa
e operations (whi
h is
slightly unrealisti
here, be
ause the numbers we are
omputing are so large), the
running time is O(n), and the spa
e required is a
onstant. A better running time of
O(k)
an be obtained using the observation that n!=(n k)! = n(n 1)(n 2) (n
k + 1) and k!
an both be
omputed with only k multipli
ations.
Framework for dynami
programming: In devising a dynami
programming algorithm, there are 4 steps/questions to be answered:
1. What are the subproblems?
2. How do we express the solution to ea
h subproblem in terms of solutions to smaller
subproblems?
3. What is the
orre
t order to ll in the dynami
programming table?
4. What is the running time and spa
e requirement of the algorithm?
25
Step 2 and sometimes step 1 are the di
ult parts; on
e these are
omplete, it should
be fairly easy to
omplete step 3, and write the a
tual (pseudo-)
ode.
A few notes on steps 1 and 2: by a subproblem, we mean an instan
e of the original
problem with a dierent (smaller) input. In determining the subproblems, sometimes
it is ne
essary to rst rephrase the original problem. In guring out the re
ursive
expression in step 2, we use the so-
alled optimal substru
ture of the problem { we
often
onsider a range of initial
hoi
es for
onstru
ting a solution, and then ea
h
hoi
e leads to some number of subproblems. The next few examples should make
these observations more
on
rete.
Chain Matrix Multipli
ation: We want to multiply a
hain of re
tangular matri
es
A1 ; A2 ; : : : An with dimensions m0 ; m1 ; m2 ; : : : mn (matrix Ai has dimensions mi 1 mi ).
Sin
e the number of operations required varies with the order we
hoose to multiply
the matri
es, we want to determine the optimal way of parenthesizing the matri
es.
Let's see how the solution we saw in le
ture ts into the framework above:
1. The problem asks: What is the optimal way to multiply A1 A2 An?
The subproblems ask: What is the optimal way to multiply Ai Ai+1 Aj ?
Noti
e that subproblems are all instan
es of the same problem, with dierent
inputs, namely: all sub-
hains of matri
es Ai; Ai +1; : : : Aj . There are O(n2) su
h
subproblems, and ea
h is dened by a pair of endpoints (i; j ). Noti
e that the
problem on the original input is just subproblem (1; n).
2. First, we dene some notation: let M (i; j ) be the number of operations for multiplying Ai; : : : Aj in an optimal order. Now, we want to write a re
ursive expression
for M (i; j ), so we \think re
ursively": the optimal grouping rst breaks the matri
es at some k between i and j , multiplies the left half in an optimal order,
multiplies the right half in an optimal order, and then does one nal multipli
ation of the left result matrix and the right result matrix. In our notation:
M (i; j ) = imin
[M (i; k) + M (k + 1; j ) + |mi 1{zmk mj}
k<j
ost of nal step
M (i; i) = 0 for all i
3. The dynami
programming table for this example has size O(n2), and we want
to pla
e the value of M (i; j ) in entry (i; j ), for all 1 i j n. We always
ll in the table from the smallest subproblem up to the largest subproblem. Here
the \size" of subproblem (i; j ) is simply the number of matri
es in its input, or
j i + 1. We
an verify that in our re
ursive expression, M (i; j ) depends only on
26
determine if either string is just a single
hara
ter { in this
ase, the length of a
LCS is 1 if that
hara
ter appears in the other string, and 0 otherwise.
3. Like the
hain matrix multipli
ation example, the dynami
programming table
for this example has size O(n2), and we want to pla
e the value of L(i; j ) in entry
(i; j ), for all 1 i j n. The order I des
ribed in se
tion for lling in the table
was slightly ina
urate. Here is a good way to visualize what the
orre
t order
should be:
j
1 2 3
1
2
3
i
Entry (i; j ) in the table depends on three other entries (from our re
ursive definition above). It depends on the entry to its right, the entry below it, and the
entry diagonally to its right and below it, as pi
tured. Therefore, we
an ll in
the table in the following order: rst, ll in the last
olumn and the last row,
whi
h
orrespond to the base
ases. Then, ll in the table from bottom to top,
and right to left within ea
h row. It is easy to see that at the time we get to
lling in ea
h entry (i; j ), the other three entries upon whi
h it depends will have
already been lled in.
4. The size of the table is O(n2), and the time to ll in ea
h entry is O(1); therefore
the running time is O(n2).
All-Pairs Shortest Paths: In this problem, we are given a graph G = (V; E ) with
edge weights w(vi; vj ) for all edges (vi; vj ) 2 E . (There are no negative
y
les, and for
onvenien
e, we will assume w(vi; vj ) = 1 if (vi; vj ) 62 E ). We want to nd the shortest
path from vi to vj for all vertex pairs (vi; vj ). Let's see how the dynami
programming
solution to this problem ts into the framework above:
1. This is an example of a
ase in whi
h we alter the problem statement in order
to be able to
learly state the subproblems. Our modied problem asks: What
is the shortest path from vi to vj using (some of) the verti
es v1 ; v2 ; : : : ; vn as
intermediate verti
es on the path? The subproblems ask: What is the shortest path
28
extra
t the solution to our original problem from these solutions, sin
e, among
other things, we will have the shortest paths for all vertex pairs (vi; vj ). Here
there are O(n3) subproblems, and ea
h
an be des
ribed by a triple (i; j; k).
2. We introdu
e the notation T (i; j; k) to represent the length of the shortest path
from i to j using (some of) the intermediate verti
es v1; v2; : : : vk . We want a
re
ursive expression for T (i; j; k) in terms of smaller subproblems. As with the
LCS problem, we begin by noti
ing that there are two
ases: either the shortest
path doesn't use vertex vk , or it does; these two possibilities are pi
tured below
in (a) and (b), respe
tively:
vi
vj
v1
vi
v2
vj
v1
v2
v3
v3
vk-1
v4
vk-1
v4
vk
(a)
vk
(b)
30
Chris Umans
umans s.berkeley.edu
Graphi
al LP example: In the short exer
ises you are asked to solve the following
Linear Program (LP) in two variables (whi
h I have renamed x and y):
maximize 3x + 5y
subje
t to x + 2y 6
x y 2
x 3
x; y 0
The following gure gives a graphi
al representation of the problem:
x=3
(0, 3)
obj = 15
11111
00000
00000
11111
00000
11111
00000
(0, 0)11111
00000
obj = 011111
(2, 0)
obj = 6
x-y = 2
objective function
x + 2y = 6
(3, 3/2); obj = 33/2
(3, 1); obj = 14
The line dened by ea
h of the
onstraints is pi
tured, along with a line representing
a
onstant value for obje
tive fun
tion. The feasible region is shaded. We imagine
\sliding" the obje
tive fun
tion line in the dire
tion of de
reasing obje
tive fun
tion
value (in the dire
tion of the arrows in the gure), until it rst tou
hes the feasible
region. The point of interse
tion will be the maximum value of the obje
tive fun
tion
subje
t to the
onstraints. This will always happen at a
orner (or along an entire line,
in whi
h
ase we
an pi
k an endpoint of that line, whi
h is a
orner).
31
To determine whi
h
orner attains the maximum value of the obje
tive fun
tion, we
evaluate the obje
tive fun
tion at ea
h of the
orners of the feasible region. In this
example, there are 5
orners, ea
h marked with a heavy dot in the gure, and ea
h is
labled with its
oordinates, and the value of the obje
tive fun
tion at that point. The
optimum value of 33=2 is attained at (3; 3=2). The solution to the LP is thus: x = 3;
y = 3=2.
Why we
are: Many real-world problems t into the general framework \maximize
or minimize some obje
tive fun
tion, subje
t to some
onstraints." It turns out that
many su
h formulations are very hard to solve e
iently. For example, we might want
to require that the variables in an LP are integers, whi
h is natural for many problems
that seem to need a Boolean variable. We might also want an obje
tive fun
tion that
in
ludes a quadrati
term (one that multiplies two variables together). It turns out
that both of these formulations are (probably) phenomenally hard to solve exa
tly.
But, Linear Programming is easy, in the following two ways:
{ A Linear Program
an be solved in polynomial time. The algorithms that do this
are
omplex and often not useful in pra
ti
e.
{ The simplex algorithm solves most Linear Programs very e
iently in pra
ti
e,
although on some degenerate
ases it requires exponential running time. The
simlex algorithm takes advantage of the fa
t that the optimum lies at a
orner
of the feasible region. It starts at some
orner, and repeatedly moves to a better
neighboring
orner, until it
an't improve the value of the obje
tive fun
tion
anymore. Of
ourse, in general this algorithm is operating in n-dimensional spa
e,
moving from
orner to
orner of the polytope that is the feasible region. In this
ourse, you are only expe
ted to be able to tra
e the steps the simplex algorithm
might take on 2 (and possibly 3) variable problems, where one
an reasonably
expe
t to be able to draw the feasible region graphi
ally.
Be
ause we have an e
ient way to solve LP's, and they provide a quite general
framework for expressing optimization problems, it's often worth putting a fair amount
of eort into trying to express an optimization problem as a LP. If su
h a transformation
an be found, it automati
ally gives an e
ient solution to the optimization problem.
Example: produ
tion s
heduling: One of the major uses of Linear Programming
is in business, where a
ompany is trying to make de
isions that maximize prot,
while obeying
ertain
onstraints. Here is a simple example: a business produ
es two
produ
ts, A and B , and it must de
ide how many of ea
h to produ
e in the next month.
32
Say produ
t A sells for $10 per unit and B sells for $12 per unit. Then to maximize
prot, the business should solve the following LP:
maximize 10a + 12b
subje
t to
a; b 0
Here a is the amount of A to produ
e and b is the amount of B to produ
e. What is
the solution to this LP? The solution is to produ
e an innite amount of A and B . In
LP terminology we say that this problem is unbounded. Clearly this is not realisti
, so
we should add some
onstraints to model the situation more a
urately. Let's say that
the
ompany
an produ
e no more than 10; 000 units of either produ
t. We should
then add the
onstraints:
a 10000
b 10000
Now, the optimum is a
hieved by produ
ing 10; 000 units of ea
h. This is reasonable
but not terribly interesting. Now suppose there is a single pa
kaging ma
hine that
must pa
kage ea
h unit of A or B produ
ed by the
ompany before it is shipped, and
suppose that it takes 1 minute to pa
kage a unit of A and 3 minutes to pa
kage a unit
of B . Furthermore, the pa
kaging ma
hine is run during working hours, whi
h are 40
hrs/week, for 4 weeks. We get the following additional
onstraint:
a + 3b 9600;
be
ause the ma
hine
an operate for a total of 4 40 60 = 9600 minutes. Now it's
harder to see what the optimum is, but we
ould use our LP-solver to nd it.
To make the problem even more realisti
, we
ould
onsider planning a s
hedule for
the next 12 months. We would then have 24 variables (a1 ; a2; : : : a12 and b1; b2 ; : : : b12 )
instead of 2, representing the amount of ea
h produ
t produ
ed ea
h month. Our
onstraints
ould be modied to a
ount for anti
ipated dierent pri
es from month to
month, dierent monthly
osts, and the possibility of produ
ing extra units one month
and storing them (at some
ost, of
ourse) until they are needed in a future month.
The readings have su
h an example. The point is that it is often very easy to formulate
this type of de
ision-making as a LP, and by adding
onstraints, we
an model quite
omplex s
enarios.
Example: maximum delay in a network: Suppose I send an email to a friend
and I want to know how long I must wait before I am
ertain that the email has
been re
eived. The maximum delay will depend on the
hara
teristi
s of the network
33
through whi
h the message passes. We model the network with the following simple
graph:
s1
s2
s3
t1
t2
t3
t
sn
tn
In this model, the sour
e
omputer is lo
ated at the vertex s, the destination is at
vertex t, the verti
es labeled s1 ; s2; : : : sn represent the possible points of entry to the
network, and the verti
es labeled t1 ; t2; : : : tn represent the possible exit points. The
edges (si; tj ) represent possible routes through the network. Every edge e in the graph
has an asso
iated weight w(e) that is an upper bound on the delay along that route. To
solve my problem, I am interested in the path from s to t with the maximum weight.
We will try to write this problem as a Linear Program.
This is an example of a problem where the LP variables are not at all obvious. For
this problem we will have a variable asso
iated with ea
h vertex, whi
h, by abuse of
notation we will label with the same name as the vertex itself. So, the LP variables
are s; t; s1; s2 ; : : : sn; t1 ; t2; : : : tn. The LP is:
minimize s t
subje
t to s si w(s; si) 8i
si tj w(si; tj ) 8i; j
tj t w(tj ; t) 8j
This is fairly
ounter-intuitive. To understand how these
onstraints are useful, noti
e
that we are trying to minimize the gap between s and t. That is, the LP is trying to
push t as
lose to s as it possibly
an. We want the gap between s and t to be the
length of the longest path, so we need
onstraints that prevent the gap from be
oming
too small. Ea
h
onstraint in the LP for
es the gap between the variables representing
endpoints of an edge to be at least as large as the length of that edge. In parti
ular,
we
an prove the following
laim:
Claim 1 For all paths p from s to t, a feasible assignment to the variables in the LP
above must have s t length(p).
34
Proof: Say the path p goes from s to si to tj to t. Let's see what information we an
get from the
onstraints we know these four variables must obey:
s si w(s; si)
si tj w(si; tj )
tj t w(tj ; t)
If we sum these inequalities, we get s t length(p) as required.
Sin
e this
laim holds for all paths, it holds, in parti
ular, for the longest path. This
means that the optimum value of the LP
an be no smaller than the length of the
longest path. To nish proving that the optimum a
tually gives us the length of the
longest path, we need to exhibit a feasible solution for whi
h the obje
tive fun
tion
a
tually attains the value length(p), where p is a longest s-t path.
It turns out that setting t = 0, tj = w(tj ; t) for all j , si = maxj (w(si; tj ) + w(tj ; t)) for
all i, and s to the length of the longest s-t path, we get a feasible solution. The proof
is not di
ult, but we did not get to it in se
tion.
This problem is meant to give you some ideas that should be useful for the last problem
on the homework, whi
h asks you to give a LP formulation of the single-pair shortest
paths problem.
35
Chris Umans
April 5, 2000
Network
ow example: In the short exer
ises, you are asked to draw the residual
graph after ea
h augmenting path is added to the
ow in a simple
ow network. The
graphs should look something like this:
s
1
2
3
4
5
6
7
8
9
1
2
3
4
5
(a)
1
2
3
4
5
6
7
8
9
(b)
6
7
8
9
(c)
1
2
3
4
5
6
7
8
9
(d)
All the edge
apa
ities in the gures are one. Figure (a) is the original dire
ted graph,
whi
h is the same as the residual graph with respe
t to the \zero"
ow (no
ow on
any edge). The rst augmenting path we nd is: s ! 1 ! 6 ! t. We push one unit
of
ow along this path, and get the residual graph pi
tured in (b). Next we nd the
augmenting path s ! 2 ! 8 ! t, and push one unit of
ow along it. This yields the
residual graph in (
). Finally we nd the augmenting path s ! 3 ! 7 ! t, and push
one unit of
ow along it, yielding the residual graph in (d). In this residual graph,
there are no augmenting paths, so we know that the total
ow of three units that we
have a
hieved is a maximum
ow.
Max-Flow/Min-Cut Theorem: This is an important theorem, and working through
the proof is a good way to be
ome familiar with properties of
ows and augmenting
paths.
36
2/3
0/1
0/1
b 1/1
1/1 t
2
s
(a)
b 1
1
1
1/1
1
c
(b)
The value of a
ow f , denoted jf j is the amount of
ow leaving the sour
e s (equivalently, the amount of
ow entering the sink t). In the example network pi
tured in (a),
ea
h edge is labeled with the
ow along that edge and the edge
apa
ity; the value of
the
ow in this example is 2.
The
apa
ity of a
ut (S; T ), denoted
(S; T ) is the sum of the
apa
ities of the edges
rossing the
ut in the s ! t dire
tion. (Remember that a
ut is a partition of the
verti
es into two sets, S and T , su
h that S
ontains the sour
e s and T
ontains the
sink t). In the example network pi
tures in (b), the dashed line represents a
ut (i.e.,
S = fs; a; bg and T = f
; tg), whose
apa
ity is 3.
We
an now pro
eed with the proof. Our strategy is to prove the following two statements, whi
h together imply the theorem:
1. The value of any
ow in G is at most the
apa
ity of a minimum
ut in G.
2. The value of a maximum
ow f in G equals the
apa
ity of some
ut in G.
Proof of (1). Let (S; T ) be a minimum
ut in G, and let f be any
ow in G. First,
we laim that:
jf j =
X
u2S;v2V
f (u; v ):
Why is this true? Let's break down the sum, by
onsidering dierent verti
es u might
be. When u is the sour
e s, we sum all the outgoing
ow from s, whi
h we know
equals jf j. When u is any other vertex in S , we sum the outgoing
ow minus the
in
oming
ow, whi
h we know by
onservation of
ow to be zero. The in
oming
ow
is subtra
ted be
ause of the
onvention that f (u; v) = f (v; u).
37
u2S;v2V
u2S;v2T
f (u; v ) +
X
u2S;v2S
f (u; v )
The rst summation on the last line is exa
tly the
ow
rossing the
ut. The se
ond
summation on the last line equals zero, be
ause of the same fa
t we used above: for
every u; v 2 S , we are summing f (u; v) and f (v; u), and we know that f (u; v) =
f (v; u). Noting that the
ow along an edge f (u; v )
an be no larger than the
apa
ity
of that edge
(u; v), we get, nally:
X
jf j =
f (u; v )
=
=
u2S;v2V
u2S;v2T
u2S;v2T
f (u; v ) +
f (u; v )
X
u2S;v2S
u2S;v2T
f (u; v )
(u; v ) =
(S; T );
whi
h proves part (1). Noti
e also that we proved along the way that jf j = Pu2S;v2T f (u; v),
for any
ow f and any
ut (S; T ); we will use that again in the se
ond part.
Proof of (2). Let f be a maximum
ow in G. We will
onstu
t a
ut (S; T ) su
h
that
(S; T ) = jf j. We know that there
an be no augmenting paths in the residual
network Gf . If there were, then f
ould be augmented and it
ould not have been
a maximum
ow. Let S be the set of all verti
es rea
hable from the sour
e s in the
residual network Gf . This set
annot in
lude the sink t be
ause then there would be
an augmenting path. If we let T be the verti
es not rea
hable from s in the residual
network, then (S; T ) is a
ut in G.
What
an we say about the edges in the original network G, that
ross the
ut (S; T )
in the s ! t dire
tion? Consider su
h an edge (u; v), where u 2 S and v 2 T . If the
ow along edge (u; v) is less than the
apa
ity of edge (u; v), then the ex
ess
apa
ity
will show up as an edge from u to v in residual graph with non-zero
apa
ity. But that
an't be, be
ause by our
hoi
e of S and T , u is rea
hable from s in the residual graph,
but v is not! Therefore the
ow along edge (u; v) must equal its
apa
ity. We say that
edge (u; v) is saturated.
38
Sin
e we have just argued that all edges that
ross the
ut in the s ! t dire
tion are
saturated, we have:
X
X
f (u; v ) =
(u; v ) =
(S; T ):
u2S;v2T
u2S;v2T
Putting that together with the fa
t that jf j = Pu2S;v2T f (u; v), whi
h we proved earlier,
we have:
X
X
jf j =
f (u; v ) =
(u; v ) =
(S; T );
u2S;v2T
u2S;v2T
39
Chris Umans
umans s.berkeley.edu
A problem Q that is NP-
omplete, is one of the \hardest" problems in the
lass NP;
moreover, if you
an solve Q in polynomial time, then every problem in NP
an be
solved in polynomial time, and hen
e P = NP . Before we des
ribe how to do ea
h
step, we need to know exa
tly what is meant by a \problem."
Optimization and de
ision problems: For our purposes, a problem is a question
with the following form:
1. Given | {z } , what is the shortest/longest/minimum/maximum | {z } ?
stru
ture
instan
e
E.g., Given a graph G, what is the traveling salesperson tour of minimum length?
2. Given | {z } , is there a | {z } ?
stru
ture
instan
e
E.g., Given a graph G and an integer k, is there a traveling salesperson tour of
length k?
The rst type of problem is an optimization problem. The answer to the question it
asks is some optimal stru
ture (like a shortest tour in the graph). The se
ond type of
problem is a de
ision problem. The answer to the question it asks is YES or NO. It is
important to remember that P and NP are
lasses of de
ision problems. Optimization
problems
an be transformed into de
ision problems by adding a \bound" k to the
problem instan
e, as we did above. On your homework you are asked to show that the
de
ision and optimization versions of a problem are \equivalant", up to a polynomial
fa
tor.
Showing a problem is in NP: A (de
ision) problem Q is in NP if there exists a
polynomial-time
he
king algorithm. This algorithm is given an instan
e I of Q and a
\potential solution" C to Q, and it outputs either YES or NO. We require that
1. If I is a \yes" instan
e, then there exists some C (a
erti
ate) that makes the
he
king algorithm output YES.
2. If I is a \no" instan
e, then no C makes the
he
king algorithm output YES.
In pra
ti
e, you should think in terms of a short
erti
ate you would need to provide
to \
onvin
e" the
he
ker that a \yes" instan
e really is a YES instan
e, together with
an e
ient pro
edure for the
he
ker. Some examples (for problems you have seen):
{ 3-SAT: The instan
e is a 3-CNF Boolean formula; the
erti
ate is a (purported)
satisfying assignment. The
he
ker assigns the values spe
ied by the assignment,
and
he
ks that ea
h
lause
ontains a TRUE.
41
{ Hamiltonian Path: The instan
e is a graph G verti
es; the
erti
ate is a list of
verti
es that (supposedly) forms a Hamiltonian path in G. The
he
ker makes
sure that every vertex appears exa
tly on
e in the list, that there is an edge
between ea
h pair of subsequent verti
es in the list.
{ Maximum Independent Set: The instan
e is a graph G and an integer k; the
erti
ate is a (supposedly) independent set of the verti
es of size k. The
he
ker
makes sure no pair of the verti
es has an edge between them, and that there are
indeed k dierent verti
es in the set.
{ Graph Isomorphism: The instan
e is a pair of graphs G1 and G2 ; the
erti
ate is
a (purported) isomorphism between G1 and G2 (i.e., a bije
tive mapping from
the verti
es of G1 to the verti
es of G1 ). The
he
ker veries that is indeed
bije
tive, and that for every edge (u; v) in G1 , the edge ((u); (v)) belongs to
G2 , and that for every edge (u; v ) in G2 , the edge ( 1 (u); 1(v )) belongs to G1 .
Redu
tions: A polynomial-time redu
tion from a de
ision problem Qold to Qnew is a
polynomial-time algorithm R that is given an (arbitrary) instan
e of Qold and produ
es
an instan
e of Qnew . We require that \yes" instan
es of Qold map to \yes" instan
es
of Qnew , and that \no" instan
es of Qold map to \no" instan
es of Qnew . Graphi
ally,
the mapping looks like this:
YES
YES
NO
NO
R
Q_old
Q_new
To remember whi
h dire
tion the redu
tion goes in, it is useful to remember our purpose
in performing the redu
tion in the rst pla
e. A redu
tion from Qold to Qnew allows
us to argue: \If Qnew has a polynomial-time algorithm, then Qold does." Why is this
true? Suppose we had an e
ient algorithm A for Qnew . Then given any instan
e I of
Qold , we
ould apply the redu
tion R, obtaining an instan
e R(I ) of Qnew , and solve
that instan
e using A. Be
ause yes maps to yes and no maps to no, we know that the
yes/no answer for R(I ) is also the yes/no answer for I . If you are not sure if you have
redu
ed in the right dire
tion, try making this argument, and see if it works.
A sample redu
tion: Let's try redu
ing 3-SAT to Independent Set. Here are the
two (de
ision) problem denitions:
42
{ 3-SAT: Given a Boolean formula in 3-CNF form (an AND or ORs, where ea h
~x
~z
In general, there will be a triangle for ea
h of the m
lauses (or an edge or a single vertex
if there are only 2 or 1 literals in the
lause). To help us reason about the problem we
label the verti
es of the triangle with the literals in the
orresponding
lause. Then we
onne
t every pair of verti
es labeled with
ontradi
tory literals. Finally we set k = m.
To show that \yes maps to yes and no maps to no," we prove the following
laim.
Noti
e that we need to prove both dire
tions of this
laim, even though the redu
tion
is only a mapping in one dire
tion.
Claim 2 has a satisfying assignment if and only if G has an independent set of size
at least k.
Proof: (!) Suppose has a satisfying assignment. Then under that satisfying as-
signment there must be at least one TRUE in ea
h
lause. Pi
k one TRUE literal in
ea
h
lause, and in
lude the
orresponding vertex in the independent set. This set
has size m k, and it is in fa
t an independent set be
ause we
annot have in
luded
any verti
es labeled with
ontradi
tory literals, sin
e the TRUES
ame from a truth
assignment.
( ) Suppose G has an independent set V 0 of size at least k. No more than one vertex
from ea
h triangle
an be in V 0 , so V 0 must have size exa
tly k = m. Furthermore, no
two verti
es labeled with
ontradi
tory literals
an be in V 0 be
ause it is an independent
43
set. Then, we
an set the literals labeling verti
es in V 0 to TRUE in a
onsistent way,
and set the other variables in any way we wish, and we will have at least one TRUE
literal in ea
h
lause, whi
h is a satsifying assignment.
44
Chris Umans
umans s.berkeley.edu
YES
NO
NO
R
3-CNF-SAT
0/1-IP
That is, given an (arbitrary) instan
e of 3-CNF-SAT, , whi
h looks something like:
(z1 _ z2 _ :z3 ) ^ (:z1 _ z4 _ z3 ) ^ ^ ( );
we need to produ
e an instan
e of 0/1-IP, (whi
h
onsists of m inequalities involving
n 0/1 variables) spe
ied by an m n matrix A and an m-ve
tor ~b.
Before attempting the transformation, let's remind ourselves of the
laim we will need
to be able to prove:
45
Claim 3 has a satisfying assignment if and only if there exists a 0/1 ve
tor ~x for
whi
h A~x ~b.
Sin
e the 3-CNF-SAT instan
e involved nding a truth assignment to the z variables,
it seems natural to have the x variables of the IP
orrespond to the z variables. So,
we set n to be the number of z variables in , and we have one 0/1 x variable for ea
h
in our instan
e of 0/1 IP. We will asso
iate the 0/1 value of variable xi in the IP with
the FALSE and TRUE values of the
orresponding zi variable in .
Now, we need to
onstrain the x variables so that there is a solution to Ax b if and
only if there is a satisfying assignment to . Noti
e that the IP solution must satisfy
ALL of the inequalities, just as a satisfying assignment must satisfy ALL of the
lauses
of . This suggests that we should have a
onstraint in our IP for ea
h
lause in .
For a
lause in with no negated variables, like (z1 _ z2 _ z3), it is pretty
lear that
we want a
onstraint like:
x1 + x2 + x3 1
Thinking of our
orresponden
e between IP variables and variables of , this
onstraint
says \at least one variable in the
lause must be true."
What
onstraint should we write for a
lause like (z1 _ z2 _ :z3 )? We need a (linear!)
fun
tion f that behaves like :. That is, we would like f (x3) = 0 if x3 = 1 and f (x3) = 1
if x3 = 0. It seems that f (x3) = 1 x3 will do the tri
k. Then we
an en
ode a
lause
like (z1 _ z2 _ :z3 ) with a
onstraint like:
x1 + x2 + (1 x3 ) 1:
We now have all we need to spe
ify the redu
tion. We set m to be the number of
lauses
in , and for ea
h
lause, we write the
orresponding
onstraint on the x variables.
Using the standard algebrai
manipulation we
an get all of the
onstraints into the
form a1 x1 + a2 x2 + anxn bj , where the ai and bj are
onstants. From there we get
the matrix A and the ve
tor ~b. Finally, we
an prove the
laim above:
Proof:[of
laim (!) Assume has a satsifying assignment. Then for ea
h i we
an
set the IP variable xi to 1 if zi is TRUE in the satisfying assignment, or 0 if zi is
FALSE. Sin
e the satsifying assignment has at least one TRUE in ea
h
lause, there
is at least one xi set to 1 (or (1 xi ) set to 1) in ea
h
onstraint of the IP; hen
e we
have des
ribed a ve
tor ~x for whi
h A~x ~b.
( ) Assume there is some ve
tor ~x for whi
h A~x ~b. Then for ea
h i, set the variable
zi to TRUE if xi is set to 1 and FALSE if xi is set to 0. Sin
e every IP
onstraint is
46
obeyed, every
lause in has a TRUE in it. We have exhibited a satisfying assignment
and therefore is satisable.
k-
oloring example: As further pra
ti
e with redu
tions,
onsider the 3-
oloring
problem (whi
h you will prove to be NP-
omplete in a homework):
{ 3-Coloring: Given a graph G, is there a
oloring of the verti
es with 3
olors so
that no two adja
ent verti
es are
olored with the same
olor?
We will try to prove that the 4-
oloring problem is NP-
omplete:
{ 4-Coloring: Given a graph G, is there a
oloring of the verti
es with 4
olors so
that no two adja
ent verti
es are
olored with the same
olor?
We will skip over verifying that 4-
oloring is in NP; it should be quite easy.
We now want a redu
tion R that performs the following mapping:
YES
YES
NO
NO
R
3-COLORING
4-COLORING
We are given an (arbitrary) graph G, and we want the redu
tion to produ
e a graph
G0 so that we
an prove the following
laim:
Claim 4 G is 3-
olorable if and only if G0 is 4-
olorable.
As a rst attempt, we might try setting G0 = G. One dire
tion of the
laim works { if
G is 3-
olorable, then G0 = G is
ertainly 4-
olorable (we
an just not use one
olor).
However, the other dire
tion is problemati
. If G0 is 4-
olorable, we don't know that
G is 3-
olorable. We need to nd a way to \for
e" G0 to \waste" one
olor, so that the
remaining
oloring is essentially a 3-
oloring of G. We
an do this by dening G0 to
be G with one extra vertex v that is
onne
ted to every other vertex. Having spe
ied
our redu
tion, we
an now prove the
laim:
Proof:[of
laim (!) Suppose we have a 3-
oloring of G. Then we
an use the same
olors to
olor all of G0 ex
ept the extra vertex v with 3
olors, and then
olor v with
the 4-th
olor. Hen
e G0 is 4-
olorable.
47
( ) Suppose we have a 4-
oloring of G0. Noti
e that whatever
olor is used by the
extra vertex v
annot be used by any of the other verti
es (be
ause v is
onne
ted
to every other vertex). Therefore the remaining verti
es of G0 (whi
h are exa
tly the
verti
es of G) must be
olored with only 3
olors, whi
h implies that G is 3-
olorable.
Noti
e that we
an generalize this redu
tion. We
an prove the problem k-Coloring,
for any k 4, is NP-
omplete (e.g., 29-Coloring is NP-
omplete). We use the same
idea, but now we need to add k 3 extra verti
es, ea
h of whi
h is
onne
ted to every
other vertex (in
luding the other k 4 extra verti
es).
48
Chris Umans
umans s.berkeley.edu
Christodes' algorithm: We dis
ussed an approximation algorithm for TSP with the
triangle inequality in le
ture; here is a more
lever variant. Re
all the approximation
algorithm from
lass: we are given a
omplete graph G whose edge weights satisfy the
triangle inequality. We nd a minimum spanning tree M in G, and form a tour T 0 by
going \twi
e around" this tree, as pi
tured below:
We then transformed tour T 0 into tour T by \short-
ir
uiting" the tour so that it
visited every vertex exa
tly on
e. That is, starting at the root, we followed T 0 until it
was about to visit an already-visited vertex, at whi
h point we moved dire
tly to the
next unvisited vertex on tour T 0.
We noti
ed that be
ause an optimum traveling salesperson tour of G yields a spanning
tree with the same or smaller
ost (by simply deleting an edge from the tour), that
w(M ) w(OP T ). Here w(M ) is the weight of spanning tree M , and w(OP T ) is the
length of the optimum traveling salesperson tour.
Be
ause of the triangle inequality, the short-
ir
uiting pro
ess
annot have in
reased
the tour length, so we have:
w(T ) w(T ) = 2w(M ) 2w(OP T );
whi
h means that we have an approximation algorithm with ratio 2, sin
e it returns a
tour that is no longer than twi
e the optimum.
Christodes' algorithm introdu
es a
lever twist, and improves the ratio to 1:5. It
begins in the same way, by nding in G = (V; E ) a minimum spanning tree M .
It then nds a minimum weight perfe
t mat
hing on V 0, the odd-degree verti
es of
M . This requires some explanation. First, noti
e that M (indeed any graph) has an
even number of odd-degree verti
es. This is be
ause the sum of the degrees equals
twi
e the number of edges, an even number. Now, a perfe
t mat
hing is simply a
mat
hing in whi
h every vertex is tou
hed by an edge in the mat
hing. The weight of
a mat
hing is just the sum of the weights of the edges in the mat
hing. A minimum
weight perfe
t mat
hing (MWPM) is a perfe
t mat
hing with minimum weight. There
50
is a polynomial-time algorithm that nds a minimum weight perfe
t mat
hing, but we
won't des
ribe it here.
Let us
all the MWPM we nd P . We add the edges of P to our MST M . In our
example graph, we might get something like this:
Noti
e that in the resulting graph, every vertex has even degree! We have \repaired"
all of the odd-degree verti
es in V 0 by mat
hing them with other odd-degree verti
es.
What do we know about graphs in whi
h every vertex has even degree? There is an
Euler
ir
uit (a tour that traverses every edges exa
tly on
e and returns to its starting
point). We let T 0 be an Euler
ir
uit, and then short-
ir
uit it to get an honest traveling
salesperson tour T as before.
Now, how long is our tour T ? We know that w(T ) w(T 0) as before, and that
w(T 0) = w(M ) + w(P ), sin
e T 0 traverses every edge in the spanning tree M and the
perfe
t mat
hing P exa
tly on
e. As before we have the inequality w(M ) w(OP T ).
We need to know the relationship between w(OP T ) and w(P ).
We
laim that 2w(P ) w(OP T ). This is not hard to see: rst, noti
e that an optimum
traveling salesperson tour on only the verti
es in V 0 (
all it OP T 0)
an be no longer
than OP T , by the triangle inequality { we
ould just short-
ir
uit OP T to skip over
the verti
es not in V 0 , making the tour no longer in the pro
ess. Now, sin
e there
are an even number of verti
es in V 0, OP T 0 is a
y
le of even length. Therefore, by
taking every-other edge around OP T 0, we get a perfe
t mat
hing on V 0, and by taking
the other edges around OP T 0 we get another perfe
t mat
hing on V 0. Ea
h of these
perfe
t mat
hings
an have weight no larger than the MWPM, so we have:
w(OP T ) w(OP T 0) 2w(P ):
This implies that w(P ) 0:5w(OP T ). Putting everything together, we
an bound the
length of our tour T :
w(T ) w(T 0) = w(M ) + w(P ) w(OP T ) + 0:5w(OP T ) = 1:5w(OP T ):
Therefore, the algorithm we just des
ribed attains an approximation ratio of 1.5, sin
e
it returns a tour that is no longer than 1.5 times the optimum.
51
Chris Umans
May 3, 2000
umans s.berkeley.edu
nV
1
2
n
We noti
e that the feasible subsets of items (subsets whose total weight does not ex
eed
W ) remains un
hanged, so our algorithm's solution is a valid solution to the original
instan
e. Be
ause ea
h element potentially lost some value in the s
aling down/s
aling
up pro
ess, AP P ROX OP T , where OP T is the value of an optimum solution to
the original problem. But how mu
h smaller that OP T
an AP P ROX really be?
We
laim that ea
h item lost at most 2k in value (be
ause we essentially ignored the
lower order k bits). Sin
e there are n items, we have:
AP P ROX OP T n2k :
We then have:
k
k
AP P ROX OP T n2k
= 1 n2 1 n2 ;
OP T
OP T
OP T
OP T V (we
54