Trees and Optimization

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

31

Chapter 1: Trees and Distance

Section 1.1: Trees and Optimization

32

C) G has n 1 edges and no cycles.


D) G has exactly one u , v-path whenever u , v V(G).

1.1. TREES AND OPTIMIZATION


The word tree suggests branching out and never completing a cycle.
Many computational applications use trees to organize data or decisions.
Cleverly managed trees provide data structures that enable many algorithms to run quickly. In this section, we explore optimization problems
concerning trees. First we need definitions and basic properties.
1.1.1. DEFINITION. A forest is a graph with no cycle (an acyclic
graph). A tree is a connected forest. A leaf (or pendant vertex)
is a vertex of degree 1. A spanning subgraph is a subgraph containing all vertices. A spanning tree is a spanning subgraph that
is a tree.
1.1.2. PROPOSITION. A tree with at least two vertices has at least two
leaves. If v is a leaf of a tree G, then G v is a tree.

Proof: The first statement holds because the endpoints of a maximal nontrivial path in an acyclic graph have degree 1. Next, since a cut-vertex v
of a connected graph G has a neighbor in each component of G v, a leaf
cannot be a cut-vertex. Thus when v is a leaf of a tree G, the graph G v
is connected and acyclic.

PROPERTIES OF TREES
Trees have many equivalent definitions. Verifying any one shows
that a graph is a tree, and then the others are available for use.
1.1.3. THEOREM. For a graph G on n vertices, the following properties
are equivalent (and define the class of trees on n vertices).
A) G is connected and has no cycles.
B) G is connected and has n 1 edges.

Proof: We prove the equivalence of A , B, and C by showing that any two


of {connected, acyclic, n 1 edges} implies the third.
A B , C. Use induction on n; for n = 1, an acyclic graph has no
edge. For n > 1, Proposition 1.1.2 provides a leaf x and implies that G x
is a tree with n 1 vertices. By the induction hypothesis, G x has n 2
edges. Since d(x) = 1, it follows that G has n 1 edges.
B A , C. Delete edges of cycles in G until the resulting graph G is
acyclic. Since G is connected and no edge of a cycle is a cut-edge (Lemma
0.44), G is connected. Since G is acyclic and A B , C, G has n 1
edges. Hence no edges were deleted, and G itself is acyclic.
C A , B. Let the components of G be G 1 , . . . , G k , with | V(G i)| = ni
for all i. Since G is acyclic, each G i is connected and acyclic, so | E(G i)| =
k
ni 1. Thus | E(G)| = i=1 | E(G i)| = n k. Since | E(G)| = n 1, we have
k = 1, and G is connected.
A D. Since G is connected, for u , v V(G) there is a u , v-path.
To prohibit a second path, we use extremality. Over all pairs of distinct
paths with the same endpoints, let {P, Q} be a pair with minimum total
length. By this choice, P and Q have no common vertices other than their
endpoints. Hence P Q is a cycle in G, which contradicts condition A.
D A. Existence of the paths implies that G is connected. Uniqueness of the paths prohibits cycles.
To characterize trees among multigraphs, one must add to Theorem
1.1.3(D) a prohibition of loops. The rest remains the same. The defining
properties have many applications.
1.1.4. PROPOSITION. If T is a tree with k edges and G is a graph with
(G) k, then T is a subgraph of G. Also, this inequality is sharp.

Proof: Since K k has minimum degree k 1 and contains no tree with k


edges, no value of (G) less than k can force the appearance of T .
The sufficiency of (G) k follows by induction on k. When k = 0,
the graph G has a vertex and the claim holds. When k > 0, let T be a
tree on k vertices obtained from T by deleting a leaf v with neighbor u.
Since (G) k > k 1, the induction hypothesis applies to yield T as a
subgraph of G. Let x be the vertex in this copy of T that represents u.
Because T has only k 1 vertices other than u, some y NG(x) does not
appear in this copy of T . Adding the edge xy to represent uv enlarges
this copy of T in G to a copy of T in G.

33

Chapter 1: Trees and Distance


x

y

Since edges of cycles are not cut-edges, we can delete edges from a
connected graph to obtain a spanning tree. We next use two properties of
trees to prove a result about spanning trees: 1) Since a tree has no cycles,
every edge is a cut-edge. 2) Since a tree has a unique connecting path for
each vertex pair, adding any edge creates exactly one cycle. We use subtraction and addition to indicate deletion and inclusion of single edges.
1.1.5. PROPOSITION. If T and T are two spanning trees of a connected graph G and e E(T) E(T ), then there exists e E(T ) E(T) such that T e + e and T + e e are both spanning trees of
G.
Proof: The figure below illustrates T and T sharing two edges, with
E(T) bold and E(T ) solid. The specified edge e is a cut-edge of T ; let U
and U be the vertex sets of the components of T e. Adding e to T creates a unique cycle C. The path C e contained in T has its endpoints
in U and U , so it has an edge e with endpoints in U and U .
Since e is the only edge of T joining U and U , we have e E(T )
E(T). Since e joins the components of T e, the graph T e + e is a spanning tree. Since e is on the path from U to U in T , it lies on the unique
cycle formed by adding e to T . Hence T + e e also is a spanning tree.

e E(T)

e E(T )

34

five blocks; three copies of K 2 , one of K 3 , and one subgraph that is neither
a cycle nor complete.

Section 1.1: Trees and Optimization

In any graph, the maximal subgraphs not having cut-vertices are


useful subgraphs that form a tree-like structure.
1.1.6. DEFINITION. A block of a graph G is a maximal connected
graph H such that H is a subgraph of G and has no cut-vertex. If
G itself is connected and has no cut-vertex, then G is a block.
1.1.7. Example. Blocks. If H is a block of G, then H has no cut-vertex,
but H may contain cut-vertices of G. For example, the graph below has

1.1.8. REMARK. Properties of blocks. An edge of a cycle cannot itself


be a block, because it belongs to a larger subgraph having no cut-vertex.
Hence an edge is a block of G if and only if it is a cut-edge of G (the blocks
of a tree are its edges). If a block has more than two vertices, then it is 2connected. The blocks of a graph are its isolated vertices, its cut-edges,
and its maximal 2-connected subgraphs.
1.1.9. PROPOSITION. Two blocks in a graph share at most one vertex.
Proof: Given blocks B1 and B2 sharing two vertices, choose x V(B1
B2). Since each block has no cut-vertex, deleting x leaves a path within
Bi from every vertex of Bi x to each vertex of (B1 B2) x. Hence B1
B2 x is connected. Now B1 B2 is a subgraph with no cut-vertex, which
contradicts the maximality of B1 and B2 .
Thus the blocks of a graph G form a decomposition of G. When two
blocks of G share a vertex, it must be a cut-vertex of G. The interaction
between blocks and cut-vertices is described by an auxiliary graph.
1.1.10. DEFINITION. The block-cutpoint graph of a graph G is a bipartite graph H in which one partite set consists of the cut-vertices
of G, and the other has a vertex bi for each block Bi of G. We include
vbi as an edge of H if and only if v Bi .
b

i
e
g
a
G c
x

d
f
j

b5

b1 b3 b2
a e
x
b4

The block-cutpoint graph of a connected graph G is a tree (Exercise


13) whose leaves are blocks of G. A graph G with connectivity 1 has at
least two leaf blocks that each contain exactly one cut-vertex of G.

35

Chapter 1: Trees and Distance

OPTIMAL SPANNING TREES


In a connected graph with many spanning trees (see Chapter 6 for
enumeration), which is best ? For example, the Minimum Connector
Problem seeks a connected subgraph with minimum total weight in a
graph with weighted edges. For nonnegative weights, the solution is a
spanning tree. Naively, we iteratively include an edge of smallest weight
that creates no cycle. Locally optimal heuristics are often called greedy
algorithms. This is one of the rare instances where a greedy algorithm
finds an optimal solution.
1.1.11. ALGORITHM. (Kruskals Algorithm; Minimum-Weight Spanning Trees)
Input: A weighted connected graph.
Idea: Maintain an acyclic spanning subgraph H , enlarging it by edges
with low weight to form a spanning tree. Consider edges in nondecreasing order of weight, breaking ties arbitrarily.
Initialization: Set E(H) = .
Iteration: If the next cheapest edge joins two components of H , then
include it; otherwise, discard it. Terminate when H is connected.
No added edge creates a cycle, so each new edge connects two components. We begin with n components and reduce this number by one with
each step. As long as more than one component remains, there are edges
joining components (since the input graph is connected), and we have not
yet considered them (since every edge considered is in H or completes a
cycle with H). Thus n 1 steps are performed and produce a subgraph
that is a tree. We will prove that this tree has minimum weight.
1.1.12. Example. Kruskals algorithm uses only the order of the weights,
not their magnitude. In the example below, edges are labeled (and considered) in increasing order of weight. Edges of equal weight may be examined in any order; the resulting trees have the same cost. Here, the four
cheapest edges are selected, but then we cannot take the fifth or sixth.

11

8 2
7
10

5
12

36

1.1.13. THEOREM. (Kruskal [1956]). In a connected weighted graph G,


Kruskals Algorithm constructs a minimum-weight spanning tree.
Proof: Let T be a tree produced by Kruskals Algorithm, and let T be
a minimum spanning tree. If T 6= T , let e be the first edge chosen for
T that is not in T . Adding e to T creates one cycle, which contains an
edge e
/ E(T) since T has no cycle. Now T + e e is a spanning tree.
Since T contains e and all the edges of T chosen before e, both e
and e are available when the algorithm chooses e, and hence w(e) w(e).
Thus T + e e is a spanning tree with weight at most T that contains a
longer initial segment of T . Since T is finite, iterating this switch leads
to a minimum-weight spanning tree containing T . Phrased extremally,
we have proved that a minimum-weight spanning tree agreeing with T
for the longest initial segment must be T itself.
To implement Kruskals Algorithm, first sort the m edges by weight.
Maintain for each vertex the label of the current component containing
it. Accept the next cheapest edge if its endpoints have different labels.
Merge the two components involving an accepted edge by assigning one of
the labels to every vertex having the other label. By always merging the
smaller component into the larger, each label will change at most log2 n
times, and the total number of changes is at most n log2 n.
In this implementation, the time complexity is governed by the sorting of edge weights. Analysis of Kruskals Algorithm often assumes presorted weights; otherwise, other algorithms may do better. Prims Algorithm (Exercise 25) grows a spanning tree from a single vertex by iteratively adding the cheapest edge that incorporates a new vertex. It and
Kruskals Algorithm are comparable when weights are pre-sorted.
[1926] and Jarnick [1930] posed and solved the minBoth Bor uvka

imum spanning tree problem. Modern improvements use clever data


structures to merge components quickly. Fast versions appear in Tarjan
[1984] for when the edges are pre-sorted and in GabowGalilSpencer
Tarjan [1986] for when they are not. Thorough discussion and further
references appear in AhujaMagnantiOrlin [1993, Chapter 13]. More
recent developments appear in K argerK leinTarjan [1995].

Section 1.1: Trees and Optimization

Next we seek a spanning tree with the most leaves. When our graph
models a communication network, we seek the smallest set of vertices to
protect so that all surviving vertices after an attack can communicate.
The non-leaf vertices in a spanning tree form a set S such that G[S] is connected and every vertex outside S is adjacent to some vertex of S. Such a

37

Chapter 1: Trees and Distance

set S is a connected dominating set. A smallest connected dominating


set is always the set of non-leaves in a spanning tree with the most leaves;
this equivalence was perhaps first noted in HedetniemiLaskar [1984].
The problem of maximizing the number of leaves in a spanning tree
of G is NP-complete (GareyJohnson [1979, p206]), so we may be content
to find a spanning tree with many leaves. NP-completeness increases the
value of constructive proofs for general bounds. A constructive proof that
all graphs in some class have spanning trees with at least t leaves becomes
an algorithm to produce such a tree for graphs in this class.
DingJohnsonSeymour [2001] solved this extremal problem in terms
of the numbers of vertices and edges. If n 6= t + 2 and G has at least n + (2t )
edges, then G has a spanning tree with more than t leaves. The result is
sharp: some n-vertex graph with n + (2t ) 1 edges has no spanning tree
with more than t leaves (Exercise 31).
Earlier, the extremal problem was studied in terms of the number of
vertices and the minimum degree. Let l(n , k) be the largest t such that
every connected n-vertex graph with minimum degree at least k has a
spanning tree with at least t leaves. When G = Cn , spanning trees have
only 2 leaves, so l(n , 2) = 2. For k 2, the cycle generalizes to a k-regular
graph that has a linear number of nonleaves in each spanning tree.
2
n
1.1.14. Example. l(n , k) kk
+1 n + 2. Let s = k+1 . We construct
a graph G n ,k , with n vertices and minimum degree k, whose spanning
trees all have t least 3s 2 nonleaf vertices. Begin with complete graphs
R1 , . . . , Rs , each of order at least k + 1, together having n vertices.
Choose x i , yi Ri ; let W = {x i}is=1 {yi}is=1 . Delete the edges {x i yi}is=1 .
Let Z = {x i y(i+1) (mod s)}is=1 , and add the edges in Z to complete G n ,k . Note
that (G n ,k) = k.
Consider a spanning tree T . Any two edges in Z form an edge cut, so
T lacks at most one edge of Z. If x j yj +1
/ E(T), then T contains an x i , yipath in Ri , for each i. Now the nonleaves contain some vertex of Ri W
for each i, plus all of W {x j , yj +1 }. If Z E(T), then T lacks an x i , yipath in Ri for exactly one value of i, say j . This forces at least 3(s 1)
nonleaves in V Rj , and k 2 forces an additional nonleaf at x j or yj .

Ri

38

For k = 3, there are several proofs that the construction in Example 1.1.14 is optimal. For k = 4, the optimal bound l(n , 4) 52 n + 85
was proved in GriggsWu [1990] and in K leitmanWest [1991] (two small
graphs have no tree with 52 n + 2 leaves). GriggsWu [1990] also proved
that l(n , 5) = 36 n + 2. The proofs are algorithmic, constructing a tree with
at least this many leaves. We present a proof for k = 3.
1.1.15. THEOREM. (LinialSturtevant [unpub.], GriggsWu [1990],
K leitmanWest [1991]) Every connected N-vertex graph G with
(G) 3 has a spanning tree with at least N/4 + 2 leaves.
Proof: We provide an algorithm to grow such a tree. Let T denote the
current tree, with n vertices and l leaves. If x is a leaf of T , then the
external degree of x, denoted d(x), is | NG(x) V(T)|. The operation of expansion at x consists of adding to T the d(x) edges from x to NG(x) V(T).
We grow T by operations, where each operation consists of some number
of expansions. Note that expansion preserves the property that all edges
from T to G V(T) are incident to leaves of T .
A leaf x of T with d(x) = 0 is dead; no expansion is possible at a
dead leaf, and it remains a leaf in the final tree. Let m be the number
of dead leaves in T . An expansion that makes y a dead leaf kills y. We
call an operation admissible if its effect on T satisfies the augmentation
inequality 3l + m n, where l, m, and n denote the changes
in the numbers of leaves, dead leaves, and vertices in T , respectively.
We grow a spanning tree by admissible operations. If G is not 3regular, we begin with the edges at a vertex of maximum degree. If G
is 3-regular and every edge belongs to a triangle, then G = K 4 , and the
claim holds. Otherwise, G is 3-regular and has an edge in no triangle; in
this case we begin with such an edge and the four edges incident to it.

yi

xi

Section 1.1: Trees and Optimization

If T is grown to a spanning tree with L leaves by admissible operations, then all leaves eventually die. The final operation will kill at least
two leaves not counted by the augmentation inequality, so the total of
m from the augmentation inequalities for the operations will be at most
L 2. We begin with 4 leaves and 6 vertices if G is 3-regular; otherwise

39

Chapter 1: Trees and Distance

with r leaves and r + 1 vertices for some r > 3. Summing the augmentation inequalities over all operations yields 3(L 4) + (L 2) N 6 if G
is 3-regular and 3(L r) + (L 2) N r 1 otherwise. These simplify
to 4L N + 8 and 4L N + 2r + 1 N + 9, respectively, which yield
L N/4 + 2.
It remains to present admissible operations that can be applied until
T absorbs all vertices and to show that the last operation kills two extra
leaves. We use the three operations shown below, applying O2 only when
O1 is not available.

x
O1

O2

x
O3

O1: If d(x) 2 for some current leaf x, then expanding at x yields


l = d(x) 1, n = d(x), and m 0. The augmentation inequality
reduces to 2d(x) 3, which is satisfied when d(x) 2.
O2: If d(x) 1 for every current leaf x and some vertex outside T
has at least two neighbors in T , then expanding at one of them yields
l = 0 and m 1 = n, and the augmentation inequality holds.
O3: If y is the only neighbor of x outside T and y has r neighbors not
in T , where r 2, then expanding at x and then y yields l = r 1, n =
r + 1 and m 0. The augmentation inequality reduces to 3(r 1) r + 1,
which holds when r 2.
Because k = 3, every vertex outside T has at least two neighbors in T
or at least two neighbors outside T . Hence at least one of these operations
is available until T becomes a spanning tree.
Now consider the final operation. If it is O1 or O3, then the two new
leaves are dead, which contributes an extra 2 to m not counted by the
augmentation inequality. If the final operation is O2, then the new leaf is
dead and has at least two neighbors that are current leaves of T and also
become dead. Therefore, the contribution to m is at least 3 instead of
at least 1. In each case, the contributions to m from the augmentation
inequalities sum to at most L 2.
K k+1 ,

The graph G n ,k of Example 1.1.14 contains many copies of


the
graph obtained from K k+1 by deleting one edge. Forbidding this induced
subgraph forces more of the vertices to be leaves; GriggsK leitman
Shastri [1989] proved that every K 4-free connected n-vertex graph with

Section 1.1: Trees and Optimization

40

minimum degree at least 3 has a spanning tree with at least (n + 4)/3


leaves. The proof is difficult. Exercise 34 considers an easier variation.
The construction of Example 1.1.14 is optimal for k 5. It was long
thought to be essentially optimal for all k. However, Alon [1990] showed
probabilistically that for large n some k-regular graph has no dominat+1)
ing set of size less than (1 + o(1)) 1+ln(k
n. Since connected dominating
k +1
sets are dominating sets, the number of leaves that can be guaranteed
therefore cannot grow faster than about (1 lnk(k++11) )n (noted by Mubayi).
K leitman and West [1991] showed that Alons probabilistic construction is close to optimal. The connection between the two results apparently was not noticed until seven years after they were proved. For large
k, one cannot avoid having at least about (n ln k)/k nonleaves, and trees
this good exist. The proof below is simpler than that in K leitmanWest
[1991] and makes the result asymptotically sharp. Here 1 + replaces a
constant greater than 2.5 in the original result.
1.1.16.* THEOREM. (CaroWestYuster [2000]) Given > 0 and k large
in terms of , every connected graph with order N and minimum degree k has a spanning tree with more than [1 (1+)k ln k ]N leaves.

Proof: We grow such a tree. Begin with a star at a vertex of degree k


and iteratively expand the current tree T , which has n vertices, l leaves,
and external degree d(x) at each leaf x. Expansion at a leaf adds all outside neighbors, so only leaves have outside neighbors. Each operation
combines one or more expansions to satisfy the augmentation inequality
rl + M (r 1)n, where r is a parameter to be chosen in terms of k,
and M is a measure of total deadness of leaves.
A leaf is more dead as it has fewer external neighbors. We will
choose 0 , . . . , r with 0 r = 0 and say that a leaf x with
r 1
d(x) = i has deadness i (let i = 0 for i > r). Set M = i=0 i mi ,
where T has mi leaves with external degree i.
For the final tree, M = 0 L, and initially M 0. When we grow a
tree from the initial star, summing the augmentation inequalities yields
+ k +1 r
. When r k,
r(L k) + 0 L (r 1)(N k 1). Thus L (r1)N
r + 0
we can discard k + 1 r from the numerator. Dividing top and bottom by
r and applying 1+10/r > 1 r0 then yields
L > (1 1r )(1

0
)N
r

> (1 1+r0 )N ,

so we will choose r and 0 so that 1+r0 < (1+)k ln k .


We define operations Oi and Pi for each i, applied only when the
largest external degree is i. Operation Oi is one expansion at a leaf with

41

Chapter 1: Trees and Distance

external degree i. Operation Pi consists of expansion at a leaf with external degree i plus expansion at one of those new vertices (see figure below).
When the maximum external degree is i, we perform Pi at a vertex x
with d(x) = i if the number of vertices introduced by the second expansion is at least 2r + i i. If no such choice is available, then we perform Oi .
This procedure always chooses some operation, and we grow a spanning
tree. The operations will be admissible because (1) Pi provides enough
leaves, and (2) when Pi is not available, Oi increases deadness by enough.
When the second expansion in Pi introduces s leaves, n = s + i and
l = s + i 2. We need r(s + i 2) + M (r 1)(s + i). Always M i ,
since x is no longer a leaf. Hence s + i 2r i 0 suffices, and this holds
since Pi is used only when s 2r + i i.
s 2r + i i

=i

Oi

=i

Pi

Now consider Oi . We need r(i 1) + M (r 1)i, or M r i. We


ignore contributions to M from new leaves, since we have no control over
their external degree. For each edge joining a new vertex y to a current
leaf z other than x, expansion at x reduces d(z) from j to j 1 for some
j , and we gain j 1 j . Since i is the maximum external degree, j i.
Let cj = j 1 j . If c1 cr 0, then each edge back to the
current tree increases deadness by at least ci . With c1 , . . . , cr chosen this
r
way, i = j =i+1 cj . Now M qci i , where q counts the edges from
the new leaves to old leaves other than x. Admissibility of Oi follows if q
is large enough to make qci i r i.
When Pi is unavailable, each new leaf has fewer than 2r + i neighbors outside V(T) NG(x). It also has at most i neighbors in NG [x] V(T).
Hence it has more than k 2r i neighbors other than x in T , so q >
i(k 2r i). Admissibility of Oi follows if ci i(k 2r i) r i + i .
We specify r and nonincreasing c1 , . . . , cr to satisfy this inequality
for all i. Set ci = bi for 1 i r (we will choose b in terms of ). We
then want b(k 2r) r i + (1 + b) i . Since i i+1 and i < i + 1,
it suffices to establish the inequality when i = 1, where it simplifies to
r
+1)r
1
1 bk+1b+(2b
. Our choice of ci yields 0 = b i=1 1i b[ln r + 2r
+ .577]
1
(see Knuth [1973, p73-78] for the bound on the harmonic number ri=1 1i ).

Section 1.1: Trees and Optimization

42

Since 1 = 0 b, when r 2 we have


1 b[ln r +

1
2r

.423] < b ln r < b ln k.

+2b)r
suffices, so we set r = 1+b2b (k (1 + b) ln k) .
Therefore, b ln k bk(1
1 +b
Now the augmentation inequality holds for Oi .
With these choices, we have proved that l(n , k) (1 1+r0 )n. Since
1
+2b)+(1+2b) ln k
ln k
1 + 0
(1 + O( lnkk )), choosing b < /3 yields
< (1+b )(1
= (1+2b)
r
k
k(1+b) ln k
the desired lower bound when k is sufficiently large.

Note that for small , the proof of Theorem 1.1.16 sets r asymptotically to about k, but it still requires r 2, so roughly k > 2/ is
needed. CaroWestYuster [2000] also gave a probabilistic algorithm for
connected domination that does as well as Theorem 1.1.16. For each fixed
k with k 6, the exact value of l(n , k) in terms of n remains unknown.

OPTIMAL SEARCH TREES AND CODING


Trees are used in computer science to model hierarchical structures.
1.1.17. DEFINITION. A rooted graph is a graph with one vertex distinguished as a root. In a rooted tree, the neighbor of a vertex on the
path from it to the root is its parent, and the other neighbors are its
children. An ordered tree is a rooted tree in which the children of
each vertex are given a fixed (left-to-right) order. A binary tree is
an ordered tree in which every vertex has zero or two children.
The root in a rooted tree has no parent. Ordered trees are also called
rooted plane trees or planted trees, since the ordering of children
yields a natural drawing in the plane. In a binary tree, the left subtree and right subtree are the subgraphs obtained by deleting the root
r; they are rooted at the left and right children of r, respectively. Some
applications of binary trees allow vertices to have one child, still designated as left or right. In most discussions of k-ary trees, each vertex has
0 or k children. Vertices in rooted trees are often called nodes.
1.1.18. Example. Below are the five binary trees with four leaves.

43

Chapter 1: Trees and Distance

Binary trees support data storage for efficient access. If we associate


each item with a leaf, then we can access them by a search from the root
that always says which subtree at the current node contains the desired
leaf. Given access probabilities among n items, we want to associate the
items with the n leaves of a binary tree to minimize the expected search
length. The length of a search is the distance from the root to the leaf.
Alternatively, with large computer files and limited storage, we want
binary codes for characters to minimize total length. The relative character frequencies define probabilities. Treating the items as messages
with probabilities p1 , . . . , pn , we want to assign binary codewords to minimize the expected message length. The problems of minimizing expected
search length and expected message length are equivalent.
The length of codewords may vary, so a way to recognize the end of
a codeword is needed. If no codeword is a prefix of another, then the current word ends when the bits since the end of the previous word form a
codeword. This prefix-free condition allows the codewords to correspond
to the leaves of a binary tree by associating left with 0 and right with 1.
The expected length of a message is pi li , where the ith word has probability pi and code-length li . Constructing the optimal code is surprisingly
easy (n = 1 can also be used as the basis).
1.1.19. ALGORITHM. (Huffmans Algorithm [1952]; Prefix-free Coding).
Input: Weights (frequencies or probabilities) p1 , . . . , pn .
Output: Prefix-free code (equivalently, a binary tree).
Idea: Infrequent messages should have longer codes; put infrequent messages deeper by combining them into parent nodes.
Initial case: If n = 2 the optimal length is one, and 0,1 are the codes
assigned to the two messages (the tree consists of a root and two leaves).
Recursion: If n > 2, replace the two least likely items p and p with a
single item q having weight p + p . Solve the smaller problem with n 1
items. Give children with weights p and p to the leaf for q. That is, replace the codeword computed for q with its extensions by 1 and 0, assigned
to the items that were replaced.
1.1.20. Example. Huffman coding. Suppose the relative frequencies of 8
messages are 5,1,1,7,8,2,3,6. The algorithm iteratively combines lightest items to form the tree on the left below, working from the bottom up.
The tree is redrawn on the right with leaves labeled by frequencies and
codewords. Placed in the original order, the codewords are 100, 00000,
00001, 01, 11, 0001, 001, and 101. For the expected length, we compute

Section 1.1: Trees and Optimization

44

pi li = 90/33 < 3; the expected length of a code using the eight words of
length 3 would be 3.
33
19
11

14


5 1 1 7 8 2 3 6

8:11
7:01

6:101

3:001 5:100

2:0001
1:00000 1:00001

1.1.21. THEOREM. For distribution p1 , . . . , pn , Huffmans Algorithm


produces the prefix-free code with minimum expected length.
Proof: We use induction on n. For n = 2, the algorithm produces the
only binary tree. Consider n > 2. Given a tree with n leaves, greedily
assigning messages to leaves with depths in reverse order to probabilities
minimizes the expected length. Thus every optimal code has two least
likely messages at leaves of greatest depth. Since every leaf at maximum
depth has another leaf as its sibling, we may thus assume that the least
likely messages appear as siblings at greatest depth; permuting items at
a given depth does not change the expected length.
Let T be an optimal tree, having the items with least probabilities
pn and pn1 as sibling leaves of greatest depth. Let T be the tree obtained from T by deleting these leaves. The tree T yields a code for
q1 , . . . , qn1 , where qn1 = pn1 + pn and otherwise qi = pi . Let k be the
depth of the leaf for qn1 in T . The cost for T is the cost for T plus qn1 ,
since we lose kqn1 and gain (k + 1)(pn1 + pn) in changing T to T .
This holds no matter which sibling pair at greatest depth we combine to form T , so we optimize T by optimizing T for q1 , . . . , qn . By
the induction hypothesis, T is optimized by applying Huffmans algorithm to {qi}. Since the replacement of {pn1 , pn} by qn1 is the first step
of Huffmans algorithm for {pi}, we conclude that Huffmans algorithm
generates the optimal tree for p1 , . . . , pn .

pn

pn1

pj

k
k+1

qn1

qj

45

Chapter 1: Trees and Distance

Huffmans algorithm computes an optimal code, but how does it compare to a balanced tree with every codeword having length lg n or
lg n ? If n = 2k and the words are equally likely, then the balanced
tree with all leaves at depth k is optimal, as produced by the algorithm.
With pi = 1/n, the resulting expected length is kpi = pi lg pi ; the
latter quantity is called the entropy of the discrete probability distribution {pi}. The formula is no coincidence; entropy is a lower bound on the
expected length. This holds for all codes with binary codewords, not just
prefix-free codes.
1.1.22. THEOREM. (Shannon) For every probability distribution {pi}
on n messages and every binary code for these messages, the expected
length of codewords is at least pi lg pi .

Proof: We use induction on n. For n = 1 = p1 , the entropy is zero, as


is the expected length for the optimal code, since there is no need to use
any digits. For n > 1, let W be the set of words in an optimal code, with
W0 and W1 being the subsets starting with 0 and 1, respectively. If all
words start with the same bit, then deleting the first bit of each reduces
the expected length, and the code is not optimal.
Hence W0 and W1 are codes for smaller sets. Let qj be the sum of
the probabilities for the messages in Wj ; normalizing the given probabilities by qj gives the probability distributions for code Wj . Since the words
within Wj all start with the same bit, the expected length is at least 1
more than the optimal expected length for the normalized distribution
over its words.
Applying the induction hypothesis to both W0 and W1 , we find that
the expected length for W is at least

pj
pj
pi
pi
lg + q1 1
lg
q0 1
q0
q0
q1
q1

j W1
i W0

46

36). When the probabilities vary greatly, Huffman coding is much more
efficient than codes words of equal length. The length of a computer file
coded for compactness may be only half its length under ASCII coding,
which assigns 5 digits per character. Coding individual files accentuates
this; the distribution of characters in a program source file may be much
different from that of a document or another program.
However, we may want the codewords in the same order as the message words. This makes searching easy, because we can store at each internal vertex a word between the largest message word at a leaf of the left
subtree and the smallest message word at a leaf of the right subtree. The
expected length will be longer than in Huffman coding, but these alphabetic prefix-free codes can be easier to use while almost as efficient.
Since the items must appear at leaves in left-to-right order, the
leaves in any subtree get one of the (n2) sets of consecutive messages. The
final merge to complete an optimal alphabetic tree must combine an optimal tree for the first k leaves and an optimal tree for the last n k leaves,
for some k. To choose the best k, we solve the subproblems for all consecutive segments. This algorithmic technique of solving all subproblems is
called dynamic programming.
1.1.23. ALGORITHM. (Optimal Alphabetic Trees).
Input: Frequencies p1 , . . . , pn and fixed left-to-right ordering of leaves.
Initialization: Set c(S) = 0 for each singleton leaf set S.
Iteration: For i from 2 through n, compute a cost c(S) for each segment S
of i consecutive nodes. With Sk being the first k of these, and Sk = S Sk ,
the cost is c(S) = ( j S pj ) + mink [c(Sk ) + c(Sk )].
1.1.24. THEOREM. Algorithm 1.1.23 computes optimal alphabetic trees
in time O(n3).

= 1 pi(lg pi lg q0)
i W0

Section 1.1: Trees and Optimization

pj (lg pj lg q1) .

j W1

= 1 + q0 lg q0 + q1 lg q1 pi lg pi
i W

It suffices to prove that 1 + q0 lg q0 + q1 lg q1 0 when q0 + q1 = 1. Let


f(x) = x lg x. The function f is convex for x > 0 (since f is positive), so
1 + f(x) + f(1 x) 1 + 2f(.5) = 0.
Huffmans algorithm comes close to Shannons bound. If each pi is a
power of 1 /2, then the Huffman code achieves the entropy bound (Exercise

Proof: When two adjoining segments are merged, the search path to each
leaf lengthens by 1. This explains the combining cost j S pj in the algorithm. Since an optimal tree must merge optimal subtrees, induction on
i proves that the algorithm finds an optimal tree.
A separate dynamic program computes the (n2) combining costs j S pj
in advance, in increasing order of | S| , using constant time per computation. The algorithm then computes a potential combination for each
choice of two adjoining segments, in increasing order of the size of the
union. Such a pair is specified by the start and end of the first segment
(possibly equal) and the end of the second segment. The algorithm performs (n3) + (2n) constant-time computations to find all values c(S), keeping
always the best value found among the | S| 1 candidates for c(S).

47

Chapter 1: Trees and Distance

Knuth [1971] showed how to manage the computation more cleverly


to do it in quadratic time, even for a more general problem. Yao [19??]
later found further refinements. Nagaraj [1997] is a tutorial on optimal
binary search trees with a good bibliography.
Hu and Tucker developed a faster algorithm for optimal alphabetic
codes. It computes a not-necessarily-alphabetic tree, discards that tree
while keeping its depth information for leaves, and then finds an alphabetic tree with leaves at those depths.
1.1.25. ALGORITHM. (HuTucker Algorithm).
Input: Frequencies and fixed left-to-right ordering of leaves.
Step 1. Iteratively merge two compatible items with least total weight,
where items are compatible if all items between them have already participated in merges. Replace the merged items by a parent with their
total weight, placed between their former positions in the list.
Step 2. The output of Step 1 is a binary tree. Compute the depth of each
original item (the number of merges it participated in).
Step 3. Construct an alphabetic tree with the leaves at these depths by
iteratively pairing leftmost adjacent items with the largest depths, replacing them with one item having the next smaller depth.
1.1.26. Example. Consider input frequencies 3 , 2 , 2 , 3 , 6 , 2 , 3 , 2 in order. In Step 1 of the HuTucker Algorithm, we first combine the leftmost
2s, but the other 2s are not compatible. We combine one of these with the
3 between them and then combine two 3s across the initially combined
item. Proceeding yields the tree on the left below.
This tree is not alphabetic. The depths computed in Step 2 are
3 , 3 , 3 , 3 , 2 , 4 , 4 , 3 in order. To form the corresponding alphabetic tree,
combine the 4s, then three pairs of neighboring 3s, etc.
23

13
10
7
5
4 6

3


2
3 2 2 3

2

Step 1 maintains a shrinking list of the current weights, with the


crossable ones (results of merges) marked. The pairs that later are compatible are not affected by where between its two inputs the output of
a merge is placed. Hence the location also does not affect the resulting

Section 1.1: Trees and Optimization

48

depths. If the new item replaces its left child in the list, and the algorithm breaks ties by merging the lexicographically leftmost least compatible pair, then the merged pair always consists of two consecutive items.
This produces a fast implementation (GarsiaWachs [1977]).
The proof of correctness is surprisingly hard (it may be skipped without loss of continuity). Hu [1973] shortened the original HuTucker
[1971] proof. We follow the still shorter proof in HuK leitmanTamaki
[1979], which permits a more general optimization criterion.
The proof has two main steps. Feasibility: The depths resulting from
Step 2 are realizable by an alphabetic tree. Optimality: The tree resulting from Step 1 has minimum cost among a class of trees that includes all
alphabetic trees. The final alphabetic tree is then also optimal, because
pi li does not change when we rearrange the tree to make it alphabetic
without changing the depths of the leaves.
The proof of feasibility requires technical lemmas about the weights
of nodes and their order of formation. As Step 1 proceeds, the items in a
segment bounded by noncrossable nodes (including the boundary nodes)
are pairwise compatible. When a noncrossable item is merged, the two
compatibility sets involving it combine. Therefore, if u and v are compatible and both are crossable, then they are compatible with the same
set of nodes as long as they both exist.
Let T be the tree produced by Step 1 of the algorithm. We write w(u)
for the weight of a node u in T , with w(u) w(v) meaning that w(u) <
w(v) or that w(u) = w(v) with u to the left of v in the list. Now no two
nodes or pairs of nodes are considered equal in weight. A compatible pair
(u , v) in T is a locally minimal compatible pair (LMCP) if w(v) w(x)
for all x compatible with u and w(u) w(y) for all y compatible with v.
The algorithm chooses an LMCP. Also, each node belongs to at most one
LMCP, so the current LMCPs are pairwise disjoint.
1.1.27.* LEMMA . Merging an LMCP cannot decrease the weight of the
lightest node compatible with v. In particular, merging an LMCP
preserves other LMCPs.
Proof: Let a be the weight of the lightest node compatible with v before
merging x and y. By symmetry, we may assume that y is not between v
and x. If v becomes compatible with a node of weight less than a, then
the merge must make some node compatible with v that was not before.
Hence x or y is noncrossable, and there is no noncrossable node between
v and x. This requires v and x to be compatible, so a w(x).
Suppose u becomes newly compatible with v when x and y merge. If
u is compatible with y before the merge, then local minimality implies
w(x) w(u), so a w(u). If u is not compatible with y before the merge,

49

Chapter 1: Trees and Distance

then x is noncrossable, x is between u and v, and v is between x and y.


Now v and y are compatible, so a w(y). Since u is compatible with x in
this case, local minimality implies w(y) w(u), and again a w(u).
1.1.28.* LEMMA . Given a list of crossable and non-crossable nodes with
weights, every sequence of LMCP merges produces the same tree.
Proof: It suffices to prove that the choice of the first merge does not affect the final tree. We use induction; with at most three items, there
can be only one LMCP to choose. Consider a longer list with {u , v} and
{x , y} as distinct LMCPs available to be merged. These pairs are disjoint, and by Lemma 1.1.27 each remains an LMCP if we merge the other.
The pairs that become LMCPs after merging {u , v} or {x , y} may differ.
Nevertheless, the induction hypothesis guarantees that the choice of the
next LMCP after {u , v} does not affect the tree produced by starting with
{u , v}. In particular, the next merge could be {x , y}. Similarly, starting with {x , y} always leads to the same tree, and one way to proceed is
to merge {u , v} next. Since merging these two pairs in the two possible
orders produces the same list, the tree at the end is the same no matter
which is merged first.
The next lemma aids the proof of feasibility. Its awkward hypotheses about relative position are needed. For example, if y is noncrossable,
then merging x and y to form z can make z compatible with a node of
weight less than w(y) on the side of y away from x.
1.1.29.* LEMMA . If x and y merge in some list of LMCP merges, then
no node with weight less than w(y) on the x side of y can subsequently
become compatible with an ancestor z of {x , y} until two nodes are
merged that have the original positions of x and y between them.
Proof: A node u on the x side of y can become compatible with z only
when a noncrossable node v that is beyond x on the x side of y is merged,
at a time when u is compatible with v. If v merges with z, then local minimality implies w(z) w(u), but also w(y) w(z) since z is an ancestor of
y. If v merges with some other node on the x side of y, then local minimality implies w(v) w(u). We have also w(y) w(v) unless v earlier became
compatible with an ancestor of x and y after their merge. Hence there is
no first time the undesired event can happen.
We say that v crosses over u if u is crossable, exists when v is formed,
and has a descendant that lies between two descendants of v at some time
in the node list. If v crosses over u, then they become compatible when v
is formed. We write u v to mean that u and v are compatible.

Section 1.1: Trees and Optimization

50

1.1.30.* LEMMA . If v crosses over u in forming T , then w(u) w(v).

Proof: Let a and d be the children of v, and let b and c be the children of
u, listed in each case from left to right. The possible orderings of these in
the node list are (a , b , c , d), (b , a , c , d), and (a , b , d , c). If a b and c d
when u is formed, then local minimality in the formation of u implies
w(c) w(a) and w(b) w(d), which in turn implies w(u) w(v).
Otherwise, we apply Lemma 1.1.29 with {x , y} = {b , c} and use induction on the number of merges that cross over u before the formation
of v. If v is the first to cross over u, then Lemma 1.1.29 again implies
w(c) w(a) and w(b) w(d) (the ordering (a , b , c , d) requires two invocations of Lemma 1.1.29; the other two orderings use Lemma 1.1.29
once and the local minimality of b , c once). For the induction step, let t
be the last vertex to cross over u before v. By the induction hypothesis,
w(u) w(t). By arguments like those made already for u and v, we have
w(t) w(v).
After v is formed, the set of nodes compatible with v is the same as
the set compatible with u as long as both exist. Since w(u) w(v), this
implies that u merges before v.
1.1.31.* LEMMA . Let u and v be compatible crossable nodes with parents u and v in T , respectively. If w(u) w(v), then w(u) w(v).

Proof: Suppose that u merges with x to form u , and v merges with y to


form v . Since u v, the set of nodes compatible with v is the same as
the set compatible with u as long as both exist. Since w(u) w(v), this
implies that u merges before v. Since v is compatible with x when u is
formed, w(x) w(v). Since by Lemma 1.1.27 the weight of the lightest
node compatible with v cannot decrease during its existence, w(u) w(y).
Together, these inequalities imply w(u) w(v).
In a tree, let l(u) be the depth of a node u (its distance from the root).
The depth list of a tree is the list of depths of its leaves, in order.
1.1.32.* THEOREM. The depth list resulting from Step 2 of the Hu
Tucker algorithm is realizable by an alphabetic tree.
Proof: A nonnegative integer list is the depth list of an alphabetic tree if
and only if the highest numbers occur in even-length blocks and the list
obtained by replacing these in pairs with the next lower number is realizable. This holds because in every alphabetic tree the leaves of greatest
depth occur in consecutive pairs of siblings.

51

Chapter 1: Trees and Distance

If a depth list is not realizable, then the bottom-up process of converting it to an alphabetic tree fails at some point where k is the largest
entry and some maximal segment of k has odd length. Since each depth
except the root has an even number of nodes, the tree produced by Step
1 has a merge between an element a of this segment and an element d of
another segment of nodes at depth k, where between a and d there is an
element b with a smaller depth. Before a and d can be merged to form v,
the leaf b must become crossable by merging into a parent node u. Hence
v crosses over u. By Lemma 1.1.30, this implies that u and v are compatible crossable nodes with w(u) w(v).
It thus suffices to show that if u and v are compatible crossable nodes
with w(u) w(v), then l(u) l(v). Since w(u) w(v), it follows that v
is not a descendant of u. We use induction on the distance from v to the
closest common ancestor of u and v to prove the claim. If this distance
is 0, then u is a descendant of v, and the statement is immediate. Otherwise, let u and v be the parents of u and v, respectively. By Lemma
1.1.31, w(u) w(v); also u and v are compatible crossable nodes. By
the induction hypothesis, l(u) l(v), which yields l(u) l(v).
Given an input w (a list of weights, each designated as crossable or
noncrossable), let C be the class of trees formed by merges that can be ordered so that every merge is a compatible pair (in the current order) and
the list of depths of the nodes formed, in the order of formation, is nonincreasing. The optimal Huffman (non-alphabetic) trees for the frequency
lists 1,2,1 and 1,2,2,2,1 do not have such an ordering, since they require
the first merge to be a noncompatible pair. Since alphabetic trees merge
only adjacent nodes in the current list, every alphabetic tree with input
w belongs to C. Therefore, it suffices to show that the tree T produced
by Step 1 belongs to C and achieves minimum cost over the trees in C.
1.1.33.* THEOREM. The HuTucker Algorithm produces an optimal alphabetic binary tree.
Proof: By Lemma 1.1.28, every algorithm that always merges LMCPs
produces T . Hence it suffices to show that for every input w, there is a
tree of minimum cost in C having a merge order in which the first (deepest) merge is an LMCP. This implies simultaneously that T C and that
T has minimum cost in C.
We use induction on the length n of the input; the claim holds trivially for n = 2. Choose an optimal tree T and bottom-up merge ordering
of T to minimize the sum of weights of the first merged pair. Let this
pair be {u , v} with parent v ; it suffices to show that {u , v} is an LMCP.
If not, then we may assume by symmetry that w(u) > w(x) for some node

Section 1.1: Trees and Optimization

52

x compatible with v; let x have the least such weight (leftmost if there is
a tie).
By the induction hypothesis, subsequent merges are LMCPs. First
suppose that before x merges, v merges with some node y. Since v
x, also x is compatible with v and with every node compatible with v
(regardless of whether x is crossable). In particular, x y, but w(v)
w(u) > w(x), which contradicts the choice of {v , y} as an LMCP.
Next suppose that x eventually merges with v . We wish to replace
the first merge {u , v} with {v , x} (forming x) and replace the {v , x}
merge with {u , x}; this would yield the same list of depths of merges
and yield a cheaper tree, since w(u) > w(x). Since v x, this list will also
be a list of compatible merges unless u is noncrossable and some intervening merge in forming T crosses the position of u. We must eliminate this
possibility.
We consider this together with the remaining case, which is when x
merges with some y other than v before v merges. Here replacing the
initial {u , v} merge by {v , x} and the later {v , y} merge by {u , y} would
again yield the same list of depths of merges, and it would produce either a cheaper tree or one with the same cost but cheaper first merge.
Again these pairs are compatible at the time of merge and the new list
will be a list of compatible merges unless u is initially noncrossable and
some intervening merge crosses the position of u.
If u is noncrossable, then the hypothesis v x implies that v and x
are on the same side of u. Suppose {r, s} is the first merge crossing the
position of u, with s on the same side of u as v , x. If s v initially, then
the choice of x implies w(x) w(s), and thus {r, s} is not an LMCP when
it merges. We prove that w(x) w(z) when z is a node on the v side of u
that is not initially compatible with v but becomes compatible with v before x merges. In particular, s has this property and could not form an
LMCP with r.
We use induction on the number of merges until z becomes compatible
with v . We have already argued that w(x) w(z) if no such merges occur.
Otherwise, z becomes compatible with v via some merge {a , b} in which
b z. If a v initially, then the choice of x implies that w(x) w(a). If
instead a became compatible with v before the {a , b} merge, then the induction hypothesis yields w(x) w(a) again. In either case, a , b being an
LMCP implies that w(a) w(z).
We have proved that there is no first merge crossing over a noncrossable u before x merges. Thus the choice of the initial cheapest deepest
compatible merge must be an LMCP, which completes the proof.

53

Chapter 1: Trees and Distance

Section 1.1: Trees and Optimization

54

1.1.13. Prove that the block-cutpoint graph (Definition 1.1.10) of a connected


graph G is a tree in which all leaves correspond to blocks of G.

EXERCISES
1.1.1. () Prove that a tree with maximum degree k has at least k leaves.

1.1.2. () Prove that every graph with n vertices and k edges has at least n k
components.
1.1.3. () Prove C A , B in Theorem 1.1.3 by adding edges to connect components.

1.1.14. Prove that a connected graph having exactly two vertices that are not
cut-vertices is a path.
1.1.15. Let T be a tree with k vertices. Let G be a graph that does not contain K3
or K 2 ,t . Prove that if (G) > (k 2)(t 1), then T occurs as an induced subgraph
of G. (Zaker [2011])

1.1.4. () Characterization of trees.


a) Prove that a multigraph is a tree if and only if it is connected and every
edge is a cut-edge.
b) Prove that a multigraph is a tree if and only if every way of adding an
edge (without adding a vertex) creates exactly one cycle.
c) Explain why (b) fails if the condition applies only to nonadjacent pairs.

1.1.16. () Let G be an n-vertex graph with n 4. Prove that if G has at


least 2n 3 edges, then G has two cycles of the same length. (Comment: Chen
JacobsonLehelShreve [1999] strengthens this.)

1.1.5. () Prove that a connected n-vertex graph has exactly one cycle if and only
if it has exactly n edges.

1.1.18. () Prove that every n-vertex graph with n + 2 edges has a cycle of length
at most (n + 2)/2 . For each n, construct an example with no shorter cycle.

1.1.6. () Every tree is bipartite. Prove that every tree has a leaf in its larger
partite set (in both if they have equal size).

1.1.19. () Let T be a tree with k edges, and let G be an n-vertex graph with more
than n(k 1) (k2) edges. Use Proposition 1.1.4 to prove that T G if n > k.

1.1.7. () Let T be a tree in which every vertex has degree 1 or degree k. Determine the possible values for | V(T)| .

1.1.8. () Let T be a tree in which all vertices adjacent to leaves have degree at
least 3. Prove that T has some pair of leaves with a common neighbor.

1.1.9. () Let G be a tree. Prove that there is a partition of V(G) into two
nonempty sets such that each vertex has at least half of its neighbors in its own
set in the partition if and only if G is not a star.
1.1.10. () There are five cities in a network. The cost of building a road directly
between i and j is the entry ai , j in the matrix below. Note that ai , j = indicates
that there is a mountain in the way and the road cannot be built. Determine the
least cost of making all the cities reachable from each other.

0
3
5

11
9

3 5
0 3
3 0
9
8 10

11
9

0
7

9
8
10

7
0

1.1.11. () In the graph K1 C4 , assign the weights 1 , 1 , 2 , 2 , 3 , 3 , 4 , 4 to the


edges in two ways: one way so that the mimimum-weight spanning tree is unique,
and another way so that the minimum-weight spanning tree is not unique.
1.1.12. () Compute a code with minimum expected length for a set of ten messages whose relative frequencies are 1 , 2 , 3 , 4 , 5 , 5 , 6 , 7 , 8 , 9. What is the expected length of a message in this optimal code?

1.1.17. () Prove that every n-vertex graph with n + 1 edges has a cycle of length
at most (2n + 2)/3 . For each n, construct an example achieving this bound.

1.1.20. Give proof or infinitely many counterexamples for these statements:


a) If T is a minimum-weight spanning tree of a weighted graph G, then the
u , v-path in T is a minimum-weight u , v-path in G.
b) One can produce a minimum weighted spanning path in a complete graph
with nonnegative edge weights by iteratively selecting the edge of least weight so
that the edges selected so far form a forest with maximum degree 2.
1.1.21. Suppose that in the hypercube Qk , each edge whose endpoints differ in
coordinate i is given weight 2i. Compute the minimum weight of a spanning tree.
1.1.22. Let G be a weighted graph with distinct edge weights. Without using
Kruskals algorithm, prove that G has only one minimum-weight spanning tree.
1.1.23. Let G be a weighted connected graph. Prove that no matter how ties are
broken in choosing the next edge for Kruskals Algorithm, the list of weights of
a minimum spanning tree (in nondecreasing order) is unique.
1.1.24. Let F be a spanning forest of a connected weighted graph G. Among all
the edges of G having endpoints in different components of F , let e be one of minimum weight. Prove that among all the spanning trees of G that contain F , there
is one of minimum weight that contains e. Use this to give another proof that
Kruskals algorithm works.
1.1.25. () Prims Algorithm grows a spanning tree from an arbitrary vertex of
a weighted G, iteratively adding the cheapest edge between a vertex already absorbed and a vertex not yet absorbed, finishing when the other n 1 vertices of G
have been absorbed. (Ties are broken arbitrarily.) Prove that Prims Algorithm

55

Chapter 1: Trees and Distance

produces a minimum-weight spanning tree of G. (Jarn ick [1930], Prim [1957],


Dijkstra [1959], independently).
1.1.26. Let v be a vertex in a connected graph G. Obtain an algorithm for finding, among all minimum spanning trees of G, one that minimizes the degree of
v. Prove that it works.
1.1.27. () A minimax or bottleneck spanning tree is a spanning tree in which
the maximum weight of the edges is as small as possible. Prove that every
minimum-weight spanning tree is a bottleneck spanning tree.
1.1.28. Let T be a minimum-weight spanning tree in a weighted connected graph
G. Prove that T omits some heaviest edge from every cycle in G.
1.1.29. Given a connected weighted graph, iteratively delete a heaviest non-cutedge until the resulting graph is acyclic. Prove that the subgraph remaining is
a minimum-weight spanning tree.
1.1.30. () Let T be a minimum-weight spanning tree in G, and let T be another spanning tree in G. Prove that T can be transformed into T using steps
that exchange one edge of T for one edge of T , such that the edge set is always a
spanning tree and the total weight never increases.
1.1.31. Form a graph G by replacing an edge of K t+1 with a path of length n t connecting its endpoints, through n t 1 new vertices. Prove that G has n vertices
and n + (2t ) 1 edges but has no spanning tree with more than t leaves. (Comment: DingJohnsonSeymour [2001] showed that every n-vertex graph with at
least n + (2t ) edges has a spanning tree with more than t leaves.)
1.1.32. () Upper bounds on l(n , k).
a) Form an n-vertex graph G from a cyclic arrangement of cliques of sizes
k/2 , k/2 , 1 , . . . , k/2 , k/2 , 1 (in order) by letting every vertex be adjacent to all
vertices in the clique before it and the clique after it (G is k-regular). In terms of
k and n, determine the maximum number of leaves in a spanning tree of G.
b) For k even, place n vertices on a circle and let each vertex be adjacent to
the k/2 closest vertices in each direction. For 3k/2 + 2 n < 5(k + 1)/3, prove that
this graph has no spanning tree with at least (k 2)n/(k + 1) + 2 leaves.

1.1.33. Let l(G) be the maximum number of leaves in a spanning tree of G, and
2
. Linial conjectured that l(G) f(G) for all G; this is
let f(G) = vV(G) d(v)
d(v)+1
false. Prove that f(G) l(G) can be arbitrarily large by considering the graph
G m with 10m vertices formed by adding a matching joining K 5m and mC5 .
1.1.34. () Let G be an n-vertex graph other than K3 in which every edge belongs
to a triangle. Prove that G has a spanning tree with at least (n + 5)/3 leaves and
that this is sharp for an infinite family of graphs. (Hint: Grow a tree by operations satisfying an appropriate augmentation inequality. To get the constant
right , an extra dead leaf may need to be guaranteed at the beginning.)
1.1.35. Prove that the number of binary trees with n + 1 leaves equals the number of ordered trees with n + 1 vertices (Example 1.1.18 shows the five binary trees
with four leaves). (Hint: Show that the two families satisfy the same recurrence.)

Section 1.1: Trees and Optimization

56

1.1.36. () Suppose that n messages occur with probabilities p1 , . . . , pn and that


each pi is a power of 1/2 (each pi 0 and pi = 1).
a) Prove that the two least likely messages have equal probability.
b) Prove that the expected message length of the Huffman code for this distribution is pi lg pi .
1.1.37. Given frequencies p1 , . . . , pn , Huffmans algorithm finds the prefix-free
encoding that minimizes pi li , where li is the length of the word assigned to
item i (equivalently, the length from the root to the ith leaf in the corresponding
binary tree. Consider instead the objective function maxi {pi t li }, where t is a real
number in the interval [1 , 2]. (HuKleitmanTamaki)
a) An analogue of Huffmans algorithm builds a binary tree from the bottom
up by iteratively replacing the two least frequent items, having weights p , p , by
a single item with frequency t max{p , p}. Prove that the resulting binary tree
minimizes maxi {pi t li }.
b) (+) Use the algorithm in part (a) to prove that a tree always exists with
n
maxi {pi t li } t j =1 pj . (Hint: Use induction on n. Consider an optimal tree built
by the algorithm. Focus on the deepest level where it has more than two nodes,
if any exists. Modify the part of the input corresponding to leaves at that level.)
1.1.38. () The Fibonacci tree Tk with F k leaves is the rooted tree defined as
follows. Let T1 and T2 consist of the root only. For k 2, let the left subtree be
Tk1 and the right subtree be Tk2 . For a path from the root , let left branches
cost 1 and right branches cost c, with c > 0. (Here F k is the adjusted Fibonacci
number with F 0 = F 1 = 1.)
a) Let T be a binary tree with n leaves in which every vertex has 0 or 2 children (not necessarily a Fibonacci tree). Prove that the difference between the total
cost of paths to leaves and the total cost of paths to non-leaves is (n 1)(1 + c).
b) For c = 2, prove that Tk has the minimum total cost of paths to leaves
among all rooted plane binary trees with F k leaves. (Hint: Prove that the cost to
each internal vertex of Tk is less than the cost to every potential vertex that is
not internal to Tk .) (Horibe [1983])
1.1.39. Lopsided binary trees. Fix p (0 , 1). From an internal node of a tree,
assign probability p to the left branch and 1 p to the right branch. The probability of reaching any node is the product of the probabilities along the path from
the root. The entropy function H is given by H(p1 , . . . , pn) = pi lg pi . A
p-maximal tree is a tree maximizing the entropy of the leaf probability distribution over all trees with the same number of leaves.
a) Prove that H(p1 , . . . , pn) equals H(p , 1 p) times the sum of the probabilities of internal nodes, and that the sum of the probabilities of internal nodes
equals the expected path length to a leaf.
b) A binary tree with left-branch cost 1 and right-branch cost c > 1 is cminimal if the total cost of paths to leaves is miminal among all binary trees
with the same number of leaves. Prove that if pc = 1 p, then a binary tree is cminimal
if and only if it is p-maximal. (Comment: for c = 2, the corrresponding
p is (1 + 5)/2.) (Horibe [1988])

You might also like