Algorithms and Complexity II

Algorithms and Complexity 2, CS2870
Gregory Gutin December 11, 2011
Abstract
This notes accompany the second year course CS2870: Algorithms and Complexity 2. All computer scientists should know basics of graph theory, algorithms and applications as well as of computational complexity. Permission is given to freely copy and distribute this document in an unchanged form. You may not modify the text and redistribute without written permission from the author.
Contents
1 Basic notions on undirected and directed graphs 1.1 Introduction to graph theory: Graph models 1.1.1 1.1.2 1.2 . . . . . . . . . . . . . . . . . 9 9 9
Matchings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Trac models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Degrees in graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.1 1.2.2 1.2.3 Basic denitions and results . . . . . . . . . . . . . . . . . . . . . . . 12 Havel-Hakimi algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 13 Pseudocode of Havel-Hakimi algorithm (extra material) . . . . . . . 15
1.3 1.4 1.5 1.6 1.7 1.8
Degrees in digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Isomorphism of graphs and digraphs . . . . . . . . . . . . . . . . . . . . . . 19 Classes of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Graph data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 25
2 Walks, Connectivity and Trees 2.1 2.2 2.3 2.4 2.5 2.6
Walks, trails, paths and cycles in graphs . . . . . . . . . . . . . . . . . . . . 25 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Edge-connectivity and Vertex-connectivity (extra material) . . . . . . . . . 30 Basic properties of trees and forests . . . . . . . . . . . . . . . . . . . . . . 32
Spanning trees and forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Greedy-type algorithms and minimum weight spanning trees 5 . . . . . . . . 34
6 2.7 Solutions
CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 41
3 Directed graphs 3.1
Acyclic digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1.1 3.1.2 3.1.3 Acyclic ordering of acyclic digraphs . . . . . . . . . . . . . . . . . . 41
Longest and Shortest paths in acyclic digraphs . . . . . . . . . . . . 44 Analyzing projects using PERT/CPM (extra material) . . . . . . . . 46
3.2
Distances in digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.1 3.2.2 3.2.3 Breadth First Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Dijkstras algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 The Floyd-Warshall algorithm . . . . . . . . . . . . . . . . . . . . . 50
3.3
Strong connectivity in digraphs . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.1 3.3.2 Basics of strong connectivity . . . . . . . . . . . . . . . . . . . . . . 53 Algorithms for nding strong components . . . . . . . . . . . . . . . 54
3.4 3.5
Application: Solving the 2-Satisability Problem (extra material) . . . . . . 55 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 61
4 Colourings of Graphs, Independent Sets and Cliques 4.1 4.2 4.3 4.4 4.5 4.6 4.7
Basic denitions of vertex colourings . . . . . . . . . . . . . . . . . . . . . . 61 Bipartite graphs and digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Periods of digraphs and Markov chains (extra material) . . . . . . . . . . . 64 Computing chromatic number . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Greedy colouring and interval graphs . . . . . . . . . . . . . . . . . . . . . . 67 Edge colourings (extra material) . . . . . . . . . . . . . . . . . . . . . . . . 69 Independent Sets and Cliques . . . . . . . . . . . . . . . . . . . . . . . . . . 69 73
5 Matchings in graphs 5.1 5.2 5.3 5.4
Matchings in (general) graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Matchings in bipartite graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Application of matchings Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
CONTENTS 6 Euler trails and Hamilton cycles in graphs 6.1 6.2 6.3 6.4
7 81
Euler trails in multigraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Chinese Postman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Hamilton cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Travelling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 87 89
7 NP-completeness 7.1 7.2 7.3
Why is arranging objects hard? . . . . . . . . . . . . . . . . . . . . . . . . . 89 Brute force optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 How to spot an explosive algorithm . . . . . . . . . . . . . . . . . . . . . . . 90 7.3.1 7.3.2 Exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Good algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4 7.5
Tractable problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 The class N P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7.5.1 7.5.2 7.5.3 Minimisation and backtracking . . . . . . . . . . . . . . . . . . . . . 93 Examples of N P-complete problems . . . . . . . . . . . . . . . . . . 94 Alternative denition on the class N P . . . . . . . . . . . . . . . . . 95
7.6 7.7 7.8
Proving that a problem is N P-complete . . . . . . . . . . . . . . . . . . . . 97 Proceeding in the face of intractable problems . . . . . . . . . . . . . . . . . 99 TSP Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
CONTENTS
Chapter 1
Basic notions on undirected and directed graphs

1.1 Introduction to graph theory: Graph models
We start from some simple problems that can be usefully modeled as graphs. While considering these models we introduce some basic notions from graph theory.
1.1.1
Matchings
(Undirected) graphs are structures dened by vertices (sometimes, also called nodes) and edges between some pairs of vertices. Figure 1.1 gives an example of a graph H with vertices j1 , j2 , j3 , j4 , j5 , p1 , . . . , p7 and edges j1 p1 , j1 p3 , j2 p5 , j3 p2 , j3 p3 , j4 p3 , j4 p5 , j4 p6 , j5 p4 , j5 p7 . The graph H may represent the following problem. A recruitment agency currently has 5 jobs in IT available and 7 people looking for a job in IT. Not every person can do every job and the list of jobs with persons able to do them is written above (j1 p1 , j1 p3 , . . . , j5 p7 ). The agency gets 2000 for every person employed, and thus is interested in nding appropriate people to all ve jobs. The natural question is what is the maximum amount of money the agency can make in this situation. It is not dicult to check that the agency can make 2000 5 = 10000 in this particular example by using the following assignment: {j1 p1 , j2 p5 , j3 p2 , j4 p6 , j5 p7 }. The above example can be easily generalized. We are given jobs j1 , . . . , jm , people p1 , . . . , pn , and a list of pairs job-person such that each pair ji pk in the list indicates that 9
10 CHAPTER 1.
BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS j1 j2 j3 j4 j5 p1 p2 p3 p4 p5 p6 p7 Figure 1.1: A graph H.
job ji can be done by person pk . The objective is to determine the maximum number of jobs that can be done under the condition that no person can do more than one job. The best (maximum) assignment of jobs to persons is a maximum matching in the corresponding graph. It turns out that the general problem can be better understood and solved if it modeled as a graph similar to one in Figure 1.1. In fact, graph theory provides a fast algorithm to quickly solve the problem (using computers) even if the numbers of jobs and people are quite large. The algorithm is non-trivial and is based on certain notions and results in graph theory. We will study this algorithm, but it is perhaps useful to try to design such an algorithm yourself. Already the graph H in Figure 1.1 and the corresponding problem allow us to introduce some important notions in graph theory. The graph H is bipartite, i.e., its set of vertices can be partitioned into two partite sets such that no edge joins vertices from the same partite set. In our case no two jobs should be joined and no two people should be joined. We said we wanted to nd a maximum matching in H. What is a matching in a graph? It is a collection of edges with no common vertices. In particular, the collection j1 p1 , j2 p5 , j3 p2 is a matching, but j1 p1 , j1 p3 , j3 p2 is not (j1 is in two edges). A matching is maximum if it contains the maximum possible number of edges in the graph in hand. The term matching appeared because of another practical problem. In a small village there are a number of girls and a number of boys and some girls know some boys. What is the maximum number of marriages that can be arranged such that a girl may only marry a boy she knows? There is a theorem, Halls theorem, which answers a weaker question: is it possible to marry all girls (boys)? Sometimes, the last question is called the marriage problem. Let us suppose that the graph H in Figure 1.1 models an instance of the marriage problems, where ji (Jane, Janet, etc.) are girls and pk (Peter, Paul, etc.) are boys. One
1.2. DEGREES IN GRAPHS x z u v
11
Figure 1.2: A digraph D
of the simplest questions to ask is how many boys Jane (j1 ) knows. We see that Jane knows two boys p1 and p3 . We say that the degree of j1 is 2. The degree of j2 (j3 , j4 , j5 , respectively) is 1 (2,3,2, respectively). The degree of each of the vertices p1 , p2 , p4 , p6 , p7 is 1, the degree of p5 is 2, and the degree of p3 is 3. You may check that the sum of all degrees is twice the number of edges in H. This is true for every graph. Try to understand why. We prove this fact later on.
1.1.2
Trac models
Graphs can be used for journey planning. Indeed, the system of UK roads can be considered as a graph whose vertices are junctions and intervals of roads between junctions are edges. Clearly, every edge has a weight (for example, the time needed to travel along this edge). Another example is the London Underground (LU). In LU people often need to nd how to get from one vertex to another, i.e., from one station to another. Their journey is a path in the LU graph, i.e., a sequence of distinct vertices (= stations) such that each vertex joined to the previous vertex by an edge. In Figure 1.1, p1 j1 p3 j4 p6 is a path. Tourist buses in London and Oxford visit certain places of interest and return to their original place. So, they move along a cycle, i.e., a path plus an extra edge between the rst and last vertices of the path. The LU graph is an undirected graph as we can travel along each edge in both directions. In some other graph models, this is impossible. For example, the London road system cannot be represented as an undirected graph since it has one-way streets. A oneway street has a direction in which the travel is allowed. This situation can be represented adequately by a directed graph, i.e., a graph in which every edge has direction. Edges of directed graphs are normally called arcs. Figure 1.2 depicts a directed graph (for short, digraph).
1.2
Degrees in graphs
In this section, we will study degrees of undirected graphs.
12 CHAPTER 1.
BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS
1.2.1
Basic denitions and results
The degree of a vertex x in a graph G, denoted by dG (x), is the number of edges in G in which x is one of the two end-vertices. Normally, a graph G is written as a pair G = (V, E), where V is the vertex set and E is the edge set. A vertex y is a neighbour of vertex x in G if xy E. Observe that the total number of neighbours of x is its degree. One of the rst observations in graph theory is the following proposition called the sum-of-degrees proposition. In the proposition we use the symbol |E| that is the number of elements in set E, i.e., the number of edges in G. Similarly, |V | denotes the number of vertices in G. Proposition 1.2.1 The sum of degrees of vertices in a graph G = (V, E) equals twice the number of edges in G. In notation, dG (x) = 2|E|.
xV
Proof: Every edge e E with end-vertices y and z contributes 2 to the sum as it contributes 1 to dG (y) and 1 to dG (z). QED Question 1.2.2 Check the above proposition for the graph H in Fig. 1.1.
xV
dG (x)
The sum-of-degrees proposition implies that every graph has even number of vertices of odd degree. Indeed, let G = (V, E) be a graph. We partition V into the set of vertices of odd degree V1 and the set of vertices of even degree V2 . We have 2|E| =
xV
dG (x) =
xV1
dG (x) +
xV2
dG (x),
i.e., 2|E| =
xV1
dG (x) +
xV2
dG (x).
xV1
Since 2|E| is even and must be even, too.
xV2
dG (x) is even (as the sum of even numbers),
dG (x)
Theorem 1.2.3 Every graph G with at least two vertices has a pair of vertices of the same degree. Proof: Let G have n vertices. If all vertices of G have dierent degrees, the degrees will range from 0 to n 1 inclusive. However, a graph cannot have both vertex of degree 0 (a vertex with no edges) and vertex of degree n 1 (linked to all vertices in G). QED
1.2. DEGREES IN GRAPHS
13
Question 1.2.4 (a) Compute the number of edges in a graph with 10 vertices, each of degree 3. (b) Compute the number of edges in a graph with n vertices, each of degree 4. (c) Is there a graph with vertex degrees 5,4,3,2,1 ? (d) Draw a graph with vertex degrees 2,2,3,3,3,3. (e) Draw a graph with vertex degrees 1,1,1,1,1,1,6.
1.2.2
Havel-Hakimi algorithm
Now we consider the question when a sequence of non-negative integers is a sequence of degrees of vertices of a graph. For example, the sequence 0, 0, 0, 0 is the sequence of degrees of a graph on 4 vertices with no edges. The sequence 1, 1 is the sequence of degrees of a graph with 2 vertices joined by a unique edge. However, 0, 1 is not a sequence of degrees of any graph: if we assume that a graph H with the sequence of degrees 0, 1 does exist, then we see that H has two vertices, one of degree 0 and another of degree 1. The vertex of degree 1 implies that H has an edge, while the vertex of degree 0 implies that H has no edge, a contradiction. Every sequence of non-negative integers, which is a sequence of degrees of some graph is called graphic. Havel and Hakimi suggested an algorithm to check whether the sequence in hand is graphic or not. This algorithm is recursive and based on the following theorem. Theorem 1.2.5 (HH theorem) Let i1 , i2 , . . . , in with i1 i2 in be a sequence of non-negative integers. This sequence is graphic if and only if the following sequence is graphic: replace i1 by 0 and decrease the rst i1 numbers in i2 , i3 , . . . , in by one. It is very important that the operation of this theorem is applied to a non-increasing sequence i1 , i2 , . . . , in . Let us consider the following examples. Question 1.2.6 Check whether the sequence 4, 2, 4, 3, 4, 1 is graphic and if it is construct a graph with this degree sequence. Solution: First we rewrite the given sequence in non-increasing order: 4, 4, 4, 3, 2, 1. Assume that 4, 4, 4, 3, 2, 1 is graphic and H is a graph with this sequence with vertices u, v, w, x, y, z. We now apply the HH theorem.
uvwxyz
14 CHAPTER 1.
u z y
v w x
Figure 1.3: A graphic realization of 4,4,4,3,2,1
444321 (apply HH) 033211 (apply HH) 002101 (apply HH) 000000 Obviously, the last sequence is graphic and, hence, the original sequence is graphic as well. We build H going along the transformations above from bottom to top. First we depict the six vertices, then we add edges wx and wz. Then we add edges vw, vx and vy. Finally we add edges between vertex u and vertices v, w, x, y. See Figure 1.3. Question 1.2.7 Check whether the sequence 3, 2, 4, 3, 4, 1 is graphic and if it is construct a graph with this degree sequence. Solution: First we rewrite the given sequence in non-increasing order: 4, 4, 3, 3, 2, 1. Assume that 4, 4, 3, 3, 2, 1 is graphic and H is a graph with this sequence with vertices u, v, w, x, y, z. We now apply the HH theorem.
uvwxyz 443321 (apply HH) 032211 (apply HH) 001101 (apply HH) 000001 Obviously, the last sequence is not graphic and, hence, the original sequence is not graphic either. Question 1.2.8 Check whether the following sequences are graphic and when it is construct the corresponding graph: (a) 3, 3, 3, 3, 3, 3
1.2. DEGREES IN GRAPHS (b) 5, 4, 3, 3, 1, 0 (c) 4, 4, 4, 4, 4, 2, 2 (d) 5, 4, 4, 4, 4, 3, 2
15
Question 1.2.9 [A solution is given in the end of the chapter] Using the HavelHakimi algorithm check whether each of the following sequences is graphic and, when it is, construct the corresponding graph. Justify your answers. (i) 5, 3, 3, 3, 3, 3, 2 (ii) 6, 4, 4, 2, 2, 1, 1 Question 1.2.10 [A solution is given in the end of the chapter] Consider the following sequences of natural numbers. Which of the sequences are degree sequences of trees? Justify your answers. For every degree sequence of a tree, construct the corresponding tree. (You may use the Havel-Hakimi algorithm where appropriate.) (i) 4, 3, 3, 4, 4, 2, 1 (ii) 2, 1, 1, 1, 1, 4
1.2.3
Pseudocode of Havel-Hakimi algorithm (extra material)
Now we will produce a pseudo-code for checking whether a sequence of non-negative integers is graphic. We use an array d to represent this sequence. Procedure sort in lines 2 and 7 sorts the array d using any sorting algorithm (merge sort, for example). Its input is an unsorted array d and its output is a sorted array d in which d[0] d[1] d[2] . . . . The input of our pseudo-code consists of an array d and the number m of its positive elements.
1 // d is an array of n integers with m positive integers all smaller than n 2 sort(d) 3 while (m > 0 && d[0] <= m-1) do 4 { 5 for i from 1 to d[0] do d[i] := d[i]-1 6 d[0] := 0 7 sort(d) 8 k :=0 9 for j from 0 to m do 10 { 11 if (d[j]>0) k := k+1 12 } 13 m := k
16 CHAPTER 1.
BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS x v z u
14 15 16
} if ( m == 0) print "graphic" else print "non-graphic"
To see that the above code is correct, it suces to observe that the code implements the Havel-Hakimi procedure considered earlier. The pseudo-code discovers that the current array d is not graphic only if d[0], the largest element of d, is larger than the number m of positive elements in d minus 1 (d[0] itself). Otherwise, the pseudo-code is performed to the end, i.e., m = 0, which means, by the Havel-Hakimi Theorem, that d is graphic. To make our pseudo-code more ecient, we observe that the value of each d[i] is bounded from above by n 1. Thus, d[i] = O(n) and we can implement sort to run in time O(n). To this end, we can use counting sort(see [CLR1990] for description and analysis of counting sort). So, each time we use line 7, we perform O(n) operations. The same is true for the remaining lines. Clearly, we have at most n iterations in while of line 3 and, thus, the overall time complexity of our pseudo-code is O(n2 ). Question 1.2.11 What is the time complexity of the pseudo-code above when sort is implemented by merge sort?
1.3
Degrees in digraphs
An undirected graph G = (V, E) has sets of vertices V and edges E. For an edge xy E, one can move from x to y and from y to x. A directed graph (digraph, for short) D = (V, A) has vertices V and arcs A. For an arc xy A, one can move only from x to y. (The situation when one can move from x to y and y to x in a digraph is reected by having both arcs xy and yx.) For example, the digraph D in Figure 1.4 has vertices V = {x, y, z, u, v, w} and arcs A = {xz, yz, zu, uv, uw, wu}. Since there are arcs coming to a vertex x in a digraph H and ones leaving x, the notion of degree is not enough for digraphs. Instead, we have two parameters: the out-degree
1.4. SUBGRAPHS
17
d+ (x) and in-degree d (x) of a vertex x is the number of arcs leaving and coming to x respectively. For example, in D of Figure 1.4, d+ (x) = 1, d (x) = 0, d+ (y) = 1, d (y) = 0, and d+ (z) = 1, d (z) = 2. The out-degree and in-degree are called semi-degrees. Instead of the sum-of-degrees proposition for undirected graphs, we have the sum-ofsemi-degrees proposition: Proposition 1.3.1 The sum of out-degrees of vertices in a digraph D = (V, A) equals the number of arcs in G. In notation, xV d+ (x) = |A|. Also, xV d (x) = |A|. The out-neighbours of a vertex x in a digraph D = (V, A) are all vertices y for which xy A. The in-neighbours of a vertex x in a digraph D = (V, A) are all vertices y for which yx A. Observe that the number of out-neighbours of x is its out-degree and the number of in-neighbours of x is its in-degree. The vertex z is the only out-neighbour of vertex x in Figure 1.4. The vertex z has two in-neighbours: x and y (in Figure 1.4). Question 1.3.2 (a) Compute the number of arcs in a digraph with out-degree sequence 3, 2, 2, 1, 1, 1, 0. (b) Draw a digraph with in-degree sequence 2, 2, 2, 1, 1, 1. (c) Draw a digraph with 6 vertices in which all in-degrees are dierent.
1.4
Subgraphs
The deletion of an edge e from a (undirected) graph G = (V, E) means the transformation that changes G into G = (V, E e). Consider the London Underground (LU) graph. If we delete some of the edges of LU graph (i.e., stop trains running between some stations) we obtain a subgraph of the LU graph, which we also call a spanning subgraph since it has the same vertices as the LU graph, but less edges. In Figure 1.5, H is a spanning subgraph of G. The deletion of a vertex v from a graph G = (V, E) means deletion of v and all edges of G with end-vertex v from G. Indeed, if we close one of the stations in the LU graph, we eectively will close all edges to this station. If we delete some vertices and edges from a graph G, we get a subgraph of G. If only vertices are deleted, we speak of an induced subgraph. If G = (V, E) and we delete set W of vertices in G, we say that the remaining subgraph is induced by V W. The third (unnamed) graph in Figure 1.5 is an induced subgraph of G. It is induced by a, b, c, f. If we delete any vertex from graph H in Figure 1.5, we obtain a graph which is neither spanning nor induced subgraph of G. The deletion of vertices and/or edges is of interest for network reliability applications. Indeed, in a computer network we are interested what is the minimum number of links
18 CHAPTER 1.
f
G
e
H
a f
Figure 1.5: A graph and its subgraphs
e f h
Figure 1.6: A graph representing a network of computers
between computers that have to be shut down before one cannot communicate from some computer to another in the network. The larger the number the more reliable is the network. Assume that the graph in Figure 1.6 represents a network of computers. Here, it is enough to shut down just one computer h to make the network disconnected. However, two links, edges ch, hd should be deleted before the network is disconnected. Clearly, no single link failure will make the network disconnected. Some subgraphs of graphs are of special interest. The most important of them are paths and cycles. Many lines of LU graph are paths. For example, the Jubilee Line is a path. Recall that a path P is a sequence of distinct vertices such that any vertex of P is joined by an edge to its predecessor and/or successor in P . Some lines are not paths. For example, the Metropolitan Line is not a path as there two branches leaving Harrowon-the-Hill station (one to Uxbridge and another to Watford, Chesham and Amersham, which is divided into more branches). The Circle Line is an example of a cycle.
Question 1.4.1 (a) Draw the subgraph of G (in Figure 1.6) induced by the vertices a, c, d, e, f, g. (b) What are degrees of the vertices of a path (a cycle)?
1.5. ISOMORPHISM OF GRAPHS AND DIGRAPHS
19
a e d c b
f g h
x
Figure 1.7: Isomorphic graphs
v a b c d
x
1.5
Isomorphism of graphs and digraphs
A graph can be drawn in many dierent ways. A casual viewer may think that all drawings are dierent graphs. This provides a motivation to the following denition. Two graphs G and H are isomorphic if there is a one-to-one correspondence (called an isomorphism) between the vertices of G and H such that a pair of vertices in G are joined by an edge if and only if the corresponding pair of vertices in H are joined by an edge. See Figure 1.7 for three dierent drawings of the same (up to isomorphism) graph. To verify whether two graphs are isomorphic or not it is useful to check their parameters: to be isomorphic they must have the same number of vertices, edges, degree sequence (up to permutation of numbers), etc. Question 1.5.1 Prove that two graphs in Figure 1.8 are isomorphic. Solution: Consider the one-to-one correspondence given by av, bx, cy, du. It is easy to verify that this correspondence is isomorphism. In particular, there is an edge between a and b, and there is an edge between v and x. Question 1.5.2 Prove that two graphs in Figure 1.9 are not isomorphic. Solution: Each of graphs G and H has just one vertex of degree 3. However, the vertex of G of degree 3 is linked by edge to just one vertex of degree 1, and the vertex of H of degree 3 is linked by edge to two vertices of degree 1. Hence, G and H are not isomorphic.
20 CHAPTER 1.
v z
f
Figure 1.9: Non-isomorphic graphs
Question 1.5.3 Prove that two graphs in Figure 1.10 are isomorphic.
Question 1.5.4 Prove that two graphs in Figure 1.11 are not isomorphic.
Question 1.5.5 Draw two non-isomorphic graphs with (a) 6 vertices and 10 edges (b) 6 vertices and 11 edges Two digraphs D and H are isomorphic if there is a one-to-one correspondence between their vertices that preserves arcs, i.e., if vertices x and y in D correspond to vertices a and b in H, then xy is an arc in D if and only if ab is an arc in H. Question 1.5.6 Prove that the two digraphs in Figure 1.12 are isomorphic.
Figure 1.11: Non-isomorphic graphs
1.6. CLASSES OF GRAPHS
21
b a d Q e
c f a
c f
d R
Figure 1.12: A digraph Q and its converse R.
The converse of a digraph D is obtained from D by reversing the directions of all arcs in D. In Figure 1.12, R is the converse of Q. Question 1.5.7 Give three examples of digraphs on 6 vertices whose converses are not isomorphic to the original digraphs.
1.6
Classes of graphs
Paths and cycles can be considered as graphs themselves. A path on n vertices, denoted by Pn , is a graph with vertices v1 , v2 , . . . , vn and edges v1 v2 , v2 v3 , . . . , vn1 vn . If we add edge vn v1 to Pn , we get a cycle on n vertices denoted by Cn . A complete graph on n vertices, denoted by Kn , is a graph on n vertices in which every two vertices are joined by an edge (are adjacent). If we x n, then there is only one complete graph on n vertices (up to isomorphism). Thus, we may speak of the complete graphs on 3,4,5, etc. vertices. See Figure 1.13 for K3 and K4 and Figure 1.7 for three dierent drawings of K4 . Clearly, C3 = K3 . Kn = (V, E) has n(n 1)/2 edges. Indeed, the degree of every vertex is n 1. By the sum-of-degrees proposition, 2|E| = vV d(v) = n(n 1). Hence, |E| = n(n 1)/2. A complete bipartite graph, denoted by Kp,q , is a graph on p + q vertices, whose vertices are partitioned into two partite sets P and Q such that |P | = p, |Q| = q, every vertex in P is adjacent with every vertex in Q, and no two vertices in P (Q, respectively) are adjacent. Clearly, Kp,q has pq edges. See Figure 1.13 for K2,3 and K3,3 . The vertices of the n-cube can be viewed as binary (0, 1)-sequences with n elements (= coordinates). The n-cube is denoted by Qn . For example, Q3 has vertices 000, 001, 010, 100, 011, 101, 110, 111. Graph Q3 is depicted in Figure 1.13. Two vertices of Qn are adjacent if and only if they dier only in one coordinate. For example, vertex 000 is adjacent only with 001, 010, 100.
22 CHAPTER 1.
K3
K4
K2,3
K3,3
Q3
Figure 1.13: Some graphs
Thus, a vertex of n-cube is a sequence i1 i2 i3 . . . in of digits equal 0 or 1. There exist exactly 2n such sequences, i.e., Qn has 2n vertices. For a digit i = 1, = 0 and for i = 0, = 1. By denition, i1 i2 . . . in is adjacent i i exactly with 1 i2 i3 . . . in , i12 i3 . . . in , i1 i23 . . . in , . . ., i1 i2 i3 . . . n . Thus, every vertex in i i i i Qn is adjacent to exactly n vertices. To compute the number of edges in Qn = (V, E), we use the sum-of-degrees proposition: 2|E| = vV d(v) = n2n . Hence, |E| = n2n1 . Question 1.6.1 Compute the number of vertices and edges in Q5 .
1.7
Graph data structures
For the adjacency matrix representation of a digraph D = (V, A), we assume that the vertices of D are labeled v1 , v2 , . . . , vn in some arbitrary but xed manner. The adjacency matrix M (D) = [mij ] of a digraph D is an n n-matrix such that mij = 1 if vi vj A and mij = 0 otherwise. The adjacency matrix representation is a very convenient and fast tool for checking whether there is an arc from a vertex to another one. A drawback of this representation is the fact that to check all adjacencies, without using any other information besides the adjacency matrix, one needs (n2 ) time. Thus, the majority of algorithms using the adjacency matrix cannot have complexity lower than (n2 ) (this holds in particular if we include the time needed to construct the adjacency matrix). The adjacency list representation of a digraph D = (V, A) consists of a pair of arrays Adj + and Adj . Each of Adj + and Adj consists of |V | (linked) lists, one for every vertex in V . For each x V , the linked list Adj + (x) (Adj (x), respectively) contains all out-neighbours of x (in-neighbours of x, respectively) in some xed order (see Figure 1.14). Using the adjacency list Adj + (x) (Adj (x)) one can obtain all out-neighbours
1.8. SOLUTIONS
c b d a b c a e d e h f f g g h h c d e / f / g / f / g / g / a f f e c g / d /
23
Figure 1.14: A directed multigraph and a representation by adjacency lists Adj + .
(in-neighbours) of a vertex x in O(|Adj + (x)|) (O(|Adj (x)|)) time. A drawback of the adjacency list representation is the fact that one needs, in general, more than constant time to verify whether xy A. Indeed, to decide this we have to search sequentially through Adj + (x) (or Adj (x)) until we either nd y (x) or reach the end of the list. Question 1.7.1 Give the adjacency lists Adj for the graph in Fig. 1.14. In the rest of this book, we will sometimes use, for simplicity, adjacency matrices, but we will not take into consideration the time needed to construct such matrices (they are inputs). Notice, however, that in practice adjacency lists are used more often.
1.8
Solutions
Question 1.2.9 (i) Assume that 5, 3, 3, 3, 3, 3, 2 is graphic and H is a graph with this degree sequence with vertices u, v, w, t, x, y, z. We now apply the HH algorithm.
uvwtxyz 5333332 0222222 0011222 0011011 0000011
Since the last sequence is graphic, the initial one is graphic as well. The corresponding graph is depicted in the gure below.
24 CHAPTER 1.
x y
(ii) The HH algorithm shows that the sequence is not graphic.
uvwtxyz 6442211 0331100 0020000
Question 1.2.10 (i): This is not a sequence of degrees of a tree as it has only one item 1 (every tree with at least 2 vertices has two leaves). (ii) Lets use the Havel-Hakimi algorithm for 2, 2, 1, 1, 1, 1, 4. Order the sequence and assume that 4, 2, 1, 1, 1, 1 is graphic and H is a graph with this degree sequence with vertices u, v, w, x, y, z. uvwxyz 421111 010001 000000
Since the last sequence is graphic, the initial one is graphic as well. The corresponding graph G has edges {uv, uw, ux, uy, vz}, which is a tree.
Chapter 2
Walks, Connectivity and Trees

2.1 Walks, trails, paths and cycles in graphs
A walk in a graph G is an alternating sequence of vertices and edges v1 e1 v2 e2 v3 . . . vn1 en1 vn such that ei = vi vi+1 , i = 1, 2, . . . , n 1. For example, in Figure 2.1, x, xz, z, zx, x, xv, v is a walk. A walk is closed if v1 = vn and open if v1 = vn . In Figure 2.1, x, xz, z, zx, x, xv, v is an open walk, and x, xz, z, zx, x is a closed walk. Certainly, a walk is dened by v1 v2 . . . vn , i.e., the sequence of its vertices. For example, walk x, xz, z, zx, x, xv, v can be written as xzxv. We say that a walk v1 v2 . . . vn is from v1 to vn , and its rst vertex is v1 and the last vertex is vn . We say that n 1 is the length of the walk (its number of edges). Walk xzxv is from x to v and its length is 3. A trail in a graph G is a walk in which all edges are distinct. Walk x, xz, z, zx, x, xv, v in Figure 2.1 is not a trail as edge xz = (zx) repeats itself. Walk uvwuy is a trail. A path is a walk with distinct vertices. In a cycle all vertices are distinct with exception of the rst and last ones, which coincide. Thus, in a cycle v1 v2 . . . vn , v1 = vn . We say that v1 v2 . . . vn is through vi for any i = 1, 2, . . . , n. In Figure 2.1, trail uvwuy is not a path as u repeats itself. Trail uvwx is a path and trail uvwxu is a cycle.
u z t y
v w x
Figure 2.1: A graph G
25
26
CHAPTER 2. WALKS, CONNECTIVITY AND TREES x v z u
Question 2.1.1 Which of the following walks is a trail (path, cycle) ztyt, ywvxw, ywvx, ywvzty ? The following proposition is one of the reasons why mostly paths and cycles and not general walks and trails are of interest in graph theory. Proposition 2.1.2 Let G be a graph and let W be a walk in G with rst vertex u and last vertex v = u. Then G has a path from u to v. Proof: Consider a shortest walk W from u to v. Suppose that some vertices in W coincide, i.e., there is a vertex w such that W = u . . . w . . . w . . . v. However, this means that G has a walk u . . . w . . . v, which is shorter than W , a contradiction. Hence, all vertices of W are distinct, and thus W is a path. QED The denitions and results of this section hold also for digraphs. For example, in the digraph of Figure 2.2, zuwuv is a trail, zuv is a path and uwu is a cycle.
2.2
Connectivity
A graph G is connected if there is a walk from any vertex of G to any other vertex of G. By Proposition 2.1.2, G is connected if and only if there is a path from any vertex of G to any other vertex of G. Clearly, every complete graph and every complete bipartite graph are connected, for they are in one piece. Suppose that a graph G with vertices v1 , v2 , . . . , vn is connected. Merge a path from v1 to v2 with a path from v2 to v3 with a path v3 to v4 , etc. with a path from vn1 to vn . As a result, we get a walk containing all vertices in G. At the same time, if a graph H has a walk containing all vertices, then parts of this walk provide walks between pairs of vertices in H. Thus, we have obtained the following: Proposition 2.2.1 A graph G is connected if and only if G has a walk containing all vertices.
2.2. CONNECTIVITY j1 j2 j3 j4 j5 p1 p2 p3 p4 p5 p6 p7 Figure 2.3: A graph H.
27
Some graphs consists of many pieces. Consider Figure 2.3. The graph H there is not connected, it is disconnected. It consists of two pieces, one is the subgraph induced by vertices j5 , p4 , p7 and the subgraph induced by the rest of the vertices. These two subgraphs are called connectivity components of H. In general, connectivity components of a graph G are maximum connected induced subgraphs of G. There are many applications of connectivity. Consider one of them. Treat the countries of the world as vertices of a graph. We say that two countries have strong economical relations if the total annual trade between them (in both directions) is at least 1,000,000. Finding connectivity components in this graph would allow us to see what countries are economically dependent of each other directly or indirectly. Now we will introduce a simple, yet very important, technique in algorithmic graph theory called depth-rst search (DFS). DFS allows, in particular, to nd connectivity components of a graph very eciently. Let G = (V, E) be a graph. In DFS, we start from an arbitrary vertex of G. At every stage of DFS, we visit some vertex x of G. If x has an unvisited neighbour y, we visit the vertex y (if x has more than one unvisited neighbour, we choose y as an arbitrary unvisited neighbour). If x has no unvisited neighbour, we call x explored and return to the predecessor of x (the vertex from which we have moved to x). If x does not have a predecessor, we nd an unvisited vertex to restart the above procedure. If such a vertex does not exist, we stop. Each time we restart we start a new connectivity component. Actually, DFS can be used not only to nd connectivity components, but also to compute a spanning forest in a graph as we see later. In our formal description of DFS for connectivity components (DFS-CC), each vertex x of G gets a stamp: visit(x) = 0 when x has not been visited yet and visit(x) = 1 once x has been visited. In the following description, N (v) is the set of neighbours of a vertex v. To list all vertices of each connectivity component, we use root(v) and List(v): root(v) equals to some vertex x in the connectivity component containing v (we may call x a
28
CHAPTER 2. WALKS, CONNECTIVITY AND TREES
a e d c b
f g h
m
Figure 2.4: Disconnected graph H.
root-vertex) and List(v) is a set such that if List(v) is non-empty it contains all vertices belonging to the same component as v, i.e., vertices with the same root-vertex. DFS-CC Input: A graph G = (V, E). Output: List(v) such that if List(v) is non-empty it contains all vertices belonging to the same component as v 1. for v V do {root(v) := v; visit(v) := 0} 2. for v V do if visit(v) = 0 then DFS-PROC(v) 3. for v V do List(v) := 4. for v V do {u := root(v); List(u) := List(u) {v}}
DFS-PROC(v): 1. visit(v) := 1 2. for u N (v) do {if visit(u) = 0 then { root(u) := root(v); DFS-PROC(u)}}
Clearly, the main body of the algorithm takes O(|V |) time. The total time for executing the dierent calls of the procedure DFS-PROC is O(|E|) since xV d(x) = 2|E| by the sum-of-degrees proposition. As a result, the time complexity of DFS-CC is O(|V | + |E|). Thus, we have the following: Proposition 2.2.2 For a graph G = (V, E), we can nd all connectivity components in time O(|V | + |E|).
2.2. CONNECTIVITY
29
Question 2.2.3 Apply DFS-CC to nd connectivity components in graph H of Figure 2.4. Assume that in the loops of DFS-CC and DFS-PROC the vertices of H are considered in alphabetical order. Solution: Since in line 2 of DFS the vertices of H are considered in alphabetical order, a is visited rst and the rst four vertices to be visited are a, d, b, c (in this order). Well have root(a) = root(d) = root(b) = root(c) = a. After that the next four vertices will be visited in the order e, g, f, h. As a result, root(e) = root(g) = root(f ) = root(h) = e. Finally, the last four vertices will be visited in the following order: k, l, m, n and well have root(k) = root(l) = root(m) = root(n) = k. In line 4 of DFS, well get List(a) = {a, b, c, d}, List(e) = {e, f, g, h}, List(k) = {k, l, m, n}. The rest of the lists will remain empty. Thus, weve obtained three connectivity components in H.
3 5 1 2 9
8 13 6
17
10
14
11
12
16
15
Figure 2.5: A graph G
Question 2.2.4 The algorithm DFS-CC is applied to nd connectivity components in the graph G of Figure 2.5. Assume that in the loops of DFS-CC and DFS-PROC the vertices of G are considered in the natural order. Give the order in which the vertices of G are visited. Solution: The vertices of G are visited in the following order: 1, 2, 3, 5, 4, 6, 13, 16, 14, 15, 17, 7, 8, 10, 9, 11, 12. [This answer can be considered as a model answer to an exam question of this type.] Question 2.2.5 The algorithm DFS-CC is applied to nd connectivity components in the graph G of Figure 2.6. Assume that in the loops of DFS-CC and DFS-PROC the vertices of G are considered in the natural order. Give the order in which the vertices of G are visited.
30
5 6 1 2 9
17 18 10 13 14
19 4 3 11 12 16 15
Figure 2.6: Disconnected graph G
Question 2.2.6 The algorithm DFS-CC is applied to nd connectivity components in graph H of Figure 2.3. Assume that in the loops of DFS-CC and DFS-PROC the vertices of H are considered in the following order: j1 , j2 , . . . , j5 , p1 , p2 , . . . , p7 . Give the order in which the vertices of H are visited. How many connectivity components are in H?
2.3
Edge-connectivity and Vertex-connectivity (extra material)
Some other applications of connectivity are related to reliability of networks. A typical question is as follows: a graph G is connected, what is the minimum number of edges has to be deleted from G to make G disconnected? Clearly, the larger that number, called the edge-connectivity of G, the more reliable the network represented by G. The edge-connectivity of G is denoted by (G). One can easily see that (Pn ) = 1 (n 2), (Cn ) = 2 (n 3). Question 2.3.1 Prove that (Kn ) = n1 (n 2) and (Kp,q ) = min{p, q} (max{p, q} > 1). In general, for a graph G = (V, E), (G) minxV dG (x) since deleting all edges with a common end-vertex will leave that vertex isolated from the rest of the graph. For many graphs we have simply (G) = minxV dG (x). In particular, (Kn ) = minxV d(x) = n1. However, for graph G in Figure 2.7, (G) = 2 (delete edges ch, hd and G becomes disconnected, deletion of any single edge will not make G disconnected), but minxV dG (x) = 3.
2.3. EDGE-CONNECTIVITY AND VERTEX-CONNECTIVITY (EXTRA MATERIAL)31
c h
e f g
Figure 2.7: A graph
For a pair x, y of vertices in a graph G, xy (G) is the minimum number of edges whose deletion from G results in a graph in which x and y belong to dierent connectivity components. The parameter xy (G) is called the local edge-connectivity between x and y. Notice that (G) = min{xy (G) : x = y V (G)}. A pair P, Q of paths are called edge-disjoint if no edge of P is an edge of Q and no edge of Q is an edge of P . The following important theorem links edge-disjoint paths and local edge-connectivity. Theorem 2.3.2 (Menger) For a pair x, y of distinct vertices of a graph G, xy (G) equals the maximum number of edge-disjoint paths between x and y. Sometimes, reliability of a network (graph) G is determined not by the minimum number of edges whose deletion makes G disconnected, but by the minimum number of vertices of G whose deletion makes G disconnected. The last parameter is called the connectivity (or, vertex connectivity) of G. The connectivity of G is denoted by (G). One can easily see that (Pn ) = 1 (n 3) and (Cn ) = 2 (n 3). By denition, we assume that (Kn ) = n 1. All connected graphs apart from K1 have positive vertex connectivity. Question 2.3.3 Prove that (Kp,q ) = min{p, q} (max{p, q} > 1). Theorem 2.3.4 For every graph G = (V, E), (G) (G) minxV d(x). Proof: If G is disconnected, then (G) = (G) = 0 and thus the inequality follows. Assume now that G is connected. Let F be a subset of edges in G such that G F is disconnected and |F | = (G). Let U be a set of vertices formed by taking one vertex from each edge in F . We have G U is disconnected and (G) |U | |F | = (G). Thus, (G) (G). Let y V such that d(y) = minxV d(x). If we delete all edges having y as an end-vertex, we obtain two components y and G y. Thus, minxV d(x) = d(y) (G). QED
32
v z
f
Figure 2.8: Trees
Similarly to xy (G), we dene xy (G), which is the minimum number of vertices whose deletion makes G disconnected such that x and y belong to dierent components. Notice that (G) = min{xy (G) : x = y V (G)}. Theorem 2.3.5 (Menger) For a pair x, y of distinct vertices of a graph G, xy (G) equals the maximum number of paths between x and y with the following property: no pair of the paths has any common vertices apart from x and y. Mengers Theorems make it possible to construct ecient algorithms to compute (G) and (G) and their local variations, but this material is outside the scope of this lecture notes.
2.4
Basic properties of trees and forests
A forest is a graph with no cycle. A tree is a connected forest. Trees and forests play an important role in many applications of graph theory especially in computer science. Theorem 2.4.1 Let T be a tree with n vertices. Then (a) T has n 1 edges (b) addition of an edge between two non-adjacent vertices in T creates exactly one cycle (c) there is exactly one path between any pair of vertices in T (d) deletion of an edge from T creates a disconnected graph with two connectivity components According to (d) a tree is minimally connected graph. Thus, if we want to connect a set of newly created camps by roads with minimum expenses, we should construct a tree system of the roads. Question 2.4.2 Prove that for a forest F = (V, E) consisting of c trees, we have |E| = |V | c.
2.5. SPANNING TREES AND FORESTS
33
Solution: Let T1 = (V1 , E1 ), T2 = (V2 , E2 ), . . . , Tc = (Vc , Ec ) be trees of F = (V, E). By (a) of Theorem 2.4.1, we have |Ei | = |Vi | 1 (i = 1, 2, . . . , c). Thus,
c c c c
|E| =
i=1
|Ei | =
i=1
(|Vi | 1) =
i=1
|Vi |
i=1
1 = |V | c.
QED A vertex of degree 1 is called a leaf. Theorem 2.4.3 (Leaf Theorem) Every tree has a vertex of degree 1. Proof: Let T = (V, E) be a tree. Since T is connected d(v) 1 for every v V. Assume that d(v) 2 for every v V. By the sum-of-degrees proposition, 2|E| 2n, where n = |V |. Thus, |E| n. But |E| = n 1, a contradiction. So, there is a vertex of degree 1. QED Leaf Theorem can be improved as follows: Question 2.4.4 [A solution is given in the end of the chapter.] Let T be a tree with at least two vertices. Prove that T has at least two leaves.
2.5
Spanning trees and forests
Let G be a connected graph. Let us construct a connected spanning subgraph H of G with minimum number of edges by deleting edges one by one. We claim that H is a tree. Indeed, H is connected. Assume that H has a cycle. But by deleting an edge in that cycle we get a connected subgraph of G, a contradiction to the minimality of H. Thus, every connected graph G has a tree as a spanning subgraph, it is called a spanning tree T of G. If G has several components, we can nd a spanning tree in each component. As a result we get a spanning forest of G. The following pseudo-code nds a spanning forest in a graph G. The pseudo-code DFS-SF is a DFS algorithm, a modication of DFS-CC in Section 2.2. DFS-SF Input: A graph G = (V, E). Output: a set of all edges F that belong to a forest of G 1. F := ; for v V do visit(v) := 0
34
x a b a b a b y z
c f
Figure 2.9: Graphs
2. for v V do if visit(v) = 0 then DFS-PROC1(v)
DFS-PROC1(v): 1. visit(v) := 1 2. for u N (v) do {if visit(u) = 0 then {F := F {uv}; DFS-PROC1(u)}}
Question 2.5.1 Apply DFS-SF to nd spanning trees in graphs depicted in Figure 2.9.
2.6
Greedy-type algorithms and minimum weight spanning trees
In many applications, weighted graphs are of interest. A graph G = (V, E) is weighted if there is an assignment of edges to non-negative real numbers such that every edges has weight. The weights may reect various parameters such as distances between vertices, time or cost of going between vertices. Graph G in Figure 2.10 is a weighted graph. The weight of a graph is the sum of the weights of its edges. The following minimum connector problem is of interest: Given a weighted connected graph G, nd a spanning connected subgraph of G of minimum weight. This problem arises in applications. For example, suppose we created a number of new villages in the middle of nowhere and want to connect the villages with roads such that the total distance of the roads is minimum. According to the properties of trees, the minimum weight connector is a spanning tree of minimum cost. To nd this tree T the following greedy algorithm can be used. Rank edges of G e1 , e2 , . . . , em such that w(e1 ) w(e2 ) w(e3 ) w(em ). Pick edges in that order one by one and add them to (initally empty) T except when the current edge creates a cycles with previously chosen edges.
2.6. GREEDY-TYPE ALGORITHMS AND MINIMUM WEIGHT SPANNING TREES35
e 2 3 a 5 6 b 2 3 f
d 1 c G
Figure 2.10: A weighted graph and its minimum weight spanning trees
The usefulness of the greedy algorithm is due to the following result, which we do not prove. Theorem 2.6.1 The greedy algorithm always nds a minimum weight spanning tree. Question 2.6.2 Find a minimum weight spanning tree in graph G in Figure 2.10. How many minimum weight spanning trees G has? Solution: We use the greedy algorithm. We rank edges of G in the following order: cd, bc, ef, bf, be, ad, ab. We start from empty T. We pick edges cd, bc, ef and bf without creating any cycle and thus add them to T . Edge be cannot be added to T as it creates cycle bef b with edges ef and bf chosen earlier. We add edge ad to T , but we do not add ab to T as it creates cycle abcda with previously chosen edges. As a result we get T in Figure 2.10. We can rank edges of G slightly dierently: cd, bc, ef, be, bf, ad, ab. Then the greedy algorithm constructs Q in Figure 2.10 (edge be gets chosen before bf and bf cannot be picked up as it creates cycle with previously chosen edges). Thus, we get another minimum weight spanning tree. Since bc and ef have the same weight we can have several raknings of the edges, but the order of the last two edges does not matter since they both will be chosen no matter what ranking is considered. Thus, G has exactly two minimum weight spanning trees. Question 2.6.3 Find a minimum weight spanning tree and the number of minimum weight spanning trees in the graphs of Figure 2.11. The greedy algorithm for nding a minimum weight spanning tree is often called Kruskals algorithm. We give a pseudo-code of an implementation of Kruskals algorithm below. This is a relatively simple code, but not the most ecient implementation of the algorithm though. More ecient implementations wont be considered in this course.
36
e 2 4 a 5 1 G 7 3 6 10 b 9 c 1 f 1
b 1 6 a 1 1 2 g f 1 d F 5 3 2 b
e 6 f
6 4
2 d c e H
Figure 2.11: Weighted graphs
Kruskals Algorithm Input: A connected graph G = (V, E) with weights on the edges. Output: A minimum weight spanning tree T = (V, F ) of G. 1. for v V do root(v) := v 2. F := 3. sort the edges of G in the non-decreasing order e1 , e2 , . . . , em of their weights (i.e., w(e1 ) . . . w(em )) and output it as a queue Q. 4. until Q = do { delete the head uv of Q from Q; if root(u) = root(v) then { add uv to F ; for each x V do {if root(x) = root(v) then root(x) := root(u)} } } Theorem 2.6.4 The above implementation of Kruskals algorithm is correct (i.e., always produces the right solution to the minimum weight spanning tree problem).
2.6. GREEDY-TYPE ALGORITHMS AND MINIMUM WEIGHT SPANNING TREES37 Proof: We apply Theorem 2.6.1. Initially any greedy algorithm starts from the zero-edge subgraph of the graph G. Assume that in the course of the algorithm we have produced a forest with connectivity components (i.e., trees) T1 , T2 , . . . Tp (the trees include all vertices of G, i.e., some of them may consist of a single vertex). The algorithm chooses the next edge uv, which is the lightest edge among those edges that have not been considered (for inclusion in the minimum weight spanning tree) by the algorithm so far. The algorithm must include uv if and only if its inclusion does not create a cycle in T1 T2 . . . Tp . This means that the algorithm must include uv if and only if u and v belong to dierent components of T1 T2 . . . Tp , i.e. to dierent trees. Thus, the implementation of the algorithm above starts from creating the zero-edge subgraph of G in Steps 1 and 2. Step 1 indicates that each component of the subgraph consists of a single vertex, which is the root of the component. In Step 3 we sort all edges of G and keep them in a queue Q (the lightest edge is the head of Q). In Step 4 we delete the lightest edge uv from the current Q. Then we check whether u and v belong to the same connectivity component of the current forest by comparing the root vertices of their components. If u and v belong to dierent components, we add uv to T and merge the two components by assigning the root of u as a root to all vertices in the component of v. The loop of Step 4 lasts until Q is empty. QED The next theorem give the running time of the implementation. Theorem 2.6.5 The above implementation of Kruskals algorithm can be run in time O(|V |2 + |E| log |E|). Proof: Step 1 and 2 run in time O(|V |). Step 3 can run in time O(|E| log |E|) using one of the ecient sort algorithms. In Step 4 we consider all E edges of G, but only |V | 1 of them will be included in T (since T has |V | vertices and |V | 1 edges by Theorem 2.4.1). Any included edge uv will require O(|V |) operations to merge the components of u and v. Thus, Step 4 will require O(|E| + (|V | 1)|V |) O(|V |2 ) operations. QED Question 2.6.6 Applying Kruskals algorithm, nd a minimum weight spanning tree in graphs G of Figures 2.10 and 2.11. There is another algorithm for nding a minimum spanning tree T . This is not the greedy algorithm, but a greedy-type algorithm called Prims algorithm. The main idea of Prims algorithm is to build T by starting from a single vertex and increasing the current T by appending to it new vertices. It is possible to prove that Prims algorithm always solves the problem if at each iteration we add a vertex x V VT with minimum weight min{w(xz) : z VT }, i.e., with the minimum distance to the current T .
38 Prims Algorithm
Input: A connected graph G = (V, E) with weight w(uv) on every edge uv. We assume that w(uv) = if uv E. Output: A minimum weight spanning tree T = (VT , ET ) of G. 1. ET := ; choose a vertex u; VT := {u} 2. for v V {u} do dist(v) := w(vu) 3. until V = VT do { 4. nd x V VT with minimum dist(x) (here dist(x) = w(xv), v VT ); VT := VT {x}; ET := ET {xv} 5. for y V VT do dist(y) := min{dist(y), w(yx)} } Step 1 initializes T . In Step 2 we choose the rst vertex to include in T by giving it the minimum value of dist. The loop starting at Step 3 increases T by adding one vertex x at a time. The added vertex x, as we pointed out above, must have the minimum distance to the current T . Step 5 updates dist(y) for each y outside T . The update is correct since only one vertex has been added to T , so the distance decreases only of dist(y) < w(yx). Theorem 2.6.7 The above implementation of Prims algorithm runs in time O(|V |2 ). Proof: Steps 1 and 2 require O(|V |) operations. In each loop of Step 3 we spend O(|V |) time nding x (and v), updating T and dist(y) for each y outside T . There are |V | iterations of the loop. So, we get O(|V |2 ) operations as required. QED Question 2.6.8 Applying Prims algorithm, nd a minimum weight spanning tree in graphs G of Figures 2.10 and 2.11.
2.7
Solutions
Question 2.4.4 Induction on the number of vertices n. Its trivial for n = 2.
2.7. SOLUTIONS
39
Suppose its true for all trees with n 1 2 vertices. We prove it for an arbitrary tree T with n vertices. By Leaf Theorem, T has a leaf x. Consider T x; T x has two leaves y, z. If x is not adjacent to either of them, T has tree leaves: x, y, z. Since x can be adjacent to only one of them, lets assume that x is adjacent to y. Then T has two leaves: x, z.
40
Chapter 3
Directed graphs
3.1 Acyclic digraphs
For undirected graphs we have studied trees, which are connected acyclic graphs, since trees have numerous applications. Similarly, acyclic digraphs have numerous applications. A digraph is acyclic if it has no directed cycle. Actually, when we speak of cycles or paths in digraphs, we always mean directed cycles and paths without stating it. Digraph D in Figure 1.4 is not acyclic as it has cycle uwu. The digraphs in Figure 1.12 are acyclic. Clearly, the operation of converse cannot create cycles in acyclic digraphs, and thus the converse of an acyclic digraph is an acyclic digraph.
3.1.1
Acyclic ordering of acyclic digraphs
Proposition 3.1.1 Every acyclic digraph has a vertex of in-degree zero as well as a vertex of out-degree zero.
Proof: Let D be a digraph in which all vertices have positive out-degrees. We show that D = (V, A) has a cycle. Choose a vertex v1 in D. Since d+ (v1 ) > 0, there is a vertex v2 such that v1 v2 is an arc. As d+ (v2 ) > 0, v2 v3 is an arc for some v3 . Proceeding in this manner, we obtain walks of the form v1 v2 . . . vk . As V is nite, there exists the least k > 2 such that vk = vi for some 1 i < k. Clearly, vi vi+1 . . . vk is a cycle. Thus an acyclic digraph D has a vertex of out-degree zero. Since the converse H of D is also acyclic, H has a vertex v of out-degree zero. Clearly, the vertex v has in-degree zero in D. QED 41
42
CHAPTER 3. DIRECTED GRAPHS
Proposition 3.1.1 allows one to check whether a digraph D is acyclic: if D has a vertex of out-degree zero, then delete this vertex from D and consider the resulting digraph; otherwise, D contains a cycle. Let D be a digraph and let x1 , x2 , . . . , xn be an ordering of its vertices. We call this ordering an acyclic ordering if, for every arc xi xj in D, we have i < j. Clearly, an acyclic ordering of D induces an acyclic ordering of every subdigraph H of D. Since no cycle has an acyclic ordering, no digraph with a cycle has an acyclic ordering. On the other hand, the following holds: Proposition 3.1.2 Every acyclic digraph has an acyclic ordering of its vertices. Proof: We give a constructive proof by describing a procedure that generates an acyclic ordering of the vertices in an acyclic digraph D = (V, A). At the rst step, we choose a vertex v with in-degree zero. (Such a vertex exists by Proposition 3.1.1.) Set x1 = v and delete x1 from D. At the ith step, we nd a vertex u of in-degree zero in the remaining acyclic digraph, set xi = u and delete xi from the remaining acyclic digraph. The procedure has |V | steps. Suppose that xi xj in an arc in D, but i > j. As xj was chosen before xi , it means that xj was not of in-degree zero at the jth step of the procedure; a contradiction. QED Question 3.1.3 Find all acyclic orderings for digraph Q in Figure 1.12. Solution: Vertex a is the only vertex of in-degree 0 in Q. Hence a is the rst vertex in any acyclic ordering. When we delete a from Q, we get only b of in-degree 0. So, a, b is the beginning of any acyclic ordering. After deleting b, we get c and d of in-degree 0. So, now we may choose either c or d as the next vertex in an acyclic ordering. If we choose c, getting the partial acyclic ordering a, b, c, and delete it, d becomes the only vertex of in-degree 0. We include d in the ordering and get a, b, c, d, delete d, choose e, and then remaining f. Thus, a, b, c, d, e, f is an acyclic ordering. If we choose d, getting the partial acyclic ordering abd, and delete it, c becomes the only vertex of in-degree 0. We include c in the ordering and get a, b, d, c, delete d, choose e, and then remaining f. Thus, a, b, d, c, e, f is another acyclic ordering. We see that there are only two acyclic orderings of Q and they are given above. Question 3.1.4 Find all acyclic orderings for digraphs in Figure 3.1.
Now we consider an algorithm for nding an acyclic ordering of an acyclic digraph. Recall that, for a vertex x of a digraph D = (V, A), N + (x) denotes the set of outneighbours of x, i.e., vertices y such that xy A.
3.1. ACYCLIC DIGRAPHS
43
b a d Q e
c f a
c f
d R
Figure 3.1: Acyclic digraphs
DFS-A(D) Input: A digraph D = (V, A) on n vertices. Output: An acyclic ordering v1 , . . . , vn of D. 1. for v V do tvisit(v) := 0 2. i := n + 1 3. for v V do {if tvisit(v) = 0 then DFS-PROC2(v)}
DFS-PROC2(v)
1. tvisit(v) := 1 2. for u N + (v) do {if tvisit(u) = 0 then DFS-PROC2(u)} 3. i := i 1, vi := v.
Theorem 3.1.5 The algorithm DFS-A correctly determines an acyclic ordering of any acyclic digraph in time O(|V | + |A|). Figure 3.2 illustrates the result of applying DFS-A to an acyclic digraph starting from vertex x and restarting from vertex z. The resulting acyclic ordering is z, w, u, y, x, v. Question 3.1.6 Find all acyclic orderings for digraphs in Figure 3.1 using DFS-A.
44
v5
v1
v6
v4
v3
v2
Figure 3.2: The result of applying DFS-A to an acyclic digraph
3.1.2
Longest and Shortest paths in acyclic digraphs
Let D = (V, A, w) be an arc-weighted acyclic digraph. Well show that longest paths from a vertex s to the rest of the vertices can be found quite easily, using dynamic programming. Without loss of generality, we may assume that the in-degree of s is zero. Let L = v1 , v2 , . . . , vn be an acyclic ordering of the vertices of D such that v1 = s. Denote by (vi ) the length of the longest path from s to vi . Clearly, (v1 ) = 0. For every i, 2 i |V |, we have (vi ) = max{ (vj ) + w(vj , vi ) : vj N (vi )} if N (vi ) = otherwise, (3.1)
where N (vi ) is the set of in-neighbours of vi . The correctness of this formula can be shown by the following argument. We may assume that vi is reachable from s. Since the ordering L is acyclic, the vertices of a longest path P from s to vi belong to {v1 , v2 , . . . , vi }. Let vk be the vertex just before vi in P . By induction, (vk ) is computed correctly using (3.1). The term (s, vk ) + w(vk , vi ) is one of the terms in the right-hand side of (3.1). Clearly, it provides the maximum. The algorithm has two phases: the rst nds an acyclic ordering, the second implements Formula (3.1). The complexity of this algorithm is O(|V | + |A|) since the rst phase runs in time O(|V | + |A|) (see DFS-A) and the second phase requires the same asymptotic time due to the formula xV d (x) = |A|. We illustrate the above algorithm and how to nd an actual longest path from s to another vertex in Question 3.1.9. Question 3.1.7 Give a pseudo-code of the above algorithm.
3.1. ACYCLIC DIGRAPHS
45
b 2 a 3 d 5 1
c 4 3 f 2 e
b a 2 4 1 d
c -1 3 -2 f
3 Q
5 R e
Figure 3.3: Weighted digraphs
Question 3.1.8 What modications to the above algorithm are required to obtain an algorithm for nding a shortest path from a xed vertex s to each other vertex of an arcweighted acyclic digraph D? [A solution is given in the end of the chapter.] Question 3.1.9 Use the above algorithm to nd the length of a longest path from a to f in digraph Q in Figure 3.3. Solution: For a vertex x, let (x) be the length of a longest path from a to x. Consider the acyclic ordering a, b, d, c, e, f . Lets establish s(x) in the order of the acyclic ordering. We have (a) = 0. Vertex a is the only in-neighbour of b, so (b) = (a)+w(ab) = 2. Similarly for d: (d) = (a)+w(ad) = 3. Vertex c has three in-neighbours a, b, d. Hence, (c) = max{ (a) + w(ac), (b) + l(bc), (d) + l (dc)} = max{1, 6, 8} = 8. For e, we have (e) = max{ (c) + w(ce), (d) + w(de)} = max{12, 6} = 12. Finally, (f ) = max{ (c) + w(cf ), (e) + w(ef )} = 14. So, the length of a longest path from a to f is 14. Question 3.1.10 Use the above algorithm to nd the length of a shortest path from a to f in digraph R of Figure 3.3. Solution: For a vertex x let s(x) be the length of a shortest path from a to x. Note that a, b, c, d, e, f is an acyclic ordering of vertices in R. Lets establish s(x) in the order of the acyclic ordering. We have s(a) = 0, s(b) = s(a) + w(ab) = 2, s(c) = s(b) + w(bc) = 2 + 3 = 5. Since d has two in-neighbours, s(d) = min{s(a) + w(ad), s(b) + w(bd)} = min{1, 6} = 1. We have s(e) = min{s(c) + w(ce), s(d) + w(de)} = min{8, 6} = 6, s(f ) = min{s(c) + w(cf ), s(e) + w(ef )} = min{5 1, 6 2} = 4. Question 3.1.11 Find an acyclic ordering of the vertices of the digraph D in Figure 3.4. Using the longest path algorithm for acyclic digraphs, compute the length of a longest path from s to t in D. Justify your answer. [A solution is given in the end of the chapter.]
46
a 3 s 2 4 2 c 3 e 2 1 2 1 4

b 6 d 3 5 f t
Figure 3.4: An acyclic digraph D
3.1.3
Analyzing projects using PERT/CPM (extra material)
Often a large project consists of many activities some of which can be done in parallel, others can start only after certain activities have been accomplished. In such cases, the critical path method (CPM) and Program Evaluation and Review Technique (PERT) are of interest. They allow one to predict when the project will be nished and monitor the progress of the project. They allow one to identify certain activities which should be nished on time if the predicted completion time is to be achieved. CPM and PERT were developed independently in the late 1950s. They have many features in common and several others that distinguish them. However, over the years the two methods have practically merged into one combined approach often called PERT/CPM. Notice that PERT/CPM has been used in a large number of projects including a new plant construction, NASA space exploration, movie production and ship building. PERT/CPM has many tools for project management, but we will restrict ourselves only to a brief introduction and refer the reader to various books on operations research for more information on the method. We will introduce PERT/CPM using an example. Suppose the tasks to complete construction of a house are as follows (in brackets we give their duration in days): Wiring (5), Plumbing (8), Walls & Ceilings (10), Floors (4), Exterior Decorating (3) and Interior Decorating (12). We cannot start doing Floors before Wiring and Plumbing have been accomplished, we cannot do Walls & Ceilings before Wiring has been nished, we cannot do Exterior Decorating before the task Walls & Ceilings has been completed, and we cannot do Interior Decorating before Walls & Ceilings and Floors have been nished. How much time do we need to accomplish the construction? To solve the problem we rst construct a digraph N , which is called an activity-onnode (AON) project network1 . We associate the vertices of N with the starting and
Original versions of PERT and CPM used another type of netwoks, activity-on-arc (AOA) project network, but AOA networks are signicantly harder to construct and change than AON networks and it
1
3.1. ACYCLIC DIGRAPHS Pl 8 S Wi 0 5 WC 10 ED 3 F 4 Fl 12 ID
47
Figure 3.5: House construction network
nishing points of the projects (vertices S and F ) and with the activities described above, i.e., Wiring (W i), Plumbing (P l), Floors (F l), Walls & Ceiling (W C), Interior Decoration (ID) and Exterior Decorating (ED). The network N is a vertex-weighted digraph, where the weights of S and F are 0 and the weight of any other vertex is the duration of the corresponding activities. Observe that the duration of the house construction project equals the maximum weight of an (S, F )-path. As in the example above, in the general case, an AON network D is a vertex-weighted digraph with the starting and nishing vertices S and F . Our initial aim is to nd the maximum weight of an (S, F )-path in D. Since D is an acyclic digraph, this can be done in linear time using the algorithm described above after the vertex splitting procedure (replacing every vertex x by two vertices x , x and arc x x ; every arc xy is replaced by x y ). We can also use dynamic programming directly: for a vertex x of D let t(x) be the earlier time when the activity corresponding to x can be accomplished. Then t(S) = 0 and for any other vertex x, we have t(x) = (x) + max{t(y) : y N (x)}, where N (x) is the set of in-neighbours of x and (x) is the duration of the activity associated with x. The ensure that we know the value of t(y) for each in-neighbour of y of x, we consider the vertices of D in an acyclic ordering. In the example above, S, P l, W i, F l, W C, ID, ED, F is an acyclic ordering. Thus, we
makes more sense to use AON networks rather than AOA ones
48
have: t(S) = 0, t(P l) = 8 + 0 = 8, t(W i) = 5 + 0 = 5, t(F l) = l(F l) + max{t(P l), t(W i)} = 4 + 8 = 12, t(W C) = 10 + 5 = 15, t(ID) = l(ID) + max{t(F l), t(W C)} = 12 + 15 = 27, t(ED) = 3 + 15 = 18, t(F ) = max{t(ID), t(ED)} = 27. The following path is of weight 27: S, W i, W C, ID, F . Every maximum weight (S, F )-path is called critical and every vertex (and the corresponding activity) belonging to a critical path is critical. Observe that to ensure that the project takes no longer than required, no critical activity should be delayed. At the same time, delay with non-critical activities may not aect the duration of the project. For example, if we do Plumbing 13 days instead of 8 days, the project will be nished in 27 days anyway. This means that the project manager should monitor mainly critical activities and may delay non-critical activities in order to enforce critical ones (e.g., by moving workforce from a non-critical activity to a critical one). The manager may want to expedite the project (if, for example, earlier completion will result in a considerable bonus) by spending more money on it. This issue can be investigated using linear programming (studied, e.g., in CS3490).
3.2
Distances in digraphs
The distance from a vertex x in a digraph D to a vertex y is the length of a shortest path from x to y, if such a path exists and it equals , otherwise. The distance is denoted by dist(x, y). Algorithms for nding distances are of importance in many applications of digraphs.
3.2.1
Breadth First Search
Breadth First Search (BFS) is an algorithm for nding distances from a vertex s in a digraph D to all other vertices in D. BFS is based on the following simple idea. Starting at s, we visit each out-neighbour x of s. We set dist (s, x) := 1 and s := pred(x) (s is the predecessor of x). Now we visit all vertices y not yet visited and which are outneighbours of vertices x of distance 1 from s. We set dist (s, y) := 2 and x := pred(y). We continue in this fashion until we have reached all vertices which are reachable from s (this will happen after at most n 1 iterations, where n is the number of vertices in D). For the rest of the vertices z (not reachable from s), we set dist (s, z) := . A more formal description of BFS is as follows. At the end of the algorithm, pred(v) = nil means that either v = s or v is not reachable from s. The correctness of the algorithm is due to the
3.2. DISTANCES IN DIGRAPHS u v y
49
Figure 3.6: A digraph D in which the bold arcs indicate arcs used by BFS to nd distances from s.
fact that dist(s, x) = dist (s, x) for every x V . For a vertex x, N + (x) denotes the set of out-neighbours of x. BFS Input: A digraph D = (V, A) and a vertex s V. Output: dist (s, v) and pred(v) for all v V. 1. for v V do { dist (s, v) := ; pred(v) := nil} 2. dist (s, s) := 0; a queue Q := {s} 3. while Q = do { delete a vertex u, the head of Q, from Q; for v N + (u) do { if dist (s, v) = then {dist (s, v) := dist (s, u) + 1; pred(v) := u; put v to the end of Q} } The complexity of the above algorithm is O(n+m), where n = |V | and m = |A|. Indeed, Step 1 requires O(n) time. The time to perform Step 3 is O(m) as the out-neighbours of every vertex are considered only once and xV d+ (x) = m, by the sum-of-semi-degrees proposition. The algorithm is illustrated in Figure 3.6. Question 3.2.1 (a) Using BFS, nd distances from the vertex b in the digraph of Figure 1.14. (b) Using BFS, nd distances from the vertex a in the digraph of Figure 1.14. Question 3.2.2 Explain how we can use pred in BFS to nd the actual shortest paths from s.
50
3.2.2
Dijkstras algorithm
The next algorithm, due to Dijkstra, nds the distances from a given vertex s in a weighted digraph D = (V, A, c) to the rest of the vertices, provided that all the weights c(xy) of arcs xy are non-negative. In the course of the execution of Dijkstras algorithm, the vertex set of D is partitioned into two sets, P and Q. Moreover, a parameter v is assigned to every vertex v V . Initially all vertices are in Q. In the process of the algorithm, the vertices reachable from s move from Q to P . While a vertex v is in Q, the corresponding parameter v is an upper bound on dist(s, v). Once v moves to P , we have v = dist(s, v). A formal description of Dijkstras algorithm follows. In the description, N + (v) is the set of out-neighbours of v. Dijkstras algorithm Input: A weighted digraph D = (V, A, c), such that c(a) 0 for every a A, and a vertex s V. Output: The parameter v for every v V such that v = dist(s, v). 1. P := ; Q := V ; s := 0; for v V s do v := 2. while Q = do { nd v Q such that v = min{u : u Q} Q := Q v; P := P v for u Q N + (v) do u := min{u , v + c(v, u)} } Theorem 3.2.3 Dijkstras algorithm determines the distances from s to all other vertices in time O(n2 + m), where n = |V |, m = |A|. Figure 3.7 illustrates Dijkstras algorithm. Question 3.2.4 Execute Dijkstras algorithm on the digraph in Figure 3.8.
3.2.3
The Floyd-Warshall algorithm
Strong connectivity for digraphs is an equivalent of connectivity for undirected graphs. A digraph D is strongly connected if for any pair x, y of vertices in D, there is a path from x to y and a path from y to x.
3.2. DISTANCES IN DIGRAPHS

9 s 0 3 2 6 1 2 2 7 s 0 3 3 2 1 9 2 6 1 2 2 2 7 9 2 1
51
(a)
5 9 s 0 3 3 2 6 1 2 4 2 2 7 s 0 3 3 2 1 9 2 6 5
(b)
6
2 2 1
1 2 7 11
4 2
(c)
5 9 s 0 3 3 2 6 1 2 4 2 2 7 11 s 0 3 3 2 6 5 9 2 6
(d)
6
2 2 1
1 2 7 7
(e)
5 9 s 0 3 3 2 6 1 4 2 2 7 7 2 6 1
(f )
(g)
Figure 3.7: Execution of Dijkstras algorithm. The white vertices are in Q; the black vertices are in P. The number above each vertex is the current value of the parameter . (a) The situation after performing the rst step of the algorithm. (b)(g) The situation after each successive iteration of the loop in the second step of the algorithm. The bold arcs give a shortest path tree.
The Floyd-Warshall algorithm allows one to nd all pairwise distances between the vertices of a strongly connected digraph D. We assume that we are given a strong weighted digraph D = (V, A, c) such that some weights c(a) of arcs are negative, but D has no cycle of total negative weight. (If such a cycle exists, no distance is dened as the distance between some pair of vertices of the cycle is negative, which is impractical.)
m In this subsection, it is convenient to assume that V = {1, 2, . . . , n}. Denote by ij the length of a shortest (i, j)-path in the subgraph of D induced by {1, 2, . . . , m 1} {i, j},
52
7

2 10 1 s 7 1 3 1 1 2 4 12 1 4
Figure 3.8: A digraph with non-negative weights on the arcs.

1 for all 1 m n 1. In particular, ij is the length of the path ij, if it exists. Observe that a shortest (i, j)-path in the subgraph of D induced by {1, 2, . . . , m} {i, j} either m+1 m does not include the vertex m, in which case ij = ij , or does include it, in which case m+1 m m ij = im + mj . Therefore, m+1 m m m ij = min{ij , im + mj }.
(3.2)
m Observe that ii = 0 for all i = 1, 2, . . . , n, and, furthermore, for all pairs i, j such that 1 1 i = j, ij = c(i, j) if ij A and ij = , otherwise. Formula (3.2) is also correct when there is no (i, j)-path in the subgraph of D induced by {1, 2, . . . , m} {i, j}. Clearly, n+1 ij is the length of a shortest (i, j)-path (in D). It is also easy to verify that O(|V |3 ) n+1 operations are required to compute ij for all pairs i, j.
The above assertions can readily be implemented as a formal algorithm, but we leave it as an exercise.
Question 3.2.5 Give a pseudo-code of the Floyd-Warshall algorithm.
m Theorem 3.2.6 A weighted digraph D has a negative cycle if and only if ii < 0 for some m, i {1, 2, . . . , n}.
Question 3.2.7 Let D be a weighted digraph and let s be a xed vertex in D. Which algorithms can be used for nding distances from s to the rest of the vertices if (a) D is acyclic and s is of in-degree zero ? (b) D is strongly connected and all weights are non-negative ? (c) D is strongly connected, some weights are negative, but there is no negative cycle ?
3.3. STRONG CONNECTIVITY IN DIGRAPHS
53
b a d H e
c f
x a y
c f xy abd Q cfe
d R
Figure 3.9: Digraphs
3.3
Strong connectivity in digraphs
Recall that a digraph D is strongly connected if for any pair x, y of vertices in D, there is a path from x to y and a path from y to x. Strong connectivity has many applications, see, e.g., [BG2000].
3.3.1
Basics of strong connectivity
Proposition 3.3.1 A digraph D is strongly connected if and only if it has a closed walk containing all vertices of D. Proof: If D has a closed walk W , a part of W from x to y gives a walk from x to y for each x = y V (D). Thus, D is strongly connected. If D is strongly connected, list it vertices v1 , v2 , . . . vn and merge a (v1 , v2 )-path with a (v2 , v3 )-path with a (v3 , v4 )-path . . . with a (vn , v1 )-path. We have obtained a closed walk containing all vertices of D. QED In Figure 3.9 H is strongly connected, while R is not. To see that H is strongly connected it is enough to observe, by Proposition 3.3.1, that H has a closed walk containing all vertices. Indeed, abcef cedca is such a walk. To see that R is not strongly connected, it is enough to observe that there is no path from a to x as all arcs between vertices x, y and the rest of the digraph are directed from x, y. A maximum strongly connected induced subgraph of a digraph D is called a strong component of D. Proposition 3.3.2 The vertices of a digraph D can be partitioned into several subsets each of which induces a strong component of D. The essence of this proposition is that any digraph can be partitioned in strong components and arcs between them. If we contract strong components to single vertices and
54
b a d H e
c f
x a y
c f d Q abcfe
d R
Figure 3.10: Digraphs
get rid of parallel arcs, we get an acyclic digraph called the strong component digraph of D. For example, digraph R in Figure 3.9 has strong components induced by vertices {x, y}, {a, b, d}, {c, f, e}, respectively and Q is the strong component digraph of R.
3.3.2
Algorithms for nding strong components
To nd strong components of a digraph D the following algorithm called the cyclecontraction algorithm can be used. Find a cycle in D and contract it to a single vertex naming this vertex by the set of vertices of this cycle. We proceed in this way until we get an acyclic digraph H. The names of its vertices are vertex sets of strong components of D and H itself is the strong component digraph of D. To have a full Question 3.3.3 Find the strong components and strong component digraph of digraph H of Figure 3.10. Solution: Contract cycle abca and then cycle cef c. We get digraph Q in Figure 3.10, which is the strong component digraph of H. The strong components of H are the subgraph of H induced by {a, b, c, f, e} and vertex d. Question 3.3.4 Find the strong components and strong component digraph of digraph R of Figure 3.10. To have a full description of the cycle-contraction algorithm one needs to specify how to nd a cycle. This can be done, but the algorithm will not be very ecient. Instead, well dene a much faster algorithm based on ... the algorithm DFS-A for acyclic orderings in acyclic digraphs. SCA(D) Input: A digraph D = (V, A). Output: The vertex sets of strong components of D.
3.4. APPLICATION: SOLVING THE 2-SATISFIABILITY PROBLEM (EXTRA MATERIAL)55

7 1 5 7 1 5
(a)
(b)
Figure 3.11: (a) A digraph D; the order of vertices found by DFSA is shown. (b) The converse D of D; the bold arcs are the arcs of a DFS forest for D .
1. Call DFS-A(D) to compute the acyclic ordering v1 , v2 , . . . , vn . 2. Compute the converse D of D. 3. Call DFS-CC(D ) in Section 2.2, but in the main loop of DFS-CC consider the vertices according to the ordering v1 , v2 , . . . , vn and use N + (v) rather than N (v). Figure 3.11 illustrates the strong component algorithm (SCA). The complexity of SCA is O(|V | + |A|). Question 3.3.5 Using SCA nd the strong components of a digraph D = (V, A) with V = {a, b, c, d, e} and A = {ab, ba, ac, cd, dc, de}. Solution: When we start from a, DFS-A nds the following acyclic ordering: a, c, d, e, b. The converse D of D has the same vertices as D and arc set {ab, ba, ca, cd, dc, ed}. Finally, DFS-CC applied to D with vertex ordering a, c, d, e, b nds three strong components of D. They have vertices {a, b}, {c, d} and {e}. Question 3.3.6 Using SCA nd the strong components and strong component digraphs of digraphs H and R of Figure 3.10.
3.4
Application: Solving the 2-Satisability Problem (extra material)
In this section we deal with a problem that is not a problem on graphs, but it has applications to several problems on graphs, in particular when we want to decide whether a given undirected graph has an orientation with certain properties. We will show how to solve this problem eciently using the algorithm for strong components of digraphs from the previous section.
56
A boolean variable x is a variable that can assume only two values 0 and 1. The sum of boolean variables x1 + x2 + . . . + xk is dened to be 1 if at least one of the xi s is 1 and 0 otherwise. The negation x of a boolean variable x is the variable that assumes the value 1 x. Hence x = x. Let X be a set of boolean variables. For every x X there are two literals, over x, namely x itself and x. A clause C over a set of boolean variables X is a sum of literals over the variables from X. The size of a clause is the number of literals it contains. For example, if u, v, w are boolean variables with values u = 0, v = 0 and w = 1, then C = (u + v + w) is a clause of size 3, its value is 1 and the literals in C are u, v and w. An assignment of values to the set of variables X of a boolean expression is called a truth assignment. If the variables are x1 , . . . , xk , then we denote a truth assignment by t = (t1 , . . . , tk ). Here it is understood that xi will be assigned the value ti for i = 1, . . . , k. The 2-satisability problem, also called 2-SAT, is the following problem. Let X = {x1 , . . . , xk } be a set of boolean variables and let C1 , . . . , Cr be a collection of clauses, all of size 2, for which every literal is over X. Decide if there exists a truth assignment t = (t1 , . . . , tk ) to the variables in X such that the value of every clause will be 1. This is equivalent to asking whether or not the boolean expression F = C1 . . . Cp can take the value 1. Depending on whether this is possible or not, we say that F is satisable or unsatisable. Here stands for boolean multiplication, that is, 1 1 = 1, 1 0 = 0 1 = 0 0 = 0. For a given truth assignment t = (t1 , . . . , tk ) and literal q we denote by q(t) the value of q when we use the truth assignment t (i.e. if q = x3 and t3 = 1, then q(t) = 1 1 = 0) To illustrate the denitions, let X = {x1 , x2 , x3 } and let C1 = (x1 +x3 ), C2 = (x2 +x3 ), C3 = (x1 +x3 ) and C4 = (x2 +x3 ). Then it is not dicult to check that F = C1 C2 C3 C4 is satisable and that taking x1 = 0, x2 = 1, x3 = 1 we obtain F = 1. If we allow more than 2 literals per clause then we obtain the more general problem Satisability (also called SAT), which is much harder to solve than 2-SAT. Below we will show how to reduce 2-SAT to the problem of nding the strong components in a certain digraph. We shall also show how to nd a satisfying truth assignment if one exists. Let C1 , . . . , Cr be clauses of size 2 such that the literals are taken among the variables x1 , . . . , xk and their negations and let F = C1 Cr be an instance of 2-SAT. Construct a digraph DF as follows: Let V (DF ) = {x1 , . . . , xk , x1 , . . . , xk } (i.e. DF has two vertices for each variable, one for the variable and one for its negation). For every choice of p, q V (DF ) such that some Ci has the form Ci = (p + q), A(DF ) contains an arc from p to q and an arc from q to p (recall that x = x). See Figure 3.12 for examples of a 2-SAT expressions and the corresponding digraphs. The rst expression is satisable, the second is not. Lemma 3.4.1 If DF has a (p, q)-path, then it also has a (q, p)-path. In particular, if p, q belong to the same strong component in DF , then p, q belong to the same strong component
3.4. APPLICATION: SOLVING THE 2-SATISFIABILITY PROBLEM (EXTRA MATERIAL)57 x3 x3
x1
x2
x1
x2
x2
x1
x2
x1
x3 (a)
x3 (b)
Figure 3.12: The digraph DF is shown for two instances of 2-SAT. In (a) F = (x1 + x3 ) (x2 + x3 ) (x1 + x3 ) (x2 + x3 ) and in (b) F = (x1 + x2 ) (x1 + x2 ) (x2 + x3 ) (x2 + x3 )
in DF . Lemma 3.4.2 If DF contains a path from p to q, then, for every satisfying truth assignment t, p(t) = 1 implies q(t) = 1. The following is an easy corollary of Lemma 3.4.1 and Lemma 3.4.2. Corollary 3.4.3 If t is a satisfying truth assignment, then for every strong component D of DF and every choice of distinct vertices p, q V (D ) we have p(t) = q(t). Furthermore we also have p(t) = q(t). Lemma 3.4.4 F is satisable if and only if for every i = 1, 2, . . . , k, no strong component of DF contains both the variable xi and its negation xi . The lemmas above imply the following algorithm for solving 2-SAT. 2-SAT Solver Input: An instance F = C1 C2 Cp of 2-SAT; the set R of all variables of F and their negations. Output: F is satisable or not. If F satisable, a truth assignment satisfying F. 1. Construct DF = (V, A) as follows: V := R; A := ; for i from 1 to p do A := A {(, y), (, x)}, where x, y are the literals in Ci . x y
58
2. Find strong components D1 , D2 , . . . , Ds of D (the components are in acyclic order, i.e., there is no arc from Dj to Di if i < j). 3. sat := 1; for i from 1 to p do if Di contains a variable and its negation then sat := 0; if sat = 1 then R is satisable else R is unsatisable 4. if sat = 1 do { for each literal r R do r := 2; for i from p downto 1 do { for each literal r in Di do if r=2 then { r := 1; r := 0 } } }
Proposition 3.4.5 Let p be the number of clauses and q be the number of variables of 2-SAT. We can solve 2-SAT in time O(p + q). Proof: It is enough to prove that 2-SAT Solver can run in time O(p + q). Since R has 2q literals and, in the end of Step 1, A contains 2p edges, we need O(p + q) time to construct DF = (V, A) in Step 1. Notice that DF has 2q vertices and 2p arcs. Thus, using SCA we can perform Step 2 in time O(p + q). It is easy to see the remaining two steps also run in time O(p + q). QED Question 3.4.6 Using 2-SAT Solver, determine which instance of 2-SAT in Figure 3.12 is satisable and nd its satisfying truth assignment. Question 3.4.7 Using 2-SAT Solver, determine whether the following instance of 2-SAT is satisable and, if it is, nd its satisfying truth assignment: (x+y)(x+)(y+z)( +). y x z
3.5
Solutions
Question 3.1.8 There two alternatives: (a) replace max by min and use the same formula; (b) replace the sigh of all weights of the digraph and use the longest path algorithm for the new weighted digraph. Question 3.1.11 One acyclic ordering is s, a, b, c, d, e, f, t (there are others).
3.5. SOLUTIONS
59
Let l(x) be the length of a longest path from s to a vertex x. Determine l(x) for all vertices in the order of the acyclic ordering. l(s) = 0. l(a) = l(s) + w(s, a) = 3. l(b) = l(a) + w(ab) = 3 + 1 = 4. l(c) = max{l(s) + w(sc), l(a) + w(ac)} = max{2, 1} = 1. l(d) = max{l(b) + w(bd), l(c) + l(cd)} = max{0, 6} = 6. l(e) = 4. l(f ) = 10. l(t) = max{l(b) + w(bt), l(d) + w(dt), l(f ) + w(f d)} = 15. The length of a longest path from s to t is 15.
60
Chapter 4
Colourings of Graphs, Independent Sets and Cliques

4.1 Basic denitions of vertex colourings
A vertex colouring of a graph G = (V, E) is an assignment that assigns a colour to every vertex of G. A proper vertex colouring assigns dierent colours to end-vertices of all edges. For a positive integer k, a vertex k-colouring is a colouring that uses exactly k dierent colours. For example, three vertex colourings of a graph are depicted in Figure 4.1. The rst colouring (the left hand side one) is not a proper vertex colouring since there we have two edges with end-vertices of the same colour. The second and third colourings are proper and they are a vertex 4-colouring and vertex 3-colouring. A graph is said to be vertex k-colourable if it has a proper vertex k-colouring. So, the graph in Figure 4.1 is vertex 4- and 3-colourable. The chromatic number of a graph G, (G) = 3, is the minimum k such that G is vertex k-colourable. To see that the chromatic number of the graph H in Figure 4.1 is 3, it is enough to observe that H is vertex 3-colourable and H has no proper vertex 2-colouring as H
Figure 4.1: Three vertex colourings of a graph
61
62CHAPTER 4. COLOURINGS OF GRAPHS, INDEPENDENT SETS AND CLIQUES
Figure 4.2: Bipartite graphs
contains (as a subgraph) K3 , the complete graph on 3 vertices and (K3 ) = 3. Recall that a complete graph Kn on n vertices have vertices adjacent to each other. The vertices of a complete bipartite graph Kn,m can be partitioned into two sets, partite sets V1 and V2 , such that every edge has one end-vertex in V1 and the other in V2 and every vertex of V1 is adjacent to each vertex of V2 . It is clear that (Kn ) = n. Also, for complete (Kn,m ) = 2 since every vertex from one partite set can get colour 1 and every vertex from the other partite set can get colour 2 and we obtain a proper vertex 2-colouring.
4.2
Bipartite graphs and digraphs
A graph G is bipartite if its vertices can be partitioned into two sets, partite sets V1 and V2 , such that every edge has one end-vertex in V1 and the other in V2 . Clearly, a complete bipartite graph is a bipartite graph. Let G be a bipartite graph with partite sets V1 , V2 . If we assign colour 1 to every vertex in V1 and colour 2 to every vertex in V2 , we will get a proper vertex 2-colouring of G. Thus, (G) 2. (If G has no edges, then (G) = 1, and if G has an edge, then (G) = 2.) Moreover, it is easy to see that if (G) 2, then G is bipartite. Thus, the class of bipartite graphs coincides with the class of graphs of chromatic number at most 2. The following is a characterization of bipartite graphs in terms of cycle lengths. Theorem 4.2.1 A graph is bipartite if and only if it has no cycle of odd length. Thus, the graph in Figure 4.1 is not bipartite (it has cycles of length 3). On the other hand, graphs in Figure 4.2 are all bipartite. Every tree is bipartite as a tree has no cycles (in particular, cycles of odd lengths !). Question 4.2.2 What is the minimum number of edges that we need to delete from the graph H of Figure 4.1 to make it bipartite?
4.2. BIPARTITE GRAPHS AND DIGRAPHS
63
(b) (a)
Figure 4.3: Non-bipartite graphs
(c)
UG(D)
Figure 4.4: A digraph D and its underlying graph
Solution: The graph H is not bipartite as it has cycles of odd lengths. If we delete the two diagonal edges, then we get the left hand side graph of Figure 4.2, i.e., a bipartite graph. So, if we delete the two edges from H, it will become bipartite. On the other hand, deletion of only one edge in H will leave H with at least one cycle of length 3 (odd length). Thus, 2 is the required minimum number. Question 4.2.3 What is the minimum number of edges that we need to delete from each of the graphs in of Figure 4.3 to make it bipartite? The underlying graph of a digraph D (denoted by U G(D)) is obtained from D by replacing arcs by (undirected) edges and eliminating parallel edges. See Figure 4.4. A digraph is called bipartite if its underlying graph is bipartite. One may think that the following claim is true: a digraph is bipartite if and only if it has no (directed) cycles of odd length. However, this claim is wrong. Indeed, a digraph L with vertices v1 , v2 , v3 and arcs v1 v2 , v2 v3 , v1 v3 has no cycles (i.e., acyclic), but U G(L) is K3 and thus is not bipartite. However, the claim is true for strongly connected digraphs. Theorem 4.2.4 A strongly connected digraph D is bipartite if and only if D has no cycle of odd length.
4.3
Periods of digraphs and Markov chains (extra material)
Theorem 4.2.4 has a generalizations, which is of importance to applications. To state the generalization, we need some additional denitions. The greatest common divisor of a set S of positive integers is the largest positive integer k that divides every integer from S. Foe example, the greatest common divisor of 6, 4, 8, 10 is 2 and the greatest common divisor of 6, 9 is 3. The period, p(D), of a strongly connected digraph D is the greatest common divisor of the lengths of cycles in D. Since D is strongly connected, it has a cycle, so p(D) is dened. If D is bipartite, then p(D) 2.
Theorem 4.3.1 If a strongly connected digraph D = (V, A) has period p 2, then V can be partitioned into sets V1 , V2 , . . . , Vp such that every arc with the initial end-vertex in Vi has the terminal end-vertex in Vi+1 for every i = 1, 2, . . . , p, where Vp+1 = V1 .
Consider an algorithm, by Balcer and Veinott, to compute p(D) for a strongly connected digraph D. If, for a vertex x of d+ (x) 2, we contract all out-neighbours of x and delete any parallel arcs obtained, then the resulting digraph has the same period as the original digraph by Theorem 4.3.1. Repeating this iteration, we will nally obtain a cycle C. The length of C is the desired period. For example, see Figure 4.5.
Question 4.3.2 Find p(H) and p(L) for the following two digraphs using the BalcerVeinott algorithm. The digraph H is given by V (H) = {a, b, c, d}, A(H) = {ab, bc, cd, da, ad} and the digraph L is given by V (L) = {a, b, c, d, e, f }, A(L) = {ab, bc, cd, da, be, ef, f b}.
Theorem 4.3.1 and the above algorithm are of interest due to applications. Let us consider one of them. It is on so-called Markov chains, but we will speak on some similar process. Suppose we have n water containers S1 , S2 , . . . , Sn . A container Si has pi fraction of water in it, pi 1. At every stage, some fraction pij of water in container Si is put into container Sj (for all i, j). We are interested to know how the water will be distributed after a very large number of stages. To nd out that one constructs a digraph D with vertices S1 , S2 , . . . , Sn and arcs Si Sj for all i, j such that pij > 0. If p(D) = 1, then after a suciently large number of stages, the water distribution will be almost stationary, i.e., every container will have a certain fraction of water all the time (with possibly very small variations). If p(D) 2, then the water will be moved cyclically (after a large number of stages) and the cycle will have length p(D).
4.4. COMPUTING CHROMATIC NUMBER

a b b
65
c d h ad
c h ad
c h
be
ad
adg
adg
be
be
beh
cf
cf
cf
Figure 4.5: Illustrating the Balcer-Veinott algorithm
4.4
Computing chromatic number
We already know that (Kn ) = n and (Kn,m ) = 2. If n is even, then cycle Cn is a bipartite graph. So, (Cn ) = 2. If n is odd, Cn is not bipartite. Hence, (Cn ) 3. However, we can colour all vertices of Cn but one with colours 1 and 2 without creating an edges with end-vertices of the same colour (colour vertices along the cycle 1,2,1,2, etc.). Assign colour 3 to the remaining vertex. Clearly, we get a proper vertex 3-colouring of Cn . Hence, (Cn ) = 3 when n is odd. Let us consider another class of graph called wheels. Wheel Wn consists of cycle Cn1 and an extra vertex that is adjacent with every vertex in the cycle. See Figure 4.6.
W7
W4
Figure 4.6: Wheels
W5
66CHAPTER 4. COLOURINGS OF GRAPHS, INDEPENDENT SETS AND CLIQUES Since the extra vertex requires the extra colour, we have (Wn ) = 3 if n is odd and (Wn ) = 4 if n is even. u b z b c a y f v f d g e a c e w x d R G H Figure 4.7: Undirected graphs
Question 4.4.1 Find the chromatic numbers of the graphs in Figure 4.7. Justify your answers. Solution: (G) = 4 since (G ) = 4, where G is the subgraph of G induced by a, b, c, d. A 4-coloring of G can be extended to a proper 4-colouring of G. (H) = 3 since c(w) = c(y) = 1, c(v) = c(u) = 2, c(x) = c(z) = 3 is a proper 3-colouring of H and H contains K3 . (R) = 4 since (R f ) = 3 (odd cycle) and f must have the fourth colour. Question 4.4.2 A lecture timetable is to be drawn up. Since some students wish to attend several lectures, certain lectures must not coincide. The asterisks in the following table show which pairs of lectures cannot coincide. How many periods are needed to timetable all ve lectures? lec. a b c d e a * * * b * * * c * * * d * * * e * * * * -
Solution: We represent this situation by a graph G with vertices a, b, c, d, e. Two vertices of G are adjacent if the corresponding lectures cannot coincide. We see that G = W5 . We know that (W5 ) = 3. Thus, the vertices of G can be partitioned into sets V1 , V2 , V3 such that no edge has end-vertices in the same Vi . That means lectures in each of Vi can coincide. Thus, we need 3 periods of time to schedule all ve lectures.
4.5. GREEDY COLOURING AND INTERVAL GRAPHS
67
4.5
Greedy colouring and interval graphs
For a graph G with vertices ordered (in some order) v1 , v2 , . . . , vn , the greedy colouring procedure assigns colours 1, 2, . . . to the vertices in such a way that the vertices are considered in the above order and vi receives the smallest colour distinct from its already coloured neighbours. We illustrate the procedure using the wheel W6 with V (W6 ) = {v1 , . . . , v6 } and E(W6 ) = {v1 v2 , v2 v5 , v5 v6 , v6 v4 , v4 v1 } {v3 vi : i = 3}. We colour vertices in the following order: v1 , v2 , . . . , v6 . We have color(v1 ) = 1. color(v2 ) = 2 since v2 has v1 as a neighbour. color(v3 ) = 3 since v1 , v2 are neighbours of v3 . color(v4 ) = 2 since v1 is a neighbour of v4 , but v2 is not. color(v5 ) = 1 as v1 is not a neighbour of v5 . color(v6 ) = 4 as it has neighbours of colours 1,2 and 3. Thus, the colouring uses 4 colours. [4 marks] We know that (W6 ) = 4 (recall that (C5 ) = 3 and the central vertex requires an extra colour). Thus, the obtained colouring is optimal. However, very often the greedy procedure does not produce an optimal colouring. For example, consider C6 with vertices 1, 2, . . . 6 and edges 12, 23, 34, 45, 56, 61. Let v1 = 1, v2 = 4, v3 = 2, v4 = 3, v5 = 5, v6 = 6. Then the greedy procedure gives colour 1 to both vertices 1 and 4, colour 2 to vertices 2 and 5 and colour 3 to vertices 3 and 6. However, (C6 ) = 2. Thus, the colouring obtained by the greedy procedure is not optimal. Theorem 4.5.1 Let (G) be the maximum degree of a vertex in a connected graph G. If G is not regular (i.e., not every degree is (G)), then (G) (G). If G is regular, then (G) (G) + 1. Proof: Let G be non-regular and let x be a vertex of G with degree d(x) < (G). Use BFS to nd distances from x to the rest of the vertices. Let l be the longest distance from x to a vertex in G. Order vertices of G v1 , . . . , vn such that rst we list vertices distance l from x, than distance l 1 from x, etc. Suppose we have coloured all vertices v1 , . . . , vi1 with colours 1, 2, . . . , (G) using the greedy algorithm and we want to colour vi , i < n. Since dist(x, vi ) > 0, among neighbours of vi at most d(vi ) 1 (G) 1 have been coloured (since there is a vertex vj on a shortest path from x to vi and, clearly, j > i). Thus, the greedy algorithm can choose a colour for vi among 1, 2, . . . , (G) that is not used for any neighbour of vi in the set {v1 , . . . , vi1 } of already coloured vertices. It remains to consider the case of v1 . Since d(x) < (G), the greedy algorithm can nd a colour for v1 among 1, 2, . . . , (G) that is not used for any neighbour of v1 . QED In fact, a slightly stronger theorem holds: Theorem 4.5.2 (Brooks Theorem) Let (G) be the maximum degree of a vertex in a connected graph G. If G is not complete and G is not an odd cycle, then (G) (G).
a a b c d e e
b c d
Figure 4.8: Intervals and interval graph Consider the following register problem. To speed up arithmetic computations computations the values of variables are stored in fast registers. The number of registers is restricted, so we want to use them economically. For every variable we have a time interval when the variable is used (between the rst and last uses). If two variables have non-intersecting time intervals, we can allocate the same register to the two variables. To nd the minimum number of registers needed, we construct a graph G whose vertices correspond to the variables and two vertices are adjacent if the corresponding two time intervals intersect. The chromatic number of G equals the minimum number of registers since variables corresponding to the vertices of the same colour have non-intersecting intervals. Similar graphs, called interval graphs, appear in other applications. A graph G is interval if there is a set of intervals corresponding to the vertices of G such that two vertices of G are adjacent if and only if the corresponding intervals intersect. See Figure 4.8. Not every graph is interval. Question 4.5.3 Prove that K2,2 is not an interval graph. Let K2,2 has vertices x1 , x2 , y1 , y2 and edges xi yj , where 1 i, j 2. Assume that K2,2 is interval, i.e., there are four intervals I1 , I2 , J1 , J2 corresponding x1 , x2 , y1 , y2 respectively. This means that every Ik intersects with every Js , but I1 and I2 do not intersect, nor do J1 and J2 . It is easy to check that this is not possible. Question 4.5.4 Prove that every Kn is an interval graph. Question 4.5.5 Construct the interval graph H for the following family of intervals: {[1, 3], [2, 4], [1, 5], [3.5, 6], [3.5, 7]}. For a graph G, let (G) denotes the number of vertices in a largest complete subgraph of G. For example, (Kn ) = n, (Kn,m ) = 2. Clearly, (G) (G) since the vertices of a complete subgraph of G require dierent colours in any proper vertex colouring.
4.6. EDGE COLOURINGS (EXTRA MATERIAL)
69
Theorem 4.5.6 Let G be an interval graph. Then (G) = (G). Moreover, a proper vertex (G)-colouring can be obtained by the greedy colouring procedure provided the vertices of G are ordered v1 , v2 , . . . , vn in the increasing order of the left end-points of their intervals. Question 4.5.7 Prove that the chromatic number of the graph H constructed in Question 4.5.5 equals 4 without producing the actual vertex colouring. Find a proper vertex 4colouring of H.
4.6
Edge colourings (extra material)
Sometimes, we need to colour edges rather than vertices in a graph G. Such a colouring is called a proper edge colouring if no two edges of G with the same end-vertex are assigned the same colour. The minimum number of colours in a proper edge colouring of G is its chromatic index denoted by (G). If (G) is the maximum degree of a vertex in G, then it is easy to see that (G) (G). Question 4.6.1 Consider Kn,n with vertices x1 , x2 , . . . , xn , y1 , y2 , . . . , yn and edges xi yj (1 i, j n). Prove that (Kn,n ) = n. Hint: See Figure 4.9. Thus, (Kn,n ) = (Kn,n ). It is easy to see that (K3 ) = 3 = (K3 )+1. The following theorem is the main result on the chromatic index. Theorem 4.6.2 (Vising) If (G) is the maximum degree of a vertex in G, then (G) (G) (G) + 1.
4.7
Independent Sets and Cliques
Given a graph G = (V, E) a set X V is an independent set X of G if no edge contains two vertices from X. In other words X V and for any two vertices in X, there is no edge connecting them. For example, {a, c} and {b, d} are independent sets in the graph G of Fig. 4.10. The set {a, d, e} is an independent set in graph H of Fig. 4.10. Suppose that you are organising a cruise, where none of the guests are allowed to know each other. You have a number of applicants, who want to go on your cruise, and you know exactly which of the applicants know each other. Finally you want as many people on your cruise as possible. How do you decide who to invite on the cruise?
1 3 2
2 3 1 3 2
Figure 4.9: Optimal K3,3 edge colouring
Hopefully after a little thought, you realise that this is not an easy problem if you have several thousand applicants, and many of them know each other. In fact the problem is equivalent to nding a maximum independent set in a graph, i.e., an independent set with maximum number of vertices. If you build a graph where each node corresponds to an applicant, and you add an edge between two nodes, if they know each other. Now any independent set corresponds to a set of applicants, where nobody knows each other. So if we could nd a maximum independent set in a graph, then we also solve our cruise problem.
e f a b c
e f d a b
g d 1 G c e H f d F c
Figure 4.10: Graphs
Question 4.7.1 Find the cardinality of a maximum independent set in the graph H of
4.7. INDEPENDENT SETS AND CLIQUES Fig. 4.10.
71
Solution: Observe that S = {a, d, e} is an independent set in H. To show that S is a maximum independent set in H, we will prove that H contains no independent set with 4 vertices. Suppose H does contain an independent set I with 4 vertices. Notice that I cannot contain c as it has three neighbours, all of which we cannot eliminate by deleting only two vertices from V (H) (to obtain I). Thus, c I. Observe that H c is a path and no vertex of this path can be deleted to eliminate all edges. Question 4.7.2 Find the cardinality of a maximum independent set in the graphs G and F of Fig. 4.10. The Cartesian product of graphs G and H, written G H, is the graph with vertex set V (G) V (H)} = {(u, v) : u V (G), v V (H)} in which (u, v) is adjacent to (u , v ) if and only if (1) u = u and vv E(H) or (2) v = v and uu E(G). Question 4.7.3 Show that K2 K2 is isomorphic to C4 . Solution: Let the rst K2 has vertices 1, 2 and edge 12 and the second K2 has vertices 1 , 2 and edge 1 2 . Then V (K2 K2 ) = {(1, 1 ), (1, 2 ), (2, 1 ), (2, 2 )}. By the denition we have the following edges: (1, 1 )(1, 2 ), (1, 2 )(2, 2 ), (2, 2 )(2, 1 ), (2, 1 )(1, 1 ). No other edges are there, so K2 K2 is a cycle with 4 vertices. Question 4.7.4 Show that K2 K3 is isomorphic to the graph H of Fig. 4.10. The cartesian product provides a link between the independent sets and chromatic numbers as follows: Theorem 4.7.5 A graph G with n vertices has a vertex m-colouring if and only if GKm has an independent set with n vertices. A set X of vertices of a graph G is a clique if any two distinct vertices of G are linked by an edge of G. Observe that X is a clique in G if and only if X is an independent set in the compliment of G, G. Question 4.7.6 Find the cardinality of a maximum clique in each graph of Fig. 4.10. Let (G) be the number of vertices in a largest clique of graph G. Observe that (G) (G) for each graph G (the vertices of clique require dierent colours in any proper vertex colouring of G). The graphs in which for every induced subgraph H we have (H) = (H) are called perfect. These graphs are very important; they include interval graphs and bipartite graphs.
72CHAPTER 4. COLOURINGS OF GRAPHS, INDEPENDENT SETS AND CLIQUES Question 4.7.7 Prove that a bipartite graph is perfect.
Chapter 5
Matchings in graphs
5.1 Matchings in (general) graphs
In a graph G, a matching is a collection of edges with no common end-vertices. Edges af, be, cd form a matching in graph R in Figure 5.1; af, bf, cd is not a matching as edges af and bf have a common vertex, f. All matchings of graph H in Figure 5.1 consist of only one edge. A matching M in a graph G is called maximum if M contains the maximum possible number of edges in a matching of G. In a graph G with n vertices (n is even), a matching is called perfect if it contains n/2 edges. For example, af, be, cd is a perfect matching in graph R of Figure 5.1. Clearly, H in Figure 5.1 has no perfect matching. We will see below that Q has no perfect matching either. It is natural to ask when a graph has a perfect matching. Tutte was the rst to answer
a b c R
d e
a e d H
f y Q
Figure 5.1: Matchings in graphs: R has a perfect matching, H has only matchings of size 1, Q has no perfect matching
73
74
CHAPTER 5. MATCHINGS IN GRAPHS
this question in 1947. Let S be a set of vertices in a graph G. If we delete S from G we get the graph G S with connectivity components; some of these components have even number of vertices, the others odd number of vertices. Let o(G S) be the number of components with odd number of vertices. Suppose that G has a perfect matching M . For every connected components R of G S, M may have only (|V (R)| 1)/2 edges within R and thus at least one vertex of R must have a matching (in M ) vertex outside of R, which can only belong to S (all edges going outside of R have an end-vertex in S). Since these vertices of S must be distinct, o(G S) |S|. This inequality is a necessary condition for a graph G = (V, E) to have a perfect matching (for every S V ). Tutte proved that this condition is also sucient. Theorem 5.1.1 (Tutte) A graph G = (V, E) has a perfect matching if and only if o(G S) |S| for every S V. This theorem can be used to show that graph H in Figure 5.1 has no perfect matching. Indeed, if we delete vertex e from H (S = {e} and, thus, |S| = 1), we get four components with odd number of vertices. Let S = {x, y} for graph Q in Figure 5.1. Observe that o(Q S) = 4 > 2 = |S| and, thus, by Tuttes theorem Q has no perfect matching. To see that a matching is maximum or not, one can use the following result by Berge, 1957. For a matching M in a graph G, a path P is M -augmenting if edges of P alternate between edges in M and not in M and if the end-vertices of P are not end-vertices of edges in M. Theorem 5.1.2 (Berge) A matching M in a graph G is maximum if and only if G has no M -augmenting path. For example, matching ad, be in graph R of Figure 5.1 is not maximum since path cebdaf is an augmenting path. If we delete the edges of the matching from the path, then we get a larger matching ce, bd, af. Thus, this theorem can be used to increase the size of matching. For bipartite graphs, the corresponding procedure is considered in the next section. Question 5.1.3 Find a maximum matching in Q of Figure 5.1.
5.2
Matchings in bipartite graphs
In this section we consider bipartite graphs. Recall that the set of vertices V of a bipartite graph G = (V, E) can be partitioned into two sets X and Y such that every edge of G has one end-vertex in X and the other in Y. The set of neighbours of a vertex x X (i.e.,
5.2. MATCHINGS IN BIPARTITE GRAPHS
75
a b c d H
a b c d
a b c d e Q
a b c d e
a b c d R
a b c d
Figure 5.2: Bipartite graphs
vertices of Y that are adjacent with x) is denoted by N (x). For a set Z X, N (X) is the union of N (x) for all x X. Theorem 5.2.1 (Hall) A bipartite graph G with partite sets X and Y (|X| |Y |) has a matching of size |X| if and only if |N (Z)| |Z| for every Z X. In Figure 5.2, graph R has no perfect matching by Halls theorem. Indeed, N (a, b, c) = {a , b } and, thus, |N (a, b, c)| = 2 < 3 = |{a, b, c}|. Halls theorem allows to check whether a bipartite graph has a perfect matching, in particular, but it does not allow to nd such a matching. Thus, we consider an algorithm based on augmenting paths that allows to nd a maximum matching in a bipartite graph. In each iteration, the algorithms starts from a current matching M , builds an augmenting tree and tries to use the tree to increase the size of matching by deleting some edges of M and adding more edges. Suppose we have a matching M in a bipartite graph G. If M includes (as end-vertices of its edges) all vertices of G, we cannot obviously nd a larger matching. Let x be a vertex of G that is not in M . We build a tree T from x (as a root) by rst taking x as the only vertex of T , but then adding every neighbour y of x together with the edge xy to T . If one of the neighbours z is not in M , we increase M by adding edge xz. If all neighbours are in M , we add to T all edges from M with end-vertices in N (x). This adds to T new vertices L (the other end-vertices of the edges in M ). We now add to T all neighbours of vertices in L, that are not in T yet, together with one edge per neighbour that connects the neighbour with a vertex in L. If one of the vertices in the last set of added vertices is not in M , we can construct an augmenting path, delete the corresponding edge of M
76
and add to M two edges instead. If all last added vertices are in M we add to T the corresponding edges in M and proceed as above. If at any stage we add a vertex z = x which is not in M , we can obtain an augmenting path from x to z. The path P is the unique path from x to z in the current T (it is unique as T is a tree). By deleting from M all edges that are in P and adding to M all edges of P that are not in M , we will obtain a matching larger than M. The above described iteration either nds an augmenting path and thus increases the size of the current matching or fails to nd an augmenting path when already all vertices are in T. The last possibility implies that the current matching is maximum and the algorithm stops. One can start the algorithm from a trivial matching, i.e., an edge in G. However, to have less iterations, it is advisable to start from a matching produced by a simple algorithm such as the greedy algorithm. The greedy algorithm for matchings starts from an edge and adds the rst possible edge to the current matching. Of course, the greedy algorithm normally does not produce a maximum matching. To illustrate the algorithm, let us consider graph H in Figure 5.2. The greedy algorithm produces matching M = {aa , bc , cd }. We build tree T starting from d. The only neighbour of d is c , so c and dc are added to T. Since c is in M , c b M and b are added to T. Then a and ba are added to T . Then a and a a M are added to T. Then b and ab are added to T . Then c and b c are added to T . Then d and cd are added to T. Since in d is not in M , we nish building T and consider the augmenting path that starts in d and ends in d (this path is always unique as T is a tree): dcba ab cd . Now we delete all edges of M in the path and add the remaining edges of the path to M . We obtain {ab , ba , cd , dc }. The last matching is maximum as it is perfect. Question 5.2.2 Find a maximum matchings in graph Q of Figure 5.2. Start from the matching ba , cb , de , ec . Question 5.2.3 Find a maximum matchings in graph R of Figure 5.2. Start from the matching aa , db . Question 5.2.4 Find a maximum matching in a graph G with V (G) = {1, 2, . . . , 5, 1 , 2 , . . . , 5 } and E(G) = {11 , 21 , 24 , 25 , 32 , 34 , 35 , 43 , 45 , 51 , 52 , 53 }. Question 5.2.5 Six workers, A, B, C, D, E and F, are to be assigned to ve tasks, , , , and . For safety reasons, task must be done by two people working together. Worker A can perform tasks and , B asks and , C tasks and , D tasks , and , E task only, F tasks and . Each worker can perform only one task. (a) Find a way to assign the workers to the tasks such that all tasks will be carried out.
5.3. APPLICATION OF MATCHINGS (b) Suppose that C can no longer do task longer possible to carry out all tasks.
77 for health reasons. Explain why it is no
5.3
Application of matchings
Solving systems of linear equations by the Gauss elimination algorithm, one needs rst to arrange the equations in the system such that the coecients of diagonal variables are non-zero. This can achieved by using the algorithm for nding a maximum matching in a bipartite graph. For convenience, we will consider an equivalent problem: given a square matrix A = [aij ] rearrange its columns such that every diagonal entry aii becomes non-zero. We illustrate a general solution to this matrix 1 1 A= 0 0 problem by the following example. Consider 1 0 0 0 1 0 . 1 0 1 0 1 0
We see that all diagonal entries of A but a11 are zero. To nd rearrangement of columns of A that makes all diagonal entries non-zero, construct a bipartite graph B(A) corresponding to A. Let the partite sets of B(A) be {1, 2, 3, 4} and {1 , 2 , 3 , 4 }. Edge ij is in B(A) if and only if aij = 0. It is easy to check that B(A) is actually graph H in Figure 5.2, where a is 1, b is 2, c is 3 and d is 4. We need to nd a perfect matching in B(A). We already know that M = {12 , 21 , 34 , 43 } is a perfect matching in B(A). This matching tells us that, because of edge 12 , the rst column of A will be replaced by column 2, because of 21 column 2 will be replaced column 1, column 3 by column 4, and column 4 by column 3. As a result we get matrix 1 0 A = 1 0 Question 5.3.1 For the following matrix 1 0 1 1 A = 0 1 0 0 1 0 1 1 0 0 0 0 1 0 0 1 . 0 1
0 0 0 1 1
0 1 1 0 0
0 0 0, 1 0
78
nd a column rearrangement which puts A in a zero-free diagonal form. Hint: use the result of Question 5.2.2. Question 5.3.2 For the following matrix 1 0 1 0 A = 0 1 0 0 1 1
0 0 0 1 1
0 1 1 0 0
0 1 1, 1 0
nd a column rearrangement which puts A in a zero-free diagonal form. Hint: use the result of Question 5.2.4. Question 5.3.3 Let G be a bipartite graph with vertices V (G) = {1, 2, 3, 4, 5, 1 , 2 , 3 , 4 , 5 } and edges E(G) = {12 , 13 , 15 , 21 , 22 , 31 , 43 , 52 , 54 }. Starting from the matching M = {12 , 21 , 43 , 54 } of G, nd a maximum matching of G. Using this matching, nd a column permutation that puts the matrix 0 1 1 0 1 1 1 0 0 0 A = 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 in a 0-free-diagonal form. Write down the 0-free-diagonal matrix.
5.4
Solutions
Question 5.3.3 Let M = {12 , 21 , 43 , 54 } To increase M , we grow an alternating tree with root 5 . The tree will give us an augmenting path 5 12 21 3. Thus, we can replace 12 , 21 in M by 15 , 22 , 31 , which gives us a perfect matching M = {15 , 22 , 31 , 43 , 54 }. To nd the column permutation notice that B(A) = G.
5.4. SOLUTIONS
79
M shows that the required permutation replaces col. 1 by col. 5, col. 2 by col. 2, col. 3 by col. 1, col. 4 by col. 3, and col. 5 by col. 4. As a result, we get the matrix 1 0 A = 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0. 0 1
80
Chapter 6
Euler trails and Hamilton cycles in graphs

6.1 Euler trails in multigraphs
In this section we study undirected and directed multigraphs, i.e., generalizations of undirected and directed graphs in which parallel edges and arcs are allowed. A trail T in directed or undirected multigraph G is called Euler if T is closed (i.e., starts and ends at the same vertex) and includes every arc or edge of G (exactly once as T is a trail). When you are asked to draw a gure without taking pen o the paper, you draw an Euler trail in the corresponding multigraph G. If G has no Euler trail, you cannot draw G without taking pen o the paper. Euler was the rst to characterize (undirected) multigraphs with Euler trails. Again, here some obvious necessary conditions are also sucient conditions. This was the very rst theorem in graph theory. Theorem 6.1.1 A multigraph G has an Euler trail if and only if G is connected and every vertex in G has even degree. This theorem allows to conclude, in particular, that any connected multigraph whose vertices are all of degree 4 has an Euler trail. Suppose that a connected multigraph G has all vertices but two x, y of even degree. By adding a new edge xy, we make the degrees of all vertices even, i.e., the graph becomes Euler, i.e., it has an Euler trail. Similarly, if we have a connected multigraph H with 2k vertices x1 , x2 , . . . , x2k of odd degree, then at least k edges, say, x1 x2 , x3 x4 . . . , x2k1 x2k are needed to make H Euler. Clearly, k is the minimum such number. 81
82
CHAPTER 6. EULER TRAILS AND HAMILTON CYCLES IN GRAPHS
d H
d R
e Q
Figure 6.1: Multigraphs
Question 6.1.2 What is the minimum number of edges to add to multigraphs in Figure 6.1 to make them Euler.
Question 6.1.3 Let G be a connected graph. Prove that we can orient edges of G to form a digraph D in which |d+ (x) d (x)| 1 for each vertex x. Solution: Let 2k be the number of vertices of G of odd degree. Add k edges to G between vertices of odd degree to obtain a multigraph G in which all vertices have even degree. Let F be the set of added edges. By Theorem 6.1.1, G has an Euler trail T . Now we start from the rst vertex of T and traverse T in arbitrary direction and each time when we visit an edge xy from x to y we orient xy as an arc (x, y). Each edge will be oriented and we will obtain a digraph D with the same Euler trail T . Clearly, the in-degree and out-degree of each vertex z in D are the same (we enter z as many time as we leave it). Now delete all oriented edges of F from D . We obtain a digraph D. Since a vertex z of D is incident to at most one edge of F , either its in-degree in D is the same as in D or it drops by 1. Similarly, the out-degree of z in D is the same as in D or it drops by 1. Moreover, if the in-degree drops by 1, then the out-degree remains the same and vise versa. So, in D, we have either d+ (z) = d (z) or |d+ (z) d (z)| = 1. In bioinformatics, researchers use so-called edge-coloured multigraphs (for example, to study chromosome arrangements in a cell). An edge-coloured multigraph is an
6.1. EULER TRAILS IN MULTIGRAPHS 1 2 1 1 2 1 2 2 1 2
83
Figure 6.2: Hggkvists transformation. a
undirected multigraph in which every edge has a colour. The colours will be integers 1,2,. . .. In bioinformatics, we are interested in proper coloured Euler trails. An Euler trail T is called proper coloured (PC) if any two consecutive edges of T have dierent colours. For a vertex x of G and a colour i, di (x) denotes the number of edges of colour i incident to x. Theorem 6.1.4 (Kotzigs Theorem) An edge-coloured multigraph G has a properly coloured Euler trail if and only if G is connected, each vertex of G is of even degree, and for every vertex x and every colour i, di (x) j=i dj (x). Question 6.1.5 Let G be a connected graph and each vertex degree of G is even. Prove that we can assign colours 1,2,3 to the edges of G such that G has a PC Euler trail. Is the same true when only two colours are assigned? Solution: By Theorem 6.1.1, G has an Euler trail T . Colour edges of T using colours 1 and 2 alternatively. This way all consecutive pairs of edges will get dierent colours part from, possibly, the last and the rst edges of T (if the last edges is coloured 1). In this case, recolour the last edge in colour 3. Now, T becomes a PC trail. We cannot restrict ourselves to colours 1 and 2 only. For example, consider the cycle of length 3, two of whose edges will have the same colour whatever the colouring. Theorem 6.1.1 is a special case of Kotzigs Theorem (use a new colour for each edge of G). The following characterization of directed multigraphs with Euler trails (i.e., Euler directed multigraphs) generalizes the previous characterization for the undirected case. It is also a special case of Kotzigs Theorem. To see that replace every arc (x, y) of D by edges xz and zy of colours 1 and 2, respectively, where z is a new vertex, dierent for each arc (x, y). In general, such a transformation is called Hggkvists transformation. It a is illustrated in Figure 6.2. Theorem 6.1.6 A directed multigraph D has an Euler trail if and only if the underlying graph U G(D) is connected and, for every vertex x of D, d+ (x) = d (x).
84
e 12 4 a 5 1 G 7 3 6 2 b 9 c 1 f 1
b 11 6 a 1 1 1 g f 1 d F 5 3 12 b
e 6 f
12
6 4
2 d c e H
Figure 6.3: Weighted graphs
In particular, if a digraph H is k-regular, i.e., d+ (x) = d (x) = k for every vertex x in H, then H is Euler.
6.2
Chinese Postman Problem
Suppose a mail carrier traverses all edges in a road network (a connected graph G), starting and ending at the same vertex. The edges of G have non-negative weights representing distance or time. Wed like to nd a closed walk of minimum total length that uses all the edges of G. This is called the Chinese Postman Problem due to the fact it was rst introduced by the Chinese mathematician Guan Meigu (1962). Let us analyze the problem. If all vertex degrees of G are even, then G has an Euler trail and, thus, any Euler trail is a solution to the problem (we do not need to take any edge more than once). Suppose that G has two vertices x, y of odd degree. Then we can nd a shortest path P between x and y (for example, using Dijkstras algorithm) and by doubling edges of G belonging to P , we get a new multigraph G all of whose degrees are even. It is possible to prove that an Euler trail of G , viewed as a closed walk in G, is a solution to the problem in this case. Question 6.2.1 Solve the Chinese Postman Problem for the graph F in Figure 6.3. Solution: The graph F has only two vertices of odd degree: a and b. The shortest path between a and b is adb. Thus, we double edges ad and db in F to obtain a new multigraph F . The trail gabef bdcbdadg is its Euler trail, which, as a closed walk in F , is a solution to the Chinese Problem for F .
6.2. CHINESE POSTMAN PROBLEM Space for a gure of F .
85
Question 6.2.2 Solve the Chinese Postman Problem for the graph H in Figure 6.3.
Now assume that G has 2k vertices of odd degree (every graph has even number of vertices of odd degree, see Chapter 1). Then we can nd a shortest path between every pair of these vertices and form a complete graph K on the 2k vertices such that the weight w(xy) of an edge xy of K is the distance between x and y in G. Now we nd a minimumweight perfect matching F in K and, for each edge uv of F , double all edges of G that are used in a shortest path between u and v in G. As a result we obtain a new multigraph G whose Euler trail, viewed as a closed walk in G, is a solution to the Chinese Postman Problem for G.
Question 6.2.3 Solve the Chinese Postman Problem for the graph G in Figure 6.3.
Solution: The graph G has four vertices of odd degree: a, c, e, f . We need to nd distances between them. By inspection, we nd: dist(a, c) = 6, dist(a, e) = 4, dist(a, f ) = 9, dist(c, f ) = 5, dist(c, e) = 9, dist(e, f ) = 8. In the complete graph K on vertices a, c, e, f , edges ae and cf of weights dist(a, e) = 4 and dist(c, f ) = 5 form a lightest perfect matching of weight 4 + 5 = 9. The corresponding shortest paths are ae and cbf . Thus, we double the edges ae, cb and bf in G to form a new multigraph G . The trail daeabcbef bf cd is an Euler trail in G , which is a solution to the Chinese Postman Problem for G.
Space for gures of K and G .
Question 6.2.4 Solve the Chinese Postman Problem for a graph G with V (G) = {a, b, c, d, e}, E(G) = {ab, ac, ad, bc, bd, cd, de} in which each edge is of weight 1.
86
6.3
Hamilton cycles
A cycle in directed or undirected graph G is Hamilton if it includes all vertices of G. Unlike multigraphs with Euler trails, there is no a characterization of graphs with Hamilton cycles. Moreover, the problem to check whether a graph has a Hamilton cycle is computationally intractable (more precisely, it is NP-complete) and thus there is no fast algorithm for checking whether a graph has a Hamilton cycle. There are only some sucient conditions to guarantee that a graph has a Hamilton cycle. In particular, Theorem 6.3.1 (Diracs Theorem) A graph with n vertices in which every vertex is of degree at least n/2 has a Hamilton cycle. In particular, Km,m has a Hamilton cycle. Of course, there are many graphs with Hamilton cycle that do not satisfy this condition. In particular, consider a graph G on n 5 vertices consisting of a cycle. Clearly, G has a Hamilton cycle, but it does not satisfy the conditions of Diracs Theorem. The following theorem is stronger than Diracs Theorem (see Question 6.3.3): Theorem 6.3.2 (Ores Theorem) Let G be a graph with n vertices in which, for every pair x, y of distinct non-adjacent vertices, we have d(x) + d(y) n. Then G has a Hamilton cycle. Question 6.3.3 Derive Diracs Theorem from Ores Theorem. There are better sucient conditions that involve more graphs, but they will not be studied in this course. There are also sucient conditions for digraphs to have a Hamilton cycle. Analogously to Ores Theorem, we have the following: Theorem 6.3.4 (Meyniels Theorem) Let D be a strong digraph of order n 2. If d(x) + d(y) 2n 1 for all pairs of non-adjacent vertices in D, then D has a Hamilton cycle. Question 6.3.5 Derive Ores Theorem from Meyniels Theorem Theorem. Solution: Consider a graph G on n vertices satisfying the conditions of Ores Theorem, i.e., d(x) + d(y) n for each pair x, y of non-adjacent vertices. Let D be a digraph obtained from G by replacing every edge xy with two arcs (x, y) and (y, x). In D, we have d(x) + d(y) 2n for all pairs x, y of non-adjacent vertices (as every edge is replaced by two arcs). By Meyniels Theorem, D has a Hamilton cycle. So, G has a Hamilton cycle.
6.4. TRAVELLING SALESMAN PROBLEM
87
a c d e
f g
Figure 6.4: An example for DTH
6.4
Travelling Salesman Problem
In the Travelling Salesman Problem (TSP), we are given Kn with non-negative weights assigned to its edges. We need to nd a Hamilton cycle of Kn of minimum weight (the weight of a cycle is the sum of weights of its edges). TSP is even more dicult than the problem to check whether a graph G has a Hamilton cycle. Indeed, we can transform the latter into TSP by considering Kn with the same vertices as G in which an edge xy is of weight 0 if it belongs to G and of weight 1 if it does not belong to G. Clearly, G has a Hamilton cycle if and only if Kn with the weights assigned above has a Hamilton cycle of weight 0 (which is of minimum weight, of course). Consider a minimum-weight Hamilton cycle C in Kn with non-negative weights assigned to its edges. If we delete an edge from C, we will get a Hamilton path P , which is a spanning tree in Kn . P is not necessarily a minimum weight spanning tree, but clearly the weight of a minimum-weight spanning tree gives a lower bound for the weight of a minimum-weight Hamilton cycle C in Kn . Since TSP is very dicult, there are some heuristic algorithms (heuristics, for short) for nding some good solution rather than optimal one. We consider one such heuristic called the double tree heuristic (DTH). The heuristic starts from nding a minimum weight spanning tree T in Kn (with weights on edges). Then, it doubles every edge of T , which results in a multigraph M. This multigraph is connected and has only vertices of even degree. So, M is Euler. We nd an Euler trail R in M . (Notice that R includes all vertices of Kn .) Then we consider R and delete all repeated vertices from it (apart from the rst and last). We get a Hamilton cycle H. DTH is illustrated in Figure 6.4. Suppose that a minimum cost spanning tree is given in the left hand side of the gure. After doubling its edges, we get the graph in the right hand side of the gure. We nd an
88
Euler trail (as a sequence of vertices for simplicity): acbcdf dedghgdca. After deleting vertex repetitions, we get a tour T = acbdf egha. If TSP satises the triangle inequality, i.e. the weight w(xy) + w(yz) w(xz), then it is not dicult to show that the solution found by DTH is always at most twice the optimal one.
Chapter 7
NP-completeness
In this chapter we shall consider TSP from the computational complexity point of view. There is strong reason to believe that the best possible algorithm for nding the absolute best, or optimum, Hamilton cycle for TSP is to try all possible permuations and pick the one that yields the minimal value of the cost function. This is disappointing because there are factorially many permutations. We shall look at the nature of this factorial explosion and very briey at a variety of approaches for nding good rather than optimal solution.
7.1
Why is arranging objects hard?
Consider TSP in the following lay person formulation: The salesman is given a map containing several cities and routes between them of varying lengths. The salesmans task is to work out an itinerary which visits every city but which takes the minimum distance. A naive approach to this problem is to randomly pick a rst city to start at, then randomly pick another city, and add the route from the rst city to this city to the itinerary being constructed. Now pick another city at random, and add the route from the second city to this city to the itinerary. Continue until an itinerary has been derived which includes all the cities. The problem is that this may not be the shortest distance.
7.2
Brute force optimisation
Although the nive algorithm above is likely to produce a very inecient route, it does at a least have the advantage of being fast. It grows linearly with the number of cities on the map. Using this as the basic method, an optimal itinerary could be found by constructing all possible itineraries and selecting the one that yields the shortest distance. However, 89
90
CHAPTER 7. NP-COMPLETENESS
there are n! ways of selecting the n cities (because the rst city can be any of the n cities, and the second city can be any of the remaining n 1 and so on). Assume we have a computer that can execute an entire itinerary construction in 0.1s. How large a map can we cope with using the exhaustive optimisation algorithm of simply trying every ordering and selecting the best one? Cosmologists believe that the age of the universe since the big bang is about 1.6 1017 s, so the number of tries must certainly be less than 1.6 1018 since we can perform 101 tries per second. Unfortunately the factorial function grows very rapidly, and 20! is about 2.4 1018 so even if we had all the time in the world we could only check maps with up to 19 cities. More realistically we would like the process to complete in less than ten minutes, which limits the optimisation algorithm to a maximum of 6,000 attempts. This restricts us to 7 cities because 7! is 5,040. This analysis makes for depressing reading. We have deliberately selected a near-trivial construction algorithm that runs quickly at the cost of producing a poor itinerary, and yet to optimise the route requires factorial time, limiting us to the optimisation of trivial maps. Even if we had a computer that was 10 times faster we could only make 60,000 checks, which is only sucient to cope with eight cities since 8! is 40,320. This is an important result because it means that speeding up the machine by an order of magnitude only increases the maximum allowable map size by one city. If this is the best available algorithm, then we would be justied in claiming that route planning was completely intractable for non-trivial maps.
7.3
How to spot an explosive algorithm
The details of particular implementations are washed out of the complexity analysis by considering the proportional increase in number of steps and memory locations required as the size of the problem grows. If for instance the number of locations required quadruples when we double the number of cities to be visited then the space complexity of the algorithm is proportional to the square of the number of inputs. Such an algorithm is called quadratic. Our nive routing algorithm has a time complexity that grows as a the factorial of the number of cities, and is therefore a factorial algorithm. If an algorithm requires 15N 2 steps to nish a problem with N inputs then the runtime T is proportional to N 2 . Imagine that this algorithm required a large number of 32-bit additions to be performed. A 16-bit computer executing the same algorithm might have to execute two steps to perform each addition, but even so the number of steps would still be at worst 30N 2 . In both cases T N 2 , or in the terminology of complexity analysis T is of the order of N 2 or T = O(N 2 ). Clearly here, the constant of proportionality is a property of the implementation not just the algorithm. It turns out that algorithms that are factorial (or O(N !)) or exponential (O(2N )) are so expensive that even the fastest computer can not handle problems with more than
7.3. HOW TO SPOT AN EXPLOSIVE ALGORITHM
91
1015 inputs in a reasonable time. This is surprising, so we will consider the behaviour of exponential functions further.
7.3.1
Exponential functions
When asked to make estimates, humans tend to think linearly. As a simple example, consider this problem Take a piece of paper and fold it in two. Then fold it in two again. Continue until a total of twenty folds have been made. How thick will the resulting assembly be? Most people guess somewhere around the thickness of a large paperback book, say 34cm. However, the real answer is about 100 metres. A 500 page paperback novel is 2.5cm thick, so each leaf of paper must be about 0.1mm thick (since there two pages to a leaf). Every time the pile of paper is folded the stack doubles in thickness, so the thickness in metres is 104 2 2 . . . 2 = 104 220 = 104.9m This startling result can easily be made more startling still. The choice of twenty folds was simply an arbitrary small integer. How about 42 folds? Well in that case the thickness will be 242 104 m, or about 400,000km. I chose 42 folds because a 400,000km is roughly the distance from the earth to the moon truly an astronomical distance. These answers are surprising because everyday observable processes usually grow at a constant rate and we are conditioned to think in terms of this linear expansion. Exponential eects are rare but some of them work in our favour. In a sea port, the largest of ships can be tethered by a few turns of a mooring rope around a bollard. It turns out that the friction between the rope and the bollard goes up exponentially with the length of the rope turned around the bollard, so only a short length is required to hold an enormous mass in place. However, usually exponentials cause nasty surprises. There is a story about a medieval philosopher who performed a great service for his king and in return was allowed to ask for his hearts desire. The philosopher asked for a chessboard, with a single grain of wheat on the rst square, two grains on the second, four on the third and so on. This seemed to the king (who had a linear mind) a reasonable request so he granted it. The story did not record the philosophers fate after the king had discovered the implications of this decision.
7.3.2
Good algorithms
We have already seen that algorithms that require factorial or exponential numbers of steps will take so long to execute as to be useless for problem sizes greater than about
92
ten or fteen, whereas polynomial time algorithms present far more reasonable demands for computer time and space. We will follow common practice in calling polynomial time algorithms good. Formally, a good algorithm A is an algorithm such that TA = O(n ) for some 0 where TA is the worst case time complexity. Algorithms that have worse than polynomial behaviour are rather loosely called exponential even when they are not strictly, as in the case of our factorial nive routing algorithm. a It is worth noting that for some applications bad algorithms may be best. For instance, for a problem of size 20, given a choice between an exponential algorithm and a polynomial algorithm with T = O(N 5 ) we would choose the exponential because 220 is less than 205 . More subtly, the constant of proportionality in an order relation is not solely a property of the implementation. Sometimes algorithms showing quite small polynomial behaviour are known to exhibit very large constants of proportionality. Whilst there will always be some value of N above which an exponential algorithm will take longer than this polynomial algorithm, that value of N might be greater than any real problem in practice.
7.4
Tractable problems
Analysing a single algorithm is useful and can be used to demonstrate that one algorithm is better than another, but this kind of complexity analysis does not tell us if a given algorithm is the best possible solution to a particular problem. Rather remarkably, it is possible to make general statements about certain problem domains that tell us what the performance of the best possible algorithm will be even when we do not know what that algorithm is. More ominously, analysis can demonstrate that for certain problems no polynomial time solution exists. There are even problems for which the lower bound on N the time complexity not just exponential but doubly exponential (2N ). In fact there are problems where the lower bound on time complexity is of the form 22
22...
where the number of 2s can be arbitrarily large depending on the size of the input problem. It should be clear that these problems therefore admit to no algorithm that could be useful on any computer. When discussing problems (as opposed to particular algorithms for solving those problems) we talk about the problem domain. A problem domain is tractable if there is a good, or polynomial time, algorithm for solving . The class of tractable problems is called P. The following problems are in P: Graph Connectivity Problem: give a graph G, check that G is connected; Digraph Connectivity Problem: give a digraph D, check that D is strongly connected; Perfect Bipartite Matching Problem: given a graph G, check whether G has a perfect matching.
7.5. THE CLASS N P Question 7.4.1 Give further examples of tractable problems.
93
7.5
The class N P
We are interested in combinatorial optimisation problems. Such problems feature a large space of possible congurations (such as possible itineraries) and a cost function, such as distance. A good solution to the problem minimises the distance. A decision problem is a mapping from a set of inputs to one of the two outcomes YES or NO. In other words, a decision problem is the union of some inputs called instances some of which are YES-instances and the rest are NO-instances. For example, every instance of the decision version of TSP consists of the number of cities N , all distances between distinct cities and some threshold L. The instances having a Hamilton cycle of length at most L are YES-instances. The rest of the instances are NO-instances. In fact, every optimisation problem has a corresponding decision problem formed by asking if there is a solution conguration of the optimisation problem that is better than some threshold. The decision problem has boolean output and a result that can be easily checked: if a conguration with less wire than the threshold is found it can be demonstrated to fulll the requirement in polynomial time simply by adding up the lengths of all the wires in the circuit. If on the other hand we were required to demonstrate that a particular conguration contained less wire than any other conguration we would have to generate all possible congurations, which as we have seen requires factorial time. Hence this device of introducing a threshold does not reduce the time required to nd a conguration that meets the requirement, but it does allow us to demonstrate that a conguration, once found, is valid in polynomial time.
7.5.1
Minimisation and backtracking
The process of testing all possible itineraries of the nive routing algorithm could be viewed a as a process of backtracking. We have N bins and N cities. We allocate all the cities to a bin in an arbitrary way, and then rip-up part of the arrangement and try it another way. In fact, as we allocate cities to bins we have a choice of all of the remaining cities. This is what gives us the N (N 1) (N 2) . . . = N ! behaviour of the algorithm. Now imagine that we had an unlimited supply of computers. We could use one computer to decide which of the N cities to allocate to the rst bin, and then create an array of N 1 computers to decide which of the remaining cities should go into the next bucket. Then each N 1 branch would pass its information down to N 2 computers below it which would decide in parallel which of the N 3 remaining cities to allocate to the next bucket. This machine will in fact form a tree of computers N levels deep but up to N ! computers wide at the lowest level. We have traded time complexity for space (parallelism)
94 complexity.
There will be at least one path through the tree that corresponds to a minimal conguration of the problem. Imagine, instead of this rapidly branching tree of computers a single computer with the magical power to guess the right answer correctly. At each branch of the tree this computer would take the branch that leads to the correct result. It is clear that this computer could nd the correct result in only N steps, that is in polynomial time. This magical (and sadly unrealisable) machine is called a non-deterministic machine. A non-deterministic algorithm for some decision problem is an algorithm that is allowed to make an ideal guess at each stage and thus solves the decision problem in polynomial time. Not all problems have polynomial solutions even on non-deterministic machines, but the set of problems that do have these polynomial solutions using non-deterministic machines is called N P. It is not known if problems in N P have polynomial algorithms, that is whether P = N P, but there is good reason to suppose that P = N P. Very remarkably there is a set of problems in N P that can be transformed into each other in polynomial time. This set is called the N P-complete problems. Since they are all convertible to each other in polynomial time, if any one of the several thousand presently known such problems has a polynomial time algorithm then they all have. However, so far no-one has ever found one so the suspicion is that they are all intractable. On the other hand, exponential lower time bounds have not been proven for any of them either. Since N P-complete problems are believed to be intractable, we cannot guarantee nding an exact solution to every relatively moderate instance of a minimization N Pcomplete problem in practical time. However, heuristics may be able to approach a global minimum in polynomial time using well chosen rules of thumb.
7.5.2
Examples of N P-complete problems
In this section we will give several examples of N P-complete problems. If we want to prove that a problem, say A, is N P-complete then we have to do the following two steps. (i) Prove that A belongs to the class N P. (ii) Show that if we could solve A in polynomial time then we could also solve some other N P-complete problem in polynomial time. In order to do (ii) above, we take an arbitrary instance of a problem that is known to be N P-complete, and then transform it into an instance of A, using at most a polynomial
7.5. THE CLASS N P
95
number of steps in the process. We then need to show that the answer to this instance of A directly gives us the answer of the original N P-complete problem. We will later give some examples of how to do (ii). In order to do (i) above, we need to show that A belongs to the class N P. We will give some examples of how to do that later, when we have an alternative denition of the class N P. If we instead of showing that A is N P-complete just wanted to show that A was N P-hard, then we just needed to do (ii) above. This means that an N P-hard problem doesnt have to lie in the class N P, but it does have the property that if we could solve it in polynomial time, then we can solve every problem in N P in polynomial time. We will however mainly be interested in N P-complete problems. We now give a few examples of such problems: (1) Hamilton Path Problem: given a graph G, check whether G has a Hamilton path, i.e., a sequence of all vertices v1 v2 . . . vn such that vi is adjacent to vi+1 for each i < n; (2) 3-SAT (recall the main terms of Section 3.4) The 3-Satisfiability Problem, also called 3-SAT, is the following problem. Let X = {x1 , . . . , xk } be a set of boolean variables and let C1 , . . . , Cr be a collection of clauses, all of size 3, for which every literal is over X. Decide if there exists a truth assignment t = (t1 , . . . , tk ) to the variables in X such that the value of every clause will be 1. This is equivalent to asking whether or not the boolean expression F = C1 . . . Cp can take the value 1. Depending on whether this is possible or not, we say that F is satisable. Notice that 2-SAT is polynomial time solvable, but 3-SAT is not (unless P = N P, which the vast majority believe is not true); (3) 3-Colouring given a graph G, check whether (G) 3. Notice that 2-Colouring is polynomial time solvable, but 3-Colouring is not (unless P = N P, which the vast majority believe is not true); (4) Subset Sum: given a set N = {n1 , n2 , . . . , nk } of integers and an integer l, check whether there is a subset of N in which the sum of its elements equals l.
7.5.3
Alternative denition on the class N P
We can now give an alternative denition of the class N P, which we then will use to show that the four problems mentioned in the previous section are all in the class N P. A decision problem A lies in N P if and only if we can verify that the answer is YES, when this is the case, in polynomial time. In other words, if we know that the answer is YES (and know what the solution is), then we can prove (to anyone who does not know the answer) that the answer is YES in polynomial time.
96
CHAPTER 7. NP-COMPLETENESS We will now give a few examples of how to use this denition.
Hamilton Path Problem If we have a Hamilton path, then we can give the corresponding sequence of vertices to someone who doesnt know that the answer is YES, and then he/she can in polynomial time verify that it in fact is a Hamilton path. So, Hamilton Path Problem belongs to N P. Note that if the answer is NO, then it is not known if one could convince someone of this in polynomial time. We cannot just check all possible sequences of vertices, as this couldnt be done in polynomial time. But this doesnt matter as we only have to consider the case when the answer is YES. However this means that it is not known whether the decision problem No Hamilton Path belongs to N P. Note that this in some sense is the complement of Hamilton Path, as if one answers YES the other answers NO, and visa versa. 3-SAT If we know that the answer is YES, and have the solution (i.e., the corresponding truth assignments), then we can convince anybody that the answer is YES, by giving them the truth assignment. Then one can in polynomial time check that they in fact make the given instance of 3-SAT TRUE. So, 3-SAT belongs to N P. Independent Set, Clique and 3-Colouring As above if we do have an independent set of size at least k, then we give this set to anybody, and they can in polynomial time convince themselves of the fact that it is an independent set of size at least k. So Independent Set belongs to N P. Similarly, for Clique and 3-Colouring. Subset sum Again we just illustrate which integers sum up to l. This can then be checked in polynomial time. So Subset sum belongs to N P. For example, consider the set S = {5, 12, 22, 33, 35, 41, 55, 74, 99, 102, 124, 133, 167, 175, 189} and the number l = 618. Then is may not be so easy to decide if there is a subset of S that sums up to 618 (as S is not too large it can be done, but if S had 10000 elements, then these problems are very dicult). However if we know the result (and any other
7.6. PROVING THAT A PROBLEM IS N P-COMPLETE
97
information we may need), then we can easily convince someone that the answer is YES, as we simply give them the set {12, 22, 55, 74, 99, 167, 189}, and they can then quickly (in polynomial time) convince themselves that the answer is in fact YES, by checking that the set is a subset of S, and that the elements sum up to 618. Question 7.5.1 Hamilton Cycle Problem: given a graph G, check whether G has a Hamilton cycle. Show that the Hamilton Cycle Problem belongs to the class N P of decision problems.
7.6
Proving that a problem is N P-complete
Above we discussed that some decision problems are N P-complete. There are many books and research articles where thousands of problems are listed to be N P-complete. Nevertheless, once you encounter a new decision problem you may want to prove that the problem is N P-complete. (This, among other things, will give you an excuse to use a heuristic rather than a polynomial time algorithm to solve the problem.) There are a few ways to prove N P-completeness of a new decision problem. The key to the proof is to choose the right decision problem that is already known to be N P-complete. Consider the following examples: (1) Colouring: given a graph G and an integer k, check whether (G) k. Clearly, 3Colouring is a special case of Colouring when k is always equal 3. 3-Colouring is known to be N P-complete. So, if we could solve Colouring in polynomial time, we surely could solve 3-Colouring is polynomial time, too. Thus, Colouring is N P-hard. To prove that it is N P-complete, it suces to observe that given an assignment of k or less colours to the vertices of G, we can check whether this assignment is a proper vertex colouring in polynomial time. (2) Independent Set Problem: given a graph G and an integer k, check whether G has an independent set with at least k vertices. Theorem 4.7.5 asserts the following: A graph G with n vertices has a vertex m-colouring if and only if the Cartesian product G Km has an independent set with n vertices. Thus, if we could solve Independent Set Problem in polynomial time, we could also solve Colouring in polynomial time. Hence, Independent Set Problem is N P-hard. Clearly, it is in N P. (3) Hamilton Cycle Problem: given a graph G, check whether G has a Hamilton cycle. We assume that we know that Hamilton Path Problem is N P-complete. Consider a graph H, an instance of Hamilton Path Problem. Add to H a new vertex x adjacent to each vertex in H. Observe that the new graph L has a Hamilton
98
CHAPTER 7. NP-COMPLETENESS cycle if and only if H has a Hamilton path. Thus, if we could solve Hamilton Cycle Problem in polynomial time, we could also solve the N P-complete Hamilton Path Problem in polynomial time. Hence, Hamilton Cycle Problem is N Phard. Clearly, it is in N P.
(4) TSP: given a complete graph with weights on its edges and an integer l, check whether the graph has a Hamilton cycle of total weight at most l. Consider a graph G. Give weight 0 to all edges of G, join all non-adjacent vertices of G by new edges and assign weight 1 to each of them. Let H be the obtained weighted complete graph. Set l = 0. Observe that H has a Hamilton cycle of weight 0 if and only if G has a Hamilton cycle. Thus, if we could solve TSP in polynomial time, we could also solve the N P-complete Hamilton Cycle Problem in polynomial time. Hence, TSP is N P-hard. Clearly, it is in N P. Question 7.6.1 The Satisfiability Problem, also called SAT, is the following problem. Let X = {x1 , . . . , xk } be a set of boolean variables and let C1 , . . . , Cr be a collection of clauses, of any size, for which every literal is over X. Decide if there exists a truth assignment t = (t1 , . . . , tk ) to the variables in X such that the value of every clause will be 1. Prove that SAT is N P-complete. Question 7.6.2 Clique Problem: given a graph G and an integer k, check whether G has a clique with at least k vertices. Prove that Clique Problem is N P-complete using Independent Set Problem. Question 7.6.3 A set X of vertices of a graph G is called a vertex cover if any edge of G has a vertex in X. (1) Let G = (V, E) be a graph and let X be a vertex cover of G. Prove that V X is an independent set of G. (2) Let G = (V, E) be a graph and let I be an independent set of G. Prove that V I is a vertex cover of G. Solution: (1) Let u, v V X. By the denition of a vertex cover, uv E. Hence, V X is an independent set. (2) Let uv E. Since u and v cannot both be in I, at least one of them is in V I. Hence, V I is a vertex cover. Question 7.6.4 Vertex Cover Problem: given a graph G and an integer l, check whether G has a vertex cover with at most l vertices. Prove that Vertex Cover Problem is N P-complete. Hint: use the result of the previous question (both (1) and (2)).
7.7. PROCEEDING IN THE FACE OF INTRACTABLE PROBLEMS
99
7.7
Proceeding in the face of intractable problems
Faced with such an explosive run-time requirement the following options present themselves: 1. Get a faster computer. 2. Get a better algorithm. 3. Relax the quality requirement. Rather than searching for the best solution, accept good solutions which may be found without searching the entire set of solutions. As already discussed, option 1 is likely to yield at best marginal improvements for algorithms with explosive run-times, although it could help a great deal if, say, the algorithm had linear run-time behaviour. Option 2 is ideal, but discovering new algorithms is dicult, and for example in the case of many real VLSI layout problems it can be shown with a high degree of condence that all algorithms solving the problem (even ones not yet discovered) will have explosive run-time behaviour. In the case of problems which have only explosive algorithms then option 3 must be adopted. It is an observable fact that humans can optimise seven city maps in less than ve minutes, and that humans can produce good layout for much larger maps in much less than factorial time. Of course, all this really means is that the human designer is not slavishly attempting to construct all possible itineraries, but is somehow short circuiting the process by only testing plausible ones. These short circuits are called heuristics and act as rules of thumb for nding good solutions. There is no guarantee that the solution found by a heuristic is the best one available, and indeed in certain pathological circumstances a heuristic may perform very badly. It is very important to understand the strengths and weaknesses of any heuristic layout tools that you use and not to treat them as black boxes that will always generate good results. Heuristics can take several forms. They may be a guiding principle that is itself deterministic and always yields the same result. On the other hand, the algorithm may attempt to guess the answer by making sensible predictions as to the correct way forward. In order to illustrate some of the best known heuristics, we will consider the following well known combinatorial minimisation problem.
7.8
TSP Heuristics
We can now return to the travelling salesman problem (TSP). Consider TSP with only nine cities. Even for such a problem to nd a minimum weight
100
Hamilton cycle requires nearly an hour on a 25MHz 80486. We observed earlier that using a faster computer will allow us to increase the size of the solvable TSP only slightly. There are many TSP heuristics (see CS3490 blue book and [GP2002]). Two is of them are especially simple: (1) In Nearest Neighbour (NN) Heuristic we start from an arbitrary initial vertex and go to the nearest one (in terms of the weight), from the second to to a new nearest one, etc. We are not allowed to return to already visited vertices before all vertices have been visited. We return from the last visited vertex to the initial one. (2) The greedy heuristic is a similar algorithm to NN, except we dont always maintain a path, but instead maintain a number of disjoint paths. We start from an empty set of edges S (forming a collection of paths each consisting of just one vertex) and continue in the greedy manner, i.e., by adding to S the lightest edge, then the lightest edge among the remaining ones as long as a collection paths is maintained. Once we have a Hamilton path, we add the edge between its extreme vertices and we stop. (3) The double tree heuristic was considered in Section 6.4. Question 7.8.1 Consider an example of an airline that wants to oer connections between the cities Beijing, London, Mexico City, New York, Paris and Tokyo. The distances in miles between each pair of cities is given in the following table: B 5074 7753 6844 5120 1307 L 5074 5558 3469 214 5959 MC 7753 5558 2090 5725 7035 NY 6844 3469 2090 3636 6757 P 5120 214 5725 3636 6053 T 1307 5959 7035 6757 6053
B L MC NY P T
The problem is described by an edge-weighted complete graph with vertices B,L,MC,NY,P and T (see Fig. 7.1). Find six Hamilton cycles using NN and starting from each of the cities. Compare the lengths of the cycles. Question 7.8.2 Find a Hamilton cycle in the graph of Fig. 7.1 using the greedy heuristic.
7.8. TSP HEURISTICS
101
NY
3470 5560
210 3640 5070 P
2090 6840 5730 7040 7750 B
5960 6760
6050
MC
5120 1307
Figure 7.1: An edge-weighted complete graph
102
References
There is a large number of books on graph theory and algorithms and related topics. Here is a very short list of them. They were all mentioned in Chapters 1-7. [BG2000] J. Bang-Jensen and G. Gutin, Digraphs: Theory, Algorithms and Applications, Springer, London, 2000. [CLR1990] T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms., MIT Press, Cambridge MA, 1990. [GY1999] J. Gross and J. Yellen, Graph theory and its applications, CRC Press, 1999. [GP2002] The Traveling Salesman Problem and its Variations (G. Gutin and A. Punnen, eds.), Kluwer, Dordrecht, 2002.

Algorithms and Complexity II

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Algorithms and Complexity II

Uploaded by

Copyright:

Available Formats

Algorithms and Complexity 2, CS2870

Gregory Gutin December 11, 2011

1.3 1.4 1.5 1.6 1.7 1.8

3 Directed graphs 3.1

Acyclic digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1.1 3.1.2 3.1.3 Acyclic ordering of acyclic digraphs . . . . . . . . . . . . . . . . . . 41

Application: Solving the 2-Satisability Problem (extra material) . . . . . . 55 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 61

5 Matchings in graphs 5.1 5.2 5.3 5.4

Matchings in (general) graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Matchings in bipartite graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Application of matchings Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7 NP-completeness 7.1 7.2 7.3

7.6 7.7 7.8

Basic notions on undirected and directed graphs

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS j1 j2 j3 j4 j5 p1 p2 p3 p4 p5 p6 p7 Figure 1.1: A graph H.

1.2. DEGREES IN GRAPHS x z u v

Figure 1.2: A digraph D

In this section, we will study degrees of undirected graphs.

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS

Basic denitions and results

Since 2|E| is even and must be even, too.

dG (x) is even (as the sum of even numbers),

1.2. DEGREES IN GRAPHS

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS

Figure 1.3: A graphic realization of 4,4,4,3,2,1

1.2. DEGREES IN GRAPHS (b) 5, 4, 3, 3, 1, 0 (c) 4, 4, 4, 4, 4, 2, 2 (d) 5, 4, 4, 4, 4, 3, 2

Pseudocode of Havel-Hakimi algorithm (extra material)

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS x v z u

Figure 1.4: A digraph D

} if ( m == 0) print "graphic" else print "non-graphic"

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS

Figure 1.5: A graph and its subgraphs

Figure 1.6: A graph representing a network of computers

1.5. ISOMORPHISM OF GRAPHS AND DIGRAPHS

Isomorphism of graphs and digraphs

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS

Figure 1.10: Isomorphic graphs

Figure 1.11: Non-isomorphic graphs

1.6. CLASSES OF GRAPHS

Figure 1.12: A digraph Q and its converse R.

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS

Figure 1.13: Some graphs

Graph data structures

Figure 1.14: A directed multigraph and a representation by adjacency lists Adj + .

BASIC NOTIONS ON UNDIRECTED AND DIRECTED GRAPHS

(ii) The HH algorithm shows that the sequence is not graphic.

uvwtxyz 6442211 0331100 0020000

Walks, Connectivity and Trees

Figure 2.1: A graph G

CHAPTER 2. WALKS, CONNECTIVITY AND TREES x v z u

Figure 2.2: A digraph D

2.2. CONNECTIVITY j1 j2 j3 j4 j5 p1 p2 p3 p4 p5 p6 p7 Figure 2.3: A graph H.

CHAPTER 2. WALKS, CONNECTIVITY AND TREES

Figure 2.5: A graph G

CHAPTER 2. WALKS, CONNECTIVITY AND TREES

Figure 2.6: Disconnected graph G

Edge-connectivity and Vertex-connectivity (extra material)

2.3. EDGE-CONNECTIVITY AND VERTEX-CONNECTIVITY (EXTRA MATERIAL)31

Figure 2.7: A graph

CHAPTER 2. WALKS, CONNECTIVITY AND TREES

Basic properties of trees and forests

2.5. SPANNING TREES AND FORESTS

Spanning trees and forests

CHAPTER 2. WALKS, CONNECTIVITY AND TREES

Figure 2.9: Graphs

2. for v V do if visit(v) = 0 then DFS-PROC1(v)