Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Algorithm Design and Complexity

Course 7
Overview
 Graphs – Introduction
 Representing Graphs
 Practical Examples of Using Graphs
 Search Algorithms
 BFS
 DFS
 Topological Sorting
Graphs – Introduction
 Graphs are very important data structures as the model a
lot of real-life objects
 There are a lot of problems that must be solved for
graphs
 Some of them are very difficult (NP-complete)
 Others are in P

 In this chapter, we shall consider problems that are in P,


therefore accept polynomial time algorithms for finding
the exact solution
Very Useful in Practice
 There exist a lot of
Open-Source libraries
for graphs:
- Graphviz
- http://www.graphviz.org/
- Prefuse
- http://prefuse.org/
- Flare
- http://flare.prefuse.org/
- Gephi

- Useful for:
- Generating graphs
- Visualizing graphs
- Modeling graphs
Representing Graphs
 As input data:
 Pairs of vertices representing the edges: (Src, Dest)
 Specialized data formats for representing graphs
 RDF
 GraphML
 dot

 In memory, for designing algorithms


 Adjacency lists
 Adjacency matrix
 Incidence matrix
Dot
graph G {node;
Dristor2--Muncii--Iancului—Obor;
Piata_Victoriei1--Gara_de_nord--Crangasi--Grozavesti--Eroilor;
Pacii--Lujerului--Politehnica--Eroilor;
Republica--Titan--Dristor1--Timpuri_Noi--Unirii1--Izvor--Eroilor;
Dristor1--Dristor2;
Unirii1--Unirii2;
Piata_Victoriei1--Piata_Victoriei2;
Piata_Sudului--Eroii_Revolutiei--Tineretului--Unirii2--Romana--Piata_Victoriei2--
Aviatorilor--Pipera;
}
GraphML
<graphml xmlns="http://graphml.graphdrawing.org/xmlns">
<graph edgedefault="undirected">

<!-- data schema -->


<key id="name" for="node" attr.name="name" attr.type="string"/>
<key id="gender" for="node" attr.name="gender" attr.type="string"/>

<!-- nodes -->


<node id="1">
<data key="name">Jeff</data>
<data key="gender">M</data>
</node>
<node id="2">
<data key="name">Ed</data>
<data key="gender">M</data>
</node>
<edge source="1" target="2"></edge>
</graph>
</graphml>
Adjacency Lists
 An array with |V| elements A B G

 One entry for each vertex B H

 Each component in the array contains the list C D E

of neighbors for the index vertex D

 Adj[u] ; u  V E F

I F

G B C
A J
B
H A

G K I A J K

C J K
H
L
K L
D
E L

F
Adjacency Matrix
A[i, j] =
A B C D E F G H I J K L
1 if (i, j)  E A 1 1
1
B

0 if (i, j)  E C
D
1 1

E 1
I
F
G 1 1
A J H 1
B
I 1 1 1
J 1
G K K 1
C L
H
L
D
E

F
Incidence Matrix
 B[u, e]
 u V
 eE
 = 1 if edge e leaves vertex u
 = -1 if edge e enters vertex u
 = 0 otherwise
Adjacency Matrix vs Adjacency Lists
 Which one is better ?
 Answer: It depends
 Graphs may be:
 Sparse: m = O(n)
 Dense: m = (n2)

 Adjacency matrix:
 Space required: (n2)
 Time for going through all edges: (n2)
 Time for finding if an edge exists: (1)
 Adjacency lists:
 Space required: (n+m)
 Time for going through all edges: (n+m)
 Time for finding if an edge exists: (max(|Adj(u)|))

 |V| = n, |E| = m
 Matrix is better for dense graphs, lists are better for sparse graphs
Practical Examples of Using Graphs
 Maps (roads, etc.), networks (computer, electric, etc.),
Web, flow networks (traffic – cars or computer
networks, pipes, etc.), relations between
processes/activities…

 Simple examples:
 The shortest path on Google Maps.
 The people that are most central (or most important) in a
social network
 Google PageRank
The Web + PageRank

http://en.wikipedia.org/wiki/PageRank
Computer Networks

http://ist.marshall.edu/ist362/pics/OSPF.gif
Social Web
Semantic Web
 Source: http://www.semanticfocus.com/media/insets/rdf-graph.png
Linked Open Data on the Web
(http://richard.cyganiak.de/2007/10/lod/)
Graph Search Algorithms
 Graph search (traversal)
 A methodology to go through all the nodes in a graph
 The result is a list of nodes in the order that they are
visited
 This list should be useful to us!
 Should contain important information about the graph

 Graph search are very simple algorithms, but have quite


some applications
 Lee’s algorithm for BFS
 Topological sorting, SCC, articulation points for DFS
Data & Notations
 G = (V, E) the graph (either directed or undirected)
 V – set of vertices (|V| = n)
 E – of edges (|E| = m)
 The edges are not weighted
 May also consider them to have equal weights: 1

 (u,v) – an edge from node u to node v;


 u..v – path from u to v; we can also denote intermediate nodes
u..x..v, u..y..v
 R(u) - reachable(u) = the set of nodes that can be reached by a
path starting from node u
 Adj(u) – nodes that are adjacent to node u
Data & Notations (2)
 color(u) – color of node u – encodes the state of node u
during the search:
 White – undiscovered by the search;
 Grey – discovered, but not finished;
 Black – finished (any finished node is also discovered)
 We have discovered all nodes in Adj(u) – for BFS
 We have discovered and finished all nodes in R(u) – for DFS

 p(u) – parent of u – encodes the previous node on the path


used to discover node u in the search

 Other notations exist, but are specific to BFS or DFS


Breadth First Search (BFS)
 Uses a start vertex for the traversal: s

 Objective: determine the minimum number of edges (shortest


path considering that all the edges have the same weight = 1)
between the source s and all the other vertices of the graph

 We also traverse the vertices in the order of their distance


from the source vertex

 δ(s,u) – the cost of a optimum path s..u; δ(s,u) = ∞ <=> u 


R(s)
 d[u] = d(s,u) – the cost of the current discovered path s..u
BFS
 For each node u, we store:
 d[u] = d(s,u) – current distance from node s to node u
 p[u]
 color[u]

 We also use a queue for storing the discovered nodes


that are not yet finished

 The predecessors form a tree called a BFS tree


 The edges are (p[u], u), where p[u] != NULL
 The source s is the root
 The level of each node in the BFS tree is d[u]
BFS – Algorithm
BFS(G, s)
FOREACH (u ∈ V)
p(u) = NULL; d[u] = INF; color[u] = WHITE; // initialization
Q= // the queue
d[s] = 0;
color[s] = GREY
Q.push(s)
WHILE (Q.length() > 0) // while we still have discovered nodes
u = Q.pop()
FOREACH (v ∈ Adj(u)) // for all neighbors of u
IF (color[v] == WHITE) // if the node is undiscovered
d[v] = d[u] + 1
p[v] = u
color[v] = GREY
Q.push(v)
color[u] = BLACK // the current node is finished
Complexity
 Depends on how the graph is stored
 It influences how we iterate through Adj(u)

 (n+m) for adjacency lists


 (n2) for adjacency matrix

WHILE (Q.length() > 0) // at most n times – once for each node


u = Q.pop()
FOREACH (v ∈ Adj(u)) // n+m times for adjacency lists
// do something

 It makes sense to use adjacency lists for BFS


Example
Source = A I
Q = A; d(A) = 0 Q = G, B
A J p(A) = null d(B) = d(G) = 1
B
I I
A A
p(B) = A
G K B
J
B
J
p(G) = A
C
H G G
L K K
D C C
H H
L L
E F D D
E F E F

Q = B, C Q = C, H Q = H, D, E Q=F
Q = D, E Q=E Q=Ø
d(C) = 2 d(H) = 2 d(D)=d(E)=3 d(F)=4
I I I I I
I I
A A A A A
A A J J J J J
J J B B B B B
B B
G /G G G G
G G K K K
K K K K
C C C C C C C
H H H H H H H
L L D L L D L L L
D D D D D

E E E E E F E E
F F
F F F F

p(C) = B p(H) = G p(D) = p(E) = C p(F) = E


BFS – Properties & Correctness
 While running BFS on graph G starting from source s:
v∈Q ⇔ v∈R(s)
 Not all the nodes in G are traversed
 Only the nodes that are reachable from s
 The others remain unexplored therefore d(u) = δ(s, u) =
INF  uR(s)

  (u,v) ∈ E, δ(s,v) ≤ δ(s,u) + 1


 Usually, δ(s,v) = δ(s,u) + 1 when p(v) = u
BFS – Properties & Correctness (2)
 Loop invariants – for the outer loop
 Let S = {nodes that have been popped out the queue}

 color[u] =
 BLACK if uS
 GREY if uQ
 WHITE if uV \ Q \ S

 p[u] =
 != NULL if uQ U S
 NULL if uV \ Q \ S

 d[u] =
 != INF if uQ U S
 INF if uV \ Q \ S

 Let Q = {v1, …, vp} ; p >= 1


 d[v1]  d[v2]  …  d[vp]
BFS – Properties & Correctness (3)
 d[v1]  d[vp]  d[v1] + 1
 Why?
 Because the white neighbors of v1 are pushed in the queue and they
have d[u] = d[v1] +1
 But, d[v1]  …  d[vp]  d[u] = d[v1] +1

 Therefore the nodes are added to the queue in order of their


d[u]
 And never change again

 Using all the above properties, we can prove that d[v] = δ(s,v)
 vV
 Thus BFS is correct!
Depth First Search (DFS)
 There is no source vertex
 All the vertices of the graph are traversed
 This traversal does not compute the shortest distance
between vertices, but has a lot of useful applications

 It computes two elements for each vertex:


 d[u] = discovery time for vertex u
 f[u] = finish time for vertex u
 It uses a discrete time between 1.. 2*n that is incremented
each time a value is assigned to the discovery or finish time of
a node
Discovery and finish time
 Discovery of a vertex u:
 When the vertex is seen for the first time in the traversal
 Changes color from WHITE to GREY

 Finishing a vertex u:
 When the search leaves the vertex
 All the nodes that could have been discovered from that
vertex are either GREY or BLACK
 There is no WHITE vertex in R(u)
 Changes color from GREY to BLACK
Data Structures
 We need a stack (LIFO) in order to implement the
traversal in order to discover all the nodes in order of
how they are reached
 The predecessors are lower in the stack
 For each node u, we use:
 p[u]
 color[u]
 d[u]
 f[u]

 color[u] and p[u] have the same meaning as for BFS


DFS Trees
 Similar to BFS, the predecessors form a forest of DFS
trees:
 The edges are (p[u], u), where p[u] != NULL
 The roots of the DFS trees are those vertices that have p[u] =
NULL
 There is a forest as it might be more than a single tree

 All the vertices in R(u) that are WHITE when u is


discovered become descendents of u in the DFS tree
 All the ancestors of u in the DFS tree are colored in
GREY (including the source)
DFS - Algorithm
DFS(G)
FOREACH (u ∈ V)
color[u] = WHITE; p[u] = NULL; // initialization
time = 0;
FOREACH (u ∈ V)
IF (color[u] == WHITE) // choose the roots of the DFS trees
DFS_Visit (u); // start exploring the node

DFS_Visit(u) // exploring a node


d[u] = ++time // the node is discovered
color[u] = GREY
FOREACH (v ∈ Adj(u)) // look for undiscovered neighbors
IF (color[v] == WHITE)
p[v] = u
DFS_Visit(v) // continue exploring this vertex
color[u] = BLACK
f[u] = ++time // the node is finished
Complexity
 DFS_Visit is called exactly once for each vertex of the
graph
 When the vertex is WHITE
 When it is discovered

 Therefore the complexity would be:


 n * complexity(DFS_Visit without the recursive call)

 Therefore, similar to BFS:


 (n+m) if using adjacency lists
 (n2) if using adjacency matrix
DFS - Example
 The notation next to each vertex means d[u]/f[u]
I 17/
I

A 1/16
A J 18/
J B 2/5
B

G K K
G 6/15
C C 7/14
H
L H 3/4
D L

D 8/9
E

F E 10/13
F 11/12
DFS – Example (2)
 Some of the steps have beenI omitted
A J
B

G K
C
H

D L
E

I I I I I I
A A A A A A
J J J J J J
B B B B B B
G G G G G G
K K K K K K
C C C C C C
H H H H H H
L L L L L L
D D D D D D
E E E E E E
F F F F F F
DFS Tree – Example
 In fact, the edges have the opposite sense than the one
represented in the figure
I

A J
B

G K

C
H
L
D

F
DFS – Properties
 The DFS forest can be formally defined as:

 Arb(G) = {Arb(u); p(u) = NULL}

 Arb(u) = (V(u), E(u))


 V(u) = {v | d(u) < d(v) < f(u)} + {u};
 E(u) = {(v, z) | v in V(u), z in V(u) && p(z) = v}

 If G is undirected
 G is a connected graph <=> Arb(G) has a single tree

 For a given graph, running DFS may build different DFS forests
(and thus different DF traversals)
 Depending on the order of choosing the roots of the trees
 Depending on how the elements in Adj(u) are chosen
Parenthesis Theorem
  u, v  V, there are three correct alternatives to arrange
the discovery and finish times of the two nodes:
 d[u]  d[v]  f[v]  f[u] (u … (v … v) … u)
 v is a descendant of u in the DFS tree
 d[u]  f[u]  d[v]  f[v] (u … u) (v … v)
 there is no direct descendent relation between u and v
 d[v]  f[v]  d[u]  f[u] (v … v) (u … u)
 there is no direct descendent relation between u and v
 u and v may be in different trees or on different paths in the same
tree

 (u = discovery of u
 u) = finishing of u
White Path Theorem
 At time d[u], any node v that is:
 WHITE and
 Reachable from u (in R(u))
 There exists a path u..v that consists of only WHITE vertices
(except u that is GRAY)

 Shall become a descendant of u in the same DFS tree that


u belongs too

 Alternative:
 v is a descendent of u in a DFS tree <=> there exists a path
that consists of only WHITE vertices (except u that is GRAY)
Edge Classification
  (u, v)E is in one of the following classes:

 Tree edge
 Any edge that is part of a DFS tree

 Back edge
 Any edge from a node to one of its ancestors in the DFS tree

 Forward edge
 Any edge from a node to one of its descendants that are not its children

 Cross edge
 Any other edge that cannot be classified in one of the above classes

 Which are the colors of the two vertices ?


Edge Classification (2)
 Tree edge
 Any edge that is part of a DFS tree
 (u, v) => u – GREY ; v - WHITE

 Back edge
 Any edge from a node to one of its ancestors in the DFS tree
 (u, v) => u – GREY ; v - GREY

 Forward edge
 Any edge from a node to one of its descendants that are not its children
 (u, v) => u – GREY ; v - BLACK

 Cross edge
 Any other edge that cannot be classified in one of the above classes
 (u, v) => u – GREY ; v - BLACK

 How can we classify a cross edge from a forward edge?


 Use the relationship between d[u] and d[v]
Edge Classification (3)
 An undirected graph only has two types of edges:
 Tree edges
 Back edges

 There cannot be any forward edges (because the edges


have no orientation) or cross edges

 Theorem: A directed graph G is acyclic <=> G has no


backward edges in a DFS search
 Demo: on whiteboard
 The same is true for undirected graphs as well
Application of Graph Searches
 BFS:
 Finding the minimum path in a maze with obstacles, a source
and a destination
 Called Lee’s algorithm

 DFS:
 Topological Sorting
 Strongly Connected Components
 Articulation Points
 Bridges
 Biconnected Components
Topological Sorting
 Given a DAG (Directed Acyclic Graph)
 Used in real applications:
 Activity diagrams:
 Nodes: activities
 Edges: dependencies between activities
 Bayesian networks
 Combinatorial logic
 Compilers

 We want to order the vertices: find A[1..n] such that for


 (u, v)E and A[i] = u, A[j] = v => i < j
Topological Sorting – General
 It is used to sort partially sorted sets
 Sets for which we can define a partial order relationship
 There are elements that cannot be ordered

 Let S– the set


 ∝ - the partial order relationship
 ∝:SxS

 We call a topological sort of S, a list A = {s1, …,sn} that


consists of all the elements from S, such that for any si ∝
sj => i < j
 In the case of DAGs, the relationship are given by the
orientation of the edges: si ∝ sj <=> (si, sj)E
Topological Sorting – Example
 Source: http://serverbob.3x.ro/IA/images/fig572_01_0.jpg
 The example is from CLRS
Topological Sorting – Idea
 Run a DFS on the graph, regardless how the DFS trees
root vertices are chosen
 Also use a list for storing the topological sorting
 After finishing a vertex, append it to the beginning of the
list

 At the end, all the elements in the list shall be ordered by


their finish time:
 A = (v1, v2, …, vn) => f[vi] > f[vj]  1  i < j  n

 Remark: A DAG may have more than a single topological


sorting
Algorithm
DFS(G)
FOREACH (u ∈ V)
color[u] = WHITE; p[u] = NULL; // initialization
A= // the list for storing the topological sorting
FOREACH (u ∈ V)
IF (color[u] == WHITE)
DFS_Visit (u, A)
PRINT A // print the topological sorting

DFS_Visit(u, A) // A is transmitted by reference


color[u] = GREY
FOREACH (v ∈ Adj(u)) // look for undiscovered neighbors
IF (color[v] == WHITE)
p[v] = u
DFS_Visit(v, A)
ELSE IF (color[v] == GREY)
print ‘ERROR:This is not a DAG!’
EXIT
color[u] = BLACK
A = cons(u, A) // insert in front of list A
Correctness
 We need to show that  (u, v)E : f[v] < f[u]

 Let’s consider that we explore (u, v)


 It means that u is GREY

 Is v:
 GRAY? NO, because (u, v) would be a back edge in a DAG.
Impossible!
 WHITE? Then d[u] < d[v] < f[v] < f[u] (Parenthesis theorem, Tree
edge)
 BLACK? Then d[v] < f[v] < d[u] < f[u] (Cross edge)
or d[u] < d[v] < f[v] < f[u] (Forward edge)
Conclusions
 We have seen that graphs are very important in modeling
structures from the real world

 Graph traversals are very simple algorithms

 They are also very useful

 Lots of applications

 We have seen one of them: topological sorting


References
 CLRS – Chapter 23

 MIT OCW – Introduction to Algorithms – video lecture


17

You might also like