Professional Documents
Culture Documents
Aphs
Aphs
Bangalore, India
भारतीय विज्ञान संस्थान
बंगलौर, भारत
DS221 | 2019
Data Structures,
Algorithms & Data
Science Platforms
Yogesh Simmhan
simmhan@iisc.ac.in
L5: Graphs
Graph ADT, Algorithms
Slides courtesy:
Venkatesh Babu, CDS, IISc
2019-08-27 2
CDS.IISc.ac.in | Department of Computational and Data Sciences
What is a Graph?
▪ A graph G = (V,E) is composed of:
V: set of vertices
E: set of edges connecting the vertices in V
▪ An edge e = (u,v) is a pair of vertices
▪ Example:
a b V= {a,b,c,d,e}
E= {(a,b),(a,c),(a,d),
c (b,e),(c,d),(c,e),
(d,e)}
d e
2019-08-27 3
CDS.IISc.ac.in | Department of Computational and Data Sciences
CS16
Applications
▪ Electronic circuit design
▪ Transport networks
▪ Biological Networks
2019-08-27
http://www.pnas.org/content/103/50/19033/F3.expansion.html
4
http://images.slideplayer.com/18/5684225/slides/slide_28.jpg
CDS.IISc.ac.in | Department of Computational and Data Sciences
Applications
LinkedIn Social Network Graph Java Call Graph for Neo4J
http://allthingsgraphed.com/2014/10/16/your-linkedin-network/ http://allthingsgraphed.com/2014/11/12/code-graphs-5-top-open-source-data-projects/
2019-08-27 5
CDS.IISc.ac.in | Department of Computational and Data Sciences
Terminology
▪ If (v0, v1) is an edge in an undirected graph,
‣ v0 and v1 are adjacent, or are neighbors
‣ The edge (v0, v1) is incident on vertices v0 and v1
▪ If <v0, v1> is an edge in a directed graph
‣ v0 is adjacent to v1, and v1 is adjacent from v0
‣ The edge <v0, v1> is incident on v0 and v1
‣ v0 is the source vertex and v1 is the sink vertex
2019-08-27 6
CDS.IISc.ac.in | Department of Computational and Data Sciences
Terminology
▪ Vertices & edges can have labels that uniquely
identify them
‣ Edge label can be formed from the pair of vertex labels it
is incident upon…assuming only one edge can exist
between a pair of vertices
▪ Edge weights indicate some measure of distance or
cost to pass through that edge
3
1 2
2
4 5 6
1 5
4 3
2019-08-27 1 7
CDS.IISc.ac.in | Department of Computational and Data Sciences
Terminology
▪ The degree of a vertex is the number of edges incident to
that vertex
▪ For directed graph,
‣ the in-degree of a vertex v is the number of edges
that have v as the sink vertex
‣ the out-degree of a vertex v is the number of edges
that have v as the source vertex
‣ if di is the degree of a vertex i in a graph G with n
vertices and e edges, the number of edges is
n −1
Examples
G3
G1 3 G2 0 0 in:1, out: 1
0
2
1 2
3 1 23
3 3
1 in: 1, out: 2
3 3 4 5 6
3 2 in: 1, out: 0
1 1 1 1
undirected graphs
directed graph
2019-08-27 9
CDS.IISc.ac.in | Department of Computational and Data Sciences
Terminology
1 2
▪ path is a sequence of vertices
<v1,v2,. . .vk> such that 5
consecutive vertices vi and
vi+1 are adjacent 4 3
1 2 1 2
5 5
4 3 4 3
123453 2345
2019-08-27 10
CDS.IISc.ac.in | Department of Computational and Data Sciences
Terminology
1 2
▪ simple path: no
repeated vertices 5 235
4 3
1 2
▪ cycle: simple path,
except that the last 5 1541
vertex is the same as the
first vertex 4 3
2019-08-27 11
CDS.IISc.ac.in | Department of Computational and Data Sciences
Terminology
▪ Shortest Path: Path between two vertices where
the sum of the edge weights is the smallest
‣ Has to be a simple path (why?)
‣ Assume “unit weight” for edges if not specified
3
1 2 1 2
2
5 4 5 6
1 5
4 3 4 3
123 1
153 1543
2019-08-27 143 12
CDS.IISc.ac.in | Department of Computational and Data Sciences
Connected Graph
▪ connected graph: any two vertices are connected by some path
2019-08-27 13
CDS.IISc.ac.in | Department of Computational and Data Sciences
Graph Diameter
▪ A graph’s dimeter is the distance of its longest
shortest path
▪ if d(u,v) is the distance of the shortest path
between vertices u and v, then:
▪ diameter = Max(d(u,v)), for all u, v in V
▪ A disconnected graph has an infinite diameter
2019-08-27 http://mathworld.wolfram.com/GraphDiameter.html 14
CDS.IISc.ac.in | Department of Computational and Data Sciences
Subgraph
▪ subgraph: subset of vertices and edges forming a
graph
0 0 0 1 2 0
G1
1 2 1 2 3 1 2
3 3
(i) (ii) (iii) (iv)
2019-08-27 15
CDS.IISc.ac.in | Department of Computational and Data Sciences
tree
forest
tree tree
2019-08-27 16
CDS.IISc.ac.in | Department of Computational and Data Sciences
If a graph is not
complete: n=5
m < n(n -1)/2 m = (5*4)/2 = 10
2019-08-27 17
CDS.IISc.ac.in | Department of Computational and Data Sciences
More Connectivity
n = #vertices
n=5
m = #edges m=4
▪ For a tree m = n - 1
n=5
If m < n - 1, G is m=3
not connected
2019-08-27 18
CDS.IISc.ac.in | Department of Computational and Data Sciences
Connected Component
▪ A connected component is a maximal subgraph
that is connected.
▪ Cannot add vertices and edges from original graph and
retain connectedness.
▪ A connected graph has exactly 1 component.
2019-08-27 19
CDS.IISc.ac.in | Department of Computational and Data Sciences
Clique
▪ A subgraph C of a graph G with edges between all
pairs of vertices
G 6 7 6
7 C
8
5
5 Clique
1 4
2019-08-27 https://www.csc.ncsu.edu/faculty/samatova/practical-graph-mining-with-R 20
CDS.IISc.ac.in | Department of Computational and Data Sciences
Maximal Clique
▪ A maximal clique is a clique that is not part of a
larger clique
G 7 7
6 6
8
5
5 Clique
1 4
7
6
Maximal Clique 8
2019-08-27 5 21
CDS.IISc.ac.in | Department of Computational and Data Sciences
source sink
tail head
2019-08-27 22
CDS.IISc.ac.in | Department of Computational and Data Sciences
Graph Representation
▪ Adjacency Matrix
▪ Adjacency Lists
▪ Linked Adjacency Lists
▪ Array Adjacency Lists
2019-08-27 23
CDS.IISc.ac.in | Department of Computational and Data Sciences
Adjacency Matrix
▪ 0/1 n x n matrix, where n = # of vertices
▪ A(i,j) = 1 iff (i,j) is an edge
1 2 3 4 5
2
3 1 0 1 0 1 0
2 1 0 0 0 1
1
3 0 0 0 0 1
4 4 1 0 0 0 1
5
5 0 1 1 1 0
2019-08-27 24
CDS.IISc.ac.in | Department of Computational and Data Sciences
Adjacency Matrix
▪ n2 bits of space
▪ For an undirected graph, may store only lower or
upper triangle (exclude diagonal)
‣ (n2 -n)/2 bits
▪ O(n) time to find vertex degree and/or vertices
adjacent to a given vertex.
2019-08-27 27
CDS.IISc.ac.in | Department of Computational and Data Sciences
Adjacency Lists
▪ Adjacency list for vertex i is a linear list of vertices
adjacent from vertex i.
▪ An array of n adjacency lists.
aList[1] = (2,4)
2 aList[2] = (1,5)
3
1 aList[3] = (5)
4 aList[4] = (5,1)
5
aList[5] = (2,4,3)
2019-08-27 28
CDS.IISc.ac.in | Department of Computational and Data Sciences
• Array Length = n
• # of chain nodes = 2e (undirected graph)
• # of chain nodes = e (digraph)
2019-08-27 29
CDS.IISc.ac.in | Department of Computational and Data Sciences
• Array Length = n
• # of list elements = 2e (undirected graph)
• # of list elements = e (digraph)
2019-08-27 30
CDS.IISc.ac.in | Department of Computational and Data Sciences
▪ Adjacency lists
‣ Each list element is a pair (adjacent vertex, edge weight)
2019-08-27 31
CDS.IISc.ac.in | Department of Computational and Data Sciences
List<Vertex<V,E>> GetVertices();
List<Edge<V,E>> GetEdges();
bool IsEmpty(graph);
}
2019-08-27 33
CDS.IISc.ac.in | Department of Computational and Data Sciences
2019-08-27 34
CDS.IISc.ac.in | Department of Computational and Data Sciences
2019-08-27 35
CDS.IISc.ac.in | Department of Computational and Data Sciences
Breadth-First Search
Example
2
3
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3 1
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3 1
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
2 4
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
2 4
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
4 5 3 6
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
4 5 3 6
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
5 3 6
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
5 3 6
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
3 6 9 7
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
3 6 9 7
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
6 9 7
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
6 9 7
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
9 7
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
9 7
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
7 8
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
7 8
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
8
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
8
8
1
4
5
9
10
6
7 11
Breadth-First Search
Example
2 FIFO Queue
3
8
1
4
5
9
10
6
7 11
Breadth-First Search
Property
▪ All vertices reachable from the start vertex
(including the start vertex) are visited.
2019-08-27 56
CDS.IISc.ac.in | Department of Computational and Data Sciences
Time Complexity
▪ Each visited vertex is added to (and so removed
from) the queue exactly once
▪ When a vertex is removed from the queue, we
examine its adjacent vertices
▪ O(v) if adjacency matrix is used, where v is number of
vertices in whole graph
▪ O(d) if adjacency list is used, where d is edge degree
▪ Total time
▪ Adjacency matrix: O(w.v), where w is number of
vertices in the connected component that is searched
▪ Adjacency list: O(w+f), where f is number of edges in
the connected component that is searched
2019-08-27 57
CDS.IISc.ac.in | Department of Computational and Data Sciences
Depth-First Search
depthFirstSearch(v) {
Label vertex v as reached;
for(each unreached vertex u
adjacent to v)
depthFirstSearch(u);
}
2019-08-27 58
CDS.IISc.ac.in | Department of Computational and Data Sciences
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
4
5
9
10
6
7 11
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
2019-08-27 62
CDS.IISc.ac.in | Department of Computational and Data Sciences
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Return to 5.
2019-08-27 67
CDS.IISc.ac.in | Department of Computational and Data Sciences
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Do a dfs(3).
2019-08-27 68
CDS.IISc.ac.in | Department of Computational and Data Sciences
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
2019-08-27 69
CDS.IISc.ac.in | Department of Computational and Data Sciences
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
Return to 1.
2019-08-27 70
CDS.IISc.ac.in | Department of Computational and Data Sciences
Depth-First Search
2
3
8
1
4
5
9
10
6
7 11
2019-08-27 71
CDS.IISc.ac.in | Department of Computational and Data Sciences
DFS Properties
▪ DFS has same time complexity as BFS
▪ DFS requires O(h) memory for recursive function stack
calls while BFS requires O(w) queue capacity
▪ Same properties with respect to path finding, connected
components, and spanning trees.
‣ Edges used to reach unlabeled vertices define a depth-first
spanning tree when the graph is connected.
▪ One is better than the other for some problems, e.g.
‣ When searching, if the item is far from source (leaves), then
DFS may locate it first, and vice versa for BFS
‣ BFS traverses vertices at same distance (level) from source
‣ DFS can be used to detect cycles (revisits of vertices in current
stack)
2019-08-27 72
CDS.IISc.ac.in | Department of Computational and Data Sciences
2019-08-27 74
CDS.IISc.ac.in | Department of Computational and Data Sciences
0 5
3 4
1
4
2 3
2019-08-27 75
CDS.IISc.ac.in | Department of Computational and Data Sciences
3
2 3
0 5
1
1
4
1
2
1
2019-08-27 76
CDS.IISc.ac.in | Department of Computational and Data Sciences
2019-08-27 77
CDS.IISc.ac.in | Department of Computational and Data Sciences
2019-08-27 78
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
1 1
2
3 4
1
2019-08-27 79
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
1 1
2
15
3 4
1
Revisit, recalculate, re-propagate…
cascading effect
2019-08-27 80
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
1 1
2
9
3 4
1
2019-08-27 81
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
1 1
2
15
3 4
1
BFS with revisits is not
efficient. Can we be smart
about order of visits?
2019-08-27 82
CDS.IISc.ac.in | Department of Computational and Data Sciences
2019-08-27 83
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
c 1
2
j
e f
1
Work out!
2019-08-27 85
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
1 1
2
9
3 4
1
2019-08-27 86
CDS.IISc.ac.in | Department of Computational and Data Sciences
Complexity
▪ Using a linked list for queue, it takes O(v2 + e)
▪ For each vertex,
‣ we linearly search the linked list for smallest: O(v)
‣ we check and update for each incident edge once: O(d)
▪ When a min heap (priority queue) with distance as
priority key, total time is O(e + v log v)
‣ O(log v) to insert or remove from priority queue
‣ O(v) remove min operations
‣ O(e) change d[ ] value operations (insert/update)
▪ When e is O(v2) [highly connected, small diameter],
using a min heap is worse than using a linear list
▪ When a Fibonacci heap is used, the total time is O(e + v
log v)
2019-08-27 87
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
5
9
11
6
7
2019-08-27 88
CDS.IISc.ac.in | Department of Computational and Data Sciences
4
5
9
11
6
7
2019-08-27 89
CDS.IISc.ac.in | Department of Computational and Data Sciences
Spanning Tree
▪ Communication Network Problems
‣ Is the network connected?
‣ Can we communicate between every pair of cities?
‣ Find the components.
‣ Want to construct smallest number of feasible links so
that resulting network is connected.
2019-08-27 91
CDS.IISc.ac.in | Department of Computational and Data Sciences
6 7
7
2019-08-27 92
CDS.IISc.ac.in | Department of Computational and Data Sciences
A Spanning Tree
A Spanning tree, cost = 51.
2
4 3
8 8
1 6 10
2 4 5
4 4 3
5
9
8 11
5 6
2
6 7
7
2019-08-27 93
CDS.IISc.ac.in | Department of Computational and Data Sciences
2
4 3
8 8
1 6 10
2 4 5
4 4 3
5
9
8 11
5 6
2
6 7
7
2019-08-27 94
CDS.IISc.ac.in | Department of Computational and Data Sciences
6 7
7
2019-08-27 95
CDS.IISc.ac.in | Department of Computational and Data Sciences
Graph Clustering
▪ Clustering: The process of dividing a set of input
data into possibly overlapping, subsets, where
elements in each subset are considered related by
some similarity measure
2 Clusters
3 Clusters
Graph Clustering
▪ Between-graph
‣ Clustering a set of graphs
‣ E.g. structural similarity between chemical compounds
▪ Within-graph
‣ Clustering the nodes/edges of a single graph
‣ E.g., In a social networking graph, these clusters could
represent people with same/similar hobbies
K-Means Clustering
k=2, maxcut = 2
h
b d k
a l
g i
c
j
e f
2019-08-27 101
CDS.IISc.ac.in | Department of Computational and Data Sciences
K-Means Clustering
h
b d k
a l
g i
c
j
e f
Pick k random vertices
2019-08-27 102
CDS.IISc.ac.in | Department of Computational and Data Sciences
K-Means Clustering
h
b d k
a l
g i
c
j
e f
2019-08-27 103
CDS.IISc.ac.in | Department of Computational and Data Sciences
K-Means Clustering
h
b d k
a l
g i
c
j
e f
2019-08-27 104
CDS.IISc.ac.in | Department of Computational and Data Sciences
K-Means Clustering
Pick one of the
two colors
h
b d k
a l
g i
c
j
e f
All vertices colored
Pick one of the Cut = 3; Cut > MaxCut
two colors Repeat!
2019-08-27 105
CDS.IISc.ac.in | Department of Computational and Data Sciences
K-Means Clustering
h
b d k
a l
g i
c
j
e f
Pick k random vertices
2019-08-27 106
CDS.IISc.ac.in | Department of Computational and Data Sciences
K-Means Clustering
h
b d k
a l
g i
c
j
e f
2019-08-27 107
CDS.IISc.ac.in | Department of Computational and Data Sciences
K-Means Clustering
Pick one of
the two colors h
b d k
a l
g i
c
j
e f
All vertices colored
Cut = 2; Cut <= MaxCut
Done!
2019-08-27 108
CDS.IISc.ac.in | Department of Computational and Data Sciences
PageRank
▪ Centrality measure of web page quality based on
the web structure
‣ How important is this vertex in the graph?
▪ Random walk
‣ Web surfer visits a page, randomly clicks a link on that
page, and does this repeatedly.
‣ How frequently would each page appear in this surfing?
▪ Intuition
‣ Expect high-quality pages to contain “endorsements”
from many other pages thru hyperlinks
‣ Expect if a high-quality page links to another page, then
the second page is likely to be high quality too
2019-08-27 Simmhan, SSDS, 2016; Lin, Ch 5.3 PAGERANK 109
CDS.IISc.ac.in | Department of Computational and Data Sciences
PageRank, recursively
PageRank Iterations
α=0
Initialize P(n)=1/|G|
2019-08-27 111
Lin, Fig 5.7
CDS.IISc.ac.in | Department of Computational and Data Sciences
Tasks
▪ Self study
‣ Read: Graphs and graph algorithms (online sources)
2019-08-27 112
CDS.IISc.ac.in | Department of Computational and Data Sciences
Questions?