Multi-Threaded Cycle Detection in Undirected Graph

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

IPCO-630 Assignment

IIT2013134

IIT2013180

Cycle Detection in Undirected Graph


An Undirected Graph is said to have a cycle if a walk from a node using one or more edges leads back
to the same node. Detecting cycles in graphs is essential and often a prerequisite step in several
advanced applications of graphs, for example, in Dijkstras shortest path search where a negative cycle
(a cycle having sum of weights as negative) can lead to forever looping of the algorithm.

There are two commonly used approaches for detecting cycle in an undirected graph:

Union-Find Approach
Depth-First Search Approach

Union-Find Approach
In union find approach, edges are scanned one-by-one and all connected nodes are added to a
set. If an edge between two nodes belonging to the same set is encountered, it confirms a cycle.
Initially, we start with all nodes being in separate sets and have equal rank of 1. Next, edges are
scanned one at a time. If both nodes havent been added to any bigger set so far, then we
randomly make one of the two nodes as parent of the other, in order to denote that they now
belong to the same set. For this, a vector called parent is maintained and also a height vector is
used.
A node which becomes root for the first time has a height (or rank) of 2. Whenever a node is
added to the set rooted at root r, the rank of r increases by 1. So when an edge between two
nodes belonging to sets rooted at different nodes is encountered, we make root of that node
which has greater rank as the parent of root of other node. In this way, we say that the two sets
have merged. In the find-set operation, we refresh the root of all nodes belonging to a set. The
pseudo code is given below.
for each edge E = V1-V2 in graph do
// find subsets to which V1 and V2 belong
p1 = V1.find-Set ()
p2 = V2.find-Set ()
// if V1 and V2 belong to same subset, theres a cycle
if p1 = p2
print (cycle)
else
make-Union (V1, V2)
end-if
end-loop
SPACE COMPLEXITY:

O (V)

TIME COMPLEXITY:

O (V)

IPCO-630 Assignment

IIT2013134

IIT2013180

Depth-First Search Approach (Our Implementation)


Depth First Search is a popular backtracking-based traversal of the graph where we keep following
a chain of connected nodes as long as theres a path. Once we reach a dead end that has no
outgoing path, we backtrack to the previous node to follow another path. Once all possible
outgoing paths from a certain node have been examined, we backtrack to previous node. In this
way, a single instance of depth first search initiated from a certain node covers-up a connected
component of the undirected graph.
The depth first search approach is straightforward to be used as a cycle-detection technique. We
start-off by travelling down along any path of the graph. Each node we come across is marked as
visited and then we go down to traverse its adjacent vertices. If we encounter an adjacent vertex
which is already visited, weve found a cycle. This approach is intuitively recursive and can be
implemented easily as shown below.

DFS (node, parent)


visited [node] = true
for all vertices v adjacent to node
if v not equal to parent
if visited[v] = true
print (cycle)
else
DFS (v, node)
end-if
end-if
end-loop
SPACE COMPLEXITY: O (V)
TIME COMPLEXITY:

O (V)

Issues with Serial DFS Approach


In our implementation, weve opted for the Depth First Search Approach for cycle detection
attributed to its simplicity. Following two things should be kept in mind while detecting cycle on
undirected graphs using DFS:
Since we are checking the visited status of an adjacent node to determine if theres a cycle,
we must keep in mind that the parent of a node, i.e., the node which preceded current
node in the DFS, will always be adjacent to it and already visited. Hence we must make
sure that we inspect all adjacent nodes of the current node, except its parent. For this
purpose, an additional data parent is passed in the recursive method so as to make sure
that parent is never examined while scanning for adjacent nodes.

IPCO-630 Assignment

IIT2013134

IIT2013180

A call to this DFS method would perform cycle detection on one connected component
only. If the graph has multiple disjoint sets of connected nodes, then cycle detection has
to be performed on all of them. Hence the DFS method is invoked from main method
iteratively over each unvisited node. This ensures that we check for presence of cycle in
all connected components of the undirected graph.
for every node n
if visited[n] = false
DFS (n, -1)
end-if
end-loop

Parallelizing the DFS


At first sight, it seems simple to parallelize the DFS approach discussed above. We could simple invoke
each call to DFS from main method in a separate thread. But theres a blatant mistake in this. Threads
work asynchronously means that we have no control mechanism to explicitly specify their order of
execution. Because of the asynchronous execution, false cycles maybe detected as described below.

Marked visited by Main Thread


Marked visited by child thread T1
Marked visited by child thread T2
Unvisited

Shown above is a glimpse of a scenario where a graph resembling a binary tree is being undergone
cycle detection. Suppose that cycle detection was invoked from main method at root node (blue). The
root node invoked cycle detection on its two adjacent nodes in threads T1 and T2 respectively. The
yellow node was marked visited by thread T1 and pink node by thread T2. The white node, which is
adjacent to yellow node is still unvisited. Next, if the thread T1 invokes DFS on unvisited white node
(by spawning a new thread), then were good to go. However, if the main thread, which returned to
1
1

Marked visited by Main Thread


Marked visited by child thread T1
Marked visited by child thread T2

Marked visited by Main Thread

IPCO-630 Assignment

IIT2013134

IIT2013180

the main method after invoking recursive calls to DFS on yellow and pink nodes in threads T1 and T2
respectively, now makes a call to DFS on the white node inside the for loop before thread T1 could,
then there will be a problem. This situation is depicted in following figure.
At this point, when thread T1 checks for the adjacent nodes (excluding the parent blue node marked
as 1), it will see that node 2 (coloured in blue) is already visited. So it would think that a cycle has been
found while there is none.
The above argument makes it clear that we cannot invoke DFS from the for loop of main thread in
new thread. Next we look at another way to parallelize it. The recursive calls made to DFS span over
different sub trees. So this gives us a motivation that these separate tasks can be taken up in different
threads as shown below.
DFS (node, parent)
visited [node] = true
for all vertices v adjacent to node
if v not equal to parent
if visited[v] = true
print (cycle)
else
new::thread DFS (v, node)
end-if
end-if
end-loop

Issues with Parallel DFS


For the same reasons as described above, the above code would not run correctly. While the recursive
calls have been made in new threads, the main thread would still return to main method and invoke
DFS on another unvisited node simultaneously when other threads are running. This would again
result in detection of false cycles as explained earlier. To deal with this, we must guarantee that the
main thread doesnt invoke DFS on a new node unless the current call to DFS is over.

DFS (node, parent)


visited [node] = true
for all vertices v adjacent to node
if v not equal to parent
if visited[v] = true
print (cycle)
else
new::thread DFS (v, node)
end-if
end-if
end-loop
wait for child threads to finish

IPCO-630 Assignment

IIT2013134

IIT2013180

This is achieved by calling join methods of each of the thread spawned inside the loop. With our
current modification, main thread invokes DFS on a node and unless that DFS invocation is over, it
waits. This gives us correctness with parallelism.

RUST Implementation
In our RUST implementation, we use
adjacency matrix to store graphs.
Boolean vector is used to hold the
visited information of nodes. Overall,
we have the following global variables:
IDENTIFIER
NTHREADS
cnt
n
adj [500] [500]
cycle_found
visited [500]

Type
i32
i32
i32
i32
boolean
boolean

Usage
constant value of 10 denoting maximum no of threads to use
keep count of no of threads spawned so far
Actual number of nodes in graph
adjacency matrix to store the graph
flag variable to indicate cycle has been found
vector to store visited status of nodes

After reading input, DFS method is invoked iteratively over every


unvisited node with parent as -1. Simultaneously, before making
every call, we first check whether cycle has already been found,
so as to ensure that no wasteful computations are done.

The DFS method begins with declaration of vector


children to hold the references to newly spawned
threads and current node is marked as visited. Next we
check for presence of cycle and return if cycle has already
been found.

After this is the core portion of the code


where adjacent nodes are scanned. If
they are unvisited, then recursive call
to DFS is made on them. The recursive
call could be made in new thread or
current thread depending on how
many (or how less) threads have been
spawned so far. In the beginning of the
loop, we do check for presence of cycle
like before and return if so. This is a
micro-optimization step which ensures
that a costly recursive call isnt made if
cycle has already been found.

IPCO-630 Assignment

IIT2013134

IIT2013180

After the for-loop, we join the spawned threads, if any,


by using the children vector declared earlier, to hold the
references of the spawned threads.
Finally, towards the end, we check for presence of cycle
in the main method and display the results.

Speedup Analysis
To test our code against its serial counterpart, we used the following inputs.

A graph having only one big cycle comprising all nodes and no other edges except those
making up the cycle
A graph having a structure of a complete binary tree
A graph having random edges having count equal to 1/4th of the maximum possible number
of edges
A graph having random edges with count equal to 1/2 the maximum no of edges
A graph having random edges with count equal to 3/4th the maximum no of edges

We deliberately skipped the fully-connected graph because cycle-detection in such a graph would
involve only 3 iterations at the maximum and comparing results of serial and parallel codes for such
low number of iterations wouldnt be appropriate.
The graphs had 350 nodes each and following table lists the average speedup ratio of codes with
different limits for maximum number of threads.
Input Type
Cycle
Tree
Random Sparse
Random Medium
Random Dense

Filename
inp_undirected_350x350_cycle.txt
inp_undirected_350x350_tree.txt
inp_undirected_350x350_random_sparse.txt
inp_undirected_350x350_random_medium.txt
inp_undirected_350x350_random_dense.txt

T=2
0.96
1.34
1.25
1.19
0.87

T=5
0.82
1.43
1.38
1.31
0.89

T=10
0.77
1.58
1.51
1.33
0.54

Library Integration
In the library, the method to perform cycle-detection in directed graph goes by the name
cycle_detection that takes two arguments: matrix134180 of type &mut [[i32; 350]; 350] and
n1 of type i32. The matrix stores the adjacency matrix of the graph while the integer stores number
of nodes in graph.
The above method invokes depth first search through the method called dfs_134180 which takes in
current node and its parent as arguments. This DFS method spawns new thread and the limit on
maximum number of threads spawned is set by global variable called NTHREADS180. Following is
the list of global variables used:

n180: to store number of nodes in graph


cycle_found180: to store the status of whether cycle has been found or not

IPCO-630 Assignment

IIT2013134

IIT2013180

visited180 [bool]: vector to mark nodes as visited


adj180 [i32] [i32]: matrix that holds adjacency matrix in form of 2-D integer vector
cnt180: integer to store number of threads spawned so far
NTHREADS180: integer to control maximum number of threads to be spawned

You might also like