Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Be it social networks, traffic systems, or even molecules, it is often useful to

model a set of interconnected nodes. Graph theory provides a logically sound,


and powerful, way to do just this. So, let’s take a closer look at what it is, how
it works, and when we might use it.

Not to be confused with a graph on the xy-plane that displays a relation-


ship between two quantities, this type of graph, most fundamentally, is a data
structure that represents a connection of vertices - or “nodes”. When modelling
the real world, such connections between nodes may have constraints, such as
a direction or a certain value associated with them. For instance, if we are
modelling a traffic system, we may need to represent a one-way road, or a very
congested road. This gives rise to the need for different features on a graph, like
directions or weights.

In a weighted graph, each connection between two nodes has a certain value
associated with it. For example, it may represent the number of bonds in a
molecule of ethene, as below.

H H
1

2
C C
1

H H

This is also particularly useful for neural networks because weights between
neurones are important parameters in generating accurate output. In general,
a weight can be thought of as the “magnitude” of an edge which can denote the
value of a relevant metric.

Directed graphs contain edges that may go only in a certain direction. In


an undirected graph, if there exists an edge connecting the pair of nodes (1, 2),
this implies that this edge is also valid as the connection between (2, 1). If this
edge is directed, however, (1, 2) ̸= (2, 1).

1 2

This is applicable in the transport layer of the TCP/IP stack, when nodes de-
termine the shortest route from source to destination during packet switching.

1
Furthermore, directions can denote journeys. In problems like the Travelling
Salesman Problem, this is significant because journeys are often uni-directional.
A constraint may include beginning at a certain source node and ending at a
given destination node. Therefore, directions can help model fundamental con-
straints such as direction of travel. Within the class of directed graphs, there
also exist several other types of graph. One significant aspect of a directed
graph is whether it is cyclic or acyclic. A directed cyclic graph contains a loop.
When traversing the nodes, it would be possible to go on and on in that loop.
The graph above does not have this feature because one has no other choice
than to go from 1 to 2 to 3 to 4 (if we are starting at node 1). Therefore it
is acyclic. However, adding just one (directed) edge to this graph makes it cyclic.

1 2

Graphs that can be “embedded in the plane” are classified as planar. This
means that, whilst preserving the graph as G(V, E) with the set of vertices V
and the set of edges E, it is possible to configure it such that none of the edges
intersect each other (except at their common endpoints). For instance, the left-
most graph below is actually planar because it can be embedded in the plane
such that no edges touch each other except at shared nodes. It does not appear
to be planar because the edge (6, 1) crosses the edge (2, 3), yet this is easily
solved in the rightmost graph below. Because they are fundamentally the same
graph as G(V, E) is entirely preserved, their planarity is the same: the second
graph is visibly planar, and thus so is the first. A graph is only non-planar if
there is no way to embed it in the plane.

5 5
1 1
4 3 4 3

2 2

6 6

The connected graph with 4 nodes, formally known as K4 , is displayed be-


low. Again, it appears not to be planar because the edge (1, 3) crosses the

2
edge (2, 4), and yet, when reillustrating the latter of those edges, it ostensibly
becomes planar. So, K4 is planar.

1 4 1 4

2 3 2 3

Planarity is an important feature of a graph in solving graph problems. This


is particularly so in the famous Three Utilities Problem: there are three utilities
and three houses, and the task is to join all three houses to all three utilities in
the plane.

U1 U2 U3

H1 H2 H3

The nodes by which a face (a region bounded by edges, including the outer,
infinitely large region) is bounded alternate between house and utility, and so
each face is encompassed by at least 4 edges because each house connects to
a given utility exactly once. Since the graph must be planar, any edge must
touch exactly 2 faces. Therefore, the number of faces is never more than half
the number of edges (accounting for edges that share faces).
1
F ≤ (E)
2
Further, Euler’s Characteristic Formula states that V + E − F = 2 for a
planar graph, where V is the number of nodes, E is the number of edges, and
F is the number of faces.

F =V +E−2
1
V +E−2≤ (E)
2
2V + 2E − 4 ≤ E
2V − 4 ≤ −E
E ≤ 2V − 4

E =3×3=9

3
V =6
2V − 4 = 2(6) − 4 = 8
This implies that 9 ≤ 8, and so there must be a flaw in the original assump-
tion (that it would be possible to create a planar graph to connect the houses
to the utilities). Thus, it is impossible to connect the houses and utilities in a
planar arrangement.

Graphs have all sorts of useful applications, even in abstract Computer Sci-
ence problems. So, it is important to represent them as efficiently accessible
data structures. In an adjacency list, each node is mapped to a list of all of its
neighbours. Consider the below graph.

1 2

4 5

Node 1 neighbours node 2, while node 2 neighbours nodes 3 and 4 and so


on. So, an adjacency list may take the following two-dimensional list structure,
for a graph with n nodes:
adj list = [[noden , neighbour1 , neighbour2 ,...] * n].
Then, the corresponding adjacency list for our graph is as follows:
adj list = [[1, 2], [2, 3, 4], [3, 2], [4, 2, 5], [5, 4]].
Alternatively, but equivalently, we may implement an adjacency list as a dictio-
nary with each key being each node and each value being its neighbours:
adj list = {noden : [neighbour1 , neighbour2 ,...] * n}.

Adjacency lists are an efficient way to represent graphs, particularly sparse


graphs; with many nodes but with each node having few neighbours. Another
method of representing this as a data structure uses an adjacency matrix. This
entails creating a table that shows all the possible edges on an n × n graph
(n being the number of nodes on the graph in question), and marking with
a 1 all the cells where an edge exists, and with a 0 where there is no edge.
Mathematically, the adjacency matrix is defined as follows.

1 if (u, v) ∈ V
Auv =
0 if (u, v) ̸∈ V

For instance, in the graph above there is an edge between nodes 2 and 3.
Therefore, in the adjacency matrix, the cells (2, 3) and (3, 2) are marked 1.
However, there is no edge between nodes 3 and 5, so the cells (3, 5) and (5,

4
3) are marked 0. The adjacency matrix for our graph is below. This could be
implemented in code using a two-dimensional list with one sublist for each row.

1 2 3 4 5

1 0 1 0 0 0

2 1 0 1 1 0

3 0 1 0 0 0

4 0 1 0 1 0

5 0 0 0 1 0

When using graphs in solving a problem, it may be necessary to go across the


graph, passing through various nodes. Perhaps it is needed to find the shortest
path between two nodes for a GPS navigation application. Or perhaps we need
to search for a particular node. This is called traversal.

One method of traversal is the Depth First Search (DFS).It has time com-
plexity O(n + E), where n is the number of nodes and E is the number of edges.
This means that the time taken for the algorithm to run increases linearly as
there are more nodes and edges. A simple way to implement this is recursively,
as in the Python code listing below. During a recursive DFS, an empty stack
(with a size that is the number of nodes on the graph) is initialised. This will
help keep track of all the nodes that have been visited or not visited. Add the
source node to the stack.

Considering the current node’s neighbours, select any non-visited neighbour


and add it to the stack. Do this for all the neighbours of the node, and once
all its neighbours have been visited and added to the stack, we must backtrack.
Since there are no new non-visited nodes to visit, we pop one from the end of
the stack and repeat the process of visiting its non-visited neighbours. This
repetition applied to different nodes is what makes a recursive implementation
convenient. This process is continued until all the elements of the stack have
been popped and the stack is empty.

Listing 1: Depth First Search


def d f s ( a d j l i s t , s o u r c e n o d e ) :
v i s i t e d = set ( )

5
i f s o u r c e n o d e not in v i s i t e d :
print ( s o u r c e n o d e )
v i s i t e d . add ( s o u r c e n o d e )
f o r n e i g h b o u r in a d j l i s t [ s o u r c e n o d e ] :
dfs ( a d j l i s t , neighbour )
Consider the below graph.

1 2

4 5

The adjacency list for this graph is as follows:


{1: [2], 2: [3, 4], 3: [2], 4: [2, 5], 5: [4]}. Apply this DFS algorithm to the graph,
using the adjacency list, and starting at node 1. Then, adj list[source node] is
adj list[1], which is [2]. Then the DFS algorithm would be applied to node 2,
and so on and so forth in a recursively defined function. This would result in
visiting node 3 and then 4, and then 5.

To optimise the use of resources or energy, or to minimise the time taken


completing a task, it may be desired to travel the shortest distance. For instance,
a switch in a network has the job of directing packets during packet switching.
It must find the shortest path between two nodes, factoring in several consid-
erations such as packet congestion. So, it is needed to have an algorithm that
takes as input a graph (represented by an adjacency list), the source node, and
the destination node, and outputs the shortest path between them.

One commonly cited method - the greedy method - is a heuristic that takes
the optimal choice at each individual stage. Notably, this does not mean that
the overall resulting path is the shortest, as it is a superficial method; it only
goes one edge deep into the proposed path.

For example, in the below graph, if we are aiming to find the shortest path
from node 1 to node 4, we are initially faced with the choice of going to node 2
or to node 3. Node 1 to node 2 has a weight of 6, whereas node 1 to node 3 has
a weight of only 2. So, node 2 is visited. To get to node 4, the only choice from
here is to go directly to node 4, which has a wight of 8. The total distance us-
ing this algorithm is 10, whereas taking the alternative route has a distance of 7.

6
2

1
1 4

8
3

The described algorithm, given the adjacency list {1: [[2, 6], [3, 2]], 2: [[4,
1]], 3: [[4, 8]], 4: [[]]}, would output ([1, 3, 4], 10).

A notable problem that employs techniques in graph theory is the The Trav-
elling Salesman Problem (TSP). The TSP is an NP Hard problem that asks for
the shortest route in a weighted, undirected graph, that traverses every node
exactly once and returns to the starting node.

One easy implementation of a heuristic solution is the brute force method.


In this method, every possible route is traversed and the corresponding dis-
tances calculated. The route with the shortest distance is then outputted. One
positive aspect of this solution is that it always outputs the completely optimal
solution, yet a significant downside is that it runs in O((n − 1)!) time (for n
nodes), which is highly inefficient.

As in the shortest path problem, the TSP can also be approached using a
type of greedy algorithm known as the Nearest Neighbour Heuristic. In exactly
the same way as before, the nearest neighbour is always pursued. In graphs
with few nodes, it is not a bad solution, because there are few combinations
of paths. Yet, as you increase the number of nodes, there increase the number
of valid paths, and so it becomes an increasingly weak solution. One way to
evaluate how accurate the Nearest Neighbour Heuristic is is to find the ratio of
the distance of its solution to the true minimum distance. That way, a percent-
age score can be obtained. For instance, if it outputs a distance of 51, and the
true minimum distance is 50, then it is 2% off (too high, of course, as it can
never be lower than the true minimum). Another greedy algorithm can be used,
whereby we repeatedly attempt to connect the edges with smallest weight. An
edge is invalid and therefore rejected when it creates a cycle (unless it is the
final connection) or gives a node 3 edges.

Given a reasonable solution, there are certain algorithms that can be ap-
plied to it to refine it into an even better solution. One of the most common
improvement algorithms is an n-opt improvement, defined as considering all
combinations of n edges and reconfiguring them to shorten the path, whilst
keeping the path as a valid solution. Doing this for all possible groups of n
edges makes the graph “n-opt optimal”. For instance, in the below graph, to
make a 2-opt improvement, we would look at all the pairs of edges: namely, all
the edges are (1, 5), (2, 4), (2, 3), (2, 5), and (1, 3). We might start with the first

7
two and swap them around; in other words, we would consider an alternative
graph with edges (1, 2), and (4, 5) instead of (1, 5) and (2, 4). Do this for all
possible pairs of edges, discarding the original graph if the sum of the weights
on the new graph is less than the sum of the weights on the original solution.
Then, the new graph is 2-opt optimal.

2 3 2 3

1 4 1 4

5 5

Ralph Matta (PEPW)

You might also like