Network Flow

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Flow Network

Imagine a situation: you have pipes that are connected to each other. They are different in diameter and
can let different amounts of water through. There is one water source and one destination to where the
water should be delivered. Which structure would you use to represent such a system?

There may be many options, but in this topic, you will learn about flow networks. They are used widely
in the tasks where you need to analyze the number or amount of products, materials, etc. that can be
transferred via some limited transportation channels. This analysis is often crucial, as it helps to
optimize logistic processes and estimate the number of goods that can be transported.

Flow networks
Now, what may be called a flow network?

It is a graph that has the following characteristics:

• It is directed. Just as in the example with water flowing from one pipe to another, the edges
in the graph have a starting and a finishing vertices.

• Each edge of the graph has both flow and capacity . In terms of our pipeline system, it means
the current amount of water that is transported through a pipe and the maximum one,
respectively.

• There is one source and one sink in the graph. Going back to the pipes: a source is a place
from which the water starts flowing, and sink is its destination.

Let's visualize a flow network. For example, it may look like this:

Here the numbers near the edges are given in the following format: flow/capacity.

Note that sometimes you can come across flow networks that have several sources or sinks. For
instance, you may need several sources in critical systems, where problems with the source might
threaten human lives (e.g. water and electricity systems in hospitals).
In such cases, for some problems it is useful to transform the system into a network with one
supersource and one supersink. Node X in the graph below is a supersource – a node that is connected
to all sources of the initial network, and node Y is a supersink – a node that, you've guessed it, is
connected to all sinks of the network. Remember that an indegree of the supersource and an outdegree
of the supersink equal zero.

Now it's time to take a closer look at the capacity and flow, since understanding these properties is
crucial for learning flow networks.

It is important to mention that, formally, the capacity and flow are defined as functions from the set of
edges to the real numbers. The values of these functions are indeed the flow and capacity, which are
mentioned in this topic. Here, the intention is to keep the concept of these characteristics as simple as
possible. However, don't be too surprised if you come across these terms defined as scary mathematical
functions: it is basically the same thing.

Capacity and flow

Let's start with capacity. As mentioned before, it shows the maximum resource that can be pushed
through one edge. The main thing you need to remember about capacity is that it is non-negative. If
you remember the example, it is quite logical, as it is impossible to create a pipe that can let through
only a negative amount of water. Therefore, there can never be an edge like this (the number is the
capacity):

Sometimes, you can come across edges with a zero capacity. It means that no flow can be passed
through it.

Now, moving on to the flow, you should keep in mind its characteristics:

• Capacity constraint. The flow of an edge cannot exceed its capacity. It can be only less (edges
AB, CD, and ED below) or equal (edges AC and BE).

• Skew symmetry. When you push a flow through an edge, you can subtract it from the flow of
the oppositely directed edge, even if it is non-existent. In other words, the flow of the edge from
vertex X to Y equals the negative flow of the edge from vertex Y to X. In some sense, you may
think of it as canceling the flow. For example, you can send trucks with goods from city A to
city B. However, you can discover later on that there may not be enough space in the warehouse
of the city B for storing the goods and sending them to other cities. Therefore, you may want to
send them back to A and redistribute them from A to other destinations. You will get a better
understanding of this property when studying flow network algorithms in the upcoming topics.
• Flow conservation. This rule basically says that the sum of the incoming flows equals the sum
of the outgoing ones. Mind that if you count also the negative flows of the oppositely directed
edges (you remember about the skew symmetry, right?), the total incoming and outgoing flows
both equal 0. However, this does not work for the source and the sink, there is another rule for
these two vertices: the sum of outgoing flows of the source equals the sum of the incoming
flows of the sink.

For instance, let's count the flows for the vertices A, B, and C of the graph from above:

It could've been the end of this section, but there is one more thing that you certainly need to remember
in order not to get confused. Sometimes, the term flow is used to describe not an edge, but the total
flow of the graph. In this case, it equals the sum of the flows coming out of the source and into the sink.

For example, look at the following graph. It has the sink – node G. There are two incoming edges for
this vertex – edges EG and FG that both have the flow of 2. Thus, the flow of the graph is the sum of
these values and equals 4.

Phew, now you definitely know everything about the capacity and flow! It's the perfect moment to go
on and learn about possible ways of representing flow networks in your code.

Representation
Being a special type of graph, a flow network can indeed be represented as a graph, right? As you may
remember, there are three basic options when choosing a suitable representation. Let's discuss them one
by one and compare them by representing the network from above in different ways.

The first way for representation is having an adjacency matrix for the capacities and another matrix for
storing the flows. It is easier to implement and more convenient in many cases, since you can access
the values just by indexing the matrices. However, it is inefficient in terms of memory, especially for
sparse graphs, as the space complexity is O(|V|2).

For existing edges, you write the values of the flow and capacity in the tables. If an edge between two
vertices doesn't exist, you simply put zero in the corresponding cell.
An adjacency matrix for the flows:

An adjacency matrix for the capacities:

Another one is quite similar and has the same space complexity, but the difference is that you use only
one adjacency matrix to store the pairs of the flow and capacity for each edge. You can easily create it
by combining the two matrices from above:

The final option is to use an adjacency list with the capacity and flow stored in every node. It is more
efficient in terms of memory, having a space complexity of O(∣V∣+∣E∣), as you don't need to store
values for all possible edges, only for the existent ones. Another benefit is that it is easier to add
vertices and edges to such an adjacency list.

However, accessing data is more difficult in this case, as you have to iterate through the adjacent
vertices in order to find the needed values. Additionally, flow and capacity for non-existent edges aren't
stored in the list, so they have to be computed during the work of the program, which may be
inconvenient.
All in all, each method has its pros and cons, so you should carefully choose a way of representing the
flow network, depending on the problem you want to solve.

Of course, you can find more advanced structures to store flow networks, but they are out of the scope
of this topic. Nevertheless, you are totally welcome to explore them by yourself and improve your
knowledge of Algorithms even further!
Maximum Flow Problem
Imagine that there are roads running from city A to city B. Cars are driving on the roads, but there are
always traffic jams on some sections of roads, and the rest are empty. You need to understand whether
you need new roads or you can distribute the flow of the vehicles in such a way as to make better use of
the existing routes. What is the maximum number of cars that can travel from A to B now?

Important terms
First off, let's define the problem properly. The maximum flow problem, as you may have understood,
is all about finding such a flow in a network, that the flow from the source to the sink is as big as
possible.

Additionally, you need to understand some other terms in order to go on with this topic: in particular,
different types of graphs. While reading about them and looking at examples, you sometimes may
come across edges that are not present in the initial flow network. Remember that these are oppositely
directed edges that appear due to skew symmetry.

Now, let's talk about residual capacities. Formally, a residual capacity is the difference between the
capacity of an edge and its flow. In other words, it is the maximum flow that can be pushed through a
certain edge at the moment. For instance, if an edge AB has a flow of 5 and a capacity of 10, the
residual capacity for it will be 10 − 5 = 5. Mind that pushing the flow through/along the edge means
adding some more flow to the current one. For example, if you have an edge AB with a flow of 3 and a
capacity of 7, you may push a flow of 4 through it. By doing so, you will increase the flow of the edge
to 7.

A definition strongly connected to residual capacity is the residual graph – a graph, each edge of
which has a residual capacity assigned to it.

For example, let's create a residual graph for the following network:

It will look like the one below. Let's name this graph Gr.
You may wonder why the edge 4-2 has the residual capacity of 2. Well, it can be explained like this:
now you know that there is a flow of two cars going from 2 to 4. But maybe you don't need to send
them there for optimal flow. Therefore, you put down the residual capacity of 2, meaning these two
cars may go back from 4 to 2 if needed.

In a residual graph, you can find an augmenting path. It is a path from the source to the sink that
consists only of the edges with the positive residual capacity. Therefore, by pushing a flow along this
path, the total flow of the network is increased. An important property is connected to this path: the
flow is maximal if and only if there is no augmenting path in the residual network. Indeed, it means
that there is no path, along which a flow may be pushed. Consequently, the total flow of the network
can't be increased in any way.

For example, in the above-mentioned residual graph, you can find an augmenting path 1-3-5. After you
push the flow of 1 through it, there are no more augmenting paths left, and the residual graph is the
following:

Last but not least is the level graph. Each node of this graph stores the length of the shortest path from
the source to it. Recall that the length of a path is simply the number of edges in it.

For instance, for Gr, the level graph will look like the one below. Remember that the numbers in the
nodes are the lengths from the source to the node.

You may wonder why you need to learn about all these various types of networks, but they are used in
different algorithms that solve the maximum flow problem. Therefore, it is essential to know and
understand all the above-mentioned theory.

Additional examples
The example with the roads and vehicles is not the only application of the maximum flow problem.

For instance, you may come across it while engineering different networks (e.g. an Internet network).
Unfortunately, wires can pass only a limited number of bits per second. Therefore, it is crucial to make
the most efficient use of them and maximize the traffic. In this situation, methods used to solve the
maximum flow problem may come in handy.
Another example is the pipe system. Nowadays, multistory buildings are very common. In order to
provide all residents with sufficient amount of water without creating too many pipes, the existing ones
should be used in the most efficient way. It is exactly the issue the maximum flow problem can
address! It may be applied to electricity, gas, and other similar systems in the same way.

There are many more applications of the maximum flow problem:

• Baseball elimination. Sometimes sports analysts want to find out whether a certain team may
get the most wins compared to the other ones. One approach to solving this is by narrowing it
down to a maximum flow problem.

• Airline scheduling. Airline companies find it crucial to create the most efficient schedule for
flights. One popular solution is to formulate this task as a maximum flow problem. Thanks to
such an approach, you can easily maximize the number of flights the given number of crews
can perform or, vice versa, find the minimum number of crews needed to perform a certain
number of flights.

• Image segmentation. It is a common task nowadays to try to distinguish the foreground and the
background of a picture. In order to do so, you can break it down into pixels and create a flow
network. In this network, each node is a pixel and it is connected to the neighboring pixels.
Additionally, there are a source and a sink. In the beginning of the algorithm, you assign the
following values: likelihood of a pixel being a part of the foreground/background (fi and bi,
respectively) and penalties pij for the fact that two adjacent pixels i and j are classified
differently. Then you maximize the following:
maximize ∑ f i+∑ bi−∑ p ij
i i i,j

Solving the problem


As you can see from the previous section, the maximum flow problem is common in many fields. As a
result, there are plenty of algorithms that aim to solve it. Below you will see the 3 most common of
them.

• Ford-Fulkerson algorithm. The idea behind this algorithm is the following: the initial flow
equals 0. Then you look for augmenting paths in the residual graph and increase the total flow
with the size of the flow that was pushed along the path. If there is no such path left, the
maximal flow has been found and the algorithm is stopped. The time complexity of this
algorithm is a bit unusual and equals O(Ef), where E is the number of edges and f is the
maximum flow.

• Edmonds-Karp algorithm. Just like in the previous algorithm, the flow equals 0 in the
beginning. Then, on every new step, a residual graph is created. After that you look for the
shortest path from the source to the sink in this graph and the total flow is increased by the
maximal flow that can be pushed along this path. The algorithm is repeated while a path from
the source to the sink may be found. As for time complexity, it is O(VE2), where V is the
number of vertices in the graph.

• Dinic's algorithm. In this algorithm, both residual and level graphs are used. To begin with, you
create a level graph. After checking whether the path from the source to the sink exists, you
push as many flows as possible, until no augmenting paths are left. These two steps are repeated
while a path from the source to the sink exists. In general, time complexity equals O(EV2), but it
can be reduced to O(VE∗log(V)) if you use a data structure called dynamic trees.

Maximum flow of minimum cost


It is important to understand that there may be several maximum flows in a network.

In some cases, you can choose whichever you want. However, frequently there are some other values
that may depend on the flow that you pick. For instance, regarding the example with the cars, you may
want to minimize the total length of all routes, as it will keep the amount of pollution produced by cars
as little as possible. This value that is minimized (e.g. price, time, pollution level) is assigned to each
edge and is called the cost of going through a certain edge.

For example, look at the different flows pushed through the same network:

As you can see, the total flow is the same. Additionally, it is the maximum flow possible for this
network. But let's have a look at the costs of going through the edges:

The total cost is computed by multiplying the flow by the cost and adding the numbers together. Now,
if you count the total costs for both cases, you will get 215 and 158, respectively. Therefore, if you had
to minimize the cost of the flow, you would definitely choose the second one.
Conclusion
To sum up, let's refresh the main points about the maximum flow problem:

• Maximum flow problem is a problem of finding the biggest total possible flow in a network.

• Residual capacity is the difference between the capacity of an edge and its flow. A residual
graph is a graph that has residual capacities assigned to its edges.

• The augmenting path goes from the source to the sink and consists only of the edges with the
positive residual capacity.

• A level graph is a graph whose nodes store the lengths of the shortest path from the source to
these nodes.

• The Ford-Fulkerson, Edmonds-Karp, and Dinic's algorithms are the most common ones for
solving the maximum flow problem.

• A variation of the maximum flow problem where you try to minimize the cost of the total flow
is called the maximum flow of the minimum cost problem.
Ford-Fulkerson Algorithm
By now, you are already familiar with the problem of finding the maximum flow in a network.
Moreover, you even know the gist of the most popular algorithms that solve it. However, sometimes in
order to use an algorithm successfully in various real-life tasks, a deeper understanding is needed. So,
why not dive into exploring the Ford-Fulkerson algorithm?

Algorithm
Before discussing the steps of the algorithm, let's remember the definitions of some important
structures that are used in it.

First off, a flow network is a directed graph, which has a source and a sink, and each edge of which
has flow and capacity. Secondly, a residual graph is used in this algorithm. It is a graph, each edge of
which has a residual capacity assigned to it. Don't forget that the residual capacity is the difference
between the capacity and the flow of an edge. Finally, an augmenting path is a path from the source
to the sink that consists only of edges with a positive residual capacity.

Now, it is time to move on to the algorithm. Do you recall from previous topics that the word flow can
have many meanings? For this very specific reason, this topic will use different symbols to denote
different types of flow.

Let f(v,w) be the flow of the edge that connects vertices v and w. Additionally, faug stands for the flow of
an augmenting path found on every iteration of the algorithm. The steps of the algorithm are as follows:

1. Initialize the flows of the edges with 0.

2. Find any augmenting path and determine its flow, faug. Let x1, x2, ......., xn be the residual
capacities of the edges this path consists of. Then obviously faug = min (x1, x2, ......., xn). Indeed,
the flow that you send through the augmenting path equals the minimal residual capacity of its
edges. It is pretty straightforward, isn't it?

3. Update the flows of the edges of the network. For every edge that goes from node v to a node w:

f(v,w) = f(v,w) + faug


f(w,v) = f(w,v) − faug

4. Repeat steps 2 and 3, while there is an augmenting path in the residual graph of the network.

That's it! Doesn't sound too difficult, does it?

However, as you may have noticed, there is one detail that the algorithm lacks: it doesn't tell you how
exactly you should find an augmenting path. This is why sometimes it is called not the Ford-Fulkerson
algorithm, but the Ford-Fulkerson method. Just as any other method, it has some implementations, one
of which is the Edmonds-Karp algorithm.
This algorithm is basically the same as the Ford-Fulkerson one. There is only one difference: in the
Edmonds-Karp algorithm, you push the flow through the augmenting path that is the shortest and not
just through any one. In this case, the shortest means consisting of the smallest number of edges. For
example, imagine that you have the following residual graph:

It has three augmenting paths: A-B-C-D, A-B-D, and A-D. If you use the Ford-Fulkerson algorithm,
you may choose any one of them. In the Edmonds-Karp algorithm, however, you would choose A-D, as
it is the shortest one.

Example
Now, let's look at how the algorithm would work for the following network. The numbers near each
edge are capacities.

As mentioned above, the Ford-Fulkerson algorithm doesn't determine the exact way you should select
each augmenting path. Therefore, if you try to find the maximum flow yourself, don't get scared if the
flows of the edges differ from the ones in the example at some point. The main thing here is to find the
correct total flow of the network.

After each step, you will see a picture of the network on the left and the residual graph on the right. In
the beginning of the algorithm, the flow of each edge in a network equals 0.
So, let's choose the first augmenting path. Let it be A-B-C-E. It goes through the edges AB, BC, and
CE with residual capacities 7, 8, and 5, respectively. Therefore, the maximum flow you can send
through it is 5, as it is the smallest capacity among them.

Now that the flows of the edges are updated accordingly, it's time to check if there are any more
augmenting paths. Indeed, there are several of them: A-B-E, A-C-B-E, A-D-C-B-E. Let's look at A-B-
E. The flow sent along the path this time is the following:
fABE = min(2, 10) = 2

You can see that no more flow may be pushed through the edges AB and CE. You wouldn't be able to
find any other path if you looked only at the network itself. However, there is a residual graph and you
can still find some augmenting paths in it. Let's choose the path A-C-B-E. The flow equals min(4, 5, 8)
= 4.
There is only one augmenting path left in the residual graph, therefore, there are no options here. Let's
push the flow of 1 along the path A-D-C-B-E.

Hooray!!! no augmenting paths are left, which means that you've finally found the maximum flow of
the network, which equals 12.
Complexity and correctness
Given f is the maximum flow and E is the number of edges in a network, the complexity of the
algorithm is O(fE), which may look a bit unusual. Why is it like this? Imagine that you have a network
that looks like this:

The worst-case scenario for it could be the following: first you find an augmenting path A-B-C-D-E,
and then the path A-D-C-B-E.
As you can see, on each step you find an augmenting path with the flow of 1. You repeat these steps on
and on, leading to f iterations in total. The complexity of finding an augmenting path is O(E), therefore,
the complexity of the whole algorithm is O(fE). As for the Edmonds-Karp algorithm, its complexity
doesn't depend on the maximum flow and equals O(VE2), where V is the number of vertices and E is
the number of edges.

You already know how the algorithm works and what its complexity is. However, have you wondered
why the algorithm works at all? In other words, why is the result found at the end of the algorithms the
maximum flow possible?

First thing you need to understand is whether the algorithm terminates at some point. Indeed, it does, as
the capacities of the edges are finite, so the flow that is pushed through them can't be increased forever.

The next question: why is the total flow of the network that the algorithm finds actually the biggest one
possible? Suppose it is not. Consequently, an augmenting path can be found. However, in this case, the
algorithm would still be running, because, as you remember, it stops only when there are no more
augmenting paths from the source to the sink. Therefore, our assumption is wrong, and the flow found
by the Ford-Fulkerson algorithm is maximum for the network.

Fun fact: the Ford-Fulkerson algorithm may not terminate if the capacities are not integers.
Conclusion
To sum up, let's go through the main points of the topic:

• The Ford-Fulkerson algorithm is used to solve the maximum flow problem.


• The idea behind the algorithm is the following: you find the augmenting paths in a residual
graph and sum the flows of these paths. When there are no such paths left, the algorithm stops.
• As the way of choosing augmenting paths is not specified, the Ford-Fulkerson algorithm is
sometimes called the Ford-Fulkerson method. The Edmonds-Karp algorithm is an
implementation of it.
• The complexity of the Ford-Fulkerson algorithm is O(fE), where f is the maximum flow and E is
the number of edges in a network.
• The complexity of the Edmonds-Karp algorithm is O(VE2), where V is the number of vertices
and E is the number of edges.
• For some networks with non-integer capacities, the Ford-Fulkerson algorithm may not
terminate.

You might also like