HW2_v2_5515b5cfb747f133047d661c936cdb81

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Big Homework 2

Data Structures and Algorithms


Graphs & Binary Trees

General mentions
• For this project you will work either in teams of 2 people or alone.
• The homework will be uploaded on the Moodle platform for Hw2 assignment. If you
encounter problems with the platform, contact by email / Teams your lab assistant.
• The homework must be submitted by the 27th of May, at 8:00. No late submissions will be
accepted.
• You will be asked details about your solutions during the following lab. Projects not
presented at the following lab won’t be graded! Pay attention, it is the last week of the
semester, so there are no other occasions to present it!
• The final submission will contain an archive named Student1FamilyName_Student1Name_
Student2FamilyName_Student2Name_HW2 with:
· the source files of your project (.cpp and .h), grouped in separate folders for each
exercise (and not contain the object files (.o), the codeblocks files (.cbp) or executables
(.exe))
· a README file in which you will briefly specify all the functional sections of the
project, together with instructions for the user; additionally, if you have parts of the
homework that don’t work, you may offer solution ideas for a partial score on these
sections. You will have to explain your propositions at the presentation lab.
• For all questions regarding the project, communicate on Microsoft teams with your lab
assistant (recommended to create a new post on the Teams group so that everyone can see
the question & the answer). Don’t forget to mention @General (to notify everyone) or
@TeacherName.
• Warning: we will use plagiarism detection software on your submissions (Stanford’s tool
Moss). Copied homework will be marked with 0 points.
! Observation: You can use the data structures and other exercises that we used in class.
Alternatively, you can use standard C++ implementations, BUT not other custom implementations
from the internet.

1. Network Connectivity Checker (1p)

You are given a network of devices connected via network cables. Each device is represented by a
unique integer identifier. Some devices are connected directly by cables, while others are connected
indirectly through intermediate devices. Your task is to write a function to determine whether all
devices in the network are connected, i.e., whether there exists a path between every pair of devices.
The function should take the following inputs:
1. An integer N (1 ≤ N ≤ 1000), representing the total number of devices in the network.
2. A list of tuples, where each tuple contains two integers u and v (1 ≤ u, v ≤ N, u != v ),
representing a direct connection between device u and device v.
The function should return a boolean value: true if all devices in the network are connected, and
false otherwise.
Note:
- A direct connection between devices u and v means that there is a network cable directly
connecting them.
- Indirect connections are formed through a series of direct connections. For example, if device a is
connected to device b, and device b is connected to device c, then device a is indirectly connected to
device c.
- The network may contain isolated devices (devices not connected to any other device).
Example:
Input:
N=5
connections = [(1, 2), (2, 3), (3, 4), (4, 5), (5, 1)]
Output:
true
In this example, all devices are directly or indirectly connected to each other, forming a closed loop.
Therefore, the function should return true.

0.25p reading the input + constructing the graph


0.75p algorithm & correct output
-0.25p if not handling errors / corner cases

2. Binary trees (5p)

Andrei and Sebi want to exchange text messages. To make things more interesting, they chose to
send their messages encoded as bits and decode them upon receipt. Instead of using UTF-8
encoding, which could lead them to using 32 bits for a single character, they started implementing a
different way, using the number of occurrences of each character within the text.
Encoding a message is the process of converting it into a different form using a code. The code can
be any other representation of the data, as long as there is a one-to-one correspondence between
input and output. Let’s look at the following example, where we encode the first characters of the
alphabet using 3 bits:

a b c d e f
001 010 011 100 101 110

The bit representations of each letter are called codewords.


Now, the message afbbce becomes 001101010010011101. This is how we encoded our message. To
decode it, we use the same table, but reverse the process. We take groups of 3 consecutive bits and
find the letter associated with each one.

Andrei and Sebi thought of doing something smarter and tried to use a dynamic-length encoding
based on the number of occurrences of a character within the text. They want to use prefix codes,
for which no codeword is also a prefix of some other codeword. For example, we can encode a as
0, b as 101 and c as 100. The message abc becomes 0101100. As you can observe, a (code 0) cannot
be used as a prefix of the other 2 codes (101 or 100); same for b and c. They managed to use less
bits than with fixed length codewords, while also preserving the meaning of the message.
To use this, they need a new, modified binary tree. Its leaves should be nodes containing each
character and its number of occurrences within the text, sorted by the number of occurrences. The
codeword for a character is calculated by traversing the path starting from the root of the tree down
to the leaf containing that character, where going to the left child adds a 0 to the resulting codeword,
while going to the right child adds a 1.

1. The first step would be to create a BST which stores, in each node, a character and its number of
apparitions. We would also want to modify it by adding a new method, called removeMin(), which
removes and returns the smallest value within that BST.
2. The next part would be creating a second tree: the modified binary tree (called encoding tree).
We start with the two least frequent characters in the text and connect them as leaves under the
same parent node containing the sum of their occurrences. The left child will be the one of the two
having fewer occurrences. The new node, containing the sum of the two characters’ occurrences,
will be added among the rest of the leaves in the BST storing characters and occurrences. We then
repeat the process by selecting, from the BST, the next two nodes containing the least number of
occurrences.
In the following picture, we can see this process for a given list of characters and their occurrences
within some text:

As you can see, the first selected leaves correspond to ‘f’ and ‘e’, which appear 5 and 9 times,
respectively. We then create their parent node, containing the sum of their occurrences. The process
is then repeated, the parent node now replacing the two previously selected leaves in the collection
from which we choose.
In this example, the resulting codeword for a is 0, for b it is 101, for c it is 100 and so on. We can
easily decode the text 111100001011100 as dcaabf because of the prefix codes (remember: no
codeword is a prefix for another codeword).

Having defined the logic, the two friends started planning the implementation. They would need a
class called EncodingTree which would represent their new data structure. The constructor of this
class receives as parameter the BST mentioned in the first step and creates the resulting binary tree
(You can represent the nodes in the new binary tree however you want to).

The class would also contain the method char* encode(char* text) which receives as parameter a
text and returns its encoded form using the encoding tree and a method char* decode(char* code)
which receives the encoded text and returns its decoded form.
Your task is to help Sebi and Andrei with their implementation!

Task details:
1. The message to be sent before encoding will be read from a file called message.in, while the
encoded message will be written to a file called encoded.out.
2. The encoded message received will be read from a file called encoded.in, while the decoded
message obtained will be written to a file called message.out.
3. For simplicity, the texts will contain only characters of the alphabet, digits and the space
character.
4. Apart from the mentioned data structures, you can use any other data structures you need,
but you may not replace one of the data structures required by the problem with another one
(this means that you must use the BST and the EncodingTree as they were described).

Requirements:
1. Implement the removeMin() method for BST. (0.5p)
2. Obtain the BST whose nodes store the characters and their occurrences from a given input
text. (1p)
3. Implement the constructor for the EncodingTree class. (1p)
4. Implement the encode() method. (1p)
5. Implement the decode() method. (1p)
6. Test your methods by reading the input and writing the output to the specified files (see the
Task details section). (0.5p)

Advice and tips:


1. Before starting the implementation, try to follow the steps manually, on paper, and
understand the intended behavior. Then, try to apply the algorithm to different texts.
2. Follow the implementation steps as they are listed in the Requirements section: start with
the BST, then continue with the constructor, then the encoding and decoding methods.
3. Image Segmentation (4p).

You are working at a big software company, Abode, that creates photo editing programs. Due to the
high demand from the users to have an automatic segmentation tool, your new task is to create such
a tool. The main idea that comes to your mind is to use deep learning, but your team lead tells you
that “It will cost us more time and energy to develop that rather than use graphs”.

You’re intrigued so you start searching on the web and find that images can be processed as graphs
such that each pixel is a node in the graph. More than that you come up with a brilliant idea, the
pixel's brightness can be used as weights in segmenting the image. To find a way to cut the picture
you’ll need a way to pass information about the brightness from one starting node to one final node
thus adding two nodes which are connected directly to all of the nodes.

Moreover, you see that it is not needed to pass information from nodes that are not directly
connected. So, you only connect 4-neighbor nodes with the weight given by the following idea:
If the difference between two neighboring pixels is less than a threshold, we add the weight as being
255 - difference otherwise 0.

By observing the values passed you see that you want to find the final maximum value passed such
that you can minimize the cuts for the graph and segment a part of the image (for simplicity we will
focus only on the foreground encoded as a minimal value in the picture).

Let's consider the following matrix of 3x3:


70 14 30
50 10 15
34 53 78
The pixel with value = 10 has the following 4-neighbors {14,50,53,15}
To assign the weights for the pixel =10 to the other neighbors, we will consider the given formula
so that we will have:
Pixel = 10, 1-neighbor pixel = 14 , the difference between them is |10-14| < 10 so the weight = 0
between vertex with pixel 10 and vertex with pixel 14
Pixel = 10, 2-neighbor pixel = 15, the difference between them |10-15| < 10 - weight = 0
Pixel = 10, 3-neighbor pixel = 53, the difference between them |10-53| > 10 - weight = 255 - |10-53|
Pixel = 10, 4-neighbor pixel = 50, the difference between them |10-50| > 10 - weight = 255 - |10-50|

For the pixels on the edge of the photo we will have only the number of existing neighbors. An
example is:

For finding maximum value that can be passed in the graph (max flow and as a by-product the min
cut) we need to compute augmenting paths through the residual graph and augment the flow.
Augmented path is a path of edges in the residual graph (which we can start from the original
graph) with unused capacity greater than zero from the source (s) to the sink(t).
Every augmented path has a bottleneck which is the smallest edge on the path. We use the
bottleneck value to augment the flow along the path (e.g add weight = 157 to the flow), flow is
added along the direct path and subtracted along the reverse path of the residual edges. This creates
new valid edges in the reverse direction, which can be used to find augmenting paths (eliminating
some paths that do not contribute to achieving maximum flow).

The residual graph is the one that contains residual edges, not just the original edges. On the edges
with weights / capacity of 0 there will be no flow.
The cut represents partitioning the graph into two disjoint subsets (the reunion of two sets of
vertices is the original graph, but the intersection is empty) keeping the start node (S) and the sink
node (T) in different subsets. In this case we observe that flow is maximized through the nodes: 78
and 21, we also need to separate the start node from the sink node so the edges that will be cut are
from node 20-78, 78-21, 21-sink, remaining all the nodes that are part of the foreground of the
picture.
Your team lead tells you to start working on the PoC (proof of concept) but with some limitations in
mind:
● The pictures will be of size 6x6.
● The specified threshold will be 10.
● The weight for the start node to the rest of the nodes will be 255.
● The weight for the end node to the rest of the nodes will be 255.

Task details:
● Implement the graph for image (0.75p)
● Implement the algorithm for finding the maximum flow (1p)
● Implement the algorithm cutting the graph (1.5p)
● Traverse the graph (0.5p)
● Generate image from the remaining graph (0.25p)

Example of input/output:
Example - 1
Initial Photo:
230 128 255 128 255 230
1 255 1 230 204 255
1 230 1 255 100 255
1 1 1 255 133 255
1 255 1 1 255 255
255 128 255 128 230 255
Explanation:
Firstly we create a graph that the number of nodes equal to the number of pixels in the image (the
matrix size[0] * size[1] = 36 nodes) + two additional nodes that we will use as source and sink
nodes. The source and sink nodes are connected with all the other nodes with a capacity of 255.
Thus resulting in a adjMatrix for the graph as follows:

Afterwards we compute the maximum flow and we get the residual graph which will be the same
size as before.
Residual graph after computing the max flow:

We observe that there are multiple values. For all of them which are less than the maximum of them
(in this case 230 because we don’t count the sink and the source nodes which are not part of the
image but are helping nodes, we chose the max value for this because we want just a binary
problem [foreground vs background], but by choosing each value we can generate masks that helps
us to segment each part of the image with different values) and are not 0 we cut the edges of the
node. Afterwards using the remaining graph we traverse from the first node that has connection up
to the sink in order to get what nodes are visited and then with them generate the new image.

New Photo:

Example - 2
100 107 103 101 183 145
151 151 151 5 132 100
123 32 5 5 132 100
123 5 5 123 123 154
151 5 5 151 100 123
151 5 5 100 100 123

New Photo:
000000
000500
005500
055000
055000
055000

Visual representation (with resize to 200x200):


Example 2 (the original image - left, the segmented image - right)
Example 1 (the original image - left, the segmented image - right)

Links:
https://julie-jiang.github.io/image-segmentation/
https://en.wikipedia.org/wiki/Graph_cuts_in_computer_vision
https://en.wikipedia.org/wiki/Maximum_flow_problem

Glossary
- Bottleneck = the smallest value that an edge has thus limiting the flow
- Capacity = the weight of the edge, how much information can be passed from a node to
another, maximum information sent
- Flow / Max flow = Movement of a certain quantity of information through a connected
graph.
- Augmented path = A path in the graph that has available capacity for additional
information to be sent from the source node to the sink node
- Residual graph = when sending flow through the graph some edges might not use all of its
capacity. It has forward edges (flow goes the direction of the edge) that is less than the
capacity resulting in residual capacity available which can be used to send more flow. The
reverse edges allow flow in the opposite direction (from the original edge), meaning that it
can “cancel out” some of the flow that was previously sent. The residual graph is
constructed based on the original graph and the current flow including both forward and
reverse edges, representing the potential for sending additional flow or removing flow that
has already been sent.
- Graph cut = partitioning the graph into two disjoint subsets (the reunion of two sets of
vertices is the original graph, but the intersection is empty) keeping the start node (S) and
the sink node (T) in different subsets
- In our case min-cut is relevant (smallest total weight of the edges which if removed
would disconnect the source from the sink)

You might also like