Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

This assignment consists of several tasks, some of which involve developing C++ code,

and some of which involve explaining data structures and algorithms, and analysing
their time-complexity characteristics. You will also measure the performance of your
code, and present and evaluate the results.

There are six main tasks, some of which involve sub-tasks.

For the tasks described in Section 3, you are required to produce:

• a written report;
• and C++ source code.

Task 1: Analysis of your Binary Search Tree Implementation of a Dictionary

[Core Task] Analyse the time-complexity characteristics of the following functions from
that implementation. For each function, this analysis should include an explicit statement of
the time complexity of the function using formal notation, and a brief explanation justifying
why it has that complexity.

• lookup()
• insert()
• displayEntries()
• The destructor
• remove()
• displayTree()
• rotateLeft() and rotateRight()
• The copy constructor
• The move constructor
• The copy assignment operator
• The move assignment operator
• removeIf()

Clarifications:

• you should analyse the best, average and worst cases for each member.
• You must have some written explanation to justify each stated complexity (though
you may organise your text to share an explanation between multiple functions, if
relevant).
• You should make appropriate use of Big-O notation (and/or other relevant
asymptotic notation).
Scenario: The Great Wall Problem

Tasks 2, 3, 5 & 6 are based around investigating an efficient solution to a particular


software engineering problem, which is introduced narratively as follows:

Long ago, a great wall was built along the northern border of an ancient kingdom. After the
wall was finished, an artist walked along the wall from west to east, decorating the southern
side of each top brick with a unique symbol.

The artist's apprentice was instructed to follow and copy each symbol onto the northern side
of each brick. However, instead of copying each symbol onto the back of the same brick, the
apprentice accidentally copied the symbol onto the back of the next brick along to the east.
That is, he drew the symbol from southern side of the first brick onto the northern side of
the second brick, the symbol from the southern side of the second brick onto the northern
side of the third brick, and so forth all the way along the wall. When he reached the eastern
end, he realised his mistake, as there was no brick on which to draw the final symbol. In
panic, he removed the first brick from the wall, and destroyed it.

The years passed, and the local people gave names to the symbols decorating the wall.
They carved these names beneath the symbol on (both sides of) each brick. Many years
later, an earthquake shook the kingdom, and the wall came crashing down. Dismayed, the
King ordered all of the decorated bricks to be brought to his palace. Upon examining the
heap of bricks, the Royal Data Scientist observed that it was readily apparent which was the
north and south side of each brick, as exposure to sunlight had caused the symbols on one
side of the wall to fade more than the other. Thus there was enough information to
efficiently determine the original sequence of symbols.

The Royal Software Engineer noted that processing the symbol names would be

more efficient than processing images of the symbols, and proposed the following

algorithm for computing the original sequence:

1. Load the information from each brick into main memory, organising it in a manner
suitable for efficient searching.
2. Arbitrarily choose one of the bricks as a starting point.
3. Taking the two symbol names from the starting brick, start constructing a result
sequence elsewhere in main memory, northern name followed by southern name.
4. Repeatedly, until no matching brick is found:
i. Search for the brick with a northern symbol that matches
the back (easternmost) symbol in the result sequence.
ii. Add the southern symbol name from that brick to the back of the result
sequence.
5. Repeatedly, until no matching brick is found:
i. Search for the brick with a southern symbol that matches
the front (westernmost) symbol in the result sequence.
ii. Add the northern symbol name from that brick to the front of the result
sequence.
Task 2: Implementing the Royal Software Engineer's Algorithm

The C++ standard library provides numerous containers that could be used to implement
the Royal Software Engineer's Algorithm, including:

• std::vector
• std::list
• std::map
• std::unordered_map

Task 2a: The Optimal Solution

[Core Task] Making use of one or more of these standard library containers, implement the
Royal Software Engineer's algorithm in the way that you think will have the most efficient
time performance at runtime (in the average case).

Additional requirements and clarifications:

• If desired, you may use additional standard library containers beyond those
suggested above.
• However, you should not use data structures that you have implemented yourself.
• You may deviate from the Royal Software Engineer's Algorithm if you can think of
ways to be more efficient.

Scenario Extension: The Paranoid Monarch

Returning to the narrative:

The King came to see the Royal Software Engineer, looking concerned. "The Royal
Mathematician has just told me that just because an algorithm is efficient on average, at
worst sometimes it can still be very inefficient. If that happens here, the sequence may not
be recreated in my lifetime!"

The Royal Software Engineer sighed. "That's true in principle, but extremely unlikely for this
algorithm. I wouldn't worry about it."

"But I'm a very unlucky monarch! I don't want to take a chance. I want a solution that is
guaranteed to finish in my lifetime."

Task 2b: The Worst-case Solution

[Advanced Task] Also implement the Royal Software Engineer's algorithm in the way that
you think will have the most efficient worst-case time performance at runtime.
Task 3: Justifying Implementation Choices

[Core Task] Explain which combination(s) of containers you have used to implement
the Royal Software Engineer's Algorithm in Task 2. Justify your choice, paying particular
attention to the time-complexity characteristics of your code.

Clarifications:

• You should state the time complexity guarantees of each of the container operations
that are used in your code. This includes both member functions of the container
classes, and also library functions that operate on a container.
• Your discussion must include justifying (for each step of the Royal Software
Engineer's Algorithm), why you did not choose each of the other standard-
library containers suggested at the start of Task 2.
• If you have attempted Task 2b, then your discussion should distinguish between
average-case and worst-case complexity guarantees.
• your explanation should include an analysis of the time complexity of the algorithm
as a whole.

Task 4: Understanding Data Structures

[Core Task] Research which data structures are typically used to implement the standard
library containers that you have chosen to use for your implementation(s) in Task 2. State
these in your report, and provide citation(s) for your source of information. Explain
how these data structures provide the time-complexity guarantees for the container
operations that you have used.

For the higher grades, you should:

• thoroughly explain how the internal implementation provides those precise


guarantees;
• discuss both average-case and worst-case complexity;
• identify when a guarantee is based on amortised complexity, where that is relevant
to how you have used the container, explaining the basis for that amortisation.
Task 5: Performance Measurement — The Royal Software Engineer's
Algorithm

This task involves generating performance data, and then presenting and evaluating that
data.

Task 5a: Measuring the Royal Software Engineer's Algorithm

[Core Task] Using C++ standard-library timing facilities, write a C++ program that
measures the time usage of your implementation(s) from Task 2. Use your
implementation(s) and the provided input data files to generate performance data.

Additional requirements and information:

• Data files of varying sizes are provided in the test data; only use those that allow
you to generate results from which you will be able to draw meaningful conclusions.
• You should include the time taken to load the data into main memory in your
measurements.
• You should not include the time taken to display the results to the standard output,
as that output is just to allow the assessor to verify the correctness of your code,
rather than being part of the algorithm being investigated.

Task 5b: Presenting the Royal Software Engineer's Results

[Core Task] Present the result data from Task 5a in your report using
table(s) and graph(s).

Clarifications:

• Ensure that all of your raw results data is included in tables.


• If you have results from two implementations, then you should (at least) have one
graph on which both sets of results are plotted, to help a reader to visually compare
the results.

Task 5c: Evaluating the Royal Software Engineer's Algorithm

[Core Task] Evaluate the results in your report.

Clarifications:

• Ensure that you discuss what the results indicate regarding the time complexity of
the implementation(s).
Report Requirements

The report should then be divided into sections for Task 1, Task 3, Task 4, Task 5 and (if
attempted) Task 6. You may introduce subsections for subtasks if desired. Note that no
report content is required for Task 2.

If you complete all six tasks, then your report should be approximately 2000–4000 words in
length, with a maximum limit of 4500 words (excluding any title page, contents page,
tables, graphs, and reference list). You should not include any source code in the report. If
your results data is extensive (e.g. if you have multiple runs from which you have taken
averages), then you may place some auxiliary data tables in an appendix.

Code Requirements

• For Task 1, you must submit the C++ code that you are tasked with analysing.
• You must submit C++ code solutions for Tasks 2 and 5 of this assignment.
• To facilitate testing your solutions, a set of input data files representing instances of
the problem scenario described in Section 3 are available from this link.

Note: These files were generated on a Linux machine, and consequently have Unix line
endings. I recommend reading briefly about line ending conventions (e.g. [1]) if you are not
familiar with line-ending issues.

To allow the assessor to easily check the correctness of your implementation, for each of
Tasks 2a, 2b and 6b (if attempted) you are required to provide a batch program that:

1. accepts input data in the same format as the data files provided;
2. takes a file path to the data file as a command-line argument when the program
is launched (note: not as standard input while the program is running);
3. outputs the result sequence, one name per line, to the standard output;
4. does not take any standard input while the program is running, nor produce any
other standard output.

For Tasks 5a and 6c, you are required to provide one or more programs that run timing
measurements. These programs can run measurements for either a single input data file or
for a collection of input data files (depending on how you prefer to generate results). You
may have one timing program that measures multiple implementations, or separate
programs for each additional implementation (if attempted).

your code for Tasks 2, 5 and 6 should form a single codebase containing multiple programs,
sharing common code files to avoid code duplication.

Finally, please ensure that you have used a '#include' for the header file for any library
containers that you have used. (An eccentricity of the C++ language is that some compilers
may still compile the code if you forget to include the header, but then the code may not
compile if the assessor is using a different compiler.

You might also like