Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

APPROVED_L7_COMP7067_2023-24_sub_brief Feb 2024 - v3 - preEE **

Unit Title: DATA PROCESSING AND ANALYTICS (COMP7067)


Assessment Title: Project report and video on data processing and analytics topics
Unit Level: 7 Assessment Number: 1 of 1
Credit Value of Unit: 20 Date Issued: 19/01/2024
Unit Leader: Ismail Alarab Submission Due Date: 10/05/2024 Time: 12:30 PM
Other Marker(s): N/A Submission Location: Turnitin
Quality Assessor (QA): Hari Pandey Feedback Method: Brightspace

This is a group assignment which carries 100% of the final unit mark.

ASSESSMENT TASK
This is a group assignment with individual elements where you are asked to design and implement databases for selected
use cases and perform data analytics on given datasets. Your group should have between 3 and 5 members, and you
must enrol to your group on Brightspace by Friday 23/02/2024. After this deadline, you will be assigned to a group
randomly.

Part A
The first part of the assignment focuses on your understanding and implementation of various database technologies. You
are given three use case scenarios as below:

Use-case 1
Consider your passion for art leading to the creation of ArtBase, a company that builds artwork items for art
galleries. The core of this company is a database with a schema that captures all the information that galleries
need to maintain.
Galleries keep information about artists, their names (which are unique), birthplaces, age, and style of art. For each
piece of artwork, the database should include the artist’s name, the year it was made, its unique title, its type of art
(e.g., painting, lithograph, sculpture, photograph), and its price. Pieces of artwork are also classified into groups of
various kinds, for example, portraits, still lifes, works of the 19th century, etc. A given piece may belong to more
than one group. Each group is identified by a name (like those just given) that describes the group. Finally,
galleries keep information about customers. For each customer, galleries keep that person’s unique name,
address, total amount of dollars spent in the gallery, the artists and groups of art that the customer tends to like.

Use-case 2
You are required to implement an academic publishing database. The description of the scenario is given as
follows:
The database contains multiple academics, with their names, email addresses and research fields. These
academics submit publications to scientific journals. Names, research areas, dates of submission and acceptance
of publications must be stored. Publications may be co-authored by multiple authors; however, can have only one
corresponding author. Academics also review the publications of their peers. A publication can be reviewed by
several academics. For each review, its date and result need to be saved. An academic can be affiliated with
multiple organisations, the names and addresses of which are stored in the database. These organisations can be
subscribed to multiple journals. For the journals, the names, number of issues and the editor's name must be
recorded.
Finally, you store information about journal publishers, including their names, countries, addresses, phone
numbers, and emails. A journal can only have one publisher.

Use-case 3
UniqueTeam is an application that models soccer teams, the games they play, and the players in each team.
There is a set of teams; each team has an ID (unique identifier), name, main stadium, and the city where it is
based. Each team has many players and each player belongs to one team. Each player has a number (unique
identifier), name, date of birth (DoB), start year, and t-shirt number. Teams play matches and, in each match, there
is a host team and a guest team. The match takes place in the stadium of the host team.
For each match, there is a need to keep track of the following: the date on which the game is played, the final result
of the match, and the players who participated in the match. For each player, we record the number of goals he

Page 1 of 7
APPROVED_L7_COMP7067_2023-24_sub_brief Feb 2024 - v3 - preEE **

scored, whether or not he took a yellow card, and whether or not he took a red card. During the match, one player
may substitute another player. You should keep track of the substitution and the time at which it took place.
Each match has exactly three referees. For each referee, we have an ID (unique identifier), name, DoB, and years
of experience. One referee is the main referee and the other two are assistant referees.

Task (Part A):


Choose one of the above use-case scenarios and:
1. Develop an Entity-Relationship model of the information requirements for the selected scenario.
2. Translate your model into an equivalent relational schema. Specify all relation headings, indicating primary
and foreign keys.
3. Implement at least three important entities from your relational schema in SQL, MongoDB, and Neo4J
(could be different three entities for each technology) and generate sample data for all of the
implementations. Show samples of generated data.
4. Explain why you chose the particular entities to implement with each technology.
5. Come up with 5 test cases for each database technology and implement those using queries. Show the
query code and the output for each database technology.
Include the answers to each of the five points above in your technical report.

Part B
In an era of growing data complexity and volume, feature selection and construction techniques play a key role in
understanding our data in helping reduce the dimensionality and improve learnability in data analytics problems. Both data
and big data processing and analytics feature selection techniques are important for reducing the time required to build
machine learning models and improving the performance of these algorithms. Moreover, principal component analysis is
an important algorithm used in data and big data processing for the purpose of data visualisation, as well as for
dimensionality reduction and for gaining insight in the knowledge hidden in the data. For the submission, you are given 3
datasets (below this section) and you are asked to define the classification problem of your choice, select one dataset and
perform the following tasks:
1. Define the training and testing set for your dataset.
2. Implement a neural network and one other classification algorithm of your choice and compare the performance for
the dataset you choose.
3. Apply Principal Component Analysis to the dataset and explain its outcome. How does the number of principal
components affect the percentage of variance covered for this dataset?
4. Apply any feature selection method of your choice and compare the performance using any one algorithm used in
question 2 before and after applying the feature selection algorithm.
5. Explainable AI enables the understanding of machine learning model decisions by domain experts. Investigate the
explainability of your machine learning models that answers why the machine learning model made specific
decisions on any three samples.
You must perform the tests on the test data set to evaluate the results. Use the AUC (area under curve) and accuracy as
a metric for comparing the performance. Include the answers to each of the five points above in your technical report.

Datasets (Select one):


Room Occupancy Estimation - UCI Machine Learning Repository
RT-IoT2022 - UCI Machine Learning Repository
Human Activity Recognition Using Smartphones Data Set

Groups
It is your responsibility to decide whom you would like to work with. You also need to be prepared for any
unforeseen circumstances that may arise when working as a team. You should inform the unit leader via email
about your group members within three weeks of the issue date of this coursework. Otherwise, the remaining
students will be randomly assigned into groups.
Each group should consist of four members. If you would like to form a group of a different size (minimum: 3,
maximum: 5), you can do this in exceptional circumstances only, and you must explain these circumstances in your
report. In the case of you being in a group of 5, you should be adding new methods (not covered in class) or adding
some extra results, etc. We assume that all members in the group will receive equal marks for the technical report
(80%) of the assignment. If group members do not agree with this default mark distribution and raise an issue (e.g.
somebody not engaged or made a poor contribution), the unit leader will contact the group members via email to
resolve the issue and marks can be reallocated.

ORIGINALITY REQUIREMENT
The following originality requirements will apply to this assignment:

Page 2 of 7
APPROVED_L7_COMP7067_2023-24_sub_brief Feb 2024 - v3 - preEE **

You are not allowed to use any Generative AI or other AI powered tools, such as ChatGPT, for this assessment. Any use
of these tools for any part of this assessment would be considered an academic offence.

SUBMISSION FORMAT
There will be two deliverables for this assignment:
1. Technical report (80%) - Group submission. The technical report needs to address both tasks A and B.
2. Individual presentation (20%) – Individual submission. Prepare a ten-minute video presentation, where you
discuss the problem statement, your role in the group, your contribution to the final submission and the steps that
you followed for completing the tasks. The links to the video presentation of every group member should be
included in the project report.

The word count for the groups submission and individual submission are as follows:
3 members – 9,000 words total (7,500 for group report, 500 each for individual presentation)
4 members – 12,000 words total (10,000 for group report, 500 each for individual presentation)
5 members – 15,000 words total (12,500 for group report, 500 each for individual presentation)

MARKING CRITERIA
The following criteria will be used to assess the assignment:
Task A (40%):
Subtask Expectations ILO
All required entities and attributes are listed.
Multivalued or any special attributes are identified.
Relations and cardinalities (including the optional participation) are
1. Entity Relationship correctly identified and described. 1, 2
Model (5%) Types of participation (one-one, one-many, many-many) are correctly
identified.
State your assumptions which you used in creation of ER diagram (if
any).
ERD is correctly translated into relational schema by depicting all the
2. Conversion into relations and their attributes.
1, 2
Relational Schema (5%) Identification of primary and foreign keys.
Used the 8 step conversion rules.
Design the database appropriately which corresponds to the proposed
3. Conversion into ER diagram using SQL, MongoDB and Neo4J.
2
Databases and sample All the required constraints are added and clearly shown.
data (10%) All of the entities have sufficient sample data, and it is clearly shown in
all three databases - SQL, MongoDB and Neo4J.
4. Design choices Explanation of the choice of entities for database technologies
2
explanation (5%) How your choice fit the characteristics of these technologies.
Use cases are clear and the queries address them well. Queries are
correct and deliver valid results, which are clearly shown.
Demonstrate the 5 use cases (can be different for each technology),
5. Use cases and queries query code and output for each database technology - SQL, MongoDB
2
(15%) and Neo4J
1% for each query (test case, code and result) for all three databases
and hence 5% for each database technology - SQL, MongoDB and
Neo4J.

Page 3 of 7
APPROVED_L7_COMP7067_2023-24_sub_brief Feb 2024 - v3 - preEE **

Task B (40%):
Subtask Expectations ILO
1. Training and testing Training and testing data loaded
4
set Detailed description of the selected dataset with all the values displayed and
(5%) loaded efficiently
Clearly list the 2 models chosen and justification of your choice
2. Classification models Implementation of both classification models
4
(10%) All results displayed and explained clearly
Comparison of the performance of both classification models

PCA implemented on the dataset with adequate outcome explanation.


3. PCA (10%) The question in step 3 of Part B of assignment answered with proper 4, 5
justification from results obtained.

Justification of the feature selection method chosen.


4. Feature Selection Result explanation after application of feature selection method on any
4, 5
method (10%) of the algorithm point 2.
Result comparison with and without feature selection method.

Applying two Explainable AI methods on one machine learning model on at


least three samples.
5. Explainable AI (5%) 3
Detailed discussion of the results obtained from these methods accompanied
by suitable visuals.

Individual Video (20%):


Subtask Expectations ILO
Clarity of explanation (5%)
Problem analysis with justification using relevant sources. 3
Explain and justify the method used.

Your role in the group activity.


Approach used for group work.
Your contribution to the final submission.
Presentation slides (10%) Steps followed for completing task. 5
More pictorial representation in slides.
Slides should be well organized with not too much text.
Presentation skills.
Summary and conclusions Discussion and thorough elaboration on the results with meaningful
5
(5%) explanation.

To get higher marks in this assignment:


You should make assumptions about more possibilities which might exist while designing ERD diagrams apart from
the ones listed in the use case scenarios (Task A).
You should use more complex queries involving multiple entities (more than 2) in the test use cases (Task A).
You should use at least one classification model which is not covered in the lecture and also at least one feature
selection algorithm not covered in lecture (Task B).
The better, the results are presented and compared in the form of graphs, tables and relevant visualisation tools,
more marks will be awarded (Task B).

INTENDED LEARNING OUTCOMES (ILOs)


This unit assesses your ability to:
1. Perform and critically analyse data modelling.
2. Understand the underlying technology of various database systems.
3. Gain critical understanding of Data analytics’ challenges.
4. Gain critical understanding of the most significant pattern recognition algorithms for dealing with Data and Big Data.

Page 4 of 7
APPROVED_L7_COMP7067_2023-24_sub_brief Feb 2024 - v3 - preEE **

5. Be able to interpret the results from Data and Big Data analytics’ algorithms and use the appropriate methods for
reporting the results.

QUESTIONS ABOUT THE BRIEF


Any issues about the assignment can be raised with the lecturer during lectures/seminars or by appointment. Email will be
used for handling questions about the brief when no seminar/lab session is scheduled between the time the questions
arise and the submission deadline.

Unit Leader Signature Ismail Alarab

Page 5 of 7
APPROVED_L7_COMP7067_2023-24_sub_brief Feb 2024 - v3 - preEE **

Unit Title: DATA PROCESSING AND ANALYTICS (COMP7067)


Element: Coursework 01 - Submission

This is a mandatory form, meaning that ALL members MUST submit the below form INDIVIDUALLY for the Unit Leader to consider when marking.

If there were significant issues with engagement or contribution from one or more group members adjustments will be made accordingly, based on
the following principles:
If there is clear evidence of engagement or contribution issues, the Unit Leader reserves the right to adjust any member's marks
accordingly, based on the evidence detailed in the form below.
If only part of the group submits this form:
a. the Unit Leader reserves the right to adjust any member's marks accordingly, based on the evidence detailed in the form below.
b. the Unit Leader reserves the right to award zero marks to any member who does not submit the below form.
Any group where the forms submitted provide insufficient information to support Unit Leader to make a judgement, the Unit Leader
reserves the right to award every group member an equal mark.
Any group where the forms submitted provide clear contradictory information, the marks awarded will be at the discretion of the Unit
Leader.

IMPORTANT: Every member of the group takes full responsibility to work collaboratively and professionally as a group. The Unit Leader will
provide guidance on the next steps if there is a dispute between group members, Unit Leaders would not normally be expected to intervene to
resolve disputes between group members.

Your Group Name or Number


(According to Brightspace)

Contribution Comments
Name and Student Number
(Please Circle) (Required when less than full contribution)

None Partial Full


Your Name and Student Number

None Partial Full

None Partial Full

None Partial Full

None Partial Full

None Partial Full

None Partial Full

None Partial Full

None Partial Full

Your role within the group and actions/tasks which you completed or significantly contributed to:

Any other comments you would like to make on your group, particularly anything related to the group work:

Student Signature: Date: ____ / ____ / ________

Page 6 of 7
APPROVED_L7_COMP7067_2023-24_sub_brief Feb 2024 - v3 - preEE **

If a piece of coursework is not submitted by the required deadline, the following will apply:
1. If coursework is submitted within 72 hours after the deadline, the maximum mark that can be awarded is 50%. If the
assessment achieves a pass mark and subject to the overall performance of the unit and the student's profile for the level, it
will be accepted by the Assessment Board as the reassessment piece. This ruling will apply to written coursework and
artefacts only; This ruling will apply to the first attempt only (including any subsequent attempt taken as a first attempt due to
exceptional circumstances).
2. If a first attempt coursework is submitted more than 72 hours after the deadline, a mark of zero (0%) will be awarded.
3. Failure to submit/complete any other types of coursework (which includes resubmission coursework without exceptional
circumstances) by the required deadline will result in a mark of zero (0%) being awarded.
The Standard Assessment Regulations can be found on Brightspace or via
https://www1.bournemouth.ac.uk/students/help-advice/important-information (under Assessment).

Exceptional Circumstances
If you have any valid exceptional circumstances which mean that you cannot meet an assignment submission deadline and you
wish to request an extension, you will need to complete and submit the online Exceptional Circumstances Form together with
appropriate supporting evidence (e.g. GP note) normally before the coursework deadline. Further details on the procedure and
links to the exceptional circumstances forms can be found on Brightspace or via
https://www1.bournemouth.ac.uk/students/help-advice/looking-support/exceptional-circumstances. Please make sure that you read
these documents carefully before submitting anything for consideration. For further guidance on exceptional circumstances please
contact your Programme Leader.

Referencing
You must acknowledge your source every time you refer to others' work, using the BU Harvard Referencing system (Author Date
Method). Failure to do so amounts to plagiarism which is against University regulations. Please refer to
https://libguides.bournemouth.ac.uk/bu-referencing-harvard-style for the University's guide to citation in the Harvard style. Also be
aware of Self-plagiarism, this primarily occurs when a student submits a piece of work to fulfill the assessment requirement for a
particular unit and all or part of the content has been previously submitted by that student for formal assessment on the same/a
different unit. Further information on academic offences can be found on Brightspace and from
https://www1.bournemouth.ac.uk/discover/library/using-library/how-guides/how-avoid-academic-offences

Additional Learning Support


Students with Additional Learning Needs may contact the Additional Learning Support Team. Details can be found here:
https://www1.bournemouth.ac.uk/als

IT Support
If you have any problems submitting your assessment please contact the IT Service Desk - +44 (0)1202 965515 - immediately and
before the deadline.

Disclaimer
The information provided in this assignment brief is correct at time of publication. In the unlikely event that any changes
are deemed necessary, they will be communicated clearly via e-mail and Brightspace and a new version of this
assignment brief will be circulated.

Page 7 of 7

You might also like