Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ICT110

Introduction to Data Science

Task 3

Semester 1, 2024
ICT110 Introduction to Data Science
Task 3

Assessment and Submission Details

Marks: 45% of the Total Assessment for the Course

Due Date: 11:59pm Friday 14th June, Exam week

Submit your assignment to Canvas – Assignments - Task 3. Please follow the submission
instructions in Canvas.

The assignment will be marked out of a total of 100 marks and forms 45% of the total
assessment for the course. ALL assignments will be checked for plagiarism by Turnitin system
provided by Canvas automatically.

Refer to your Course Outline or the Course Web Site for a copy of the “Student Misconduct,
Plagiarism and Collusion” guidelines.

Late submission will be penalised according to the policy in the course outline. Please note
Saturday and Sunday are included in the count of days late.

Requests for an extension to an assignment MUST be made to the course coordinator prior to
the date of submission and requests made on the day of submission or after the submission date
will only be considered in exceptional circumstances. Assignment submission extensions
will only be made using the official University guidelines.

Page 2 of 6
ICT110 Introduction to Data Science
Task 3

Background
A series of data sets are provided, and you are welcome to choose whichever you are
interested in.

SPORT:

Suncorp Super Netball Data 2019-2023 by player and team by results.

ACCIDENT:

This data has been extracted from the Queensland Road Crash Database.

WINE:

This dataset is related to red variants of the Portuguese "Vinho Verde" wine.

BUSINESS:

Supermarket Sales for a retailer

The data files are available to download from Task 3 in Canvas.

Assignment Task

You are a member of the team and need to perform data analysis on selected attributes.

Key Questions you need to answer:

Describe the data. Provide a comprehensive overview of the data and its attributes, things such
as how many, what type, what it describes. Exploratory Data Analysis.

Describe the finding/s: What did you find, what did you predict, what did you thick is
important.

You have been requested to prepare a data analysis report about your work and explain your
findings. The potential audiences include other researchers, business representatives, and
government agencies. They may have limited ICT or mathematical knowledge. Therefore, the
report should be technical but have clear explanations describing the findings.

Note: not all columns are related to this purpose.

To prepare the report, please include the following sections:

1. Introduction
Introduce the problem. Include background material as appropriate: who cares about this
problem, what impact it has, where does the data come from, what are the dimensions and
structure of the data.

2. Data Setup
Describe how to load the data, and how the pre-processing is performed.

The original dataset is not ready for analysis and it is different from the data forms that we
Page 3 of 6
ICT110 Introduction to Data Science
Task 3
are familiar with in previous practices. This means we need to do some pre-processing, either
for the whole dataset, or for a subset of the dataset required for each sub task described later.

Once you have some ideas of exploratory or advanced analysis, you need to adjust the form
of dataset. This can be achieved either by manipulating records in R by transposition or
subsetting, or with other tools (e.g. notepad or excel) before reading them into R. For
simplicity, you can also rename the attribute names.

Please clearly explain the way you have cleaned the data in this section. If you use Excel
please still explain the steps that you used for cleaning.

3. Exploratory Data Analysis [[AT LEAST FOUR OF THE FOLLOWING]]

3.1. One-variable analyses with graphs and tables

One-variable analysis studies one variable (one column/attribute) each time. It is up to you to
decide which attribute/variable you use for this analysis but the attribute you select need to be
related to the research objectives.

3.2. Two-variable analyses with graphs and tables

A two-variable analysis studies the relation between two variables. It is up to you to decide
which attributes/variables you use for this analysis but the attributes you select need to be
related to the research objectives.

4. Advanced Analysis [[AT LEAST TWO OF THE FOLLOWING (you can do the same
type twice on different data)]]

4.1. Regression analyses with graphs

Briefly explain the concept of linear regression (with references). It is up to you to decide
which attributes/variables you use for this analysis but the attributes you select need to be
related to the research objectives.

4.2. Clustering with graphs

Briefly explain the concept of clustering and k-means (with references). Perform 1 clustering
analysis. It is up to you to decide which attribute(s) you use for this analysis but the
attribute(s) you select need to be related to the research objectives.

5. Conclusion
Sum up your findings and provide some insight into the findings.

6. Reflections

In this part, discuss any difficulties you had performing the analysis and how you solved
those difficulties. Reflect on how the analysis process went for you, what you learnt, and
what you might do differently next time. Aim to write one paragraph.

For all data analysis (Section 3 & 4), you need to provide both R script file and the
explanation to the code (in comments in code). Please submit a single R code file as part
of your submission for compiling and running. Your R code MUST run.
Page 4 of 6
ICT110 Introduction to Data Science
Task 3
The marking rubric is viewable on Canvas.

Report Format

Your report should be 1,200 + words. The report MUST be formatted using the

following guidelines:

1. Title Page – Include your name as the report’s author.


2. Header – Report title
3. Footer – your name and the page number
4. Paragraph text – 12 point single line spacing
5. Headings – In an appropriate type and size
6. Margins – 2.5cm on all margins
7. Page numbering - Introduction and onwards to use conventional numerals (1, 2, 3, 4)
starting on page 1 from the introduction.
8. The report is to be created as a single Microsoft Word document (version 2007 or later).
No other format is acceptable and doing so will result in the deduction of marks.

Please follow the conventions detailed in:


Summers, J. & Smith, B., 2014, Communication Skills Handbook, 4th Ed, Wiley, Australia.

Referencing

References for the explanation of decision trees and linear regression are required. These
references should follow the Harvard or APA method of referencing. Note that ALL references
should be from journal articles, conference papers, technical papers or a recognized expert in
the field. Use the library databases or Google Scholar to find appropriate articles. DO NOT use
Wikipedia as a reference.

Assignment Return and Release of Grades

Assignment grades will be available on Canvas in two weeks after the submission. Details of
marking will also be accessible via online rubrics on Canvas.

Where an assignment is undergoing investigation for alleged plagiarism or collusion the grade
for the assignment and the assignment will be withheld until the investigation has concluded.

Page 5 of 6
ICT110 Introduction to Data Science
Task 3

Assignment Advice

This assignment will take many weeks to complete and will require a good understanding of
data science theories and practices for successful completion. It is imperative that students take
heed of the following points in relation to doing this assignment:

1. Ensure that you clearly understand the requirements for the assignment – what must
be done and what are the deliverables.
2. If you do not understand any of the assignment requirements – Please ASK the course
coordinator or your tutor.
3. Each time you work on any aspect of the assignment reread the assignment
requirements to ensure that what is required is clearly understood.

End of Assignment

Page 6 of 6

You might also like