Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

TM5140 Data-driven Decision Making

Assignment 1

12/02/2022

Introduction

This assignment is based on Chapter 1 through Chapter 3, as well as the lab sessions covered in each
chapter.

Assignment guidelines:

1. Data preparation

For this assignment, you will select a research question related to the Sri Lankan economy (e.g., education,
agriculture, health, transportation, business, export, environment, telecommunication, retail, IT, and so on),
identify a suitable (recent) data set with which to investigate the question, plan appropriate analyses,
implement them using R software and tidy tools, and prepare a brief report using rmarkdown.
Your research question should be related to the Sri Lankan economy. Once you have thought of a few different
research questions in which you are interested, find an appropriate dataset for the analysis. An ideal dataset
would be something related to or from your own research, but if this is not available, you may find something
on the internet (eg: Google Trends data), an annual report, indicators published by different organizations
in Sri Lanka etc. The data set would have to be small enough and/or formatted appropriately for you to
analyze it, but also large enough with enough different variables to demonstrate your knowledge gained in
Chapters 1, 2 and 3 and the lab sessions.
Save the dataset as a csv file in the following format:
Data_(Sector)_(YourStudentIDNumber).csv
Example:
If you select a dataset from education sector and your student ID number is XX1111111
Then your csv data file can be saved as Data_Education_XX1111111.csv

2. Data documentation

The documentation (i.e., the “Readme.txt” file) that accompanies each project data set is as important as the
data itself. This information permits collaborators and other analysts to understand any limitations or special
characteristics of the data that may impact its use. The following outline and content are recommended and
should be adhered to as closely as possible to make the documentation consistent across all data sets.
Data Set Documentation/Readme Outline:

a) Save the text file in the following format

Readme_(Sector)_(YourStudentIDNumber).txt

1
b) Your readme file should contains the following

• Sector:
• Title: A descriptive title that match the content of the data set

• Author: Your name


• Author_ID: Student Index Number
• Email: Email address
• Description: Introduction to your dataset (Max 200 words)

• Source: Data source (eg, web pages, annual report, publications). Provide web address references, if
any (i.e., links for any publications, additional documentation such as Project web site,if available).
• Format: A data frame with xx rows and yy variables.

Please replace xx (number of rows in your dataset) and yy (number of columns in your dataset) according to
your dataset.

• Provide a SEPERATE description for EACH column of your dataset using the following format.

column_1_name - Varible description


column_2_name - Varible description
column_3_name - Varible description
...

• Other: Other Remarks. Any other information related to the dataset, if any (eg, special, unusual
incidents)

If you have multiple datasets save them as seperate csv files and prepare a seperate readme file
for each dataset.

3. Analysis

Produce THREE meaningful visual representations based on the dataset using the R programming language
(ggplot2 R package) and interpret the results. This assignment is based on your understanding of the
concepts discussed in the first three chapters and the lab sessions.

4. Result communication

Use rmarkdown to create a brief (less than five-page) report based on your findings. Include the visualizations
with R codes and a brief explanation of the visualizations in your answer.
Following topics should be addressed in your report.

• Research objectives
• Three meaningful visual representations
• Interpretation of the visual representations

2
• References: This is the last section of your report. Here you should provide all the references that you
used for your report writing.

Ethical writers ALWAYS acknowledge the contributions of others and the source of their ideas. Any verbatim
text or taken from another author must be enclosed in quotation marks. Please acknowledge every source
(including r codes) you use in your writing, whether you paraphrase it, summarize it or enclose it in quotations.
At the end of this activity you will have the following three documents

1. Data_(Sector)_(YourStudentIDNumber).csv
2. Readme_(Sector)_(YourStudentIDNumber).txt
3. Report_(Sector)_(YourStudentIDNumber).rmd
4. Report_(Sector)_(YourStudentIDNumber).pdf(.doc, .pdf or .html)

NOTE: If you have multiple data files save them as follows

1. Data_(Sector)_(YourStudentIDNumber)_(dataset_1).csv

Data_(Sector)_(YourStudentIDNumber)_(dataset_2).csv

2. Readme_(Sector)_(YourStudentIDNumber)_(dataset_1).txt

Readme_(Sector)_(YourStudentIDNumber)_(dataset_2).txt

Deadline: 12 March 2022

If you have any issues with submitting your assignment due to any unexpected circumstances, you can apply
for an extension or special consideration. For special consideration or deadline extensions, please send me
an email (priyangad@uom.lk) on or before 5 March 2022, informing me of your request with the email title:
5140_Assignment1_deadline_extension_(yourIndexNumber). Otherwise, a penalty will be applied for late
submission of assignments (0.5 mark for each extra day).

Discussion Forum

I have initiated a discussion forum for this assignment (Assignment 1-Forum). Use this forum to discuss any
questions or concerns related to this project. I am happy to participate in the forum discussion, but I want
you to take part in this discussion and help each other.

You might also like