Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

---

title: 'ETC1010-5510: Introduction to Data Analysis'

author: "Your name"

output:

html_document:

css: CSSBackground.css

---

```{r setup, include = TRUE, echo = FALSE, cache = FALSE}

# Please do not touch this R code chunk!

knitr::opts_chunk$set(

echo = TRUE,

eval = TRUE,

out.width = "70%",

fig.width = 8,

fig.height = 6,

fig.retina = 3)

set.seed(6)

filter <- dplyr::filter

```

## Instructions to Students

**This is an individual assignment and you must work on it on your own. Collaboration on the
assignment constitute collusion. For more on collusion and misconduct please see this [webpage.]
(https://connect.monash.edu/s/article/FAQ-2144) **
This assignment is designed to simulate a scenario in which you are taking over someone's existing
work and continuing with it to draw some further insights. The dataset you have used has already
gone through preliminary cleaning, and it will be your job to use this cleaned dataset to answer
questions.

You have just joined the Australian Security Intelligence Organisation (ASIO) as a data analyst. You've
first job is to help ASIO understand the existing landscape of security threats. To get you started in
this new role, ASIO has asked you to perform a short EDA on a snippet of global terrorism data that
the agency has compiled. You are to communicate your findings about existing patterns in global
terrorism to the chief analytics leader. This is not a formal report, but rather something you are
giving to your manager that describes the data with some interesting insights.

Please make sure you read the hints throughout the assignment to help guide you on the tasks.

The points allocated for each of the elements in the assignment are marked in the questions and in
certain cases, code scaffolding has been provided. However, if you feel this scaffolding is unhelpful,
you are not obliged to use it.

## Marking + Grades

This assignment will be worth **10%** of your total grade. **Due on: 5 March 2024, 4:00 PM
(Melbourne time)**. **Late submissions will not be accepted.**

For this assignment, you will need to upload the following into Moodle:

- The rendered html file saved as a pdf. The assignment will be only marked if the pdf is uploaded in
Moodle. **The submitted assignment pdf file must have all the code and output visible.**

- To complete the assignment, you will need to fill in the blanks with appropriate R code for some
questions. These sections are marked with `___`. For other questions, you will need to write the
entire R code chunk.

- **At a minimum, your assignment should be able to be "knitted"** using the `Knit` button for your
Rmarkdown document so that you can produce a html file that you will save as pdf file and upload it
into Moodle. You will be reminded about how to save the rendered html file into pdf in the tutorials
of Week 3.

If you want to look at what the assignment looks like as you progress , remember that you can set
the R chunk options to `eval = FALSE` like so to ensure that you can knit the file:

````markdown

```{r this-chunk-will-not-run, eval = FALSE} `r''`

a <- 1 + 2

```

````

**If you use `eval = FALSE` or `echo = FALSE`, please remember to ensure that you have set to `eval =
TRUE` and `echo = TRUE` when you submit the assignment, to ensure all your R codes run.**

**IMPORTANT: You must use R code to answer all the questions in the report.**

## Due Date

This assignment is due on 5 March 2024, 4:00 PM. You will submit the knitted html file **saved as a
pdf** via Moodle. Please make sure you add your name on the YAML part of the Rmd file before you
knit it and save it as pdf.

## How to find help from R functions?

Remember, you can look up the help file for functions by typing: `?function_name`. For example, `?
mean`.
## Load all the libraries that you need here

```{r libraries, eval = TRUE, message = FALSE, warning = FALSE}

library(tidyverse)

```

## Reading and preparing data

```{r read-in-data, message = FALSE, echo=FALSE, warning=FALSE, eval=FALSE}

library(readxl)

terror <- read_xlsx("data/globalterrorismdb_0522dist.xlsx")

```

## Question 1: Display the first 10 rows of the data set (1pt). **Hint:** Check *?head* in your R
console

## Question 2: How many observations and variables does the data set _music_ have (1pt)? Use
inline code to complete the sentence below (2pts)

The inline code will look like the following

```{r}

#The number of observations are `r ___` (1pt) and the number of variables are `r ___` (1pt)

```
## Question 3: What is the name of the 3rd variable in this data set (2pts)? Use R commands to
answer this question.

## Question 4: Using the terror data set, rename the first variable to event_id and save this new data
frame as terror (2pts). Display the first 4 rows and 9 columns corresponding to the country "France"
(1pt).

```{r rename_variable, eval=FALSE}

terror<-rename(terror, ___=___)

terror %>%

dplyr::___(country_txt == "France") %>%

head(___ )

```

## Question 5: Using the terror data set, filter out all data between the years of 1995 and 2011. Call
this new dataset terror_short. (4pt)

## Question 6: How many variables and observations are in the _terror_short_ data frame during
these years (2pts)? Hint: you can use `count` or `nrow` to complete this.

## Question 7: Use terror_short dataframe to answer the following questions.


### Question 7.1: Since 1995, which countries have experienced the most terrorist attacks. Display
the five countries with the highest frequency of attacks. (5pt)

```{r, eval=FALSE}

terror_short %>% ___(country_txt) %>% ___(n=n()) %>% arrange(___) %>% ___(5)

```

### Question 7.2: In Western Europe, which county has experienced the most attacks. (2pt)

### Question 7.3: What are the three most common type of attack in Eastern Europe? Use the
variable 'attacktype1_txt' to answer this question. (2pt)

## Question 8: Use terror_short dataframe to answer the following questions.

### Question 8.1: Calculate the number of attacks per month across all years. Display the month with
the highest number of attacks. (4pt)

```{r, eval=FALSE}

terror_short %>%___(imonth) %>% ___(n=n()) %>% arrange(___) %>% ___(__)

```

### Question 8.2: Calculate the average number of attacks per month across all year, and display the
month with the highest average. Which month has the highest variability as measured by standard
deviation? (5pt)

### Question 8.3: Use a boxplot to display the frequency of attacks by Month across all years. By
visual analysis, state which month has the highest and lowest variability? (2pts)

### Question 8.4: Recalling the relationship between boxplots and empirical quantiles, state which
month has the highest interquartile range of attacks. (4pts)

### Question 8.5: Do you observe any pattern in the frequency of attacks throughout the year? (Hint:
Make sure your boxplot is aligned from January to December.) Use two to three sentences to answer
this question. (5pt)

You might also like