Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

09/03/2022, 11:56 R Markdown and Vectorisation

R Markdown and Vectorisation


Patrick Baker & Kaitlyn Hammond
7 March 2022

Preamble
After the previous session, you should be comfortable with basic mathematical operations and functions in R,
know how to assign objects and perform operations on those objects, and know how to save your work in a
script.

This week, we will introduce R Markdown ( .Rmd ), the document class we will be using for submitting the
assessed pracs, and show you how to use it to make reports with embedded R code, R output, and R
graphics. We will also show you a few common problems to avoid when you are working with .Rmd files.

In this prac you will also learn the basics of generating vectors and data frames, accessing specific elements in
them, creating new variables, and summarising your data.

After this session you should be able to:

Edit and knit (i.e., compile) an R Markdown document

Generate vectors and data frames

Access and summarise the data in those vectors and data frames

Append a new variable to a data frame

Make a basic histograms of your data.

Introduction to R Markdown
R Markdown is a document class that is used in R that combines elements of code contained in “chunks” with
text and document formatting. When .Rmd files are compiled (a process called “knitting”), the result is a clean
document that contains code, code outputs, and text.

Each week in which there is an assessed practical, there will be an .Rmd template document in your project
folder that you will fill in, knit, and submit as an .html file to the LMS.

Anatomy of an R Markdown document


When you open an R Markdown document in RStudio, it will look something like this:

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 1/14
09/03/2022, 11:56 R Markdown and Vectorisation

There are three key components to any R Markdown document, all of which you can see in this figure. These
are:

the header
the setup chunk
your R code chunks

The header in the R Markdown file creates the title of the document, specifies the author and date, and tells R
Markdown what kind of document to create. It is located at the top of the .Rmd file and looks like this:

The only thing that you will need to do to the header for the assessments in this class is fill in your name and
student ID.

The next key part of the document is the “setup” code chunk.

A code chunk is a section of the document that contains code that will run when the R Markdown file is
compiled. Code chunks are shown as a different color to the normal text that you might write in the R
Markdown file. The “setup” code chunk is where you specify the R packages that you will need for the prac.
Each package is enclosed in the library() function, which just tells R that we would like to have that R
package available for use. Finally, the setup code chunk also contains some information that controls how the
document will be compiled. We will put all of the necessary information into the setup code chunk that is
required to complete the prac. You will not need to do anything to the setup code chunk.

The third key element of the R Markdown document are the blank code chunks. This is where you will enter
your R code to answer the questions in the practical handout. Once you are happy with the code you have
written in the code chunk, you can click on the little green arrow in the upper right corner of the code chunk to
run the code.

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 2/14
09/03/2022, 11:56 R Markdown and Vectorisation

In addition to these three key elements of the R Markdown document, you can also add plain text outside of
the code chunks. This is where you will provide your interpretation of the R code, R output, or R graphics.

Troubleshooting R Markdown
There are several common mistakes that people make when starting out with R Markdown documents. First,
the code chunks have a fixed structure that must be maintained. They begin and end with three left-hand quote
marks (see image below). If you remove any of these or add an extra quote mark, your code chunk will not run.

Second, any outputs you want to show need to be typed out to print them. If you assign and object but don’t
type the object name (as in the image below), it will not print.

You need to explicitly state the name of the object within the R code chunk for it to be printed:

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 3/14
09/03/2022, 11:56 R Markdown and Vectorisation

Finally, many people mistakenly type their explanation to an answer inside, rather than outside, the R code
chunk. This inevitably leads to an error message when R cannot understand what the text is asking:

So, remember: put your R code inside the code chunks and your explanations outside of the code
chunks!

Hint: if you type a # before text, it tells R markdown not to read it. So if you want to type directly in the code
box, make sure you put a # before the text.

R Markdown Visual Mode


Opening the .Rmd document and seeing all the code can be a bit intimidating if you are not used to using
statistical software, html, or programming! Fortunately, RStudio has recently added a new tool – the Visual
Editor – to its R Markdown panel to make it easier to see what you are doing.

To switch to visual mode, click on the little drawing compass icon in the upper right corner:

Your R Markdown panel will change to look like this:

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 4/14
09/03/2022, 11:56 R Markdown and Vectorisation

The visual editor makes it a bit easier to see the structure and formatting of the R Markdown document. This
should make it more obvious where you need to write your code chunks and where you need to write your
explanation.

Knitting your R Markdown document


Once you have finished editing your R Markdown document, you will need to compile it through a process
called “knitting”. To do this, select the knit button (next to the ball of yarn icon) at the top left of your R
Markdown document:

When you do this you will see all sorts of information scroll past in your console panel under the R Markdown
tab. In general, you can ignore this. If the file compiles correctly, an .html file will be generated and should be
listed in the files tab in the bottom-right pane of your R Studio window.

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 5/14
09/03/2022, 11:56 R Markdown and Vectorisation

To have a look at the file, either double-click on it or click on it and select “View in Browser”.

To submit your assessment, you will need to export the .html file that you just created from your R Markdown
file to your computer. To do this, tick the box to the left of the document and then select “Export…” from the
“More” dropdown menu:

Now you should have a beautiful document with all your code and outputs to submit t the AGRI90075 Canvas
assessment submission page!

Vectors and vectorisation


Last week we introduced vectors – objects that contain one or more elements, where the elements can be
numbers, letters, or words (basically anything that you can type on a keyboard). In today’s prac, we will be
exploring various aspects of vectors and vectorisation. For example, here is a numeric vector with four
elements:

vector1 <- c(4,7,8,1)

vector1

## [1] 4 7 8 1

When you run this code, you will notice in your environment pane that there is now an object called vector1
with num [1:4] beside it. This tells us that it is a numeric vector with four elements.

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 6/14
09/03/2022, 11:56 R Markdown and Vectorisation

If we want to see the entire vector, we can just type the name of the vector as we did above. However, you
may just want to extract one of the elements (eg, the 3rd element of vector1 ). To do this, R uses bracket
( [] ) notation:

vector1[3]

## [1] 8

While this may not seem particularly helpful for this tiny vector, it can be very useful in larger vectors with more
items than can be displayed in R.

For today’s prac we will explore some useful operations for vector manipuluation using a small data set that
contains information describing a number of black cherry trees. Specifically, the dataset includes the diameter,
the height, and the volume measurements for 31 black cherry trees.

We will start by creating a vector of the tree diameters (measured in inches):

diameter <- c(8.3,8.6,8.8,10.5,10.7,10.8,11.0,11.0,11.1,11.2,

11.3,11.4,11.4,11.7,12.0,12.9,12.9,13.3,13.7,13.8,

14.0,14.2,14.5,16.0,16.3,17.3,17.5,17.9,18.0,18.0,20.6)

Hint: You can copy and paste this code straight from the manual!

Now let’s create the vector of heights (measured in feet) for the same trees:

height <-c(70,65,63,72,81,83,66,75,80,75,79,

76,76,69,75,74,85,86,71,64,78,80,

74,72,77,81,82,80,80,80,87)

QUESTION 1: What are the diameter and height of the 17th tree? (You will need to show your R code for
doing this, so that we know that you didn’t just count to the 17th value…)

Data frames
For most data analysis, you will have multiple variables stored in a table or data frame. In this section, we will
build a data frame from multiple vectors, do some calculations, and plot the data.

Let’s take our diameter and height objects and bind them together into a data frame that we will call
cherry :

cherry <- data.frame(diameter, height)

cherry

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 7/14
09/03/2022, 11:56 R Markdown and Vectorisation

## diameter height

## 1 8.3 70

## 2 8.6 65

## 3 8.8 63

## 4 10.5 72

## 5 10.7 81

## 6 10.8 83

## 7 11.0 66

## 8 11.0 75

## 9 11.1 80

## 10 11.2 75

## 11 11.3 79

## 12 11.4 76

## 13 11.4 76

## 14 11.7 69

## 15 12.0 75

## 16 12.9 74

## 17 12.9 85

## 18 13.3 86

## 19 13.7 71

## 20 13.8 64

## 21 14.0 78

## 22 14.2 80

## 23 14.5 74

## 24 16.0 72

## 25 16.3 77

## 26 17.3 81

## 27 17.5 82

## 28 17.9 80

## 29 18.0 80

## 30 18.0 80

## 31 20.6 87

The syntax is straight-forward – put the objects that you want to include in the data frame into the command
data.frame() . Note that the order that you write them in the parantheses is the order in which they will be
written in the data frame (ie, diameter will be the first column and height will be the second column). The data
is now stored in the cherry data frame. If you look at your “environment” pane, you will see a triangle next to
the cherry object. If you click the triangle, you will see the two column names and the first entries for each
column.

If you click on the word cherry , a new tab will open up in your Scripts pane showing you the full data frame.

Removing data
Now that we have combined our vectors into a data frame, we do not need to keep the vectors in the
environment anymore. You can use rm() or remove() to remove the two vectors:

rm(diameter)

rm(height)

The diameter and height objects will no longer be visible in your environment. We like a tidy work
environment…

Summarising your data


https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 8/14
09/03/2022, 11:56 R Markdown and Vectorisation

In R, the summary() function is a quick and easy way to generate some descriptive statistics for your data
frame. It is useful to look at a summary of your data before doing any analysis.

summary(cherry)

## diameter height

## Min. : 8.30 Min. :63

## 1st Qu.:11.05 1st Qu.:72

## Median :12.90 Median :76

## Mean :13.25 Mean :76

## 3rd Qu.:15.25 3rd Qu.:80

## Max. :20.60 Max. :87

Fo numerical data the summary function provides you with the mean, median, minimum, maximum, and first
and third quartiles.

Generating new variables


Often, data analysis requires you to perform calculations on your data that you may want to add to your data
frame. To select a column by name from a data frame, you can use the $ symbol.

cherry$diameter

## [1] 8.3 8.6 8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0

## [16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0

## [31] 20.6

We can also use the dollar sign ( $ ) to add a new variable to a data frame. Here we want to change the data
from imperial units (inches and feet) to metric units (centimetres and metres). To do this, we use “vectorisation”
– that is, we apply a function to a vector to change all of the values within that vector into a new vector. Here
we convert the diameter measurements in inches to diameter measurements in centimetres (by multiplying by
2.54) and place the new values in the data frame as the variable diameter_cm :

cherry$diameter_cm <- cherry$diameter * 2.54 ### There are 2.54 cm in 1 inch

cherry

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectorisa… 9/14
09/03/2022, 11:56 R Markdown and Vectorisation

## diameter height diameter_cm

## 1 8.3 70 21.082

## 2 8.6 65 21.844

## 3 8.8 63 22.352

## 4 10.5 72 26.670

## 5 10.7 81 27.178

## 6 10.8 83 27.432

## 7 11.0 66 27.940

## 8 11.0 75 27.940

## 9 11.1 80 28.194

## 10 11.2 75 28.448

## 11 11.3 79 28.702

## 12 11.4 76 28.956

## 13 11.4 76 28.956

## 14 11.7 69 29.718

## 15 12.0 75 30.480

## 16 12.9 74 32.766

## 17 12.9 85 32.766

## 18 13.3 86 33.782

## 19 13.7 71 34.798

## 20 13.8 64 35.052

## 21 14.0 78 35.560

## 22 14.2 80 36.068

## 23 14.5 74 36.830

## 24 16.0 72 40.640

## 25 16.3 77 41.402

## 26 17.3 81 43.942

## 27 17.5 82 44.450

## 28 17.9 80 45.466

## 29 18.0 80 45.720

## 30 18.0 80 45.720

## 31 20.6 87 52.324

The data frame cherry now has three variables. Notice that we wrote cherry after we added the new
variable in order for us to print out the data frame.

QUESTION 2: Create a new variable height_m . To convert feet to metres, multiply feet by 12 (inches in a
foot), then multiply by 2.54 (centimetres in an inch), then divide by 100 (centimetres in a metre). Add this new
variable to the cherry data frame. Print the data frame.

QUESTION 3: Calculate the conical volume in cubic metres of the trees. To do this you will create a new
variable volume in the cherry data frame that you will calculate using the following formula:

Where V = volume, h = height, and r = radius (ie, 1/2 of diameter). You will need to ensure that your units are
correct. The best way to do this is to convert the diameter_cm values from centimetres to metres in your
calculation. Do this by dividing by 100. Also, make sure that you use parentheses wisely.

QUESTION 4: What is the minimum, maximum, and median volume of the trees in the cherry data frame?

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectoris… 10/14
09/03/2022, 11:56 R Markdown and Vectorisation

Selecting multiple entries or columns


Earlier we showed that you can use brackets ( [] ) to select a particular element within a vector. You can also
use this notation with data frames. Data frames have two dimensions – rows and columns – so you can use
dataframe[x,y] to select Row x and Column y. Here we select the 2nd entry in the 3rd column:

cherry[2,3]

## [1] 21.844

While it is rare that you will need to select a single value from within a data frame, we can use this notation to
subset a data frame (ie, select specific columns or rows). Here we select columns 2 and 3 from the data frame
(note the use of the colon (:) here):

cherry[,2:3]

## height diameter_cm

## 1 70 21.082

## 2 65 21.844

## 3 63 22.352

## 4 72 26.670

## 5 81 27.178

## 6 83 27.432

## 7 66 27.940

## 8 75 27.940

## 9 80 28.194

## 10 75 28.448

## 11 79 28.702

## 12 76 28.956

## 13 76 28.956

## 14 69 29.718

## 15 75 30.480

## 16 74 32.766

## 17 85 32.766

## 18 86 33.782

## 19 71 34.798

## 20 64 35.052

## 21 78 35.560

## 22 80 36.068

## 23 74 36.830

## 24 72 40.640

## 25 77 41.402

## 26 81 43.942

## 27 82 44.450

## 28 80 45.466

## 29 80 45.720

## 30 80 45.720

## 31 87 52.324

By leaving the “x” position in our brackets blank, we tell R that we want to select all rows in columns 2 through
3.

If we wanted to select all of the data in the first 5 rows, we would do the following:

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectoris… 11/14
09/03/2022, 11:56 R Markdown and Vectorisation

cherry[1:5,]

## diameter height diameter_cm

## 1 8.3 70 21.082

## 2 8.6 65 21.844

## 3 8.8 63 22.352

## 4 10.5 72 26.670

## 5 10.7 81 27.178

If you want to pick several columns (or rows), but they are not all next to each other, you can use the
concatenate function c() that you learned last week. If you want to select everything but one particular row or
column, you can use the minus sign ( - ).

cherry[c(1, 3, 5, 12), ] ### Pick and choose your rows

## diameter height diameter_cm

## 1 8.3 70 21.082

## 3 8.8 63 22.352

## 5 10.7 81 27.178

## 12 11.4 76 28.956

cherry[ ,-1] ### Select everything but the first column

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectoris… 12/14
09/03/2022, 11:56 R Markdown and Vectorisation

## height diameter_cm

## 1 70 21.082

## 2 65 21.844

## 3 63 22.352

## 4 72 26.670

## 5 81 27.178

## 6 83 27.432

## 7 66 27.940

## 8 75 27.940

## 9 80 28.194

## 10 75 28.448

## 11 79 28.702

## 12 76 28.956

## 13 76 28.956

## 14 69 29.718

## 15 75 30.480

## 16 74 32.766

## 17 85 32.766

## 18 86 33.782

## 19 71 34.798

## 20 64 35.052

## 21 78 35.560

## 22 80 36.068

## 23 74 36.830

## 24 72 40.640

## 25 77 41.402

## 26 81 43.942

## 27 82 44.450

## 28 80 45.466

## 29 80 45.720

## 30 80 45.720

## 31 87 52.324

If we want to create a new data frame that is a subset of an existing data frame, then we need to assign the
selection to a new name:

not_all_of_cherry <- cherry[,2:3]

QUESTION 5: Using the bracket ( [] ) notation, create a new data frame called cherry_subset that contains
the last 10 values in the height_m column. Print the data frame.

Plotting your data


It is always a good idea to plot your data before doing any statistical analysis. Next week we will spend a lot of
time learning the fundamentals of data visualisation using the ggplot package in R. Today, though, we will
use the hist() function in base R to make a simple histogram.

The hist() function will create a histogram of your data so that you can assess the distribution of the data.
This will be very important later on as you learn about data distributions.

Let’s have a look at the distribution of the diameter column in the cherry data frame:

hist(cherry$diameter)

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectoris… 13/14
09/03/2022, 11:56 R Markdown and Vectorisation

QUESTION 6: Plot a histogram of the volume of the trees in the cherry data frame.

https://c4944651f55f488c9623f3a769087078.app.rstudio.cloud/file_show?path=%2Fcloud%2Fproject%2FWeek_02_RMarkdown_and_vectoris… 14/14

You might also like