Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Week 4 Materials:

Lecture

Beginner R for
Public Health
Practitioners and
Researchers in
Liberia
R Coding and Biostatistics Short Course Series
Laura Skrip, PhD, MPH
Today’s Plan
Content Topics Time (Minutes) Skills
Practice Activity (30 minutes)
Defining new 60 • Making space for new objects
variables (cont.) • Using the which function to subset and redefine
existing variables
• Creating embedded for and if loops to create loops
based on conditional statements
Linking datasets 5 • Linking two datasets based on a common variable
using the cbind() and merge() functions
Introducing the 20 • Installing new packages
ggplot2 package
Preparing for Course Conclusion (5 minutes)

Beginner R Course - Developed by Laura Skrip


Activity: Cleaning data with the which
function
• Read in the diabetes.csv dataset (Source:
https://www.kaggle.com/uciml/pima-indians-diabetes-database).
• Use the which function to reformat the BMI variable into a
categorical variable based on the following classifications from the
US CDC:

BMI Classification
< 18.5 Underweight
18.5 to <25 Healthy
25 to <30 Overweight
30+ Obesity
3 Beginner R Course - Developed by Laura Skrip
Activity: Cleaning data with the which
function
Hints:
• Create ‘space’ for a new variable.

• Identify the row numbers which satisfy each category.


• Assignthe classification (i.e., healthy, overweight, etc.) to
the appropriate row numbers of the new variable.

4 Beginner R Course - Developed by Laura Skrip


Defining new
variables
(cont.)
Creating variables that allow us to investigate
our hypotheses

Beginner R Course - Developed by Laura Skrip


“If loops” to create variables
• An “if loop” in R allows you to apply an action when a condition is met.

• Example 1: Suppose you had a dataset with an Age variable. Age is often collected
as a numeric variable, but sometimes it is helpful to categorize age. You may want
to create a variable that reflects age groups known to be at differential risk of a
disease or condition.
– Action: if age is less than 45, use the category ‘<45’ in the Age_Group variable and if
age is 45 or above, use the category ‘≥45’.

• Example 2: Suppose you are interested in creating a variable that indicates


whether females in the dataset are of childbearing age, as a way to calculate
population at risk of a particular condition affecting this subgroup.
• Action: if age is between 15 and 44 and sex is female, use the category ‘Yes’ in a
variable ChildbearingAge and if age is less < 15 or >44 and sex is female or male, use
the the category ‘No’.

6 Beginner R Course - Developed by Laura Skrip


“If Loop” syntax in R
‘If loops’ with assignment of answer:
Results_Vector_Two <- rep(NA,10)
for (i in 1:10) {
if (i > 5) {
Results_Vector_Two[i] <- i**2
}
}

• Note that we often embed an ‘if loop’ within a ‘for loop’


• What do we expect to get for indices > 5 in Results_Vector_Two? What
about for indices ≤ 5?
7 Beginner R Course - Developed by Laura Skrip
“If Loop” syntax in R
‘If loops’ with assignment of answer:
Results_Vector_Two <- rep(NA,10)
for (i in 1:10) { A conditional
if (i > 5) { statement that must
Results_Vector_Two[i] <- i**2
be met for the action
to occur. The result
} of this must be
} TRUE or FALSE.

• Note that we often embed an ‘if loop’ within a ‘for loop’


• What do we expect to get for indices > 5 in Results_Vector_Two? What
about for indices ≤ 5?
8 Beginner R Course - Developed by Laura Skrip
“If Loop” syntax in R
Results_Vector_Two <- rep(NA,10) A conditional
for (i in 1:10) { statement that must
be met for the action
if (Results_Vector[i] == 8) { to occur. The result
Results_Vector_Two[i] <- i**2 of this must be
TRUE or FALSE.
}
else if (Results_Vector[i] != 8) {
Results_Vector_Two[i] <- Results_Vector[i]
}
}
9 Beginner R Course - Developed by Laura Skrip
Activity: Defining new variables with an
if loop
• Create a new variable that categorizes risk level, as
defined (arbitrarily and for purposes of demonstration):
– Risk level is high if person ≥ 55 years old or person has
diabetes.
– Risk level is low if person is < 55 years and no diabetes.
– Risk level is moderate if person < 55 years and has diabetes.
• You may need to look up syntax for ‘and’ and ‘or’, specifically when
trying to generate a logical answer of TRUE or FALSE.

10 Beginner R Course - Developed by Laura Skrip


Linking
datasets
Allowing for creation of a single data frame
when two datasets have a common, identifying
variable

Beginner R Course - Developed by Laura Skrip


Merging two datasets requires a
common, identifying variable
• In the VAERS datasets, we have the VAERS_ID variable.

• Let’s merge the data from the 2021VAERSDATA.csv and 2021VAERSVAX.csv


spreadsheets into a single data frame.

• What search terms could we use to identify the function for accomplishing this task?

12 Beginner R Course - Developed by Laura Skrip


Merging two datasets requires a
common, identifying variable
• In the VAERS datasets, we have the VAERS_ID variable.

• Let’s merge the data from the 2021VAERSDATA.csv and 2021VAERSVAX.csv


spreadsheets into a single data frame.

• What search terms could we use to identify the function for accomplishing this task?

merge(data1, data2, by = “ID1”)

13 Beginner R Course - Developed by Laura Skrip


Visualizing
data
Introducing the ggplot2 package

Beginner R Course - Developed by Laura Skrip


R-Graph Gallery
provides many
graph types with
sample code that
can be modified
according to your
own dataset.
https://www.r-
graph-
gallery.com/ggplot2-
package.html

15 Beginner R Course - Developed by Laura Skrip


Example: Multiple group histogram

• https://www.r-graph-gallery.com/histogram_several_group.html
16 Beginner R Course - Developed by Laura Skrip
Final Steps
for Beginner R
Demonstrating what you have learned and
where we are going

Beginner R Course - Developed by Laura Skrip


Outstanding Course Components
• Short post-test to assess what you have learned and how we did

• Data visualization practice!

• Assignment of final project/exercise

• Scheduling…

18 Beginner R Course - Developed by Laura Skrip


See you in lab
next time!

Beginner R Course - Developed by Laura Skrip

You might also like