Download as pdf or txt
Download as pdf or txt
You are on page 1of 94

Unit-2

R Programming Basics: Overview of R, R data types and objects, reading


and writing data, Control structures, functions, scoping rules, dates and
times, Loop functions, debugging tools, Simulation, code profiling.

Creation and Execution of R File in R


Studio
R Studio is an integrated development environment(IDE) for R. IDE is a GUI, where
you can write your quotes, see the results and also see the variables that are generated
during the course of programming. R is available as an Open Source software for
Client as well as Server Versions.
Creating an R file
There are two ways to create an R file in R studio:
 You can click on the File tab, from there when you click it will give a drop-down
menu, where you can select the new file and then R script, so that, you will get a
new file open.
 Use the plus button, which is just below the file tab and you can choose R script,
from there, to open a new R script file.

Once you open an R script file, this is how an R Studio with the script file open looks
like.

So, 3 panels console, environment/history and file/plots panels are there. On top left
you have a new window, which is now being opened as a script file. Now you are
ready to write a script file or some program in R Studio.
Writing Scripts in an R File
Writing scripts to an R file is demonstrated here with an example:
In the above example, a variable ‘a’ is assigned with a value 11, in the first line of the
code and there is b which is ‘a’ times 10, that is the second command. Here, the code
is evaluating the value of a times 10 and assign the value to the b and the third
statement, which is print(c(a, b)) means concatenates this a and b and print the result.
So, this is how a script file is written in R. After writing a script file, there is a need to
save this file before execution.
Saving an R File
Let us see, how to save the R file. From the file menu if you click the file tab you can
either save or save as button. When you want to save the file if you click the save
button, it will automatically save the file has untitled x. So, this x can be 1 or 2
depending upon how many R scripts you have already opened.

Or, it is a nice idea to use the Save as button, just below the Save one, so that, you can
rename the script file according to your wish. Let us suppose we have clicked the Save
as button. This will pop out a window like this, where you can rename the script file
as test.R. Once you rename, then by clicking the save button you can save the script
file.

So now, we have seen how to open an R script and how to write some code in the R
script file and save the file.
The next task is to execute the R file.
Execution of an R file
There are several ways in which the execution of the commands that are available in
the R file is done.

 Using the run command: This “run” command can be executed using the GUI,
by pressing the run button there, or you can use the Shortcut key control + enter.
What does it do?
It will execute the line in which the cursor is there.
 Using the source command:
This “source” command can be executed using the GUI, by pressing the source
button there, or you can use the Shortcut key control + shift + S.
What does it do?
It will execute the whole R file and only print the output which you wanted to
print.
 Using the source with echo command:
This “source with echo” command can be executed using the GUI, by pressing the
source with echo button there, or you can use the Shortcut key control + shift +
enter.
What does it do?
It will print the commands also, along with the output you are printing.

R Data Types
A variable can store different types of values such as numbers, characters
etc. These different types of data that we can use in our code are
called data types.

Different Types of Data Types


In R, there are 6 basic data types:

 logical
 numeric
 integer
 complex
 character
 raw

Logical Data Type


The logical data type in R is also known as boolean data type. It can only have two
values: TRUE and FALSE . For example,

bool1 <- TRUE

print(bool1)

print(class(bool1))

bool2 <- FALSE

print(bool2)

print(class(bool2))

Output

[1] TRUE
[1] "logical"

[1] FALSE

[1] "logical"

In the above example,

 bool1 has the value TRUE ,

 bool2 has the value FALSE .

Here, we get "logical" when we check the type of both variables.


Note: You can also define logical variables with a single letter -
T for TRUE or F for FALSE . For example,

is_weekend <- F

print(class(is_weekend)) # "logical"

2. Numeric Data Type


In R, the numeric data type represents all real numbers with or without decimal values. For
example,

# floating point values

weight <- 63.5

print(weight)

print(class(weight))

# real numbers

height <- 182

print(height)

print(class(height))

Output

[1] 63.5

[1] "numeric"

[1] 182

[1] "numeric"

Here, both weight and height are variables of numeric type.


3. Integer Data Type
The integer data type specifies real values without decimal points. We use
the suffix L to specify integer data. For example,
integer_variable <- 186L

print(class(integer_variable))

Output
[1] "integer"

Here, 186L is an integer data. So we get "integer" when we print the class
of integer_variable .

4. Complex Data Type

The complex data type is used to specify purely imaginary values in R. We


use the suffix i to specify the imaginary part. For example,
# 2i represents imaginary part

complex_value <- 3 + 2i

# print class of complex_value

print(class(complex_value))

Output
[1] "complex"

Here, 3 + 2i is of complex data type because it has an imaginary part 2i .

5. Character Data Type

The character data type is used to specify character or string values in a


variable.
In programming, a string is a set of characters. For example, 'A' is a single
character and "Apple" is a string.
You can use single quotes '' or double quotes "" to represent strings. In
general, we use:
 '' for character variables
 "" for string variables
For example,

# create a string variable

fruit <- "Apple"

print(class(fruit))
# create a character variable

my_char <- 'A'

print(class(my_char))

Output

[1] "character"

[1] "character"

Here, both the variables - fruit and my_char - are of character data type.
6. Raw Data Type

A raw data type specifies values as raw bytes. You can use the following
methods to convert character data types to a raw data type and vice-versa:
 charToRaw() - converts character data to raw data
 rawToChar() - converts raw data to character data
For example,

# convert character to raw

raw_variable <- charToRaw("Welcome to Programiz")

print(raw_variable)

print(class(raw_variable))

# convert raw to character

char_variable <- rawToChar(raw_variable)

print(char_variable)

print(class(char_variable))

Output

[1] 57 65 6c 63 6f 6d 65 20 74 6f 20 50 72 6f 67 72 61 6d 69 7a

[1] "raw"

[1] "Welcome to Programiz"

[1] "character"

Conversion into Numeric


We can use the as.numeric function to convert the values of other data
types into numerical values. The conversion follows a few rules, which are:
 To convert an integer value into a numeric, we can use
the as.numeric function.
 We can convert a complex value into numeric by using the function.
This removes the imaginary part of the number.
 Logical values can be converted into numeric as well by the function.
The TRUE value is converted to 1, and FALSE is converted to 0.
 Character values can similarly be converted into numerical values but if
the string contains letters, alphabets, and symbols then the numeric
value becomes NA.
> num2 <- as.numeric(int)
> num2
> num3 <- as.numeric(comp)
> num3
> num4 <- as.numeric(logi)
> num4
> num5 <- as.numeric(char)
> num5 <- as.numeric("1234")
> num5

Conversion into Integer


The as.integer function can convert the values of other data types into
integer values according to the following rules:
 Numeric values can be converted into an integer using the function.
This removes any decimal values from the number.
 Complex values can also be converted into integers. The function
removes the imaginary parts of the number.
 The conversion from logical values to integers is similar to the
conversion of logical values to numerics. TRUE is converted to 1, and
FALSE is converted to 0.
 Character values can be converted into integers as well by using
the as.integer function. This conversion follows the same rules as the
character to numeric conversion.
> int4 <- as.integer(num)
> int4
> int5 <- as.integer(14.7)
> int5
> int6 <- as.integer(comp)
> int6
> int7 <- as.integer(logi)
> int7
> int8 <- as.integer("1234")
> int8
Conversion into Complex
Using the as.complex function, we can convert other values into the complex
data types. The conversion takes place according to the following rules:
 Numeric values can be converted into complex by using the as.complex
function or by adding an imaginary part to it.
 Integer value can also be converted into complex values similarly.
 Logical values become 0+0i for FALSE and 1+0i for TRUE when
converted into complex values using the as.complex function. We can
also convert a logical value into a complex value by adding an imaginary
part to it.
 The conversion from a character to a complex is the same as the
conversion from character to numeric or an integer with 0i added to the
converted value if it is not NA.
> comp2 <- as.complex(num)
> comp2
> comp3 <- as.complex(int)
> comp3
> comp4 <- as.complex(logi)
> comp4
> comp5 <- as.complex("1234")
> comp5
Conversion into Logical
Conversion into logical data type can be done by using
the as.logical function, by following the given rules:
 Numeric, integer, and complex values can be converted into logical
values, but the function returns FALSE if the value is zero and TRUE if
it is anything else.
 Character values when converted by the as.logical function, always
return NA.
> logi2 <- as.logical(num)
> logi2
> logi3 <- as.logical(int)
> logi3
> logi4 <- as.logical(comp)
> logi4
> logi5 <- as.logical(char)
> logi5
Conversion into character
We can convert a value of any data type into character data type using
the as.character function. The function converts the original value into a
character string.
> char2 <- as.character("hello")
> char3 <- as.character(comp)
> char2
> char3

R Objects
In contrast to other programming languages like C and java in R, the variables
are not declared as some data type. The variables are assigned with R-
Objects and the data type of the R-object becomes the data type of the
variable. There are many types of R-objects. The frequently used ones are −

 Vectors
 Lists
 Matrices
 Arrays
 Factors
 Data Frames
The simplest of these objects is the vector object and there are six data types of
these atomic vectors, also termed as six classes of vectors. The other R-Objects
are built upon the atomic vectors.
In R programming, the very basic data types are the R-objects called vectors which
hold elements of different classes as shown above. Please note in R the number of
classes is not confined to only the above six types. For example, we can use many
atomic vectors and create an array whose class will become array.

Vectors
Vectors are the most basic R data objects and there are six types of atomic vectors.
They are logical, integer, double, complex, character and raw.

A vector is simply a list of items that are of the same type.

To combine the list of items to a vector, use the c() function and separate
the items by a comma.

In the example below, we create a vector variable called fruits, that combine
strings:

Example

# Vector of strings
fruits <- c("banana", "apple", "orange")

# Print fruits
fruits

In this example, we create a vector that combines numerical values:

Example

# Vector of numerical values


numbers <- c(1, 2, 3)

# Print numbers
numbers

To create a vector with numerical values in a sequence, use the : operator:

Example

# Vector with numerical values in a sequence


numbers <- 1:10

numbers
You can also create numerical values with decimals in a sequence, but note
that if the last element does not belong to the sequence, it is not used:

Example

# Vector with numerical decimals in a sequence


numbers1 <- 1.5:6.5
numbers1

# Vector with numerical decimals in a sequence where the last element


is not used
numbers2 <- 1.5:6.3
numbers2

Result:

[1] 1.5 2.5 3.5 4.5 5.5 6.5


[1] 1.5 2.5 3.5 4.5 5.5

In the example below, we create a vector of logical values:

Example

# Vector of logical values


log_values <- c(TRUE, FALSE, TRUE, FALSE)

log_values

Vector Length

To find out how many items a vector has, use the length() function:

Example
fruits <- c("banana", "apple", "orange")

length(fruits)

Sort a Vector

To sort items in a vector alphabetically or numerically, use


the sort() function:

Example

fruits <- c("banana", "apple", "orange", "mango", "lemon")


numbers <- c(13, 3, 5, 7, 20, 2)

sort(fruits) # Sort a string


sort(numbers) # Sort numbers
Access Vectors
You can access the vector items by referring to its index number inside
brackets []. The first item has index 1, the second item has index 2, and so
on:

Example

fruits <- c("banana", "apple", "orange")

# Access the first item (banana)


fruits[1]

You can also access multiple elements by referring to different index


positions with the c() function:

Example

fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Access the first and third item (banana and orange)


fruits[c(1, 3)]

You can also use negative index numbers to access all items except the ones
specified:

Example

fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Access all items except for the first item


fruits[c(-1)]

Change an Item
To change the value of a specific item, refer to the index number:

Example
fruits <- c("banana", "apple", "orange", "mango", "lemon")

# Change "banana" to "pear"


fruits[1] <- "pear"

# Print fruits
fruits

Repeat Vectors
To repeat vectors, use the rep() function:

Example

Repeat each value:


repeat_each <- rep(c(1,2,3), each = 3)

repeat_each
Example

Repeat the sequence of the vector:

repeat_times <- rep(c(1,2,3), times = 3)

repeat_times
Example

Repeat each value independently:

repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))

repeat_indepent

Generating Sequenced Vectors


One of the examples on top, showed you how to create a vector with
numerical values in a sequence with the : operator:

Example

numbers <- 1:10

numbers

To make bigger or smaller steps in a sequence, use the seq() function:

Example

numbers <- seq(from = 0, to = 100, by = 20)

numbers

Note: The seq() function has three parameters: from is where the sequence
starts, to is where the sequence stops, and by is the interval of the sequence.

Lists
A list in R can contain many different data types inside it. A list is a collection
of data which is ordered and changeable.

To create a list, use the list() function:

Example

# List of strings
thislist <- list("apple", "banana", "cherry")

# Print the list


thislist
Access Lists

You can access the list items by referring to its index number, inside
brackets. The first item has index 1, the second item has index 2, and so on:

Example

thislist <- list("apple", "banana", "cherry")

thislist[1]

Change Item Value

To change the value of a specific item, refer to the index number:

Example

thislist <- list("apple", "banana", "cherry")


thislist[1] <- "blackcurrant"

# Print the updated list


thislist

List Length
To find out how many items a list has, use the length() function:

Example

thislist <- list("apple", "banana", "cherry")

length(thislist)

Check if Item Exists


To find out if a specified item is present in a list, use the %in% operator:

Example

Check if "apple" is present in the list:

thislist <- list("apple", "banana", "cherry")

"apple" %in% thislist

Add List Items


To add an item to the end of the list, use the append() function:

Example

Add "orange" to the list:


thislist <- list("apple", "banana", "cherry")

append(thislist, "orange")

To add an item to the right of a specified index, add " after=index number" in
the append() function:

Example

Add "orange" to the list after "banana" (index 2):

thislist <- list("apple", "banana", "cherry")

append(thislist, "orange", after = 2)

Remove List Items

You can also remove list items. The following example creates a new,
updated list without an "apple" item:

Example

Remove "apple" from the list:

thislist <- list("apple", "banana", "cherry")

newlist <- thislist[-1]

# Print the new list


newlist

Range of Indexes

You can specify a range of indexes by specifying where to start and where to
end the range, by using the : operator:

Example
Return the second, third, fourth and fifth item:

thislist <-
list("apple", "banana", "cherry", "orange", "kiwi", "melon", "mango")

(thislist)[2:5]

Note: The search will start at index 2 (included) and end at index 5
(included).

Remember that the first item has index 1.


Loop Through a List
You can loop through the list items by using a for loop:

Example

Print all items in the list, one by one:

thislist <- list("apple", "banana", "cherry")

for (x in thislist) {
print(x)
}

Join Two Lists

There are several ways to join, or concatenate, two or more lists in R.

The most common way is to use the c() function, which combines two
elements together:

Example
list1 <- list("a", "b", "c")
list2 <- list(1,2,3)
list3 <- c(list1,list2)

list3

Converting List to Vector


A list can be converted to a vector so that the elements of the vector can be used for
further manipulation. All the arithmetic operations on vectors can be applied after the
list is converted into vectors. To do this conversion, we use the unlist() function. It
takes the list as input and produces a vector.
# Create lists.
list1 <- list(1:5)
print(list1)

list2 <-list(10:14)
print(list2)

# Convert the lists to vectors.


v1 <- unlist(list1)
v2 <- unlist(list2)

print(v1)
print(v2)

# Now add the vectors


result <- v1+v2
print(result)
When we execute the above code, it produces the following result −
[[1]]
[1] 1 2 3 4 5

[[1]]
[1] 10 11 12 13 14

[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19

Matrices
A matrix is a two dimensional data set with columns and rows.

A column is a vertical representation of data, while a row is a horizontal


representation of data.

A matrix can be created with the matrix() function. Specify


the nrow and ncol parameters to get the amount of rows and columns:

Example

# Create a matrix
thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)

# Print the matrix


thismatrix

Note: Remember the c() function is used to concatenate items together.

You can also create a matrix with strings:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow


= 2, ncol = 2)

thismatrix

Access Matrix Items


You can access the items by using [ ] brackets. The first number "1" in the
bracket specifies the row-position, while the second number "2" specifies the
column-position:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow


= 2, ncol = 2)

thismatrix[1, 2]
The whole row can be accessed if you specify a comma after the number in
the bracket:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow


= 2, ncol = 2)

thismatrix[2,]

The whole column can be accessed if you specify a comma before the
number in the bracket:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow


= 2, ncol = 2)

thismatrix[,2]

Access More Than One Row

More than one row can be accessed if you use the c() function:

Example

thismatrix <-
matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "p
ear", "melon", "fig"), nrow = 3, ncol = 3)

thismatrix[c(1,2),]

Access More Than One Column

More than one column can be accessed if you use the c() function:

Example

thismatrix <-
matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "p
ear", "melon", "fig"), nrow = 3, ncol = 3)

thismatrix[, c(1,2)]

Add Rows and Columns

Use the cbind() function to add additional columns in a Matrix:

Example

thismatrix <-
matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "p
ear", "melon", "fig"), nrow = 3, ncol = 3)

newmatrix <- cbind(thismatrix,


c("strawberry", "blueberry", "raspberry"))

# Print the new matrix


newmatrix

Note: The cells in the new column must be of the same length as the
existing matrix.

Use the rbind() function to add additional rows in a Matrix:

Example

thismatrix <-
matrix(c("apple", "banana", "cherry", "orange","grape", "pineapple", "p
ear", "melon", "fig"), nrow = 3, ncol = 3)

newmatrix <- rbind(thismatrix,


c("strawberry", "blueberry", "raspberry"))

# Print the new matrix


newmatrix

Note: The cells in the new row must be of the same length as the existing
matrix.

Remove Rows and Columns

Use the c() function to remove rows and columns in a Matrix:

Example

thismatrix <-
matrix(c("apple", "banana", "cherry", "orange", "mango", "pineapple"),
nrow = 3, ncol =2)

#Remove the first row and the first column


thismatrix <- thismatrix[-c(1), -c(1)]

thismatrix

Check if an Item Exists


To find out if a specified item is present in a matrix, use the %in% operator:

Example

Check if "apple" is present in the matrix:


thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow
= 2, ncol = 2)

"apple" %in% thismatrix

Number of Rows and Columns

Use the dim() function to find the number of rows and columns in a Matrix:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow


= 2, ncol = 2)

dim(thismatrix)

Matrix Length
Use the length() function to find the dimension of a Matrix:

Example

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow


= 2, ncol = 2)

length(thismatrix)

Total cells in the matrix is the number of rows multiplied by number of


columns.

In the example above: Dimension = 2*2 = 4.

Loop Through a Matrix

You can loop through a Matrix using a for loop. The loop will start at the first
row, moving right:

Example

Loop through the matrix items and print them:

thismatrix <- matrix(c("apple", "banana", "cherry", "orange"), nrow


= 2, ncol = 2)

for (rows in 1:nrow(thismatrix)) {


for (columns in 1:ncol(thismatrix)) {
print(thismatrix[rows, columns])
}
}
Combine two Matrices
Again, you can use the rbind() or cbind() function to combine two or more
matrices together:

Example

# Combine matrices
Matrix1 <- matrix(c("apple", "banana", "cherry", "grape"), nrow = 2,
ncol = 2)
Matrix2 <- matrix(c("orange", "mango", "pineapple", "watermelon"), nrow
= 2, ncol = 2)

# Adding it as a rows
Matrix_Combined <- rbind(Matrix1, Matrix2)
Matrix_Combined

# Adding it as a columns
Matrix_Combined <- cbind(Matrix1, Matrix2)
Matrix_Combined

Arrays
Compared to matrices, arrays can have more than two dimensions.

We can use the array() function to create an array, and the dim parameter to
specify the dimensions:

Example

# An array with one dimension with values ranging from 1 to 24


thisarray <- c(1:24)
thisarray

# An array with more than one dimension


multiarray <- array(thisarray, dim = c(4, 3, 2))
multiarray

Example Explained

In the example above we create an array with the values 1 to 24.

How does dim=c(4,3,2) work?


The first and second number in the bracket specifies the amount of rows and
columns.
The last number in the bracket specifies how many dimensions we want.

Note: Arrays can only have one data type.


Access Array Items

You can access the array elements by referring to the index position. You can
use the [] brackets to access the desired elements from an array:

Example

thisarray <- c(1:24)


multiarray <- array(thisarray, dim = c(4, 3, 2))

multiarray[2, 3, 2]

The syntax is as follow: array[row position, column position, matrix level]

You can also access the whole row or column from a matrix in an array, by
using the c() function:

thisarray <- c(1:24)

# Access all the items from the first row from matrix one
multiarray <- array(thisarray, dim = c(4, 3, 2))
multiarray[c(1),,1]

# Access all the items from the first column from matrix one
multiarray <- array(thisarray, dim = c(4, 3, 2))
multiarray[,c(1),1]

A comma (,) before c() means that we want to access the column.

A comma (,) after c() means that we want to access the row.

Check if an Item Exists


To find out if a specified item is present in an array, use the %in% operator:

Example

Check if the value "2" is present in the array:

thisarray <- c(1:24)


multiarray <- array(thisarray, dim = c(4, 3, 2))

2 %in% multiarray

Amount of Rows and Columns

Use the dim() function to find the amount of rows and columns in an array:

Example

thisarray <- c(1:24)


multiarray <- array(thisarray, dim = c(4, 3, 2))
dim(multiarray)

Array Length

Use the length() function to find the dimension of an array:

Example

thisarray <- c(1:24)


multiarray <- array(thisarray, dim = c(4, 3, 2))

length(multiarray)

Loop Through an Array


You can loop through the array items by using a for loop:

Example

thisarray <- c(1:24)


multiarray <- array(thisarray, dim = c(4, 3, 2))

for(x in multiarray){
print(x)
}

Factors
Factors are used to categorize data. Examples of factors are:

 Demography: Male/Female
 Music: Rock, Pop, Classic, Jazz
 Training: Strength, Stamina

To create a factor, use the factor() function and add a vector as argument:

Example

# Create a factor
music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"))

# Print the factor


music_genre

Result:

[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz

Levels: Classic Jazz Pop Rock


You can see from the example above that that the factor has four levels
(categories): Classic, Jazz, Pop and Rock.

To only print the levels, use the levels() function:

Example

music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"))

levels(music_genre)

Result:

[1] "Classic" "Jazz" "Pop" "Rock"

You can also set the levels, by adding the levels argument inside
the factor() function:

Example

music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"), levels = c("Classic", "Jazz", "Pop", "Rock", "Other"))

levels(music_genre)

Result:

[1] "Classic" "Jazz" "Pop" "Rock" "Other"

Factor Length
Use the length() function to find out how many items there are in the factor:

Example

music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"))

length(music_genre)

Result:

[1] 8

Access Factors
To access the items in a factor, refer to the index number, using [] brackets:

Example
Access the third item:

music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"))

music_genre[3]

Result:

[1] Classic

Levels: Classic Jazz Pop Rock

Change Item Value


To change the value of a specific item, refer to the index number:

Example

Change the value of the third item:

music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"))

music_genre[3] <- "Pop"

music_genre[3]

Result:

[1] Pop

Levels: Classic Jazz Pop Rock

Note that you cannot change the value of a specific item if it is not already
specified in the factor. The following example will produce an error:

Example

Trying to change the value of the third item ("Classic") to an item that does not exist/not predefined
("Opera"):

music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"))

music_genre[3] <- "Opera"

music_genre[3]

Result:
Warning message:

In `[<-.factor`(`*tmp*`, 3, value = "Opera") :

invalid factor level, NA generated

However, if you have already specified it inside the levels argument, it will
work:

Example

Change the value of the third item:

music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "
Jazz"), levels = c("Classic", "Jazz", "Pop", "Rock", "Opera"))

music_genre[3] <- "Opera"

music_genre[3]

Result:

[1] Opera

Levels: Classic Jazz Pop Rock Opera

Data Frames
Data Frames are data displayed in a format as a table.

Data Frames can have different types of data inside it. While the first column
can be character, the second and third can be numeric or logical. However,
each column should have the same type of data.

Use the data.frame() function to create a data frame:

Example

# Create a data frame


Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Print the data frame


Data_Frame
Summarize the Data
Use the summary() function to summarize the data from a Data Frame:

Example

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame

summary(Data_Frame)

You will learn more about the summary() function in the statistical part of the R
tutorial.

Access Items
We can use single brackets [ ], double brackets [[ ]] or $ to access columns
from a data frame:

Example

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame[1]

Data_Frame[["Training"]]

Data_Frame$Training

Add Rows

Use the rbind() function to add new rows in a Data Frame:

Example

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new row


New_row_DF <- rbind(Data_Frame, c("Strength", 110, 110))

# Print the new row


New_row_DF

Add Columns

Use the cbind() function to add new columns in a Data Frame:

Example

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new column


New_col_DF <- cbind(Data_Frame, Steps = c(1000, 6000, 2000))

# Print the new column


New_col_DF

Remove Rows and Columns

Use the c() function to remove rows and columns in a Data Frame:

Example

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Remove the first row and column


Data_Frame_New <- Data_Frame[-c(1), -c(1)]

# Print the new data frame


Data_Frame_New

Amount of Rows and Columns


Use the dim() function to find the amount of rows and columns in a Data
Frame:

Example
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

dim(Data_Frame)

You can also use the ncol() function to find the number of columns
and nrow() to find the number of rows:

Example

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

ncol(Data_Frame)
nrow(Data_Frame)

Data Frame Length


Use the length() function to find the number of columns in a Data Frame
(similar to ncol()):

Example

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

length(Data_Frame)

Combining Data Frames


Use the rbind() function to combine two or more data frames in R vertically:

Example

Data_Frame1 <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame2 <- data.frame (


Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)

New_Data_Frame <- rbind(Data_Frame1, Data_Frame2)


New_Data_Frame

And use the cbind() function to combine two or more data frames in R
horizontally:

Example

Data_Frame3 <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

Data_Frame4 <- data.frame (


Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)

New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4)


New_Data_Frame1

OPERATOR
An operator is a symbol that tells the compiler to perform specific mathematical or
logical manipulations. R language is rich in built-in operators and provides following
types of operators.

Types of Operators
We have the following types of operators in R programming −

 Arithmetic Operators
 Relational Operators
 Logical Operators
 Assignment Operators
 Miscellaneous Operators

Arithmetic Operators
Following table shows the arithmetic operators supported by R language. The
operators act on each element of the vector.

Operator Description Example


+ Adds two vectors

− Subtracts second vector from the v <- c( 2,5.5,6)


first t <- c(8, 3, 4)
print(v-t)

it produces the following


result −
[1] -6.0 2.5 2.0

* Multiplies both vectors v <- c( 2,5.5,6)


t <- c(8, 3, 4)
print(v*t)

it produces the following


result −
[1] 16.0 16.5 24.0

/ Divide the first vector with the v <- c( 2,5.5,6)


second t <- c(8, 3, 4)
print(v/t)

When we execute the above


code, it produces the
following result −
[1] 0.250000 1.833333 1.500000

%% Give the remainder of the first v <- c( 2,5.5,6)


vector with the second t <- c(8, 3, 4)
print(v%%t)

it produces the following


result −
[1] 2.0 2.5 2.0

%/% The result of division of first vector v <- c( 2,5.5,6)


with second (quotient) t <- c(8, 3, 4)
print(v%/%t)

it produces the following


result −
[1] 0 1 1

^ The first vector raised to the v <- c( 2,5.5,6)


exponent of second vector t <- c(8, 3, 4)
print(v^t)

it produces the following


result −
[1] 256.000 166.375 1296.000

Relational Operators
Following table shows the relational operators supported by R language. Each
element of the first vector is compared with the corresponding element of the second
vector. The result of comparison is a Boolean value.

Operator Description Example

> v <- c(2,5.5,6,9)


t <- c(8,2.5,14,9)
Checks if each element of the first print(v>t)
vector is greater than the corresponding
element of the second vector. it produces the following result −
[1] FALSE TRUE FALSE FALSE

< v <- c(2,5.5,6,9)


t <- c(8,2.5,14,9)
Checks if each element of the first print(v < t)
vector is less than the corresponding
element of the second vector. it produces the following result −
[1] TRUE FALSE TRUE FALSE

== v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
Checks if each element of the first print(v == t)
vector is equal to the corresponding
element of the second vector. it produces the following result −
[1] FALSE FALSE FALSE TRUE

<= v <- c(2,5.5,6,9)


Checks if each element of the first t <- c(8,2.5,14,9)
vector is less than or equal to the print(v<=t)
corresponding element of the second
it produces the following result −
vector.
[1] TRUE FALSE TRUE TRUE

>= v <- c(2,5.5,6,9)


Checks if each element of the first t <- c(8,2.5,14,9)
vector is greater than or equal to the print(v>=t)
corresponding element of the second
it produces the following result −
vector.
[1] FALSE TRUE FALSE TRUE

!= v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
Checks if each element of the first print(v!=t)
vector is unequal to the corresponding
element of the second vector. it produces the following result −
[1] TRUE TRUE TRUE FALSE

Logical Operators
Following table shows the logical operators supported by R language. It is applicable
only to vectors of type logical, numeric or complex. All numbers greater than 1 are
considered as logical value TRUE.
Each element of the first vector is compared with the corresponding element of the
second vector. The result of comparison is a Boolean value.
Operator Description Example

&
It is called Element-wise Logical AND
v <- c(3,1,TRUE,2+3i)
operator. It combines each element of t <- c(4,1,FALSE,2+3i)
the first vector with the corresponding print(v&t)
element of the second vector and gives
a output TRUE if both the elements are it produces the following result −
TRUE. [1] TRUE TRUE FALSE TRUE

| It is called Element-wise Logical OR v <- c(3,0,TRUE,2+2i)


operator. It combines each element of t <- c(4,0,FALSE,2+3i)
the first vector with the corresponding print(v|t)
element of the second vector and gives
it produces the following result −
a output TRUE if one the elements is
TRUE. [1] TRUE FALSE TRUE TRUE

! v <- c(3,0,TRUE,2+2i)
It is called Logical NOT operator. Takes print(!v)
each element of the vector and gives
it produces the following result −
the opposite logical value.
[1] FALSE TRUE FALSE FALSE

The logical operator && and || considers only the first element of the vectors and give
a vector of single element as output.

Operator Description Example

&& v <- c(3,0,TRUE,2+2i)


t <- c(1,3,TRUE,2+3i)
Called Logical AND operator. Takes first print(v&&t)
element of both the vectors and gives the
TRUE only if both are TRUE. it produces the following result −
[1] TRUE

|| v <- c(0,0,TRUE,2+2i)
t <- c(0,3,TRUE,2+3i)
Called Logical OR operator. Takes first print(v||t)
element of both the vectors and gives the
TRUE if one of them is TRUE. it produces the following result −
[1] FALSE

Assignment Operators
These operators are used to assign values to vectors.

Operator Description Example


Called Left Assignment v1 <- c(3,1,TRUE,2+3i)
v2 <<- c(3,1,TRUE,2+3i)
<− v3 = c(3,1,TRUE,2+3i)
print(v1)
or print(v2)
= print(v3)

or it produces the following result −

<<− [1] 3+0i 1+0i 1+0i 2+3i


[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i

Called Right Assignment c(3,1,TRUE,2+3i) -> v1


c(3,1,TRUE,2+3i) ->> v2
-> print(v1)
print(v2)
or
it produces the following result −
->>
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i

Miscellaneous Operators
These operators are used to for specific purpose and not general mathematical or
logical computation.

Operator Description Example

: Colon v <- 2:8


operator. It print(v)
creates the
series of it produces the following result −
numbers in [1] 2 3 4 5 6 7 8
sequence
for a vector.

%in% v1 <- 8
This v2 <- 12
operator is t <- 1:10
used to print(v1 %in% t)
identify if an print(v2 %in% t)
element it produces the following result −
belongs to a
vector. [1] TRUE
[1] FALSE

%*% M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow = TRUE)


This t = M %*% t(M)
operator is print(t)
used to
multiply a it produces the following result −
matrix with
[,1] [,2]
its
[1,] 65 82
transpose.
[2,] 82 117
Taking Input from User in R
Programming
Developers often have a need to interact with users, either to get data or to
provide some sort of result. Most programs today use a dialog box as a way
of asking the user to provide some type of input. Like other programming
languages in R it’s also possible to take input from the user. For doing so,
there are two methods in R.

 Using readline() method


 Using scan() method

Using readline() method


In R language readline() method takes input in string format. If one inputs an
integer then it is inputted as a string, lets say, one wants to input 255, then it
will input as “255”, like a string. So one needs to convert that inputted value
to the format that he needs. In this case, string “255” is converted to integer
255. To convert the inputted value to the desired data type, there are some
functions in R,

 as.integer(n); —> convert to integer


 as.numeric(n); —> convert to numeric type (float, double etc)
 as.complex(n); —> convert to complex number (i.e 3+2i)
 as.Date(n) —> convert to date …, etc

Syntax:
var = readline();
var = as.integer(var);
Note that one can use “<-“ instead of “=”

Example:

 R

# R program to illustrate

# taking input from the user

# taking input using readline()

# this command will prompt you


# to input a desired value

var = readline();

# convert the inputted value to integer

var = as.integer(var);

# print the value

print(var)

Output:

255
[1] 255

One can also show message in the console window to tell the user, what to
input in the program. To do this one must use a argument
named prompt inside the readline() function. Actually prompt argument
facilitates other functions to constructing of files documenting. But prompt is
not mandatory to use all the time.

Syntax:
var1 = readline(prompt = “Enter any number :”);
or,
var1 = readline(“Enter any number : “);
Example:

 R

# R program to illustrate

# taking input from the user

# taking input with showing the message

var = readline(prompt = "Enter any number : ");

# convert the inputted value to an integer

var = as.integer(var);
# print the value

print(var)

Output:

Enter any number : 255


[1] 255

Taking multiple inputs in R


Taking multiple inputs in R language is same as taking single input, just need
to define multiple readline() for inputs. One can use braces for defining
multiple readline() inside it.

Syntax:
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
or,
{
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
}
Example:

# R program to illustrate

# taking input from the user

# taking multiple inputs

# using braces

var1 = readline("Enter 1st number : ");

var2 = readline("Enter 2nd number : ");

var3 = readline("Enter 3rd number : ");


var4 = readline("Enter 4th number : ");

# converting each value

var1 = as.integer(var1);

var2 = as.integer(var2);

var3 = as.integer(var3);

var4 = as.integer(var4);

# print the sum of the 4 number

print(var1 + var2 + var3 + var4)

Output:

Enter 1st number : 12


Enter 2nd number : 13
Enter 3rd number : 14
Enter 4th number : 15
[1] 54

Taking String and Character input in R

To take string input is the same as an integer. For “String” one doesn’t need
to convert the inputted data into a string because R takes input as string
always. And for “character”, it needs to be converted to ‘character’.
Sometimes it may not cause any error. One can take character input as
same as string also, but that inputted data is of type string for the entire
program. So the best way to use that inputted data as ‘character’ is to
convert the data to a character.

Syntax:
string:
var1 = readline(prompt = “Enter your name : “);
print(var1)
character:
var1 = readline(prompt = “Enter any character : “);
var1 = as.character(var1)
print(var1)
Example:

 R

# R program to illustrate

# taking input from the user

# string input

var1 = readline(prompt = "Enter your name : ");

# character input

var2 = readline(prompt = "Enter any character : ");

# convert to character

var2 = as.character(var2)

# printing values

print(var1)

print(var2)

Output:

Enter your name : GeeksforGeeks


Enter any character : G
[1] "GeeksforGeeks"
[1] "G"

Using scan() method


Another way to take user input in R language is using a method,
called scan() method. This method takes input from the console. This
method is a very handy method while inputs are needed to take quickly for
any mathematical calculation or for any dataset. This method reads data in
the form of a vector or list. This method also uses to reads input from a file
also.

Syntax:
x = scan()
scan() method is taking input continuously, to terminate the input process,
need to press Enter key 2 times on the console.
Example:
This is simple method to take input using scan() method, where some
integer number is taking as input and print those values in the next line on
the console.

 R

# R program to illustrate

# taking input from the user

# taking input using scan()

x = scan()

# print the inputted values

print(x)

Output:

1: 1 2 3 4 5 6
7: 7 8 9 4 5 6
13:
Read 12 items
[1] 1 2 3 4 5 6 7 8 9 4 5 6
Explanation:
Total 12 integers are taking as input in 2 lines when the control goes to 3rd
line then by pressing Enter key 2 times the input process will be terminated.

Taking double, string, character type values using scan()


method
To take double, string, character types inputs, specify the type of the inputted
value in the scan() method. To do this there is an argument called what, by
which one can specify the data type of the inputted value.

Syntax:
x = scan(what = double()) —-for double
x = scan(what = ” “) —-for string
x = scan(what = character()) —-for character
Example:

 R

# R program to illustrate

# taking input from the user

# double input using scan()

d = scan(what = double())

# string input using 'scan()'

s = scan(what = " ")

# character input using 'scan()'

c = scan(what = character())

# print the inputted values

print(d) # double

print(s) # string

print(c) # character

Output:

1: 123.321 523.458 632.147


4: 741.25 855.36
6:
Read 5 items
1: geeksfor geeks gfg
4: c++ R java python
8:
Read 7 items
1: g e e k s f o
8: r g e e k s
14:
Read 13 items
[1] 123.321 523.458 632.147 741.250 855.360
[1] "geeksfor" "geeks" "gfg" "c++" "R" "java"
"python"
[1] "g" "e" "e" "k" "s" "f" "o" "r" "g" "e" "e" "k" "s"
Explanation:
Here, count of double items is 5, count of string items is 7, count of character
items is 13.

Read File data using scan() method


To read file using scan() method is same as normal console input, only thing
is that, one needs to pass the file name and data type to the scan() method.

Syntax:
x = scan(“fileDouble.txt”, what = double()) —-for double
x = scan(“fileString.txt”, what = ” “) —-for string
x = scan(“fileChar.txt”, what = character()) —-for character
Example:

# R program to illustrate

# taking input from the user

# string file input using scan()

s = scan("fileString.txt", what = " ")

# double file input using scan()

d = scan("fileDouble.txt", what = double())

# character file input using scan()

c = scan("fileChar.txt", what = character())

# print the inputted values

print(s) # string

print(d) # double
print(c) # character

Output:

Read 7 items
Read 5 items
Read 13 items
[1] "geek" "for" "geeks" "gfg" "c++" "java" "python"
[1] 123.321 523.458 632.147 741.250 855.360
[1] "g" "e" "e" "k" "s" "f" "o" "r" "g" "e" "e" "k" "s"
Save the data file in the same location where the program is saved for better
access. Otherwise total path of the file need to defined inside
the scan() method.

Control Structures in R
R provides different control structures that can be used on their own and
even in combinations to control the flow of the program. These control
structures are:

1. If – else
2. ifelse() function
3. Switch
4. For loops
5. While loops
6. Break statement
7. Next statement
8. Repeat loops
Let’s take a look at these structures one at a time:

1. if – else
The if-else in R enforce conditional execution of code. They are an important
part of R’s decision-making capability. It allows us to make a decision based
on the result of a condition. The if statement contains a condition that
evaluates to a logical output. It runs the enclosed code block if the condition
evaluates to TRUE. It skips the code block if the condition evaluates
to FALSE.
We can use the if statement on its own like:
Code:

a <- 5
b <- 6

if(a<b){

print("a is smaller than b")

Output:

You must check R Data Types as it plays a vital role in Control Structures.
We use the else statement with the if statement to enact a choice
between two alternatives. If the condition within the if statement evaluates
to FALSE, it runs the code within the else statement. For example:
Code:

if(a>b){

print("a is greater than b")

} else{

print("b is greater than a")

Output:
We can use the else if statement to select between multiple options. For
example:
Code:

a <- 5

b <- 5

if(a<b){

print("a is smaller than b")

} else if(a==b) {

print("a is equal to b")

} else {

print("a is greater than b")

Output:
2. ifelse() Function
The ifelse() function acts like the if-else structure. The following is the
syntax of the ifelse() function in R:

ifelse(condition, exp_if_true, exp_if_false)

Where condition is the condition that evaluates to either TRUE or FALSE,


exp_if_true is the expression that is returned if the condition results
in TRUE,
exp_if_false is the expression that is returned if the condition results
in FALSE.
Code:

ifelse(a<7,"a is less than 7","a is greater than 7")

Output:
3. switch
The switch is an easier way to choose between multiple alternatives than
multiple if-else statements. The R switch takes a single input argument and
executes a particular code based on the value of the input. Each possible
value of the input is called a case. For example:
Code:

a <- 4

switch(a,

"1"="this is the first case in switch",

"2"="this is the second case in switch",

"3"="this is the third case in switch",

"4"="this is the fourth case in switch",

"5"="this is the fifth case in switch"

Output:
4. for loops
The for loop in R, repeats through sequences to perform repeated tasks.
They work with an iterable variable to go through a sequence. The following
is the syntax of for loops in R:

for(variable in sequence){

Code_to_repeat

Where variable is the iterative variable,


sequence is the sequence which we need to loop through,
Code_to_repeat is the code that runs every iteration.
Code:

vec <- c(1:10)

for(i in vec){

print(vec[i])

Output:
5. while Loops
The while loop in R evaluates a condition. If the condition evaluates
to TRUE it loops through a code block, whereas if the condition evaluates
to FALSE it exits the loop. The while loop in R keeps looping through the
enclosed code block as long as the condition is TRUE. This can also result in
an infinite loop sometimes which is something to avoid. The while loop’s
syntax is as follows:

while(condition){

code_to _run

Where condition is the condition to run the loop,


code_to_run is the code block that runs if the condition evaluates to TRUE.
Advertisement

Code:

i <- 0

while(i<10){

print(paste("this is iteration no",i))

i <- i+1
}

Output:

6. break Statement
The break statement can break out of a loop. Imagine a loop searching a
specific element in a sequence. The loop needs to keep going until either it
finds the element or until the end of the sequence. If it finds the element
early, further looping is not needed. In such a case, the R break statement
can “break” us out of the loop early. For example:
Code:

for(i in vec){

print(paste("this is iteration no ", i))

if(i==7){

print("break!!")

break

}
Output:

7. next Statement
Advertisement

The next statement in R causes the loop to skip the current iteration and
start the next one. For example:
Code:

for(i in vec){

if(i==5 || i==7){

print("next!!")

next

print(paste("this is iteration no", i))

Output:
8. repeat loop
The repeat loop in R initiates an infinite loop from the get-go. The only
way to get out of the loop is to use the break statement. The repeat loop is
useful when you don’t know the required number of iterations. For
example:
Code:

vec2 <- 1:40

x <- 15

i <- 1

repeat{

if(i == x){

print("found it!!")

break

print("not found!")

i <- i+1
}

Output:

R Functions
A function is a block of code which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data as a result.

Creating a Function
To create a function, use the function() keyword:

Example
my_function <- function() { # create a function with the name my_function

print("Hello World!")

Call a Function
To call a function, use the function name followed by parenthesis,
like my_function():

Example
my_function <- function() {
print("Hello World!")
}

my_function() # call the function named my_function

Arguments
Information can be passed into functions as arguments.

Arguments are specified after the function name, inside the parentheses. You
can add as many arguments as you want, just separate them with a comma.

The following example has a function with one argument (fname). When the
function is called, we pass along a first name, which is used inside the
function to print the full name:

Example
my_function <- function(fname) {
paste(fname, "Griffin")
}

my_function("Peter")
my_function("Lois")
my_function("Stewie")
Parameters or Arguments?
The terms "parameter" and "argument" can be used for the same thing:
information that are passed into a function.

From a function's perspective:

A parameter is the variable listed inside the parentheses in the function


definition.

An argument is the value that is sent to the function when it is called.

Number of Arguments
By default, a function must be called with the correct number of arguments.
Meaning that if your function expects 2 arguments, you have to call the
function with 2 arguments, not more, and not less:

Example
This function expects 2 arguments, and gets 2 arguments:

my_function <- function(fname, lname) {


paste(fname, lname)
}

my_function("Peter", "Griffin")

If you try to call the function with 1 or 3 arguments, you will get an error:

Example
This function expects 2 arguments, and gets 1 argument:

my_function <- function(fname, lname) {


paste(fname, lname)
}

my_function("Peter")

Default Parameter Value


The following example shows how to use a default parameter value.

If we call the function without an argument, it uses the default value:

Example
my_function <- function(country = "Norway") {
paste("I am from", country)
}

my_function("Sweden")
my_function("India")
my_function() # will get the default value, which is Norway
my_function("USA")

Return Values
To let a function return a result, use the return() function:

Example
my_function <- function(x) {
return (5 * x)
}

print(my_function(3))
print(my_function(5))
print(my_function(9))

The output of the code above will be:


[1] 15
[1] 25
[1] 45

Nested Functions
There are two ways to create a nested function:

 Call a function within another function.


 Write a function within a function.

Example
Call a function within another function:

Nested_function <- function(x, y) {


a <- x + y
return(a)
}

Nested_function(Nested_function(2,2), Nested_function(3,3))

Example Explained
The function tells x to add y.

The first input Nested_function(2,2) is "x" of the main function.

The second input Nested_function(3,3) is "y" of the main function.

The output is therefore (2+2) + (3+3) = 10.

Example
Write a function within a function:

Outer_func <- function(x) {


Inner_func <- function(y) {
a <- x + y
return(a)
}
return (Inner_func)
}
output <- Outer_func(3) # To call the Outer_func
output(5)

Example Explained
You cannot directly call the function because the Inner_func has been
defined (nested) inside the Outer_func.

We need to call Outer_func first in order to call Inner_func as a second step.


We need to create a new variable called output and give it a value, which is 3
here.

We then print the output with the desired value of "y", which in this case is
5.

The output is therefore 8 (3 + 5).

Recursion
R also accepts function recursion, which means a defined function can call
itself.

Recursion is a common mathematical and programming concept. It means


that a function calls itself. This has the benefit of meaning that you can loop
through data to reach a result.

The developer should be very careful with recursion as it can be quite easy to
slip into writing a function which never terminates, or one that uses excess
amounts of memory or processor power. However, when written correctly,
recursion can be a very efficient and mathematically-elegant approach to
programming.

In this example, tri_recursion() is a function that we have defined to call


itself ("recurse"). We use the k variable as the data, which decrements (-1)
every time we recurse. The recursion ends when the condition is not greater
than 0 (i.e. when it is 0).

To a new developer it can take some time to work out how exactly this
works, best way to find out is by testing and modifying it.

Example
tri_recursion <- function(k) {
if (k > 0) {
result <- k + tri_recursion(k - 1)
print(result)
} else {
result = 0
return(result)
}
}
tri_recursion(6)

Dates and Times in R


date() function in R Language is used to return the current date and time.

Syntax: date()
Parameters:
Does not accept any parameters
Ex: date()
[1] "Sun Mar 17 22:11:39 2024"

Sys.Date() Function
Sys.Date()function is used to return the system’s date.
Syntax: Sys.Date()
Parameters:
Does not accept any parameters
Ex: Sys.Date()
[1] "2024-03-17"

Sys.time()
Sys.time()function is used to return the system’s date and time.
Syntax: Sys.time()
Parameters:
Does not accept any parameters

Ex:Sys.time()
[1] "2024-03-17 22:09:02 IST"

Sys.timezone()
Sys.timezone() function is used to return the current time zone.
Syntax: Sys.timezone()
Parameters:
Does not accept any parameters
Ex: Sys.timezone()
[1] "Asia/Calcutta"

Times in R are represented by the POSIXct or POSIXlt class and Dates are
represented by the Date class. The as.Date() function handles dates in R
without time. This function takes the date as a String in the format YYYY-MM-
DD or YYY/MM/DD and internally represents it as the number of days since 1970-
01-01. And, Times are stored internally as the number of seconds since 1970-
01-01.

Following are the Date formats that are used to specify the date, I will use these
in the examples below.

CODE MEANING CODE MEANING

%a Abbreviated weekday %A Full weekday

%b Abbreviated month %B Full month


CODE MEANING CODE MEANING

%c Locale-specific date and %d Decimal date


time

%H Decimal hours (24 hour) %I Decimal hours (12 hour)

%j Decimal day of the year %m Decimal month

%M Decimal minute %p Locale-specific AM/PM

%S Decimal second %U Decimal week of the year (starting on


Sunday)

%w Decimal Weekday %W Decimal week of the year (starting on


(0=Sunday) Monday)

%x Locale-specific Date %X Locale-specific Time

%y 2-digit year %Y 4-digit year

%z Offset from GMT %Z Time zone (character)

So Date 1970-01-01 stores as 0, 1970-01-02 stores as 1. Let’s check this with


an example.

The default input format for Date consists of the year, followed by the month
and day, separated by slashes or dashes.

# Date examples
x <- as.Date("1970-01-01")
y <- as.Date("1970-01-02")
print(x)
print(y)
Yields below output. Note that typeof(x) returns a double value.
And, class(x) returns Date.
> x <- as.Date("1970-01-01")
> y <- as.Date("1970-01-02")
> print(x)
[1] "1970-01-01"
> print(y)
[1] "1970-01-02"
To check the internal value of these dates, use the unclass() function.

# Check internal value of date


unclass(x)
unclass(y)
> unclass(x)
[1] 0
> unclass(y)
[1] 1
1.1 Dates not in Standard Format

If your input dates are not in the standard format, you have the specify the
format as shown below.

z <- as.Date("01/01/1970",format='%m/%d/%Y')
print(z)
> z <- as.Date("01/01/1970",format='%m/%d/%Y')
> print(z)
[1] "1970-01-01"

2. Times in R

Times are stored internally as the number of seconds since 1970-01-01. Times in R
are represented either by the POSIXct or the POSIXlt class. Let’s see with
examples what each time class brings to us.

POSIXct stores the time as a large integer value whereas POSIXlt stores the time
as a list of values with information like day, month, week, year e.t.c. If you
wanted to get a specific value of time this comes in handy.
The default input format for POSIX dates consists of the year, followed by the
month and day, separated by slashes or dashes; for time values, the date may be
followed by white space and a time in the form hour:minutes:seconds or
hour:minutes followed by timezone.
2.1 POSIXct Time Class

The as.POSIXct() function takes date and time as input and returns the time of
the type class POSIXct, it internally represents the time as a large integer value.
Use the unclass() to check the integer value.

# Using POSIXct
timect <- as.POSIXct("2022-11-08 22:14:35 PST")
print(timect)
class(timect)
unclass(timect)
Output:
> timect <- as.POSIXct("2022-11-08 22:14:35 PST")
> print(timect)
[1] "2022-11-08 22:14:35 IST"
> class(timect)
[1] "POSIXct" "POSIXt"
> unclass(timect)
[1] 1667925875
attr(,"tzone")
[1] ""
2.2 POSIXlt Time Class

The as.POSIXlt() function also takes the date and time as string format and
returns a value of type class POSIXlt, it internally stores the values of data and
time parts as a list which ideally contains day, week, month, year, hour, minute,
and the second e.t.c. You can check these values by calling unclass().

# Using as.POSIXlt
timelt <- as.POSIXlt("2022-11-08 22:14:35 PST")
print(timelt)
class(timelt)
unclass(timelt)
Yields below output.
> timelt <- as.POSIXlt("2022-11-08 22:14:35 PST")
> print(timelt)
[1] "2022-11-08 22:14:35 IST"
> class(timelt)
[1] "POSIXlt" "POSIXt"
> unclass(timelt)
$sec
[1] 35
$min
[1] 14
$hour
[1] 22
$mday
[1] 8
$mon
[1] 10
$year
[1] 122
$wday
[1] 2
$yday
[1] 311
$isdst
[1] 0
$zone
[1] "IST"
$gmtoff
[1] NA
attr(,"tzone")
[1] "" "IST" "+0630"
attr(,"balanced")
[1] TRUE

3. Operations on Dates & Times in R

You can perform several mathematic operations like + and – on Dates & Times
and you can do comparisons too like ==, >, < e.t.c. Following are some
examples.

3.1 Subtract Dates in R

In the below example, I am subtracting the date from another date which results
in differences in the number of days.

# Date Diff
dateDiff <- as.Date("2021-01-01") - as.Date("2020-01-01")
print(dateDiff)

# Output
# Time difference of 366 days
3.2 Subtract Times in R

Now let’s use subtract time from another time, the result would be in decimal
value.

#Time Diff
x <- as.POSIXlt("2022-11-08 03:14:35 PST")
y <- as.POSIXlt("2022-11-09 26:14:35 PST")
timeDiff <- y - x
print(timeDiff)

# Output
# Time difference of 20.75694 hours
3.3 Extract Parts of Date & Time

Since POSIXlt stores the time as an array by representing all date fields, you
can use $ operator on the object to get the values. The fields you can use are
“sec“, “min“, “hour“, “mday“, “mon“, “year“, “wday“, “day“, “isdst“,
“zone“, “gmtoff“.

# Date & Time Values


timelt$sec
timelt$wday

# Output
#[1] 35
#[1] 2
3.4 Add Days to Dates

By using the + operator let’s add some days to the Date.

# Add days to date


newDate <- as.Date("2021-01-01") + 3
print(newDate)

# Output
#[1] "2021-01-04"
4. Dates & Times Functions in R

Following are some of the Dates and Times functions in R.

4.1 Sys.time()

Sys.time() returns the current system date and time in the format “ 2022-11-
09 20:05:17 PST” which is of type class “ POSIXct” or “ POSIXt“

x <- Sys.time()
print(x)
class(x)
> x <- Sys.time()
> print(x)
[1] "2024-03-17 23:18:06 IST"
> class(x)
[1] "POSIXct" "POSIXt"
4.2 Find Interval Between Dates

If you have a vector of dates or times, you can use the diff() function to get the
difference between dates. The result of the below example would be different
between the first and second dates and the different between the second and
third dates.

# Differences
datesVec <- as.Date(c("2020-04-21", "2021-06-30", "2021-
11-04"))
diff(datesVec)

# Output
#Time differences in days
#[1] 435 127
4.3 Generate Sequence of Dates

By using seq() function you can generate the sequence of dates with the
specified length. The below example generates 5 dates by month difference.

dateMonth <- seq(as.Date("2020-04-21"), length = 5, by =


"month")
dateMonth

# Output
#[1] "2020-04-21" "2020-05-21" "2020-06-21" "2020-07-21"
"2020-08-21"
4.4 Truncate Date & Time

The trunc() function is used to truncate the date and time values. The below
examples demonstrate the truncation of days, months, and years.

#truncate
x <- as.POSIXlt("2022-11-08 03:14:35 PST")
trunc(x, "mins")
trunc(x, "days")
trunc(x, "year")
# Output
[1] "2022-11-08 03:14:00 PST"
[1] "2022-11-08 PST"
[1] "2022-01-01 PST"
4.5 strptime()

If you have dates and times in R with a different than standard format
use strptime() to convert it to POSIXlt class. This function takes a character
vector that has dates and times and converts into to a POSIXlt object.

#strptime() function example


strDate <- c("11 April, 2022 09:30", "20 November, 2022
10:05")
x <- strptime(strDate, "%d %B, %Y %H:%M")
print(x)
class(x)
Yields below output.
> strDate <- c("11 April, 2022 09:30", "20 November, 2022 10:05")
> x <- strptime(strDate, "%d %B, %Y %H:%M")
> print(x)
[1] "2022-04-11 09:30:00 IST" "2022-11-20 10:05:00 IST"
> class(x)
[1] "POSIXlt" "POSIXt"

Scoping Rules in R
The location where we can find a variable and also access it if required is called the
scope of a variable.

There are mainly two types of variable scopes:

1. Local variable Scope


2. Global variable Scope
Global variable Scope:(created outside the function)

Global variables are those variables that exist throughout the execution of a program. It
can be changed and accessed from any part of the program. It is used both inside and
outside the function.

Ex:(Declaring global variable outside the function)

s=”software”

fun1=function(){

paste(“R is a “, s)
}

fun1()

Output:

“R is software”

(Declaring global variable outside the function)

Global variables can be created inside the function also by using <<(Global assignment
<< - operator(Global assignment operator)

Super assignment operator (<< - ) is used

a=”Python”

fun1=function(x)

print(x)

print(a)

fun1(5)

Output:

“Python”

f2=function(){

a<<- “a studio”

print(a)

f2()

Output:

“a studio”

local variable scope:created inside the function.


Local variables are those variables that exist only within a certain part of a program like a
program like a function and are released when the function call ends. Local variables are
declared inside a block.
f2= function(){

b=”language”

paste(“R is a “,b)

f2()

Output:

“R is a language”

Output:

b has scope only inside function.Error: object b not found.

Global variable and local variable with same name

Example:

x=”python”

func=function()

x=”cobol”

paste(x,”is a language”)

func()

Output:

“cobol is a language”

Loop Functions

Looping on the Command Line


Writing for and while loops is useful when programming but not particularly easy
when working interactively on the command line. Multi-line expressions with curly
braces are just not that easy to sort through when working on the command line. R
has some functions which implement looping in a compact form to make
your life easier.

 lapply(): Loop over a list and evaluate a function on each element


 sapply(): Same as lapply but try to simplify the result
 apply(): Apply a function over the margins of an array
 tapply(): Apply a function over subsets of a vector
 mapply(): Multivariate version of lapply

apply() function
apply() takes Data frame or matrix as an input and gives output in vector, list or
array. Apply function in R is primarily used to avoid explicit uses of loop
constructs. It is the most basic of all collections can be used over a matrix.
The syntax for apply() is as follows:

apply(x,MARGIN,FUN,…)

Parameters

Parameter Condition Description

x Required A matrix , data frame or array

A vector giving the subscripts which the function will be


applied over.
MARGIN Required 1 indicates rows
2 indicates columns
c(1, 2) indicates rows and columns

FUN Required The function to be applied

… Optional Any other arguments to be passed to the FUN function

Examples
# Get the sum of each column
data <- matrix(1:9, nrow=3, ncol=3)
data
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
apply(data, 2, sum)
[1] 6 15 24
# Get the sum of each row
apply(data, 1, sum)
[1] 12 15 18
You can use user-defined functions as well.

# Apply a custom function that squares each element in a matrix


apply(data, 2, function(x){x^2})
[,1] [,2] [,3]
[1,] 1 16 49
[2,] 4 25 64
[3,] 9 36 81

lapply() function
lapply() function is useful for performing operations on list objects and returns a
list object of same length of original set. lapply() returns a list of the similar length
as input list object, each element of which is the result of applying FUN to the
corresponding element of list. Lapply in R takes list, vector or data frame as input
and gives output in list.

The syntax for lapply() is as follows:


lapply(x,FUN,…)

Parameters

Condition Description

x Required A list

FUN Required The function to be applied

… Optional Any other arguments to be passed to the FUN function

Example
# Get the sum of each list item
data <- list(item1 = 1:5,
item2 = seq(4,36,8),
item4 = c(1,3,5,7,9))
data
$item1
[1] 1 2 3 4 5
$item2
[1] 4 12 20 28 36
$item4
[1] 1 3 5 7 9
lapply(data, sum)
$item1
[1] 15
$item2
[1] 100
$item4
[1] 25

l in lapply() stands for list. The difference between lapply() and apply() lies
between the output return. The output of lapply() is a list. lapply() can be used for
other objects like data frames and vectors.

lapply() function does not need MARGIN.

A very easy example can be to change the string value of a matrix to lower case
with tolower function. We construct a matrix with the name of the famous
movies. The name is in upper case format.
movies <- c("SPIDERMAN","BATMAN","VERTIGO","CHINATOWN")
movies_lower <-lapply(movies, tolower)
str(movies_lower)

Output:
## List of 4
## $:chr"spyderman"
## $:chr"batman"
## $:chr"vertigo"
## $:chr"chinatown"
We can use unlist() to convert the list into a vector.
movies_lower <-unlist(lapply(movies,tolower))
str(movies_lower)
Output:
## chr [1:4] "spyderman" "batman" "vertigo" "chinatown"

sapply() function
sapply() function takes list, vector or data frame as input and gives output in vector
or matrix. It is useful for operations on list objects and returns a list object of same
length of original set. Sapply function in R does the same job as lapply() function
but returns a vector.

The syntax for sapply() is as follows:


sapply(x,FUN,…)

Parameters
Parameter Condition Description

x Required A list

FUN Required The function to be applied

… Optional Any other arguments to be passed to the FUN function

Example
# Get the sum of each list item and simplify the result into a vector
data <- list(item1 = 1:5,
item2 = seq(4,36,8),
item4 = c(1,3,5,7,9))
data
$item1
[1] 1 2 3 4 5
$item2
[1] 4 12 20 28 36
$item4
[1] 1 3 5 7 9
sapply(data, sum)
item1 item2 item4
15 100 25

tapply() function
The tapply() function breaks the data set up into groups and applies a function to each
group.

Syntax
The syntax for tapply() is as follows:
tapply(x,INDEX,FUN,…,simplify)

Parameters
Parameter Condition Description

x Required A vector

INDEX Required A grouping factor or a list of factors

FUN Required The function to be applied

… Optional Any other arguments to be passed to the FUN function

Returns simplified result if set to TRUE.


simplify Optional
Default is TRUE.

Example
# Find the age of youngest male and female
data <- data.frame(name=c("Amy","Max","Ray","Kim","Sam","Eve","Bob"),
age=c(24, 22, 21, 23, 20, 24, 21),
gender=factor(c("F","M","M","F","M","F","M")))
data
name age gender
1 Amy 24 F
2 Max 22 M
3 Ray 21 M
4 Kim 23 F
5 Sam 20 M
6 Eve 24 F
7 Bob 21 M
tapply(data$age, data$gender, min)
F M
23 20

Part of the job of a data scientist or researchers is to compute summaries of


variables. For instance, measure the average or group data based on a
characteristic. Most of the data are grouped by ID, city, countries, and so on.
Summarizing over group reveals more interesting patterns.

To understand how it works, let’s use the iris dataset. This dataset is very famous
in the world of machine learning. The purpose of this dataset is to predict the class
of each of the three flower species: Sepal, Versicolor, Virginica. The dataset
collects information for each species about their length and width.

As a prior work, we can compute the median of the length for each species. Tapply
in R is a quick way to perform this computation.
data(iris)
tapply(iris$Sepal.Width, iris$Species, median)
Output:
## setosa versicolor virginica
## 3.4 2.8 3.0

# Find the age of youngest male and female


data <- data.frame(name=c("Amy","Max","Ray","Kim","Sam","Eve","Bob"),
age=c(24, 22, 21, 23, 20, 24, 21),
gender=factor(c("F","M","M","F","M","F","M")))
data
name age gender
1 Amy 24 F
2 Max 22 M
3 Ray 21 M
4 Kim 23 F
5 Sam 20 M
6 Eve 24 F
7 Bob 21 M
tapply(data$age, data$gender, min)
F M
23 20

mapply()
The mapply() function in R can be used to apply a function to multiple list or vector arguments.

This function uses the following basic syntax:

mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)

where:

FUN: The function to apply

…: Arguments to vectorize over

MoreArgs: A list of other arguments to FUN

SIMPLIFY: Whether or not to reduce the result to a vector.

USE.NAMES: Whether or not to use names if the first … argument has names
The following examples show how to use this function in different scenarios.

Example 1 : Use mapply() to Create a Matrix

The following code shows how to use mapply() to create a matrix by repeating the values c(1, 2, 3)
each 5 times:

#create matrix

mapply(rep, 1:3, times=5)

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 1 2 3

[3,] 1 2 3

[4,] 1 2 3

[5,] 1 2 3

Notice how this is much more efficient than typing out the following:

#create same matrix as previous example

matrix(c(rep(1, 5), rep(2, 5), rep(3, 5)), ncol=3)

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 1 2 3

[3,] 1 2 3

[4,] 1 2 3

[5,] 1 2 3

Example 2: Use mapply() to Find Max Value of Corresponding Elements in Vectors

The following code shows how to use mapply() to find the max value for corresponding elements in
two vectors:

#create two vectors

vector1 <- c(1, 2, 3, 4, 5)

vector2 <- c(2, 4, 1, 2, 10)

#find max value of each corresponding elements in vectors

mapply(max, vector1, vector2)


[1] 2 4 3 4 10

list(rep(1, 4), rep(2, 3), rep(3, 2), rep(4, 1))

With mapply(), instead we can do


> mapply(rep, 1:4, 4:1)
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

This passes the sequence 1:4 to the first argument of rep() and the
sequence 4:1 to the second argument.

Debugging in R – How to Easily Overcome Errors in Your


Code?
What is Debugging?
The larger your code, the more chances of it having bugs. We need to
remove the bugs after writing the code. This process of removing bugs
from the code is known as debugging.
The Debugging process
With practice and experience, you will find what errors and bugs you are
prone to, and you will adjust your debugging approach according to that.

While there may not be any dedicated debugging process for any
programming language, here is a general process to start you with.

1. Search for the error on the internet


The first thing recommended by most programmers and us as well would
be to search for the error on the internet. Just copy the error statement and
search it. If you are lucky, it would be a common error, and you would find
how to debug it. Even if you don’t find a proper solution, you may get a
general idea of what could be producing the error.

2. Isolating faulty code


Honestly, there is no magic solution to locate where your code is faulty. The
only way is to go through your code and find what might not be working.

One popular approach is to split the code into parts. You test these
individual parts and try to figure out which one is not working. Then you
split the faulty code further and further until you isolate the part where the
problem is occurring.

3. Making it repeatable
Once you have isolated a part of the code that you believe is the root of the
problem, you will have to repeatedly execute it with different changes to
identify the bug and debug it. You have to make it so that you can execute
that part of the code on its own again and again. After that, make small
changes in the code and execute it. Every try will give you more insight into
the problem and its causes.

4. Fixing and testing


As you finally identify the problem, correct it and test it again with different
scenarios to ensure that the fix works. After that, you can put it back into
the whole code and test the entire code again.

Debugging in R
Debugging is the process of finding errors in your code to figure out why it’s behaving in
unexpected ways. This typically involves:

1. Running the code


2. Stopping the code where something suspicious is taking place
3. Looking at the code step-by-step from this point on to either change the values of
some variables, or modify the code itself.

Since R is an interpreter language, debugging in R means debugging functions.

There are a few kinds of problems you’ll run into with R:

 messages give the user a hint that something is wrong, or may be missing. They can
be ignored, or suppressed altogether with suppressMessages().
 warnings don’t stop the execution of a function, but rather give a heads up that
something unusual is happening. They display potential problems.
 errors are problems that are fatal, and result in the execution stopping altogether.
Errors are used when there is no way for the function to continue with its task.
There are many ways to approach these problems when they arise. For example, condition
handling using tools like try(), tryCatch(), and withCallingHandlers() can increase your
code’s robustness by proactively steering error handling.

R also includes several advanced debugging tools that can be very helpful for quickly and
efficiently locating problems, which will be the focus of this article. To illustrate, we’ll use an
example adapted from an excellent paper by Roger D. Peng, and show how these tools work
along with some updated ways to interact with them via RStudio. In addition to working with
errors, the debugging tools can also be used on warnings by converting them to errors
via options(warn = 2).

traceback()

If we’ve run our code and it has already crashed, we can use traceback() to try to locate
where this happened. traceback() does this by printing a list of the functions that were called
before the error occurred, called the “call stack.” The call stack is read from bottom to top:

traceback() shows that the error occurred during evaluation of func3(y).

Another way we can use traceback(), besides inserting it directly into the code, is by
using traceback() as an error handler (meaning that it will call immediately if any error
occurs). This can be done using options(error = traceback).

We can also access traceback() directly through the button on the right-hand side of the error
message in RStudio:
Debug Mode

While traceback() is certainly useful, it doesn’t show us where, exactly, an error occurred
within a function. For this, we need “debug mode.”

Entering debug mode will pause your function and let you examine and interact with the
environment of the function itself, rather than the usual global environment. In the function’s
runtime environment you’re able to do some useful new things. For example, the
environment pane shows the objects that are saved in the function’s local environment, which
can be inspected by typing their name into the browser prompt.

You can also run code and view the results that normally only the function would see.
Beyond just viewing, you’re able to make changes directly inside debug mode.

You’ll notice that while debugging, the prompt changes to Browse[1]> to let you know that
you’re in debug mode. In this state you’ll still have access to all the usual commands, but also
some extra ones. These can be used via the toolbar that shows up, or by entering the
commands into the console directly:

 ls() to see what objects are available in the current environment


 str() and print() to examine these objects
 n to evaluate the next statement
 s to step into the next line, if it is a function. From there you can go through each line
of the function.
 where to print a stack trace of all active function calls
 f to finish the execution of the current loop or function
 c to leave the debug mode and continue with the regular execution of the function
 Q to stop debug mode, terminate the function, and return to the R prompt

Debug mode sounds pretty useful, right? Here are some ways we can access it.

browser()

One way to enter debug mode is to insert a browser() statement into your code manually,
allowing you to step into debug mode at a pre-specified point.

If you want to use a manual browser() statement on installed code, you can use
print(functionName) to print the function code (or you can download the source code
locally), and use browser() just like you would on your own code.

While you don’t have to run any special code to quit browser(), do remember to remove
the browser() statement from your code once you’re done.

debug()

In contrast to browser(), which can be inserted anywhere into your


code, debug() automatically inserts a browser() statement at the beginning of a function.
This can also be achieved by using the “Rerun with Debug” button on the right-hand side of
the error message in RStudio, just under “Show Traceback.”

Once you’re done with debug(), you’ll need to call undebug(), otherwise it’ll enter debug
mode every time the function is called. An alternative is to use debugonce(). You can check
whether a function is in debug mode using isdebugged().

Options in RStudio

In addition to debug() and browser(), you can also enter debug mode by setting “editor
breakpoints” in RStudio by clicking to the left of the line in RStudio, or by selecting the line
and typing shift+F9. Editor breakpoints are denoted by a red circle on the left-hand side,
indicating that debug mode will be entered at this line once the source is run.

Editor breakpoints avoid having to modify code with a browser() statement, though it is
important to note that there are some instances where editor breakpoints won’t function
properly, and they cannot be used conditionally (unlike browser(), which can be used in
an if() statement).

You can also have RStudio enter the debug mode for you. For example, you can have
RStudio stop the execution when an error is raised via Debug (on the top bar) > On Error, and
changing it from “Error Inspector” to “Break in Code.”

To prevent debug mode from opening every time an error occurs, RStudio won’t invoke the
debugger unless it looks like some of your own code is on the stack. If this is causing
problems for you, navigate to Tools > Global Options > General > Advanced, and unclick
“Use debug error handler only when my code contains errors.”

If you just want to invoke debug mode every single time there’s ever an error,
use options(error = browser()).

recover()

recover() is similar to browser(), but lets you choose which function in the call stack you
want to debug. recover() is not used directly, but rather as an error handler by
calling options(error = recover).

Once put in place, when an error is encountered, recover() will pause R, print the call stack
(though note that this call stack will be upside-down relative to the order in traceback()), and
allow you to select which function’s browser you’d like to enter. This is helpful because
you’ll be able to browse any function on the call stack, even before the error occurred, which
is important if the root cause is a few calls prior to where the error actually takes place.
Once you’ve found the problem, you can switch back to default error handling by removing
the option from your .Rprofile file. Note that previously options(error = NULL) was used to
accomplish this, but this became illegal in R 3.6.0 and as of September 2019 may cause
RStudio to crash the next time you try running certain things, such as .Rmd files.

trace()

The trace() function is slightly more complicated to use, but can be useful when you don’t
have access to the source code (for example, with base functions). trace() allows you to insert
any code at any location in a function, and the functions are only modified indirectly (without
re-sourcing them).

In order to figure out which line of code to use, try: as.list(body(yourFunction))

Note that if called with no additional arguments beyond the function name,
trace(yourFunction) just prints the function message:
Let’s try it out:

Now our function func3() is an object with tracing code:

If we want to see the tracing code to get a better understanding of what’s going on, we can
use body(yourFunction):

At this point, if we call on the function func1(), debug mode will open if r is not a number.
When you’re done, you can remove tracing from a function using untrace()

Simulation in R
Simulation is an important (and big) topic for both statistics and for a variety of other
areas where there is a need to introduce randomness. Sometimes you want to implement a
statistical procedure that requires random number generation or sampling (i.e. Markov
chain Monte Carlo, the bootstrap, random forests, bagging) and sometimes you want to
simulate a system and random number generators can be used to model random inputs.

R comes with a set of pseuodo-random number generators that allow you to simulate from
well-known probability distributions like the Normal, Poisson, and binomial. Some
example functions for probability distributions in R

 rnorm: generate random Normal variates with a given mean and standard deviation
 dnorm: evaluate the Normal probability density (with a given mean/SD) at a point (or
vector of points)
 pnorm: evaluate the cumulative distribution function for a Normal distribution
 rpois: generate random Poisson variates with a given rate

For each probability distribution there are typically four functions available that start with
a “r”, “d”, “p”, and “q”. The “r” function is the one that actually simulates random
numbers from that distribution. The other functions are prefixed with a

 d for density
 r for random number generation
 p for cumulative distribution
 q for quantile function (inverse cumulative distribution)

If you’re only interested in simulating random numbers, then you will likely only need the
“r” functions and not the others. However, if you intend to simulate from arbit rary
probability distributions using something like rejection sampling, then you will need the
other functions too.

Probably the most common probability distribution to work with is the Normal distribution
(also known as the Gaussian). Working with the Normal distributions requires using these
four functions
dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)
Here we simulate standard Normal random numbers with mean 0 and standard deviation 1.

> ## Simulate standard Normal random numbers


> x <- rnorm(10)
>x
[1] 0.01874617 -0.18425254 -1.37133055 -0.59916772 0.29454513 0.38979430
[7] -1.20807618 -0.36367602 -1.62667268 -0.25647839
We can modify the default parameters to simulate numbers with mean 20 and standard
deviation 2.

> x <- rnorm(10, 20, 2)


>x
[1] 22.20356 21.51156 19.52353 21.97489 21.48278 20.17869 18.09011 19.60970
[9] 21.85104 20.96596
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
18.09 19.75 21.22 20.74 21.77 22.20
If you wanted to know what was the probability of a random Normal variable of being less
than, say, 2, you could use the pnorm() function to do that calculation.
> pnorm(2)
[1] 0.9772499
You never know when that calculation will come in handy.

Setting the random number seed


When simulating any random numbers it is essential to set the random number seed.
Setting the random number seed with set.seed() ensures reproducibility of the sequence of
random numbers.
For example, I can generate 5 Normal random numbers with rnorm().
> set.seed(1)
> rnorm(5)
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
Note that if I call rnorm() again I will of course get a different set of 5 random numbers.
> rnorm(5)
[1] -0.8204684 0.4874291 0.7383247 0.5757814 -0.3053884
If I want to reproduce the original set of random numbers, I can just reset the seed
with set.seed().
> set.seed(1)
> rnorm(5) ## Same as before
[1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078
In general, you should always set the random number seed when conducting a
simulation! It is possible to generate random numbers from other probability distributions
like the Poisson. The Poisson distribution is commonly used to model data that come in
the form of counts.

> rpois(10, 1) ## Counts with a mean of 1


[1] 0 0 1 1 2 1 1 4 1 2
> rpois(10, 2) ## Counts with a mean of 2
[1] 4 1 2 0 1 1 0 1 4 1
> rpois(10, 20) ## Counts with a mean of 20
[1] 19 19 24 23 22 24 23 20 11 22
Simulating a Linear Model
Simulating random numbers is useful but sometimes we want to simulate values that come
from a specific model. For that we need to specify the model and then simulate from it
using the functions described above.

Suppose we want to simulate from the following linear model

y=β0+β1x+ε
where ε∼N(0,22) Assume x∼N(0,12), β0=0.5and β1=2. The variable x might represent an
important predictor of the outcome y. Here’s how we could do that in R.
> ## Always set your seed!
> set.seed(20)
>
> ## Simulate predictor variable
> x <- rnorm(100)
>
> ## Simulate the error term
> e <- rnorm(100, 0, 2)
>
> ## Compute the outcome via the model
> y <- 0.5 + 2 * x + e
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-6.4084 -1.5402 0.6789 0.6893 2.9303 6.5052
We can plot the results of the model simulation.

> plot(x, y)

What if we wanted to simulate a predictor variable x that is binary instead of having a


Normal distribution. We can use the rbinom() function to simulate binary random
variables.
> set.seed(10)
> x <- rbinom(100, 1, 0.5)
> str(x) ## 'x' is now 0s and 1s
int [1:100] 1 0 0 1 0 0 0 0 1 0 ...
Then we can procede with the rest of the model as before.

> e <- rnorm(100, 0, 2)


> y <- 0.5 + 2 * x + e
> plot(x, y)

We can also simulate from generalized linear model where the errors are no longer from a
Normal distribution but come from some other distribution. For examples, suppose we
want to simulate from a Poisson log-linear model where

Y∼Poisson(μ)
logμ=β0+β1xlog⁡
and β0=0.5and β1=0.3. We need to use the rpois() function for this
> set.seed(1)
>
> ## Simulate the predictor variable as before
> x <- rnorm(100)
Now we need to compute the log mean of the model and then exponentiate it to get the
mean to pass to rpois().
> log.mu <- 0.5 + 0.3 * x
> y <- rpois(100, exp(log.mu))
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.00 1.00 1.55 2.00 6.00
> plot(x, y)
You can build arbitrarily complex models like this by simulating more predictors or
making transformations of those predictors (e.g. squaring, log transformations, etc.).

Random Sampling
The sample() function draws randomly from a specified set of (scalar) objects allowing
you to sample from arbitrary distributions of numbers.
> set.seed(1)
> sample(1:10, 4)
[1] 9 4 7 1
> sample(1:10, 4)
[1] 2 7 3 6
>
> ## Doesn't have to be numbers
> sample(letters, 5)
[1] "r" "s" "a" "u" "w"
>
> ## Do a random permutation
> sample(1:10)
[1] 10 6 9 2 1 5 8 4 3 7
> sample(1:10)
[1] 5 10 2 8 6 1 4 3 9 7
>
> ## Sample w/replacement
> sample(1:10, replace = TRUE)
[1] 3 6 10 10 6 4 4 10 9 7
To sample more complicated things, such as rows from a data frame or a list, you can
sample the indices into an object rather than the elements of the object itself.

Here’s how you can sample rows from a data frame.

> library(datasets)
> data(airquality)
> head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
Now we just need to create the index vector indexing the rows of the data frame and
sample directly from that index vector.

> set.seed(20)
>
> ## Create index vector
> idx <- seq_len(nrow(airquality))
>
> ## Sample from the index vector
> samp <- sample(idx, 6)
> airquality[samp, ]
Ozone Solar.R Wind Temp Month Day
107 NA 64 11.5 79 8 15
120 76 203 9.7 97 8 28
130 20 252 10.9 80 9 7
98 66 NA 4.6 87 8 6
29 45 252 14.9 81 5 29
45 NA 332 13.8 80 6 14
Other more complex objects can be sampled in this way, as long as there’s a way to index
the sub-elements of the object.

Code Profiling
Typically code profilers are used by developers to help identify performance problems
without having to touch their code. Profilers can answer questions like, “How many times is
each method in my code called?” and, “How long do each of these methods take?” Profilers
also track things like memory allocations and garbage collection. Some profilers can even
track key methods in your code, so you can understand how often SQL statements and web
services are called. In addition, some profilers can track web requests and train those
transactions to understand the performance of transactions within your code.

Code profilers can track all the way down to each individual line of code. However, most
developers only use profilers when chasing a CPU or memory problem, and need to go out of
their way to try and find those problems. This is because many profilers make applications
run a hundred times slower than usual. While most consider profilers to be a situational tool
not meant for daily use, code profiling can be a total lifesaver when you need it.

Profilers are great for finding the hot path in your code. Figuring out what is using twenty
percent of the total CPU usage of your code, and then determining how to improve that
would be a great example of when to use a code profiler. In addition, profilers are also great
for finding memory leaks early, as well as understanding the performance of dependency
calls and transactions. Profilers help you look for methods that can lead to improvement over
time. A former mentor once told me, “If you can improve something one percent every day,
then over the course of a month, you’ll improve by thirty percent.” What really makes a
difference is continued improvement over time.

Types of code profilers


There are two different types of code profilers: server-side and desktop.

A server-side profiler tracks the performance of key methods in pre-production or


production environments. These profilers measure transaction timing, such as tracking how
long a web request takes, while also giving you increased visibility into errors and logs. An
example of a server-side profiler would be an application performance management tool, or
APM, for short.

Desktop code profiling is slower and requires a lot of overhead, potentially making your app
much slower than it should be. This kind of profiler usually tracks the performance of every
line of code within each individual method. These types of profilers also track memory
allocations and garbage collection to help with memory leaks. desktop profilers are very good
at finding that hot path, figuring out every single method that’s being called, and identifying
what uses the most CPU.

But there’s also another solution. For the sake of simplicity, we’ll call it a hybrid profiler.

These hybrid code profilers merge key data from server-based profiling with code-level
details on your desktop for use every day. These profilers provide server level insights
combined with the ability to track key methods, every transaction, dependency calls, errors,
and logs.

Tools that align to these three different types of profilers


For server-side code profiling, most companies use APMs. At Stackify we’ve developed a
product called Retrace.

Some options for desktop code profilers include Visual Studio, NProfiler, and others.

There are very few true hybrid code profiling solutions. Among those is our own hybrid
profiler we call Prefix, which is free to use.

Profiling R Code
R comes with a profiler to help you optimize your code and improve its performance. In
generall, it’s usually a bad idea to focus on optimizing your code at the very beginning of
development. Rather, in the beginning it’s better to focus on translating your ideas into
code and writing code that’s coherent and readable. The problem is that heavily optimized
code tends to be obscure and difficult to read, making it harder to debug and revise. Better
to get all the bugs out first, then focus on optimizing.

Of course, when it comes to optimizing code, the question is what should you optimize?
Well, clearly should optimize the parts of your code that are running slowly, but how do
we know what parts those are?

This is what the profiler is for. Profiling is a systematic way to examine how much time is
spent in different parts of a program.

Sometimes profiling becomes necessary as a project grows and layers of code are placed
on top of each other. Often you might write some code that runs fine once. But then later,
you might put that same code in a big loop that runs 1,000 times. Now the original code
that took 1 second to run is taking 1,000 seconds to run! Getting that little piece of original
code to run faster will help the entire loop.

It’s tempting to think you just know where the bottlenecks in your code are. The reality is
that profiling is better than guessing. Better to collect some data than to go on hunches
alone. Ultimately, getting the biggest impact on speeding up code depends on knowing
where the code spends most of its time. This cannot be done without some sort of rigorous
performance analysis or profiling.
We should forget about small efficiencies, say about 97% of the time: premature
optimization is the root of all evil —Donald Knuth
The basic principles of optimizing your code are:

 Design first, then optimize

 Remember: Premature optimization is the root of all evil

 Measure (collect data), don’t guess.

 If you’re going to be scientist, you need to apply the same principles here!

The R Profiler
Using system.time() allows you to test certain functions or code blocks to see if they are
taking excessive amounts of time. However, this approach assumes that you already know
where the problem is and can call system.time() on it that piece of code. What if you don’t
know where to start?
This is where the profiler comes in handy. The Rprof() function starts the profiler in R.
Note that R must be compiled with profiler support (but this is usually the case). In
conjunction with Rprof(), we will use the summaryRprof() function which summarizes the
output from Rprof() (otherwise it’s not really readable). Note that you should NOT
use system.time() and Rprof() together, or you will be sad.
Rprof() keeps track of the function call stack at regularly sampled intervals and tabulates
how much time is spent inside each function. By default, the profiler samples the function
call stack every 0.02 seconds. This means that if your code runs very quickly (say, under
0.02 seconds), the profiler is not useful. But of your code runs that fast, you probably
don’t need the profiler.
The profiler is started by calling the Rprof() function.
> Rprof() ## Turn on the profiler
You don’t need any other arguments. By default it will write its output to a file
called Rprof.out. You can specify the name of the output file if you don’t want to use this
default.
Once you call the Rprof() function, everything that you do from then on will be measured
by the profiler. Therefore, you usually only want to run a single R function or expression
once you turn on the profiler and then immediately turn it off. The reason is that if you
mix too many function calls together when running the profiler, all of the results will be
mixed together and you won’t be able to sort out where the bottlenecks are. In reality, I
usually only run a single function with the profiler on.
The profiler can be turned off by passing NULL to Rprof().
> Rprof(NULL) ## Turn off the profiler
The raw output from the profiler looks something like this. Here I’m calling
the lm() function on some data with the profiler running.
## lm(y ~ x)

sample.interval=10000
"list" "eval" "eval" "model.frame.default" "model.frame" "eval" "eval" "lm"
"list" "eval" "eval" "model.frame.default" "model.frame" "eval" "eval" "lm"
"list" "eval" "eval" "model.frame.default" "model.frame" "eval" "eval" "lm"
"list" "eval" "eval" "model.frame.default" "model.frame" "eval" "eval" "lm"
"na.omit" "model.frame.default" "model.frame" "eval" "eval" "lm"
"na.omit" "model.frame.default" "model.frame" "eval" "eval" "lm"
"na.omit" "model.frame.default" "model.frame" "eval" "eval" "lm"
"na.omit" "model.frame.default" "model.frame" "eval" "eval" "lm"
"na.omit" "model.frame.default" "model.frame" "eval" "eval" "lm"
"na.omit" "model.frame.default" "model.frame" "eval" "eval" "lm"
"na.omit" "model.frame.default" "model.frame" "eval" "eval" "lm"
"lm.fit" "lm"
"lm.fit" "lm"
"lm.fit" "lm"
At each line of the output, the profiler writes out the function call stack. For example, on
the very first line of the output you can see that the code is 8 levels deep in the call stack.
This is where you need the summaryRprof() function to help you interpret this data.
Using summaryRprof()
The summaryRprof() function tabulates the R profiler output and calculates how much
time is spent in which function. There are two methods for normalizing the data.
 “by.total” divides the time spend in each function by the total run time

 “by.self” does the same as “by.total” but first subtracts out time spent in functions above
the current function in the call stack. I personally find this output to be much more useful.

Here is what summaryRprof() reports in the “by.total” output.


$by.total
total.time total.pct self.time self.pct
"lm" 7.41 100.00 0.30 4.05
"lm.fit" 3.50 47.23 2.99 40.35
"model.frame.default" 2.24 30.23 0.12 1.62
"eval" 2.24 30.23 0.00 0.00
"model.frame" 2.24 30.23 0.00 0.00
"na.omit" 1.54 20.78 0.24 3.24
"na.omit.data.frame" 1.30 17.54 0.49 6.61
"lapply" 1.04 14.04 0.00 0.00
"[.data.frame" 1.03 13.90 0.79 10.66
"[" 1.03 13.90 0.00 0.00
"as.list.data.frame" 0.82 11.07 0.82 11.07
"as.list" 0.82 11.07 0.00 0.00
Because lm() is the function that I called from the command line, of course 100% of the
time is spent somewhere in that function. However, what this doesn’t show is that
if lm() immediately calls another function (like lm.fit(), which does most of the heavy
lifting), then in reality, most of the time is spent in that function, rather than in the top-
level lm() function.
The “by.self” output corrects for this discrepancy.

$by.self
self.time self.pct total.time total.pct
"lm.fit" 2.99 40.35 3.50 47.23
"as.list.data.frame" 0.82 11.07 0.82 11.07
"[.data.frame" 0.79 10.66 1.03 13.90
"structure" 0.73 9.85 0.73 9.85
"na.omit.data.frame" 0.49 6.61 1.30 17.54
"list" 0.46 6.21 0.46 6.21
"lm" 0.30 4.05 7.41 100.00
"model.matrix.default" 0.27 3.64 0.79 10.66
"na.omit" 0.24 3.24 1.54 20.78
"as.character" 0.18 2.43 0.18 2.43
"model.frame.default" 0.12 1.62 2.24 30.23
"anyDuplicated.default" 0.02 0.27 0.02 0.27
Now you can see that only about 4% of the runtime is spent in the actual lm() function,
whereas over 40% of the time is spent in lm.fit(). In this case, this is no surprise since
the lm.fit() function is the function that actually fits the linear model.
You can see that a reasonable amount of time is spent in functions not necessarily
associated with linear modeling (i.e. as.list.data.frame, [.data.frame). This is because
the lm() function does a bit of pre-processing and checking before it actually fits the
model. This is common with modeling functions—the preprocessing and checking is
useful to see if there are any errors. But those two functions take up over 1.5 seconds of
runtime. What if you want to fit this model 10,000 times? You’re going to be spending a
lot of time in preprocessing and checking.
The final bit of output that summaryRprof() provides is the sampling interval and the total
runtime.
$sample.interval
[1] 0.02

$sampling.time
[1] 7.41

You might also like