Download as pdf or txt
Download as pdf or txt
You are on page 1of 88

R - Programming

Course Code: 21CB484


Course Instructor: Praahas Amin
Department of CSBS
Canara Engineering College
PAGE 1
Module 1 - Numeric, Arithmetic, Assignment, and Vectors

Module 2 - Matrices and Arrays

Course Module 3 - Lists and Data Frames


Outline
Module 4 - Functions

Module 5 – Pointers to Further Programming Techniques

PAGE 2
Course Objectives
▪ Explore and understand R and R Studio interactive
environment. Course
▪ To learn and practice programming techniques using R
programming. Objectives
▪ Read Structured Data into R from various sources.
▪ Understand the different data Structures, data types in R.
▪ To develop small applications using R Programming

PAGE 3
CO Outcomes RBT Level
Understand the fundamental syntax of R data types,
CO1 L2
expressions and the usage of the R-Studio IDE
Apply critical programming language concepts of
CO2 control structures in R for conditional branching and L3
looping Course
Apply the List and Data Frame data structures of R
CO3 programming language and import data into R
programs
L3 Outcomes
Utilize the functions in R-Programs and understand
CO4 L3
their scope in R language.
Use advanced R concepts of debugging and object
CO5 L3
oriented concepts

PAGE 4
CO-PO-PSO Mapping

Program
Course
Program Outcomes Specific
Outcomes
Outcomes

CO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2

CO1 2 2 3 3 1 1 1 2
CO2 2 2 3 3 1 1 1 2
CO3 2 2 3 3 1 1 1 2
CO4 2 2 3 3 1 1 1 2
CO5 2 2 3 3 1 1 1 2

PAGE 5
1. Jones, O., Maillardet. R. and Robinson, A. (2014).
Introduction to Scientific Programming and
Simulation Using R. Chapman & Hall/CRC, The R
Series. Text Books
2. Michael J. Crawley, “Statistics: An Introduction using &
R” Second edition, Wiley,2015
References
3. Wickham, H. & Grolemund, G. (2018). for Data
Science. O’Reilly New York. Available for free at
http://r4ds.had.co.nz/

PAGE 6
Assessment Details
Teaching Hours/Week(L:T:P:S) 0:2:0:0
Total Hours 24
Credits 01
CIE (1 hour) 3
Assignments 2
Quiz/GD/Seminar (1 Hour) 1
SEE (1 Hour) 1
CIE Test Marks 20 Marks
Assignment Marks
Quiz/GD/Seminar Marks
10 Marks
20 Marks
Assessment
CIE Marks 50
SEE Marks 50
Total Marks 100
CIE Type MCQ
SEE Type MCQ
Min Passing Marks CIE 40% of Max (i.e 20/50)
Min Passing Marks SEE 35% of Max(18/50)
Total Min Passing Marks 40% of Total Max (40/100) PAGE 7
Numeric, Arithmetic, Assignment, and Vectors:
R for Basic Math, Arithmetic, Variables,
Functions, Vectors, Expressions and
Assignments and Logical expressions. Module 1
Text Book 1: Chapter 2(2.1 to 2.7)

PAGE 8
Variables
▪ A placeholder to hold a value (like a folder) Note:
▪ Can place a value in it, operate on it or modify it bu
the name of the placeholder remains the same. ▪ To display the value of the variable we can
use print(x) or show(x)
▪ Assigning Values to variables
▪ x <- 2.5 ▪ To get the datatype of the variable we can
▪ x = 2.5 use typeof(x)
▪ Variables are created when the values are assigned ▪ We can show outcome of na assignment by
top them. surrounding with parenthesis
▪ Naming of Variables ▪ X<-200
▪ Any name made up of letters, numbers and . Or_ ▪ (y<-(1+1/x)^x)
▪ Name should start with letter or . then a letter.
▪ Names are case-sensitive
▪ Use informative names for readability
▪ When assigning values to a variable, the
expression on the RHS is evaluated first and then
the value is placed in the variable on the LHS

Module 1 Module 3 Module 4 Module 5 PAGE 9


Module 2
Data Types in R
▪ Integer
▪ x <- 2L
▪ x = 2L
▪ Double
▪ x <- 2.5
▪ Complex ▪ typeof(x) - > shows the datatype of the
▪ x <- 3+2i variable
▪ Character
▪ x <- “h”
▪ Logical
▪ x <- T
▪ x <- F
▪ x <- TRUE
▪ x <- FALSE

Module 1 Module 3 Module 4 Module 5 PAGE 10


Module 2
Arithmetic Operators
▪ [1] that prefixes output indicates that this is
▪ Addition item 1 in a vector of output.
▪ x1 <- 2L
▪ x2 <- 2.5 ▪ R by default displays only 7 significant digits.
▪ x <- x1+x2 #x will have 4.5 as result ▪ The display can be changed to display “x”
▪ Subtraction digits, by using options(digits=x)
▪ x <- 4-5 #This will store -1 in x
▪ Multiplication ▪ This however does not guarantee accuracy to
▪ x <- 3*2 #This will store 6 in x x digits.
▪ Division
▪ x <- 3/2 #This will store 1.5 in x
▪ Exponentiation
▪ x <- 3^2 #This will store 9 in x
▪ Modulus
▪ x <- 3%%2 #This will store remainder 1 in x
▪ Integer Division
▪ x <- 17%/%5 #This will store Quotient 3 in x

Module 1 Module 3 Module 4 Module 5 PAGE 11


Module 2
Functions
▪ To find out aboput default values and
▪ Functions take 1 or more arguments or inputs and alternative usages use help(function_name)
produces 1 or more outputs or return values ▪ Eg: help(fname)
▪ Eg: seq( from=1, to=9, by=2) #O/p – [1] 1 3 5 7 9
▪ Or ?fname
▪ Eg: seq( from=1, to=9, by= - 2) #O/p – [1] 9 7 5 3 1
▪ Some arguments are optional and have a ▪ If we just call function name without
predefined value if we omit it. (Here by=1 by
default) parenthesis for arguments, we see the object
▪ Eg: seq( from=1, to=9) #O/p – [1] 1 2 3 4 5 6 7 8 9 type
▪ Functions can have no arguments at all.
▪ Arguments can be constant,variable,another
▪ To see a demonstration of a function use
function call or an algebraic combination of these demo(function_name)
▪ Eg: seq(1, x, x/3) ▪ Eg:demo(graphics)
▪ Order of arguments
▪ Every function has a default order for arguments.
▪ If arguments are provided in same order, then
naming the arguments is not required
▪ If the argument names are not provided in the
default order, then their names must be provided

Module 1 Module 3 Module 4 Module 5 PAGE 12


Module 2
Vectors
▪ Vector is an indexed list of variables. ▪ (x <- seq(1,20, by=2) )
▪ It is a data structure that has a name and within ▪ [1] 1 3 5 7 9 11 13 15 17 19
it there are different variables that are labelled
sequentially ▪ (y <- rep(3,4))
▪ Labelling of variables within a Vector is as ▪ [1] 3 3 3 3
1,2,3,4….
▪ Observe that the first index is 1 and not 0 ▪ (z <- c(y,x))
▪ Vectors are created the first time values are ▪ [1] 3 3 3 3 1 3 5 7 9 11 13 15 17 19
assigned to it, just like variables
▪ A variable is a vector of length 1 called atomic
▪ Shorthand seq(from,to,by=1) seq(from,to,by= -1):
▪ (x <- 100:105) #100 101 102 103 104 105
▪ Top create vectors of length greater than 1, we
use functions that produce vetor-valued ▪ To get a sequence from 1 to n+1 use 1:(n+1)
output. ▪ : takes precedence over *,+,/,-
▪ c(…) # Combine ▪ n <- 5(x <- 1:n+1) # 2 3 4 5 6
▪ seq(from,to,by) #Sequence ▪ (y <- 1:(n+1)) # 1 2 3 4 5 6
▪ rep(x, times) #Repeat

Module 1 Module 3 Module 4 Module 5 PAGE 13


Module 2
Vectors
▪ Element “i” of vector “x” is referred using x[i]. ▪ length(x) gives the number of elements of x.
▪ If “i” is a vector of positive integers, then x[i] ▪ It is possible to have a vector with no elements
corresponds to subvector of “x” ▪ x<-c()
▪ If “i” is a vector of negative integers, then x[i] ▪ length(x) # [1] 0
then corresponding values “x” are omitted
▪ (x<-100:110) #100 101 102 103 104 105 106 107
108 109 110
▪ i <- c(1, 3, 2)
▪ j <- c(-1, -2, -3)
▪ x[i] # [1] 100 102 101
▪ x[j] # [1] 103 104 105 106 107 108 109 110
▪ Square brackets can be used to get or set the
value
▪ x[1]<-3000# #[1] 3000 101 102 103 104 105 106
107 108 109 110

Module 1 Module 3 Module 4 Module 5 PAGE 14


Module 2
Vectors
▪ Algebraic operations on vectors act element-wise ▪ R will still duplicate the shorter vector even if it
▪ x <- c(1,2,3) cannot match the longer vector with a whole
▪ y <- c(4,5,6) number of multiples, but will produce a warning
▪ x*y # [1] 4 10 18 ▪ c(1,2,3) + c(1,2) # [1] 2 4 4
▪ x+y # [1] 5 7 9 Warning message:
▪ y^x # [1] 4 25 216 In c(1,2,3) + c(1,2) :
▪ When algebraic expressions are applied on two Longer object length is not a multiple of shorter
vectors of unequal lengths, R automatically object length.
repeats the shorter vector until it has something
that has the same length as the longer vector.
▪ c(1,2,3,4) + c(1,2) # [1] 2 4 4 6
▪ (1:10)^c(1,2) # [1] 1 4 3 16 5 36 7 64 9 100
▪ 2+c(1,2,3) # [1] 3 4 5
▪ 2*c(1,2,3) # [1] 2 4 6
▪ (1:10)^2 # [1] 1 4 9 16 25 36 49 64 81 100

Module 1 Module 3 Module 4 Module 5 PAGE 15


Module 2
Vectors
▪ Useful set of functions that take Vector arguments ▪ Example Numerical Integration
are: ▪ dt <- 0.005
▪ sum(), prod(), max(), min(), sqrt(), sort(), mean(x), ▪ t <- seq(0,2*pi, by =dt)
var(x) ▪ ft <- sin(t)
▪ Note that functions applied to a vector may be ▪ (I <- sum(ft)*dt) # [1] 0.5015487
defined to act element-wise or may act on the whole ▪ * t is a vector. ft is also a vector.
vector input and return a result ▪ plot(t,ft)
▪ sqrt(1:4) # [1] 1.00000 1.414214 1.732051 2.000000 ▪ Note: when using plot(x,y,type),x and y mus be
▪ mean(1:6) # [1] 3.5 vectors of same length
▪ sort(c(5,1,3,4,2)) # [1] 1 2 3 4 5 ▪ Type can be set to ”p” (default i.e points), “l”
(lines),”o” points over lines etc

▪ Example: Mean and Variance ▪ Example: Exponential Limit


▪ x <- c(1.2, 0.9, 0.8, 1.0, 1.2) ▪ x <- seq(10,200,by=10)
▪ x.mean <- sum(x)/length(x)
▪ x.mean – mean(x) #[1] 0 ▪ y <- (1+1/x)^x
▪ x.var <- sum((x-x.mean)^2)/(length(x) -1) ▪ exp(y) – y
▪ x.var – var(x) # [1] 0
▪ plot(x,y)
Module 1 Module 3 Module 4 Module 5 PAGE 16
Module 2
Missing Data
▪ In Real experiments certain observations ▪ a <- NA
maybe missing for one reason or another
▪ is.na(a) # [1] TRUE
▪ Missing data can be ignored or imputed
(invented) depending on the statistical analysis ▪ a <- c(11,NA,13)
involved. ▪ is.na(a) # [1]FALSE TRUE FALSE
▪ Represented in R using NA
▪ any(is.na(a)) #[1] TRUE
▪ NA can be thought of as placeholders for
values that should have been there but are ▪ mean(a) #[1] NA
missing ▪ mean(a,na.rm=TRUE) #[1] 12 NAs can be removed
▪ We can check for missing values using “is.na”
▪ NA is not same as NULL
▪ NA is a placeholder for something that is
missing. NULL is something that never existed
at all

Module 1 Module 3 Module 4 Module 5 PAGE 17


Module 2
Expressions & Assignments

▪ Expression is used to denote a phrase of code


that can be executed
▪ Eg: seq(10,20, by=3) #[1] 10 13 16 19
▪ Eg: 4 #[1] 4
▪ Eg: mean(c(1,2,3)) #[1] 2
▪ Eg: 1>2 #[1] FALSE
▪ If the evaluation of an expression is saved
using the <- operator, then the combinationis
called assignment
▪ Eg: x1 <- seq(10,20, by=3)
▪ Eg: x2 <- 4
▪ Eg: x3 <- mean(c(1,2,3))
▪ Eg: x4 <- 1>2

Module 1 Module 3 Module 4 Module 5 PAGE 18


Module 2
Logical Expressions
▪ Logical Expression is formed using comparison operators and
the logical operators ▪ &&, || - These logical operators consider only
▪ Value of logical expression is always TRUE or FALSE.
the first element of the vectors and give a
vector of single element as output.
▪ Integers 1 & 0 can represent TRUE & FALSE
respectively(coercion) ▪ They work only on scalars
▪ <, >, <=, >=, ==, != ▪ v <- c(3,0,TRUE,2+2i)
▪ &, |, ! (Elementwise Logical and, or and not operation)
▪ Works on vectors on an element by element basis
▪ t <- c(1,3,TRUE,2+3i)
▪ v <- c(3,1,TRUE,2+3i) ▪ print(v&&t) # [1] TRUE
▪ t <- c(4,1,FALSE,2+3i)
▪ print(v&t) #[1] TRUE TRUE FALSE TRUE
▪ v <- c(0,0,TRUE,2+2i)
▪ v <- c(3,0,TRUE,2+2i) ▪ t <- c(0,3,TRUE,2+3i)
▪ t <- c(4,0,FALSE,2+3i)
▪ print(v|t) #[1] TRUE FALSE TRUE TRUE ▪ print(v||t) # [1] FALSE

▪ v <- c(3,0,TRUE,2+2i)
▪ print(!v) # [1] FALSE TRUE FALSE FALSE

▪ c(0,0,1,1) | c(0,1,0,1) #[1] FALSE TRUE TRUE TRUE


▪ xor(c(0,0,1,1),c(0,1,0,1)) #[1] FALSE TRUE TRUE FALSE

Module 1 Module 3 Module 4 Module 5 PAGE 19


Module 2
Logical Expressions
▪ Logical Expressions can be applied to Vectors to ▪ subset function can also be used for selecting a
produce vectors of TRUE/FALSE values. This can be subvector of x
used for selecting subvectors using indexing ▪ One difference between subset function and
operation. using the index operator is that subset function
will ignore the missing index values(NA),
▪ Eg: Find all integers between 1 & 20 that are whereas the x[subset] preserves the NA values
divisible by 4. ▪ x <- c(1,NA,3,4)
▪ x <- 1:20 # [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 ▪ x>2 #[1] FALSE NA TRUE TRUE
14 15
▪ x[x>2] #[1] NA 3 4
[16] 16 17 18 19 20 ▪ subset(x,subset=x>2) #[1]3 4
▪ x %% 4 == 0 # [1] FALSE FALSE FALSE TRUE FALSE ▪ Another difference between subset(x,subset=?)
FALSE FALSE and x[?] is that the latter accepts expressions
[8] TRUE FALSE FALSE FALSE TRUE that resolve to integer or logical objects,
FALSE FALSE whereas the former only works with logical
[15] FALSE TRUE FALSE FALSE objects.
FALSE TRUE ▪ To know the positions of TRUE elelemts of a
▪ (y <- x[x %% 4 == 0]) #[1] 4 8 12 16 20 logical vector x use which(x)
▪ x<- c(1,1,2,3,5,8,13)
▪ The result of x[subset] is that subvector of x for ▪ which(x%%2==0) #[1] 3 6
which corresponding elements of subset are TRUE

Module 1 Module 3 Module 4 Module 5 PAGE 20


Module 2
Rounding Error
▪ *Note: Rounding Error
▪ Only integers and fractions whose
denominator is a power of 2 can be exactly
represented with Floating Point
representation. All other numbers are subject
t torounding error.
▪ 2*2==4 #[1] TRUE
▪ sqrt(2)*sqrt(2)==2 #[1] FALSE
▪ sqrt(2) has a rounding error which gets
amplified when we square it.
▪ all.equal(x,y) #Returns TRUE if difference
between x and y is smaller than some set
tolerance based on R’s operational level of
accuracy.

Module 1 Module 3 Module 4 Module 5 PAGE 21


Module 2
Questions
▪ *

Module 1 Module 3 Module 4 Module 5 PAGE 22


Module 2
Assignment
▪ *

Module 1 Module 3 Module 4 Module 5 PAGE 23


Module 2
Matrices and Arrays and Conditions and
Looping: Defining a Matrix, Sub-setting, Matrix
Operations, if statements, looping with for,
looping with while, vector based programming. Module 2
Text Book 1: Chapter 2- 2.8, chapter 3- 3.2 to
3.5

PAGE 24
Matrices
▪ (A <- matrix(1:6, nrow=2, ncol=3,byrow=TRUE)
▪ Matrix – It is created from a vector using the [,1] [,2] [,3]
matrix function [1,] 1 2 3
▪ matrix(data,nrow=1,ncol=1,byrow=FALSE)
[2,] 4 5 6
▪ data is a vector of length at most nrow*ncol
▪ dim(A) returns the dimensions of a matrix
▪ nrow – No. of Rows (default value 1)
▪ dim(A) #[1] [1] 2 3
▪ ncol – No. of Columns (default value 1)
▪ byrow used to define whether to fill the ▪ Creating Diagonal Matrix
matrix by elements of data, row-by-row or ▪ Use diag(x)
column-by-column. ▪ Joining Matrices with rows of same length (
▪ byrow defaults to FALSE Stacking Vertically)
▪ If length(data) is less than nrow*ncol, then ▪ Use rbind(…)
data is re-used as many times as needed. ▪ Joining Matrices with columns of the same length
( Stacking Horizontally)
▪ Use cbind(…)

Module 2 Module 5 PAGE 25


Module 1 Module 3 Module 4
Matrices
▪ (B<-diag(c(1,2,3)))
▪ [,1] [,2] [,3]
▪ Elements of Matrices are referenced using
▪ [1,] 1 0 0
two indices
▪ [2,] 0 2 0
▪ A[1,3] <- 0 # Sets first row, third column value ▪ [3,] 0 0 3
to 0 ▪ Algebraic operations incluing *(multiply) act
▪ [,1] [,2] [,3] elementwise on matrices
▪ [1,] 1 2 0 ▪ To perform Matrix Multiplication we use %*%
▪ [2,] 4 5 6 ▪ Functions for use with Matrices
▪ A[, 2:3] # Reference All Rows and columns ▪ nrow(x)
from 2 to 3 ▪ ncol(x)
▪ [,1] [,2] ▪ det(x) #Determinant
▪ t(x) #Transpose
▪ [1,] 2 0
▪ solve(A,B) #returns x such that A%*%x==B
▪ [2,] 5 6 ▪ solve(A) #If A is invertible, matrix inverse of A is
returned

Module 2 Module 5 PAGE 26


Module 1 Module 3 Module 4
Matrices
▪ Algebraic operations incluing *(multiply) act elementwise on matrices.
To perform Matrix Multiplication we use %*%
▪ Elements of Matrices are referenced using two ▪ Functions for use with Matrices
indices ▪ nrow(x) , ncol(x), det(x)(Determinant), t(x)(Transpose)
▪ solve(A,B) #returns x such that A%*%x==B
▪ A[1,3] <- 0 # Sets first row, third column value ▪ solve(A) #If A is invertible, matrix inverse of A is returned
to 0
▪ (A <- matrix(c(3,5,2,3),nrow=2,ncol=2))
▪ [,1] [,2] [,3]
▪ [1,] 1 2 0 ▪ (B <- matrix(c(1,1,0,1),nrow=2,ncol=2))
▪ [2,] 4 5 6 ▪ A%*%B
▪ [,1] [,2]
▪ A[, 2:3] # Reference All Rows and columns from ▪ [1,] 5 2
2 to 3 ▪ [2,] 8 3
▪ [,1] [,2] ▪ A*B
▪ [1,] 2 0 ▪ [,1] [,2]
▪ [2,] 5 6 ▪ [1,] 3 0
▪ [2,] 5 3
▪ (B<-diag(c(1,2,3)))
▪ A.inv <- solve(A)
▪ [,1] [,2] [,3]
▪ [1,] 1 0 0 ▪ [,1] [,2]
▪ [2,] 0 2 0 ▪ [1,] -3 2
▪ [3,] 0 0 3 ▪ [2,] 5 -3

Module 2 Module 5 PAGE 27


Module 1 Module 3 Module 4
Matrices
▪ R prints out a vector x as a row vector, however in matrix
operations it will treat x as either a row or column vector in an
attempt to make the components conformable.
▪ A <- matrix(1:9, nrow=3, ncol=3,byrow=TRUE) ▪ A <- matrix(c(3,5,2,3), nrow=2, ncol=2)
▪ [,1] [,2] [,3] ▪ [,1] [,2]
▪ [1,] 3 2
▪ [1,] 1 2 3 ▪ [2,] 5 3
▪ [2,] 4 5 6 ▪ (x<-c(1,2))
▪ [1] 1 2
▪ [3,] 7 8 9 ▪ A%*%x
▪ [,1]
▪ In R, a Matrix is stored as a vector with an ▪ [1,] 7
added dimension attribute, which gives ▪ [2,] 11

number of rows and column. Matrix elements ▪ Now t(x) treats x as a column vector by default and produces an
array with the fixed dimension attributes of a row vector
are stored clumnwise in the vector. ▪ [,1] [,2]
▪ [1,] 1 2
▪ Here refrencing using single index is : A[1] = 1 ,
A[2] = 4, A[3] = 7, A[4] = 2 , A[5] = 5, A[6] = 8, ▪ A%*%t(x) #Error in A %*% t(x) : non-conformable arguments
A[7] =3 , A[8]=6, A[9] =9 ▪ To check if na objectis a matrix or a vector you can use is.matrix(x)
and is.vector(x)
▪ Matheematically speaking they’re equivalent but they’re treated
as different objects in R.

Module 2 Module 5 PAGE 28


Module 1 Module 3 Module 4
Matrices
▪ Sometimes it is convenient to arrange objects in
▪ To create a matrix A with one column from a arrays of more than two dimensions.
vector x, we use as.matrix(x) ▪ This is done with Arrays
▪ A <- as.matrix(x) ▪ array(data,dim)
▪ To create a vector x from the columns of a ▪ data is a vector containing the elements of the array
matrix A, we use as.vector(A) ▪ dim is a vector whose length is the number of
▪ x <- as.vector(A) dimensions and whose elements give the size of the
arrayalonh each dimensional axis
▪ This just strips the dimension attribute from A
and leaves the elements as they are (Stored ▪ To fill the array you need length(data) equal to
Clolumnwise) prod(dim)
▪ This process of changing object type is called
“coercion”
▪ In many instances R will implicitly coerce the
type of an object in order to apply specified
operations or functions

Module 2 Module 5 PAGE 29


Module 1 Module 3 Module 4
*The Workspace

▪ Objects created in R exist until explicitly deleted or session is concluded.


▪ To list all currently defined objects – ls() or objects()
▪ To remove object x use rm(x)
▪ To remove all currently defined objects -> rm(list=ls())
▪ To save all existing objects to a file fname -> save.image(file=“fname”)
▪ To save objects x and y -> save(x,y, file=“fname”)
▪ To load a set of saved objects -> load(file=“fname”)
▪ Whjen quitting R, if you save the data when prompted , then the objects will be stored in file
.Rdata in the current working directory
▪ R keeps a record of all commands you type. To save the history use savehistory(file=“fname”)
and for loading use loadhistory(file=“fname”)
▪ IF workspace image is saved when quitting, then current history is saved in .Rhistory in current
working directory

Module 2 Module 5 PAGE 30


Module 1 Module 3 Module 4
Branching with if
▪ Braces { } are aused to group together one or
▪ Useful to choose the execution of some or more expressions
other part of a program depending on
condition. ▪ If there is only one expression, then bracews
▪ if(logical_expression){ are optional
expression_1 ▪ During evaluation of an “if” expression, if the
... logical_expression evaluates to TRUE, then
} the first group of expressions is executed and
▪ if(logical_expression){ the second group is not executed.
expression_1 ▪ During evaluation of an “if” expression, if the
... logical_expression evaluates to FALSE, then
} else { only the second group of expressions is
executed and the first group is not executed.
expression_2
... ▪ If statements can be nested to create
} elaborate pathways through a program

Module 2 Module 5 PAGE 31


Module 1 Module 3 Module 4
Branching with if
▪ Example – Find the roots of a quadratic
▪ Else part is optional and if the “if” statement is equation
finished before it sees the “else” part on a ▪ #find the zeros of a2*x2+a1*x+a0 =0
written on new line, then R treats else as the
start of a new command. Since there is no ▪ #clear the workspace
command starting with else, it will give an error
▪ rm(list=ls())
▪ if(logical_expression){
expression_1 ▪ #Input
▪ a2<-1
...
▪ a1<-4
} ▪ a0<-5
else { #This will cause an error
▪ #Calculate the Discriminant
expression_2 ▪ discrim <- a1^2 - 4*a2*a0
...
▪ #Calculate the roots depending on the value
} of the discriminant

Module 2 Module 5 PAGE 32


Module 1 Module 3 Module 4
Branching with if
▪ If(discrim>0){
▪ Example – Find the roots of a quadratic roots<-c((-a1+sqrt(a1^2-4*a2*a0))/(2*a2),
equation (-a1-sqrt(a1^2-4*a2*a0))/(2*a2))
} else {
▪ #find the zeros of a2*x2+a1*x+a0 =0
If(discrim==0){
▪ #clear the workspace rootsa1-/(2*a2)
▪ rm(list=ls())
}else{
▪ #Input roots<-c()
▪ a2<-1 }
▪ a1<-4
}
▪ a0<-5
#Output
▪ #Calculate the Discriminant show(roots)
▪ discrim <- a1^2 - 4*a2*a0
▪ #Calculate the roots depending on the value
of the discriminant ▪ Modify code to handle a2=0

Module 2 Module 5 PAGE 33


Module 1 Module 3 Module 4
Branching with if
▪ if(logical_expression_1){ ▪ if(logical_expression_1){
expression_1 expression_1
... ...
} else { } else if(logical_expression_2) {
expression_2 expression_2
... ...
} else { } else {
expression_3 expression_3
... ...
} }

Module 2 Module 5 PAGE 34


Module 1 Module 3 Module 4
Looping with for
▪ Example – Summing a vector
▪ for command executes the group of ▪ (x_list <- seq(1,9 by = 2)) #[1]1 3 5 7 9
expressions within braces { } once for each
element of vector. ▪ sum_x <- 0

▪ The grouped expression can makes use of x ▪ for(x in x_list){


which takes on each of the values of the sum_x <- sum_x + x
elements of the vector as the loop repeats.
cat(“The current loop element is”,x,”\n”)
The vector can be a list
▪ for(x in vector){ cat(“The cumulative total is”,sum_x,”\n”)
expression_1 }
...
}
▪ Built in function for same
▪ cat – concatenate – allows us to combine text
▪ sum(x_list)
and variables together and display. Unlike
print and show

Module 2 Module 5 PAGE 35


Module 1 Module 3 Module 4
Looping with for
▪ Calculate n factorial 1 (n!)
▪ #clear the workspace
▪ rm(list=ls())
▪ #input
▪ n <- 6
▪ #Calculation
▪ n_factorial <- 1
▪ for(i in 1:n){
n_factorial <- n_factorial*I
}
▪ #Output
▪ show(n_factorial) #[1] 720
▪ Alternate methods:
▪ prod(1:n)
▪ factorial(n)

Module 2 Module 5 PAGE 36


Module 1 Module 3 Module 4
Looping with for
▪ Example – Pension Value – Forecast Pension growth under compound
interest
▪ #clear the workspace
▪ rm(list=ls())
▪ #input
▪ r <- 0.11 #Annual Rate of Interest
▪ term <- 10 #forecast duration
▪ period <- 1/12 #Time between payments in years
▪ payments <- 100 #Amount deposited each period
▪ #Calculations
▪ n <- floor(term/period) #Number of payments
▪ pension <- 0
▪ for(i in 1:n){
pension[i+1] <- pension[i] * (1+r*period)+payments
}
time <- (0:n)*period
▪ #Output
▪ plot(time,pension)

Module 2 Module 5 PAGE 37


Module 1 Module 3 Module 4
Looping with for
▪ Program 1 ▪ Program 1 is faster than Program to achieve the
n <- 1000000 same result
x <- rep(0:n) ▪ Changing the size of a vector takes about as long
for(i in 1:n){ as creating a new vector does.
x[i] <- I ▪ R needs to reconsider uts allocation of memory
} to the object each time the size changes
▪ In program 1, x is already preallocated memory
▪ Program 2 – Preallocation
n <- 1000000
▪ In program 2, size of vector x is changed with
x <- 1 every execution opf x[i] <- i – Redimensioning
for(i in 2:n){
x[i] <- i
}

Module 2 Module 5 PAGE 38


Module 1 Module 3 Module 4
Looping with while
▪ When we do not know beforehand how ▪ When a while command is executed,
many times we need to go around a loop, we logical_expression is evaluated first.
check some condition to see if we are done ▪ If it evaluates to TRUE, then the group of
yet. – While loop is used for this expressions in braces { } is executed.
▪ Control is then passed to start of the command.
▪ while(logical_expression){ If logical_expression is still TRUE, the grouped
expression_1 expression are executed again.

... ▪ For the loop to stop, the logical_expression must


evaluate to FALSE and this usally depends on a
} variable that is modified within the grouped
expressions\
▪ While loop is more fundamental than for loop as
we can always rewrite a for loop as a while loop

Module 2 Module 5 PAGE 39


Module 1 Module 3 Module 4
Looping with while
▪ Example – Compound Interest – Duration of Loan
▪ Example - Fibonacci Sequence under compound interest
#clear the workspace
▪ #clear the workspace ▪ rm(list=ls())
▪ rm(list=ls())
▪ #inputs
▪ #initialize variables ▪ r <- 0.11 #Annual Rate of Interest
▪ F <- c(1,1) #List of Fibonacci numbers ▪ period <- 1/12 #Time between repayments in years
▪ n <- 2 #length of F ▪ debt_initial <- 1000
▪ #Iteratively calculate new Fibonacci Numbers ▪ repayments <- 12 #Amount repayed each period
while(F[n] <=100){ ▪ #Calculations
#cat("n =",n, "F[n] =", F[n], "\n" ) ▪ time <- 0
n <- n+1 ▪ debt <- debt_initial
F[n] <- F[n-1]+F[n-2] ▪ while(debt>0){
} time <- time+period
▪ #Output debt <- debt*(1+r*period) - repayments
▪ cat("The First Fibonacci Number >100 is F(",n, ") = ", }
F[n], "\n" )
▪ #Output
▪ cat(" Loan will be repaid in ", time, " years\n" )

Module 2 Module 5 PAGE 40


Module 1 Module 3 Module 4
Vector Based Programming
▪ Often it is necessary to perform operations ▪ Find Sum of First n Squares
on each element of a vector. ▪ i.e n(n+1)(2n+1)/6
▪ R is designed so that such tasks can be ▪ Looping
accomplished using vector operations rather ▪ n<-100
than looping ▪ S<-0
▪ Vector operations are more efficient and ▪ for(i in 1:n){
concise literally. S <- S+i^2
}
▪ In the Vector Operation:
S #[1] 338350
▪ R interprests 1:n as “integers from 1 upto n,
inclusive” then squared those integers usiong ▪ Vector Operation
the vectorized “^2” and then added them up ▪ sum((1:n)^2) #[1] 338350
in sum

Module 2 Module 5 PAGE 41


Module 1 Module 3 Module 4
Vector Based Programming
▪ x <- c(-2,-1,1,2)
▪ Ifelse function performs element-wise
conditional evaluation upon a vector ▪ ifelse(x>0, "Positive", "Negative")
▪ ifelse(test, A,B) ▪ #[1] "Negative" "Negative" "Positive" "Positive"
▪ x>0 is the test logical expression that evaluates if
▪ It takes 3 vector arguments elements of vector x are greater than 0 or not.
▪ A logical expression test ▪ “Positive” is the expression returned as the result of
▪ Two expressions A and B evaluation of vector element whose value is greater
than 0
▪ The function returns a vector that is a ▪ “Negative” is the expression returned as the result of
combination of the evaluated expressions A evaluation of vector element whose value is not
and B greater than 0
▪ The elements of A that correspond to ▪ The final result returned is a vector of results of
elements of test that are TRUE evaluation of all the values of x
▪ The elements of B that correspond to ▪ pmin and pmax provide vectorized versions of the
elements of test that are FALSE minimum and maximum
▪ pmin(c(1,2,3),c(3,2,1),c(2,2,2)) #[1] 1 2 1
▪ If vectors have different lengths, R will repeat
the shorter vector(s) to match the longer ▪ The function returns our desired vector (The minimum
values from each vector)
▪ pmax(c(1,2,3),c(3,2,1),c(2,2,2)) #[1] 3 2 3

Module 2 Module 5 PAGE 42


Module 1 Module 3 Module 4
Questions
▪ *

Module 2 Module 5 PAGE 43


Module 1 Module 3 Module 4
Assignment
▪ *

Module 2 Module 5 PAGE 44


Module 1 Module 3 Module 4
Lists and Data Frames: Data Frames, Lists,
Special Values, The apply family

Text Book 1: Chapter 6- 6.2 to 6.4 Module 3

PAGE 45
Dataframes
▪ In Vector data structure in R, all components must be of the same mode – numeric, character or logical
vectors
▪ Real datasets require grouping of data of differing modes.
▪ Matrices cannot contain heterogenous data – data of different modes
▪ Lists and Dataframes are able to store much more complicated data structures
▪ Dataframe is a list that is tailor-made to meet the practical needs of representing multivariate datasets
▪ It is a list of vectors restricted to be of equal lengths
▪ Each vector or column corresponds to a variable in an experiment and each row corresponds to a single
observation.
▪ Each vector can be of any of the basic modes of object

Module 3 PAGE 46
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ If a header is present, it is used to name the
▪ Large Dataframes are usually read into R from a file. columns of the dataframe.
▪ read.table(file,header=FALSE,sep=" ") ▪ The column names can be assigned after reading
▪ It returns a dataframe the file using “names” function or when reading it
▪ file - the name of the file to be read – relative to in using the col.names argument which should be
current working directory, absolute or URL.
▪ header – indicates if the first line of the file is a line assigned a character vector, whose length is same
of text giving the variable names or not. as that of the number of columns.
▪ sep – gives the character used to separate the values
in each row. Default is variable amount of white ▪ If there is no col.names argument and no header,
space given by sep=" ". then R uses the names “V1”, “V2”, etc.
▪ ?read.table can be used for more details
▪ File
▪ Commonly used Variants:
▪ read.csv(file) – Comma Separated data
▪ read.delim(file) – tab-delimited data
▪ Equivalents
▪ read.table(file,header=TRUE,sep=",")
▪ read.table(file,header=TRUE,sep=“\t")

Module 3 PAGE 47
Module 1 Module 2 Module 4 Module 5
Dataframes
Sample Dataset ufc.csv
▪ "plot","tree","species","dbh.cm","height.m"
▪ 2,1,"DF",39,20.5
▪ 2,2,"WL",48,33
▪ 3,2,"GF",52,30
▪ 3,5,"WC",36,20.7
▪ 3,8,"WC",38,22.5
▪ ufc <-
read.csv("C:/Users/Praahas/OneDrive/Documents/Desktop/ufc.csv")
▪ Ufc
▪ To examine the dataset head(ufc) and tail(ufc) can be used

Module 3 PAGE 48
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ To select more than one of the variables in a
▪ Each column or variable in a dataframe has a dataframe we [ ] notaion. We can also use names.
unique name and can be extracted using ▪ ufc[4:5] is same as ufc[c("dbh.cm ", "height.m")]
dataframe name, column name and a dollar sign ▪ diam.height<- ufc[4:5] #"dbh.cm “ and "height.m“
▪ x <- ufc$height.m columns will be stored in diam.height
▪ x[1:5] #[1] 20.5 33.0 30.0 20.7 22.5 ▪ diam.height[1:4,] #Will display rows from 1 to 5
▪ Note: Indexing starts from 1
▪ We can use [ [ ] ] notation to extract columns.
▪ ufc$height.m ,ufc[[5]] and ufc[[“height.m"]] are all
equivalent
▪ Elements of the dataframe can be extracted
directly using Matrix indexing ufc[1:5, 5]
▪ #[1] 20.5 33.0 30.0 20.7 22.5
▪ Check if an object is a dataframe –
is.data.frame(diam.height) #[1] TRUE

Module 3 PAGE 49
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ Variable can be extracted one at a time using [ [ ] ]
▪ Result of selecting columns using [ ] is another ▪ Selecting a column using [ [ ] ] preserves the mode of
dataframe. This can sometimes cause confusion
when you select only one variable the object being that is being extracted
▪ Using [ ] preserves the mode of the object from
which the extraction is being made.
▪ mode(ufc)
▪ [1] "list"
▪ x<-ufc[5] ▪ mode(ufc[5])
▪ height.m
▪ 1 20.5 ▪ [1] "list"
▪ 2 33.0 ▪ mode(ufc[[5]])
▪ 3 30.0
▪ 4 20.7 ▪ [1] "numeric"
▪ 5 22.5
▪ x[1:5] #Error in `[.data.frame`(x, 1:5) : undefined
columns selected

Module 3 PAGE 50
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ In ufc example, lets add a new variable to the dataset –
▪ Dataframes can bne constructed from a Volume
collection of vectors and/or existing dataframes ▪ ufc$volume.m3<-pi*(ufc$dbh.cm/200)^2 * ufc$height.m/2
using data.frame ▪ mean(ufc$volume.m3) #[1] 1.93294
▪ data.frame(col1=x1,col2=x2…….,df1,df2,…..) ▪ Equivalently we could assign to
▪ col1, col2,…are column names given as character ▪ ufc[6], ufc["volume.m3"], ufc[[6]] or ufc[["volume.m3"]]
strings without quotes.
▪ x1,x2… are vectors of equal length ▪ For better readability you can use
▪ ufc$volume.m3 <- with(ufc,pi*(dbh.cm/200)^2*height.m/2)
▪ df1, df2,…. Are dataframes whose column length
must be same as vectors x1, x2
▪ Column names maybe omitted in which case R
will choose a name
▪ A new variable can also be created within a
dataframe by naming it and assigning a value

Module 3 PAGE 51
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ dim(df) will return number of rows and columns of a
▪ names(df) returns the names of the dataframe df as a vector of dataframe
character string
▪ #[1] "plot" "tree" "species" "dbh.cm" ▪ dim(df) <- c(x,y) wil however generate errors
▪ [5] "height.m" "volume.m3" ▪ It is not an attribute of a dataframe, it has been
▪ names(ufc) <-c("P","T","S","D","H","V") extended to dataframes only for convenience
▪ #[1] "P" "T" "S" "D" "H" "V“
▪ names can be used to set or get the object’s names
▪ names is an attribute technically
▪ We must have exactly one name for each column and they
must all be different
▪ dim(dimsnion) of a matrix is another example for attribute
▪ As long as the total number of elements remain the same
we can change the shape of a matrix by changing the dim
attribute
▪ R will reassign values from the old matrix to the new one
column by column
▪ If you delete a column, the remaining columns names are
unchanged

Module 3 PAGE 52
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ dim(For example if we are interested only in DF and GF tree
▪ Dataframe also has row-names. By default they are heights in ufc dataset:
named "1", "2", "3", etc. When dataframe is ▪ fir.height<-subset(ufc, subset=species %in% c("DF","GF"),
created select=c(plot,tree,height.m))
▪ Both read.table and data.frame take optional
argument row.names, where row names can be
specified
▪ row.names(df) will ;return row names of a df as a ▪ For vectors x & y of the same mode, the expression in
character vector x%in%y returns a logical vector the same length as x whose i-
th element is TRUE if and only if x[i] is an element of y
▪ row.names is an attribute of a dataframe and
therefore row names can eb set by making ▪ %in% operator is performing many-to-many matching
assignment to rown.names(df) ▪ Subset argument accepts a logical vector anhd determines
which rows are selected
▪ If you delete a row, the remaining row names are ▪ Note that the vector is of columns, not column names.
unchanged ▪ Note that expressions assigning values to subset and select
▪ subset function is useful for selecting rows of a can directly use the columns of the target dataframe which is
given as the first argument
dataframe especially combined with %in%

Module 3 PAGE 53
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ Complete rows (without missing values) can be
▪ To write a dataframe to a file: identified from a 2-Dimensional such as a dataframe
▪ write.table(x, file= " ", append=FALSE, sep = " ", using complete.cases command
row.names=TRUE, col.names=TRUE ) ▪ Rows with missing values can be removed using na.omit
▪ For complete list of argument use ?write.table function.
▪ x – dataframe to be written
▪ file – name and address of the file to write to. File
is created if it doesn’t exist. By default I twrites to
screen
▪ append – Indicates whether to append to file or
overwrite
▪ sep – Indicates character used to separate values
within a row. Rows are separated by new lines
▪ row.names – indicates whether or not to include
the existing row names as the first column, or a
character vector of column names

Module 3 PAGE 54
Module 1 Module 2 Module 4 Module 5
Dataframes
▪ attach(ufc)
▪ R allows to attach a dataframe to the workspace.
When attached the variables in the dataframe ▪ max(height.m[species=="GF"]) #[1] 47
can be referred to without being prefixed by the ▪ height.m <-0 #Changing value in attached df variable
dataframe name.
▪ attach(ufc) ▪ max(height.m) # [1] 0
▪ max(height.m[species=="GF"]) #[1] 47 ▪ max(ufc$height.m) #[1] 47
▪ To detach use detach(df)
▪ detach(ufc)
▪ When a dataframe is attached, R makes a copy of
each variable which is deleted when the datafram ▪ max(ufc$height.m) #[1] 47
is detached. So changing an attached dataframe
does not change the dataframe
▪ It is preferrbly avoided as it can be a source of
potential errors
▪ with and transform function provide a safer
alternative

Module 3 PAGE 55
Module 1 Module 2 Module 4 Module 5
Lists
▪ my.list <- list("one", TRUE,3,c("f","o","u","r"))
▪ A generic container for other objects. ▪ my.list[[2] ] #[1] TRUE
▪ Like a vector, a list is an indexed set of
objects(and has length) ▪ mode(my.list[[2]]) # "logical“
▪ But unlike vector, elements of a list can be of ▪ my.list[[4]] #[1] "f" "o" "u" "r“
different types, including other lists
▪ Mode of a list is list ▪ my.list[[4]][1] #[1] "f"
▪ It might contain an individual measurement, a ▪ R uses double square brackets [ [1] ] to indicate
vector of observations on a single response List Elements then single square brackets [1] to
variable, a dataframe or a list of dataframes indicate vector elements within the list
containing the results of several experiments
▪ A List is created using list(…) command with
comma-separated arguments.
▪ Single square brackets are used to select a sub-
list
▪ Double square brackets are used to extract a
single element

Module 3 PAGE 56
Module 1 Module 2 Module 4 Module 5
Lists
▪ my.list <- list(first="one",second=TRUE,third=3,fourth=c("f","o","u","r"))
▪ > my.list
▪ Elements of a list can be named when the list ▪ $first
is created using arguments of the form ▪ [1] "one"
▪ $second
name1=x1,name2=x2, etc. ▪ [1] TRUE
▪ Elements of a list can be named later by ▪ $third
▪ [1] 3
assigning a value to the names attribute ▪ $fourth
▪ Unlike dataframe, the elements of a list do ▪ [1] "f" "o" "u" "r“

not have to be named ▪ names(my.list) # [1] "first" "second" "third" "fourth“


▪ my.list$second # [1] TRUE
▪ Names can be used (within quotes) when
▪ names(my.list)<- c("First Element","Second Element","Third
indexing with single or double square Element","Fourth Element")
brackets. ▪ Changes element names

▪ Or they can also be used (with or without ▪ my.list$'Second Element’ # [1] TRUE
quotes) after a dollar sign to extract a list ▪ x<-'Second Element’
element ▪ my.list[[x]] # [1] TRUE

Module 3 PAGE 57
Module 1 Module 2 Module 4 Module 5
Lists
▪ To Flatten a list x, i.e convert it into a vector,
we use unlist(x)
▪ x<-list(1,c(2,3),c(4,5,6))
▪ unlist(x) # [1] 1 2 3 4 5 6
▪ If the list object itself comprises of lists, then
these lists are also flattened, unless the
argument recursive = FALSE is set

Module 3 PAGE 58
Module 1 Module 2 Module 4 Module 5
Lists
▪ Linear Regression:
▪ lm.xy<-lm(y ~ x,data=data.frame(x=1:5,y=1:5))
▪ mode(lm.xy) #[1]”list”
▪ names(lm.xy)

Module 3 PAGE 59
Module 1 Module 2 Module 4 Module 5
The apply family
▪ R has functions that allow you to easily apply a function to all or selected elements of a list or dataframe
▪ apply() - takes Data frame or matrix as an input and gives output in vector, list or array. Apply function in R is
primarily used to avoid explicit uses of loop constructs. It is the most basic of all collections can be used over a
matrice.
▪ lapply()
▪ sapply()
▪ tapply()

Module 3 PAGE 60
Module 1 Module 2 Module 4 Module 5
The apply family
▪ apply() - apply a function to the rows or columns of a
matrix or data frame. This function takes matrix or data
frame as an argument along with function and whether it
has to be applied by row or column and return
▪ apply(X,MARGIN,FUN)
▪ If margin is 1 FUN is applied across row
▪ If margin is 2 FUN is applied across the column

▪ # create sample
▪ datasample_matrix <- matrix(C<-(1:10),nrow=3, ncol=10)
▪ print( "sample matrix:")
▪ sample_matrix
▪ # Use apply() function across row to find sum
▪ print("sum across rows:")
▪ apply( sample_matrix, 1, sum)
▪ # use apply() function across column to find mean
▪ print("mean across columns:")apply( sample_matrix, 2, mean)

Module 3 PAGE 61
Module 1 Module 2 Module 4 Module 5
The apply family
▪ lapply() - apply functions on list objects and returns a list
object of the same length. It takes a list, vector, or data
frame as input and gives output in the form of a list
object.It applies a certain operation to all the elements of
the list it doesn’t need a MARGIN.
▪ lapply(X,FUN)

▪ # create sample data


▪ names <- c("tony stark", "steve rogers","stephen strange",
"peter parker","natasha romanoff")
▪ print( "original data:")
▪ names
▪ # apply lapply() function
▪ print("data after lapply():")
▪ lapply(names, toupper)

Module 3 PAGE 62
Module 1 Module 2 Module 4 Module 5
The apply family
▪ sapply() – apply functions on a list, vector, or data
frame and returns an array or matrix object of the
same length. It takes a list, vector, or data frame as
input and gives output in the form of an array or
matrix object. Since the sapply() function applies a
certain operation to all the elements of the object it
doesn’t need a MARGIN.
▪ It is the same as lapply() with the only difference being
the type of return object.
▪ sapply(X,FUN)

▪ # create sample data


▪ sample_data<- data.frame( x=c(1,2,3,4,5,6),y=c(3,2,4,2,34,5))
▪ print( "original data:")
▪ sample_data
▪ # apply sapply() function
▪ print("data after sapply():")
▪ sapply(sample_data, max)

Module 3 PAGE 63
Module 1 Module 2 Module 4 Module 5
The apply family
▪ tapply() – Vectorise the application of a function to subsets of
data. It is useful for applying a function operation for each
factor variable in a vector. It helps to create a subset of a
vector and then apply some functions to each of the subsets
▪ tapply(X, INDEX, FUN, …)
▪ X – Target Vector to which function will be applied
▪ INDEX – It is a factor, which is used to group the elements of X. It
will be coerced to a factor if it is not one already. It has same
length as X
▪ FUN – Function to be applied. It is applied to subvectors of X
corresponding to a single level of Index

▪ #install.packages("tidyverse")
▪ # load library tidyverse
▪ library(tidyverse)
▪ # print head of diamonds dataset
▪ print(" Head of data:")
▪ head(diamonds)
▪ # apply tapply function to get average price by cut
▪ print("Average price for each cut of diamond:")
▪ tapply(diamonds$price, diamonds$cut, mean)
Module 3 PAGE 64
Module 1 Module 2 Module 4 Module 5
The apply family
▪ mapply() – This function stands for multivariate apply
and is used to perform mathematical operations on
multiple lists simultaneously.
▪ mapply(FUN,LIST1, LIST2 …)
▪ LIST1, LIST2… – Created Lists
▪ FUN – Function to be applied on the lists.

▪ # Creating a list
▪ A = list(c(1, 2, 3, 4))
▪ # Creating another list
▪ B = list(c(2, 5, 1, 6))
▪ # Applying mapply()
▪ result = mapply(sum, A, B)
▪ print(result) #[1] 24
Module 3 PAGE 65
Module 1 Module 2 Module 4 Module 5
Questions
▪ *

Module 3 PAGE 66
Module 1 Module 2 Module 4 Module 5
Assignment
▪ *

Module 3 PAGE 67
Module 1 Module 2 Module 4 Module 5
Functions: Calling functions, scoping,
Arguments matching, writing functions: The
function command, Arguments, specialized
function. Module 4
Text Book 1: Chapter 5- 5.1 to 5.6

PAGE 68
Functions
▪ The value of x1, x2 etc are copied to arg_1 ,arg_2 etc.
the arguments then act as variables within the function
▪ Building Blocks for large programs and essential for structuring
complex algorithms. ▪ Function next evaluates the grouped expressions
contained within the braces { }
▪ Once loaded it can be reused without having to reload it.
▪ The value of the expression output is returned as the
▪ Break down a program into smaller logical units which does a value of the function
simple well defined task
▪ A function may have more than 1 return statement, in
▪ A Function’s general form: which case it stops after executing the first one it reaches.
▪ name <- function(arg_1, arg_2, …) { ▪ If there is no return statement, then the value returned
exp_1 by the function is the value of the last expression in the
exp_2 braces – A function ALWAYS returns a value in R.
<some other exp> ▪ NULL may be returned by the function
return(output)
} ▪ Some functions have no arguments
▪ arg_1, arg_2 etc are names of variables ▪ Braces are necessary only if the function comprises more
▪ exp_1, exp_2 and output are all regular R expressions than 1 expression
▪ name is the name of the function ▪ When a function is called, if the returned value is not
▪ Function call is made using name(x1,x2) assigned to a variable then it is printed.
▪ The value of this expression is the value of the expression output. ▪ Expression invisible(x) will return the same value as x, but
the value is not printed.

Module 4 PAGE 69
Module 1 Module 2 Module 3 Module 5
Functions
▪ quad<-function(a0,a1,a2){
▪ #Find the zeros of a2*x^2+a1*x+a0=0
▪ Roots of a quadratic Equation ▪ if (a2==0 && a1==0 & a0==0){
roots<-NA
▪ #Main }else if(a2==0 && a1==0){
roots<-NULL
▪ rm(list=ls()) }else if(a2==0){
roots<--a0/a1
▪ source("C:/Users/Praahas/Projects/R/quad.r") }else {
#calculate the discriminant
▪ quad(1,0,-1) discrim <- a1^2 - 4*a2*a0
#calculate the roots depending on the value of the discriminant
▪ quad(1,-2,1) if (discrim>0){
roots<- (-a1 +c(1,-1)*sqrt(a1^2 - 4*a2*a0))/(2*a2)
▪ quad(1,1,1) } else if (discrim == 0){
roots<- -a1/(2*a2)
}else{
roots<-NULL
}
}
return(roots)
}

Module 4 PAGE 70
Module 1 Module 2 Module 3 Module 5
Functions
▪ n_factorial<-function(n){
𝑛!
▪ nCr = ▪ #Calculate n Factorial
𝑟! 𝑛−𝑟 !
▪ n_fact<-prod(1:n)
▪ #Main
▪ return(n_fact)
▪ rm(list=ls())
▪ }
▪ source("C:/Users/Praahas/Projects/R/ncr.r")
▪ ncr(4,2) #[1] 6
▪ ncr<-function(n,r){
▪ ncr(6,4) #[1] 15
▪ #Calculate ncr
▪ n_ch_r<-n_factorial(n)/n_factorial(r)/n_factorial(n-r)
▪ Return(n_ch_r)
▪ }

Module 4 PAGE 71
Module 1 Module 2 Module 3 Module 5
Functions
▪ wmean <- function(x,k){
▪ Discard K Smallest and K largest values and then calculate Mean- Eliminates outliers compared to
untrimmed mean
▪ x<-sort(x)
▪ Winsorised Mean – instead of discarding k-th largest and k-th smallest values, we replace them by ▪ n<-length(x)
𝑥(𝑛−𝑘) and 𝑥(𝑘+1) respectively
▪ x[1:k]<-x[k+1]
▪ This can be used when a sample may contain occasional extraordinary values

▪ #Main ▪ x[(n-k+1):n]<-x[n-k]

▪ rm(list=ls()) ▪ return(mean(x))

▪ source("C:/Users/Praahas/Projects/R/wmean.r")
▪ x<-c(8.244,51.421,39.020,90.574,44.697,83.600,73.760,81.106,38.811,68.517)
▪ mean(x)
▪ wmean(x,2)

Module 4 PAGE 72
Module 1 Module 2 Module 3 Module 5
Functions
▪ swap<-function(x){
▪ When a function is executed, the computer sets aside space for the ▪ #swap values of x[1] and x[2]
function variables, makes a copy of the function code and then y<-x[2]
transfers control to the function
▪ x[2]<-x[1]
▪ When the function finishes executing, the output is passed to the main
program and the copy of the function variables and code is deleted ▪ x[1]<-y

▪ Function to swap numbers ▪ return(x)

▪ x<-c(7,8,9) ▪ }

▪ source("C:/Users/Praahas/Projects/R/swap.r")
▪ x[1:2]<-swap(x[1:2]) #[1] 8 7 9
▪ x[2:3]<-swap(x[2:3]) #[1] 8 9 7

Module 4 PAGE 73
Module 1 Module 2 Module 3 Module 5
Functions
▪ swap<-function(x){
▪ #swap values of x[1] and x[2]
▪ y<-x[2]
▪ x[2]<-x[1]
▪ x[1]<-y
▪ return(x)
▪ }

▪ #Main
▪ x<-c(7,8,9)
▪ source("C:/Users/Praahas/Projects/R/swap.r")
▪ x[1:2]<-swap(x[1:2]) #[1] 8 7 9
▪ x[2:3]<-swap(x[2:3]) #[1] 8 9 7

Module 4 PAGE 74
Module 1 Module 2 Module 3 Module 5
Scope & its Consequences
▪ test<-function(x){
▪ Arguments and variables that are defined within a function exist y<-x+1
only within that function
return(y)
▪ If variables with same name exist inside and outside a function,
then they are separate and do not interact at all }
▪ #main
▪ If we execute command rm(list=ls()) inside a function then, you
only delete those objectsthat are defined inside the function ▪ test(1) #[1] 2
▪ The part of a program in which a variable is defined is called its ▪ y # Error: object 'y' not found
scope ▪ y<-10
▪ Restricting the scope of variabels ensures that a function call will
▪ test(1) #[1] 2
not modify a variable outside the function, escept by assigneing
the returned value. ▪ y #[1] 10

Module 4 PAGE 75
Module 1 Module 2 Module 3 Module 5
Scope & its Consequences
▪ test2<-function(x){
▪ Scope of a variable is not symmetric y<-x+z
▪ Variables defined insode a function cannot be seen outside, but return(y)
variables defined outside the function can be seen inside the
function, provided there is no varaibel with the same name defiend }
insided the function. ▪ z<-1
▪ test2(1) #[1] 2
▪ z<-2
▪ test2(1) #[1] 3

Module 4 PAGE 76
Module 1 Module 2 Module 3 Module 5
Arguments
▪ test3<-function(x=1){
▪ Arguments used in a function are named when return(x)
the function is created }
▪ test3(2) #[1] 2
▪ Some arguments may be assigned default values,
which are used in case tehj argument is not ▪ test3() #[1] 1
provided in the function call.
▪ Sometimes arguments have to be defined so that ▪ funk<-function(words=c(“Apple", "Bat", “Cat", "Dog")){
they can only take a small number of different ▪ words<-match.arg(words)
values and the function will stop informatively if ▪ return(words)
an inappropriate value is passed. ▪ }
▪ This can be done with if statement, but R
provides a method for this. – Include a vector of ▪ funk() #[1] “Apple“
permissible values for any such argument and ▪ funk(“Bat") #[1] “Bat"
check them using match.arg function
▪ funk("Dum") # Error in match.arg(words) from( #2)
▪ ‘arg’ should be one of “Apple", "Bat", “Cat", "Dog"
Module 4 PAGE 77
Module 1 Module 2 Module 3 Module 5
Arguments
▪ test4<-function(x, ...){
▪ R provides a means for passing arguments ▪ return(sd(x,...))
unaltered from the function that is being called
to the functions that are called within it. ▪ }

▪ These arguments do not have to be named ▪ test4(1:3) #[1] 1


explicitly in the outer function ▪ test4(c(1:2,NA)# [1] NA
▪ Three dots (…) – an ellipsis act as a placeholder ▪ test4(c(1:2,NA),na.rm=TRUE) # [1] 0.7071068
for any extra arguments ▪ test4(c(1:2,NA),TRUE) # [1] 0.7071068
▪ R assigns arguments to variables from the left,
unless an argument is named
▪ Naming an argument in the function call is good
practice for better readability

Module 4 PAGE 78
Module 1 Module 2 Module 3 Module 5
Arguments
▪ *Note
▪ R provides a means for partial matching of ▪ seq.int(0, 1, len = 11)
arguments, where doing so is not ambiguous ▪ seq.int(0, 1, length.out = 11)
▪ Argument names in the function call do not have
to be complete ▪ ls(all = TRUE)
▪ This can make the code more fragile and ▪ ls(all.names = TRUE)
therefore and this style is therefore not ▪ Partial matching exists to save you typing long argument
encouraged names.
▪ The danger with it is that functions may gain additional
arguments later on which conflict with your partial
▪ test6<-function(a=1, b.c.d=1){ match.
return (a+b.c.d) ▪ This means that it is only suitable for interactive use – if
} you are writing code that will stick around for a long
time (to go in a package, for example) then you should
▪ test6() #[1] 2 always write the full argument name.
▪ test6(b=5) #[1] 6 ▪ The other problem is that by abbreviating an argument
name, you can make your code less readable.

Module 4 PAGE 79
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Many R Functions are vectorized – for a given vector ▪ Example: sapply(X,FUN)
input, the function acts on each element separately and
returns a vector output. ▪ The use of the above expression is to apply
the function FUN to every element of
▪ This enables R to have compact efficient and readable vector X.
code
▪ X can be a list or an atomic vector (vector
▪ Applying function to a vector is much faster than that contains atomic objects like logical,
iteratively looping and applying the function on each integer, numeric, complex character and
element raw)
▪ apply, sapply,lapply,tapply,mapply ▪ sapply(X,FUN) returns a vector whose i-th
element is the value of the expression
FUN(X[i])

Module 4 PAGE 80
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Example for sapply() – Density of Primes
▪ Write a function prime that tests if a given integer is prime
or not
▪ Use sapply() to apply the prime checker function to the
vector 2:n so that we know all primes less than or equal to
n
▪ ρ(n) -> number of primes less than or equal to n
ρ(n) log(𝑛)
▪ Legendre and Gauss’ Assertion -> lim -> 1
𝑛→∞ 𝑛
▪ Result proved by Hadamard and de la Vallee Poussin
▪ Cumulative Sum Function of a vector X-> cumsum(X)

Module 4 PAGE 81
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ rm(list=ls())
▪ prime<-function(n){
▪ n<-1000
if(n==1){ ▪ m.vec<-2:n
is.prime<-FALSE ▪ primes<-sapply(m.vec,prime)
}else if(n==2){
▪ num.primes<cumsum(primes)
is.prime<-TRUE
}else{ ▪ #print(num.primes)
is.prime<-TRUE ▪ par(mfrow = c(1,2),las=1)
for(m in 2:(n/2)){
▪ plot(m.vec, num.primes/m.vec,type="l",main ="prime
if(n%%m==0) is.prime<-FALSE density",xlab="n",ylab="")
}
}
▪ lines(m.vec,1/log(m.vec),col="red")
return(is.prime) ▪ plot(m.vec, num.primes/m.vec*log(m.vec),type="l",main
} ="prime density * log(n)",xlab="n",ylab="")

Module 4 PAGE 82
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions

Module 4 PAGE 83
Module 1 Module 2 Module 3 Module 5
Vector Based Programming using Functions
▪ Optimised Code for Prime Density
▪ Check for factors upto 𝑛, since n=ab
▪ prime<-function(n){
if(n==1){ ▪ Atleast one of a and b is less than or equal to 𝑛
is.prime<-FALSE ▪ Once we find one factor we don’t need to keep checking
}else if(n==2){
is.prime<-TRUE
}else{
is.prime<-TRUE
m<-2
m.max<-sqrt(n)
while(is.prime && m<=m.max){
if(n%%m ==0) is.prime<-FALSE
m<-m+1
}
}
return(is.prime)
}
PAGE 84
Recursive Programming
▪ When a function is called , a new copy of the
▪ A programming technique made possible by functions, function is created with a new set of function
where a function calls itself. variables in a new environment
▪ Example n factorial -> n! = n*((n-1)!)
▪ Therefore elegant but not efficient

▪ nfact2<-function(n){
if(n==1){
cat("Called nfact2(1)\n")
return(1)
}else{
cat("called nfact2(",n,")\n",sep="")
return(n*nfact2(n-1))
}
}
nfact2(6)

Module 4 PAGE 85
Module 1 Module 2 Module 3 Module 5
Recursive Programming ▪ primesieve<- function(sieved,unsieved){
p<-unsieved[1]
▪ Example Sieve of Eratosthenes – Finding all of n<-unsieved[length(unsieved)]
the primes less than or equal to a given number
n if(p^2 >n){
1. Start with a list 2,3,….n and largest known return(c(sieved, unsieved))
prime p=2 }else{
2. Remove from the list all elements that are unsieved<-unsieved[unsieved%%p!=0]
multiples of p (but keep p itself) sieved<-c(sieved,p)
return(primesieve(sieved,unsieved)) }
3. Increase p to the smallest element of the
remaining list that is larger than the current p. }
4. If p is larger than 𝑛 then stop, otherwise go primesieve(c(),2:200)
back to step 2

Module 4 PAGE 86
Module 1 Module 2 Module 3 Module 5
Sieve of
Eratosthenes

276 B.C.
Module 4 PAGE 87
Module 1 Module 2 Module 3 Module 5
Debugging Functions
▪ Unexpected inputs can lead to undesirable ▪ In Browser environment, R Commands can be
consequences and the user may not know why entered normally and evaluated normally, but some
commands have specific new interpretations.
▪ Functions can work, but may return plausible
nonsense. ▪ n – evaluates the current step and prints the next
step to eb evaluated. Return Key has same effect
▪ Perform simple checks of the input to ensure it
conforms to expectations ▪ c – continues evaluation from the next expression
to the end of te hcurrent set of expressions,
▪ stop(“Your message here.”) function is useful for whether that be the end of the current loop or the
this. It ceases processing and prints message to end of the function – same as cont. c stops the
user. browser and continues evaluation starting at the
next statement. Return Key and cont has same
▪ browser() function is useful to invoke inside your erffect
own functions . – Temporarily stop the program
and allows inspection of objects ▪ Q – stops evaluation and exists browser returning
the user to the top-level prompt.
▪ You can step through the code executing one
instruction at a time.

Module 4 PAGE 88
Module 1 Module 2 Module 3 Module 5

You might also like