Download as pdf or txt
Download as pdf or txt
You are on page 1of 736

What is R?

• R is a programming language and software


environment for statistical analysis and
graphics representation.
• R is a dialect of the S language.
• R was created by Ross Ihaka and Robert
Gentleman at the University of Auckland, New
Zealand, and is currently developed by the R
Development Core Team.
History of R
• S is a language that was developed by John Chambers
and others at the old Bell Telephone Laboratories,
originally part of AT&T Corp. at 1976.
• In 2004 Insightful purchased the S language from
Lucent for $2 million under the product name S-PLUS
and built a number of fancy features.
• In 1991, Ross Ihaka and Robert Gentleman developed R
as a free software environment for their teaching
classes when they were colleagues at the University of
Auckland in New Zealand.
• In addition, many other people have contributed new
code and bug fixes to the project.
• Early 1990s: The development of R began.
• August 1993: The software was announced on the
S-news mailing list.
• June 1995: The code was made available under
the Free Software Foundation’s GNU General
Public License (GPL), Version 2.
• February 2000: The first version of R, version 1.0.0,
was released.
• October 2004: Release of R version 2.0.0.
• April 2013: Release of R version 3.0.0.
• April 2015: Release of R-3.2.0
• December 2019 : Release of R-3.6.2
Importance of R
• R is free, open-source code
• R runs anywhere
• R supports extensions
• R provides an engaged community
• R connects with other languages
• Running code without a compiler
• The Ultimate Statistical Analysis Kit
• Benefits of Charting
Limitation of R
• Lack of packages.
• R commands give little thought to memory
management, and so R can consume all
available memory.
• Memory management, speed, and efficiency
are probably the biggest challenges R faces.
• R isn't just for advanced programmers.
R Resources
• https://cran.r-project.org
Videos
• RStudio Learning Resources -- the makers of RStudio offer a series of online multimedia materials (video,
documents, code examples, etc) to help learn R, from beginner-level introduction to the language to more
advanced applications of R.
• RStudio Primers -- a series of interactive tutorials (with video, written materials, code examples, etc)
covering a range of topics from R basics to using R for data analysis or for visualization.
• RStudio Webinars -- upcoming and archived (recorded) webinars on a range of R topics.
• RStudio Essentials -- shorter video tutorials on a bevy of core R topics, from debugging to parallel
programming in R
Books
• R Cookbook -- link is to the free HMTL version of the 2nd edition. An alternate online version from O'Reilly
books is available from Princeton University Library for free to those with an active NetID.
• R for Data Science -- a free online version of the popular O'Reilly book by Hadley Wickham.
• Advanced R -- another popular book by Hadley Wickham. Link is to a free online version of the second
edition.
• The R Inferno -- conversationally written introduction to R. Available both as a free online PDF and in
print format.
• An Introduction to R (2020) -- a comprehensive textbook for beginners and reference for more advanced
users. An online PDF book from the Comprehensive R Archive Network (CRAN) that is regularly updated.

Web pages / written online tutorials


• R resources for every level --- a meta resource with links to additional learning resources for R
programming, IDEs, data analysis and visualization with R, etc.
• Impatient R -- a detailed no-nonsense tutorial for R beginners, from the author of the "The R Inferno"
• R Tutorials -- a compendium of bite-sized tutorials on different R topics from the Association for
Computing Machinery (ACM). New posts appear (and are archived) fairly regularly.
• Data Analysis with R -- links to slides and exercise materials from a 2012 workshop on R run by the Oak
Ridge Leadership Computing Facility (OLCF)
R for windows
• Download and install R:
• Go to the R website (https://www.r-
project.org/)
• clickonthe"DownloadR"link.
• Chooseyouroperatingsystem(e.g.,Windows,Ma
c,Linux)
• download the appropriate version. Follow the
installation instructions.
R console
R Studio
• RStudio is an integrated development
environment for R.
• It includes a console, syntax-highlighting
editor that supports direct code execution, as
well as tools for plotting, history, debugging
and workspace management.
Syntax of R Programming
R Command Prompt:
•After the installation of R environment
setup, we can easily start R command prompt
by typing R on cmd.

R Script File:
•We make a text file and save this file with .R
extension
Comments

• In R programming, comments are the


programmer readable explanation in the
source code of an R program.
• The purpose of adding these comments is to
make the source code easier to understand.
• These comments are generally ignored by
compilers and interpreters.
Comments are generally used for the
following purposes:
• Code Readability
• Explanation of the code or Metadata of the
project
• Prevent execution of code
• To include resources
Types of Comments

• There are generally three types of comments


supported by languages, namely-
• Single-line Comments- Comment that only
needs one line
• Multi-line Comments- Comment that requires
more than one line.
• Documentation Comments- Comments that
are drafted usually for a quick documentation
lookup
Comments in R
• Single comment is written using # at the
beginning of the statement as follow.
• #Test program in R.
• R does not support multi-line comment and
Documentation comments as in C or python.
• Note: It only supports single-line comments
drafted by a ‘#’ symbol.
Syntax:
# single line comment statement

• This is a multiple-line comment


• # Each line starts with the '#' symbol
• # The following code will be executed
#Trick for multi-line comment
if(FALSE) {
"R is an interpreted computer programming la
nguage which was created by
Ross Ihaka and Robert Gentleman at the Unive
rsity of Auckland, New Zealand "
}
R Variables

• Variables are containers for storing data


values.
• R does not have a command for declaring a
variable.
• A variable is created the moment you first
assign a value to it.
• To assign a value to a variable, use the <- sign.
To output (or print) the variable value, just
type the variable name:
variables
• A variable provides us with named storage
that our programs can manipulate.
• A variable in R can store an vector, group of
vectors or a combination of many R objects.
• A valid variable name consists of letters,
numbers and the dot or underline characters.
• The variable name starts with a letter or the
dot not followed by a number.
Rules for Variable Names
• A variable can have a short name (like x and y) or
a more descriptive name (age, carname,
total_volume).
• A variable name must start with a letter and can
be a combination of letters, digits, period(.)
and underscore(_). If it starts with period(.), it
cannot be followed by a digit.
• A variable name cannot start with a number or
underscore (_)
• Variable names are case-sensitive (age, Age and
AGE are three different variables)
• Reserved words cannot be used as variables
(TRUE, FALSE, NULL, if...)
Variable Assignment
• R supports three ways of variable
assignment:
• Using equal operator- operators use an arrow
or an equal sign to assign values to variables.
• Using the leftward operator- data is copied
from right to left.
• Using the rightward operator- data is copied
from left to right.
• However, <- is preferred in most cases because
the = operator can be forbidden in some
context in R.
R Variables Syntax

Types of Variable Creation in R:


• Using equal to operators
variable_name = value

• using leftward operator


variable_name <- value

• using rightward operator


value -> variable_name
# R program to illustrate
# Initialization of variables

# using equal to operator


var1 = "hello"
print(var1)

# using leftward operator


var2 <- "hello"
print(var2)

# using rightward operator


"hello" -> var3
print(var3)
Program 1:
#WAP in R to print Hello world using print()
a<-"Hello World"
print(a)

Program 2:
#WAP in R to print name and age of student without
using print ()

name <- "John“


age <- 40
name # output "John“
age # output 40
Important Methods for R Variables

• class() function
• ls() function
• rm() function
• Print()
• Paste() and paste0()
• Cat()
class() function
• This built-in function is used to determine the
data type of the variable provided to it. The R
variable to be checked is passed to this as an
argument and it prints the data type in return.
Syntax:
class(variable)
Example:
var1 = "hello"
print(class(var1))
ls() function
• This built-in function is used to know all the present variables in the workspace.
This is generally helpful when dealing with a large number of variables at once and
helps prevents overwriting any of them.
Syntax
ls()

Example
# using equal to operator
var1 = "hello"

# using leftward operator


var2 <- "hello"

# using rightward operator


"hello" -> var3
print(ls())
rm() function
• This is again a built-in function used to delete an
unwanted variable within your workspace. This
helps clear the memory space allocated to certain
variables that are not in use thereby creating
more space for others. The name of the variable
to be deleted is passed as an argument to it.
Syntax :
rm(variable)
Example:
rm(var3)
print(var3)
Auto-Print / Output Variables

• if the program of R is written over the console line by


line then the output is printed normally, no need to use
any function for print that output.
• You can just type the name of the variable
Example
name <- "John Doe“
name # auto-print the value of the name variable
print()
• print ()- Print function is used for printing the output of
any object in R.
• Syntax: print(“any string”) or, print(variable)
Example:
name <- "John Doe“
print(name)
Or
print(" John Doe ")

• There are times you must use the print() function to


output code, for example when working with for loops.
Concatenate Elements
• You can also concatenate, or join, two or more
elements, by using the paste() and cat
()function.
• To combine both text and a variable, R uses
comma (,).
paste() and paste0() :
• This function concatenates the input values in a
single character string.
• R provides a method paste() to print output with
string and variable together. This method defined
inside the print() function and without print().
• paste() converts its arguments to character
strings.
• The difference between paste and paste0 is
that paste function provides a separator operator,
whereas paste0 does not
Syntax:
paste(string1, string2, sep=)
print(paste(variable, “any string”))
paste0(input)
Example:
text <- "awesome“
paste("R is", text)

text1 <- "R is“


text2 <- "awesome“
paste(text1, text2,sep=“/”)
paste0(text1,text2)
# using paste inside print()
print(paste(text, “language”))
string1 <- " R programming "
string2 <- “is domain specific language"

# Using paste() method


answer <- paste(string1, string2, sep=" ")
print(answer)
cat()
• The cat() function combines multiple items
into a continuous print output.
• Another way to print output in R is using
of cat() function.
• cat() converts its arguments to character
strings. This is useful for printing output in
user defined functions.
Syntax:
• cat(“any string”) or, cat(“any string”, variable)
Syntax:
cat(any string or variable, file = “”, sep = ” “, fill = FALSE, labels =
NULL, append = FALSE)
Parameters:
any string or variable : atomic vectors, names, NULL and
objects with no output
file: the file in which printing will be done
sep: specified separator
fill: If fill=TRUE, a new line will be printed, otherwise not
labels: specified labels

Example:
x = “R language"
cat(x, "is best\n")
x <- 1:9
cat(x, sep =" + ")
# print normal string
cat("This is R language")
The Difference Between cat() and
paste() in R
• The cat() function will output the
concatenated string to the console, but it
won't store the results in a variable. T
• he paste() function will output the
concatenated string to the console and it will
store the results in a character variable.
• EXAMPLE 1 :
#concatenate several strings together using cat()
results <- cat("hey", "there", "everyone")
#attempt to view concatenated string
results

NULL #output of program

Note: This is because the cat() function does not store results.

EXAMPLE 2:
results <- paste("hey", "there", "everyone")
#view concatenated string
results
[1] "hey there everyone“ #output of program
Finding Variables

• To know all the variables currently available in


the workspace we use the ls() function.
• Also the ls() function can use patterns to
match the variable names.
• print(ls())
• print(ls(pattern = "var"))
Deleting Variables

• Variables can be deleted by using


the rm() function.
• Below we delete the variable var3. On printing
the value of the variable error is thrown.
• rm(var3)
• print(var3)
Multiple Variables

• R allows you to assign the same value to multiple


variable
• # Assign the same value to multiple variables in
one line
var1 <- var2 <- var3 <- "Orange"

# Print variable values


var1
var2
var3ple variables in one line:
Scope of Variables in R programming

• The location where we can find a variable and


also access it if required is called the scope of
a variable.
• There are mainly two types of variable scopes:
• Global Variables
• Local variable
Global Variables
Global variables are those variables that exist throughout
the execution of a program. It can be changed and
accessed from any part of the program.
• As the name suggests, Global Variables can be
accessed from any part of the program.
• They are available throughout the lifetime of a
program.
• They are declared anywhere in the program outside all
of the functions or blocks.

Declaring global variables


• Global variables are usually declared outside of all of
the functions and blocks. They can be accessed from
any portion of the program.
# R program to illustrate usage of global variables

# global variable
global = 5

# global variable accessed from within a function


display = function(){
print(global)
}
display()

# changing value of global variable


global = 10
display()
Local Variables
• Local variables are those variables that exist
only within a certain part of a program like a
function and are released when the function
call ends. Local variables do not exist outside
the block in which they are declared, i.e. they
can not be accessed or used outside that
block.

Declaring local variables


• Local variables are declared inside a block.
# R program to illustrate
# usage of local variables

func = function(){
# this variable is local to the
# function func() and cannot be
# accessed outside this function
age = 18
print(age)
}

cat("Age is:\n")
func()
Difference between local and global variables in R

• Scope A global variable is defined outside of


any function and may be accessed from
anywhere in the program, as opposed to a
local variable.
• Lifetime A local variable’s lifetime is
constrained by the function in which it is
defined. The local variable is destroyed once
the function has finished running. A global
variable, on the other hand, doesn’t leave
memory until the program is finished running
or the variable is explicitly deleted.
• Naming conflicts If the same variable name is
used in different portions of the program, they
may occur since a global variable can be accessed
from anywhere in the program. Contrarily, local
variables are solely applicable to the function in
which they are defined, reducing the likelihood of
naming conflicts.
• Memory usage Because global variables are kept
in memory throughout program execution, they
can eat up more memory than local variables.
Local variables, on the other hand, are created
and destroyed only when necessary, therefore
they normally use less memory
Data Type
• R Data types are used in computer programming to specify
the kind of data that can be stored in a variable. For
effective memory consumption and precise computation,
the right data type must be selected. Each R data type has
its own set of regulations and restrictions.
• In R, a variable itself is not declared of any data type, rather
it gets the data type of the R - object assigned to it.
• So R is called a dynamically typed language, which means
that we can change a variable’s data type of the same
variable again and again when using it in a program.
• var_x <- "Hello“
• Class(var_x)
• var_x <- 34.5
• var_x <- 27L
Basic Data Types
• Numeric- Decimal values are called numeric in R.
• Integer
• Complex- A complex value in R is defined via the
pure imaginary value i. A complex number will be
in the form of a+bi.
• Logical- There are two logical values True and
False.
• Character-used to represent character value in R.
1. Numeric: Decimal value is called numeric in R, and it is the
default computational data type.
• Set of all real numbers
Ex: numeric_value <- 3.14

• 2. Integer:In R, there are two ways to create an integer


variable. The first is to invoke the as.integer() function.
• > int <- as.integer(3) #makes int an integer with value 3
• > int > class(int)
• Here, L tells R to store the value as an integer,
• Set of all integers, 3L, 66L, 2346L
Ex: integer_value <- 42L

3. Logical: It is a special data type for data with only two possible
values which can be construed as true/false.
• TRUE and FALSE
Ex: logical_value <- TRUE
4. Complex: A complex value in R is defined as the pure imaginary value i.
• Set of complex numbers
Ex: complex_value <- 1 + 2i

5. Character: In R programming, a character is used to represent string values.


We convert objects into character values with the help ofas.character()
function.
• “a”, “b”, “c”, …, “@”, “#”, “$”, …., “1”, “2”, …etc
Ex: character_value <- "Hello “

6.raw: A raw data type is used to holds raw bytes. To save and work with data
at the byte level in R, use the raw data type
• as.raw()
Ex: single_raw <- as.raw(255) #raw ff

# Create a raw vector


x <- as.raw(c(0x1, 0x2, 0x3, 0x4, 0x5))
print(x)
• It’s difficult to generalize how much memory
is used by data types in R, but on most 64 bit
systems today, integers are 32 bits (4 bytes)
and double-precision floating point numbers
(numerics in R) are 64 bits (8 bytes).
Furthermore, character data are usually 1 byte
per character.
Determine Memory Usage of Data
Objects in R

• object.size()
• memory.profile()

Ex:
• # numeric value
• a = 11
• print(object.size(a))
Find data type of an object in R

• To find the data type of an object you have to


use class() function.
• The syntax for doing that is you need to pass
the object as an argument to the
function class() to find the data type of an
object.
Syntax
• class(object)
Program 3:
#WAP to determine data type of R object using
class(),typeof()
# Assign a decimal value to x
x = 5.6
# print the class name of variable
class(x)
typeof()
# Declare an integer by appending an L suffix.
y = 5L
# print the class name of y
print(class(y))
# Assign a character value to char
char = "Gs"
# print the class name of char
print(class(char))

# complex
x <- 9i + 3
class(x)
# Sample values
x=4
y=3

# Comparing two values


z=x>y

# print the logical value


print(z)

# print the class name of z


class(z)
#WAP to add two numbers
a<-10
b<-20
c<-a+b
print(c) # for print output
paste("sum=",a,b,c, sep=',')#concate different
inputs
cat("\n")
cat(a,b,c, sep='/')
Data Type Conversion
• We can convert one data type to another data
type as in any programming language.
• We can convert any basic data type to
numeric using the function as.numeric().
• Similarly as.integer() converts to integer,
as.character() converts to character,
as.logical() converts to logical and
as.complex() converts to complex data types.
Type Conversion
• The process of altering the data type of an object to
another type is referred to as Type casting or data type
conversion.
• Note: All the conversion are not possible and if attempted
will be returning an “NA” value.

Syntax
as.data_type(object)
Example:
• as.numeric()
• as.integer()
• as.complex()
• as.character()
• as.logical()
# A simple R program
# convert data type of an object to another

# Logical
print(as.numeric(TRUE))

# Integer
print(as.complex(3L))

# Numeric
print(as.logical(10.5))

# Complex
print(as.character(1+2i))

# Can't possible
print(as.numeric("12-04-2020"))

#possible
print(as.numeric("12"))
x <- 1L # integer
y <- 2 # numeric
z<-”hello”
z <- as.numeric(z)
# convert from integer to numeric:
a <- as.numeric(x)
# convert from numeric to integer:
b <- as.integer(y)
# print values of x and y
x
y
z
# print the class name of a and b
class(a)
class(b)
Keywords/Reserved Words
• Reserved words in R programming are a set of
words that have a special meaning and cannot
be used as an identifier.
• The list of reserved words can be viewed by
typing ?reserved or help(reserved) at the R
command prompt.
Identifiers
• The unique name given to a variable like function
or objects is known as an identifier.
• Following are the rules for naming an identifier.
1. Identifiers can be a combination of letters,
digits, period(.) and underscore.
2. It must start with a letter or a period. If it starts
with a period, it can not be followed by a digit.
3. Reserved word in R can not be used as identifier.
Ex. Total1,sum,.date.of.birth,Sum_of_two etc.
Constants
• Constants or literals, are entities whose value
cannot be altered. Basic types of constants are
numeric constants and character constants.
• There are built-in constants also. All numbers fall
under this category.
• They can be of type integer, double and complex.
• But it is not good to rely on these, as they are
implemented as variables whose values can be
changed,
Taking Input from User in R
Programming
• t’s also possible to take input from the user.
For doing so, there are two methods in R.

• Using readline() method


• Using scan() method
Using readline() method
• In R language readline() method takes input in string format.
• To convert the inputted value to the desired data type, there
are some functions in R

• as.integer(n); —> convert to integer


• as.numeric(n); —> convert to numeric type (float, double etc)
• as.complex(n); —> convert to complex number (i.e 3+2i)
• as.Date(n) —> convert to date …, etc
• Syntax:
var = readline(prompt=“message”);
var = as.integer(var);
Note that one can use “<-“ instead of “=”
Parameters:
• Prompt: One can also show message in the console
window to tell the user, what to input in the program.
Actually prompt argument facilitates other functions to
constructing of files documenting. But prompt is not
mandatory to use all the time.

Syntax:
var1 = readline(prompt = “Enter any number : “);
or,
var1 = readline(“Enter any number : “);
# R program to illustrate taking input from the user

# taking input with showing the message


a = readline(prompt = "Enter any number : ")
# convert the inputted value to an numeric
a= as.numeric(a)

# print the value


print(a)
a = readline(prompt = "Enter any number : ")
# convert the inputted value to an numeric
a= as.numeric(var)

# print the value


print(var)
Taking multiple inputs in R
• Taking multiple inputs in R language is same as
taking single input, just need to define
multiple readline() for inputs. One can
use braces for define multiple readline() inside it.

Syntax:
{
var1 = readline(“Enter 1st number : “);
var2 = readline(“Enter 2nd number : “);
var3 = readline(“Enter 3rd number : “);
var4 = readline(“Enter 4th number : “);
}
# R program to illustrate taking input from the user to taking multiple
inputs using braces
{
var1 = readline("Enter 1st number : ");
var2 = readline("Enter 2nd number : ");
var3 = readline("Enter 3rd number : ");
var4 = readline("Enter 4th number : ");
}

# converting each value


var1 = as.numeric(var1);
var2 = as.numeric (var2);
var3 = as.numeric (var3);
var4 = as.numericvar4);

# print the sum of the 4 number


print(var1 + var2 + var3 + var4)
Taking String and Character input in R
Syntax:
string:
var1 = readline(prompt = “Enter your name :
“);
character:
var1 = readline(prompt = “Enter any character
: “);
var1 = as.character(var1)
# R program to illustrate to taking input from the
user string input
var1 = readline(prompt = "Enter your name : ");

# character input
var2 = readline(prompt = "Enter any character : ");
# convert to character
var2 = as.character(var2)

# printing values
print(var1)
print(var2)
Using scan() method

• This method takes input from the console, reads data


in the form of a vector or list, reads input from a file
also. This method is a very handy method while inputs
are needed to taken quickly for any mathematical
calculation or for any dataset.

• Syntax:
x = scan()
scan() method is taking input continuously, to
terminate the input process, need to press Enter key 2
times on the console.
Example 1:
# taking input from the user

x = scan()
# print the inputted values
print(x)
Example 2:
# double input using scan()
d = scan(what = double())
# string input using 'scan()'
s = scan(what = " ")

# character input using 'scan()'


t= scan(what = character())

# print the inputted values


print(d) # double
print(s) # string
print(t) # character
R Operators

• Operators are the symbols directing the


compiler to perform various kinds of
operations between the operands.
• Operators simulate the various mathematical,
logical, and decision operations performed on
a set of Complex Numbers, Integers, and
Numericals as input operands.
Types of the operator in R language

• Arithmetic Operators
• Logical Operators
• Relational Operators
• Assignment Operators
• Miscellaneous Operator
Arithmetic Operations
• Every statistical analysis involves a lot of
calculations, and calculation is what R is
designed for — the work that R does best.
1. Basic arithmetic operators-
• These operators are used in just about every
programming language.
1. Basic Arithmetic Operator
#Arithmatic operators
x <- 5
y <- 16
x+y
x-y
x*y
y/x
y^x
Y%%x
2. USING MATHEMATICAL FUNCTIONS
3. Relational Operators
x <- 5
y <- 16
x<y
x>y
x <= 5
y >= 20
y == 16
x != 5
4. Logical Operators
5. Assignment operator-

6. Vector operations-
• Vector operations are functions that make
calculations on a complete vector, like sum().
• Each result depends on more than one value of the
vector.
7. Matrix operations-
• These functions are used for operations and
calculations on matrices.
Objects
• Unlike other programming languages, variables
are assigned to objects rather than data types
in R programming.
• Instead of declaring data types, as done in C++
and Java, in R, the user assigns the variables with
certain Objects in R, the most popular are:
• Vectors
• Factors
• Lists
• Data Frames
• Matrices
• The data type of the object in R becomes the
data type of the variable by definition.
• R's basic data types are character, numeric,
integer, complex, and logical.
Data Structure
• A data structure is a particular way of
organizing data in a computer so that it can be
used effectively.
• The idea is to reduce the space and time
complexities of different tasks.
• Data structures in R programming are tools
for holding multiple values.
• R’s base data structures are often organized by
their dimensionality (1D, 2D, or nD)
• The most essential data structures used in R
include:
• Vectors
• Lists
• Dataframes
• Matrices
• Arrays
• Factors
vector
• A vector is the simplest type of data structure
in R.
• Vectors are single-dimensional,
homogeneous data structures. To create a
vector, use the c() function.
• A vector is a sequence of data elements of the
same basic type.
• If you want to check the variable type,
use class().
• Vectors in R are the same as the arrays in C
language which are used to hold multiple data
values of the same type.
• One major key point is that in R the indexing
of the vector will start from ‘1’ and not from
‘0’. We can create numeric vectors and
character vectors as well.
Types of vectors
• Vectors are of different types which are used in R.
Following are some of the types of vectors:
• There are 5 data types of the simplest object -
vector:
1. Logical
2. Numeric
3. Character
4. Raw
5. Complex
Numeric vectors
Numeric vectors are those which contain numeric values
such as integer, float, etc.

# R program to create numeric Vectors


# creation of vectors using c() function.
v1 <- c(4, 5, 6, 7)
# display type of vector
typeof(v1)

# by using 'L' we can specify that we want integer values.


v2 <- c(1L, 4L, 2L, 5L)
# display type of vector
typeof(v2)
Character vectors
Character vectors contain alphanumeric
values and special characters.
# R program to create Character Vectors

# by default numeric values


# are converted into characters
v1 <- c('geeks', '2', 'hello', 57)
# Displaying type of vector
typeof(v1)
Logical vectors
Logical vectors contain boolean values such as
TRUE, FALSE and NA for Null values.
# R program to create Logical Vectors

# Creating logical vector


# using c() function
v1 <- c(TRUE, FALSE, TRUE, NA)
# Displaying type of vector
typeof(v1)
• numeric_vec <- c(1,2,3,4,5)
• integer_vec <- c(1L,2L,3L,4L,5L)
• logical_vec <- c(TRUE, TRUE, FALSE, FALSE,
FALSE)
• complex_vec <- c(12+2i, 3i, 4+1i, 5+12i, 6i)
• character_vec <- c("techvidvan", "this", "is",
"a", "character vector")
• > numeric_vec
• > integer_vec
• > logical_vec
• > complex_vec
• > character_vec
How to Create Vector in R?
• There are different ways of creating vectors.
1. Generally, we use ‘c’ to combine different elements together.
• The c() function is used for creating a vector in R. This function
returns a one-dimensional array, also known as vector.
• For example:

• x <- c(1,2,3,4)
• There are several other ways of creating a vector:
2. Using the Operator

• x <- 1:5
• For y operator:
• y <- 5:-5
2. Create R vector using seq() function
• There are also two ways in this. The first way is to
set the step size and the second method is by
setting the length of the vector.
1) Setting step size with ‘by’ parameter:

seq(2,4, by = 0.4)
• (2.0,2.4,2.8,3.2,3.6,4.0)
2) Specifying length of vector with the ‘length.out’
feature:
• seq(1,4, length.out = 5)
• (1.00,1.75,2.50,3.25,4.00)
EXAMPLE :
# R program to create Vectors we can use the c function to
combine the values as a vector. By default the type will be
double
X <- c(61, 4, 21, 67, 89, 2)
cat('using c function', X, '\n')
# seq() function for creating
# a sequence of continuous values.
# length.out defines the length of vector.
Y <- seq(1, 10, length.out = 5)
cat('using seq() function', Y, '\n')
# use':' to create a vector
# of continuous values.
Z <- 2:7
cat('using colon', Z)
How to Access Elements of R Vectors?

• With the help of vector indexing, we can access


the elements of vectors. Indexing denotes the
position where the values in a vector are stored.
1. Indexing with Integer Vector
• Unlike many programming languages like Python,
C++, Java etc. where the indexing starts from 0,
the indexing of vectors in R starts with 1.
• X <- c(2, 5, 18, 1, 12)
• cat('Using Subscript operator', X[2:4],
'\n')
2. Indexing with Character Vector
• Character vector indexing can be done as
follows:
• x <- c("One" = 1, "Two" = 2, "Three" = 3)
• x["Two"]
3. Indexing with Logic Vector
• In logical indexing, the positions whose
corresponding position has logical vector
TRUE are returned.
• a <- c(1,2,3,4)
• a[c(TRUE, FALSE, TRUE, FALSE)]
# R program to access elements of a Vector accessing
elements with an index number.
X <- c(2, 5, 18, 1, 12)
cat('Using Subscript operator', X[2], '\n')
# by passing a range of values inside the vector index.
Y <- c(4, 8, 2, 1, 17)
cat('Using combine() function', Y[c(4, 1)], '\n')
# using logical expressions
Z <- c(5, 2, 1, 4, 4, 3)
cat('Using Logical indexing', Z[Z>4])

Output:
Using Subscript operator 5
Using combine() function 1 4
Using Logical indexing 5
Modifying a vector
• Modification of a Vector is the process of
applying some operation on an individual
element of a vector to change its value in the
vector. There are different ways through which
we can modify a vector:
# R program to modify elements of a Vector
Creating a vector
X <- c(2, 7, 9, 7, 8, 2)
# modify a specific element
X[3] <- 1
X[2] <-9
cat('subscript operator', X, '\n')
# Modify using different logics.
X[X>5] <- 0
cat('Logical indexing', X, '\n')
# Modify by specifying
# the position or elements.
X <- X[c(3, 2, 1)]
cat('combine() function', X)
fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Access the first and third item (banana and orange)
fruits[c(1, 3)]
# Access all items except for the first item
fruits[c(-1)]
Output
subscript operator 2 9 1 7 8 2
Logical indexing 2 0 1 0 0 2
combine() function 1 0 2
Change an Item
To change the value of a specific item, refer to the index
number:
Example
fruits <-
c("banana", "apple", "orange", "mango", "lemon")
# Change "banana" to "pear"
fruits[1] <- "pear"
# Print fruits
fruits
• Deleting a vector
Deletion of a Vector is the process of deleting all of
the elements of the vector. This can be done by
assigning it to a NULL value.
• Delete by rm()
# R program to delete a Vector Creating a Vector
M <- c(8, 10, 2, 5)
# set NULL to the vector
M <- NULL
cat('Output vector', M)
rm(M)#delete entire vector from memory
Output:
Output vector NULL
Object not found
FUNCTIONS OF VECTOR
• v<-c(10,20,30,40,50)
• min(v)
• max(v)
• sum()
• prod()
• sort()
• length()
Sorting elements of a Vector
sort() function is used with the help of which we can sort the values in
ascending or descending order.
# R program to sort elements of a Vector Creation of Vector
X <- c(8, 2, 7, 1, 11, 2)
# Sort in ascending order
A <- sort(X)
cat('ascending order', A, '\n')
# sort in descending order
# by setting decreasing as TRUE
B <- sort(X, decreasing = TRUE)
cat('descending order', B)

Output:
ascending order 1 2 2 7 8 11
descending order 11 8 7 2 2 1
order()
Sorting by order()
X<-c(10, 40,50,20,30)
X[order(x)]#increasing order
X[order(-x)]#decreasing order
Reverse the order
x<-c(10,20,30,40)
rv<-rev(x)
Print(rv)
seq() or Generating Sequenced Vectors
One of the examples on top, showed you how to create a
vector with numerical values in a sequence with
the : operator:
Example
numbers<- 1:10
numbers
1. To make bigger or smaller steps in a sequence, use
the seq() function:
Example
numbers<- seq(from = 0, to = 100, by = 20)
numbers
Note: The seq() function has three parameters: from is
where the sequence starts, to is where the sequence
stops, and by is the interval of the sequence.
• To create a vector with numerical values in a
sequence, use the : operator:
Example
• # Vector with numerical values in a sequence
numbers <- 1:10
numbers
• You can also create numerical values with decimals in
a sequence, but note that if the last element does not
belong to the sequence, it is not used:
Example
• # Vector with numerical decimals in a sequence
numbers1 <- 1.5:6.5
numbers1

# Vector with numerical decimals in a sequence where


the last element is not used
numbers2 <- 1.5:6.3
numbers2
Result:
• [1] 1.5 2.5 3.5 4.5 5.5 6.5
[1] 1.5 2.5 3.5 4.5 5.5
Vector Length
To find out how many items a vector has, use
the length() function:
Example
fruits<- c("banana", "apple", "orange")
length(fruits)
Output:
3
rep() or Repeat Vectors
• To repeat vectors, use the rep() function:
Example
1. Repeat each value:
• repeat_each rep(c(2, 4, 2), each = 2)
repeat_each
• output:
• [1] 2 2 4 4 2 2
2. Repeat the sequence of the vector:
repeat_times<- rep(c(0, 0, 7), times = 4)
repeat_times
Output: >
• [1] 0 0 7 0 0 7 0 0 7 0 0 7

3. Repeat each value independently:


• repeat_indepent<- rep(c(0, 7), times = c(4,3))
repeat_indepent
Output:
[1] 0 0 0 0 7 7 7
any() Function

• The any() function takes a vector and a logical


condition as input arguments. It checks the
vector against the condition and creates
a logical vector. It then returnTRUE,
• if any one of the elements in the logical vector
is TRUE.
• any(…, na.rm=FALSE)
all() Function

• The all() function takes a vector and a logical


condition as input arguments. It checks the
vector against the condition and creates a
logical vector.
• It then returns TRUE if all the elements in the
logical vector are TRUE, and FALSE if all
elements are not TRUE.
• all(…, na.rm=FALSE)
x <- 1:10
any(x > 5)
[1] TRUE

> any(x > 88)


[1] FALSE

> all(x > 88)


[1] FALSE

> all(x > 0)


[1] TRUE
is.vector() Function and is.element()

• The is.vector() function takes an object as an input and


returns TRUE if the input object is a vector. It returns FALSE if the
object is not a vector.
• is.element() takes an object as an input and returns TRUE if the
input object present in a vector. It returns FALSE if the object is not
present.

numbers <- 1:10


is.vector( numbers)
Is.element(4,numbers)
Output:
[1] TRUE
as.vector() Function

• The as.vector() function takes an object as an


argument and converts it into a vector.
• mat_to_vec <- matrix(c(1:9),c(3,3))
mat_to_vec # matrix creation
• mat_to_vec <- as.vector(mat_to_vec)
mat_to_vec #convert
Output :
[1] 1 2 3 4 5 6 7 8 9
The lapply() Function

• The lapply() function takes a vector, list or a


data frame and a function.
The lapply() function applies a function to all
elements of a vector, list or data frames. The
function then returns the result in the form of
a list. For example:
• names <- c("JOHN","RICK","RAHUL","ABDUL")
lapply(names,tolower)
The sapply() Function

• The sapply() is very similar to


the lappy() function. The only difference is
that the sapply() function returns the result in
the form of a vector. For example:
• Code:
• sapply(names,tolower)
Recycled rule vector
• X<-c(2,1,3,8)
• Y<-c(1,4)
• x+y
• #automatic recycle in given length of v1
#recycle 1,4,1,4
Operations in R Vector
1. Combining Vector in R
• The c() function is not only used to create a vector, but also it is also
used to combine two vectors. By combining one or more vectors, it
forms a new vector which contains all the elements of each vector.
Let see an example to see how c() function combines the vectors.

Example:
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)
Output
[1] "1“ "2“ "4“ "5" "7" "8"
[7] "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"
2. Arithmetic Operations

Arithmetic operations on vectors can be performed member-by-


member.
Suppose we have two vectors a and b:
a = c (1, 3)
b = c (1, 3)
For Addition:
a+b
For subtraction:
a-b
For division:
a/b
For remainder operation:
a %% b
Logical operations
• Logical operations in R simulate element-wise
decision operations, based on the specified
operator between the operands, which are
then evaluated to either a True or False
boolean value.
• Any non-zero integer value is considered as a
TRUE value, be it a complex or real number.
# R program to illustrate # the use of Logical operators
vec1 <- c(0,2)
vec2 <- c(TRUE,FALSE)

# Performing operations on Operands


cat ("Element wise AND :", vec1 & vec2, "\n")
cat ("Element wise OR :", vec1 | vec2, "\n")
cat ("Logical AND :", vec1 && vec2, "\n")
cat ("Logical OR :", vec1 || vec2, "\n")
cat ("Negation :", !vec1)
• Output
• Element wise AND : FALSE FALSE
• Element wise OR : TRUE TRUE
• Logical AND : FALSE
• Logical OR : TRUE
• Negation : TRUE FALSE
Relational Operations

• The relational operators in R carry out


comparison operations between the
corresponding elements of the operands.
Returns a boolean TRUE value if the first
operand satisfies the relation compared to the
second. A TRUE value is always considered to
be greater than the FALSE.
# R program to illustrate
# the use of Relational operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Vector1 less than Vector2 :", vec1 < vec2, "\n")
cat ("Vector1 less than equal to Vector2 :", vec1 <= vec2, "\n")
cat ("Vector1 greater than Vector2 :", vec1 > vec2, "\n")
cat ("Vector1 greater than equal to Vector2 :", vec1 >= vec2, "\n")
cat ("Vector1 not equal to Vector2 :", vec1 != vec2, "\n")
• Output
• Vector1 less than Vector2 : TRUE TRUE
• Vector1 less than equal to Vector2 : TRUE TRUE
• Vector1 greater than Vector2 : FALSE FALSE
• Vector1 greater than equal to Vector2 : FALSE FALSE
• Vector1 not equal to Vector2 : TRUE TRUE
Assignment Operations

• Assignment operators in R are used to


assigning values to various data objects in R.
The objects may be integers, vectors, or
functions. These values are then stored by the
assigned variable names. There are two kinds
of assignment operators: Left and Right
• Left Assignment (<- or <<- or =)
# R program to illustrate
# the use of Assignment operators
vec1 <- c(2:5)
c(2:5) ->> vec2
vec3 <<- c(2:5)
vec4 = c(2:5)
c(2:5) -> vec5

# Performing operations on Operands


cat ("vector 1 :", vec1, "\n")
cat("vector 2 :", vec2, "\n")
cat ("vector 3 :", vec3, "\n")
cat("vector 4 :", vec4, "\n")
cat("vector 5 :", vec5)

Output:
vector 1 : 2 3 4 5
vector 2 : 2 3 4 5
vector 3 : 2 3 4 5
vector 4 : 2 3 4 5
vector 5 : 2 3 4 5
Miscellaneous Operations

These are the mixed operators in R that simulate the


printing of sequences and assignment of vectors, either
left or right-handed.
• %in% Operator
Checks if an element belongs to a list and returns a
boolean value TRUE if the value is present else FALSE.
val <- 0.1
list1 <- c(TRUE, 0.1,"apple")
print (val %in% list1)
Output : TRUE Checks for the value 0.1 in the specified
list. It exists, therefore, prints TRUE.
3. Logical Index Vector in R -
• By using a logical index vector in R, we can form a
new vector from a given vector, which has the
same length as the original vector.
• If the corresponding members of the original
vector are included in the slice, then vector
members are TRUE and otherwise FALSE.
• a<c("Shubham","Arpita","Nishka","Vaishali","Sum
it","Gunjan")
• b<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
• a[b]
• Output
• [1] "Shubham" "Nishka" "Vaishali"
4. Numeric Index
• For indexing a numerical value in R, we specify the
index between square braces [ ].
• If our index is negative, then R will return us all the
values except for the index that we have specified.
q<c("shubham","arpita","nishka","gunjan","vaishali","su
mit")
q[2]
q[-4]
q[15]
Output
[1] "arpita" [1] "shubham" "arpita" "nishka" "vaishali"
"sumit" [1] NA
5. Duplicate Index
• The index vector allows duplicate values. Hence, the
following retrieves a member twice in one operation.
q<c("shubham","arpita","nishka","gunjan","vaishali","s
umit")
q[2]
q[-4]
q[15]
Output
[1] "arpita“
[1] "shubham" "arpita" "nishka" "vaishali" "sumit"
[1] NA
6. Range Indexes
• To produce a vector slice between two
indexes, we can use the colon operator “:“. It
is convenient for situations involving large
vectors.
• s = c("aa", "bb", "cc", "dd", "ee")
• s[1:3]
• OUTPUT
"aa", "bb", "cc"
7. Out-of-order Indexes
• The index vector can even be out-of-order.
Here is a vector slice with the order of first
and second members reversed.
q<c("shubham","arpita","nishka","gunjan","vais
hali","sumit")
b<-q[2:5]
q[c(2,1,3,4,5,6)]
Output
[1] "arpita" "shubham" "nishka" "gunjan"
"vaishali" "sumit"
8) Named vectors members
We first create our vector of characters as:
z=c("TensorFlow","PyTorch")
z
Output
[1] "TensorFlow" "PyTorch"

Once our vector of characters is created, we name the first vector


member as "Start" and the second member as "End" as:
names(z)=c("Start","End")
z
Output
Start End "TensorFlow" "PyTorch"

We retrieve the first member by its name as follows:


z["Start"]
Output
Start "TensorFlow"
• Delete Particular Element From vector:
• # R program to delete 2nd element from Vector
• # Creating a Vector
• M <- c(8, 10, 2, 5)
• M<-M[-2]
• OR
• # set indices to the vector
• indices<-c(2)
• result<-M[-indices]
• print(result)
Applications of vectors
• In machine learning for principal component
analysis vectors are used. They are extended to
eigenvalues and eigenvector and then used for
performing decomposition in vector spaces.
• The inputs which are provided to the deep
learning model are in the form of vectors. These
vectors consist of standardized data which is
supplied to the input layer of the neural network.
• In the development of support vector machine
algorithms, vectors are used.
• Vector operations are utilized in neural networks
for various operations like image recognition and
text processing.
Factor
• R factor is used to store categorical data as
levels.
• It can store both character and integer types
of data.
• These factors are created with the help
of factor() functions, by taking a vector as
input.
• R factors are variables. The factor is stored as
integers.
• Factors in R Programming Language are data
structures that are implemented to categorize
the data or represent categorical data and
store it on multiple levels.
• They can be stored as integers with a
corresponding label to every unique integer.
• The R factors may look similar to
character vectors, they are integers and care
must be taken while using them as strings.
• The R factor accepts only a restricted number of
distinct values.
• For example, a data field such as gender may
contain values only from female, male, or
transgender.
• The factor is a data structure which is used for
fields which take only predefined finite number
of values. These are the variable which takes a
limited number of different values. These are
the data objects which are used to categorize
the data and to store it on multiple levels. It can
store both integers and strings values, and are
useful in the column that has a limited number
of unique values.
• Factors have labels which are associated with
the unique integers stored in it.
• It contains predefined set value known as
levels and by default R always sorts levels in
alphabetical order.
Attributes of a factor
There are the following attributes of a factor in R

• X
It is the input vector which is to be transformed into a factor.
• levels
It is an input vector that represents a set of unique values which are
taken by x.
• labels
It is a character vector which corresponds to the number of labels.
• Exclude
It is used to specify the value which we want to be excluded,
• ordered
It is a logical attribute which determines if the levels are ordered.
• nmax
It is used to specify the upper bound for the maximum number of
level.
How to create a factor?

In R, it is quite simple to create a factor. A factor


is created in two steps:

• In the first step, we create a vector.


• Next step is to convert the vector into a factor,

R provides factor() function to convert the


vector into factor. There is the following syntax
of factor() function
Syntax : factor_data<- factor(vector)
# Creating a vector
x <-c("female", "male", "male", "female")
print(x)
# Converting the vector x into a factor
# named gender
gender <-factor(x)
print(gender)

Output
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
# Creating a factor with levels defined by
programmer
gender <- factor(c("female", "male", "male",
"female"),levels = c("female", "transgender",
"male"));

gender
Output
[1] female male male female
Levels: female transgender male
Checking for a Factor in R

• The function is.factor() is used to check


whether the variable is a factor and returns
“TRUE” if it is a factor.
gender <- factor(c("female", "male", "male", "female"));
print(is.factor(gender))
Output
[1] TRUE
Function class() is also used to check whether
the variable is a factor and if true returns
“factor”

gender <- factor(c("female", "male", "male",


"female"));
class(gender)

Output
[1] "factor"
Accessing elements of a Factor in R
Like we access elements of a vector, the same way
we access the elements of a factor. If gender is a
factor then gender[i] would mean accessing an
ith element in the factor.
Example
gender <- factor(c("female", "male", "male",
"female"));
gender[3]

Output
[1] male Levels: female male
• More than one element can be accessed at a
time.
Example
gender <- factor(c("female", "male", "male",
"female"));
gender[c(2, 4)]
Output
[1] male female Levels: female male
Subtract one element at a time.
Example
gender <- factor(c("female", "male", "male",
"female" ));
gender[-3]

Output
[1] female male female
Levels: female male
How to Create a Factor

• directions <- c("North", "North", "West", "South")


• factor(directions)
• o/p= levels: North, South,West
• In order to add this missing level to our factors, we use
the “levels” attribute as follows:
• factor(directions, levels= c("North", "East", "South",
"West"))
• In order to provide abbreviations or ‘labels’ to our
levels, we make use of the labels argument as follows –
• factor(directions, levels= c("North", "East", "South",
"West"), labels=c("N", "E", "S", "W"))
Modification of a Factor in R
After a factor is formed, its components can be
modified but the new values which need to be
assigned must be at the predefined level.
Example
gender <- factor(c("female", "male", "male",
"female" ));
gender[2]<-"female"
Gender
Output
[1] female female male female Levels: female male
# Creating a vector as input.
data <- c("Shubham","Nishka","Arpita","Nishka","Shubham")

# Applying the factor function.


factor_data<- factor(data)

#Printing all elements of factor


print(factor_data)

#Change 4th element of factor with sumit


factor_data[4] <-"Arpita"
print(factor_data)

#change 4th element of factor with "Gunjan"


factor_data[4] <- "Gunjan" # cannot assign values outside levels
print(factor_data)

#Adding the value to the level


levels(factor_data) <- c(levels(factor_data),"Gunjan")#Adding new level
factor_data[4] <- "Gunjan"
print(factor_data)
Output
[1] Shubham Nishka Arpita Nishka Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Arpita Shubham
Levels: Arpita Nishka Shubham
Warning message: In `[<-.factor`(`*tmp*`, 4, value
= "Gunjan") : invalid factor level, NA generated
[1] Shubham Nishka Arpita Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Gunjan Shubham
Levels: Arpita Nishka Shubham Gunjan
Changing order of the levels
• In R, we can change the order of the levels in the factor with the
help of the factor function.
Example
data <-
c("Nishka","Gunjan","Shubham","Arpita","Arpita","Sumit","Gunjan
","Shubham")
# Creating the factors
factor_data<- factor(data)
print(factor_data)

# Apply the factor function with the required order of the level.
new_order_factor<-
factor(factor_data,levels = c("Gunjan","Nishka","Arpita","Shubham
","Sumit"))
print(new_order_factor)
• Output
• [1] Nishka Gunjan Shubham Arpita Arpita
Sumit Gunjan Shubham
• Levels: Arpita Gunjan Nishka Shubham Sumit
• [1] Nishka Gunjan Shubham Arpita Arpita
Sumit Gunjan Shubham
• Levels: Gunjan Nishka Arpita Shubham Sumit
Generating Factor Levels

• R provides gl() function to generate factor


levels. This function takes three arguments
i.e., n, k, and labels. Here, n and k are the
integers which indicate how many levels we
want and how many times each level is
required.
There is the following syntax of gl() function
which is as follows:
gl(n, k, labels)
n indicates the number of levels.
k indicates the number of replications.
labels is a vector of labels for the resulting factor
levels.
Example
gen_factor<-
gl(3,5,labels=c("BCA","MCA","B.Tech"))
gen_factor
Output
[1] BCA BCA BCA BCA BCA MCA MCA MCA MCA
MCA
[11] B.Tech B.Tech B.Tech B.Tech B.Tech
Levels: BCA MCA B.Tech
Cont’d..
• if you want to exclude any level from your factor, you
can make use of the exclude argument.
• factor(directions, levels= c("North", "East", "South",
"West"), exclude = "North")
• There are various ways to access the elements of a
factor in R. Some of the ways are as follows:
• data <- c("East", "West", "East", "North)
• data[4]
• data[c(2,3)]
• data[-1]
• data[c(TRUE, FALSE, TRUE, TRUE)]
How to Modify an R Factor?

• To modify a factor, we are only limited to the


values that are not outside the predefined
levels.
• print(data)
• data[2] <- "North"
• data[3] <- "South"
Factor Functions in R
• is.factor() checks if the input is present in the form of
factor and returns a Boolean value (TRUE or FALSE).
• as.factor() takes the input (usually a vector) and
converts it into a factor.
• is.ordered() checks if the factor is ordered and returns
boolean TRUE or FALSE.
• as.ordered() function takes an unordered function and
returns a factor that is arranged in order.
• f_directions <- factor(directions)
• is.factor(f_directions)
• as.factor(directions)
• is.ordered(f_directions)
• as.ordered(f_directions)
R Strings

• Strings are a bunch of character variables.


• It is a one-dimensional array of characters.
• One or more characters enclosed in a pair of matching
single or double quotes can be considered a string in R.
• Strings in R Programming represent textual content and can
contain numbers, spaces, and special characters.
• An empty string is represented by using “.
• R Strings are always stored as double-quoted values.
• A double-quoted string can contain single quotes within it.
Single-quoted strings can’t contain single quotes. Similarly,
double quotes can’t be surrounded by double quotes.
• Creation of String in R
• R Strings can be created by assigning character
values to a variable. These strings can be
further concatenated by using various
functions and methods to form a big string.
Example
R

# R program for String Creation

# creating a string with double quotes


str1 <- "OK1"
cat ("String 1 is : ", str1)

# creating a string with single quotes


str2 <- 'OK2'
cat ("String 2 is : ", str2)
str3 <- "This is 'acceptable and 'allowed' in R"
cat ("String 3 is : ", str3)
str4 <- 'Hi, Wondering "if this "works"'
cat ("String 4 is : ", str4)
str5 <- 'hi, ' this is not allowed'
cat ("String 5 is : ", str5)
Output
String 1 is: OK1 String 2 is: OK2 String 3 is: This is
'acceptable and 'allowed' in R String 4 is: Hi,
Wondering "if this "works" Error: unexpected
symbol in " str5 <- 'hi, ' this" Execution halted
• Length of String
• The length of strings indicates the number of
characters present in the string. The
function str_length() belonging to
the ‘string’ package or nchar() inbuilt function
of R can be used to determine the length of
strings in R.
• Using the str_length() function
# R program for finding length of string

# Importing package
library(stringr)

# Calculating length of string


str_length("hello")

Output
• 5
• Using nchar() function
• R

• # R program to find length of string

• # Using nchar() function
• nchar("hel'lo")
• Output
• 6
• Accessing portions of an R string
• The individual characters of a string can be
extracted from a string by using the indexing
methods of a string. There are two R’s inbuilt
functions in order to access both the single
character as well as the substrings of the string.
• substr() or substring() function in R extracts
substrings out of a string beginning with the start
index and ending with the end index. It also
replaces the specified substring with a new set of
characters.
• Syntax
• substr(..., start, end) or substring(..., start, end)
• Using substr() function
• R

• # R program to access
• # characters in a string

• # Accessing characters
• # using substr() function
• substr("Learn Code Tech", 1, 1)

• Output
• "L"
• If the starting index is equal to the ending
index, the corresponding character of the
string is accessed. In this case, the first
character, ‘L’ is printed.
• Using substring() function
• R

• # R program to access characters in string
• str <- "Learn Code"

• # counts the characters in the string
• len <- nchar(str)

• # Accessing character using
• # substring() function
• print (substring(str, len, len))

• # Accessing elements out of index
• print (substring(str, len+1, len+1))
• Output
• [1] "e"
• The number of characters in the string is 10. The first print
statement prints the last character of the string, “e”, which is
str[10]. The second print statement prints the 11th character of the
string, which doesn’t exist, but the code doesn’t throw an error and
print “”, that is an empty character.
• The following R code indicates the mechanism of String Slicing,
where in the substrings of a R string are extracted:
• R

• # R program to access characters in string
• str <- "Learn Code"

• # counts the number of characters of str = 10
• len <- nchar(str)
• print(substr(str, 1, 4))
• print(substr(str, len-2, len))
• Output
• [1]"Lear" [1]"ode"
Case Conversion
• The R string characters can be converted to
upper or lower case by R’s inbuilt
function toupper() which converts all the
characters to upper case, tolower() which
converts all the characters to lower case,
and casefold(…, upper=TRUE/FALSE) which
converts on the basis of the value specified to
the upper argument. All these functions can
take in as arguments multiple strings too. The
time complexity of all the operations is
O(number of characters in the string).
• Example
• # R program to Convert case of a string
• str <- "Hi LeArn CodiNG"
• print(toupper(str))
• print(tolower(str))
• print(casefold(str, upper = TRUE))
• Output
• [1] "HI LEARN CODING" [1] "hi learn coding" [1]
"HI LEARN CODING"
By default, the value of upper in casefold() function is set to FALSE. If we set it to
TRUE, the R string gets printed in upper case.
• Concatenation of R Strings
• Using R’s paste function, you can concatenate strings. Here is a
straightforward example of code that joins two strings together:

• R

• # Create two strings
• string1 <- "Hello"
• string2 <- "World"

• # Concatenate the two strings
• result <- paste(string1, string2)

• # Print the result
• print(result)
• Output
• "Hello World"
• # Concatenate three strings
• result <- paste("Hello", "to", "the World")

• # Print the result
• print(result)
• R String formatting
• String formatting in R is done via the sprintf
function. An easy example of code that prepares
a string using a variable value is provided below:
• # Create two variables with values
• x <- 42
• y <- 3.14159

• # Format a string with the two variable values
• result <- sprintf("The answer is %d, and pi is
%.2f.", x, y)

• # Print the result
• print(result)
• Updating R strings
• The characters, as well as substrings of a
string, can be manipulated to new string
values. The changes are reflected in the
original string. In R, the string values can be
updated in the following way:
• substr (..., start, end) <- newstring substring
(..., start, end) <- newstring
• # Create a string
• string <- "Hello, World!"

• # Replace "World" with "Universe"
• string <- gsub("World", "Universe", string)

• # Print the updated string
• print(string)
• Output
• "Hello, Universe!"
• Multiple strings can be updated at once, with the
start <= end.
• If the length of the substring is larger than the
new string, only the portion of the substring
equal to the length of the new string is replaced.
• If the length of the substring is smaller than the
new string, the position of the substring is
replaced with the corresponding new string
values.
String Function
• R provides various string functions to perform
tasks. These string functions allow us to
extract sub string from string, search pattern
etc. There are the following string functions in
R:
1.substr(x, start=n1,stop=n2)
It is used to extract substrings in a character vector.
a <- "987654321"
substr(a, 3, 3)
Output[1] "3“
2.grep(pattern, x , ignore.case=FALSE, fixed=FALSE)
It searches for pattern in x.
st1 <- c('abcd','bdcd','abcdabcd')
pattern<- '^abc'
print(grep(pattern, st1))
Output[1] 1 3
• 3.sub(pattern, replacement, x, ignore.case
=FALSE, fixed=FALSE)
• It finds pattern in x and replaces it with
replacement (new) text.
st1<- "England is beautiful but no the part of
EU"
sub("England', "UK", st1)
Output[1] "UK is beautiful but not a part of EU"
4. paste(..., sep="")
It concatenates strings after using sep string to
separate them.
paste('one',2,'three',4,'five')
Output[1] one 2 three 4 five
5.strsplit(x, split)
It splits the elements of character vector x at
split point.
a<-"Split all the character“
print(strsplit(a, ""))
Output[[1]] [1] "split" "all" "the" "character"
6.tolower(x)
It is used to convert the string into lower case.
st1<- "shuBHAm"
print(tolower(st1))
Output[1] shubham

7.toupper(x)
It is used to convert the string into upper case.
st1<- "shuBHAm"
print(toupper(st1))
Output[1] SHUBHAM
Reading Strings
• We can read strings from a keyboard using the
readline() fun.
• It lets the user to enter a one-line string at the
terminal.
• Value <- readline(prompt=“string”)
• Ex. Print(n<-readline(prompt=“enter the subject:”))
• Enter the subject : R
• [1] “R”
List
• Lists are the objects of R which contain
elements of different types such as number,
vectors, string and another list inside it.
• It can also contain a function or a matrix as its
elements.
• A list is a data structure which has components
of mixed data types. We can say, a list is a
generic vector which contains other objects.
• A list in R is a generic object consisting of an
ordered collection of objects.
• Lists are one-dimensional, heterogeneous
data structures.
• The list can be a list of vectors, a list of
matrices, a list of characters and a list of
functions, and so on.
• A list is a vector but with heterogeneous data
elements.
Lists creation

• The process of creating a list is the same as a


vector. In R, the vector is created with the help
of c() function. Like c() function, there is
another function, i.e., list() which is used to
create a list in R.
• A list avoid the drawback of the vector which
is data type. We can add the elements in the
list of different data types.
Syntax : list()
Creating a List
Example
vec <- c(3,4,5,6)
char_vec<-
c("shubham","nishka","gunjan","sumit")
logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
out_list<-list(vec,char_vec,logic_vec)
out_list
Output:
[[1]] [1] 3 4 5 6
[[2]] [1] "shubham" "nishka" "gunjan" "sumit“
[[3]] [1] TRUE FALSE FALSE TRUE
Creating the list with different data type
list_data<-
list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,
22.5,12L)
print(list_data)
Output:
[[1]] [1] "Shubham"
[[2]] [1] "Arpita“
[[3]] [1] 1 2 3 4 5
[[4]] [1] TRUE
[[5]] [1] FALSE
[[6]] [1] 22.5
[[7]] [1] 12
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
# We can combine all these three different DT
empList = list(empId, empName, numberOfEmp)
print(empList)

Output:
[[1]]
[1] 1 2 3 4

[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"

[[3]]
[1] 4
Giving a name to list elements
• R provides a very easy way for accessing
elements, i.e., by giving the name to each
element of a list.
• By assigning names to the elements, we can
access the element easily. There are only three
steps to print the list data corresponding to the
name:
1. Creating a list.
2. Assign a name to the list elements with the help
of names() function.
3. Print the list data.
Example
# Creating a list containing a vector, a matrix and a list.
list_data <-
list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow =
2),
list("BCA","MCA","B.tech"))

# Giving names to the elements in the list.


names(list_data) <- c("Students", "Marks", "Course")

# Show the list.


print(list_data)
Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3] [1,] 40 60 90 [2,] 80 70 80
$Course
$Course[[1]] [1] "BCA"
$Course[[2]] [1] "MCA“
$Course[[3]] [1] "B. tech."
Naming List Elements

• The list elements can be given names and they


can be accessed using these names.
• list_data <- list(c("Jan","Feb","Mar"),
matrix(c(3,9,5,1,-2,8), nrow = 2))
• names(list_data) <- c("1st Quarter",
"A_Matrix")
• print(list_data)
Accessing List Elements
We can access components of a list in two ways.
• Access components by names and index: All
the components of a list can be named and
index
• we can use those names to access the
components of the list using the dollar
command.
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
print(empList)

# Accessing components by names


cat("Accessing name components using $ command\n")
print(empList$Names)
Example

# Creating a list by naming all its components


empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
print(empList)

# Accessing a top level components by indices


cat("Accessing name components using indices\n")
print(empList[[2]])

# Accessing a inner level components by indices


cat("Accessing Sandeep from name using indices\n")
print(empList[[2]][2])

# Accessing another inner level components by indices


cat("Accessing 4 from ID using indices\n")
print(empList[[1]][4])
Output:
$ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$`Total Staff`
[1] 4
Accessing name components using indices
[1] "Debi" "Sandeep" "Subham" "Shiba"
Accessing Sandeep from name using indices
[1] "Sandeep"
Accessing 4 from ID using indices
[1] 4
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-
2,8), nrow = 2), list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner
list")
# Access the first element of the list.
print(list_data[1])
# Access the third element. As it is also a list, all its
elements will be printed.
print(list_data[3])
# Access the list element using the name of the element.
print(list_data$A_Matrix)
Manipulating List Elements

• We can add, delete and update list elements


as shown below.
• We can add and delete elements only at the
end of a list. But we can update any element.
• list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8),
nrow = 2), list("green",12.3))
• names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner
list")
# Add element at the end of the list.
• list_data[4] <- "New element"
• print(list_data[4])

# Remove the last element.


• list_data[4] <- NULL

# Update the 3rd Element.


list_data[3] <- "updated element“
print(list_data[3])
Example
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
cat("Before modifying the list\n")
print(empList)

# Modifying the top-level component


empList$`Total Staff` = 5

# Modifying inner level component


empList[[1]][5] = 5
empList[[2]][5] = "Kamala"

cat("After modified the list\n")


print(empList)
Output:
Before modifying the list
$ID [1] 1 2 3 4
$Names [1] "Debi" "Sandeep" "Subham" "Shiba"
$`Total Staff` [1] 4
After modified the list
$ID [1] 1 2 3 4 5
$Names [1] "Debi" "Sandeep" "Subham" "Shiba"
"Kamala"
$`Total Staff` [1] 5
• Concatenation of lists
• Two R lists can be concatenated using the
concatenation function. So, when we want to
concatenate two lists we have to use the
concatenation operator.
• Syntax:
• list = c(list, list1)
list = the original list
list1 = the new list
Merging Lists

• You can merge many lists into one list by


placing all the lists inside c() function.
• list1 <- list(1,2,3)
• list2 <- list("Sun","Mon","Tue")
# Merge the two lists.
• merged.list <- c(list1,list2)
# Print the merged list.
• print(merged.list)
Converting List to Vector

• A list can be converted to a vector so that the


elements of the vector can be used for further
manipulation.
• All the arithmetic operations on vectors can
be applied after the list is converted into
vectors.
• To do this conversion, we use
the unlist() function.
• It takes the list as input and produces a vector.
Cont’d…
# Create lists.
• list1 <- list(1:5)
• print(list1)
• list2 <-list(10:14)
• print(list2)
# Convert the lists to vectors.
• v1 <- unlist(list1)
• v2 <- unlist(list2)
• print(v1)
• print(v2)
# Now add the vectors
• result <- v1+v2
• print(result)
• List Length
• To find out how many items a list has, use
the length() function:
• Example
• thislist<- list("apple", "banana", "cherry")

length(thislist)
• Check if Item Exists
• To find out if a specified item is present in a
list, use the %in% operator:
• Example
• Check if "apple" is present in the list:
• thislist<- list("apple", "banana", "cherry")

"apple" %in% thislist


• Add List Items
• To add an item to the end of the list, use
the append() function:
• Example
• Add "orange" to the list:
• thislist<- list("apple", "banana", "cherry")

append(thislist, "orange")
• To add an item to the right of a specified
index, add "after=index number" in
the append() function:
• Example
• Add "orange" to the list after "banana" (index
2):
• thislist<- list("apple", "banana", "cherry")

append(thislist, "orange", after = 2)


• Remove List Items
• You can also remove list items. The following
example creates a new, updated list without an
"apple" item:
• Example
• Remove "apple" from the list:
• thislist<- list("apple", "banana", "cherry")

newlist<- thislist[-1]

# Print the new list


newlist
empId = c(1, 2, 3, 4)
empName = c("Debi", "Sandeep", "Subham", "Shiba")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
cat("Before deletion the list is\n")
print(empList)

# Deleting a top level components


cat("After Deleting Total staff components\n")
print(empList[-3])

# Deleting a inner level components


cat("After Deleting sandeep from name\n")
print(empList[[2]][-2])
• Output:
• Before deletion the list is
• $ID [1] 1 2 3 4
• $Names [1] "Debi" "Sandeep" "Subham" "Shiba"
• $`Total Staff` [1] 4
• After Deleting Total staff components
• $ID [1] 1 2 3 4
• $Names [1] "Debi" "Sandeep" "Subham" "Shiba"
• After Deleting sandeep from name [1] "Debi"
"Subham" "Shiba"
• Range of Indexes
• You can specify a range of indexes by specifying where
to start and where to end the range, by using
the : operator:
• Example
• Return the second, third, fourth and fifth item:
• thislist<-
list("apple", "banana", "cherry", "orange", "kiwi", "mel
on", "mango")
(thislist)[2:5]
Matrices
• Matrices are the R objects in which the elements
are arranged in a two-dimensional rectangular
layout.
• They contain elements of the same atomic types.
• Though we can create a matrix containing only
characters or only logical values, they are not of
much use.
• We use matrices containing numeric elements to
be used in mathematical calculations.
• Matrix is a rectangular arrangement of numbers
in rows and columns.
• In a matrix, as we know rows are the ones that
run horizontally and columns are the ones that
run vertically.
• In R programming, matrices are two-dimensional,
homogeneous data structures. These are some
examples of matrices:
• A Matrix is created using the matrix() function.
• Syntax-
matrix(data, nrow, ncol, byrow, dimnames)
• data is the input vector which becomes the
data elements of the matrix.
• nrow is the number of rows to be created.
• ncol is the number of columns to be created.
• byrow is a logical clue. If TRUE then the input
vector elements are arranged by row.
• dimname is the names assigned to the rows
and columns.
• M <- matrix(c(3:14), nrow = 4, ncol=3, byrow =
TRUE)
• print(M)
• or
• N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
• print(N)

• rownames = c("row1", "row2", "row3", "row4")


• colnames = c("col1", "col2", "col3")
• P <-matrix(c(3:14), nrow = 4, byrow = TRUE,
dimnames = list(rownames, colnames))
• print(P)
[,1] [,2] [,3]
• [1,] 3 4 5
• [2,] 6 7 8
• [3,] 9 10 11
• [4,] 12 13 14

[,1] [,2] [,3]


• [1,] 3 7 11
• [2,] 4 8 12
• [3,] 5 9 13
• [4,] 6 10 14

col1 col2 col3


row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14
Note: By default, matrices are in column-wise order.
# R program to create a matrix
A = matrix(
# Taking sequence of elements
c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE )
# Naming rows
rownames(A) = c("a", "b", "c")
# Naming columns
colnames(A) = c("c", "d", "e")
cat("The 3x3 matrix:\n")
print(A)
P <- matrix(c(3:14),nrow=4,byrow= TRUE,dimnames= list(rownames,colnames))
print(P)

Output:
The 3x3 matrix:
cde
a123
b456
c789
Creating special matrices
• R allows creation of various different types of matrices with the use
of arguments passed to the matrix() function.Matrix where all rows
and columns are filled by a single constant ‘k’:
To create such a matrix the syntax is given below:
Syntax:
matrix(k, m, n)

Parameters:
k: the constant
m: no of rows
n: no of columns

Example:
# R program to illustrate
# special matrices
# Matrix having 3 rows and 3 columns
# filled by a single constant 5
print(matrix(5, 3, 3))
Output:

[,1] [,2] [,3]


[1,] 5 5 5
[2,] 5 5 5
[3,] 5 5 5
Diagonal matrix:
• A diagonal matrix is a matrix in which the entries
outside the main diagonal are all zero. To create
such a matrix the syntax is given below:

• Syntax:
• diag(k, m, n)

Parameters:
k: the constants/array
m: no of rows
n: no of columns
Example:
# R program to illustrate
# special matrices
# Diagonal matrix having 3 rows and 3 columns
# filled by array of elements (5, 3, 3)
print(diag(c(5, 3, 3), 3, 3))

Output:

[,1] [,2] [,3]


[1,] 5 0 0
[2,] 0 3 0
[3,] 0 0 3
Identity matrix:

A square matrix in which all the elements of the
principal diagonal are ones and all other elements are
zeros. To create such a matrix the syntax is given
below:

Syntax:
• diag(k, m, n)

Parameters:
k: 1
m: no of rows
n: no of columns
Example:

# R program to illustrate
# special matrices
# Identity matrix having
# 3 rows and 3 columns
print(diag(1, 3, 3))

Output:

[,1] [,2] [,3]


[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
Matrix metrics
Matrix metrics mean once a matrix is created
then
• How can you know the dimension of the
matrix?
• How can you know how many rows are there
in the matrix?
• How many columns are in the matrix?
• How many elements are there in the matrix?
Example:
# R program to illustrate matrix metrics TO Create a 3x3 matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3, byrow = TRUE )
cat("The 3x3 matrix:\n")
print(A)

cat("Dimension of the matrix:\n")


print(dim(A))

cat("Number of rows:\n")
print(nrow(A))

cat("Number of columns:\n")
print(ncol(A))

cat("Number of elements:\n")
print(length(A))
# OR
print(prod(dim(A)))
Output:

The 3x3 matrix:


[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Dimension of the matrix:
[1] 3 3
Number of rows:
[1] 3
Number of columns:
[1] 3
Number of elements:
[1] 9
[1] 9
Matrix Concatenation

• Matrix concatenation refers to the merging of


rows or columns of an existing matrix.

Concatenation of a row:
The concatenation of a row to a matrix is done
using rbind().
• Concatenation of a column:
The concatenation of a column to a matrix is
done using cbind().
# R program to illustrate
# concatenation of a row in metrics

# Create a 3x3 matrix


A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol =
3,byrow = TRUE )
cat("The 3x3 matrix:\n")
print(A)
# Creating another 1x3 matrix
B = matrix( c(10, 11, 12),nrow = 1, ncol = 3)
cat("The 1x3 matrix:\n")
print(B)

# Add a new row using rbind()


C = rbind(A, B)
cat("After concatenation of a row:\n")
print(C)

# Creating another 3x1 matrix


B = matrix(c(10, 11, 12),nrow = 3,ncol = 1,byrow = TRUE)
cat("The 3x1 matrix:\n")
print(B)

# Add a new column using cbind()


C = cbind(A, B)

cat("After concatenation of a column:\n")


print(C)
OUTPUT
Another way of creating a matrix
• By using cbind() and rbind() – If we are using cbind()
function, elements will be filled column-wise and
rbind() function fills the matrix elements row-wise.
• M=cbind(c(1,2,3),c(4,5,6))
• M
[,1] [,2]
• [1,] 1 4
• [2,] 2 5
• [3,] 3 6
• M=rbind(c(1,2,3),c(4,5,6))
• M
[,1] [,2] [,3]
• [1,] 1 2 3
• [2,] 4 5 6

(3) By using dim() function- we can also create a matrix


from a vector by setting its dimensions using dim().
M = c(1,2,3,4,5,6)
dim(M) =c(2,3)
M
[,1] [,2] [,3]
• [1,] 1 3 5
• [2,] 2 4 6
Accessing Matrix Elements
• Matrix elements can be accessed in 3 different
ways-
1. Integer vector as index- An element at the mth
row and nth column of a matrix P can be
accessed by the expression P[m,n].
• We can use negative integers to specify rows or
columns to be excluded.
• If any field inside the bracket is left blank, it
selects all.
• For ex. the entire mth row of matrix P can be
extracted as P[m,] and for column P[,n].
• M= matrix(c(1:12), nrow =4, byrow= TRUE)
• M
[,1] [,2] [,3]
• [1,] 1 2 3
• [2,] 4 5 6
• [3,] 7 8 9
• [4,] 10 11 12
1. M[2,3]
2. M[2, ]
3. M[ ,3]
4. M[ , ]
5. M[ ,c(1,3)]
6. M[c(3,2) , ]
7. M[c(1,2) ,c(2,3)]
8. M[-1, ]
1. [1] 6
2. [1] 4 5 6
3. [1] 3 6 9 12
4. [,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
5. [,1] [,2]
[1,] 1 3
[2,] 4 6
[3,] 7 9
[4,] 10 12
6. [,1] [,2] [,3]
[1,] 7 8 9
[2,] 4 5 6
7. [,1] [,2]
[1,] 2 3
[2,] 5 6
8. [,1] [,2] [,3]
[1,] 4 5 6
[2,] 7 8 9
[3,] 10 11 12
# Create a 3x3 matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow =
3, ncol = 3, byrow = TRUE )
cat("The 3x3 matrix:\n")
print(A)
# Accessing first and second row
cat("Accessing first and second row\n")
print(A[1:2, ])
# Accessing first and second column
cat("Accessing first and second column\n")
print(A[, 1:2])
# Access the element at 3rd column and 1st row.
print(A[1,3])

# Access the element at 2nd column and 4th row.


print(A[4,2])

# Access only the 2nd row.


print(A[2,])

# Access only the 3rd column.


print(A[,3])

# Accessing 2
print(A[1, 2])

# Accessing 6
print(A[2, 3])
• Logical vector as index- Two logical vectors
can be used to index a matrix. In such
situation, rows and columns where the value
is TRUE is returned.
• These indexing vectors are recycled if
necessary and can be mixed with integers
vectors.
• M= matrix(c(1:12), nrow =4, byrow = TRUE)
• M[c(TRUE, FALSE,TRUE),c(TRUE,TRUE,FALSE)]
[,1] [,2]
[1,] 1 2
[3,] 10 11
• Character vector as index – If we assign
names to the rows and columns of a matrix,
then we can access the elements by names.
• This can be mixed with integers or logical
indexing.
• M <-matrix(c(3:14), nrow = 4, byrow = TRUE,
dimname = list(c(“r1”,”r2”,”r3”,”r4”),c(“c1”,”c2”,”c3”)))
• M[“r2”, “c3”] # elements at 2nd row, 3rd column
• M[ , “c1”] # elements of the column named c1
• M[TRUE, c(“c1”,”c2”)] # all rows and columns c1 & c2
• M[2:3, c(“c1”,”c3”)] # 2nd & 3rd row, columns c1 & c3
[,1] [,2] [,3]
• [1,] 1 2 3
• [2,] 4 5 6
• [3,] 7 8 9
• [4,] 10 11 12

• [1] 6

• r1 r2 r3 r4
1 4 7 10

c1 c2
• r1 1 2
• r2 4 5
• r3 7 8
• r4 10 11

c1 c2
• r1 4 6
• r2 7 9
Modifying elements of a Matrix

In R you can modify the elements of the matrices by a direct assignment.


Example:
# R program to illustrate
# editing elements in metrics
# Create a 3x3 matrix
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol = 3, byrow = TRUE )
cat("The 3x3 matrix:\n")
print(A)
# Editing the 3rd rows and 3rd column element
# from 9 to 30
# by direct assignments
A[3, 3] = 30
cat("After edited the matrix\n")
print(A)
Deleting rows and columns of a Matrix

To delete a row or a column, first of all, you need to access that row or
column and then insert a negative sign before that row or column.
It indicates that you had to delete that row or column.
Row deletion:

# R program to illustrate
# row deletion in metrics Create a 3x3 matrix
A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3, byrow = TRUE)
cat("Before deleting the 2nd row\n")
print(A)

# 2nd-row deletion
A = A[-2, ]

cat("After deleted the 2nd row\n")


print(A)
• Output:

• Before deleting the 2nd row


• [, 1] [, 2] [, 3]
• [1, ] 1 2 3
• [2, ] 4 5 6
• [3, ] 7 8 9

• After deleted the 2nd row
• [, 1] [, 2] [, 3]
• [1, ] 1 2 3
• [2, ] 7 8 9
Column deletion:

# R program to illustrate
# column deletion in metrics
# Create a 3x3 matrix
A = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3, ncol =
3, byrow = TRUE )
cat("Before deleting the 2nd column\n")
print(A)

# 2nd-row deletion
A = A[, -2]
cat("After deleted the 2nd column\n")
print(A)
Matrix Arithmetic
• The dimensions ( no of rows and columns)
should be same for the matrices involved in the
operation.
• Matrix1 <- matrix(c(10,20,30,40,50,60), nrow=2)
• Matrix2 <- matrix(c(1,2,3,4,5,6), nrow=2)
• Sum <- Matrix1 + Matrix2
• Difference <- Matrix1 – Matrix2
• Product <- Matrix1 * Matrix2
• Quotient <- Matrix1 / Matrix2
[,1] [,2] [,3]
• [1,] 10 20 30
• [2,] 40 50 60

[,1] [,2] [,3]


• [1,] 1 3 5
• [2,] 2 4 6
Matrix Manipulation
• We can modify a single element or elements
based on a conditions.
• Matrix1 <- matrix(c(10,20,30,40,50,60), nrow=2)
• Matrix1[2,2] <- 100
• Matrix1[ Matrix<40] <- 0
• We can add row or column using rbind() and cbind().
Similarly it can be removed through reassignment.
• cbind( Matrix1, c(1,2,3))
• rbind( Matrix1 , c(1,2))
• print( Matrix1 <- Matrix1[1:2, ]
Matrix Operations
1. Matrix Multiplication – Two matrices A of order MXN and B of
order PXQ are eligible for multiplication only if N is equal to P.
• The resultant matrix will be of the order MXQ.
• Matrix multiplication is performed using the operator A % *
%B where A and B are matrices.
• Matrix1 <- matrix(c(10,20,30,40,50,60), nrow=2)
• Matrix2 <- matrix(c(1,2,3,4,5,6), nrow=2)
• Product <- Matrix1 %*% Matrix2

2. Transpose – The transpose of a matrix is an operation which


flips a matrix over its diagonal, that is it switches the row and
column indices of the matrix.
• Matrix1 <- matrix(c(10,20,30,40,50,60), nrow=2)
• t( Matrix1)
[,1] [,2]
• [1,] 10 40
• [2,] 20 50
• [3,] 30 60

[,1] [,2] [,3]


• [1,] 10 20 30
• [2,] 40 50 60
3. Cross product-
• A<- matrix(c(10,20,30,40,50,60), nrow=2)
• B <- matrix(c(1,2,3,4,5,6), nrow=2)
• crossprod( A,B)

4. Diagonal Matrix –
• A <- matrix (1:9 , nrow =3)
• diag(A) # prints the diagonal element
• diag(3) # create an identity matrix of order 3
• diag( c(1,2,3) ,3) # create a matrix of order 3
with diagonal elements 1,2,3.
[,1] [,2] [,3]
• [1,] 1 4 7
• [2,] 2 5 8
• [3,] 3 6 9

• [1,] 1 5 9

[,1] [,2] [,3]


• [1,] 1 0 0
• [2,] 0 1 0
• [3,] 0 0 1

[,1] [,2] [,3]


• [1,] 1 0 0
• [2,] 0 2 0
• [3,] 0 0 3
eigen()
• eigen() function in R Language is used to
calculate eigenvalues and eigenvectors of a
matrix. Eigenvalue is the factor by which a
eigenvector is scaled.
• Syntax: eigen(x)
• Parameters:
x: Matrix
• # R program to illustrate
• # Eigenvalues and eigenvectors of matrix

• # Create a 3x3 matrix
• A = matrix(c(1:9), 3, 3)

• cat("The 3x3 matrix:\n")
• print(A)

• # Calculating Eigenvalues and eigenvectors
• print(eigen(A))
5. Row sum and column sum-
• A<- matrix(c(10,20,30,40,50,60), nrow=2)
• rowSums(A)
• colSums(A)
6. Row means and column means-
• rowMeans(A)
• colMeans(A)
7. Eigen values and eigen vectors-
• Y <- eigen(A)
8. Inverse –
• solve(A)
Arrays
• Arrays are R data objects which can store data in
more than two dimensions.
• Arrays can store only same data type.
• For ex. , if we create an array of dimension (2,4,5)
then it creates 5 rectangular matrices each with 2
rows and 4 columns.
• An array is created using the array() function.
• It takes vectors as input. The function dim()
defines the dimension of an array or use the
values in the dim parameter to create an array.
• Compared to matrices, arrays can have more
than
• R Array Syntax
There is the following syntax of R arrays:
array(data, dim = (nrow, ncol, nmat),
dimnames=names)
where
nrow: Number of rows
ncol : Number of columns
nmat: Number of matrices of dimensions nrow *
ncol
dimnames : Default value = NULL.
• EXAMPLE

V1= c(1,2,3)#RECYCLE
V2= c(10,20,30,40,50,60)
A<- array(c(V1,V2),dim=c(3,3,2))
print(A)
OUTPUT

[,1] [,2] [,3]


• [1,] 1 10 40
• [2,] 2 20 50
• [3,] 3 30 60
, , 2
[,1] [,2] [,3]
• [1,] 1 10 40
• [2,] 2 20 50
• [3,] 3 30 60
TYPES OF ARRAY

• ONE DIMENSIONAL ARRAY


• MULTI DIMENSIONAL ARRAY
ONE/Uni-Dimensional Array

• A vector is a uni-dimensional array, which is


specified by a single dimension, length.
• A Vector can be created using ‘c()‘ function. A
list of values is passed to the c() function to
create a vector.
EXAMPLE
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print (vec1)
# cat is used to concatenate # strings and print
it.
cat ("Length of vector : ", length(vec1))
Output:
[1] 1 2 3 4 5 6 7 8 9
Length of vector : 9
Multi-Dimensional Array

• A two-dimensional matrix is an array specified


by a fixed number of rows and columns, each
containing the same data type.
• A matrix is created by using array() function to
which the values and the dimensions are
passed.
EXAMPLE
# arranges data from 2 to 13
# in two matrices of dimensions 2x3
arr = array(2:13, dim = c(2, 3, 2))
print(arr)
OUTPUT
,,1
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 3 5 7

,,2
[,1] [,2] [,3]
[1,] 8 10 12
[2,] 9 11 13
NOTE
• Vectors of different lengths can also be fed as
input into the array() function.
• However, the total number of elements in all
the vectors combined should be equal to the
number of elements in the matrices. The
elements are arranged in the order in which
they are specified in the function.
Naming rows and columns

• In R, we can give the names to the rows,


columns, and matrices of the array. This is
done with the help of the dim name
parameter of the array() function.
• It is not necessary to give the name to the
rows and columns. It is only used to
differentiate the row and column for better
understanding.
• a list has to be specified which has a name for
each component of the dimension.
• Each component is either a null or a vector of
length equal to the dim value of that
corresponding dimension.
#Creating two vectors of different lengths
vec1 <-c(1,3,5)
vec2 <-c(10,11,12,13,14,15)

#Initializing names for rows, columns and matrices


col_names <- c("Col1","Col2","Col3")
row_names <- c("Row1","Row2","Row3")
matrix_names <- c("Matrix1","Matrix2")

#Taking the vectors as input to the array


res <-
array(c(vec1,vec2),dim=c(3,3,2),dimnames=list(row_n
ames,col_names,matrix_names))
print(res)
• We can give names to the rows, columns and
matrices in the array by using the dimnames
parameter.
• V1 = c(1,2,3)
• V2= c(10,20,30,40,50,60)
• Column_names <- c(“col1”,”col2”,”col3”)
• Row_names <- c(“row1”,”row2”,”row3”)
• Matrix_names <- c(“matrix1”,”matrix2”)
• A <- array(c(V1,V2), dim =c(3,3,2),dimnames =
list(row_names,column_names_matrix.names
))
Accessing Array Elements
• The R arrays can be accessed by using indices
for different dimensions separated by
commas.
• Different components can be specified by any
combination of elements’ names or positions.
Accessing Uni-Dimensional Array
• The elements can be accessed by using
indexes of the corresponding elements.
vec <- c(1:10) # accessing entire vector cat
("Vector is : ", vec) # accessing elements cat
("Third element of vector is : ", vec[3])
Output:
• Vector is : 1 2 3 4 5 6 7 8 9 10
• Third element of vector is : 3
Accessing Multi dim. Array Elements
• We can use the index position to access the
array elements. Using index we can alter each
and every individual element present in array.
• Syntax- array_name [row_position, col_position,matrix_level]
• A <- array( 1:24, dim= c(3,4,2))
• A[1,2,1] # 1st row 2nd col in matrix1.
• A[3,4,2]
• A[3, , 1] # only 3rd row in 1 matrix.
• A[ , 4,2] # 4th column in 2 matrix.
• A[ , , 1] #accessing matrix 1
• A[ , , 2] #accessing matrix 2
[,1] [,2] [,3] [,4]
• [1,] 1 4 7 10
• [2,] 2 5 8 11
• [3,] 3 6 9 12
[,1] [,2] [,3] [,4]
• [1,] 13 16 19 22
• [2,] 14 17 20 23
• [3,] 15 18 21 24

• [1] 4
• [1] 24
• [1] 3 6 9 12
• [1] 22 23 24
Adding elements to array

• Elements can be appended at the different


positions in the array.
• The sequence of elements is retained in order of
their addition to the array.
• The time complexity required to add new
elements is O(n) where n is the length of the
array.
• The length of the array increases by the number
of element additions. There are various in-built
functions available in R to add new values:
• c(vector, values): c() function allows us to append
values to the end of the array. Multiple values can
also be added together.
• append(vector, values): This method allows the
values to be appended at any position in the
vector. By default, this function adds the element
at end. append(vector, values,
after=length(vector)) adds new values after
specified length of the array specified in the last
argument of the function.
• Using the length function of the array: Elements
can be added at length+x indices where x>0.
# creating a uni-dimensional
array x <- c(1, 2, 3, 4, 5)
# addition of element using c() function
x <- c(x, 6)
print ("Array after 1st modification ")
print (x) # addition of element using append function
x <- append(x, 7)
print ("Array after 2nd modification ")
print (x) # adding elements after computing the length
len <- length(x)
x[len + 1] <- 8 print ("Array after 3rd modification ")
print (x)
[1] Array after 1st modification
[1] 1 2 3 4 5 6
[1] Array after 2nd modification
[1] 1 2 3 4 5 6 7
[1] Array after 3rd modification
[1] 1 2 3 4 5 6 7 8
Removing Elements from Array

• Elements can be removed from arrays in R, either one


at a time or multiple together.
• These elements are specified as indexes to the array,
wherein the array values satisfying the conditions are
retained and rest removed. The comparison for
removal is based on array values.
• Multiple conditions can also be combined together to
remove a range of elements.
• Another way to remove elements is by
using %in% operator wherein the set of element values
belonging to the TRUE values of the operator are
displayed as result and the rest are removed.
EXAMPLE
# Creating an array of length 9
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print("Original Array")
print(m) # Remove a single value element (3)
from the array
m <- m[m != 3]
print("After 1st modification")
print(m)
# Removing elements based on a condition
(greater than 2 and less than or equal to 8)
m <- m[m > 2 & m <= 8]
print("After 2nd modification")
print(m) # Remove a sequence of elements
using another array
remove <- c(4, 6, 8) # Check which elements
satisfy the remove property
print(m %in% remove)
print("After 3rd modification")
print(m[!m %in% remove])
• Output:
• [1] "Original Array"
• [1] 1 2 3 4 5 6 7 8 9
• [1] "After 1st modification“
• [1] 1 2 4 5 6 7 8 9
• [1] "After 2nd modification"
• [1] 4 5 6 7 8
• [1] TRUE FALSE TRUE FALSE TRUE
• [1] "After 3rd modification“
• [1] 5 7
Updating Existing Elements of Array

• The elements of the array can be updated with


new values by assignment of the desired index of
the array with the modified value.
• The changes are retained in the original array. If
the index value to be updated is within the length
of the array, then the value is changed,
otherwise, the new element is added at the
specified index.
• Multiple elements can also be updated at once,
either with the same element value or multiple
values in case the new values are specified as a
vector.
• # creating an array of length 9
• m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
• print ("Original Array")
• print (m) # updating single element
• m[1] <- 0
• print ("After 1st modification")
• print (m) # updating sequence of elements
m[7:9] <- -1
• print ("After 2nd modification")
• print (m)
• [1] "Original Array"
• [1] 1 2 3 4 5 6 7 8 9
• [1] "After 1st modification"
• [1] 0 2 3 4 5 6 7 8 9
• [1] "After 2nd modification"
• [1] 0 2 3 4 5 6 -1 -1 -1
Check if an Item Exists
• To find out if a specified item is present in an
array, use the %in% operator:
• Example
• Check if the value "2" is present in the array:
• thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))

2 %in% multiarray
• OUTPUT:
• [1] TRUE
Loop Through an Array

• thisarray <- c(1:24)


• multiarray <- array(thisarray, dim = c(4, 3, 2))

• for(x in multiarray){
• print(x)
• }
OUTPUT
• [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9
[1] 10 [1] 11 [1] 12 [1] 13 [1] 14 [1] 15 [1] 16
[1] 17 [1] 18 [1] 19 [1] 20 [1] 21 [1] 22 [1] 23
[1] 24
Amount of Rows and Columns

• Use the dim() function to find the amount of


rows and columns in an array:
• thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))
dim(multiarray)

OUTPUT:

• [1] 4 3 2
Array Length

• Use the length() function to find the dimension of an


array:
• Example
• thisarray <- c(1:24)
multiarray <- array(thisarray, dim = c(4, 3, 2))
length(multiarray)

• OUTPUT:
• [1] 24
calculations across the elements
• We can do calculations across the elements in an
array using the apply() function.
• Syntax- apply(x, margin,func)
• X is an array, margin is the name of the dataset,
func is function to be applied.
• V1 <- c(1,2,3)
• V2 <- c(10,20,30,40,50,60)
• A<- array(c(V1,V2), dim=c(3,3,2))
• B <- apply(A, c(1), sum)
• C <- apply (C, c(2), sum)
EXAMPLE
• #Creating two vectors of different lengths
• vec1 <-c(1,3,5)
• vec2 <-c(10,11,12,13,14,15)

• #Taking the vectors as input to the array1
• res1 <- array(c(vec1,vec2),dim=c(3,3,2))
• print(res1)

• #using apply function
• result <- apply(res1,c(1),sum)
• print(result)
Array Arithmetic
• To perform the arithmetic operations, we need to
convert the multi-dimensional matrix into one
dimensional matrix.
• V1 <- c(1,2,3)
• V2 <- c(10,20,30,40,50,60)
• A<- array(c(V1,V2), dim=c(3,3,2))
• mat.a <- A[ , , 1]
• mat.b <- A[ , ,2]
• mat.a + mat.b
• mat.a - mat.b
• mat.a * mat.b
• mat.a / mat.b
MATHS
Math Functions
• R provides the various mathematical functions
to perform the mathematical calculation.
• These mathematical functions are very helpful
to find absolute value, square value and much
more calculations.
• In R, there are the following functions which
are used:
Example
1.abs(x)It returns the absolute value of input x.
x<- -4
print(abs(x))
Output
[1] 4
2.sqrt(x)It returns the square root of input x.
x<- 4
print(sqrt(x))
Output[
1] 2
3.ceiling(x)It returns the smallest integer which is
larger than or equal to x.
x<- 4.5
print(ceiling(x))
Output
[1] 5
4.floor(x)It returns the largest integer, which is
smaller than or equal to x.
x<- 2.5
print(floor(x))
Output
[1] 2
5.trunc(x)It returns the truncate value of input x.
x<- c(1.2,2.5,8.1)
print(trunc(x))
Output
[1] 1 2 8
6.round(x, digits=n)It returns round value of input x.
x<- -4
print(abs(x))
Output
4
7.cos(x), sin(x), tan(x)It returns cos(x), sin(x) value of input x.
x<- 4
print(cos(x))
print(sin(x))
print(tan(x))
Output[1] -06536436 [2] -0.7568025 [3] 1.157821
8.log(x)It returns natural logarithm of input x.
x<- 4
print(log(x))
Output[1] 1.386294
9.log10(x)It returns common logarithm of input x.
x<- 4
print(log10(x))
Output[1] 0.60206
10.exp(x)It returns exponent.
x<- 4
print(exp(x))
Output[1] 54.59815
Factors
• Factor is a data structure used for fields that takes
only predefined finite number of values or
categorical data.
• They are used to categorize the data and store it
as levels.
• They can store both string and integers.
• For ex., A data field such as marital status may
contain only values from single, married,
separated, divorced and widowed. In such case,
the possible values are predefined and distnict
called levels.
Creating factors
• factors are created with the help
of factor() functions, by taking a vector as input.
• Factor contains a predefined set value called
levels. By default, R always sorts levels in
alphabetical order.
• directions <- c("North", "North", "West", "South")
• factor(directions)
• o/p= levels: North, South,West
Accessing Factor
• There are various ways to access the elements
of a factor in R. Some of the ways are as
follows:
• data <- c("East", "West", "East", "North)
• data[4]
• data[c(2,3)]
• data[-1]
• data[c(TRUE, FALSE, TRUE, TRUE)]
Modifying Factor
• To modify a factor, we are only limited to the
values that are not outside the predefined
levels.
• print(data)
• data[2] <- "North"
• data[3] <- "South"
Data Frames

• A data frame is used for storing data tables.


• It is a list of vectors of equal length.
• A data frame is a table or a two-dimensional
array like structure in which each column
contains values of one variable and each row
contains one set of values from each column.
• Data Frames in R Language are generic data
objects of R which are used to store the
tabular data.
• Data frames can also be interpreted as
matrices where each column of a matrix can
be of the different data types.
• DataFrame is made up of three principal
components, the data, rows, and columns
Characteristics of a data frame
1. The column names should be non-empty.
2. The row names should be unique.
3. The data stored in a data frame can be of
numeric, factor or character type.
4. Each column should contain same number of
data items.
Creating Data Frames
• We can create data frames using the function
data.frame().
• The top line of the table called the header
contains the column names.
• Each horizontal line afterward denotes a data
row, which begins with the name of the row,
and then followed by the actual data.
• Each data member of a row is called a cell.
• We can get the name of header using the
function names().
• No of rows using the function nrow().
• No of column using the function ncol().
• The length() function returns the length of the
list which is same as that of no of columns.
• The structure of a data frame can be retrived
using str() function.
• The statistical summary and nature of the data
can be obtained by applying summary()
function.
Example 1:
#Create the data frame.
emp.data<- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23",


"2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
OUTPUT
EXAMPLE 2:

# R program to create dataframe


# creating a data frame
friend.data<- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# print the data frame
print(friend.data)
OUTPUT
EXAMPLE 3:

# Create a data frame


Data_Frame<- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Print the data frame


Data_Frame
OUTPUT
Get the Structure of the Data Frame
It can display even the internal structure of large
lists which are nested using str() function.
# Get the structure of the data frame.
str(emp.data)

# using str()
print(str(friend.data))
OUTPUT
• X <- data.frame("roll"=1:2,"name"=c("jack","jill"),"age"=c(20,22))
• print(X)

• names(X)

• nrow(X)

• ncol(X)

• str(X)

• summary(X)
Summary of Data in Data Frame
The statistical summary and nature of the data
can be obtained by
applying summary() function.
# Print the
summary.print(summary(emp.data))
# using summary()
print(summary(friend.data))
OUTPUT
Extract Data from Data Frame
Extract specific column from a data frame using
column name.
# Extract Specific columns.result<-
data.frame(emp.data$emp_name,emp.data$sala
ry)print(result)

Extract the first two rows and then all columns


# Extract first two rows.result<-
emp.data[1:2,]print(result)

# Extract 3rd and 5th row with 2nd and 4th


column.result<-
emp.data[c(3,5),c(2,4)]print(result)
Accessing Data Frame Components
• Components of data frame can be accessed like a
list or like a matrix.
(a) Accessing like a list – we can use either [[ or $
operator to access columns of data frame.
• Accessing with [[ and $ is similar.
• X <-
data.frame("roll"=1:2,"name"=c("jack","jill"),"age"=c(20,
22))
• X$name
• X[["name"]]
• X[[3]] # retrieves the value for the third col name as list
(b) Accessing like a Matrix – Data frame can be
accessed like a matrix by providing index for
row and column.
• We can use the [] for indexing, this will return
us a data frame unlike the other two [[ and $
will reduce it into a vector.
• We can use the head() function to display first
n rows.
• Negative number for the index are also
allowed in data frames.
• X <-
data.frame("roll"=1:3,"name"=c("jack","jill","Tom"),"age"=c(20,22,23))

• X["name"]

• X[1:2,]

• X[, 2:3]

• X[c(1,2),c(2,3)]

• X[,-1]

• X[-1,]

• X[X$age>21,]

• head(X,2)
Expand Data Frame
A data frame can be expanded by adding
columns and rows.
Add Column
• Just add the column vector using a new
column name.
Example 1:

Data_Frame <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new column


New_col_DF <- cbind(Data_Frame, Steps = c(1000, 6000,
2000))

# Print the new column


New_col_DF
OUTPUT
Add Row
EXAMPLE 3:-
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)

# Add a new row


New_row_DF <- rbind(Data_Frame, c("Strength", 110, 110))

# Print the new row


New_row_DF
OUTPUT
Combining Data Frames
• Use the rbind() function to combine two or more data frames in R
vertically:

Example 2:
• Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame2 <- data.frame (
Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)
New_Data_Frame<- rbind(Data_Frame1, Data_Frame2)
New_Data_Frame
OUTPUT
• And use the cbind() function to combine two or more data frames
in R horizontally:
Example 3:

• Data_Frame3 <- data.frame (


Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
Data_Frame4 <- data.frame (
Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)
New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4)
New_Data_Frame1
OUTPUT
Remove Rows and Columns
Use the c() function to remove rows and columns in
a Data Frame:
Data_Frame<- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Remove the first row and column
Data_Frame_New<- Data_Frame[-c(1), -c(1)]
# Print the new data frame
Data_Frame_New
OUTPUT
• Amount of Rows and Columns, Data Frame
Length
• dim(Data_Frame)
• ncol(Data_Frame)
nrow(Data_Frame)
• length(Data_Frame)
Modifying Data Frames
• Data frames can be modified like we modified matrices
through reassignment.
• X <-
data.frame("roll"=1:3,"name"=c("jack","jill","Tom"),"age"=c(2
0,22,23))
• X[1,"age"] <- 25
• A data frame can be expanded by adding columns and rows.
• We can add the column vector using a new column name.
• Columns can also be added using the cbind() function.
• Similarly rows can be added using the rbind() function.
• Data frame columns can be deleted by assigning NULL to it.
• Similarly, rows can be deleted through reassignment.
• print(X$bloodgroup <- c("A+","B-","AB+"))

# adding new column using cbind()


• print(X <-
cbind(X,city=c("delhi","mumbai","chennai")))

# adding new row using rbind()


• print(X <- rbind(X,c(4,"Jack",24,"B+","Delhi")))
Aggregating Data
• It is relatively easy to collapse data in R using
one or more by variables and a defined
function.
• When using the aggregate() function, the by
variables must be in a list, even if there is only
one column.
• The function can be built-in functions like
mean, max, min, sum etc. or user provided
function.
• X <- data.frame("roll"=1:11,
"name"=c("jack","jill","jeeva","smith","bob","smith","john",
“mathew","charle","zen","yug"),
"age"= c(20,20,30,21,19,21,19,18,22,25,21),
"marks" = c(100,98,99,75,80,90,88,43,87,43,89))
• print(X)

• aggdata <- aggregate(X$marks,list(m=X$age),mean)


• print(aggdata)

• aggdata <- aggregate(X$marks,list(m=X$age),max)


• print(aggdata)

• aggdata <- aggregate(X$marks,list(m=X$age),sum)


• print(aggdata)
Sorting Data
• To sort a data frame in R, use the order()
function.
• By default, sorting is ascending.
• We can sort in descending order by giving the
sorting variable a minus sign in front.
• X <-
data.frame("roll"=1:11,"name"=c("jack","jill","jeeva","smith
","bob","smith","john","mathew","charle","zen","yug"),
"age"= c(20,20,30,21,19,21,19,18,22,25,21),
"marks" = c(100,98,99,75,80,90,88,43,87,43,89))

#sort by name
• newdata <- X[order(X$name),]

# sort by age and within sort by name


• newdata <- X[order(X$age,X$name),]

# sort by age ascending and within age descending


• newdata <- X[order(X$name,-X$age),]
Merging Data
• We can merge two data frames(datasets)
horizontally, by using the merge() function.
• In most cases, we can join two data frames by
one or more common key variable(i.e. inner join).
• There are different types of join like inner join,
outer join, left outer join, right outer join and
cross join.
• Following are the points to be kept in mind while
performing join operations-
1. An inner join of two data frames df1 and df2 returns
only the rows in which the left table have matching
keys in the right table.
2. An outer join of two data frames df1 and df2 returns
all rows from both tables, join records from the left
which have matching keys in the right table.
3. A left outer join(or simply left join) of two data frames
df1 and df2 returns all rows from the left table, and
any rows with matching keys from the right table.
4. A right outer join of two data frames df1 and df2
returns all rows from the right table, and any rows
with matching keys from the left table.
5. A cross join of two data frames df1 and df2 returns a
result set which is the number of rows in the first
table multiplied by the no of rows in second table.
• df1 =data.frame(CustomerId = c(1:6), product= c(rep("toaster",3),rep("radio",3)))

• df2 =data.frame(CustomerId = c(2,4,6), state= c(rep("alabama",2),rep("ohio",1)))

• print(df1)
• print(df2)

• # inner join
• merge(df1,df2, by= "CustomerId")

• # outer join
• merge(x=df1,y=df2, by= "CustomerId",all=TRUE)

• # left outer join


• merge(x=df1,y=df2, by= "CustomerId",all.x=TRUE)

• #Right outer join


• merge(x=df1,y=df2, by= "CustomerId",all.y=TRUE)

• #cross join
• merge(x=df1,y=df2, by= NULL)
Reshaping Data
• R provides a variety of methods for reshaping
data prior to analysis.
• Two important functions for reshaping data are
the melt() and cast() functions.
• These functions are available in reshape package.
• Before using these functions, make sure that the
package is properly installed in your system.
• We can “melt” the data so that each row is a
unique id-variable combination. Then we can
“cast” the melted data into any shape we would
like.
• y <- data.frame("id"=c(1,2,1,2,1), "age"=c(20,20,21,21,19),

"marks1"=c(80,60,70,80,90),"marks2"=c(100,98,99,75,80))
• print(y)

• #melting data
• mdata= melt(y, id=c("id","age"))

• # cast( data, formula, function)


• # mean marks for each id
• markmeans <- cast(mdata,id~variable,mean)

• # mean mark for each group


• agemeans <- cast(mdata,age~variable,mean)
Subsetting Data
• The subset() function is the easier way to
select variables and observations.
• In the following ex., we select all rows that
have a value of age greater than or equal to 20
or age less than 10.
• Similarly we select all rows with
name=“smith” or name=“John”.
• X <- data.frame ("roll"=1:11,
"name"=c("jack","jill","jeeva","smith","bob","smith","joh
n","mathew","charle","zen","yug"),
"age"= c(20,20,30,21,19,21,19,18,22,25,21))
• print(X)

• newdata <-
subset(X,age>=25&age<30,select=c(roll,name,age))
• print(newdata)

• newdata <-
subset(X,name=="smith"|name=="john",select=roll:age)
• print(newdata)
Unit 3
Conditions and loops
• Decision making structures are used by the
programmer to specify one or more
conditions to be evaluated or tested by the
program.
• A statement or statements need to be
executed if the condition is TRUE and
optionally other statements to be executed if
the condition is FALSE.
• Control statements are expressions used to
control the execution and flow of the program
based on the conditions provided in the
statements.
• These structures are used to make a decision
after assessing the variable.
• In R programming, there are 8 types of control
statements as follows:
• if condition
• if-else condition
• for loop
• nested loops
• while loop
• repeat and break statement
• return statement
• next statement

Decision Making
• R provides the following types of decision
making statements which includes if
statement, if..else statement, nested if…else
statement, ifelse() function and switch
statement.
• if condition
• This control structure checks the expression
provided in parenthesis is true or not. If true,
the execution of the statements in braces {}
continues.
• Syntax:
if(expression)
{ statements .... ....
}
x <- 100
if(x > 10){
print(paste(x, "is greater than 10"))
}
Output:
[1] "100 is greater than 10"
if Statement
• An if statement consists of a boolean
expression followed by one or more
statements. The syntax is-
• If( boolean_expression)
{
// statement will execute if the boolean
expression is true.
}
• If the boolean_expression evaluates to TRUE,
then the block of code inside the if statement
will be executed.
• If boolean_expression evaluates to FALSE,
then the first set of code after the end of if
statement will be executed.
• Here boolean expression can be a logical or
numeric vector, but only the first element is
taken into consideration.
• In the case of numeric vector, zero is taken as
FALSE, rest as TRUE.
• x<- 10
if (x > 0)
{

cat(x, “ is a positive number\n”)


}
• if-else condition
• It is similar to if condition but when the test
expression in if condition fails, then
statements in else condition are executed.
Syntax:
if(expression)
{ statements .... ....
} else
{ statements .... ....
}
if….else Statement
• An if statement can be followed by an optional else
statements which executes when the boolean expression is
FALSE.
• The syntax of if…else is-
if (boolean_expression)
{
// if expression is true
}
else
{
// if expression is false
}
• If the boolean_expression evaluates to be
TRUE, then if block of code will be executed,
otherwise else block of code will be executed.
Example:

x <- 5
# Check value is less than or greater than 10
if(x > 10){
print(paste(x, "is greater than 10"))
}else{
print(paste(x, "is less than 10"))
}
Output:
[1] "5 is less than 10"
• X <- -5
If(x > 0){
cat( x, “is a positive number\n”)
} else {
cat( x, “is a negative number\n”)
}
• We can write the if…else statement in a single
line if the “if and else” block contains only one
statement as follows.
• if( x>0) cat ( x, ”is a positive no\n”) else cat(x, “is
a negative no\n”)
Nested if…else Statement
• An if statement can be followed by an optional
else if..else statement, which is very useful to
test various conditions using single if…else if
statement.
• We can nest as many if..else statement as we
want.
• Only one statement will get executed
depending upon the boolean_expression.
• if( boolean_expression 1) {
// execute when expression 1 is true.
} else if(boolean_expression 2) {
// execute when expression 2 is true.
} else if(boolean_expression 3) {
// execute when expression 3 is true.
} else {
// execute when none of the above condition is
true.
}
• X <- 19
if (x < 0)
{
cat(x, ”is a negative number”)
} else if (x>0)
{
cat(x, “is a positive number”)
}
else
print(“zero”)
ifelse() function
• Most of the function in R take vector as input and
output a resultant vector.
• This vectorization of code, will be much faster
than applying the same function to each element
of the vector individually.
• There is an easier way to use if..else statement
specifically for vectors in R.
• We can use if…else() function instead which is the
vector equivalent form of the if..else statement.
• ifelse(boolean_expression, x, y)
• Here, boolean_expression must be a logical
vector.
• The return value is a vector with the same length
as boolean_expression.
• This returned vector has element from x if the
corresponding value of boolean_expression is
TRUE or from Y if the corresponding value of
boolean_expression is FALSE.
• For example, the ith element of result will be x[i],
if boolean_expression[i] is TRUE else it will take
the value of y[i].
• The vectors x and y are recycled whenever
necessary.
• a = c(5,7,2,9)
ifelse( a %% 2 == 0 , “even” ,”odd”)
• o/p = ?
• In the above example, the boolean_expression
is a %% 2 ==0 which will result into the
vector(FALSE, FALSE,TRUE,FALSE).
• Similarly, the other two vectors in the function
argument gets recycled to (“even”, ”even”,
”even”, ”even”) and (“odd”, “odd”, “odd”,
“odd”) respectively.
• Hence the result is evaluated accordingly.
switch Statement
• A switch statement allows a variable to be tested
for equality against a list of values.
• Each value is called a case, and the variable being
switched on is checked for each case.
• switch( expression, case1, case2, case3….)
• If the value of expression is not a character string,
it is coerced to integer.
• We can have any no of case statements within a
switch.
• Each case is followed by the value to be
compared to and a colon.
• If the value of the integer is between 1 and
nargs()-1 { the max no of arguments} then the
corresponding element of case condition is
evaluated and the result is returned.
• If expression evaluates to a character string
then the string is matched(exactly) to the
names of the elements.
• If there is more than one match, the first
matching element is returned.
• No default argument is available.
• Switch( 2, “red”, “green”, “blue”)
• Switch(“color”, “color” = “red”, “shape” = “ square” ,
”length “=5)
• Output- [1] “green”
[2] “red”
• If the value evaluated is a number, that item of the list
is returned.
• In the above example, “red”, “green”, ”blue” from a
three item list. The switch() function returns the
corresponding item to the numeric value evaluated.
• In the above example, green is returned.
• The result of the statement can be a string as well.
• In this case, the matching named item’s value is
returned.
• In the above example, “color” is the string that is
matched and its value “red” is returned.
Question
• Write a program in R to check if a given year
is a leap year or not.
• Write a program in R to find the largest of
three numbers using if-else.
• Write a program in R to check if a given
character is a vowel or consonant.
• Write a program in R to check if a given
number is a prime number.
Loops
• In General, statements are executed
sequentially.
• Loops are used in programming to repeat a
specific block of code.
• R provides various looping structures like for
loop, while loop and repeat loop.
for loop
• A for loop is a repetition control structure that allow us
to efficiently write a loop that needs to execute a
specific number of times.
• A for loop is used to iterate over a vector in R
programming.
for ( value in sequence)
{
statements
}
• Here sequence is a vector and value takes on each of
its value during the loop.
• In each iteration, statements are evaluated.
• for loop
• It is a type of loop or sequence of statements
executed repeatedly until exit condition is
reached.
Syntax:
for(value in vector)
{ statements .... ....
}
x <- letters[4:10]

for(i in x){
print(i)
}
Output:
[1] "d“
[1] "e“
[1] "f"
[1] "g"
[1] "h“
[1] "i"
[1] "j"
• X <- c(2,5,3,9,8,11,6)
count <- 0
for(val in X)
{
if (val %% 2 == 0)
count = count+1
}
cat( “no of even numbers in”, X, “is”, count, ”\n”)
• o/p = ?
• The for loop in R is flexible that they are not
limited to integers in the input.
• We can pass character vector, logical vector,
lists or expressions.
• Ex-
• V <- c( “a”, “e”, “i”, “o”, “u”)
for ( vowel in V)
{
print(vowel)
}
• o/p- ?
• Nested loops
• Nested loops are similar to simple loops.
Nested means loops inside loop. Moreover,
nested loops are used to manipulate the
matrix.
# Defining matrix
m <- matrix(2:15, 2)

for (r in seq(nrow(m))) {
for (c in seq(ncol(m))) {
print(m[r, c])
}
}
• Output:
• [1] 2 [1] 4 [1] 6 [1] 8 [1] 10 [1] 12 [1] 14 [1] 3
[1] 5 [1] 7 [1] 9 [1] 11 [1] 13 [1] 15
• while loop
• while loop is another kind of loop iterated
until a condition is satisfied. The testing
expression is checked first before executing
the body of loop.
• Syntax:
while(expression)
{ statement .... ....
}
x=1

# Print 1 to 5
while(x <= 5){
print(x)
x=x+1
}
• Output:
• [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
while loop
• while loops used to loop until a specific condition in
met.
• Syntax-
while ( test_expression)
{ statement
}
• Here, test expression is evaluated and the body of the
loop is entered if the result is TRUE.
• The statements inside the loop are executed and the
flow returns to evaluate the test_expression again.
• This is repeated each time until test_expression
evaluated to FALSE, in which case, the loop exits.
num=5
sum=0
while(num>0)
{ sum= sum + num
num= num - 1
} cat( “the sum is”, sum, “\n”)
• repeat loop and break statement
• repeat is a loop which can be iterated many
number of times but there is no exit condition to
come out from the loop. So, break statement is
used to exit from the loop. break statement can
be used in any type of loop to exit from the loop.
Syntax:
repeat {
statements .... ....
if(expression) {
break
}
}
x=1

# Print 1 to 5
repeat{
print(x)
x=x+1
if(x > 5){
break
}
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
repeat loop
• A repeat loop is used to iterate over a block of
code multiple number of times.
• There is no condition check in repeat loop to
exit the loop. We must ourselves put a
condition explicitly inside the body of the loop
and use the break statement to exit the loop.
• Otherwise it will result in an infinite loop.
repeat {
Statements
if( condition)
{
Break
}
}
Example
count <- 1
repeat {
print(count)
count <- count + 1
if (count > 5) {
break # Exit the
loop when count is greater than 5
}
}
Output:
[1] 1 2 3 4 5
Loop Control Statements
• Loop control statements are also known as
jump statements.
• Loop control statements change execution
from its normal sequence.
• When execution leaves a scope, all automatic
objects that were created in that scope are
destroyed.
• The loop control statements in R are break
statement and next statement.
break statement
• A break statement is used inside a loop
(repeat, for, while) to stop the iterations and
flow the control outside of the loop.
• In a nested looping situation, where there is a
loop inside another loop, this statement exists
from the innermost loop that is being
evaluated.
• x<- 1:10
for( val in x) {
if (val == 3) {
break
}
print(val) }
• o/p = ?
• In the above example, we iterate over the vector
x, which has consecutive numbers from 1 to 10.
• Inside the for loop we have used an if condition
to break if the current value is equal to 3.
next statement
• A next statement is useful when we want to
skip the current iteration of a loop without
terminating it.
• On encountering next, the R parser skips
further evaluation and starts next iteration of
loop.
• This is equivalent to the continue statement in
C, java and python.
• X <- 1:10
for( val in X) {
if ( val == 3) {
next
}
print( val)
}
• We use the next statement inside a condition to
check if the value is equal to 3.
• If the value is equal to 3, the current evaluation
stops( value is not printed) but the loop continues
with the next iteration.
• return statement
• return statement is used to return the result
of an executed function and returns control to
the calling function.
• Syntax:
• return(expression)
• Example:

# Checks value is either positive, negative or zero


func <- function(x){
if(x > 0){
return("Positive")
}else if(x < 0){
return("Negative")
}else{
return("Zero")
}
}

func(1)
func(0)
func(-1)
• Output:
• [1] "Positive"
• [1] "Zero"
• [1] "Negative"
Question
• Write a program in R to print the numbers
from 1 to 10 using a for loop.
• Write a program in R to find the sum of all
numbers from 1 to 100 using a for loop.
• Write a program in R to print the odd
numbers between 1 and 20 using a for loop.
• Write a program in R to calculate the factorial
of a given number using a for loop.
R Functions

• A set of statements which are organized


together to perform a specific task is known as
a function. R provides a series of in-built
functions, and it allows the user to create
their own functions. Functions are used to
perform tasks in the modular approach.
• Functions are used to avoid repeating the
same task and to reduce complexity.
To understand and maintain our code, we
logically break it into smaller parts using the
function. A function should be -
• Written to carry out a specified task.
• May or may not have arguments
• Contain a body in which our code is written.
• May or may not return one or more output
values.
• An R function is created by using the keyword
function." There is the following syntax of R
function:
• func_name <- function(arg_1, arg_2, ...) {
• Function body
• }
Components of Functions

• There are four components of function, which are as follows:

• Function Name
The function name is the actual name of the function. In R, the function is stored as an
object with its name.

• Arguments
In R, an argument is a placeholder. In function, arguments are optional means a
function may or may not contain arguments, and these arguments can have
default values also. We pass a value to the argument when a function is invoked.

• Function Body
The function body contains a set of statements which defines what the function does.

• Return value
• It is the last expression in the function body which is to be evaluated.
Function Types

• Similar to the other languages, R also has two


types of function, i.e. Built-in
Function and User-defined Function.
• In R, there are lots of built-in functions which
we can directly call in the program without
defining them. R also allows us to create our
own functions.
User-defined function

R allows us to create our own function in our program. A user


defines a user-define function to fulfill the requirement of
user. Once these functions are created, we can use these
functions like in-built function.

# Creating a function without an argument.


new.function <- function() {
for(i in 1:5) {
print(i^2)
}
}

new.function()
Built-in function

• The functions which are already created or


defined in the programming framework are
known as built-in functions. User doesn't need
to create these types of functions, and these
functions are built into an application. End-
users can access these functions by simply
calling it. R have different types of built-in
functions such as seq(), mean(), max(), and
sum(x) etc.
• # Creating sequence of numbers from 32 to 46.
• print(seq(32,46))

• # Finding the mean of numbers from 22 to 80.
• print(mean(22:80))

• # Finding the sum of numbers from 41 to 70.
• print(sum(41:70))
Functions
• Functions are used to logically break our code
into simpler parts which becomes easy to
maintain and understand.
• A function is a set of statements organized
together to perform a specific task.
• R has a large no of built-in functions and the
user can create their own functions.
• A function is an object, with or without
arguments.
Function Definition
• The reserved word function is used to declare a
function in R.
• func_name <- function(argument)
{
Statement
}
• Here, the reserved word function is used to declare a
function in R.
• This function object is given a name by assigning it to a
variable, func_name.
• The statements within the curly braces form the body
of the function. These braces are optional if the body
contains only a single expression.
• Following are the components of a function in R-
1. Function Name – This is the actual name of the
function. It is stored in R environment as an
object with this name.
2. Arguments – When a function is invoked, we can
pass values to the arguments. Arguments are
optional. A function may or may not contain
arguments. The arguments can also have default
values.
3. Function Body – The function body contains a
collection of statements that defines what the
function does.
Function Calling
• We can create user-defined functions in R. They are
specific to what a user wants and once created they can
be used like build-in functions.
• power <- function(x,y)
{
result <- x^y
cat( x, “raised to the power”, y, “is”, result, “\n”)
}
• power(2,3)
• Here, the arguments used in the function declaration x
and y are called formal arguments and those used while
calling the function are called actual argument.
Function without Arguments
• It is possible to create a function in R without
arguments.
• square <- function()
{
for( i in 1:5)
cat(“square of”, i, “is”, (i*i), “\n”)
}
• square()
Function with named Arguments
• When calling a function in this way, the order of the
actual arguments does not matter or we can pass the
arguments in a shuffled order.
• For example, all the function calls given below are
equivalent.
• power <- function(x,y)
{
result <- x^y
cat( x, “raised to the power”, y, “is”, result, “\n”)
}
• power(2,3)
• Power(x=2,y=3)
• Power(y=3,x=2)
• Further we can use named and unnamed
arguments in a single function call.
• In such case, all the named arguments are
matched first and then the remaining
unnamed arguments are matched in a
positional order
• power( x=2,3)
• power(2, y=3)
Function with default Arguments
• We can assign default values to arguments in a
function in R. This is done by providing an
appropriate value to the formal argument in the
function declaration.
• The function named power is defined with a
default value for Y in the following example
program. If no value is passed for Y, then the
default value is taken.
• If the value is passed for Y, then the default value
will be overridden.
• power <- function(x,y=2)
{
result <- x^y
cat( x, “raised to the power”, y, “is”, result,
“\n”)
}
• power(2)
• Power(2,3)
Built-in Functions
• There are several built-in functions available in
R. These functions can be directly used in user
written program.
• The built-in functions can be grouped into
mathematical functions, character functions,
statistical functions, probability functions,
date functions, time functions and other
useful functions.
Mathematical functions
1. abs()- this function computes the absolute value
of numeric data.
• The syntax is abs(x), where x is any numeric
value, array or vector.
• abs(-1)
• x <- c( -2,4,0,45,9,-4)
• abs(x)
• x <- matrix (c( -3,5,-7,1,-9,4), nrow= 3, ncol=2,
byrow=TRUE)
• abs(x[1, ])
• abs (x[, 1])
2. Sin(), cos() and tan()- the function sin()
computes the sine value, cos() computes the
cosine value and tan() computes the tangent
value of numeric data in radians.
• Syntax is sin(x), cos(x), tan(x), where x is any
numeric, array or vector.
• sin(10) , cos(90) , tan(50)
• x <- c( -2,4,0,45,9,-4)
• sin(x) , cos(x) , tan(x)
• x <- matrix (c( -3,5,-7,1,-9,4), nrow= 3, ncol=2,
byrow=TRUE)
• sin(x[1, ]) ,cos(x[,1 ]), tan(x[1,])
3. asin(), acos() and atan() – the asin() computes the
inverse sine value, acos() computes inverse cosine
value and atan() computes inverse tangent value
of numeric data in radians.
• asin(1), acos(1), atan(50)
4.exp(x) – the function computes the exponential
value of a number or number vector, e^x.
• x=5 , exp(x)
5. ceiling- This function returns the smallest integer
larger than the parameter.
• x <- 2.5
• Ceiling(x)
• 3
6. floor- This function returns the largest integer
not greater than the giving number.
• x <- 2.5
• floor(x)
7. round()- This function returns the integer
rounded to the giving number.
• The syntax is round( x, digits=n), where x is
numeric variable or a vector and digit specifies
the number of digits to be rounded.
• x<- 2.587888
• round(x,3)
7. trunc()- This function returns the integer
truncated with decimal part.
• x <- 2.99
• trunc(x)
8. signif(x, digits=n)- This function rounds the
values in its first argument to the specified
number of significant digits.
• x <- 2.587888
• Signif (x,3)
• 2.59
10. log(), log10(), log2(), log(x,b)- log() function
computes natural algorithms for a no or vector.
11. max() and min() – max() function computes the
max value of a vector and min() function
computes the minimum value of a vector.
• x <- c(10, 289, -100, 8000)
• max(x) , min(x)
12. beta() and Ibeta()- function returns the beta
value and Ibeta() returns the natural logarithm of
the beta function.
• beta(4,9)
• Ibeta(4,9)
o/p - 0.0005050, -7.590852
13. gamma()- this function returns the gamma
function £x.
• x=5
• gamma(x)
• o/p – 24
14. factorial ()- this function computes factorial
of a number or a numeric vector.
• x=5
• factorial(x)
• Apply(), lapply(), sapply(), tapply() Function in
R with Examples
apply() Function in R:
Applying a function to a matrix, array, or list is done in R using
the apply() function. It is an extremely helpful function for
doing out actions on the data kept in these structures.
The following is the apply() function's syntax:
apply(X, MARGIN, FUN, ...)
Here;
The margin that the function should be applied to is specified by
the MARGIN parameter. It can be a vector of these values, such
as 1, 2, or both for rows and columns.
X is the matrix, array, or list being operated on.
FUN is the desired outcome.
The function's optional arguments are contained in "... ".
• Here are a few examples of apply():
• Example 1: Applying a Function to a Matrix
by Rows
• Let's say we have the matrix shown below:
• m <- matrix(1:12, nrow = 3)
• Apply() can be used to determine the mean of
each row:
• apply(m, 1, mean)
• Example 2: Applying a Function to a Matrix
by Columns
• Apply() can be used to determine the standard
deviation for each column of a matrix m:
• apply(m, 2, sd)
• Example 3: Applying a Function to a List
• Consider a list of vectors:
• lst <- list(a = 1:5, b = 6:10, c = 11:15)
• Apply() can be used to determine the sum of
each vector:
• apply(lst, 1, sum)
• Example 4: Applying a User-Defined Function
to a Matrix by Rows
• Let's say we have the matrix shown below:
• m <- matrix(1:12, nrow = 3)
• Apply() allows us to apply a user-defined
function to each row:
• f <- function(x) sum(x^2)
• apply(m, 1, f)
• The f() function computes the sum of the
squares for each row in this example.
• All things considered, apply() is a fairly strong
function that can be applied in a number of
different ways to manipulate matrices, arrays,
and lists.
lapply() Function in R:

• A useful feature of the R programming language is the


lapply() function, which enables you to apply a specific
function to each element in a list or vector. A list with the
same length as the input is produced as the output, with
each entry representing the outcome of applying the
specified function to its corresponding input element.
• The lapply() function's underlying syntax is as follows:
• lapply(X, FUN, ...)
• In this case, FUN is the function that will be applied to each
member of X. X is the input list or vector. You can add more
parameters to the FUN function by passing them as the "...
argument."
• Some Examples of lapply():
• Example 1:
• Let's look at an illustration of how to employ
R's lapply() function.
• Let's say we want to determine the square
root of each number in a list of numbers. Each
element of the list can have the sqrt() function
applied to it using the lapply() method. Here is
the key:
• Code:
• # Create a list of numbers
• my_list <- list(4, 9, 16, 25)
• # Apply the sqrt() function to each element of
the list using lapply()
• result <- lapply(my_list, sqrt)
• # Print the result
• result
• As you can see, the sqrt() function was applied to
each element of the list by the lapply() function,
which then returned a list that had the same
length as the input and contained elements that
were the square roots of the corresponding input
elements.
• Example 2:
• The lapply() function can also be used with user-
defined functions. Let's make a function that adds
10 to a given integer, for instance, and then use
this function on each item in the list:
• Code:
• # Define a function that adds 10 to a given number
• add_10 <- function(x) {
• x + 10
• }
• # Apply the add_10() function to each element of the li
st using lapply()
• result <- lapply(my_list, add_10)
• # Print the result
• result
• In this instance, the add 10() function was applied
to each element of the list by the lapply()
function, which resulted in a list that had the
same length as the input and contained each
element as the result of adding 10 to its
corresponding input element.
• It's significant to remember that, regardless of
the input, the lapply() method always produces a
list. For instance, the result of using the lapply()
method on a vector is still a list: Here is an
example to show this
• code:
• # Create a vector of numbers
• my_vector <- c(4, 9, 16, 25)
• # Apply the sqrt() function to each element of
the vector using lapply()
• result <- lapply(my_vector, sqrt)
• # Print the result
• result
sapply() Function in R:

• A helpful feature of the R programming


language is the sapply() function, which may
be used to streamline the code for applying a
specified function to each element of a list or
vector. A vector or matrix with the same
length or dimensions as the input is produced
as the output, with each element the outcome
of applying the specified function to its
corresponding input element.
• The sapply() function's syntax is as follows:
• sapply(X, FUN, simplify = TRUE, ...)
• In this case, FUN is the function that will be
applied to each member of X. X is the input list or
vector. By default, the simplified parameter is set
to TRUE, meaning that if the function's output is
a vector or matrix, the result will also be a vector
or matrix. To send more arguments to the FUN
function, use the "... argument."
• Let's say we want to determine the square root of each
number in a list of numbers. The sqrt() function can be
applied to each element of the list using the sapply()
method. Here is the key:
• Code:
• # Create a list of numbers
• my_list <- list(4, 9, 16, 25)
• # Apply the sqrt() function to each element of the list using
sapply()
• result <- sapply(my_list, sqrt)
• # Print the result
• result
• The sapply() function can also be used with user-defined functions.
Let's make a function that adds 10 to a given integer, for instance,
and then use this function on each item in the list:
• Code:
• # Define a function that adds 10 to a given number
• add_10 <- function(x) {
• x + 10
• }
• # Apply the add_10() function to each element of the list using sapp
ly()
• result <- sapply(my_list, add_10)
• # Print the result
• result
• It's crucial to remember that the sapply() function also
works with vectors. In this situation, a vector will still
be the result:
• Code:
• # Create a vector of numbers
• my_vector <- c(4, 9, 16, 25)
• # Apply the sqrt() function to each element of the vect
or using sapply()
• result <- sapply(my_vector, sqrt)
• # Print the result
• result
tapply() Function in R:

• For applying a specified function to subsets of a vector


or array based on the values of another variable, the R
language's tapply() function is a helpful tool.
Depending on the function used, the output is a vector,
array, or list.
• The tapply() function's fundamental syntax is as
follows:
• tapply(X, INDEX, FUN, ...)
• Here, FUN is the function to be applied to each subset,
X is the input vector or array, INDEX is a factor or list of
factors denoting the subsets, and... are optional
parameters that can be supplied to the function FUN.
• Assume we have a vector of categories and a vector of numbers. The
average of the values for each category is what we are looking for.
• The mean() function can be applied to each subset of the vector based on
the category using the tapply() function. The code is here;
• Code:
• # Create a vector of numbers
• numbers <- c(23, 18, 25, 32, 20, 19, 27, 31, 22, 24)
• # Create a vector of categories
• categories <- c("A", "B", "A", "B", "B", "A", "B", "A", "B", "A")
• # Apply the mean() function to each subset of the vector based on the cat
egory using tapply()
• result <- tapply(numbers, categories, mean)
• # Print the result
• result
Character Function
• These functions are used for string handling
operations like extracting characters from a
string, extracting substrings from a string,
concatenation of strings, matching strings,
inserting strings, converting strings from one
case to another and so on.
1. agrep()- this function searches for
approximate matches to pattern within each
element of the string.
• agrep( pattern, x, ignore.case=FALSE, value=
FALSE, max.distance=0.1, useBytes= FALSE)
• x <- c(“R language”, “and”, “SAND”)
• agrep( “an”,x)
• agrep(“an”, x, ignore.case=TRUE)
• agrep(“uag”, x, ignore.case=TRUE, max=1)
• agrep(“uag”, x, ignore.case=TRUE, max=2)

• [1] 1 2
• [1] 1 2 3
• [1] 1
• [1] 1 2 3
2. char.expand()- This function seeks for a unique
match of its first argument among the elements
of its second.
• If successful, it returns this element, otherwise, it
performs an action specified by the third
argument. The syntax is as follow-
char.expand( input, target, nomatch= stop(“no
match”), warning())
• Where input is the character string to be
expanded, target is the character vector with the
values to be matched against, nomatch is an R
expression to be evaluated in case expansion was
not possible and warning function prints the
warning message in case there is no match.
• The match string searches only in the beginning.
• x<- c(“sand”, “and”, “land”)
• char.expand(“an”, x, warning(“no expand”))
• char.expand(“a”, x, warning(“no expand”))
3. charmatch()- This function finds matches
between two arguments and returns the index
position.
• charmatch( x, table, nomatch= NA_integer_)
• Where x gives the value to be matched, table
gives the value to be matched against and
nomatch gives the value to be returned at non
matching positions.
• charmatch (“an”, c(“and”, ”sand”))
• charmatch(“an”, “sand”)
• [1] 1
• [1] NA
4. charToRow – This function converts character
to ASCII or “raw” objects.
• x <- charToRaw(“a”)
• Y <- charToRaw(“AB”)
• [1] 61
• [1] 41 42
5. chartr() – this function is used for character
substitutions.
• chartr(old, new, x)
• x <- “apples are red”
• chartr(“a”, “g”, x)
6. dquote()- this function is used for putting double
quotes on a text.
• x <- ‘2013-06-12’
• dquote(x)
7. format()- numbers and strings can be formatted
to a specific style using format() function.
• Ex- format(x, digits, nsmall, scientific, width,
justify= c(“left”, “right”, “centre”, “none”))
8. gsub()- this function replaces all matches of a
string, if the parameter is a string vector, returns
a string vector of the same length and with the
same attributes.
• gsub(pattern, replacement, x, ignore.case=FALSE)
Ex- x<- “apples are red”
gsub(“are”, “were”, x)
o/p- “apples were red”
9. nchar() & nzchar()- This function determines
the size of each elements of a character
vector. nzchar() tests whether elements of a
character vector are non-empty strings.
Syn- nchar(x, type=“chars”, allowNA= FALSE)
syn- nzchar()
10. noquote()- This function prints out strings
without quotes. The syntax is noquote(x)
where x is a character vector.
Ex- letters
noquotes(letters)
11. paste()- Strings in R are combined using the
paste() function. It can take any number of string
arguments to be combined together.
Syn- paste(…., sep = “ “, collapse = NULL)
• Where…. Represents any number of arguments
to be combined, sep represents any seperator
between the arguments. It is optional.
• Collapse is used to eliminate the space in
between two strings but not the space within two
words of one string.
• Ex- a <- “hello”
• b <- “everyone”
• print(paste(a,b,c))
• print( paste(a,b,c, sep = “-” ))
• print( paste(a,b,c, sep = “”, collapse = “”)
12. replace()- This function replaces the values in X
with indices given in list by those given in values.
If necessary, the values in ‘values’ are recycled.
syn- replace( x, list, values)
Ex- x <- c(“green”, ”red”, “yellow”)
y <- replace(x,1,”black”)
13. sQuote()- This function is used for putting single
quote on a text.
X <- “2013-06-12 19:18:05”
sQuote(X)
14. strsplit()- This function splits the elements of a
character vector x into substrings according to
the matches to substring split within them.
Syn- strsplit( x, split)
15. substr()- This function extracts or replace
substrings in a character vector.
Syn- substr( x, start, stop)
substr( x, start, stop) <- value
Ex- substr( “programming”, 2,3)
x= c(“red”, “blue”, “green”, “yellow”)
Substr(x,2,3) <- “gh”
16. tolower() – This function converts string to
its lower case.
Syn- tolower(“R Programming”)
17. toString – This function produces a single
character string describing an R object.
Syn- toString(x)
toString( x, width = NULL)
18. toupper- This function converts string to its
upper case.
Syn- toupper(“r programming”)
Statistical Function
1. mean()- The function mean() is used to
calculate average or mean in R.
Syn- mean(x, trim= 0, na.rm = FALSE)
Trim is used to drop some observation from
both end of the sorted vector and na.rm is
used to remove the missing values from the
input vector.
2. median()- the middle most value in a data
series is called the median. The median() fun
is used in R to calculate this value.
Syn- median(x, na.rm= FALSE)
3. var()- returns the estimated variance of the
population from which the no in vector x are
sampled.
Syn- x<- c(10,2,30,2,5,8)
var(x, na.rm= TRUE)
4. sd()- returns the estimated standard deviation of
the population from which the no in vector x are
sampled.
Syn- sd(x, na.rm= TRUE)
5. scale()- returns the standard scores(z-score) for
the no in vector in x. Used to standardizing a
matrix.
Syn- x<- matrix(1:9, 3,3)
scale(x)
6. sum()- adds up all elements of a vector.
Syn- sum(X)
sum(c(1:10))
7. diff(x,lag=1)- returns suitably lagged and iterated
differences.
Syn- diff(x, lag, differences)
Where X is a numeric vector or matrix containing the
values to be differenced, lag is an integer indicating
which lag to use and difference is an integer indicating
the order of the difference.
• For ex., if lag=2, the difference between third and first
value, between the fourth and the second value are
calculated.
• The attribute differences returns the differences of
differences.
8. range()- returns a vector of the minimum and
maximum values.
Syn- x<- c(10,2,14,67,86,54)
range(x)
o/p- 2 86
9. rank()- This function returns the rank of the
numbers( in increasing order) in vector x.
Syn- rank(x, na.last = TRUE)
10. Skewness- how much differ from normal
distribution.
Syn- skewness(x)
Date and Time Functions
• R provides several options for dealing with date and
date/time.
• Three date/time classes commonly used in R are Date,
POSIXct and POSIXIt.
1. Date – date() function returns a date without time as
character string. Sys.Date() and Sys.time() returns the
system’s date and time.
Syn <- date()
Sys.Date()
Sys.time()
• We can create a date as follows-
• Dt <- as.Date(“2012-07-22”)
• While creating a date, the non-standard must be
specified.
• Dt2 <- as.Date(“04/20/2011” , format =“%m%d%Y”)
• Dt3 <- as.Date(“October 6, 2010”, format = “%B %d,%Y”)
2. POSIXct- If we have times in your data, this is
usually the best class to use. In POSIXct, “ct”
stands for calender time.
• We can create some POSIXct objects as follows.
Tm1<- as.POSIXct(“2013-07-24 23:55:26”)
o/p – “2013-07-24 23:55:26 PDT”
Tm2 <- as.POSIXct(“25072012 08:32:07”, format=
“%d%m%Y %H:%M:%S”)
• We can specify the time zone as follows.
Tm3<- as.POSIXct(“2010-12-01 11:42:03”,
tz=“GMT”)
• Times can be compared as follows.
• Tm2> Tm1
• We can add or subtract seconds as follows.
• Tm1 +30
• Tm1- 30
• Tm2 – Tm1
3. POSIXlt- This class enables easy extraction of
specific components of a time. In POSIXit, “lt”
stands for local time.
• “lt” also helps one remember that POSIXlt objects
are lists.
• Tm1.lt <- as.POSIXlt(“2013-07-24 23:55:26”)
• o/p- “2013-07-24 23:55:26”
• We can extract the components in time as follows.
• unlist(Tm1.lt)
sec min hour mday mon year wday yday isdat
26 55 23 24 6 113 3 204 1
• mday, wday, yday stands for day of the month, day of
the week and day of year resp.
• A particular component of a time can be extracted as
follows.
• Tm1.lt$sec
• we can truncate or round off the times as given below.
• trunc( Tm1.lt, “days”) o/p - “2013-07-24”
• trunc( Tm1.lt, “mins”) o/p – “2013-07-24 23:55:00”
Other Functions
1. rep( x, ntimes) – This function repeats x n
times.
Ex.- rep( 1:3,4)

2. cut( x,n)- divide continuous variable in factor


with n levels.
X<- c(1,2,3,1,2,3,1)
cut( X,2)
Recursive Function
• A function that calls itself is called a recursive function
and this technique is known as recursion.
• This special programming technique can be used to
solve problems by breaking them into smaller and
simpler sub- problems.
• Recursive functions call themselves. They break down
the problem into the smallest possible components.
• The function() calls itself within the original function()
on each of the smaller components. After this, the
results will be put together to solve the original
problem.
recursive.factorial <- function(x)
{
if ( x == 0)
return (1)
else
return ( X * recursive.factorial( X-1))
}
recursive.factorial (5)
Convert decimal number to binary-
convert_to_binary <- function(n)
{
if ( n>1)
{
convert_to _binary(as.integer(n/2))
}
cat ( n%%2)
}
convert_to_binary(5)
Object-Oriented Programming in R

Object-Oriented Programming (OOP) in R is an
approach to programming that revolves around
objects.
• In OOP, objects are instances of classes, which
encapsulate data (attributes) and functions
(methods) that operate on that data.
• This paradigm allows for more modular and
organized code, making it easier to manage and
reuse.
• Object-Oriented Programming (OOP) in R
allows you to create and work with objects
that bundle data and functions together.
• There are several systems for OOP in R,
including S3, S4.

• Each system has its own way of defining


classes, methods, and objects.
• OOP in R allows for a more modular and
organized approach to programming,
particularly useful for larger projects where
code organization, encapsulation, and
reusability are important.

• The choice of which OOP system to use (S3,


S4, or R6) depends on the project's
requirements and complexity.
• In R programming, OOPs in R provide classes and
objects as its key tools to reduce and manage the
complexity of the program.
• R is a functional language that uses concepts of OOPs.
We can think of a class as a sketch of a car. It contains
all the details about the model_name, model_no,
engine, etc. Based on these descriptions we select a
car.
• The car is the object. Each car object has its own
characteristics and features.
• An object is also called an instance of a class and the
process of creating this object is called instantiation. In
R S3 and S4 classes are the two most important classes
for object-oriented programming.
concepts of OOP in R:

• Classes and Objects


• Class: A blueprint for creating objects. It
defines the properties (attributes) and
behaviors (methods) that objects of that class
will have.
• Object: An instance of a class, representing a
specific entity. It contains the data defined by
the class's slots.
• Everything in R is an object. An object is a data structure
having some attributes and methods which act on its
attributes.
• Class is a blueprint for the object. We can think of class like
a sketch (prototype) of a house. It contains all the details
about the floors, doors, windows etc. Based on these
descriptions we build the house.
• House is the object. As, many houses can be made from a
description, we can create many objects from a class. An
object is also called an instance of a class and the process
of creating this object is called instantiation.
• While most programming languages have a single class
system, R has three class systems. Namely, S3, S4 and more
recently Reference class systems.
• They have their own features and peculiarities and
choosing one over the other is a matter of preference.
• Encapsulation
• The bundling of data and methods that
operate on that data within a class. It helps in
hiding the internal state of objects and
restricting direct access to it.
• Inheritance
• The ability of a class to inherit properties and
behavior from another class (parent class). It
promotes code reuse and allows for creating
specialized classes.
• Polymorphism
• The ability of objects of different classes to
respond to the same method in different
ways. It allows for flexibility in method
implementation based on the class of the
object.
• Methods
• A function that is associated with a class.
Methods define the behaviors of objects of
that class.
• OOP Systems in R
• S3 (Simple, Informal):
– Lightweight and flexible.
– Informal class definition.
– Methods defined by generic.function_name
conventions.
• S4 (Formal, Strict):
– Formal class definition.
– Strong typing, encapsulation, and inheritance.
– Methods explicitly defined using setMethod().
• R6/ REFERENCE Class (Encapsulated OOP):
– Encapsulated object-oriented programming.
– Public and private methods.
– More control over privacy and encapsulation.
• In Object-Oriented Programming, S3 and S4
are the two important systems.
• S3
• In oops, the S3 is used to overload any
function. So that we can call the functions
with different names and it depends on the
type of input parameter or the number of
parameters.
• S4
• S4 is the most important characteristic of
oops. However, this is a limitation, as it is
quite difficult to debug. There is an optional
reference class for S4.
S3 Class
• In R, S3 (Simple Class System version 3) is a basic
and informal system for object-oriented
programming (OOP). It allows you to define
simple classes and methods without the formal
structure of S4 or Reference Classes.
• S3 classes are quite flexible and easy to use.
• It's a lightweight and informal method for
defining classes and methods. S3 classes are easy
to use and often sufficient for many tasks.
• S3 classes are easy to work with and are
commonly used in R for many packages and
tasks. They are particularly useful for simpler
projects or when a more formal OOP structure
is not necessary.
• Creating S3 Class Objects
• To create an S3 class object, you typically follow these
steps:
• Create the Object: Create an object using existing R
data structures like vectors, lists, or data frames.
• Assign Class Attribute: Use the class() function to
assign a class to the object.
• Define Methods: Define functions that operate on this
class of objects. These methods are regular R
functions, but they will be dispatched based on the
class of the object.
Characteristics
• Lightweight and Simple:
– S3 classes are lightweight and straightforward to use.
– They provide a simple and flexible way to define and work with classes and objects.

• Flexible Class Definition:


– Class definitions in S3 are informal and not rigidly structured.
– Classes are often represented by character vectors with the same name as the class.
– For example, my_class <- "my_class" defines a class named "my_class".

• Class Attribute:
– Classes are determined by the class() attribute of an object.
– You can assign a class to an object using class(object) <- "class_name".
– Checking the class of an object is done with class(object).

• Simple Object Creation:


– Objects of a class are created using standard R data structures.
– There's no formal constructor method, but a simple function with the class name can be used.
– For example, obj <- list(name = "John", age = 30) creates an object with two attributes.
• Generic Functions:
– S3 allows the definition of generic functions.
– Generic functions behave differently based on the class of the input.
– They have a signature but no body.
– Examples include print(), summary(), and plot().
• Method Dispatch:
– Methods in S3 are typically named as generic_name.class_name.
– When you call a generic function, R dispatches the appropriate method based
on the class of the input.
– For instance, print.my_class is a method for the print() generic function
specific to the "my_class" class.
• Informal Inheritance:
– While not as strict as other class systems, S3 does support a basic form of
inheritance.
– This means you can create new classes that inherit behavior and attributes
from existing classes.
• Advantages:
– Easy to use and understand, especially for beginners.
– Offers flexibility and allows for quick prototyping.
– Lightweight and doesn't require formal definitions.
• Disadvantages:
– Lack of formal structure can lead to inconsistencies in
larger projects.
– Limited support for encapsulation and strict
inheritance compared to S4 or R6 classes.
– Error handling can be less robust compared to more
formal class systems.
• S3 differs from traditional programming
languages such as Java, C ++, and C #, which
implement OO passing messages. This makes
S3 easy to implement.
• In the S3 class, the generic function calls the
method. Its ability to implement the generic
function OO.
• S3 is very casual and has no formal definition
of classes.
• S3 requires very little knowledge from the
programmer.
• Implement function overloading concept
• Classes are not formally defined, but rather
indicated by the class attribute.
• Generic functions are functions that can
behave differently based on the class of the
object they are applied to.
• You can define methods for generic functions
to specify behavior for specific classes.
• S3 does not have formal inheritance, but you
can achieve similar results by defining
methods for generic functions that call
methods for other classes.
• Creating S3 Classes
• Class Definition:
– Classes are not formally defined.
– A class is typically represented by a character
vector with the same name as the class.
– For example, to create a class called "my_class":
• # Define a class
• my_class <- "my_class"
• Creating Objects:
• Objects of a class are created using a constructor
function.
• There's no formal constructor, but often a
function with the class name is used to create
objects.
• For instance, to create an object of class
"my_class":
# Create an object of class "my_class"
obj <- list(name = "John", age = 30)
class(obj) <- "my_class"
• Class Attributes
• Classes are determined by the class() attribute
of an object.
• You can set the class of an object using
class(object) <- "class_name".
• To check the class of an object, use
class(object).
• Generic Functions and Methods
• Generic Functions:
– Functions that behave differently based on the class
of the input.
– Typically, a generic function does not have a body,
only a signature.
– Example: print(), summary(), plot() are generic
functions.
• Methods:
– Functions that are defined to work with specific
classes.
– When you call a generic function, R dispatches the
appropriate method based on the class of the input.
– Methods are usually defined as
generic_name.class_name.
• Example:
# Define a method for the print generic function
print.my_class <- function(x) {
cat("Name:", x$name, "\n")
cat("Age:", x$age, "\n")
}
• Example
• # List creation with its attributes name
• # and roll no.
• a <- list(name="Adam", Roll_No=15)

• # Defining a class "Student"
• class(a) <- "Student"

• # Creation of object
• a
• Example
Here's a simple example to tie everything together:
# Define a class
my_class <- "my_class“

# Create an object of class "my_class"


obj <- list(name = "John", age = 30)
class(obj) <- "my_class“

# Define a method for the print generic function print.


my_class <- function(x) {
cat("Name:", x$name, "\n")
cat("Age:", x$age, "\n")
}
# Call the print function (which dispatches the print.my_class method)
print(obj)

• Output:
• Name: John
• Age: 30
Inheritance in S3 Class

• Inheritance is an important concept in


OOP(object-oriented programming) which allows
one class to derive the features and
functionalities of another class. This feature
facilitates code-reusability.
• S3 class in R programming language has no
formal and fixed definition. In an S3 object, a list
with its class attribute is set to a class name. S3
class objects inherit only methods from their
base class.
Example:

• In the following code, inheritance is done using S3


class, firstly the object is created of the class student.

# student function with argument
# name(n) and roll_no(r)
student <- function(n, r) {
value <- list(name=n, Roll=r)
attr(value, "class") < - "student"
value
}
• Then, the method is defined to print the details
of the student.

# 'print.student' method created

print.student <- function(obj) {

# 'cat' function is used to concatenate


# strings
cat("Name:", obj$name, "\n")
cat("Roll", obj$roll, "\n")}
• Now, inheritance is done while creating
another class by doing class(obj) <- c(child,
parent).

s <- list(name="Kesha", Roll=21,


country="India")
# child class 'Student' inherits
# parent class 'student'
class(s) <- c("Student", "student")
s
• Output:
• Name: Kesha
• Roll: 21
The following code overwrites the method for class
students.

# 'Student' class object is passed


# in the function of class 'student'

print.student <- function(obj) {


cat(obj$name, "is from", obj$country, "\n")
}
s
• Output:
• Kesha is from India
S4 CLASS
• In R, S4 classes are part of the object-oriented
programming (OOP) system that allows for
defining formal class structures with specific
slots (attributes) and methods (functions).
• S4 classes provide a more formal and strict
approach to OOP compared to S3 classes.
Here's an overview of working with S4 classes
in R:
• S4 class has a predefined definition. It contains
functions for defining methods and generics. It
makes multiple dispatches easy. This class
contains auxiliary functions for defining methods
and generics.
• S4 classes are more formalized and rigid
compared to S3 classes.
• Creating S4 class and object
• setClass() command is used to create S4 class.
Following is the syntax for setclass command
which denotes myclass with slots containing
name and rollno.
• The new() function is used to create an object
of the S4 class. In this function, we will pass
the class name as well as the value for the
slots.
• Accessing Slots
• You can access the slots (attributes) of an S4
object using the @ symbol
• Methods
• Methods are functions associated with a class.
You can define methods for S4 classes using
the setMethod function
• Inheritance
• S4 classes also support inheritance. You can
create a subclass that inherits from a parent class
• Inheritance is supported with the contains
argument in setClass.

• Validity Checking
• S4 classes also allow you to define validity checks
to ensure that objects are created with valid data
• validity checks can be added with setValidity.
characteristics
• Formal Structure: S4 classes have a formal definition that includes
slots (similar to fields in other programming languages) that hold
the data and methods (functions) that operate on that data.

• Formal Methods: Methods for S4 classes are explicitly defined and


associated with the class. This means that different classes can have
methods with the same name but different behaviors.

• Validity Checking: S4 classes often include built-in mechanisms for


checking the validity of objects when they are created or modified.
This helps maintain consistency and prevents errors.

• Inheritance: S4 classes support Multiple inheritance, meaning you


can create new classes that inherit behavior and structure from
existing classes.
• Define an S4 Class
• To define an S4 class, you use the setClass
function. Here's a basic example:

# Define an S4 class "Person" setClass( "Person",


slots = list( name = "character", age = "numeric" )
)
• In this example:
• We define a class named Person.
• The slots argument defines the attributes (name
and age) of the class along with their types.
• Create an Object of the Class
• After defining a class, you can create an object
of that class using the new function:

# Create an object john of class Person john <-


new("Person", name = "John Doe", age = 30)
Accessing Slots
You can access the slots (attributes) of an S4
object using the @ symbol:

# Accessing slots john@name # "John Doe"


john@age # 30
• Methods
• Methods are functions associated with a class.
You can define methods for S4 classes using
the setMethod function:
# Define a method to print information about a
Person setMethod("show", "Person",
function(object) {
cat("Name:", object@name, "\n")
cat("Age:", object@age, "\n") } )
# Now, calling `show` on a Person object will use
this method john
• Inheritance
• S4 classes also support inheritance. You can create a
subclass that inherits from a parent class:
# Define a subclass of Person called Employee
setClass( "Employee", contains = "Person", slots = list(
employee_id = "character" ) )
# Create an Employee object
jane <- new("Employee", name = "Jane Smith", age = 25,
employee_id = "12345")
# Access slots including the inherited slots from Person
jane@nam # "Jane Smith“
jane@age # 25
jane@employee_id # "12345"
# Calling show on an Employee object will use the
Person's show method
jane
• Validity Checking
• S4 classes also allow you to define validity checks to
ensure that objects are created with valid data:
# Adding a validity check to Person class
setValidity("Person", function(object)
{
if (object@age < 0) {
"Age must be a positive number"
} else {
TRUE } } )
# Creating a new Person object with invalid age
invalid_person <- new("Person", name = "Invalid", age = -
5)
# This will throw an error: Error in validityMethod(object)
: Age must be a positive nu
# Define an S4 class called "Person using
representation()“
setClass( "Person", representation( name =
"character", age = "numeric" ) )
# Create an object of class "Person“
john_doe <- new("Person", name = "John Doe",
age = 30)
# Access slots
john_doe@name # "John Doe“
john_doe@age # 30
Example 2:

# Function setClass() command used to create S4


class containing list of slots.
setClass("Student", slots=list(name="character",
Roll_No="numeric"))
# 'new' keyword used to create object of
# class 'Student'
a <- new("Student", name="Adam", Roll_No=20)

# Calling object
a
• Output:
Slot "name":
[1] "Adam“
Slot "Roll_No":
[1] 20
• Example:

stud <- setClass("Student",


slots=list(name="character",
Roll_No="numeric"))

# Calling object
stud
Reference Class
• In R, the Reference Class system (also known
as Reference Classes or RC) is another form of
object-oriented programming (OOP) that
provides more flexibility and control
compared to S3 and S4 classes. Reference
Classes were introduced to R in version 2.12.0.
• Reference Classes in R provide a more flexible
and mutable approach to object-oriented
programming.
• Reference Classes are useful when you need
more control over objects' mutability, private
fields, and inheritance at the object level.
However, they can be more complex and have
a bit more overhead compared to S3 and S4
classes.
characteristics
• Mutable Objects: Objects created from Reference Classes are
mutable, meaning you can modify their slots directly.

• Object-level Inheritance: Reference Classes support inheritance at


the object level. This means you can change the behavior of
individual objects without affecting the class definition.

• Fields and Methods: Reference Classes have fields (slots/attributes)


and methods (functions) associated with the class.

• Private Fields: Reference Classes allow you to define private fields


that can only be accessed by methods within the class.

• Formal Constructor: Reference Classes have a formal constructor


method for creating objects.
• Define a Reference Class
• To define a Reference Class, you use the
setRefClass function

• Create an Object of the Class


• You create objects of the Reference Class
using the $new method
• Accessing Fields and Methods
• Fields (slots/attributes) are accessed using the
$ operator, and methods are called similarly:
• Inheritance
• Reference Classes support inheritance, where
a child class inherits from a parent class:
• Inheritance is supported with the contains
argument in setRefClass.
• Private Fields
• You can define private fields that are only
accessible within the class
• Private fields can be defined with a leading
dot . in the field name.
• Define a Reference Class
• To define a Reference Class, you use the
setRefClass function:
# Define a Reference Class "Person" Person <-
setRefClass("Person",
fields = list( name = "character", age = "numeric"
),
methods = list( introduce = function() {
cat("Hello, my name is", self$name, "and I am",
self$age, "years old.\n") } ) )
• Create an Object of the Class
• You create objects of the Reference Class
using the $new method:
# Create an object john of class Person
john <- Person$new(name = "John Doe", age =
30)
• Accessing Fields and Methods
• Fields (slots/attributes) are accessed using the
$ operator, and methods are called similarly:
# Accessing fields
john$name # "John Doe"
john$age # 30
# Call the introduce method
john$introduce()
# "Hello, my name is John Doe and I am 30 years
old."
• Inheritance
• Reference Classes support inheritance, where a
child class inherits from a parent class
# Define a child class of Person
Employee <- setRefClass("Employee", contains =
"Person", fields = list( employee_id =
"character" ), methods = list( display_id =
function() {
cat("Employee ID:", self$employee_id, "\n") } ) )
# Create an Employee object
jane <- Employee$new(name = "Jane Smith", age =
25, employee_id = "12345")
# Access fields and methods
jane$name # "Jane Smith"
jane$age # 25
jane$employee_id # "12345“
# Call methods
jane$introduce() # "Hello, my name is Jane
Smith and I am 25 years old.“
jane$display_id() # "Employee ID: 12345"
• Private Fields
• You can define private fields that are only
accessible within the class:
# Define a Reference Class with private field
Secret <- setRefClass("Secret", fields = list(
public_info = "character", .private_info =
"character" ), methods = list( show_private =
function() {
cat("Private Info:", self$.private_info, "\n") } ) )
# Create an object
secret_obj <- Secret$new(public_info = "Public",
.private_info = "Private")
# Accessing public field
secret_obj$public_info
# "Public" # Trying to access private field from
outside the class will throw an error
secret_obj$.private_info # Error in
.Object$.private_info: attempt to access
private field '.private_info' from non-object #
But you can access it from within the class
secret_obj$show_private()
# "Private Info: Private"
Comparison:

• S3:
– Lightweight, simple, and informal.
– Flexible for quick prototyping.
– Limited support for formal inheritance and encapsulation.
• S4:
– Formal and strict class definition.
– Strong typing, encapsulation, and inheritance.
– More complex and best suited for larger projects and packages.
• R6:
• Encapsulated object-oriented programming.
• Strong encapsulation, inheritance, and control over privacy.
• Useful for creating reusable and well-structured code.
Debugging
• A grammatically correct program may give us incorrect
results due to logical errors. In case, if such errors (i.e.
bugs) occur, we need to find out why and where they
occur so that you can fix them. The procedure to
identify and fix bugs is called “debugging”.
• There are a number of R debug functions, such as:
• traceback()
• debug()
• browser()
• trace()
• recover()
• Debugging is a process of cleaning a program
code from bugs to run it successfully.
• While writing codes, some mistakes or
problems automatically appears after the
compilation of code and are harder to
diagnose. So, fixing it takes a lot of time and
after multiple levels of calls.
• Debugging in R is through warnings,
messages, and errors. Debugging in R means
debugging functions. Various debugging
functions are availabale.
Fundamental principles of Debugging

• In R, there are various principles of debugging


which help the programmers to spend their
time in writing and coding rather than in
debugging. These principles are as follows:
1. The Essence of Debugging –
• The principle of confirmation: Fixing a bugging
program is a process of confirming, one by one,
that many things you believe to be true about
code are actually true. When we find one of our
assumptions is not true, we have found a clue
to the location of a bug.

2. Start Small - Stick to small simple test cases, at


least at the beginning of the R debug process.
Working with large data objects may make it
harder to think about the problem. Of course,
we should eventually test our code in large,
complicated cases, but start small.
3. Debug in a Modular-
• Top-Down Manner: Most professional software developers
agree that code should be written in a modular manner.
• Our first-level code should not be long enough with much of it
consisting of functions calls. And those functions should not
be too lengthy and should call another function if necessary.
• This makes code easier at the writing stage and also for others
to understand when the time comes for the code to be
extended.
• We should debug in a top-down manner.

4. Antibugging - If we have a section of a code in which a


variable x should be positive, then we can insert this line:
Stopifnot(x>0)
• If there is a bug in the code earlier that renders x equals to,
say -3, the call to stopifnot() will bring things right there,
• with an error message like this: Error: x > 0 is not TRUE
R Debug Functions
• 1. traceback() –
• If our code has already crashed and we want to
know where the offensive line is, try traceback
().
• This will (sometimes) show the location
somewhere in the code of the problem.
• When an R function fails, an error is printed on
the screen. Immediately after the error, we can
call traceback () to see on which function the
error occurred. The traceback () function prints
the list of functions which were called before
the error had occurred. The functions are
printed in reverse order.
2. debug()-
• The function debug() in R allows the user to
step through the execution of a function, line
by line.
• At any point, we can print out values of
variables or produce a graph of the results
within the function.
• While debugging, we can simply type “c” to
continue to the end of the current section of
code. traceback() does not tell us where the
error occurred in the function. In order to
know which line causes the error, we will
have to step through the function using
debug().
3. browser() - The R debug function browser() stops the
execution of a function until the user allows it to
continue. This is useful if we don’t want to step
through the complete code, line-by-line, but we want it
to stop at a certain point so we can check out what is
going on. Inserting a call to the browser() in a function
will pause the execution of a function at the point
where the browser() is called. Similar to using debug()
except we can control where execution gets paused.

4. trace() - Calling trace() on a function allows the user to


insert bits of code into a function. The syntax for R
debug function trace() is a bit strange for first-time
users. It might be better off using debug().
5. recover() -
When we are debugging a
function, recover() allows us to check variables in
upper-level functions.
• By typing a number in the selection, we are
navigated to the function on the call stack and
positioned in the browser environment.
• We can use recover() as an error handler, set
using options() (e.g.options(error=recover)).
• When a function throws an error, execution is
halted at the point of failure. We can browse the
function calls and examine the environment to
find the source of the problem.
• In recover, we use the previous f(), g() and h()
functions for debugging.
Error Handling & Recovery in R

• Exception or Error handling is a process of


responding to anomalous occurrences in the
code that disrupt the flow of the code. In
general, the scope for the exception handlers
begins with try and ends with a catch. R
provides try() and trycatch() function for the
same.
• Handling Conditions Programmatically
• In the R language, there are three different tools that are
there for handling conditions including errors
programmatically.
• try(): it helps us to continue with the execution of the
program even when an error occurs. The try() function is a
wrapper function for trycatch() which prints the error and
then continues.

• tryCatch(): it helps to handle the conditions and control


what happens based on the conditions. On the other hand,
trycatch() gives you the control of the error function and
also optionally, continues the process of the function.

• withCallingHandlers(): it is an alternative to tryCatch() that


takes care of the local handlers.
• try() gives you the ability to continue
execution even when an error occurs.
• When we put the code inside the try block,
code executes, even error occurs, and also for
correct results, it will be the last result
evaluated, and if a failure, it will give with “try-
error”.
• Example:
success <- try(100 + 200)
failure <- try("100" + "200")

Output:
Error in "100" + "200" : non-numeric argument to binary
operator

Example:
# Class of success
class(success)

• Output
• [1] "try-error"
• tryCatch() specifies handler functions that control what happens when a condition
is signaled. One can take different actions for warnings, messages, and interrupts.
• The tryCatch() function is used for error handling in R. It allows you to "try"
executing a block of code and "catch" any errors that occur, enabling you to handle
them gracefully.
• Syntax:
• tryCatch(expr, error = function(e) {
• # Handle the error
• })
• expr: The expression or block of code to be evaluated.
• error: A function that specifies how to handle errors.
• Example:
tryCatch( sqrt("hello"),
error = function(e) {
print("An error occurred!")
})

• Error Messages:
• Handling Specific Errors: You can catch specific
types of errors by using the condition parameter
in tryCatch(). This allows you to handle different
types of errors differently.
• Example:
• tryCatch( sqrt("hello"), error = function(e) {
• if (inherits(e, "simpleError")) {
• print("Error: Invalid input for sqrt!")
• } else if (inherits(e, "error")) {
• print("Some other error occurred!")
• }})
• Custom Error Messages: When an error
occurs, you can provide custom error
messages to give more meaningful feedback
to the user.
• Example:
• tryCatch( sqrt("hello"), error = function(e) {
message("An error occurred:",
conditionMessage(e))
• })
• Example:

# Using tryCatch()
display_condition <- function(inputcode)
{
tryCatch(inputcode,
error = function(c) "Unexpected error occurred",
warning = function(c) "warning message, but

still need to look into code",


message = function(c) "friendly message, but

take precautions")
}

# Calling the function


display_condition(stop("!"))
display_condition(warning("?!"))
display_condition(message("?"))
display_condition(10000)
output
• For Input: display_condition(stop("!"))
• Output: [1] "Unexpected error occurred"
• For Input: display_condition(warning("?!"))
• Output: [1] "warning message, but still need to
look into code"
• ForInput: display_condition(message("?"))
• Output: [1] "friendly message, but take
precautions"
• For Input: display_condition(10000)
• Output: [1] 10000
• withCallingHandlers() is an alternative
to tryCatch(). It establishes local handlers,
whereas tryCatch() registers existing handlers.
This will be most useful to handle
a message with withCallingHandlers() rather
than tryCatch() since the latter will stop the
program.
Example:
# Using tryCatch()
message_handler <- function(c) cat("Important
message is caught!\n")
tryCatch(message = message_handler,
{
message("1st value printed?")
message("Second value too printed!")
})

Output:
Important message is caught!
• try-catch-finally in R
• Unlike other programming languages such as
Java, C++, and so on, the try-catch-finally
statements are used as a function in R. The
main two conditions to be handled in
tryCatch() are “errors” and “warnings”.
• Syntax:
check = tryCatch({
expression }, warning = function(w){
code that handles the warnings
}, error = function(e){
code that handles the errors }, finally =
function(f){
clean-up code })
# R program illustrating error handling
# Applying tryCatch
tryCatch(

# Specifying expression
expr = {
1+1
print("Everything was fine.")
},
# Specifying error message
error = function(e){
print("There was an error message.")
},

warning = function(w){
print("There was a warning message.")
},

finally = {
print("finally Executed")
}
)
• Output:
• [1] "Everything was fine."
• [1] "finally Executed"
• withCallingHandlers() in R
• In R, withCallingHandlers() is a variant
of tryCatch(). The only difference is tryCatch()
deals with exiting handlers while
withCallingHandlers() deals with local
handlers.
• Example:
# R program illustrating error handling

# Evaluation of tryCatch
check <- function(expression){

withCallingHandlers(expression,

warning = function(w){
message("warning:\n", w)
},
error = function(e){
message("error:\n", e)
},
finally = {
message("Completed")
})
}

check({10/2})
check({10/0})
check({10/'noe'})
Unit-4
• Files in R Programming
• So far the operations using the R program are
done on a prompt/terminal which is not stored
anywhere. But in the software industry, most of
the programs are written to store the information
fetched from the program. One such way is to
store the fetched information in a file. So the two
most common operations that can be performed
on a file are:
• Importing/Reading Files in R
• Exporting/Writing Files in R
• Reading Files in R Programming Language
• When a program is terminated, the entire data is
lost. Storing in a file will preserve our data even if
the program terminates. If we have to enter a
large number of data, it will take a lot of time to
enter them all. However, if we have a file
containing all the data, we can easily access the
contents of the file using a few commands in R.
You can easily move your data from one
computer to another without any changes. So
those files can be stored in various formats. It
may be stored in a i.e..txt(tab-separated value)
file, or in a tabular format i.e .csv(comma-
separated value) file or it may be on the internet
or cloud. R provides very easier methods to read
those files.
TEXT File reading in R
• One of the important formats to store a file is in a text
file. R provides various methods that one can read data
from a text file.
• read.delim(): This method is used for reading “tab-
separated value” files (“.txt”). By default, point (“.”) is
used as decimal point.
• Syntax: read.delim(file or file.choose(), header = TRUE,
sep = “\t”, dec = “.”, …)
• Parameters:
• file.choose(): In R it’s also possible to choose a file
interactively using the function file.choose(), and if
you’re a beginner in R programming then this method
is very useful for you.
• file: the path to the file containing the data to
be read into R.
• header: a logical value. If TRUE, read.delim()
assumes that your file has a header row, so
row 1 is the name of each column. If that’s not
the case, you can add the argument header =
FALSE.
• sep: the field separator character. “\t” is used
for a tab-delimited file.
• dec: the character used in the file for decimal
points.
# R program reading a text file

# Read a text file using read.delim()


myData = read.delim(“Ks.txt", header = FALSE)
OR
myFile = read.delim(file.choose(), header = FALSE)

print(myData)

• Output:
• 1 A computer science portal for Ks.
• read_tsv(): This method is also used for to read a
tab separated (“\t”) values by using the help
of readr package.
• Syntax: read_tsv(file, col_names = TRUE)

• Parameters:
• file: the path to the file containing the data to be
read into R.
• col_names: Either TRUE, FALSE, or a character
vector specifying column names. If TRUE, the first
row of the input will be used as the column
names.
• # R program to read text file
• # using readr package

• # Import the readr library


• library(readr)

• # Use read_tsv() to read text file


• myData = read_tsv("ks.txt", col_names = FALSE)
• print(myData)

• Output:
• # A tibble: 1 x 1
X1
1 A computer science portal for ks.
• Reading one line at a time
• read_lines(): This method is used for the reading
line of your own choice whether it’s one or two
or ten lines at a time. To use this method we have
to import reader package.
• Syntax: read_lines(file, skip = 0, n_max = -1L)
• Parameters:
• file: file path
• skip: Number of lines to skip before reading data
• n_max: Numbers of lines to read. If n is -1, all
lines in the file will be read.
# R program to read one line at a time

# Import the readr library


library(readr)

# read_lines() to read one line at a time


myData = read_lines("eks.txt", n_max = 1)
print(myData)

# read_lines() to read two line at a time


myData = read_lines("ks.txt", n_max = 2)
print(myData)
• Output:
• [1] "A computer science portal for ks."

[1] "A computer science portal for ks."


[2] "ks is founded by Sandeep Jain Sir."
• Reading the whole file
• read_file(): This method is used for reading the whole
file. To use this method we have to import reader
package.
Syntax: read_lines(file)
file: the file path

EXAMPLE

library(readr)
# read_file() to read the whole file
myData = read_file("ks.txt")
print(myData)
• # R program to read a file in table format

• # Using read.table()
• myData = read.table("basic.csv")
• print(myData)
• Output:
• 1 Name,Age,Qualification,Address
2 Amiya,18,MCA,BBS
3 Niru,23,Msc,BLS
4 Debi,23,BCA,SBP
5 Biku,56,ISC,JJP
• Reading a file in a table format
• Another popular format to store a file is in a tabular format.
R provides various methods that one can read data from a
tabular formatted data file.
• read.table(): read.table() is a general function that can be
used to read a file in table format. The data will be
imported as a data frame.
• Syntax: read.table(file, header = FALSE, sep = “”, dec = “.”)
• Parameters:
• file: the path to the file containing the data to be imported
into R.
• header: logical value. If TRUE, read.table() assumes that
your file has a header row, so row 1 is the name of each
column. If that’s not the case, you can add the argument
header = FALSE.
• sep: the field separator character
• dec: the character used in the file for decimal points.
READ CSV FILE IN R
• read.csv(): read.csv() is used for reading “comma separated
value” files (“.csv”). In this also the data will be imported as
a data frame.
• Syntax: read.csv(file or file.choose(), header = TRUE, sep =
“,”, dec = “.”, …)
• file.choose(): You can also
use file.choose() with read.csv() just like before.
• Parameters:
• file: the path to the file containing the data to be imported
into R.
• header: logical value. If TRUE, read.csv() assumes that your
file has a header row, so row 1 is the name of each column.
If that’s not the case, you can add the argument header =
FALSE.
• sep: the field separator character
• dec: the character used in the file for decimal points.
• # R program to read a file in table format

• # Using read.csv()
• myData = read.csv("basic.csv")
• print(myData)

• Output:
• Name Age Qualification Address
1 Amiya 18 MCA BBS
2 Niru 23 Msc BLS
3 Debi 23 BCA SBP
4 Biku 56 ISC JJP
• read.csv2(): read.csv() is used for variant used in
countries that use a comma “,” as decimal point and a
semicolon “;” as field separators.
• Syntax: read.csv2(file, header = TRUE, sep = “;”, dec =
“,”, …)
• Parameters:
• file: the path to the file containing the data to be
imported into R.
• header: logical value. If TRUE, read.csv2() assumes that
your file has a header row, so row 1 is the name of
each column. If that’s not the case, you can add the
argument header = FALSE.
• sep: the field separator character
• dec: the character used in the file for decimal points.
• # R program to read a file in table format

• # Using read.csv2()
• myData = read.csv2("basic.csv")
• print(myData)
• Output:
• Name.Age.Qualification.Address
1 Amiya,18,MCA,BBS
2 Niru,23,Msc,BLS
3 Debi,23,BCA,SBP
4 Biku,56,ISC,JJP
read_csv() fro readr package
• read_csv(): This method is also used for to read a
comma (“,”) separated values by using the help
of readr package.
• Syntax: read_csv(file, col_names = TRUE)
• Parameters:
• file: the path to the file containing the data to be
read into R.
• col_names: Either TRUE, FALSE, or a character
vector specifying column names. If TRUE, the first
row of the input will be used as the column
names.
• # R program to read a file in table format
• # using readr package

• # Import the readr library


• library(readr)

• # Using read_csv() method


• myData = read_csv("basic.csv", col_names =
TRUE)
• print(myData)
Reading a file from the internet

• It’s possible to use the


functions read.delim(), read.csv() and read.table() to
import files from the web.
• Example:
• R
• # R program to read a file from the internet

• # Using read.delim()
• myData =
read.delim("http://www.sthda.com/upload/boxplot_forma
t.txt")
• print(head(myData))
• Using read.table() Function
• This function specifies how the dataset is separated, in this
case we take sep=”, “ as an argument.
• file.choose(): It opens a menu to choose a CSV file from the
desktop.
• header: It is to indicate whether the first row of the dataset
is a variable name or not. Apply T/True if the variable name
is present else put F/False.

• Example:
• # import and store the dataset in data2
• data2 <- read.table(file.choose(), header=T, sep=", ")

• # display data
• data2
• Using R-Studio
• Here we are going to import data through R
studio with the following steps.
• Steps:
• From the Environment tab click on the Import
Dataset Menu.
• Select the file extension from the option.
• In the third step, a pop-up box will appear,
either enter the file name or browse the
desktop.
• The selected file will be displayed on a new
window with its dimensions.
• In order to see the output on the console,
type the filename.
Working with Excel Files in R
Programming
• Excel files are of extension .xls, .xlsx and
.csv(comma-separated values). To start working
with excel files in R Programming Language, we
need to first import excel files in RStudio or any
other R supporting IDE(Integrated development
environment).
• Reading Excel Files in R Programming Language
• First, install readxl package in R to load excel files.
Various methods including their subparts are
demonstrated further.
• Reading Files:
• The two excel files Sample_data1.xlsx and
Sample_data2.xlsx and read from the working
directory.
# Working with Excel Files
# Installing required package
install.packages("readxl")

# Loading the package


library(readxl)

# Importing excel file


Data1 < - read_excel("Sample_data1.xlsx")
Data2 < - read_excel("Sample_data2.xlsx")

# Printing the data


head(Data1)
head(Data2)
• The excel files are loaded into variables
Data_1 and Data_2 as a dataframes and then
variable Data_1 and Data_2 is called that
prints the dataset.
• Importing Excel File
• To read and import the excel files, “xlsx” package
is required to use the read.xlsx() function. To
read “.xls” excel files, “gdata” package is required
to use read.xls() function.
• Syntax:
• read.xlsx(filename, sheetIndex) OR
• read.xlsx(filename, sheetName) Parameters:
sheetIndex specifies number of sheet
sheetName specifies name of sheet
• To know about all the arguments of read.xlsx(),
execute below command in R:
• help("read.xlsx")
• # Install xlsx package
• install.packages("xlsx")

• library(xlsx)

• # Check current working directory


• getwd()

• # Get content into a data frame


• data <- read.xlsx("ExcelExample.xlsx",
• sheetIndex = 1,
• header = FALSE)

• # Printing content of Text File


• print(data)

• # Print the class of data


• print(class(data))
• Output:
• [1] "C:/Users/xyz/Documents" X1 X2 X3 1
1000 ABC abc 2
• 2000 DEF def 3
• 3000 GHI ghi 4
• 4000 JKL jkl 5
• 5000 MNO mno
• [1] "data.frame"
• Writing Files
• After performing all operations, Data1 and
Data2 are written into new files
using write.xlsx() function built in writexl
package.
• # Installing the package
• install.packages("writexl")

• # Loading package
• library(writexl)

• # Writing Data1
• write_xlsx(Data1, "New_Data1.xlsx")

• # Writing Data2
• write_xlsx(Data2, "New_Data2.xlsx")
Writing to Files in R Programming

• R programming Language is one of the very


powerful languages specially used for data
analytics in various fields. Analysis of data
means reading and writing data from various
files like excel, CSV, text files, etc. Today we
will be dealing with various ways of writing
data to different types of files using R
programming.
Writing Data to CSV files in R
Programming Language
• CSV stands for Comma Separated Values.
These files are used to handle a large amount
of statistical data. Following is the syntax to
write to a CSV file:
• Syntax:
• write.csv(my_data, file = "my_data.csv")
• write.csv2(my_data, file = "my_data.csv")
output
Writing Data to text files

• Text files are commonly used in almost every


application in our day-to-day life as a step for
the “Paperless World”. Well, writing to .txt
files is very similar to that of the CSV files.
Following is the syntax to write to a text file:
• Syntax:
• write.table(my_data, file = "my_data.txt", sep
= "")
• Here csv() and csv2() are the function in R
programming.
• write.csv() uses “.” for the decimal point and a
comma (“, ”) for the separator.
• write.csv2() uses a comma (“, ”) for the
decimal point and a semicolon (“;”) for the
separator.
output
Writing Data to Excel files

• To write data to excel we need to install the


package known as “xlsx package”, it is basically
a java based solution for reading, writing, and
committing changes to excel files. It can be
installed as follows:
• install.packages("xlsx")
• and can be loaded and General syntax of using
it is:
• Syntax:
• library("xlsx")
• write.xlsx(my_data, file = "result.xlsx",
• sheetName = "my_data", append =
FALSE).
output
Data Handling in R Programming

• R Programming Language is used for statistics and data analytics purposes.


Importing and exporting of data is often used in all these applications of R
programming.
R language has the ability to read different types of files such as comma-
separated values (CSV) files, text files, excel sheets and files, SPSS files,
SAS files, etc.
• R allows its users to work smoothly with the systems directories with the
help of some pre-defined functions that take the path of the directory as
the argument or return the path of the current directory that the user is
working on. Below are some directory functions in R:
• getwd(): This function is used to get the current working directory being
used by R.
• setwd(): This function in R is used to change the path of current working
directory and the path of the directory is passed as argument in the
function.
• Example:
• setwd("C:/RExamples/")
• setwd("C:\\RExamples\\")
• list.files(): This function lists all files and
folders present in current working directory.
• Exporting files in R
• Below are some methods to export the data to a
file in R:
• Using console
cat() function in R is used to output the object to
console. It can be also used as redirecting the
output to a particular file.Syntax:
• cat(..., file) Parameter:
file specifies the filename to which output has to
redirected
• To know about all the arguments of cat(), execute
below command in R:
• help("cat")
• str = "World"

• # Redirect Output to file


• cat("Hello, ", str, file = "catExample.txt")
• Output:
Above code creates a new file and redirects
the output of cat(). The contents of the file are
shown below after executing the code-
• Hello, World
• Using sink() function:
sink() function is used to redirect all the
outputs from cat() and print() to the given
filename.
Syntax:
• sink(filename) # begins redirecting output to
file . . sink() To know about all the arguments
of sink(), execute below command in R:
• help("sink")
• # Begin redirecting output
• sink("SinkExample.txt")

• x <- c(1, 3, 4, 5, 10)


• print(mean(x))
• print(class(x))
• print(median(x))
• sink()

• Output:
The above code creates a new file and redirects the output.
The contents of the file are shown below after executing
the code-
• [1] 4.6 [1] "numeric" [1] 4
• Writing to CSV files
• A matrix or data-frame object can be
redirected and written to csv file
using write.csv() function.
• Syntax: write.csv(x, file)
• Parameter:
file specifies the file name used for writing
• To know about all the arguments
of write.csv(), execute below command in R:
• help("write.csv")
• # Create vectors
• x <- c(1, 3, 4, 5, 10)
• y <- c(2, 4, 6, 8, 10)
• z <- c(10, 12, 14, 16, 18)

• # Create matrix
• data <- cbind(x, y, z)

• # Writing matrix to CSV File


• write.csv(data, file = "CSVWrite.csv", row.names =
FALSE)
• Output:
Above code creates a new file and redirects the
output. The contents of the file is shown below
after executing the code-
• Exporting Data from scripts in R Programming
• Exporting Data from the R Programming
Language is done on a prompt/terminal which
is not stored anywhere. But in the software
industry, most of the programs are written to
store the information fetched from the
program. One such way is to store the fetched
information in a file. So the two most common
operations that can be performed on a file
are:
• Importing Data to R scripts
• Exporting Data from R scripts
• Exporting Data from R Scripts
• When a program is terminated, the entire data is lost.
Storing in a file will preserve one’s data even if the
program terminates. If one has to enter a large number
of data, it will take a lot of time to enter them all.
However, if one has a file containing all the data,
he/she can easily access the contents of the file using a
few commands in R. One can easily move his data from
one computer to another without any changes. So
those files can be stored in various formats. It may be
stored in .txt(tab-separated value) file, or in a tabular
format i.e .csv(comma-separated value) file or it may
be on the internet or cloud. R provides very easy
methods to export data to those files.
• Exporting data to a text file
• One of the important formats to store a file is
in a text file. R provides various methods that
one can export data to a text file.
• write.table():
• The R base function write.table() can be used
to export a data frame or a matrix to a text
file.
• Syntax: write.table(x, file, append = FALSE, sep
= ” “, dec = “.”, row.names = TRUE, col.names =
TRUE)
• Parameters:
x: a matrix or a data frame to be written.
file: a character specifying the name of the result file.
sep: the field separator string, e.g., sep = “\t” (for tab-
separated value).
dec: the string to be used as decimal separator. Default
is “.”
row.names: either a logical value indicating whether
the row names of x are to be written along with x, or a
character vector of row names to be written.
col.names: either a logical value indicating whether
the column names of x are to be written along with x,
or a character vector of column names to be written.
• # R program to illustrate
• # Exporting data from R

• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )

• # Export a data frame to a text file using write.table()


• write.table(df,
• file = "myDataFrame.txt",
• sep = "\t",
• row.names = TRUE,
• col.names = NA)
Output:
• write_tsv():
• This write_tsv() method is also used for to
export data to a tab separated (“\t”) values by
using the help of readr package.
• Syntax: write_tsv(file, path)
• Parameters:
file: a data frame to be written
path: the path to the result file
• # R program to illustrate
• # Exporting data from R

• # Importing readr library


• library(readr)

• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )

• # Export a data frame using write_tsv()


• write_tsv(df, path = "MyDataFrame.txt")
• Output:
• Name Language Age
Amiya R 22
Raj Python 25
Asish Java 45
• Exporting data to a csv file
• Another popular format to store a file is in a
csv(comma-separated value) format. R
provides various methods that one can export
data to a csv file.
• write.table():
• The R base function write.table() can also be
used to export a data frame or a matrix to a
csv file.
• Syntax: write.table(x, file, append = FALSE, sep = ” “,
dec = “.”, row.names = TRUE, col.names = TRUE)

• Parameters:
x: a matrix or a data frame to be written.
file: a character specifying the name of the result file.
sep: the field separator string, e.g., sep = “\t” (for tab-
separated value).
dec: the string to be used as decimal separator. Default
is “.”
row.names: either a logical value indicating whether
the row names of x are to be written along with x, or a
character vector of row names to be written.
col.names: either a logical value indicating whether
the column names of x are to be written along with x,
or a character vector of column names to be written.
• # R program to illustrate
• # Exporting data from R

• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )

• # Export a data frame to a text file using write.table()


• write.table(df,
• file = "myDataFrame.csv",
• sep = "\t",
• row.names = FALSE,
• )
Output
• write.csv2():
• This method is much similar as write.csv() but
it uses a comma (“, ”) for the decimal point
and a semicolon (“;”) for the separator.
• # R program to illustrate
• # Exporting data from R

• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )

• # Export a data frame to a text file using write.csv2()


• write.csv2(df, file = "my_data.csv")
output
• write_csv():
• This method is also used for to export data to
a comma separated (“, ”) values by using the
help of readr package.
• Syntax: write_csv(file, path)
• Parameters:
file: a data frame to be written
path: the path to the result file
• # R program to illustrate
• # Exporting data from R

• # Importing readr library


• library(readr)

• # Creating a dataframe
• df = data.frame(
• "Name" = c("Amiya", "Raj", "Asish"),
• "Language" = c("R", "Python", "Java"),
• "Age" = c(22, 25, 45)
• )

• # Export a data frame using write_csv()


• write_csv(df, path = "MyDataFrame.csv")
DATASET
• A data set is a collection of numbers or
values that relate to a particular subject.
• A data set is a collection of data, often
presented in a table.
• The organized collections of data is known as
dataset. They are mostly used in fields like
machine learning, business, and government
to gain insights, make informed decisions, or
train algorithms.
• A data set is a collection data presented usually in a table. The data can
have a relationship where both variables change in the same direction.
The data can have a relationship where the variable change in opposite
directions. And some times the data have no relationship at all.
• A data set is a collection of data. The data is usually presented in a
table form. The data can be numerical or categorical. The data can be
two variables, multiple variables and the variable can have different
relationships.
• A data set is a collection of data that is usually organized in table form.
The data is placed on the table in a manner to help with understanding
the information.
• A data set can contain data comparing time of day to the number
customers. A data set can also contain data regarding an individual's
income, expenses, savings and how the three affect each other.
• What is the purpose of dataset?
• The purpose of a data set is organize the
collected data so that is easier to understand.
The data set places the data into columns and
rows for comparison.
• Types of Data Sets
• There are several types of data sets. Each one displaying different
types of data. We will discuss the different data sets and the types
of operations that can be performed on the data.
• Numerical Data Set
• Numerical data is data represented with numbers versus words.
Numerical data also goes by the name quantitative data because
quantity refers to a numerical amount. When asked for a quantity,
the data can be counted. Numerical data will always be displayed as
numbers so that mathematical operations can be used on the data.
One type is numerical.
• It has data represented with numbers instead of words.
• These include temperature, humidity, marks and so on.
• Number of siblings
• Number of students playing different sports
• Number of miles run
• Bivariate Data Set
• Bivariate data, bi meaning two and variate meaning
variable, is a data set with two variables. The variables
in the data set have a connection.
• Bivariate data sets consist of two variable that have a
relationship.
• In this dataset, 2 classes or features are directly
correlated to each other.
• For example, height and weight in a dataset are
directly related to each other.
• The amount of beachgoers compared to the
temperature on a cold day. The two variables will be
the beachgoers and the temperature.
• The money earned compared to the amount of hours
worked. The two variables are the money earned and
the hours worked.
• Multivariate Data Set
• Multivariate data, as its name states, is a data set with
multiple variables. Multiple variables meaning three or
more variables. The variables in a multivariate data set
interact with each other as a function. The variables
depend on each other.
• Multivariate data sets have three or more variables that all
depend on each other. Multivariate Dataset:
• In these types of datasets, as the name suggests 2 or more
classes are directly correlated to each other.
• For example, attendance, and assignment grades are
directly correlated to a student’s overall grade.

• The distance compared to the rate and time a car travels.


The variables are distance, rate and time. All three variables
affect the other.
• Categorical Data Set
• Categorical data represents qualities and characteristics of a person or
item. The variables in a categorical data set are considered qualitative
variables because the variables represent qualities. Categorical data set is
another and it displays data based on characteristics or qualities of an
item.
• These include categories such as colour, gender, occupation, games, sports
and so on.
• Degree earned (bachelors, masters, doctoral)
• Favorite color (red, blue, green, etc.)

• Correlation Data Set


• Correlation data have some type of relationship between the variables.
The relationship can be one of the following.
• Correlation data sets are data that have have 1 of 3 relationships.
• Positive relationships happen when the variables change in the same
direction.
• Negative relationships happen when the variables change in opposite
directions.
• No or zero relationships occur when the variable have no affect on each
other.

• Web Dataset: These include datasets created by calling
APIs using HTTP requests and populating them with values
for data analysis. These are mostly stored in JSON
(JavaScript Object Notation) formats.
• Time series Dataset: These include datasets between a
period, for example, changes in geographical terrain over
time.
• Image Dataset: It includes a dataset consisting of images.
This is mostly used to differentiate the types of diseases,
heart conditions and so on.
• Ordered Dataset: These datasets contain data that are
ordered in ranks, for example, customer reviews, movie
ratings and so on.
• Partitioned Dataset: These datasets have data points
segregated into different members or different partitions.
• File-Based Datasets: These datasets are stored in files, in
Excel as .csv, or .xlsx files.
• Features of a Dataset
• The features of a dataset may allude to the columns available in the
dataset. The features of a dataset are the most critical aspect of the
dataset, as based on the features of each available data point, will
there be any possibility of deploying models to find the output to
predict the features of any new data point that may be added to
the dataset.
• It is only possible to determine the standard features from some
datasets since their functionalities and data would be completely
different when compared to other datasets. Some possible features
of a dataset are:
• Numerical Features: These may include numerical values such as
height, weight, and so on. These may be continuous over an
interval, or discrete variables.
• Categorical Features: These include multiple classes/ categories,
such as gender, colour, and so on.
• Metadata: Includes a general description of a dataset. Generally in
very large datasets, having an idea/ description of the dataset when
it’s transferred to a new developer will save a lot of time and
improve efficiency.
• Size of the Data: It refers to the number of entries and features it
contains in the file containing the Dataset.
• Formatting of Data: The datasets available online are available in
several formats. Some of them are JSON (JavaScript Object
Notation), CSV (Comma Separated Value), XML (eXtensible Markup
Language), DataFrame, and Excel Files (xlsx or xlsm). For particularly
large datasets, especially involving images for disease detection,
while downloading the files from the internet, it comes in zip files
which will be needed to extract in the system to individual
components.
• Target Variable: It is the feature whose values/attributes are
referred to to get outputs from the other features with machine
learning techniques.
• Data Entries: These refer to the individual values of data present in
the Dataset. They play a huge role in data analysis.
• Most Used built-in Datasets in R

• In R, there are tons of datasets we can try but the


mostly used built-in datasets are:
• airquality - New York Air Quality Measurements
• AirPassengers - Monthly Airline Passenger
Numbers 1949-1960
• mtcars - Motor Trend Car Road Tests
• iris - Edgar Anderson's Iris Data
• These are few of the most used built-in data sets.
If you want to learn about other built-in datasets,
please visit The R Datasets Package.
• There is a popular built-in data set in R called "mtcars" (Motor
Trend Car Road Tests), which is retrieved from the 1974 Motor
Trend US Magazine.
• In the examples below (and for the next chapters), we will use
the mtcars data set, for statistical purposes:
• Example
• # Print the mtcars data set
mtcars
• Result:
• mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6
160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0
110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320
18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8
360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69
3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4
2 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C
17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
mtcars {datasets}
R DocumentationMotor Trend Car Road Tests
Description
The data was extracted from the 1974 Motor Trend US magazine, and
comprises fuel consumption and 10 aspects of automobile design
and performance for 32 automobiles (1973-74 models).
Usage
mtcarsFormat
A data frame with 32 observations on 11 (numeric) variables.
[, 1]mpgMiles/(US) gallon[, 2]cylNumber of cylinders[,
3]dispDisplacement (cu.in.)[, 4]hpGross horsepower[, 5]dratRear
axle ratio[, 6]wtWeight (1000 lbs)[, 7]qsec1/4 mile time[,
8]vsEngine (0 = V-shaped, 1 = straight)[, 9]amTransmission (0 =
automatic, 1 = manual)[,10]gearNumber of forward
gears[,11]carbNumber of carburetorsNote
Henderson and Velleman (1981) comment in a
footnote to Table 1: 'Hocking [original
transcriber]'s noncrucial coding of the Mazda's
rotary engine as a straight six-cylinder engine and
the Porsche's flat engine as a V engine, as well as
the inclusion of the diesel Mercedes 240D, have
been retained to enable direct comparisons to be
made with previous analyses.'
Source
Henderson and Velleman (1981), Building multiple
regression models interactively. Biometrics, 37,
391-411.
Examples
summary(mtcars2)
• From the examples above, we have found out that the
data set has 32 observations (Mazda RX4, Mazda RX4
Wag, Datsun 710, etc) and 11 variables (mpg, cyl, disp,
etc).
• A variable is defined as something that can be measured or
counted.
• Here is a brief explanation of the variables from the mtcars
data set:
• Variable Name DescriptionmpgMiles/(US) Gallon
• Cyl Number of cylinders
• Disp Displacement
• hpGross horsepowerdratRear axle ratiowtWeight (1000
lbs)qsec1/4 mile timevsEngine (0 = V-shaped, 1 =
straight)amTransmission (0 = automatic, 1 =
manual)gearNumber of forward gearscarbNumber of
carburetors
• Information About the Data Set
• You can use the question mark (?) to get
information about the mtcars data set:
• Example
• # Use the question mark to get information
about the data set

?mtcars
• Display R datasets
• To display the dataset, we simply write the
name of the dataset inside
the print() function. For example,
• # display airquality dataset print(airquality)

Ozone Solar.R Wind Temp Month Day


• Get Informations of Dataset
• In R, there are various functions we can use to get
information about the dataset like: dimensions of
dataset, number of rows and columns, name of
variables and so on. For example,
• # use dim() to get dimension of dataset
cat("Dimension:",dim(airquality))
• # use nrow() to get number of rows
cat("\nRow:",nrow(airquality))
• # use ncol() to get number of columns
cat("\nColumn:",ncol(airquality))
• # use names() to get name of variable of dataset
cat("\nName of Variables:",names(airquality))
• dim() - returns the dimension of the dataset
i.e. 153 6
• nrow() - returns the number of row
(observations) i.e. 153
• ncol() - returns the number of column
(variables) i.e. 6
• names() - returns all the name of variables
• Display Variables Value in R
• To display all the values of the specified
variable in R, we use the $ operator and the
name of the variable. For example,
• # display all values of Temp variable
print(airquality$Temp)
• Sort Variables Value in R
• In R, we use the sort() function to sort values
of variables in ascending order. For example,
• # sort values of Temp variable
sort(airquality$Temp)
• Statistical Summary of Data in R
• We use the summary() function to get statistical
information about the dataset.
• The summary() function returns six statistical
summaries:
• min
• First Quartile
• Median
• Mean
• Third Quartile
• Max
• Get Information
• Use the dim() function to find the dimensions of the
data set, and the names() function to view the names
of the variables:
• Example
• Data_Cars <- mtcars # create a variable of the mtcars
data set for better organization
# Use dim() to find the dimension of the data set
dim(Data_Cars)
# Use names() to find the names of the variables from
the data set
names(Data_Cars)
• Use the rownames() function to get the name
of each row in the first column, which is the
name of each car:
• Example
• Data_Cars <- mtcars

rownames(Data_Cars)
Handling large data sets in R
• The Problem with large data sets in R-
• R reads entire data set into RAM all at once.
Other programs can read file sections on
demand.
• R Objects live in memory entirely.
• Does not have int64 data type
Not possible to index objects with huge
numbers of rows & columns even in 64 bit
systems (2 Billion vector index limit) . Hits file
size limit around 2-4 GB.
• How big is a large data set:
• We can categorize large data sets in R across two
broad categories:
• Medium sized files that can be loaded in R (
within memory limit but processing is
cumbersome (typically in the 1-2 GB range )
• Large files that cannot be loaded in R due to R /
OS limitations as discussed above . we can further
split this group into 2 sub groups
– Large files - (typically 2 - 10 GB) that can still be
processed locally using some work around solutions.
– Very Large files - ( > 10 GB) that needs distributed
large scale computing.
• Medium sized datasets (< 2 GB)
1. Try to reduce the size of the file before loading it into R
• If you are loading xls files , you can select specific columns that is
required for analysis instead of selecting the entire data set.
• You can not select specific columns if you are loading csv or text
file - you might want to pre-process the data in command line
using cut or awk commands and filter data required for analysis.
2. Pre-allocate number of rows and pre-define column classes
• Read optimization example :
• read in a few records of the input file , identify the classes of the
input file and assign that column class to the input file while
reading the entire data set
• calculate approximate row count of the data set based on the size
of the file , number of fields in the column ( or using wc in
command line ) and define nrow= parameter
• define comment.char parameter
• Alternately, use fread option from package data.table.
• “fast and friendly file finagler”, the popular data.table package is
an extremely useful and easy to use. Its fread() function is meant
to import data from regular delimited files directly into R,
without any detours or nonsense.
• One of the great things about this function is that all controls,
expressed in arguments such as sep, colClasses and nrows are
automatically detected.
• Also, bit64::integer64 types are also detected and read directly
without needing to read as character before converting.
• ff - ff is another package dealing with large data sets similar to
bigmemory. It uses a pointer as well but to a flat binary file
stored in the disk, and it can be shared across different sessions.
• One advantage ff has over bigmemory is that it supports
multiple data class types in the data set unlike bigmemory.
• Parallel Processing-Parallelism approach runs several
computations at the same time and takes advantage of
multiple cores or CPUs on a single system or across
systems. Following R packages are used for parallel
processing in R.
• Bigmemory - bigmemory is part of the “big” family
which consists of several packages that perform
analysis on large data sets. bigmemory uses several
matrix objects but we will only focus on big.matrix.
• big.matrix is a R object that uses a pointer to a C++
data structure. The location of the pointer to the C++
matrix can be saved to the disk or RAM and shared
with other users in different sessions.
• By loading the pointer object, users can access the data
set without reading the entire set into R.
• Very Large datasets -
• There are two options to process very large data
sets ( > 10GB) in R.
• Use integrated environment packages
like Rhipe to leverage Hadoop MapReduce
framework.
• Use RHadoop directly on hadoop distributed
system.
• Storing large files in databases and connecting
through DBI/ODBC calls from R is also an option
worth considering.

Unit-5
• Regular Expressions- Regular Expressions
(regex) are a set of pattern matching
commands used to detect string sequences in
a large text data. These commands are
designed to match a family (alphanumeric,
digits, words) of text which makes then
versatile enough to handle any text / string
class.
• In short, using regular expressions you can get
more out of text data while writing shorter
codes.
• String Manipulation- In R, we have packages such
as stringr and stringi which are loaded with all string
manipulation functions.
• In addition, R also comprises several base functions for
string manipulations. These functions are designed to
complement regular expressions.
• The practical differences between string manipulation
functions and regular expressions are
• We use string manipulation functions to do simple
tasks such as splitting a string, extracting the first three
letters, etc. We use regular expressions to do more
complicated tasks such as extract email IDs or date
from a set of text.
• String manipulation functions are designed to respond
in a certain way. They don't deviate from their natural
behavior. Whereas, we can customize regular
expressions in any way we want.
List of String Manipulation Functions
List of Regular Expression Commands

• In regex, there are multiple ways of doing a


certain task. Therefore, while learning, it's
essential for you to stick to a particular
method to avoid confusion.
• Regular expressions in R can be divided into 5
categories:
• Meta characters
• Sequences
• Quantifiers
• Character Classes
• POSIX character classes
1. Meta characters – Meta characters comprises a
set of special operators which regex doesn't
capture. These characters include: . \ | ( ) [ ] { } $
*+?
• If any of these characters are available in a string,
regex won't detect them unless they are prefixed
with double backslash (\) in R.
• From a given vector, we want to detect the string
"percent%." We'll use the base grep() function
used to detect strings given a pattern. Also. we'll
use the gsub() function to make the
replacements.
• dt <- c("percent%","percent")
grep(pattern = "percent\\%",x = dt, value = T)
[1] "percent%"
2. Quantifiers - Quantifiers are the shortest to type, but these tiny atoms are
immensely powerful.
• One position here and there can change the entire output value.
• Quantifiers are mainly used to determine the length of the resulting
match.
• Always remember, that quantifiers exercise their power on items to the
immediate left of it.
• Following is the list of quantifiers commonly used in detecting patterns in
text: It matches everything except a newline.
• These quantifiers can be used with metacharacters, sequences, and
character classes to return complex patterns. Combinations of
these quantifiers help us match a pattern. The nature of these
quantifiers is better known in two ways:
• Greedy Quantifiers : The symbol .* is known as a greedy quantifier.
It says that for a particular pattern to be matched, it will try to
match the pattern as many times as its repetition are available.
• Non-Greedy Quantifiers : The symbol .? is known as a non-greedy
quantifier. Being non-greedy, for a particular pattern to be matched,
it will stop at the first match.
• Let's look at an example of greedy vs. non-greedy quantifier. From
the given number, apart from the starting digit, we want to extract
this number till the next digit '1' is detected. The desired result is
101.
• number <- "101000000000100“
• #greedy
regmatches(number, gregexpr(pattern = "1.*1",text = number))
[1] "1010000000001"

• #non greedy
regmatches(number, gregexpr(pattern = "1.?1",text = number))
[1] "101“
• It works like this: the greedy match starts from the first digit, moves
ahead, and stumbles on the second '1' digit. Being greedy, it
continues to search for '1' and stumbles on the third '1' in the
number. Then, it continues to check further but couldn't find more.
Hence, it returns the result as "1010000000001." On the other
hand, the non-greedy quantifier, stops at the first match, thus
returning "101."
• Let's look at a few more examples of quantifiers:
• names <-
c("anna","crissy","puerto","cristian","garcia","steven","alex
","rudy")

• #doesn't matter if e is a match


grep(pattern = "e*",x = names,value = T)
[1] "anna" "crissy" "puerto" "cristian" "garcia" "steven"
"alex" "rudy"

• #must match t one or more times


grep(pattern = "t+",x = names,value = T)
[1] "puerto" "cristian" "steven"

• #must match n two times


grep(pattern = "n{2}",x = names,value = T)
[1] "anna"
3. Sequences - As the name suggests, sequences
contain special characters used to describe a
pattern in a given string. Following are the
commonly used sequences in R:
• gsub(pattern = "\\d", "_", "I'm working in RStudio
v.0.99.484")
• [1] "I'm working in RStudio v._.__.___"
• # substitute any non-digit with an underscore
• gsub(pattern = "\\D", "_", "I'm working in RStudio
v.0.99.484")
• [1] "_________________________0_99_484"
• # substitute any whitespace with underscore
• gsub(pattern = "\\s", "_", "I'm working in RStudio
v.0.99.484")
• [1] "I'm_working_in_RStudio_v.0.99.484“
• # substitute any wording with underscore
• gsub(pattern = "\\w", "_", "I'm working in RStudio
v.0.99.484")
• [1] "_'_ _______ __ _______ _._.__.___"
• Let's look at some examples:
• string <- "I have been to Paris 20 times“
• #match a digit
gsub(pattern = "\\d+",replacement = "_",x = string)
regmatches(string,regexpr(pattern = "\\d+",text = string))

• #match a non-digit
gsub(pattern = "\\D+",replacement = "_",x = string)
regmatches(string,regexpr(pattern = "\\D+",text = string))

• #match a space - returns positions


gregexpr(pattern = "\\s+",text = string)

• #match a non space


gsub(pattern = "\\S+",replacement = "app",x = string)

• #match a word character


gsub(pattern = "\\w",replacement = "k",x = string)

• #match a non-word character


gsub(pattern = "\\W",replacement = "k",x = string)
4. Character Classes - Character classes refer to a set of
characters enclosed in a square bracket [ ].
• These classes match only the characters enclosed in the bracket.
These classes can also be used in conjunction with quantifiers.
• The use of the caret (^) symbol in character classes is interesting. It
negates the expression and searches for everything except the
specified pattern. Following are the types of character classes used
in regex:
• Let's look at some examples using character
classes:
• string <- "20 people got killed in the mob attack.
14 got severely injured"

• #extract numbers
regmatches(x = string,gregexpr("[0-9]+",text =
string))

• #extract without digits


regmatches(x = string,gregexpr("[^0-9]+",text =
string))
5. POSIX Character Classes - In R, these classes
can be identified as enclosed within a double
square bracket ([[ ]]).
• They work like character classes. A caret
ahead of an expression negates the expression
value. Following are the posix character
classes available in R:
• x <- "I like beer! #beer, @wheres_my_beer, I like R
(v3.2.2) #rrrrrrr2015"
• # remove space or tabs
• gsub(pattern = "[[:blank:]]", replacement = "", x)
• [1]
"Ilikebeer!#beer,@wheres_my_beer,IlikeR(v3.2.2)#rrrrr
rr2015"
• # replace punctuation with whitespace
• gsub(pattern = "[[:punct:]]", replacement = " ", x)
• [1] "I like beer beer wheres my beer I like R v3 2 2
rrrrrrr2015"
• # remove alphanumeric characters
• gsub(pattern = "[[:alnum:]]", replacement = "", x)
• [1] " ! #, @__, (..) #"
• Let's look at some of the examples of this regex class:
• string <- c("I sleep 16 hours\n, a day","I sleep 8 hours\n a day.","You
sleep how many\t hours ?")
• #get digits
unlist(regmatches(string,gregexpr("[[:digit:]]+",text = string)))

• #remove punctuations
gsub(pattern = "[[:punct:]]+",replacement = "",x = string)

• #remove spaces
gsub(pattern = "[[:blank:]]",replacement = "-",x = string)

• #remove control characters


gsub(pattern = "[[:cntrl:]]+",replacement = " ",x = string)

• #remove non graphical characters


gsub(pattern = "[^[:graph:]]+",replacement = "",x = string)
Bar Charts
• A bar chart represents data in rectangular bars
with length of the bar proportional to the
value of the variable.
• R uses the function barplot() to create bar
charts. R can draw both vertical and
Horizontal bars in the bar chart.
• In bar chart each of the bars can be given
different colors.
• The basic syntax to create a bar-chart in R is −
• barplot(H,xlab,ylab,main, names.arg,col) Following is
the description of the parameters used −
• H is a vector or matrix containing numeric values used
in bar chart.
• xlab is the label for x axis.
• ylab is the label for y axis.
• main is the title of the bar chart.
• names.arg is a vector of names appearing under each
bar.
• col is used to give colors to the bars in the graph.
• Example- H <- c(7,12,28,3,41)
• barplot(H)
Bar Chart Labels, Title and Colors

• The features of the bar chart can be expanded by


adding more parameters.
• The main parameter is used to add title.
The col parameter is used to add colors to the bars.
• The args.name is a vector having same number of
values as the input vector to describe the meaning of
each bar.
• H <- c(7,12,28,3,41)
• M <- c("Mar","Apr","May","Jun","Jul”)
• barplot(H,names.arg=M,xlab="Month",ylab="Revenue"
,col="blue", main="Revenue chart",border="red")
Group Bar Chart and Stacked Bar Chart
• We can create bar chart with groups of bars and stacks in
each bar by using a matrix as input values.
• More than two variables are represented as a matrix which
is used to create the group bar chart and stacked bar chart.
• colors = c("green","orange","brown")
• months <- c("Mar","Apr","May","Jun","Jul")
• regions <- c("East","West","North")
• Values <- matrix(c(2,9,3,11,9,4,8,7,3,12,5,2,8,10,11), nrow =
3, ncol = 5, byrow = TRUE)
• barplot(Values, main = "total revenue", names.arg =
months, xlab = "month", ylab = "revenue", col = colors)
• legend("topleft", regions, cex = 1.3, fill = colors)
Histogram
• A histogram represents the frequencies of
values of a variable bucketed into ranges.
Histogram is similar to bar chat but the
difference is it groups the values into
continuous ranges. Each bar in histogram
represents the height of the number of values
present in that range.
• R creates histogram using hist() function. This
function takes a vector as an input and uses
some more parameters to plot histograms.
• The basic syntax for creating a histogram using R is −
• hist(v,main,xlab,xlim,ylim,breaks,col,border)
• v is a vector containing numeric values used in histogram.
• main indicates title of the chart.
• col is used to set color of the bars.
• border is used to set border color of each bar.
• xlab is used to give description of x-axis.
• xlim is used to specify the range of values on the x-axis.
• ylim is used to specify the range of values on the y-axis.
• breaks is used to mention the width of each bar.
• A simple histogram is created using input vector, label, col
and border parameters.
• v <- c(9,13,21,8,36,22,12,41,31,33,19)
• hist(v,xlab = "Weight",col = "yellow",border = "blue")
Box Plot
• Boxplots are a measure of how well distributed is
the data in a data set. It divides the data set into
three quartiles. This graph represents the
minimum, maximum, median, first quartile and
third quartile in the data set. It is also useful in
comparing the distribution of data across data
sets by drawing boxplots for each of them.
• Boxplots are created in R by using
the boxplot() function.
• The basic syntax to create a boxplot in R is −
• boxplot(x, data, notch, varwidth, names, main) Following is the
description of the parameters used −
• x is a vector or a formula.
• data is the data frame.
• notch is a logical value. Set as TRUE to draw a notch.
• varwidth is a logical value. Set as true to draw width of the box
proportionate to the sample size.
• names are the group labels which will be printed under each
boxplot.
• main is used to give a title to the graph.
• We use the data set "mtcars" available in the R environment to
create a basic boxplot. Let's look at the columns "mpg" and "cyl" in
mtcars.
• input <- mtcars[,c('mpg','cyl')]
• boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")
• Boxplot with Notch –
• We can draw boxplot with notch to find out how the
medians of different data groups match with each
other.
• The below script will create a boxplot graph with notch
for each of the data group.
• boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
main = "Mileage Data",
notch = TRUE, varwidth = TRUE,
col = c("green","yellow","purple"),
names = c("High","Medium","Low") )
Scatter Plot
• Scatterplots show many points plotted in the Cartesian
plane. Each point represents the values of two variables.
One variable is chosen in the horizontal axis and another
in the vertical axis.
• The simple scatterplot is created using
the plot() function.
• Scatterplot Matrices - When we have more than two
variables and we want to find the correlation between
one variable versus the remaining ones we use
scatterplot matrix. We use pairs() function to create
matrices of scatterplots.
• pairs(formula, data)
• pairs(~wt+mpg+disp+cyl,data = mtcars, main =
"Scatterplot Matrix")
• Syntax of Scatterplot
• plot(x, y, main, xlab, ylab, xlim, ylim, axes)
• x is the data set whose values are the horizontal
coordinates.
• y is the data set whose values are the vertical coordinates.
• main is the tile of the graph.
• xlab is the label in the horizontal axis.
• ylab is the label in the vertical axis.
• xlim is the limits of the values of x used for plotting.
• ylim is the limits of the values of y used for plotting.
• axes indicates whether both axes should be drawn on the
plot.
• input <- mtcars[,c('wt','mpg')]
• plot(x = input$wt,y = input$mpg, xlab = "Weight", ylab =
"Milage", xlim = c(2.5,5), ylim = c(15,30), main = "Weight vs
Milage" )
• 3D Scatterplot- You can create a 3D scatterplot with
the scatterplot3d package. Use the
function scatterplot3d(x, y, z).
• library(scatterplot3d)
• attach(mtcars)
• scatterplot3d(wt,disp,mpg, main="3D Scatterplot")
• # 3D Scatterplot with Coloring and Vertical Drop Lines
• library(scatterplot3d)
• attach(mtcars)
• scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE,
type="h", main="3D Scatterplot")
Strip Chart
• Strip charts can be created using the stripchart() function in
R programming language.
• This function takes in a numeric vector or a list of numeric
vectors, drawing a strip chart for each vector.
• Let us use the built-in dataset airquality which has “Daily air
quality measurements.
• X<-airquality$Ozone
• Multiple Strip Charts- stripchart(x, main="Multiple
stripchart for comparision",
xlab="Degree Fahrenheit",
ylab="Temperature",
method="jitter",
col=c("orange","red"), pch=16 )
Dot plot
• Create dotplots with the dotchart(x,
labels=) function, where x is a numeric vector
and labels is a vector of labels for each point. You
can add a groups= option to designate a factor
specifying how the elements of x are grouped. If
so, the option gcolor= controls the color of the
groups label. cex controls the size of the labels.
• Dotchart(mtcars$mpg,labels=row.names(mtcars),
cex=.7, main="Gas Milage for Car Models",
xlab="Miles Per Gallon")
Density Plots
• Kernal density plots are usually a much more
effective way to view the distribution of a
variable.
• It is created using plot(density(x)) where x is a
numeric vector.
• d <- density(mtcars$mpg)
• plot(d)
Line Graph
• A line chart is a graph that connects a series of
points by drawing line segments between
them. These points are ordered in one of their
coordinate (usually the x-coordinate) value.
Line charts are usually used in identifying the
trends in data.
• The plot() function in R is used to create the
line graph.
• plot(v,type,col,xlab,ylab) Following is the description of
the parameters used −
• v is a vector containing the numeric values.
• type takes the value "p" to draw only the points, "l" to
draw only the lines and "o" to draw both points and
lines.
• xlab is the label for x axis.
• ylab is the label for y axis.
• main is the Title of the chart.
• col is used to give colors to both the points and lines.
• v <- c(7,12,28,3,41)
• plot(v,type = "o")
Pie Chart
• In R the pie chart is created using the pie() function which takes positive
numbers as a vector input. The additional parameters are used to control
labels, color, title etc.
• pie(x, labels, radius, main, col, clockwise) Following is the description of
the parameters used −
• x is a vector containing the numeric values used in the pie chart.
• labels is used to give description to the slices.
• radius indicates the radius of the circle of the pie chart.(value between −1
and +1).
• main indicates the title of the chart.
• col indicates the color palette.
• clockwise is a logical value indicating if the slices are drawn clockwise or
anti clockwise.
• x <- c(21, 62, 10, 53)
• labels <- c("London", "New York", "Singapore", "Mumbai")
• pie(x,labels)
• Slice Percentages and Chart Legend –
• x <- c(21, 62, 10,53)
• labels <- c("London", "NewYork“, "Singapore", "Mumbai")
• piepercent<- round(100*x/sum(x), 1)
• png(file = "city_percentage_legends.jpg")
• pie(x, labels = piepercent, main = "City pie
chart",col = rainbow(length(x)))
legend("topright", c("London","New
York","Singapore","Mumbai"), cex = 0.8, fill =
rainbow(length(x)))
• 3D Pie Chart –
• A pie chart with 3 dimensions can be drawn
using additional packages. The
package plotrix has a function
called pie3D() that is used for this.
• x <- c(21, 62, 10,53)
• lbl <- c("London", "New York", "Singapore", "Mumbai")
• png(file = "3d_pie_chart.jpg")
• pie3D(x,labels = lbl,explode = 0.1, main = "Pie
Chart of Countries ")

You might also like