intro to R

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 253

R Packages

R packages are the collection of R functions, sample data, and compile codes. In the
R environment, these packages are stored under a directory called "library." During
installation, R installs a set of packages. We can add packages later when they are
needed for some specific purpose. Only the default packages will be available when
we start the R console. Other packages which are already installed will be loaded
explicitly to be used by the R program.

There is the following list of commands to be used to check, verify, and use the R
packages.

Check Available R Packages


To check the available R Packages, we have to find the library location in which R
packages are contained. R provides libPaths() function to find the library locations.

1. libPaths()

When the above code executes, it produces the following project, which may vary
depending on the local settings of our PCs & Laptops.

41.7M

941

OOPs Concepts in Java


Next

Stay
[1] "C:/Users/ajeet/OneDrive/Documents/R/win-library/3.6"
[2] "C:/Program Files/R/R-3.6.1/library"

Getting the list of all the packages installed


R provides library() function, which allows us to get the list of all the installed
packages.

1. library()

When we execute the above function, it produces the following result, which may
vary depending on the local settings of our PCs or laptops.

Packages in library 'C:/Program Files/R/R-3.6.1/library':

Like library() function, R provides search() function to get all packages currently
loaded in the R environment.

1. search()
When we execute the above code, it will produce the following result, which may
vary depending on the local settings of our PCs and laptops:

[1] ".GlobalEnv" "package:stats" "package:graphics"


[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"

List of R packages
R is the language of data science which includes a vast repository of packages. These
packages appeal to different regions which use R for their data purposes. CRAN has
10,000 packages, making it an ocean of superlative statistical work. There are lots of
packages in R, but we will discuss the important one.

There are some mostly used and popular packages which are as follows:
1) tidyr
The word tidyr comes from the word tidy, which means clear. So the tidyr package is
used to make the data' tidy'. This package works well with dplyr. This package is an
evolution of the reshape2 package.

2) ggplot2
R allows us to create graphics declaratively. R provides the ggplot package for this
purpose. This package is famous for its elegant and quality graphs which sets it apart
from other visualization packages.

31.3MPrime Ministers of India | List of Prime Minister of India (1947-2020)

3) ggraph
R provides an extension of ggplot known as ggraph. The limitation of ggplot is the
dependency on tabular data is taken away in ggraph.

4) dplyr
R allows us to perform data wrangling and data analysis. R provides the dplyr library
for this purpose. This library facilitates several functions for the data frame in R.

5) tidyquant
The tidyquant is a financial package which is used for carrying out quantitative
financial analysis. This package adds to the tidyverse universe as a financial package
which is used for importing, analyzing and visualizing the data.

6) dygraphs
The dygraphs package provides an interface to the main JavaScript library which we
can use for charting. This package is essentially used for plotting time-series data in
R.

7) leaflet
For creating interactive visualization, R provides the leaflet package. This package is
an open-source JavaScript library. The world's popular websites like the New York
Times, Github and Flicker, etc. are using leaflet. The leaflet package makes it easier to
interact with these sites.
8) ggmap
For delineating spatial visualization, the ggmap package is used. It is a mapping
package which consists of various tools for geolocating and routing.

9) glue
R provides the glue package to perform the operations of data wrangling. This
package is used for evaluating R expressions which are present within the string.

10) shiny
R allows us to develop interactive and aesthetically pleasing web apps by providing
a shiny package. This package provides various extensions with HTML widgets, CSS,
and JavaScript.

11) plotly
The plotly package provides online interactive and quality graphs. This package
extends upon the JavaScript library -plotly.js.

12) tidytext
The tidytext package provides various functions of text mining for word processing
and carrying out analysis through ggplot, dplyr, and other miscellaneous tools.

13) stringr
The stringr package provides simplicity and consistency to use wrappers for the
'stringi' package. The stringi package facilitates common string operations.

14) reshape2
This package facilitates flexible reorganization and aggregation of data using melt ()
and decast () functions.

15) dichromat
The R dichromat package is used to remove Red-Green or Blue-Green contrasts from
the colors.
16) digest
The digest package is used for the creation of cryptographic hash objects of R
functions.

17) MASS
The MASS package provides a large number of statistical functions. It provides
datasets that are in conjunction with the book "Modern Applied Statistics with S."

18) caret
R allows us to perform classification and regression tasks by providing the caret
package. CaretEnsemble is a feature of caret which is used for the combination of
different models.

19) e1071
The e1071 library provides useful functions which are essential for data analysis like
Naive Bayes, Fourier Transforms, SVMs, Clustering, and other miscellaneous
functions.

20) sentimentr
The sentiment package provides functions for carrying out sentiment analysis. It is
used to calculate text polarity at the sentence level and to perform aggregation by
rows or grouping variables.

Data Types in R Programming


In programming languages, we need to use various variables to store various
information. Variables are the reserved memory location to store values. As we create
a variable in our program, some space is reserved in memory.
In R, there are several data types such as integer, string, etc. The operating system
allocates memory based on the data type of the variable and decides what can be
stored in the reserved memory.

There are the following data types which are used in R programming:

Data Example Description


type

Logical True, False It is a special data type for data with only two possible values which can be
construed as true/false.

Numeric 12,32,112,5432 Decimal value is called numeric in R, and it is the default computational
data type.

Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,

Complex Z=1+2i, t=7+3i A complex value in R is defined as the pure imaginary value i.

Character 'a', '"good'", "TRUE", In R programming, a character is used to represent string values. We
'35.4' convert objects into character values with the help ofas.character() function.

Raw A raw data type is used to holds raw bytes.

Let's see an example for better understanding of data types:


1. #Logical Data type
2. variable_logical<- TRUE
3. cat(variable_logical,"\n")
4. cat("The data type of variable_logical is ",class(variable_logical),"\n\n")
5.
6. #Numeric Data type
7. variable_numeric<- 3532
8. cat(variable_numeric,"\n")
9. cat("The data type of variable_numeric is ",class(variable_numeric),"\n\n")
10.
11. #Integer Data type
12. variable_integer<- 133L
13. cat(variable_integer,"\n")
14. cat("The data type of variable_integer is ",class(variable_integer),"\n\n")
15.
16. #Complex Data type
17. variable_complex<- 3+2i
18. cat(variable_complex,"\n")
19. cat("The data type of variable_complex is ",class(variable_complex),"\n\n")
20.
21. #Character Data type
22. variable_char<- "Learning r programming"
23. cat(variable_char,"\n")
24. cat("The data type of variable_char is ",class(variable_char),"\n\n")
25.
26. #Raw Data type
27. variable_raw<- charToRaw("Learning r programming")
28. cat(variable_raw,"\n")
29. cat("The data type of variable_char is ",class(variable_raw),"\n\n")

When we execute the following program, it will give us the following output:
Data Structures in R Programming
Data structures are very important to understand. Data structure are the objects
which we will manipulate in our day-to-day basis in R. Dealing with object
conversions is the most common sources of despairs for beginners. We can say that
everything in R is an object.

R has many data structures, which include:


1. Atomic vector
2. List
3. Array
4. Matrices
5. Data Frame
6. Factors

Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R
data objects. There are six types of atomic vectors such as logical, integer, character,
double, and raw. "A vector is a collection of elements which is most commonly
of mode character, integer, logical or numeric" A vector can be one of the
following two types:

1. Atomic vector
2. Lists
List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a
single mode. A list contains a mixture of data types. The list is also known as generic
vectors because the element of the list can be of any type of R object. "A list is a
special type of vector in which each element can be a different type."

We can create a list with the help of list() or as.list(). We can use vector() to create a
required length empty list.

Arrays
There is another type of data objects which can store data in more than two
dimensions known as arrays. "An array is a collection of a similar data type with
contiguous memory allocation." Suppose, if we create an array of dimension (2, 3,
4) then it creates four rectangular matrices of two rows and three columns.

In R, an array is created with the help of array() function. This function takes a vector
as an input and uses the value in the dim parameter to create an array.

Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional
rectangular layout. In the matrix, elements of the same atomic types are contained.
For mathematical calculation, this can use a matrix containing the numeric element. A
matrix is created with the help of the matrix() function in R.

Syntax

The basic syntax of creating a matrix is as follows:

1. matrix(data, no_row, no_col, by_row, dim_name)

Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in
which each column contains the value of one variable, and row contains the set of
value from each column.

There are the following characteristics of a data frame:

1. The column name will be non-empty.


2. The row names will be unique.
3. A data frame stored numeric, factor or character type data.
4. Each column will contain same number of data items.

Factors
Factors are also data objects that are used to categorize the data and store it as
levels. Factors can store both strings and integers. Columns have a limited number of
unique values so that factors are very useful in columns. It is very useful in data
analysis for statistical modeling.

Factors are created with the help of factor () function by taking a vector as an input
parameter.

Variables in R Programming
Variables are used to store the information to be manipulated and referenced in the
R program. The R variable can store an atomic vector, a group of atomic vectors, or a
combination of many R objects.

Language like C++ is statically typed, but R is a dynamically typed, means it check
the type of data type when the statement is run. A valid variable name contains
letter, numbers, dot and underlines characters. A variable name should start with a
letter or the dot not followed by a number.

Name of Validity Reason for valid and invalid


variable

_var_name Invalid Variable name can't start with an underscore(_).

var_name, Valid Variable can start with a dot, but dot should not be followed by a number. In
var.name this case, the variable will be invalid.

var_name% Invalid In R, we can't use any special character in the variable name except dot and
underscore.

2var_name Invalid Variable name cant starts with a numeric digit.

.2var_name Invalid A variable name cannot start with a dot which is followed by a digit.
var_name2 Valid The variable contains letter, number and underscore and starts with a letter.

Assignment of variable
In R programming, there are three operators which we can use to assign the values
to the variable. We can use leftward, rightward, and equal_to operator for this
purpose.

There are two functions which are used to print the value of the variable i.e., print()
and cat(). The cat() function combines multiples values into a continuous print
output.

1. # Assignment using equal operator.


2. variable.1 = 124
3.
4. # Assignment using leftward operator.
5. variable.2 <- "Learn R Programming"
6.
7. # Assignment using rightward operator.
8. 133L -> variable.3
9.
10. print(variable.1)
11. cat ("variable.1 is ", variable.1 ,"\n")
12. cat ("variable.2 is ", variable.2 ,"\n")
13. cat ("variable.3 is ", variable.3 ,"\n")

When we execute the above code in our R command prompt, it will give us the
following output:

Data types of variable


R programming is a dynamically typed language, which means that we can change
the data type of the same variable again and again in our program. Because of its
dynamic nature, a variable is not declared of any data type. It gets the data type from
the R-object, which is to be assigned to the variable.

We can check the data type of the variable with the help of the class() function. Let's
see an example:

1. variable_y<- 124
2. cat("The data type of variable_y is ",class(variable_y),"\n")
3.
4. variable_y<- "Learn R Programming"
5. cat(" Now the data type of variable_y is ",class(variable_y),"\n")
6.
7. variable_y<- 133L
8. cat(" Next the data type of variable_y becomes ",class(variable_y),"\n")

When we execute the above code in our R command prompt, it will give us the
following output:

Keywords in R Programming
In programming, a keyword is a word which is reserved by a program because it has
a special meaning. A keyword can be a command or a parameter. Like in C, C++,
Java, there is also a set of keywords in R. A keyword can't be used as a variable name.
Keywords are also called as "reserved names."

There are the following keywords as per reserved or help(reserved) command:

If else repeat

While function for


Next break TRUE

FALSE NULL Inf

NaN NA NA_integer_

NA_real_ NA_complex_ NA_character_

1) if
The if statement consists of a Boolean expression which is followed by one or more
statements. In R, if statement is the simplest conditional statement which is used to
decide whether a block of the statement will be executed or not.

Example

1. a<-11
2. if(a<15)
3. + print("I am lesser than 15")

Output:
2) else
The R else statement is associated with if statement. When the if statement's
condition is false only then else block will be executed. Let see an example to make it
clear:

Example:

1. a<-22
2. if(a<20){
3. cat("I am lesser than 20")
4. }else{
5. cat("I am larger than 20")
6. }

Output:

3) repeat
The repeat keyword is used to iterate over a block of code multiple numbers of
times. In R, repeat is a loop, and in this loop statement, there is no condition to exit
from the loop. For exiting the loop, we will use the break statement.

Example:

1. x <- 1
2. repeat {
3. cat(x)
4. x = x+1
5. if (x == 6){
6. break
7. }
8. }
Output:

4) while
A while keyword is used as a loop. The while loop is executed until the given
condition is true. This is also used to make an infinite loop.

Example:

1. a <- 20
2. while(a!=0){
3. cat(a)
4. a = a-2
5. }

Output:

5) function
A function is an object in R programming. The keyword function is used to create a
user-define function in R. R has some pre-defined functions also, such as seq, mean,
and sum.

Example:

1. new.function<- function(n) {
2. for(i in 1:n) {
3. a <- i^2
4. print(a)
5. }
6. }
7. new.function(6)
Output:

6) for
The for is a keyword which is used for looping or iterating over a sequence
(dictionary, string, list, set or tuple).

We can execute a set of a statement once for each item in the iterator (list, set, tuple,
etc.) with the help of for loop.

Example:

1. v <- LETTERS[1:4]
2. for ( i in v) {
3. print(i)
4. }

Output:

7) next
The next keyword skips the current iteration of a loop without terminating it. When R
parser found next, it skips further evaluation and starts the new iteration of the loop.

Example:

1. v <- LETTERS[1:6]
2. for ( i in v) {
3. if (i == "D") {
4. next
5. }
6. print(i)
7. }

Output:

8) break
The break keyword is used to terminate the loop if the condition is true. The control
of the program firstly passes to the outer statement then passes to the body of the
break statement.

Example:

1. n<-1
2. while(n<10){
3. if(n==3)
4. break
5. n=n+1
6. cat(n,"\n")
7. }
8. cat("End of the program")

Output:
9) TRUE/FALSE
The TRUE and FALSE keywords are used to represent a Boolean true and Boolean
false. If the given statement is true, then the interpreter returns true else the
interpreter returns false.

10) NULL
In R, NULL represents the null object. NULL is used to represent missing and
undefined values. NULL is the logical representation of a statement which is neither
TRUE nor FALSE.

Example:

1. as.null(list(a = 1, b = "c"))

Output:

11) Inf and NaN


The is.finite and is.infinite function returns a vector of the same length indicating
which elements are finite or infinite.

Inf and -Inf are positive and negative infinity. NaN stands for 'Not a Number.' NaN
applies on numeric values and real and imaginary parts of complex values, but it will
not apply to the values of integer vectors.

Usage

1. is.finite(x)
2. is.infinite(x)
3. is.nan(x)
4.
5. Inf
6. NaN

12) NA
NA is a logical constant of length 1 that contains a missing value indicator. It can be
coerced to any other vector type except raw. There are other types of constant also,
such as NA_Integer_, NA_real_, NA_complex_, and NA_character. These constants are
of the other atomic vector type which supports missing values.

Usage

1. NA
2. is.na(x)
3. anyNA(x, recursive = FALSE)
4.
5. ## S3 method for class 'data.frame'
6. is.na(x)
7.
8. is.na(x) <- value

Operators in R
In computer programming, an operator is a symbol which represents an action. An
operator is a symbol which tells the compiler to perform
specific logical or mathematical manipulations. R programming is very rich in built-
in operators.

In R programming, there are different types of operator, and each operator


performs a different task. For data manipulation, There are some advance operators
also such as model formula and list indexing.
There are the following types of operators used in R:

1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators

Arithmetic Operators
Arithmetic operators are the symbols which are used to represent arithmetic math
operations. The operators act on each and every element of the vector. There are
various arithmetic operators which are supported by R.

00:00/11:48

S. Operator Description Example


No

1. + This operator is used to add two vectors in R. a <- c(2, b <- c(11, 5, 3)
print(a+b)
3.3, 4)
It will give us the following
output:
[1] 13.0 8.3
5.0

2. - This operator is used to divide a vector from another b <- c(11, 5, 3)


print(a-b)
one. a <- c(2, 3.3, 4)
It will give us the following
output:
[1] -9.0 -1.7
3.0

3. * This operator is used to multiply two vectors with each b <- c(11, 5, 3)
print(a*b)
other. a <- c(2, 3.3, 4)
It will give us the following
output:
[1] 22.0 16.5
4.0

4. / This operator divides the vector from another one. a b <- c(11, 5, 3)
print(a/b)
<- c(2, 3.3, 4)
It will give us the following
output:
[1] 0.1818182
0.6600000 4.0000000

5. %% This operator is used to find the remainder of the first b <- c(11, 5, 3)
print(a%%b)
vector with the second vector. a <- c(2, 3.3, 4)
It will give us the following
output:
[1] 2.0 3.3 0

6. %/% This operator is used to find the division of the first a <- c(2, 3.3,
4)
vector with the second(quotient).
b <- c(11, 5, 3)
print(a%/%b)

It will give us the following


output:
[1] 0 0 4

7. ^ This operator raised the first vector to the exponent of b <- c(11, 5, 3)
print(a^b)
the second vector. a <- c(2, 3.3, 4)
It will give us the following
output:
[1] 0248.0000
391.3539 4.0000
Relational Operators
A relational operator is a symbol which defines some kind of relation between two
entities. These include numerical equalities and inequalities. A relational operator
compares each element of the first vector with the corresponding element of the
second vector. The result of the comparison will be a Boolean value. There are the
following relational operators which are supported by R:

S. Operator Description Example


No

1. > This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is greater than the corresponding element of the
b <- c(2, 4,
second vector. 6)
print(a>b)

It will give us the


following output:
[1]
FALSEFALSEFALSE

2. < This operator will return TRUE when every element in the a <- c(1, 9,
5)
first vector is less then the corresponding element of the
b <- c(2, 4,
second vector. 6)
print(a<b)

It will give us the


following output:
[1] FALSE
TRUE FALSE

3. <= This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is less than or equal to the corresponding
b <- c(2, 3,
element of another vector. 6)
print(a<=b)

It will give us the


following output:
[1]
TRUETRUETRUE

4. >= This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is greater than or equal to the corresponding
b <- c(2, 3,
element of another vector. 6)
print(a>=b)

It will give us the


following output:
[1] FALSE
TRUE FALSE

5. == This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is equal to the corresponding element of the
b <- c(2, 3,
second vector. 6)
print(a==b)

It will give us the


following output:
[1] FALSE TRUE
FALSE

6. != This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is not equal to the corresponding element of the
b <- c(2, 3,
second vector. 6)
print(a>=b)

It will give us the


following output:
[1] TRUE
FALSE TRUE

Logical Operators
The logical operators allow a program to make a decision on the basis of multiple
conditions. In the program, each operand is considered as a condition which can be
evaluated to a false or true value. The value of the conditions is used to determine
the overall value of the op1 operator op2. Logical operators are applicable to those
vectors whose type is logical, numeric, or complex.

The logical operator compares each element of the first vector with the
corresponding element of the second vector.

There are the following types of operators which are supported by R:

S. Operator Description Example


No

1. & This operator is known as the Logical AND operator. This a <- c(3, 0,
TRUE, 2+2i)
operator takes the first element of both the vector and
b <- c(2, 4,
returns TRUE if both the elements are TRUE. TRUE, 2+3i)
print(a&b)

It will give us the following


output:
[1] TRUE
FALSE TRUE TRUE

2. | This operator is called the Logical OR operator. This a <- c(3, 0,


TRUE, 2+2i)
operator takes the first element of both the vector and
b <- c(2, 4,
returns TRUE if one of them is TRUE. TRUE, 2+3i)
print(a|b)

It will give us the following


output:
[1]
TRUETRUETRUETRUE

3. ! This operator is known as Logical NOT operator. This a <- c(3, 0,


TRUE, 2+2i)
operator takes the first element of the vector and gives
print(!a)
the opposite logical value as a result.
It will give us the following
output:
[1] FALSE
TRUE FALSE FALSE

4. && This operator takes the first element of both the vector a <- c(3, 0,
TRUE, 2+2i)
and gives TRUE as a result, only if both are TRUE.
b <- c(2, 4,
TRUE, 2+3i)
print(a&&b)

It will give us the following


output:
[1] TRUE

5. || This operator takes the first element of both the vector a <- c(3, 0,
TRUE, 2+2i)
and gives the result TRUE, if one of them is true.
b <- c(2, 4,
TRUE, 2+3i)
print(a||b)
It will give us the following
output:
[1] TRUE

Assignment Operators
An assignment operator is used to assign a new value to a variable. In R, these
operators are used to assign values to vectors. There are the following types of
assignment

S. Operator Description Example


No

1. <- or = or <<- These operators are known as left assignment a <- c(3, 0, TRUE,
2+2i)
operators.
b <<- c(2, 4,
TRUE, 2+3i)
d = c(1, 2, TRUE,
2+3i)
print(a)
print(b)
print(d)

It will give us the following


output:
[1] 3+0i 0+0i
1+0i 2+2i
[1] 2+0i 4+0i
1+0i 2+3i
[1] 1+0i 2+0i
1+0i 2+3i

2. -> or ->> These operators are known as right assignment c(3, 0, TRUE,
2+2i) -> a
operators.
c(2, 4, TRUE,
2+3i) ->> b
print(a)
print(b)

It will give us the following


output:
[1] 3+0i 0+0i
1+0i 2+2i
[1] 2+0i 4+0i
1+0i 2+3i
operators which are supported by R:

Miscellaneous Operators
Miscellaneous operators are used for a special and specific purpose. These operators
are not used for general mathematical or logical computation. There are the
following miscellaneous operators which are supported in R

S. Operator Description Example


No

1. : The colon operator is used to create the series of v <- 1:8


print(v)
numbers in sequence for a vector.
It will give us the following output:
[1] 1 2 3 4 5 6 7 8

2. %in% This is used when we want to identify if an a1 <- 8


a2 <- 12
element belongs to a vector.
d <- 1:10
print(a1%in%t)
print(a2%in%t)

It will give us the following output:


[1] FALSE
[1] FALSE

3. %*% It is used to multiply a matrix with its transpose. M=matrix(c(1,2,3,4,5,6), nrow=2,


ncol=3, byrow=TRUE)
T=m%*%T(m)
print(T)

It will give us the following output:


14 32
32 77

conditional statements :
if Statement
The if statement consists of the Boolean expressions followed by one or more
statements. The if statement is the simplest decision-making statement which helps
us to take a decision on the basis of the condition.
The if statement is a conditional programming statement which performs the
function and displays the information if it is proved true.

The block of code inside the if statement will be executed only when the boolean
expression evaluates to be true. If the statement evaluates false, then the code which
is mentioned after the condition will run.

The syntax of if statement in R is as follows: OOPs Concepts in Java

1. if(boolean_expression) {
2. // If the boolean expression is true, then statement(s) will be executed.
3. }

Flow Chart

Let see some examples to understand how if statements work and perform a certain
task in R.

Example 1

1. x <-24L
2. y <- "shubham"
3. if(is.integer(x))
4. {
5. print("x is an Integer")
6. }

Output:

Example 2

1. x <-20
2. y<-24
3. count=0
4. if(x<y)
5. {
6. cat(x,"is a smaller number\n")
7. count=1
8. }
9. if(count==1){
10. cat("Block is successfully execute")
11. }

Output:

Example 3

1. x <-1
2. y<-24
3. count=0
4. while(x<y){
5. cat(x,"is a smaller number\n")
6. x=x+2
7. if(x==15)
8. break
9. }

Output:

Example 4

1. x <-24
2. if(x%%2==0){
3. cat(x," is an even number")
4. }
5. if(x%%2!=0){
6. cat(x," is an odd number")
7. }

Output:

Example 5

1. year
2. 1 = 2011
3. if(year1 %% 4 == 0) {
4. if(year1 %% 100 == 0) {
5. if(year1 %% 400 == 0) {
6. cat(year,"is a leap year")
7. } else {
8. cat(year,"is not a leap year")
9. }
10. } else {
11. cat(year,"is a leap year")
12. }
13. } else {
14. cat(year,"is not a leap year")
15. }

Output:

If-else statement
An if-else statement, else statement will be executed when the boolean
expression will false. In simple words, If a Boolean expression will have true value,
then the if block gets executed otherwise, the else block will get executed.

R programming treats any non-zero and non-null values as true, and if the value is
either zero or null, then it treats them as false.

The basic syntax of If-else statement is as follows:

1. if(boolean_expression) {
2. // statement(s) will be executed if the boolean expression is true.
3. } else {
4. // statement(s) will be executed if the boolean expression is false.
5. }

Flow Chart

Example 1

1. # local variable definition


2. a<- 100
3. #checking boolean condition
4. if(a<20){
5. # if the condition is true then print the following
6. cat("a is less than 20\n")
7. }else{
8. # if the condition is false then print the following
9. cat("a is not less than 20\n")
10. }
11. cat("The value of a is", a)

Output:
Example 2

1. x <- c("Hardwork","is","the","key","of","success")
2.
3. if("key" %in% x) {
4. print("key is found")
5. } else {
6. print("key is not found")
7. }

Output:

Example 3

1. a<- 100
2. #checking boolean condition
3. if(a<20){
4. cat("a is less than 20")
5. if(a%%2==0){
6. cat(" and an even number\n")
7. }
8. else{
9. cat(" but not an even number\n")
10. }
11. }else{
12. cat("a is greater than 20")
13. if(a%%2==0){
14. cat(" and an even number\n")
15. }
16. else{
17. cat(" but not an even number\n")
18. }
19. }
Output:

Example 4

1. a<- 'u'
2. if(a=='a'||a=='e'||a=='i'||a=='o'||a=='u'||a=='A'||a=='E'||a=='I'||a=='O'||a=='U'){
3. cat("character is a vowel\n")
4. }else{
5. cat("character is a constant")
6. }
7. cat("character is =",a)
8. }

Output:

Example 5

1. a<- 'u'
2. if(a=='a'||a=='e'||a=='i'||a=='o'||a=='u'||a=='A'||a=='E'||a=='I'||a=='O'||a=='U'){
3. cat("character is a vowel\n")
4. }else{
5. cat("character is a constant")
6. }
7. cat("character is =",a)
8. }

Output:

R else if statement
This statement is also known as nested if-else statement. The if statement is followed
by an optional else if..... else statement. This statement is used to test various
condition in a single if......else if statement. There are some key points which are
necessary to keep in mind when we are using the if.....else if.....else statement. These
points are as follows:

1. if statement can have either zero or one else statement and it must come after
any else if's statement.
2. if statement can have many else if's statement and they come before the else
statement.
3. Once an else if statement succeeds, none of the remaining else if's or else's will be
tested.

The basic syntax of If-else statement is as follows:

1. if(boolean_expression 1) {
2. // This block executes when the boolean expression 1 is true.
3. } else if( boolean_expression 2) {
4. // This block executes when the boolean expression 2 is true.
5. } else if( boolean_expression 3) {
6. // This block executes when the boolean expression 3 is true.
7. } else {
8. // This block executes when none of the above condition is true.
9. }

Flow Chart

Example 1

1. age <- readline(prompt="Enter age: ")


2. age <- as.integer(age)
3. if(age<18)
4. print("You are child")
5. else if(age>30)
6. print("You are old guy")
7. else
8. print("You are adult")

Output:
Example 2

1. marks=83;
2. if(marks>75){
3. print("First class")
4. }else if(marks>65){
5. print("Second class")
6. }else if(marks>55){
7. print("Third class")
8. }else{
9. print("Fail")
10. }

Output:
Example 3

1. cat("1) For Addition\n")


2. cat("2) For Subtraction\n")
3. cat("3) For Division\n")
4. cat("4) For multiplication\n")
5. n1<-readline(prompt="Enter first number:")
6. n2<-readline(prompt="Enter second number:")
7. choice<-readline(prompt="Enter your choice:")
8. n1<- as.integer(n1)
9. n2<- as.integer(n2)
10. choice<- as.integer(choice)
11. if(choice==1){
12. sum <-(n1+n2)
13. cat("sum=",sum)
14. }else if(choice==2){
15. sub<-(n1-n2)
16. cat("sub=",sub)
17. }else if(choice==3){
18. div<-n1/n2
19. cat("Division=",div)
20. }else if(choice==4){
21. mul<-n1*n2
22. cat("mul=",mul)
23. }else{
24. cat("wrong choice")
25. }

Output:
Example 4

1. x <- c("Hardwork","is","the","key","of","success")
2. if("Success" %in% x) {
3. print("success is found in the first time")
4. } else if ("success" %in% x) {
5. print("success is found in the second time")
6. } else {
7. print("No success found")
8. }

Output:

Example 5

1. n1=4
2. n2=87
3. n3=43
4. n4=74
5. if(n1>n2){
6. if(n1>n3&&n1>n4){
7. largest=n1
8. }
9. }else if(n2>n3){
10. if(n2>n1&&n2>n4){
11. largest=n2
12. }
13. }else if(n3>n4){
14. if(n3>n1&&n3>n2){
15. largest=n3
16. }
17. }else{
18. largest=n4
19. }
20. cat("Largest number is =",largest)

Output:

R Switch Statement
A switch statement is a selection control mechanism that allows the value of an
expression to change the control flow of program execution via map and search.

The switch statement is used in place of long if statements which compare a variable
with several integral values. It is a multi-way branch statement which provides an
easy way to dispatch execution for different parts of code. This code is based on the
value of the expression.

This statement allows a variable to be tested for equality against a list of values. A
switch statement is a little bit complicated. To understand it, we have some key
points which are as follows:
o If expression type is a character string, the string is matched to the listed cases.
o If there is more than one match, the first match element is used.
o No default case is available.
o If no case is matched, an unnamed case is used.

There are basically two ways in which one of the cases is selected:

1) Based on Index
If the cases are values like a character vector, and the expression is evaluated to a
number than the expression's result is used as an index to select the case.

2) Based on Matching Value


When the cases have both case value and output value like ["case_1"="value1"], then
the expression value is matched against case values. If there is a match with the case,
the corresponding value is the output.

The basic syntax of If-else statement is as follows:

1. switch(expression, case1, case2, case3....)

Flow Chart

Example 1

1. x <- switch(
2. 3,
3. "Shubham",
4. "Nishka",
5. "Gunjan",
6. "Sumit"
7. )
8. print(x)

Output:

Example 2

1. ax= 1
2. bx = 2
3. y = switch(
4. ax+bx,
5. "Hello, Shubham",
6. "Hello Arpita",
7. "Hello Vaishali",
8. "Hello Nishka"
9. )
10. print (y)

Output:

Example 3
1. y = "18"
2. x = switch(
3. y,
4. "9"="Hello Arpita",
5. "12"="Hello Vaishali",
6. "18"="Hello Nishka",
7. "21"="Hello Shubham"
8. )
9.
10. print (x)

Output:

Example 4

1. x= "2"
2. y="1"
3. a = switch(
4. paste(x,y,sep=""),
5. "9"="Hello Arpita",
6. "12"="Hello Vaishali",
7. "18"="Hello Nishka",
8. "21"="Hello Shubham"
9. )
10.
11. print (a)

Output:
Example 5

1. y = "18"
2. a=10
3. b=2
4. x = switch(
5. y,
6. "9"=cat("Addition=",a+b),
7. "12"=cat("Subtraction =",a-b),
8. "18"=cat("Division= ",a/b),
9. "21"=cat("multiplication =",a*b)
10. )
11.
12. print (x)

Output:

next Statement
The next statement is used to skip any remaining statements in the loop and
continue executing. In simple words, a next statement is a statement which skips the
current iteration of a loop without terminating it. When the next statement is
encountered, the R parser skips further evaluation and starts the next iteration of the
loop.

This statement is mostly used with for loop and while loop.

Note: In else branch of the if-else statement, the next statement can also be used.

Syntax
There is the following syntax for creating the next statement in R

1. next

Flowchart

Example 1: next in repeat loop

1. a <- 1
2. repeat {
3. if(a == 10)
4. break
5. if(a == 5){
6. next
7. }
8. print(a)
9. a <- a+1
10. }

Output:
Example 2: next in while loop

1. a<-1
2. while (a < 10) {
3. if(a==5)
4. next
5. print(a)
6. a=a+1
7. }

Output:

Example 3: next in for loop

1. x <- 1:10
2. for (val in x) {
3. if (val == 3){
4. next
5. }
6. print(val)
7. }
Output:

Example 4

1. a1<- c(10L,-11L,12L,-13L,14L,-15L,16L,-17L,18L)
2. sum<-0
3. for(i in a1){
4. if(i<0){
5. next
6. }
7. sum=sum+i
8. }
9. cat("The sum of all positive numbers in array is=",sum)

Output:

Example 5

1. j<-0
2. while(j<10){
3. if (j==7){
4. j=j+1
5. next
6. }
7. cat("\nnumber is =",j)
8. j=j+1
9. }

Output:

Break Statement
In the R language, the break statement is used to break the execution and for an
immediate exit from the loop. In nested loops, break exits from the innermost loop
only and control transfer to the outer loop.

It is useful to manage and control the program execution flow. We can use it to
various loops like: for, repeat, etc.

There are basically two usages of break statement which are as follows:

1. When the break statement is inside the loop, the loop terminates immediately and
program control resumes on the next statement after the loop.
2. It is also used to terminate a case in the switch statement.
Note: We can also use break statement inside the else branch
of if...else statement.

Syntax
There is the following syntax for creating a break statement in R

1. break

Flowchart

Example 1: Break in repeat loop

1. a <- 1
2. repeat {
3. print("hello");
4. if(a >= 5)
5. break
6. a<-a+1
7. }
Output:

Example 2

1. v <- c("Hello","loop")
2. count <- 2
3. repeat {
4. print(v)
5. count <- count + 1
6. if(count > 5) {
7. break
8. }
9. }

Output:

Example 3: Break in while loop


1. a<-1
2. while (a < 10) {
3. print(a)
4. if(a==5)
5. break
6. a=a+1
7. }

Output:

Example 4: Break in for loop

1. for (i in c(2,4,6,8)) {
2. for (j in c(1,3)) {
3. if (i==6)
4. break
5. print(i)
6. }
7. }

Output:
Example 5

1. num=7
2. flag = 0
3. if(num> 1) {
4. flag = 1
5. for(i in 2:(num-1)) {
6. if ((num %% i) == 0) {
7. flag = 0
8. break
9. }
10. }
11. }
12. if(num == 2) flag = 1
13. if(flag == 1) {
14. print(paste(num,"is a prime number"))
15. } else {
16. print(paste(num,"is not a prime number"))
17. }

Output:
For Loop
A for loop is the most popular control flow statement. A for loop is used to iterate a
vector. It is similar to the while loop. There is only one difference between for and
while, i.e., in while loop, the condition is checked before the execution of the body,
but in for loop condition is checked after the execution of the body.

There is the following syntax of For loop in C/C++:

1. for (initialization_Statement; test_Expression; update_Statement)


2. {
3. // statements inside the body of the loop
4. }

How For loop works in C/C++?


The for loop in C and C++ is executed in the following way:

o The initialization statement of for loop is executed only once.


o After the initialization process, the test expression is evaluated. The for loop is
terminated when the test expression is evaluated to false.
o The statements inside the body of for loop are executed, and expression is updated if
the test expression is evaluated to true.
o The test expression is again evaluated.
o The process continues until the test expression is false. The loop terminates when the
test expression is false.

For loop in R Programming


In R, a for loop is a way to repeat a sequence of instructions under certain conditions.
It allows us to automate parts of our code which need repetition. In simple words, a
for loop is a repetition control structure. It allows us to efficiently write the loop that
needs to execute a certain number of time.

In R, a for loop is defined as :

1. It starts with the keyword for like C or C++.


2. Instead of initializing and declaring a loop counter variable, we declare a variable
which is of the same type as the base type of the vector, matrix, etc., followed by a
colon, which is then followed by the array or matrix name.
3. In the loop body, use the loop variable rather than using the indexed array element.
4. There is a following syntax of for loop in R:

1. for (value in vector) {


2. statements
3. }

Flowchart

Example 1: We iterate all the elements of a vector and print the current value.

1. # Create fruit vector


2. fruit <- c('Apple', 'Orange',"Guava", 'Pinapple', 'Banana','Grapes')
3. # Create the for statement
4. for ( i in fruit){
5. print(i)
6. }

Output

Example 2: creates a non-linear function with the help of the polynomial of x


between 1 and 5 and store it in a list.

1. # Creating an empty list


2. list <- c()
3. # Creating a for statement to populate the list
4. for (i in seq(1, 5, by=1)) {
5. list[[i]] <- i*i
6. }
7. print(list)

Output

Example 3: For loop over a matrix

1. # Creating a matrix
2. mat <- matrix(data = seq(10, 21, by=1), nrow = 6, ncol =2)
3. # Creating the loop with r and c to iterate over the matrix
4. for (r in 1:nrow(mat))
5. for (c in 1:ncol(mat))
6. print(paste("mat[", r, ",",c, "]=", mat[r,c]))
7. print(mat)

Output

Example 4: For loop over a list

1. # Create a list with three vectors


2. fruit <- list(Basket = c('Apple', 'Orange',"Guava", 'Pinapple', 'Banana','Grapes'),
3. Money = c(10, 12, 15), purchase = TRUE)
4. for (p in fruit)
5. {
6. print(p)
7. }

Output
Example 5: count the number of even numbers in a vector.# Create a list with three
vectors.

1. x <- c(2,5,3,9,8,11,6,44,43,47,67,95,33,65,12,45,12)
2. count <- 0
3. for (val in x) {
4. if(val %% 2 == 0) count = count+1
5. }
6. print(count)

Output

R repeat loop
A repeat loop is used to iterate a block of code. It is a special type of loop in which
there is no condition to exit from the loop. For exiting, we include a break statement
with a user-defined condition. This property of the loop makes it different from the
other loops.

A repeat loop constructs with the help of the repeat keyword in R. It is very easy to
construct an infinite loop in R.
The basic syntax of the repeat loop is as follows:

1. repeat {
2. commands
3. if(condition) {
4. break
5. }
6. }

Flowchart

1. First, we have to initialize our variables than it will enter into the Repeat loop.
2. This loop will execute the group of statements inside the loop.
3. After that, we have to use any expression inside the loop to exit.
4. It will check for the condition. It will execute a break statement to exit from the loop
5. If the condition is true.
6. The statements inside the repeat loop will be executed again if the condition is false.

Example 1:

1. v <- c("Hello","repeat","loop")
2. cnt <- 2
3. repeat {
4. print(v)
5. cnt <- cnt+1
6.
7. if(cnt > 5) {
8. break
9. }
10. }

Output

Example 2:

1. sum <- 0
2. {
3. n1<-readline(prompt="Enter any integer value below 20: " )
4. n1<-as.integer(n1)
5. }
6. repeat{
7. sum<-sum+n1
8. n1n1=n1+1
9. if(n1>20){
10. break
11. }
12. }
13. cat("The sum of numbers from the repeat loop is: ",sum)

Output
Example 3: Infinity repeat loop

1. total<-0
2. number<-readline(prompt="please enter any integer value: ")
3. repeat{
4. totaltotal=total+number
5. numbernumber=number+1
6. cat("sum is =",total)
7. }

Output
Example 4: repeat loop with next

1. a <- 1
2. repeat {
3. if(a == 10)
4. break
5. if(a == 7){
6. aa=a+1
7. next
8. }
9. print(a)
10. a <- a+1
11. }

Output
Example 5:

1. terms<-readline(prompt="How many terms do you want ?")


2. terms<-as.integer(terms)
3. i<-1
4. repeat{
5. print(paste("The cube of number",i,"is =",(i*i*i)))
6. if(i==terms)
7. break
8. i<-i+1
9. }

Output
while loop
A while loop is a type of control flow statements which is used to iterate a block of
code several numbers of times. The while loop terminates when the value of the
Boolean expression will be false.

In while loop, firstly the condition will be checked and then after the body of the
statement will execute. In this statement, the condition will be checked n+1 time,
rather than n times.

The basic syntax of while loop is as follows:

1. while (test_expression) {
2. statement
3. }

Flowchart

00:00/05:29

Example 1:
1. v <- c("Hello","while loop","example")
2. cnt <- 2
3. while (cnt < 7) {
4. print(v)
5. cntcnt = cnt + 1
6. }}

Output

Example 2: Program to find the sum of the digits of the number.

1. n<-readline(prompt="please enter any integer value: ")


2. please enter any integer value: 12367906
3. n <- as.integer(n)
4. sum<-0
5. while(n!=0){
6. sumsum=sum+(n%%10)
7. n=as.integer(n/10)
8. }
9. cat("sum of the digits of the numbers is=",sum)

Output
Example 3: Program to check a number is palindrome or not.

1. n <- readline(prompt="Enter a four digit number please: ")


2. n <- as.integer(n)
3. num<-n
4. rev<-0
5. while(n!=0){
6. rem<-n%%10
7. rev<-rem+(rev*10)
8. n<-as.integer(n/10)
9. }
10. print(rev)
11. if(rev==num){
12. cat(num,"is a palindrome num")
13. }else{
14. cat(num,"is not a palindrome number")
15. }

Output
Example 4: Program to check a number is Armstrong or not.

1. num = as.integer(readline(prompt="Enter a number: "))


2. sum = 0
3. temp = num
4. while(temp > 0) {
5. digit = temp %% 10
6. sumsum = sum + (digit ^ 3)
7. temp = floor(temp / 10)
8. }
9. if(num == sum) {
10. print(paste(num, "is an Armstrong number"))
11. } else {
12. print(paste(num, "is not an Armstrong number"))
13. }

Output
Example 5: program to find the frequency of a digit in the number.

1. num = as.integer(readline(prompt="Enter a number: "))


2. digit = as.integer(readline(prompt="Enter digit: "))
3. n=num
4. count = 0
5. while(num > 0) {
6. if(num%%10==digit){
7. countcount=count+1
8. }
9. num=as.integer(num/10)
10. }
11. print(paste("The frequency of",digit,"in",n,"is=",count))

Output
R Functions
A set of statements which are organized together to perform a specific task is known
as a function. R provides a series of in-built functions, and it allows the user to create
their own functions. Functions are used to perform tasks in the modular approach.

Functions are used to avoid repeating the same task and to reduce complexity. To
understand and maintain our code, we logically break it into smaller parts using the
function. A function should be

1. Written to carry out a specified task.


2. May or may not have arguments
3. Contain a body in which our code is written.
4. May or may not return one or more output values.

"An R function is created by using the keyword function." There is the following
syntax of R function:

1. func_name <- function(arg_1, arg_2, ...) {


2. Function body
3. }
Components of Functions
There are four components of function, which are as follows:

Function Name

The function name is the actual name of the function. In R, the function is stored as
an object with its name.

Arguments

In R, an argument is a placeholder. In function, arguments are optional means a


function may or may not contain arguments, and these arguments can have default
values also. We pass a value to the argument when a function is invoked.

Function Body

The function body contains a set of statements which defines what the function does.

Return value

It is the last expression in the function body which is to be evaluated.


Function Types
Similar to the other languages, R also has two types of function, i.e. Built-in
Function and User-defined Function. In R, there are lots of built-in functions which
we can directly call in the program without defining them. R also allows us to create
our own functions.

Built-in function
The functions which are already created or defined in the programming framework
are known as built-in functions. User doesn't need to create these types of functions,
and these functions are built into an application. End-users can access these
functions by simply calling it. R have different types of built-in functions such as
seq(), mean(), max(), and sum(x) etc.

1. # Creating sequence of numbers from 32 to 46.


2. print(seq(32,46))
3.
4. # Finding the mean of numbers from 22 to 80.
5. print(mean(22:80))
6.
7. # Finding the sum of numbers from 41 to 70.
8. print(sum(41:70))

Output:
User-defined function
R allows us to create our own function in our program. A user defines a user-define
function to fulfill the requirement of user. Once these functions are created, we can
use these functions like in-built function.

1. # Creating a function without an argument.


2. new.function <- function() {
3. for(i in 1:5) {
4. print(i^2)
5. }
6. }
7.
8. new.function()

Output:

Function calling with an argument


We can easily call a function by passing an appropriate argument in the function. Let
see an example to see how a function is called.

1. # Creating a function to print squares of numbers in sequence.


2. new.function <- function(a) {
3. for(i in 1:a) {
4. b <- i^2
5. print(b)
6. }
7.
8. # Calling the function new.function supplying 10 as an argument.
9. new.function(10)

Output:

Function calling with no argument


In R, we can call a function without an argument in the following way

1. # Creating a function to print squares of numbers in sequence.


2. new.function <- function() {
3. for(i in 1:5) {
4. a <- i^2
5. print(a)
6. }
7. }
8.
9. # Calling the function new.function with no argument.
10. new.function()
Output:

Function calling with Argument Values


We can supply the arguments to a function call in the same sequence as defined in
the function or can supply in a different sequence but assigned them to the names of
the arguments.

1. # Creating a function with arguments.


2. new.function <- function(x,y,z) {
3. result <- x * y + z
4. print(result)
5. }
6.
7. # Calling the function by position of arguments.
8. new.function(11,13,9)
9.
10. # Calling the function by names of the arguments.
11. new.function(x = 2, y = 5, z = 3)

Output:
Function calling with default arguments
To get the default result, we assign the value to the arguments in the function
definition, and then we call the function without supplying argument. If we pass any
argument in the function call, then it will get replaced with the default value of the
argument in the function definition.

1. # Creating a function with arguments.


2. new.function <- function(x = 11, y = 24) {
3. result <- x * y
4. print(result)
5. }
6.
7. # Calling the function without giving any argument.
8. new.function()
9.
10. # Calling the function with giving new values of the argument.
11. new.function(4,6)

Output:

R Built-in Functions
The functions which are already created or defined in the programming framework
are known as a built-in function. R has a rich set of functions that can be used to
perform almost every task for the user. These built-in functions are divided into the
following categories based on their functionality.
Math Functions
R provides the various mathematical functions to perform the mathematical
calculation. These mathematical functions are very helpful to find absolute value,
square value and much more calculations. In R, there are the following functions
which are used:

S. Function Description Example


No

1. abs(x) It returns the absolute value of input x. x<- -4


print(abs(x))
Output
[1] 4

2. sqrt(x) It returns the square root of input x. x<- 4


print(sqrt(x))
Output
[1] 2

3. ceiling(x) It returns the smallest integer which is larger than x<- 4.5
print(ceiling(x))
or equal to x.
Output
[1] 5
4. floor(x) It returns the largest integer, which is smaller than x<- 2.5
print(floor(x))
or equal to x.
Output
[1] 2

5. trunc(x) It returns the truncate value of input x. x<- c(1.2,2.5,8.1)


print(trunc(x))
Output
[1] 1 2 8

6. round(x, It returns round value of input x. x<- 4.34567


print(round(x,digits=2))
digits=n)
Output
4.35

7. cos(x), sin(x), It returns cos(x), sin(x) value of input x. x<- 4


print(cos(x))
tan(x)
print(sin(x))
print(tan(x))
Output
[1] -06536436
[2] -0.7568025
[3] 1.157821

8. log(x) It returns natural logarithm of input x. x<- 4


print(log(x))
Output
[1] 1.386294

9. log10(x) It returns common logarithm of input x. x<- 4


print(log10(x))
Output
[1] 0.60206

10. exp(x) It returns exponent. x<- 4


print(exp(x))
Output
[1] 54.59815

String Function
R provides various string functions to perform tasks. These string functions allow us
to extract sub string from string, search pattern etc. There are the following string
functions in R:

S. Function Description Example


No

1. substr(x, start=n1,stop=n2) It is used to extract substrings a <- "987654321"


substr(a, 3, 3)
in a character vector.
Output
[1] "3"

2. grep(pattern, x , It searches for pattern in x. st1 <- c('abcd','bdcd','abc


pattern<- '^abc'
ignore.case=FALSE, fixed=FALSE)
print(grep(pattern, st1))
Output
[1] 1 3

3. sub(pattern, replacement, x, It finds pattern in x and st1<- "England is beautif


the part of EU"
ignore.case =FALSE, fixed=FALSE) replaces it with replacement
sub("England', "UK", st1)
(new) text.
Output
[1] "UK is beautiful but
of EU"

4. paste(..., sep="") It concatenates strings after paste('one',2,'three',4,'fi


using sep string to separate Output
them. [1] one 2 three 4 five

5. strsplit(x, split) It splits the elements of a<-"Split all the character


print(strsplit(a, ""))
character vector x at split point.
Output
[[1]]
[1] "split" "all"
"character"

6. tolower(x) It is used to convert the string st1<- "shuBHAm"


print(tolower(st1))
into lower case.
Output
[1] shubham

7. toupper(x) It is used to convert the string st1<- "shuBHAm"


print(toupper(st1))
into upper case.
Output
[1] SHUBHAM

Statistical Probability Functions


R provides various statistical probability functions to perform statistical task. These
statistical functions are very helpful to find normal density, normal quantile and many
more calculation. In R, there are following functions which are used:
00:00/08:14

S. Function Description Example


No

1. dnorm(x, m=0, sd=1, log=False) It is used to find the height of the probability distribution a <- seq(
b <- dnor
at each point to a given mean and standard deviation
png(file=
plot(x,y)
dev.off()

2. pnorm(q, m=0, sd=1, it is used to find the probability of a normally distributed a <- seq(
b <- dnor
lower.tail=TRUE, log.p=FALSE) random numbers which are less than the value of a given
png(file=
number. plot(x,y)
dev.off()

3. qnorm(p, m=0, sd=1) It is used to find a number whose cumulative value a <- seq(
b <- qnor
matches with the probability value.
png(file=
plot(x,y)
dev.off()

4. rnorm(n, m=0, sd=1) It is used to generate random numbers whose distribution y <- rnor
png(file=
is normal.
hist(y, m
dev.off()

5. dbinom(x, size, prob) It is used to find the probability density distribution at each a<-seq(0,
b<- dbino
point.
png(file=
plot(x,y)
dev.off()

6. pbinom(q, size, prob) It is used to find the cumulative probability (a single value a <- pbin
print(a)
representing the probability) of an event.
Output
[1] 0.95

7. qbinom(p, size, prob) It is used to find a number whose cumulative value a <- qbin
print(a)
matches the probability value.
Output
[1] 18

8. rbinom(n, size, prob) It is used to generate required number of random values a <- rbin
of a given probability from a given sample. print(a)
Output
[1] 55

9. dpois(x, lamba) it is the probability of x successes in a period when the dpois(a=2


lambda=3)
expected number of events is lambda (λ)
Output
[1] 0.61

10. ppois(q, lamba) It is a cumulative probability of less than or equal to q ppois(q=4


ppois(q=1
successes.
Output
[1] 0.64

11. rpois(n, lamba) It is used to generate random numbers from the poisson rpois(10,
[1] 6 1
distribution.

12. dunif(x, min=0, max=1) This function provide information about the uniform dunif(x,
distribution on the interval from min to max. It gives the
density.

13. punif(q, min=0, max=1) It gives the distributed function punif(q,


log.p=FAL

14. qunif(p, min=0, max=1) It gives the quantile function. qunif(p,


log.p=FAL

15. runif(x, min=0, max=1) It generates random deviates. runif(x,

Other Statistical Function


Apart from the functions mentioned above, there are some other useful functions
which helps for statistical purpose. There are the following functions:

S. Function Description Example


No

1. mean(x, trim=0, It is used to find the mean for x object a<-c(0:10, 40)
xm<-mean(a)
na.rm=FALSE)
print(xm)
Output
[1] 7.916667

2. sd(x) It returns standard deviation of an object. a<-c(0:10, 40)


xm<-sd(a)
print(xm)
Output
[1] 10.58694

3. median(x) It returns median. a<-c(0:10, 40)


xm<-meadian(a)
print(xm)
Output
[1] 5.5

4. quantilie(x, probs) It returns quantile where x is the numeric


vector whose quantiles are desired and probs
is a numeric vector with probabilities in [0, 1]

5. range(x) It returns range. a<-c(0:10, 40)


xm<-range(a)
print(xm)
Output
[1] 0 40

6. sum(x) It returns sum. a<-c(0:10, 40)


xm<-sum(a)
print(xm)
Output
[1] 95

7. diff(x, lag=1) It returns differences with lag indicating a<-c(0:10, 40)


xm<-diff(a)
which lag to use.
print(xm)
Output
[1] 1 1 1 1 1 1 1
1 1 1 30

8. min(x) It returns minimum value. a<-c(0:10, 40)


xm<-min(a)
print(xm)
Output
[1] 0

9. max(x) It returns maximum value a<-c(0:10, 40)


xm<-max(a)
print(xm)
Output
[1] 40

10. scale(x, Column center or standardize a matrix. a <- matrix(1:9,3,3)


scale(x)
center=TRUE,
Output
scale=TRUE)
[,1]
[1,] -0.747776547
[2,] -0.653320562
[3,] -0.558864577
[4,] -0.464408592
[5,] -0.369952608
[6,] -0.275496623
[7,] -0.181040638
[8,] -0.086584653
[9,] 0.007871332
[10,] 0.102327317
[11,] 0.196783302
[12,] 3.030462849
attr(,"scaled:center")
[1] 7.916667
attr(,"scaled:scale")
[1] 10.58694

R Vector
A vector is a basic data structure which plays an important role in R programming.

In R, a sequence of elements which share the same data type is known as vector. A
vector supports logical, integer, double, character, complex, or raw data type. The
elements which are contained in vector known as components of the vector. We can
check the type of vector with the help of the typeof() function.
The length is an important property of a vector. A vector length is basically the
number of elements in the vector, and it is calculated with the help of the length()
function.

Vector is classified into two parts, i.e., Atomic vectors and Lists. They have three
common properties, i.e., function type, function length, and attribute function.

44.5M

722

Hello Java Program for Beginners


There is only one difference between atomic vectors and lists. In an atomic vector, all
the elements are of the same type, but in the list, the elements are of different data
types. In this section, we will discuss only the atomic vectors. We will discuss lists
briefly in the next topic.

How to create a vector in R?


In R, we use c() function to create a vector. This function returns a one-dimensional
array or simply vector. The c() function is a generic function which combines its
argument. All arguments are restricted with a common data type which is the type of
the returned value. There are various other ways to create a vector in R, which are as
follows:

1) Using the colon(:) operator


We can create a vector with the help of the colon operator. There is the following
syntax to use colon operator:

1. z<-x:y

This operator creates a vector with elements from x to y and assigns it to z.

Example:

1. a<-4:-10
2. a

Output

[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

2) Using the seq() function


In R, we can create a vector with the help of the seq() function. A sequence function
creates a sequence of elements as a vector. The seq() function is used in two ways,
i.e., by setting step size with ?by' parameter or specifying the length of the vector
with the 'length.out' feature.

Example:

1. seq_vec<-seq(1,4,by=0.5)
2. seq_vec
3. class(seq_vec)
Output

[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Example:

1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. class(seq_vec)

Output

[1] 1.0 1.6 2.2 2.8 3.4 4.0


[1] "numeric"

Atomic vectors in R
In R, there are four types of atomic vectors. Atomic vectors play an important role in
Data Science. Atomic vectors are created with the help of c() function. These atomic
vectors are as follows:
Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal
value to any variable d, then this d variable will become a numeric type. A vector
which contains numeric elements is known as a numeric vector.

Example:

1. d<-45.5
2. num_vec<-c(10.1, 10.2, 33.2)
3. d
4. num_vec
5. class(d)
6. class(num_vec)

Output

[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"

Integer vector
A non-fraction numeric value is known as integer data. This integer data is
represented by "Int." The Int size is 2 bytes and long Int size of 4 bytes. There is two
way to assign an integer value to a variable, i.e., by using as.integer() function and
appending of L to the value.

A vector which contains integer elements is known as an integer vector.

Example:

1. d<-as.integer(5)
2. e<-5L
3. int_vec<-c(1,2,3,4,5)
4. int_vec<-as.integer(int_vec)
5. int_vec1<-c(1L,2L,3L,4L,5L)
6. class(d)
7. class(e)
8. class(int_vec)
9. class(int_vec1)

Output

[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"

Character vector
A character is held as a one-byte integer in memory. In R, there are two different
ways to create a character data type value, i.e., using as.character() function and by
typing string between double quotes("") or single quotes('').

A vector which contains character elements is known as an integer vector.

Example:

1. d<-'shubham'
2. e<-"Arpita"
3. f<-65
4. f<-as.character(f)
5. d
6. e
7. f
8. char_vec<-c(1,2,3,4,5)
9. char_vec<-as.character(char_vec)
10. char_vec1<-c("shubham","arpita","nishka","vaishali")
11. char_vec
12. class(d)
13. class(e)
14. class(f)
15. class(char_vec)
16. class(char_vec1)

Output

[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"

Logical vector
The logical data types have only two values i.e., True or False. These values are based
on which condition is satisfied. A vector which contains Boolean values is known as
the logical vector.

Example:

1. d<-as.integer(5)
2. e<-as.integer(6)
3. f<-as.integer(7)
4. g<-d>e
5. h<-e<f
6. g
7. h
8. log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
9. log_vec
10. class(g)
11. class(h)
12. class(log_vec)

Output

[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"

Accessing elements of vectors


We can access the elements of a vector with the help of vector indexing. Indexing
denotes the position where the value in a vector is stored. Indexing will be performed
with the help of integer, character, or logic.
1) Indexing with integer vector
On integer vector, indexing is performed in the same way as we have applied in C,
C++, and java. There is only one difference, i.e., in C, C++, and java the indexing
starts from 0, but in R, the indexing starts from 1. Like other programming languages,
we perform indexing by specifying an integer value in square braces [] next to our
vector.

Example:

1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. seq_vec[2]

Output

[1] 1.0 1.6 2.2 2.8 3.4 4.0


[1] 1.6

2) Indexing with a character vector


In character vector indexing, we assign a unique key to each element of the vector.
These keys are uniquely defined as each element and can be accessed very easily.
Let's see an example to understand how it is performed.

Example:
1. char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)
2. char_vec
3. char_vec["arpita"]

Output

shubhamarpitavaishali
22 23 25
arpita
23

3) Indexing with a logical vector


In logical indexing, it returns the values of those positions whose corresponding
position has a logical vector TRUE. Let see an example to understand how it is
performed on vectors.

Example:

1. a<-c(1,2,3,4,5,6)
2. a[c(TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)]

Output

[1] 1 3 4 6

Vector Operation
In R, there are various operation which is performed on the vector. We can add,
subtract, multiply or divide two or more vectors from each other. In data science, R
plays an important role, and operations are required for data manipulation. There are
the following types of operation which are performed on the vector.

1) Combining vectors
The c() function is not only used to create a vector, but also it is also used to combine
two vectors. By combining one or more vectors, it forms a new vector which contains
all the elements of each vector. Let see an example to see how c() function combines
the vectors.

Example:

1. p<-c(1,2,4,5,7,8)
2. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
3. r<-c(p,q)

Output

[1] "1" "2" "4" "5" "7" "8"


[7] "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"

2) Arithmetic operations
We can perform all the arithmetic operation on vectors. The arithmetic operations
are performed member-by-member on vectors. We can add, subtract, multiply, or
divide two vectors. Let see an example to understand how arithmetic operations are
performed on vectors.

Example:
1. a<-c(1,3,5,7)
2. b<-c(2,4,6,8)
3. a+b
4. a-b
5. a/b
6. a%%b

Output

[1] 3 7 11 15
[1] -1 -1 -1 -1
[1] 2 12 30 56
[1] 0.5000000 0.7500000 0.8333333 0.8750000
[1] 1 3 5 7

3) Logical Index vector


With the help of the logical index vector in R, we can form a new vector from a given
vector. This vector has the same length as the original vector. The vector members
are TRUE only when the corresponding members of the original vector are included
in the slice; otherwise, it will be false. Let see an example to understand how a new
vector is formed with the help of logical index vector.

Example:

1. a<-c("Shubham","Arpita","Nishka","Vaishali","Sumit","Gunjan")
2. b<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
3. a[b]

Output

[1] "Shubham" "Nishka" "Vaishali"

4) Numeric Index
In R, we specify the index between square braces [ ] for indexing a numerical value. If
our index is negative, it will return us all the values except for the index which we
have specified. For example, specifying [-3] will prompt R to convert -3 into its
absolute value and then search for the value which occupies that index.

Example:

1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. q[2]
3. q[-4]
4. q[15]

Output

[1] "arpita"
[1] "shubham" "arpita" "nishka" "vaishali" "sumit"
[1] NA

5) Duplicate Index
An index vector allows duplicate values which means we can access one element
twice in one operation. Let see an example to understand how duplicate index works.

Example:

1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. q[c(2,4,4,3)]

Output

[1] "arpita" "gunjan" "gunjan" "nishka"

6) Range Indexes
Range index is used to slice our vector to form a new vector. For slicing, we used
colon(:) operator. Range indexes are very helpful for the situation involving a large
operator. Let see an example to understand how slicing is done with the help of the
colon operator to form a new vector.

Example:

1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. b<-q[2:5]
3. b

Output

[1] "arpita" "nishka" "gunjan" "vaishali"

7) Out-of-order Indexes
In R, the index vector can be out-of-order. Below is an example in which a vector
slice with the order of first and second values reversed.
Example:

1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")b<-q[2:5]
2. q[c(2,1,3,4,5,6)]

Output

[1] "arpita" "shubham" "nishka" "gunjan" "vaishali" "sumit"

8) Named vectors members


We first create our vector of characters as:

1. z=c("TensorFlow","PyTorch")
2. z

Output

[1] "TensorFlow" "PyTorch"

Once our vector of characters is created, we name the first vector member as "Start"
and the second member as "End" as:

1. names(z)=c("Start","End")
2. z

Output

Start End
"TensorFlow" "PyTorch"

We retrieve the first member by its name as follows:

1. z["Start"]

Output

Start
"TensorFlow"

We can reverse the order with the help of the character string index vector.

1. z[c("Second","First")]

Output
Second First
"PyTorch" "TensorFlow"

Applications of vectors
1. In machine learning for principal component analysis vectors are used. They are
extended to eigenvalues and eigenvector and then used for performing
decomposition in vector spaces.
2. The inputs which are provided to the deep learning model are in the form of vectors.
These vectors consist of standardized data which is supplied to the input layer of the
neural network.
3. In the development of support vector machine algorithms, vectors are used.
4. Vector operations are utilized in neural networks for various operations like image
recognition and text processing.

R Lists
In R, lists are the second type of vector. Lists are the objects of R which contain
elements of different types such as number, vectors, string and another list inside it.
It can also contain a function or a matrix as its elements. A list is a data structure
which has components of mixed data types. We can say, a list is a generic vector
which contains other objects.

Example

1. vec <- c(3,4,5,6)


2. char_vec<-c("shubham","nishka","gunjan","sumit")
3. logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
4. out_list<-list(vec,char_vec,logic_vec)
5. out_list

Output:

[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE
Lists creation
The process of creating a list is the same as a vector. In R, the vector is created with
the help of c() function. Like c() function, there is another function, i.e., list() which is
used to create a list in R. A list avoid the drawback of the vector which is data type.
We can add the elements in the list of different data types.

Syntax

6M

Microsoft, Activision Blizzard and More Join Others in Suspending Russian Sales

1. list()

Example 1: Creating list with same data type

1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1
6. list_2
7. list_3
8. list_4

Output:

[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3

[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"

[[1]]
[1] 1 2 3

[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE

Example 2: Creating the list with different data type

1. list_data<-list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,22.5,12L)
2. print(list_data)

In the above example, the list function will create a list with character, logical,
numeric, and vector element. It will give the following output

Output:

[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] 1 2 3 4 5
[[4]]
[1] TRUE
[[5]]
[1] FALSE
[[6]]
[1] 22.5
[[7]]
[1] 12

Giving a name to list elements


R provides a very easy way for accessing elements, i.e., by giving the name to each
element of a list. By assigning names to the elements, we can access the element
easily. There are only three steps to print the list data corresponding to the name:

1. Creating a list.
2. Assign a name to the list elements with the help of names() function.
3. Print the list data.

Let see an example to understand how we can give the names to the list elements.

Example

1. # Creating a list containing a vector, a matrix and a list.


2. list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow = 2
),
3. list("BCA","MCA","B.tech"))
4.
5. # Giving names to the elements in the list.
6. names(list_data) <- c("Students", "Marks", "Course")
7.
8. # Show the list.
9. print(list_data)

Output:

$Students
[1] "Shubham" "Nishka" "Gunjan"

$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80

$Course
$Course[[1]]
[1] "BCA"

$Course[[2]]
[1] "MCA"

$Course[[3]]
[1] "B. tech."
Accessing List Elements
R provides two ways through which we can access the elements of a list. First one is
the indexing method performed in the same way as a vector. In the second one, we
can access the elements of a list with the help of names. It will be possible only with
the named list.; we cannot access the elements of a list using names if the list is
normal.

Let see an example of both methods to understand how they are used in the list to
access elements.

Example 1: Accessing elements using index

1. # Creating a list containing a vector, a matrix and a list.


2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),

3. list("BCA","MCA","B.tech"))
4. # Accessing the first element of the list.
5. print(list_data[1])
6.
7. # Accessing the third element. The third element is also a list, so all its elemen
ts will be printed.
8. print(list_data[3])

Output:

[[1]]
[1] "Shubham" "Arpita" "Nishka"

[[1]]
[[1]][[1]]
[1] "BCA"

[[1]][[2]]
[1] "MCA"

[[1]][[3]]
[1] "B.tech"

Example 2: Accessing elements using names

1. # Creating a list containing a vector, a matrix and a list.


2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),
list("BCA","MCA","B.tech"))
3. # Giving names to the elements in the list.
4. names(list_data) <- c("Student", "Marks", "Course")
5. # Accessing the first element of the list.
6. print(list_data["Student"])
7. print(list_data$Marks)
8. print(list_data)

Output:

$Student
[1] "Shubham" "Arpita" "Nishka"

[,1] [,2] [,3]


[1,] 40 60 90
[2,] 80 70 80

$Student
[1] "Shubham" "Arpita" "Nishka"

$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80

$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."

Manipulation of list elements


R allows us to add, delete, or update elements in the list. We can update an element
of a list from anywhere, but elements can add or delete only at the end of the list. To
remove an element from a specified index, we will assign it a null value. We can
update the element of a list by overriding it from the new value. Let see an example
to understand how we can add, delete, or update the elements in the list.

Example

1. # Creating a list containing a vector, a matrix and a list.


2. list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),

3. list("BCA","MCA","B.tech"))
4.
5. # Giving names to the elements in the list.
6. names(list_data) <- c("Student", "Marks", "Course")
7.
8. # Adding element at the end of the list.
9. list_data[4] <- "Moradabad"
10. print(list_data[4])
11.
12. # Removing the last element.
13. list_data[4] <- NULL
14.
15. # Printing the 4th Element.
16. print(list_data[4])
17.
18. # Updating the 3rd Element.
19. list_data[3] <- "Masters of computer applications"
20. print(list_data[3])

Output:

[[1]]
[1] "Moradabad"

$<NA>
NULL

$Course
[1] "Masters of computer applications"

Converting list to vector


There is a drawback with the list, i.e., we cannot perform all the arithmetic operations
on list elements. To remove this, drawback R provides unlist() function. This function
converts the list into vectors. In some cases, it is required to convert a list into a
vector so that we can use the elements of the vector for further manipulation.

The unlist() function takes the list as a parameter and change into a vector. Let see an
example to understand how to unlist() function is used in R.

Example

1. # Creating lists.
2. list1 <- list(10:20)
3. print(list1)
4.
5. list2 <-list(5:14)
6. print(list2)
7.
8. # Converting the lists to vectors.
9. v1 <- unlist(list1)
10. v2 <- unlist(list2)
11.
12. print(v1)
13. print(v2)
14.
15. adding the vectors
16. result <- v1+v2
17. print(result)

Output:

[[1]]
[1] 1 2 3 4 5

[[1]]
[1] 10 11 12 13 14

[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19
Merging Lists
R allows us to merge one or more lists into one list. Merging is done with the help of
the list() function also. To merge the lists, we have to pass all the lists into list
function as a parameter, and it returns a list which contains all the elements which
are present in the lists. Let see an example to understand how the merging process is
done.

Example

1. # Creating two lists.


2. Even_list <- list(2,4,6,8,10)
3. Odd_list <- list(1,3,5,7,9)
4.
5. # Merging the two lists.
6. merged.list <- list(Even_list,Odd_list)
7.
8. # Printing the merged list.
9. print(merged.list)

Output:

[[1]]
[[1]][[1]]
[1] 2

[[1]][[2]]
[1] 4

[[1]][[3]]
[1] 6

[[1]][[4]]
[1] 8

[[1]][[5]]
[1] 10

[[2]]
[[2]][[1]]
[1] 1

[[2]][[2]]
[1] 3

[[2]][[3]]
[1] 5

[[2]][[4]]
[1] 7

[[2]][[5]]
[1] 9

R Arrays
In R, arrays are the data objects which allow us to store data in more than two
dimensions. In R, an array is created with the help of the array() function. This array()
function takes a vector as an input and to create an array it uses vectors values in
the dim parameter.

For example- if we will create an array of dimension (2, 3, 4) then it will create 4
rectangular matrices of 2 row and 3 columns.

R Array Syntax
There is the following syntax of R arrays:

1. array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))

data5Mam for Beginners

The data is the first argument in the array() function. It is an input vector which is
given to the array.

matrices

In R, the array consists of multi-dimensional matrices.

row_size

This parameter defines the number of row elements which an array can store.

column_size

This parameter defines the number of columns elements which an array can store.

dim_names

This parameter is used to change the default names of rows and columns.
How to create?
In R, array creation is quite simple. We can easily create an array using vector and
array() function. In array, data is stored in the form of the matrix. There are only two
steps to create a matrix which are as follows

1. In the first step, we will create two vectors of different lengths.


2. Once our vectors are created, we take these vectors as inputs to the array.

Let see an example to understand how we can implement an array with the help of
the vectors and array() function.

Example

1. #Creating two vectors of different lengths


2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4.
5. #Taking these vectors as input to the array
6. res <- array(c(vec1,vec2),dim=c(3,3,2))
7. print(res)

Output

, , 1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
, , 2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

Naming rows and columns


In R, we can give the names to the rows, columns, and matrices of the array. This is
done with the help of the dim name parameter of the array() function.

It is not necessary to give the name to the rows and columns. It is only used to
differentiate the row and column for better understanding.

Below is an example, in which we create two arrays and giving names to the rows,
columns, and matrices.

Example

1. #Creating two vectors of different lengths


2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4.
5. #Initializing names for rows, columns and matrices
6. col_names <- c("Col1","Col2","Col3")
7. row_names <- c("Row1","Row2","Row3")
8. matrix_names <- c("Matrix1","Matrix2")
9.
10. #Taking the vectors as input to the array
11. res <- array(c(vec1,vec2),dim=c(3,3,2),dimnames=list(row_names,col_names,m
atrix_names))
12. print(res)

Output

, , Matrix1

Col1 Col2 Col3


Row1 1 10 13
Row2 3 11 14
Row3 5 12 15

, , Matrix2

Col1 Col2 Col3


Row1 1 10 13
Row2 3 11 14
Row3 5 12 15

Accessing array elements


Like C or C++, we can access the elements of the array. The elements are accessed
with the help of the index. Simply, we can access the elements of the array with the
help of the indexing method. Let see an example to understand how we can access
the elements of the array using the indexing method.

Example

1. , , Matrix1
2. Col1 Col2 Col3
3. Row1 1 10 13
4. Row2 3 11 14
5. Row3 5 12 15
6.
7. , , Matrix2
8. Col1 Col2 Col3
9. Row1 1 10 13
10. Row2 3 11 14
11. Row3 5 12 15
12.
13. Col1 Col2 Col3
14. 5 12 15
15.
16. [1] 13
17.
18. Col1 Col2 Col3
19. Row1 1 10 13
20. Row2 3 11 14
21. Row3 5 12 15

Manipulation of elements
The array is made up matrices in multiple dimensions so that the operations on
elements of an array are carried out by accessing elements of the matrices.

Example

1. #Creating two vectors of different lengths


2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4.
5. #Taking the vectors as input to the array1
6. res1 <- array(c(vec1,vec2),dim=c(3,3,2))
7. print(res1)
8.
9. #Creating two vectors of different lengths
10. vec1 <-c(8,4,7)
11. vec2 <-c(16,73,48,46,36,73)
12.
13. #Taking the vectors as input to the array2
14. res2 <- array(c(vec1,vec2),dim=c(3,3,2))
15. print(res2)
16.
17. #Creating matrices from these arrays
18. mat1 <- res1[,,2]
19. mat2 <- res2[,,2]
20. res3 <- mat1+mat2
21. print(res3)

Output

, , 1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

, , 2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

, , 1
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73

, , 2
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
[,1] [,2] [,3]
[1,] 9 26 59
[2,] 7 84 50
[3,] 12 60 88

Calculations across array elements


For calculation purpose, r provides apply() function. This apply function contains
three parameters i.e., x, margin, and function.

This function takes the array on which we have to perform the calculations. The basic
syntax of the apply() function is as follows:

1. apply(x, margin, fun)

Here, x is an array, and a margin is the name of the dataset which is used and fun is
the function which is to be applied to the elements of the array.

Example

1. #Creating two vectors of different lengths


2. vec1 <-c(1,3,5)
3. vec2 <-c(10,11,12,13,14,15)
4.
5. #Taking the vectors as input to the array1
6. res1 <- array(c(vec1,vec2),dim=c(3,3,2))
7. print(res1)
8.
9. #using apply function
10. result <- apply(res1,c(1),sum)
11. print(result)

Output

, , 1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

, , 2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15

[1] 48 56 64
R Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created
with the help of the vector input to the matrix function. On R matrices, we can
perform addition, subtraction, multiplication, and division operation.

In the R matrix, elements are arranged in a fixed number of rows and columns. The
matrix elements are the real numbers. In R, we use matrix function, which can easily
reproduce the memory representation of the matrix. In the R matrix, all the elements
must share a common basic type.

Example

1. matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)
2. matrix1

Output

[,1] [,2] [,3]


[1,] 11 13 15
[2,] 12 14 16

History of matrices in R
The word "Matrix" is the Latin word for womb which means a place where something
is formed or produced. Two authors of historical importance have used the word
"Matrix" for unusual ways. They proposed this axiom as a means to reduce any
function to one of the lower types so that at the "bottom" (0order) the function is
identical to its extension.

3.8M
Coinbase’s Super Bowl Ad Was So Successful It Crashed the App

Any possible function other than a matrix from the matrix holds true with the help of
the process of generalization. It will be true only when the proposition (which asserts
function in question) is true. It will hold true for all or one of the value of argument
only when the other argument is undetermined.

How to create a matrix in R?


Like vector and list, R provides a function which creates a matrix. R provides the
matrix() function to create a matrix. This function plays an important role in data
analysis. There is the following syntax of the matrix in R:

1. matrix(data, nrow, ncol, byrow, dim_name)

data

The first argument in matrix function is data. It is the input vector which is the data
elements of the matrix.

nrow

The second argument is the number of rows which we want to create in the matrix.

ncol

The third argument is the number of columns which we want to create in the matrix.

byrow

The byrow parameter is a logical clue. If its value is true, then the input vector
elements are arranged by row.

dim_name

The dim_name parameter is the name assigned to the rows and columns.

Let's see an example to understand how matrix function is used to create a matrix
and arrange the elements sequentially by row or column.

Example

1. #Arranging elements sequentially by row.


2. P <- matrix(c(5:16), nrow = 4, byrow = TRUE)
3. print(P)
4.
5. # Arranging elements sequentially by column.
6. Q <- matrix(c(3:14), nrow = 4, byrow = FALSE)
7. print(Q)
8.
9. # Defining the column and row names.
10. row_names = c("row1", "row2", "row3", "row4")
11. ccol_names = c("col1", "col2", "col3")
12.
13. R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_
names))
14. print(R)

Output

[,1] [,2] [,3]


[1,] 5 6 7
[2,] 8 9 10
[3,] 11 12 13
[4,] 14 15 16

[,1] [,2] [,3]


[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14

col1 col2 col3


row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14

Accessing matrix elements in R


Like C and C++, we can easily access the elements of our matrix by using the index
of the element. There are three ways to access the elements from the matrix.

1. We can access the element which presents on nth row and mth column.
2. We can access all the elements of the matrix which are present on the nth row.
3. We can also access all the elements of the matrix which are present on the mth
column.

Let see an example to understand how elements are accessed from the matrix
present on nth row mth column, nth row, or mth column.
Example

1. # Defining the column and row names.


2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4. #Creating matrix
5. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_
names))
6. print(R)
7.
8. #Accessing element present on 3rd row and 2nd column
9. print(R[3,2])
10.
11. #Accessing element present in 3rd row
12. print(R[3,])
13.
14. #Accessing element present in 2nd column
15. print(R[,2])

Output

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

[1] 12

col1 col2 col3


11 12 13

row1 row2 row3 row4


6 9 12 15

Modification of the matrix


R allows us to do modification in the matrix. There are several methods to do
modification in the matrix, which are as follows:
Assign a single element
In matrix modification, the first method is to assign a single element to the matrix at
a particular position. By assigning a new value to that position, the old value will get
replaced with the new one. This modification technique is quite simple to perform
matrix modification. The basic syntax for it is as follows:

1. matrix[n, m]<-y

Here, n and m are the rows and columns of the element, respectively. And, y is the
value which we assign to modify our matrix.

Let see an example to understand how modification will be done:

Example

1. # Defining the column and row names.


2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4.
5. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_
names))
6. print(R)
7.
8. #Assigning value 20 to the element at 3d roe and 2nd column
9. R[3,2]<-20
10. print(R)

Output

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 20 13
row4 14 15 16

Use of Relational Operator


R provides another way to perform matrix medication. In this method, we used some
relational operators like >, <, ==. Like the first method, the second method is quite
simple to use. Let see an example to understand how this method modifies the
matrix.

Example 1

1. # Defining the column and row names.


2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4.
5. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_
names))
6. print(R)
7.
8. #Replacing element that equal to the 12
9. R[R==12]<-0
10. print(R)

Output

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 0 13
row4 14 15 16

Example 2

1. # Defining the column and row names.


2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4.
5. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_
names))
6. print(R)
7.
8. #Replacing elements whose values are greater than 12
9. R[R>12]<-0
10. print(R)

Output

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 0
row4 0 0 0

Addition of Rows and Columns


The third method of matrix modification is through the addition of rows and columns
using the cbind() and rbind() function. The cbind() and rbind() function are used to
add a column and a row respectively. Let see an example to understand the working
of cbind() and rbind() functions.

Example 1

1. # Defining the column and row names.


2. row_names = c("row1", "row2", "row3", "row4")
3. ccol_names = c("col1", "col2", "col3")
4.
5. R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_
names))
6. print(R)
7.
8. #Adding row
9. rbind(R,c(17,18,19))
10.
11. #Adding column
12. cbind(R,c(17,18,19,20))
13.
14. #transpose of the matrix using the t() function:
15. t(R)
16.
17. #Modifying the dimension of the matrix using the dim() function
18. dim(R)<-c(1,12)
19. print(R)

Output

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16
17 18 19

col1 col2 col3


row1 5 6 7 17
row2 8 9 10 18
row3 11 12 13 19
row4 14 15 16 20

row1 row2 row3 row4


col1 5 8 11 14
col2 6 9 12 15
col3 7 10 13 16

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 5 8 11 14 6 9 12 15 7 10 13 16
Matrix operations
In R, we can perform the mathematical operations on a matrix such as addition,
subtraction, multiplication, etc. For performing the mathematical operation on the
matrix, it is required that both the matrix should have the same dimensions.

Let see an example to understand how mathematical operations are performed on


the matrix.

Example 1

1. R <- matrix(c(5:16), nrow = 4,ncol=3)


2. S <- matrix(c(1:12), nrow = 4,ncol=3)
3.
4. #Addition
5. sum<-R+S
6. print(sum)
7.
8. #Subtraction
9. sub<-R-S
10. print(sub)
11.
12. #Multiplication
13. mul<-R*S
14. print(mul)
15.
16. #Multiplication by constant
17. mul1<-R*12
18. print(mul1)
19.
20. #Division
21. div<-R/S
22. print(div)

Output

[,1] [,2] [,3]


[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28

[,1] [,2] [,3]


[1,] 4 4 4
[2,] 4 4 4
[3,] 4 4 4
[4,] 4 4 4

[,1] [,2] [,3]


[1,] 5 45 117
[2,] 12 60 140
[3,] 21 77 165
[4,] 32 96 192

[,1] [,2] [,3]


[1,] 60 108 156
[2,] 72 120 168
[3,] 84 132 180
[4,] 96 144 192

[,1] [,2] [,3]


[1,] 5.000000 1.800000 1.444444
[2,] 3.000000 1.666667 1.400000
[3,] 2.333333 1.571429 1.363636
[4,] 2.000000 1.500000 1.333333

Applications of matrix
1. In geology, Matrices takes surveys and plot graphs, statistics, and used to study in
different fields.
2. Matrix is the representation method which helps in plotting common survey things.
3. In robotics and automation, Matrices have the topmost elements for the robot
movements.
4. Matrices are mainly used in calculating the gross domestic products in Economics,
and it also helps in calculating the capability of goods and products.
5. In computer-based application, matrices play a crucial role in the creation of realistic
seeming motion.

R Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column
contains values of one variable, and rows contains one set of values from each
column. A data frame is a special case of the list in which each component has equal
length.

A data frame is used to store data table and the vectors which are present in the
form of a list in a data frame, are of equal length.

In a simple way, it is a list of equal length vectors. A matrix can contain one type of
data, but a data frame can contain different data types such as numeric, character,
factor, etc.

There are following characteristics of a data frame.

Keep Watching

o The columns name should be non-empty.


o The rows name should be unique.
o The data which is stored in a data frame can be a factor, numeric, or character type.
o Each column contains the same number of data items.
How to create Data Frame
In R, the data frames are created with the help of frame() function of data. This
function contains the vectors of any type such as numeric, character, or integer. In
below example, we create a data frame that contains employee id (integer vector),
employee name(character vector), salary(numeric vector), and starting date(Date
vector).

Example

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,915.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Printing the data frame.
12. print(emp.data)

Output

employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita915.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27

Getting the structure of R Data Frame


In R, we can find the structure of our data frame. R provides an in-build function
called str() which returns the data with its complete structure. In below example, we
have created a frame using a vector of different data type and extracted the structure
of it.

Example

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Printing the structure of data frame.
12. str(emp.data)

Output

'data.frame': 5 obs. of 4 variables:


$ employee_id : int 1 2 3 4 5
$ employee_name: chr "Shubham" "Arpita" "Nishka" "Gunjan" ...
$ sal :num 623 515 611 729 843
$ starting_date: Date, format: "2012-01-01" "2013-09-23" ...
Extracting data from Data Frame
The data of the data frame is very crucial for us. To manipulate the data of the data
frame, it is essential to extract it from the data frame. We can extract the data in
three ways which are as follows:

1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.

Let's see an example of each one to understand how data is extracted from the data
frame with the help these ways.

Extracting the specific columns from a data frame


Example

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name= c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Extracting specific columns from a data frame
12. final <- data.frame(emp.data$employee_id,emp.data$sal)
13. print(final)

Output

emp.data.employee_idemp.data.sal
1 1 623.30
2 2 515.20
3 3 611.00
4 4 729.00
5 5 843.25

Extracting the specific rows from a data frame


Example

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Extracting first row from a data frame
12. final <- emp.data[1,]
13. print(final)
14.
15.
16. # Extracting last two row from a data frame
17. final <- emp.data[4:5,]
18. print(final)

Output

employee_idemployee_namesalstarting_date
1 1 Shubham 623.3 2012-01-01

employee_idemployee_namesalstarting_date
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27

Extracting specific rows corresponding to specific


columns
Example

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. # Extracting 2nd and 3rd row corresponding to the 1st and 4th column
12. final <- emp.data[c(2,3),c(1,4)]
13. print(final)

Output

employee_idstarting_date
2 2 2013-09-23
3 3 2014-11-15

Modification in Data Frame


R allows us to do modification in our data frame. Like matrices modification, we can
modify our data frame through re-assignment. We cannot only add rows and
columns, but also we can delete them. The data frame is expanded by adding rows
and columns.

We can

1. Add a column by adding a column vector with the help of a new column name using
cbind() function.
2. Add rows by adding new rows in the same structure as the existing data frame and
using rbind() function
3. Delete the columns by assigning a NULL value to them.
4. Delete the rows by re-assignment to them.

Let's see an example to understand how rbind() function works and how the
modification is done in our data frame.

Example: Adding rows and columns

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. print(emp.data)
12.
13. #Adding row in the data frame
14. x <- list(6,"Vaishali",547,"2015-09-01")
15. rbind(emp.data,x)
16.
17. #Adding column in the data frame
18. y <- c("Moradabad","Lucknow","Etah","Sambhal","Khurja")
19. cbind(emp.data,Address=y)

Output

employee_idemployee_namesalstarting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
employee_idemployee_namesalstarting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
6 6 Vaishali 547.00 2015-09-01
employee_idemployee_namesalstarting_date Address
1 1 Shubham 623.30 2012-01-01
Moradabad
2 2 Arpita 515.20 2013-09-23
Lucknow
3 3 Nishka 611.00 2014-11-15 Etah
4 4 Gunjan 729.00 2014-05-11
Sambhal
5 5 Sumit 843.25 2015-03-27 Khurja

Example: Delete rows and columns

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. print(emp.data)
12.
13. #Delete rows from data frame
14. emp.data<-emp.data[-1,]
15. print(emp.data)
16.
17. #Delete column from the data frame
18. emp.data$starting_date<-NULL
19. print(emp.data)

Output

employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesalstarting_date
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesal
1 1 Shubham623.30
2 2 Arpita515.20
3 3 Nishka611.00
4 4 Gunjan729.00
5 5 Sumit843.25

Summary of data in Data Frames


In some cases, it is required to find the statistical summary and nature of the data in
the data frame. R provides the summary() function to extract the statistical summary
and nature of the data. This function takes the data frame as a parameter and returns
the statistical information of the data. Let?s see an example to understand how this
function is used in R:
Example

1. # Creating the data frame.


2. emp.data<- data.frame(
3. employee_id = c (1:5),
4. employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
5. sal = c(623.3,515.2,611.0,729.0,843.25),
6.
7. starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-
05-11",
8. "2015-03-27")),
9. stringsAsFactors = FALSE
10. )
11. print(emp.data)
12.
13. #Printing the summary
14. print(summary(emp.data))

Output

employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27

employee_idemployee_namesalstarting_date
Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27

R factors
The factor is a data structure which is used for fields which take only predefined finite
number of values. These are the variable which takes a limited number of different
values. These are the data objects which are used to categorize the data and to store
it on multiple levels. It can store both integers and strings values, and are useful in
the column that has a limited number of unique values.
Factors have labels which are associated with the unique integers stored in it. It
contains predefined set value known as levels and by default R always sorts levels in
alphabetical order.

Attributes of a factor
There are the following attributes of a factor in R
0. X
It is the input vector which is to be transformed into a factor.
a. levels
It is an input vector that represents a set of unique values which are taken by x.
b. labels
It is a character vector which corresponds to the number of labels.
c. Exclude
It is used to specify the value which we want to be excluded,
d. ordered
It is a logical attribute which determines if the levels are ordered.
e. nmax
It is used to specify the upper bound for the maximum number of level.

How to create a factor?


In R, it is quite simple to create a factor. A factor is created in two steps

1. In the first step, we create a vector.


2. Next step is to convert the vector into a factor,
R provides factor() function to convert the vector into factor. There is the following
syntax of factor() function

1. factor_data<- factor(vector)

Let's see an example to understand how factor function is used.

Example

1. # Creating a vector as input.


2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubha
m","Sumit","Arpita","Sumit")
3.
4. print(data)
5. print(is.factor(data))
6.
7. # Applying the factor function.
8. factor_data<- factor(data)
9.
10. print(factor_data)
11. print(is.factor(factor_data))

Output

[1] "Shubham" "Nishka" "Arpita" "Nishka" "Shubham" "Sumit" "Nishka"


[8] "Shubham" "Sumit" "Arpita" "Sumit"
[1] FALSE
[1] Shubham Nishka Arpita Nishka Shubham SumitNishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] TRUE

Accessing components of factor


Like vectors, we can access the components of factors. The process of accessing
components of factor is much more similar to the vectors. We can access the
element with the help of the indexing method or using logical vectors. Let's see an
example in which we understand the different-different ways of accessing the
components.

Example

1. # Creating a vector as input.


2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubha
m","Sumit","Arpita","Sumit")
3.
4. # Applying the factor function.
5. factor_data<- factor(data)
6.
7. #Printing all elements of factor
8. print(factor_data)
9.
10. #Accessing 4th element of factor
11. print(factor_data[4])
12.
13. #Accessing 5th and 7th element
14. print(factor_data[c(5,7)])
15.
16. #Accessing all elemcent except 4th one
17. print(factor_data[-4])
18.
19. #Accessing elements using logical vector
20. print(factor_data[c(TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRU
E)])

Output

[1] Shubham Nishka Arpita Nishka Shubham SumitNishka Shubham Sumit


[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Nishka
Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka


Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka Arpita Shubham SumitNishka Shubham Sumit Arpita


[10] Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Shubham ShubhamSumitNishkaSumit


Levels: Arpita Nishka Shubham Sumit

Modification of factor
Like data frames, R allows us to modify the factor. We can modify the value of a
factor by simply re-assigning it. In R, we cannot choose values outside of its
predefined levels means we cannot insert value if it's level is not present on it. For
this purpose, we have to create a level of that value, and then we can add it to our
factor.

Let's see an example to understand how the modification is done in factors.

Example

1. # Creating a vector as input.


2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham")
3.
4. # Applying the factor function.
5. factor_data<- factor(data)
6.
7. #Printing all elements of factor
8. print(factor_data)
9.
10. #Change 4th element of factor with sumit
11. factor_data[4] <-"Arpita"
12. print(factor_data)
13.
14. #change 4th element of factor with "Gunjan"
15. factor_data[4] <- "Gunjan" # cannot assign values outside levels
16. print(factor_data)
17.
18. #Adding the value to the level
19. levels(factor_data) <- c(levels(factor_data),"Gunjan")#Adding new level
20. factor_data[4] <- "Gunjan"
21. print(factor_data)

Output

[1] Shubham Nishka Arpita Nishka Shubham


Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Arpita Shubham
Levels: Arpita Nishka Shubham
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = "Gunjan") :
invalid factor level, NA generated
[1] Shubham NishkaArpita Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Gunjan Shubham
Levels: Arpita Nishka Shubham Gunjan
Factor in Data Frame
When we create a frame with a column of text data, R treats this text column as
categorical data and creates factor on it.

Example

1. # Creating the vectors for data frame.


2. height <- c(132,162,152,166,139,147,122)
3. weight <- c(40,49,48,40,67,52,53)
4. gender <- c("male","male","female","female","male","female","male")
5.
6. # Creating the data frame.
7. input_data<- data.frame(height,weight,gender)
8. print(input_data)
9.
10. # Testing if the gender column is a factor.
11. print(is.factor(input_data$gender))
12.
13. # Printing the gender column to see the levels.
14. print(input_data$gender)

Output

height weight gender


1 132 40 male
2 162 49 male
3 152 48 female
4 166 40 female
5 139 67 male
6 147 52 female
7 122 53 male
[1] TRUE
[1] male male female female male female male
Levels: female male

Changing order of the levels


In R, we can change the order of the levels in the factor with the help of the factor
function.

Example
1. data <- c("Nishka","Gunjan","Shubham","Arpita","Arpita","Sumit","Gunjan","Sh
ubham")
2. # Creating the factors
3. factor_data<- factor(data)
4. print(factor_data)
5.
6. # Apply the factor function with the required order of the level.
7. new_order_factor<- factor(factor_data,levels = c("Gunjan","Nishka","Arpita","S
hubham","Sumit"))
8. print(new_order_factor)

Output

[1] Nishka Gunjan Shubham Arpita ArpitaSumit Gunjan Shubham


Levels: Arpita Gunjan Nishka Shubham Sumit
[1] Nishka Gunjan Shubham Arpita ArpitaSumit Gunjan Shubham
Levels: Gunjan Nishka Arpita Shubham Sumit

Generating Factor Levels


R provides gl() function to generate factor levels. This function takes three arguments
i.e., n, k, and labels. Here, n and k are the integers which indicate how many levels we
want and how many times each level is required.

There is the following syntax of gl() function which is as follows

1. gl(n, k, labels)
1. n indicates the number of levels.
2. k indicates the number of replications.
3. labels is a vector of labels for the resulting factor levels.

Example

1. gen_factor<- gl(3,5,labels=c("BCA","MCA","B.Tech"))
2. gen_factor

Output

[1] BCA BCABCABCABCA MCA MCAMCAMCAMCA


[11] B.TechB.TechB.TechB.TechB.Tech
Levels: BCA MCA B.Tech
Data Reshaping in R
In R, Data Reshaping is about changing how the data is organized into rows and
columns. In R, data processing is done by taking the input as a data frame. It is much
easier to extract data from the rows and columns of a data frame, but there is a
problem when we need a data frame in a format which is different from the format in
which we received it. R provides many functions to merge, split, and change the rows
to columns and vice-versa in a data frame.

Transpose a Matrix
R allows us to calculate the transpose of a matrix or a data frame by providing t()
function. This t() function takes the matrix or data frame as an input and return the
transpose of the input matrix or data frame. The syntax of t() function is as follows:

1. t(Matrix/data frame)

Let's see an example to understand how this function is used

Example

42.4M

874
C++ vs Java

1. a <- matrix(c(4:12),nrow=3,byrow=TRUE)
2. a
3. print("Matrix after transpose\n")
4. b <- t(a)
5. b

Output:

Joining rows and columns in Data Frame


R allows us to join multiple vectors to create a data frame. For this purpose R
provides cbind() function. R also provides rbind() function, which allows us to merge
two data frame. In some situation, we need to merge data frames to access the
information which depends on both the data frame. There is the following syntax of
cbind() function and rbind() function.

1. cbind(vector1, vector2,.......vectorN)
2. rbind(dataframe1, dataframe2,........dataframeN)

Let's see an example to understand how cbind() and rbind() function is used.

Example

1. #Creating vector objects


2. Name <- c("Shubham Rastogi","Nishka Jain","Gunjan Garg","Sumit Chaudhary")
3. Address <- c("Moradabad","Etah","Sambhal","Khurja")
4. Marks <- c(255,355,455,655)
5.
6. #Combining vectors into one data frame
7. info <- cbind(Name,Address,Marks)
8.
9. #Printing data frame
10. print(info)
11.
12. # Creating another data frame with similar columns
13. new.stuinfo <- data.frame(
14. Name = c("Deepmala","Arun"),
15. Address = c("Khurja","Moradabad"),
16. Marks = c("755","855"),
17. stringsAsFactors=FALSE
18. )
19.
20. #Printing a header.
21. cat("# # # The Second data frame\n")
22.
23. #Printing the data frame.
24. print(new.stuinfo)
25.
26. # Combining rows form both the data frames.
27. all.info <- rbind(info,new.stuinfo)
28.
29. # Printing a header.
30. cat("# # # The combined data frame\n")
31.
32. # Printing the result.
33. print(all.info)

Output:
Merging Data Frame
R provides the merge() function to merge two data frames. In the merging process,
there is a constraint i.e.; data frames must have the same column names.

Let's take an example in which we take the dataset about Diabetes in Pima Indian
Women which is present in the "MASS" library. We will merge two datasets on the
basis of the value of the blood pressure and body mass index. When selecting these
two columns for merging, the records where values of these two variables match in
both data sets are combined together to form a single data frame.

Example

1. library(MASS)
2. merging_pima<- merge(x = Pima.te, y = Pima.tr,
3. by.x = c("bp", "bmi"),
4. by.y = c("bp", "bmi")
5. )
6. print(merging_pima)
7. nrow(merging_pima)

Output:
Melting and Casting
In R, the most important and interesting topic is about changing the shape of the
data in multiple steps to get the desired shape. For this purpose, R provides melt()
and cast() function. To understand its process, consider a dataset called ships which
is present in the MASS library.

Example

1. library(MASS)
2. print(ships)

Output:
Melt the Data
Now we will use the above data to organize it by melting it. Melting means the
conversion of columns into multiple rows. We will convert all the columns except
type and year of the above dataset into multiple rows.

Example

1. library(MASS)
2. library(reshape2)
3. molten_ships <- melt(ships, id = c("type","year"))
4. print(molten_ships)

Output:
Casting of Molten Data
After melting the data, we can cast it into a new form where the aggregate of each
type of ship for each year is created. For this purpose, R provides cast() function.

Let's starts doing the casting of our molten data.

Example

1. library(MASS)
2. library(reshape2)
3. #Melting the data
4. molten.ships <- melt(ships, id = c("type","year"))
5. print("Molted Data")
6. print(molten.ships)
7. #Casting of data
8. recasted.ship <- dcast(molten.ships, type+year~variable,sum)
9. print("Cast Data")
10. print(recasted.ship)
Output:

What is Object-Oriented
Programming in R?
Object-Oriented Programming (OOP) is the most popular programming language.
With the help of oops concepts, we can construct the modular pieces of code which
are used to build blocks for large systems. R is a functional language, and we can do
programming in oops style. In R, oops is a great tool to manage the complexity of
larger programs.
In Object-Oriented Programming, S3 and S4 are the two important systems.

S3

In oops, the S3 is used to overload any function. So that we can call the functions
with different names and it depends on the type of input parameter or the number
of parameters.

S4

S4 is the most important characteristic of oops. However, this is a limitation, as it is


quite difficult to debug. There is an optional reference class for S4.

Objects and Classes in R


In R, everything is an object. Therefore, programmers perform OOPS concept when
they write code in R. An object is a data structure which has some methods that can
act upon its attributes.

In R, classes are the outline or design for the object. Classes encapsulate the data
members, along with the functions. In R, there are two most important classes, i.e., S3
and S4, which play an important role in performing OOPs concepts.

Let's discuss both the classes one by one with their examples for better
understanding.

1) S3 Class
With the help of the S3 class, we can take advantage of the ability to implement the
generic function OO. Furthermore, using only the first argument, S3 is capable of
dispatching. S3 differs from traditional programming languages such as Java, C ++,
and C #, which implement OO passing messages. This makes S3 easy to implement.
In the S3 class, the generic function calls the method. S3 is very casual and has no
formal definition of classes.

S3 requires very little knowledge from the programmer.

Creating an S3 class

In R, we define a function which will create a class and return the object of the
created class. A list is made with relevant members, class of the list is determined,
and a copy of the list is returned. There is the following syntax to create a class

1. variable_name <- list(member1, member2, member3.........memberN)

Example

1. s <- list(name = "Ram", age = 29, GPA = 4.0)


2. class(s) <- "Faculty"
3. s

Output

There is the following way in which we define our generic function print.
1. print
2. function(x, ...)
3. UseMethod("Print")

When we execute or run the above code, it will give us the following output:

Like print function, we will make a generic function GPA to assign a new value to our
GPA member. In the following way we will make the generic function GPA

1. GPA <- function(obj1){


2. UseMethod("GPA")
3. }

Once our generic function GPA is created, we will implement a default function for it

1. GPA.default <- function(obj){


2. cat("We are entering in generic function\n")
3. }

After that we will make a new method for our GPA function in the following way

1. GPA.faculty <- function(obj1){


2. cat("Final GPA is ",obj1$GPA,"\n")
3. }

And at last we will run the method GPA as


1. GPA(s)

Output

Inheritance in S3

Inheritance means extracting the features of one class into another class. In the S3
class of R, inheritance is achieved by applying the class attribute in a vector.

For inheritance, we first create a function which creates new object of class faculty in
the following way

1. faculty<- function(n,a,g) {
2. value <- list(nname=n, aage=a, GPA=g)
3. attr(value, "class") <- "faculty"
4. value
5. }

After that we will define a method for generic function print() as

1. print.student <- function(obj1) {


2. cat(1obj$name, "\n")
3. cat(1obj$age, "years old\n")
4. cat("GPA:", obj1$GPA, "\n")
5. }

Now, we will create an object of class InternationalFaculty which will inherit from
faculty class. This process will be done by assigning a character vector of class name
as:

1. class(Objet) <- c(child, parent)

so,

1. # create a list
2. fac <- list(name="Shubham", age=22, GPA=3.5, country="India")
3. # make it of the class InternationalFaculty which is derived from the class Facu
lty
4. class(fac) <- c("InternationalFaculty","Faculty")
5. # print it out
6. fac

When we run the above code which we have discussed, it will generate the following
output:

We can see above that, we have not defined any method of form print.
InternationalFaculty (), the method called print.Faculty(). This method of class Faculty
was inherited.

So our next step is to defined print.InternationalFaculty() in the following way:

1. print.InternationalFaculty<- function(obj1) {
2. cat(obj1$name, "is from", obj1$country, "\n")
3. }

The above function will overwrite the method defined for class faculty as

1. Fac
getS3method and getAnywhere function

There are the two most common and popular S3 method functions which are used in
R. The first method is getS3method() and the second one is getAnywhere().

S3 finds the appropriate method associated with a class, and it is useful to see how a
method is implemented. Sometimes, the methods are non-visible, because they are
hidden in a namespace. We use getS3method or getAnywhere to solve this problem.

getS3method
getAnywhere function

1. getAnywhere("simpleloess")

2) S4 Class
The S4 class is similar to the S3 but is more formal than the latter one. It differs from
S3 in two different ways. First, in S4, there are formal class definitions which provide a
description and representation of classes. In addition, it has special auxiliary functions
for defining methods and generics. The S4 also offers multiple dispatches. This
means that common functions are capable of taking methods based on multiple
arguments which are based on class.

Creating an S4 class

In R, we use setClass() command for creating S4 class. In S4 class, we will specify a


function for verifying the data consistency and also specify the default value. In R,
member variables are called slots.

To create an S3 class, we have to define the class and its slots. There are the
following steps to create an S4 class

Step 1:
In the first step, we will create a new class called faculty with three slots name, age,
and GPA.

1. setClass("faculty", slots=list(name="character", age="numeric", GPA="numeric


"))

There are many other optional arguments of setClass() function which we can explore
by using ?setClass command.

Step 2:

In the next step, we will create the object of S4 class. R provides new() function to
create an object of S4 class. In this new function we pass the class name and the
values for the slots in the following way:

1. setClass("faculty", slots=list(name="character", age="numeric", GPA="numeric


"))
2. # creating an object using new()
3. # providing the class name and value for slots
4. s <- new("faculty",name="Shubham", age=22, GPA=3.5)
5. s
It will generate the following output

Creating S4 objects using a generator function

The setClass() function returns a generator function. This generator function helps in
creating new objects. And it acts as a constructor.

1. A <- setClass("faculty", slots=list(name="character", age="numeric", GPA="nu


meric"))
2. A

It will generate the following output:


Now we can use the above constructor function to create new objects. The
constructor in turn uses the new() function to create objects. It is just a wrap around.
Let's see an example to understand how S4 object is created with the help of
generator function.

Example

1. faculty<-setClass("faculty", slots=list(name="character", age="numeric", GP


A="numeric"))
2. # creating an object using generator() function
3. # providing the class name and value for slots
4. faculty(name="Shubham", age=22, GPA=3.5)

Output

Inheritance in S4 class

Like S3 class, we can perform inheritance in S4 class also. The derived class will inherit
both attributes and methods of the parent class. Let's start understanding that how
we can perform inheritance in S4 class. There are the following ways to perform
inheritance in S4 class:

Step 1:

In the first step, we will create or define class with appropriate slots in the following
way:

1. setClass("faculty",
2. slots=list(name="character", age="numeric", GPA="numeric")
3. )

Step 2:

After defining class, our next step is to define class method for the display() generic
function. This will be done in the following manner:

1. setMethod("show",
2. "faculty",
3. function(obj) {
4. cat(obj@name, "\n")
5. cat(obj@age, "years old\n")
6. cat("GPA:", obj@GPA, "\n")
7. }
8. )

Step 3:

In the next step, we will define the derived class with the argument contains. The
derived class is defined in the following way

1. setClass("Internationalfaculty",
2. slots=list(country="character"),
3. contains="faculty"
4. )

In our derived class we have defined only one attribute i.e. country. Other attributes
will be inherited from its parent class.

1. s <- new("Internationalfaculty",name="John", age=21, GPA=3.5, country="Indi


a")
2. show(s)
When we did show(s), the method defines for class faculty gets called. We can also
define methods for the derived class of the base class as in the case of the S3 system.

What is R Debug?
In computer programming, debugging is a multi-step process which involves
identifying a problem, isolating the source of the problem, and then fixing the
problem or determining a way to work around it. The final step of debugging is to
test an improvement or workaround and ensure that it works.

The grammatically correct program may give us incorrect results due to some logical
errors which are known as "bug." In case, if such errors occur, then we need to find
out why and where they have occurred so that we can fix them. The procedure to
identify and fix bugs is called "debugging."

Fundamental principles of Debugging


R programmers find that they spend more time in debugging of a program than
actually writing it or code it. This makes debugging skills less valuable. In R, there are
various principles of debugging which help the programmers to spend their time in
writing and coding rather than in debugging. These principles are as follows:
1. The essence of debugging
Fixing a bugging is a process of confirmation. It gradually confirms that many aspects
we believe to be true about the code are true actually. When it is found that one
such assumption is false, there we found a clue to the bug's location.

31.3M

580

Prime Ministers of India | List of Prime Minister of India (1947-2020)

For example

1. a <- b^2 + 3*c(z, 2)


2. x<- 28
3. if (x+q> 0)
4. t<- 1
5. else
6. u<- -10

2. Start Small
Stick to small, simple test cases, at least at the beginning of the R debug process.
Working with big data objects can make it difficult to think about the problem. Of
course, we should eventually test our code in large, complex cases, but start small.

3. Debug in a Modular
Most professional software developers agree that the code should be written in a modular
manner. Our first-level code should not be too long for a function call. And those functions
should not be too long and should call another function if necessary. This makes the code
easier to write and helps others understand when the time comes to extend the code.

We should debug in a top-down manner. Suppose we have the debug state of our
function f () and it has the below line.

For example

1. Y <- g (x, 8)

Currently, say no to debug (g). Execute the line and see if g () returns the value that
we expect. If this happens, we simply have to avoid the single-step time-consuming
process through g(). If g () returns an incorrect value, now is the time to call debug
(g).

4. Antibugging
If there is a section of a code in which a variable z should be positive then we can
insert the following line for better performance:

Stopifnot(z>0)

When there is a bug in the code like the value of z is equal to -3, then
the Stopifnot() function is called and will bring things right there with an error
message :

Error:x>0 is not TRUE


Functions
In R, for debugging purposes, there are lots of functions available. These functions
play an important role in removing bugs from our code. R provides the following
functions of debugging:

1) traceback()
If our code has already crashed and we want to know where the offensive line is, try
traceback (). This will (sometimes) show the location somewhere in the code of the
problem. When an R function fails, an error is printed on the screen. Immediately
after the error, we can call traceback () to see on which function the error occurred.
The traceback () function prints the list of functions which were called before the
error had occurred. The functions are printed in reverse order.

Let's see an example to understand how we can use the traceback() function

Example

1. f <- function(a){
2. x <- a-ql(a)
3. x
4. }
5. ql<- function(b){
6. r <- b*mn(b)
7. r
8. }
9. mn<- function(p){
10. r <- log(p)
11. if(r<10)
12. r^2
13. else
14. r^3
15. }
16. f(-2)
When we run the above code, it will generate the following output:

After finding the following error we call our traceback() function and when we run, it
will show the following output:

traceback()

2) debug()
In R, debug () function allows the user to step through the execution of a function. At
any point, we can print the values of the variables or draw a graph of the results
within the function. While debugging, we can just type "c" to continue to the end of
the current block of code. Traceback () does not tell us where the function error
occurred. To know which line is causing the error, we have to step through the
function using debug ().

Let's see an example to understand how the debug function is used in R.

Example

1. func<- function(a,value){
2. subt<- value-a
3. squar<- subt^2
4. collect <- sum(squar)
5. collect
6. }
7. set.seed(100)
8. value <- rnorm(100)
9. func(1,value)
10. debug(func)
11. func(1,value)

Output

3) browser()
The browser() function halts the execution of a function until the user allows it to
continue. This is useful if we don't want to step through the complete code, line-by-
line, but we wish to stop it at a certain point so we can check what's going on.

Inserting a call into the browser() in a function will pause the function's execution at
the point where the browser () is called. It is same as using debug (), except that we
can control where the execution gets pause.

Let's see an example to understand how the browser() function is used in R.

Example
1. a<-function(b) {
2. browser() ## a break point inserted here
3. c<-log(b)
4. if(c<10)
5. c^2
6. else
7. c^3
8. }
9. a(-1)

Output

4) trace()
The trace() function call allows the user to insert bits of code into the function. The
syntax for the R debug function trace () is a bit awkward for first-time users. It may be
better to use debug ().

Let's see an example to understand how the browser() function is used in R.

Example
1. f <- function(a){
2. x <- a-ql(a)
3. x
4. }
5. ql<- function(b){
6. r <- b*mn(b)
7. r
8. }
9. mn<- function(p){
10. r <- log(p)
11. if(r<10)
12. r^2
13. else
14. r^3
15. }
16. as.list(body(mn))
17. trace("mn",quote(if(is.nan(r)){browser()}),at=3,print=FALSE)
18. f(1)
19. f(-1)

Output
5) recover()
When we will perform debugging of a function, recover () allows us to examine
variables in an upper-level function.

By typing a number in the selection, we are navigated to the function on the call
stack and deployed in a browser environment.

The recover () function is used as an error handler, set using options () (eg. Adopt
(error = retrieval)).

When a function throws an error, execution is stopped at the point of failure. We can
browse the function call and examine the environment to find the source of the
problem.

Example

1. f <- function(a){
2. x <- a-ql(a)
3. x
4. }
5. ql<- function(b){
6. r <- b*mn(b)
7. r
8. }
9. mn<- function(p){
10. r <- log(p)
11. if(r<10)
12. r^2
13. else
14. r^3
15. }
16. as.list(body(mn))
17. trace("mn",quote(if(is.nan(r)){recover()}),at=3,print=FALSE)
18. f(-1)

Output
Debugging Installed Packages
There is probability of an error stemming by an installed R package. The several ways
by which we can solve our problem are as follows:

o Setting the options ( error = recover) and then it is proceeded line by line by the
code using n.
o In complex situations, we should have a copy of the function code. In R the function
entering is used to print out the function code which can be copied into the text
editor. We can edit this by loading it into the global workspace and then by
performing debugging.
o If our problems are not solved, then we have to download the source code. We can
also use the devtools package and the install(), load_all() functions to make our
procedure quicker.

Error Handling and Recovery


Exception or error handling is a process of response to odd events of code that
interrupts the flow of code. In general, the scope for the exception handler begins
with a try and ends with a catch. R provides the try (), and trycatch () functions for the
same.

The try () function is the wrapper function for trycatch () that prints the error and
then continues. On the other hand, trycatch () gives us control of the error function
and, optionally, also continues the process of the function.

R Data Visualization
In R, we can create visually appealing data visualizations by writing few lines of code.
For this purpose, we use the diverse functionalities of R. Data visualization is an
efficient technique for gaining insight about data through a visual medium. With the
help of visualization techniques, a human can easily obtain information about hidden
patterns in data that might be neglected.

By using the data visualization technique, we can work with large datasets to
efficiently obtain key insights about it.
R Visualization Packages
R provides a series of packages for data visualization. These packages are as follows:

1) plotly

41.7M

941

OOPs Concepts in Java

The plotly package provides online interactive and quality graphs. This package
extends upon the JavaScript library ?plotly.js.

2) ggplot2

R allows us to create graphics declaratively. R provides the ggplot package for this
purpose. This package is famous for its elegant and quality graphs, which sets it apart
from other visualization packages.

3) tidyquant

The tidyquant is a financial package that is used for carrying out quantitative
financial analysis. This package adds under tidyverse universe as a financial package
that is used for importing, analyzing, and visualizing the data.
4) taucharts

Data plays an important role in taucharts. The library provides a declarative interface
for rapid mapping of data fields to visual properties.

5) ggiraph

It is a tool that allows us to create dynamic ggplot graphs. This package allows us to
add tooltips, JavaScript actions, and animations to the graphics.

6) geofacets

This package provides geofaceting functionality for 'ggplot2'. Geofaceting arranges a


sequence of plots for different geographical entities into a grid that preserves some
of the geographical orientation.

7) googleVis

googleVis provides an interface between R and Google's charts tools. With the help
of this package, we can create web pages with interactive charts based on R data
frames.

8) RColorBrewer

This package provides color schemes for maps and other graphics, which are
designed by Cynthia Brewer.

9) dygraphs

The dygraphs package is an R interface to the dygraphs JavaScript charting library. It


provides rich features for charting time-series data in R.

10) shiny

R allows us to develop interactive and aesthetically pleasing web apps by providing


a shiny package. This package provides various extensions with HTML widgets, CSS,
and JavaScript.

R Graphics
Graphics play an important role in carrying out the important features of the data.
Graphics are used to examine marginal distributions, relationships between variables,
and summary of very large data. It is a very important complement for many
statistical and computational techniques.
Standard Graphics
R standard graphics are available through package graphics, include several
functions which provide statistical plots, like:

o Scatterplots
o Piecharts
o Boxplots
o Barplots etc.

We use the above graphs that are typically a single function call.

Graphics Devices
It is something where we can make a plot to appear. A graphics device is a window
on your computer (screen device), a PDF file (file device), a Scalable Vector Graphics
(SVG) file (file device), or a PNG or JPEG file (file device).

There are some of the following points which are essential to understand:

o The functions of graphics devices produce output, which depends on the active
graphics device.
o A screen is the default and most frequently used device.
o R graphical devices such as the PDF device, the JPEG device, etc. are used.
o We just need to open the graphics output device which we want. Therefore, R takes
care of producing the type of output which is required by the device.
o For producing a certain plot on the screen or as a GIF R graphics file, the R code
should exactly be the same. We only need to open the target output device before.
o Several devices can be open at the same time, but there will be only one active
device.

The basics of the grammar of graphics


There are some key elements of a statistical graphic. These elements are the basics of
the grammar of graphics. Let's discuss each of the elements one by one to gain the
basic knowledge of graphics.

1) Data

Data is the most crucial thing which is processed and generates an output.
2) Aesthetic Mappings

Aesthetic mappings are one of the most important elements of a statistical graphic. It
controls the relation between graphics variables and data variables. In a scatter plot,
it also helps to map the temperature variable of a data set into the X variable.

In graphics, it helps to map the species of a plant into the color of dots.

3) Geometric Objects

Geometric objects are used to express each observation by a point using the
aesthetic mappings. It maps two variables in the data set into the x,y variables of the
plot.

4) Statistical Transformations

Statistical transformations allow us to calculate the statistical analysis of the data in


the plot.The statistical transformation uses the data and approximates it with the
help of a regression line having x,y coordinates, and counts occurrences of certain
values.

5) Scales

It is used to map the data values into values present in the coordinate system of the
graphics device.

6) Coordinate system

The coordinate system plays an important role in the plotting of the data.

o Cartesian
o Plot

7) Faceting

Faceting is used to split the data into subgroups and draw sub-graphs for each
group.

Advantages of Data Visualization in R


1. Understanding

It can be more attractive to look at the business. And, it is easier to understand


through graphics and charts than a written document with text and numbers. Thus, it
can attract a wider range of audiences. Also, it promotes the widespread use of
business insights that come to make better decisions.

2. Efficiency

Its applications allow us to display a lot of information in a small space. Although, the
decision-making process in business is inherently complex and multifunctional,
displaying evaluation findings in a graph can allow companies to organize a lot of
interrelated information in useful ways.

3. Location

Its app utilizing features such as Geographic Maps and GIS can be particularly
relevant to wider business when the location is a very relevant factor. We will use
maps to show business insights from various locations, also consider the seriousness
of the issues, the reasons behind them, and working groups to address them.

Disadvantages of Data Visualization in R


1. Cost

R application development range a good amount of money. It may not be possible,


especially for small companies, that many resources can be spent on purchasing
them. To generate reports, many companies may employ professionals to create
charts that can increase costs. Small enterprises are often operating in resource-
limited settings, and are also receiving timely evaluation results that can often be of
high importance.

2. Distraction

However, at times, data visualization apps create highly complex and fancy graphics-
rich reports and charts, which may entice users to focus more on the form than the
function. If we first add visual appeal, then the overall value of the graphic
representation will be minimal. In resource-setting, it is required to understand how
resources can be best used. And it is also not caught in the graphics trend without a
clear purpose.

R Pie Charts
R programming language has several libraries for creating charts and graphs. A pie-
chart is a representation of values in the form of slices of a circle with different colors.
Slices are labeled with a description, and the numbers corresponding to each slice
are also shown in the chart. However, pie charts are not recommended in the R
documentation, and their characteristics are limited. The authors recommend a bar
or dot plot on a pie chart because people are able to measure length more
accurately than volume.

The Pie charts are created with the help of pie () function, which takes positive
numbers as vector input. Additional parameters are used to control labels, colors,
titles, etc.

There is the following syntax of the pie() function:

1. pie(X, Labels, Radius, Main, Col, Clockwise)

Here,

1. X is a vector that contains the numeric values used in the pie chart.
2. Labels are used to give the description to the slices.
3. Radius describes the radius of the pie chart.
4. Main describes the title of the chart.
5. Col defines the color palette.
6. Clockwise is a logical value that indicates the clockwise or anti-clockwise direction in
which slices are drawn.

Example

1. # Creating data for the graph.


2. x <- c(20, 65, 15, 50)
3. labels <- c("India", "America", "Shri Lanka", "Nepal")
4. # Giving the chart file a name.
5. png(file = "Country.jpg")
6. # Plotting the chart.
7. pie(x,labels)
8. # Saving the file.
9. dev.off()

Output:
Title and color
A pie chart has several more features that we can use by adding more parameters to
the pie() function. We can give a title to our pie chart by passing the main parameter.
It tells the title of the pie chart to the pie() function. Apart from this, we can use a
rainbow colour pallet while drawing the chart by passing the col parameter.

Note: The length of the pallet will be the same as the number of values that we
have for the chart. So for that, we will use length() function.

Let's see an example to understand how these methods work in creating an attractive
pie chart with title and color.

Example

1. # Creating data for the graph.


2. x <- c(20, 65, 15, 50)
3. labels <- c("India", "America", "Shri Lanka", "Nepal")
4. # Giving the chart file a name.
5. png(file = "title_color.jpg")
6. # Plotting the chart.
7. pie(x,labels,main="Country Pie chart",col=rainbow(length(x)))
8. # Saving the file.
9. dev.off()

Output:

Slice Percentage & Chart Legend


There are two additional properties of the pie chart, i.e., slice percentage and chart
legend. We can show the data in the form of percentage as well as we can add
legends to plots in R by using the legend() function. There is the following syntax of
the legend() function.

1. legend(x,y=NULL,legend,fill,col,bg)
Here,

o x and y are the coordinates to be used to position the legend.


o legend is the text of legend
o fill is the color to use for filling the boxes beside the legend text.
o col defines the color of line and points besides the legend text.
o bg is the background color for the legend box.

Example

1. # Creating data for the graph.


2. x <- c(20, 65, 15, 50)
3. labels <- c("India", "America", "Shri Lanka", "Nepal")
4. pie_percent<- round(100*x/sum(x), 1)
5. # Giving the chart file a name.
6. png(file = "per_pie.jpg")
7. # Plotting the chart.
8. pie(x, labels = pie_percent, main = "Country Pie Chart",col = rainbow(length(x)))
9. legend("topright", c("India", "America", "Shri Lanka", "Nepal"), cex = 0.8,
10. fill = rainbow(length(x)))
11. #Saving the file.
12. dev.off()

Output:
3 Dimensional Pie Chart
In R, we can also create a three-dimensional pie chart. For this purpose, R provides a
plotrix package whose pie3D() function is used to create an attractive 3D pie chart.
The parameters of pie3D() function remain same as pie() function. Let's see an
example to understand how a 3D pie chart is created with the help of this function.

Example

1. # Getting the library.


2. library(plotrix)
3. # Creating data for the graph.
4. x <- c(20, 65, 15, 50,45)
5. labels <- c("India", "America", "Shri Lanka", "Nepal","Bhutan")
6. # Give the chart file a name.
7. png(file = "3d_pie_chart1.jpg")
8. # Plot the chart.
9. pie3D(x,labelslabels = labels,explode = 0.1, main = "Country Pie Chart")
10. # Save the file.
11. dev.off()

Output:

Example

1. # Getting the library.


2. library(plotrix)
3. # Creating data for the graph.
4. x <- c(20, 65, 15, 50,45)
5. labels <- c("India", "America", "Shri Lanka", "Nepal","Bhutan")
6. pie_percent<- round(100*x/sum(x), 1)
7. # Giving the chart file a name.
8. png(file = "three_D_pie.jpg")
9. # Plotting the chart.
10. pie3D(x, labels = pie_percent, main = "Country Pie Chart",col = rainbow(length(x)))
11. legend("topright", c("India", "America", "Shri Lanka", "Nepal","Bhutan"), cex = 0
.8,
12. fill = rainbow(length(x)))
13. #Saving the file.
14. dev.off()

Output:

R Bar Charts
A bar chart is a pictorial representation in which numerical values of variables are
represented by length or height of lines or rectangles of equal width. A bar chart is
used for summarizing a set of categorical data. In bar chart, the data is shown
through rectangular bars having the length of the bar proportional to the value of
the variable.

In R, we can create a bar chart to visualize the data in an efficient manner. For this
purpose, R provides the barplot() function, which has the following syntax:

1. barplot(h,x,y,main, names.arg,col)

S.No Parameter Description

1. H A vector or matrix which contains numeric values used in the bar chart.

2. xlab A label for the x-axis.

3. ylab A label for the y-axis.

4. main A title of the bar chart.

5. names.arg A vector of names that appear under each bar.

6. col It is used to give colors to the bars in the graph.

Example

1. # Creating the data for Bar chart


2. H<- c(12,35,54,3,41)
3. # Giving the chart file a name
4. png(file = "bar_chart.png")
5. # Plotting the bar chart
6. barplot(H)
7. # Saving the file
8. dev.off()

Output:
Labels, Title &Colors
Like pie charts, we can also add more functionalities in the bar chart by-passing more
arguments in the barplot() functions. We can add a title in our bar chart or can add
colors to the bar by adding the main and col parameters, respectively. We can add
another parameter i.e., args.name, which is a vector that has the same number of
values, which are fed as the input vector to describe the meaning of each bar.

Let's see an example to understand how labels, titles, and colors are added in our bar
chart.

Example

1. # Creating the data for Bar chart


2. H <- c(12,35,54,3,41)
3. M<- c("Feb","Mar","Apr","May","Jun")
4.
5. # Giving the chart file a name
6. png(file = "bar_properties.png")
7.
8. # Plotting the bar chart
9. barplot(H,names.arg=M,xlab="Month",ylab="Revenue",col="Green",
10. main="Revenue Bar chart",border="red")
11. # Saving the file
12. dev.off()

Output:

Group Bar Chart & Stacked Bar Chart


We can create bar charts with groups of bars and stacks using matrices as input
values in each bar. One or more variables are represented as a matrix that is used to
construct group bar charts and stacked bar charts.
Let's see an example to understand how these charts are created.

Example

1. library(RColorBrewer)
2. months <- c("Jan","Feb","Mar","Apr","May")
3. regions <- c("West","North","South")
4. # Creating the matrix of the values.
5. Values <- matrix(c(21,32,33,14,95,46,67,78,39,11,22,23,94,15,16), nrow = 3, nco
l = 5, byrow = TRUE)
6. # Giving the chart file a name
7. png(file = "stacked_chart.png")
8. # Creating the bar chart
9. barplot(Values, main = "Total Revenue", names.arg = months, xlab = "Month",
ylab = "Revenue", ccol =c("cadetblue3","deeppink2","goldenrod1"))
10. # Adding the legend to the chart
11. legend("topleft", regions, cex = 1.3, fill = c("cadetblue3","deeppink2","goldenr
od1"))
12.
13. # Saving the file
14. dev.off()

Output:
R Boxplot
Boxplots are a measure of how well data is distributed across a data set. This divides
the data set into three quartiles. This graph represents the minimum, maximum,
average, first quartile, and the third quartile in the data set. Boxplot is also useful in
comparing the distribution of data in a data set by drawing a boxplot for each of
them.

R provides a boxplot() function to create a boxplot. There is the following syntax of


boxplot() function:

1. boxplot(x, data, notch, varwidth, names, main)

Here,
S.N Parameter Description
o

1. x It is a vector or a formula.

2. data It is the data frame.

3. notch It is a logical value set as true to draw a notch.

4. varwidth It is also a logical value set as true to draw the width of the box same as the sample size.

5. names It is the group of labels that will be printed under each boxplot.

6. main It is used to give a title to the graph.

Let?s see an example to understand how we can create a boxplot in R. In the below
example, we will use the "mtcars" dataset present in the R environment. We will use
its two columns only, i.e., "mpg" and "cyl". The below example will create a boxplot
graph for the relation between mpg and cyl, i.e., miles per gallon and number of
cylinders, respectively.

44.5M

722

Hello Java Program for Beginners

Example

1. # Giving a name to the chart file.


2. png(file = "boxplot.png")
3. # Plotting the chart.
4. boxplot(mpg ~ cyl, data = mtcars, xlab = "Quantity of Cylinders",
5. ylab = "Miles Per Gallon", main = "R Boxplot Example")
6.
7. # Save the file.
8. dev.off()

Output:
Boxplot using notch
In R, we can draw a boxplot using a notch. It helps us to find out how the medians of
different data groups match with each other. Let's see an example to understand
how a boxplot graph is created using notch for each of the groups.

In our below example, we will use the same dataset ?mtcars."

Example

1. # Giving a name to our chart.


2. png(file = "boxplot_using_notch.png")
3. # Plotting the chart.
4. boxplot(mpg ~ cyl, data = mtcars,
5. xlab = "Quantity of Cylinders",
6. ylab = "Miles Per Gallon",
7. main = "Boxplot Example",
8. notch = TRUE,
9. varwidth = TRUE,
10. ccol = c("green","yellow","red"),
11. names = c("High","Medium","Low")
12. )
13. # Saving the file.
14. dev.off()

Output:

Violin Plots
R provides an additional plotting scheme which is created with the combination of
a boxplot and a kernel density plot. The violin plots are created with the help of
vioplot() function present in the vioplot package.
Let's see an example to understand the creation of the violin plot.

Example

1. # Loading the vioplot package


2. library(vioplot)
3. # Giving a name to our chart.
4. png(file = "vioplot.png")
5. #Creating data for vioplot function
6. x1 <- mtcars$mpg[mtcars$cyl==4]
7. x2 <- mtcars$mpg[mtcars$cyl==6]
8. x3 <- mtcars$mpg[mtcars$cyl==8]
9. #Creating vioplot function
10. vioplot(x1, x2, x3, names=c("4 cyl", "6 cyl", "8 cyl"),
11. col="green")
12. #Setting title
13. title("Violin plot example")
14. # Saving the file.
15. dev.off()

Output:
Bagplot- 2-Dimensional Boxplot Extension
The bagplot(x, y) function in the aplpack package provides a biennial version of the
univariate boxplot. The bag contains 50% of all points. The bivariate median is
approximate. The fence separates itself from the outside points, and the outlays are
displayed.

Let?s see an example to understand how we can create a two-dimensional boxplot


extension in R.

Example

1. # Loading aplpack package


2. library(aplpack)
3. # Giving a name to our chart.
4. png(file = "bagplot.png")
5. #Creating bagplot function
6. attach(mtcars)
7. bagplot(wt,mpg, xlab="Car Weight", ylab="Miles Per Gallon",
8. main="2D Boxplot Extension")
9. # Saving the file.
10. dev.off()

Output:

R Histogram
A histogram is a type of bar chart which shows the frequency of the number of
values which are compared with a set of values ranges. The histogram is used for the
distribution, whereas a bar chart is used for comparing different entities. In the
histogram, each bar represents the height of the number of values present in the
given range.

For creating a histogram, R provides hist() function, which takes a vector as an input
and uses more parameters to add more functionality. There is the following syntax of
hist() function:

1. hist(v,main,xlab,ylab,xlim,ylim,breaks,col,border)

Here,

S.No Parameter Description

1. v It is a vector that contains numeric values.

2. main It indicates the title of the chart.

3. col It is used to set the color of the bars.

4. border It is used to set the border color of each bar.

5. xlab It is used to describe the x-axis.

6. ylab It is used to describe the y-axis.

7. xlim It is used to specify the range of values on the x-axis.

8. ylim It is used to specify the range of values on the y-axis.

9. breaks It is used to mention the width of each bar.

Let?s see an example in which we create a simple histogram with the help of required
parameters like v, main, col, etc.

Example

1. # Creating data for the graph.


2. v <- c(12,24,16,38,21,13,55,17,39,10,60)
3.
4. # Giving a name to the chart file.
5. png(file = "histogram_chart.png")
6.
7. # Creating the histogram.
8. hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red")
9.
10. # Saving the file.
11. dev.off()

Output:

Let?s see some more examples in which we have used different parameters of hist()
function to add more functionality or to create a more attractive chart.

Example: Use of xlim&ylim parameter


1. # Creating data for the graph.
2. v <- c(12,24,16,38,21,13,55,17,39,10,60)
3.
4. # Giving a name to the chart file.
5. png(file = "histogram_chart_lim.png")
6.
7. # Creating the histogram.
8. hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red",xlim = c(0,40), y
lim = c(0,3), breaks = 5)
9.
10. # Saving the file.
11. dev.off()

Output:
Example: Finding return value of hist()

1. # Creating data for the graph.


2. v <- c(12,24,16,38,21,13,55,17,39,10,60)
3.
4. # Giving a name to the chart file.
5. png(file = "histogram_chart_lim.png")
6. # Creating the histogram.
7. m<-hist(v)
8. m

Output:

Example: Using histogram return values for labels


using text()

1. # Creating data for the graph.


2. v <- c(12,24,16,38,21,13,55,17,39,10,60,120,40,70,90)
3. # Giving a name to the chart file.
4. png(file = "histogram_return.png")
5.
6. # Creating the histogram.
7. m<-hist(v,xlab = "Weight",ylab="Frequency",col = "darkmagenta",border = "p
ink", breaks = 5)
8. #Setting labels
9. text(m$mids,m$counts,labels=m$counts, adj=c(0.5, -0.5))
10. # Saving the file.
11. dev.off()

Output:

Example: Histogram using non-uniform width

1. # Creating data for the graph.


2. v <- c(12,24,16,38,21,13,55,17,39,10,60,120,40,70,90)
3. # Giving a name to the chart file.
4. png(file = "histogram_non_uniform.png")
5. # Creating the histogram.
6. hist(v,xlab = "Weight",ylab="Frequency",xlim=c(50,100),col = "darkmagenta",border
= "pink", breaks=c(10,55,60,70,75,80,100,120))
7. # Saving the file.
8. dev.off()

Output:

R Line Graphs
A line graph is a pictorial representation of information which changes continuously
over time. A line graph can also be referred to as a line chart. Within a line graph,
there are points connecting the data to show the continuous change. The lines in a
line graph can move up and down based on the data. We can use a line graph to
compare different events, information, and situations.
A line chart is used to connect a series of points by drawing line segments between
them. Line charts are used in identifying the trends in data. For line graph
construction, R provides plot() function, which has the following syntax:

1. plot(v,type,col,xlab,ylab)

Here,

S.N Parameter Description


o

1. v It is a vector which contains the numeric values.

2. type This parameter takes the value ?I? to draw only the lines or ?p? to draw only the points
and "o" to draw both lines and points.

3. xlab It is the label for the x-axis.

4. ylab It is the label for the y-axis.

5. main It is the title of the chart.

6. col It is used to give the color for both the points and lines

Let?s see a basic example to understand how plot() function is used to create the line
graph:

00:00/06:36

Example

1. # Creating the data for the chart.


2. v <- c(13,22,28,7,31)
3. # Giving a name to the chart file.
4. png(file = "line_graph.jpg")
5. # Plotting the bar chart.
6. plot(v,type = "o")
7. # Saving the file.
8. dev.off()
Output:

Line Chart Title, Color, and Labels


Like other graphs and charts, in line chart, we can add more features by adding more
parameters. We can add the colors to the lines and points, add labels to the axis, and
can give a title to the chart. Let?s see an example to understand how these
parameters are used in plot() function to create an attractive line graph.

Example

1. # Creating the data for the chart.


2. v <- c(13,22,28,7,31)
3. # Giving a name to the chart file.
4. png(file = "line_graph_feature.jpg")
5. # Plotting the bar chart.
6. plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
7. # Saving the file.
8. dev.off()

Output:

Line Charts Containing Multiple Lines


In our previous examples, we created line graphs containing only one line in each
graph. R allows us to create a line graph containing multiple lines. R provides lines()
function to create a line in the line graph.

The lines() function takes an additional input vector for creating a line. Let?s see an
example to understand how this function is used:

Example
1. # Creating the data for the chart.
2. v <- c(13,22,28,7,31)
3. w <- c(11,13,32,6,35)
4. x <- c(12,22,15,34,35)
5. # Giving a name to the chart file.
6. png(file = "multi_line_graph.jpg")
7. # Plotting the bar chart.
8. plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
9. lines(w, type = "o", col = "red")
10. lines(x, type = "o", col = "blue")
11. # Saving the file.
12. dev.off()

Output:
Line Graph using ggplot2
In R, there is another way to create a line graph i.e. the use of ggplot2 packages. The
ggplot2 package provides geom_line(), geom_step() and geom_path() function to
create line graph. To use these functions, we first have to install the ggplot2 package
and then we load it into the current working library.

Let?s see an example to understand how ggplot2 is used to create a line graph. In
the below example, we will use the predefined ToothGrowth dataset, which describes
the effect of vitamin C on tooth growth in Guinea pigs.

Example

1. library(ggplot2)
2. #Creating data for the graph
3. data_frame<- data.frame(dose=c("D0.5", "D1", "D2"),
4. len=c(4.2, 10, 29.5))
5. head(data_frame)
6. png(file = "multi_line_graph2.jpg")
7. # Basic line plot with points
8. ggplot(data=data_frame, aes(x=dose, y=len, group=1)) +geom_line()+geom_point()
9. # Change the line type
10. ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(linetype = "dashed")
+geom_point()
11. # Change the color
12. ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(color="red")
+geom_point()
13. dev.off()

Output:
R Scatterplots
The scatter plots are used to compare variables. A comparison between variables is
required when we need to define how much one variable is affected by another
variable. In a scatterplot, the data is represented as a collection of points. Each point
on the scatterplot defines the values of the two variables. One variable is selected for
the vertical axis and other for the horizontal axis. In R, there are two ways of creating
scatterplot, i.e., using plot() function and using the ggplot2 package's functions.

There is the following syntax for creating scatterplot in R:

1. plot(x, y, main, xlab, ylab, xlim, ylim, axes)

Here,

S.No Parameters Description

1. x It is the dataset whose values are the horizontal coordinates.

2. y It is the dataset whose values are the vertical coordinates.

3. main It is the title of the graph.

4. xlab It is the label on the horizontal axis.

5. ylab It is the label on the vertical axis.

6. xlim It is the limits of the x values which is used for plotting.

7. ylim It is the limits of the values of y, which is used for plotting.

8. axes It indicates whether both axes should be drawn on the plot.

Let's see an example to understand how we can construct a scatterplot using the plot
function. In our example, we will use the dataset "mtcars", which is the predefined
dataset available in the R environment.
43.1M

943

Features of Java - Javatpoint

Example

1. #Fetching two columns from mtcars


2. data <-mtcars[,c('wt','mpg')]
3. # Giving a name to the chart file.
4. png(file = "scatterplot.png")
5. # Plotting the chart for cars with weight between 2.5 to 5 and mileage betwee
n 15 and 30.
6. plot(x = data$wt,y = data$mpg, xlab = "Weight", ylab = "Milage", xlim = c(2.5,5), ylim
= c(15,30), main = "Weight v/sMilage")
7. # Saving the file.
8. dev.off()

Output:
Scatterplot using ggplot2
In R, there is another way for creating scatterplot i.e. with the help of ggplot2
package.

The ggplot2 package provides ggplot() and geom_point() function for creating a
scatterplot. The ggplot() function takes a series of the input item. The first parameter
is an input vector, and the second is the aes() function in which we add the x-axis and
y-axis.

Let's start understanding how the ggplot2 package is used with the help of an
example where we have used the familiar dataset "mtcars".

Example

1. #Loading ggplot2 package


2. library(ggplot2)
3. # Giving a name to the chart file.
4. png(file = "scatterplot_ggplot.png")
5. # Plotting the chart using ggplot() and geom_point() functions.
6. ggplot(mtcars, aes(x = drat, y = mpg)) +geom_point()
7. # Saving the file.
8. dev.off()

Output:

We can add more features and make a more attractive scatter plots also. Below are
some examples in which different parameters are added.

Example 1: Scatterplot with groups

1. #Loading ggplot2 package


2. library(ggplot2)
3. # Giving a name to the chart file.
4. png(file = "scatterplot1.png")
5. # Plotting the chart using ggplot() and geom_point() functions.
6. #The aes() function inside the geom_point() function controls the color of the group.
7. ggplot(mtcars, aes(x = drat, y = mpg)) +
8. geom_point(aes(color=factor(gear)))
9. # Saving the file.
10. dev.off()

Output:

Example 2: Changes in axis

1. #Loading ggplot2 package


2. library(ggplot2)
3. # Giving a name to the chart file.
4. png(file = "scatterplot2.png")
5. # Plotting the chart using ggplot() and geom_point() functions.
6. #The aes() function inside the geom_point() function controls the color of the group.
7. ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color=factor
(gear)))
8. # Saving the file.
9. dev.off()

Output:

Example 3: Scatterplot with fitted values

1. #Loading ggplot2 package


2. library(ggplot2)
3. # Giving a name to the chart file.
4. png(file = "scatterplot3.png")
5. #Creating scatterplot with fitted values.
6. # An additional function stst_smooth is used for linear regression.
7. ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = fact
or(gear))) + stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
8. #in above example lm is used for linear regression and se stands for standard error.
9. # Saving the file.
10. dev.off()

Output:

Adding information to the graph


Example 4: Adding title
1. #Loading ggplot2 package
2. library(ggplot2)
3. # Giving a name to the chart file.
4. png(file = "scatterplot4.png")
5. #Creating scatterplot with fitted values.
6. # An additional function stst_smooth is used for linear regression.
7. new_graph<-ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(ae
s(color = factor(gear))) +
8. stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
9. #in above example lm is used for linear regression and se stands for standard
error.
10. new_graph+
11. labs(
12. title = "Scatterplot with more information"
13. )
14. # Saving the file.
15. dev.off()

Output:
Example 5: Adding title with dynamic name

1. #Loading ggplot2 package


2. library(ggplot2)
3. # Giving a name to the chart file.
4. png(file = "scatterplot5.png")
5. #Creating scatterplot with fitted values.
6. # An additional function stst_smooth is used for linear regression.
7. new_graph<-ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(ae
s(color = factor(gear))) +
8. stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
9. #in above example lm is used for linear regression and se stands for standard
error.
10. #Finding mean of mpg
11. mean_mpg<- mean(mtcars$mpg)
12. #Adding title with dynamic name
13. new_graph + labs(
14. title = paste("Adding additiona information. Average mpg is", mean_mpg)
15. )
16. # Saving the file.
17. dev.off()

Output:

Example 6: Adding a sub-title

1. #Loading ggplot2 package


2. library(ggplot2)
3. # Giving a name to the chart file.
4. png(file = "scatterplot6.png")
5. #Creating scatterplot with fitted values.
6. # An additional function stst_smooth is used for linear regression.
7. new_graph<-ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(ae
s(color = factor(gear))) +
8. stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
9. #in above example lm is used for linear regression and se stands for standard
error.
10. #Adding title with dynamic name
11. new_graph + labs(
12. title =
13. "Relation between Mile per hours and drat",
14. subtitle =
15. "Relationship break down by gear class",
16. caption = "Authors own computation"
17. )
18. # Saving the file.
19. dev.off()

Output:
Example 7: Changing name of x-axis and y-axis

1. #Loading ggplot2 package


2. library(ggplot2
3. # Giving a name to the chart file.
4. png(file = "scatterplot7.png")
5. #Creating scatterplot with fitted values.
6. # An additional function stst_smooth is used for linear regression.
7. new_graph<-ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(ae
s(color = factor(gear))) +
8. stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
9. #in above example lm is used for linear regression and se stands for standard
error.
10. #Adding title with dynamic name
11. new_graph + labs(
12. x = "Drat definition",
13. y = "Mile per hours",
14. color = "Gear",
15. title = "Relation between Mile per hours and drat",
16. subtitle = "Relationship break down by gear class",
17. caption = "Authors own computation"
18. )
19. # Saving the file.
20. dev.off()

Output:

Example 8: Adding theme

1. #Loading ggplot2 package


2. library(ggplot2
3. # Giving a name to the chart file.
4. png(file = "scatterplot8.png")
5. #Creating scatterplot with fitted values.
6. # An additional function stst_smooth is used for linear regression.
7. new_graph<-ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(ae
s(color = factor(gear))) +
8. stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
9. #in above example lm is used for linear regression and se stands for standard
error.
10. #Adding title with dynamic name
11. new_graph+
12. theme_dark() +
13. labs(
14. x = "Drat definition, in log",
15. y = "Mile per hours, in log",
16. color = "Gear",
17. title = "Relation between Mile per hours and drat",
18. subtitle = "Relationship break down by gear class",
19. caption = "Authors own computation"
20. )
21. # Saving the file.
22. dev.off()

Output:
Linear Regression
Linear regression is used to predict the value of an outcome variable y on the basis
of one or more input predictor variables x. In other words, linear regression is used to
establish a linear relationship between the predictor and response variables.

In linear regression, predictor and response variables are related through an equation
in which the exponent of both these variables is 1. Mathematically, a linear
relationship denotes a straight line, when plotted as a graph.

There is the following general mathematical equation for linear regression:

1. y = ax + b
Here,

00:00/04:28

o y is a response variable.
o x is a predictor variable.
o a and b are constants that are called the coefficients.

Steps for establishing the Regression


The prediction of the weight of a person when his height is known, is a simple
example of regression. To predict the weight, we need to have a relationship
between the height and weight of a person.

There are the following steps to create the relationship:

1. In the first step, we carry out the experiment of gathering a sample of observed
values of height and weight.
2. After that, we create a relationship model using the lm() function of R.
3. Next, we will find the coefficient with the help of the model and create the
mathematical equation using this coefficient.
4. We will get the summary of the relationship model to understand the average error
in prediction, known as residuals.
5. At last, we use the predict() function to predict the weight of the new person.

There is the following syntax of lm() function:

1. lm(formula,data)

Here,

S.No Parameters Description

1. Formula It is a symbol that presents the relationship between x and y.

2. Data It is a vector on which we will apply the formula.


Creating Relationship Model and Getting
the Coefficients
Let's start performing the second and third steps, i.e., creating a relationship model
and getting the coefficients. We will use the lm() function and pass the x and y input
vectors and store the result in a variable named relationship_model.

Example

1. #Creating input vector for lm() function


2. x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
3. y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
4. # Applying the lm() function.
5. relationship_model<- lm(y~x)
6. #Printing the coefficient
7. print(relationship_model)

Output:

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
47.50833 0.07276

Getting Summary of Relationship Model


We will use the summary() function to get a summary of the relationship model. Let's
see an example to understand the use of the summary() function.

Example

1. #Creating input vector for lm() function


2. x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
3. y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
4.
5. # Applying the lm() function.
6. relationship_model<- lm(y~x)
7.
8. #Printing the coefficient
9. print(summary(relationship_model))
Output:

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-38.948 -7.390 1.869 15.933 34.087

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.50833 55.18118 0.861 0.414
x 0.07276 0.39342 0.185 0.858

Residual standard error: 25.96 on 8 degrees of freedom


Multiple R-squared: 0.004257, Adjusted R-squared: -0.1202
F-statistic: 0.0342 on 1 and 8 DF, p-value: 0.8579

The predict() Function


Now, we will predict the weight of new persons with the help of the predict()
function. There is the following syntax of predict function:

1. predict(object, newdata)

Here,

S.No Parameter Description

1. object It is the formula that we have already created using the lm() function.

2. Newdata It is the vector that contains the new value for the predictor variable.

Example

1. #Creating input vector for lm() function


2. x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
3. y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
4.
5. # Applying the lm() function.
6. relationship_model<- lm(y~x)
7.
8. # Finding the weight of a person with height 170.
9. z <- data.frame(x = 160)
10. predict_result<- predict(relationship_model,z)
11. print(predict_result)

Output:

1
59.14977

Plotting Regression
Now, we plot out prediction results with the help of the plot() function. This function
takes parameter x and y as an input vector and many more arguments.

Example

1. #Creating input vector for lm() function


2. x <- c(141, 134, 178, 156, 108, 116, 119, 143, 162, 130)
3. y <- c(62, 85, 56, 21, 47, 17, 76, 92, 62, 58)
4. relationship_model<- lm(y~x)
5. # Giving a name to the chart file.
6. png(file = "linear_regression.png")
7. # Plotting the chart.
8. plot(y,x,col = "red",main = "Height and Weight Regression",abline(lm(x~y)),cex = 1.3,
pch = 16,xlab = "Weight in Kg",ylab = "Height in cm")
9. # Saving the file.
10. dev.off()

Output:
R-Multiple Linear Regression
Multiple linear regression is the extension of the simple linear regression, which is
used to predict the outcome variable (y) based on multiple distinct predictor
variables (x). With the help of three predictor variables (x1, x2, x3), the prediction of y
is expressed using the following equation:

y=b0+b1*x1+b2*x2+b3*x3

The "b" values represent the regression weights. They measure the association
between the outcome and the predictor variables. "

Or
Multiple linear regression is the extension of linear regression in the relationship
between more than two variables. In simple linear regression, we have one predictor
and one response variable. But in multiple regressions, we have more than one
predictor variable and one response variable.

There is the following general mathematical equation for multiple regression -

y=b0+b1*x1+b2*x2+b3*x3+⋯bn*xn

Here,

o y is a response variable.
o b0, b1, b2...bn are the coefficients.
o x1, x2, ...xn are the predictor variables.

In R, we create the regression model with the help of the lm() function. The model
will determine the value of the coefficients with the help of the input data. We can
predict the value of the response variable for the set of predictor variables using
these coefficients.

There is the following syntax of lm() function in multiple regression

1. lm(y ~ x1+x2+x3...., data)

Before proceeding further, we first create our data for multiple regression. We will
use the "mtcars" dataset present in the R environment. The main task of the model is
to create the relationship between the "mpg" as a response variable with "wt", "disp"
and "hp" as predictor variables.

For this purpose, we will create a subset of these variables from the "mtcars" dataset.

1. data<-mtcars[,c("mpg","wt","disp","hp")]
2. print(head(input))

Output:
Creating Relationship Model and finding
Coefficient
Now, we will use the data which we have created before to create the Relationship
Model. We will use the lm() function, which takes two parameters i.e., formula and
data. Let's start understanding how the lm() function is used to create the
Relationship Model.

Example

1. #Creating input data.


2. input <- mtcars[,c("mpg","wt","disp","hp")]
3. # Creating the relationship model.
4. Model <- lm(mpg~wt+disp+hp, data = input)
5. # Showing the Model.
6. print(Model)

Output:

From the above output it is clear that our model is successfully setup. Now, our next
step is to find the coefficient with the help of the model.

b0<- coef(Model)[1]
print(b0)
x_wt<- coef(Model)[2]
x_disp<- coef(Model)[3]
x_hp<- coef(Model)[4]
print(x_wt)
print(x_disp)
print(x_hp)

Output:

The equation for the Regression Model


Now, we have coefficient values and intercept. Let's start creating a mathematical
equation that we will apply for predicting new values. First, we will create an
equation, and then we use the equation to predict the mileage when a new set of
values for weight, displacement, and horsepower is provided.

Let's see an example in which we predict the mileage for a car with weight=2.51,
disp=211 and hp=82.

Example

1. #Creating equation for predicting new values.


2. y=b0+x_wt*x1+x_disp*x2+x_hp*x3\
3. #Applying equation for prediction new values
4. y=b0+x_wt*2.51+x_disp*211+x_hp*82

Output:
R-Logistic Regression
In the logistic regression, a regression curve, y = f (x), is fitted. In the regression curve
equation, y is a categorical variable. This Regression Model is used for predicting that
y has given a set of predictors x. Therefore, predictors can be categorical, continuous,
or a mixture of both.

The logistic regression is a classification algorithm that falls under nonlinear


regression. This model is used to predict a given binary result (1/0, yes/no, true/false)
as a set of independent variables. Furthermore, it helps to represent
categorical/binary outcomes using dummy variables.

Logistic regression is a regression model in which the response variable has


categorical values such as true/false or 0/1. Therefore, we can measure the
probability of the binary response.

There is the following mathematical equation for the logistic regression:

00:00/04:47

y=1/(1+e^-(b0+b1 x1+b2 x2+⋯))

In the above equation, y is a response variable, x is the predictor variable, and b 0 and
b1, b2,...bn are the coefficients, which is numeric constants. We use the glm() function
to create the regression model.

There is the following syntax of the glm() function.

1. glm(formula, data, family)


Here,

S.N Parameter Description


o

1. formula It is a symbol which represents the relationship b/w the variables.

2. data It is the dataset giving the values of the variables.

3. family An R object which specifies the details of the model, and its value is binomial for logistic
regression.

Building Logistic Regression


The in-built data set "mtcars" describes various models of the car with their different
engine specifications. In the "mtcars" data set, the transmission mode is described by
the column "am", which is a binary value (0 or 1). We can construct a logistic
regression model between column "am" and three other columns - hp, wt, and cyl.

Let's see an example to understand how the glm function is used to create logistic
regression and how we can use the summary function to find a summary for the
analysis.

In our example, we will use the dataset "BreastCancer" available in the R


environment. To use it, we first need to install "mlbench" and "caret" packages.

Example

1. #Loading library
2. library(mlbench)
3. #Using BreastCancer dataset
4. data(BreastCancer, package = "mlbench")
5. breast_canc = BreastCancer[complete.cases(BreastCancer),]
6. #Displaying the information related to dataset with the str() function.
7. str(breast_canc)

Output:
We now divide our data into training and test sets with training sets containing 70%
data and test sets including the remaining percentages.

1. #Dividing dataset into training and test dataset.


2. set.seed(100)
3. #Creating partitioning.
4. Training_Ratio <- createDataPartition(b_canc$Class, p=0.7, list = F)
5. #Creating training data.
6. Training_Data <- b_canc[Training_Ratio, ]
7. str(Training_Data)
8. #Creating test data.
9. Test_Data <- b_canc[-Training_Ratio, ]
10. str(Test_Data)

Output:
Now, we construct the logistic regression function with the help of glm() function.
We pass the formula Class~Cell.shape as the first parameter and specifying the
attribute family as "binomial" and use Training_data as the third parameter.

Example

1. #Creating Regression Model


2. glm(Class ~ Cell.shape, family="binomial", data = Training_Data)

Output:
Now, use the summary function for analysis.

1. #Creating Regression Model


2. model<-glm(Class ~ Cell.shape, family="binomial", data = Training_Data)
3. #Using summary function
4. print(summary(model))

Output:

R Poisson Regression
The Poisson Regression model is used for modeling events where the outcomes are
counts. Count data is a discrete data with non-negative integer values that count
things, such as the number of people in line at the grocery store, or the number of
times an event occurs during the given timeframe.

We can also define the count data as the rate data. So that it can express the
number of times an event occurs within the timeframe as a raw count or as a rate.
Poisson regression allows us to determine which explanatory variable (x values)
influence a given response variable (y value, count, or a rate).

For example, poisson regression can be implemented by a grocery store to


understand better, and predict the number of people in a row.

There is the following general mathematical equation for poisson regression:

Here,

S.No Parameter Description

1. y It is the response variable.

2. a and b These are the numeric coefficients.

3. x x is the predictor variable.

The poisson regression model is created with the help of the familiar function glm().

Let's see an example in which we create the poisson regression model using glm()
function. In this example, we have considered an in-built dataset "wrapbreaks" that
describe the tension(low, medium, or high), and the effect of wool type(A and B) on
the number of wrap breaks per loom. We will consider wool "type" and "tension"as
the predictor variables, and "breaks" is taken as the response variable.

Example

1. #Creting data for the poisson regression


2. reg_data<-warpbreaks
3. print(head(reg_data))

Output:
Now, we will create the regression model with the help of the glm() function as:

1. #Creating Poisson Regression Model using glm() function


2. output_result <-glm(formula = breaks ~ wool+tension, data = warpbreaks,family = p
oisson)
3. output_result

Output:

Now, let's use summary() function to find the summary of the model for data
analysis.

1. #Using summary function


2. print(summary(output_result))
Output:

R Normal Distribution
In random collections of data from independent sources, it is commonly seen that
the distribution of data is normal. It means that if we plot a graph with the value of
the variable in the horizontal axis and counting the values in the vertical axis, then we
get a bell shape curve. The curve center represents the mean of the data set. In the
graph, fifty percent of the value is located to the left of the mean. And the other fifty
percent to the right of the graph. This is referred to as the normal distribution.

R allows us to generate normal distribution by providing the following functions:


These function can have the following parameters:

S.No Parameter Description

1. x It is a vector of numbers.

2. p It is a vector of probabilities.

3. n It is a vector of observations.

4. mean It is the mean value of the sample data whose default value is zero.

5. sd It is the standard deviation whose default value is 1.

Let's start understanding how these functions are used with the help of the examples.

dnorm():Density
The dnorm() function of R calculates the height of the probability distribution at each
point for a given mean and standard deviation. The probability density of the normal
distribution is:

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- seq(-1, 20, by = .2)
3. # Choosing the mean as 2.0 and standard deviation as 0.5.
4. y <- dnorm(x, mean = 2.0, sd = 0.5)
5. # Giving a name to the chart file.
6. png(file = "dnorm.png")
7. #Plotting the graph
8. plot(x,y)
9. # Saving the file.
10. dev.off()

Output:
pnorm():Direct Look-Up
The dnorm() function is also known as "Cumulative Distribution Function". This
function calculates the probability of a normally distributed random numbers, which
is less than the value of a given number. The cumulative distribution is as follows:

f(x)=P(X≤x)

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- seq(-1, 20, by = .1)
3. # Choosing the mean as 2.0 and standard deviation as 0.5.
4. y <- pnorm(x, mean = 2.0, sd = 0.5)
5. # Giving a name to the chart file.
6. png(file = "pnorm.png")
7. #Plotting the graph
8. plot(x,y)
9. # Saving the file.
10. dev.off()

Output:
qnorm():Inverse Look-Up
The qnorm() function takes the probability value as an input and calculates a number
whose cumulative value matches with the probability value. The cumulative
distribution function and the inverse cumulative distribution function are related by

p=f(x)
x=f-1 (p)

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- seq(0, 1, by = .01)
3. # Choosing the mean as 2.0 and standard deviation as 0.5.
4. y <- qnorm(x, mean = 2.0, sd = 0.5)
5. # Giving a name to the chart file.
6. png(file = "qnorm.png")
7. #Plotting the graph
8. plot(y,x)
9. # Saving the file.
10. dev.off()

Output:

rnorm():Random variates
The rnorm() function is used for generating normally distributed random numbers.
This function generates random numbers by taking the sample size as an input. Let's
see an example in which we draw a histogram for showing the distribution of the
generated numbers.

Example

1. # Creating a sequence of numbers between -1 and 20 incrementing by 0.2.


2. x <- rnorm(1500, mean=80, sd=15 )
3. # Giving a name to the chart file.
4. png(file = "rnorm.png")
5. #Creating histogram
6. hist(x,probability =TRUE,col="red",border="black")
7. # Saving the file.
8. dev.off()

Output:
Binomial Distribution
The binomial distribution is also known as discrete probability distribution, which
is used to find the probability of success of an event. The event has only two possible
outcomes in a series of experiments. The tossing of the coin is the best example of
the binomial distribution. When a coin is tossed, it gives either a head or a tail. The
probability of finding exactly three heads in repeatedly tossing the coin ten times is
approximate during the binomial distribution.

R allows us to create binomial distribution by providing the following function:

These function can have the following parameters:

S.No Parameter Description

1. x It is a vector of numbers.

2. p It is a vector of probabilities.

3. n It is a vector of observations.

4. size It is the number of trials.

5. prob It is the probability of the success of each trial.

Let's start understanding how these functions are used with the help of the examples
dbinom(): Direct Look-Up, Points
The dbinom() function of R calculates the probability density distribution at each
point. In simple words, it calculates the density function of the particular binomial
distribution.

Example

1. # Creating a sample of 100 numbers which are incremented by 1.5.


2. x <- seq(0,100,by = 1)
3. # Creating the binomial distribution.
4. y <- dbinom(x,50,0.5)
5. # Giving a name to the chart file.
6. png(file = "dbinom.png")
7. # Plotting the graph.
8. plot(x,y)
9. # Saving the file.
10. dev.off()

Output:
pbinom():Direct Look-Up, Intervals
The dbinom() function of R calculates the cumulative probability(a single value
representing the probability) of an event. In simple words, it calculates the
cumulative distribution function of the particular binomial distribution.

Example

1. # Probability of getting 20 or fewer heads from 48 tosses of a coin.


2. x <- pbinom(20,48,0.5)
3. #Showing output
4. print(x)

Output:
qbinom(): Inverse Look-Up
The qbinom() function of R takes the probability value and generates a number
whose cumulative value matches with the probability value. In simple words, it
calculates the inverse cumulative distribution function of the binomial distribution.

Let's find the number of heads that have a probability of 0.45 when a coin is tossed
51 times.

Example

1. # Finding number of heads with the help of qbinom() function


2. x <- qbinom(0.45,48,0.5)
3. #Showing output
4. print(x)

Output:

rbinom()
The rbinom() function of R is used to generate required number of random values for
given probability from a given sample.

Let's see an example in which we find nine random values from a sample of 160 with
a probability of 0.5.

Example

1. # Finding random values


2. x <- rbinom(9,160,0.5)
3. #Showing output
4. print(x)
Output:

T-Test in R
In statistics, the T-test is one of the most common test which is used to determine
whether the mean of the two groups is equal to each other. The assumption for the
test is that both groups are sampled from a normal distribution with equal
fluctuation. The null hypothesis is that the two means are the same, and the
alternative is that they are not identical. It is known that under the null hypothesis,
we can compute a t-statistic that will follow a t-distribution with n1 + n2 - 2 degrees
of freedom.

In R, there are various types of T-test like one sample and Welch T-test. R provides
a t.test() function, which provides a variety of T-tests.

There are the following syntaxes of t.test() function for different T-test

Independent 2-group T-test

1. t.test(y~x)
here, y is numeric, and x is a binary factor.

Independent 2-group T-test

1. t.test(y1,y2)

Here, y1 and y2 are numeric.

Paired T-test

1. t.test(y1,y2,paired=TRUE)

Here, y1 & y2 are numeric.

One sample T-test

1. t.test(y,mu=3)

Here, Ho: mu=3

How to perform T-tests in R


In the T-test, for specifying equal variances and a pooled variance estimate, we set
var.equal=True. We can also use alternative="less" or alternative="greater" for
specifying one-tailed test.

Let's see how one-sample, paired sample, and independent samples T-test is
performed.

One-Sample T-test
One-Sample T-test is a T-test which compares the mean of a vector against a
theoretical mean. There is a following formula which is used to compute the T-test :

Here,

1. M is the mean.
2. ? is the theoretical mean.
3. s is the standard deviation.
4. n is the number of observations.

For evaluating the statistical significance of the t-test, we need to compute the p-
value. The p-value range starts from 0 to 1, and is interpreted as follow:

o If the p-value is lower than 0.05, it means we are strongly confident to reject the null
hypothesis. So that H3 is accepted.
o If the p-value is higher than 0.05, then it indicates that we don't have enough
evidence to reject the null hypothesis.

We construct the pvalue by looking at the corresponding absolute value of the t-test.

In R, we use the following syntax of t.test() function for performing a one-sample T-


test in R.

1. t.test(x, ?=0)

Here,

1. x is the name of our variable of interest.


2. ? is described by the null hypothesis, which is set equal to the mean.

Example

Let's see an example of One-Sample T-test in which we test whether the volume of a
shipment of wood was less than usual(?0=0).

1. set.seed(0)
2. ship_vol <- c(rnorm(70, mean = 35000, sd = 2000))
3. t.test(ship_vol, mu = 35000)

Output:
Paired-Sample T-test
To perform a paired-sample test, we need two vectors data y1 and y2. Then, we will
run the code using the syntax t.test (y1, y2, paired = TRUE).

Example:

Suppose, we work in a large health clinic, and we are testing a new drug Procardia,
which aims to reduce high blood pressure. We find 13000 individuals with high
systolic blood pressure (x 150 = 150 mmHg, SD = 10 mmHg), and we provide them
with Procardia for a month, and then measure their blood pressure again. We find
that the average systolic blood pressure decreased to 144 mmHg with a standard
deviation of 9 mmHg.

1. set.seed(2800)
2. pre.treatment <- c(rnorm(2000, mean = 130, sd = 5))
3. post.treatment <- c(rnorm(2000, mean = 144, sd = 4))
4. t.test(pre_Treatment, post_Treatment, paired = TRUE)

Output:
Independent-Sample T-test
Depending on the structure of our data and the equality of their variance, the
independent-sample T-test can take one of the three forms, which are as follows:

1. Independent-Samples T-test where y1 and y2 are numeric.


2. Independent-Samples T-test where y1 is numeric and y2 is binary.
3. Independent-Samples T-test with equal variances not assumed.

There is the following general form of t.test() function for the independent-sample t-
test:

1. t.test(y1,y2, paired=FALSE)

By default, R assumes that the versions of y1 and y2 are unequal, thus defaulting to
Welch's test. For toggling this, we set the flag var.equal=TRUE.

Let's see some examples in which we test the hypothesis. In this hypothesis,
Clevelanders and New Yorkers spend different amounts for eating outside on a
monthly basis.

Example 1: Independent-Sample T-test where y1 and y2 are numeric

1. set.seed(0)
2. Spenders.Cleve <- rnorm(50, mean = 300, sd = 70)
3. Spenders.NY <- rnorm(50, mean = 350, sd = 70)
4. t.test(Spenders.Cleve, Spenders.NY, var.equal = TRUE)

Output:
Example 2: Where y1 is numeric and y2 are binary

1. set.seed(0)
2. Spenders.Cleve <- rnorm(50, mean = 300, sd = 70)
3. Spenders.NY <- rnorm(50, mean = 350, sd = 70)
4. Amount.Spent <- c(Spenders.Cleve, Spenders.NY)
5. city.name <- c(rep("Cleveland", 50), rep("New York", 50))
6. t.test(Amount.Spent ~ city.name, var.equal = TRUE)

Output:

Example 3: With equal variance not assumed

1. set.seed(0)
2. Spenders.Cleve <- rnorm(50, mean = 300, sd = 70)
3. Spenders.NY <- rnorm(50, mean = 350, sd = 70)
4. t.test(Spenders.Cleve, Spenders.NY, var.equal = FALSE)

Output:
Chi-Square Test
The Chi-Square Test is used to analyze the frequency table (i.e., contingency table),
which is formed by two categorical variables. The chi-square test evaluates whether
there is a significant relationship between the categories of the two variables.

The Chi-Square Test is a statistical method which is used to determine whether two
categorical variables have a significant correlation between them. These variables
should be from the same population and should be categorical like- Yes/No,
Red/Green, Male/Female, etc.

R provides chisq.test() function to perform chi-square test. This function takes data
as an input, which is in the table form, containing the count value of the variables in
the observation.
In R, there is the following syntax of chisq.test() function:

1. chisq.test(data)

Let's see an example in which we will take the Cars93 data present in the "Mass"
library. This data represents the sales of different models of cars in the year 1993.

Data:

1. library("MASS")
2. print(str(Cars93))

Output:

Example:

1. # Loading the Mass library.


2. library("MASS")
3. # Creating a data frame from the main data set.
4. car_data<- data.frame(Cars93$AirBags, Cars93$Type)
5. # Creating a table with the needed variables.
6. car_data = table(Cars93$AirBags, Cars93$Type)
7. print(car_data)
8. # Performing the Chi-Square test.
9. print(chisq.test(car_data))

Output:

R Classification
The idea of the classification algorithm is very simple. We predict the target class by
analyzing the training dataset. We use training datasets to obtain better boundary
conditions that can be used to determine each target class. Once the boundary
condition is determined, the next task is to predict the target class. The entire
process is known as classification.

There are some important points of classification algorithms:


o Classifier
It is an algorithm that maps the input data to a specific category.
o ClassificationModel
A classification model tries to draw some conclusions from the input values which are
given for training. This conclusion will predict class labels/categories for new data.
o Feature
It is an individual measurable property of an event being observed.
o Binaryclassification
It is a classification task that has two possible outcomes. E.g., Gender classification,
which has only two possible outcomes, i.e., Male and Female.
o Multi-classclassification
It is a classification task in which classification is done with more than two classes. An
example of multi-class classification is: an animal can be a dog or cat, but not both at
the same time.
o Multi-label classification
It is a classification task in which each sample is mapped with a set of target labels.
An example of multi-label classification is: a news article that can be about a person,
location, and sports at the same time.

Types of Classification Algorithms


In R, classification algorithms are broadly classified in the following types:

o Linear classifier
In machine learning, the main task of statistical classification is to use an object's
characteristics for finding to which class it belongs. This task is achieved by making a
classification decision based on the value of a linear combination of the
characteristics. In R, there are three linear classification algorithms which are as
follows:
1. Logistic Regression
2. Naive Bayes classifier
3. Fisher's linear discriminant
o Support vector machines
A support vector machine is the supervised learning algorithm that analyzes data that
are used for classification and regression analysis. In SVM, each data item is plotted
as a point in n-dimensional space with the value of each attribute, that is the value of
a particular coordinate.
Least squares support vector machines is mostly used classification algorithm in R.
o Quadratic classifiers
Quadratic classification algorithms are based on Bayes theorem. These classifiers
algorithms are different in their approach for classification from the logistic
regression. In logistic regression, it is possible to derive the probability of observation
directly for a class (Y = k) for a particular observation (X = x). But in quadratic
classifies, the observation is done in the following two steps:
1. In the first step, we identify the distribution for input X for each of the groups
or classes.
2. After that, we flip the distribution with the help of Bayes theorem to calculate
the probability.
o Kernel estimation
Kernel estimation is a non-parametric way of estimating the Probability Density
Function (PDF) of the continuous random variable. It is non-parametric because it
assumes no implicit distribution for the variable. Essentially, on each datum, a kernel
function is created with the datum at its center. It ensures that the kernel is
symmetric about the datum. The PDF is then estimated by adding all these kernel
functions and dividing it by the number of data to ensure that it satisfies the two
properties of the PDF:
1. Every possible value of the PDF should be non-negative.
2. The fixed integral of the PDF on its support set should be equal to 1.

In R, the k-nearest neighbor is the most used kernel estimation algorithm for
classification.

o Decision Trees
Decision Tree is a supervised learning algorithm that is used for classification and
regression tasks. In R, the decision tree classifier is implemented with the help of the
R machine learning caret package. The random forest algorithm is the mostly used
decision tree algorithm used in R.
o NeuralNetworks
The neural network is another classifier algorithm that is inspired by the human brain
for performing a particular task or function. These algorithms are mostly used in
image classification in R. To implement neural network algorithms, we have to install
the neuralnet package.
o Learningvectorquantization
Learning vector quantization is a classification algorithm that is used for binary and
multi-class problems. By learning the training dataset, the LVQ model creates
codebook vectors that represent class regions. They contain elements which are
placed around the respective class according to their matching level. If the element
matches, it moves closer to the target class, if it does not match, then it proceeds.

You might also like