Professional Documents
Culture Documents
intro to R
intro to R
intro to R
R packages are the collection of R functions, sample data, and compile codes. In the
R environment, these packages are stored under a directory called "library." During
installation, R installs a set of packages. We can add packages later when they are
needed for some specific purpose. Only the default packages will be available when
we start the R console. Other packages which are already installed will be loaded
explicitly to be used by the R program.
There is the following list of commands to be used to check, verify, and use the R
packages.
1. libPaths()
When the above code executes, it produces the following project, which may vary
depending on the local settings of our PCs & Laptops.
41.7M
941
Stay
[1] "C:/Users/ajeet/OneDrive/Documents/R/win-library/3.6"
[2] "C:/Program Files/R/R-3.6.1/library"
1. library()
When we execute the above function, it produces the following result, which may
vary depending on the local settings of our PCs or laptops.
Like library() function, R provides search() function to get all packages currently
loaded in the R environment.
1. search()
When we execute the above code, it will produce the following result, which may
vary depending on the local settings of our PCs and laptops:
List of R packages
R is the language of data science which includes a vast repository of packages. These
packages appeal to different regions which use R for their data purposes. CRAN has
10,000 packages, making it an ocean of superlative statistical work. There are lots of
packages in R, but we will discuss the important one.
There are some mostly used and popular packages which are as follows:
1) tidyr
The word tidyr comes from the word tidy, which means clear. So the tidyr package is
used to make the data' tidy'. This package works well with dplyr. This package is an
evolution of the reshape2 package.
2) ggplot2
R allows us to create graphics declaratively. R provides the ggplot package for this
purpose. This package is famous for its elegant and quality graphs which sets it apart
from other visualization packages.
3) ggraph
R provides an extension of ggplot known as ggraph. The limitation of ggplot is the
dependency on tabular data is taken away in ggraph.
4) dplyr
R allows us to perform data wrangling and data analysis. R provides the dplyr library
for this purpose. This library facilitates several functions for the data frame in R.
5) tidyquant
The tidyquant is a financial package which is used for carrying out quantitative
financial analysis. This package adds to the tidyverse universe as a financial package
which is used for importing, analyzing and visualizing the data.
6) dygraphs
The dygraphs package provides an interface to the main JavaScript library which we
can use for charting. This package is essentially used for plotting time-series data in
R.
7) leaflet
For creating interactive visualization, R provides the leaflet package. This package is
an open-source JavaScript library. The world's popular websites like the New York
Times, Github and Flicker, etc. are using leaflet. The leaflet package makes it easier to
interact with these sites.
8) ggmap
For delineating spatial visualization, the ggmap package is used. It is a mapping
package which consists of various tools for geolocating and routing.
9) glue
R provides the glue package to perform the operations of data wrangling. This
package is used for evaluating R expressions which are present within the string.
10) shiny
R allows us to develop interactive and aesthetically pleasing web apps by providing
a shiny package. This package provides various extensions with HTML widgets, CSS,
and JavaScript.
11) plotly
The plotly package provides online interactive and quality graphs. This package
extends upon the JavaScript library -plotly.js.
12) tidytext
The tidytext package provides various functions of text mining for word processing
and carrying out analysis through ggplot, dplyr, and other miscellaneous tools.
13) stringr
The stringr package provides simplicity and consistency to use wrappers for the
'stringi' package. The stringi package facilitates common string operations.
14) reshape2
This package facilitates flexible reorganization and aggregation of data using melt ()
and decast () functions.
15) dichromat
The R dichromat package is used to remove Red-Green or Blue-Green contrasts from
the colors.
16) digest
The digest package is used for the creation of cryptographic hash objects of R
functions.
17) MASS
The MASS package provides a large number of statistical functions. It provides
datasets that are in conjunction with the book "Modern Applied Statistics with S."
18) caret
R allows us to perform classification and regression tasks by providing the caret
package. CaretEnsemble is a feature of caret which is used for the combination of
different models.
19) e1071
The e1071 library provides useful functions which are essential for data analysis like
Naive Bayes, Fourier Transforms, SVMs, Clustering, and other miscellaneous
functions.
20) sentimentr
The sentiment package provides functions for carrying out sentiment analysis. It is
used to calculate text polarity at the sentence level and to perform aggregation by
rows or grouping variables.
There are the following data types which are used in R programming:
Logical True, False It is a special data type for data with only two possible values which can be
construed as true/false.
Numeric 12,32,112,5432 Decimal value is called numeric in R, and it is the default computational
data type.
Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,
Complex Z=1+2i, t=7+3i A complex value in R is defined as the pure imaginary value i.
Character 'a', '"good'", "TRUE", In R programming, a character is used to represent string values. We
'35.4' convert objects into character values with the help ofas.character() function.
When we execute the following program, it will give us the following output:
Data Structures in R Programming
Data structures are very important to understand. Data structure are the objects
which we will manipulate in our day-to-day basis in R. Dealing with object
conversions is the most common sources of despairs for beginners. We can say that
everything in R is an object.
Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R
data objects. There are six types of atomic vectors such as logical, integer, character,
double, and raw. "A vector is a collection of elements which is most commonly
of mode character, integer, logical or numeric" A vector can be one of the
following two types:
1. Atomic vector
2. Lists
List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a
single mode. A list contains a mixture of data types. The list is also known as generic
vectors because the element of the list can be of any type of R object. "A list is a
special type of vector in which each element can be a different type."
We can create a list with the help of list() or as.list(). We can use vector() to create a
required length empty list.
Arrays
There is another type of data objects which can store data in more than two
dimensions known as arrays. "An array is a collection of a similar data type with
contiguous memory allocation." Suppose, if we create an array of dimension (2, 3,
4) then it creates four rectangular matrices of two rows and three columns.
In R, an array is created with the help of array() function. This function takes a vector
as an input and uses the value in the dim parameter to create an array.
Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional
rectangular layout. In the matrix, elements of the same atomic types are contained.
For mathematical calculation, this can use a matrix containing the numeric element. A
matrix is created with the help of the matrix() function in R.
Syntax
Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in
which each column contains the value of one variable, and row contains the set of
value from each column.
Factors
Factors are also data objects that are used to categorize the data and store it as
levels. Factors can store both strings and integers. Columns have a limited number of
unique values so that factors are very useful in columns. It is very useful in data
analysis for statistical modeling.
Factors are created with the help of factor () function by taking a vector as an input
parameter.
Variables in R Programming
Variables are used to store the information to be manipulated and referenced in the
R program. The R variable can store an atomic vector, a group of atomic vectors, or a
combination of many R objects.
Language like C++ is statically typed, but R is a dynamically typed, means it check
the type of data type when the statement is run. A valid variable name contains
letter, numbers, dot and underlines characters. A variable name should start with a
letter or the dot not followed by a number.
var_name, Valid Variable can start with a dot, but dot should not be followed by a number. In
var.name this case, the variable will be invalid.
var_name% Invalid In R, we can't use any special character in the variable name except dot and
underscore.
.2var_name Invalid A variable name cannot start with a dot which is followed by a digit.
var_name2 Valid The variable contains letter, number and underscore and starts with a letter.
Assignment of variable
In R programming, there are three operators which we can use to assign the values
to the variable. We can use leftward, rightward, and equal_to operator for this
purpose.
There are two functions which are used to print the value of the variable i.e., print()
and cat(). The cat() function combines multiples values into a continuous print
output.
When we execute the above code in our R command prompt, it will give us the
following output:
We can check the data type of the variable with the help of the class() function. Let's
see an example:
1. variable_y<- 124
2. cat("The data type of variable_y is ",class(variable_y),"\n")
3.
4. variable_y<- "Learn R Programming"
5. cat(" Now the data type of variable_y is ",class(variable_y),"\n")
6.
7. variable_y<- 133L
8. cat(" Next the data type of variable_y becomes ",class(variable_y),"\n")
When we execute the above code in our R command prompt, it will give us the
following output:
Keywords in R Programming
In programming, a keyword is a word which is reserved by a program because it has
a special meaning. A keyword can be a command or a parameter. Like in C, C++,
Java, there is also a set of keywords in R. A keyword can't be used as a variable name.
Keywords are also called as "reserved names."
If else repeat
NaN NA NA_integer_
1) if
The if statement consists of a Boolean expression which is followed by one or more
statements. In R, if statement is the simplest conditional statement which is used to
decide whether a block of the statement will be executed or not.
Example
1. a<-11
2. if(a<15)
3. + print("I am lesser than 15")
Output:
2) else
The R else statement is associated with if statement. When the if statement's
condition is false only then else block will be executed. Let see an example to make it
clear:
Example:
1. a<-22
2. if(a<20){
3. cat("I am lesser than 20")
4. }else{
5. cat("I am larger than 20")
6. }
Output:
3) repeat
The repeat keyword is used to iterate over a block of code multiple numbers of
times. In R, repeat is a loop, and in this loop statement, there is no condition to exit
from the loop. For exiting the loop, we will use the break statement.
Example:
1. x <- 1
2. repeat {
3. cat(x)
4. x = x+1
5. if (x == 6){
6. break
7. }
8. }
Output:
4) while
A while keyword is used as a loop. The while loop is executed until the given
condition is true. This is also used to make an infinite loop.
Example:
1. a <- 20
2. while(a!=0){
3. cat(a)
4. a = a-2
5. }
Output:
5) function
A function is an object in R programming. The keyword function is used to create a
user-define function in R. R has some pre-defined functions also, such as seq, mean,
and sum.
Example:
1. new.function<- function(n) {
2. for(i in 1:n) {
3. a <- i^2
4. print(a)
5. }
6. }
7. new.function(6)
Output:
6) for
The for is a keyword which is used for looping or iterating over a sequence
(dictionary, string, list, set or tuple).
We can execute a set of a statement once for each item in the iterator (list, set, tuple,
etc.) with the help of for loop.
Example:
1. v <- LETTERS[1:4]
2. for ( i in v) {
3. print(i)
4. }
Output:
7) next
The next keyword skips the current iteration of a loop without terminating it. When R
parser found next, it skips further evaluation and starts the new iteration of the loop.
Example:
1. v <- LETTERS[1:6]
2. for ( i in v) {
3. if (i == "D") {
4. next
5. }
6. print(i)
7. }
Output:
8) break
The break keyword is used to terminate the loop if the condition is true. The control
of the program firstly passes to the outer statement then passes to the body of the
break statement.
Example:
1. n<-1
2. while(n<10){
3. if(n==3)
4. break
5. n=n+1
6. cat(n,"\n")
7. }
8. cat("End of the program")
Output:
9) TRUE/FALSE
The TRUE and FALSE keywords are used to represent a Boolean true and Boolean
false. If the given statement is true, then the interpreter returns true else the
interpreter returns false.
10) NULL
In R, NULL represents the null object. NULL is used to represent missing and
undefined values. NULL is the logical representation of a statement which is neither
TRUE nor FALSE.
Example:
1. as.null(list(a = 1, b = "c"))
Output:
Inf and -Inf are positive and negative infinity. NaN stands for 'Not a Number.' NaN
applies on numeric values and real and imaginary parts of complex values, but it will
not apply to the values of integer vectors.
Usage
1. is.finite(x)
2. is.infinite(x)
3. is.nan(x)
4.
5. Inf
6. NaN
12) NA
NA is a logical constant of length 1 that contains a missing value indicator. It can be
coerced to any other vector type except raw. There are other types of constant also,
such as NA_Integer_, NA_real_, NA_complex_, and NA_character. These constants are
of the other atomic vector type which supports missing values.
Usage
1. NA
2. is.na(x)
3. anyNA(x, recursive = FALSE)
4.
5. ## S3 method for class 'data.frame'
6. is.na(x)
7.
8. is.na(x) <- value
Operators in R
In computer programming, an operator is a symbol which represents an action. An
operator is a symbol which tells the compiler to perform
specific logical or mathematical manipulations. R programming is very rich in built-
in operators.
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
Arithmetic Operators
Arithmetic operators are the symbols which are used to represent arithmetic math
operations. The operators act on each and every element of the vector. There are
various arithmetic operators which are supported by R.
00:00/11:48
1. + This operator is used to add two vectors in R. a <- c(2, b <- c(11, 5, 3)
print(a+b)
3.3, 4)
It will give us the following
output:
[1] 13.0 8.3
5.0
3. * This operator is used to multiply two vectors with each b <- c(11, 5, 3)
print(a*b)
other. a <- c(2, 3.3, 4)
It will give us the following
output:
[1] 22.0 16.5
4.0
4. / This operator divides the vector from another one. a b <- c(11, 5, 3)
print(a/b)
<- c(2, 3.3, 4)
It will give us the following
output:
[1] 0.1818182
0.6600000 4.0000000
5. %% This operator is used to find the remainder of the first b <- c(11, 5, 3)
print(a%%b)
vector with the second vector. a <- c(2, 3.3, 4)
It will give us the following
output:
[1] 2.0 3.3 0
6. %/% This operator is used to find the division of the first a <- c(2, 3.3,
4)
vector with the second(quotient).
b <- c(11, 5, 3)
print(a%/%b)
7. ^ This operator raised the first vector to the exponent of b <- c(11, 5, 3)
print(a^b)
the second vector. a <- c(2, 3.3, 4)
It will give us the following
output:
[1] 0248.0000
391.3539 4.0000
Relational Operators
A relational operator is a symbol which defines some kind of relation between two
entities. These include numerical equalities and inequalities. A relational operator
compares each element of the first vector with the corresponding element of the
second vector. The result of the comparison will be a Boolean value. There are the
following relational operators which are supported by R:
1. > This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is greater than the corresponding element of the
b <- c(2, 4,
second vector. 6)
print(a>b)
2. < This operator will return TRUE when every element in the a <- c(1, 9,
5)
first vector is less then the corresponding element of the
b <- c(2, 4,
second vector. 6)
print(a<b)
3. <= This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is less than or equal to the corresponding
b <- c(2, 3,
element of another vector. 6)
print(a<=b)
4. >= This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is greater than or equal to the corresponding
b <- c(2, 3,
element of another vector. 6)
print(a>=b)
5. == This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is equal to the corresponding element of the
b <- c(2, 3,
second vector. 6)
print(a==b)
6. != This operator will return TRUE when every element in the a <- c(1, 3,
5)
first vector is not equal to the corresponding element of the
b <- c(2, 3,
second vector. 6)
print(a>=b)
Logical Operators
The logical operators allow a program to make a decision on the basis of multiple
conditions. In the program, each operand is considered as a condition which can be
evaluated to a false or true value. The value of the conditions is used to determine
the overall value of the op1 operator op2. Logical operators are applicable to those
vectors whose type is logical, numeric, or complex.
The logical operator compares each element of the first vector with the
corresponding element of the second vector.
1. & This operator is known as the Logical AND operator. This a <- c(3, 0,
TRUE, 2+2i)
operator takes the first element of both the vector and
b <- c(2, 4,
returns TRUE if both the elements are TRUE. TRUE, 2+3i)
print(a&b)
4. && This operator takes the first element of both the vector a <- c(3, 0,
TRUE, 2+2i)
and gives TRUE as a result, only if both are TRUE.
b <- c(2, 4,
TRUE, 2+3i)
print(a&&b)
5. || This operator takes the first element of both the vector a <- c(3, 0,
TRUE, 2+2i)
and gives the result TRUE, if one of them is true.
b <- c(2, 4,
TRUE, 2+3i)
print(a||b)
It will give us the following
output:
[1] TRUE
Assignment Operators
An assignment operator is used to assign a new value to a variable. In R, these
operators are used to assign values to vectors. There are the following types of
assignment
1. <- or = or <<- These operators are known as left assignment a <- c(3, 0, TRUE,
2+2i)
operators.
b <<- c(2, 4,
TRUE, 2+3i)
d = c(1, 2, TRUE,
2+3i)
print(a)
print(b)
print(d)
2. -> or ->> These operators are known as right assignment c(3, 0, TRUE,
2+2i) -> a
operators.
c(2, 4, TRUE,
2+3i) ->> b
print(a)
print(b)
Miscellaneous Operators
Miscellaneous operators are used for a special and specific purpose. These operators
are not used for general mathematical or logical computation. There are the
following miscellaneous operators which are supported in R
conditional statements :
if Statement
The if statement consists of the Boolean expressions followed by one or more
statements. The if statement is the simplest decision-making statement which helps
us to take a decision on the basis of the condition.
The if statement is a conditional programming statement which performs the
function and displays the information if it is proved true.
The block of code inside the if statement will be executed only when the boolean
expression evaluates to be true. If the statement evaluates false, then the code which
is mentioned after the condition will run.
1. if(boolean_expression) {
2. // If the boolean expression is true, then statement(s) will be executed.
3. }
Flow Chart
Let see some examples to understand how if statements work and perform a certain
task in R.
Example 1
1. x <-24L
2. y <- "shubham"
3. if(is.integer(x))
4. {
5. print("x is an Integer")
6. }
Output:
Example 2
1. x <-20
2. y<-24
3. count=0
4. if(x<y)
5. {
6. cat(x,"is a smaller number\n")
7. count=1
8. }
9. if(count==1){
10. cat("Block is successfully execute")
11. }
Output:
Example 3
1. x <-1
2. y<-24
3. count=0
4. while(x<y){
5. cat(x,"is a smaller number\n")
6. x=x+2
7. if(x==15)
8. break
9. }
Output:
Example 4
1. x <-24
2. if(x%%2==0){
3. cat(x," is an even number")
4. }
5. if(x%%2!=0){
6. cat(x," is an odd number")
7. }
Output:
Example 5
1. year
2. 1 = 2011
3. if(year1 %% 4 == 0) {
4. if(year1 %% 100 == 0) {
5. if(year1 %% 400 == 0) {
6. cat(year,"is a leap year")
7. } else {
8. cat(year,"is not a leap year")
9. }
10. } else {
11. cat(year,"is a leap year")
12. }
13. } else {
14. cat(year,"is not a leap year")
15. }
Output:
If-else statement
An if-else statement, else statement will be executed when the boolean
expression will false. In simple words, If a Boolean expression will have true value,
then the if block gets executed otherwise, the else block will get executed.
R programming treats any non-zero and non-null values as true, and if the value is
either zero or null, then it treats them as false.
1. if(boolean_expression) {
2. // statement(s) will be executed if the boolean expression is true.
3. } else {
4. // statement(s) will be executed if the boolean expression is false.
5. }
Flow Chart
Example 1
Output:
Example 2
1. x <- c("Hardwork","is","the","key","of","success")
2.
3. if("key" %in% x) {
4. print("key is found")
5. } else {
6. print("key is not found")
7. }
Output:
Example 3
1. a<- 100
2. #checking boolean condition
3. if(a<20){
4. cat("a is less than 20")
5. if(a%%2==0){
6. cat(" and an even number\n")
7. }
8. else{
9. cat(" but not an even number\n")
10. }
11. }else{
12. cat("a is greater than 20")
13. if(a%%2==0){
14. cat(" and an even number\n")
15. }
16. else{
17. cat(" but not an even number\n")
18. }
19. }
Output:
Example 4
1. a<- 'u'
2. if(a=='a'||a=='e'||a=='i'||a=='o'||a=='u'||a=='A'||a=='E'||a=='I'||a=='O'||a=='U'){
3. cat("character is a vowel\n")
4. }else{
5. cat("character is a constant")
6. }
7. cat("character is =",a)
8. }
Output:
Example 5
1. a<- 'u'
2. if(a=='a'||a=='e'||a=='i'||a=='o'||a=='u'||a=='A'||a=='E'||a=='I'||a=='O'||a=='U'){
3. cat("character is a vowel\n")
4. }else{
5. cat("character is a constant")
6. }
7. cat("character is =",a)
8. }
Output:
R else if statement
This statement is also known as nested if-else statement. The if statement is followed
by an optional else if..... else statement. This statement is used to test various
condition in a single if......else if statement. There are some key points which are
necessary to keep in mind when we are using the if.....else if.....else statement. These
points are as follows:
1. if statement can have either zero or one else statement and it must come after
any else if's statement.
2. if statement can have many else if's statement and they come before the else
statement.
3. Once an else if statement succeeds, none of the remaining else if's or else's will be
tested.
1. if(boolean_expression 1) {
2. // This block executes when the boolean expression 1 is true.
3. } else if( boolean_expression 2) {
4. // This block executes when the boolean expression 2 is true.
5. } else if( boolean_expression 3) {
6. // This block executes when the boolean expression 3 is true.
7. } else {
8. // This block executes when none of the above condition is true.
9. }
Flow Chart
Example 1
Output:
Example 2
1. marks=83;
2. if(marks>75){
3. print("First class")
4. }else if(marks>65){
5. print("Second class")
6. }else if(marks>55){
7. print("Third class")
8. }else{
9. print("Fail")
10. }
Output:
Example 3
Output:
Example 4
1. x <- c("Hardwork","is","the","key","of","success")
2. if("Success" %in% x) {
3. print("success is found in the first time")
4. } else if ("success" %in% x) {
5. print("success is found in the second time")
6. } else {
7. print("No success found")
8. }
Output:
Example 5
1. n1=4
2. n2=87
3. n3=43
4. n4=74
5. if(n1>n2){
6. if(n1>n3&&n1>n4){
7. largest=n1
8. }
9. }else if(n2>n3){
10. if(n2>n1&&n2>n4){
11. largest=n2
12. }
13. }else if(n3>n4){
14. if(n3>n1&&n3>n2){
15. largest=n3
16. }
17. }else{
18. largest=n4
19. }
20. cat("Largest number is =",largest)
Output:
R Switch Statement
A switch statement is a selection control mechanism that allows the value of an
expression to change the control flow of program execution via map and search.
The switch statement is used in place of long if statements which compare a variable
with several integral values. It is a multi-way branch statement which provides an
easy way to dispatch execution for different parts of code. This code is based on the
value of the expression.
This statement allows a variable to be tested for equality against a list of values. A
switch statement is a little bit complicated. To understand it, we have some key
points which are as follows:
o If expression type is a character string, the string is matched to the listed cases.
o If there is more than one match, the first match element is used.
o No default case is available.
o If no case is matched, an unnamed case is used.
There are basically two ways in which one of the cases is selected:
1) Based on Index
If the cases are values like a character vector, and the expression is evaluated to a
number than the expression's result is used as an index to select the case.
Flow Chart
Example 1
1. x <- switch(
2. 3,
3. "Shubham",
4. "Nishka",
5. "Gunjan",
6. "Sumit"
7. )
8. print(x)
Output:
Example 2
1. ax= 1
2. bx = 2
3. y = switch(
4. ax+bx,
5. "Hello, Shubham",
6. "Hello Arpita",
7. "Hello Vaishali",
8. "Hello Nishka"
9. )
10. print (y)
Output:
Example 3
1. y = "18"
2. x = switch(
3. y,
4. "9"="Hello Arpita",
5. "12"="Hello Vaishali",
6. "18"="Hello Nishka",
7. "21"="Hello Shubham"
8. )
9.
10. print (x)
Output:
Example 4
1. x= "2"
2. y="1"
3. a = switch(
4. paste(x,y,sep=""),
5. "9"="Hello Arpita",
6. "12"="Hello Vaishali",
7. "18"="Hello Nishka",
8. "21"="Hello Shubham"
9. )
10.
11. print (a)
Output:
Example 5
1. y = "18"
2. a=10
3. b=2
4. x = switch(
5. y,
6. "9"=cat("Addition=",a+b),
7. "12"=cat("Subtraction =",a-b),
8. "18"=cat("Division= ",a/b),
9. "21"=cat("multiplication =",a*b)
10. )
11.
12. print (x)
Output:
next Statement
The next statement is used to skip any remaining statements in the loop and
continue executing. In simple words, a next statement is a statement which skips the
current iteration of a loop without terminating it. When the next statement is
encountered, the R parser skips further evaluation and starts the next iteration of the
loop.
This statement is mostly used with for loop and while loop.
Note: In else branch of the if-else statement, the next statement can also be used.
Syntax
There is the following syntax for creating the next statement in R
1. next
Flowchart
1. a <- 1
2. repeat {
3. if(a == 10)
4. break
5. if(a == 5){
6. next
7. }
8. print(a)
9. a <- a+1
10. }
Output:
Example 2: next in while loop
1. a<-1
2. while (a < 10) {
3. if(a==5)
4. next
5. print(a)
6. a=a+1
7. }
Output:
1. x <- 1:10
2. for (val in x) {
3. if (val == 3){
4. next
5. }
6. print(val)
7. }
Output:
Example 4
1. a1<- c(10L,-11L,12L,-13L,14L,-15L,16L,-17L,18L)
2. sum<-0
3. for(i in a1){
4. if(i<0){
5. next
6. }
7. sum=sum+i
8. }
9. cat("The sum of all positive numbers in array is=",sum)
Output:
Example 5
1. j<-0
2. while(j<10){
3. if (j==7){
4. j=j+1
5. next
6. }
7. cat("\nnumber is =",j)
8. j=j+1
9. }
Output:
Break Statement
In the R language, the break statement is used to break the execution and for an
immediate exit from the loop. In nested loops, break exits from the innermost loop
only and control transfer to the outer loop.
It is useful to manage and control the program execution flow. We can use it to
various loops like: for, repeat, etc.
There are basically two usages of break statement which are as follows:
1. When the break statement is inside the loop, the loop terminates immediately and
program control resumes on the next statement after the loop.
2. It is also used to terminate a case in the switch statement.
Note: We can also use break statement inside the else branch
of if...else statement.
Syntax
There is the following syntax for creating a break statement in R
1. break
Flowchart
1. a <- 1
2. repeat {
3. print("hello");
4. if(a >= 5)
5. break
6. a<-a+1
7. }
Output:
Example 2
1. v <- c("Hello","loop")
2. count <- 2
3. repeat {
4. print(v)
5. count <- count + 1
6. if(count > 5) {
7. break
8. }
9. }
Output:
Output:
1. for (i in c(2,4,6,8)) {
2. for (j in c(1,3)) {
3. if (i==6)
4. break
5. print(i)
6. }
7. }
Output:
Example 5
1. num=7
2. flag = 0
3. if(num> 1) {
4. flag = 1
5. for(i in 2:(num-1)) {
6. if ((num %% i) == 0) {
7. flag = 0
8. break
9. }
10. }
11. }
12. if(num == 2) flag = 1
13. if(flag == 1) {
14. print(paste(num,"is a prime number"))
15. } else {
16. print(paste(num,"is not a prime number"))
17. }
Output:
For Loop
A for loop is the most popular control flow statement. A for loop is used to iterate a
vector. It is similar to the while loop. There is only one difference between for and
while, i.e., in while loop, the condition is checked before the execution of the body,
but in for loop condition is checked after the execution of the body.
Flowchart
Example 1: We iterate all the elements of a vector and print the current value.
Output
Output
1. # Creating a matrix
2. mat <- matrix(data = seq(10, 21, by=1), nrow = 6, ncol =2)
3. # Creating the loop with r and c to iterate over the matrix
4. for (r in 1:nrow(mat))
5. for (c in 1:ncol(mat))
6. print(paste("mat[", r, ",",c, "]=", mat[r,c]))
7. print(mat)
Output
Output
Example 5: count the number of even numbers in a vector.# Create a list with three
vectors.
1. x <- c(2,5,3,9,8,11,6,44,43,47,67,95,33,65,12,45,12)
2. count <- 0
3. for (val in x) {
4. if(val %% 2 == 0) count = count+1
5. }
6. print(count)
Output
R repeat loop
A repeat loop is used to iterate a block of code. It is a special type of loop in which
there is no condition to exit from the loop. For exiting, we include a break statement
with a user-defined condition. This property of the loop makes it different from the
other loops.
A repeat loop constructs with the help of the repeat keyword in R. It is very easy to
construct an infinite loop in R.
The basic syntax of the repeat loop is as follows:
1. repeat {
2. commands
3. if(condition) {
4. break
5. }
6. }
Flowchart
1. First, we have to initialize our variables than it will enter into the Repeat loop.
2. This loop will execute the group of statements inside the loop.
3. After that, we have to use any expression inside the loop to exit.
4. It will check for the condition. It will execute a break statement to exit from the loop
5. If the condition is true.
6. The statements inside the repeat loop will be executed again if the condition is false.
Example 1:
1. v <- c("Hello","repeat","loop")
2. cnt <- 2
3. repeat {
4. print(v)
5. cnt <- cnt+1
6.
7. if(cnt > 5) {
8. break
9. }
10. }
Output
Example 2:
1. sum <- 0
2. {
3. n1<-readline(prompt="Enter any integer value below 20: " )
4. n1<-as.integer(n1)
5. }
6. repeat{
7. sum<-sum+n1
8. n1n1=n1+1
9. if(n1>20){
10. break
11. }
12. }
13. cat("The sum of numbers from the repeat loop is: ",sum)
Output
Example 3: Infinity repeat loop
1. total<-0
2. number<-readline(prompt="please enter any integer value: ")
3. repeat{
4. totaltotal=total+number
5. numbernumber=number+1
6. cat("sum is =",total)
7. }
Output
Example 4: repeat loop with next
1. a <- 1
2. repeat {
3. if(a == 10)
4. break
5. if(a == 7){
6. aa=a+1
7. next
8. }
9. print(a)
10. a <- a+1
11. }
Output
Example 5:
Output
while loop
A while loop is a type of control flow statements which is used to iterate a block of
code several numbers of times. The while loop terminates when the value of the
Boolean expression will be false.
In while loop, firstly the condition will be checked and then after the body of the
statement will execute. In this statement, the condition will be checked n+1 time,
rather than n times.
1. while (test_expression) {
2. statement
3. }
Flowchart
00:00/05:29
Example 1:
1. v <- c("Hello","while loop","example")
2. cnt <- 2
3. while (cnt < 7) {
4. print(v)
5. cntcnt = cnt + 1
6. }}
Output
Output
Example 3: Program to check a number is palindrome or not.
Output
Example 4: Program to check a number is Armstrong or not.
Output
Example 5: program to find the frequency of a digit in the number.
Output
R Functions
A set of statements which are organized together to perform a specific task is known
as a function. R provides a series of in-built functions, and it allows the user to create
their own functions. Functions are used to perform tasks in the modular approach.
Functions are used to avoid repeating the same task and to reduce complexity. To
understand and maintain our code, we logically break it into smaller parts using the
function. A function should be
"An R function is created by using the keyword function." There is the following
syntax of R function:
Function Name
The function name is the actual name of the function. In R, the function is stored as
an object with its name.
Arguments
Function Body
The function body contains a set of statements which defines what the function does.
Return value
Built-in function
The functions which are already created or defined in the programming framework
are known as built-in functions. User doesn't need to create these types of functions,
and these functions are built into an application. End-users can access these
functions by simply calling it. R have different types of built-in functions such as
seq(), mean(), max(), and sum(x) etc.
Output:
User-defined function
R allows us to create our own function in our program. A user defines a user-define
function to fulfill the requirement of user. Once these functions are created, we can
use these functions like in-built function.
Output:
Output:
Output:
Function calling with default arguments
To get the default result, we assign the value to the arguments in the function
definition, and then we call the function without supplying argument. If we pass any
argument in the function call, then it will get replaced with the default value of the
argument in the function definition.
Output:
R Built-in Functions
The functions which are already created or defined in the programming framework
are known as a built-in function. R has a rich set of functions that can be used to
perform almost every task for the user. These built-in functions are divided into the
following categories based on their functionality.
Math Functions
R provides the various mathematical functions to perform the mathematical
calculation. These mathematical functions are very helpful to find absolute value,
square value and much more calculations. In R, there are the following functions
which are used:
3. ceiling(x) It returns the smallest integer which is larger than x<- 4.5
print(ceiling(x))
or equal to x.
Output
[1] 5
4. floor(x) It returns the largest integer, which is smaller than x<- 2.5
print(floor(x))
or equal to x.
Output
[1] 2
String Function
R provides various string functions to perform tasks. These string functions allow us
to extract sub string from string, search pattern etc. There are the following string
functions in R:
1. dnorm(x, m=0, sd=1, log=False) It is used to find the height of the probability distribution a <- seq(
b <- dnor
at each point to a given mean and standard deviation
png(file=
plot(x,y)
dev.off()
2. pnorm(q, m=0, sd=1, it is used to find the probability of a normally distributed a <- seq(
b <- dnor
lower.tail=TRUE, log.p=FALSE) random numbers which are less than the value of a given
png(file=
number. plot(x,y)
dev.off()
3. qnorm(p, m=0, sd=1) It is used to find a number whose cumulative value a <- seq(
b <- qnor
matches with the probability value.
png(file=
plot(x,y)
dev.off()
4. rnorm(n, m=0, sd=1) It is used to generate random numbers whose distribution y <- rnor
png(file=
is normal.
hist(y, m
dev.off()
5. dbinom(x, size, prob) It is used to find the probability density distribution at each a<-seq(0,
b<- dbino
point.
png(file=
plot(x,y)
dev.off()
6. pbinom(q, size, prob) It is used to find the cumulative probability (a single value a <- pbin
print(a)
representing the probability) of an event.
Output
[1] 0.95
7. qbinom(p, size, prob) It is used to find a number whose cumulative value a <- qbin
print(a)
matches the probability value.
Output
[1] 18
8. rbinom(n, size, prob) It is used to generate required number of random values a <- rbin
of a given probability from a given sample. print(a)
Output
[1] 55
11. rpois(n, lamba) It is used to generate random numbers from the poisson rpois(10,
[1] 6 1
distribution.
12. dunif(x, min=0, max=1) This function provide information about the uniform dunif(x,
distribution on the interval from min to max. It gives the
density.
1. mean(x, trim=0, It is used to find the mean for x object a<-c(0:10, 40)
xm<-mean(a)
na.rm=FALSE)
print(xm)
Output
[1] 7.916667
R Vector
A vector is a basic data structure which plays an important role in R programming.
In R, a sequence of elements which share the same data type is known as vector. A
vector supports logical, integer, double, character, complex, or raw data type. The
elements which are contained in vector known as components of the vector. We can
check the type of vector with the help of the typeof() function.
The length is an important property of a vector. A vector length is basically the
number of elements in the vector, and it is calculated with the help of the length()
function.
Vector is classified into two parts, i.e., Atomic vectors and Lists. They have three
common properties, i.e., function type, function length, and attribute function.
44.5M
722
1. z<-x:y
Example:
1. a<-4:-10
2. a
Output
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
Example:
1. seq_vec<-seq(1,4,by=0.5)
2. seq_vec
3. class(seq_vec)
Output
Example:
1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. class(seq_vec)
Output
Atomic vectors in R
In R, there are four types of atomic vectors. Atomic vectors play an important role in
Data Science. Atomic vectors are created with the help of c() function. These atomic
vectors are as follows:
Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal
value to any variable d, then this d variable will become a numeric type. A vector
which contains numeric elements is known as a numeric vector.
Example:
1. d<-45.5
2. num_vec<-c(10.1, 10.2, 33.2)
3. d
4. num_vec
5. class(d)
6. class(num_vec)
Output
[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"
Integer vector
A non-fraction numeric value is known as integer data. This integer data is
represented by "Int." The Int size is 2 bytes and long Int size of 4 bytes. There is two
way to assign an integer value to a variable, i.e., by using as.integer() function and
appending of L to the value.
Example:
1. d<-as.integer(5)
2. e<-5L
3. int_vec<-c(1,2,3,4,5)
4. int_vec<-as.integer(int_vec)
5. int_vec1<-c(1L,2L,3L,4L,5L)
6. class(d)
7. class(e)
8. class(int_vec)
9. class(int_vec1)
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
Character vector
A character is held as a one-byte integer in memory. In R, there are two different
ways to create a character data type value, i.e., using as.character() function and by
typing string between double quotes("") or single quotes('').
Example:
1. d<-'shubham'
2. e<-"Arpita"
3. f<-65
4. f<-as.character(f)
5. d
6. e
7. f
8. char_vec<-c(1,2,3,4,5)
9. char_vec<-as.character(char_vec)
10. char_vec1<-c("shubham","arpita","nishka","vaishali")
11. char_vec
12. class(d)
13. class(e)
14. class(f)
15. class(char_vec)
16. class(char_vec1)
Output
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
Logical vector
The logical data types have only two values i.e., True or False. These values are based
on which condition is satisfied. A vector which contains Boolean values is known as
the logical vector.
Example:
1. d<-as.integer(5)
2. e<-as.integer(6)
3. f<-as.integer(7)
4. g<-d>e
5. h<-e<f
6. g
7. h
8. log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
9. log_vec
10. class(g)
11. class(h)
12. class(log_vec)
Output
[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"
Example:
1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. seq_vec[2]
Output
Example:
1. char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)
2. char_vec
3. char_vec["arpita"]
Output
shubhamarpitavaishali
22 23 25
arpita
23
Example:
1. a<-c(1,2,3,4,5,6)
2. a[c(TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)]
Output
[1] 1 3 4 6
Vector Operation
In R, there are various operation which is performed on the vector. We can add,
subtract, multiply or divide two or more vectors from each other. In data science, R
plays an important role, and operations are required for data manipulation. There are
the following types of operation which are performed on the vector.
1) Combining vectors
The c() function is not only used to create a vector, but also it is also used to combine
two vectors. By combining one or more vectors, it forms a new vector which contains
all the elements of each vector. Let see an example to see how c() function combines
the vectors.
Example:
1. p<-c(1,2,4,5,7,8)
2. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
3. r<-c(p,q)
Output
2) Arithmetic operations
We can perform all the arithmetic operation on vectors. The arithmetic operations
are performed member-by-member on vectors. We can add, subtract, multiply, or
divide two vectors. Let see an example to understand how arithmetic operations are
performed on vectors.
Example:
1. a<-c(1,3,5,7)
2. b<-c(2,4,6,8)
3. a+b
4. a-b
5. a/b
6. a%%b
Output
[1] 3 7 11 15
[1] -1 -1 -1 -1
[1] 2 12 30 56
[1] 0.5000000 0.7500000 0.8333333 0.8750000
[1] 1 3 5 7
Example:
1. a<-c("Shubham","Arpita","Nishka","Vaishali","Sumit","Gunjan")
2. b<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
3. a[b]
Output
4) Numeric Index
In R, we specify the index between square braces [ ] for indexing a numerical value. If
our index is negative, it will return us all the values except for the index which we
have specified. For example, specifying [-3] will prompt R to convert -3 into its
absolute value and then search for the value which occupies that index.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. q[2]
3. q[-4]
4. q[15]
Output
[1] "arpita"
[1] "shubham" "arpita" "nishka" "vaishali" "sumit"
[1] NA
5) Duplicate Index
An index vector allows duplicate values which means we can access one element
twice in one operation. Let see an example to understand how duplicate index works.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. q[c(2,4,4,3)]
Output
6) Range Indexes
Range index is used to slice our vector to form a new vector. For slicing, we used
colon(:) operator. Range indexes are very helpful for the situation involving a large
operator. Let see an example to understand how slicing is done with the help of the
colon operator to form a new vector.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. b<-q[2:5]
3. b
Output
7) Out-of-order Indexes
In R, the index vector can be out-of-order. Below is an example in which a vector
slice with the order of first and second values reversed.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")b<-q[2:5]
2. q[c(2,1,3,4,5,6)]
Output
1. z=c("TensorFlow","PyTorch")
2. z
Output
Once our vector of characters is created, we name the first vector member as "Start"
and the second member as "End" as:
1. names(z)=c("Start","End")
2. z
Output
Start End
"TensorFlow" "PyTorch"
1. z["Start"]
Output
Start
"TensorFlow"
We can reverse the order with the help of the character string index vector.
1. z[c("Second","First")]
Output
Second First
"PyTorch" "TensorFlow"
Applications of vectors
1. In machine learning for principal component analysis vectors are used. They are
extended to eigenvalues and eigenvector and then used for performing
decomposition in vector spaces.
2. The inputs which are provided to the deep learning model are in the form of vectors.
These vectors consist of standardized data which is supplied to the input layer of the
neural network.
3. In the development of support vector machine algorithms, vectors are used.
4. Vector operations are utilized in neural networks for various operations like image
recognition and text processing.
R Lists
In R, lists are the second type of vector. Lists are the objects of R which contain
elements of different types such as number, vectors, string and another list inside it.
It can also contain a function or a matrix as its elements. A list is a data structure
which has components of mixed data types. We can say, a list is a generic vector
which contains other objects.
Example
Output:
[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE
Lists creation
The process of creating a list is the same as a vector. In R, the vector is created with
the help of c() function. Like c() function, there is another function, i.e., list() which is
used to create a list in R. A list avoid the drawback of the vector which is data type.
We can add the elements in the list of different data types.
Syntax
6M
Microsoft, Activision Blizzard and More Join Others in Suspending Russian Sales
1. list()
1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1
6. list_2
7. list_3
8. list_4
Output:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"
[[1]]
[1] 1 2 3
[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
1. list_data<-list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,22.5,12L)
2. print(list_data)
In the above example, the list function will create a list with character, logical,
numeric, and vector element. It will give the following output
Output:
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] 1 2 3 4 5
[[4]]
[1] TRUE
[[5]]
[1] FALSE
[[6]]
[1] 22.5
[[7]]
[1] 12
1. Creating a list.
2. Assign a name to the list elements with the help of names() function.
3. Print the list data.
Let see an example to understand how we can give the names to the list elements.
Example
Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
Accessing List Elements
R provides two ways through which we can access the elements of a list. First one is
the indexing method performed in the same way as a vector. In the second one, we
can access the elements of a list with the help of names. It will be possible only with
the named list.; we cannot access the elements of a list using names if the list is
normal.
Let see an example of both methods to understand how they are used in the list to
access elements.
3. list("BCA","MCA","B.tech"))
4. # Accessing the first element of the list.
5. print(list_data[1])
6.
7. # Accessing the third element. The third element is also a list, so all its elemen
ts will be printed.
8. print(list_data[3])
Output:
[[1]]
[1] "Shubham" "Arpita" "Nishka"
[[1]]
[[1]][[1]]
[1] "BCA"
[[1]][[2]]
[1] "MCA"
[[1]][[3]]
[1] "B.tech"
Output:
$Student
[1] "Shubham" "Arpita" "Nishka"
$Student
[1] "Shubham" "Arpita" "Nishka"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
Example
3. list("BCA","MCA","B.tech"))
4.
5. # Giving names to the elements in the list.
6. names(list_data) <- c("Student", "Marks", "Course")
7.
8. # Adding element at the end of the list.
9. list_data[4] <- "Moradabad"
10. print(list_data[4])
11.
12. # Removing the last element.
13. list_data[4] <- NULL
14.
15. # Printing the 4th Element.
16. print(list_data[4])
17.
18. # Updating the 3rd Element.
19. list_data[3] <- "Masters of computer applications"
20. print(list_data[3])
Output:
[[1]]
[1] "Moradabad"
$<NA>
NULL
$Course
[1] "Masters of computer applications"
The unlist() function takes the list as a parameter and change into a vector. Let see an
example to understand how to unlist() function is used in R.
Example
1. # Creating lists.
2. list1 <- list(10:20)
3. print(list1)
4.
5. list2 <-list(5:14)
6. print(list2)
7.
8. # Converting the lists to vectors.
9. v1 <- unlist(list1)
10. v2 <- unlist(list2)
11.
12. print(v1)
13. print(v2)
14.
15. adding the vectors
16. result <- v1+v2
17. print(result)
Output:
[[1]]
[1] 1 2 3 4 5
[[1]]
[1] 10 11 12 13 14
[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19
Merging Lists
R allows us to merge one or more lists into one list. Merging is done with the help of
the list() function also. To merge the lists, we have to pass all the lists into list
function as a parameter, and it returns a list which contains all the elements which
are present in the lists. Let see an example to understand how the merging process is
done.
Example
Output:
[[1]]
[[1]][[1]]
[1] 2
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 6
[[1]][[4]]
[1] 8
[[1]][[5]]
[1] 10
[[2]]
[[2]][[1]]
[1] 1
[[2]][[2]]
[1] 3
[[2]][[3]]
[1] 5
[[2]][[4]]
[1] 7
[[2]][[5]]
[1] 9
R Arrays
In R, arrays are the data objects which allow us to store data in more than two
dimensions. In R, an array is created with the help of the array() function. This array()
function takes a vector as an input and to create an array it uses vectors values in
the dim parameter.
For example- if we will create an array of dimension (2, 3, 4) then it will create 4
rectangular matrices of 2 row and 3 columns.
R Array Syntax
There is the following syntax of R arrays:
The data is the first argument in the array() function. It is an input vector which is
given to the array.
matrices
row_size
This parameter defines the number of row elements which an array can store.
column_size
This parameter defines the number of columns elements which an array can store.
dim_names
This parameter is used to change the default names of rows and columns.
How to create?
In R, array creation is quite simple. We can easily create an array using vector and
array() function. In array, data is stored in the form of the matrix. There are only two
steps to create a matrix which are as follows
Let see an example to understand how we can implement an array with the help of
the vectors and array() function.
Example
Output
, , 1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
, , 2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
It is not necessary to give the name to the rows and columns. It is only used to
differentiate the row and column for better understanding.
Below is an example, in which we create two arrays and giving names to the rows,
columns, and matrices.
Example
Output
, , Matrix1
, , Matrix2
Example
1. , , Matrix1
2. Col1 Col2 Col3
3. Row1 1 10 13
4. Row2 3 11 14
5. Row3 5 12 15
6.
7. , , Matrix2
8. Col1 Col2 Col3
9. Row1 1 10 13
10. Row2 3 11 14
11. Row3 5 12 15
12.
13. Col1 Col2 Col3
14. 5 12 15
15.
16. [1] 13
17.
18. Col1 Col2 Col3
19. Row1 1 10 13
20. Row2 3 11 14
21. Row3 5 12 15
Manipulation of elements
The array is made up matrices in multiple dimensions so that the operations on
elements of an array are carried out by accessing elements of the matrices.
Example
Output
, , 1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
, , 2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
, , 1
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
, , 2
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
[,1] [,2] [,3]
[1,] 9 26 59
[2,] 7 84 50
[3,] 12 60 88
This function takes the array on which we have to perform the calculations. The basic
syntax of the apply() function is as follows:
Here, x is an array, and a margin is the name of the dataset which is used and fun is
the function which is to be applied to the elements of the array.
Example
Output
, , 1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
, , 2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
[1] 48 56 64
R Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created
with the help of the vector input to the matrix function. On R matrices, we can
perform addition, subtraction, multiplication, and division operation.
In the R matrix, elements are arranged in a fixed number of rows and columns. The
matrix elements are the real numbers. In R, we use matrix function, which can easily
reproduce the memory representation of the matrix. In the R matrix, all the elements
must share a common basic type.
Example
1. matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)
2. matrix1
Output
History of matrices in R
The word "Matrix" is the Latin word for womb which means a place where something
is formed or produced. Two authors of historical importance have used the word
"Matrix" for unusual ways. They proposed this axiom as a means to reduce any
function to one of the lower types so that at the "bottom" (0order) the function is
identical to its extension.
3.8M
Coinbase’s Super Bowl Ad Was So Successful It Crashed the App
Any possible function other than a matrix from the matrix holds true with the help of
the process of generalization. It will be true only when the proposition (which asserts
function in question) is true. It will hold true for all or one of the value of argument
only when the other argument is undetermined.
data
The first argument in matrix function is data. It is the input vector which is the data
elements of the matrix.
nrow
The second argument is the number of rows which we want to create in the matrix.
ncol
The third argument is the number of columns which we want to create in the matrix.
byrow
The byrow parameter is a logical clue. If its value is true, then the input vector
elements are arranged by row.
dim_name
The dim_name parameter is the name assigned to the rows and columns.
Let's see an example to understand how matrix function is used to create a matrix
and arrange the elements sequentially by row or column.
Example
Output
1. We can access the element which presents on nth row and mth column.
2. We can access all the elements of the matrix which are present on the nth row.
3. We can also access all the elements of the matrix which are present on the mth
column.
Let see an example to understand how elements are accessed from the matrix
present on nth row mth column, nth row, or mth column.
Example
Output
[1] 12
1. matrix[n, m]<-y
Here, n and m are the rows and columns of the element, respectively. And, y is the
value which we assign to modify our matrix.
Example
Output
Example 1
Output
Example 2
Output
Example 1
Output
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 5 8 11 14 6 9 12 15 7 10 13 16
Matrix operations
In R, we can perform the mathematical operations on a matrix such as addition,
subtraction, multiplication, etc. For performing the mathematical operation on the
matrix, it is required that both the matrix should have the same dimensions.
Example 1
Output
Applications of matrix
1. In geology, Matrices takes surveys and plot graphs, statistics, and used to study in
different fields.
2. Matrix is the representation method which helps in plotting common survey things.
3. In robotics and automation, Matrices have the topmost elements for the robot
movements.
4. Matrices are mainly used in calculating the gross domestic products in Economics,
and it also helps in calculating the capability of goods and products.
5. In computer-based application, matrices play a crucial role in the creation of realistic
seeming motion.
R Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column
contains values of one variable, and rows contains one set of values from each
column. A data frame is a special case of the list in which each component has equal
length.
A data frame is used to store data table and the vectors which are present in the
form of a list in a data frame, are of equal length.
In a simple way, it is a list of equal length vectors. A matrix can contain one type of
data, but a data frame can contain different data types such as numeric, character,
factor, etc.
Keep Watching
Example
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita915.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
Example
Output
1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.
Let's see an example of each one to understand how data is extracted from the data
frame with the help these ways.
Output
emp.data.employee_idemp.data.sal
1 1 623.30
2 2 515.20
3 3 611.00
4 4 729.00
5 5 843.25
Output
employee_idemployee_namesalstarting_date
1 1 Shubham 623.3 2012-01-01
employee_idemployee_namesalstarting_date
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
Output
employee_idstarting_date
2 2 2013-09-23
3 3 2014-11-15
We can
1. Add a column by adding a column vector with the help of a new column name using
cbind() function.
2. Add rows by adding new rows in the same structure as the existing data frame and
using rbind() function
3. Delete the columns by assigning a NULL value to them.
4. Delete the rows by re-assignment to them.
Let's see an example to understand how rbind() function works and how the
modification is done in our data frame.
Output
employee_idemployee_namesalstarting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
employee_idemployee_namesalstarting_date
1 1 Shubham 623.30 2012-01-01
2 2 Arpita 515.20 2013-09-23
3 3 Nishka 611.00 2014-11-15
4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27
6 6 Vaishali 547.00 2015-09-01
employee_idemployee_namesalstarting_date Address
1 1 Shubham 623.30 2012-01-01
Moradabad
2 2 Arpita 515.20 2013-09-23
Lucknow
3 3 Nishka 611.00 2014-11-15 Etah
4 4 Gunjan 729.00 2014-05-11
Sambhal
5 5 Sumit 843.25 2015-03-27 Khurja
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesalstarting_date
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesal
1 1 Shubham623.30
2 2 Arpita515.20
3 3 Nishka611.00
4 4 Gunjan729.00
5 5 Sumit843.25
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesalstarting_date
Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27
R factors
The factor is a data structure which is used for fields which take only predefined finite
number of values. These are the variable which takes a limited number of different
values. These are the data objects which are used to categorize the data and to store
it on multiple levels. It can store both integers and strings values, and are useful in
the column that has a limited number of unique values.
Factors have labels which are associated with the unique integers stored in it. It
contains predefined set value known as levels and by default R always sorts levels in
alphabetical order.
Attributes of a factor
There are the following attributes of a factor in R
0. X
It is the input vector which is to be transformed into a factor.
a. levels
It is an input vector that represents a set of unique values which are taken by x.
b. labels
It is a character vector which corresponds to the number of labels.
c. Exclude
It is used to specify the value which we want to be excluded,
d. ordered
It is a logical attribute which determines if the levels are ordered.
e. nmax
It is used to specify the upper bound for the maximum number of level.
1. factor_data<- factor(vector)
Example
Output
Example
Output
[1] Nishka
Levels: Arpita Nishka Shubham Sumit
Modification of factor
Like data frames, R allows us to modify the factor. We can modify the value of a
factor by simply re-assigning it. In R, we cannot choose values outside of its
predefined levels means we cannot insert value if it's level is not present on it. For
this purpose, we have to create a level of that value, and then we can add it to our
factor.
Example
Output
Example
Output
Example
1. data <- c("Nishka","Gunjan","Shubham","Arpita","Arpita","Sumit","Gunjan","Sh
ubham")
2. # Creating the factors
3. factor_data<- factor(data)
4. print(factor_data)
5.
6. # Apply the factor function with the required order of the level.
7. new_order_factor<- factor(factor_data,levels = c("Gunjan","Nishka","Arpita","S
hubham","Sumit"))
8. print(new_order_factor)
Output
1. gl(n, k, labels)
1. n indicates the number of levels.
2. k indicates the number of replications.
3. labels is a vector of labels for the resulting factor levels.
Example
1. gen_factor<- gl(3,5,labels=c("BCA","MCA","B.Tech"))
2. gen_factor
Output
Transpose a Matrix
R allows us to calculate the transpose of a matrix or a data frame by providing t()
function. This t() function takes the matrix or data frame as an input and return the
transpose of the input matrix or data frame. The syntax of t() function is as follows:
1. t(Matrix/data frame)
Example
42.4M
874
C++ vs Java
1. a <- matrix(c(4:12),nrow=3,byrow=TRUE)
2. a
3. print("Matrix after transpose\n")
4. b <- t(a)
5. b
Output:
1. cbind(vector1, vector2,.......vectorN)
2. rbind(dataframe1, dataframe2,........dataframeN)
Let's see an example to understand how cbind() and rbind() function is used.
Example
Output:
Merging Data Frame
R provides the merge() function to merge two data frames. In the merging process,
there is a constraint i.e.; data frames must have the same column names.
Let's take an example in which we take the dataset about Diabetes in Pima Indian
Women which is present in the "MASS" library. We will merge two datasets on the
basis of the value of the blood pressure and body mass index. When selecting these
two columns for merging, the records where values of these two variables match in
both data sets are combined together to form a single data frame.
Example
1. library(MASS)
2. merging_pima<- merge(x = Pima.te, y = Pima.tr,
3. by.x = c("bp", "bmi"),
4. by.y = c("bp", "bmi")
5. )
6. print(merging_pima)
7. nrow(merging_pima)
Output:
Melting and Casting
In R, the most important and interesting topic is about changing the shape of the
data in multiple steps to get the desired shape. For this purpose, R provides melt()
and cast() function. To understand its process, consider a dataset called ships which
is present in the MASS library.
Example
1. library(MASS)
2. print(ships)
Output:
Melt the Data
Now we will use the above data to organize it by melting it. Melting means the
conversion of columns into multiple rows. We will convert all the columns except
type and year of the above dataset into multiple rows.
Example
1. library(MASS)
2. library(reshape2)
3. molten_ships <- melt(ships, id = c("type","year"))
4. print(molten_ships)
Output:
Casting of Molten Data
After melting the data, we can cast it into a new form where the aggregate of each
type of ship for each year is created. For this purpose, R provides cast() function.
Example
1. library(MASS)
2. library(reshape2)
3. #Melting the data
4. molten.ships <- melt(ships, id = c("type","year"))
5. print("Molted Data")
6. print(molten.ships)
7. #Casting of data
8. recasted.ship <- dcast(molten.ships, type+year~variable,sum)
9. print("Cast Data")
10. print(recasted.ship)
Output:
What is Object-Oriented
Programming in R?
Object-Oriented Programming (OOP) is the most popular programming language.
With the help of oops concepts, we can construct the modular pieces of code which
are used to build blocks for large systems. R is a functional language, and we can do
programming in oops style. In R, oops is a great tool to manage the complexity of
larger programs.
In Object-Oriented Programming, S3 and S4 are the two important systems.
S3
In oops, the S3 is used to overload any function. So that we can call the functions
with different names and it depends on the type of input parameter or the number
of parameters.
S4
In R, classes are the outline or design for the object. Classes encapsulate the data
members, along with the functions. In R, there are two most important classes, i.e., S3
and S4, which play an important role in performing OOPs concepts.
Let's discuss both the classes one by one with their examples for better
understanding.
1) S3 Class
With the help of the S3 class, we can take advantage of the ability to implement the
generic function OO. Furthermore, using only the first argument, S3 is capable of
dispatching. S3 differs from traditional programming languages such as Java, C ++,
and C #, which implement OO passing messages. This makes S3 easy to implement.
In the S3 class, the generic function calls the method. S3 is very casual and has no
formal definition of classes.
Creating an S3 class
In R, we define a function which will create a class and return the object of the
created class. A list is made with relevant members, class of the list is determined,
and a copy of the list is returned. There is the following syntax to create a class
Example
Output
There is the following way in which we define our generic function print.
1. print
2. function(x, ...)
3. UseMethod("Print")
When we execute or run the above code, it will give us the following output:
Like print function, we will make a generic function GPA to assign a new value to our
GPA member. In the following way we will make the generic function GPA
Once our generic function GPA is created, we will implement a default function for it
After that we will make a new method for our GPA function in the following way
Output
Inheritance in S3
Inheritance means extracting the features of one class into another class. In the S3
class of R, inheritance is achieved by applying the class attribute in a vector.
For inheritance, we first create a function which creates new object of class faculty in
the following way
1. faculty<- function(n,a,g) {
2. value <- list(nname=n, aage=a, GPA=g)
3. attr(value, "class") <- "faculty"
4. value
5. }
Now, we will create an object of class InternationalFaculty which will inherit from
faculty class. This process will be done by assigning a character vector of class name
as:
so,
1. # create a list
2. fac <- list(name="Shubham", age=22, GPA=3.5, country="India")
3. # make it of the class InternationalFaculty which is derived from the class Facu
lty
4. class(fac) <- c("InternationalFaculty","Faculty")
5. # print it out
6. fac
When we run the above code which we have discussed, it will generate the following
output:
We can see above that, we have not defined any method of form print.
InternationalFaculty (), the method called print.Faculty(). This method of class Faculty
was inherited.
1. print.InternationalFaculty<- function(obj1) {
2. cat(obj1$name, "is from", obj1$country, "\n")
3. }
The above function will overwrite the method defined for class faculty as
1. Fac
getS3method and getAnywhere function
There are the two most common and popular S3 method functions which are used in
R. The first method is getS3method() and the second one is getAnywhere().
S3 finds the appropriate method associated with a class, and it is useful to see how a
method is implemented. Sometimes, the methods are non-visible, because they are
hidden in a namespace. We use getS3method or getAnywhere to solve this problem.
getS3method
getAnywhere function
1. getAnywhere("simpleloess")
2) S4 Class
The S4 class is similar to the S3 but is more formal than the latter one. It differs from
S3 in two different ways. First, in S4, there are formal class definitions which provide a
description and representation of classes. In addition, it has special auxiliary functions
for defining methods and generics. The S4 also offers multiple dispatches. This
means that common functions are capable of taking methods based on multiple
arguments which are based on class.
Creating an S4 class
To create an S3 class, we have to define the class and its slots. There are the
following steps to create an S4 class
Step 1:
In the first step, we will create a new class called faculty with three slots name, age,
and GPA.
There are many other optional arguments of setClass() function which we can explore
by using ?setClass command.
Step 2:
In the next step, we will create the object of S4 class. R provides new() function to
create an object of S4 class. In this new function we pass the class name and the
values for the slots in the following way:
The setClass() function returns a generator function. This generator function helps in
creating new objects. And it acts as a constructor.
Example
Output
Inheritance in S4 class
Like S3 class, we can perform inheritance in S4 class also. The derived class will inherit
both attributes and methods of the parent class. Let's start understanding that how
we can perform inheritance in S4 class. There are the following ways to perform
inheritance in S4 class:
Step 1:
In the first step, we will create or define class with appropriate slots in the following
way:
1. setClass("faculty",
2. slots=list(name="character", age="numeric", GPA="numeric")
3. )
Step 2:
After defining class, our next step is to define class method for the display() generic
function. This will be done in the following manner:
1. setMethod("show",
2. "faculty",
3. function(obj) {
4. cat(obj@name, "\n")
5. cat(obj@age, "years old\n")
6. cat("GPA:", obj@GPA, "\n")
7. }
8. )
Step 3:
In the next step, we will define the derived class with the argument contains. The
derived class is defined in the following way
1. setClass("Internationalfaculty",
2. slots=list(country="character"),
3. contains="faculty"
4. )
In our derived class we have defined only one attribute i.e. country. Other attributes
will be inherited from its parent class.
What is R Debug?
In computer programming, debugging is a multi-step process which involves
identifying a problem, isolating the source of the problem, and then fixing the
problem or determining a way to work around it. The final step of debugging is to
test an improvement or workaround and ensure that it works.
The grammatically correct program may give us incorrect results due to some logical
errors which are known as "bug." In case, if such errors occur, then we need to find
out why and where they have occurred so that we can fix them. The procedure to
identify and fix bugs is called "debugging."
31.3M
580
For example
2. Start Small
Stick to small, simple test cases, at least at the beginning of the R debug process.
Working with big data objects can make it difficult to think about the problem. Of
course, we should eventually test our code in large, complex cases, but start small.
3. Debug in a Modular
Most professional software developers agree that the code should be written in a modular
manner. Our first-level code should not be too long for a function call. And those functions
should not be too long and should call another function if necessary. This makes the code
easier to write and helps others understand when the time comes to extend the code.
We should debug in a top-down manner. Suppose we have the debug state of our
function f () and it has the below line.
For example
1. Y <- g (x, 8)
Currently, say no to debug (g). Execute the line and see if g () returns the value that
we expect. If this happens, we simply have to avoid the single-step time-consuming
process through g(). If g () returns an incorrect value, now is the time to call debug
(g).
4. Antibugging
If there is a section of a code in which a variable z should be positive then we can
insert the following line for better performance:
Stopifnot(z>0)
When there is a bug in the code like the value of z is equal to -3, then
the Stopifnot() function is called and will bring things right there with an error
message :
1) traceback()
If our code has already crashed and we want to know where the offensive line is, try
traceback (). This will (sometimes) show the location somewhere in the code of the
problem. When an R function fails, an error is printed on the screen. Immediately
after the error, we can call traceback () to see on which function the error occurred.
The traceback () function prints the list of functions which were called before the
error had occurred. The functions are printed in reverse order.
Let's see an example to understand how we can use the traceback() function
Example
1. f <- function(a){
2. x <- a-ql(a)
3. x
4. }
5. ql<- function(b){
6. r <- b*mn(b)
7. r
8. }
9. mn<- function(p){
10. r <- log(p)
11. if(r<10)
12. r^2
13. else
14. r^3
15. }
16. f(-2)
When we run the above code, it will generate the following output:
After finding the following error we call our traceback() function and when we run, it
will show the following output:
traceback()
2) debug()
In R, debug () function allows the user to step through the execution of a function. At
any point, we can print the values of the variables or draw a graph of the results
within the function. While debugging, we can just type "c" to continue to the end of
the current block of code. Traceback () does not tell us where the function error
occurred. To know which line is causing the error, we have to step through the
function using debug ().
Example
1. func<- function(a,value){
2. subt<- value-a
3. squar<- subt^2
4. collect <- sum(squar)
5. collect
6. }
7. set.seed(100)
8. value <- rnorm(100)
9. func(1,value)
10. debug(func)
11. func(1,value)
Output
3) browser()
The browser() function halts the execution of a function until the user allows it to
continue. This is useful if we don't want to step through the complete code, line-by-
line, but we wish to stop it at a certain point so we can check what's going on.
Inserting a call into the browser() in a function will pause the function's execution at
the point where the browser () is called. It is same as using debug (), except that we
can control where the execution gets pause.
Example
1. a<-function(b) {
2. browser() ## a break point inserted here
3. c<-log(b)
4. if(c<10)
5. c^2
6. else
7. c^3
8. }
9. a(-1)
Output
4) trace()
The trace() function call allows the user to insert bits of code into the function. The
syntax for the R debug function trace () is a bit awkward for first-time users. It may be
better to use debug ().
Example
1. f <- function(a){
2. x <- a-ql(a)
3. x
4. }
5. ql<- function(b){
6. r <- b*mn(b)
7. r
8. }
9. mn<- function(p){
10. r <- log(p)
11. if(r<10)
12. r^2
13. else
14. r^3
15. }
16. as.list(body(mn))
17. trace("mn",quote(if(is.nan(r)){browser()}),at=3,print=FALSE)
18. f(1)
19. f(-1)
Output
5) recover()
When we will perform debugging of a function, recover () allows us to examine
variables in an upper-level function.
By typing a number in the selection, we are navigated to the function on the call
stack and deployed in a browser environment.
The recover () function is used as an error handler, set using options () (eg. Adopt
(error = retrieval)).
When a function throws an error, execution is stopped at the point of failure. We can
browse the function call and examine the environment to find the source of the
problem.
Example
1. f <- function(a){
2. x <- a-ql(a)
3. x
4. }
5. ql<- function(b){
6. r <- b*mn(b)
7. r
8. }
9. mn<- function(p){
10. r <- log(p)
11. if(r<10)
12. r^2
13. else
14. r^3
15. }
16. as.list(body(mn))
17. trace("mn",quote(if(is.nan(r)){recover()}),at=3,print=FALSE)
18. f(-1)
Output
Debugging Installed Packages
There is probability of an error stemming by an installed R package. The several ways
by which we can solve our problem are as follows:
o Setting the options ( error = recover) and then it is proceeded line by line by the
code using n.
o In complex situations, we should have a copy of the function code. In R the function
entering is used to print out the function code which can be copied into the text
editor. We can edit this by loading it into the global workspace and then by
performing debugging.
o If our problems are not solved, then we have to download the source code. We can
also use the devtools package and the install(), load_all() functions to make our
procedure quicker.
The try () function is the wrapper function for trycatch () that prints the error and
then continues. On the other hand, trycatch () gives us control of the error function
and, optionally, also continues the process of the function.
R Data Visualization
In R, we can create visually appealing data visualizations by writing few lines of code.
For this purpose, we use the diverse functionalities of R. Data visualization is an
efficient technique for gaining insight about data through a visual medium. With the
help of visualization techniques, a human can easily obtain information about hidden
patterns in data that might be neglected.
By using the data visualization technique, we can work with large datasets to
efficiently obtain key insights about it.
R Visualization Packages
R provides a series of packages for data visualization. These packages are as follows:
1) plotly
41.7M
941
The plotly package provides online interactive and quality graphs. This package
extends upon the JavaScript library ?plotly.js.
2) ggplot2
R allows us to create graphics declaratively. R provides the ggplot package for this
purpose. This package is famous for its elegant and quality graphs, which sets it apart
from other visualization packages.
3) tidyquant
The tidyquant is a financial package that is used for carrying out quantitative
financial analysis. This package adds under tidyverse universe as a financial package
that is used for importing, analyzing, and visualizing the data.
4) taucharts
Data plays an important role in taucharts. The library provides a declarative interface
for rapid mapping of data fields to visual properties.
5) ggiraph
It is a tool that allows us to create dynamic ggplot graphs. This package allows us to
add tooltips, JavaScript actions, and animations to the graphics.
6) geofacets
7) googleVis
googleVis provides an interface between R and Google's charts tools. With the help
of this package, we can create web pages with interactive charts based on R data
frames.
8) RColorBrewer
This package provides color schemes for maps and other graphics, which are
designed by Cynthia Brewer.
9) dygraphs
10) shiny
R Graphics
Graphics play an important role in carrying out the important features of the data.
Graphics are used to examine marginal distributions, relationships between variables,
and summary of very large data. It is a very important complement for many
statistical and computational techniques.
Standard Graphics
R standard graphics are available through package graphics, include several
functions which provide statistical plots, like:
o Scatterplots
o Piecharts
o Boxplots
o Barplots etc.
We use the above graphs that are typically a single function call.
Graphics Devices
It is something where we can make a plot to appear. A graphics device is a window
on your computer (screen device), a PDF file (file device), a Scalable Vector Graphics
(SVG) file (file device), or a PNG or JPEG file (file device).
There are some of the following points which are essential to understand:
o The functions of graphics devices produce output, which depends on the active
graphics device.
o A screen is the default and most frequently used device.
o R graphical devices such as the PDF device, the JPEG device, etc. are used.
o We just need to open the graphics output device which we want. Therefore, R takes
care of producing the type of output which is required by the device.
o For producing a certain plot on the screen or as a GIF R graphics file, the R code
should exactly be the same. We only need to open the target output device before.
o Several devices can be open at the same time, but there will be only one active
device.
1) Data
Data is the most crucial thing which is processed and generates an output.
2) Aesthetic Mappings
Aesthetic mappings are one of the most important elements of a statistical graphic. It
controls the relation between graphics variables and data variables. In a scatter plot,
it also helps to map the temperature variable of a data set into the X variable.
In graphics, it helps to map the species of a plant into the color of dots.
3) Geometric Objects
Geometric objects are used to express each observation by a point using the
aesthetic mappings. It maps two variables in the data set into the x,y variables of the
plot.
4) Statistical Transformations
5) Scales
It is used to map the data values into values present in the coordinate system of the
graphics device.
6) Coordinate system
The coordinate system plays an important role in the plotting of the data.
o Cartesian
o Plot
7) Faceting
Faceting is used to split the data into subgroups and draw sub-graphs for each
group.
2. Efficiency
Its applications allow us to display a lot of information in a small space. Although, the
decision-making process in business is inherently complex and multifunctional,
displaying evaluation findings in a graph can allow companies to organize a lot of
interrelated information in useful ways.
3. Location
Its app utilizing features such as Geographic Maps and GIS can be particularly
relevant to wider business when the location is a very relevant factor. We will use
maps to show business insights from various locations, also consider the seriousness
of the issues, the reasons behind them, and working groups to address them.
2. Distraction
However, at times, data visualization apps create highly complex and fancy graphics-
rich reports and charts, which may entice users to focus more on the form than the
function. If we first add visual appeal, then the overall value of the graphic
representation will be minimal. In resource-setting, it is required to understand how
resources can be best used. And it is also not caught in the graphics trend without a
clear purpose.
R Pie Charts
R programming language has several libraries for creating charts and graphs. A pie-
chart is a representation of values in the form of slices of a circle with different colors.
Slices are labeled with a description, and the numbers corresponding to each slice
are also shown in the chart. However, pie charts are not recommended in the R
documentation, and their characteristics are limited. The authors recommend a bar
or dot plot on a pie chart because people are able to measure length more
accurately than volume.
The Pie charts are created with the help of pie () function, which takes positive
numbers as vector input. Additional parameters are used to control labels, colors,
titles, etc.
Here,
1. X is a vector that contains the numeric values used in the pie chart.
2. Labels are used to give the description to the slices.
3. Radius describes the radius of the pie chart.
4. Main describes the title of the chart.
5. Col defines the color palette.
6. Clockwise is a logical value that indicates the clockwise or anti-clockwise direction in
which slices are drawn.
Example
Output:
Title and color
A pie chart has several more features that we can use by adding more parameters to
the pie() function. We can give a title to our pie chart by passing the main parameter.
It tells the title of the pie chart to the pie() function. Apart from this, we can use a
rainbow colour pallet while drawing the chart by passing the col parameter.
Note: The length of the pallet will be the same as the number of values that we
have for the chart. So for that, we will use length() function.
Let's see an example to understand how these methods work in creating an attractive
pie chart with title and color.
Example
Output:
1. legend(x,y=NULL,legend,fill,col,bg)
Here,
Example
Output:
3 Dimensional Pie Chart
In R, we can also create a three-dimensional pie chart. For this purpose, R provides a
plotrix package whose pie3D() function is used to create an attractive 3D pie chart.
The parameters of pie3D() function remain same as pie() function. Let's see an
example to understand how a 3D pie chart is created with the help of this function.
Example
Output:
Example
Output:
R Bar Charts
A bar chart is a pictorial representation in which numerical values of variables are
represented by length or height of lines or rectangles of equal width. A bar chart is
used for summarizing a set of categorical data. In bar chart, the data is shown
through rectangular bars having the length of the bar proportional to the value of
the variable.
In R, we can create a bar chart to visualize the data in an efficient manner. For this
purpose, R provides the barplot() function, which has the following syntax:
1. barplot(h,x,y,main, names.arg,col)
1. H A vector or matrix which contains numeric values used in the bar chart.
Example
Output:
Labels, Title &Colors
Like pie charts, we can also add more functionalities in the bar chart by-passing more
arguments in the barplot() functions. We can add a title in our bar chart or can add
colors to the bar by adding the main and col parameters, respectively. We can add
another parameter i.e., args.name, which is a vector that has the same number of
values, which are fed as the input vector to describe the meaning of each bar.
Let's see an example to understand how labels, titles, and colors are added in our bar
chart.
Example
Output:
Example
1. library(RColorBrewer)
2. months <- c("Jan","Feb","Mar","Apr","May")
3. regions <- c("West","North","South")
4. # Creating the matrix of the values.
5. Values <- matrix(c(21,32,33,14,95,46,67,78,39,11,22,23,94,15,16), nrow = 3, nco
l = 5, byrow = TRUE)
6. # Giving the chart file a name
7. png(file = "stacked_chart.png")
8. # Creating the bar chart
9. barplot(Values, main = "Total Revenue", names.arg = months, xlab = "Month",
ylab = "Revenue", ccol =c("cadetblue3","deeppink2","goldenrod1"))
10. # Adding the legend to the chart
11. legend("topleft", regions, cex = 1.3, fill = c("cadetblue3","deeppink2","goldenr
od1"))
12.
13. # Saving the file
14. dev.off()
Output:
R Boxplot
Boxplots are a measure of how well data is distributed across a data set. This divides
the data set into three quartiles. This graph represents the minimum, maximum,
average, first quartile, and the third quartile in the data set. Boxplot is also useful in
comparing the distribution of data in a data set by drawing a boxplot for each of
them.
Here,
S.N Parameter Description
o
1. x It is a vector or a formula.
4. varwidth It is also a logical value set as true to draw the width of the box same as the sample size.
5. names It is the group of labels that will be printed under each boxplot.
Let?s see an example to understand how we can create a boxplot in R. In the below
example, we will use the "mtcars" dataset present in the R environment. We will use
its two columns only, i.e., "mpg" and "cyl". The below example will create a boxplot
graph for the relation between mpg and cyl, i.e., miles per gallon and number of
cylinders, respectively.
44.5M
722
Example
Output:
Boxplot using notch
In R, we can draw a boxplot using a notch. It helps us to find out how the medians of
different data groups match with each other. Let's see an example to understand
how a boxplot graph is created using notch for each of the groups.
Example
Output:
Violin Plots
R provides an additional plotting scheme which is created with the combination of
a boxplot and a kernel density plot. The violin plots are created with the help of
vioplot() function present in the vioplot package.
Let's see an example to understand the creation of the violin plot.
Example
Output:
Bagplot- 2-Dimensional Boxplot Extension
The bagplot(x, y) function in the aplpack package provides a biennial version of the
univariate boxplot. The bag contains 50% of all points. The bivariate median is
approximate. The fence separates itself from the outside points, and the outlays are
displayed.
Example
Output:
R Histogram
A histogram is a type of bar chart which shows the frequency of the number of
values which are compared with a set of values ranges. The histogram is used for the
distribution, whereas a bar chart is used for comparing different entities. In the
histogram, each bar represents the height of the number of values present in the
given range.
For creating a histogram, R provides hist() function, which takes a vector as an input
and uses more parameters to add more functionality. There is the following syntax of
hist() function:
1. hist(v,main,xlab,ylab,xlim,ylim,breaks,col,border)
Here,
Let?s see an example in which we create a simple histogram with the help of required
parameters like v, main, col, etc.
Example
Output:
Let?s see some more examples in which we have used different parameters of hist()
function to add more functionality or to create a more attractive chart.
Output:
Example: Finding return value of hist()
Output:
Output:
Output:
R Line Graphs
A line graph is a pictorial representation of information which changes continuously
over time. A line graph can also be referred to as a line chart. Within a line graph,
there are points connecting the data to show the continuous change. The lines in a
line graph can move up and down based on the data. We can use a line graph to
compare different events, information, and situations.
A line chart is used to connect a series of points by drawing line segments between
them. Line charts are used in identifying the trends in data. For line graph
construction, R provides plot() function, which has the following syntax:
1. plot(v,type,col,xlab,ylab)
Here,
2. type This parameter takes the value ?I? to draw only the lines or ?p? to draw only the points
and "o" to draw both lines and points.
6. col It is used to give the color for both the points and lines
Let?s see a basic example to understand how plot() function is used to create the line
graph:
00:00/06:36
Example
Example
Output:
The lines() function takes an additional input vector for creating a line. Let?s see an
example to understand how this function is used:
Example
1. # Creating the data for the chart.
2. v <- c(13,22,28,7,31)
3. w <- c(11,13,32,6,35)
4. x <- c(12,22,15,34,35)
5. # Giving a name to the chart file.
6. png(file = "multi_line_graph.jpg")
7. # Plotting the bar chart.
8. plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
9. lines(w, type = "o", col = "red")
10. lines(x, type = "o", col = "blue")
11. # Saving the file.
12. dev.off()
Output:
Line Graph using ggplot2
In R, there is another way to create a line graph i.e. the use of ggplot2 packages. The
ggplot2 package provides geom_line(), geom_step() and geom_path() function to
create line graph. To use these functions, we first have to install the ggplot2 package
and then we load it into the current working library.
Let?s see an example to understand how ggplot2 is used to create a line graph. In
the below example, we will use the predefined ToothGrowth dataset, which describes
the effect of vitamin C on tooth growth in Guinea pigs.
Example
1. library(ggplot2)
2. #Creating data for the graph
3. data_frame<- data.frame(dose=c("D0.5", "D1", "D2"),
4. len=c(4.2, 10, 29.5))
5. head(data_frame)
6. png(file = "multi_line_graph2.jpg")
7. # Basic line plot with points
8. ggplot(data=data_frame, aes(x=dose, y=len, group=1)) +geom_line()+geom_point()
9. # Change the line type
10. ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(linetype = "dashed")
+geom_point()
11. # Change the color
12. ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(color="red")
+geom_point()
13. dev.off()
Output:
R Scatterplots
The scatter plots are used to compare variables. A comparison between variables is
required when we need to define how much one variable is affected by another
variable. In a scatterplot, the data is represented as a collection of points. Each point
on the scatterplot defines the values of the two variables. One variable is selected for
the vertical axis and other for the horizontal axis. In R, there are two ways of creating
scatterplot, i.e., using plot() function and using the ggplot2 package's functions.
Here,
Let's see an example to understand how we can construct a scatterplot using the plot
function. In our example, we will use the dataset "mtcars", which is the predefined
dataset available in the R environment.
43.1M
943
Example
Output:
Scatterplot using ggplot2
In R, there is another way for creating scatterplot i.e. with the help of ggplot2
package.
The ggplot2 package provides ggplot() and geom_point() function for creating a
scatterplot. The ggplot() function takes a series of the input item. The first parameter
is an input vector, and the second is the aes() function in which we add the x-axis and
y-axis.
Let's start understanding how the ggplot2 package is used with the help of an
example where we have used the familiar dataset "mtcars".
Example
Output:
We can add more features and make a more attractive scatter plots also. Below are
some examples in which different parameters are added.
Output:
Output:
Output:
Output:
Example 5: Adding title with dynamic name
Output:
Output:
Example 7: Changing name of x-axis and y-axis
Output:
Output:
Linear Regression
Linear regression is used to predict the value of an outcome variable y on the basis
of one or more input predictor variables x. In other words, linear regression is used to
establish a linear relationship between the predictor and response variables.
In linear regression, predictor and response variables are related through an equation
in which the exponent of both these variables is 1. Mathematically, a linear
relationship denotes a straight line, when plotted as a graph.
1. y = ax + b
Here,
00:00/04:28
o y is a response variable.
o x is a predictor variable.
o a and b are constants that are called the coefficients.
1. In the first step, we carry out the experiment of gathering a sample of observed
values of height and weight.
2. After that, we create a relationship model using the lm() function of R.
3. Next, we will find the coefficient with the help of the model and create the
mathematical equation using this coefficient.
4. We will get the summary of the relationship model to understand the average error
in prediction, known as residuals.
5. At last, we use the predict() function to predict the weight of the new person.
1. lm(formula,data)
Here,
Example
Output:
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
47.50833 0.07276
Example
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-38.948 -7.390 1.869 15.933 34.087
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.50833 55.18118 0.861 0.414
x 0.07276 0.39342 0.185 0.858
1. predict(object, newdata)
Here,
1. object It is the formula that we have already created using the lm() function.
2. Newdata It is the vector that contains the new value for the predictor variable.
Example
Output:
1
59.14977
Plotting Regression
Now, we plot out prediction results with the help of the plot() function. This function
takes parameter x and y as an input vector and many more arguments.
Example
Output:
R-Multiple Linear Regression
Multiple linear regression is the extension of the simple linear regression, which is
used to predict the outcome variable (y) based on multiple distinct predictor
variables (x). With the help of three predictor variables (x1, x2, x3), the prediction of y
is expressed using the following equation:
y=b0+b1*x1+b2*x2+b3*x3
The "b" values represent the regression weights. They measure the association
between the outcome and the predictor variables. "
Or
Multiple linear regression is the extension of linear regression in the relationship
between more than two variables. In simple linear regression, we have one predictor
and one response variable. But in multiple regressions, we have more than one
predictor variable and one response variable.
y=b0+b1*x1+b2*x2+b3*x3+⋯bn*xn
Here,
o y is a response variable.
o b0, b1, b2...bn are the coefficients.
o x1, x2, ...xn are the predictor variables.
In R, we create the regression model with the help of the lm() function. The model
will determine the value of the coefficients with the help of the input data. We can
predict the value of the response variable for the set of predictor variables using
these coefficients.
Before proceeding further, we first create our data for multiple regression. We will
use the "mtcars" dataset present in the R environment. The main task of the model is
to create the relationship between the "mpg" as a response variable with "wt", "disp"
and "hp" as predictor variables.
For this purpose, we will create a subset of these variables from the "mtcars" dataset.
1. data<-mtcars[,c("mpg","wt","disp","hp")]
2. print(head(input))
Output:
Creating Relationship Model and finding
Coefficient
Now, we will use the data which we have created before to create the Relationship
Model. We will use the lm() function, which takes two parameters i.e., formula and
data. Let's start understanding how the lm() function is used to create the
Relationship Model.
Example
Output:
From the above output it is clear that our model is successfully setup. Now, our next
step is to find the coefficient with the help of the model.
b0<- coef(Model)[1]
print(b0)
x_wt<- coef(Model)[2]
x_disp<- coef(Model)[3]
x_hp<- coef(Model)[4]
print(x_wt)
print(x_disp)
print(x_hp)
Output:
Let's see an example in which we predict the mileage for a car with weight=2.51,
disp=211 and hp=82.
Example
Output:
R-Logistic Regression
In the logistic regression, a regression curve, y = f (x), is fitted. In the regression curve
equation, y is a categorical variable. This Regression Model is used for predicting that
y has given a set of predictors x. Therefore, predictors can be categorical, continuous,
or a mixture of both.
00:00/04:47
In the above equation, y is a response variable, x is the predictor variable, and b 0 and
b1, b2,...bn are the coefficients, which is numeric constants. We use the glm() function
to create the regression model.
3. family An R object which specifies the details of the model, and its value is binomial for logistic
regression.
Let's see an example to understand how the glm function is used to create logistic
regression and how we can use the summary function to find a summary for the
analysis.
Example
1. #Loading library
2. library(mlbench)
3. #Using BreastCancer dataset
4. data(BreastCancer, package = "mlbench")
5. breast_canc = BreastCancer[complete.cases(BreastCancer),]
6. #Displaying the information related to dataset with the str() function.
7. str(breast_canc)
Output:
We now divide our data into training and test sets with training sets containing 70%
data and test sets including the remaining percentages.
Output:
Now, we construct the logistic regression function with the help of glm() function.
We pass the formula Class~Cell.shape as the first parameter and specifying the
attribute family as "binomial" and use Training_data as the third parameter.
Example
Output:
Now, use the summary function for analysis.
Output:
R Poisson Regression
The Poisson Regression model is used for modeling events where the outcomes are
counts. Count data is a discrete data with non-negative integer values that count
things, such as the number of people in line at the grocery store, or the number of
times an event occurs during the given timeframe.
We can also define the count data as the rate data. So that it can express the
number of times an event occurs within the timeframe as a raw count or as a rate.
Poisson regression allows us to determine which explanatory variable (x values)
influence a given response variable (y value, count, or a rate).
Here,
The poisson regression model is created with the help of the familiar function glm().
Let's see an example in which we create the poisson regression model using glm()
function. In this example, we have considered an in-built dataset "wrapbreaks" that
describe the tension(low, medium, or high), and the effect of wool type(A and B) on
the number of wrap breaks per loom. We will consider wool "type" and "tension"as
the predictor variables, and "breaks" is taken as the response variable.
Example
Output:
Now, we will create the regression model with the help of the glm() function as:
Output:
Now, let's use summary() function to find the summary of the model for data
analysis.
R Normal Distribution
In random collections of data from independent sources, it is commonly seen that
the distribution of data is normal. It means that if we plot a graph with the value of
the variable in the horizontal axis and counting the values in the vertical axis, then we
get a bell shape curve. The curve center represents the mean of the data set. In the
graph, fifty percent of the value is located to the left of the mean. And the other fifty
percent to the right of the graph. This is referred to as the normal distribution.
1. x It is a vector of numbers.
2. p It is a vector of probabilities.
3. n It is a vector of observations.
4. mean It is the mean value of the sample data whose default value is zero.
Let's start understanding how these functions are used with the help of the examples.
dnorm():Density
The dnorm() function of R calculates the height of the probability distribution at each
point for a given mean and standard deviation. The probability density of the normal
distribution is:
Example
Output:
pnorm():Direct Look-Up
The dnorm() function is also known as "Cumulative Distribution Function". This
function calculates the probability of a normally distributed random numbers, which
is less than the value of a given number. The cumulative distribution is as follows:
f(x)=P(X≤x)
Example
Output:
qnorm():Inverse Look-Up
The qnorm() function takes the probability value as an input and calculates a number
whose cumulative value matches with the probability value. The cumulative
distribution function and the inverse cumulative distribution function are related by
p=f(x)
x=f-1 (p)
Example
Output:
rnorm():Random variates
The rnorm() function is used for generating normally distributed random numbers.
This function generates random numbers by taking the sample size as an input. Let's
see an example in which we draw a histogram for showing the distribution of the
generated numbers.
Example
Output:
Binomial Distribution
The binomial distribution is also known as discrete probability distribution, which
is used to find the probability of success of an event. The event has only two possible
outcomes in a series of experiments. The tossing of the coin is the best example of
the binomial distribution. When a coin is tossed, it gives either a head or a tail. The
probability of finding exactly three heads in repeatedly tossing the coin ten times is
approximate during the binomial distribution.
1. x It is a vector of numbers.
2. p It is a vector of probabilities.
3. n It is a vector of observations.
Let's start understanding how these functions are used with the help of the examples
dbinom(): Direct Look-Up, Points
The dbinom() function of R calculates the probability density distribution at each
point. In simple words, it calculates the density function of the particular binomial
distribution.
Example
Output:
pbinom():Direct Look-Up, Intervals
The dbinom() function of R calculates the cumulative probability(a single value
representing the probability) of an event. In simple words, it calculates the
cumulative distribution function of the particular binomial distribution.
Example
Output:
qbinom(): Inverse Look-Up
The qbinom() function of R takes the probability value and generates a number
whose cumulative value matches with the probability value. In simple words, it
calculates the inverse cumulative distribution function of the binomial distribution.
Let's find the number of heads that have a probability of 0.45 when a coin is tossed
51 times.
Example
Output:
rbinom()
The rbinom() function of R is used to generate required number of random values for
given probability from a given sample.
Let's see an example in which we find nine random values from a sample of 160 with
a probability of 0.5.
Example
T-Test in R
In statistics, the T-test is one of the most common test which is used to determine
whether the mean of the two groups is equal to each other. The assumption for the
test is that both groups are sampled from a normal distribution with equal
fluctuation. The null hypothesis is that the two means are the same, and the
alternative is that they are not identical. It is known that under the null hypothesis,
we can compute a t-statistic that will follow a t-distribution with n1 + n2 - 2 degrees
of freedom.
In R, there are various types of T-test like one sample and Welch T-test. R provides
a t.test() function, which provides a variety of T-tests.
There are the following syntaxes of t.test() function for different T-test
1. t.test(y~x)
here, y is numeric, and x is a binary factor.
1. t.test(y1,y2)
Paired T-test
1. t.test(y1,y2,paired=TRUE)
1. t.test(y,mu=3)
Let's see how one-sample, paired sample, and independent samples T-test is
performed.
One-Sample T-test
One-Sample T-test is a T-test which compares the mean of a vector against a
theoretical mean. There is a following formula which is used to compute the T-test :
Here,
1. M is the mean.
2. ? is the theoretical mean.
3. s is the standard deviation.
4. n is the number of observations.
For evaluating the statistical significance of the t-test, we need to compute the p-
value. The p-value range starts from 0 to 1, and is interpreted as follow:
o If the p-value is lower than 0.05, it means we are strongly confident to reject the null
hypothesis. So that H3 is accepted.
o If the p-value is higher than 0.05, then it indicates that we don't have enough
evidence to reject the null hypothesis.
We construct the pvalue by looking at the corresponding absolute value of the t-test.
1. t.test(x, ?=0)
Here,
Example
Let's see an example of One-Sample T-test in which we test whether the volume of a
shipment of wood was less than usual(?0=0).
1. set.seed(0)
2. ship_vol <- c(rnorm(70, mean = 35000, sd = 2000))
3. t.test(ship_vol, mu = 35000)
Output:
Paired-Sample T-test
To perform a paired-sample test, we need two vectors data y1 and y2. Then, we will
run the code using the syntax t.test (y1, y2, paired = TRUE).
Example:
Suppose, we work in a large health clinic, and we are testing a new drug Procardia,
which aims to reduce high blood pressure. We find 13000 individuals with high
systolic blood pressure (x 150 = 150 mmHg, SD = 10 mmHg), and we provide them
with Procardia for a month, and then measure their blood pressure again. We find
that the average systolic blood pressure decreased to 144 mmHg with a standard
deviation of 9 mmHg.
1. set.seed(2800)
2. pre.treatment <- c(rnorm(2000, mean = 130, sd = 5))
3. post.treatment <- c(rnorm(2000, mean = 144, sd = 4))
4. t.test(pre_Treatment, post_Treatment, paired = TRUE)
Output:
Independent-Sample T-test
Depending on the structure of our data and the equality of their variance, the
independent-sample T-test can take one of the three forms, which are as follows:
There is the following general form of t.test() function for the independent-sample t-
test:
1. t.test(y1,y2, paired=FALSE)
By default, R assumes that the versions of y1 and y2 are unequal, thus defaulting to
Welch's test. For toggling this, we set the flag var.equal=TRUE.
Let's see some examples in which we test the hypothesis. In this hypothesis,
Clevelanders and New Yorkers spend different amounts for eating outside on a
monthly basis.
1. set.seed(0)
2. Spenders.Cleve <- rnorm(50, mean = 300, sd = 70)
3. Spenders.NY <- rnorm(50, mean = 350, sd = 70)
4. t.test(Spenders.Cleve, Spenders.NY, var.equal = TRUE)
Output:
Example 2: Where y1 is numeric and y2 are binary
1. set.seed(0)
2. Spenders.Cleve <- rnorm(50, mean = 300, sd = 70)
3. Spenders.NY <- rnorm(50, mean = 350, sd = 70)
4. Amount.Spent <- c(Spenders.Cleve, Spenders.NY)
5. city.name <- c(rep("Cleveland", 50), rep("New York", 50))
6. t.test(Amount.Spent ~ city.name, var.equal = TRUE)
Output:
1. set.seed(0)
2. Spenders.Cleve <- rnorm(50, mean = 300, sd = 70)
3. Spenders.NY <- rnorm(50, mean = 350, sd = 70)
4. t.test(Spenders.Cleve, Spenders.NY, var.equal = FALSE)
Output:
Chi-Square Test
The Chi-Square Test is used to analyze the frequency table (i.e., contingency table),
which is formed by two categorical variables. The chi-square test evaluates whether
there is a significant relationship between the categories of the two variables.
The Chi-Square Test is a statistical method which is used to determine whether two
categorical variables have a significant correlation between them. These variables
should be from the same population and should be categorical like- Yes/No,
Red/Green, Male/Female, etc.
R provides chisq.test() function to perform chi-square test. This function takes data
as an input, which is in the table form, containing the count value of the variables in
the observation.
In R, there is the following syntax of chisq.test() function:
1. chisq.test(data)
Let's see an example in which we will take the Cars93 data present in the "Mass"
library. This data represents the sales of different models of cars in the year 1993.
Data:
1. library("MASS")
2. print(str(Cars93))
Output:
Example:
Output:
R Classification
The idea of the classification algorithm is very simple. We predict the target class by
analyzing the training dataset. We use training datasets to obtain better boundary
conditions that can be used to determine each target class. Once the boundary
condition is determined, the next task is to predict the target class. The entire
process is known as classification.
o Linear classifier
In machine learning, the main task of statistical classification is to use an object's
characteristics for finding to which class it belongs. This task is achieved by making a
classification decision based on the value of a linear combination of the
characteristics. In R, there are three linear classification algorithms which are as
follows:
1. Logistic Regression
2. Naive Bayes classifier
3. Fisher's linear discriminant
o Support vector machines
A support vector machine is the supervised learning algorithm that analyzes data that
are used for classification and regression analysis. In SVM, each data item is plotted
as a point in n-dimensional space with the value of each attribute, that is the value of
a particular coordinate.
Least squares support vector machines is mostly used classification algorithm in R.
o Quadratic classifiers
Quadratic classification algorithms are based on Bayes theorem. These classifiers
algorithms are different in their approach for classification from the logistic
regression. In logistic regression, it is possible to derive the probability of observation
directly for a class (Y = k) for a particular observation (X = x). But in quadratic
classifies, the observation is done in the following two steps:
1. In the first step, we identify the distribution for input X for each of the groups
or classes.
2. After that, we flip the distribution with the help of Bayes theorem to calculate
the probability.
o Kernel estimation
Kernel estimation is a non-parametric way of estimating the Probability Density
Function (PDF) of the continuous random variable. It is non-parametric because it
assumes no implicit distribution for the variable. Essentially, on each datum, a kernel
function is created with the datum at its center. It ensures that the kernel is
symmetric about the datum. The PDF is then estimated by adding all these kernel
functions and dividing it by the number of data to ensure that it satisfies the two
properties of the PDF:
1. Every possible value of the PDF should be non-negative.
2. The fixed integral of the PDF on its support set should be equal to 1.
In R, the k-nearest neighbor is the most used kernel estimation algorithm for
classification.
o Decision Trees
Decision Tree is a supervised learning algorithm that is used for classification and
regression tasks. In R, the decision tree classifier is implemented with the help of the
R machine learning caret package. The random forest algorithm is the mostly used
decision tree algorithm used in R.
o NeuralNetworks
The neural network is another classifier algorithm that is inspired by the human brain
for performing a particular task or function. These algorithms are mostly used in
image classification in R. To implement neural network algorithms, we have to install
the neuralnet package.
o Learningvectorquantization
Learning vector quantization is a classification algorithm that is used for binary and
multi-class problems. By learning the training dataset, the LVQ model creates
codebook vectors that represent class regions. They contain elements which are
placed around the respective class according to their matching level. If the element
matches, it moves closer to the target class, if it does not match, then it proceeds.