Professional Documents
Culture Documents
R PROGRAMMING LAB MANUAL_FINAL (2)
R PROGRAMMING LAB MANUAL_FINAL (2)
MANUAL
PEO2: To exhibit technical skills to analyze, design and develop solutions for
engineering issues by using innovative methods, cutting edge tools and techniques.
2
R PROGRAMMING
D. Y. PATIL COLLEGE OF ENGINEERING & TECHNOLOGY
Department of Computer Science & Engineering (Data Science)
Academic Year 2021-22
Lab Manual
3
R PROGRAMMING
Experiment -I
Aim: To Install R and R Packages
To Install RStudio
To Install R Packages
The capabilities of R are extended through user-created packages, which allow specialized
statistical techniques, graphical devices, import/export capabilities, reporting tools (knitr,
Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C, C++, and
Fortran.The R packaging system is also used by researchers to create compendia to organise
research data, code and report files in a systematic way for sharing and public archiving.
A core set of packages is included with the installation of R, with more than 12,500 additional
packages (as of May 2018[update]) available at the Comprehensive R Archive Network
(CRAN).
Packages are collections of R functions, data, and compiled code in a well- defined format. The
directory where packages are stored is called the library. R comes with a standard set of
packages. Others are available for download and installation. Once installed, they have to be
loaded into the session to be used.
.libPaths() # get library location library() # see
all packages installed
search() # see packages currently loaded
Adding R Packages
You can expand the types of analyses you do be adding other packages. A complete list of
contributed packages is available from CRAN.
1. Download and install a package (you only need to do this once).To use the package,
invoke the library(package) command to load it into the current session. (You need
to do this once in each session, unless you customize your environment to automatically
load it each time.)
It turns out the ability to estimate ordered logistic or probit regression is included in the
MASS package.
To install this package you run the following command: 1 > install .
packages (" MASS ")
You will be asked to pick a CRAN mirror from which to download (generally the closer
the faster) and R will install the package to your library. R will still be clueless. To actually
tell R to use the new package you have to tell R to load the package’s library each time you
start an R session, just like so:
Packages are frequently updated. Depending on the developer this could happen very often.
To keep your packages updated enter this every once in a while:
1 > update . packages ( )
The Workspace
The workspace is your current R working environment and includes any user-defined objects
(vectors, matrices, data frames, lists, functions). At the end of an R session, the user can save an
image of the current workspace that is automatically reloaded the next time R is started.
Commands are entered interactively at the R user prompt. Up and down arrow keys scroll
through your command history.
You will probably want to keep different projects in different physical directories. Here are
some standard commands for managing your workspace.
6
R PROGRAMMING
Experiment 2
Aim:
Demonstration of declaring R variables, objects, expressions, vectors and assigning values & Perform
program for reading data from R and writing data into R.
Objective:
1) Student is able to work on variables, objects, expressions
2)Work on vectors and its operations
3)Read and write data to R
DATA TYPES
You may like to store information of various data types like character, wide character, integer,
floating point, double floating point, Boolean etc. Based on the data type of a variable, the
operating system allocates memory and decides what can be stored in the reserved memory.
The variables are assigned with R-Objects and the data type of the R-object becomes the data type of
the variable. There are many types of R-objects. The frequently used ones are −
R provides many functions to examine features of vectors and other objects, for example
8
R PROGRAMMING
III. Calculate the 10-based logarithm of 100, and multiply the result with the
cosine of π. Hint: see ? log and ? pi.
Sol: > log10(100)*cos(pi) [1] -2
> typeof(x)
[1] "double"
> length(x)
[1] 5
>x
> typeof(x)
[1] "character"
9
R PROGRAMMING
[1] 1 2 3 4 5 6 7
[1] 2 1 0 -1 -2
More complex sequences can be created using the seq() function, like defining number of
points in an interval, or the step size.
[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
VECTORS EXERCISE - I
3. If x=c(1:12)
What is the value of: dim(x)
What is the value of: length(x)
4. If a=c(12:5)
What is the value of: is.numeric
10
R PROGRAMMING
5. If x=c ('blue', 'red', 'green', 'yellow') what is the value of: is.character(x).
Read the `before' and `after' values into two different vectors called before and after. Use
R to evaluate the amount of weight lost for each participant. What is the average amount
of weight lost?
16.The numbers below are the first ten days of rainfall amounts in 1996. Read them in to a
vector using the c() function 0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1
Conclusion: Student has Demonstrated declaring R variables, objects, expressions, vectors and assigning
values & Perform program for reading data from R and writing data into R.
11
R PROGRAMMING
Experiment 3
Theory:
A function is a set of statements organized together to perform a specific task. R has a large number of in-built
functions and the user can create their own functions.
In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments
that may be necessary for the function to accomplish the actions.
The function in turn performs its task and returns control to the interpreter as well as any result which may be
stored in other objects.
Function Definition
An R function is created by using the keyword function. The basic syntax of an R function definition is as
follows −
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Function Components
The different parts of a function are −
• Function Name − This is the actual name of the function. It is stored in R environment as an object
with this name.
• Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the
argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can
have default values.
• Function Body − The function body contains a collection of statements that defines what the function
does.
• Return Value − The return value of a function is the last expression in the function body to be evaluated.
R has many in-built functions which can be directly called in the program without defining them first. We can
also create and use our own functions referred as user defined functions.
User-defined Function
We can create user-defined functions in R. They are specific to what a user wants and once created they can be
used like the built-in functions. Below is an example of how a function is created and used.
# Create a function to print squares of numbers in sequence.
new.function <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}
# Call the function new.function supplying 6 as an argument.
new.function(6)
12
R PROGRAMMING
Exercise:
13
R PROGRAMMING
Experiment 4
Aim: Perform various matrix operations &Implement the higher dimensional array in R.
Theory:
Matrices: A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.
Creating Matrices
To create matrices we will use the matrix() function. The matrix() function takes the following
arguments:
• data an R object (this could be a vector).
• nrow the desired number of rows.
• ncol the desired number of columns.
• byrow a logical statement to populatethe matrix by either row or by column.
Creation of matrix
14
R PROGRAMMING
> v1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
d) v2<- matrix(1:8, ncol = 2)
Sol:
> v2<- matrix(1:8, ncol = 2)
> v2
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
Manipulation of Matrix
f) matrix1 Sol:
> matrix1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
g) matrix1[1, 3] Sol:
> matrix1[1, 3]
[1] 7
h) matrix1[ 2, ] Sol:
> matrix1[ 2, ]
[1] 2 5 8
i) matrix1[,-2] Sol:
> matrix1[,-2]
[,1] [,2]
15
R PROGRAMMING
[1,] 1 7
[2,] 2 8
[3,] 3 9
j) matrix1[1, 1] = 15 Sol:
> matrix1[1, 1] = 15
> matrix1
[,1] [,2] [,3]
[1,] 15 4 7
[2,] 2 5 8
[3,] 3 6 9
k) matrix1[ ,2 ] = 1 Sol:
> matrix1
[,1] [,2] [,3]
[1,] 15 1 7
[2,] 2 1 8
[3,] 3 1 9
l) matrix1[ ,2:3 ] = 2 Sol:
> matrix1[ ,2:3 ] = 2
> matrix1
[,1] [,2] [,3]
[1,] 15 2 2
[2,] 2 2 2
[3,] 3 2 2
Mathematical Operations
R can do matrix arithmetic. Below is a list of some basic operations we can do.
+ - * / standard scalar or by element operations
• %*% matrix multiplication
• t() transpose
• solve() inverse
• det() determinant
• chol() cholesky decomposition
• eigen() eigenvalues and eigenvectors
• crossprod() cross product.
Exercise
16
R PROGRAMMING
b) Calculate Transpose.
c) Calculate Inverse.
d) Calculate Multiplication of the matrix.
e) construct a matrix with 10 columns and 10 rows, all filled with random numbers
between 0 and 100.
f) Calculate the row means of this matrix (Hint: use rowMeans). Also calculate the
standard deviation across the row means (now also use sd().
g) Now remake the above matrix with 100 columns, and 10 rows. Then calculate the
column means (using, of course, colMeans).
3) Scalar multiplication. Find the solution for aA where a=3 and A is the same as in the previous question.
6)Find the eigenvalues and eigenvectors of A’A . Hint: Use crossprod to compute A’A .
Arrays
Arrays are the R data objects which store the data in more than two dimensions. Arrays are n -dimensional
data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates 3 rectangular
matrices each with 2 rows and 3 columns. They are homogeneous data structures.
Now, let’s see how to create arrays in R. To create an array in R you need to use the function called array().
The arguments to this array() are the set of elements in vectors and you have to pass a vector containing the
dimensions of the array.
Array_NAME <- array(data, dim = (row_Size, column_Size, matrices, dimnames)
where,
data – An input vector given to the array.
matrices – Consists of multi-dimensional matrices.
row_Size – Number of row elements that an array can store.
column_Size – Number of column elements that an array can store.
dimnames – Used to change the default names of rows and columns according to the user’s preference.
Example:
17
R PROGRAMMING
Output:
, , 1
, , 2
1. Write a R program to create an array of two 3x3 matrices each with 3 rows and 3 columns from two
given two vectors
2. Write a R program to create an array of two 3x3 matrices each with 3 rows and 3 columns from two
given two vectors. Print the second row of the second matrix of the array and the element in the 3rd
row and 3rd column of the 1st matrix.
3. Write a R program to create a two-dimensional 5x3 array of sequence of even integers greater than 50.
4. Write a R program to add, multiply two 3D arrays
18
R PROGRAMMING
Experiment 5
Aim: Create list in R and perform various list operations to access list elements in R.
Theory: LISTS
Creating a list
Here, we create a list x, of three components with data types double, logical and integer vector
respectively.
Its structure can be examined with the str() function.
> str(x)
We can create the same list without the tags as follows. In such scenario, numeric indices are used by
default.
>x
Access Lists
You can access the list items by referring to its index number, inside brackets. The first item has index 1, the
second item has index 2, and so on
List Length
To find out how many items a list has, use the length() function:
Check if Item Exists
To find out if a specified item is present in a list, use the %in% operator:
19
R PROGRAMMING
Exercise:
1) If: p <- c(2,7,8), q <- c("A", "B", "C") and x <- list(p, q), then what is the value of x[2]?
2)If: w <- c(2, 7, 8) v <- c("A", "B", "C") x <- list(w, v), then which R statement will replace "A" in x
with "K".
3)If a <- list ("x"=5, "y"=10, "z"=15), which R statement will give the sum of all elements in a?
4)If Newlist <- list(a=1:10, b="Good morning", c="Hi"), write an R statement that will add 1 to each
element of the first vector in Newlist.
5)If b <- list(a=1:10, c="Hello", d="AA"), write an R expression that will give all elements, except the
second, of the first vector of b.
6)Let x <- list(a=5:10, c="Hello", d="AA"), write an R statement to add a new item z = "NewItem" to the
list x.
7) write an R statement that will assign new names "one", "two" and "three" to the elements of y.
8) write an R statement that will give the length of vector r of x.
x <- list(y=1:10, t="Hello", f="TT", r=5:20)
9)Let string <- "Grand Opening", write an R statement to split this string into two and return the output:
Q.10 Let: y <- list ("a", "b", "c") and q <- list ("A", "B", "C", "a", "b", "c"). Write an R statement that will
return all elements of q that are not in y.
20
R PROGRAMMING
Experiment 6
Aim: Create Data Frame in R and perform various operations on data frame & demonstrate the common
functions on factors and tables
Theory:
Data Frames:
A data frame is a table or a two-dimensional array-like structure in which each column contains values
of one variable and each row contains one set of values from each column.
Following are the characteristics of a data frame.
Question 1
Create the following data frame, afterwards invert gender(use factors) for all individuals.
21
R PROGRAMMING
Question 2
Create this data frame (make sure you import the variable working as
character and not factor).
Question 4
Create a data frame from a matrix of your choice, change the row names so every row says id_i
(where i is the row number) and change the column names to variable_i (where i is the column
number). I.e., for column 1 it will say variable_1, and for row 2 will say id_2 and so on.
Question -5
(a) Create a small data frame representing a database of films. It should contain the fields
title, director, year, country, and at least three films.
(b) Create a second data frame of the same format as above, but containing just one new film.
(c) Merge the two data frames using rbind().
(d) Try sorting the titles using sort(): whathappens?
Factors:
Factor is a data structure used for fields that takes only predefined, finite number of values
(categorical data).
For example, a data field such as marital status may contain only values from single,
married, separated, divorced, or widowed.
In such case, we know the possible values beforehand and these predefined, distinct values
are called levels. Following is an example of factor in R.
>x
Here, we can see that factor x has four elements and two levels. We can check if a variable is
a factor or not using class() function.
Similarly, levels of a factor can be checked using the levels() function.
> class(x)
[1] "factor"
> levels(x)
Creating factor in R?
We can create a factor using the function factor(). Levels of a factor are inferred from the
data if not provided.
23
R PROGRAMMING
>x
>x
We can see from the above example that levels may be predefined even if not used.
Factors are closely related with vectors. In fact, factors are stored as integer vectors. This is
clearly seen from its structure.
> str(x)
24
R PROGRAMMING
Experiment 7
Aim: Demonstration of plots in R as Box Plots, Pie Charts, Bar charts, Line Chart and histogram.
R Bar Plot
Bar plots can be created in R using the barplot() function.
We can supply a vector or matrix to this function. If we supply a vector, the plot will have bars
with their heights equal to the elements in the vector.
Let us suppose, we have a vector of maximum temperatures (in degree Celsius) for seven
days as follows.
barplot(max.temp)
This function can take a lot of argument to control the way our data is plotted. You can read
about them in the help section ?barplot.
Some of the frequently used ones are, main to give the title, xlab and ylab to
provide labels for the axes, names.arg for naming each bar, color etc. col to define
We can also plot bars horizontally by providing the argument horiz = TRUE.
25
R PROGRAMMING
barplot(max.temp,
ylab = "Day",
col = "darkred",
horiz = TRUE)
26
R PROGRAMMING
Sometimes we have to plot the count of each item as bar plots from categorical data. For
example, here is a vector of age of 10 college freshmen.
Simply doing barplot(age) will not give us the required plot. It will plot 10 bars with height
equal to the student’s age. But we want to know the number of student in each age category.
This count can be quickly found using the table() function, as shown below.
> table(age)
age
16 17 18 19
1261
Now plotting this data will give our required bar plot. Note below, that we
define the argument density to shade the bars.
barplot(table(age),
xlab="Age",
ylab="Count",
border="red",
27
R PROGRAMMING
col="blue",
density=10
R Pie Chart
Pie chart is drawn using the pie() function in R programming . This function takes in a vector of
non-negativenumbers.
expenditure
100
Let us consider the above data represents the monthly expenditure breakdown of an
individual.
28
R PROGRAMMING
Now let us draw a simple pie chart out of this data using the pie() function.
pie(expenditure)
We can see above that a pie chart was plotted with 5 slices. The chart was drawn in anti-
clockwise direction using pastel colors.
pie(expenditure,
labels=as.character(expenditure),
col=c("red","orange","yellow","blue","green"),
border="brown",
clockwise=TRUE
29
R PROGRAMMING
As seen in the above figure, we have used the actual amount as labels. Also, the chart is drawn in
clockwise fashion.
Since the human eye is relatively bad at judging angles, other types of charts are appropriate
than pie charts.
This is also stated in the R documentation – Pie charts are a very bad way of displaying
information.
Most used plotting function in R programming is the plot() function. It
is a generic function, meaning, it has many methods which are called according to the type of
object passed to plot().
In the simplest case, we can pass in a vector and we will get a scatter plot of magnitude vs index.
But generally, we pass in two vectors and a scatter plot of these points are plotted.
For example, the command plot(c(1,2),c(3,5)) would plot the points (1,3) and (2,5).
Here is a more concrete example where we plot a sine function form range
-pi to pi.
x <- seq(-pi,pi,0.1)
plot(x, sin(x))
30
R PROGRAMMING
31
R PROGRAMMING
We can add a title to our plot with the parameter main. Similarly, xlab and ylab can be used to
label the x-axis and y-axis respectively.
plot(x, sin(x),
ylab="sin(x)")
We can see above that the plot is of circular points and black in color. This is the default color.
We can change the plot type with the argument type. It accepts the following strings and has
the given effect.
32
R PROGRAMMING
"p" - points
"l" - lines
plot(x, sin(x),
ylab="sin(x)",
type="l",
col="blue")
33
R PROGRAMMING
R 3D PLOTS
There are many functions in R programming for creating 3D plots. In this section, we will discuss
on the persp() function which can be used to create 3D surfaces in perspective view.
This function mainly takes in three variables, x, y and z where x and y are vectors defining the
location along x- and y-axis. The height of the surface (z-axis) will be in the matrix z. As an
example,
Let’s plot a cone. A simple right circular cone can be obtained with the following function.
persp(x, y, z)
34
R PROGRAMMING
We can add a title to our plot with the parameter main. Similarly, xlab, ylab and
zlab can be used to label the three axes.
Rotational angles
We can define the viewing direction using parameters theta and phi.
By default theta, azimuthal direction, is 0 and phi, colatitude direction, is 15.
Colouringand Shading Plot
Colouring of the plot is done with parameter col. Similarly, we can add
shading with the parameter shade.
persp(x, y, z,
zlab = "Height",
35
R PROGRAMMING
36
R PROGRAMMING
Experiment 8
Aim: Study of Simple Liner Regression and Multiple Regression in R.
Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one
predictor variable to predict the response variable. We can define it as:
Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent variable.
In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple predictor
variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied
for the multiple linear regression equation, the equation becomes:
37
R PROGRAMMING
Experiment 9
Aim: Write an R script to find subset of dataset by using subset (), aggregate () functions on iris dataset
Theory:
Steps:
1) Download iris dataset from https://www.kaggle.com/datasets/arshid/iris-flower-dataset
2) Subset()
subsetting allows the user to access elements from an object. It takes out a portion from the object
based on the condition provided
3)Aggregate
aggregate() function is used to get the summary statistics of the data by group. The statistics include
mean, min, sum. max etc
aggregate(dataframe$aggregate_column, list(dataframe$group_column), FUN)
.
38
R PROGRAMMING
Experiment 10
Aim: Import a data from web storage. Name the dataset and now do Logistic Regression to find out
relation between variables that are affecting the admission of a student in a institute based on his or her
GRE score, GPA obtained and rank of the student. Also check the model is fit or not.
Theory:
1)Download dataset from https://www.kaggle.com/datasets/mohansacharya/graduate-admissions
2) The dataset contains several parameters which are considered important during the application for
Masters Programs.
The parameters included are :
1. GRE Scores ( out of 340 )
2. TOEFL Scores ( out of 120 )
3. University Rating ( out of 5 )
4. Statement of Purpose and Letter of Recommendation Strength ( out of 5 )
5. Undergraduate GPA ( out of 10 )
6. Research Experience ( either 0 or 1 )
7. Chance of Admit ( ranging from 0 to 1 )
39