Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

R PROGRAMMING LAB

MANUAL

Academic Year : 2021-22


Course Code : 201DSP215
Class : SY
Semester : IV

Department of Computer Science and Engineering(Data Science)


D. Y. Patil College of Engineering and Technology, Kolhapur.
R PROGRAMMING

Department Of Computer Science & Engineering

➢ Department Vision and Mission:


➢ Vision:
To create technocrats with a flair of advanced technology in Computer Science and
Engineering so as to satisfy the Industrial and Societal needs.
Mission:
M1: To facilitate students with latest hardware, software technologies and technical
expertise.
M2: To inspire and nurture creativity amongst the students.
M3: To make the students industry ready and inculcate entrepreneurship skills.
M4: To enrich students’ technical skills for finding innovative solutions to societal
needs.

➢ Program Educational Objectives (PEOs):

PEO1: To apply the knowledge of mathematics and computer science and


engineering to provide realistic solutions to the problems in their domain.

PEO2: To exhibit technical skills to analyze, design and develop solutions for
engineering issues by using innovative methods, cutting edge tools and techniques.

PEO3: To apply professional practices, ethical principles and improve


communication skills to enhance the career avenues.

PEO4: To display thirst for emerging technologies and inquisitiveness to tackle


societal and environmental needs.

2
R PROGRAMMING
D. Y. PATIL COLLEGE OF ENGINEERING & TECHNOLOGY
Department of Computer Science & Engineering (Data Science)
Academic Year 2021-22

Lab Manual

Name of the course: R Programming Laboratory Class – SY


Course Code: 201DSP215 Sem-VI
Faculty: Mrs. S.M.Surve

Teaching Scheme Examination Scheme


Theory: -2 Hrs./Week ISE : 25 Marks
Practical: 2 Hrs./Week ESE-POE:25 Marks
List of Assignments

Sr. No Title of Assignment CO Mapped


01 Installation of R and RStudio CO1
Demonstration of declaring R variables, objects, expressions,
02 vectors and assigning values & Perform program for reading data CO1
from R and writing data into R.
Implementation of package in R & create a program for calling
03 CO2
functions in R.
Perform various matrix operations &Implement the higher
04 CO1
dimensional array in R.
Create list in R and perform various list operations to access list
05 CO1
elements in R.
Create Data Frame in R and perform various operations on data
06 frame &Demonstrate the common functions on factors and tables CO1
in R
Demonstration of plots in R as Box Plots, Pie Charts, Bar charts,
07 CO3
Line Chart and histogram
08 Study of Simple Liner Regression and Multiple Regression in R. CO3,CO4
Write an R script to find subset of dataset by using subset (),
09 CO1,CO2
aggregate () functions on iris dataset.
Import a data from web storage. Name the dataset and now do
Logistic Regression to find out relation between variables that are
10 affecting the admission of a student in a institute based on his or CO2,CO4
her GRE score, GPA obtained and rank of the student. Also check
the model is fit or not..
Case study on How to calculate the correlation between two
11 variables. How to make scatter plots. Use the scatter plot to CO4
investigate the relationship between two variables
Case Study on Generate and Visualize Discrete and continuous
distributions using the statistical environment. Demonstration of
12 CO4
CDF and PDF uniform and normal, binomial Poisson
distributions

Prepared by: Checked by: Verified by: Approved by:


Course Coordinator Module Coordinator Program Coordinator HOD

3
R PROGRAMMING

Experiment -I
Aim: To Install R and R Packages

Objective: To understand how to install R ,RStudio and its packages

1. Open an internet browser and go to www.r-project.org.


2. Click the "download R" link in the middle of the page under "Getting Started."
3. Select a CRAN location (a mirror site) and click the corresponding link.
4. Click on the "Download R for WINDOWS" link at the top of the page.
5. Click on the file containing the latest version of R under "Files."
6. Save the .pkg file, double-click it to open, and follow the installation instructions.
7. Now that R is installed, you need to download and install RStudio.

To Install RStudio

1. Go to www.rstudio.com and click on the "Download RStudio" button.


2. Click on "Download RStudio Desktop."
3. Click on the version recommended for your system, or the latest Mac version, save the
.dmg file on your computer, double-click it to open, and then drag and drop it to your
applications folder.

To Install R Packages

The capabilities of R are extended through user-created packages, which allow specialized
statistical techniques, graphical devices, import/export capabilities, reporting tools (knitr,
Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C, C++, and
Fortran.The R packaging system is also used by researchers to create compendia to organise
research data, code and report files in a systematic way for sharing and public archiving.
A core set of packages is included with the installation of R, with more than 12,500 additional
packages (as of May 2018[update]) available at the Comprehensive R Archive Network
(CRAN).

Packages are collections of R functions, data, and compiled code in a well- defined format. The
directory where packages are stored is called the library. R comes with a standard set of
packages. Others are available for download and installation. Once installed, they have to be
loaded into the session to be used.
.libPaths() # get library location library() # see
all packages installed
search() # see packages currently loaded

Adding R Packages

You can expand the types of analyses you do be adding other packages. A complete list of
contributed packages is available from CRAN.

Follow these steps:


4
R PROGRAMMING

1. Download and install a package (you only need to do this once).To use the package,
invoke the library(package) command to load it into the current session. (You need
to do this once in each session, unless you customize your environment to automatically
load it each time.)

Installing and Loading Packages

It turns out the ability to estimate ordered logistic or probit regression is included in the
MASS package.
To install this package you run the following command: 1 > install .
packages (" MASS ")

You will be asked to pick a CRAN mirror from which to download (generally the closer
the faster) and R will install the package to your library. R will still be clueless. To actually
tell R to use the new package you have to tell R to load the package’s library each time you
start an R session, just like so:

1 > library (" MASS ")


>
R now knows all the functions that are canned in the MASS package. To see what functions
are implemented in the MASS package, type:
1 > library ( help = " MASS ")
>
Maintaining your Library

Packages are frequently updated. Depending on the developer this could happen very often.
To keep your packages updated enter this every once in a while:
1 > update . packages ( )

The Workspace
The workspace is your current R working environment and includes any user-defined objects
(vectors, matrices, data frames, lists, functions). At the end of an R session, the user can save an
image of the current workspace that is automatically reloaded the next time R is started.
Commands are entered interactively at the R user prompt. Up and down arrow keys scroll
through your command history.

You will probably want to keep different projects in different physical directories. Here are
some standard commands for managing your workspace.

getwd( ) # print the current working directory . ls ( ) # list


the objects in the current workspace.
Setwd (mydirectory) # change to my directory
setwd ("c:/docs/mydir") # note / instead of \ in windows
# view and set options for the session help(options) # learn
5
R PROGRAMMING

about available options options( ) # view current option


settings

Conclusion: Student are able to install R and its packages.

6
R PROGRAMMING

Experiment 2

Aim:
Demonstration of declaring R variables, objects, expressions, vectors and assigning values & Perform
program for reading data from R and writing data into R.

Objective:
1) Student is able to work on variables, objects, expressions
2)Work on vectors and its operations
3)Read and write data to R

DATA TYPES
You may like to store information of various data types like character, wide character, integer,
floating point, double floating point, Boolean etc. Based on the data type of a variable, the
operating system allocates memory and decides what can be stored in the reserved memory.

The variables are assigned with R-Objects and the data type of the R-object becomes the data type of
the variable. There are many types of R-objects. The frequently used ones are −

• Vectors: A basic data structure of R containing the same type of data.


• Matrices: A matrix is a two-dimensional rectangular data set. It can be created using a
vector input to the matrix function.
• Factors: Factors are the r-objects which are created using a vector. It stores the vector
along with the distinct values of the elements in the vector as labels. The labels are
always character irrespective of whether it is numeric or character or Boolean etc. in the
input vector. They are useful in statistical modelling.
• Data Frames: Data frames are tabular data objects. Unlike a matrix in data frame each
column can contain different modes of data. The first column can be numeric while the
second column can be character and third column can be logical. It is a list of vectors of
equal length.
• Lists: A list is an R-object which can contain many different types of elements inside it
like vectors, functions and even another list inside it.
Modes
All objects have a certain mode. Some objects can only deal with one mode at a time, others can
store elements of multiple modes. R distinguishes the following modes:
1. integer: integers (e.g. 1, 2 or -69)
2. numeric: real numbers (e.g 2.336, -0.35)
3. complex: complex or imaginary numbers
4. character: elements made up of text-strings (e.g. "text", "Hello World!", or "123")
• logical: data containing logical constants (i.e. TRUE and FALSE) By atomic, we mean
the vector only holds data of a single type
• character: "a", "swc"
• numeric: 2, 15.5
• integer: 2L (the L tells R to store this as an integer)
• logical: TRUE, FALSE
7
R PROGRAMMING

• complex: 1+4i(complex numbers with real and imaginary parts)

R provides many functions to examine features of vectors and other objects, for example

• class( ) - what kind ofobject is it (high-level)?


• typeof( ) - what is the object’s data type (low-level)?
• length( ) - how long is it? What about two dimensional objects?

1. Use R to calculate the following:


I. 31 * 78
Sol: > 31*78
[1] 2418
II. 697 / 41
Sol: > 697 /41
[1] 17

2. Assign the value of 39 to x Sol: >


x<-39
> x [1]
39
3. Make z the value of x - y Sol:
> z<- x - y
4. Display the value of z in the console Sol: > z
[1] 17
5. Calculate the square root of 2345, and perform a log2 transformation on the result.
Sol : > log2(sqrt(2345)) [1]
5.597686

6. Calculate the following quantities:


I. The sum of 100.1, 234.9 and 12.01.
Sol: > 100.1+234.9+12.01
[1] 347.01

II. The square root of 256. Sol:


> sqrt(256)
[1] 16

8
R PROGRAMMING

III. Calculate the 10-based logarithm of 100, and multiply the result with the
cosine of π. Hint: see ? log and ? pi.
Sol: > log10(100)*cos(pi) [1] -2

Vectors: A basic data structure of R containing the same type of data.


Creating Vector
Vectors are generally created using the c() function.
Since, a vector must have elements of the same type, this function will try and coerce elements
to the same type, if they are different.
Coercion is from lower to higher types from logical to integer to double to character.

> x <- c(1, 5, 4, 9, 0)

> typeof(x)

[1] "double"

> length(x)

[1] 5

> x <- c(1, 5.4, TRUE, "hello")

>x

[1] "1" "5.4" "TRUE" "hello"

> typeof(x)

[1] "character"

If we want to create a vector of consecutive numbers, the : operator is very helpful.

9
R PROGRAMMING

> x <- 1:7; x

[1] 1 2 3 4 5 6 7

> y <- 2:-2; y

[1] 2 1 0 -1 -2

Example 1: Creating a vector using : operator

More complex sequences can be created using the seq() function, like defining number of
points in an interval, or the step size.

Example 2: Creating a vector using seq() function

seq(1, 3, by=0.2) # specify step size

[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

seq(1, 5, length.out=4) # specify length of the vector

[1] 1.000000 2.333333 3.666667 5.000000

VECTORS EXERCISE - I

1. Consider two vectors, x, y


x=c(4,6,5,7,10,9,4,15)
y=c(0,10,1,8,2,3,4,1)
What is the value of: x*y and x+y

2. Consider two vectors, a, b


a=c(1,5,4,3,6)
b=c(3,5,2,1,9)
What is the value of: a<=b :

3. If x=c(1:12)
What is the value of: dim(x)
What is the value of: length(x)

4. If a=c(12:5)
What is the value of: is.numeric
10
R PROGRAMMING

5. If x=c ('blue', 'red', 'green', 'yellow') what is the value of: is.character(x).

If x=c('blue',10,'green',20) What is the value of: is.character(x).

6. Consider two vectors, a, b


a=c(10,2,4,15)
b=c(3,12,4,11)
What is the value of: rbind(a,b) and cbind(a,b )
7. The numbers below are the first ten days of rainfall amounts in 1996. Read them in to a
vector using the c() function 0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1

8. Inspect Table and answer the following questions:


i. What was the mean rainfall, how about the standard deviation?
ii. Calculate the cumulative rainfall (’running total’) over these ten days.
Confirm that the last value of the vector that is equal to the total sum of the
rainfall.
iii. Which day saw the highest rainfall?
10. Write R Program to do following operation on two vectors
Addition,subtraction,division,multiplication
11. Write R Program to find mean, mode, median,sum of a vector
12. Sort the vector in ascending and descending order
13. Create a vector x=(20,30,20,40,40,50) .count number of 20
14. Find largest and second largest number in vector
15. The weights of five people before and after a diet programme are given in the table.

Read the `before' and `after' values into two different vectors called before and after. Use
R to evaluate the amount of weight lost for each participant. What is the average amount
of weight lost?

16.The numbers below are the first ten days of rainfall amounts in 1996. Read them in to a
vector using the c() function 0.1, 0.6, 33.8, 1.9, 9.6, 4.3, 33.7, 0.3, 0.0, 0.1

17.Inspect Table and answer the following questions:


i. What was the mean rainfall, how about the standard deviation?
ii. Calculate the cumulative rainfall (’running total’) over these ten days. Confirm
that the last value of the vector that is equal to the total sum of the rainfall.
iii. Which day saw the highest rainfall?

Conclusion: Student has Demonstrated declaring R variables, objects, expressions, vectors and assigning
values & Perform program for reading data from R and writing data into R.

11
R PROGRAMMING

Experiment 3

Aim: Implementation of functions in R & create a program for calling functions in R.

Theory:
A function is a set of statements organized together to perform a specific task. R has a large number of in-built
functions and the user can create their own functions.
In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments
that may be necessary for the function to accomplish the actions.
The function in turn performs its task and returns control to the interpreter as well as any result which may be
stored in other objects.
Function Definition
An R function is created by using the keyword function. The basic syntax of an R function definition is as
follows −
function_name <- function(arg_1, arg_2, ...) {
Function body
}
Function Components
The different parts of a function are −
• Function Name − This is the actual name of the function. It is stored in R environment as an object
with this name.
• Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the
argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can
have default values.
• Function Body − The function body contains a collection of statements that defines what the function
does.
• Return Value − The return value of a function is the last expression in the function body to be evaluated.
R has many in-built functions which can be directly called in the program without defining them first. We can
also create and use our own functions referred as user defined functions.
User-defined Function
We can create user-defined functions in R. They are specific to what a user wants and once created they can be
used like the built-in functions. Below is an example of how a function is created and used.
# Create a function to print squares of numbers in sequence.
new.function <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}
# Call the function new.function supplying 6 as an argument.
new.function(6)

12
R PROGRAMMING

Calling a Function with Argument Values (by position and by name)


The arguments to a function call can be supplied in the same sequence as defined in the function or they can be
supplied in a different sequence but assigned to the names of the arguments.
# Create a function with arguments.
new.function <- function(a,b,c) {
result <- a * b + c
print(result)
}

# Call the function by position of arguments.


new.function(5,3,11)

# Call the function by names of the arguments.


new.function(a = 11, b = 5, c = 3)
Calling a Function with Default Argument
We can define the value of the arguments in the function definition and call the function without supplying any
argument to get the default result. But we can also call such functions by supplying new values of the argument
and get non default result.
Lazy Evaluation of Function
Arguments to functions are evaluated lazily, which means so they are evaluated only when needed by the
function body.

Exercise:

1. Create a function that will return the sum of 2 integers.


2. Create a function what will return TRUE if a given integer x is inside a vector v. Use function with
arguments as v and x.
3. Create a function that given a vector v and an integer x will return how many times the integer appears
inside the vector.
4. Create a function that given a vector will print by screen the mean and the standard deviation, it will
optionally also print the median. Use default arguments
5. Write recursive function for calculating factorial of a number and Fibonacci series

Conclusion: Student are able to implement functions

13
R PROGRAMMING

Experiment 4

Aim: Perform various matrix operations &Implement the higher dimensional array in R.

Theory:

Matrices: A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the
matrix function.

Creating Matrices
To create matrices we will use the matrix() function. The matrix() function takes the following
arguments:
• data an R object (this could be a vector).
• nrow the desired number of rows.
• ncol the desired number of columns.
• byrow a logical statement to populatethe matrix by either row or by column.
Creation of matrix

a) matrix1 <- matrix ( data = 1, nrow = 3, ncol = 3)


Sol:
> matrix1 <- matrix ( data = 1, nrow = 3, ncol = 3)
> matrix1
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 1 1 1
[3,] 1 1 1
b) vector8 <- 1:12
matrix3 <- matrix ( data = Vector8 , nrow = 4)
Sol:
> vector8 <- c(1:12)
> vector8
[1] 1 2 3 4 5 6 7 8 9 10 11 12
> matrix3 <- matrix ( data = vector8 , nrow = 4)
> matrix3
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
c) v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)
Sol:
> v1<- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)

14
R PROGRAMMING

> v1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
d) v2<- matrix(1:8, ncol = 2)
Sol:
> v2<- matrix(1:8, ncol = 2)
> v2
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8

e) matrix1 = matrix(1:9, nrow = 3) matrix1 + 2 Sol:


> matrix1 = matrix(1:9, nrow = 3)
> matrix1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> matrix1+2 [,1]
[,2] [,3]
[1,] 3 6 9
[2,] 4 7 10
[3,] 5 8 11

Manipulation of Matrix
f) matrix1 Sol:
> matrix1
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
g) matrix1[1, 3] Sol:
> matrix1[1, 3]
[1] 7
h) matrix1[ 2, ] Sol:
> matrix1[ 2, ]
[1] 2 5 8
i) matrix1[,-2] Sol:
> matrix1[,-2]
[,1] [,2]
15
R PROGRAMMING

[1,] 1 7
[2,] 2 8
[3,] 3 9

j) matrix1[1, 1] = 15 Sol:
> matrix1[1, 1] = 15
> matrix1
[,1] [,2] [,3]
[1,] 15 4 7
[2,] 2 5 8
[3,] 3 6 9
k) matrix1[ ,2 ] = 1 Sol:
> matrix1
[,1] [,2] [,3]
[1,] 15 1 7
[2,] 2 1 8
[3,] 3 1 9
l) matrix1[ ,2:3 ] = 2 Sol:
> matrix1[ ,2:3 ] = 2
> matrix1
[,1] [,2] [,3]
[1,] 15 2 2
[2,] 2 2 2
[3,] 3 2 2

Mathematical Operations
R can do matrix arithmetic. Below is a list of some basic operations we can do.
+ - * / standard scalar or by element operations
• %*% matrix multiplication
• t() transpose
• solve() inverse
• det() determinant
• chol() cholesky decomposition
• eigen() eigenvalues and eigenvectors
• crossprod() cross product.

Exercise

16
R PROGRAMMING

b) Calculate Transpose.
c) Calculate Inverse.
d) Calculate Multiplication of the matrix.
e) construct a matrix with 10 columns and 10 rows, all filled with random numbers
between 0 and 100.
f) Calculate the row means of this matrix (Hint: use rowMeans). Also calculate the
standard deviation across the row means (now also use sd().
g) Now remake the above matrix with 100 columns, and 10 rows. Then calculate the
column means (using, of course, colMeans).

3) Scalar multiplication. Find the solution for aA where a=3 and A is the same as in the previous question.

4)Find the value of x on Ax=b.

5)Using the function eigen find the eigenvalue for A.

6)Find the eigenvalues and eigenvectors of A’A . Hint: Use crossprod to compute A’A .

Arrays
Arrays are the R data objects which store the data in more than two dimensions. Arrays are n -dimensional
data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates 3 rectangular
matrices each with 2 rows and 3 columns. They are homogeneous data structures.
Now, let’s see how to create arrays in R. To create an array in R you need to use the function called array().
The arguments to this array() are the set of elements in vectors and you have to pass a vector containing the
dimensions of the array.
Array_NAME <- array(data, dim = (row_Size, column_Size, matrices, dimnames)
where,
data – An input vector given to the array.
matrices – Consists of multi-dimensional matrices.
row_Size – Number of row elements that an array can store.
column_Size – Number of column elements that an array can store.
dimnames – Used to change the default names of rows and columns according to the user’s preference.
Example:

# Create the vectors with different length


vector1 <- c(1, 2, 3)
vector2 <- c(10, 15, 3, 11, 16, 12)

17
R PROGRAMMING

# taking this vector as input


result <- array(c(vector1, vector2), dim = c(3, 3, 2))
print(result)

Output:
, , 1

[,1] [,2] [,3]


[1,] 1 10 11
[2,] 2 15 16
[3,] 3 3 12

, , 2

[,1] [,2] [,3]


[1,] 1 10 11
[2,] 2 15 16
[3,] 3 3 12

1. Write a R program to create an array of two 3x3 matrices each with 3 rows and 3 columns from two
given two vectors
2. Write a R program to create an array of two 3x3 matrices each with 3 rows and 3 columns from two
given two vectors. Print the second row of the second matrix of the array and the element in the 3rd
row and 3rd column of the 1st matrix.
3. Write a R program to create a two-dimensional 5x3 array of sequence of even integers greater than 50.
4. Write a R program to add, multiply two 3D arrays

18
R PROGRAMMING

Experiment 5

Aim: Create list in R and perform various list operations to access list elements in R.

Theory: LISTS

List is a data structure having components of mixed data types.


A vector having all elements of the same type is called atomic vector but a vector having elements of different
type is called list.
We can check if it’s a list with typeof() function and find its length using length().

Creating a list

List can be created using the list() function.

> x <- list("a" = 2.5, "b" = TRUE, "c" = 1:3)

Here, we create a list x, of three components with data types double, logical and integer vector
respectively.
Its structure can be examined with the str() function.

> str(x)

We can create the same list without the tags as follows. In such scenario, numeric indices are used by
default.

> x <- list(2.5,TRUE,1:3)

>x

Access Lists
You can access the list items by referring to its index number, inside brackets. The first item has index 1, the
second item has index 2, and so on
List Length
To find out how many items a list has, use the length() function:
Check if Item Exists
To find out if a specified item is present in a list, use the %in% operator:

19
R PROGRAMMING

Exercise:
1) If: p <- c(2,7,8), q <- c("A", "B", "C") and x <- list(p, q), then what is the value of x[2]?
2)If: w <- c(2, 7, 8) v <- c("A", "B", "C") x <- list(w, v), then which R statement will replace "A" in x
with "K".
3)If a <- list ("x"=5, "y"=10, "z"=15), which R statement will give the sum of all elements in a?
4)If Newlist <- list(a=1:10, b="Good morning", c="Hi"), write an R statement that will add 1 to each
element of the first vector in Newlist.
5)If b <- list(a=1:10, c="Hello", d="AA"), write an R expression that will give all elements, except the
second, of the first vector of b.
6)Let x <- list(a=5:10, c="Hello", d="AA"), write an R statement to add a new item z = "NewItem" to the
list x.
7) write an R statement that will assign new names "one", "two" and "three" to the elements of y.
8) write an R statement that will give the length of vector r of x.
x <- list(y=1:10, t="Hello", f="TT", r=5:20)
9)Let string <- "Grand Opening", write an R statement to split this string into two and return the output:
Q.10 Let: y <- list ("a", "b", "c") and q <- list ("A", "B", "C", "a", "b", "c"). Write an R statement that will
return all elements of q that are not in y.

Conclusion: Students are able to perform various operations on List

20
R PROGRAMMING

Experiment 6

Aim: Create Data Frame in R and perform various operations on data frame & demonstrate the common
functions on factors and tables

Theory:
Data Frames:
A data frame is a table or a two-dimensional array-like structure in which each column contains values
of one variable and each row contains one set of values from each column.
Following are the characteristics of a data frame.

• The column names should be non-empty.


• The row names should be unique.
• The data stored in a data frame can be of numeric, factor or character type.
• Each column should contain same number of data items.
#Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-


05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)

Some Functions of Dataframes are:


• The structure of the data frame can be seen by using str() function.
• The statistical summary and nature of the data can be obtained by applying summary() function.
• Extract specific column from a data frame using column name
• A data frame can be expanded by adding columns and rows using cbind() and rbind()

Question 1
Create the following data frame, afterwards invert gender(use factors) for all individuals.

21
R PROGRAMMING

Question 2
Create this data frame (make sure you import the variable working as
character and not factor).

Add this data frame column-wise to the previous one.


a) How many rows and columns does the new data frame have?
b) What class of data is in each column?
Question 3
Create a simple data frame from 3 vectors. Order the entire data frame by the first column.

Question 4
Create a data frame from a matrix of your choice, change the row names so every row says id_i
(where i is the row number) and change the column names to variable_i (where i is the column
number). I.e., for column 1 it will say variable_1, and for row 2 will say id_2 and so on.
Question -5

(a) Create a small data frame representing a database of films. It should contain the fields
title, director, year, country, and at least three films.

(b) Create a second data frame of the same format as above, but containing just one new film.
(c) Merge the two data frames using rbind().
(d) Try sorting the titles using sort(): whathappens?

Factors:
Factor is a data structure used for fields that takes only predefined, finite number of values
(categorical data).
For example, a data field such as marital status may contain only values from single,
married, separated, divorced, or widowed.
In such case, we know the possible values beforehand and these predefined, distinct values
are called levels. Following is an example of factor in R.

>x

[1] single married married single

Levels: married single


22
R PROGRAMMING

Here, we can see that factor x has four elements and two levels. We can check if a variable is
a factor or not using class() function.
Similarly, levels of a factor can be checked using the levels() function.

> class(x)

[1] "factor"

> levels(x)

[1] "married" "single"

Creating factor in R?

We can create a factor using the function factor(). Levels of a factor are inferred from the
data if not provided.

> x <- factor(c("single", "married", "married", "single"));

23
R PROGRAMMING

>x

[1] single married married single

Levels: married single

> x <- factor(c("single", "married", "married", "single"), levels = c("single",


"married", "divorced"));

>x

[1] single married married single

Levels: single married divorced

We can see from the above example that levels may be predefined even if not used.
Factors are closely related with vectors. In fact, factors are stored as integer vectors. This is
clearly seen from its structure.

> x <- factor(c("single","married","married","single"))

> str(x)

Factor w/ 2 levels "married","single": 2 1 1 2


1. If x = c(1, 2, 3, 3, 5, 3, 2, 4, NA), what are the levels of factor(x)?
2. Let x <- c(11, 22, 47, 47, 11, 47, 11). If an R expression factor(x, levels=c(11, 22, 47),
ordered=TRUE) is executed, what will be the 4th element in the output?
3. If z <- factor(c("p", "q", "p", "r", "q")) and levels of z are "p", "q" ,"r", write an R expression that will
change the level "p" to "w" so that z is e qual to: "w", "q" , "w", "r" , "q".(use level)
4. If: s1 <- factor(sample(letters, size=5, replace=TRUE)) and s2 <- factor(sample(letters, size=5,
replace=TRUE)), write an R expression that will concatenate s1 and s2 in a single factor with 10
elements

24
R PROGRAMMING

Experiment 7

Aim: Demonstration of plots in R as Box Plots, Pie Charts, Bar charts, Line Chart and histogram.
R Bar Plot
Bar plots can be created in R using the barplot() function.
We can supply a vector or matrix to this function. If we supply a vector, the plot will have bars
with their heights equal to the elements in the vector.
Let us suppose, we have a vector of maximum temperatures (in degree Celsius) for seven
days as follows.

max.temp <- c(22, 27, 26, 24, 23, 26, 28)

Now we can make a bar plot out of this data.

barplot(max.temp)

This function can take a lot of argument to control the way our data is plotted. You can read
about them in the help section ?barplot.
Some of the frequently used ones are, main to give the title, xlab and ylab to
provide labels for the axes, names.arg for naming each bar, color etc. col to define

We can also plot bars horizontally by providing the argument horiz = TRUE.

25
R PROGRAMMING

barplot(max.temp,

main = "Maximum Temperatures in a Week",

xlab = "Degree Celsius",

ylab = "Day",

names.arg = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),

col = "darkred",

horiz = TRUE)

26
R PROGRAMMING

Plotting Categorical Data

Sometimes we have to plot the count of each item as bar plots from categorical data. For
example, here is a vector of age of 10 college freshmen.

age <- c(17,18,18,17,18,19,18,16,18,18)

Simply doing barplot(age) will not give us the required plot. It will plot 10 bars with height
equal to the student’s age. But we want to know the number of student in each age category.
This count can be quickly found using the table() function, as shown below.

> table(age)

age

16 17 18 19

1261

Now plotting this data will give our required bar plot. Note below, that we
define the argument density to shade the bars.

barplot(table(age),

main="Age Count of 10 Students",

xlab="Age",

ylab="Count",

border="red",

27
R PROGRAMMING

col="blue",

density=10

R Pie Chart

Pie chart is drawn using the pie() function in R programming . This function takes in a vector of
non-negativenumbers.

expenditure

Housing Food Cloths Entertainment

100

Let us consider the above data represents the monthly expenditure breakdown of an
individual.

28
R PROGRAMMING

Example: Simple pie chart using pie()

Now let us draw a simple pie chart out of this data using the pie() function.

pie(expenditure)

We can see above that a pie chart was plotted with 5 slices. The chart was drawn in anti-
clockwise direction using pastel colors.

Example 2: Pie chart with additional parameters

pie(expenditure,

labels=as.character(expenditure),

main="Monthly Expenditure Breakdown",

col=c("red","orange","yellow","blue","green"),

border="brown",

clockwise=TRUE

29
R PROGRAMMING

As seen in the above figure, we have used the actual amount as labels. Also, the chart is drawn in
clockwise fashion.
Since the human eye is relatively bad at judging angles, other types of charts are appropriate
than pie charts.
This is also stated in the R documentation – Pie charts are a very bad way of displaying
information.
Most used plotting function in R programming is the plot() function. It
is a generic function, meaning, it has many methods which are called according to the type of
object passed to plot().
In the simplest case, we can pass in a vector and we will get a scatter plot of magnitude vs index.
But generally, we pass in two vectors and a scatter plot of these points are plotted.
For example, the command plot(c(1,2),c(3,5)) would plot the points (1,3) and (2,5).
Here is a more concrete example where we plot a sine function form range
-pi to pi.

x <- seq(-pi,pi,0.1)

plot(x, sin(x))

30
R PROGRAMMING

31
R PROGRAMMING

Adding Titles and Labelling Axes

We can add a title to our plot with the parameter main. Similarly, xlab and ylab can be used to
label the x-axis and y-axis respectively.

plot(x, sin(x),

main="The Sine Function",

ylab="sin(x)")

Changing Color and Plot Type

We can see above that the plot is of circular points and black in color. This is the default color.
We can change the plot type with the argument type. It accepts the following strings and has
the given effect.

32
R PROGRAMMING

"p" - points

"l" - lines

"b" - both points and lines

"c" - empty points joined by lines

"o" - overplotted points and lines

"s" and "S" - stair steps

"h" - histogram-like vertical lines

"n" - does not produce any points or lines

Similarly, we can define the color using col.

plot(x, sin(x),

main="The Sine Function",

ylab="sin(x)",

type="l",

col="blue")

33
R PROGRAMMING

R 3D PLOTS
There are many functions in R programming for creating 3D plots. In this section, we will discuss
on the persp() function which can be used to create 3D surfaces in perspective view.
This function mainly takes in three variables, x, y and z where x and y are vectors defining the
location along x- and y-axis. The height of the surface (z-axis) will be in the matrix z. As an
example,
Let’s plot a cone. A simple right circular cone can be obtained with the following function.

cone <- function(x, y){


x <- y <- seq(-1, 1, length= 20)
sqrt(x^2+y^2)
z <- outer(x, y, cone)
}

Now let’s prepare our variables.

We used the function seq() to generate vector of equally spaced numbers.


Then, we used the outer() function to apply the function cone at every combination of x and
y.
Finally, plot the 3D surface as follows.

persp(x, y, z)

34
R PROGRAMMING

Adding Titles and Labelling Axes to Plot

We can add a title to our plot with the parameter main. Similarly, xlab, ylab and
zlab can be used to label the three axes.

Rotational angles

We can define the viewing direction using parameters theta and phi.
By default theta, azimuthal direction, is 0 and phi, colatitude direction, is 15.
Colouringand Shading Plot
Colouring of the plot is done with parameter col. Similarly, we can add
shading with the parameter shade.

persp(x, y, z,

main="Perspective Plot of a Cone",

zlab = "Height",

theta = 30, phi = 15,

col = "springgreen", shade = 0.5)

35
R PROGRAMMING

36
R PROGRAMMING

Experiment 8
Aim: Study of Simple Liner Regression and Multiple Regression in R.

Simple Linear Regression:


Regression models describe the relationship between variables by fitting a line to the observed data.
Linear regression models use a straight line, while logistic and nonlinear regression models use a
curved line. Regression allows you to estimate how a dependent variable changes as the independent
variable(s) change.
Simple linear regression is used to estimate the relationship between two quantitative variables. You
can use simple linear regression when you want to know:How strong the relationship is between two
variables (e.g. the relationship between rainfall and soil erosion).
The value of the dependent variable at a certain value of the independent variable (e.g. the amount of
soil erosion at a certain level of rainfall).
The formula for a simple linear regression is:
y is the predicted value of the dependent variable (y) for any given value of the independent variable
(x).
B0 is the intercept, the predicted value of y when the x is 0.
B1 is the regression coefficient – how much we expect y to change as x increases.
x is the independent variable ( the variable we expect is influencing y).
e is the error of the estimate, or how much variation there is in our estimate of the regression
coefficient.
Linear regression finds the line of best fit line through your data by searching for the regression
coefficient (B1) that minimizes the total error (e) of the model.

Multiple Linear Regression is an extension of Simple Linear regression as it takes more than one
predictor variable to predict the response variable. We can define it as:
Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent variable.

In Multiple Linear Regression, the target variable(Y) is a linear combination of multiple predictor
variables x1, x2, x3, ...,xn. Since it is an enhancement of Simple Linear Regression, so the same is applied
for the multiple linear regression equation, the equation becomes:

Y= b<sub>0</sub>+b<sub>1</sub>x<sub>1</sub>+ b<sub>2</sub>x<sub>2</sub>+ b<sub>3</sub


>x<sub >3</sub>+...... bnxn ............... (a)
Where,
Y= Output/Response variable
b0, b1, b2, b3 , bn....= Coefficients of the model.
x1, x2, x3, x4,...= Various Independent/feature variable

Conclusion: Student studied Linear regression and Multiple regression

37
R PROGRAMMING

Experiment 9

Aim: Write an R script to find subset of dataset by using subset (), aggregate () functions on iris dataset

Theory:
Steps:
1) Download iris dataset from https://www.kaggle.com/datasets/arshid/iris-flower-dataset

2) Subset()
subsetting allows the user to access elements from an object. It takes out a portion from the object
based on the condition provided

Method 1: Subsetting in R Using [ ] Operator


Using the ‘[ ]’ operator, elements of vectors and observations from data frames can be accessed.
To neglect some indexes, ‘-‘ is used to access all other indexes of vector or data frame.

Method 2: Subsetting in R Using [[ ]] Operator


[[ ]] operator is used for subsetting of list-objects. This operator is the same as [ ] operator but the only
difference is that [[ ]] selects only one element whereas [ ] operator can select more than 1 element in a
single command.

Method 3: Subsetting in R Using $ Operator


$ operator can be used for lists and data frames in R. Unlike [ ] operator, it selects only a single
observation at a time. It can be used to access an element in named list or a column in data frame. $
operator is only applicable for recursive objects or list-like objects.

3)Aggregate
aggregate() function is used to get the summary statistics of the data by group. The statistics include
mean, min, sum. max etc
aggregate(dataframe$aggregate_column, list(dataframe$group_column), FUN)
.

Coclusion: Student studied how to subset and aggregate given dataset

38
R PROGRAMMING

Experiment 10

Aim: Import a data from web storage. Name the dataset and now do Logistic Regression to find out
relation between variables that are affecting the admission of a student in a institute based on his or her
GRE score, GPA obtained and rank of the student. Also check the model is fit or not.

Theory:
1)Download dataset from https://www.kaggle.com/datasets/mohansacharya/graduate-admissions

2) The dataset contains several parameters which are considered important during the application for
Masters Programs.
The parameters included are :
1. GRE Scores ( out of 340 )
2. TOEFL Scores ( out of 120 )
3. University Rating ( out of 5 )
4. Statement of Purpose and Letter of Recommendation Strength ( out of 5 )
5. Undergraduate GPA ( out of 10 )
6. Research Experience ( either 0 or 1 )
7. Chance of Admit ( ranging from 0 to 1 )

3)Predict Predicting admission from important parameters

4)Perform Exploratory analysis using different plots()

5) Implementation of Logistic Regression in R programming


In R language, logistic regression model is created using glm() function.
Syntax: glm(formula, family = binomial)
where
formula: represents an equation on the basis of which model has to be fitted.
family: represents the type of function to be used i.e., binomial for logistic regression

Conclusion: Student are able to perform logistic regression in R

39

You might also like