INTRODUCTION

Name. Kishan Kumar Goud Roll no.
432
SYBSC.IT
R PROGRAMMING
R Data types
The R Programming language the variables are not declared as some data types.
The variables are assigned with R object and data types of R object becomes
data types of variables.
There are many types of R Objects.

Vectors
List
Matrices
Arrays
Factors
Data Frames
Computer Oriented Statistical

Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
PRACTICAL NO .01
AIM: Using R execute the basic commands, array, list, frames and matrices.
The simplest of R Object is vectors and there are six datatypes of these atomic
vectors.
The R Object are built upon this Atomic Vectors.
Types of R Vectors
Logical
Numeric
Integer
Complex
Character
Raw
Class
 Class gives data type of the vector. R possesses a simple generic function
mechanism which can be used for an object-oriented style of programming.
 Method dispatch takes place based on the class of the first argument to the
generic function.
Six data-types of atomic vectors are:
1) logical: Create or test for objects of type "logical", and the basic logical
constants.
2) numeric: Creates or coerces objects of type "numeric" numeric is a more

general test of an object being interpretable as numbers.

Techniques
SYBSC.IT
3) integer: Creates or tests for objects of type "integer".
4) complex: Basic functions which support complex arithmetic in R, in addition

to the arithmetic operators +, -, *, /, and ^.
5) character: Create or test for objects of type "character".
charToRaw()
 Conversion and manipulation of objects of type "raw".
6) raw: Creates or tests for objects of type "raw".
Sequence Generation
seq()
Generate regular sequences.
seq is a standard generic with a default method.
seq.int is a primitive which can be much faster but has a few restrictions.
Techniques
SYBSC.IT
seq_along and seq_len are very fast primitives for two common cases.
C function
c()
This is a generic function which combines its arguments.
The default method combines its arguments to form a vector.
All arguments are coerced to a common type which is the type of the returned
value, and all attributes except names are removed.
To create a sequence
Colon:
 It is used to create a regular sequence.
Accessing the Vector Elements

Elements of the vectors are accessed by index sign.
[] (square brackets) are used for indexing.
Indexing starts with first position.
By giving negative value in index drops the element.
Techniques
SYBSC.IT
TRUE and FALSE can also be used for indexing.
Accessing element at specific position:
Accessing element at different position:
To drop elements of specific index:
weekdays [c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE)]
Vector element re-cycling

Techniques
SYBSC.IT
Sorting
sort()
Description
 Sort (or order) a vector or factor (partially) into ascending or descending
order.
 For ordering along more than one variable, e.g., for sorting data frames,
see sort
Sorting Number
 In Ascending Order
 In Descending Order
Sorting characters
R-List
List of the R object which contains elements of different type like no string
vector and another list inside it.
A list can also contain a matrix or a function as its element.
List is created using list() method.

Techniques
SYBSC.IT
List
Description
 Functions to construct, coerce and check for both kinds of R lists.
Matrix
Description
 Matrix creates a matrix from the given set of values.
 As matrix attempts to turn its argument into a matrix.
list1=list(c("jan","feb","march"),matrix(c(2,4,5,6),nrow=2),list("green",15.5))
Accessing list elements by index.

Techniques
SYBSC.IT
Name the list

 Function to get or set the names of an object.
Accessing list elements

 Access name of the list by using “$”
Creating l1 and l2 to perform Merging

Techniques
SYBSC.IT
Merging List l1 and l2
Unlist a list

Techniques
SYBSC.IT
Given a list structure x, unlist simplifies it to produce a vector which contains

all the atomic components which occur in x.
Create a list l3:
Unlist l3:
Arrays
Arrays are the R data objects which can store data in more than two
dimensional.
An array is created using array function.
It takes vector as an input and use the value in dim parameter to create an array,
where dim parameter specifies the dimension of the array.
Array can store data of same type.
Syntax: array(data, dim=dimension(no. of rows, no. of col, no. of 2D
arrays)dimname).
Dimnane requires list function.

Techniques
SYBSC.IT
Array
Description
 Creates or tests for arrays.
Arguments
data: a vector (including a list or expression vector) giving data to fill the array.
Non-atomic classed objects are coerced by as vector.
dim: the dim attribute for the array to be created, that is an integer vector of
length one or more giving the maximal indices in each dimension.
Usage
array(data = NA, dim = length(data), dimnames = NULL)
Creating an array

Techniques
SYBSC.IT
Naming the rows, column and matrix
Function factor is used to encode a vector as a factor

factor()
Description
 The function factor is used to encode a vector as a factor (the terms
‘category’ and ‘enumerated type’ are also used for factors).
 If argument ordered is TRUE, the factor levels are assumed to be ordered.
For compatibility with S there is also a function ordered.
>data=c("north","south","east","north","south","east","east","east","nor
th","south","north","south","north")

Techniques
SYBSC.IT
Data Frames
Data frame is a table of 20 array like structure in which each column contain
value one variable of each row contain one set of values from each column.
Characteristics of Data Frames are:
 Column name should not be empty.
 Row name should be unique.
 The data stored in a data frame can be of numeric factors or character
type.
 Each column should contain same number of data frame items.
 Data frames is created with data.frame function.
Creating a Data Frame/Table

>empdata=data.frame(empID=c(101:105),empName=c("Taehyung","Jungkook
","Jimin","Suga","Jin"),Salary=c(18000,14000,20000,23000,25000),startDate=a
s.Date(c("2018-03-25","2019-02-24","2020-5-21","2018-03-25","2018-03-
2")),stringsAsFactors=FALSE)
str()
Description
 Compactly display the internal structure of an R object, a diagnostic
function and an alternative to summary (and to some extent, dput).
 Ideally, only one line for each ‘basic’ structure is displayed.
 It is especially well suited to compactly display the (abbreviated) contents
of (possibly nested) lists.
 The idea is to give reasonable output for any R object.
 It calls args for (non-primitive) function objects.
Techniques
SYBSC.IT
summary()
Description
 Summary is a generic function used to produce result summaries of the
results of various model fitting functions.
 The function invokes particular methods which depend on the class of the
first argument.
Extracting data from Data Table (Data Frame)

data.frame()
Description
 The function data.frame() creates data frames, tightly coupled collections
of variables which share many of the properties of matrices and of lists,
used as the fundamental data structure by most of R's modeling software.

Techniques
SYBSC.IT
Expanding data frames

 Adding columns to the data frame
>empdata$dept=c("IT","Finance","Testing","Development","HR")
 Creating data frame and combining it

>empnewdata=data.frame(empID=c(106:108),empName=c("Jennie","Lisa","R
ose"),Salary=c(12000,19000,24000),startDate=as.Date(c("2020-01-5","2018-
01-2","2019-5-21")),
dept=c("HR","IT","Development"),stringsAsFactors=FALSE)

Techniques
SYBSC.IT
rbind()
Description
 Take a sequence of vector, matrix or data-frame arguments and combine
by columns or rows, respectively. These are generic functions with
methods for other R classes.

Techniques
SYBSC.IT
PRACTICAL NO .02
AIM: Create a matrix using R and perform the operations addition,
multiplication, inverse and transpose.
Matrices of R object in which elements are arranged in 2D rectangular layout it

contains elements of same atomic type.
 Creating a matrix
 Matrix addition
 Matrix subtraction
 Matrix Multiplication
Techniques
SYBSC.IT
 x %*% y
Description
Multiplies two matrices, if they are conformable. If one argument is a vector, it

will be promoted to either a row or column matrix to make the two arguments
conformable. If both are vectors of the same length, it will return the inner
product (as a matrix).
 Matrix Inverse
det
Description
 det calculates the determinant of a matrix. determinant is a generic
function that returns separately the modulus of the determinant,
optionally on the logarithm scale, and the sign of the determinant.

Techniques
SYBSC.IT
 Inverse Matrix m1
 Inverse Matrix m2
 Matrix Transpose
Description
 Given a matrix or data.frame x, t returns the transpose of x.

Techniques
SYBSC.IT
PRACTICAL NO .03
AIM: Execute the statistical functions mean, median, mode, quartiles, range,
interquartile-range and histogram using R.
Mean
 mean()
Description
 Generic function for the (trimmed) arithmetic mean.
 na.rm
a logical value indicating whether NA values should be stripped before the
computation proceeds.
Median
 median()
Description
 Compute the sample median.

Techniques
SYBSC.IT
Quartile
 quantile()
Description
 The generic function quantile produces sample quantiles corresponding to
the given probabilities.
 The smallest observation corresponds to a probability of 0 and the largest to
a probability of 1.
Range
 range()
Description
 range returns a vector containing the minimum and maximum of all the
given arguments.
Histogram
 hist()
Description
 The generic function hist computes a histogram of the given data values.
 If plot = TRUE, the resulting object of class "histogram" is plotted by
plot.histogram, before it is returned.

Techniques
SYBSC.IT
Mode
1) User Defined
2) Using table
User Defined
> getmode=function(x)
{ #Create mode function
unique_x=unique(x)
tabulate_x=tabulate(match(x,unique_x))
unique_x[tabulate_x == max(tabulate_x)]
}

Techniques
SYBSC.IT
 Create a vector v3 find length and mode
 Using Table
 data()
Description
 Loads specified data sets, or list the available data sets.
> data()
> faithful
> eruptions=faithful$eruptions
>eruptions

Techniques
SYBSC.IT

Techniques
SYBSC.IT
 Calculate Mean, Median, Quartile, Range, IQR and Histogram of

eruptions.

Techniques
SYBSC.IT
PRACTICAL NO .04
AIM: Using R import data from Excel/.csv file and perform statistical
functions: mean, median, mode, quartiles, range, interquartile-range and
histogram.
Using .csv
Set path to current working directory
 getwd(),setwd()
Description
 getwd returns an absolute filepath representing the current working directory
of the R process.
 setwd(dir) is used to set the working directory to dir.
 read.csv()
Description
 Reads a file in table format and creates a data frame from it, with cases
corresponding to lines and variables to fields in the file.
 Create a notepad file “Student” save it as “Student.csv” to the current

working directory.
 Import “Student.csv” in R.
Techniques
SYBSC.IT
 To get marks
marks=studentData$Marks
marks
 Calculate Mean, Median, Quartile, Range, IQR, Mode and Histogram

of marks.
Using Excel

Techniques
SYBSC.IT
 To import a file from excel we need to install few packages.

1) xlsx
 The xlsx package gives programatic control of Excel files using R.
2)xlsxjars
 The xlsxjars package collects all the external jars required for the xlxs
package.
3)rJava
 Low-level interface to Java VM very much like .C/.Call and friends.
 Allows creation of objects, calling methods and accessing fields.
To install the packages:

> install.packages("xlsx")
package ‘xlsx’ successfully unpacked and MD5 sums checked
> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked
> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked
 Create a Employee.xlsx file.

Techniques
SYBSC.IT
 Import Employee.xlsx using R.

Load the xlsx package.
> library(xlsx)
> setwd("D:/")
Importing .xlsx file
> emp=read.xlsx("Employee.xlsx",sheetIndex=1)
> emp

Techniques
SYBSC.IT
 To get salary
> salary=emp$salary
> salary
 Calculate Mean, Median, Quartile, Range, IQR, Mode and Histogram of

salary.

Techniques
SYBSC.IT
PRACTICAL NO .05
Techniques
SYBSC.IT
AIM: Using R import data from Excel/.csv file and calculate standard
deviation, variance and co-variance.
Q) Length of the services is given in the hours of employee of a company. Find

Standard Deviation and variance of the length of the services.
Length of the services Length of services Frequency

(Group Data)
10-15 12.5 12
15-20 17.5 127
20-25 22.5 18
25-30 27.5 7
30-35 52.5 65
35-40 37.5 2
 Create a Company.xlsx file.
 To install the packages:


Techniques
SYBSC.IT
 Import Company.xlsx using R.ji

> library(xlsx)
> setwd("D:/")
> empservices=read.xlsx("Company.xlsx",sheetIndex=1)
> empservices
Techniques
SYBSC.IT
 Replicate Elements
rep()
Description
 rep replicates the values in x. It is a generic function, and the (internal)
default method is described here.
> totalservices=rep(empservices$Length_of_services,empservices$hrs)
> totalservices
 Standard Deviation:
sd()
Techniques
SYBSC.IT
Description
 This function computes the standard deviation of the values in x.
 If na.rm is TRUE then missing values are removed before computation
proceeds.
 Variance:
var()
Description
 var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors.
 If x and y are matrices then the covariances (or correlations) between the
columns of x and the columns of y are computed.
Calculate standard deviation, variance and mean of totalservices.

> stddev=sd(totalservices)
> stddev
> var(totalservices)
> mean(totalservices)
 Co-variance:
cov()

Techniques
SYBSC.IT
 Co-variance is the measure the joint variability of two random variables.

 Description var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors.
 If x and y are matrices then the covariances (or correlations) between the
columns of x and the columns of y are computed.
Q) Marks of six students in English and Mathematics are given as follows find
the co-variance between marks of English and mathematics.
English Marks Mathematics Marks
68 88
85 89
78 79
69 85
67 77
82 60
 Create a marks.xlsx file.
 Load the xlsx package.

> library(xlsx)

Techniques
SYBSC.IT
 Set path to current working directory

> setwd("D:/")
 Importing .xlsx file
> marks=read.xlsx("marks.xlsx",sheetIndex=1)
> marks
 Co-variance
> cov(marks$English_Marks,marks$Mathematics_Marks)
 When we change the marks and make it almost equal-equal we are getting
Co-variance in positive.
Techniques
SYBSC.IT
PRACTICAL NO .06

Techniques
SYBSC.IT
AIM: Using R import data from Excel/.csv file and calculate its skewness and
kurtosis value.

We have to install moments package to find skewness and kurtosis

> install.packages("moments")
package ‘moments’ successfully unpacked and MD5 sums checked
Q) Length of the services is given in the hours of employee of a company. Find

Standard Deviation and variance of the length of the services.
Length of the services Length of services Frequency
(Group Data)
10-15 12.5 12
15-20 17.5 127
20-25 22.5 18
25-30 27.5 7
30-35 52.5 65
35-40 37.5 2
 Create a Company.xlsx file.

Techniques
SYBSC.IT
 Import Company.xlsx using R.ji

> library(xlsx)
>library(moments)
> setwd("D:/")
> empservices=read.xlsx("Company.xlsx",sheetIndex=1)
> empservices
> totalservices=rep(empservices$Length_of_services,empservices$hrs)

Techniques
SYBSC.IT
> totalservices
 Skewness
skewness()
Description
 This function computes skewness of given data.
 Kurtosis
kurtosis()
Description
 This function computes the estimator of Pearson's measure of kurtosis.
> skewness(totalservices)
> kurtosis(totalservices)

Techniques
SYBSC.IT
PRACTICAL NO .07
AIM: Using R import data from Excel/.csv file and perform hypothesis testing
in R.
Definition:
Statistical Hypothesis
Statistical Hypothesis is either
1) A statement about the value of a population parameter.
2) A statement about the kind of probability distribution that a certain
variable obeys.
Example 1: The mean age of all the university students 23.4 years.
Example 2: The population of university student who are women is 76%.
Example 3: The population of books in public library whose height exceeds
30cm is less than equal to 0.13 (30<_0.13).
Types of Hypothesis:
Null Hypothesis: Denoted as Ho.

 It is a statistical hypothesis that the observation is due to chance factor.
 It is denoted by Ho.
Alternative Hypothesis: Denoted as H1 or Ha.

 It shows that observation are result of near effects.
 It is denoted by H1 or Ha.
Types of Errors:
 Type I Error
 Type II Error
Question 1)

Techniques
SYBSC.IT
The data set atkins-diet.csv consists weight loss experience by dieter using
Atkins diet test for the atkin hypothesis says that people who use their method
lose on average at least 20 pounds in 6 months can we reject the claim by
Atkins-diet.

 Import AtkinsDiet.csv to R
Techniques
SYBSC.IT
> library(xlsx)
> setwd("D:/")
> getwd()

> AtkinsDiet=read.csv("AtkinsDiet.csv")
> AtkinsDiet
 To get the data of Loss.at.6.Months

> LossAt6Months=AtkinsDiet$Loss.at.6.Months
> LossAt6Months
Techniques
SYBSC.IT
Method 1:
Method 2:

Techniques
SYBSC.IT
Conclusion: As true mean is less than 20 the H0 hypothesis can be rejected.

Therefore, we can reject the claim by Atkins diet.
Question 2)
Techniques
SYBSC.IT
The data set Iris that is given in R that gives measurement of sepal length, sepal
width, petal length, petal width for 50 flowers from each of three species of iris
flower which are setosa, versicolor and virginica.
A) Test the hypothesis that mean of the sepal length of virginica species =6.15.
2
Load data of iris in R
> data()
> iris
> virginica=iris[iris$Species=="virginica",]
> virginica

Techniques
SYBSC.IT
 Mean of the length of Virginica species =6.15
Conclusion: As true mean is not equal to 6.15 the null hypothesis can be
rejected. Therefore, mean of the length of Virginica species is not equal 6.15
library(WriteXLS)
write.xlsx(Mdf,"D:/Mdx.xlsx")

Techniques
SYBSC.IT
PRACTICAL NO .08
AIM: To perform Chi-Square test.

 Import clinical.csv to R
Techniques
SYBSC.IT
> library(xlsx)
> setwd("D:/")
> getwd()
Example 1
 Create a clinical.xlsx file.
Method 1: import data using csv file

Code:
setwd(“D:/”)
cdata=read.csv(“clinical.csv”)
cdata
chidata=data.frame(cdata$p,cdata$q)
chidata
chtest=chisq.test(chidata)
chtesT
Output:

Techniques
SYBSC.IT
Method 1: using matrix

Code:
Techniques
SYBSC.IT
M=matrix(c(71,42,49,78),nrow=2)
M
Mdata=data.frame(M)
Mdata
Mchi=chisq.test(Mdata)
Mchi
Output:
Example 2:
Check association between AirBags and Type of the Cars93 database of R

Techniques
SYBSC.IT
Code:
library(“MASS”)
print(str(Cars93))
car.data=data.frame(Cars93$AirBags,Cars93$Type)
car.data
car.data=table(Cars93$AirBags,Cars93$Type)
Carschi=chisq.test(car.data)
Carschi
Output:

Techniques
SYBSC.IT
PRACTICAL NO .09
AIM: To perform Linear Regression.

 Import regression.csv to R
Techniques
SYBSC.IT
> library(xlsx)
> setwd("D:/")
> getwd()
Example 1
x>1,2,3,4,5,6
y>25,65,75,85,62,105
Perform Linear Regression in R
Method 1: import data using csv file

<regression.csv file includes data as given in the question>
 Create a regression.xlsx file.
Code:
setwd(“D:/”)
getwd()
Techniques
SYBSC.IT
rdata=read.csv(“regression.csv”);
rdata;
x=rdata$year;
y=rdata$sales;
reg=lm(y~x);
plot(x,y);
plot(x,y,col=”blue”,pch=16,cex=2.2,xlab=”Year”,ylab=”Sales”,main=”Perform
ance”,abline(reg,col=”red”));
p=data.frame(x=7);
est=predict(reg,p);
est;
Output:

Techniques
SYBSC.IT

Techniques
SYBSC.IT
Method 2: using matrix
Code:
setwd(“D:/”);
getwd();
M=matrix(c(1,2,3,4,5,6,25,65,75,85,62,105),ncol=2);
M;
Mdata=data.frame(M);
Mdata;
x=Mdata$X1;
y=Mdata$X2;
reg=lm(y~x);
reg;
plot(x,y);
plot(x,y,abline(reg));
p=data.frame(x=7);
est=predict(reg,p);
est;
Output:

Techniques
SYBSC.IT
plot(x,y,col="green",abline(reg));
plot(x,y,col="green",pch=16,abline(reg));
plot(x,y,col="green",pch=16,abline(reg,col="red"));

Techniques
SYBSC.IT

Techniques
SYBSC.IT
PRACTICAL NO .10
AIM: To perform Binomial and Normal Distribution.
Binomial Distribution:
Binomial Distribution model deals with finding the probability of success of a
random experiment
Example 1:
To find probability density function at each point
R built-in function used:
dbinom(x,size,prob)
where x is a vector of numbers
size is the number of trials
prob is the probability of success of each trial
Code:
x=seq(0,50,by=1);
x;
y=dbinom(x,50,0.5);
y;
plot(x,y);
Output:

Techniques
SYBSC.IT
Example 2:

Techniques
SYBSC.IT
To find cumulative probability of an event

(for example: to find probability of getting 26 or less heads from a 51 tosses of a
coin)
R built-in function used:
pbinom(x,size,prob)
size is the number of trials
prob is the probability of success of each trial
Code:
x=pbinom(26,51,0.5);
x;
Output:
Normal Distribution:
Techniques
SYBSC.IT
Normal Distribution curve is a bell-shaped curve for sufficiently high sample

size.
Example 3:
To find height of probability distribution at each point for given mean and given
sd
R built-in function used :
dnorm(x,mean,sd)
mean is the mean of distribution
sd is the standard deviation of distribution
Code:
x=seq(-10,10,by=0.1);
x;
y=dnorm(x,mean=2.5,sd=0.5);
y;
plot(x,y);
Output:

Techniques
SYBSC.IT
plot(x,y);

Techniques

INTRODUCTION

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

INTRODUCTION

Uploaded by

Copyright:

Available Formats

Name. Kishan Kumar Goud Roll no.

There are many types of R Objects.

Computer Oriented Statistical

2) numeric: Creates or coerces objects of type "numeric" numeric is a more

Computer Oriented Statistical

3) integer: Creates or tests for objects of type "integer".

4) complex: Basic functions which support complex arithmetic in R, in addition

5) character: Create or test for objects of type "character".

Accessing the Vector Elements

TRUE and FALSE can also be used for indexing.

Accessing element at specific position:

Accessing element at different position:

To drop elements of specific index:

Vector element re-cycling

Computer Oriented Statistical

Computer Oriented Statistical

Accessing list elements by index.

Computer Oriented Statistical

Name the list

Accessing list elements

Creating l1 and l2 to perform Merging

Merging List l1 and l2

Computer Oriented Statistical

Given a list structure x, unlist simplifies it to produce a vector which contains

Computer Oriented Statistical

Computer Oriented Statistical

Naming the rows, column and matrix

Function factor is used to encode a vector as a factor

Computer Oriented Statistical

Creating a Data Frame/Table

Extracting data from Data Table (Data Frame)

Computer Oriented Statistical

Expanding data frames

 Creating data frame and combining it

Computer Oriented Statistical

Computer Oriented Statistical

Matrices of R object in which elements are arranged in 2D rectangular layout it

Multiplies two matrices, if they are conformable. If one argument is a vector, it

Computer Oriented Statistical

Computer Oriented Statistical

Computer Oriented Statistical

Computer Oriented Statistical

Computer Oriented Statistical

 Create a vector v3 find length and mode

Computer Oriented Statistical

Computer Oriented Statistical

 Calculate Mean, Median, Quartile, Range, IQR and Histogram of

Computer Oriented Statistical

 Create a notepad file “Student” save it as “Student.csv” to the current

 Calculate Mean, Median, Quartile, Range, IQR, Mode and Histogram

Computer Oriented Statistical

 To import a file from excel we need to install few packages.

To install the packages:

 Create a Employee.xlsx file.

Computer Oriented Statistical

 Import Employee.xlsx using R.

Computer Oriented Statistical

 Calculate Mean, Median, Quartile, Range, IQR, Mode and Histogram of

Computer Oriented Statistical

Q) Length of the services is given in the hours of employee of a company. Find

Length of the services Length of services Frequency

 Create a Company.xlsx file.

 To install the packages: