Professional Documents
Culture Documents
INTRODUCTION
INTRODUCTION
432
SYBSC.IT
R PROGRAMMING
R Data types
The R Programming language the variables are not declared as some data types.
The variables are assigned with R object and data types of R object becomes
data types of variables.
PRACTICAL NO .01
AIM: Using R execute the basic commands, array, list, frames and matrices.
The simplest of R Object is vectors and there are six datatypes of these atomic
vectors.
The R Object are built upon this Atomic Vectors.
Types of R Vectors
Logical
Numeric
Integer
Complex
Character
Raw
Class
Class gives data type of the vector. R possesses a simple generic function
mechanism which can be used for an object-oriented style of programming.
Method dispatch takes place based on the class of the first argument to the
generic function.
Six data-types of atomic vectors are:
1) logical: Create or test for objects of type "logical", and the basic logical
constants.
charToRaw()
Conversion and manipulation of objects of type "raw".
6) raw: Creates or tests for objects of type "raw".
Sequence Generation
seq()
Generate regular sequences.
seq is a standard generic with a default method.
seq.int is a primitive which can be much faster but has a few restrictions.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
seq_along and seq_len are very fast primitives for two common cases.
C function
c()
This is a generic function which combines its arguments.
The default method combines its arguments to form a vector.
All arguments are coerced to a common type which is the type of the returned
value, and all attributes except names are removed.
To create a sequence
Colon:
It is used to create a regular sequence.
weekdays [c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE)]
Sorting
sort()
Description
Sort (or order) a vector or factor (partially) into ascending or descending
order.
For ordering along more than one variable, e.g., for sorting data frames,
see sort
Sorting Number
In Ascending Order
In Descending Order
Sorting characters
R-List
List of the R object which contains elements of different type like no string
vector and another list inside it.
A list can also contain a matrix or a function as its element.
List is created using list() method.
List
Description
Functions to construct, coerce and check for both kinds of R lists.
Matrix
Description
Matrix creates a matrix from the given set of values.
As matrix attempts to turn its argument into a matrix.
list1=list(c("jan","feb","march"),matrix(c(2,4,5,6),nrow=2),list("green",15.5))
Unlist a list
Unlist l3:
Arrays
Arrays are the R data objects which can store data in more than two
dimensional.
An array is created using array function.
It takes vector as an input and use the value in dim parameter to create an array,
where dim parameter specifies the dimension of the array.
Array can store data of same type.
Syntax: array(data, dim=dimension(no. of rows, no. of col, no. of 2D
arrays)dimname).
Dimnane requires list function.
Array
Description
Creates or tests for arrays.
Arguments
data: a vector (including a list or expression vector) giving data to fill the array.
Non-atomic classed objects are coerced by as vector.
dim: the dim attribute for the array to be created, that is an integer vector of
length one or more giving the maximal indices in each dimension.
Usage
array(data = NA, dim = length(data), dimnames = NULL)
Creating an array
>data=c("north","south","east","north","south","east","east","east","nor
th","south","north","south","north")
Data Frames
Data frame is a table of 20 array like structure in which each column contain
value one variable of each row contain one set of values from each column.
Characteristics of Data Frames are:
Column name should not be empty.
Row name should be unique.
The data stored in a data frame can be of numeric factors or character
type.
Each column should contain same number of data frame items.
Data frames is created with data.frame function.
str()
Description
Compactly display the internal structure of an R object, a diagnostic
function and an alternative to summary (and to some extent, dput).
Ideally, only one line for each ‘basic’ structure is displayed.
It is especially well suited to compactly display the (abbreviated) contents
of (possibly nested) lists.
The idea is to give reasonable output for any R object.
It calls args for (non-primitive) function objects.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
summary()
Description
Summary is a generic function used to produce result summaries of the
results of various model fitting functions.
The function invokes particular methods which depend on the class of the
first argument.
rbind()
Description
Take a sequence of vector, matrix or data-frame arguments and combine
by columns or rows, respectively. These are generic functions with
methods for other R classes.
PRACTICAL NO .02
AIM: Create a matrix using R and perform the operations addition,
multiplication, inverse and transpose.
Creating a matrix
Matrix addition
Matrix subtraction
Matrix Multiplication
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
x %*% y
Description
Matrix Inverse
det
Description
det calculates the determinant of a matrix. determinant is a generic
function that returns separately the modulus of the determinant,
optionally on the logarithm scale, and the sign of the determinant.
Inverse Matrix m1
Inverse Matrix m2
Matrix Transpose
Description
Given a matrix or data.frame x, t returns the transpose of x.
PRACTICAL NO .03
AIM: Execute the statistical functions mean, median, mode, quartiles, range,
interquartile-range and histogram using R.
Mean
mean()
Description
Generic function for the (trimmed) arithmetic mean.
na.rm
a logical value indicating whether NA values should be stripped before the
computation proceeds.
Median
median()
Description
Compute the sample median.
Quartile
quantile()
Description
The generic function quantile produces sample quantiles corresponding to
the given probabilities.
The smallest observation corresponds to a probability of 0 and the largest to
a probability of 1.
Range
range()
Description
range returns a vector containing the minimum and maximum of all the
given arguments.
Histogram
hist()
Description
The generic function hist computes a histogram of the given data values.
If plot = TRUE, the resulting object of class "histogram" is plotted by
plot.histogram, before it is returned.
Mode
1) User Defined
2) Using table
User Defined
> getmode=function(x)
{ #Create mode function
unique_x=unique(x)
tabulate_x=tabulate(match(x,unique_x))
unique_x[tabulate_x == max(tabulate_x)]
}
Using Table
data()
Description
Loads specified data sets, or list the available data sets.
> data()
> faithful
> eruptions=faithful$eruptions
>eruptions
PRACTICAL NO .04
AIM: Using R import data from Excel/.csv file and perform statistical
functions: mean, median, mode, quartiles, range, interquartile-range and
histogram.
Using .csv
Set path to current working directory
getwd(),setwd()
Description
getwd returns an absolute filepath representing the current working directory
of the R process.
setwd(dir) is used to set the working directory to dir.
read.csv()
Description
Reads a file in table format and creates a data frame from it, with cases
corresponding to lines and variables to fields in the file.
Import “Student.csv” in R.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
To get marks
marks=studentData$Marks
marks
Using Excel
> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked
> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked
To get salary
> salary=emp$salary
> salary
PRACTICAL NO .05
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
AIM: Using R import data from Excel/.csv file and calculate standard
deviation, variance and co-variance.
> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked
> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked
> empservices=read.xlsx("Company.xlsx",sheetIndex=1)
> empservices
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
Replicate Elements
rep()
Description
rep replicates the values in x. It is a generic function, and the (internal)
default method is described here.
> totalservices=rep(empservices$Length_of_services,empservices$hrs)
> totalservices
Standard Deviation:
sd()
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
Description
This function computes the standard deviation of the values in x.
If na.rm is TRUE then missing values are removed before computation
proceeds.
Variance:
var()
Description
var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors.
If x and y are matrices then the covariances (or correlations) between the
columns of x and the columns of y are computed.
Co-variance:
cov()
Q) Marks of six students in English and Mathematics are given as follows find
the co-variance between marks of English and mathematics.
English Marks Mathematics Marks
68 88
85 89
78 79
69 85
67 77
82 60
> marks=read.xlsx("marks.xlsx",sheetIndex=1)
> marks
Co-variance
> cov(marks$English_Marks,marks$Mathematics_Marks)
When we change the marks and make it almost equal-equal we are getting
Co-variance in positive.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
PRACTICAL NO .06
AIM: Using R import data from Excel/.csv file and calculate its skewness and
kurtosis value.
> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked
> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked
> empservices=read.xlsx("Company.xlsx",sheetIndex=1)
> empservices
> totalservices=rep(empservices$Length_of_services,empservices$hrs)
> totalservices
Skewness
skewness()
Description
This function computes skewness of given data.
Kurtosis
kurtosis()
Description
This function computes the estimator of Pearson's measure of kurtosis.
> skewness(totalservices)
> kurtosis(totalservices)
PRACTICAL NO .07
AIM: Using R import data from Excel/.csv file and perform hypothesis testing
in R.
Definition:
Statistical Hypothesis
Statistical Hypothesis is either
1) A statement about the value of a population parameter.
2) A statement about the kind of probability distribution that a certain
variable obeys.
Example 1: The mean age of all the university students 23.4 years.
Example 2: The population of university student who are women is 76%.
Example 3: The population of books in public library whose height exceeds
30cm is less than equal to 0.13 (30<_0.13).
Types of Hypothesis:
Types of Errors:
Type I Error
Type II Error
Question 1)
The data set atkins-diet.csv consists weight loss experience by dieter using
Atkins diet test for the atkin hypothesis says that people who use their method
lose on average at least 20 pounds in 6 months can we reject the claim by
Atkins-diet.
> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked
> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked
Import AtkinsDiet.csv to R
Load the xlsx package.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
> library(xlsx)
Set path to current working directory
> setwd("D:/")
> getwd()
Method 1:
Method 2:
Question 2)
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
The data set Iris that is given in R that gives measurement of sepal length, sepal
width, petal length, petal width for 50 flowers from each of three species of iris
flower which are setosa, versicolor and virginica.
A) Test the hypothesis that mean of the sepal length of virginica species =6.15.
2
Load data of iris in R
> data()
> iris
> virginica=iris[iris$Species=="virginica",]
> virginica
Conclusion: As true mean is not equal to 6.15 the null hypothesis can be
rejected. Therefore, mean of the length of Virginica species is not equal 6.15
library(WriteXLS)
write.xlsx(Mdf,"D:/Mdx.xlsx")
PRACTICAL NO .08
AIM: To perform Chi-Square test.
> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked
> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked
Import clinical.csv to R
Load the xlsx package.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
> library(xlsx)
Set path to current working directory
> setwd("D:/")
> getwd()
Example 1
Output:
M=matrix(c(71,42,49,78),nrow=2)
M
Mdata=data.frame(M)
Mdata
Mchi=chisq.test(Mdata)
Mchi
Output:
Example 2:
Check association between AirBags and Type of the Cars93 database of R
Code:
library(“MASS”)
print(str(Cars93))
car.data=data.frame(Cars93$AirBags,Cars93$Type)
car.data
car.data=table(Cars93$AirBags,Cars93$Type)
Carschi=chisq.test(car.data)
Carschi
Output:
PRACTICAL NO .09
AIM: To perform Linear Regression.
> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked
> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked
Import regression.csv to R
Load the xlsx package.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
> library(xlsx)
Set path to current working directory
> setwd("D:/")
> getwd()
Example 1
x>1,2,3,4,5,6
y>25,65,75,85,62,105
Perform Linear Regression in R
Code:
setwd(“D:/”)
getwd()
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
rdata=read.csv(“regression.csv”);
rdata;
x=rdata$year;
y=rdata$sales;
reg=lm(y~x);
plot(x,y);
plot(x,y,col=”blue”,pch=16,cex=2.2,xlab=”Year”,ylab=”Sales”,main=”Perform
ance”,abline(reg,col=”red”));
p=data.frame(x=7);
est=predict(reg,p);
est;
Output:
Code:
setwd(“D:/”);
getwd();
M=matrix(c(1,2,3,4,5,6,25,65,75,85,62,105),ncol=2);
M;
Mdata=data.frame(M);
Mdata;
x=Mdata$X1;
y=Mdata$X2;
reg=lm(y~x);
reg;
plot(x,y);
plot(x,y,abline(reg));
p=data.frame(x=7);
est=predict(reg,p);
est;
Output:
plot(x,y,abline(reg));
plot(x,y,abline(reg));
plot(x,y,col="green",abline(reg));
plot(x,y,col="green",pch=16,abline(reg));
plot(x,y,col="green",pch=16,abline(reg,col="red"));
PRACTICAL NO .10
AIM: To perform Binomial and Normal Distribution.
Binomial Distribution:
Binomial Distribution model deals with finding the probability of success of a
random experiment
Example 1:
To find probability density function at each point
R built-in function used:
dbinom(x,size,prob)
where x is a vector of numbers
size is the number of trials
prob is the probability of success of each trial
Code:
x=seq(0,50,by=1);
x;
y=dbinom(x,50,0.5);
y;
plot(x,y);
Output:
Example 2:
Code:
x=pbinom(26,51,0.5);
x;
Output:
Normal Distribution:
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT
Example 3:
To find height of probability distribution at each point for given mean and given
sd
R built-in function used :
dnorm(x,mean,sd)
where x is a vector of numbers
mean is the mean of distribution
sd is the standard deviation of distribution
Code:
x=seq(-10,10,by=0.1);
x;
y=dnorm(x,mean=2.5,sd=0.5);
y;
plot(x,y);
Output:
plot(x,y);