Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 65

Name. Kishan Kumar Goud Roll no.

432
SYBSC.IT

R PROGRAMMING

R Data types
The R Programming language the variables are not declared as some data types.
The variables are assigned with R object and data types of R object becomes
data types of variables.

There are many types of R Objects.


Vectors
List
Matrices
Arrays
Factors
Data Frames

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .01
AIM: Using R execute the basic commands, array, list, frames and matrices.
The simplest of R Object is vectors and there are six datatypes of these atomic
vectors.
The R Object are built upon this Atomic Vectors.

Types of R Vectors

Logical
Numeric
Integer
Complex
Character
Raw

Class
 Class gives data type of the vector. R possesses a simple generic function
mechanism which can be used for an object-oriented style of programming.
 Method dispatch takes place based on the class of the first argument to the
generic function.
Six data-types of atomic vectors are:
1) logical: Create or test for objects of type "logical", and the basic logical
constants.

2) numeric: Creates or coerces objects of type "numeric" numeric is a more


general test of an object being interpretable as numbers.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

3) integer: Creates or tests for objects of type "integer".

4) complex: Basic functions which support complex arithmetic in R, in addition


to the arithmetic operators +, -, *, /, and ^.

5) character: Create or test for objects of type "character".

charToRaw()
 Conversion and manipulation of objects of type "raw".
6) raw: Creates or tests for objects of type "raw".

Sequence Generation
seq()
Generate regular sequences.
seq is a standard generic with a default method.
seq.int is a primitive which can be much faster but has a few restrictions.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

seq_along and seq_len are very fast primitives for two common cases.

C function
c()
This is a generic function which combines its arguments.
The default method combines its arguments to form a vector.
All arguments are coerced to a common type which is the type of the returned
value, and all attributes except names are removed.

To create a sequence
Colon:
 It is used to create a regular sequence.

Accessing the Vector Elements


Elements of the vectors are accessed by index sign.
[] (square brackets) are used for indexing.
Indexing starts with first position.
By giving negative value in index drops the element.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

TRUE and FALSE can also be used for indexing.

Accessing element at specific position:

Accessing element at different position:

To drop elements of specific index:

weekdays [c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE)]

Vector element re-cycling

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Sorting
sort()
Description
 Sort (or order) a vector or factor (partially) into ascending or descending
order.
 For ordering along more than one variable, e.g., for sorting data frames,
see sort

Sorting Number
 In Ascending Order

 In Descending Order

Sorting characters

R-List
List of the R object which contains elements of different type like no string
vector and another list inside it.
A list can also contain a matrix or a function as its element.
List is created using list() method.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

List
Description
 Functions to construct, coerce and check for both kinds of R lists.

Matrix
Description
 Matrix creates a matrix from the given set of values.
 As matrix attempts to turn its argument into a matrix.

list1=list(c("jan","feb","march"),matrix(c(2,4,5,6),nrow=2),list("green",15.5))

Accessing list elements by index.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Name the list


 Function to get or set the names of an object.

Accessing list elements


 Access name of the list by using “$”

Creating l1 and l2 to perform Merging


Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Merging List l1 and l2

Unlist a list

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Given a list structure x, unlist simplifies it to produce a vector which contains


all the atomic components which occur in x.
Create a list l3:

Unlist l3:

Arrays
Arrays are the R data objects which can store data in more than two
dimensional.
An array is created using array function.
It takes vector as an input and use the value in dim parameter to create an array,
where dim parameter specifies the dimension of the array.
Array can store data of same type.
Syntax: array(data, dim=dimension(no. of rows, no. of col, no. of 2D
arrays)dimname).
Dimnane requires list function.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Array
Description
 Creates or tests for arrays.
Arguments

data: a vector (including a list or expression vector) giving data to fill the array.
Non-atomic classed objects are coerced by as vector.

dim: the dim attribute for the array to be created, that is an integer vector of
length one or more giving the maximal indices in each dimension.

Usage
array(data = NA, dim = length(data), dimnames = NULL)

Creating an array

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Naming the rows, column and matrix

Function factor is used to encode a vector as a factor


factor()
Description
 The function factor is used to encode a vector as a factor (the terms
‘category’ and ‘enumerated type’ are also used for factors).
 If argument ordered is TRUE, the factor levels are assumed to be ordered.
For compatibility with S there is also a function ordered.

>data=c("north","south","east","north","south","east","east","east","nor
th","south","north","south","north")

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Data Frames
Data frame is a table of 20 array like structure in which each column contain
value one variable of each row contain one set of values from each column.
Characteristics of Data Frames are:
 Column name should not be empty.
 Row name should be unique.
 The data stored in a data frame can be of numeric factors or character
type.
 Each column should contain same number of data frame items.
 Data frames is created with data.frame function.

Creating a Data Frame/Table


>empdata=data.frame(empID=c(101:105),empName=c("Taehyung","Jungkook
","Jimin","Suga","Jin"),Salary=c(18000,14000,20000,23000,25000),startDate=a
s.Date(c("2018-03-25","2019-02-24","2020-5-21","2018-03-25","2018-03-
2")),stringsAsFactors=FALSE)

str()
Description
 Compactly display the internal structure of an R object, a diagnostic
function and an alternative to summary (and to some extent, dput).
 Ideally, only one line for each ‘basic’ structure is displayed.
 It is especially well suited to compactly display the (abbreviated) contents
of (possibly nested) lists.
 The idea is to give reasonable output for any R object.
 It calls args for (non-primitive) function objects.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

summary()
Description
 Summary is a generic function used to produce result summaries of the
results of various model fitting functions.
 The function invokes particular methods which depend on the class of the
first argument.

Extracting data from Data Table (Data Frame)


data.frame()
Description
 The function data.frame() creates data frames, tightly coupled collections
of variables which share many of the properties of matrices and of lists,
used as the fundamental data structure by most of R's modeling software.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Expanding data frames


 Adding columns to the data frame
>empdata$dept=c("IT","Finance","Testing","Development","HR")

 Creating data frame and combining it


>empnewdata=data.frame(empID=c(106:108),empName=c("Jennie","Lisa","R
ose"),Salary=c(12000,19000,24000),startDate=as.Date(c("2020-01-5","2018-
01-2","2019-5-21")),
dept=c("HR","IT","Development"),stringsAsFactors=FALSE)

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

rbind()
Description
 Take a sequence of vector, matrix or data-frame arguments and combine
by columns or rows, respectively. These are generic functions with
methods for other R classes.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .02
AIM: Create a matrix using R and perform the operations addition,
multiplication, inverse and transpose.

Matrices of R object in which elements are arranged in 2D rectangular layout it


contains elements of same atomic type.

 Creating a matrix

 Matrix addition

 Matrix subtraction

 Matrix Multiplication
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 x %*% y

Description

Multiplies two matrices, if they are conformable. If one argument is a vector, it


will be promoted to either a row or column matrix to make the two arguments
conformable. If both are vectors of the same length, it will return the inner
product (as a matrix).

 Matrix Inverse
det
Description
 det calculates the determinant of a matrix. determinant is a generic
function that returns separately the modulus of the determinant,
optionally on the logarithm scale, and the sign of the determinant.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Inverse Matrix m1

 Inverse Matrix m2

 Matrix Transpose
Description
 Given a matrix or data.frame x, t returns the transpose of x.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .03
AIM: Execute the statistical functions mean, median, mode, quartiles, range,
interquartile-range and histogram using R.

Mean
 mean()

Description
 Generic function for the (trimmed) arithmetic mean.

 na.rm
a logical value indicating whether NA values should be stripped before the
computation proceeds.

Median
 median()
Description
 Compute the sample median.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Quartile
 quantile()
Description
 The generic function quantile produces sample quantiles corresponding to
the given probabilities.
 The smallest observation corresponds to a probability of 0 and the largest to
a probability of 1.

Range
 range()
Description
 range returns a vector containing the minimum and maximum of all the
given arguments.

Histogram
 hist()
Description
 The generic function hist computes a histogram of the given data values.
 If plot = TRUE, the resulting object of class "histogram" is plotted by
plot.histogram, before it is returned.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Mode
1) User Defined
2) Using table

User Defined
> getmode=function(x)
{ #Create mode function
unique_x=unique(x)
tabulate_x=tabulate(match(x,unique_x))
unique_x[tabulate_x == max(tabulate_x)]
}

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Create a vector v3 find length and mode

 Using Table

 data()
Description
 Loads specified data sets, or list the available data sets.

> data()
> faithful
> eruptions=faithful$eruptions
>eruptions

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Calculate Mean, Median, Quartile, Range, IQR and Histogram of


eruptions.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .04
AIM: Using R import data from Excel/.csv file and perform statistical
functions: mean, median, mode, quartiles, range, interquartile-range and
histogram.

Using .csv
Set path to current working directory
 getwd(),setwd()

Description
 getwd returns an absolute filepath representing the current working directory
of the R process.
 setwd(dir) is used to set the working directory to dir.

 read.csv()
Description
 Reads a file in table format and creates a data frame from it, with cases
corresponding to lines and variables to fields in the file.

 Create a notepad file “Student” save it as “Student.csv” to the current


working directory.

 Import “Student.csv” in R.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 To get marks
marks=studentData$Marks
marks

 Calculate Mean, Median, Quartile, Range, IQR, Mode and Histogram


of marks.

Using Excel

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 To import a file from excel we need to install few packages.


1) xlsx
 The xlsx package gives programatic control of Excel files using R.
2)xlsxjars
 The xlsxjars package collects all the external jars required for the xlxs
package.
3)rJava
 Low-level interface to Java VM very much like .C/.Call and friends.
 Allows creation of objects, calling methods and accessing fields.

To install the packages:


> install.packages("xlsx")
package ‘xlsx’ successfully unpacked and MD5 sums checked

> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked

> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked

 Create a Employee.xlsx file.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Import Employee.xlsx using R.


Load the xlsx package.
> library(xlsx)
Set path to current working directory
> setwd("D:/")
Importing .xlsx file
> emp=read.xlsx("Employee.xlsx",sheetIndex=1)
> emp

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 To get salary
> salary=emp$salary
> salary

 Calculate Mean, Median, Quartile, Range, IQR, Mode and Histogram of


salary.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .05
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

AIM: Using R import data from Excel/.csv file and calculate standard
deviation, variance and co-variance.

Q) Length of the services is given in the hours of employee of a company. Find


Standard Deviation and variance of the length of the services.

Length of the services Length of services Frequency


(Group Data)
10-15 12.5 12
15-20 17.5 127
20-25 22.5 18
25-30 27.5 7
30-35 52.5 65
35-40 37.5 2

 Create a Company.xlsx file.

 To install the packages:


> install.packages("xlsx")

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

package ‘xlsx’ successfully unpacked and MD5 sums checked

> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked

> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked

 Import Company.xlsx using R.ji


Load the xlsx package.
> library(xlsx)
Set path to current working directory
> setwd("D:/")
Importing .xlsx file

> empservices=read.xlsx("Company.xlsx",sheetIndex=1)
> empservices
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Replicate Elements
rep()
Description
 rep replicates the values in x. It is a generic function, and the (internal)
default method is described here.
> totalservices=rep(empservices$Length_of_services,empservices$hrs)
> totalservices

 Standard Deviation:
sd()
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Description
 This function computes the standard deviation of the values in x.
 If na.rm is TRUE then missing values are removed before computation
proceeds.

 Variance:
var()
Description
 var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors.
 If x and y are matrices then the covariances (or correlations) between the
columns of x and the columns of y are computed.

Calculate standard deviation, variance and mean of totalservices.


> stddev=sd(totalservices)
> stddev
> var(totalservices)
> mean(totalservices)

 Co-variance:
cov()

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Co-variance is the measure the joint variability of two random variables.


 Description var, cov and cor compute the variance of x and the covariance or
correlation of x and y if these are vectors.
 If x and y are matrices then the covariances (or correlations) between the
columns of x and the columns of y are computed.

Q) Marks of six students in English and Mathematics are given as follows find
the co-variance between marks of English and mathematics.
English Marks Mathematics Marks

68 88
85 89
78 79
69 85
67 77
82 60

 Create a marks.xlsx file.

 Load the xlsx package.


> library(xlsx)

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Set path to current working directory


> setwd("D:/")
 Importing .xlsx file

> marks=read.xlsx("marks.xlsx",sheetIndex=1)
> marks

 Co-variance
> cov(marks$English_Marks,marks$Mathematics_Marks)

 When we change the marks and make it almost equal-equal we are getting
Co-variance in positive.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .06

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

AIM: Using R import data from Excel/.csv file and calculate its skewness and
kurtosis value.

 To install the packages:


> install.packages("xlsx")
package ‘xlsx’ successfully unpacked and MD5 sums checked

> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked

> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked

We have to install moments package to find skewness and kurtosis


 To install the packages:
> install.packages("moments")
package ‘moments’ successfully unpacked and MD5 sums checked

Q) Length of the services is given in the hours of employee of a company. Find


Standard Deviation and variance of the length of the services.
Length of the services Length of services Frequency
(Group Data)
10-15 12.5 12
15-20 17.5 127
20-25 22.5 18
25-30 27.5 7
30-35 52.5 65
35-40 37.5 2
 Create a Company.xlsx file.

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Import Company.xlsx using R.ji


Load the xlsx package.
> library(xlsx)
>library(moments)
Set path to current working directory
> setwd("D:/")
Importing .xlsx file

> empservices=read.xlsx("Company.xlsx",sheetIndex=1)
> empservices

> totalservices=rep(empservices$Length_of_services,empservices$hrs)

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

> totalservices

 Skewness
skewness()
Description
 This function computes skewness of given data.

 Kurtosis
kurtosis()
Description
 This function computes the estimator of Pearson's measure of kurtosis.

> skewness(totalservices)
> kurtosis(totalservices)

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .07
AIM: Using R import data from Excel/.csv file and perform hypothesis testing
in R.

Definition:
Statistical Hypothesis
Statistical Hypothesis is either
1) A statement about the value of a population parameter.
2) A statement about the kind of probability distribution that a certain
variable obeys.
Example 1: The mean age of all the university students 23.4 years.
Example 2: The population of university student who are women is 76%.
Example 3: The population of books in public library whose height exceeds
30cm is less than equal to 0.13 (30<_0.13).

Types of Hypothesis:

Null Hypothesis: Denoted as Ho.


 It is a statistical hypothesis that the observation is due to chance factor.
 It is denoted by Ho.

Alternative Hypothesis: Denoted as H1 or Ha.


 It shows that observation are result of near effects.
 It is denoted by H1 or Ha.

Types of Errors:

 Type I Error
 Type II Error

Question 1)

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

The data set atkins-diet.csv consists weight loss experience by dieter using
Atkins diet test for the atkin hypothesis says that people who use their method
lose on average at least 20 pounds in 6 months can we reject the claim by
Atkins-diet.

 To install the packages:


> install.packages("xlsx")
package ‘xlsx’ successfully unpacked and MD5 sums checked

> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked

> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked

 Import AtkinsDiet.csv to R
Load the xlsx package.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

> library(xlsx)
Set path to current working directory
> setwd("D:/")
> getwd()

Importing .xlsx file


> AtkinsDiet=read.csv("AtkinsDiet.csv")
> AtkinsDiet

 To get the data of Loss.at.6.Months


> LossAt6Months=AtkinsDiet$Loss.at.6.Months
> LossAt6Months
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Method 1:

Method 2:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Conclusion: As true mean is less than 20 the H0 hypothesis can be rejected.


Therefore, we can reject the claim by Atkins diet.

Question 2)
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

The data set Iris that is given in R that gives measurement of sepal length, sepal
width, petal length, petal width for 50 flowers from each of three species of iris
flower which are setosa, versicolor and virginica.

A) Test the hypothesis that mean of the sepal length of virginica species =6.15.
2
Load data of iris in R
> data()
> iris

> virginica=iris[iris$Species=="virginica",]
> virginica

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

 Mean of the length of Virginica species =6.15

Conclusion: As true mean is not equal to 6.15 the null hypothesis can be
rejected. Therefore, mean of the length of Virginica species is not equal 6.15

library(WriteXLS)
write.xlsx(Mdf,"D:/Mdx.xlsx")

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .08
AIM: To perform Chi-Square test.

 To install the packages:


> install.packages("xlsx")
package ‘xlsx’ successfully unpacked and MD5 sums checked

> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked

> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked

 Import clinical.csv to R
Load the xlsx package.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

> library(xlsx)
Set path to current working directory
> setwd("D:/")
> getwd()

Example 1

 Create a clinical.xlsx file.

Method 1: import data using csv file


Code:
setwd(“D:/”)
cdata=read.csv(“clinical.csv”)
cdata
chidata=data.frame(cdata$p,cdata$q)
chidata
chtest=chisq.test(chidata)
chtesT

Output:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Method 1: using matrix


Code:
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

M=matrix(c(71,42,49,78),nrow=2)
M
Mdata=data.frame(M)
Mdata
Mchi=chisq.test(Mdata)
Mchi

Output:

Example 2:
Check association between AirBags and Type of the Cars93 database of R

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Code:
library(“MASS”)
print(str(Cars93))
car.data=data.frame(Cars93$AirBags,Cars93$Type)
car.data
car.data=table(Cars93$AirBags,Cars93$Type)
Carschi=chisq.test(car.data)
Carschi

Output:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .09
AIM: To perform Linear Regression.

 To install the packages:


> install.packages("xlsx")
package ‘xlsx’ successfully unpacked and MD5 sums checked

> install.packages("xlsxjars")
package ‘xlsxjars"’ successfully unpacked and MD5 sums checked

> install.packages("rJava")
package ‘rJava’ successfully unpacked and MD5 sums checked

 Import regression.csv to R
Load the xlsx package.
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

> library(xlsx)
Set path to current working directory
> setwd("D:/")
> getwd()

Example 1
x>1,2,3,4,5,6
y>25,65,75,85,62,105
Perform Linear Regression in R

Method 1: import data using csv file


<regression.csv file includes data as given in the question>

 Create a regression.xlsx file.

Code:
setwd(“D:/”)
getwd()
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

rdata=read.csv(“regression.csv”);
rdata;
x=rdata$year;
y=rdata$sales;
reg=lm(y~x);
plot(x,y);
plot(x,y,col=”blue”,pch=16,cex=2.2,xlab=”Year”,ylab=”Sales”,main=”Perform
ance”,abline(reg,col=”red”));
p=data.frame(x=7);
est=predict(reg,p);
est;

Output:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Method 2: using matrix

Code:

setwd(“D:/”);
getwd();
M=matrix(c(1,2,3,4,5,6,25,65,75,85,62,105),ncol=2);
M;
Mdata=data.frame(M);
Mdata;
x=Mdata$X1;
y=Mdata$X2;
reg=lm(y~x);
reg;
plot(x,y);
plot(x,y,abline(reg));
p=data.frame(x=7);
est=predict(reg,p);
est;

Output:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

plot(x,y,abline(reg));
plot(x,y,abline(reg));
plot(x,y,col="green",abline(reg));
plot(x,y,col="green",pch=16,abline(reg));
plot(x,y,col="green",pch=16,abline(reg,col="red"));

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

PRACTICAL NO .10
AIM: To perform Binomial and Normal Distribution.

Binomial Distribution:
Binomial Distribution model deals with finding the probability of success of a
random experiment
Example 1:
To find probability density function at each point
R built-in function used:
dbinom(x,size,prob)
where x is a vector of numbers
size is the number of trials
prob is the probability of success of each trial

Code:
x=seq(0,50,by=1);
x;
y=dbinom(x,50,0.5);
y;
plot(x,y);

Output:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Example 2:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

To find cumulative probability of an event


(for example: to find probability of getting 26 or less heads from a 51 tosses of a
coin)
R built-in function used:
pbinom(x,size,prob)
where x is a vector of numbers
size is the number of trials
prob is the probability of success of each trial

Code:
x=pbinom(26,51,0.5);
x;

Output:

Normal Distribution:
Computer Oriented Statistical
Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

Normal Distribution curve is a bell-shaped curve for sufficiently high sample


size.

Example 3:
To find height of probability distribution at each point for given mean and given
sd
R built-in function used :
dnorm(x,mean,sd)
where x is a vector of numbers
mean is the mean of distribution
sd is the standard deviation of distribution

Code:
x=seq(-10,10,by=0.1);
x;
y=dnorm(x,mean=2.5,sd=0.5);
y;
plot(x,y);

Output:

Computer Oriented Statistical


Techniques
Name. Kishan Kumar Goud Roll no. 432
SYBSC.IT

plot(x,y);

Computer Oriented Statistical


Techniques

You might also like