Professional Documents
Culture Documents
R-Training For Print
R-Training For Print
org
Choose "Download / CRAN"
R Objects
R is Object Oriented Programming (OOP) language. There are many built-in objects of R. Some common
R-objects used to handle data are
Vectors
List
Matrices
Factors
Data Frames
Arrays
Vectors
A vector is combination of two or more variables all of same data type. It is the simplest type of R object.
All variables so far created previously are R objects containing one element (member).
Creating Vectors
Using 'c( )' function
> ab <- c(23,35,56) > ab
> ac <- c("Nepal","India","China") > ac
> ad<-c(TRUE,TRUE,2<3,0==0) > ad
> ae<-c(3+4i,7+2i) > ae
Factors
Introduction
A factor is a R data type that stores categorical variables. Such type of data types are abundantly used in
statistical modeling.
A data variable is said to be of categorical, if the contents to be included in it are not all different, but can
be any one of two or more types.
For example, variables related to gender may be of only two types- male and female.
Variables related to blood group may be any one of four types- A, B, AB and O.
Variable related to GPA grade may be any one of types, A, B, C, D, E and F.
Here gender and blood group variables have no intrinsic ordering, however, the GPA grade has an
intrinsic ordering. The categorical variables which have no intrinsic ordering is said to be nominal
variable.
Creating Factors of Nominal Category
Suppose the gender of 6 consecutive customers entering a restaurant are observed to be "male, male,
female, male, female, male"
To store these values as a factor data type
> factor(c("Male", "Male", "Female", "Male", "Female", "Male"))
Or,
gen_fact = factor(c("Male", "Male", "Female", "Male", "Female", "Male"))
Or,
gen = c("Male", "Male", "Female", "Male", "Female", "Male")
gen_fact = factor(gen)
If one views the structure of this factor by using 'str()' function, then according to alphabetical order
"Large" is provided value 1, "Medium" is provide value 2 and "Small" is provided value 3.
To provide values 1, 2 and 3 for "Small", "Medium" and "Large", one can use 'levles' attribute of factor
function, as
> ts_fact = factor(tshirt_size, ordered = TRUE, levels = c("Small","Medium", "Large"))
The categorical variables which are defined for certain ranges are called interval variables. For example:
(a) age-group (0 10, 10 20, 20- 30, etc.) (b) income groups ( $ 100 500, $ 600 1000, etc.)
Description on interval variables and ratio variable are left over now.
Accessing Elements in Factors
To access a specific element in the factor created, one can use 'factor_name[position]". E.g.
> ts_fact[2]
> ts_fact[c(1,3,4)]
Matrix
A matrix is a two dimensional rectangular data set in which data values are arranged into rows and
columns.
Creating Matrices
Creating matrix of integers
> m2 = matrix(c(12, 43, 43, 23,34, 26), nrow = 2, ncol= 3)
> m2 = matrix(c(12, 43, 43, 23,34, 26), nrow = 2, ncol= 3, byrow=TRUE)
Data Frame
Introduction
R is a statistical programming language and in Statistics we work with datasets. Such data sets typically
comprises of observations. All observations consist of some variables which may be of different types.
For example - .........
In datasets, different instances of observations are stored in different rows. Each of these observations
has specific attributes, e.g. name, age, gender, score, etc. Since there will be a lot of observations a
particular attribute is placed in same column of dataset.
So, a dataset is similar to matrix, since it is a two dimensional array consisting of rows and columns.
However, a matrix can contain all data of same type, but a dataset needs each observation containing
data of one or more different data type.
In fact, a list represents a single observation (row) of dataset and a dataset can also be created by using
list of lists.
However, R provides a special way to create a dataset and it is by using object 'dataframe'.
A data frame is fundamental data structure that stores datasets.
In a data frame all columns contains elements of same data type and they represent different attributes
of observations. Data representing common attribute of different observations are placed in a particular
column of data frame. In the same way, rows contain list of elements belonging to a particular instance
or particular observation.
Creating data frame
A data set is created by using 'data.frame()' function.
Let us create a data frame containing three columns (or vectors) of names- name, age, and gender, each
containing five observations.
> name = c("Roni", "Rabi", "Sunita", "Arjun", "Mani")
> age = c(34, 65, 45, 23, 34)
> male = c(TRUE, TRUE,FALSE,TRUE,TRUE)
> df = data.frame(name, age, male)
> df
Labeling Variables of Data Frame
To provide clear descriptive labels to the variables, i.e., columns, 'names()' function is used as
> names(df) = c("Name-of-Students", "Age", "Male")
An alternative method is
> df = data.frame(Name-of-Student = name, Age = age, Male = male)
Or,
> df = data.frame("Name-of-Student" = name, Age = age, Male = male)
In the same way, different rows of observations can also be named. (Later)
To View Structure of Data Frame
To view the structure of the data frame, 'str()' function is used.
E.g.
> str(df)
To Access Elements of Data Frame
a) By Treating Data Frame as Matrix
To access age of third person (since age is in second column of dataframe-
> df[3, 2]
> dg[3, "Age"] //'Age' is name of variable 'age'
To display all records of third person
> df[ 3 , ]
To display names of all students, i.e., first column-
> df [ , 1]
> df[ , "Name-of-Student"]
To display the data in third and fifth row
> df[ c(3,5), ]
To display the data from second to fourth row
> df[ 2:4, ]
To display data in entire observations in first and third columns, i.e., name and male columns
> df[ , c(1,3)]
To display data in entire observations from second to third row
> df[ , 2:3]
Array
While matrices of are confined to two dimensions, arrays can be any number of dimensions. In fact,
vectors, lists and factors are one dimensional array. Similarly, matrices are two dimensional arrays.
Creating arrays
Marks of 3 students in 4 subjects recorded for two terminal examinations can be presented in the form
of a 3-dimensional array as 2 number of 3 x 3 matrices as follows:
a) > ar1 = array(c(24,65,76,54,34,56,67,67,78,78,76,56,47,84,57,63,35,45,67,89,87,56,34,23),
dim=c(4,3,2))
b) > term1 = matrix(c(24,65,76,54,34,56,67,67,78,78,76,56), nrow=4, ncol=3)
> term2= matrix(c(47,84,57,63,35,45,67,89,87,56,34,23), nrow = 4, ncol = 3)
> ar2 = array(c(m1, m2), dim = c(4,3,2))
c) > sub11 = c(24,65,76,54) > sub21 = c(34,56,67,67)
> sub31 = c(78,78,76,56) > sub12 = c(47,84,57,63)
> sub22 = c(35,45,67,89) > sub32 = c(87,56,34,23)
> ar3 = array(matrix(c(sub11, sub21, sub31, sub12, sub22, sub32),nrow=4, ncol=3), dim=c(4,3,2))
Manipulating Arrays
To provide names to the row headings, column heading, and matrix headings in above array.
a)
> ar4 = array(c(24,65,76,54,34,56,67,67,78,78,76,56,47,84,57,63,35,45,67,89,87,56,34,23), dim=c(4,3,2),
dimnames=list(c("Stud1", "Stud2", "Stud3","Stud4"),c("Sub1", "Sub2", "Sub3"),c("Term1", "Term2")))
b)
c)
To display marks of "Stud2" in "Sub3" in "Term1"
ar4[2, 3, 1]
To display marks of "Stud2" in all subjects in the first term
ar4[2, , 1]
To display average mark of "Stud2" (of all subjects) in first term
mean(ar4[2, , 1])
To display marks of all students in "Sub3" in "Term2"
ar4[ , 3, 2]
To display average mark of all students in "Sub3" in "Term2"
mean(ar4[ , 3, 2])
To display all marks of "Term1"
ar4[ , , 1]
To display sum of all marks in "Term1"
sum(ar4[ , , 1])
To display average marks of all students in both terms
apply(ar4, c(1), mean)
To display average marks in all subjects in both terms
apply(ar4, c(2), mean)
To display grand average marks of all students in all subjects in both terms
apply(ar4, c(3), mean)
To display grand average marks different students in all subject in "Term1"
apply(ar4[ , , 1], c(1), mean)
To display grand average marks in different subjects of all students in "Term1"
apply(ar4[ , , 1], c(2), mean)