Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Website for downloading R www.r-project.

org
Choose "Download / CRAN"

Working with Scalars


Basic Data Types
Numeric
Character
Logical
Complex
Displaying Scalar Values
Displaying numeric values
print(47)
print(47.5)
print(35 + 56)

Displaying textual values


print ("rabi")
print (rabi)
print("Rabi is working")

Displaying logical values


print(TRUE)
print(FALSE)
print(3>2)
print(47.1==47.2)
print(47.1=47.2) // not allowed

Displaying Complex Values


print(3+2i)
print ( (3 + 2i) + (6 + 7i))
Using Memory with Scalar Values
Storing Scalar Values in Memory as Variables
aa=7
bb<-56.6
cc<-"My Nepal"
dd<-TRUE
ee<-3 + 2i

Displaying values in variables


> print(aa)
> print(bb) >print(cc) > print(dd) > print(ee)
Note: Values in variables can be displayed simply by typing its name, such as
> aa > bb > cc > dd > ee
To display type of variables Use 'class()'function, or also 'mode()' function
> class(aa) > class(bb) ....
To display combinations of textual prompts and variable values- Use 'cat()' function
E.g. > cat("The value in variable 'aa' is",aa,"\n")
> cat("The type of variable dd is",class(dd),"\n")

R Objects
R is Object Oriented Programming (OOP) language. There are many built-in objects of R. Some common
R-objects used to handle data are
Vectors
List
Matrices
Factors
Data Frames
Arrays

Vectors
A vector is combination of two or more variables all of same data type. It is the simplest type of R object.
All variables so far created previously are R objects containing one element (member).
Creating Vectors
Using 'c( )' function
> ab <- c(23,35,56) > ab
> ac <- c("Nepal","India","China") > ac
> ad<-c(TRUE,TRUE,2<3,0==0) > ad
> ae<-c(3+4i,7+2i) > ae

Using 'assign()' function


> assign("a", 7)
> assign("b", c(1,2,3,4))
Creating arithmetic sequences
a) Using ' : ' Operator
> 1:30 > ba<- 1:30 > baa<-5.6:12.6 > bab < - 5.6: 12.7

b) Using 'seq()' function


Syntax: seq(start_value, end_value, increment)
>bb <- seq(3,54) or bb<-seq(3,54,1)
> bc<-seq(3,54,2)
> bd<-seq(1,10,0.5)
> be <-seq(50,0,-5)
> bf<-seq(10,100, by = 10)
> bg<-seq(length = 40, from = 4.6, by = 0.2)
Accessing Vector Elements
a) Using position of element
> rb = c("Violet","Indigo","Blue","Green","Yellow","Orange","Red")
> rb[2] > rb[c(2, 5)] > rb[2:5] > rc = rb[3] > rc
> rd = rb[c(3,4,7)] > rd

b) Using logical indexing


> re = rb[c(TRUE, TRUE, FALSE,FALSE,TRUE,TRUE,FALSE)]
c) Using negative indexing
> rf = rb[c(-3,-4,-6)]
Carrying Mathematical Operations on Vectors
Carrying mathematical operations on single vector
ma = c(34,65,76,21,23)
maa = ma + 2 mab = ma 2 mac = ma * 2
mad = ma/2 mada = 1/ma mae = ma ^2
maf = ma^(1/2) mag = ma^(1/3)
Carrying mathematical operations on two vectors
mb = c(12,54,23, 45,32)
mc = ma + mb md = ma mb
me = ma * mb mf = ma/mb
mg = c(1,2,3,2,1)
mh = mb ^ mg
Note: If length of two vectors are not same, then mathematical operations on two vectors is not
possible. However, if length of one vector is scalar multiple of another, then in this case, the
values of shorter vector is recycled while carrying mathematical operations. E.g.
> m1 = c(2, 6, 7)
> m2 = c(3, 6, 8, 7, 4, 5)
> m3 = m1 + m2
Displaying Statistical Values of Elements in Vector
mb = c(12,54,23, 45,32)
mi = mean(mb) > mi
mj = var(mb) > mj
mk = sum(mb) > mk
ml = prod(mb) > ml
mm=sqrt(mb) > mm
mn = length(mb) > mn
mo = min(mb) > mp = max(mb) > mq = sort(mb)
Working on logical and relational operators with vectors
> a = c(1:5)
>b=a>3 >b //Output - FALSE,FALSE,FALSE,TRUE,TRUE
> a==3 >a!=3
> TRUE & TRUE > TRUE & FALSE
> FALSE & TRUE > FALSE & FALSE
> TRUE | TRUE > TRUE | FALSE
> FALSE | TRUE > FALSE | FALSE
> ! FALSE > ! TRUE
> ! (TRUE & FALSE)
Mode and Length Attribute of Vectors
To display the type of elements in vector use 'mode(vector_name)'
To display the number of elements in vector use 'length(vector_name)'
Lists
Introduction
A vector contains elements of same type. A list is similar to vector, but it may contain elements of
different type.
Creating List
Exm.
> a1 = list("Rabi", 23, 54.5)
> a1
A list may contain vectors, e.g.
> aa = list(c(2, 3, 4), 21.4, TRUE) > aa
A list may also contain functions. E.g.
> bb = list( c(2, 3, 4), 21.4, sqrt(c(23, 45, 32))) > bb
A list may also contain another list. E.g.
> cc = list(c("Rara", "Phewa", "Begnas"), aa) > cc

Factors
Introduction

A factor is a R data type that stores categorical variables. Such type of data types are abundantly used in
statistical modeling.
A data variable is said to be of categorical, if the contents to be included in it are not all different, but can
be any one of two or more types.
For example, variables related to gender may be of only two types- male and female.
Variables related to blood group may be any one of four types- A, B, AB and O.
Variable related to GPA grade may be any one of types, A, B, C, D, E and F.
Here gender and blood group variables have no intrinsic ordering, however, the GPA grade has an
intrinsic ordering. The categorical variables which have no intrinsic ordering is said to be nominal
variable.
Creating Factors of Nominal Category
Suppose the gender of 6 consecutive customers entering a restaurant are observed to be "male, male,
female, male, female, male"
To store these values as a factor data type
> factor(c("Male", "Male", "Female", "Male", "Female", "Male"))
Or,
gen_fact = factor(c("Male", "Male", "Female", "Male", "Female", "Male"))
Or,
gen = c("Male", "Male", "Female", "Male", "Female", "Male")
gen_fact = factor(gen)

Interpreting Values in Factors


The distinct values that are repeated in creating a factor are called 'levels'. The names of these 'levels'
are displayed when factor is created.
In fact, these levels are sorted alphabetically.
For example- the blood group of 11 patients admitted at a hospital on a day are recorded and are
changed into factor below
> bg = factor(c("A","B","A","AB","A","O","O","A","AB","B","B"))
> bg
R stores different levels of factors as a vector of integers. R assigns integer values to different elements in
a factor in the order of the alphabetical listing.
To display the numeric integers corresponding to different elements of a factor the structure function
'str()' is used. E.g.
> str(bg)
These integers are used by R for storing textual description of elements in a factor.
By default, the values provided to different elements of a factor are set according as alphabetical
ordering. However, we can provide our own integer values to the different elements of factor by using
'levels' parameter inside 'factor()' function.
Exm.
> bg = factor(c("A","B","A","AB","A","O","O","A","AB","B","B"), levels = c("O","A","B","AB"))
> str(bg)
Here, O group is given value 1, A group 2, B group 3 and AB group 4.
The number of levels in a factor can be accessed with 'nlevels' function. E.g.
print(nlevels(gen_fact))
Types of Categorical Variables
The categorical variables so far we have encountered do not have any intrinsic ordering. They are also
called nominal variables.
In some categorical variables different levels associated may have specific ordering. For example,
economic status of citizens can be categorized as low, medium, high. Here different levels have some
sort of ordering. Such type of categorical variables are said to be 'Ordinal'.
In fact, there are four types of categorical variable, they are:
a) Nominal Variable
b) Ordinal Variable
c) Scale Variable
d) Ratio Variable
Ordinal Variables
To create ordinal variable, while creating factor, 'ordered' attribute is set to 'TRUE'. E.g.
> tshirt_size = c("Large","Small","Large","Large","Medium","Small","Large")
> ts_fact = factor(tshirt_size, ordered = TRUE)

If one views the structure of this factor by using 'str()' function, then according to alphabetical order
"Large" is provided value 1, "Medium" is provide value 2 and "Small" is provided value 3.
To provide values 1, 2 and 3 for "Small", "Medium" and "Large", one can use 'levles' attribute of factor
function, as
> ts_fact = factor(tshirt_size, ordered = TRUE, levels = c("Small","Medium", "Large"))
The categorical variables which are defined for certain ranges are called interval variables. For example:
(a) age-group (0 10, 10 20, 20- 30, etc.) (b) income groups ( $ 100 500, $ 600 1000, etc.)
Description on interval variables and ratio variable are left over now.
Accessing Elements in Factors
To access a specific element in the factor created, one can use 'factor_name[position]". E.g.
> ts_fact[2]
> ts_fact[c(1,3,4)]

Matrix
A matrix is a two dimensional rectangular data set in which data values are arranged into rows and
columns.
Creating Matrices
Creating matrix of integers
> m2 = matrix(c(12, 43, 43, 23,34, 26), nrow = 2, ncol= 3)
> m2 = matrix(c(12, 43, 43, 23,34, 26), nrow = 2, ncol= 3, byrow=TRUE)

Creating matrix of texts


> m3 =matrix(c("Kavre","Kathmandu","Nuwakot","Sunsari","Morang","Jhapa"), nrow=3,ncol=2)

Creating matrix of sequence of integers


> m = matrix(1:10, nrow=2, ncol=5)
> m1 = matrix(1 : 10, nrow = 2, ncol = 5, byrow = TRUE)

Creating matrix of same element repeatedly


> m4 = matrix(0, nrow=2, ncol =3)
> m5 = matrix(c(2, 5), nrow = 2, ncol=3)

Specifying Columns and Row Headings


Row and column headings are provided to a matrix by using 'dimnames' attribute in 'matrix' function by
specifying them as a list of two vectors. E.g.
Marks of two students "Rajan" and "Hari" in three subjects "Math", "Science" and "Computer" are
stored in a matrix and headings are provided below-
> mm1 = matrix(c(34, 57,54,76,57,87), nrow=2, ncol = 3, dimnames =
list(c("Rajan","Hari"),c("Math","Science","Computer")))
Accessing Elements in Matrices
Accessing particular element
> m2 = matrix(c(12, 43, 43, 23,34, 26), nrow = 2, ncol= 3)
> m2[2, 3]
> n1 = m2[2, 3]

Accessing elements in a row(s), column(s)


> m2[1, ] > m2[2, ] > m2[ , 2]

Accessing diagonal elements


> m3 = matrix(1 : 25, nrow = 5, ncol = 5) > diag(m3)
Matrix Manipulation
> a = matrix(1:10, nrow=2, ncol=5) >a
> b = matrix(-5:4, nrow = 2, ncol = 5) >b
> a + b (addition of matrix)
>c=ab
>5*a
> a * b (product of corresponding elements)
> a / b (element by element division)
>1/a
Real matrix manipulation
> a = matrix(1:10, nrow=2, ncol=5)
> b = matrix(-5: 4, nrow=5, ncol=2)
> a %*% b (real matrix multiplication)
> eigen(c) [ eigen value of matrix 'c']
> eigen(c)$values
> eigen(c) $ vectors
> det(c) // displays determinant of matrix 'c'
> solve(c) // displays inverse of matrix 'c'
Statistical manipulation of elements in vector
> sqrt(a) > sum(a) > mean(a)
> sum(a[1, ]) > sum(a[2, ]) > mean(a[2, ])
> c = matrix(1 : 25, nrow = 5, ncol = 5) > sum(diag(c))
> mean(diag(c)) > p1 = diag(c) > mean(p1)

Data Frame

Introduction
R is a statistical programming language and in Statistics we work with datasets. Such data sets typically
comprises of observations. All observations consist of some variables which may be of different types.
For example - .........
In datasets, different instances of observations are stored in different rows. Each of these observations
has specific attributes, e.g. name, age, gender, score, etc. Since there will be a lot of observations a
particular attribute is placed in same column of dataset.
So, a dataset is similar to matrix, since it is a two dimensional array consisting of rows and columns.
However, a matrix can contain all data of same type, but a dataset needs each observation containing
data of one or more different data type.
In fact, a list represents a single observation (row) of dataset and a dataset can also be created by using
list of lists.
However, R provides a special way to create a dataset and it is by using object 'dataframe'.
A data frame is fundamental data structure that stores datasets.
In a data frame all columns contains elements of same data type and they represent different attributes
of observations. Data representing common attribute of different observations are placed in a particular
column of data frame. In the same way, rows contain list of elements belonging to a particular instance
or particular observation.
Creating data frame
A data set is created by using 'data.frame()' function.
Let us create a data frame containing three columns (or vectors) of names- name, age, and gender, each
containing five observations.
> name = c("Roni", "Rabi", "Sunita", "Arjun", "Mani")
> age = c(34, 65, 45, 23, 34)
> male = c(TRUE, TRUE,FALSE,TRUE,TRUE)
> df = data.frame(name, age, male)
> df
Labeling Variables of Data Frame
To provide clear descriptive labels to the variables, i.e., columns, 'names()' function is used as
> names(df) = c("Name-of-Students", "Age", "Male")
An alternative method is
> df = data.frame(Name-of-Student = name, Age = age, Male = male)
Or,
> df = data.frame("Name-of-Student" = name, Age = age, Male = male)
In the same way, different rows of observations can also be named. (Later)
To View Structure of Data Frame
To view the structure of the data frame, 'str()' function is used.
E.g.
> str(df)
To Access Elements of Data Frame
a) By Treating Data Frame as Matrix
To access age of third person (since age is in second column of dataframe-
> df[3, 2]
> dg[3, "Age"] //'Age' is name of variable 'age'
To display all records of third person
> df[ 3 , ]
To display names of all students, i.e., first column-
> df [ , 1]
> df[ , "Name-of-Student"]
To display the data in third and fifth row
> df[ c(3,5), ]
To display the data from second to fourth row
> df[ 2:4, ]
To display data in entire observations in first and third columns, i.e., name and male columns
> df[ , c(1,3)]
To display data in entire observations from second to third row
> df[ , 2:3]

b) By Treating Data Frame as Vector of Lists


All above commands are used by behaving data frame as a matrix.
Alternatively, data frame can also be visualized as vector of lists, where each list corresponds to a
particular observation or row.
In this method different elements of data frame can be accessed as follows
To display all data in 'age' columns
> df $ Age
> df $ Male
> df [["Age"]]
> df[[ 2]]
To Add New Row(s) and Column(s) to Data Frame
To add new column of name 'height' with values 132, 143, 214, 245, 243
> ht = c(132, 143, 214, 245, 243) //creating vector for heights
> df $ height = ht
Or,
> df[["height"]] = ht
Another equivalent is to use 'cbind()' function as follows-
> wt = c{35, 56,54,46,54)
> cbind(df, wt)
To add a new row to the data frame, we create another data frame containing rows to be added and use
'rbind()' function as follows-
Sorting and Ordering in Data Frame
Sorting and Ordering observations in data frame
To sort ages in ascending order
> sort(df $ age)
To rank ages in ascending order
> order(df $ age)
Or,
> rnk = order(df $ age)
.... it shows position of smallest value in the rank and so on ...
To display entire data in order by 'rnk'
> df[rnk, ]
To order in descending order and display entire obervations-
> df[ order(df $ age, decreasing = TRUE), ]

Data from External Files


Practically, a dataset is not created directly, but it is imported from some data source, such as Excel, SQL,
Access, SPSS, etc.
Practically, data required for statistical analysis are not entered in R directly, as we have practiced in data
frame, but they are usually imported from different sources, such as text editor, spreadsheets,
databases, etc.
Reading Data From Text Files/ Spreadsheets
There are two ways of creating data files in text editors. First by separating each data in a row by comma
" , "and second by using "Tab". Different rows or observations in data file are separated by pressing
"Enter" key in both ways.
Data files created by using comma are commonly called comma separated files (or .CSV files) and those
using "Tab" are called tab delimited text files (or .TXT files)
Opening '.csv' file -
> read.csv(file = ..........)
Another way is to use 'file.choose()' function to display list of files to open as
> read.csv( file.choose())
To open '.txt' file -
> read.delim(file.choose()) // or // >read.delim(file = .......... )
The general way to open a data file is to use 'read.table()' function as -
> read.table(file=......., header = TRUE, sep = ",")
> read.table(file=......., header = TRUE, sep = "\t")
// OR
> read.table( file.choose(), header = TRUE, sep = "," )
> read.table( file.choose(), header = TRUE, sep = "\t" )
Manipulating Data in Data Frames
To work with imported data table
> bb = read.table( file.choose(), header = TRUE, sep = "," )
> mean(bb $ Math)
> sum(bb $ Math)
> sqrt(bb $ Math)
> mean(bb) will not work, since data frame is not stored into memory.
So it is required to use
> attach(bb) // It imports data from files into working memory of computer.
// Once attach() function is used to import a data file, it will not be necessary to use '$' operator to refer
to any variable in it. E.g.
> mean(Math)
> dim(bb) // displays the number of rows and columns in the data frame
> bb[c(1,3), ] //displays the first and the third observations
> bb[2:3, ]
> bb[-(2:3), ] // all except second and third row
> bb[ , 4]
> bb[ , "Math"]
> bb[ , c("Math","Science")]
> bb [ , c(4,5)]
To display summary of data Frame
> summary(bb)
For numeric data it displays mean, median, mode, quartiles, etc. If there are categorical data, i.e.,
factors, then it displays counts of different categories.
Filtering Data in Data Frames
One common way of filtering data in data frame is to split data frame.
To split a data frame into two or more data frames according to some categorical data, e.g. Gender
> maledata = bb[bb$Gender=="male", ]
> femdata = bb[bb$Gender=="female", ]
> maledata > femdata > dim(maledata)
> summary(maledata) > femdata[1:3,]
> mean(bb$Age[bb$Gender=="female"])// To display mean age of females only in following data frame

Array
While matrices of are confined to two dimensions, arrays can be any number of dimensions. In fact,
vectors, lists and factors are one dimensional array. Similarly, matrices are two dimensional arrays.
Creating arrays
Marks of 3 students in 4 subjects recorded for two terminal examinations can be presented in the form
of a 3-dimensional array as 2 number of 3 x 3 matrices as follows:
a) > ar1 = array(c(24,65,76,54,34,56,67,67,78,78,76,56,47,84,57,63,35,45,67,89,87,56,34,23),
dim=c(4,3,2))
b) > term1 = matrix(c(24,65,76,54,34,56,67,67,78,78,76,56), nrow=4, ncol=3)
> term2= matrix(c(47,84,57,63,35,45,67,89,87,56,34,23), nrow = 4, ncol = 3)
> ar2 = array(c(m1, m2), dim = c(4,3,2))
c) > sub11 = c(24,65,76,54) > sub21 = c(34,56,67,67)
> sub31 = c(78,78,76,56) > sub12 = c(47,84,57,63)
> sub22 = c(35,45,67,89) > sub32 = c(87,56,34,23)
> ar3 = array(matrix(c(sub11, sub21, sub31, sub12, sub22, sub32),nrow=4, ncol=3), dim=c(4,3,2))
Manipulating Arrays
To provide names to the row headings, column heading, and matrix headings in above array.
a)
> ar4 = array(c(24,65,76,54,34,56,67,67,78,78,76,56,47,84,57,63,35,45,67,89,87,56,34,23), dim=c(4,3,2),
dimnames=list(c("Stud1", "Stud2", "Stud3","Stud4"),c("Sub1", "Sub2", "Sub3"),c("Term1", "Term2")))
b)
c)
To display marks of "Stud2" in "Sub3" in "Term1"
ar4[2, 3, 1]
To display marks of "Stud2" in all subjects in the first term
ar4[2, , 1]
To display average mark of "Stud2" (of all subjects) in first term
mean(ar4[2, , 1])
To display marks of all students in "Sub3" in "Term2"
ar4[ , 3, 2]
To display average mark of all students in "Sub3" in "Term2"
mean(ar4[ , 3, 2])
To display all marks of "Term1"
ar4[ , , 1]
To display sum of all marks in "Term1"
sum(ar4[ , , 1])
To display average marks of all students in both terms
apply(ar4, c(1), mean)
To display average marks in all subjects in both terms
apply(ar4, c(2), mean)
To display grand average marks of all students in all subjects in both terms
apply(ar4, c(3), mean)
To display grand average marks different students in all subject in "Term1"
apply(ar4[ , , 1], c(1), mean)
To display grand average marks in different subjects of all students in "Term1"
apply(ar4[ , , 1], c(2), mean)

You might also like