Professional Documents
Culture Documents
Mmsac FDP Tutorial
Mmsac FDP Tutorial
Mmsac FDP Tutorial
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
1. What is R Programming
R is one of the most popular platforms for data analysis and visualization. R is free and
open source software.
It has versions for Windows, MacOSX, Linux.
R is free software programming language and software environment for statistical
computation and graphics. Ref. WIKIPEDIA
It is generally used by statisticians and data miners for data analytics and mining.
R language is an implementation of S programming language created by John Chembers at
Bell labs.
R was created by Ross Ihaka and Robert Gentleman at University of Auckland, New
Zealand.
R software environment is written in C, Fortran and R.
R provides a wide variety of statistical and graphical techniques including linear , nonlinear
modelling, classical statistical tests , time series analysis, classification, clustering etc.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 1
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
2. setwd("mydirectory") : You can set working directory under which you are working
>setwd("/home/sumitra/rexamples")
Input:
source(filename)
This function submits a script to the current session.
Example : source(myscript.R)
Script file is saved as filename with .R as extension.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 2
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Output:
sink(filename)
This function save output to mentioned file name. If the file name already exists then its
contents are over written.
Two options are provided for sink ()
1. append =TRUE
This option used to append text to already existing file.
2. split=TRUE
This option will send output to both the screen and file
Graphics Output:
sink() has no effect on graphics output. For graphics output, use following functions.
1. pdf(“filename.pdf”) : To save out put in .pdf format.
2 win.metafile(“filename.wmf”) : To save out put in Windows metafile.
3. png(“filename.png”) : To save output in png format.
4. jpeg(“filename.jpeg”) : To save output in jpeg format.
5. bmp(“filename.bmp”) : To save output in bmp format.
6. postscript(“filename.ps”): To save output in PostScript format.
##Example
source(script1.R)
sink(“myoutput”, append=TRUE, split=TRUE)
pdf(“mygraph.pdf”)
source(“script2.R”)
dev.off # is used to return output to terminal
6. Packages in R
Till date there are 6695 packages are available in R. Information of all packages are available on site
http://cran.r-project.org/web/packages/available_packages_by_date.html
Installing Packages
install.packages(“name of package”) : This command is used to install packages.
update.packages(“name of package”) : This command is used to update packages those are
already installed.
installed.packages() : This command shows already installed packages till date.
library(package name) : This command is used to load package for further use.
help(Package=”package_name”) : This command is used to learn about a particular package.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 3
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
An important feature of R is that it will do different things on different types of objects. For
example, type:1
> 4+6
The result should be
[1] 10
So, R does scalar arithmetic returning the scalar value 10. (In actual fact, R returns a vector of
length 1 - hence the [1] denoting first element of the vector.
We can assign objects values for subsequent use. For example:
x<-6
y<-4
z<-x+y
Type 1 would do the same calculation as above, storing the result in an object called z. We can look at
the contents of the object by simply typing its name:
>z
[1] 10
> sqrt(16)
[1] 4
calculates the square root of 16.
Objects can be removed from the current workspace with the rm function: For example
> rm(x,y)
There are many standard functions available in R, and it is also possible to create new ones.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 4
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Objects
R has five basic or “atomic" classes of objects:
character
numeric (real numbers)
integer
complex
logical (True/False)
The most basic object is a vector
A vector can only contain objects of the same class
BUT: The one exception is a list, which is represented as a vector but can contain objects of
different classes (indeed, that's usually why we use them)
Empty vectors can be created with the vector() function.
------------------------------------------------------------------------------------------------------------
Numbers
Numbers in R are generally treated as numeric objects (i.e. double precision real numbers)
If you explicitly want an integer, you need to specify the L suffix
Ex: Entering 1 gives you a numeric object; entering 1L explicitly gives you an integer.
There is also a special number Inf which represents infinity; e.g. 1 / 0; Inf can be used in ordinary
calculations; e.g. 1 / Inf is 0
The value NaN represents an undefined value (\not a number"); e.g. 0 / 0; NaN can also be thought
of as a missing value (more on that later)
Entering Inputs
At the R prompt we type expressions. The <- symbol is the assignment operator.
> x<-1
> print (x)
[1] 1
>x
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 5
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
[1] 1
> msg<-"hello"
> msg
[1] "hello"
At prompt 1: Enter values to be stored in the vector. While entering values, separate them by spaces.
After entering the last value, press ‘enter’ twice
1: 3 4 5 6 7 8
7:
Read 6 items
Expression Evaluation
The grammar of the language determines whether an expression is complete or not.
> x <- ## Incomplete expression
The # character indicates a comment. Anything to the right of the # (including the # itself) is ignored.
When a complete expression is entered at the prompt, it is evaluated and the result of the evaluated
expression is returned. The result may be auto-printed.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 6
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Vectors
Can be created in R in a number of ways. The c() function can be used to create vectors of objects.
> z<-c(1, 2, 3, 4, 7, 9)
Note the use of the function c to concatenate or `glue together' individual elements. This function can
be used much more widely, for example
> x<-c(1,2,3)
>x
[1] 1 2 3
> y<-c(4,7,9)
>y
[1] 4 7 9
> z<-c(x,y)
>z
[1] 1 2 3 4 7 9
would lead to the same result by gluing together two vectors to create a single vector.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 7
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
When different objects are mixed in a vector, coercion occurs so that every element in the vector is of
the same class.
Explicit coercion
Objects can be explicitly coerced from one class to another using the as.* functions.
##Examples
##Examples
> x <- c("a", "b", "c")
> as.numeric(x)
[1] NA NA NA
Warning message:
NAs introduced by coercion
> as.logical(x)
[1] NA NA NA
------------------------------------------------------------------------------------------------------------
Lists
Can be created in R in a number of ways. Lists are a special type of vector that can contain elements of
different classes and are a very important data type in R.
##Example
> x<-list(1, "a", TRUE, 1 + 4i)
>x
[[1]]
[1] 1
[[2]]
[1] "a"
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 8
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
[[3]]
[1] TRUE
[[4]]
[1] 1+4i
------------------------------------------------------------------------------------------------------------
Factors
Factors are used to represent categorical data. Factors can be unordered or ordered. One can think of a
factor as an integer vector where each integer has a label.
Factors are treated specially by modelling functions like lm() and glm()
Using factors with labels is better than using integers because factors are self-describing;
having a variable that has values “Male" and “Female" is better than a variable that has values
1 and 2
##Example
>x
[1] yes yes no yes no
Levels: no yes
> table(x)
x
no yes
2 3
> unclass(x)
[1] 2 2 1 2 1
attr(,"levels")
[1] "no" "yes"
The order of the levels can be set using the levels argument to factor(). This can be important in linear
modelling because the first level is used as the baseline level.
##Example
> x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
>x
[1] yes yes no yes no
Levels: yes no
------------------------------------------------------------------------------------------------------------
Missing Values
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 9
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Missing values are denoted by NA or NaN (Not a Number) for undefined mathematical operations.
is.na() is used to test objects if they are NA
is.nan() is used to test for NaN
NA values have a class also, so there are integer NA, character NA, etc.
A NaN value is also NA but the converse is not true
##Example
> is.na(x)
[1] FALSE FALSE TRUE FALSE FALSE
> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE
> is.na(x)
[1] FALSE FALSE TRUE TRUE FALSE
> is.nan(x)
[1] FALSE FALSE FALSE TRUE FALSE
------------------------------------------------------------------------------------------------------------
Data Frames
Data frames are used to store tabular data
They are represented as a special type of list where every element of the list has to have the same
length
Each element of the list can be thought of as a column and the length of each element of the list
is the number of rows
Unlike matrices, data frames can store different classes of objects in each column (just like lists);
matrices must have every element be the same class
Data frames also have a special attribute called row.names
Data frames are usually created by calling read.table() or read.csv()
Can be converted to a matrix by calling data.matrix()
##Example
> dFrame<-data.frame(srNo= 1:6, code=c('M', 'F', 'F', 'F', 'M', 'F'))
> dFrame
srNo code
1 1 M
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 10
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
2 2 F
3 3 F
4 4 F
5 5 M
6 6 F
> nrow(dFrame)
[1] 6
> ncol(dFrame)
[1] 2
------------------------------------------------------------------------------------------------------------
Names
R objects can also have names, which is very useful for writing readable code and self-describing
objects.
##Example
> x<-1:3
> names(x)
NULL
> names(x)<-c("one", "two", "three")
>x
one two three
1 2 3
##Example
> x<-list(a = 1, b = 2, c = 3)
>x
$a
[1] 1
$b
[1] 2
$c
[1] 3
##Example
> mat
[,1] [,2]
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 11
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
[1,] 1 9
[2,] 3 5
[3,] 0 -1
> mat
col1 col2
row1 1 9
row2 3 5
row3 0 -1
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 12
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
9. Vectorized operations
Many operations in R are vectorized making code more efficient, concise, and easier to read.
> x<- 1:4; y<- 6:9
>x+y
[1] 7 9 11 13
>x>2
[1] FALSE FALSE TRUE TRUE
> y == 8
[1] FALSE FALSE TRUE FALSE
>x*y
[1] 6 14 24 36
>x/y
[1] 0.1666667 0.2857143 0.3750000 0.4444444
As explained above, R will often adapt to the objects it is asked to work on.
##Example:
> x<-c(7, 4, -6)
> y<-c(23, 7, 55)
> x + y
[1] 30 11 49
> x * y
[1] 161 28 -330
Above example shows that R uses component-wise arithmetic on vectors. R will also try to make sense
if objects are mixed. For example,
##Example
> (x + y)-2
[1] 28 9 47
Though care should be taken to make sure that R is doing what you would like it to in these
circumstances.
Two particularly useful functions worth remembering are length() which returns the length of a vector
(i.e. the number of elements it contains) and sum() which calculates the sum of the elements of a
vector.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 13
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
------------------------------------------------------------------------------------------------------------
Matrices
Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of
length 2 (nrow, ncol)
Matrices are constructed column-wise. The basic parameters passed are matrix elements, and
dimensions in the form of number of rows and columns.
>m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Matrices can be created in R in a variety of ways other than the matrix() function. Perhaps the simplest
is to create the columns and then glue them together with the command cbind().
##The matrix can also be created using row-wise binding using the command rbind()
> mat1<-rbind(x,y)
> mat1
[,1] [,2] [,3]
x 5 7 9
y 6 3 4
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 14
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
The functions cbind and rbind can also be applied to matrices themselves (provided the dimensions
match) to form larger matrices.
##Example – Basic method for matrix creation using other matrices
> bigMat<-rbind(mat1, mat1)
> bigMat
[,1] [,2] [,3]
x 5 7 9
y 6 3 4
x 5 7 9
y 6 3 4
As an alternative we could have specified the number of columns with the argument ncol=2
(obviously, it is unnecessary to give both).
Notice that the matrix is 'filled up' column-wise. If instead you wish to fill up row-wise, add the option
byrow=T.
##Example – Basic method for matrix creation, default setting of column binding over-ridden by
parameter for row-binding called byrow
Notice that the argument nrow has been abbreviated to nr. Such abbreviations are always possible for
function arguments provided it induces no ambiguity - if in doubt always use the full argument name.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 15
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Matrices can also be created directly from vectors by adding a dimension attribute. The dimension of a
matrix can be checked with the dim command.
##Example
> m<- 1:10
> dim(m)
NULL
> dim(m)<-c(2, 5)
>m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> dim(m)
[1] 2 5
## That is, two rows and five columns
Sequences
Vectors with sequences of values, generally with uniform intervals. For example, a vector sequence of
numeric from 1 to 10 can be stored in a variable ‘x’, as:
> x<-1:10
> x
[1] 1 2 3 4 5 6 7 8 9 10
[1] 5 8 11 14 17 20 23
and
##Example:
Generates sequence with the range in the given interval of
required length
> seq(4, 79, length = 6)
[1] 4 19 34 49 64 79
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 16
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
These examples illustrate that many functions in R have optional arguments, in this case, either the
step length or the total length of the sequence (it doesn't make sense to use both). If you leave out both
of these options, R will make its own default choice, in this case assuming a step length of 1.
At this point it's worth mentioning the help facility. If you don't know how to use a function, or don't
know what the options or default values are, type help(function_name) where function-name is the
name of the function you are interested in. This will usually help and will often include examples to
make things even clearer.
Another useful function for building vectors is the rep command for repeating things.
For example, to generate a vector containing twenty-five 2s, we run,
##Example:
> rep(2, 25)
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>x
[,1] [,2]
[1,] 1 3
[2,] 2 4
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 17
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
>y
[,1] [,2]
[1,] 10 10
[2,] 10 10
>x/y
[,1] [,2]
[1,] 0.1 0.3
[2,] 0.2 0.4
> sqMat
[,1] [,2] [,3]
[1,] 1 7 2
[2,] 5 0 8
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 18
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
[3,] 4 9 5
> solve(sqMat)
[,1] [,2] [,3]
[1,] -1.0746269 -0.25373134 0.83582090
[2,] 0.1044776 -0.04477612 0.02985075
[3,] 0.6716418 0.28358209 -0.52238806
>z
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> z[1,1]
[1] 1
> z[c(2,3), 2]
[1] 5 6
> z[, 2]
[1] 4 5 6
> z[1:2,]
[,1] [,2]
[1,] 1 4
[2,] 2 5
So, in particular, it is necessary to specify which rows and columns are required, whilst omitting the
integer for either dimension implies that every element in that dimension is selected.
##Tutorials##
1. Define
> x<-c(4,2,6)
> y<-c(1,0,-1)
Decide what the result will be of the following:
(a) length(x)
(b) sum(x)
(c) sum(x^2)
(d) x+y
(e) x*y
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 19
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
(f) x-2
(g) x^2
Use R to check your answers.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 20
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
2. Decide what the following sequences are and use R to check your answers:
(a) 7:11
(b) seq(2,9)
(c) seq(4,10,by=2)
(d) seq(3,30,length=10)
(e) seq(6,-4,by=-2)
3. Determine what the result will be of the following R expressions, and then use R to check if you are
right:
(a) rep(2,4)
(b) rep(c(1,2),4)
Suggest an alternative for above function:
(c) rep(c(1,2),c(4,4))
(d) rep(1:4,4)
(e) rep(1:4,rep(3,4))
##Tutorials on matrices##
Exercises
1. Create in R the matrices
3 2 1 4 0
x= and y=
1 1 0 1 1
2. With x and y as above, calculate the effect of the following subscript operations and check
your answers in R.
(a) x [1, ]
(b) x [2, ]
(c) Extract second column elements of y:
(d) y [1, 2]
(e) y [ ,2:3]
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 21
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
10. Subsetting
There are a number of operators that can be used to extract subsets of R objects.
[ always returns an object of the same class as the original; can be used to select more than one
element
[[ is used to extract elements of a list or a data frame; it can only be used to extract a single
element and the class of the returned object will not necessarily be a list or data frame
$ is used to extract elements of a list or data frame by name; semantics are similar to that of [[.
> x[1]
[1] "a"
> x[3:5]
[1] "c" "c" "d"
> x[res]
[1] "c" "c" "d"
> mat[3, 2]
[1] 6
> mat[, 2]
[1] 4 5 6
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 22
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
> mat[2,]
[1] 2 5
By default, when a single element of a matrix is retrieved, it is returned as a vector of length 1 rather
than a 1 1 matrix. This behavior can be turned off by setting drop = FALSE.
> mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> mat[2,]
[1] 2 5
> mat[2, , drop = FALSE]
[,1] [,2]
[1,] 2 5
$bar
[1] 0.6
> x[1]
$foo
[1] 1 2 3 4 5 6
> x[1][1]
$foo
[1] 1 2 3 4 5 6
> x$foo
[1] 1 2 3 4 5 6
> x$bar
[1] 0.6
> x[2]
$bar
[1] 0.6
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 23
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
> x[2][1]
$bar
[1] 0.6
> x[1][2]
$<NA>
NULL
> x["bar"]
$bar
[1] 0.6
> x[["bar"]][1]
[1] 0.6
> x[[2]][[2]]
Error in x[[2]][[2]] : subscript out of bounds
$third
[1] "testing"
The “[[“ operator can be used with computed indices; $ can only be used with literal names
##Example- Extracting list elements using computed indices (named indices)
> name<-"foo"
> x[[name]]
[1] 1 2 3 4
> x$foo
[1] 1 2 3 4
> x[[2]]
[[1]]
[1] 15.5
[[2]]
[1] 16.6
> x[[2]][[1]]
[1] 15.5
> x[[2]][[2]]
Error in x[[2]][[2]] : subscript out of bounds
> x[[1]][[3]]
[1] 14
> x[[a]]
Error: object 'a' not found
> x$a
[1] 0 1 2 3 4 5
> x[["a"]]
NULL
> x[["a", exact = FALSE]]
[1] 0 1 2 3 4 5
> x[good]
[1] 2 4
> y[good]
[1] "b" "d"
##Example 2
> airquality[1:6,]
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
> good<- complete.cases(airquality)
> airquality[good,][1:6,]
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 26
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
The data structure are as follows (we have already studied various data structures in earlier
sessions)
Vectors
Lists
Factors
Matrices
Data Frames
a. Keyboard
b. Statistical Packages (namely SAS, SPSS, Stata)
c. Text Files ( namely ASCII, XML, Webscraping)
d. Other (namely Excel, netCFD, HDF5)
e. Database Management Systems (namely SQL, MySQL, MongoDB, Oracle, Access)
## Example
In the following example, you’ll create a data frame named mydata with three variables:
age (numeric) , gender (character) , and weight (numeric) .
You’ll then invoke the text editor, add your data, and save the results.
> mydata
age gender weight
1 1 F 20
Syntax:
mydataframe <- read.table(file, header=logical_value, sep="delimiter", row.names="name")
where
file is a delimited ASCII file ,
header is a logical value indicating whether the first row contains variable names ( TRUE or FALSE ),
sep specifies the delimiter separating data values
row.names is an optional parameter specifying one or more variables to represent row identifiers.
## Example
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 28
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
R includes a number of datasets that it is convenient to use for examples. You can get a description
of what's available by typing
> data()
To access any of these datasets, you then type data(dataset) where dataset is the name of the dataset
you wish to access.
##Example
> data(trees)
Typing
> trees[1:5,]
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
gives us the first 5 rows of these data, and we can now see that the columns represent measurements of
girth, height and volume of trees (actually cherry trees: see help(trees)) respectively.
Now, if we want to work on the columns of these data, we can use the subscripting technique
explained above: for example, trees [, 2] gives all of the heights. This is a bit tedious however, and it
would be easier if we could refer to the heights more explicitly.
We can achieve this by attaching to the trees dataset:
> x<-attach(trees)
Effectively, this makes the contents of trees a directory, and if we type the name of an object, R will
look inside this directory to find it. Since Height is the name of one of the columns of trees, R now
recognises this object when we type the name. Hence, for example,
> mean(Height)
[1] 76
##Example
> head(trees, 3)
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
##Example
> head(trees)
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
6 10.8 83 19.7
2. names(x) : You can get the column names of a data frame with the 'names()' function.
> names(trees)
[1] "Girth" "Height" "Volume"
3. tail(x,n): The 'tail()' function is an easy way to extract the last few elements of an R object.
> tail(trees, 3)
Girth Height Volume
29 18.0 80 51.5
30 18.0 80 51.0
31 20.6 87 77.0
4. nrow(x) : You can use the 'nro()' function to compute the number of rows in a data frame.
> nrow(trees)
[1] 31
5. ncol(x) :You can use the 'ncol()' function to compute the number of columns in a data frame.
> ncol(trees)
[1] 3
6. Subsetting operations
Where
x= A matrix/data frame/vector
n= The first n rows (head) /last n rows (tail)
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 30
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
2. Extract the first 3 rows of the data frame and print them to the console. What does the output
look like?
3. What is the value of Ozone in the 51st row?
4. Identify the class of hw object
5. How many missing values are in the Ozone column of this data frame?
Control structures in R allow you to control the flow of execution of the program, depending on
runtime conditions.
Run command ?Control to get information of all control structures in R.
>?Control
1. If, else
if (condition) {
# do something
} else {
# do something else
}
Without else
if(condition)
{
}
##Example
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 31
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
open R Studio File -> New File -> R Script then type following code in code region then
> x<-1
> if (x>3) {
+ y<-10
+ } else {
+ y<-5
+}
>y
[1] 5
is same as
> y <- if (x>3 ) {
+ 10
+ } else {
+ 5
+}
>y
[1] 5
##Example 2
OR
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 32
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
OR
Nested loops
##Example
> m <- matrix(1:10, 2)
> for (i in seq( nrow(m))) {
+ for (j in seq( ncol(m))) {
+ print(m[i, j])
+ }
+}
[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
3. While
##Example
> i <- 1
> while (i < 10) {
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 33
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
+ print(i)
+ i <- i + 1 }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
Be sure there is a way to exit out of a while loop. It can leads to infinite loop if not properly written.
4. Repeat and break
##Exam ple
> sum <- 1
> repeat
+{
+ sum <- sum + 2;
+ print(sum);
+ if(sum>11)
+ break;
+}
[1] 3
[1] 5
[1] 7
[1] 9
[1] 11
[1] 13
5. Next
##Example
> for (i in 1:20) {
+ if (i%%2 == 1) {
+ next
+ } else {
+ print(i)
+ }
+}
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 12
[1] 14
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 34
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
[1] 16
[1] 18
[1] 20
## Example
Write script agecount.R for Check that "age" is not less than 10 other wise print message as “Do not
consider age for count”
Read "homicides.txt" data file by using readLines()
Extract ages of victims; ignore records where no age is given
Return integer containing count of homicides for that age
age<-57
homicides <- readLines("homicides.txt")
if(age<10)
{
print("Do not consider it for calculation")
}else
{
pattern = sprintf("\\s+%d\\s+years\\s+old", age)
res = grep(pattern, homicides,ignore.case = TRUE)
length(res)
}
6. Looping funcions in R
· lapply : Loop over a list and evaluate a function on each element
· sapply : Same as lapply but try to simplify the result
· apply : Apply a function over the margins of an array
· tapply : Apply a function over subsets of a vector
· mapply : Multivariate version of lapply
An auxiliary function split is also useful, particularly in conjunction with lapply.
1. lapply
lapply takes three arguments:
1. a list X ;
2. a function (or the name of a function) FUN ;
3. other arguments via its ... argument.
If X is not a list, it will be coerced to a list using as.list .
lapply always returns a list, regardless of the class of the input.
##Example
> x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
>x
$a
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 35
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
[,1] [,2]
[1,] 1 3
[2,] 2 4
$b
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
2. sapply
sapply will try to simplify the result of lapply if possible.
· If the result is a list where every element is length 1, then a vector is returned
· If the result is a list where every element is a vector of the same length (> 1), a matrix is returned.
· If it can’t figure things out, a list is returned
3. apply
apply is used to a evaluate a function (often an anonymous one) over the margins of an array.
· It is most often used to apply a function to the rows or columns of a matrix
· It can be used with general arrays, e.g. taking the average of an array of matrices
· It is not really faster than writing a loop, but it works in one line!
Syntax :
> str(apply)
function (X, MARGIN, FUN, ...)
· X is an array
· MARGIN is an integer vector indicating which margins should be “retained”.
· FUN is a function to be applied
· ... is for other arguments to be passed to FUN
## Example
margine 1 begins row where as margin 2 begins colums
>x<-matrix(1:24,nrows=4)
>x
>apply(x,1,sum) sum of all rows
>apply(x,2,sum) sum of all colums
4. mapply
mapply is a multivariate apply of sorts which applies a function in parallel over a set of arguments.
> str(mapply)
function (FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,USE.NAMES = TRUE)
· FUN is a function to apply
· ... contains arguments to apply over
· MoreArgs is a list of other arguments to FUN .
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 36
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
5. tapply
tapply is used to apply a function over subsets of a vector. t stands for table.
> str(tapply)
function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
· X is a vector
· INDEX is a factor or a list of factors (or else they are coerced to factors)
· FUN is a function to be applied
· ... contains other arguments to be passed FUN
· simplify , should we simplify the result?
6. split
split takes a vector or other objects and splits it into groups determined by a factor or list of factors.
> str(split)
function (x, f, drop = FALSE, ...)
· x is a vector (or list) or data frame
· f is a factor (or coerced to one) or a list of factors
· drop indicates whether empty factors levels should be dropped
##Tutorials on Control Structure ##
1. For given number identify odd or even number
2. Define list of months and print by using for loop
3. Define list of marks for a student for five subject and calculate sum of marks.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 37
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
> head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
##Example 2
> str(str)
function (object, ...)
##Example 3
> str(ls)
function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
pattern)
##Example 4
> x<-c(1, 2, NA, 4, NA, 6)
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 1.75 3.00 3.25 4.50 6.00 2
> str(x)
num [1:6] 1 2 NA 4 NA 6
##Example of Execution
>cube<-make.power(3)
>square<-make.power(2)
>cube(4)
[1] 64
> square(5)
[1] 25
> ls(environment(cube))
[1] "n" "pow"
> get("n", environment(cube))
[1] 3
> ls(environment(square))
[1] "n" "pow"
> get("n", environment(square))
[1] 2
g<-function(x){
x*y
}
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 39
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
array, the function var() will assume that the columns of the object are different variables and return
the variance-covariance matrix.
2. Median: The median of a sequence is the middle score in a set of value that have been ranked in
numeric order. The median can be calculated as :
##Example:
> median(x)
[1] 7.25
3. Mode: The mode is the most frequently occurring score in the data set. There is no direct function in
R to calculate the mode of a dataset.
##Example:
> xt<-table(x); xt
x
1.2 3.1 5.6 6.2 6.5 7 7.5 8.2 9.3 14.5
1 1 1 1 1 1 1 2 2 1
> which(xt==max(xt)) ->m;
> mode<-xt[m]
> mode
X ##bi-modal
8.2 9.3
2 2
4. Other functions:
##Example:
> length(x); min(x); max(x);
[1] 12
[1] 1.2
[1] 14.5
Measures of Variability
These measures are used to determine the degree of variation within a population or sample. These
measures include the range, variation and standard deviation.
1. Range: This value is simply the difference between the highest and lowest values in the data set.
##Example:
> range(x)
[1] 1.2 14.5
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 40
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
2. Variance: A more informative measure of variability is variance. This measure represents the degree
to which the data scores tend to vary from their mean. It is more informative than the range, because it
takes into account every score in the dataset, rather than the min and max values as in range().
Variance is the average of the squared deviations from the mean. The steps to calculate the variance
for a set of data scores are:
a. Find the mean score
b. Find the deviation of each raw score from the mean. For this, subtract the raw score
from the mean.
c. Square the deviation scores. (Reason: negative differences are made positive and
extreme scores are given more weight)
d. Find the sum of the squared deviation scores
e. Divide the sum by the number of scores to give the variance measure.
##Example:
> var(x)
[1] 11.00879
3. Standard Deviation: Simply the square root of the variance. This value is important because its
values is of the same unit as that of the raw data values.
> sd(x)
[1] 3.317949
Finally,
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 2.75 5.50 8.75 8.25 50.00
and
> x[7:12]
[1] 6.5 7.0 9.3 1.2 14.5 6.2
> summary(x[7:12])
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.200 6.275 6.750 7.450 8.725 14.500
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 41
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
##Tutorial
1. The data y<-c(33,44,29,16,25,45,33,19,54,22,21,49,11,24,56) contain sales of milk
in litres for 5 days in three different shops (the first 3 values are for shops 1,2 and 3 on
Monday, etc.) Produce a statistical summary of the sales for each day of the week and also for each
shop.
Attaching to objects
##Example:
> data(trees)
> attach(trees)
> head(trees)
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
4 10.5 72 16.4
5 10.7 81 18.8
6 10.8 83 19.7
> length(trees)
[1] 3
> str(trees)
'data.frame': 31 obs. of 3 variables:
$ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
$ Height: num 70 65 63 72 81 83 66 75 80 75 ...
$ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
> mean(trees[, 1])
[1] 13.24839
> mean(trees[, 2])
OR
OR
> mean(trees$Height) ##works even if the dataset is not attached
[1] 76
[1] 76
> mean(trees[, 3])
[1] 30.17097
In actual fact, trees is an object called a data-frame, essentially a matrix with named columns (though a
data-frame, unlike a matrix, may also include non-numerical variables, such as character names).
Because of this, there is another equivalent syntax to extract, for example, the vector of heights:
> trees$Height
which can also be used without having first attached to the dataset.
##Tutorial
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 42
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
1. Attach to the dataset quakes and produce a statistical summary of the variables depth and
mag.
2. Attach to the dataset mtcars and find the mean weight and mean fuel consumption for vehicles in the
dataset (type help(mtcars) for a description of the variables available).
has the effect of calculating the mean of each column (dimension 2) of trees. We'd have used a 1
instead of a 2 if we wanted the mean of every row.
Any function can be applied in this way, though if optional arguments to the function are required
these need to be specified as well - see help(apply) for further details.
##Tutorial
1. Repeat the analyses of the datasets quakes and mtcars using the function apply to simplify the
calculations.
1 4 0
2. If y =
0 1 1
what is the result of apply(y[,2:3],1,mean)? Check your answer in R.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 43
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Arguments
x, q vector of quantiles.
p vector of probabilities.
n number of observations. If length(n) > 1, the length is taken to be the number required.
mean vector of means.
sd vector of standard deviations.
log, log.p logical; if TRUE, probabilities p are given as log(p).
lower.tail logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].
Details
If mean or sd are not specified they assume the default values of 0 and 1, respectively.
The normal distribution has density
f(x) = 1/(√(2 π) σ) e^-((x - μ)^2/(2 σ^2))
where μ is the mean of the distribution and σ the standard deviation.
Value
dnorm gives the density, pnorm gives the distribution function, qnorm gives the quantile function,
and rnorm generates random deviates
> dnorm(5, 3, 2)
[1] 0.1209854
evaluates the density of the N(3, 4) distribution at x = 5.
As a further example
> y<-seq(-5, 10, 0.1)
> dnorm(y, 3, 2)
calculates the density function of the same distribution at intervals of 0.1 over the range [-5, 10].
The functions pnorm and qnorm work in an identical way - use help for further information.
Similar functions exist for other distributions. For example, dt, pt and qt for the t-distribution, though
in this case it is necessary to give the degrees of freedom rather than the mean and standard deviation.
Other distributions available include the binomial, exponential, Poisson and gamma, though care is
needed interpreting the functions for discrete variables.
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 44
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
One further important technique for many statistical applications is the simulation of data from
specified probability distributions. R enables simulation from a wide range of distributions, using a
syntax similar to the above.
For example, to simulate 100 observations from the N(3, 4) distribution we write
> rnorm(100,3,2)
Similarly, rt, rpois for simulation from the t and Poisson distributions, etc.
Exercises
1. Suppose X ~ N(2, 0.25). Denote by f and F the density and distribution functions of X respectively.
Use R to calculate
(a) f(0.5)
(b) F(2.5)
(c) F-1(0:95) (recall that F-11 is the quantile function)
(d) Pr(1 ≤ X ≤ 3)
2. Repeat question 1 in the case that X has a t-distribution with 5 degrees of freedom.
3. Use the function rpois to simulate 100 values from a Poisson distribution with a parameter
of your own choice. Produce a statistical summary of the result and check that the mean and
variance are in reasonable agreement with the true population values.
4. Repeat the previous question replacing rpois with rexp.
18. Graphics
R has many facilities for producing high quality graphics. A useful facility before beginning is to
divide a page into smaller pieces so that more than one figure can be displayed.
R has very good graphics capability.There are basically two types of graphics functions in R
1. High level function : It creates new graph
2. Low level function :It adds elements to existing graph.
Various graphics functions are listed out in following table
Sr.No Function Graph Type
1. plot(): vector of x and y values Scatter plot
2. hist() Histogram
3 boxplot() Box-whiskers plot
4 stripchart() Stripchart
5 barplot() Bar Graph
6 stem() Stem and leaf display
Some additional parameter of graphics function
Sr.no Argument Description
1 main Title
2 xab xlabel
3 ylab ylabel
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 45
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 46
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
> View(mtcars)
> help(mtcars)
> plot(mtcars$mpg,mtcars$cyl)
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 47
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
X<-seq(0,2,by=0.2)
y<-X
y1<-y^2
y2<-y^3
plot(X,y,"o",lty=1,xlab="Xaxis",ylab="Yaxis",ylim=range(0,max(y2)),cex=0.7,lwd=2)
Some
Try :
plot(mtcars$mpg,mtcars$cyl,type="l")
points(mtcars$mpg,mtcars$cyl)
lines(mtcars$mpg,mtcars$cyl , col="red")
points(mtcars$mpg,mtcars$cyl , col="red")
Plot Barplot
barplot(table(mtcars$cyl))
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 48
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Plot Histogram
hist(mtcars$mpg)
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 49
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
Plot BoxPlot
boxplot(mtcars$cyl,mtcars$mpg
)
pie(1:8,col=1:8)
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 50
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
We can also plot one variable against another using the function plot:
> plot(Height, Volume)
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 51
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
R can also produce a scatterplot matrix (a matrix of scatterplots for each pair of variables)
using the function pairs:
> pairs(trees)
Like many other functions plot is object-specific: its behaviour depends on the object to which it is
applied. For example, if the object is a matrix, plot is identical to pairs: try plot(trees).
For some other possibilities try:
> data(nhtemp)
> str(nhtemp)
Time-Series [1:60] from 1912 to 1971: 49.9 52.3 49.4 51.1 49.4 47.9 49.8 50.9
49.3 51.9 ...
> head(nhtemp)
[1] 49.9 52.3 49.4 51.1 49.4 47.9
> length(nhtemp)
[1] 60
> plot(nhtemp)
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 52
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
> data(faithful)
> head(faithful)
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
> str(faithful)
'data.frame': 272 obs. of 2 variables:
$ eruptions: num 3.6 1.8 3.33 2.28 4.53 ...
$ waiting : num 79 54 74 62 85 55 88 85 51 85 ...
> plot(faithful)
> data(HairEyeColor)
> plot(HairEyeColor)
There are also many optional arguments in most plotting functions that can be used to control colours,
plotting characters, axis labels, titles etc. The functions points and lines are useful for adding points
and lines respectively to a current graph. The function abline is useful for adding a line with specified
intercept and slope.
To print a graph, point the cursor over the graphics window and press the right button on the mouse.
This should open up a menu which includes `print' as an option. You also have the option to save the
figure in various formats, for example as a postscript file, for storage and later use.
##Tutorial
1. Use
> x<-rnorm(100)
or something similar, to generate some data. Produce a figure showing a histogram and
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 53
MITCOE, Department of Computer Engineering
Two-week FDP on
“Mathematical Modelling and Statistical Analysis in Computation”
4th June to 13th June 2015
R Tutorial and Hands-on Session
boxplot of the data. Modify the axis names and title of the plot in an appropriate way.
2. Type the following
13
> x<- (-10):10
> n<-length(x)
> y<-rnorm(n,x,4)
> plot(x,y)
> abline(0,1)
Try to understand the effect of each command and the graph that is produced.
3. Type the following:
> data(nhtemp)
> plot(nhtemp)
This produces a time series plot of measurements of annual mean temperatures in New Hamp-
shire, U.S.A.
and lines, use type='b' instead.
References:
Conducted by: Rekha Sugandhi and Sumitra Pundlik, MIT College of Engineering, Pune Page 54