Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 48

2.

Basic Data Types


Contents
Variable Types
Tables
We look at some of the ways that R can store and organize data. This is a basic
introduction to a small subset of the different data types recognized by R and is not
comprehensive in any sense. The main goal is to demonstrate the different kinds of
information R can handle. It is assumed that you know how to enter data or read data files
which is covered in the first chapter.
2.1. Variable Types
2.1.1. Numbers
The way to work with real numbers has already been covered in the first chapter and is
briefly discussed here. The most basic way to store a number is to make an assignment of
a single number
> a <- 3
>
The !"#$ tells R to take the number to the right of the symbol and store it in a variable
whose name is given on the left. %ou can also use the !&$ symbol. When you make an
assignment R does not print out any information. If you want to see what value a variable
has 'ust type the name of the variable on a line and press the enter key
> a
[1] 3
This allows you to do all sorts of basic operations and save the numbers
> b <- sqrt(a*a+3)
> b
[1] 3.464102
If you want to get a list of the variables that you have defined in a particular session you
can list them all using the ls command
> ls()
[1] "a" "b"
%ou are not limited to 'ust saving a single number. %ou can create a list (also called a
!vector$) using the c command
> a <- c(1,2,3,4,5)
> a
[1] 1 2 3 4 5
> a+1
[1] 2 3 4 5 6
> mean(a)
[1] 3
> ar(a)
[1] 2.5
%ou can get access to particular entries in the vector in the following manner
> a <- c(1,2,3,4,5)
> a[1]
[1] 1
> a[2]
[1] 2
> a[0]
n!mer"c(0)
> a[5]
[1] 5
> a[6]
[1] #$
*ote that the zero entry is used to indicate how the data is stored. The first entry in the
vector is the first number+ and if you try to get a number past the last number you get
!*,.$
-.amples of the sort of operations you can do on vectors is given in a ne.t chapter.
To initialize a list of numbers the numeric command can be used. /or e.ample+ to create
a list of 01 numbers+ initialized to zero+ use the following command
> a <- n!mer"c(10)
> a
[1] 0 0 0 0 0 0 0 0 0 0
If you wish to determine the data type used for a variable the type command
> t%&e'((a)
[1] ")'!ble"
2.1.2. Strings
%ou are not limited to 'ust storing numbers. %ou can also store strings. , string is
specified by using 2uotes. 3oth single and double 2uotes will work
> a <- "*ell'"
> a
[1] "*ell'"
> b <- c("*ell'","t*ere")
> b
[1] "*ell'" "t*ere"
> b[1]
[1] "*ell'"
The name of the type given to strings is character+
> t%&e'((a)
[1] "c*aracter"
> a + c*aracter(20)
> a
[1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
2.1.3. Factors
,nother important way R can store data is as a factor. 4ften times an e.periment includes
trials for different levels of some e.planatory variable. /or e.ample+ when looking at the
impact of carbon dio.ide on the growth rate of a tree you might try to observe how
different trees grow when e.posed to different preset concentrations of carbon dio.ide.
The different levels are also called factors.
,ssuming you know how to read in a file+ we will look at the data file given in the first
chapter. 5everal of the variables in the file are factors
> s!mmar%(tree,-./0)
$1 $2 $3 $4 $5 $6 $1 /1 /2 /3 /4 /5 /6 /1 -1 -2 -3 -4
-5 -6
3 1 1 3 1 3 1 1 3 3 3 3 3 3 1 3 1 3
1 1
-1 -26 -21 31 32 33 34 35 36 31
1 1 1 1 1 3 1 1 1 1
3ecause the set of options given in the data file corresponding to the !C63R$ column are
not all numbers R automatically assumes that it is a factor. When you use summary on a
factor it does not print out the five point summary+ rather it prints out the possible values
and the fre2uency that they occur.
In this data set several of the columns are factors+ but the researchers used numbers to
indicate the different levels. /or e.ample+ the first column+ labeled !C+$ is a factor. -ach
trees was grown in an environment with one of four different possible levels of carbon
dio.ide. The researchers 2uite sensibly labeled these four environments as 0+ 7+ 8+ and 9.
:nfortunately+ R cannot determine that these are factors and must assume that they are
regular numbers.
This is a common problem and there is a way to tell R to treat the !C$ column as a set of
factors. %ou specify that a variable is a factor using the factor command. In the following
e.ample we convert tree;C into a factor
> tree,-
[1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3
3 3 3 3 3
[34] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4
> s!mmar%(tree,-)
5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7.
1.000 2.000 2.000 2.514 3.000 4.000
> tree,- <- (act'r(tree,-)
> tree,-
[1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3
3 3 3 3 3
[34] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4
2eels8 1 2 3 4
> s!mmar%(tree,-)
1 2 3 4
9 23 10 13
> leels(tree,-)
[1] "1" "2" "3" "4"
4nce a vector is converted into a set of factors then R treats it in a different manner then
when it is a set of numbers. , set of factors have a decrete set of possible values+ and it
does not make sense to try to find averages or other numerical descriptions. 4ne thing
that is important is the number of times that each factor appears+ called their
!fre2uencies+$ which is printed using the summary command.
2.1.4. Data Frames
,nother way that information is stored is in data frames. This is a way to take many
vectors of different types and store them in the same variable. The vectors can be of all
different types. /or e.ample+ a data frame may contain many lists+ and each list might be
a list of factors+ strings+ or numbers.
There are different ways to create and manipulate data frames. <ost are beyond the scope
of this introduction. They are only mentioned here to offer a more complete description.
=lease see the first chapter for more information on data frames.
4ne e.ample of how to create a data frame is given below
> a <- c(1,2,3,4)
> b <- c(2,4,6,9)
> leels <- (act'r(c("$","/","$","/"))
> b!bba <- )ata.(rame(("rst+a,
sec'n)+b,
(+leels)
> b!bba
("rst sec'n) (
1 1 2 $
2 2 4 /
3 3 6 $
4 4 9 /
> s!mmar%(b!bba)
("rst sec'n) (
5"n. 81.00 5"n. 82.0 $82
1st 6!.81.15 1st 6!.83.5 /82
5e)"an 82.50 5e)"an 85.0
5ean 82.50 5ean 85.0
3r) 6!.83.25 3r) 6!.86.5
5a7. 84.00 5a7. 89.0
> b!bba,("rst
[1] 1 2 3 4
> b!bba,sec'n)
[1] 2 4 6 9
> b!bba,(
[1] $ / $ /
2eels8 $ /
2.1.5. Logical
,nother important data type is the logical type. There are two predefined variables+
TRUE and FALSE
> a + :0;<
> t%&e'((a)
[1] "l'="cal"
> b + >$2?<
> t%&e'((b)
[1] "l'="cal"
The standard logical operators can be used
" less than
> great than
"& less than or e2ual
>& greater than or e2ual
&& e2ual to
?& not e2ual to
@ entry wise or
@@ or
? not
A entry wise and
AA and
.or(a+b) e.clusive or
*ote that there is a difference between operators that act on entries within a vector and
the whole vector
> a + c(:0;<,>$2?<)
> b + c(>$2?<,>$2?<)
> a@b
[1] :0;< >$2?<
> a@@b
[1] :0;<
> 7'r(a,b)
[1] :0;< >$2?<
There are a large number of functions that test to determine the type of a variable. /or
e.ample the is.numeric function can determine if a variable is numeric
> a + c(1,2,3)
> "s.n!mer"c(a)
[1] :0;<
> "s.(act'r(a)
[1] >$2?<
2.2. Tables
,nother common way to store information is in a table. 6ere we look at how to define
both one way and two way tables. We only look at how to create and define tablesB the
functions used in the analysis of proportions are e.amined in another chapter.
2.2.1. One Way Tables
The first e.ample is for a one way table. 4ne way tables are not the most interesting
e.ample+ but it is a good place to start. 4ne way to create a table is using the table
command. The arguments it takes is a vector of factors+ and it calculates the fre2uency
that each factor occurs. 6ere is an e.ample of how to create a one way table
> a <- (act'r(c("$","$","/","$","/","/","-","$","-"))
> res!lts <- table(a)
> res!lts
a
$ / -
4 3 2
> attr"b!tes(res!lts)
,)"m
[1] 3
,)"mnames
,)"mnames,a
[1] "$" "/" "-"
,class
[1] "table"
> s!mmar%(res!lts)
#!mber '( cases "n table8 4
#!mber '( (act'rs8 1
If you know the number of occurrences for each factor then it is possible to create the
table directly+ but the process is+ unfortunately+ a bit more convoluted. There is an easier
way to define one#way tables (a table with one row)+ but it does not e.tend easily to two#
way tables (tables with more than one row). %ou must first create a matri. of numbers. ,
matri. is like a vector in that it is a list of numbers+ but it is different in that you can have
both rows and columns of numbers. /or e.ample+ in our e.ample above the number of
occurrences of !,$ is 9+ the number of occurrences of !3$ is 8+ and the number of
occurrences of !C$ is 7. We will create one row of numbers. The first column contains a
9+ the second column contains a 8+ and the third column contains a 7
> 'cc!r <- matr"7(c(4,3,2),nc'l+3,b%r'A+:0;<)
> 'cc!r
[,1] [,2] [,3]
[1,] 4 3 2
,t this point the variable !occur$ is a matri. with one row and three columns of numbers.
To dress it up and use it as a table we would like to give it labels for each columns 'ust
like in the previous e.ample. 4nce that is done we convert the matri. to a table using the
as.table command
> c'lnames('cc!r) <- c("$","/","-")
> 'cc!r
$ / -
[1,] 4 3 2
> 'cc!r <- as.table('cc!r)
> 'cc!r
$ / -
$ 4 3 2 932 - 4022954
> attr"b!tes('cc!r)
,)"m
[1] 1 3
,)"mnames
,)"mnames[[1]]
[1] "$"
,)"mnames[[2]]
[1] "$" "/" "-"
,class
[1] "table"
2.2.2. Two Way Tables
If you want to add rows to your table 'ust add another vector to the argument of the table
command. In the e.ample below we have two 2uestions. In the first 2uestion the
responses are labeled !*ever+$ !5ometimes+$ or !,lways.$ In the second 2uestion the
responses are labeled !%es+$ !*o+$ or !<aybe.$ The set of vectors !a+$ and !b+$ contain
the response for each measurement. The third item in !a$ is how the third person
responded to the first 2uestion+ and the third item in !b$ is how the third person
responded to the second 2uestion.
> a <-
c("?'met"mes","?'met"mes","#eer","$lAa%s","$lAa%s","?'met"mes","?'met"
mes","#eer")
> b <- c("5a%be","5a%be","Bes","5a%be","5a%be","#'","Bes","#'")
> res!lts <- table(a,b)
> res!lts
b
a 5a%be #' Bes
$lAa%s 2 0 0
#eer 0 1 1
?'met"mes 2 1 1
The table command allows us to do a very 2uick calculation+ and we can immediately see
that two people who said !<aybe$ to the first 2uestion also said !5ometimes$ to the
second 2uestion.
Cust as in the case with one#way tables it is possible to manually enter two way tables.
The procedure is e.actly the same as above e.cept that we now have more than one row.
We give a brief e.ample below to demonstrate how to enter a two#way table that includes
breakdown of a group of people by both their gender and whether or not they smoke. %ou
enter all of the data as one long list but tell R to break it up into some number of columns
> se7sm'Ce<-matr"7(c(10,120,65,140),nc'l+2,b%r'A+:0;<)
> r'Anames(se7sm'Ce)<-c("male","(emale")
> c'lnames(se7sm'Ce)<-c("sm'Ce","n'sm'Ce")
> se7sm'Ce <- as.table(se7sm'Ce)
> se7sm'Ce
sm'Ce n'sm'Ce
male 10 120
(emale 65 140
The matri. command creates a two by two matri.. The byrow=TRUE option indicates
that the numbers are filled in across the rows first+ and the ncols=2 indicates that there
are two columns.
3. Basic Operations and Numerical
Descriptions
Contents
3asic 4perations
3asic *umerical Descriptions
4perations on Vectors
We look at some of the basic operations that you can perform on lists of numbers. It is
assumed that you know how to enter data or read data files which is covered in the first
chapter+ and you know about the basic data types.
3.1. Basic Operations
4nce you have a vector (or a list of numbers) in memory most basic operations are
available. <ost of the basic operations will act on a whole vector and can be used to
2uickly perform a large number of calculations with a single command. There is one
thing to note+ if you perform an operation on more than one vector it is often necessary
that the vectors all contain the same number of entries.
6ere we first define a vector which we will call !a$ and will look at how to add and
subtract constant numbers from all of the numbers in the vector. /irst+ the vector will
contain the numbers 0+ 7+ 8+ and 9. We then see how to add E to each of the numbers+
subtract 01 from each of the numbers+ multiply each number by 9+ and divide each
number by E.
> a <- c(1,2,3,4)
> a
[1] 1 2 3 4
> a + 5
[1] 6 1 9 4
> a - 10
[1] -4 -9 -1 -6
> a*4
[1] 4 9 12 16
> aD5
[1] 0.2 0.4 0.6 0.9
We can save the results in another vector called b
> b <- a - 10
> b
[1] -4 -9 -1 -6
If you want to take the s2uare root+ find e raised to each number+ the logarithm+ etc.+ then
the usual commands can be used
> sqrt(a)
[1] 1.000000 1.414214 1.132051 2.000000
> e7&(a)
[1] 2.119292 1.394056 20.095531 54.549150
> l'=(a)
[1] 0.0000000 0.6431412 1.0496123 1.3962444
> e7&(l'=(a))
[1] 1 2 3 4
3y combining operations and using parentheses you can make more complicated
e.pressions
> c <- (a + sqrt(a))D(e7&(2)+1)
> c
[1] 0.2394059 0.4064942 0.5640143 0.1152115
*ote that you can do the same operations with vector arguments. /or e.ample to add the
elements in vector a to the elements in vector b use the following command
> a + b
[1] -9 -6 -4 -2
The operation is performed on an element by element basis. *ote this is true for almost
all of the basic functions. 5o you can bring together all kinds of complicated e.pressions
> a*b
[1] -4 -16 -21 -24
> aDb
[1] -0.1111111 -0.2500000 -0.4295114 -0.6666661
> (a+3)D(sqrt(1-b)*2-1)
[1] 0.1512364 1.0000000 1.2994234 1.6311303
%ou need to be careful of one thing. When you do operations on vectors they are
performed on an element by element basis. 4ne ramification of this is that all of the
vectors in an e.pression must be the same length. If the lengths of the vectors differ then
you may get an error message+ or worse+ a warning message and unpredictable results
> a <- c(1,2,3)
> b <- c(10,11,12,13)
> a+b
[1] 11 13 15 14
Earn"n= messa=e8
l'n=er 'bFect len=t*
"s n't a m!lt"&le '( s*'rter 'bFect len=t* "n8 a + b
,s you work in R and create new vectors it can be easy to lose track of what variables
you have defined. To get a list of all of the variables that have been defined use the ls()
command
> ls()
[1] "a" "b" "b!bba" "c"
"last.Aarn"n="
[6] "tree" "trees"
/inally+ you should keep in mind that the basic operations almost always work on an
element by element basis. There are rare e.ceptions to this general rule. /or e.ample+ if
you look at the minimum of two vectors using the min command you will get the
minimum of all of the numbers. There is a special command+ called pmin+ that may be the
command you want in some circumstances
> a <- c(1,-2,3,-4)
> b <- c(-1,2,-3,4)
> m"n(a,b)
[1] -4
> &m"n(a,b)
[1] -1 -2 -3 -4
3.2. Basic Numerical Descriptions
Fiven a vector of numbers there are some basic commands to make it easier to get some
of the basic numerical descriptions of a set of numbers. 6ere we assume that you can
read in the tree data that was discussed in a previous chapter. It is assumed that it is
stored in a variable called tree
> tree <- rea).cs(("le+"trees41.cs",*ea)er+:0;<,se&+",")G
> names(tree)
[1] "-" "#" "-./0" "0<H" "2>/5" "?:/5" "0:/5"
"2>#--"
[4] "?:#--" "0:#--" "2>/--" "?:/--" "0:/--" "2>-$--" "?:-$--"
"0:-$--"
[11] "2>I--" "?:I--" "0:I--" "2>5J--" "?:5J--" "0:5J--" "2>H--"
"?:H--"
[25] "0:H--" "2>?--" "?:?--" "0:?--"
-ach column in the data frame can be accessed as a vector. /or e.ample the numbers
associated with the leaf biomass (G/3<) can be found using tree$LFBM
> tree,2>/5
[1] 0.430 0.400 0.450 0.920 0.520 1.320 0.400 1.190 0.490 0.210 0.210
0.310
[13] 0.650 0.190 0.520 0.300 0.590 0.490 0.590 0.590 0.410 0.490 1.160
1.210
[25] 1.190 0.930 1.220 0.110 1.020 0.130 0.690 0.610 0.100 0.920 0.160
0.110
[31] 1.640 1.490 0.140 1.240 1.120 0.150 0.340 0.910 0.410 0.560 0.550
0.610
[44] 1.260 0.465 0.940 0.410 1.010 1.220
The following commands can be used to get the mean+ median+ 2uantiles+ minimum+
ma.imum+ variance+ and standard deviation of a set of numbers
> mean(tree,2>/5)
[1] 0.1644014
> me)"an(tree,2>/5)
[1] 0.12
> q!ant"le(tree,2>/5)
0K 25K 50K 15K 100K
0.1300 0.4900 0.1200 1.0015 1.1600
> m"n(tree,2>/5)
[1] 0.13
> ma7(tree,2>/5)
[1] 1.16
> ar(tree,2>/5)
[1] 0.1424392
> s)(tree,2>/5)
[1] 0.3190111
/inally+ the summary command will print out the min+ ma.+ mean+ median+ and 2uantiles
> s!mmar%(tree,2>/5)
5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7.
0.1300 0.4900 0.1200 0.1644 1.0090 1.1600
The summary command is especially nice because if you give it a data frame it will print
out the summary for every vector in the data frame
> s!mmar%(tree)
- # -./0 0<H 2>/5
5"n. 81.000 5"n. 81.000 $1 8 3 5"n. 8 1.00 5"n. 8
0.1300
1st 6!.82.000 1st 6!.81.000 $4 8 3 1st 6!.8 4.00 1st
6!.80.4900
5e)"an 82.000 5e)"an 82.000 $6 8 3 5e)"an 814.00 5e)"an 8
0.1200
5ean 82.514 5ean 81.426 /2 8 3 5ean 813.05 5ean 8
0.1644
3r) 6!.83.000 3r) 6!.83.000 /3 8 3 3r) 6!.820.00 3r)
6!.81.0015
5a7. 84.000 5a7. 83.000 /4 8 3 5a7. 820.00 5a7. 8
1.1600
(Lt*er)836 #$Ms 811.00
?:/5 0:/5 2>#-- ?:#--
5"n. 80.0300 5"n. 80.1200 5"n. 80.990 5"n. 80.3100
1st 6!.80.1400 1st 6!.80.2925 1st 6!.81.312 1st 6!.80.6400
5e)"an 80.2450 5e)"an 80.4450 5e)"an 81.550 5e)"an 80.1950
5ean 80.2993 5ean 80.4662 5ean 81.560 5ean 80.1912
3r) 6!.80.3900 3r) 6!.80.5500 3r) 6!.81.199 3r) 6!.80.4350
5a7. 80.1200 5a7. 81.5100 5a7. 82.160 5a7. 81.2400
0:#-- 2>/-- ?:/-- 0:/--
5"n. 80.4100 5"n. 825.00 5"n. 814.00 5"n. 815.00
1st 6!.80.6000 1st 6!.834.00 1st 6!.811.00 1st 6!.814.00
5e)"an 80.1500 5e)"an 831.00 5e)"an 819.00 5e)"an 820.00
5ean 80.1344 5ean 836.46 5ean 819.90 5ean 821.43
3r) 6!.80.9100 3r) 6!.841.00 3r) 6!.820.00 3r) 6!.823.00
5a7. 81.5500 5a7. 849.00 5a7. 821.00 5a7. 841.00
2>-$-- ?:-$-- 0:-$-- 2>I--
5"n. 80.2100 5"n. 80.1300 5"n. 80.1100 5"n. 80.6500
1st 6!.80.2600 1st 6!.80.1600 1st 6!.80.1600 1st 6!.80.9100
5e)"an 80.2400 5e)"an 80.1100 5e)"an 80.1650 5e)"an 80.4000
5ean 80.2964 5ean 80.1114 5ean 80.1654 5ean 80.4053
3r) 6!.80.3100 3r) 6!.80.1915 3r) 6!.80.1100 3r) 6!.80.4400
5a7. 80.3600 5a7. 80.2400 5a7. 80.2400 5a7. 81.1900
#$Ms 81.0000
?:I-- 0:I-- 2>5J-- ?:5J--
5"n. 80.910 5"n. 80.330 5"n. 80.0100 5"n. 80.100
1st 6!.80.440 1st 6!.80.400 1st 6!.80.1000 1st 6!.80.110
5e)"an 81.055 5e)"an 80.415 5e)"an 80.1200 5e)"an 80.130
5ean 81.105 5ean 80.413 5ean 80.1104 5ean 80.135
3r) 6!.81.210 3r) 6!.80.520 3r) 6!.80.1300 3r) 6!.80.150
5a7. 81.520 5a7. 80.640 5a7. 80.1400 5a7. 80.140
0:5J-- 2>H-- ?:H-- 0:H--
5"n. 80.04000 5"n. 80.1500 5"n. 80.1500 5"n. 80.1000
1st 6!.80.06000 1st 6!.80.2000 1st 6!.80.2200 1st 6!.80.1300
5e)"an 80.01000 5e)"an 80.2400 5e)"an 80.2900 5e)"an 80.1450
5ean 80.06649 5ean 80.2391 5ean 80.2101 5ean 80.1465
3r) 6!.80.01000 3r) 6!.80.2100 3r) 6!.80.3115 3r) 6!.80.1600
5a7. 80.04000 5a7. 80.3100 5a7. 80.4100 5a7. 80.2100
2>?-- ?:?-- 0:?--
5"n. 80.0400 5"n. 80.1400 5"n. 80.0400
1st 6!.80.1325 1st 6!.80.1600 1st 6!.80.1200
5e)"an 80.1600 5e)"an 80.1900 5e)"an 80.1300
5ean 80.1661 5ean 80.1911 5ean 80.1249
3r) 6!.80.1915 3r) 6!.80.2000 3r) 6!.80.1415
5a7. 80.2600 5a7. 80.2900 5a7. 80.1100
3.3. Operations on Vectors
6ere we look at some commonly used commands that perform operations on lists. The
commands include the sort+ min+ ma+ and sum commands. /irst+ the sort command can
sort the given vector in either ascending or descending order
> a + c(2,4,6,3,1,5)
> b + s'rt(a)
> c + s'rt(a,)ecreas"n= + :0;<)
> a
[1] 2 4 6 3 1 5
> b
[1] 1 2 3 4 5 6
> c
[1] 6 5 4 3 2 1
The min and the ma commands find the minimum and the ma.imum numbers in the
vector
> m"n(a)
[1] 1
> ma7(a)
[1] 6
/inally+ the sum command adds up the numbers in the vector
> s!m(a)
[1] 21
4. Basic Probability Distributions
Contents
The *ormal Distribution
The t Distribution
The 3inomial Distribution
The Chi#52uared Distribution
We look at some of the basic operations associated with probability distributions. There
are a large number of probability distributions available+ but we only look at a few. If you
would like to know what distributions are available you can do a search using the
command help.search(!distribution$).
6ere we give details about the commands associated with the normal distribution and
briefly mention the commands for other distributions. The functions for different
distributions are very similar where the differences are noted below.
/or this chapter it is assumed that you know how to enter data which is covered in the
previous chapters.
To get a full list of the distributions available in R you can use the following command
*el&(3"str"b!t"'ns)
/or every distribution there are four commands. The commands for each distribution are
prepended with a letter to indicate the functionality
!d$ returns the height of the probability density function
!p$ returns the cumulative density function
!2$ returns the inverse cumulative density function (2uantiles)
!r$ returns randomly generated numbers
4.1. The Normal Distribution
There are four functions that can be used to generate the values associated with the
normal distribution. %ou can get a full list of them and their options using the help
command
> *el&(#'rmal)
The first function we look at it is !norm. Fiven a set of values it returns the height of the
probability distribution at each point. If you only give the points it assumes you want to
use a mean of zero and standard deviation of one. There are options to use different
values for the mean and standard deviation+ though
> )n'rm(0)
[1] 0.3494423
> )n'rm(0)*sqrt(2*&")
[1] 1
> )n'rm(0,mean+4)
[1] 0.0001339302
> )n'rm(0,mean+4,s)+10)
[1] 0.03692101
> <- c(0,1,2)
> )n'rm()
[1] 0.34944229 0.24141012 0.05344041
> 7 <- seq(-20,20,b%+.1)
> % <- )n'rm(7)
> &l't(7,%)
> % <- )n'rm(7,mean+2.5,s)+0.1)
> &l't(7,%)
The second function we e.amine is pnorm. Fiven a number or a list it computes the
probability that a normally distributed random number will be less than that number. This
function also goes by the rather ominous title of the !Cumulative Distribution /unction.$
It accepts the same options as dnorm
> &n'rm(0)
[1] 0.5
> &n'rm(1)
[1] 0.9413441
> &n'rm(0,mean+2)
[1] 0.02215013
> &n'rm(0,mean+2,s)+3)
[1] 0.2524425
> <- c(0,1,2)
> &n'rm()
[1] 0.5000000 0.9413441 0.4112444
> 7 <- seq(-20,20,b%+.1)
> % <- &n'rm(7)
> &l't(7,%)
> % <- &n'rm(7,mean+3,s)+4)
> &l't(7,%)
If you wish to find the probability that a number is larger than the given number you can
use the lower.tail option
> &n'rm(0,l'Aer.ta"l+>$2?<)
[1] 0.5
> &n'rm(1,l'Aer.ta"l+>$2?<)
[1] 0.1596553
> &n'rm(0,mean+2,l'Aer.ta"l+>$2?<)
[1] 0.4112444
The ne.t function we look at is "norm which is the inverse of pnorm. The idea behind
"norm is that you give it a probability+ and it returns the number whose cumulative
distribution matches the probability. /or e.ample+ if you have a normally distributed
random variable with mean zero and standard deviation one+ then if you give the function
a probability it returns the associated H#score
> qn'rm(0.5)
[1] 0
> qn'rm(0.5,mean+1)
[1] 1
> qn'rm(0.5,mean+1,s)+2)
[1] 1
> qn'rm(0.5,mean+2,s)+2)
[1] 2
> qn'rm(0.5,mean+2,s)+4)
[1] 2
> qn'rm(0.25,mean+2,s)+2)
[1] 0.6510205
> qn'rm(0.333)
[1] -0.4316442
> qn'rm(0.333,s)+3)
[1] -1.244433
> qn'rm(0.15,mean+5,s)+2)
[1] 6.34949
> + c(0.1,0.3,0.15)
> qn'rm()
[1] -1.2915516 -0.5244005 0.6144949
> 7 <- seq(0,1,b%+.05)
> % <- qn'rm(7)
> &l't(7,%)
> % <- qn'rm(7,mean+3,s)+2)
> &l't(7,%)
> % <- qn'rm(7,mean+3,s)+0.1)
> &l't(7,%)
The last function we e.amine is the rnorm function which can generate random numbers
whose distribution is normal. The argument that you give it is the number of random
numbers that you want+ and it has optional arguments to specify the mean and standard
deviation
> rn'rm(4)
[1] 1.2391211 -0.2323254 -1.2003091 -1.6119493
> rn'rm(4,mean+3)
[1] 2.633090 3.611496 2.039961 2.601433
> rn'rm(4,mean+3,s)+3)
[1] 4.590556 2.414403 4.156041 6.345944
> rn'rm(4,mean+3,s)+3)
[1] 3.000952 3.114190 10.032021 3.245661
> % <- rn'rm(200)
> *"st(%)
> % <- rn'rm(200,mean+-2)
> *"st(%)
> % <- rn'rm(200,mean+-2,s)+4)
> *"st(%)
> qqn'rm(%)
> qql"ne(%)
4.2. The t Distribution
There are four functions that can be used to generate the values associated with the t
distribution. %ou can get a full list of them and their options using the help command
> *el&(:3"st)
These commands work 'ust like the commands for the normal distribution. 4ne
difference is that the commands assume that the values are normalized to mean zero and
standard deviation one+ so you have to use a little algebra to use these functions in
practice. The other difference is that you have to specify the number of degrees of
freedom. The commands follow the same kind of naming convention+ and the names of
the commands are !t+ pt+ "t+ and rt.
, few e.amples are given below to show how to use the different commands. /irst we
have the distribution function+ !t
> 7 <- seq(-20,20,b%+.5)
> % <- )t(7,)(+10)
> &l't(7,%)
> % <- )t(7,)(+50)
> &l't(7,%)
*e.t we have the cumulative probability distribution function
> &t(-3,)(+10)
[1] 0.006611929
> &t(3,)(+10)
[1] 0.4433292
> 1-&t(3,)(+10)
[1] 0.006611929
> &t(3,)(+20)
[1] 0.446462
> 7 + c(-3,-4,-2,-1)
> &t((mean(7)-2)Ds)(7),)(+20)
[1] 0.001165549
> &t((mean(7)-2)Ds)(7),)(+40)
[1] 0.000603064
*e.t we have the inverse cumulative probability distribution function
> qt(0.05,)(+10)
[1] -1.912461
> qt(0.45,)(+10)
[1] 1.912461
> qt(0.05,)(+20)
[1] -1.124119
> qt(0.45,)(+20)
[1] 1.124119
> <- c(0.005,.025,.05)
> qt(,)(+253)
[1] -2.545401 -1.464395 -1.650944
> qt(,)(+25)
[1] -2.191436 -2.054534 -1.109141
/inally random numbers can be generated according to the t distribution
> rt(3,)(+10)
[1] 0.4440430 2.1134365 0.6195262
> rt(3,)(+20)
[1] 0.1043300 -1.4692149 0.0115013
> rt(3,)(+20)
[1] 0.9023932 -0.4154190 -1.0546125
4.3. The Binomial Distribution
There are four functions that can be used to generate the values associated with the
binomial distribution. %ou can get a full list of them and their options using the help
command
> *el&(/"n'm"al)
These commands work 'ust like the commands for the normal distribution. The binomial
distribution re2uires two e.tra parameters+ the number of trials and the probability of
success for a single trial. The commands follow the same kind of naming convention+ and
the names of the commands are dbinom+ pbinom+ 2binom+ and rbinom.
, few e.amples are given below to show how to use the different commands. /irst we
have the distribution function+ !binom
> 7 <- seq(0,50,b%+1)
> % <- )b"n'm(7,50,0.2)
> &l't(7,%)
> % <- )b"n'm(7,50,0.6)
> &l't(7,%)
> 7 <- seq(0,100,b%+1)
> % <- )b"n'm(7,100,0.6)
> &l't(7,%)
*e.t we have the cumulative probability distribution function
> &b"n'm(24,50,0.5)
[1] 0.4439624
> &b"n'm(25,50,0.5)
[1] 0.5561316
> &b"n'm(25,51,0.5)
[1] 0.5
> &b"n'm(26,51,0.5)
[1] 0.610116
> &b"n'm(25,50,0.5)
[1] 0.5561316
> &b"n'm(25,50,0.25)
[1] 0.444462
> &b"n'm(25,500,0.25)
[1] 4.455659e-33
*e.t we have the inverse cumulative probability distribution function
> qb"n'm(0.5,51,1D2)
[1] 25
> qb"n'm(0.25,51,1D2)
[1] 23
> &b"n'm(23,51,1D2)
[1] 0.2914241
> &b"n'm(22,51,1D2)
[1] 0.200531
/inally random numbers can be generated according to the binomial distribution
> rb"n'm(5,100,.2)
[1] 30 23 21 14 19
> rb"n'm(5,100,.1)
[1] 66 66 59 69 63
4.4. The Chi!"uared Distribution
There are four functions that can be used to generate the values associated with the Chi#
52uared distribution. %ou can get a full list of them and their options using the help
command
> *el&(-*"sq!are)
These commands work 'ust like the commands for the normal distribution. The first
difference is that it is assumed that you have normalized the value so no mean can be
specified. The other difference is that you have to specify the number of degrees of
freedom. The commands follow the same kind of naming convention+ and the names of
the commands are !chis"+ pchis"+ "chis"+ and rchis".
, few e.amples are given below to show how to use the different commands. /irst we
have the distribution function+ !chis"
> 7 <- seq(-20,20,b%+.5)
> % <- )c*"sq(7,)(+10)
> &l't(7,%)
> % <- )c*"sq(7,)(+12)
> &l't(7,%)
*e.t we have the cumulative probability distribution function
> &c*"sq(2,)(+10)
[1] 0.003654941
> &c*"sq(3,)(+10)
[1] 0.01951544
> 1-&c*"sq(3,)(+10)
[1] 0.491424
> &c*"sq(3,)(+20)
[1] 4.041501e-06
> 7 + c(2,4,5,6)
> &c*"sq(7,)(+20)
[1] 1.114255e-01 4.644909e-05 2.113521e-04 1.102499e-03
*e.t we have the inverse cumulative probability distribution function
> qc*"sq(0.05,)(+10)
[1] 3.440244
> qc*"sq(0.45,)(+10)
[1] 19.30104
> qc*"sq(0.05,)(+20)
[1] 10.95091
> qc*"sq(0.45,)(+20)
[1] 31.41043
> <- c(0.005,.025,.05)
> qc*"sq(,)(+253)
[1] 149.9161 210.9355 211.1113
> qc*"sq(,)(+25)
[1] 10.51465 13.11412 14.61141
/inally random numbers can be generated according to the Chi#52uared distribution
> rc*"sq(3,)(+10)
[1] 16.90015 20.29412 12.34044
> rc*"sq(3,)(+20)
[1] 11.939919 9.541436 11.496312
> rc*"sq(3,)(+20)
[1] 11.14214 23.96401 24.91251
#. Basic Plots
Contents
5trip Charts
6istograms
3o.plots
5catter =lots
*ormal II =lots
We look at some of the ways R can display information graphically. This is a basic
introduction to some of the basic plotting commands. It is assumed that you know how to
enter data or read data files which is covered in the first chapter+ and it is assumed that
you are familiar with the different data types.
In each of the topics that follow it is assumed that two different data sets+ w0.dat and
treesJ0.csv have been read and defined using the same variables as in the first chapter.
3oth of these data sets come from the study discussed on the web site given in the first
chapter. We assume that they are read using !read.csv$ into variables w# and tree
> A1 <- rea).cs(("le+"A1.)at",se&+",",*ea)+:0;<)
> names(A1)
[1] "als"
> tree <- rea).cs(("le+"trees41.cs",se&+",",*ea)+:0;<)
> names(tree)
[1] "-" "#" "-./0" "0<H" "2>/5" "?:/5" "0:/5"
"2>#--"
[4] "?:#--" "0:#--" "2>/--" "?:/--" "0:/--" "2>-$--" "?:-$--"
"0:-$--"
[11] "2>I--" "?:I--" "0:I--" "2>5J--" "?:5J--" "0:5J--" "2>H--"
"?:H--"
[25] "0:H--" "2>?--" "?:?--" "0:?--"
#.1. !trip Charts
, strip chart is the most basic type of plot available. It plots the data in order along a line
with each data point represented as a bo.. 6ere we provide e.amples using the w# data
frame mentioned at the top of this page+ and the one column of the data is w#$$als.
To create a strip chart of this data use the stripchart command
> *el&(str"&c*art)
> str"&c*art(A1,als)
5trip Chart
This is the most basic possible strip charts. The stripchart() command takes many of the
standard plot() options for labeling and annotations.
,s you can see this is about as bare bones as you can get. There is no title nor a.es labels.
It only shows how the data looks if you were to put it all along one line and mark out a
bo. at each point. If you would prefer to see which points are repeated you can specify
that repeated points be stacked
> str"&c*art(A1,als,met*')+"stacC")
, variation on this is to have the bo.es moved up and down so that there is more
separation between them
> str"&c*art(A1,als,met*')+"F"tter")
If you do not want the bo.es plotting in the horizontal direction you can plot them in the
vertical direction
> str"&c*art(A1,als,ert"cal+:0;<)
> str"&c*art(A1,als,ert"cal+:0;<,met*')+"F"tter")
5ince you should always annotate your plots there are many different ways to add titles
and labels. 4ne way is within the stripchart command itself
> str"&c*art(A1,als,met*')+"stacC",
ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,
7lab+M/"'5ass '( 2eaesM)
If you have a plot already and want to add a title+ you can use the title command
> t"tle(M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,7lab+M/"'5ass '( 2eaesM)
*ote that this simply adds the title and labels and will write over the top of any titles or
labels you already have.
#.2. $isto%rams
, histogram is very common plot. It plots the fre2uencies that data appears within certain
ranges. 6ere we provide e.amples using the w# data frame mentioned at the top of this
page+ and the one column of data is w#$$als.
To plot a histogram of the data use the !hist$ command
> *"st(A1,als)
> *"st(A1,als,ma"n+"3"str"b!t"'n '( A1",7lab+"A1")
6istogram 4ptions
<any of the basic plot commands accept the same options. The help%hist& command will
give you options specifically for the hist command. %ou can also use the help command
to see more but also note that if you use help%plot& you may see more options.
-.periment with different options to see what you can do.
,s you can see R will automatically calculate the intervals to use. There are many options
to determine how to break up the intervals. 6ere we look at 'ust one way+ varying the
domain size and number of breaks. If you would like to know more about the other
options check out the help page
> *el&(*"st)
%ou can specify the number of breaks to use using the breaks option. 6ere we look at the
histogram for various numbers of breaks
> *"st(A1,als,breaCs+2)
> *"st(A1,als,breaCs+4)
> *"st(A1,als,breaCs+6)
> *"st(A1,als,breaCs+9)
> *"st(A1,als,breaCs+12)
>
%ou can also vary the size of the domain using the .lim option. This option takes a vector
with two entries in it+ the left value and the right value
> *"st(A1,als,breaCs+12,7l"m+c(0,10))
> *"st(A1,als,breaCs+12,7l"m+c(-1,2))
> *"st(A1,als,breaCs+12,7l"m+c(0,2))
> *"st(A1,als,breaCs+12,7l"m+c(1,1.3))
> *"st(A1,als,breaCs+12,7l"m+c(0.4,1.3))
>
The options for adding titles and labels are e.actly the same as for strip charts. %ou
should always annotate your plots and there are many different ways to add titles and
labels. 4ne way is within the hist command itself
> *"st(A1,als,
ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,
7lab+M/"'5ass '( 2eaesM)
If you have a plot already and want to change or add a title+ you can use the title
command
> t"tle(M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,7lab+M/"'5ass '( 2eaesM)
*ote that this simply adds the title and labels and will write over the top of any titles or
labels you already have.
It is not uncommon to add other kinds of plots to a histogram. /or e.ample+ one of the
options to the stripchart command is to add it to a plot that has already been drawn. /or
e.ample+ you might want to have a histogram with the strip chart drawn across the top.
The addition of the strip chart might give you a better idea of the density of the data
> *"st(A1,als,ma"n+M2ea( /"'5ass "n ."=* -L2
<n"r'nmentM,7lab+M/"'5ass '( 2eaesM,%l"m+c(0,16))
> str"&c*art(A1,als,a))+:0;<,at+15.5)
#.3. Bo&plots
, bo.plot provides a graphical view of the median+ 2uartiles+ ma.imum+ and minimum of
a data set. 6ere we provide e.amples using two different data sets. The first is the w#
data frame mentioned at the top of this page+ and the one column of data is w#$$als. The
second is the tree data frame from the treesJ0.csv data file which is also mentioned at the
top of the page.
We first use the w# data set and look at the bo.plot of this data set
> b'7&l't(A1,als)
,gain+ this is a very plain graph+ and the title and labels can be specified in e.actly the
same way as in the stripchart and hist commands
> b'7&l't(A1,als,
ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,
%lab+M/"'5ass '( 2eaesM)
*ote that the default orientation is to plot the bo.plot vertically. 3ecause of this we used
the ylab option to specify the a.is label. There are a large number of options for this
command. To see more of the options see the help page
> *el&(b'7&l't)
,s an e.ample you can specify that the bo.plot be plotted horizontally by specifying the
horizontal option
> b'7&l't(A1,als,
ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,
7lab+M/"'5ass '( 2eaesM,
*'r"N'ntal+:0;<)
The option to plot the bo. plot horizontally can be put to good use to display a bo. plot
on the same image as a histogram. %ou need to specify the add option+ specify where to
put the bo. plot using the at option+ and turn off the addition of a.es using the a.es
option
> *"st(A1,als,ma"n+M2ea( /"'5ass "n ."=* -L2
<n"r'nmentM,7lab+M/"'5ass '( 2eaesM,%l"m+c(0,16))
> b'7&l't(A1,als,*'r"N'ntal+:0;<,at+15.5,a))+:0;<,a7es+>$2?<)
If you are feeling really crazy you can take a histogram and add a bo. plot and a strip
chart
> *"st(A1,als,ma"n+M2ea( /"'5ass "n ."=* -L2
<n"r'nmentM,7lab+M/"'5ass '( 2eaesM,%l"m+c(0,16))
> b'7&l't(A1,als,*'r"N'ntal+:0;<,at+16,a))+:0;<,a7es+>$2?<)
> str"&c*art(A1,als,a))+:0;<,at+15)
5ome people shell out good money to have this much fun.
/or the second part on bo.plots we will look at the second data frame+ !tree+$ which
comes from the treesJ0.csv file. To reiterate the discussion at the top of this page and the
discussion in the data types chapter+ we need to specify which columns are factors
> tree <- rea).cs(("le+"trees41.cs",se&+",",*ea)+:0;<)
> tree,- <- (act'r(tree,-)
> tree,# <- (act'r(tree,#)
We can look at the bo.plot of 'ust the data for the stem biomass
> b'7&l't(tree,?:/5,
ma"n+M?tem /"'5ass "n 3"((erent -L2 <n"r'nmentsM,
%lab+M/"'5ass '( ?temsM)
That plot does not tell the whole story. It is for all of the trees+ but the trees were grown in
different kinds of environments. The bo.plot command can be used to plot a separate bo.
plot for each level. In this case the data is held in !tree;5T3<+$ and the different levels
are stored as factors in !tree;C.$ The command to create different bo.plots is the
following
b'7&l't(tree,?:/5Otree,-)
*ote that for the level called !7$ there are four outliers which are plotted as little circles.
There are many options to annotate your plot including different labels for each level.
=lease use the help(bo.plot) command for more information.
#.4. !catter Plots
, scatter plot provides a graphical view of the relationship between two sets of numbers.
6ere we provide e.amples using the tree data frame from the treesJ0.csv data file which
is mentioned at the top of the page. In particular we look at the relationship between the
stem biomass (!tree;5T3<$) and the leaf biomass (!tree;G/3<$).
The command to plot each pair of points as an .#coordinate and a y#coorindate is !plot$
> &l't(tree,?:/5,tree,2>/5)
It appears that there is a strong positive association between the biomass in the stems of a
tree and the leaves of the tree. It appears to be a linear relationship. In fact+ the corelation
between these two sets of observations is 2uite high
> c'r(tree,?:/5,tree,2>/5)
[1] 0.411545
Fetting back to the plot+ you should always annotate your graphs. The title and labels can
be specified in e.actly the same way as with the other plotting commands
> &l't(tree,?:/5,tree,2>/5,
ma"n+"0elat"'ns*"& /etAeen ?tem an) 2ea( /"'mass",
7lab+"?tem /"'mass",
%lab+"2ea( /"'mass")
#.#. Normal '' Plots
The final type of plot that we look at is the normal 2uantile plot. This plot is used to
determine if your data is close to being normally distributed. %ou cannot be sure that the
data is normally distributed+ but you can rule out if it is not normally distributed. 6ere we
provide e.amples using the w# data frame mentioned at the top of this page+ and the one
column of data is w#$$als.
The command to generate a normal 2uantile plot is 22norm. %ou can give it one
argument+ the univariate data set of interest
> qqn'rm(A1,als)
%ou can annotate the plot in e.actly the same way as all of the other plotting commands
given here
> qqn'rm(A1,als,
ma"n+"#'rmal 6-6 Hl't '( t*e 2ea( /"'mass",
7lab+":*e'ret"cal 6!ant"les '( t*e 2ea( /"'mass",
%lab+"?am&le 6!ant"les '( t*e 2ea( /"'mass")
,fter you creat the normal 2uantile plot you can also add the theoretical line that the data
should fall on if they were normally distributed
> qql"ne(A1,als)
In this e.ample you should see that the data is not 2uite normally distributed. There are a
few outliers+ and it does not match up at the tails of the distribution.
(. )ntermediate Plottin%
Contents
Continuous Data
Discrete Data
<iscellaneous 4ptions
We look at some more options for plotting+ and we assume that you are familiar with the
basic plotting commands (Basic (lots). , variety of different sub'ects ranging from
plotting options to the formatting of plots is given.
In many of the e.amples below we use some of RKs commands to generate random
numbers according to various distributions. The section is divided into three sections. The
focus of the first section is on graphing continuous data. The focus of the second section
is on graphing discrete data. The third section offers some miscellaneous options that are
useful in a variety of conte.ts.
(.1. Continuous Data
Contents
<ultiple Data 5ets on 4ne =lot
-rror 3ars
,dding *oise ('itter)
<ultiple Fraphs on 4ne Image
Density =lots
=airwise Relationships
5haded Regions
=lotting a 5urface
In the e.amples below a data set is defined using RKs normally distributed random
number generator.
> 7 <- rn'rm(10,s)+5,mean+20)
> % <- 2.5*7 - 1.0 + rn'rm(10,s)+4,mean+0)
> c'r(7,%)
[1] 0.1400516
.1.1. !ulti"le Data Sets on One #lot
4ne common task is to plot multiple data sets on the same plot. In many situations the
way to do this is to create the initial plot and then add additional information to the plot.
/or e.ample+ to plot bivariate data the plot command is used to initialize and create the
plot. The points command can then be used to add additional data sets to the plot.
/irst define a set of normally distributed random numbers and then plot them. (This same
data set is used throughout the e.amples below.)
> 7 <- rn'rm(10,s)+5,mean+20)
> % <- 2.5*7 - 1.0 + rn'rm(10,s)+4,mean+0)
> c'r(7,%)
[1] 0.1400516
> &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m ?t!((")
> 71 <- r!n"((9,15,25)
> %1 <- 2.5*71 - 1.0 + r!n"((9,-6,6)
> &'"nts(71,%1,c'l+2)
*ote that in the previous e.ample+ the colour for the second set of data points is set using
the col option. %ou can try different numbers to see what colours are available. /or most
installations there are at least eight options from 0 to L. ,lso note that in the e.ample
above the points are plotted as circles. The symbol that is used can be changed using the
pch option.
> 72 <- r!n"((9,15,25)
> %2 <- 2.5*72 - 1.0 + r!n"((9,-6,6)
> &'"nts(72,%2,c'l+3,&c*+2)
,gain+ try different numbers to see the various options. ,nother helpful option is to add a
legend. This can be done with the le)en! command. The options for the command+ in
order+ are the and y coordinates on the plot to place the legend followed by a list of
labels to use. There are a large number of other options so use help%le)en!& to see more
options. /or e.ample a list of colors can be given with the col option+ and a list of
symbols can be given with the pch option.
> &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m ?t!((")
> &'"nts(71,%1,c'l+2,&c*+3)
> &'"nts(72,%2,c'l+4,&c*+5)
> le=en)(14,10,c("Lr"="nal","'ne","tA'"),c'l+c(1,2,4),&c*+c(1,3,5))
/igure 0.
The three data sets displayed on the same graph.
,nother common task is to change the limits of the a.es to change the size of the plotting
area. This is achieved using the lim and ylim options in the plot command. 3oth options
take a vector of length two that have the minimum and ma.imum values.
> &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m
?t!((",7l"m+c(0,30),%l"m+c(0,100))
> &'"nts(71,%1,c'l+2,&c*+3)
> &'"nts(72,%2,c'l+4,&c*+5)
> le=en)(14,10,c("Lr"="nal","'ne","tA'"),c'l+c(1,2,4),&c*+c(1,3,5))
.1.2. $rror %ars
,nother common task is to add error bars to a set of data points. This can be
accomplished using the arrows command. The arrows command takes two pairs of
coordinates+ that is two pairs of and y values. The command then draws a line between
each pair and adds an !arrow head$ with a given length and angle.
> &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m ?t!((")
> 7."=* <- 7
> %."=* <- % + abs(rn'rm(10,s)+3.5))
> 72'A <- 7
> %2'A <- % - abs(rn'rm(10,s)+3.1))
> arr'As(7."=*,%."=*,72'A,%2'A,c'l+2,an=le+40,len=t*+0.1,c')e+3)
/igure 7.
, data set with error bars added.
*ote that the option co!e is used to specify where the bars are drawn. Its value can be 0+
7+ or 8. If co!e is 0 the bars are drawn at pairs given in the first argument. If co!e is 7 the
bars are drawn at the pairs given in the second argument. If co!e is 8 the bars are drawn
at both.
.1.3. &''ing Noise ()itter*
In the previous e.ample a little bit of !noise$ was added to the pairs to produce an
artificial offset. This is a common thing to do for making plots. , simpler way to
accomplish this is to use the *itter command.
> n!mberE*"te <- r*%&er(400,4,5,3)
> n!mber-*"&&e) <- r*%&er(400,2,1,3)
> &ar(m(r'A+c(1,2))
> &l't(n!mberE*"te,n!mber-*"&&e),7lab+"#!mber E*"te 5arbles 3raAn",
%lab+"#!mber -*"&&e) 5arbles 3raAn",ma"n+"H!ll"n= 5arbles")
> &l't(F"tter(n!mberE*"te),F"tter(n!mber-*"&&e)),7lab+"#!mber E*"te
5arbles 3raAn",
%lab+"#!mber -*"&&e) 5arbles 3raAn",ma"n+"H!ll"n= 5arbles E"t*
Q"tter")
/igure 8.
=oints with noise added using the *itter command.
.1.4. !ulti"le +ra",s on One -mage
*ote that a new command was used in the previous e.ample. The par command can be
used to set different parameters. In the e.ample above the m+row was set. The plots are
arranged in an array where the default number of rows and columns is one. The m+row
parameter is a vector with two entries. The first entry is the number of rows of images.
The second entry is the number of columns. In the e.ample above the plots were arranged
in one row with two plots across.
> &ar(m(r'A+c(2,3))
> b'7&l't(n!mberE*"te,ma"n+"("rst &l't")
> b'7&l't(n!mber-*"&&e),ma"n+"sec'n) &l't")
> &l't(F"tter(n!mberE*"te),F"tter(n!mber-*"&&e)),7lab+"#!mber E*"te
5arbles 3raAn",
%lab+"#!mber -*"&&e) 5arbles 3raAn",ma"n+"H!ll"n= 5arbles E"t*
Q"tter")
> *"st(n!mberE*"te,ma"n+"('!rt* &l't")
> *"st(n!mber-*"&&e),ma"n+"("(t* &l't")
> m'sa"c&l't(table(n!mberE*"te,n!mber-*"&&e)),ma"n+"s"7t* &l't")
/igure 9.
,n array of plots using the par command.
.1.5. Density #lots
There are times when you do not want to plot specific points but wish to plot a density.
This can be done using the smoothScatter command.
> n!mberE*"te <- r*%&er(30,4,5,3)
> n!mber-*"&&e) <- r*%&er(30,2,1,3)
> sm''t*?catter(n!mberE*"te,n!mber-*"&&e),
7lab+"E*"te 5arbles",%lab+"-*"&&e) 5arbles",ma"n+"3raA"n=
5arbles")
/igure E.
The SmoothScatter can be used to plot densities.
*ote that the previous e.ample may benefit by superimposing a grid to help delimit the
points of interest. This can be done using the )ri! command.
> n!mberE*"te <- r*%&er(30,4,5,3)
> n!mber-*"&&e) <- r*%&er(30,2,1,3)
> sm''t*?catter(n!mberE*"te,n!mber-*"&&e),
7lab+"E*"te 5arbles",%lab+"-*"&&e) 5arbles",ma"n+"3raA"n=
5arbles")
> =r")(4,3)
.1.. #airwise .elations,i"s
There are times that you want to e.plore a large number of relationships. , number of
relationships can be plotted at one time using the pairs command. The idea is that you
give it a matri. or a data frame+ and the command will create a scatter plot of all
combinations of the data.
> !3ata <- rn'rm(20)
> 3ata <- rn'rm(20,mean+5)
> A3ata <- !3ata + 2*3ata + rn'rm(20,s)+0.5)
> 73ata <- -2*!3ata+rn'rm(20,s)+0.1)
> %3ata <- 3*3ata+rn'rm(20,s)+2.5)
> ) <- )ata.(rame(!+!3ata,+3ata,A+A3ata,7+73ata,%+%3ata)
> &a"rs())
/igure E.
:sing pairs to produce all permutations of a set of relationships on one graph.
.1./. S,a'e' .egions
, shaded region can be plotted using the poly)on command. The poly)on command takes
a pair of vectors+ and y+ and shades the region enclosed by the coordinate pairs. In the
e.ample below a blue s2uare is drawn. The vertices are defined starting from the lower
left. /ive pairs of points are given because the starting point and the ending point is the
same.
> 7 + c(-1,1,1,-1,-1)
> % + c(-1,-1,1,1,-1)
> &l't(7,%)
> &'l%='n(7,%,c'l+Mbl!eM)
>
, more complicated e.ample is given below. In this e.ample the re'ection region for a
right sided hypothesis test is plotted+ and it is shaded in red. , set of custom a.es is
constructed+ and symbols are plotted using the epression command.
> st)3e <- 0.15G
> 7 <- seq(-5,5,b%+0.01)
> % <- )n'rm(7,s)+st)3e)
> r"=*t <- qn'rm(0.45,s)+st)3e)
> &l't(7,%,t%&e+"l",7a7t+"n",%lab+"&",
7lab+e7&ress"'n(&aste(M$ss!me) 3"str"b!t"'n '( M,bar(7))),
a7es+>$2?<,%l"m+c(0,ma7(%)*1.05),7l"m+c(m"n(7),ma7(7)),
(rame.&l't+>$2?<)
> a7"s(1,at+c(-5,r"=*t,0,5),
&'s + c(0,0),
labels+c(e7&ress"'n(M M),e7&ress"'n(bar(7)
[cr]),e7&ress"'n(m![0]),e7&ress"'n(M M)))
> a7"s(2)
> 70eFect <- seq(r"=*t,5,b%+0.01)
> %0eFect <- )n'rm(70eFect,s)+st)3e)
> &'l%='n(c(70eFect,70eFect[len=t*(70eFect)],70eFect[1]),
c(%0eFect,0, 0), c'l+Mre)M)
/igure M.
:sing poly)on to produce a shaded region.
The a.es are drawn separately. This is done by first suppressing the plotting of the a.es
in the plot command+ and the horizontal a.is is drawn separately. ,lso note that the
epression command is used to plot a Freek character and also produce subscripts.
.1.0. #lotting a Sur1ace
/inally+ a brief e.ample of how to plot a surface is given. The persp command will plot a
surface with a specified perspective. In the e.ample+ a grid is defined by multiplying a
row and column vector to give the and then the y values for a grid. 4nce that is done a
sine function is specified on the grid+ and the persp command is used to plot it.
> 7 <- seq(0,2*&",b%+&"D100)
> % <- 7
> 7= <- (7*0+1) K*K t(%)
> %= <- (7) K*K t(%*0+1)
> ( <- s"n(7=+%=)
> &ers&(7,%,(,t*eta+-10,&*"+40)
>
The NON notation is used to perform matri. multiplication.
(.2. Discrete Data
Contents
3arplot
<osaic =lot
In the e.amples below a data set is defined using RKs hypergeometric random number
generator.
> n!mberE*"te <- r*%&er(30,4,5,3)
> n!mber-*"&&e) <- r*%&er(30,2,1,3)
.2.1. %ar"lot
The plot command will try to produce the appropriate plots based on the data type. The
data that is defined above+ though+ is numeric data. %ou need to convert the data to
factors to make sure that the plot command treats it in an appropriate way. The as.+actor
command is used to cast the data as factors and ensures that R treats it as discrete data.
> n!mberE*"te <- r*%&er(30,4,5,3)
> n!mberE*"te <- as.(act'r(n!mberE*"te)
> &l't(n!mberE*"te)
>
In this case R will produce a barplot. The barplot command can also be used to create a
barplot. The barplot command re2uires a vector of heights+ though+ and you cannot
simply give it the raw data. The fre2uencies for the barplot command can be easily
calculated using the table command.
> n!mberE*"te <- r*%&er(30,4,5,3)
> t'tals <- table(n!mberE*"te)
> t'tals
n!mberE*"te
0 1 2 3
4 13 11 2
> bar&l't(t'tals,ma"n+"#!mber 3raAs",%lab+">req!enc%",7lab+"3raAs")
>
In the previous e.ample the barplot command is used to set the title for the plot and the
labels for the a.es. The labels on the ticks for the horizontal a.is are automatically
generated using the labels on the table. %ou can change the labels by setting the row
names of the table.
> t'tals <- table(n!mberE*"te)
> r'Anames(t'tals) <- c("n'ne","'ne","tA'","t*ree")
> t'tals
n!mberE*"te
n'ne 'ne tA' t*ree
4 13 11 2
> bar&l't(t'tals,ma"n+"#!mber 3raAs",%lab+">req!enc%",7lab+"3raAs")
>
The order of the fre2uencies is the same as the order in the table. If you change the order
in the table it will change the way it appears in the barplot. /or e.ample+ if you wish to
arrange the fre2uencies in descending order you can use the sort command with the
!ecreasin) option set to TRUE.
> bar&l't(s'rt(t'tals,)ecreas"n=+:0;<),ma"n+"#!mber
3raAs",%lab+">req!enc%",7lab+"3raAs")
The inde.ing features of R can be used to change the order of the fre2uencies manually.
> t'tals
n!mberE*"te
n'ne 'ne tA' t*ree
4 13 11 2
> s'rt(t'tals,)ecreas"n=+:0;<)
n!mberE*"te
'ne tA' n'ne t*ree
13 11 4 2
> t'tals[c(3,1,4,2)]
n!mberE*"te
tA' n'ne t*ree 'ne
11 4 2 13
> bar&l't(t'tals[c(3,1,4,2)])
>
The barplot command returns the horizontal locations of the bars. :sing the locations and
putting together the previous ideas a =areto Chart can be constructed.
> 72'c + bar&l't(s'rt(t'tals,)ecreas"n=+:0;<),ma"n+"#!mber 3raAs",
%lab+">req!enc%",7lab+"3raAs",%l"m+c(0,s!m(t'tals)+2))
> &'"nts(72'c,c!ms!m(s'rt(t'tals,)ecreas"n=+:0;<)),t%&e+M&M,c'l+2)
> &'"nts(72'c,c!ms!m(s'rt(t'tals,)ecreas"n=+:0;<)),t%&e+MlM)
>
.2.2. !osaic #lot
<osaic plots are used to display proportions for tables that are divided into two or more
conditional distributions. 6ere we focus on two way tables to keep things simpler. It is
assumed that you are familiar with using tables in R (see the section on two way tables
for more information Two ,ay Tables).
6ere we will use a made up data set primarily to make it easier to figure out what R is
doing. The fictitious data set is defined below. The idea is that si.teen children of age
eight are interviewed. They are asked two 2uestions. The first 2uestion is+ !do you
believe in 5anta Claus.$ If they say that they do then the term !belief$ is recorded+
otherwise the term !no belief$ is recorded. The second 2uestion is whether or not they
have an older brother+ older sister+ or no older sibling. (We are keeping it simple here?)
The answers that are recorded are !older brother+$ !older sister+$ or !no older sibling.$
> santa <- )ata.(rame(bel"e(+c(Mn' bel"e(M,Mn' bel"e(M,Mn' bel"e(M,Mn'
bel"e(M,
Mbel"e(M,Mbel"e(M,Mbel"e(M,Mbel"e(M,
Mbel"e(M,Mbel"e(M,Mn' bel"e(M,Mn'
bel"e(M,
Mbel"e(M,Mbel"e(M,Mn' bel"e(M,Mn'
bel"e(M),
s"bl"n=+c(M'l)er br't*erM,M'l)er br't*erM,M'l)er
br't*erM,M'l)er s"sterM,
Mn' 'l)er s"bl"n=M,Mn' 'l)er
s"bl"n=M,Mn' 'l)er s"bl"n=M,M'l)er s"sterM,
M'l)er br't*erM,M'l)er s"sterM,M'l)er
br't*erM,M'l)er s"sterM,
Mn' 'l)er s"bl"n=M,M'l)er
s"sterM,M'l)er br't*erM,Mn' 'l)er s"bl"n=M)
)
> santa
bel"e( s"bl"n=
1 n' bel"e( 'l)er br't*er
2 n' bel"e( 'l)er br't*er
3 n' bel"e( 'l)er br't*er
4 n' bel"e( 'l)er s"ster
5 bel"e( n' 'l)er s"bl"n=
6 bel"e( n' 'l)er s"bl"n=
1 bel"e( n' 'l)er s"bl"n=
9 bel"e( 'l)er s"ster
4 bel"e( 'l)er br't*er
10 bel"e( 'l)er s"ster
11 n' bel"e( 'l)er br't*er
12 n' bel"e( 'l)er s"ster
13 bel"e( n' 'l)er s"bl"n=
14 bel"e( 'l)er s"ster
15 n' bel"e( 'l)er br't*er
16 n' bel"e( n' 'l)er s"bl"n=
> s!mmar%(santa)
bel"e( s"bl"n=
bel"e( 89 n' 'l)er s"bl"n=85
n' bel"e(89 'l)er br't*er 86
'l)er s"ster 85
The data is given as strings+ so R will automatically treat them as categorical data+ and the
data types are +actors. If you plot the individual data sets+ the plot command will default
to producing barplots.
> &l't(santa,bel"e()
> &l't(santa,s"bl"n=)
>
If you provide both data sets it will automatically produce a mosaic plot which
demonstrates the relative fre2uencies in terms of the resulting areas.
> &l't(santa,s"bl"n=,santa,bel"e()
> &l't(santa,bel"e(,santa,s"bl"n=)
The mosaicplot command can be called directly
> t'tals + table(santa,bel"e(,santa,s"bl"n=)
> t'tals
n' 'l)er s"bl"n= 'l)er br't*er 'l)er s"ster
bel"e( 4 1 3
n' bel"e( 1 5 2
> m'sa"c&l't(t'tals,ma"n+"Ll)er /r't*ers are QerCs",
7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=")
The colours of the plot can be specified by setting the col argument. The argument is a
vector of colours used for the rows. 5ee /gure refPfigureQRintermediate=lottingP for an
e.ample.
> m'sa"c&l't(t'tals,ma"n+"Ll)er /r't*ers are QerCs",
7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=",
c'l+c(2,3,4))
/igure Q.
-.ample of a mosaic plot with colours.
The labels and the order that they appear in the plot can be changed in e.actly the same
way as given in the e.amples for barplot above.
> r'Anames(t'tals)
[1] "bel"e(" "n' bel"e("
> c'lnames(t'tals)
[1] "n' 'l)er s"bl"n=" "'l)er br't*er" "'l)er s"ster"
> r'Anames(t'tals) <- c("/el"ees","3'es n't /el"ee")
> c'lnames(t'tals) <- c("#' Ll)er","Ll)er /r't*er","Ll)er ?"ster")
> t'tals
#' Ll)er Ll)er /r't*er Ll)er ?"ster
/el"ees 4 1 3
3'es n't /el"ee 1 5 2
> m'sa"c&l't(t'tals,ma"n+"Ll)er /r't*ers are QerCs",
7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=")
When changing the order keep in mind that the table is a two dimensional array. The
indices must include both rows and columns+ and the transpose command (t) can be used
to switch how it is plotted with respect to the vertical and horizontal a.es.
> t'tals
#' Ll)er Ll)er /r't*er Ll)er ?"ster
/el"ees 4 1 3
3'es n't /el"ee 1 5 2
> t'tals[c(2,1),c(2,3,1)]
Ll)er /r't*er Ll)er ?"ster #' Ll)er
3'es n't /el"ee 5 2 1
/el"ees 1 3 4
> m'sa"c&l't(t'tals[c(2,1),c(2,3,1)],ma"n+"Ll)er /r't*ers are QerCs",
7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=",c'l+c(2,3,4))
> m'sa"c&l't(t(t'tals),ma"n+"Ll)er /r't*ers are QerCs",
%lab+"/el"e( "n ?anta -la!s",7lab+"Ll)er ?"bl"n=",c'l+c(2,3))
(.3. *iscellaneous Options
Contents
<ultiple Representations 4n 4ne =lot
<ultiple Windows
=rint To , /ile
,nnotation and /ormatting
The previous e.amples only provide a slight hint at what is possible. 6ere we give some
e.amples that provide a demonstration of the way the different commands can be
combined and the options that allow them to be used together.
.3.1. !ulti"le .e"resentations On One #lot
/irst+ an e.ample of a histogram with an appro.imation of the density function is given.
In addition to the density function a horizontal bo.plot is added to the plot with a rug
representation of the data on the horizontal a.is. The horizontal bounds on the histogram
will be specified. The bo.plot must be a!!e! to the histogram+ and it will be raised above
the histogram.
> 7 + re7&(20,rate+4)
> *"st(7,%l"m+c(0,19),ma"n+":*"s $re $n ."st'=ram",7lab+"R")
> b'7&l't(7,at+16,*'r"N'ntal+:0;<,a))+:0;<)
> r!=(7,s")e+1)
> ) + )ens"t%(7)
> &'"nts(),t%&e+MlM,c'l+3)
>
.3.2. !ulti"le Win'ows
The !e$ commands allow you to create and manipulate multiple graphics windows. %ou
can create new windows using the !e$.new%& command+ and you can choose which one to
make active using the !e$.set%& command. The !e$.list%&+ !e$.net%&+ and !e$.pre$%&
command can be used to list the graphical devices that are available.
In the following e.ample three devices are created. They are listed+ and different plots are
created on the different devices.
> )e.neA()
> )e.neA()
> )e.neA()
> )e.l"st()
R11ca"r' R11ca"r' R11ca"r'
2 3 4
> )e.set(3)
R11ca"r'
3
> 7 + rn'rm(20)
> *"st(7)
> )e.set(2)
R11ca"r'
2
> b'7&l't(7)
> )e.set(4)
R11ca"r'
4
> qqn'rm(7)
> qql"ne(7)
> )e.ne7t()
R11ca"r'
2
> )e.set()e.ne7t())
R11ca"r'
2
> &l't()ens"t%(7))
>
.3.3. #rint To & File
There are a couple ways to print a plot to a file. It is important to be able to work with
graphics devices as shown in the previous subsection (Multiple ,in!ows). The first way
e.plored is to use the !e$.print command. This command will print a copy of the
currently active device+ and the format is defined by the !e$ice argument.
In the e.ample below+ the current window is printed to a png file called !hist.png$ that is
711 pi.els wide.
> 7 + rn'rm(100)
> *"st(7)
> )e.&r"nt()e"ce+&n=,A")t*+200,"*"st.&n=")
>
To find out what devices are available on your system use the help command.
> *el&()e"ce)
,nother way to print to a file is to create a device in the same way as the graphical
devices were created in the previous section. 4nce the device is created+ the various plot
commands are given+ and then the device is turned off to write the results to a file.
> &n=(("le+"*"st.&n=")
> *"st(7)
> r!=(7,s")e+1)
> )e.'((()
.3.4. &nnotation an' Formatting
3asic annotation can be performed in the regular plotting commmands. /or e.ample+
there are options to specify labels on a.es as well as titles. <ore options are available
using the ais command.
<ost of the primary plotting commands have an option to turn off the generation of the
a.es using the aes=FALSE option. The a.es can be then added using the ais command
which allows for a greater number of options.
In the e.ample below a bivariate set of random numbers are generated and plotted as a
scatter plot. The a.es are added+ but the horizontal a.is is located in the center of the data
rather than at the bottom of the figure. *ote that the horizontal and vertical a.es are
added separately+ and are specified using the first argument to the command. (:se
help%ais& for a full list of options.)
> 7 <- rn'rm(10,mean+0,s)+4)
> % <- 3*7-1+rn'rm(10,mean+0,s)+2)
s!mmar%(7)
5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7.
-6.1550 -1.4290 1.2000 -0.1425 2.4190 3.1630
> s!mmar%(%)
5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7.
-11.4900 -4.0060 0.1051 -1.2060 9.2600 10.4200
> &l't(7,%,a7es+>$2?<,c'l+2)
> a7"s(1,&'s+c(0,0),at+seq(-1,5,b%+1))
> a7"s(2,&'s+c(0,0),at+seq(-19,11,b%+2))
>
In the previous e.ample the at option is used to specify the tick marks.
When using the plot command the default behavior is to draw an a.is as well as draw a
bo. around the plotting area. The drawing of the bo. can be suppressed using the bty
option. The value can be !o+$ !l+$ !Q+$ !c+$ !u$+ !S+$ or !n.$ (The lines drawn roughly
look like the letter given e.cept for !n$ which draws no lines.) The bo. can be drawn
later using the bo command as well.
> 7 <- rn'rm(10,mean+0,s)+4)
> % <- 3*7-1+rn'rm(10,mean+0,s)+2)
> &l't(7,%,bt%+"1")
> &l't(7,%,bt%+"n")
> b'7(lt%+3)
>
The par command can be used to set the default values for various parameters. , couple
are given below. In the e.ample below the default background is set to grey+ no bo. will
be drawn around the window+ and the margins for the a.es will be twice the normal size.
> &ar(bt%+"l")
> &ar(b=+"=ra%")
> &ar(me7+2)
> 7 <- rn'rm(10,mean+0,s)+4)
> % <- 3*7-1+rn'rm(10,mean+0,s)+2)
> &l't(7,%)
>
,nother common task is to place a te.t string on the plot. The tet command takes a
coordinate and a label+ and it places the label at the given coordinate. The tet command
has options for setting the offset+ size+ font+ and other options. In the e.ample below the
label !numbers?$ is placed on the plot. :se help%tet& to see more options.
> 7 <- rn'rm(10,mean+0,s)+4)
> % <- 3*7-1+rn'rm(10,mean+0,s)+2)
> &l't(7,%)
> te7t(-1,-2,"n!mbersS")
>
The default te.t command will cut off any characters outside of the plot area. This
behavior can be overridden using the p! option.
> 7 <- rn'rm(10,mean+0,s)+4)
> % <- 3*7-1+rn'rm(10,mean+0,s)+2)
> &l't(7,%)
> te7t(-1,-2,"'!ts")e t*e area",7&)+:0;<)
+. )nde&in% )nto Vectors
Contents
Inde.ing With Gogicals
*ot ,vailable or <issing Values
Indices With Gogical -.pression
Fiven a vector of data one common task is to isolate particular entries or censor items
that meet some criteria. 6ere we show how to use RKs inde.ing notation to pick out
specific items within a vector.
+.1. )nde&in% ,ith -o%icals
We first give an e.ample of how to select specific items in a vector. The first step is to
define a vector of data+ and the second step is to define a vector made up of logical
values. When the vector of logical values is used for the inde. into the vector of data
values only the items corresponding to the variables that evaluate to TRUE are returned
> a <- c(1,2,3,4,5)
> b <- c(:0;<,>$2?<,>$2?<,:0;<,>$2?<)
> a[b]
[1] 1 4
> ma7(a[b])
[1] 4
> s!m(a[b])
[1] 5
+.2. Not ./ailable or *issin% Values
4ne common problem is data entries that are marked -A or not available. There is a
predefined variable called -A that can be used to indicate missing information. The
problem with this is that some functions throw an error if one of the entries in the data is
*,. 5ome functions allow you to ignore the missing values through special options
> a <- c(1,2,3,4,#$)
> a
[1] 1 2 3 4 #$
> s!m(a)
[1] #$
> s!m(a,na.rm+:0;<)
[1] 10
There are other times+ though+ when this option is not available+ or you simply want to
censor them. The is.na function can be used to determine which items are not available.
The logical !not$ operator in R is the . symbol. When used with the inde.ing notation the
items within a vector that are -A can be easily removed
> a <- c(1,2,3,4,#$)
> "s.na(a)
[1] >$2?< >$2?< >$2?< >$2?< :0;<
> S"s.na(a)
[1] :0;< :0;< :0;< :0;< >$2?<
> a[S"s.na(a)]
[1] 1 2 3 4
> b <- a[S"s.na(a)]
> b
[1] 1 2 3 4
+.3. )ndices ,ith -o%ical 0&pression
,ny logical e.pression can be used as an inde. which opens a wide range of possibilities.
/or e.ample+ you can remove or focus on entries that match specific criteria. /or
e.ample+ you might want to remove all entries that are above a certain value
> a + c(6,2,5,3,9,2)
> a
[1] 6 2 5 3 9 2
> b + a[a<6]
> b
[1] 2 5 3 2
/or another e.ample+ suppose you want to 'oin together the values that match two
different factors in another vector
> ) + )ata.(rame('ne+as.(act'r(c(MaM,MaM,MbM,MbM,McM,McM)),
tA'+c(1,2,3,4,5,6))
> )
'ne tA'
1 a 1
2 a 2
3 b 3
4 b 4
5 c 5
6 c 6
> b't* + ),tA'[(),'ne++MaM) @ (),'ne++MbM)]
> b't*
[1] 1 2 3 4
*ote that a single T@K was used in the previous e.ample. There is a difference between T@@K
and T@K. , single bar will perform a vector operation+ term by term+ while a double bar
will evaluate to a single TR:- or /,G5- result
> (c(:0;<,:0;<))@(c(>$2?<,:0;<))
[1] :0;< :0;<
> (c(:0;<,:0;<))@@(c(>$2?<,:0;<))
[1] :0;<
> (c(:0;<,:0;<))T(c(>$2?<,:0;<))
[1] >$2?< :0;<
> (c(:0;<,:0;<))TT(c(>$2?<,:0;<))
[1] >$2?<

You might also like