This document introduces basic data types in R including numbers, strings, factors, data frames, and logical values. It discusses how to create and manipulate variables of each type, including using assignment operators, vectors, lists, and functions like factor() and data.frame(). Factors allow categorical variables to be represented and tables can organize frequency counts of factors in one-way and two-way layouts.
This document introduces basic data types in R including numbers, strings, factors, data frames, and logical values. It discusses how to create and manipulate variables of each type, including using assignment operators, vectors, lists, and functions like factor() and data.frame(). Factors allow categorical variables to be represented and tables can organize frequency counts of factors in one-way and two-way layouts.
This document introduces basic data types in R including numbers, strings, factors, data frames, and logical values. It discusses how to create and manipulate variables of each type, including using assignment operators, vectors, lists, and functions like factor() and data.frame(). Factors allow categorical variables to be represented and tables can organize frequency counts of factors in one-way and two-way layouts.
Contents Variable Types Tables We look at some of the ways that R can store and organize data. This is a basic introduction to a small subset of the different data types recognized by R and is not comprehensive in any sense. The main goal is to demonstrate the different kinds of information R can handle. It is assumed that you know how to enter data or read data files which is covered in the first chapter. 2.1. Variable Types 2.1.1. Numbers The way to work with real numbers has already been covered in the first chapter and is briefly discussed here. The most basic way to store a number is to make an assignment of a single number > a <- 3 > The !"#$ tells R to take the number to the right of the symbol and store it in a variable whose name is given on the left. %ou can also use the !&$ symbol. When you make an assignment R does not print out any information. If you want to see what value a variable has 'ust type the name of the variable on a line and press the enter key > a [1] 3 This allows you to do all sorts of basic operations and save the numbers > b <- sqrt(a*a+3) > b [1] 3.464102 If you want to get a list of the variables that you have defined in a particular session you can list them all using the ls command > ls() [1] "a" "b" %ou are not limited to 'ust saving a single number. %ou can create a list (also called a !vector$) using the c command > a <- c(1,2,3,4,5) > a [1] 1 2 3 4 5 > a+1 [1] 2 3 4 5 6 > mean(a) [1] 3 > ar(a) [1] 2.5 %ou can get access to particular entries in the vector in the following manner > a <- c(1,2,3,4,5) > a[1] [1] 1 > a[2] [1] 2 > a[0] n!mer"c(0) > a[5] [1] 5 > a[6] [1] #$ *ote that the zero entry is used to indicate how the data is stored. The first entry in the vector is the first number+ and if you try to get a number past the last number you get !*,.$ -.amples of the sort of operations you can do on vectors is given in a ne.t chapter. To initialize a list of numbers the numeric command can be used. /or e.ample+ to create a list of 01 numbers+ initialized to zero+ use the following command > a <- n!mer"c(10) > a [1] 0 0 0 0 0 0 0 0 0 0 If you wish to determine the data type used for a variable the type command > t%&e'((a) [1] ")'!ble" 2.1.2. Strings %ou are not limited to 'ust storing numbers. %ou can also store strings. , string is specified by using 2uotes. 3oth single and double 2uotes will work > a <- "*ell'" > a [1] "*ell'" > b <- c("*ell'","t*ere") > b [1] "*ell'" "t*ere" > b[1] [1] "*ell'" The name of the type given to strings is character+ > t%&e'((a) [1] "c*aracter" > a + c*aracter(20) > a [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" 2.1.3. Factors ,nother important way R can store data is as a factor. 4ften times an e.periment includes trials for different levels of some e.planatory variable. /or e.ample+ when looking at the impact of carbon dio.ide on the growth rate of a tree you might try to observe how different trees grow when e.posed to different preset concentrations of carbon dio.ide. The different levels are also called factors. ,ssuming you know how to read in a file+ we will look at the data file given in the first chapter. 5everal of the variables in the file are factors > s!mmar%(tree,-./0) $1 $2 $3 $4 $5 $6 $1 /1 /2 /3 /4 /5 /6 /1 -1 -2 -3 -4 -5 -6 3 1 1 3 1 3 1 1 3 3 3 3 3 3 1 3 1 3 1 1 -1 -26 -21 31 32 33 34 35 36 31 1 1 1 1 1 3 1 1 1 1 3ecause the set of options given in the data file corresponding to the !C63R$ column are not all numbers R automatically assumes that it is a factor. When you use summary on a factor it does not print out the five point summary+ rather it prints out the possible values and the fre2uency that they occur. In this data set several of the columns are factors+ but the researchers used numbers to indicate the different levels. /or e.ample+ the first column+ labeled !C+$ is a factor. -ach trees was grown in an environment with one of four different possible levels of carbon dio.ide. The researchers 2uite sensibly labeled these four environments as 0+ 7+ 8+ and 9. :nfortunately+ R cannot determine that these are factors and must assume that they are regular numbers. This is a common problem and there is a way to tell R to treat the !C$ column as a set of factors. %ou specify that a variable is a factor using the factor command. In the following e.ample we convert tree;C into a factor > tree,- [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 [34] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 > s!mmar%(tree,-) 5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7. 1.000 2.000 2.000 2.514 3.000 4.000 > tree,- <- (act'r(tree,-) > tree,- [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 [34] 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 2eels8 1 2 3 4 > s!mmar%(tree,-) 1 2 3 4 9 23 10 13 > leels(tree,-) [1] "1" "2" "3" "4" 4nce a vector is converted into a set of factors then R treats it in a different manner then when it is a set of numbers. , set of factors have a decrete set of possible values+ and it does not make sense to try to find averages or other numerical descriptions. 4ne thing that is important is the number of times that each factor appears+ called their !fre2uencies+$ which is printed using the summary command. 2.1.4. Data Frames ,nother way that information is stored is in data frames. This is a way to take many vectors of different types and store them in the same variable. The vectors can be of all different types. /or e.ample+ a data frame may contain many lists+ and each list might be a list of factors+ strings+ or numbers. There are different ways to create and manipulate data frames. <ost are beyond the scope of this introduction. They are only mentioned here to offer a more complete description. =lease see the first chapter for more information on data frames. 4ne e.ample of how to create a data frame is given below > a <- c(1,2,3,4) > b <- c(2,4,6,9) > leels <- (act'r(c("$","/","$","/")) > b!bba <- )ata.(rame(("rst+a, sec'n)+b, (+leels) > b!bba ("rst sec'n) ( 1 1 2 $ 2 2 4 / 3 3 6 $ 4 4 9 / > s!mmar%(b!bba) ("rst sec'n) ( 5"n. 81.00 5"n. 82.0 $82 1st 6!.81.15 1st 6!.83.5 /82 5e)"an 82.50 5e)"an 85.0 5ean 82.50 5ean 85.0 3r) 6!.83.25 3r) 6!.86.5 5a7. 84.00 5a7. 89.0 > b!bba,("rst [1] 1 2 3 4 > b!bba,sec'n) [1] 2 4 6 9 > b!bba,( [1] $ / $ / 2eels8 $ / 2.1.5. Logical ,nother important data type is the logical type. There are two predefined variables+ TRUE and FALSE > a + :0;< > t%&e'((a) [1] "l'="cal" > b + >$2?< > t%&e'((b) [1] "l'="cal" The standard logical operators can be used " less than > great than "& less than or e2ual >& greater than or e2ual && e2ual to ?& not e2ual to @ entry wise or @@ or ? not A entry wise and AA and .or(a+b) e.clusive or *ote that there is a difference between operators that act on entries within a vector and the whole vector > a + c(:0;<,>$2?<) > b + c(>$2?<,>$2?<) > a@b [1] :0;< >$2?< > a@@b [1] :0;< > 7'r(a,b) [1] :0;< >$2?< There are a large number of functions that test to determine the type of a variable. /or e.ample the is.numeric function can determine if a variable is numeric > a + c(1,2,3) > "s.n!mer"c(a) [1] :0;< > "s.(act'r(a) [1] >$2?< 2.2. Tables ,nother common way to store information is in a table. 6ere we look at how to define both one way and two way tables. We only look at how to create and define tablesB the functions used in the analysis of proportions are e.amined in another chapter. 2.2.1. One Way Tables The first e.ample is for a one way table. 4ne way tables are not the most interesting e.ample+ but it is a good place to start. 4ne way to create a table is using the table command. The arguments it takes is a vector of factors+ and it calculates the fre2uency that each factor occurs. 6ere is an e.ample of how to create a one way table > a <- (act'r(c("$","$","/","$","/","/","-","$","-")) > res!lts <- table(a) > res!lts a $ / - 4 3 2 > attr"b!tes(res!lts) ,)"m [1] 3 ,)"mnames ,)"mnames,a [1] "$" "/" "-" ,class [1] "table" > s!mmar%(res!lts) #!mber '( cases "n table8 4 #!mber '( (act'rs8 1 If you know the number of occurrences for each factor then it is possible to create the table directly+ but the process is+ unfortunately+ a bit more convoluted. There is an easier way to define one#way tables (a table with one row)+ but it does not e.tend easily to two# way tables (tables with more than one row). %ou must first create a matri. of numbers. , matri. is like a vector in that it is a list of numbers+ but it is different in that you can have both rows and columns of numbers. /or e.ample+ in our e.ample above the number of occurrences of !,$ is 9+ the number of occurrences of !3$ is 8+ and the number of occurrences of !C$ is 7. We will create one row of numbers. The first column contains a 9+ the second column contains a 8+ and the third column contains a 7 > 'cc!r <- matr"7(c(4,3,2),nc'l+3,b%r'A+:0;<) > 'cc!r [,1] [,2] [,3] [1,] 4 3 2 ,t this point the variable !occur$ is a matri. with one row and three columns of numbers. To dress it up and use it as a table we would like to give it labels for each columns 'ust like in the previous e.ample. 4nce that is done we convert the matri. to a table using the as.table command > c'lnames('cc!r) <- c("$","/","-") > 'cc!r $ / - [1,] 4 3 2 > 'cc!r <- as.table('cc!r) > 'cc!r $ / - $ 4 3 2 932 - 4022954 > attr"b!tes('cc!r) ,)"m [1] 1 3 ,)"mnames ,)"mnames[[1]] [1] "$" ,)"mnames[[2]] [1] "$" "/" "-" ,class [1] "table" 2.2.2. Two Way Tables If you want to add rows to your table 'ust add another vector to the argument of the table command. In the e.ample below we have two 2uestions. In the first 2uestion the responses are labeled !*ever+$ !5ometimes+$ or !,lways.$ In the second 2uestion the responses are labeled !%es+$ !*o+$ or !<aybe.$ The set of vectors !a+$ and !b+$ contain the response for each measurement. The third item in !a$ is how the third person responded to the first 2uestion+ and the third item in !b$ is how the third person responded to the second 2uestion. > a <- c("?'met"mes","?'met"mes","#eer","$lAa%s","$lAa%s","?'met"mes","?'met" mes","#eer") > b <- c("5a%be","5a%be","Bes","5a%be","5a%be","#'","Bes","#'") > res!lts <- table(a,b) > res!lts b a 5a%be #' Bes $lAa%s 2 0 0 #eer 0 1 1 ?'met"mes 2 1 1 The table command allows us to do a very 2uick calculation+ and we can immediately see that two people who said !<aybe$ to the first 2uestion also said !5ometimes$ to the second 2uestion. Cust as in the case with one#way tables it is possible to manually enter two way tables. The procedure is e.actly the same as above e.cept that we now have more than one row. We give a brief e.ample below to demonstrate how to enter a two#way table that includes breakdown of a group of people by both their gender and whether or not they smoke. %ou enter all of the data as one long list but tell R to break it up into some number of columns > se7sm'Ce<-matr"7(c(10,120,65,140),nc'l+2,b%r'A+:0;<) > r'Anames(se7sm'Ce)<-c("male","(emale") > c'lnames(se7sm'Ce)<-c("sm'Ce","n'sm'Ce") > se7sm'Ce <- as.table(se7sm'Ce) > se7sm'Ce sm'Ce n'sm'Ce male 10 120 (emale 65 140 The matri. command creates a two by two matri.. The byrow=TRUE option indicates that the numbers are filled in across the rows first+ and the ncols=2 indicates that there are two columns. 3. Basic Operations and Numerical Descriptions Contents 3asic 4perations 3asic *umerical Descriptions 4perations on Vectors We look at some of the basic operations that you can perform on lists of numbers. It is assumed that you know how to enter data or read data files which is covered in the first chapter+ and you know about the basic data types. 3.1. Basic Operations 4nce you have a vector (or a list of numbers) in memory most basic operations are available. <ost of the basic operations will act on a whole vector and can be used to 2uickly perform a large number of calculations with a single command. There is one thing to note+ if you perform an operation on more than one vector it is often necessary that the vectors all contain the same number of entries. 6ere we first define a vector which we will call !a$ and will look at how to add and subtract constant numbers from all of the numbers in the vector. /irst+ the vector will contain the numbers 0+ 7+ 8+ and 9. We then see how to add E to each of the numbers+ subtract 01 from each of the numbers+ multiply each number by 9+ and divide each number by E. > a <- c(1,2,3,4) > a [1] 1 2 3 4 > a + 5 [1] 6 1 9 4 > a - 10 [1] -4 -9 -1 -6 > a*4 [1] 4 9 12 16 > aD5 [1] 0.2 0.4 0.6 0.9 We can save the results in another vector called b > b <- a - 10 > b [1] -4 -9 -1 -6 If you want to take the s2uare root+ find e raised to each number+ the logarithm+ etc.+ then the usual commands can be used > sqrt(a) [1] 1.000000 1.414214 1.132051 2.000000 > e7&(a) [1] 2.119292 1.394056 20.095531 54.549150 > l'=(a) [1] 0.0000000 0.6431412 1.0496123 1.3962444 > e7&(l'=(a)) [1] 1 2 3 4 3y combining operations and using parentheses you can make more complicated e.pressions > c <- (a + sqrt(a))D(e7&(2)+1) > c [1] 0.2394059 0.4064942 0.5640143 0.1152115 *ote that you can do the same operations with vector arguments. /or e.ample to add the elements in vector a to the elements in vector b use the following command > a + b [1] -9 -6 -4 -2 The operation is performed on an element by element basis. *ote this is true for almost all of the basic functions. 5o you can bring together all kinds of complicated e.pressions > a*b [1] -4 -16 -21 -24 > aDb [1] -0.1111111 -0.2500000 -0.4295114 -0.6666661 > (a+3)D(sqrt(1-b)*2-1) [1] 0.1512364 1.0000000 1.2994234 1.6311303 %ou need to be careful of one thing. When you do operations on vectors they are performed on an element by element basis. 4ne ramification of this is that all of the vectors in an e.pression must be the same length. If the lengths of the vectors differ then you may get an error message+ or worse+ a warning message and unpredictable results > a <- c(1,2,3) > b <- c(10,11,12,13) > a+b [1] 11 13 15 14 Earn"n= messa=e8 l'n=er 'bFect len=t* "s n't a m!lt"&le '( s*'rter 'bFect len=t* "n8 a + b ,s you work in R and create new vectors it can be easy to lose track of what variables you have defined. To get a list of all of the variables that have been defined use the ls() command > ls() [1] "a" "b" "b!bba" "c" "last.Aarn"n=" [6] "tree" "trees" /inally+ you should keep in mind that the basic operations almost always work on an element by element basis. There are rare e.ceptions to this general rule. /or e.ample+ if you look at the minimum of two vectors using the min command you will get the minimum of all of the numbers. There is a special command+ called pmin+ that may be the command you want in some circumstances > a <- c(1,-2,3,-4) > b <- c(-1,2,-3,4) > m"n(a,b) [1] -4 > &m"n(a,b) [1] -1 -2 -3 -4 3.2. Basic Numerical Descriptions Fiven a vector of numbers there are some basic commands to make it easier to get some of the basic numerical descriptions of a set of numbers. 6ere we assume that you can read in the tree data that was discussed in a previous chapter. It is assumed that it is stored in a variable called tree > tree <- rea).cs(("le+"trees41.cs",*ea)er+:0;<,se&+",")G > names(tree) [1] "-" "#" "-./0" "0<H" "2>/5" "?:/5" "0:/5" "2>#--" [4] "?:#--" "0:#--" "2>/--" "?:/--" "0:/--" "2>-$--" "?:-$--" "0:-$--" [11] "2>I--" "?:I--" "0:I--" "2>5J--" "?:5J--" "0:5J--" "2>H--" "?:H--" [25] "0:H--" "2>?--" "?:?--" "0:?--" -ach column in the data frame can be accessed as a vector. /or e.ample the numbers associated with the leaf biomass (G/3<) can be found using tree$LFBM > tree,2>/5 [1] 0.430 0.400 0.450 0.920 0.520 1.320 0.400 1.190 0.490 0.210 0.210 0.310 [13] 0.650 0.190 0.520 0.300 0.590 0.490 0.590 0.590 0.410 0.490 1.160 1.210 [25] 1.190 0.930 1.220 0.110 1.020 0.130 0.690 0.610 0.100 0.920 0.160 0.110 [31] 1.640 1.490 0.140 1.240 1.120 0.150 0.340 0.910 0.410 0.560 0.550 0.610 [44] 1.260 0.465 0.940 0.410 1.010 1.220 The following commands can be used to get the mean+ median+ 2uantiles+ minimum+ ma.imum+ variance+ and standard deviation of a set of numbers > mean(tree,2>/5) [1] 0.1644014 > me)"an(tree,2>/5) [1] 0.12 > q!ant"le(tree,2>/5) 0K 25K 50K 15K 100K 0.1300 0.4900 0.1200 1.0015 1.1600 > m"n(tree,2>/5) [1] 0.13 > ma7(tree,2>/5) [1] 1.16 > ar(tree,2>/5) [1] 0.1424392 > s)(tree,2>/5) [1] 0.3190111 /inally+ the summary command will print out the min+ ma.+ mean+ median+ and 2uantiles > s!mmar%(tree,2>/5) 5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7. 0.1300 0.4900 0.1200 0.1644 1.0090 1.1600 The summary command is especially nice because if you give it a data frame it will print out the summary for every vector in the data frame > s!mmar%(tree) - # -./0 0<H 2>/5 5"n. 81.000 5"n. 81.000 $1 8 3 5"n. 8 1.00 5"n. 8 0.1300 1st 6!.82.000 1st 6!.81.000 $4 8 3 1st 6!.8 4.00 1st 6!.80.4900 5e)"an 82.000 5e)"an 82.000 $6 8 3 5e)"an 814.00 5e)"an 8 0.1200 5ean 82.514 5ean 81.426 /2 8 3 5ean 813.05 5ean 8 0.1644 3r) 6!.83.000 3r) 6!.83.000 /3 8 3 3r) 6!.820.00 3r) 6!.81.0015 5a7. 84.000 5a7. 83.000 /4 8 3 5a7. 820.00 5a7. 8 1.1600 (Lt*er)836 #$Ms 811.00 ?:/5 0:/5 2>#-- ?:#-- 5"n. 80.0300 5"n. 80.1200 5"n. 80.990 5"n. 80.3100 1st 6!.80.1400 1st 6!.80.2925 1st 6!.81.312 1st 6!.80.6400 5e)"an 80.2450 5e)"an 80.4450 5e)"an 81.550 5e)"an 80.1950 5ean 80.2993 5ean 80.4662 5ean 81.560 5ean 80.1912 3r) 6!.80.3900 3r) 6!.80.5500 3r) 6!.81.199 3r) 6!.80.4350 5a7. 80.1200 5a7. 81.5100 5a7. 82.160 5a7. 81.2400 0:#-- 2>/-- ?:/-- 0:/-- 5"n. 80.4100 5"n. 825.00 5"n. 814.00 5"n. 815.00 1st 6!.80.6000 1st 6!.834.00 1st 6!.811.00 1st 6!.814.00 5e)"an 80.1500 5e)"an 831.00 5e)"an 819.00 5e)"an 820.00 5ean 80.1344 5ean 836.46 5ean 819.90 5ean 821.43 3r) 6!.80.9100 3r) 6!.841.00 3r) 6!.820.00 3r) 6!.823.00 5a7. 81.5500 5a7. 849.00 5a7. 821.00 5a7. 841.00 2>-$-- ?:-$-- 0:-$-- 2>I-- 5"n. 80.2100 5"n. 80.1300 5"n. 80.1100 5"n. 80.6500 1st 6!.80.2600 1st 6!.80.1600 1st 6!.80.1600 1st 6!.80.9100 5e)"an 80.2400 5e)"an 80.1100 5e)"an 80.1650 5e)"an 80.4000 5ean 80.2964 5ean 80.1114 5ean 80.1654 5ean 80.4053 3r) 6!.80.3100 3r) 6!.80.1915 3r) 6!.80.1100 3r) 6!.80.4400 5a7. 80.3600 5a7. 80.2400 5a7. 80.2400 5a7. 81.1900 #$Ms 81.0000 ?:I-- 0:I-- 2>5J-- ?:5J-- 5"n. 80.910 5"n. 80.330 5"n. 80.0100 5"n. 80.100 1st 6!.80.440 1st 6!.80.400 1st 6!.80.1000 1st 6!.80.110 5e)"an 81.055 5e)"an 80.415 5e)"an 80.1200 5e)"an 80.130 5ean 81.105 5ean 80.413 5ean 80.1104 5ean 80.135 3r) 6!.81.210 3r) 6!.80.520 3r) 6!.80.1300 3r) 6!.80.150 5a7. 81.520 5a7. 80.640 5a7. 80.1400 5a7. 80.140 0:5J-- 2>H-- ?:H-- 0:H-- 5"n. 80.04000 5"n. 80.1500 5"n. 80.1500 5"n. 80.1000 1st 6!.80.06000 1st 6!.80.2000 1st 6!.80.2200 1st 6!.80.1300 5e)"an 80.01000 5e)"an 80.2400 5e)"an 80.2900 5e)"an 80.1450 5ean 80.06649 5ean 80.2391 5ean 80.2101 5ean 80.1465 3r) 6!.80.01000 3r) 6!.80.2100 3r) 6!.80.3115 3r) 6!.80.1600 5a7. 80.04000 5a7. 80.3100 5a7. 80.4100 5a7. 80.2100 2>?-- ?:?-- 0:?-- 5"n. 80.0400 5"n. 80.1400 5"n. 80.0400 1st 6!.80.1325 1st 6!.80.1600 1st 6!.80.1200 5e)"an 80.1600 5e)"an 80.1900 5e)"an 80.1300 5ean 80.1661 5ean 80.1911 5ean 80.1249 3r) 6!.80.1915 3r) 6!.80.2000 3r) 6!.80.1415 5a7. 80.2600 5a7. 80.2900 5a7. 80.1100 3.3. Operations on Vectors 6ere we look at some commonly used commands that perform operations on lists. The commands include the sort+ min+ ma+ and sum commands. /irst+ the sort command can sort the given vector in either ascending or descending order > a + c(2,4,6,3,1,5) > b + s'rt(a) > c + s'rt(a,)ecreas"n= + :0;<) > a [1] 2 4 6 3 1 5 > b [1] 1 2 3 4 5 6 > c [1] 6 5 4 3 2 1 The min and the ma commands find the minimum and the ma.imum numbers in the vector > m"n(a) [1] 1 > ma7(a) [1] 6 /inally+ the sum command adds up the numbers in the vector > s!m(a) [1] 21 4. Basic Probability Distributions Contents The *ormal Distribution The t Distribution The 3inomial Distribution The Chi#52uared Distribution We look at some of the basic operations associated with probability distributions. There are a large number of probability distributions available+ but we only look at a few. If you would like to know what distributions are available you can do a search using the command help.search(!distribution$). 6ere we give details about the commands associated with the normal distribution and briefly mention the commands for other distributions. The functions for different distributions are very similar where the differences are noted below. /or this chapter it is assumed that you know how to enter data which is covered in the previous chapters. To get a full list of the distributions available in R you can use the following command *el&(3"str"b!t"'ns) /or every distribution there are four commands. The commands for each distribution are prepended with a letter to indicate the functionality !d$ returns the height of the probability density function !p$ returns the cumulative density function !2$ returns the inverse cumulative density function (2uantiles) !r$ returns randomly generated numbers 4.1. The Normal Distribution There are four functions that can be used to generate the values associated with the normal distribution. %ou can get a full list of them and their options using the help command > *el&(#'rmal) The first function we look at it is !norm. Fiven a set of values it returns the height of the probability distribution at each point. If you only give the points it assumes you want to use a mean of zero and standard deviation of one. There are options to use different values for the mean and standard deviation+ though > )n'rm(0) [1] 0.3494423 > )n'rm(0)*sqrt(2*&") [1] 1 > )n'rm(0,mean+4) [1] 0.0001339302 > )n'rm(0,mean+4,s)+10) [1] 0.03692101 > <- c(0,1,2) > )n'rm() [1] 0.34944229 0.24141012 0.05344041 > 7 <- seq(-20,20,b%+.1) > % <- )n'rm(7) > &l't(7,%) > % <- )n'rm(7,mean+2.5,s)+0.1) > &l't(7,%) The second function we e.amine is pnorm. Fiven a number or a list it computes the probability that a normally distributed random number will be less than that number. This function also goes by the rather ominous title of the !Cumulative Distribution /unction.$ It accepts the same options as dnorm > &n'rm(0) [1] 0.5 > &n'rm(1) [1] 0.9413441 > &n'rm(0,mean+2) [1] 0.02215013 > &n'rm(0,mean+2,s)+3) [1] 0.2524425 > <- c(0,1,2) > &n'rm() [1] 0.5000000 0.9413441 0.4112444 > 7 <- seq(-20,20,b%+.1) > % <- &n'rm(7) > &l't(7,%) > % <- &n'rm(7,mean+3,s)+4) > &l't(7,%) If you wish to find the probability that a number is larger than the given number you can use the lower.tail option > &n'rm(0,l'Aer.ta"l+>$2?<) [1] 0.5 > &n'rm(1,l'Aer.ta"l+>$2?<) [1] 0.1596553 > &n'rm(0,mean+2,l'Aer.ta"l+>$2?<) [1] 0.4112444 The ne.t function we look at is "norm which is the inverse of pnorm. The idea behind "norm is that you give it a probability+ and it returns the number whose cumulative distribution matches the probability. /or e.ample+ if you have a normally distributed random variable with mean zero and standard deviation one+ then if you give the function a probability it returns the associated H#score > qn'rm(0.5) [1] 0 > qn'rm(0.5,mean+1) [1] 1 > qn'rm(0.5,mean+1,s)+2) [1] 1 > qn'rm(0.5,mean+2,s)+2) [1] 2 > qn'rm(0.5,mean+2,s)+4) [1] 2 > qn'rm(0.25,mean+2,s)+2) [1] 0.6510205 > qn'rm(0.333) [1] -0.4316442 > qn'rm(0.333,s)+3) [1] -1.244433 > qn'rm(0.15,mean+5,s)+2) [1] 6.34949 > + c(0.1,0.3,0.15) > qn'rm() [1] -1.2915516 -0.5244005 0.6144949 > 7 <- seq(0,1,b%+.05) > % <- qn'rm(7) > &l't(7,%) > % <- qn'rm(7,mean+3,s)+2) > &l't(7,%) > % <- qn'rm(7,mean+3,s)+0.1) > &l't(7,%) The last function we e.amine is the rnorm function which can generate random numbers whose distribution is normal. The argument that you give it is the number of random numbers that you want+ and it has optional arguments to specify the mean and standard deviation > rn'rm(4) [1] 1.2391211 -0.2323254 -1.2003091 -1.6119493 > rn'rm(4,mean+3) [1] 2.633090 3.611496 2.039961 2.601433 > rn'rm(4,mean+3,s)+3) [1] 4.590556 2.414403 4.156041 6.345944 > rn'rm(4,mean+3,s)+3) [1] 3.000952 3.114190 10.032021 3.245661 > % <- rn'rm(200) > *"st(%) > % <- rn'rm(200,mean+-2) > *"st(%) > % <- rn'rm(200,mean+-2,s)+4) > *"st(%) > qqn'rm(%) > qql"ne(%) 4.2. The t Distribution There are four functions that can be used to generate the values associated with the t distribution. %ou can get a full list of them and their options using the help command > *el&(:3"st) These commands work 'ust like the commands for the normal distribution. 4ne difference is that the commands assume that the values are normalized to mean zero and standard deviation one+ so you have to use a little algebra to use these functions in practice. The other difference is that you have to specify the number of degrees of freedom. The commands follow the same kind of naming convention+ and the names of the commands are !t+ pt+ "t+ and rt. , few e.amples are given below to show how to use the different commands. /irst we have the distribution function+ !t > 7 <- seq(-20,20,b%+.5) > % <- )t(7,)(+10) > &l't(7,%) > % <- )t(7,)(+50) > &l't(7,%) *e.t we have the cumulative probability distribution function > &t(-3,)(+10) [1] 0.006611929 > &t(3,)(+10) [1] 0.4433292 > 1-&t(3,)(+10) [1] 0.006611929 > &t(3,)(+20) [1] 0.446462 > 7 + c(-3,-4,-2,-1) > &t((mean(7)-2)Ds)(7),)(+20) [1] 0.001165549 > &t((mean(7)-2)Ds)(7),)(+40) [1] 0.000603064 *e.t we have the inverse cumulative probability distribution function > qt(0.05,)(+10) [1] -1.912461 > qt(0.45,)(+10) [1] 1.912461 > qt(0.05,)(+20) [1] -1.124119 > qt(0.45,)(+20) [1] 1.124119 > <- c(0.005,.025,.05) > qt(,)(+253) [1] -2.545401 -1.464395 -1.650944 > qt(,)(+25) [1] -2.191436 -2.054534 -1.109141 /inally random numbers can be generated according to the t distribution > rt(3,)(+10) [1] 0.4440430 2.1134365 0.6195262 > rt(3,)(+20) [1] 0.1043300 -1.4692149 0.0115013 > rt(3,)(+20) [1] 0.9023932 -0.4154190 -1.0546125 4.3. The Binomial Distribution There are four functions that can be used to generate the values associated with the binomial distribution. %ou can get a full list of them and their options using the help command > *el&(/"n'm"al) These commands work 'ust like the commands for the normal distribution. The binomial distribution re2uires two e.tra parameters+ the number of trials and the probability of success for a single trial. The commands follow the same kind of naming convention+ and the names of the commands are dbinom+ pbinom+ 2binom+ and rbinom. , few e.amples are given below to show how to use the different commands. /irst we have the distribution function+ !binom > 7 <- seq(0,50,b%+1) > % <- )b"n'm(7,50,0.2) > &l't(7,%) > % <- )b"n'm(7,50,0.6) > &l't(7,%) > 7 <- seq(0,100,b%+1) > % <- )b"n'm(7,100,0.6) > &l't(7,%) *e.t we have the cumulative probability distribution function > &b"n'm(24,50,0.5) [1] 0.4439624 > &b"n'm(25,50,0.5) [1] 0.5561316 > &b"n'm(25,51,0.5) [1] 0.5 > &b"n'm(26,51,0.5) [1] 0.610116 > &b"n'm(25,50,0.5) [1] 0.5561316 > &b"n'm(25,50,0.25) [1] 0.444462 > &b"n'm(25,500,0.25) [1] 4.455659e-33 *e.t we have the inverse cumulative probability distribution function > qb"n'm(0.5,51,1D2) [1] 25 > qb"n'm(0.25,51,1D2) [1] 23 > &b"n'm(23,51,1D2) [1] 0.2914241 > &b"n'm(22,51,1D2) [1] 0.200531 /inally random numbers can be generated according to the binomial distribution > rb"n'm(5,100,.2) [1] 30 23 21 14 19 > rb"n'm(5,100,.1) [1] 66 66 59 69 63 4.4. The Chi!"uared Distribution There are four functions that can be used to generate the values associated with the Chi# 52uared distribution. %ou can get a full list of them and their options using the help command > *el&(-*"sq!are) These commands work 'ust like the commands for the normal distribution. The first difference is that it is assumed that you have normalized the value so no mean can be specified. The other difference is that you have to specify the number of degrees of freedom. The commands follow the same kind of naming convention+ and the names of the commands are !chis"+ pchis"+ "chis"+ and rchis". , few e.amples are given below to show how to use the different commands. /irst we have the distribution function+ !chis" > 7 <- seq(-20,20,b%+.5) > % <- )c*"sq(7,)(+10) > &l't(7,%) > % <- )c*"sq(7,)(+12) > &l't(7,%) *e.t we have the cumulative probability distribution function > &c*"sq(2,)(+10) [1] 0.003654941 > &c*"sq(3,)(+10) [1] 0.01951544 > 1-&c*"sq(3,)(+10) [1] 0.491424 > &c*"sq(3,)(+20) [1] 4.041501e-06 > 7 + c(2,4,5,6) > &c*"sq(7,)(+20) [1] 1.114255e-01 4.644909e-05 2.113521e-04 1.102499e-03 *e.t we have the inverse cumulative probability distribution function > qc*"sq(0.05,)(+10) [1] 3.440244 > qc*"sq(0.45,)(+10) [1] 19.30104 > qc*"sq(0.05,)(+20) [1] 10.95091 > qc*"sq(0.45,)(+20) [1] 31.41043 > <- c(0.005,.025,.05) > qc*"sq(,)(+253) [1] 149.9161 210.9355 211.1113 > qc*"sq(,)(+25) [1] 10.51465 13.11412 14.61141 /inally random numbers can be generated according to the Chi#52uared distribution > rc*"sq(3,)(+10) [1] 16.90015 20.29412 12.34044 > rc*"sq(3,)(+20) [1] 11.939919 9.541436 11.496312 > rc*"sq(3,)(+20) [1] 11.14214 23.96401 24.91251 #. Basic Plots Contents 5trip Charts 6istograms 3o.plots 5catter =lots *ormal II =lots We look at some of the ways R can display information graphically. This is a basic introduction to some of the basic plotting commands. It is assumed that you know how to enter data or read data files which is covered in the first chapter+ and it is assumed that you are familiar with the different data types. In each of the topics that follow it is assumed that two different data sets+ w0.dat and treesJ0.csv have been read and defined using the same variables as in the first chapter. 3oth of these data sets come from the study discussed on the web site given in the first chapter. We assume that they are read using !read.csv$ into variables w# and tree > A1 <- rea).cs(("le+"A1.)at",se&+",",*ea)+:0;<) > names(A1) [1] "als" > tree <- rea).cs(("le+"trees41.cs",se&+",",*ea)+:0;<) > names(tree) [1] "-" "#" "-./0" "0<H" "2>/5" "?:/5" "0:/5" "2>#--" [4] "?:#--" "0:#--" "2>/--" "?:/--" "0:/--" "2>-$--" "?:-$--" "0:-$--" [11] "2>I--" "?:I--" "0:I--" "2>5J--" "?:5J--" "0:5J--" "2>H--" "?:H--" [25] "0:H--" "2>?--" "?:?--" "0:?--" #.1. !trip Charts , strip chart is the most basic type of plot available. It plots the data in order along a line with each data point represented as a bo.. 6ere we provide e.amples using the w# data frame mentioned at the top of this page+ and the one column of the data is w#$$als. To create a strip chart of this data use the stripchart command > *el&(str"&c*art) > str"&c*art(A1,als) 5trip Chart This is the most basic possible strip charts. The stripchart() command takes many of the standard plot() options for labeling and annotations. ,s you can see this is about as bare bones as you can get. There is no title nor a.es labels. It only shows how the data looks if you were to put it all along one line and mark out a bo. at each point. If you would prefer to see which points are repeated you can specify that repeated points be stacked > str"&c*art(A1,als,met*')+"stacC") , variation on this is to have the bo.es moved up and down so that there is more separation between them > str"&c*art(A1,als,met*')+"F"tter") If you do not want the bo.es plotting in the horizontal direction you can plot them in the vertical direction > str"&c*art(A1,als,ert"cal+:0;<) > str"&c*art(A1,als,ert"cal+:0;<,met*')+"F"tter") 5ince you should always annotate your plots there are many different ways to add titles and labels. 4ne way is within the stripchart command itself > str"&c*art(A1,als,met*')+"stacC", ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM, 7lab+M/"'5ass '( 2eaesM) If you have a plot already and want to add a title+ you can use the title command > t"tle(M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,7lab+M/"'5ass '( 2eaesM) *ote that this simply adds the title and labels and will write over the top of any titles or labels you already have. #.2. $isto%rams , histogram is very common plot. It plots the fre2uencies that data appears within certain ranges. 6ere we provide e.amples using the w# data frame mentioned at the top of this page+ and the one column of data is w#$$als. To plot a histogram of the data use the !hist$ command > *"st(A1,als) > *"st(A1,als,ma"n+"3"str"b!t"'n '( A1",7lab+"A1") 6istogram 4ptions <any of the basic plot commands accept the same options. The help%hist& command will give you options specifically for the hist command. %ou can also use the help command to see more but also note that if you use help%plot& you may see more options. -.periment with different options to see what you can do. ,s you can see R will automatically calculate the intervals to use. There are many options to determine how to break up the intervals. 6ere we look at 'ust one way+ varying the domain size and number of breaks. If you would like to know more about the other options check out the help page > *el&(*"st) %ou can specify the number of breaks to use using the breaks option. 6ere we look at the histogram for various numbers of breaks > *"st(A1,als,breaCs+2) > *"st(A1,als,breaCs+4) > *"st(A1,als,breaCs+6) > *"st(A1,als,breaCs+9) > *"st(A1,als,breaCs+12) > %ou can also vary the size of the domain using the .lim option. This option takes a vector with two entries in it+ the left value and the right value > *"st(A1,als,breaCs+12,7l"m+c(0,10)) > *"st(A1,als,breaCs+12,7l"m+c(-1,2)) > *"st(A1,als,breaCs+12,7l"m+c(0,2)) > *"st(A1,als,breaCs+12,7l"m+c(1,1.3)) > *"st(A1,als,breaCs+12,7l"m+c(0.4,1.3)) > The options for adding titles and labels are e.actly the same as for strip charts. %ou should always annotate your plots and there are many different ways to add titles and labels. 4ne way is within the hist command itself > *"st(A1,als, ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM, 7lab+M/"'5ass '( 2eaesM) If you have a plot already and want to change or add a title+ you can use the title command > t"tle(M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,7lab+M/"'5ass '( 2eaesM) *ote that this simply adds the title and labels and will write over the top of any titles or labels you already have. It is not uncommon to add other kinds of plots to a histogram. /or e.ample+ one of the options to the stripchart command is to add it to a plot that has already been drawn. /or e.ample+ you might want to have a histogram with the strip chart drawn across the top. The addition of the strip chart might give you a better idea of the density of the data > *"st(A1,als,ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,7lab+M/"'5ass '( 2eaesM,%l"m+c(0,16)) > str"&c*art(A1,als,a))+:0;<,at+15.5) #.3. Bo&plots , bo.plot provides a graphical view of the median+ 2uartiles+ ma.imum+ and minimum of a data set. 6ere we provide e.amples using two different data sets. The first is the w# data frame mentioned at the top of this page+ and the one column of data is w#$$als. The second is the tree data frame from the treesJ0.csv data file which is also mentioned at the top of the page. We first use the w# data set and look at the bo.plot of this data set > b'7&l't(A1,als) ,gain+ this is a very plain graph+ and the title and labels can be specified in e.actly the same way as in the stripchart and hist commands > b'7&l't(A1,als, ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM, %lab+M/"'5ass '( 2eaesM) *ote that the default orientation is to plot the bo.plot vertically. 3ecause of this we used the ylab option to specify the a.is label. There are a large number of options for this command. To see more of the options see the help page > *el&(b'7&l't) ,s an e.ample you can specify that the bo.plot be plotted horizontally by specifying the horizontal option > b'7&l't(A1,als, ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM, 7lab+M/"'5ass '( 2eaesM, *'r"N'ntal+:0;<) The option to plot the bo. plot horizontally can be put to good use to display a bo. plot on the same image as a histogram. %ou need to specify the add option+ specify where to put the bo. plot using the at option+ and turn off the addition of a.es using the a.es option > *"st(A1,als,ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,7lab+M/"'5ass '( 2eaesM,%l"m+c(0,16)) > b'7&l't(A1,als,*'r"N'ntal+:0;<,at+15.5,a))+:0;<,a7es+>$2?<) If you are feeling really crazy you can take a histogram and add a bo. plot and a strip chart > *"st(A1,als,ma"n+M2ea( /"'5ass "n ."=* -L2 <n"r'nmentM,7lab+M/"'5ass '( 2eaesM,%l"m+c(0,16)) > b'7&l't(A1,als,*'r"N'ntal+:0;<,at+16,a))+:0;<,a7es+>$2?<) > str"&c*art(A1,als,a))+:0;<,at+15) 5ome people shell out good money to have this much fun. /or the second part on bo.plots we will look at the second data frame+ !tree+$ which comes from the treesJ0.csv file. To reiterate the discussion at the top of this page and the discussion in the data types chapter+ we need to specify which columns are factors > tree <- rea).cs(("le+"trees41.cs",se&+",",*ea)+:0;<) > tree,- <- (act'r(tree,-) > tree,# <- (act'r(tree,#) We can look at the bo.plot of 'ust the data for the stem biomass > b'7&l't(tree,?:/5, ma"n+M?tem /"'5ass "n 3"((erent -L2 <n"r'nmentsM, %lab+M/"'5ass '( ?temsM) That plot does not tell the whole story. It is for all of the trees+ but the trees were grown in different kinds of environments. The bo.plot command can be used to plot a separate bo. plot for each level. In this case the data is held in !tree;5T3<+$ and the different levels are stored as factors in !tree;C.$ The command to create different bo.plots is the following b'7&l't(tree,?:/5Otree,-) *ote that for the level called !7$ there are four outliers which are plotted as little circles. There are many options to annotate your plot including different labels for each level. =lease use the help(bo.plot) command for more information. #.4. !catter Plots , scatter plot provides a graphical view of the relationship between two sets of numbers. 6ere we provide e.amples using the tree data frame from the treesJ0.csv data file which is mentioned at the top of the page. In particular we look at the relationship between the stem biomass (!tree;5T3<$) and the leaf biomass (!tree;G/3<$). The command to plot each pair of points as an .#coordinate and a y#coorindate is !plot$ > &l't(tree,?:/5,tree,2>/5) It appears that there is a strong positive association between the biomass in the stems of a tree and the leaves of the tree. It appears to be a linear relationship. In fact+ the corelation between these two sets of observations is 2uite high > c'r(tree,?:/5,tree,2>/5) [1] 0.411545 Fetting back to the plot+ you should always annotate your graphs. The title and labels can be specified in e.actly the same way as with the other plotting commands > &l't(tree,?:/5,tree,2>/5, ma"n+"0elat"'ns*"& /etAeen ?tem an) 2ea( /"'mass", 7lab+"?tem /"'mass", %lab+"2ea( /"'mass") #.#. Normal '' Plots The final type of plot that we look at is the normal 2uantile plot. This plot is used to determine if your data is close to being normally distributed. %ou cannot be sure that the data is normally distributed+ but you can rule out if it is not normally distributed. 6ere we provide e.amples using the w# data frame mentioned at the top of this page+ and the one column of data is w#$$als. The command to generate a normal 2uantile plot is 22norm. %ou can give it one argument+ the univariate data set of interest > qqn'rm(A1,als) %ou can annotate the plot in e.actly the same way as all of the other plotting commands given here > qqn'rm(A1,als, ma"n+"#'rmal 6-6 Hl't '( t*e 2ea( /"'mass", 7lab+":*e'ret"cal 6!ant"les '( t*e 2ea( /"'mass", %lab+"?am&le 6!ant"les '( t*e 2ea( /"'mass") ,fter you creat the normal 2uantile plot you can also add the theoretical line that the data should fall on if they were normally distributed > qql"ne(A1,als) In this e.ample you should see that the data is not 2uite normally distributed. There are a few outliers+ and it does not match up at the tails of the distribution. (. )ntermediate Plottin% Contents Continuous Data Discrete Data <iscellaneous 4ptions We look at some more options for plotting+ and we assume that you are familiar with the basic plotting commands (Basic (lots). , variety of different sub'ects ranging from plotting options to the formatting of plots is given. In many of the e.amples below we use some of RKs commands to generate random numbers according to various distributions. The section is divided into three sections. The focus of the first section is on graphing continuous data. The focus of the second section is on graphing discrete data. The third section offers some miscellaneous options that are useful in a variety of conte.ts. (.1. Continuous Data Contents <ultiple Data 5ets on 4ne =lot -rror 3ars ,dding *oise ('itter) <ultiple Fraphs on 4ne Image Density =lots =airwise Relationships 5haded Regions =lotting a 5urface In the e.amples below a data set is defined using RKs normally distributed random number generator. > 7 <- rn'rm(10,s)+5,mean+20) > % <- 2.5*7 - 1.0 + rn'rm(10,s)+4,mean+0) > c'r(7,%) [1] 0.1400516 .1.1. !ulti"le Data Sets on One #lot 4ne common task is to plot multiple data sets on the same plot. In many situations the way to do this is to create the initial plot and then add additional information to the plot. /or e.ample+ to plot bivariate data the plot command is used to initialize and create the plot. The points command can then be used to add additional data sets to the plot. /irst define a set of normally distributed random numbers and then plot them. (This same data set is used throughout the e.amples below.) > 7 <- rn'rm(10,s)+5,mean+20) > % <- 2.5*7 - 1.0 + rn'rm(10,s)+4,mean+0) > c'r(7,%) [1] 0.1400516 > &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m ?t!((") > 71 <- r!n"((9,15,25) > %1 <- 2.5*71 - 1.0 + r!n"((9,-6,6) > &'"nts(71,%1,c'l+2) *ote that in the previous e.ample+ the colour for the second set of data points is set using the col option. %ou can try different numbers to see what colours are available. /or most installations there are at least eight options from 0 to L. ,lso note that in the e.ample above the points are plotted as circles. The symbol that is used can be changed using the pch option. > 72 <- r!n"((9,15,25) > %2 <- 2.5*72 - 1.0 + r!n"((9,-6,6) > &'"nts(72,%2,c'l+3,&c*+2) ,gain+ try different numbers to see the various options. ,nother helpful option is to add a legend. This can be done with the le)en! command. The options for the command+ in order+ are the and y coordinates on the plot to place the legend followed by a list of labels to use. There are a large number of other options so use help%le)en!& to see more options. /or e.ample a list of colors can be given with the col option+ and a list of symbols can be given with the pch option. > &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m ?t!((") > &'"nts(71,%1,c'l+2,&c*+3) > &'"nts(72,%2,c'l+4,&c*+5) > le=en)(14,10,c("Lr"="nal","'ne","tA'"),c'l+c(1,2,4),&c*+c(1,3,5)) /igure 0. The three data sets displayed on the same graph. ,nother common task is to change the limits of the a.es to change the size of the plotting area. This is achieved using the lim and ylim options in the plot command. 3oth options take a vector of length two that have the minimum and ma.imum values. > &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m ?t!((",7l"m+c(0,30),%l"m+c(0,100)) > &'"nts(71,%1,c'l+2,&c*+3) > &'"nts(72,%2,c'l+4,&c*+5) > le=en)(14,10,c("Lr"="nal","'ne","tA'"),c'l+c(1,2,4),&c*+c(1,3,5)) .1.2. $rror %ars ,nother common task is to add error bars to a set of data points. This can be accomplished using the arrows command. The arrows command takes two pairs of coordinates+ that is two pairs of and y values. The command then draws a line between each pair and adds an !arrow head$ with a given length and angle. > &l't(7,%,7lab+"Pn)e&en)ent",%lab+"3e&en)ent",ma"n+"0an)'m ?t!((") > 7."=* <- 7 > %."=* <- % + abs(rn'rm(10,s)+3.5)) > 72'A <- 7 > %2'A <- % - abs(rn'rm(10,s)+3.1)) > arr'As(7."=*,%."=*,72'A,%2'A,c'l+2,an=le+40,len=t*+0.1,c')e+3) /igure 7. , data set with error bars added. *ote that the option co!e is used to specify where the bars are drawn. Its value can be 0+ 7+ or 8. If co!e is 0 the bars are drawn at pairs given in the first argument. If co!e is 7 the bars are drawn at the pairs given in the second argument. If co!e is 8 the bars are drawn at both. .1.3. &''ing Noise ()itter* In the previous e.ample a little bit of !noise$ was added to the pairs to produce an artificial offset. This is a common thing to do for making plots. , simpler way to accomplish this is to use the *itter command. > n!mberE*"te <- r*%&er(400,4,5,3) > n!mber-*"&&e) <- r*%&er(400,2,1,3) > &ar(m(r'A+c(1,2)) > &l't(n!mberE*"te,n!mber-*"&&e),7lab+"#!mber E*"te 5arbles 3raAn", %lab+"#!mber -*"&&e) 5arbles 3raAn",ma"n+"H!ll"n= 5arbles") > &l't(F"tter(n!mberE*"te),F"tter(n!mber-*"&&e)),7lab+"#!mber E*"te 5arbles 3raAn", %lab+"#!mber -*"&&e) 5arbles 3raAn",ma"n+"H!ll"n= 5arbles E"t* Q"tter") /igure 8. =oints with noise added using the *itter command. .1.4. !ulti"le +ra",s on One -mage *ote that a new command was used in the previous e.ample. The par command can be used to set different parameters. In the e.ample above the m+row was set. The plots are arranged in an array where the default number of rows and columns is one. The m+row parameter is a vector with two entries. The first entry is the number of rows of images. The second entry is the number of columns. In the e.ample above the plots were arranged in one row with two plots across. > &ar(m(r'A+c(2,3)) > b'7&l't(n!mberE*"te,ma"n+"("rst &l't") > b'7&l't(n!mber-*"&&e),ma"n+"sec'n) &l't") > &l't(F"tter(n!mberE*"te),F"tter(n!mber-*"&&e)),7lab+"#!mber E*"te 5arbles 3raAn", %lab+"#!mber -*"&&e) 5arbles 3raAn",ma"n+"H!ll"n= 5arbles E"t* Q"tter") > *"st(n!mberE*"te,ma"n+"('!rt* &l't") > *"st(n!mber-*"&&e),ma"n+"("(t* &l't") > m'sa"c&l't(table(n!mberE*"te,n!mber-*"&&e)),ma"n+"s"7t* &l't") /igure 9. ,n array of plots using the par command. .1.5. Density #lots There are times when you do not want to plot specific points but wish to plot a density. This can be done using the smoothScatter command. > n!mberE*"te <- r*%&er(30,4,5,3) > n!mber-*"&&e) <- r*%&er(30,2,1,3) > sm''t*?catter(n!mberE*"te,n!mber-*"&&e), 7lab+"E*"te 5arbles",%lab+"-*"&&e) 5arbles",ma"n+"3raA"n= 5arbles") /igure E. The SmoothScatter can be used to plot densities. *ote that the previous e.ample may benefit by superimposing a grid to help delimit the points of interest. This can be done using the )ri! command. > n!mberE*"te <- r*%&er(30,4,5,3) > n!mber-*"&&e) <- r*%&er(30,2,1,3) > sm''t*?catter(n!mberE*"te,n!mber-*"&&e), 7lab+"E*"te 5arbles",%lab+"-*"&&e) 5arbles",ma"n+"3raA"n= 5arbles") > =r")(4,3) .1.. #airwise .elations,i"s There are times that you want to e.plore a large number of relationships. , number of relationships can be plotted at one time using the pairs command. The idea is that you give it a matri. or a data frame+ and the command will create a scatter plot of all combinations of the data. > !3ata <- rn'rm(20) > 3ata <- rn'rm(20,mean+5) > A3ata <- !3ata + 2*3ata + rn'rm(20,s)+0.5) > 73ata <- -2*!3ata+rn'rm(20,s)+0.1) > %3ata <- 3*3ata+rn'rm(20,s)+2.5) > ) <- )ata.(rame(!+!3ata,+3ata,A+A3ata,7+73ata,%+%3ata) > &a"rs()) /igure E. :sing pairs to produce all permutations of a set of relationships on one graph. .1./. S,a'e' .egions , shaded region can be plotted using the poly)on command. The poly)on command takes a pair of vectors+ and y+ and shades the region enclosed by the coordinate pairs. In the e.ample below a blue s2uare is drawn. The vertices are defined starting from the lower left. /ive pairs of points are given because the starting point and the ending point is the same. > 7 + c(-1,1,1,-1,-1) > % + c(-1,-1,1,1,-1) > &l't(7,%) > &'l%='n(7,%,c'l+Mbl!eM) > , more complicated e.ample is given below. In this e.ample the re'ection region for a right sided hypothesis test is plotted+ and it is shaded in red. , set of custom a.es is constructed+ and symbols are plotted using the epression command. > st)3e <- 0.15G > 7 <- seq(-5,5,b%+0.01) > % <- )n'rm(7,s)+st)3e) > r"=*t <- qn'rm(0.45,s)+st)3e) > &l't(7,%,t%&e+"l",7a7t+"n",%lab+"&", 7lab+e7&ress"'n(&aste(M$ss!me) 3"str"b!t"'n '( M,bar(7))), a7es+>$2?<,%l"m+c(0,ma7(%)*1.05),7l"m+c(m"n(7),ma7(7)), (rame.&l't+>$2?<) > a7"s(1,at+c(-5,r"=*t,0,5), &'s + c(0,0), labels+c(e7&ress"'n(M M),e7&ress"'n(bar(7) [cr]),e7&ress"'n(m![0]),e7&ress"'n(M M))) > a7"s(2) > 70eFect <- seq(r"=*t,5,b%+0.01) > %0eFect <- )n'rm(70eFect,s)+st)3e) > &'l%='n(c(70eFect,70eFect[len=t*(70eFect)],70eFect[1]), c(%0eFect,0, 0), c'l+Mre)M) /igure M. :sing poly)on to produce a shaded region. The a.es are drawn separately. This is done by first suppressing the plotting of the a.es in the plot command+ and the horizontal a.is is drawn separately. ,lso note that the epression command is used to plot a Freek character and also produce subscripts. .1.0. #lotting a Sur1ace /inally+ a brief e.ample of how to plot a surface is given. The persp command will plot a surface with a specified perspective. In the e.ample+ a grid is defined by multiplying a row and column vector to give the and then the y values for a grid. 4nce that is done a sine function is specified on the grid+ and the persp command is used to plot it. > 7 <- seq(0,2*&",b%+&"D100) > % <- 7 > 7= <- (7*0+1) K*K t(%) > %= <- (7) K*K t(%*0+1) > ( <- s"n(7=+%=) > &ers&(7,%,(,t*eta+-10,&*"+40) > The NON notation is used to perform matri. multiplication. (.2. Discrete Data Contents 3arplot <osaic =lot In the e.amples below a data set is defined using RKs hypergeometric random number generator. > n!mberE*"te <- r*%&er(30,4,5,3) > n!mber-*"&&e) <- r*%&er(30,2,1,3) .2.1. %ar"lot The plot command will try to produce the appropriate plots based on the data type. The data that is defined above+ though+ is numeric data. %ou need to convert the data to factors to make sure that the plot command treats it in an appropriate way. The as.+actor command is used to cast the data as factors and ensures that R treats it as discrete data. > n!mberE*"te <- r*%&er(30,4,5,3) > n!mberE*"te <- as.(act'r(n!mberE*"te) > &l't(n!mberE*"te) > In this case R will produce a barplot. The barplot command can also be used to create a barplot. The barplot command re2uires a vector of heights+ though+ and you cannot simply give it the raw data. The fre2uencies for the barplot command can be easily calculated using the table command. > n!mberE*"te <- r*%&er(30,4,5,3) > t'tals <- table(n!mberE*"te) > t'tals n!mberE*"te 0 1 2 3 4 13 11 2 > bar&l't(t'tals,ma"n+"#!mber 3raAs",%lab+">req!enc%",7lab+"3raAs") > In the previous e.ample the barplot command is used to set the title for the plot and the labels for the a.es. The labels on the ticks for the horizontal a.is are automatically generated using the labels on the table. %ou can change the labels by setting the row names of the table. > t'tals <- table(n!mberE*"te) > r'Anames(t'tals) <- c("n'ne","'ne","tA'","t*ree") > t'tals n!mberE*"te n'ne 'ne tA' t*ree 4 13 11 2 > bar&l't(t'tals,ma"n+"#!mber 3raAs",%lab+">req!enc%",7lab+"3raAs") > The order of the fre2uencies is the same as the order in the table. If you change the order in the table it will change the way it appears in the barplot. /or e.ample+ if you wish to arrange the fre2uencies in descending order you can use the sort command with the !ecreasin) option set to TRUE. > bar&l't(s'rt(t'tals,)ecreas"n=+:0;<),ma"n+"#!mber 3raAs",%lab+">req!enc%",7lab+"3raAs") The inde.ing features of R can be used to change the order of the fre2uencies manually. > t'tals n!mberE*"te n'ne 'ne tA' t*ree 4 13 11 2 > s'rt(t'tals,)ecreas"n=+:0;<) n!mberE*"te 'ne tA' n'ne t*ree 13 11 4 2 > t'tals[c(3,1,4,2)] n!mberE*"te tA' n'ne t*ree 'ne 11 4 2 13 > bar&l't(t'tals[c(3,1,4,2)]) > The barplot command returns the horizontal locations of the bars. :sing the locations and putting together the previous ideas a =areto Chart can be constructed. > 72'c + bar&l't(s'rt(t'tals,)ecreas"n=+:0;<),ma"n+"#!mber 3raAs", %lab+">req!enc%",7lab+"3raAs",%l"m+c(0,s!m(t'tals)+2)) > &'"nts(72'c,c!ms!m(s'rt(t'tals,)ecreas"n=+:0;<)),t%&e+M&M,c'l+2) > &'"nts(72'c,c!ms!m(s'rt(t'tals,)ecreas"n=+:0;<)),t%&e+MlM) > .2.2. !osaic #lot <osaic plots are used to display proportions for tables that are divided into two or more conditional distributions. 6ere we focus on two way tables to keep things simpler. It is assumed that you are familiar with using tables in R (see the section on two way tables for more information Two ,ay Tables). 6ere we will use a made up data set primarily to make it easier to figure out what R is doing. The fictitious data set is defined below. The idea is that si.teen children of age eight are interviewed. They are asked two 2uestions. The first 2uestion is+ !do you believe in 5anta Claus.$ If they say that they do then the term !belief$ is recorded+ otherwise the term !no belief$ is recorded. The second 2uestion is whether or not they have an older brother+ older sister+ or no older sibling. (We are keeping it simple here?) The answers that are recorded are !older brother+$ !older sister+$ or !no older sibling.$ > santa <- )ata.(rame(bel"e(+c(Mn' bel"e(M,Mn' bel"e(M,Mn' bel"e(M,Mn' bel"e(M, Mbel"e(M,Mbel"e(M,Mbel"e(M,Mbel"e(M, Mbel"e(M,Mbel"e(M,Mn' bel"e(M,Mn' bel"e(M, Mbel"e(M,Mbel"e(M,Mn' bel"e(M,Mn' bel"e(M), s"bl"n=+c(M'l)er br't*erM,M'l)er br't*erM,M'l)er br't*erM,M'l)er s"sterM, Mn' 'l)er s"bl"n=M,Mn' 'l)er s"bl"n=M,Mn' 'l)er s"bl"n=M,M'l)er s"sterM, M'l)er br't*erM,M'l)er s"sterM,M'l)er br't*erM,M'l)er s"sterM, Mn' 'l)er s"bl"n=M,M'l)er s"sterM,M'l)er br't*erM,Mn' 'l)er s"bl"n=M) ) > santa bel"e( s"bl"n= 1 n' bel"e( 'l)er br't*er 2 n' bel"e( 'l)er br't*er 3 n' bel"e( 'l)er br't*er 4 n' bel"e( 'l)er s"ster 5 bel"e( n' 'l)er s"bl"n= 6 bel"e( n' 'l)er s"bl"n= 1 bel"e( n' 'l)er s"bl"n= 9 bel"e( 'l)er s"ster 4 bel"e( 'l)er br't*er 10 bel"e( 'l)er s"ster 11 n' bel"e( 'l)er br't*er 12 n' bel"e( 'l)er s"ster 13 bel"e( n' 'l)er s"bl"n= 14 bel"e( 'l)er s"ster 15 n' bel"e( 'l)er br't*er 16 n' bel"e( n' 'l)er s"bl"n= > s!mmar%(santa) bel"e( s"bl"n= bel"e( 89 n' 'l)er s"bl"n=85 n' bel"e(89 'l)er br't*er 86 'l)er s"ster 85 The data is given as strings+ so R will automatically treat them as categorical data+ and the data types are +actors. If you plot the individual data sets+ the plot command will default to producing barplots. > &l't(santa,bel"e() > &l't(santa,s"bl"n=) > If you provide both data sets it will automatically produce a mosaic plot which demonstrates the relative fre2uencies in terms of the resulting areas. > &l't(santa,s"bl"n=,santa,bel"e() > &l't(santa,bel"e(,santa,s"bl"n=) The mosaicplot command can be called directly > t'tals + table(santa,bel"e(,santa,s"bl"n=) > t'tals n' 'l)er s"bl"n= 'l)er br't*er 'l)er s"ster bel"e( 4 1 3 n' bel"e( 1 5 2 > m'sa"c&l't(t'tals,ma"n+"Ll)er /r't*ers are QerCs", 7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=") The colours of the plot can be specified by setting the col argument. The argument is a vector of colours used for the rows. 5ee /gure refPfigureQRintermediate=lottingP for an e.ample. > m'sa"c&l't(t'tals,ma"n+"Ll)er /r't*ers are QerCs", 7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=", c'l+c(2,3,4)) /igure Q. -.ample of a mosaic plot with colours. The labels and the order that they appear in the plot can be changed in e.actly the same way as given in the e.amples for barplot above. > r'Anames(t'tals) [1] "bel"e(" "n' bel"e(" > c'lnames(t'tals) [1] "n' 'l)er s"bl"n=" "'l)er br't*er" "'l)er s"ster" > r'Anames(t'tals) <- c("/el"ees","3'es n't /el"ee") > c'lnames(t'tals) <- c("#' Ll)er","Ll)er /r't*er","Ll)er ?"ster") > t'tals #' Ll)er Ll)er /r't*er Ll)er ?"ster /el"ees 4 1 3 3'es n't /el"ee 1 5 2 > m'sa"c&l't(t'tals,ma"n+"Ll)er /r't*ers are QerCs", 7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=") When changing the order keep in mind that the table is a two dimensional array. The indices must include both rows and columns+ and the transpose command (t) can be used to switch how it is plotted with respect to the vertical and horizontal a.es. > t'tals #' Ll)er Ll)er /r't*er Ll)er ?"ster /el"ees 4 1 3 3'es n't /el"ee 1 5 2 > t'tals[c(2,1),c(2,3,1)] Ll)er /r't*er Ll)er ?"ster #' Ll)er 3'es n't /el"ee 5 2 1 /el"ees 1 3 4 > m'sa"c&l't(t'tals[c(2,1),c(2,3,1)],ma"n+"Ll)er /r't*ers are QerCs", 7lab+"/el"e( "n ?anta -la!s",%lab+"Ll)er ?"bl"n=",c'l+c(2,3,4)) > m'sa"c&l't(t(t'tals),ma"n+"Ll)er /r't*ers are QerCs", %lab+"/el"e( "n ?anta -la!s",7lab+"Ll)er ?"bl"n=",c'l+c(2,3)) (.3. *iscellaneous Options Contents <ultiple Representations 4n 4ne =lot <ultiple Windows =rint To , /ile ,nnotation and /ormatting The previous e.amples only provide a slight hint at what is possible. 6ere we give some e.amples that provide a demonstration of the way the different commands can be combined and the options that allow them to be used together. .3.1. !ulti"le .e"resentations On One #lot /irst+ an e.ample of a histogram with an appro.imation of the density function is given. In addition to the density function a horizontal bo.plot is added to the plot with a rug representation of the data on the horizontal a.is. The horizontal bounds on the histogram will be specified. The bo.plot must be a!!e! to the histogram+ and it will be raised above the histogram. > 7 + re7&(20,rate+4) > *"st(7,%l"m+c(0,19),ma"n+":*"s $re $n ."st'=ram",7lab+"R") > b'7&l't(7,at+16,*'r"N'ntal+:0;<,a))+:0;<) > r!=(7,s")e+1) > ) + )ens"t%(7) > &'"nts(),t%&e+MlM,c'l+3) > .3.2. !ulti"le Win'ows The !e$ commands allow you to create and manipulate multiple graphics windows. %ou can create new windows using the !e$.new%& command+ and you can choose which one to make active using the !e$.set%& command. The !e$.list%&+ !e$.net%&+ and !e$.pre$%& command can be used to list the graphical devices that are available. In the following e.ample three devices are created. They are listed+ and different plots are created on the different devices. > )e.neA() > )e.neA() > )e.neA() > )e.l"st() R11ca"r' R11ca"r' R11ca"r' 2 3 4 > )e.set(3) R11ca"r' 3 > 7 + rn'rm(20) > *"st(7) > )e.set(2) R11ca"r' 2 > b'7&l't(7) > )e.set(4) R11ca"r' 4 > qqn'rm(7) > qql"ne(7) > )e.ne7t() R11ca"r' 2 > )e.set()e.ne7t()) R11ca"r' 2 > &l't()ens"t%(7)) > .3.3. #rint To & File There are a couple ways to print a plot to a file. It is important to be able to work with graphics devices as shown in the previous subsection (Multiple ,in!ows). The first way e.plored is to use the !e$.print command. This command will print a copy of the currently active device+ and the format is defined by the !e$ice argument. In the e.ample below+ the current window is printed to a png file called !hist.png$ that is 711 pi.els wide. > 7 + rn'rm(100) > *"st(7) > )e.&r"nt()e"ce+&n=,A")t*+200,"*"st.&n=") > To find out what devices are available on your system use the help command. > *el&()e"ce) ,nother way to print to a file is to create a device in the same way as the graphical devices were created in the previous section. 4nce the device is created+ the various plot commands are given+ and then the device is turned off to write the results to a file. > &n=(("le+"*"st.&n=") > *"st(7) > r!=(7,s")e+1) > )e.'((() .3.4. &nnotation an' Formatting 3asic annotation can be performed in the regular plotting commmands. /or e.ample+ there are options to specify labels on a.es as well as titles. <ore options are available using the ais command. <ost of the primary plotting commands have an option to turn off the generation of the a.es using the aes=FALSE option. The a.es can be then added using the ais command which allows for a greater number of options. In the e.ample below a bivariate set of random numbers are generated and plotted as a scatter plot. The a.es are added+ but the horizontal a.is is located in the center of the data rather than at the bottom of the figure. *ote that the horizontal and vertical a.es are added separately+ and are specified using the first argument to the command. (:se help%ais& for a full list of options.) > 7 <- rn'rm(10,mean+0,s)+4) > % <- 3*7-1+rn'rm(10,mean+0,s)+2) s!mmar%(7) 5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7. -6.1550 -1.4290 1.2000 -0.1425 2.4190 3.1630 > s!mmar%(%) 5"n. 1st 6!. 5e)"an 5ean 3r) 6!. 5a7. -11.4900 -4.0060 0.1051 -1.2060 9.2600 10.4200 > &l't(7,%,a7es+>$2?<,c'l+2) > a7"s(1,&'s+c(0,0),at+seq(-1,5,b%+1)) > a7"s(2,&'s+c(0,0),at+seq(-19,11,b%+2)) > In the previous e.ample the at option is used to specify the tick marks. When using the plot command the default behavior is to draw an a.is as well as draw a bo. around the plotting area. The drawing of the bo. can be suppressed using the bty option. The value can be !o+$ !l+$ !Q+$ !c+$ !u$+ !S+$ or !n.$ (The lines drawn roughly look like the letter given e.cept for !n$ which draws no lines.) The bo. can be drawn later using the bo command as well. > 7 <- rn'rm(10,mean+0,s)+4) > % <- 3*7-1+rn'rm(10,mean+0,s)+2) > &l't(7,%,bt%+"1") > &l't(7,%,bt%+"n") > b'7(lt%+3) > The par command can be used to set the default values for various parameters. , couple are given below. In the e.ample below the default background is set to grey+ no bo. will be drawn around the window+ and the margins for the a.es will be twice the normal size. > &ar(bt%+"l") > &ar(b=+"=ra%") > &ar(me7+2) > 7 <- rn'rm(10,mean+0,s)+4) > % <- 3*7-1+rn'rm(10,mean+0,s)+2) > &l't(7,%) > ,nother common task is to place a te.t string on the plot. The tet command takes a coordinate and a label+ and it places the label at the given coordinate. The tet command has options for setting the offset+ size+ font+ and other options. In the e.ample below the label !numbers?$ is placed on the plot. :se help%tet& to see more options. > 7 <- rn'rm(10,mean+0,s)+4) > % <- 3*7-1+rn'rm(10,mean+0,s)+2) > &l't(7,%) > te7t(-1,-2,"n!mbersS") > The default te.t command will cut off any characters outside of the plot area. This behavior can be overridden using the p! option. > 7 <- rn'rm(10,mean+0,s)+4) > % <- 3*7-1+rn'rm(10,mean+0,s)+2) > &l't(7,%) > te7t(-1,-2,"'!ts")e t*e area",7&)+:0;<) +. )nde&in% )nto Vectors Contents Inde.ing With Gogicals *ot ,vailable or <issing Values Indices With Gogical -.pression Fiven a vector of data one common task is to isolate particular entries or censor items that meet some criteria. 6ere we show how to use RKs inde.ing notation to pick out specific items within a vector. +.1. )nde&in% ,ith -o%icals We first give an e.ample of how to select specific items in a vector. The first step is to define a vector of data+ and the second step is to define a vector made up of logical values. When the vector of logical values is used for the inde. into the vector of data values only the items corresponding to the variables that evaluate to TRUE are returned > a <- c(1,2,3,4,5) > b <- c(:0;<,>$2?<,>$2?<,:0;<,>$2?<) > a[b] [1] 1 4 > ma7(a[b]) [1] 4 > s!m(a[b]) [1] 5 +.2. Not ./ailable or *issin% Values 4ne common problem is data entries that are marked -A or not available. There is a predefined variable called -A that can be used to indicate missing information. The problem with this is that some functions throw an error if one of the entries in the data is *,. 5ome functions allow you to ignore the missing values through special options > a <- c(1,2,3,4,#$) > a [1] 1 2 3 4 #$ > s!m(a) [1] #$ > s!m(a,na.rm+:0;<) [1] 10 There are other times+ though+ when this option is not available+ or you simply want to censor them. The is.na function can be used to determine which items are not available. The logical !not$ operator in R is the . symbol. When used with the inde.ing notation the items within a vector that are -A can be easily removed > a <- c(1,2,3,4,#$) > "s.na(a) [1] >$2?< >$2?< >$2?< >$2?< :0;< > S"s.na(a) [1] :0;< :0;< :0;< :0;< >$2?< > a[S"s.na(a)] [1] 1 2 3 4 > b <- a[S"s.na(a)] > b [1] 1 2 3 4 +.3. )ndices ,ith -o%ical 0&pression ,ny logical e.pression can be used as an inde. which opens a wide range of possibilities. /or e.ample+ you can remove or focus on entries that match specific criteria. /or e.ample+ you might want to remove all entries that are above a certain value > a + c(6,2,5,3,9,2) > a [1] 6 2 5 3 9 2 > b + a[a<6] > b [1] 2 5 3 2 /or another e.ample+ suppose you want to 'oin together the values that match two different factors in another vector > ) + )ata.(rame('ne+as.(act'r(c(MaM,MaM,MbM,MbM,McM,McM)), tA'+c(1,2,3,4,5,6)) > ) 'ne tA' 1 a 1 2 a 2 3 b 3 4 b 4 5 c 5 6 c 6 > b't* + ),tA'[(),'ne++MaM) @ (),'ne++MbM)] > b't* [1] 1 2 3 4 *ote that a single T@K was used in the previous e.ample. There is a difference between T@@K and T@K. , single bar will perform a vector operation+ term by term+ while a double bar will evaluate to a single TR:- or /,G5- result > (c(:0;<,:0;<))@(c(>$2?<,:0;<)) [1] :0;< :0;< > (c(:0;<,:0;<))@@(c(>$2?<,:0;<)) [1] :0;< > (c(:0;<,:0;<))T(c(>$2?<,:0;<)) [1] >$2?< :0;< > (c(:0;<,:0;<))TT(c(>$2?<,:0;<)) [1] >$2?<