R Data Type

You might also like

Download as pdf
Download as pdf
You are on page 1of 5
Data Types In the last lesson, we learned about two data types: vectors and data frames. We also learned about two different classes of vectors: numeric and factor. There are many other data types in R. Each has a special use, and to be productive in R, you need to be familiar with the major types and the operations on these types. Primitive Types Each R object has a un underlying “type”, which determines the set of possible values for that object. You can find the type of an object using the typeot function. The main types include the following: + logicai: a logical value. TRUE [ay TRUE FALSE [a] FALSE TRUE | FALSE # logical [a] TRUE TRUE & FALSE # logical ‘and™ [1] FALSE ITRUE_# logical ‘not’ TH] FALSE + integer: an integer (positive or negative). Many R programmers do not use this mode since every integer value can be represented as a double. ‘IL_# suffix integers with an L to distinguish then from doubles Ga “7 (7 1L:10L_# range of values G12 3 45 67 8 910 110 @ (L suffix is optional) 12345 67 8 910 702 modulo (renainder) Gia ‘7%/K2_# integer division Ty * doubie: a real number stored in “double-precision floatint point format.” 1 ta 3.14 Ta 314 (G + 8/2) *7_# arithmetic operations (a) -49 2/10 _# exponentiation [a] 1024 ‘A double type can store the special values inf, -ine, and nay, which represent “positive infinity,” “negative infinity,” and “not a number”: 1/8 [a inf =1/8 [a] -Inf 2/8 [a] Wan + conptex: a complex number 1 suffix [ay eH Qn [a] vei ‘sqrt(-1+8i) [a] ei to denote ‘imaginary’ * character: a sequence of characters, called a “string” in other programming languages Hello, world [a] "Hello, World! “abracadabra Ti] “abracadabra™ # denote a string with double quotes... ‘or with single quotes (both forms are equivalent). * List: a list of named values (discussed in detail in the next section) List(a = 10, b= 11, z= $a {4} 18 hello") $b ayaa sz [1] "helo" ‘* butltin, closure, special: a function or operator (for most purposes, the distinctions between these are not important) typeof (sart) [a] "builtin® ‘typeof (read. csv) Ti] “closure” ‘typeof <=") [1] “special” + nuit: a special type with only one possible value, known as nut ‘typeof (NULL) Ta] "NULL" ‘This is not an exhaustive list, but the other modes are exotic and you probably won't ever encounter them. Missing Values One unique feature of R is its support for “Not Applicable” or “Missing” values. The logical, integer, double, complex, and character types can all represent missing values, using the special constant na. Conversions Often, you don’t need to worry too much about the types, because R will implicitly convert between types for you. For example, consider the following sequence of ‘commands x < ie xf[2]] 3.34 When the first line gets executed, x gets created as an integer vector. In the second line, R converts x to a double vector so that it can store the value 3.14. its, A “list” is a primitive type that stores a sequence of values, along with optional names for these values. The power of the list type is that it allows you to represent complicated objects. We construct lists using the 11st function: abe <- List(First.nane = “Abraham” height.in = 76.8) Last.nane = “Lincoln”, weight.1b = 180, In this example, abe is a list with four elements, with names first.nane, 1ast.nane, weight. 1b, and height in. We access the elements of a list using double square brackets. We can either specify the index of the element abet (21 TH "Abrahan™ abe[[2]] Ti] “Lincoin™ or we can specify the name abe[["Finst.nane"T] TE] “Abraham” abe[["last.nare”]] [1] “Lincoin™ Another way to access an element by name is to use the § operator: abesheight (76.8 abe$weight [2] 180 Both forms (ave[("First.name"]] and abe$first.nane) are equivalent, but the $ form is more common. ‘As with vectors, we get the number of elements with the length function: length(abe) nia We slice lists with single square brackets: abe[1:2] $First.nane [2] “Abraham” Slast.nane [2] “Lincoln” abe[1] $Finst name [2] “Abrah. For a vector, the slice (1) is logically equivalent to the element ((2]}, but for a list, these entities are distinct. ‘We can delete a particular element of a list by assigning it the value nuut: ‘abe[["last.nane™]] <= NULL This removes the element, and shifts the indexes of subsequent elements abel [21] T1180 abe([3]] [2] 76.8 Classes ‘Two types we saw in the previous lesson are not primitive: data frames and factors. In fact, a data frame is a special type of list, and a factor is a special type of integer vector. These special types are known as “classes”. Every R object is a member of one or more classes. To find these classes, use the ctass function: class (TRUE) [1] “Togical” class(1L) [2] “integer” class (3.14) TET “numeric” (Confusingly, the class for double objects is not called doubte; it is called runerse.) A data frame is a list whose elements are vectors, each with the same length. A factor is an integer vector taking values in the range 1..n, with each integer corresponding to a certain level. R distinguishes between these types and their underlying representations by assigning them to different classes. bikedata <- read.csv("bikedata.csv") ‘typeof (bikedata) [2] “list™ class (bikedata) [1] “data. frane” typeof (bikedatagcolour) TE “integer” class (bikedatagcolour) [2] “factor” ‘The power of classes is that they allow you to change how certain functions behave. Compare the following two otputs: sunmary(bikedatascolour) Black Slue Green Grey Other Red White NA"s 262636 149 5315237833314 ‘Surmary(unclass(bikedata$colour)) Min. Ist Qu. Median Nean 3rd Qu. Max. NA'S 1.00 2.00 4.00 3.836.007.0014 Here, unciass is a function that converts to the underlying primitive type. When we summarize an object with class facto-, we report counts for the levels; when we summarize an object with class integer, we report quartiles and other statistics. Advanced R programmers create new kinds of classes, along with specialized functions to act on these classes.

You might also like