Data Types
In the last lesson, we learned about two data types: vectors and data frames. We also
learned about two different classes of vectors: numeric and factor. There are many
other data types in R. Each has a special use, and to be productive in R, you need to
be familiar with the major types and the operations on these types.
Primitive Types
Each R object has a un underlying “type”, which determines the set of possible values
for that object. You can find the type of an object using the typeot function.
The main types include the following:
+ logicai: a logical value.
TRUE
[ay TRUE
FALSE
[a] FALSE
TRUE | FALSE # logical
[a] TRUE
TRUE & FALSE # logical ‘and™
[1] FALSE
ITRUE_# logical ‘not’
TH] FALSE
+ integer: an integer (positive or negative). Many R programmers do not use this
mode since every integer value can be represented as a double.
‘IL_# suffix integers with an L to distinguish then from doubles
Ga
“7
(7
1L:10L_# range of values
G12 3 45 67 8 910
110 @ (L suffix is optional)
12345 67 8 910
702 modulo (renainder)
Gia
‘7%/K2_# integer division
Ty
* doubie: a real number stored in “double-precision floatint point format.”
1
ta
3.14
Ta 314(G + 8/2) *7_# arithmetic operations
(a) -49
2/10 _# exponentiation
[a] 1024
‘A double type can store the special values inf, -ine, and nay, which represent
“positive infinity,” “negative infinity,” and “not a number”:
1/8
[a inf
=1/8
[a] -Inf
2/8
[a] Wan
+ conptex: a complex number
1 suffix
[ay eH
Qn
[a] vei
‘sqrt(-1+8i)
[a] ei
to denote ‘imaginary’
* character: a sequence of characters, called a “string” in other programming
languages
Hello, world
[a] "Hello, World!
“abracadabra
Ti] “abracadabra™
# denote a string with double quotes...
‘or with single quotes (both forms are equivalent).
* List: a list of named values (discussed in detail in the next section)
List(a = 10, b= 11, z=
$a
{4} 18
hello")
$b
ayaa
sz
[1] "helo"
‘* butltin, closure, special: a function or operator (for most purposes, the
distinctions between these are not important)
typeof (sart)
[a] "builtin®
‘typeof (read. csv)
Ti] “closure”
‘typeof <=")
[1] “special”+ nuit: a special type with only one possible value, known as nut
‘typeof (NULL)
Ta] "NULL"
‘This is not an exhaustive list, but the other modes are exotic and you probably won't
ever encounter them.
Missing Values
One unique feature of R is its support for “Not Applicable” or “Missing” values. The
logical, integer, double, complex, and character types can all represent missing values,
using the special constant na.
Conversions
Often, you don’t need to worry too much about the types, because R will implicitly
convert between types for you. For example, consider the following sequence of
‘commands
x < ie
xf[2]] 3.34
When the first line gets executed, x gets created as an integer vector. In the second
line, R converts x to a double vector so that it can store the value 3.14.
its,
A “list” is a primitive type that stores a sequence of values, along with optional names
for these values. The power of the list type is that it allows you to represent
complicated objects.
We construct lists using the 11st function:
abe <- List(First.nane = “Abraham”
height.in = 76.8)
Last.nane = “Lincoln”, weight.1b = 180,
In this example, abe is a list with four elements, with names first.nane, 1ast.nane,
weight. 1b, and height in.
We access the elements of a list using double square brackets. We can either
specify the index of the element
abet (21
TH "Abrahan™
abe[[2]]
Ti] “Lincoin™or we can specify the name
abe[["Finst.nane"T]
TE] “Abraham”
abe[["last.nare”]]
[1] “Lincoin™
Another way to access an element by name is to use the § operator:
abesheight
(76.8
abe$weight
[2] 180
Both forms (ave[("First.name"]] and abe$first.nane) are equivalent, but the $ form is
more common.
‘As with vectors, we get the number of elements with the length function:
length(abe)
nia
We slice lists with single square brackets:
abe[1:2]
$First.nane
[2] “Abraham”
Slast.nane
[2] “Lincoln”
abe[1]
$Finst name
[2] “Abrah.
For a vector, the slice (1) is logically equivalent to the element ((2]}, but for a list,
these entities are distinct.
‘We can delete a particular element of a list by assigning it the value nuut:
‘abe[["last.nane™]] <= NULL
This removes the element, and shifts the indexes of subsequent elements
abel [21]
T1180
abe([3]]
[2] 76.8
Classes‘Two types we saw in the previous lesson are not primitive: data frames and factors.
In fact, a data frame is a special type of list, and a factor is a special type of integer
vector. These special types are known as “classes”.
Every R object is a member of one or more classes. To find these classes, use the
ctass function:
class (TRUE)
[1] “Togical”
class(1L)
[2] “integer”
class (3.14)
TET “numeric”
(Confusingly, the class for double objects is not called doubte; it is called runerse.)
A data frame is a list whose elements are vectors, each with the same length. A
factor is an integer vector taking values in the range 1..n, with each integer
corresponding to a certain level. R distinguishes between these types and their
underlying representations by assigning them to different classes.
bikedata <- read.csv("bikedata.csv")
‘typeof (bikedata)
[2] “list™
class (bikedata)
[1] “data. frane”
typeof (bikedatagcolour)
TE “integer”
class (bikedatagcolour)
[2] “factor”
‘The power of classes is that they allow you to change how certain functions behave.
Compare the following two otputs:
sunmary(bikedatascolour)
Black Slue Green Grey Other Red White NA"s
262636 149 5315237833314
‘Surmary(unclass(bikedata$colour))
Min. Ist Qu. Median Nean 3rd Qu. Max. NA'S
1.00 2.00 4.00 3.836.007.0014
Here, unciass is a function that converts to the underlying primitive type. When we
summarize an object with class facto-, we report counts for the levels; when we
summarize an object with class integer, we report quartiles and other statistics.
Advanced R programmers create new kinds of classes, along with specialized
functions to act on these classes.