Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

ST104: Statistical Laboratory WARWICK

Lecture 2
Samuel Touchard
1/22
Basic Data Types

The most common data types in R are::

I Numeric: Decimal values are called numerics in R. It is the default computational data
type.
I Character: A piece of text is represented as a sequence of characters (letters, numbers,
and symbols)
I Logical: TRUE and FALSE values
I Factor: A categorical value, also called a level of a categorical variable

2/22
Numeric
If we assign a decimal value to a variable x as follows, x will be of numeric type.

> x = 10.5 # assign a decimal value


> x # print the value of x
[1] 10.5
> class(x) # print the class name of x
[1] "numeric"

Even if we assign an integer to a variable k, it is still being saved as a numeric value.

> k = 1
> k # print the value of k
[1] 1
> class(k) # print the class name of k
[1] "numeric"

3/22
Integer
Integers are values that can be written without a fractional component.
In order to create an integer variable in R, we call the as.integer() function.

> y = as.integer(3)
> y # print the value of y
[1] 3
> class(y) # print the class name of y
[1] "integer"
> is.integer(y) # is y an integer?
[1] TRUE

We can also coerce a numeric value into an integer with the as.integer() function.

> as.integer(3.14) # coerce a numeric value


[1] 3

4/22
Complex
A complex number is a value that can be expressed in the form a + bi, where a and b are
numeric values.
A complex value in R is defined via the imaginary value i.
> z = 1 + 2i # create a complex number
> z # print the value of z
[1] 1+2i
> class(z) # print the class name of z
[1] "complex"
The following gives an error because −1 is not stored as a complex number.
> sqrt(-1) # square root of -1
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced
Note: Try sqrt(as.complex(-1)).

5/22
Logical
A logical value is often created via comparison between variables.
> x = 1; y = 2 # sample values
> z = x > y # is x larger than y?
> z # print the logical value
[1] FALSE
> class(z) # print the class name of z
[1] "logical"

> u = TRUE; v = FALSE


> u & v # u AND v
[1] FALSE
> u | v # u OR v
[1] TRUE
> !u # negation of u
[1] FALSE

6/22
Logical Operators

Operator Description
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== exactly equal to
!= not equal to
!x not x
x|y x or y
x&y x and y

7/22
Exercise

Task
Consider typing the following example in R:

> x <- c(1,2,3,4,5)


> x[(x>3) | (x<2)]
> y<-c(8, 4, 10, 2, 10, 1, 7, 5, 10, 5)
> y[(y>4)&(y<8)]
What will R return?

8/22
Character

A character object is used to represent string values in R.

> x <- "Department of Statistics"


> x
[1] "Department of Statistics"
> x <- "Samuel"
> y <- "Touchard"
> paste(x,y)
[1] "Samuel Touchard"
> x+y
Error in x + y : non-numeric argument to binary operator

9/22
Character

> sprintf("We are here for %s. This is part number %s of week %s.
Samuel is a %s lecturer.", "ST104", 2, 1, "good")
[1] "We are here for ST104. This is part number 2 of week 1. Samuel is a good lecturer."
> sub("good", "bad", "Samuel is a good lecturer.")
[1] "Samuel is a bad lecturer."

10/22
Coercion

> as.numeric(TRUE)
[1] 1
> as.numeric(FALSE)
[1] 0
> as.character(3.14)
[1] "3.14"
> as.numeric("4.5")
[1] 4.5
> as.integer("5.27")
[1] 5
> as.numeric("Hello")
[1] NA

11/22
Factor

I Factors refer to categorical variables.


I They may look like integers or character values.
I It is of much greater interest to discuss about the different levels of a categorical variables
rather than just a single value.
I Hence why we will wait a bit to discuss these, until we have learnt about vectors.

12/22
Vectors
A vector is a sequence of data elements of the same basic type.
> c(2, 3, 5) # numeric values
[1] 2 3 5

> c(TRUE, FALSE, TRUE, FALSE, FALSE) # logical values.


[1] TRUE FALSE TRUE FALSE FALSE

> c("aa", "bb", "cc", "dd", "ee") # character strings


[1] "aa" "bb" "cc" "dd" "ee"

> x <- c(1, 2, 3, 4, 5); x[2] <- "hat"


> x
[1] "1" "hat" "3" "4" "5"
> class(x)
[1] "character"

13/22
Sequences
You can get some simple sequences as a vector using:

> -2:5
[1] -2 -1 0 1 2 3 4 5
> 12:4
[1] 12 11 10 9 8 7 6 5 4

seq() gives general arithmetic sequences

> seq(from=2,to=4,by=0.5)
[1] 2.0 2.5 3.0 3.5 4.0
> seq(to=4, by=0.5, length.out=5)
[1] 2.0 2.5 3.0 3.5 4.0
> seq(from=2,to=4,length.out=5)
[1] 2.0 2.5 3.0 3.5 4.0

You can specify any three of from, to, by, length.out.

14/22
Replicating
Besides seq(), another useful command is rep():

> rep(5, 3)
[1] 5 5 5
> rep(1:4, 3)
[1] 1 2 3 4 1 2 3 4 1 2 3 4
> rep(1:4, each=3)
[1] 1 1 1 2 2 2 3 3 3 4 4 4
> x <- rep(1:4, times=4:1)
> x
[1] 1 1 1 1 2 2 2 3 3 4
> length(x)
[1] 10

Note: You can use length(x) to get the total number of elements.

15/22
Exercise

Task
Which commands would you type to get the following outputs:
[1] 8 5 2 -1 -4

[1] 1 1 2 2 3 3 4 4 5 5

[1] 1 2 3 4 5 1 2 3 4 5

[1] -1 3 -5 7 -9 11 -13 15

16/22
Functions
seq() is an example of a function. Everything in R is done with functions, even things that
don’t look like functions (e.g. :, ?). If you type a function’s name you can see its code.

> seq
function (...)
UseMethod("seq")
<bytecode: 0x7f8d45793060>
<environment: namespace:base>

seq is a method, so it does different things depending on what sort of object you give it as
input. Try typing seq.default to see what seq() does for most cases. Many basic functions
in R use lower-level code for speed. To find out what a function does, use ?.

> ?seq
> ?rep
> ?length

17/22
Help for seq

18/22
Some Other Useful Mathematical Functions
A non-exhaustive list:
I exp, log, log2, log10
I sqrt, abs, min, max
I sin, cos, tan, asin, acos, atan
I sinh, cosh, tanh, asinh, acosh, atanh
I sum, prod, cumsum

Most of these are vectorised. For example:

> x <- c(0, pi/2, pi)


> sin(x)
[1] 0.000000e+00 1.000000e+00 1.224647e-16

19/22
Random Numbers
There are a lot of functions for generating independent random numbers. For example,
runif() gives uniforms, and rnorm() gives standard normals.

> runif(10)
[1] 0.2875775 0.7883051 0.4089769 0.8830174 0.9404673
[6] 0.0455565 0.5281055 0.8924190 0.5514350 0.4566147
> rnorm(8)
[1] 1.7150650 0.4609162 -1.2650612 -0.6868529 -0.4456620
[6] 1.2240818 0.3598138 0.4007715

Computers generate pseudo-random numbers by applying a complicated function to some seed


quantity, usually the exact time. For most purposes this is fine. You can ‘fix’ the random seed
for replication:

> set.seed(10529) # put any positive integer

20/22
Sampling

The function sample takes a sample of a specified size from the various elements of a vector.
It can be done with or without replacement.

> x<-c(1:10)
> sample(x,4)
[1] 1 9 10 5
> sample(x,6,replace=TRUE)
[1] 5 2 3 8 3 1

21/22
Exercise

Task
Write an R script which will:
1. Generate a vector which will contain 10 independent Normal variables with mean 0 and
standard deviation 1.
2. Compute the mean and variance of the vector.
Note: After you have computed the mean and variance above, compare your answers with R’s
mean() and var() functions.

22/22

You might also like