Professional Documents
Culture Documents
1 What Are Apply Functions?
1 What Are Apply Functions?
?apply
X is an array or matrix (this is the data that you will be performing the function on)
Margin specifies whether you want to apply the function across rows (1) or columns (2)
FUN is the function you want to use
2.1 apply examples
my.matrx is a matrix with 1-10 in column 1, 11-20 in column 2, and 21-30 in column 3. my.matrx
will be used to show some of the basic uses for the apply function.
apply(my.matrx, 1, sum)
## [1] 33 36 39 42 45 48 51 54 57 60
The apply function returned a vector containing the sums for each row.
apply(my.matrx, 2, length)
## [1] 10 10 10
What if instead, I wanted to find n-1 for each column? There isn’t a function in R to do this
automatically, so I can create my own function. If the function is simple, you can create it right
inside the arguments for apply. In the arguments I created a function that returns length - 1.
As you can see, the function correctly returned a vector of n-1 for each column.
2.1.5 Example 5: Vectors?
The previous examples showed several ways to use the apply function on a matrix. But what if I
wanted to loop through a vector instead? Will the apply function work?
If you run this function it will return the error: Error in apply(v, 1, sum) : dim(X) must have a
positive length. As you can see, this didn’t work because apply was expecting the data to have at
least two dimensions. If your data is a vector you need to use lapply, sapply, or vapply instead.
3 lapply, sapply, and vapply
lapply, sapply, and vapply are all functions that will loop a function through data in a list or vector.
First, try looking up lapply in the help section to see a description of all three function.
?lapply
lapply(X, FUN, …)
sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE)
vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE)
In this case, X is a vector or list, and FUN is the function you want to use. sapply and vapply
have extra arguments, but most of them have default values, so you don’t need to worry about
them. However, vapply requires another agrument called FUN.VALUE, which we will look at
later.
lapply(vec, sum)
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4
##
## [[5]]
## [1] 5
##
## [[6]]
## [1] 6
##
## [[7]]
## [1] 7
##
## [[8]]
## [1] 8
##
## [[9]]
## [1] 9
##
## [[10]]
## [1] 10
This function didn’t add up the values like we may have expected it to. This is because lapply
applies treats the vector like a list, and applies the function to each point in the vector.
Let’s try using a list instead
A<-c(1:9)
B<-c(1:12)
C<-c(1:15)
my.lst<-list(A,B,C)
lapply(my.lst, sum)
## [[1]]
## [1] 45
##
## [[2]]
## [1] 78
##
## [[3]]
## [1] 120
This time, the lapply function seemed to work better. The function summed each vector in the list
and returned a list of the 3 sums.
3.0.2 Example 2: sapply
sapply works just like lapply, but will simplify the output if possible. This means that instead of
returning a list like lapply, it will return a vector instead if the data is simplifiable.
sapply(vec, sum)
## [1] 1 2 3 4 5 6 7 8 9 10
sapply(my.lst, sum)
## [1] 45 78 120
See how these two examples gave the same answers, but returned a vector instead?
3.0.3 Example 3: vapply
vapply is similar to sapply, but it requires you to specify what type of data you are expecting the
arguments for vapply are vapply(X, FUN, FUN.VALUE). FUN.VALUE is where you specify the
type of data you are expecting. I am expecting each item in the list to return a single numeric
value, so FUN.VALUE = numeric(1).
If your function were to return more than one numeric value, FUN.VALUE = numeric(1) will cause
the function to return an error. This could be useful if you are expecting only one result per
subject.
4 tapply
Sometimes you may want to perform the apply function on some data, but have it separated by
factor. In that case, you should use tapply. Let’s take a look at the information for tapply.
?tapply
The arguments for tapply are tapply(X, INDEX, FUN). The only new argument is INDEX, which is
the factor you want to use to separate the data.
Now let’s use column 1 as the index and find the mean of column 2
?mapply
the arguments for mapply are mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE,
USE.NAMES = TRUE). First you list the function, followed by the vectors you are using the rest
of the arguments have default values so they don’t need to be changed for now. When you have
a function that takes 2 arguments, the first vector goes into the first argument and the second
vector goes into the second argument.
#install.packages("MASS")
library(MASS)
Let’s look at the data we will be using. We will be using the state.x77 dataset
head(state.x77)
## Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama 3615 3624 2.1 69.05 15.1 41.3 20
## Alaska 365 6315 1.5 69.31 11.3 66.7 152
## Arizona 2212 4530 1.8 70.55 7.8 58.1 15
## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65
## California 21198 5114 1.1 71.71 10.3 62.6 20
## Colorado 2541 4884 0.7 72.06 6.8 63.9 166
## Area
## Alabama 50708
## Alaska 566432
## Arizona 113417
## Arkansas 51945
## California 156361
## Colorado 103766
str(state.x77)
## num [1:50, 1:8] 3615 365 2212 2110 21198 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
## ..$ : chr [1:8] "Population" "Income" "Illiteracy" "Life Exp" ...
All the data in the dataset happens to be numeric, which is necessary when the function inside
the apply function requires numeric data.
apply(state.x77, 2, mean)
## Population Income Illiteracy Life Exp Murder HS Grad
## 4246.4200 4435.8000 1.1700 70.8786 7.3780 53.1080
## Frost Area
## 104.4600 70735.8800
apply(state.x77, 2, median)
## Population Income Illiteracy Life Exp Murder HS Grad
## 2838.500 4519.000 0.950 70.675 6.850 53.250
## Frost Area
## 114.500 54277.000
apply(state.x77, 2, sd)
## Population Income Illiteracy Life Exp Murder
## 4.464491e+03 6.144699e+02 6.095331e-01 1.342394e+00 3.691540e+00
## HS Grad Frost Area
## 8.076998e+00 5.198085e+01 8.532730e+04