Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 1

Missing data

In the statistical analysis examples I provide, I typically deal with best-case


sce-
narios in which the data sets are in good shape and have all the data they’re sup-
posed to have.
In the real world, however, things don’t always go so smoothly. Oftentimes, you
encounter data sets that have values missing for one reason or another. R denotes
a missing value as NA (for Not Available).
For example, here is some data (from a much larger data set) on the luggage
capacity, in cubic feet, of nine vehicles:
capacity <- c(14,13,14,13,16,NA,NA,20,NA)
Three of the vehicles are vans, and the term luggage capacity doesn’t apply to
them — hence, the three instances of NA . Here’s what happens when you try to
find the average of this group:
> mean(capacity)
[1] NA
To find the mean, you have to remove the NA s before you calculate:
> mean(capacity, na.rm=TRUE)
[1] 15
So the rm in na.rm means “remove” and =TRUE means “get it done.”
Just in case you ever have to check a set of scores for missing data, the is,na()
function does that for you:
> is.na(capacity)
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE

You might also like