In the statistical analysis examples I provide, I typically deal with best-case
sce- narios in which the data sets are in good shape and have all the data they’re sup- posed to have. In the real world, however, things don’t always go so smoothly. Oftentimes, you encounter data sets that have values missing for one reason or another. R denotes a missing value as NA (for Not Available). For example, here is some data (from a much larger data set) on the luggage capacity, in cubic feet, of nine vehicles: capacity <- c(14,13,14,13,16,NA,NA,20,NA) Three of the vehicles are vans, and the term luggage capacity doesn’t apply to them — hence, the three instances of NA . Here’s what happens when you try to find the average of this group: > mean(capacity) [1] NA To find the mean, you have to remove the NA s before you calculate: > mean(capacity, na.rm=TRUE) [1] 15 So the rm in na.rm means “remove” and =TRUE means “get it done.” Just in case you ever have to check a set of scores for missing data, the is,na() function does that for you: > is.na(capacity) [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE