Summarizing Objects: SEC-2 Ch-4 Data: Descriptive Statistics and Tabulation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

SEC-2

Ch-4 Data: Descriptive Statistics and Tabulation

1. Summarizing objects
1. The summary() command is a general command that provides a sum-
mary of an object.
2. If you have numerical data, then you get a numerical summary (for
example, mean, max, min) but if the data are text, you get a note of
how many different items you have.
3. The summary() command works for both matrix and data frame objects
by summarizing the columns rather than the rows.
4. Using the summary() command on the result of many analytical routines
produces a special summary suited to the kind of analysis performed.

2. Summarizing samples
1. Numerical samples can be summarized by many commands. Following
commands that produce a single value as a summary statistic:

Command Explanation
max(x, na.rm = F ALSE) Shows the maximum value. By default NA values
are not removed. A value of NA is considered the
largest unless na.rm = TRUE is used.
min(x, na.rm = F ALSE) Shows the minimum value in a vector. If there are
NA values, this returns a value of NA unless
na.rm = TRUE is used.
length(x) Gives the length of the vector and includes any NA
values. The na.rm = instruction does not work with
this command.
sum(x, na.rm = F ALSE) Shows the sum of the vector elements.
mean(x, na.rm = F ALSE) Shows the arithmetic mean.
median(x, na.rm = F ALSE) Shows the median value of the vector.
sd(x, na.rm = F ALSE) Shows the standard deviation.
var(x, na.rm = F ALSE) Shows the variance.
mad(x, na.rm = F ALSE) Shows the median absolute deviation.

2. The quantile() command, for example, produces five values as its result
(the five basic quartiles).
3. You can use the na.omit() command to strip out NA items. Essentially,
you use this to temporarily remove NA items like so:
length(na.omit(object.name))
4. Summary commands that can be applied to data frames:
2

Command Explanation
max(f rame) The largest value in the entire data frame
min(f rame) The smallest value in the entire data frame
sum(f rame) The sum of the entire data frame
f ivenum(f rame)The Tukey’s five number summary for the entire data frame:
minimum, first(or lower)quartile, median
(or second quartile), upper(or third) quartile, maximum
length(f rame) The number of columns in the data frame
summary(f rame) Gives summary for each column

3. Cumulative statistics
1. Following commands produce cumulative values:

Command Explanation
cumsum(x) The cumulative sum of a vector
cummax(x) The cumulative maximum value
cummin(x) The cumulative minimum value
cumprod(x) The cumulative product

2. The seq along() command creates a simple index. For more details, read
page- 117 of referred book.
3. These commands can be combined and used to create a range of cumu-
lative statistics. For example the running mean:
cumsum(my.data)/seq(along = my.data)

4. Summarizing rows and columns


1. The rows and columns of two-dimensional data objects can be summa-
rized in various ways.
colSums() and rowSums(): Determines the sum of rows or columns for
a data frame, matrix, or table object.
colMeans() and rowMeans(): Determines the mean of rows or columns
for a data frame, matrix, or table object.
2. The apply() command is more flexible in that any function can be ap-
plied to the columns (default) or rows of a data frame or matrix. The
lapply() and sapply() commands are similar but are designed to work
with list objects.

You might also like