Professional Documents
Culture Documents
02 Descriptive Statistics
02 Descriptive Statistics
Numerical Measures
1.Measures of Location
2.Measures of Variability
3.The Weighted Mean
4.Working with Grouped Data
Measures of Location
Numerical Measures
If the measures are computed for data from a sample,
they are called sample statistics.
If the measures are computed for data from a
population, they are called population parameters.
In statistical inference, a sample statistic is referred to as
the point estimator of the corresponding population
parameter
Mean
Perhaps the most important measure of location is the mean.
The mean provides a measure of data central location .
The mean of a data set is the average of all the data values.
The sample mean is the point estimator of the population
mean µ.
where: where:
Sxi = sum of the values of n observations Sxi = sum of the values of the N observations
n = number of observations in the sample N = number of observations in the population
Mean
Consider the following class size data for a sample of five college
classes:
46 54 42 46 32
Suppose we also compute the median starting salary for the 12 business
college graduates in previous Table. We first arrange the data in ascending order.
Because n=12 is even, we identify the middle two values: 3890 and 3920. The
median is the average of these values.
Mode
The mode of a data set is the value that occurs with greatest
frequency.
The greatest frequency can occur at two or more different values.
If the data have exactly two modes, the data are bimodal.
If the data have more than two modes, the data are multimodal.
The 85th percentile is the data value in the 11th position, or 4130.
Quartiles
It is often desirable to divide data into four parts, with each part
containing approximately one-fourth, or 25% of the observations. The
division points are referred to as the quartiles and are defined as
Let us refer to the data on starting salaries for business school graduates
below. The largest starting salary is 4325 and the smallest is 3710.
The range is 4325-3710 = 615.
Interquartile Range
A measure of variability that overcomes the dependency on
extreme values is the interquartile range (IQR).
The interquartile range is the range for the middle 50% of the
data.
For the data on monthly starting salaries above, the quartiles are Q3 = 4000
and Q1 = 3865. Thus, the interquartile range is 4000-3865 = 135.
Variance & Standard Deviation
The variance is a measure of variability that utilizes all the data.
It is based on the difference between the value of each observation
(xi) and the mean ( for a sample, m for a population).
The variance is useful in comparing the variability of two or more
variables.
SAMPLE VARIANCE POPULATION VARIANCE
s = 2 mm s = 2 mm
= 15 mm = 70 mm
Cov = 13,33% Cov = 2,86%
The Weighted Mean
Weighted Mean
In the formulas for the sample mean and population mean, each x i is
given equal importance or weight. For instance, the formula for the
sample mean can be written as follows:
GPA=3,29
Working with Grouped Data
Grouped Data
In most cases, measures of location and variability are computed by
using the individual data values. Sometimes, however, data are
available only in a grouped or frequency distribution form.
Grouped Data
Grouped Data
Note:
The class midpoints, Mi, halfway between the class limits, the first class of 10 –14 has
a midpoint at (14-10)/2=12.