Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

BUS 110- Applied Statistics

Chapter 3
Summary statistics: Measures of
Central Tendency and Variance
Question

There are 13 patients in the hospital. The average


temperature is 36.6℃.

Do you think that the patients in the hospital are overall


doing well?
Numerical Descriptive Measures Chapter 3

Learning Objectives

In this chapter, you learn:

➢ To describe the properties of central tendency, variation, and shape in


numerical data
➢ To construct and interpret a boxplot
➢ To compute descriptive summary measures for a population
➢ To compute the coefficient of correlation
Numerical Descriptive Measures
(Summary Statistics) Chapter 3

Describing Data Numerically

Central Tendency Quartiles Variation

Arithmetic Mean Range

Median Interquartile Range

Mode Variance

Standard Deviation

Coefficient of Variation
Central Tendency: The sample mean Chapter 3

The central tendency is the extent to which the data values group around a
typical or central value.

The sample mean: The arithmetic mean is the most common measure of
central tendency.

𝑋1 + 𝑋2 + ⋯ +𝑋𝑛 σ𝑛𝑖=1 𝑋𝑖
𝑋ത = =
𝑛 𝑛

Because all the values play an equal role, a mean is greatly affected by any
value that is greatly different from the others. When you have such extreme
values, you should avoid using the mean as a measure of central tendency.
Central Tendency: The median Chapter 3

The median is the middle value in an ordered array of data that has been
ranked from smallest to largest. Half the values are smaller than or equal to
the median, and half the values are larger than or equal to the median.
The location of the median:

➢ If the number of values is odd, the median is the middle number


➢ If the number of values is even, the median is the average of the two
middle numbers

Note that this is not the value of the median, only the position of the
median in the ranked data
Central Tendency: The Mode Chapter 3

The mode is the value in a set of data that appears most frequently. Like the
median and unlike the mean, extreme values do not affect the mode. Often,
there is no mode or there are several modes in a set of data.
Quartiles Chapter 3

Quartiles split the ranked data into 4 segments with an equal number of
values per segment.

25% 25% 25% 25%

Q1 Q2 Q3

The first quartile, Q1, is the value for which 25% of the observations are
smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are larger)
Only 25% of the observations are greater than the third quartile(Q3)
Quartiles Chapter 3

Find a quartile by determining the value in the appropriate position in the


ranked data, where:

First quartile position: Q1 = (n+1)/4

Second quartile position: Q2 = (n+1)/2 (the median position)

Third quartile position: Q3 = 3(n+1)/4

where n is the number of observed values


Quartiles Chapter 3

Rule 1 If the ranked value is a whole number, the quartile is equal to the
measurement.

Rule 2 If the ranked value is a fractional half (2.5, 4.5, etc.), the quartile is equal
to the measurement that corresponds to the average of the measurements
corresponding to the two ranked values involved.

Rule 3 If the ranked value is neither a whole number nor a fractional half, you
round the result to the nearest integer and select the measurement
corresponding to that ranked value.
Measures of Variation Chapter 3

Variation measures the spread, or dispersion, of values in a data set.


The Range. The range is equal to the largest value minus the smallest value.
𝑹 = 𝑿𝒎𝒂𝒙 − 𝑿𝒎𝒊𝒏
The range measures the total spread in the set of data.

Disadvantages of the Range


- Ignores the way in which data are distributed
- Sensitive to outliers
Interquartile Range (IQR) Chapter 3

IQR can be used to eliminate some outlier problems.

➢ Eliminate some high- and low-valued observations and calculate the


range from the remaining values

➢ Interquartile range = 3rd quartile – 1st quartile


𝑰𝑸𝑹 = 𝑸𝟑 − 𝑸𝟏
The Variance and the Standard Deviation Chapter 3

Two commonly used measures of variation that take into account how all the
data values are distributed are the variance and the standard deviation.

The sample variance is the sum of the squared differences around the mean
divided by the sample size minus 1.

(𝑋1 − ത 2 +(𝑋2 − 𝑋)
𝑋) ത 2 σ𝑛𝑖=1(𝑋𝑖 − 𝑋)
ത 2 + ⋯ + (𝑋𝑛 − 𝑋) ത 2
𝑠2 = =
𝑛−1 𝑛−1
The sample standard deviation is the square root of the sum of the squared
differences around the mean divided by the sample size minus 1.

ത 2
σ𝑛𝑖=1(𝑋𝑖 − 𝑋)
𝑆= 𝑆2 =
𝑛−1
The Coefficient of Variation Chapter 3

The coefficient of variation is equal to the standard deviation divided by the


mean, multiplied by 100%.

𝑆
𝐶𝑉 = ∙ 100%
𝑋ത

Measures relative variation

Shows variation relative to mean

Can be used to compare two or more sets of data measured in different units
Z-scores Chapter 3

An extreme value or outlier is a value located far away from the mean. The Z
score, which is the difference between the value and the mean, divided by
the standard deviation, is useful in identifying outliers.

𝑋 − 𝑋ത
𝑍=
𝑠

Values located far away from the mean will have either very small (negative)

Z scores or very large (positive) Z scores

As a general rule, a Z score is considered an outlier if it is less than -3 or


greater than +3
Shape of a Distribution Chapter 3

Shape is the pattern of the distribution of data values throughout the entire
range of all the values. A distribution is either symmetrical or skewed.
In a symmetrical distribution, the values below the mean are distributed in
exactly the same way as the values above the mean. In this case, the low and
high values balance each other out. In a skewed distribution, the values are not
symmetrical around the mean.
Shape also can influence the relationship of the mean to the median.

Mean < median: negative, or left- Mean = median: symmetric, Mean > median: positive, or
skewed or zero skewness right-skewed
Shape of a Distribution
mean: 1.4,
median: 1
Shape of a Distribution: which sport is this ?
mean: 107.38
median: 107
Shape of a Distribution

mean: 71
median: 75
what is the shape of distribution ?
The Five-Number Summary Chapter 3

A five-number summary, which consists of the following, provides a way to determine


the shape of a distribution:
Box-and-Whisker Plot:

𝑋𝑚𝑖𝑛 𝑄1 Median (𝑄2 ) 𝑄3 𝑋𝑚𝑎𝑥

25% 25% 25% 25%

17 32 47 55 68

IQR = 55 – 32 = 23
Relationships Among the Five-Number Summary
and the Type of Distribution Chapter 3
Numerical Measures for a Population Chapter 3

Population summary measures are called parameters


The population mean is the sum of the values in the population divided by the
population size, 𝑁

𝑋1 + 𝑋2 + ⋯ +𝑋𝑁 σ𝑁
𝑖=1 𝑋𝑖
𝜇= =
𝑁 𝑁

Where:
𝜇 = population mean
𝑁 = population size
𝑋𝑖 = 𝑖 th value of the variable 𝑋
Population Variance and Standard Deviation Chapter 3

Average of squared deviations of values from the population mean is called


population variance

2 +(𝑋 − 𝜇)2 + ⋯ + (𝑋 − 𝜇)2 σ𝑁 2


(𝑋1 − 𝜇) 2 𝑛 𝑖=1(𝑋𝑖 − 𝜇)
Population variance 𝜎 2 = =
𝑁 𝑁

σ𝑛𝑖=1(𝑋𝑖 − 𝜇)2
The population standard deviation 𝜎 = 𝜎2 =
𝑁
Where 𝜇 = population mean
𝑁 = population size
𝑋𝑖 = 𝑖 th value of the variable 𝑋
The Empirical Rule Chapter 3

If the data distribution is approximately bell-shaped, then the interval:


𝜇 ± 1𝜎 contains about 68% of the values in the population or the sample

𝜇 ± 2𝜎 contains about 95% of the values in the population or the sample

𝜇 ± 3𝜎 contains about 99.7% of the values in the population or the sample

68% 95% 99.7%

𝜇 𝜇 𝜇

𝜇 ± 1𝜎 𝜇 ± 2𝜎 𝜇 ± 3𝜎
The Empirical Rule: How smart you are?
mean: 531
SD: 114
The Chebyshev Rule Chapter 3

The Chebyshev rule states that for any data set, regardless of shape, the
percentage of values that are found within distances of k standard deviations
from the mean must be at least
1
1 − 2 ∙ 100%
𝑘
The Chebyshev rule is very general and applies to any distribution. The rule
indicates at least what percentage of the values fall within a given distance
from the mean. However, if the data set is approximately bell-shaped, the
empirical rule will more accurately reflect the greater concentration of data
close to the mean

% of Values Found in Intervals


Around the Mean
Chebyshev Empirical Rule
Interval (any (bell-shaped
distribution) distribution)
(𝜇 − 𝜎, 𝜇 + 𝜎) At least 0% Approximately 68%
(𝜇 − 2𝜎, 𝜇 + 2𝜎) At least 75% Approximately 95%
(𝜇 − 3𝜎, 𝜇 + 3𝜎) At least 88.89% Approximately 99.7%

You might also like