Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

DESCRIPTIVE STATISTICS

Numerical Descriptive Measures


Learning Objectives

In this module, you will recall:


▪ To describe the properties of central tendency,
variation and shape in numerical data
▪ To calculate descriptive summary measures for
a sample and a population
Summary Definitions
▪ The central tendency is the extent to which all
the data values group around a typical or
central value.
▪ The variation is the amount of dispersion, or
scattering, of values
▪ The shape is the pattern of the distribution of
values from the lowest value to the highest
value.
Measures of Central Tendency
The Arithmetic Mean
▪ The arithmetic mean (mean) is the most common
measure of central tendency.

For a sample of size n:

X i
X1 + X2 +  + Xn
X= i=1
=
n n
Sample size Observed values
Measures of Central Tendency
The Arithmetic Mean
▪ The most common measure of central tendency
▪ Mean = sum of values divided by the number of values
▪ Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 4 Mean = 4
2 + 3 + 4 + 5 + 6 20 1 + 2 + 3 + 4 + 10 20
= =4 = =4
5 5 5 5
Properties of the Mean
• The mean of a set of numerical data is unique.
• It is the only measure of central tendency where the
sum of the deviation of each value from the mean will
always be zero.
• It includes precise information from every score, hence,
it is affected by a change in any score.
• It is affected by extreme values.
• The mean of separate distribution can be combined to
get the mean of the total distribution.
Measures of Central Tendency
Theis theMedian
▪ In an ordered array, the median “middle” number (50% above,
50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 5 Median = 5.5

▪ Not affected by extreme values


Measures of Central Tendency
Locating the Median
▪ The median of an ordered set of data is located at the
n + 1 ranked value.
2

▪ If the number of values is odd, the median is the


middle number.
▪ If the number of values is even, the median is the
average of the two middle numbers.
▪ Note that n +1
2
is NOT the value of the median,
only the position of the median in the ranked data.
Properties of the Median
• The median of a set of data is unique.
• It is not affected by extremely large or extremely low
values.
• It is affected by the number of observations in the
distribution.
• Not all values in the distribution contribute to the value of
the median.
• It can be computed for ordinal, interval and ratio level data.
Measures of Central Tendency
The Mode
▪ Value that occurs most often
▪ Not affected by extreme values
▪ Used for categorical data
▪ Used for numerical data primarily when grouped
▪ There may be no mode
▪ There may be several modes

No Mode
Measures of Central Tendency
Review Example
House Prices: ▪ Mean = (PhP24,000,000/6)
PhP5,500,000 = PhP4,00,000
4,500,000
4,100,000
▪ Median = middle value of ranked data
3,800,000 = PhP3,950,000
3,600,000
2,500,000
▪ Mode = most frequent value
= none
Measures of Central Tendency
Which Measure to Choose?
▪ The mean is generally used, unless extreme values
(outliers) exist.

▪ The median is often used, since the median is not sensitive


to extreme values. For example, median home prices may
be reported for a region; it is less sensitive to outliers.
Measures of Position
•A measure of position, or quantile, is a general
descriptive measurement used to separate
quantitative data into distinct groups. To compute
quantiles of ungrouped data, the values must first be
arranged either in ascending or descending order.
• Quartiles divide the values into four groups of equal
size, each comprising 25% of observations. If n = 50,
25% of the values is less than or equal to Q1.
Measures of Position
• Deciles divide the values into ten groups of equal
size, each comprising 50% of observations.
If n = 50, 30% of the values is less than or equal to
D3.
• Percentiles divide the values into 100 groups of
equal size, each comprising 1% of observations.
If n = 200, 65% of the values is less than or equal
to P65.
Compute the required quantile.
Below are the average grades of 15 randomly selected
freshmen applicants:
85.3 90.7 75.6 82.3 95.6
88.3 93.2 88.1 77.0 79.3
85.9 91.7 93.2 89.8 79.5

Find: 1) Q1 2) Q3 3) D8
4) D4 5) P75 6) P65
From the sample of 15 applicants,
• If the school decides to accept applicants who belong to
the top 60%, how many will be accepted?
• If the school decides to give a refresher course to the
lowest 30%, how many will be required to enroll?
• If the school decides to provide scholarships to the top
20%, how many will be benefited?
Measures of Variation
▪ Variation measures the spread, or dispersion, of
values in a data set.
▪ Range
▪ Quartile Deviation
▪ Variance
▪ Standard Deviation
▪ Coefficient of Variation
Measures of Variability

• It measures the difference of each value around the


mean.
• It functions as a measure of risk or uncertainty in the field
of finance.
• It provides measure of volatility in considering
alternatives for pricing commodities.
• It may be used as a measure of error in the field of
forecasting.
Measures of Variability: Range and
Quartile Deviation
▪ The range of a set of data with n observations is
defined as the difference between the highest and the
lowest values.
▪ The quartile deviation, QD, is the amount of spread
with the middle half of the items arranged in an
ordered array. It is also called semi-interquartile
range. It is used for ordinal data.
𝑄3 −𝑄1
𝑄𝐷 =
2
Measures of Variability:
Variance

▪ The variance is the average (approximately) of


squared deviations of values from the mean.
n

 i
(X − X ) 2

Sample variance: s = i =1
2

n -1

Where X = arithmetic mean


n = sample size
Xi = ith value of the variable X
Measures of Variability:
Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
(10 − X)2 + (12 − X)2 + (14 − X)2 + ⋯ + (24 − X)2
s=
n−1

(10 − 16)2 + (12 − 16)2 + (14 − 16)2 + ⋯ + (24 − 16)2


=
8−1

126 A measure of the “average”


= = 4.2426
7 scatter around the mean
Measures of Variability:
Comparing Standard Deviation
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338

Data B Mean = 15.5


11 12 13 14 15 16 17 18 19 20 21 S = 0.926

Data C Mean = 15.5


S = 4.570
11 12 13 14 15 16 17 18 19 20 21
Compute the required measure.
Below are the average time for workers to finish a task:
35 60 45 48 52
72 42 32 55 30
49 68 36 62 42

Find: 1) range
2) QD
3) sample variance and sample standard deviation.
Measures of Variability:
Coefficient of Variation
▪ The coefficient of variation is the standard deviation
divided by the mean, multiplied by 100.
▪ It is always expressed as a percentage, %.
▪ It shows variation relative to the mean.
▪ The CV can be used to compare two or more sets of data
measured in different units (e.g. weight in kgs and height
in meters)

S
CV = ⋅ 100%
X
Coefficient of Variation

Stock A:
S 6
Average price last year = 60 CVA = ⋅ 100% = ⋅ 100% = 10%
X 60
Standard deviation = 6

Stock B:
Average price last year =100 S $5
CVB = ⋅ 100% = ⋅ 100% = 5%
X $100
Standard deviation = $5

Both stocks have the same standard deviation, but


stock B is less variable relative to its price
Sample problem:

The operations manager of a package delivery service is


deciding on whether to purchase a new fleet of trucks. When
packages are in the trucks in preparation for delivery, you need
to consider two constraints – the weight (in pounds) and the
volume (in cubic feet) for each item.
The operations manager samples 200 packages, and finds
that the mean weight is 26.0 pounds, with a standard deviation
of 3.9 pounds, and the mean volume is 8.8 cubic feet, with a
standard deviation of 2.2 cubic feet. How can the operations
manager compare the variation of the weight and the volume?
What is more variable relative to the mean?
Numerical Descriptive Measures for
a Population
▪ Descriptive statistics discussed previously described a
sample, not the population.

▪ Summary measures describing a population, called


parameters, are denoted with Greek letters.

▪ Important population parameters are the population mean,


variance, and standard deviation.
Population Mean

▪ The population mean is the sum of the values in


the population divided by the population size, N.

σ𝑁
𝑖=1 𝑋𝑖 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
𝜇= =
N N

Where μ = population mean


N = population size
Xi = ith value of the variable X
Population Variance
The population variance is the average of squared
deviations of values from the mean

σN 2
(X
i=1 i − μ)
σ2 =
N

Where μ = population mean


N = population size
Xi = ith value of the variable X
Population Standard Deviation
The population standard deviation is the most
commonly used measure of variation.
It has the same units as the original data.

σN
i=1(X i − μ)
2
σ=
N

Where μ = population mean


N = population size
Xi = ith value of the variable X
Sample statistics versus population
parameters

Measure Population Sample


Parameter Statistic
Mean
 X
Variance
2 s2
Standard
 s
Deviation
Measures of Variation
Comparing Standard Deviation
Small standard
deviation

Large standard
deviation
Measures of Variation
Summary Characteristics
▪ The more the data are spread out, the greater the range,
quartile deviation, variance, and standard deviation.
▪ The more the data are concentrated, the smaller the
range, quartile deviation, variance, and standard
deviation.
▪ If the values are all the same (no variation), all these
measures will be zero.
▪ None of these measures are ever negative.
Shape of a Distribution
▪ Describes how data are distributed
▪ Measures of shape
▪ Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean
Coefficient of Skewness (CS)
• The coefficient of skewness is a value that determines the
degree of skewness in a distribution.
• It may be used to determine if data is normally distributed.
Assignment
The table below shows the daily income (in pesos) that a
certain salesman gets from selling newspapers and
magazines.
Sun Mon Tues Wed Thurs Fri Sat

Newspaper 420 320 255 220 305 200 375


Magazines 310 285 290 215 275 240 385

Compute the mean, median, mode, range, QD, sample


variance, sample standard deviation and coefficient of
variation for each type.
In Microsoft Excel Use this tab!
In Microsoft Excel Use these functions!

Cha
3-38
Using JASP 0.14.1.0 version:
Using gretl2021a :
Source:
Statistics for Managers Using Microsoft Excel, 5e © 2008
Pearson Prentice-Hall, Inc.

You might also like