Professional Documents
Culture Documents
Measures of Central Tendency, Dispersion and Shape
Measures of Central Tendency, Dispersion and Shape
Introduction
The role of statistics in research is to function as a tool for designing research, analysing data and
drawing conclusions therefrom. Most research studies result in a large volume of raw data which
must be suitably reduced so that it be easily read and used for further analysis. The science of
statistics cannot be ignored by any research worker even though he or she may not use detailed
statistical methods.
Scale Types
Measurement is the assignment of numbers to objects or events in a systematic fashion. Four levels
of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio and each
possessed different properties of measurement systems.
Nominal Scales
Nominal scales are measurement systems that possess none of the three properties stated above.
Level of measurement which classifies data into mutually exclusive, all-inclusive categories in
which no order or ranking can be imposed on the data.
Examples:
Ordinal Scales
Ordinal Scales are measurement systems that possess the property of order, but not the property of
distance.
Level of measurement which classifies data into categories that can be ranked. Differences
between the ranks do not exist.
Arithmetic operations are not applicable but relational operations are applicable.
Examples:
Agreement (Strongly Disagree, Disagree, Agree, Neither Disagree nor Agree, Strongly Agree)
Interval Scales
Interval scales are measurement systems that possess the properties of Order and distance, but not
the property of fixed zero.
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
Examples:
Temperature
Time
Test scores
Ratio Scales
Ratio scales are measurement systems that possess all three properties: order, distance, and fixed
zero. The added power of a fixed zero allows ratios of numbers to be meaningfully interpreted; i.e.
the ratio of Bekele’s height to Martha's height is 1.32, whereas this is not possible with interval
scales.
Level of measurement which classifies data that can be ranked, differences are meaningful,
and there is a true zero. True ratios exist between the different units of measure.
Examples:
Weight
Height
Length
N= population size
n= sample Size
X̅ = Sample mean
µ= population mean
Example 2:
∑5i=1 yi = 1+2+3+4+5= 15
Example 3:
4
∑y=1 y 2 = 12+22+32+42= 1+4+9+16= 30
Example 4:
Usually when two or more different data sets are to be compared it is necessary to condense the
data, but for comparison the condensation of data set into a frequency distribution and visual
presentation are not enough. It is then necessary to summarize the data set in a single value. Such a
value usually somewhere in the centre and represent the entire data set and hence it is called
measure of central tendency or averages. Since a measure of central tendency (i.e. an average)
indicates the location or the general position of the distribution on the X-axis therefore it is also
known as a measure of location or position.
a) Mean
The mean is a value obtained by dividing the sum of all the observations by the number of
observation. The mean is used to summarize interval or ratio data in situations when the distribution
is symmetrical and unimodal data. The mean is used to describe scores that reflect interval or ration
scale of measurement. The formula for the mean is given by:
∑𝑥
Mean= 𝑛
(sample mean)
∑𝑥
Mean= 𝑁
(population mean)
Example:
A sample of 10 executives received the following bonuses in thousands last year, determine the
mean of the bonuses:
∑(10+14+15+17+16+16+20+21+25+26)
Mean= 10
180
Mean= = 18
10
Mean= 18
b) Median
When the observations are arranged in ascending or descending order, the value that divides a
distribution into equal parts is called median. The median is commonly used to describe scores that
reflect ordinal scale of measurement. The median is given by the following formula:
(𝑛+1)𝑡ℎ
If n is odd=
2
𝑛 𝑛
[( )𝑡ℎ+( )𝑡ℎ+1)]
2 2
If n is even=
2
Example 1
45 32 37 46 39 36 41 48 36
Solution:
32 36 36 37 39 41 45 46 48
(𝑛+1)𝑡ℎ
Since n is odd we use the formula n= 2
(9+1)𝑡ℎ (10)𝑡ℎ
n= = = 5th
2 2
Median= 39
Example 2
Solution:
32 36 36 37 39 41 45 46 48 50
𝑛 𝑛
[( )𝑡ℎ+( )𝑡ℎ+1)]
2 2
Since n is even we use the formula n= 2
10 10
[( )𝑡ℎ+( )𝑡ℎ+1)] (5+6)𝑡ℎ (39+41)
2 2
If n is even= 2
= 2
= 2
= 40
Median =40
c) Mode
The mode is the most frequently occurring score. Typically, useful in describing central tendency and
is typically useful in describing central tendency when the scores reflect a nominal scale of
measurement. However, the mode is limited in giving us information about a distribution. If two or
more values occur the same number of times but most frequently than the other values, then there
is more than one mode.
Example 1:
The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.
Solution:
The score of 81 occurs the most often. It is the Mode. This data set is unimodal.
Example 2:
Solution:
We find that the both the observations 9 and 15 have the same frequency of 2. So, 9 and 15 are the
modes of the data. This data is bi-modal.
Measures of Dispersion
The scatter or spread of items of a distribution is known as dispersion or variation. In other words
the degree to which numerical data tend to spread about an average value is called dispersion or
variation of the data. Measures of dispersions are statistical measures which provide ways of
measuring the extent in which data are dispersed or spread out.
Various measures of dispersions are in use. The most commonly used measures of dispersions are:
1) Range
2) Standard deviation
3) Variance
4) Coefficient of variation.
a) The Range
The range is obtained by subtracting the smallest score from the largest score in the distribution. It is
a quick method for measuring of variability. Because the range is greatly affected by extreme scores,
it may give a distorted picture of the scores. For this reason, among others, the range is not the most
important measure of variability. The formula for the range is given by:
Range= L-S
32 35 36 36 37 38 40 42 42 43 43 45
Solution:
Range=L-S
Range= 45-32= 13
b) Standard Deviation
Standard deviation is another measure of dispersion in statistics. The standard deviation shows how
much your data is spread out around the mean or average. It is the most robust and widely used
measure of dispersion since, unlike the range because it takes into account every variable in the
dataset. When the values in a dataset are pretty tightly bunched together the standard deviation is
small. When the values are spread apart the standard deviation will be relatively large. The standard
deviation is usually presented in conjunction with the mean and is measured in the same units. The
formula for standard deviation is given by:
∑(xi −x̅)2
S.D= √
n
Where;
n= sample size
Example:
For example, suppose we have five climatic stations and have recorded rainfall in mm as follows
60,47,17,43,30. Calculate the standard deviation.
Solution
∑𝑥 ∑ 60+47+17+43+30 197
Mean= = = = 39.4
𝑛 5 5
Xi X̅ (Xi- X̅ ) (Xi- X̅ )2
∑= 1085.2
∑(xi −x̅)2
S.d= √ n
1085.2
S.d= √ 5
= S.D= √217.04
S.d= 14.73mm
c) Variance
Variance in statistics is a measurement of the spread between numbers in a data set. That is, it
measures how far each number in the set is from the mean and therefore from every other number
in the set, so Variance defined as the average of the squared differences from the mean. Variance
can be negative. A zero value means that all of the values within a data set are identical. If the
variance is low that’s mean the data collect near average, while If the variance is high the data will
spread from the average. Variance. Variance is given by the following formula:
∑(xi −x̅)2
Var= n
Where;
n= sample size
Example:
The heights (in cm) of students of a class is given to be 163, 158, 167, 174, 148. Determine the
variance.
Solution:
∑𝑥 ∑ 163+158+167+174+148 810
Mean= 𝑛
= 5
= 5
= 162
Xi X̅ (Xi- X̅ ) (Xi- X̅ )2
163 162 1 1
158 162 -4 16
167 162 5 25
∑= 382
382
Var= 5
= 76.4
Var= 76.4
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Coefficient of variation (COV)= 𝑀𝑒𝑎𝑛
x 100
Example:
An analysis of the monthly wages paid to workers in two firms A and B belonging to the same
industry gives the following results:
10
COV Firm A= x 100= 19.05%
52.5
12
COV Firm B= x 100= 25.3%
47.5
This implies that there is greater variability in wages in Firm B compared to Firm A.
a) Skewness
The measure of central tendency and measure of dispersion can describe the distribution but they
are not sufficient to describe the nature of the distribution. For this purpose, we use other two
statistical measures that compare the shape to the normal curve called Skewness and Kurtosis.
Skewness and Kurtosis are the two important characteristics of distribution that are studied in
descriptive statistics.
Kurtosis is a statistical number that tells us if a distribution is taller or shorter than a normal
distribution. If a distribution is similar to the normal distribution, the Kurtosis value is 0. If Kurtosis is
greater than 0, then it has a higher peak compared to the normal distribution. If Kurtosis is less than
0, then it is flatter than a normal distribution.