Lecture - 4 Dispersion

Measures of Dispersion, Skewness and Kurtosis
DISPERSION
Averages or the measures of central tendency give us an idea of the

concentration of the observations about the central part of the distribution. If we
know the average alone we cannot form a complete idea about the distribution as
will be clear from the following example.
Consider the series (i) 4, 5, 6, 7, 8 (ii) 2, 3, 6, 9, 10 (iii) –3, -1, 7, 12, 15. In
all these cases we see that n, the number of observations is 5 and the mean x is 6.
If we are given that the mean of 5 observations is 6, we cannot form an idea as to
whether it is the average of first series or second series or third series or of any
other series of five observations whose sum is 30. Thus we see that the measures
of central tendency are inadequate to give us a complete idea of the distribution.
They must be supported and supplemented by some other measures. One such
measure is dispersion.
We study dispersion to have an idea about the homogeneity or
heterogeneity of the distribution. In the above case we say that series (i) is more
homogeneous (less dispersed) than the series (ii) or we say that series (iii) is more
heterogeneous (more scattered) than the series (i) or (ii).
MEASURES OF DISPERSION
Literal meaning of dispersion is scatteredness or variability. The

measurement of scatter of the values of a data set among themselves is called a
measure of dispersion or variation.
Purposes of measures of dispersion: The measure of dispersion in conjunction

with an average gives us a description of the structure of the distribution and the
role of the individual values in it. A measure of dispersion serves two purposes:
(i) It provides one of the most important characteristics of a frequency

distribution.
(ii) It helps us to compare two or more frequency distribution.
Characteristic for an ideal measure of dispersion: The characteristics

for an ideal measure of dispersion are the same as those for an ideal measure of
central tendency, viz.:
(i) It should be rigidly defined

(ii) It should be easy to calculate and easy to understand.
(iii) It should be based on all observations.
Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 1

(iv) It should be amenable to further mathematical treatment.

(v) It should be affected as little as possible by fluctuations of sampling.
Types of measures of dispersion: There are two types of measures of

dispersion.
(i) Absolute measures of dispersion

(ii) Relative measures of dispersion.
Absolute Measures of Dispersion
The measures of dispersion having the same units as the original variable
is termed as absolute measures of dispersion.
There are four absolute measures of dispersion:

(i) Range,
(ii) Quartile deviation or Semi-interquartile range,
(iii) Mean deviation
(iv) Variance
(v) Standard deviation.
(vi) Standard Error
Range
The range is the difference between two extreme observations of the

distribution. It is usually denoted by R. If A and B are the greatest and the
smallest observations respectively in a distribution, then its range is
R = A - B.
Suitability of range: Range is the simplest but a crude measure of

dispersion. Since it is based on two extreme observations, it is not at all a reliable
measure of dispersion.
Quartile Deviation
Quartile deviation or semi-interquartile range is defined as the half of the

difference of the first quartile from the third quartile.
1
i.e., Q.D.  (Q3  Q1 )
2

Where, Q1 and Q3 are the first and third quartiles respectively of the distribution.
Suitability of quartile deviation: It is definitely a better a measure than

the range as it makes use of 50% of the data. But since it ignores the other 50% of
the data, it cannot be regarded as a reliable measure.
Mean Deviation
The arithmetic mean of the absolute deviations of the individual values of a
variable from their average A (usually mean, median or mode) is called the mean
deviation
Case 1. If xi (i= 1,2,…n ) is the ith value of a variable, then
1 n
M .D.   xi  A
n i 1
Case II. If xi (i = 1, 2,… k) is the value of the ith class with corresponding
k
frequency fi such that, f
i 1
i = n, then
1 k
M .D.   f i xi  A
n i 1
Case III. If xi (i = 1,2,…….K) is the mid-value of the i th class with

k
corresponding frequency fi such that f
i 1
i = n, then
1 k
M .D.   f i xi  A
n i 1
Suitability of mean deviation: Since mean deviation is based on all the

observations, it is a better measure of dispersion than range or quartile deviation.
But the step of ignoring the signs of the deviation (xi-A) creates artificiality and
makes it useless for further mathematical treatment.
Important result: Mean deviation is least when taken from median

i. e., When A = Md.

Standard Deviation and Variance
Standard Deviation: The positive square root of the arithmetic mean of

the squared deviations of observations taken from their mean is known as the
standard deviation.
Standard deviation of population values is denoted by  and is defined as
N
1

N
(X
i 1
i   ) 2 [for ungrouped data]
k
1

N
 f (X
i 1
i i  )2 [for grouped data]
Where N is the population size and μ is the population mean.
Standard deviation of sample values is denoted by s and is defined as
1 n
s 
n i 1
( xi  x ) 2 [for ungrouped data]
1 k
s 
n i 1
f i ( xi  x ) 2 [for grouped data]
To get an unbiased estimate of population standard deviation from the

sample of small size, the following formula is often used
1 n
s  ( xi  x ) 2
n  1 i 1
[for ungrouped data]
1 k
s 
n  1 i 1
f i ( xi  x ) 2 [for grouped data]

Variance: The square of the standard deviation is called the variance i.e., the
arithmetic mean of the squared deviations of observations taken from their mean
is known as the variance.
n
Sum of Squares: The quantity  (x
i 1
i  x ) 2 is often referred to as the
corrected sum of squares or simply sum of squares (S. S) of the observed values
x1, x2………..., xn. It is called the corrected sum of squares as it can be expressed
as raw some of squares minus the correction term. We can write:
n n n n n n
 (x
i 1
i  x ) 2   ( xi2  2 x xi  x 2 )   xi2  2 x  xi   x 2   xi2  nx 2
i 1 i 1 i 1 i 1 i 1
2
 n

n  x i 
  xi2   i 1 
i 1 n
2
n 
The terms n 2  x i  are usually called the raw sum of squares (RSS)
 xi and  i 1 
i 1 n
and the correction term respectively.
Similarly for a frequency or grouped data

2
k 
k k  f i xi 
 f i ( xi  x ) 2   f i xi2   i 1 
i 1 i 1 n
Hence, for large sample,
  k  
2
 k  i i   1  k
f x
1 k 1   2 1 k
s2   f i ( xi  x )   f i xi   i 1
2 2
  f i xi  nx    f i xi  x
2 2 2
n i 1 n  i 1 n  n  i 1  n i 1
 
 

For small sample,
  k  
2
 k   f i x i  
  1 
1 k 1  2 1  k 2
k
 i 1
s2  
n  1 i 1
f i ( xi  x ) 2   i i
n  1  i 1
f x 2

n

n  1
 f i xi  nx  
 i 1
2
 n  1
 f i xi  nx 
 i 1
2


 
 
Suitability of standard deviation: The standard deviation is by far the
most widely encountered measure of dispersion.
Merits:
(1) It is rigidly defined.

(2) It is based on all observations and is readily understood.
(3) It is amenable to algebraic treatment.
(4) It is the most important and most reliable among all the four measures
of absolute dispersion. The standard deviation possesses a majority of
the properties which are desirable in a measure of dispersion.
(5) It is easy to use mathematically. Many statistical theorems are built
around it.
Demerits:
(1) It is affected markedly by extreme values.
(2) It is more difficult to compute than other measures of dispersion.
Some important properties of standard deviation:
(i) It is independent of origin but not of scale of measurement i.e., if a variable x

xi  A
is transformed to another variable u by u i  , then s x  c su .
C
(ii) The variance is the minimum of all mean squared deviation (MSD) and
standard deviation is the minimum of all root mean squared deviation
(RMSD)
1 k 1 k
i.e., 
n i 1
f i ( xi  x ) 2   f i ( xi  A) 2
n i 1
1 k 1 k
and 
n i 1
f i ( xi  x ) 2  
n i 1
f i ( xi  A) 2

Where, A is any other quantity except the mean.
(iii) Variance of the combined series: If n1, N2 are the sizes; x1 , x 2 are the
means, and s12 , s 22 are the variances of two set of data: (x11, x12, ……x1n1) and
(x21, x22, ……x2n2) , then the variance s2, of the combined series is given by
n1 s12  n 2 s 22  n1 d12  n 2 d 22
s2 
n1  n2

n1 x1  n 2 x 2
Where, d1  x1  x , d 2  x 2  x and x  .
n1  n2
(iv) Standard error: The standard deviation of any statistic is termed as the
standard error of that statistic.
Standard error of the sample mean: The standard deviation can be viewed as
a parameter, which can provide a lot of information when, combined with other
techniques. It is particularly useful when the population has a special type of
frequency distribution, called the normal distribution. It is possible then to find
the percentage of observations falling within distance of one, two or three o’ s
from the mean. About 68.27 percent, 95.45 percent and 99.73 percent of the
observations will lie within the regions (μ ± ơ ), (μ ± 2ơ) and (μ ± 3ơ) respectively,
where, μ and ơ are the mean and standard deviation of normal distribution. Thus,
in a normal curve, 3 times the ơ constitutes practically the whole range of the
values in the distribution.

Relative Measures of Dispersion
The measures of dispersion having no unit of measurement, as the original

variable, is termed as relative measures of dispersion. Whenever we want to
compare the variability of the two series which differ widely in their averages or
which are measured in different units. We do not merely calculate the absolute
measures of dispersion but calculate the relative measures of dispersion. Relative
measures of dispersion usually termed as the co-efficients of dispersion which are
pure numbers.
The co-efficient of dispersion (C. D.) based on different absolute measures

of dispersion are as follows:
(i) Based upon range:
A B
C.D.  , Wher A and B are the greatest and smallest items
A B
respectively in the series,
(ii) Based upon quartile deviation:
Q3  Q1
2 Q  Q1
C.D.   3
Q3  Q1 Q3  Q1
2
(iii) Based upon mean deviation:
Mean deviation
C .D. 
Average from which it is calculated
For example, when M0 is used to calculate the mean deviation, then
Mean deviation about M 0

C.D. 
M0
(iv) Based upon standard deviation:
S tan ndard deviation 

C.D.  
Mean x
Co-efficient of Variation
Co-efficient of dispersion based upon standard deviation when multiplied

by 100, is called the co-efficient of variation (C. V.). i. e.,

C.V .   100
x
According to Professor Karl Pearson who suggested this measure, C.V. is
the percentage variation in the mean, standard deviation being considered as total
variation in the mean.
Suitability of C.V. : For comparing the variability of two series, we

calculate the co-efficient of variation for each series. The series having greater
C.V. is said to have more variability than the other and the series having lesser
C.V. is said to have more consistency than the other.
SKEWNESS
Literally skewness means ‘lack of symmetry’. We study skewness to have an

idea about the shape of the curve, which we can draw, with the help of the given
data. A distribution is said to be skewed if
(i) Mean, median and mode fall at different points

i.e., Mean  Median  Mode
(ii) Quartiles are not equidistant from the median, and
(iii) The curve drawn with the help of the given data is not symmetrical
but stretched more to one side then to the other.
Measures of Skewness
Various Measures of skewness are
(1) Sk = M - Md (2) Sk = M – M0
where, M is the mean, Md, the median and M0 , the mode of the distribuition.
(3) Sk = (Q3 – Md) – (Md – Q1)
These are the absolute measures of skewness. As in dispersion, for

comparing two series we do not calculate these absolute measures, but we

calculate the relative measures, called the co-efficients of skewness, which are
pure numbers independent of units of measurement. The following are the co-
efficients of skewness.
1. Prof. Karl Pearson,s Co-efficient of skewness: It is based on averages

(mean, median and Mode) and defined as
M  M0
Sk 

Where,  is the standard deviation of the distribution.
If mode is ill defined, then using the relation, M – M 0 = 3 (M – Md), for a

moderately asymmetrical distribution. We get
3( M  M d )
Sk 

Skewness is positive if M > M0 or M > Md and negative if M < M0 or

M < Md.
II. Prof. Bowley’s Co-efficient of Skewness: It is based on quartiles and

defined as
(Q3  M d )  ( M d  Q1 ) Q3  Q1  2 M d
Sk  
(Q3  M d )  ( M d  Q1 ) Q3  Q1
III. Co-efficient of Skewness Based on Moments: From a theoretical point

of view, the most important measure of skewness is based upon the corrected
moments. A measure of skewness may be obtained by using the third corrected
moment  3 . But  3 is a measure of absolute skewness. The measure of relative
skewness is given by:
 32
1  3 ,
2
Skewness is also sometimes measured by
3
 1  1  3
.
2 2
When
(i)  1  0 , then the distribution is symmetrical.

(ii)  1  0 , then the distribution is positively skewed.

(iii)  1  0 , then the distribution is negatively skewed.
It is to mention that rth corrected moment, denoted by  r , of a distribution

is defined as
1 k 1 k 1 k
r  
n i 1
f i ( xi  x ) r i.e.,  2   f i ( xi  x ) 2 and  3   f i ( xi  x ) 3
n i 1 n i 1
KURTOSIS
If we know the measures of central tendency, dispersion and skewness, we

still cannot form a complete idea about the distribution. In addition to these
measures we should know one more measure which Prof. Karl Pearson calls as
the Convexity of a curve’ or Kurtosis.
Kurtosis enables us to have an idea about the flatness or peakedness of the

curve. It is measured by the co-efficient  2 , given by
4
2  ,
 22
also measured by  2 , which is the derivation of  2 , given by
 2   2  3.
When
(i)  2  0 , then the curve is mesokurtic.
(ii)  2  0 , then the curve is leptokurtic.
(iii)  2  0 , then the curve is platykurtic.

Lecture - 4 Dispersion

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture - 4 Dispersion

Uploaded by

Copyright:

Available Formats

Measures of Dispersion, Skewness and Kurtosis

Averages or the measures of central tendency give us an idea of the

Literal meaning of dispersion is scatteredness or variability. The

Purposes of measures of dispersion: The measure of dispersion in conjunction

(i) It provides one of the most important characteristics of a frequency

Characteristic for an ideal measure of dispersion: The characteristics

(i) It should be rigidly defined

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 1

(iv) It should be amenable to further mathematical treatment.

Types of measures of dispersion: There are two types of measures of

(i) Absolute measures of dispersion

Absolute Measures of Dispersion

There are four absolute measures of dispersion:

The range is the difference between two extreme observations of the

Suitability of range: Range is the simplest but a crude measure of

Quartile deviation or semi-interquartile range is defined as the half of the

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 2

Suitability of quartile deviation: It is definitely a better a measure than

Case III. If xi (i = 1,2,…….K) is the mid-value of the i th class with

Suitability of mean deviation: Since mean deviation is based on all the

Important result: Mean deviation is least when taken from median

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 3

Standard Deviation and Variance

Standard Deviation: The positive square root of the arithmetic mean of

Standard deviation of population values is denoted by  and is defined as

Where N is the population size and μ is the population mean.

Standard deviation of sample values is denoted by s and is defined as

To get an unbiased estimate of population standard deviation from the

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 4

Similarly for a frequency or grouped data

Hence, for large sample,

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 5

For small sample,

(1) It is rigidly defined.

Some important properties of standard deviation:

(i) It is independent of origin but not of scale of measurement i.e., if a variable x

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 6

Where, A is any other quantity except the mean.

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 7

Relative Measures of Dispersion

The measures of dispersion having no unit of measurement, as the original

The co-efficient of dispersion (C. D.) based on different absolute measures

(i) Based upon range:

(ii) Based upon quartile deviation:

(iii) Based upon mean deviation:

For example, when M0 is used to calculate the mean deviation, then

Mean deviation about M 0

(iv) Based upon standard deviation:

S tan ndard deviation 

Co-efficient of dispersion based upon standard deviation when multiplied

Suitability of C.V. : For comparing the variability of two series, we

Literally skewness means ‘lack of symmetry’. We study skewness to have an

(i) Mean, median and mode fall at different points

Various Measures of skewness are

(3) Sk = (Q3 – Md) – (Md – Q1)

These are the absolute measures of skewness. As in dispersion, for

Professor Dr. Khandoker Saif Uddin Lecture # 4, Page 9

1. Prof. Karl Pearson,s Co-efficient of skewness: It is based on averages

If mode is ill defined, then using the relation, M – M 0 = 3 (M – Md), for a

Skewness is positive if M > M0 or M > Md and negative if M < M0 or

II. Prof. Bowley’s Co-efficient of Skewness: It is based on quartiles and

III. Co-efficient of Skewness Based on Moments: From a theoretical point