Lecture 2b Brief Lecture Notes On Measures of Dispersion (Variability)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Lecture 2b

Brief lecture notes on measures of dispersion (variability)

Definitions
Dispersion is the measure of the variation of the items
Dispersion is a measure of extent to which the individual items vary
Characteristics of an ideal measure of dispersion are the same as that of average,
viz., in brief
It should be simple to understand and easy to compute
It should be rigidly defined
It should be based on all the observations
It should be suitable for further algebraic treatments
It should not be affected by extreme values.
Importance/ Purpose of Measuring Variation
To test the reliability of an average
To serve as a basis for control of variability
To compare two or more series with regard to their variability
To facilitate as a basis for further statistical analysis
A measure of dispersion conveys information regarding the amount of
variability present in a set of data. If all the values are the same, there is no
dispersion; if they are not all the same, dispersion is present in the data. The
amount of dispersion may be small, when the values, though different, are close
together.

Example:
Let us consider a simple example to show why a measure of dispersion is so
important.
Consider two groups each of 6 students with their scores in a particular
examination:
Group-I: 48 50 52 51 49 50
Group-II: 1 2 100 99 98 0
The arithmetic mean for each group is 50. It is very much apparent from the
data that the first group consists of average or near average intelligent students
and the second group is made up of very bright and very dull students.
So, it is necessary to know how the values of the variate are dispersed about the
central value. Dispersion is the deviation of the values of a variate from its
central value. It measures the degree of variability of the values of the variate
among themselves.

1
Different Measures of Dispersion
There are two types of Measurement of dispersion:
(a) Absolute measures
(b) Relative measures
(a) Absolute measures are of four types
Range
Quartile deviation (or semi-inter-quartile range)
Mean deviation
Standard deviation.
(b) Relative measures are of three types:
Coefficient of Quartile deviation
Coefficient of Mean deviation
Coefficient of variation.

Range

Finding Range for Ungrouped Data

Range = Largest value – Smallest Value

Table-1: gives the total areas in square miles of the four western South-Central
states of the United States.
Find the range for this data set.

Table-1

Total Area
State (square miles)
Arkansas 53,182
Louisiana 49,651
Oklahoma 69,903
Texas 267,277

48

Solution:
Range = Largest value – Smallest Value
= 267,277 – 49,651
= 217,626 square miles

2
Thus, the total areas of these four states are spread over a range of 217,626
square miles.

Disadvantages
The range, like the mean has the disadvantage of being influenced by
outliers.
Its calculation is based on two values only: the largest and the smallest.

Quartile

Definition
Quartiles are three summery measures that divide a ranked data set into four
equal parts. The second quartile is the same as the median of a data set. The first
quartile is the value of the middle term among the observations that are less than
the median, and the third quartile is the value of the middle term among the
observations that are greater than the median.

Figure-1 Quartiles.

Each of these portions contains 25% of the observations of a


data set arranged in increasing order

25% 25% 25% 25%

Q1 Q2 Q3

87

Calculating Inter-quartile Range

The difference between the third and first quartiles gives the inter-quartile
range; that is,
IQR = Inter-quartile range = Q3 – Q1

3
Example

The following are the ages of nine employees of an insurance company:


47 28 39 51 33 37 59 24 33

(a) Find the values of the three quartiles. Where does the age of 28 falls in
relation to the ages of the employees?
(b) Find the inter-quartile range.

Solution

a)
Values less than the median Values greater than the median

24 28 33 33 37 39 47 51 59
28 + 33 47 + 51
Q1 =
2 Q2 = 37 Q3 =
2
= 30.5 = 49

The age of 28 falls in the lowest 25% of


the ages.
94

(b) IQR = Inter-quartile range = Q3 – Q1


= 49 – 30.5
= 18.5 years

Variance and Standard Deviation

The standard deviation is the most used measure of dispersion.


The value of the standard deviation tells how closely the values of a data
set are clustered around the mean.
In general, a lower value of the standard deviation for a data set indicates
that the values of that data set are spread over a relatively smaller range
around the mean.
In contrast, a large value of the standard deviation for a data set indicates
that the values of that data set are spread over a relatively large range
around the mean.

4
The variance calculated for population data is denoted by σ² (read as sigma
squared), and the variance calculated for sample data is denoted by s².
The standard deviation calculated for population data is denoted by σ, and the
standard deviation calculated for sample data is denoted by s.

Short-cut Formulas for the Variance and Standard Deviation for


Ungrouped Data

(  x ) 2
( x)
2

x − N
2
x − n
2

 =
2
and s =
2

N n −1

Where σ² is the population variance and s² is the sample variance. N & n


represents the population size and sample size respectively.

The standard deviation is obtained by taking the positive square root of the
variance.

Population standard deviation:  = 2


Sample standard deviation: s = s2

Example: The data on the 2002 total payroll (in millions of dollars) of five
MLB teams. Find the variance and standard deviation of these data.

Table

2002 Total Payroll


MLB Team (millions of dollars)
Anaheim Angels 62
Atlanta Braves 93
New York Yankees 126
St. Louis Cardinals 75
Tampa Bay Devil Rays 34

5
Solution

x x²
62 3844
93 8649
126 15,876
75 5625
34 1156
∑x = 390 ∑x² = 35,150
58

( x)
 x − n
2
(390) 2
2
35,150 −
5 35,150 − 30,420
s =
2
= = = 1182.50
n −1 5 −1 4
s = 1182.50 = 34.387498 = $34,387,498

Thus, the standard deviation of the 2002 payrolls of these five MLB teams is
$34,387,498.

Observations

The values of the variance and the standard deviation are never negative.
The measurement units of variance are always the square of the
measurement units of the original data.

Example
The following data are the 2002 earnings (in thousands of dollars) before taxes
for all six employees of a small company.
48.50 38.40 65.50 22.60 79.80 54.60
Calculate the variance and standard deviation for these data.

6
Solution

x x²
48.50 2352.25
38.40 1474.56
65.50 4290.25
22.60 510.76
79.80 6368.04
54.60 2981.16
∑x = 309.40 ∑x² = 17,977.02
62

( x ) 2
(309.40) 2
 x −
2

N
17,977.02 −
6
 =
2
= = 337.0489
N 6
 = 337.0489 = $18,359 thousand = $18,359

Thus, the standard deviation of the 2002 earnings of all six employees of this
company is $18,359.

Comments
A numerical measure such as the mean, median, mode, range, variance, or
standard deviation calculated for a population data set is called a
population parameter, or simply a parameter.
A summary measure calculated for a sample data set is called a sample
statistic, or simply a statistic.

Variance and Standard Deviation for Grouped Data

Short-Cut Formulas for the Variance and Standard Deviation for Grouped Data

( xf ) 2 ( xf )2

x 2
f −
N
x 2
f −
n
2 = and s 2 =
N n −1

7
Where σ² is the population variance, s² is the sample variance; x is the midpoint
and f is the frequency of a class. Also N & n are the population and sample size
respectively.

Monthly Income Mid No. of


(Tk.) point families xf x2f
(x) (f)
0-75 37.5 69 37.5×69 (37.5)2×69
75-150 - 167 - -
150-225 - 207 - -
225-300 - 65 - -
300-375 - 58 - -
375-450 - 24 - -
450-525 487.5 10 487.5×10 (487.5)2×10
Total n=∑f ∑ xf= ∑ x2f=

Quartile deviation for grouped data

N i
− c. f
Q i = Li + 4 i
f

c.f= cumulative frequency of the class just preceding the quartile class
f=frequency of the quartile class
N is the total no. of observations
L is the lower limit of the quartile class, and i=the length of the quartile class.

Coefficient of Variation
The coefficient of variation expresses the standard deviation as a percentage of
the mean.

The population coefficient of variation is



CV =  100% if   0

The sample coefficient of variation is
s
CV =  100% if x  0
x

If the standard deviations in sales for large and small stores selling similar
goods are compared, the standard deviation for large stores will almost always

8
be greater. A simple explanation is that a large store could be modeled as a
number of small stores. Comparing variation using the standard deviation would
be misleading. The coefficient of variation overcomes this problem by adjusting
for the scale of units in the population.

Example
The owners are considering purchasing shares of stock A or shares of stock B,
both listed on the New York Stock Exchange. From the closing date prices of
both stocks over the last several months the means and standard deviations were
found to be considerably different , with x A = $4.00, xB = $80.00,
s A = $2.00 and sB = $8.00. Should stock A be purchased, since the
standard deviation of stock B is larger?
Solution
We might think that stock B is more volatile than stock A. The mean closing
prices for the two stocks are x A = $4.00, xB = $80.00. Next; the coefficients of
variation are computed to measure and compare the risk of these competing
investment opportunities:

$2.00 $8.00
CVA =  100% = 50% and CV B =  100% = 10%
$4.00 $80.00

Notice that the market value of stock A fluctuates more from period to period
than does that of stock B.

9
Assignment on lecture 3
1. The following table gives the distribution of monthly income of 600 middle class families
in a city-
Monthly Income (Tk.) No. of families
0-75 69
75-150 167
150-225 207
225-300 65
300-375 58
375-450 24
450-525 10
Calculate the standard deviation and quartile deviation.

2. Calculate Quartile Deviation for the data given below:


Weekly wages 35-36 36-37 37-38 38-39 39-40 40-41 41-42
No. of wages earners 14 20 42 54 45 21 8

3. Find the lower and upper quartiles, of the frequency distribution given below.
Marks in Statistics No. of Students
Below 10 8
10-20 12
20-30 20
30-40 32
40-50 30
50-60 28
60-70 12
Above 70 4

4. The table gives the number of finished articles turned out per day by different
number of workers in a factory. Find the mean value and standard deviation of the
daily output of finished articles.
No. of articles 18 19 20 21 22 23 24 25 26 27
No. of workers 3 7 11 14 18 17 13 8 5 4

5. The following are the ages of nine employees of an insurance company:


47 28 39 51 33 37 59 24 33
Find the values of the three quartiles and also the inter-quartile range.

10
6. The data on the 2002 total payroll (in millions of dollars) of five MLB teams. Find
the variance and standard deviation of these data.
MLB Team 2002 Total Payroll (millions of dollars)
Anaheim Angels 62
Atlanta Braves 93
New York Yankees 126
St. Louis Cardinals 75
Tampa Bay Devil Rays 34

7. The following data are the 2002 earnings (in thousands of dollars) before taxes for
all six employees of a small company.
48.50 38.40 65.50 22.60 79.80 54.60
Calculate the variance and standard deviation for these data.

8. Calculate standard deviation for the following distribution giving 300 telephone
calls according to their duration in seconds.
duration (in seconds) No. of calls
0-30 9
30-60 17
60-90 43
90-120 82
120-150 81
150-180 44
180-210 24

9. The table gives the number of finished articles turned out per day by different
number of workers in a factory. Find the mean value and standard deviation of the
daily output of finished articles.
No. of articles 18 19 20 21 22 23 24 25 26 27
No. of workers 3 7 11 14 18 17 13 8 5 4

11

You might also like