Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

MODULE 5

MEASURES OF VARIABILITY

Introduction

When we summarize and describe a set of data, or a frequency


distribution, we are also interested to report how variable the data are, or how
much they spread out from high to low scores. For example, two groups of
samples, both with the median age of 25 years would represent quite different
personal and psychological profiles if one group had a spread age 18 to 30 years
old while the other group ranged from 15 to 40 years old. Another example
would be two groups of children have mean scores of 75 on an achievement test,
but one group has a score ranging from 50 to 93 while the other group has
scores ranging from 60 to 82. A measure of this spread or dispersion is an
important statistics for describing a group.

Note: This module presents how to compute for the measures of


variability using manual or long hand calculations, for the purpose of
introducing it conceptually. In the succeeding modules, the use of SPSS
is emphasized.
RANGE

A very simple measure of variability is to get the score difference between


the highest and the lowest score and this measure is called the range of the
distribution. If in a reading test for example, the highest score is 95 and the
lowest is 45, the range is 40. However, the range depends only upon the 2
extreme scores in the total group. This makes this measure a very unreliable
because it can be changed a good bit by the inclusion or omission of a single
extreme case. The example below illustrates that the range of a set of scores is
affected by a single extreme score. The Range for group 1 is 40 while the range
for group 2 is 70.

Group 1 45 50 76 77 80 81 90 95
Group 2 25 50 76 77 80 81 90 95

Think about this!


Can we use the range to compare two groups relative performance? Does
a higher value of range indicate lower performance? Why? Why not?

SEMI-INTERQUARTILE RANGE OR QUARTILE

Another measure of variability is the range of scores that includes a


specified part of the total group – usually the middle fifty percent. The middle fifty
percent of the group are scores lying between the 25th and 75th percentiles. The
25th (Q1) and 75th (Q3) percentiles are called quartiles since they cut off the
bottom quarter and the top quarter of the group respectively. The score distance
between them is called the interquartile range. The statistic that is often reported
as a measure of variability is the semi-quartile range (Q), which is half of the
interquartile range.
So, i

Finding First Quartile (Q1 ) and Third Quartile (Q3)

For Ungrouped scores

Example

56 57 63 75 78 79 80 82 87 89 90 92

Q1 = (n+1)/4 , is the position of the first quartile (Q1), where n is the number of
scores
= (12+1)/4
= 3.25

Q1 = 3rd score from the lowest + .25 (4th score-3rd score)


= 63 + .25 (75-63)
= 66

Similarly,

Q3 = 3 (n + 1)/4, is the position of the third quartile (Q3)


= 3 (12 + 1)/4
= 9.75

Q3 = 9th score from the lowest + .75 (10th score-9th score)


= 87 + .75 ( 89-87)
= 89.5

쳌䁓 쳌쳌
Therefore: i i i 䁓
For Grouped Scores

Here is a set of score from a class of 50 student. The scores have already been
summarized into a frequency distribution.
X f cf
95-99 3 50
90-94 4 47
85-89 5 43
80-84 8 38
75-79 6 30 Q3
70-74 10 24
65-69 4 14
60-64 4 10 Q1
55-59 2 6
50-54 0 4
45-49 1 4
40-44 3 3

㠶㐠
i

Where : Q1= the first quartile (25th percentile)
LQ1 = lower limit of the first quartile (25th percentile class)
N = total number of frequencies in the distribution
cf = cumulative frequency of the first quartile
fQ1 = frequency of the first quartile (25th percentile)
i= size of the interval of the first quartile

Note: Just like in finding the median of a set of grouped scores, the first step is to get
the cumulative frequency. In the formula, N is divided by 4 (N/4), because we are
looking for the score which is at middle of the lower half of the distribution. Just like the
median, the same procedure is applied in looking for the value of cf, f and the lowest
limit.
㠶㐠
So, i

= 64.5 + 5 [(50/4 - 10)/4]


= 64.5 + 5 [(12.5-10)/4]
= 64.5 + 5 (2.5/4)
= 64.5 + 5(.63)
= 64.5 + 3.15
= 67.65

Similarly, we use the same formula in finding Q3. The only difference is to get ¾
of N, because we are looking for the score that lies at the middle of the top half of the
distribution.

㠶㐠
i

Where : Q3= the third quartile (75th percentile)
LQ3 = lower limit of the third quartile (75th percentile class)
N = total number of frequencies in the distribution
cf= cumulative frequency of the 3rd quartile class
fQ3 = frequency of the third quartile
i= size of the interval of the 3rd quartile class

㠶㐠
So, i

= 79.5 + 5 [ (37.5- 30)/8 ]


= 79.5 + 5 (7.5/8)
= 79.5 + 5 (.94)
= 79.5 + 4.7
= 84.2
㷟䁓 쳌 䁓쳌
Therefore: Semi-quartile range i i 䁓 ૛
DECILES

The score that divides the distribution into ten equal parts is called the
decile. Just like Q1 and Q3, we can compute the Decile by determining the
number of cases required. For example, we are looking for the 1st Decile (D1),
then we divide N by 10, then 2N/10 for D2, 3N/10 for D3 , and so on.

For Ungrouped Data

Example . Find the 3nd Decile (D3) and 8th Decile (D8)

56 57 63 75 78 79 80 82 87 89 90 92 95 96
97

D3 = 3n/10, is the position of the 3rd decile (D3)


= 3 (15)/10
= 4.5

If the answer is not a whole number, round-up to the nearest whole number, and
that is the position of the decile. In this example 4.5 = 5 Thus, the 2nd decile (D2)
is the 5th score from the lowest = 78.

D8 = 8n/10 is the position of the 8th decile (D8)


= 8 (15)/10
= 12

If the answer is a whole number, get the average of that corresponding value in
your data set and the value that directly follows it. In this example, the 8th decile
is the average of the 12th and the 13th score. That is (92+95)/2 = 93.5
For Grouped Scores

Just like in finding the quartiles of a set of grouped scores, the first step is to get the
cumulative frequency. The same procedure is applied in looking for the value of cf, f
and the lowest limit.

Example: Look for D2

X f cf We are looking for the 2nd decile or the score in


75-77 3 50 which 20 percent of the cases falls below it, then
72-74 5 47 2nd Decile or D2 = LD2+ i [ (2N/10 – cf)/ f D2 ]
69-71 2 42 = 41.5 + 3 [(.20x50)-8]/6
66-68 4 40 = 41.5 + 3 [(10-8)/6 ]
63-65 3 36 = 41.5 + 1.00
60-62 0 33 = 42.50
57-59 2 33
54-56 4 31
51-53 6 27
48-50 3 21
45-47 4 18
42-44 6 14
D2
39-41 5 8
36-38 1 3
33-35 2 2
N=50
PERCENTILES

The same procedure may be used when we find the score below which
any percentage of the group falls. These values are called percentiles. The
median is the 50th percentile, i.e., the score below which 50 percent of individuals
fall. If we want to find the 40th percentile, we must find the score below which 40
percent of the cases fall. Any other percentiles can be found in the same way.
Percentiles have many uses, especially in connection with test norms and
interpretation of scores

For Ungrouped Data

Example . Find the 45th Percentile (P45) and 60th Percentile (P60)

56 57 63 75 78 79 80 82 87 89 90 92 95 96 97

P45 = 45n/100, is the position of the 45th Percentile (P45)


= 45(15)/100
= 6.75

The procedure in finding the decile of ungrouped scores is applied to percentiles


of ungrouped scores. That is, if the answer is not a whole number, round-up to
the nearest whole number, and that is the position of the percentile. In this
example 6.75 = 7. Thus, the 45th percentile (P45) is the 7th score = 80.

P60 = 60n/100 is the position of the 60th Percentile (P60)


= 60(15)/100
= 9

Just like the deciles, if the answer is a whole number, get the average of that
corresponding value in your data set and the value that directly follows it. In this
example, the 60th percentile is the average of the 9th and the 10th score. That is
(87 + 89)/2 = 88.
For Grouped Scores

Just like in finding the quartiles and deciles of a set of grouped scores, the first
step is to get the cumulative frequency. The same procedure is applied in looking for
the value of cf, f and the lowest limit.

Example: Find P20


X f cf
We are looking for the 20th percentile or
75-77 3 50
the score in which 20 percent of the
72-74 5 47
cases falls below it, then
69-71 2 42
66-68 4 40
63-65 3 36 P20 = LP20+ i[ (20%N – cf)/ f P20 ]
60-62 0 33 = 41.5 + 3 [(.20x50)-8]/6
57-59 2 33 = 41.5 + 3 [ (10-8)/6 ]
54-56 4 31 = 41.5 + 1.00
51-53 6 27 = 42.50
48-50 3 21
45-47 4 18
42-44 6 14 P20
39-41 5 8
36-38 1 3
33-35 2 2
N=50

Note: The values of P20 and D2 are the same because in either measure, we are
looking for the score in which 20 percent of the cases falls below it.
The Variance (Ungrouped Data)

The variance is a measure of variability among all scores in the


distribution rather than through extreme scores or only a proportion of the scores.
It considers each observation relative to the mean of the set of scores. It is
derived by getting the sum of the squared deviation from the mean divided by n-1
(for sample variance), and N (for population variance)

Sample variance (s2) i


where:
= deviation from the mean
= squared deviation
= sum of the squared deviation

Example 1. Compute the variance of the following Algebra scores of ten


students:
92 75 85 83 90 73 79 80 88 85
Score d d2
92 +9 81
75 -8 64
85 +2 4
83 0 0
90 +7 49
73 -10 100
79 -4 16
80 -3 9
88 +5 25
85 2 4
N = 10 ∑d2= 352
Mean = 83
Steps:
1. Find the Mean (Mean = 83)
2. Subtract the Mean from the scores to get d (i.e. 92-83 = 9; 75-83 = -8, etc.)
3. Square the deviation (i.e. 92 = 81, (-8)2 = 64,etc. )
4. Find the sum of the squared deviation (∑d2= 352)
5. Divide the sum of the squared deviation by the (n -1 = 9)

s2 i ; i = 39.11

Example 2. Compute for variance of the following Geometry scores of the ten
students: 92 95 75 63 45 87 99 90 98 86
Score d d2
92 9 81
95 12 144
75 -8 64
63 -20 400
45 -38 1444
87 4 16
99 16 256
90 7 49
98 15 225
86 3 9
N = 10 ∑d2= 2,688
Mean = 83
ǡ쳌
Sample variance (s2) i = 298.67

Take note that the two examples have the same value for the Mean but with different
values for the variance. The variance for the second example (Geometry scores) is
larger because the scores are more dispersed/scattered from the mean, thus higher
variability.
The Standard Deviation (Ungrouped Data)

The standard deviation gives a better idea of how the data entries differ from the
mean. It is computed by extracting the square root of the variance. The formula for the

sample standard deviation is: i or 2


.

Thus, in Example 1 (Algebra scores), the sample standard deviation


is 쳌䁓 ǡ i 䁓 . In Example 2 (Geometry scores), the sample standard
deviation is 쳌 䁓쳌 ǡ i ૛䁓

So how do we interpret the standard deviation of 6.25 and 17.28. For the
Algebra scores, it means that on the average, the scores are 6.25 away from the mean.
For the Geometry scores, it means that on the average, the distance of the scores from
the mean is 17.28. Theoretically, standard deviation and variance describe how
scattered the scores are from a central point (the mean). In layman’s term, the higher
the value of the standard deviation or variance, the more the scores scatter from the
mean. Thus, the distances of the scores are larger. Based on the two given examples,
the average scores are the same (Mean =83), and the number of scores is also the
same (n=10). But the scores for Geometry are farther away from each other and from
the mean, compared to the scores in Algebra.

An alternative way of computing for the standard deviation is to use the sum of
all the scores and the sum of all its squares. The formula is:

t
i ; where: Xi = the ith observed value for the given variable X

n = sample size
Using Example 1

Xi (scores) Xi2
92 8,464
75 5,625
85 7,225
83 6,889
90 8,100
73 5,329
79 6,241
80 6,400
88 7,744
85 7,225
∑Xi= 830 ∑Xi2= 69,242

Steps:
1. Get the sum of scores
2. Square all the scores and get the sum .
3. Substitute it with the formula
ິ t l l ິ ິl
i l
;i ິ
; i ິ
= 쳌䁓 i 6.25

In this illustration, the variance s2 is (6.25)2 = 39.11.


Using Example 2
Xi (scores) Xi2
92 8464
95 9025
75 5625
63 3969
45 2025
87 7569
99 9801
90 8100
98 9604
86 7396
∑Xi= 830 ∑Xi2= 71578

૛ ૛ t l l ૛ ૛ ິl
i l
;i ິ
; i ິ
= 쳌 䁓쳌 i 䁓

In this illustration, the variance s2 is (17.28)2 = 298.67.

Finding the Standard Deviation and Variance of Grouped Data

The standard deviation and variance of grouped data are calculated using
the class marks of each step interval, or using the deviations
1. Finding SD using the Class Marks, the formula is:
㐠t香䁘 t 㐠香䁘
i t

where: = number of samples


f = frequency (number of observations in each class interval)
CM = class mark (the midpoint of each class interval)
㐠t香䁘 2= summation of frequency x class mark squared
㐠香䁘 = summation of frequency x class mark
Illustration:

X F Class f x CM f x (CM)2
Mark (CM)
Not

75-77 3 76 (75+77)/2 228 (3x76) 17328 (3x762)


72-74 4 73 292 21316

s 69-71 6 70 420 29400


66-68 5 67 335 22445
i
63-65 8 64 512 32768
60-62 9 61 549 33489
= 57-59 5 58 290 16820
ິ 54-56 8 55 440 24200
51-53 3 52 156 8112
48-50 2 49 98 4802
=
45-47 2 46 92 4232
ິ૛ິ䁓 ິ ິ䁓૛
N=55 㐠香䁘 i 3412 㐠t香䁘 2= 214912

= l䁓l
= 7.75 The variance (s2) in this data set is (7.75)2 = 60.08.

2. Finding the SD using the deviations, the formula is:


㐠 㐠
i

where: = interval
= number of samples
㐠 = summation of frequency deviation
㐠 = summation of frequency x squared deviation
Steps:
1. Choose any step interval for the assumed mean as the arbitrary starting point
or “origin”. In the example given, the interval 60-62 has been chosen. Call this
interval zero deviation, and the next higher interval +1, the lower interval -1, etc.
These are shown in the column labeled d. (Note: Any interval can be chosen,
and the final result will be the same)
2. Multiply frequency (f) by the number of deviations (d) and the resulting
product is shown in column labeled fd. Get the sum of fd by taking into account
the plus and minus signs.
3. To get fd2, multiply d by the fd. Then get the sum of fd2
Illustration:

X F d fd fd2
75-77 3 5 15 75
72-74 4 4 16 64
69-71 6 3 18 54
66-68 5 2 10 20
63-65 8 1 8 +67 8
60-62 9 0 0 0
57-59 5 -1 -5 5
54-56 8 -2 -16 32
51-53 3 -3 -9 27
48-50 2 -4 -8 32
45-47 2 -5 -10 - 48 50
N=55 ∑fd = +19 ∑fd2 = 367

㐠 㐠
i

쳌 쳌
i 㷟

i 쳌䁓 䁓 㷟

i 쳌䁓쳌

i 䁓

= 7.75
The Coefficient of Variation

The coefficient of variation (CV) is a measure that compares the variability


of two sets of data. The formula is:

쳌䁐 䁐䁠 E L䁐쳌LL
CV = E䁐
x 100%

Using Example 1 (Algebra scores), the Standard deviation (s) = 6.25 and Mean = 83

쳌䁓
CV = x 100% = 7.53 %

The computed CV of 7.53 indicates that the variability or the degree of


differences of the Algebra scores is relatively low (the scores are closed to each
other).

Using Example 2 (Geometry scores), the Standard deviation (s) = 17.28 and Mean = 83


CV = x 100% = 20.82 %

The computed CV of 20.82 indicates that the variability of scores is


relatively higher compared to the data set in Example 1. This means that the
Geometry scores fluctuate more than the Algebra scores, or that the Geometry
scores are more variable than the Algebra scores.

You might also like