Professional Documents
Culture Documents
(Ibe Dan Onuoha 2020) Pengertian Metode Sturges
(Ibe Dan Onuoha 2020) Pengertian Metode Sturges
Central Tendency
By
ABSTRACT
This work was carried out to verify if Sturges rule gives the best estimate in the measure of
central tendency than those obtained using frequency distribution tables with number of
classes less or greater than that according to Sturges rules and to test if there is a significant
difference in the mean values of ungrouped and grouped data. Five set of data collected from
different textbooks were used to obtain the mean, median and mode without grouping the
data, grouping the data using Sturges rule to obtain the number of classes and grouping the
data using number of classes less and greater than that of Sturges rule. In addition, tests
concerning difference between two means were conducted to check if there is any significant
difference between the means from ungrouped data and those obtained from grouped data
constructed with number of classes obtained using Sturges rule and otherwise. The analysis
show that the estimates obtained from data grouped based on Sturges rule does not always
give the best estimates when compared with the estimates obtained from data grouped
without using Sturges rule. However, there is no significant difference in the means obtained
from ungrouped data and data grouped with number of classes obtained using Sturges rule.
It is recommended that any of the methods most convenient to the students, lecturers, and
researchers should be adopted as there is no significant differences in the results obtained
using the various methods.
INTRODUCTION
Descriptive statistics is concerned with the collecting, processing, summarizing, presentation
and interpretation of qualitative and quantitative data. The reliability of the data collected is
of importance as the accuracy of the results from the analysis depends on it, hence, efforts are
made to eliminate bias during data collection and when the data are being analyzed to unveil
the information contained in them. The analysis of data can be done with or without grouping
the observations into classes depending on the size of the data. This study focuses on the
verification of the Sturges rule (rule for determining the desirable number
of groups into which a distribution for observations should be classified; the number of
groups groups or classes is , where n is the number of observations) for
obtaining approximate number of classes and magnitude of class interval for constructing a
frequency distribution of a set of data. Sturges (1926) in his study titled “determination of the
number of bins/classes used in histograms and frequency tables” designed a rule for obtaining
Page 1 of 16
approximate number k of classes and magnitude of the class interval as
This study was undertaken because most often, researchers have adopted the idea of
organizing data especially when the data are large into a frequency distribution table without
necessarily bothering about the significance or insignificance of the differences that exists
between estimates from the grouped data and those from the ungrouped (raw). The study will
help researchers to know the best frequency construction method (rule) to use in arriving at
the closest result to the raw data; again, it will provide a clearer view to know if there is any
significant difference in the estimates of the raw data and grouped data.
Many researchers had carried out researches on the topics related to this. In determining the
number of class intervals, so many authors recommend using judgment and common sense,
which are two ambiguous terms and lead in turn, to an incredible number of plausible
solutions. Among these authors are Anderson, Sweeney, and Williams (2004), Aron and
Aron (2003), Fox, Levin, and Hsrkins (1993), Glasnapp and Poggio (1985), Hoaglin et al
(1983), Jaeger (1990), Moore (2000) and Ravid (1994). Spiegel & Stephens (1998) defined
raw data or ungrouped data as observed or collected values in their original form.
Shavelson et al (2000) defined grouped data frequency distribution as a table listing scores
grouped into non-overlapping class intervals of equal size or equal width (rather than
individual scores) along with the frequency of scores falling into each class interval.
Harris (1998) defined a grouped data frequency distribution as almost exclusively used when
the measurements is an interval-level or ratio-level variable. Berenson and Levine (1998)
concluded that whenever a set of data contains about 20 or more observations, the best way to
examine such data should to present it in a summary form by constructing appropriate tables
and charts.
Notwithstanding the advantages mentioned above, there are also a number of disadvantages
to the use of grouped data frequency table. A major disadvantage is loss of precision.
Bluman (2004) stated that grouped data frequency distributions can reveal little about the
actual distribution, skew, and kurtosis of data; can be easily manipulated to yield misleading
results, and also deemphasize ranges and extreme values, particularly when open classes are
used”
Page 2 of 16
MacDonald (1982) conceded that: “since the original data are lost in the grouping process,
exact calculations of the mean and standard deviation are impossible” he added that “precise
determination of the median, mode, quartiles, deciles, and percentiles ranks are likewise
impossible, as these statistical measures require the original data in ordered form”.
Based on the review of related literatures on the above captioned work, it was observed that
no one worked on verifying the accuracy of Sturges rule to other iterative process; hence this
work focuses on verifying the accuracy of Sturges rule to the iterative process
METHODOLOGY
Sources of Data
The data used in this research work are purely secondary data collected from different
textbooks (An Introduction to Applied Statistical Methods by Cyprian A. Oyeka, Schaum’s
Outlines of Theory and problems of statistics, A Multisectorial Handbook of Statistics by Joy
C. Nwabueze, Modern Basic Statistics by Vitalis Nwachukwu, Basic Statistics with
Applications Volume One by B. A. Uchendu and Etuk E. H)
Frequency distribution table is a simple table in which the data are grouped into classes on
the basis of common characteristics and the numbers of cases which fall in each class are
recorded. In construction of frequency table, the following points should be kept in mind:
1) The classes should be clearly defined and should not lead to any ambiguity.
2) The classes should be mutually exclusive and non-overlapping.
3) The classes should be exhaustive, that is, each of the given values should be included
in one of the classes.
4) The classes should be of equal width.
5) Indeterminate classes example, the open end classes like less than or greater than
should be avoided as far as possible since they create difficulty in analysis and
interpretation.
6) The number of classes should neither be too large nor too small. It should preferably
lie between 5 and 15. However, the number of classes may be more than 15
depending upon the total frequency and the details required but it is desirable that it is
not less than 5 since in that case the classification may not reveal the essential
characteristics of the population.
This is a summary statistic that represents the centre point or typical value of a dataset.
According to professor Bowley, averages are “statistical constants which enable us to
comprehend in a single effort the significance of the whole “. They give us an idea about the
concentration of the value in the central part of the distribution. The three commonest main
measures of central tendency are: the arithmetic mean, median and mode.
Page 3 of 16
The arithmetic mean: the arithmetic mean of a set of observations is their sum divided by
the number of observations is given by
Mean . . . (1)
In case of grouped frequency distribution, is taken as the class midpoint or class mark of
the corresponding class and is the frequency of the variable .
= . . . (2)
Median of a distribution: this is the value of variable which divides it into two equal parts.
In case of ungrouped data, if the number of observations is odd, then median is the middle
value after the values have been arranged in ascending or descending order of magnitude. In
case of even number of observations, there are two values; the median is then the average of
the two middle terms. In case of continuous frequency distribution, the class corresponding to
the cumulative frequency just greater than is called the median class and the value of
median is obtained by the following formulae:
Median= . . . (3)
Mode: This is the value which occurs most frequently in a set of observations and around
which the other items of the set cluster densely. In other words, mode is the value of the
variable which is predominant in the series. In the case of a discrete frequency distribution,
mode is the value of corresponding to maximum frequency. In the case of continuous
frequency distribution, mode is given by the formula:
Mode = +
. . . (4)
The above methods will be applied using the Sturges rule and two iterative means (with their
respective number of classes just one class below and one class above that obtained using
Sturges rule). By Sturges rule, the number of classes is given as: Number of classes,
= . . . . (5)
Page 4 of 16
Where N is the total number of observations
. . . (8)
we have . . . (9)
degrees of freedom
The data collected for this research work are presented below:
Data set 1
Page 5 of 16
44 30 45 45 45 48 37 39 44 40
55 40 55 33 33 48 31 51 47
44 43 40 28 28 40 37 47 30
56 46 41 25 44 36 40 57 39
48 43 37 39 26 31 44 48 36
Source: An introduction to Applied Statistical Methods by Cyprian A.Oyeka (2009)
Data set 2
68 84 75 82 68 62 88 76 93
73 79 88 73 60 71 59 85 75
61 65 75 87 74 95 78 63 72
66 78 82 75 94 69 74 68 60
96 78 89 61 75 60 79 83 71
79 62 67 97 85 76 65 71 75
65 80 73 57 78 62 76 53 74
86 67 73 81 63 76 75 85 77
Source: Schaum’s Outlines of Theory and Problems of Statistics by Murray R. Spiegel and
Larry J. Stephens (1998)
Data set 3
320 380 340 410 380 340 360 350 320 370
350 340 350 360 370 350 380 370 300 420
370 390 390 440 330 390 330 360 400 370
320 350 360 340 340 350 350 390 380 340
400 360 350 390 400 350 360 340 370 420
420 400 350 370 330 320 390 380 400 370
390 330 360 380 350 330 360 300 360 360
360 390 350 370 370 350 390 370 370 340
370 400 360 350 380 380 360 340 330 370
340 360 390 400 410 410 360 400 340 360
Source: A Multisectorial Handbook of Statistics by Joy C. Nwabueze (2009)
Data set 4
83 84 86 88 84 87 102 75 77 79
109 78 32 49 78 84 85 66 101 57
78 90 72 62 76 72 68 62 86 106
88 104 66 70 43 98 78 87 68 95
95 92 92 66 105 56 98 73 63 103
94 102 98 88 92 69 102 56 97 36
86 67 60 63 95 78 83 103 77 47
68 54 75 97 76 61 94 82 58 37
Source: Modern Basic Statistics by Vitalis Obioma Nwachukwu, et al (2014)
Page 6 of 16
Data set 5
23 14 52 72 48 52 53 38 81 7
17 27 13 91 46 63 44 44 90 55
19 92 71 15 72 4 48 41 18 22
64 70 80 28 63 54 50 61 4 11
58 44 17 35 60 74 70 51 54 17
58 43 09 81 52 72 21 45 62 79
61 62 12 56 51 64 74 68 47 67
57 80 37 63 49 54 63 37 59 38
64 52 93 64 38 56 57 74 53 39
66 47 43 51 32 24 52 25 56 23
Source: Basic Statistics with Applications by Uchendu, B.A. and Etuk, E. H.(2008)
Analysis of Data
Using (3.2), (3.3) and (3.4), we compute for mean, median, and mode for grouped data set
1, 2, 3, 4 and 5
DATA SET 1:
Table 6: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (Sturges Rule)
Page 7 of 16
Table 7: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (First iteration)
Table 8: Mean, Median and Mode from a Frequency Distribution Table Formed Using
(Second iteration)
DATA SET 2:
Table 9: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (Sturges Rule)
Page 8 of 16
Total 80 6030
Table 10: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (First iteration)
Table 11: Mean, Median and Mode from a Frequency Distribution Table Formed Using
(Second iteration)
and
DATA SET 3
Table 12: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (Sturges Rule)
, and
Table 13: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (First iteration)
Table 14: Mean, Median and Mode from a Frequency Distribution Table Formed Using
(Second iteration)
DATA SET 4
Table 15: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (Sturges Rule)
Page 10 of 16
99 – 109 9 104 936 98.5 – 109.5 80
Total 80 6318
Table 16: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (First iteration)
Table 17: Mean, Median and Mode from a Frequency Distribution Table Formed Using
(Second iteration)
DATA SET 5
Table 18: Mean, Median and Mode from a Frequency Distribution Table Formed Using
(Sturges Rule)
Page 11 of 16
75 – 86 5 80.5 402.5 74.5 – 86.5 96
87 – 98 4 92.5 370 86.5 – 98.5 100
100 4942
Total
Table 19: Mean, Median and Mode from the Frequency Distribution Table Formed
Using (First iteration)
Table 20: Mean, Median and Mode from a Frequency Distribution Table Formed Using
(Second iteration)
Summary of Results
Table 21: Means, Median and Mode of Ungrouped Data and Grouped Data (Using
Sturges Rule and Iterations)
First Second
Measures of
Data set ungrouped Sturges rule iteration(one iteration(one
tendency
class below) class above)
Page 12 of 16
Mean 42.43 42.63 42.44 42.23
1 Median 44.00 46.17 43.80 43.88
Mode 51.00 43.67 45.97 43.70
TABLE 22: Means of Ungrouped and Grouped Data (Using Sturges Rule and Iterations)
Ungrouped :
Sturges : =
First iteration: =
Second iteration: =
Page 13 of 16
Testing for difference between variances:
For (4, 4) and , the critical value of is 6.39. Hence, is not significant, hence,
the variances are equal.
Test for difference between Means of Ungrouped Data and Grouped Data (Using
Sturges Rule)
there is no significant difference between the mean of ungrouped data and that of
grouped data obtained using Sturges rule.
there is significant difference between the mean of ungrouped data and that of grouped
data obtained using Sturges rule.
Test for difference between Means of Ungrouped Data and Grouped Data (Using First
Iteration)
there is no significant difference between the mean of ungrouped data and that of
grouped data obtained Using first iteration.
there is significant difference between the mean of ungrouped data and that of grouped
data obtained Using first iteration.
Test for difference between Means of Ungrouped Data and Grouped Data (Using
Second Iteration)
there is no significant difference between the mean of ungrouped data and that of
grouped data obtained using second iteration
there is significant difference between the mean of ungrouped data and that of grouped
data obtained using second iteration.
Page 14 of 16
CONCLUSION AND RECOMMENDATION
This study critically examines the differences existing among the values of the three
commonest measures of central tendency (mean, median and mode) obtained from ungrouped
data and grouped data (using Sturges rule and iterations to obtain the number of classes),
furthermore we examine if there is any significant difference between the means obtained
using ungrouped and grouped data. The analyses of the data which were obtained from
various basic statistics textbooks shows that the means obtained from the grouped data using
Sturges rule are usually higher than those obtained from ungrouped and the other two
iterations, the mode as obtained using Sturges rule are usually lower than those obtained from
ungrouped data and the first iteration ( a class below of the number of classes using Sturges
rule) , and the second iteration (a class above the number of classes using Sturges rule) while
the median value is usually lower than the second iterations but higher than the first iteration.
Generally, the means, medians, and modes obtained using the iterations are closer to those
obtained from ungrouped data. The tests of significance indicate that the means obtained
using Sturges rule and the other two iterations are not significantly different from those
obtained from grouped data.
Since there is no significant difference in the mean values obtained using Sturges rule,
iterations(with number of classes one class below and one class above that of Sturges rule)
and from ungrouped data, we there conclude that any of the methods can be used to obtain
the measures of central tendency.
Based on the result of the analysis, the researcher recommends that any of the methods most
convenient to the students, lecturers, and researchers should be adopted as there is no
significant differences in the results obtained using the various methods.
References
Anderson, D., Sweeney, D., and Williams, T. (2004). Essentials of Modern Business
Statistics with Microsoft Excel (2nd ed.). Mason, OH: South-Western.
Aron,A. and Aron, E. N.(2003). Statistics for Psychology. Pearson Education, Inc.
Berenson, M. L. and Levine, D. M. (1998). Basic Business Statistics (7th ed.). Pearson
College Divison.
Fox, J. A., Levin, J. and Harkins, S. (1993). Elementary Statistics in Behavioral Research.
New York: Harper Collins.
Glasnapp, D. R. and Poggio, J. P. (1985). Essentials of Statistical Analysis for the Behavioral
Sciences. Columbus, OH: Charles E. Merrill.
Page 15 of 16
Harris, M. B. (1998). Basic Statistics for Behavioral Science Research (2nd ed.). Boston,
MA: Allyn and Bacon.
Jaeger, R. M. (1990). Statistics: A Spectator Sport (2nd ed.). New Park, CA: Sage.
Moore, D. S. (2000). The Basic Practice of Statistics (2nd ed.). New York: W. H. Freeman
and Company.
Ravid, R. (1994). Practical Statistical for Educators. Lanham, MD: University Press of
America.
Statistical Reasoning for the Behavioral Sciences (2nd ed.). Needham Heights,MA:
Allyn and Bacon.
Spiegel, M. R. and Stephens, L. J. (1998). Theory and Problems of Statistics. (3rd ed.).
Schaum’s Outline Series. New York: McGraw-Hill.
Sturges, H. (1926). The Chioce of a Class Interval. Journal of the American Statistical
Association. 21(153), pp 65 – 66.
Uchendu, B. A. and Etuk E. H. (2008). Basic Statistics with Applications. Vol 1. Olu Prints,
pp 21.
Page 16 of 16