Professional Documents
Culture Documents
2.3.1 Frequency Distribution: Disadvantages
2.3.1 Frequency Distribution: Disadvantages
Disadvantages
In spite of various advantages of converting a set of raw data into an ordered array, an array is
a cumbersome form of presentation which is tiresome to construct. It neither summarizes nor
organizes the data to present them in a more meaningful way. It also fails to highlight the salient
characteristics of the data which may be crucial in terms of their relevance to decision making.
This task cannot be accomplished unless the observations are appropriately condensed. The
best way to do so is to display them into a convenient number of groupings with the number of
observations falling in different groups indicated against each. Such tabular summary presentation
showing the number (frequency) of observations in each of several non-overlapping classes or
groups is known as frequency distribution (also referred to as grouped data).
The frequency distribution of the number of hours of overtime given in Table 2.2 is shown
in Table 2.4.
Constructing a Frequency Distribution As the number of observations obtained gets larger, the
method discussed above to condense the data becomes quite difficult and time consuming. Thus
to further condense the data into frequency distribution tables, the following steps should be
taken:
(i) Select an appropriate number of non-overlapping class intervals.
(ii) Determine the width of the class intervals.
(iii) Determine class limits (or boundaries) for each class interval to avoid overlapping.
1. Decide the Number of Class Intervals The decision on the number of class groupings depends
largely on the judgment of the individual investigator and/or the range that will be used to
group the data, although there are certain guidelines that can be used. As a general rule, a
frequency distribution should have at least five class intervals (groups), but not more than 15.
The following two rules are often used to decide approximate number of classes in a frequency
distribution:
(i) If k represents the number of classes and N is the total number of observations, then the
value of k will be the smallest exponent of the number 2, so that 2k t N.
In Table 2.3, we have N = 30 observations. If we apply this rule, then we shall have
23 = 8 (< 30); 24 = 16 (< 30); 25 = 32 (> 30)
Thus, we may choose k = 5 as the number of classes.
(ii) According to Sturge’s rule, the number of classes can be determined by the formula
k = 1 + 3.222 loge N
where k is the number of classes and loge N is the logarithm of the total number of
observations.
24 B U S I N E S S STATISTICS
Mid-point of Class Intervals The main advantage of using the above summary table is that the
major data characteristics become clear to the decision maker. However, it is difficult to know how
the individual values are distributed within a particular class interval without access to the original
data. The class mid-point is the point halfway between the boundaries (both upper and lower class
limits) of each class and is representative of all the observations contained in that class.
Arriving at the correct class mid-points is important, for these are used as representative of
all the observations contained in their respective class while computing many important statistical
measures. A mid-point is obtained by dividing the sum of the upper and lower class limits by
two. Problems in computing mid-points arise when the class limits are ambiguous and not clearly
defined.
CHAPTER 2 DATA CLASSIFICATION AND THEIR PRESENTATION 25
The width of the class interval should, as far as possible, be equal for all the classes. If this is
not possible to maintain, the interpretation of the distribution becomes difficult. For example, it
will be difficult to say whether the difference between the frequencies of the two classes is due to
the difference in the concentration of observations in the two classes or due to the width of the class
intervals being different.
The width of the class intervals should preferably be not only the same throughout, but
should also be a convenient number such as 5, 10 or 15. A width given by integers 7, 13 or 19
should be avoided.
Such classification ensures continuity of data because the upper limit of one class is the lower
limit of succeeding class. As shown in Table 2.5, five companies declared dividends ranging from
0 to 10 per cent, this means a company which declared exactly 10 per cent dividend would not
be included in the class 0–10 but would be included in the next class 10–20. Since this point is
not always clear, therefore to avoid confusion data are displayed in a slightly different manner
as given in Table 2.6.
Inclusive Method When the data are classified in such a way that both lower and upper limits
of a class interval are included in the interval itself, then it is said to be the inclusive method of
classifying data. This method is shown in Table 2.7.