Professional Documents
Culture Documents
Mt271 Lecture Notes 1
Mt271 Lecture Notes 1
Mt271 Lecture Notes 1
DESCRIPTIVE STATISTICS
1.1. Introduction
Statistics is a Science of collecting, organizing, summarizing,
presenting and analyzing of data as well as drawing valid
conclusion and making reasonable decisions on the basis of that
analysis.
1
1.2. Frequency distribution
A frequency distribution is the arrangement of data in tabular form
according to frequencies. Data in frequency distributions may be
ungrouped or grouped.
Example 1.1
The following data were obtained when a die was rolled 30 times.
1 2 4 2 2 6 3 5 6 3 3 1 3 1 3
4 5 3 5 3 5 1 6 3 1 2 4 2 4 4
Solution
The frequency table is constructed by tallying repeated
observations/numbers, in order to know a number of times a certain
observation appears in the data set.
This exercise is shown in the following table.
2
y
1 5 5/30 = 16.67
0.1667
2 5 5/30 = 16.67
0.1667
3 8 8/30 = 26.66
0.2666
4 5 5/30 = 16.67
0.1667
5 4 4/30 = 13.33
0.1333
6 3 3/30 = 10.00
0.1000
TOTA 30 1.0 100.00
L
When there is a huge mass of data with many of the values being
distinct, it is convenient to form a grouped frequency distribution
rather than ungrouped.
In this case various values are grouped in a class and they are
tallied to obtain a class frequency. The grouped frequency
distributions of equal class size are reasonable only when the data
do not contain extreme values (values that are very far from others
in the same data set).
3
Consider two sets of data
Set 1: 6, 8, 5, 8, 9, 4, 7 and 9. There are no extreme values in this
set.
Set 2: 12, 15, 9, 16, 90 and 600. In this case 90 and 600 are the
extreme values.
1. Identify the smallest and largest values of the data set and hence
compute the range.
2. Decide on the number of classes do you want in the distribution
and hence compute the class size ℎ using the relation
range
h
Number of classes
If there are 𝑁 individuals in the data set, one can approximate
the number of using the formula
𝑁𝑜. 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 ≈ 1 + 3.3 log 𝑁
4
3. Write your first class of size h with the lower limit (first value) 1
to 3 units below the smallest value. Make sure that all data are
included in the distribution.
4. Tally the frequency of each class and hence obtain a grouped
frequency distribution.
Example 1.2
The following data give the amount (in dollars) spent on groceries
by a family during the past forty weeks
32 22 19 18 43 42 40 43 18 21
31 26 22 25 47 40 26 32 22 34
28 35 47 26 35 38 35 28 19 38
35 38 36 25 22 45 48 26 34 41
Solution
The minimum value is 18, the maximum value is 48.
Range = 48 – 18 = 30. Number of classes = 7
Then class size h = 30/7 = 4.29 ≈ 5
The classes are 15 – 19, 20 – 24, 25 – 29, 30 – 34, 35 – 39, 40 – 44,
45 – 49 which surely include all values from 18 to 48.
5
Table 1.2: Frequency table of grouped data
15 – 19 4
20 – 24 5
25 – 29 8
30 – 34 5
35 – 39 8
40 – 44 6
45 – 49 4
TOTAL 40
Example 1.3
Prices of thirty stocks (in thousands of Shillings) on a given day
were recorded as follows:
6
11.2 8.9 20.0 9.5 35.0 41.0 14.6 100.0 9.0 10.5
79.0 32.5 46.7 22.9 13.5 17.3 41.8 30.4 93.0 33.7
14.4 20.9 34.5 10.8 45.7 104.0 42.6 10.1 41.0 53.8
Solution
Although data range from 8.9 to 104 we find that most of the values
concentrated between 8 and 55, and only few observations fall
between 55 and 104.
Since we need only five classes, we shall obtain four classes from
nearly closed values and one class for the extreme values.
For the first four classes we proceed as follows:
Minimum value = 8.9, maximum value = 53.8, range = 44.9
Number of classes = 4. Hence h = 44.9/4 = 11.225 ≈12. So the first
four classes are 8 – 19, 20 – 31, 32 – 43, 44 – 55, and the fifth class
is 56 – 105 whose width is different from 12.
The frequency distribution table is given below:
8 – 19 11
7
20 – 31 4
32 – 43 8
44 – 55 3
56 – 104 4
TOTAL 30
Class limits are the lower and upper values of a class. Thus each
class has lower and upper limits.
A class mark is the middle value between lower and upper class
boundaries or limits
8
Example 1.4
Find the class limits, boundaries, class marks and class width of the
following classes 15 – 19 and 20 – 29.
Solution
The required statistics are summarized in the following table.
1.4 Histograms
A histogram is a graphic representation of frequency distribution,
with vertical rectangles erected on the horizontal axis. The
rectangles are joined through the class boundaries and the
frequencies give their height on the vertical axis.
Example 1.5
Draw the histogram for grocery problem given in Example 1.2.
Solution
In this case we first create frequency table with class boundaries as
shown below:
9
Table 1.5: Class boundaries and frequency
9
8
7
Frequency
6
5
4
3
2
1
0
14.5 – 19.5 19.5 – 24.5 24.5 – 29.5 29.5 – 34.5 34.5 – 39.5 39.5 – 44.5 44.5 – 49.5
10
A frequency polygon is a polygon whose vertices are the
frequencies at the class marks of the classes.
To create a frequency polygon, one have to extend the distribution
by introducing one class before the lowest class and one class
after the highest class both of them will be having zero
frequencies.
Example 1.6
Draw a frequency polygon for the data in example 1.2.
Solution
The frequency distribution with class marks is shown below
The frequency polygon is given below where the class marks are
amounts in dollars
11
The frequency Polygon
9
8
7
6
Frequency
5
4
3
2
1
0
$ 12 $ 17 $ 22 $ 27 $ 32 $ 37 $ 42 $ 47 $ 52
Class marks
12
25 – 29 24.5 – 29.5 8 Less than 29.5 17
30 – 34 29.5 – 34.5 5 Less than 34.5 22
35 – 39 34.5 – 39.5 8 Less than 39.5 30
40 – 44 39.5 – 44.5 6 Less than 44.5 36
45 – 49 44.5 – 49.5 4 Less than 49.5 40
45
40
35
Cumulative Frequency
30
25
20
15
10
0
Less than Less than Less than Less than Less than Less than Less than Less than
14.5 19.5 24.5 29.5 34.5 39.5 44.5 49.5
Boundaries
13