Mt271 Lecture Notes 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

CHAPTER ONE

DESCRIPTIVE STATISTICS

1.1. Introduction
Statistics is a Science of collecting, organizing, summarizing,
presenting and analyzing of data as well as drawing valid
conclusion and making reasonable decisions on the basis of that
analysis.

Statistical investigation and analysis of data fall into two broad


categories; these are descriptive statistics and inductive/inferential
statistics.

Descriptive statistics deals with processing data without attempting


to draw any inferences from them. It refers to the presentation of
data in the form of tables, charts/graphs and gives some
characteristics of data such as averages and dispersion.

Inductive statistics is a scientific discipline concerned with


developing and using mathematical tools to make forecasts and
inferences. The term inference means the act or process of deriving
a conclusion based solely on what an individual knows.

This chapter introduces presentation of data techniques that are


commonly used in statistics. Inductive statistics will be discussed
later after the review of probability and probability distributions.

1
1.2. Frequency distribution
A frequency distribution is the arrangement of data in tabular form
according to frequencies. Data in frequency distributions may be
ungrouped or grouped.

1.2.1. Ungrouped data


In this case each individual data is assigned its own frequency
when formulating frequency distributions.

Example 1.1
The following data were obtained when a die was rolled 30 times.
1 2 4 2 2 6 3 5 6 3 3 1 3 1 3
4 5 3 5 3 5 1 6 3 1 2 4 2 4 4

Use the above data to construct a frequency table

Solution
The frequency table is constructed by tallying repeated
observations/numbers, in order to know a number of times a certain
observation appears in the data set.
This exercise is shown in the following table.

Table 1.1 : Frequency table of ungrouped data

Numb Tally Freq. Relative Percenta


er Frequency ge
Frequenc

2
y
1 5 5/30 = 16.67
0.1667
2 5 5/30 = 16.67
0.1667
3 8 8/30 = 26.66
0.2666
4 5 5/30 = 16.67
0.1667
5 4 4/30 = 13.33
0.1333
6 3 3/30 = 10.00
0.1000
TOTA 30 1.0 100.00
L

1.2.2. Grouped data with classes of equal width

When there is a huge mass of data with many of the values being
distinct, it is convenient to form a grouped frequency distribution
rather than ungrouped.
In this case various values are grouped in a class and they are
tallied to obtain a class frequency. The grouped frequency
distributions of equal class size are reasonable only when the data
do not contain extreme values (values that are very far from others
in the same data set).
3
Consider two sets of data
Set 1: 6, 8, 5, 8, 9, 4, 7 and 9. There are no extreme values in this
set.
Set 2: 12, 15, 9, 16, 90 and 600. In this case 90 and 600 are the
extreme values.

There are no specific rules in formulating such kind of frequency


distributions, it depends on the number of classes do you want. It is
advised to have grouped frequency tables with number of classes
between 6 and 12 inclusive, depending on the size of the data set.

The following steps may however be helpful in formulating such a


distribution.

1. Identify the smallest and largest values of the data set and hence
compute the range.
2. Decide on the number of classes do you want in the distribution
and hence compute the class size ℎ using the relation
range
h
Number of classes
If there are 𝑁 individuals in the data set, one can approximate
the number of using the formula
𝑁𝑜. 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 ≈ 1 + 3.3 log 𝑁

4
3. Write your first class of size h with the lower limit (first value) 1
to 3 units below the smallest value. Make sure that all data are
included in the distribution.
4. Tally the frequency of each class and hence obtain a grouped
frequency distribution.

Example 1.2
The following data give the amount (in dollars) spent on groceries
by a family during the past forty weeks

32 22 19 18 43 42 40 43 18 21
31 26 22 25 47 40 26 32 22 34
28 35 47 26 35 38 35 28 19 38
35 38 36 25 22 45 48 26 34 41

Construct a frequency distribution using seven classes

Solution
The minimum value is 18, the maximum value is 48.
Range = 48 – 18 = 30. Number of classes = 7
Then class size h = 30/7 = 4.29 ≈ 5
The classes are 15 – 19, 20 – 24, 25 – 29, 30 – 34, 35 – 39, 40 – 44,
45 – 49 which surely include all values from 18 to 48.

The frequency distribution for the grocery expenditure is


formulated below:

5
Table 1.2: Frequency table of grouped data

Class Tally Frequency

15 – 19 4

20 – 24 5

25 – 29 8

30 – 34 5

35 – 39 8

40 – 44 6

45 – 49 4

TOTAL 40

1.2.3 Classes of unequal width/size

If data consists of some extreme values, the previous techniques


cannot be generally applicable. In this case only values that are
closer from each other are considered first and the extreme values
might be grouped together into one class.

Example 1.3
Prices of thirty stocks (in thousands of Shillings) on a given day
were recorded as follows:

6
11.2 8.9 20.0 9.5 35.0 41.0 14.6 100.0 9.0 10.5
79.0 32.5 46.7 22.9 13.5 17.3 41.8 30.4 93.0 33.7
14.4 20.9 34.5 10.8 45.7 104.0 42.6 10.1 41.0 53.8

Formulate a grouped frequency distribution of five classes only.

Solution
Although data range from 8.9 to 104 we find that most of the values
concentrated between 8 and 55, and only few observations fall
between 55 and 104.
Since we need only five classes, we shall obtain four classes from
nearly closed values and one class for the extreme values.
For the first four classes we proceed as follows:
Minimum value = 8.9, maximum value = 53.8, range = 44.9
Number of classes = 4. Hence h = 44.9/4 = 11.225 ≈12. So the first
four classes are 8 – 19, 20 – 31, 32 – 43, 44 – 55, and the fifth class
is 56 – 105 whose width is different from 12.
The frequency distribution table is given below:

Table 1.3: Frequency table of grouped data with unequal width

Price Tally Frequency

8 – 19 11

7
20 – 31 4

32 – 43 8

44 – 55 3

56 – 104 4

TOTAL 30

1.3 Class Limits, Class Boundaries, Class Marks and Class


Intervals

Class limits are the lower and upper values of a class. Thus each
class has lower and upper limits.

A class boundary is the middle value between an upper limit of the


preceding class and the lower limit of the current class. This gives
the lower class boundary

A class mark is the middle value between lower and upper class
boundaries or limits

A class interval/size/width/length is the difference between upper


boundary and lower boundary of a class.

8
Example 1.4
Find the class limits, boundaries, class marks and class width of the
following classes 15 – 19 and 20 – 29.
Solution
The required statistics are summarized in the following table.

Table 1.4 : Computed limits

Limits Boundaries Class Class


Class Lower Upper Lower Upper mark size
15 – 19 15 19 14.5 19.5 17 5
20 – 29 20 29 19.5 29.5 24.5 10

1.4 Histograms
A histogram is a graphic representation of frequency distribution,
with vertical rectangles erected on the horizontal axis. The
rectangles are joined through the class boundaries and the
frequencies give their height on the vertical axis.

Example 1.5
Draw the histogram for grocery problem given in Example 1.2.

Solution
In this case we first create frequency table with class boundaries as
shown below:

9
Table 1.5: Class boundaries and frequency

Amount Boundaries frequency


15 – 19 14.5 – 19.5 4
20 – 24 19.5 – 24.5 5
25 – 29 24.5 – 29.5 8
30 – 34 29.5 – 34.5 5
35 – 39 34.5 – 39.5 8
40 – 44 39.5 – 44.5 6
45 – 49 44.5 – 49.5 4

The histogram is given below:

The Histogram for Amount Spent in Grocery

9
8
7
Frequency

6
5
4
3
2
1
0
14.5 – 19.5 19.5 – 24.5 24.5 – 29.5 29.5 – 34.5 34.5 – 39.5 39.5 – 44.5 44.5 – 49.5

1.5 Frequency Polygon

10
A frequency polygon is a polygon whose vertices are the
frequencies at the class marks of the classes.
To create a frequency polygon, one have to extend the distribution
by introducing one class before the lowest class and one class
after the highest class both of them will be having zero
frequencies.
Example 1.6
Draw a frequency polygon for the data in example 1.2.

Solution
The frequency distribution with class marks is shown below

Amount Class mark Frequency


10 – 14 12 0
15 – 19 17 4
20 – 24 22 5
25 – 29 27 8
30 – 34 32 5
35 – 39 37 8
40 – 44 42 6
45 – 49 47 4
50 – 54 52 0

The frequency polygon is given below where the class marks are
amounts in dollars
11
The frequency Polygon

9
8
7
6
Frequency

5
4
3
2
1
0
$ 12 $ 17 $ 22 $ 27 $ 32 $ 37 $ 42 $ 47 $ 52
Class marks

1.6 Cumulative Frequency Polygon (OGIVE)


This is a line graph obtained by representing the upper class
boundaries along the horizontal axis and the corresponding the
cumulative frequencies along the vertical axis.
Example 1.7
Draw a cumulative frequency polygon for the data given in table
1.5.
Solution
The table will be extended by adding a column of upper class
boundaries and a column of cumulative frequency

Amount Boundaries frequency Upper boundary Cum. frequency


Less than 14.5 0
15 – 19 14.5 – 19.5 4 Less than 19.5 4
20 – 24 19.5 – 24.5 5 Less than 24.5 9

12
25 – 29 24.5 – 29.5 8 Less than 29.5 17
30 – 34 29.5 – 34.5 5 Less than 34.5 22
35 – 39 34.5 – 39.5 8 Less than 39.5 30
40 – 44 39.5 – 44.5 6 Less than 44.5 36
45 – 49 44.5 – 49.5 4 Less than 49.5 40

The cumulative frequency polygon (OGIVE) is shown below.

Cumulative Frequency Polygon

45

40

35
Cumulative Frequency

30

25

20

15

10

0
Less than Less than Less than Less than Less than Less than Less than Less than
14.5 19.5 24.5 29.5 34.5 39.5 44.5 49.5
Boundaries

13

You might also like