Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 22

7.

REPRESENTATION OF DATA
Objective:
After studying this unit, reader should be able to
 Classify data
 Know about the stem and leaf diagram
 Draw the stem and leaf diagram
 Know about frequency distribution
 Construct a frequency distribution.
 Know relative and cumulative frequencies.
 Construct relative and cumulative frequencies.
 Know different graphs
 Draw the graphs for the given data

Structure
7.1 Introduction
7.2 Classification of data
7.3 Construction of frequency table
7.4 Relative frequency distribution
7.5 Cumulative frequency distribution
7.6 Stem and leaf diagram
7.7 Diagrammatic representation of data
7.8 Key words
7.9 Suggested readings
7.10 Review exercise

7.1 INTRODUCTION
Statistics is a science of aggregates. It is science of the group of numbers. It deals with a set of
data. The collected data is called as the Raw data. Which is just the collection of the facts and
figures from the population or sample. These data is arranged, displayed, summarized, compiled
and analyzed. Compilation of data can be tabular, graphical.
In tabular compilation we represent the data in a table form. It may be a simple classification as
per some class interval. Graphs and charts are used to display the data diagrammatically

7.2. CLASSIFICATION OF DATA

The collected data is classified using a tally marks. Tally marks are the small vertical lines used
as symbols to represent the number. The raw data that is collected is classified using the
frequency table.

Classification gives the summery of the data. It facilitates the comparison if any between the
attributes of the variable under consideration. Classification highlights the characteristics of the
data.

110
Basis for classification may be area wise, it can be time wise, it can be quality wise or it may be
quantity wise.
Area wise classification demands for the separation of the data as per the geographical areas. The
sales of different brands of colour televisions in different parts of country. This type of data may
give us information about the pattern of choices of the customer in different parts of the country.
Accordingly the colour television company can develop their strategy for advertising in different
parts of country in different media of advertising.

Time wise classification of data is arranging the data as chronological data. The time series data
is a classification of data with respect to time. Time based classification may have the basis for
classification as day or month or year or a decade etc. The sensitive index of a stock market has a
base of a day. The turn over of a company may have the base as a year.

Qualitative classification of the data is according to some attribute or characteristic that may not
be measured. Classification may be dichotomous i.e. in to categories as having and not having.
Classification may have different classes. The students in a class of MBA may be classified as
having graduation in Science, Commerce, Arts, and Engineering etc.

Quantitative classification is used when the characteristic under consideration is measurable. The
salary of employees, length of iron rods that are manufactured by a machine, quantity of a soft
drink that is dispensed by the machine etc.

7.3 Construction of frequency table


The data is represented as a frequency table. First check the data type as discrete and continuous.
The continuous data types will represent the data using class intervals.

Case I: Discrete data


When data is discrete data we classify the data as follows

Consider the data giving the details about the number of employees working in small scale
industries in a particular MIDC area.

21 22 26 24 28 21 23 25 26 25 25
26 28 21 24 26 23 25 21 25 26 23
24 28 26 28 28 25 26 25 21 24 29
28 23
To classify this data we use Tally marks. For each entry in the above data we put a vertical line as
tally mark to find how many 21’s are there, 22’s are there and so on. Four are vertical lines and
the fifth is a slanting that makes a bundle of five lines.

111
Number of employees Tally marks Frequency
21 5

22 1

23 4

24 4

25 7

26 7

27 ___ 0

28 6

29 1

* Total 35

The frequencies are written counting the number of tally marks against the number.

Case II: Frequency table for continuous data

For continuous data we make non-overlapping sub classes called as class intervals. The
minimum and the maximum observation determine the range of the class intervals. The
number of classes should not be too large of too small. The number of classes should be
between 8-12. But there is no hard and fast rule.
The classes indicates the part of the whole range over which the observations are
scattered. Class is identified by its limits that are called as class limits or class
boundaries. The class limits restrict the numbers in the given range. The lower end of
the class is called as lower limit and the upper end of the class is called as upper limit.

Example: if the classes are 0-10; 10-20; 20-30 …70-80 the lower limit of class 20-30 is
20 and the upper limit of the class 20-30 is 30.

If the upper limit of the class is same as the lower limit of the class it is called as the
exclusive class interval. In such case the observation that is exact as the upper limit is
included in the next class where that number is a lower limit.

We may get the classes where the upper class limit of a class is not same as the lower
class limit of the successive class. Then we need to make the continuity correction. As in

112
case of the continuous variable there are always observations possible between any two
numbers. Non-similar upper and lower limits of the successive classes indicate that there
is discontinuity that restricts the variable to take values from that particular part of the
population.
Continuity correction is the following process of making the class intervals continuous,
with out breaks.
The continuity correction is applied by the following method. The part of the class
interval, which is not included in the classes, is evenly distributed in the classes by
extending the limits of the class.
The correction factor is the half of the difference of the lower limit of the second class
and the upper limit of the first class.

Illustration: if the class intervals are


0-9
10-19
20-29
30-39
and so on. Here the random variable can take all possible values integers as well as the
fractional. The values that are in 9-10 19-20 29-30 are not at all included in the class
intervals so we need to make the class intervals continuous by redefining the class
intervals so that there is no discontinuity and the class width also does not change.
The excluded portion is 9-10 of width 1, 19-20 again of width one so we divide the un
attended portion in to two halves and add one part to the upper limit and subtract from the
lower limit making it as –0.5 – 9.5, 9.5-19.5 19.5- 29.5 and so on.
So after applying continuity correction we have class intervals of the width 10 and also
taking in to consideration all the values the variable can take.

Activity:

Make the following classes continuous by applying the continuity correction.

1. Class interval 0-9.5 10-19.5 20-29.5 30-39.5 40-49.5

After applying the continuity correction we get the classes as

Class interval……… …….. ……….. ………. ……….

2. class interval 1-2.99 3-4.99 5-6.99 7-8.99

After applying the continuity correction we get the classes as

Class interval……… …….. ……….. ………. ……….

3.Class interval101-150 150-200 201-250 251-300 300-350

After applying the continuity correction we get the classes as

113
Class interval……… …….. ……….. ………. ……….

The difference between the upper and the lower limit is called as the class width. The
class width indicates the span or size of the class. The class width should not be very
small or very large. If the class is 20-30; the class width is 30-20=10. Generally the
classes are formed such that the width is 5 or 10 or a multiple of 5.

Class intervals may be open ended also.


For example : below 10 10-20 20-30 30-40 40-50
50-60 60and above.
The last two class intervals are the open ended class intervals. The open ended classes are
always at the extreme of the distribution. It can not be in the middle of the distribution.

The midpoint of the class is called as Class Mark. It is calculated as the sum of the upper
and lower limit divided by two

Class Mark =(Upper class limit +lower class limit)/2

With the continuous data we are going to make the use of class mark very often.

Activity:

1. Write the data in form of frequency table.


The data related to the age of workers in a department of a manufacturing.

23 32 25 26 28 29 35 36 34 38 36
25 54 52 53 56 58 59 50 45 44 43
52 32 35 39 48 39 31 35 60 25 45
45 56 52 23 56 31 32 35 39 48 52
32 35 59 42 25 28

Class interval Tally mark Frequency

2.Following data gives the details of salary of workers in an organization.

114
2456 2555 2600 2700 2850 3010 2566 2940 3650 2640 3120
2479 2680 2150 3200 4500 2790 2460 2510 2340 2860 2700
2170 2190 2350 2640 2890 2645 3500 2150 2350 2650
2450 2489 2310 2650 2480 2660 2330
Class interval Tally marks Frequency

7.4 Relative frequency distribution.

The relative frequency gives the fraction of the total portion contained in a class. The
proportion of the observations lying in a non overlapping class intervals is shown in the
relative frequency distribution. For the data of n observations the relative frequency is
calculated as
frequency of the class
Relative frequency of a class =
total number of observations

Relative frequency distribution is the tabular summery of the relative frequencies for all
the classes.

Illustration

According to the Beverage Digest, Coke classic, Diet coke, Dr. Pepper, Pepsi cola and
sprite are the five top selling soft drinks. The data below shows the drinks selected by 50
soft drink purchases.
Frequency distribution of soft drink purchases
Soft drink frequency
Coke classic 19
Diet coke 8
Dr. Pepper 5
Pepsi cola 13
Sprite 5

The relative frequency distribution is


Soft drink frequency Relative frequency
Coke classic 19 0.38

115
Diet coke 8 0.16
Dr. Pepper 5 0.10
Pepsi cola 13 0.26
Sprite 5 0.10

Total 50 1.00

Exercise

The time in days required for completing year-end audits for a sample of 20 clients of an
accounting firm. Classify the data and find the relative frequency distribution.

Year-end audit times in days


12 14 19 18 15 15 18 17 20 27 22
23 22 21 33 28 15 18 16 13

We classify it in the following manner


Audit time (days) frequency Relative frequency
10-14 4 0.2
15-19 8 0.4
20-24 5 0.25
25-29 2 0.10
30-34 1 0.05

Total 20 1.00

Exercise:
The doctor’s office staff has studied the waiting times for patients who arrive at the office
with are quest for emergency service. The following data were collected over one month
period. Waiting times are in minutes
2 5 10 12 4 4 5 7 11 8 9
8 12 21 6 8 7 13 18 3
Use classes of 0-4, 5-9 etc. Show the frequency distribution
Show relative frequency distribution
What is the proportion of patients needing emergency service have waiting time of 10-
14?

116
7.5 Cumulative frequency distribution

A variation o frequency distribution is a cumulative frequency distribution. Cumulative


frequency shows the number of data items with values less than or equal to the upper
class limit of each class and the number of data items with values greater than or equal to
the lower class limit of each class. That means cumulative frequencies are of two types,
less than and greater than equal to type.

Consider following frequency distribution.

Class Interval Frequency Cumulative frequency


Less than equal to Greater than equal to
0 – 10 5 5 95 + 5 = 100
10 – 20 12 5 + 12 = 17 83 + 12 = 95
20 – 30 17 17 + 17 = 34 66 + 17 = 83
30 – 40 26 34 + 26 = 60 40 + 26 = 66
40 – 50 20 60 + 20 = 80 20 + 20 = 40
50 – 60 17 80 + 17 = 97 3 + 17 = 20
60 – 70 3 97 + 3 = 100 3

Less than equal to type cumulative frequency, and the frequency is less than or equal to
the upper limit of the class intervals. How many observations are less than or equal to 10,
they are 5. How many observations are less than or equal to 20, they are 5 + 12 = 17 and
so on. How many are less than or equal to 70, they are all 100.

In greater than or equal to type cumulative frequency, the number of observations which
are greater than or equal to the lower limit of the class interval are considered. Consider
the last class interval 60 – 70. How many observations are greater than or equal to 60,
they are 3. How many are greater than or equal to 50, they are 3 + 17 = 20. And in the
same way all 100 are greater than or equal to.

Exercise

For the following distribution form the cumulative frequency distribution.

Audit time (days) frequency

117
10.14 4
15-19 8
20-24 5
25-29 2
30-34 1

Total 20

7.6 Stem Leaf Diagram


Another way of classifying the data is Stem Leaf diagram. In this the data is represented
as a stem of a tree and its leaves. The data is classified as in the ten’s, twenties, thirties
and so on so that identification of the frequencies become easier
Process of drawing a stem and leaf diagram is explained with the following example.

Consider the data


23 32 25 26 28 29 35 36 34 38 36
25 54 52 53 56 58 59 50 45 44 43
52 32 35 39 48 39 31 35 50 25 45
45 56 52 23 56 31 32 35 39 48 52
32 35 59 42 25 28
The data has two digit numbers. Consider the tens place digit . They are two three four
and five only. Write these digits to the left hand side of a vertical line.
5
2 3 5 6 8 9 5 5 3 5 2 10
2
9
3 2 5 6 4 8 6 2 5 9 9 1 5 1 2 5 9 2 5 18
5
2
4 5 4 3 8 5 5 8 2 8
1
5
5 4 2 3 6 8 9 0 2 0 6 2 6 2 9 14
1
9
Inverted stem and leaf diagram will form the histogram.
9
5
2
6
8
4
6
118
5
2
9
2
6
2
2 6
5 0
3 2 2
5 8 0
5 5 9
9 5 8
8 8 6
6 3 3
5 4 2
3 5 4

2 3 4 5

The frequency distribution of the above data is as follows

Class interval Frequency


20-29 10
30-39 18
40-49 8
50-59 14

Class intervals can be made continuous as discussed in the above discussion.

The stem and leaf diagram can be drown using the two digits as stem and one digit as
leaves as well as one digit as a stem and two digit as leaves in case of three digit
numbers.

Exercise

In a study of job satisfaction, a series of tests was administered to 50 subjects. The


following data were obtained; higher scores represent greater dissatisfaction.
87 76 67 58 92 59 41 50 90 75 80
81 70 73 69 61 88 46 85 97 50 47
81 87 75 60 65 92 77 71 70 74
53 43 61 89 84 83 70 46 84 76 78
64 69 76 78 67 74 64
Construct a stem and leaf display for the data.

119
7.7 Diagrammatic presentation of data

The data presented using various diagram for better understanding of the patterns in the
data. The various graphs and diagrams are used depending upon the purpose and the
need.
The most common are Histograms, line diagrams, bar diagrams, frequency polygon,
cumulative frequency curves/ogives and pie diagrams.

Line diagrams: Line diagrams presents the two variable data. One variable is
plotted on the X axis and the second variable is plotted on the Y axis. The points
are joined using the lines . Line diagram gives the increase or decrease of the data .
Line diagrams are used for time series data, where year is plotted on the X axis
and the value of other variable is plotted on the Y axis which shows the general
tendency of the data as a whole.

Following data gives the wholesale price index for a certain period.

Year Wholesale price index


1994-95 12.5
1995-96 8.1
1996-97 4.6
1997-98 4.4
1998-99 5.9
1999-2000 3.3

120
14
12
10
8
Series1
6
4
2
0
1994-95
1995-96
1996-97
1997-98
1998-99
1999-2000
2000-2001

Activity: Draw a line diagram for the following data.

Monthly sales of a pharmaceutical company for an year’s period is given below;


Month Sales in lakhs of Rupees

Jan 384 July 415


Feb 356 August 423
March 389 September 425
April 401 October 420
May 410 November 418
June 412 December 405

121
Bar diagram: The data is presented using rectangular blocks, horizontal or vertical.
The bars diagrams are used for presenting the facts of the data. The articles in news
paper, magazines, journals etc. use these bar diagram to present the behaviour of the
characteristic/ attribute in over a space.
The vertical bar diagrams present the characteristics or attribute on the X axis and the
corresponding values on the y axis.
Horizontal bar diagram use the axes in reverse order.
We illustrate how to draw bar diagrams using the following example.
Example: Data below shows production of shirts in a manufacturing company is given
below
Year 1990 1991 1992 1993 1994 1995 1996
No. of shirts (‘00) 52 55 56 60 57 58 56

Draw the histogram or vertical bar diagram.

70
60
50
40
30
20
10
0
1990 1991 1992 1993 1994 1995 1996

The histograms are very commonly used to show the comparisons of the observations.
The height of the bars in the histograms is directly proportional to the quantity. The bar
diagrams can be multiple bar diagrams as well as divided bar diagrams.

In multiple bar diagrams two or more sets of interrelated data are represented. The
method remains the same as that of simple bar diagram. Some times bars are shaded or
given different colours as they are showing different items.

Divided bar Diagram: The parts of total are represented as small segments/parts of each
in large bars. The magnitude of the segment in a bar is directly proportional to the
quantity shown by it.

Illustration

122
The data given for different commodities for two families
Family income for two families is 1000 and 1200 respectively
The expenditures are as follows.
Commodity Expenditure
Family A Family B
Food 300 400
Clothing 250 200
Education 50 360
Others 380 300
Savings or deficit +20 -60

100%

80%

60%

40%

20%

0%
1 2

-20%

Exercise

The following data gives the oil-seeds crop production estimated for a season
Oilseed Production in lakh tones
Area A Area B
Ground nut 2.00 1.70
Soya bean 1.25 1.25
Sesame 0.75 0.20

123
Total 4.00 3.15

Pie diagram:
The circle of 360 degrees is divided in to sectors as per the share of the component. The
percentage of the components is converted in to corresponding degree and is shaded or
shown in different colours.

Illustration:

The class of Management in an institute has a constitution of students with the graduation
degree as follows. Represent this data as the Pie diagram

Graduation degree Number of student


Commerce 30
Science 15
Engineering 32
Pharmacy 10
Others 3

composition of the class

1
2
3
4
5

Exercise

124
Plot the pie chart for the data below:

Crop Area in
million hectares
Wheat 16.10
Rice 18.23
Jawar 3.50
Bajra 3.64
maize 1.60
Total 43.07

Frequency curve:
The frequency curve is a smooth curve representing the frequency distribution. On X axis
plot the class marks and on the Y axis plot the frequencies.

Marks 10-15 15-20 20-25 25-30 30-35 35-40 40-45


No.of 10 12 13 22 20 7 4
students

125
Cummulative frequency 25

20

15

10

0
1 2 3 4 5 6
upper class limits

Exercise

Draw frequency curve for the following data


Marks # Of students
10-20 5
20-30 15
30-50 24
50-70 20
70-80 7

126
Ogives:

The cumulative frequency curves are called as the Ogives. The smooth curves can be
drawn for the less than or equal to type or greater than or equal to type. The ogives
represent how many observations lie below or above certain values in the distribution,
rather than recording the numbers within interval.
The general form of the ogives is as follows: On X axis we plot the class limits and along
Y-axis we plot the cumulative frequencies.

Less than equal to type

Greater than equal to type

Class limits

Illustration:

Draw the cumulative frequency curve for the following data:


Height in Cm 150-154 154-158 158-162 162-166 166-170
Number of children 10 12 20 10 8

Less than frequency curve

70
Less than or equal to Cummulative

60

50
frequency

40

30

20

10

0
1 2 3 4 5 6
upper class limits

127
Greater than frequency curve
150 154 158 162 166

70
Greater than or equal to
Cummulative frequency

60
50
40
30
20
10
0
1 2 3 4 5 6
lower class limits

Exercise:

Following data relate to factory size according to employment. Draw a less than curve
and a more than curve for the above data.

Employment size No.of factories


Number In 1000
0-50 31
50-100 29
100-200 70
200-500 63
500-1000 119
1000-2000 126
2000-5000 85

Below given is the frequency distribution of weekly wages of 100 workers in a factory:
weekly wages no. of workers weekly wages no. of workers
120-124 3 145-149 10

128
125-129 5 150-154 8
130-134 12 155-159 5
135-139 23 160-164 3
140.144 31

Draw the ogive for the distribution and use it to determine the median wage of a worker
and verify the result by the formula.

7.8 Key words

Tally marks Tally marks are the small vertical lines used as symbols to represent the
number.

Class intervals :non-overlapping sub classes called as class intervals.

Class limits or class boundaries Class is identified by its limits that are called as class
limits or class boundaries.

Class Mark: The midpoint of the class is called as Class Mark.

Relative frequency: The relative frequency gives the fraction of the total portion
contained in a class.

Cumulative frequency: Cumulative frequency shows the number of data items with
values less than or equal to the upper class limit of each class and the number of data
items with values greater than or equal to the lower class limit of each class.

7.9 Suggested readings

Anderson et al, Statistics for business and economics, eighth edition,2002, Thomson Asia
Pvt. Ltd. Singapore

R. Levin and D. Rubin, Statistics for management, seventh edition,1997,Prntice Hall of


India, New Delhi.

Frank and Althoen, Statistics concept and applications,1994, Cambridge university press,
Cambridge

A.D.Aczel and J. Sounderpandian, Complete Business Statistics, 2002, Tata McGraw


Hill , New Delhi,India

129
W.J.Stevenson, Business Statistics concept and applications, 1978, Harper and Row
publishers, New York, USA.

7.10 Review exercise

1. The following data give the income distribution of workers in two factories.
Construct a relative frequency distributions and cumulative frequency
distributions.

Income in1000Rs. 10-12 12-14 14-16 16-18 18-20 20-22 22-24


Number of workers in
Factory 1 10 15 65 73 70 17 10
Factory 2 25 34 40 50 30 30 10

2. The following data give the income distribution of workers in two factories.
Which distribution shows more variability?
Income in1000Rs. 10-12 12-14 14-16 16-18 18-20 20-22 22-24
Factory 1 10 15 65 73 70 17 10
Factory 2 25 34 40 50 30 30 10

3. The number of apartments in a complex in a city are as below


91 79 66 98 127 139 154 147 192
88 97 92 87 142 127 184 145 162
95 89 86 98 145 129 149 158 241

Construct a frequency distribution using intervals 66-87,88-109……220


Also construct the relative frequency distribution as well as the cumulative
frequency distribution.

4. For the distribution of marks, construct the frequency distribution, relative


frequency distribution.
Obtain the proportion of students scoring marks less than 40.
What is the proportion of students scoring marks more than 60?

Marks less than No. of students


20 10
40 35
60 56
80 90
100 100
5. Draw the ogive for the data in problem no. 4.

6. For the following data related to the age of the policy holder draw the histogram.

130
Age in years 20-25 25-30 30-35 35-40 40-45 45-50
No. of policy holders 8 12 24 16 15 5

7. Live of a model of a refrigerator in a recent survey are given below


Life # of years 0-2 2-4 4-6 6-8 8-10 10-12
Number of refrigerators 5 16 13 7 5 4
Draw the ogives.

8. The table below shows the annual sales ($ millions) of Speedcall mobile
phones of random sample of 150 outlets
.
Annual sale of Speedcall Number of Outlets
mobilephones ($million)
5-9 18
10-14 35
15-19 41
20-24 21
25-29 15
30-34 13
35-39 7

a) What is the proportion of outlets is having annual sales of Speedcall mobile


phones at least $ 30 millions?
b) What proportion of Outlets has the sales between the 20-24?
c) What proportion of outlets has the annual sale of Speedcall mobile phones at the
most $ 29 millions?

131

You might also like