Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Chapter 3:-Organisation of data

Definition of organisation:-`​ `organisation of data refers to the systematic


arrangement of data so that comparison and future analysis is made
possible".
Definition of classification:-​``classification is a process of arranging data
into homogeneous groups of classes according to their common
characteristics".
Classification of data:-​raw data is classified into various ways depending
upon the requirements
1. Chronological classification.
2. Spatial classification.
3. Quantitative classification.
4. Qualitative classification.
Chronological classification:-
when the data is classified or arranged by the time of its occurrence such as
year, months, weeks, day etc it is called chronological classification.
Ex:-population of India (in cr)
1951. 35.7
1961. 43.8
The above example shows that the population of India is depicted in the time
series with different values shown for different years.
Spatial classification:-​when the data is classified by the geographical
regions or location such as country, States ,cities, districts etc it is called as
spatial classification.
Quantitative classification:-
when classification is done by their characteristics and expressed in
numerical terms it is called quantitative classification.
Ex:-income,weight,height,age,production ,marks etc
Qualitative classification:-​when data is classified in accordance with some
quality or attributes it is called as qualitative classification.
Ex:-nationality, literacy, religion, gender, marital status.
Variables:-​ ​``​When an entity or a value undergoes a change from person to
person, place to place and time to time it is called variable".
Variables are of two types:-
Discrete variables
Continuous variables.
Discrete variables:-``its value changes only in jumps of finite numbers".
the values jump from one finite number to the other but do not take any
intermediate value and therefore fractional values and decimal values do not
occur.
Ex:-population, number of books in the library.
Continuous variables``a continuous variable can take or assume any
numerator value like a decimal,integer,fractional value etc".
Ex:-height of a student, distance, time, length etc.
Frequency distribution:-
``a frequency distribution is a comprehensive way to classify raw data of
a quantitative variable that are distributed with their corresponding class
frequencies".
Concepts under frequency distribution:-
1.​ Frequency:-​``frequency refers to the number of values in a particular class
on the number of observations in a particular class interval''.
2.​ Class limit:-``​it refers to the two ends of a class interval".
a) the lower values called as the lower class limit.
b) the highest value is called upper class limit.
3. ​Class interval/class width:-
``​the difference between the upper class limit and lower class limit is called the
class interval /class width.".
4.​ Range:-​``it is the difference between the largest and the smallest value of a
variable".
​Range=largest item-smallest item
5.​ Class midpoint:-``​class point is the middle value of a class".
"it lies halfway between the lower class limit and upper class limit of a class".
Class midpoint=upper class limit+lower class limit

How to prepare a frequency distribution:-


Frequency distribution addresses four important questions:-
a) How many classes should we have?
b) What should be the size of each class?
c) How should we determine the class limit?
d) How should we get the frequency for each class?
How many classes should we have?
# Before determining the number of classes we find out the extent of the
variable in hand and this can be obtained from the range available to us.
#a large range indicates that the variables are widely spread and a small
range indicates that the variables are spread narrowly.
#after obtaining the value of the range it becomes easier to determine the
number of classes.
#when small class intervals are selected the number of classes could be large
and therefore difficult to handle.
#If we choose a class interval that is too large for the number of classes would
become too small resulting in loss of information.
#There is no hard and fast rule to determine the number of classes. The
general rule of use is to reduce the class interval to a level between 5 and 15.
2.​ What should be the size of each class:-
#we can decide class intervals once we determine the number of classes and
vice-versa therefore the two are interlinked with each other.
#a class interval is generally chosen which it is of equal magnitude.However
we could choose a class interval that might not be of equal magnitude and in
this case the classes would be of unequal width.
How should we determine the class:-
The two methods for determining the class limit are
Exclusive method
Inclusive method
Exclusive method:-``​the upper class limit of a class is excluded but the lower
class limit of a class is included in the interval".
In this method the classes are formed in such a way that the upper
class limit of one class interval coincides with the lower class limit of the next
class interval.
This method of classification is more suitable for continuous variables.
Ex:-10-20;20-30;30-40.
In the above example 20 would be included in the lower class interval of 20-30
and not in the upper class interval of 10 -20.
Inclusive method:-``​in the inclusive method the upper and lower class limits
are included in the same class interval".
Ex:-0-9;10-19;20-29.
In the above example 0 and 9 is included in the same class interval and 10
and 19 is included in the same class interval itself.

Adjustment in class interval:-


In our example we take the frequency distribution of income of employees of a
company.

Income (rs). No of
Employees
800-899. 50
900-999. 100
1000-1099. 200
A proper observation of the example shows that the gap between 899 and
900 is just one and we can ensure continuity by making an adjustment in the
class interval.
The adjustment is done in the following ways:-
1. Find the difference between the lower limit of the second class and upper
limit of the first class.
2. Divide the difference obtained by two.
3. Subtract the value obtained from the lower limit of all classes.
4.add the value of to the upper limit of all classes.
The modified table would look in the following manner.
Income (in rs). No of
Employees
799.5-899.5. 50
899.5-999.5. 100
999.5-1099.5. 200
4.​ How should we get the frequency for each class?
Frequency of an observation means how many times the observation occurs
in the raw data.
The counting of frequency is done by tally marks against a particular class.

Finding class frequency by tally marking:-


Numericals:-​prepare a tally marking chart for the marks obtained in
economics by 25 students in an examination using exclusive method.
20, 17,15 ,22, 29
21, 23, 27, 18, 12
7, 2, 9, 4, 1
8 , 3, 10, 5, 20
16, 12, 8, 4, 28
Class Observations Tally mark Frequency Class
midpoint

0-10 7,8,3,2,9,8,4,5 10 5
,4,1

10-20 16,12,17,15, 7 15
10,18,12

20-30 20,21,23,27, 8 25
22,29,20,28

Loss of information:- ​The classification of data as a frequency distribution


has an inherent shortcoming while it summarises raw data making it concise
and comprehensible it does not show the details that are found in raw data.
The loss of information occurs because when data are grouped into classes
and individual observations have no significance in future statistical
calculations.
All values in this class are assumed to be equal to the mid value of the
class interval ,for the statistical calculations are based on the class mid value
and not on the values of the observation of the classes.
Thus use of class midpoint instead of actual values of the observation
involves considerable loss of information.
Frequency distribution with unequal classes:-​In some frequency
distributions we notice that most of the observations are concentrated in
certain class intervals and the rest of the class intervals are scarcely
populated with observations.
In order to overcome this problem classes are formed in such a way that
class midpoint coincides to a value around which the observations of class
tends to concentrate then unequal class intervals are more appropriate.
For example if the class width is shown as 40 - 50 50 - 60 which is the class
interval of 10 in thinly populated areas and where it is thickly populated it may
be written as 22-25 and 25 -30 with intervals of 5.
Thus we notice that the new class mark values are more representative of
the data in these classes than the old values.
Frequency array:-``​classification of data for a discrete variable is called
as a frequency array".
Discrete variables take values which are whole and not intermediate of
fractional values. We have frequencies that correspond to each of its integral
values since there are no classes in the frequency array there would be no
class intervals.
Univariate and bivariate frequency distribution:-
Univariate frequency distribution``​a frequency distribution of a single
variable is called as an univariate frequency distribution".
Bivariate frequency distribution:-``​a bivariate frequency distribution is
frequency distribution of two variables".
frequency distribution of two variables namely sales and advertisement
expenditure is shown with sales in the columns and advertisements in the
rows.

Lecturer in Economics
Mrs Mini Saji.

You might also like