Data Distribution-1

DATA DISTRIBUTION
12, 15, 20, 22, 25

Minimum(min)= 12
Q1 = 13.5
Median = 20
Q3 = 23.5
Maximum(Max)= 25
Median n + ½
This formula does not tell us the exact median but shows us the position or place in which we
can find the median.
n = number of data points
• 15, 15, 18, 18,18, 18, 20, 25, 25, 26, 27, 30, 30, 32, 32, 35, 35, 35, 50, 51, 80, 85,
95, 99, 150.
• Data must be conveniently listed in order.
• Listing them in order gives us an idea about how data is distributed
Min = 15
Max = 150
Median(2nd quartile)=30
1st quartile(Q1) = 19
3rd quartile (Q3) = 50.5
PRESENTATION OF DATA
• The data collected in a survey is called raw data.
• In most cases, useful information is not immediately evident from the mass of
unsorted data.
• Collected data need to be organized in such a way as to condense the
information they contain in a way that will show patterns of variation clearly.
• Precise methods of analysis can be decided up on only when the
characteristics of the data are understood

METHODS OF DATA PRESENTATION
• Ordered array
• Frequency distribution
• Graphic presentation: This is represented in form of Charts or diagrams
 Line graphs
Bar graph
 Histogram
 Pie chart
Frequency polygon
Frequency curve
Cumulative frequency diagram
Scatter or dot diagram.
Pictures and special curves

The Ordered Array:
• A first step in organizing data is the preparation of an ordered array.
An ordered array is a listing of the values in order of magnitude from the smallest to
the largest value.
• This will enable us to know the range over which the items are spread
• It also gives an idea of their general distribution.
• Ordered array is an appropriate way of presentation when the data are small in size
(usually less than 20).

Example:
• The following values represent a list of ages of subjects who participate
in a study on smoking cessation: 55 46 58 54 52 69 40 65 53 58 The
ordered array is: 40 46 52 53 54 55 58 58 65 69

FREQUENCY DISTRIBUTION
• It allows data to be more easily appreciated and to draw quick comparisons,
• Data in this form are often arranged in the form of a table
• It facilitates analysis of voluminous data collected by putting them into

compact tables.
 Frequency distribution is therefore a classification showing the different

values of a variable and their respective frequencies side by side.
 FREQUENCY: The number of times each value occurs in a data set.

• A frequency distribution can either be classified as
Ungrouped or
Grouped.
A data presented in a raw/uncategorized form is referred to as ungrouped
data.
Example 1. 15, 15, 18, 18,18, 18, 20, 25, 25, 26, 27, 30, 30, 32, 32, 35, 35, 35,
50, 51, 80, 85, 95, 99, 150.
Example 2:
When the raw data is organized into a frequency distribution, then it said to be
a grouped data.
Example:
Marks obtained in a Number of students

Biology exam obtaining marks
10-19 1
20-29 2
30-39 3
40-49 4
50-59 6
60-69 5
A study in which 400 persons were asked how many full-length
movies they had seen on television during the preceding week. The
following gives the distribution of the data collected.
Number of Movies Number of Persons

0 72
1 106
2 153
3 40
4 18
5 7
6 3
7 0
8 1
Total 400
In the above distribution
• Number of movies represents the variable or variate under
consideration
• Number of persons represents the frequency,
• The whole distribution is called frequency distribution

GROUPED FREQUENCIES
• This involves grouping or condensing a set of observations in a continuous, non-
overlapping intervals such that each value in the set of observations can be placed
in one, and only one, of the intervals. These intervals are called "class intervals".
• The two end values of a class interval are called class limits
• The smaller value is lower class limit and the bigger value is the Upper class limit
• The number of items or values belonging to each class interval is called the class
frequency
• Frequency distribution arranged in a grouped form is referred to as grouped
frequency distribution
The construction of grouped frequency distribution consists essentially of four

steps:
1. Choosing the classes
2. sorting (or tallying) of the data into these Classes.
3. counting the number of items in each class
4. displaying the results in the forma of a chart or table

Example:
The following table gives the hemoglobin level (g/dl) of a sample of 50 men.
17.0, 17.7, 15.9, 15.2, 16.2, 17.1, 15.7, 17.3, 13.5 ,16.3,
14.6, 15.8, 15.3, 16.4, 13.7, 16.2, 16.4, 16.1, 17.0, 15.9
14.0,16.2, 16.4, 14.9, 17.8, 16.1, 15.5, 18.3, 15.8, 16.7,
15.9, 15.3, 13.9, 16.8, 15.9, 16.3, 17.4, 15.0, 17.5, 16.1,
14.2, 16.1, 15.7, 15.1, 17.4, 16.5, 14.4, 16.3, 17.3, 15.8.
• We wish to summarize these data using the following class intervals:
13.0 – 13.9 , 14.0 – 14.9 , 15.0 – 15.9 ,
16.0 – 16.9 , 17.0 – 17.9 , 18.0 – 18.9
Solution:
• Variable = X = hemoglobin level (continuous, quantitative)
• Sample size = n = 50
• Max= 18.3
• Min= 13.5
• Notes:
1. Minimum value is found in the first interval.
2. Maximum value is found in the last interval.
3. The intervals are not overlapped.
4. Each value belongs to one, and only one, interval.
5. Total of the frequencies = the sample size = n

Class Interval Tally Frequency
13.0-13.9 /// 3
14.0-14.9 //// 5
15.0-15.9 //// //// //// 15
16.0-16.9 //// //// //// / 16
17.0-17.9 //// //// 10
18.0-18.9 / 1
The grouped frequency distribution for the hemoglobin level of the 50 men is:
Class Interval (Haemoglobin level) Frequency (No. of men)
13.0-13.9 3
14.0-14.9 5
15.0-15.9 15
16.0-16.9 16
17.0-17.9 10
18.0-18.9 1
Total n = 50
• Cumulative Frequencies: When frequencies of two or more classes are added up, such
total frequencies are called Cumulative Frequencies.
This frequencies help as to find the total number of items whose values are less than or
greater than some value.
• Relative Frequencies: Relative frequencies express the frequency of each value or class
as a rate of total frequency.
• RF also shows the percentages of a given frequency count.
RF = Class frequency/total frequency

• Note. In the construction of cumulative frequency distribution, if we start
the cumulation from the lowest size of the variable to the highest size, the
resulting frequency distribution is called `Less than cumulative frequency
distribution' and if the cumulation is from the highest to the lowest value the
resulting frequency distribution is called `more than cumulative frequency
distribution.' The most common cumulative frequency is the less than
cumulative frequency.
• Relative frequency = frequency/n
• Percentage frequency = Relative frequency × 100%
Class Interval Frequency Cumulative Relative Cumulative Percentage Cumulative

(Haemoglobin level) (No. of men) frequency frequency relative frequency percentage
frequency frequency
13.0-13.9 3 3 0.06 0.06 6% 6%
14.0-14.9 5 8 0.10 0.16 10% 16%
15.0-15.9 15 23 0.30 0.46 30% 46%
16.0-16.9 16 39 0.32 0.78 32% 78%
17.0-17.9 10 49 0.20 0.98 20% 98%
18.0-18.9 1 50 0.02 1.00 2% 100%
Total n = 50
• From frequencies:
The no. of people whose hemoglobin levels are between 17.0 and 17.9 = 10
• From cumulative frequencies:
The no. of people whose hemoglobin levels are less than or equal to 15.9 = 23
The no. of people whose hemoglobin levels are less than or equal to 17.9 = 49
• From percentage frequencies:
The percentage of people whose hemoglobin levels are between 17.0 and
17.9 = 20%
• From cumulative percentage frequencies:
The percentage of people whose hemoglobin levels are less than or equal to 14.9 =
16%
The percentage of people whose hemoglobin levels are less than or equal to 16.9 =
78%
Frequency Density: The frequency density of a class is determined if the classes of

frequency distribution are not of equal width. This is because, frequencies of different
class can not be compared. In such circumstances, the frequency densities of the
classes are used.
i.e. frequency density of a class = frequency of the class/ width of the class.
Class Interval True Class Interval Midpoint Frequency
(Boundary points)
13.0-13.9 12.95-13.95 13.45 3
14.0-14.9 13.95-14.95 14.45 5
15.0-15.9 14.95-15.95 15.45 15
16.0-16.9 15.95-16.95 16.45 16
17.0-17.9 16.95-17.95 17.45 10
18.0-18.9 17.95-18.95 18.45 1

Mid-Points of Class Intervals:
• Mid-point = Upper limit + Lower limit/2
E.G:
• Mid-point of the 1st interval = (13.0+13.9)/2 = 13.45
• Mid-point of the last interval = (18.0+18.9)/2 = 18.45

Note:
1. Mid-point of a class interval is considered as a typical (approximated) value

for all values in that class interval. i.e. used as a representative value of the
class- interval
For example: approximately we may say that:
• there are 3 observations with the value of 13.45
• there are 5 observations with the value of 14.45
• there are 1 observation with the value of 18.45

True Class Intervals (Boundary Points):
• d = gap between class intervals
• d = lower limit – upper limit of the preceding class interval
• true upper limit = upper limit +d/2
• true lower limit = lower limit - d/2
Note:
• There are no gaps between true class intervals. The end-point (true upper
limit) of each true class interval equals to the start-point (true lower limit) of
the following true class interval
True limits (or class boundaries)
• They are points that demarcate the true upper limit of one class and the True
lower limit of the next.
• The true limits are what the tabulated limits would correspond with if one
could measure exactly.
• e.g. the class boundary between classes (13-13.9) and (14-1.14.9) is 13.95.
• It is the upper boundary for the former and lower boundary for the latter.
Class boundaries may replace class limits during statistical manipulations.
• The width of a class is found from the true class limit by
subtracting the true lower limit from the upper true limit of any particular
class.
e.g.
using the above frequency distribution table, class width for the first class is
1 i.e. (13.95-12.95) = 1
GRAPHICAL PRESENTATION OF DATA
• The graphical presentation of the frequencies of a characteristic can be

presented in either: Charts or diagrams. They may be shown either by:
Lines
dots or
 figures.
• The drawings are meant for the non statistical minded people who want to
study the relative values or frequencies of persons or events.
• For the statistical-mined persons, they are for quick eye readings.
Importance of Diagrammatic Representation
• They have greater attraction than mere figures. They give delight to the eye
and add a spark of interest.
• They help in deriving the required information in less time and without any
mental strain.
• They facilitate comparison.
• They may reveal unsuspected patterns in a complex set of data and may
suggest directions in which changes are occurring. This warns us to take an
immediate action.
• They have greater memorising value than mere figures. This is so because the
impression left by the diagram is of a lasting nature.
Limitations of Diagrammatic Representation
• This technique is made use only for purposes of comparison. It is not to be

used when comparison is either not possible or is not necessary
• It only strengthens the textual exposition of a subject, and cannot serve as a

complete substitute for statistical data.
Limitations cont.
• It can give only an approximate idea and as such where greater accuracy is
needed diagrams will not be suitable.
• They fail to bring to light small differences.

• The choice of a particular form of graph solely depends on ones preference
and/or the type of data.
• The most common ones used depend on whether the data in question is
quantitative continuous or
qualitative or quantitative discrete or counting data.

• The common graphs used in the presentation of quantitative, continuous or
measured data are:
Histogram
Frequency polygon
Frequency curve
Line chart or graph
Cumulative frequency diagram
Scatter or dot diagram.

• Presentation of qualitative or quantitative discrete or counted data is through
• diagrams. The common diagrams in use are:
• Bar diagram
• Pie or sector diagram
• Pictogram or picture diagram
• Map diagram or spot map.

Construction of graphs
• There are, however, general rules that are commonly accepted when
constructing graphs.
Every graph should be self-explanatory and as simple as possible.
Titles are usually placed below the graph and it should again question what ?
Where? When? How classified?
Legends or keys should be used to differentiate variables if more than one is

shown.
The axes label should be placed to read from the left side and from the
bottom.
The units in to which the scale is divided should be clearly indicated.
The numerical scale representing frequency must start at zero or a break in

the line should be shown.
• HISTOGRAM
 It is a graphical presentation of frequency distribution.
 It is the most commonly used diagram to depict grouped frequency

graphically.
 It consists of a series of continuous vertical rectangles (i.e., a set of compact

vertical bars drawn side by side).
Each of the bars are drawn to represent the size of the class interval by its
width and the frequency in each class-interval by its height.
 Variable characters of the different groups are indicated on the horizontal line
(x-axis) called abscissa while frequency, i.e. number of observations is marked
on the vertical line (y-axis) called ordinate.
 Histogram is an area diagram.
The area of the rectangle is proportional to the frequency of the corresponding

class interval.
 In constructing a histogram, the class boundaries are considered to be very

important and they are placed on the horizontal side(abscissa or x-axis)
 The size of the rectangle is determined by the intervals in between the classes.
Age groups in years No. of women
15-19 8
20-24 16
25-29 32
30-34 28
35-39 12
40-44 4
Note: Any time you are to present an information like this in

histogram, it is important to convert it into class boundaries or True
class intervals
Class True class Frequency (No. Cumulative Midpoints
Interval interval(Class of women) frequency
(Age) Boundaries)
15-19 14.5-19.5 8 8 17
20-24 19.5-24.5 16 24 22
25-29 24.5-29.5 32 56 27
30-34 29.5-34.5 28 84 32
35-39 34.5-39.5 12 96 37
40-44 39.5-44.5 4 100 42
Total n = 100
Width of the interval:

W =true upper limit – true lower limit = 19.5 − 14.5 = 5
• FREQUENCY POLYGON
It is again an area diagram of frequency distribution developed over a

histogram. Join the mid-points of class intervals at the height of frequencies by
straight lines. It gives a polygon, i.e. a figure with many angles
In other words, it is obtained by joining the mid points of the tops of the
rectangles in a histogram by straight lines.
Frequency polygon (Open)
Frequency Polygon (Closed)
Note: It is not essential to draw histogram in order to obtain frequency polygon. It can be drawn with out erecting
rectangles of histogram as follows:
It is used when sets of data are to be illustrated on the same diagram such as
birth and death rates, birth of diabetics and non diabetics, etc.
 Frequency polygons may be of different shapes.
 bell shaped
 bimodal (i.e. having two peaks) distribution
 rectangular distribution
 Skewed positively
 Skewed negatively
Cumulative Frequency Curve (Ogive)
• When the cumulative frequencies of a distribution are graphed the resulting
curve is called Ogive Curve or Cumulative frequency curve.
• To draw this, an ordinary frequency distribution table in a quantitative data has

to be converted into a relative cumulative frequency table.
• The cumulative frequencies are then plotted corresponding to the upper limits
of the classes.
• The points corresponding to cumulative frequency at each upper limit of the

classes are joined by free hand curve.
• Note: One can find the median, quartiles and percentiles using ogive curve.
Heights of groups (cm) Frequencies Cumulative
frequencies
160-162 10 10
162-164 15 25
164-166 17 42
166-168 19 61
168-170 20 81
170-172 26 107
172-174 29 136
174-176 30 166
176-178 22 188
178-180 12 200
Total 200
Cumulative frequency diagram showing height values of median (Q2),
• first or lower quartile (Q1), third or upper quartile (Q3) and tenth
percentile (P10)

Data Distribution-1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Distribution-1

Uploaded by

Copyright:

Available Formats

DATA DISTRIBUTION

12, 15, 20, 22, 25

• The data collected in a survey is called raw data.

• Collected data need to be organized in such a way as to condense the

• Precise methods of analysis can be decided up on only when the

characteristics of the data are understood

• Graphic presentation: This is represented in form of Charts or diagrams

Cumulative frequency diagram

Scatter or dot diagram.

Pictures and special curves

• A first step in organizing data is the preparation of an ordered array.

the largest value.

• It also gives an idea of their general distribution.

(usually less than 20).

• The following values represent a list of ages of subjects who participate

in a study on smoking cessation: 55 46 58 54 52 69 40 65 53 58 The

ordered array is: 40 46 52 53 54 55 58 58 65 69

• It allows data to be more easily appreciated and to draw quick comparisons,

• Data in this form are often arranged in the form of a table

• It facilitates analysis of voluminous data collected by putting them into

 Frequency distribution is therefore a classification showing the different

 FREQUENCY: The number of times each value occurs in a data set.

Marks obtained in a Number of students

Number of Movies Number of Persons

• Number of movies represents the variable or variate under

• Number of persons represents the frequency,

• The whole distribution is called frequency distribution

• This involves grouping or condensing a set of observations in a continuous, non-

The construction of grouped frequency distribution consists essentially of four

1. Choosing the classes

2. sorting (or tallying) of the data into these Classes.

3. counting the number of items in each class

4. displaying the results in the forma of a chart or table

14.0,16.2, 16.4, 14.9, 17.8, 16.1, 15.5, 18.3, 15.8, 16.7,

13.0 – 13.9 , 14.0 – 14.9 , 15.0 – 15.9 ,

16.0 – 16.9 , 17.0 – 17.9 , 18.0 – 18.9

• Variable = X = hemoglobin level (continuous, quantitative)

1. Minimum value is found in the first interval.

2. Maximum value is found in the last interval.

3. The intervals are not overlapped.

4. Each value belongs to one, and only one, interval.

5. Total of the frequencies = the sample size = n

total frequencies are called Cumulative Frequencies.

greater than some value.

as a rate of total frequency.

• RF also shows the percentages of a given frequency count.

RF = Class frequency/total frequency

resulting frequency distribution is called `Less than cumulative frequency

resulting frequency distribution is called `more than cumulative frequency

distribution.' The most common cumulative frequency is the less than

Class Interval Frequency Cumulative Relative Cumulative Percentage Cumulative

• From cumulative frequencies:

• From percentage frequencies:

Frequency Density: The frequency density of a class is determined if the classes of

14.0-14.9 13.95-14.95 14.45 5

15.0-15.9 14.95-15.95 15.45 15

16.0-16.9 15.95-16.95 16.45 16

17.0-17.9 16.95-17.95 17.45 10

18.0-18.9 17.95-18.95 18.45 1

• Mid-point = Upper limit + Lower limit/2

• Mid-point of the 1st interval = (13.0+13.9)/2 = 13.45

• Mid-point of the last interval = (18.0+18.9)/2 = 18.45