Professional Documents
Culture Documents
CH 1, 2 & 3for MIS
CH 1, 2 & 3for MIS
INTRODUCTION TO STATISTICS
1.1 Definition and classification of Statistics
The word statistics is defined in different ways depending on its use in the plural and singular
sense.
In the plural sense: - statistics is defined as the collection of numerical facts or figures (or the raw
data themselves).
Eg. 1. Vital statistics (numerical data on marriage, births, deaths, etc).
2. The average mark of statistics course for students is 70% would be considered as a
statistics whereas Abebe has got 90% in statistics course is not statistics.
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they
cannot be compared and are unrelated.
In its singular sense:- the word Statistics is the subject that deals with the methods of collecting,
organizing, presenting, analyzing and interpreting statistical data.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data are used.
Descriptive Statistics: - deals with describing the data collected without going further conclusion.
Example 1.1: Suppose that the mark of 6 students in Statistics course for Mathematics is given as
40, 45, 50, 60, 70 and 80. The average mark of the 6 students is 57.5 and it is considered as
descriptive statistics.
Inferential Statistics:- It deals with making inferences or conclusions about a population based
on data obtained from a sample of observations. It consists of performing hypothesis testing,
determining relationships among variables and making predictions.
Example 1.2: In the above example, if we say that the average mark in Statistics course for
Mathematics students is 57.5, then we talk about inferential statistics (draw conclusion based on
the sample observation).
1|Page
Data can be collected in a variety of ways; one of the most common methods is through the use of
sample or census survey. Survey can also be done in different methods, three of the most common
methods are:
Telephone survey
Mailed questionnaire
Personal interview.
Organization of data: - Data collected from published sources are generally in organized form.
However if an investigator has collected data through a survey, it is necessary to edit these data in
order to correct any apparent inconsistencies, ambiguities, and recording errors.
This phase also includes correcting the data for errors, grouping data into classes and tabulating.
Presentation of data:- After the data have been collected and organized they can be presented in
the form of tables, charts, diagrams and graphs. This presentation in an orderly manner facilitates
the understanding as well as analysis of data.
Analysis of data: - the basic purpose of data analysis is to dig out useful information for decision
making. This analysis may simply be a critical observation of data to draw some meaningful
conclusions about it or it may involve highly complex and sophisticated mathematical techniques.
Interpretation of data: - Interpretation means drawing conclusions from the data collected and
analyzed. Correct interpretation will lead to a valid conclusion of the study & thus can aid in
decision making.
1.3 Definition of some statistical terms
Population: - It is the totality of objects that are being studied
Examples:
All clients of Telephone Company
All students of Mettu University (MeU)
Population of families, etc.
The population could be finite or infinite (an imaginary collection of units).
Sample: - is part or subset of population under study
Sampling frame: - is the list of all possible units of the population that the sample can be drawn
from it.
Eg. List of all students of MeU, List of all residential houses in Mettu town, etc
2|Page
Survey: - is an investigation of a certain population to assess its characteristics. It may be census
or sample.
Census survey: a complete enumeration of the population under study.
Sample survey: the process of collecting data covering a representative part or portion of a
population.
Parameter: - is a statistical measure of a population, or summary value calculated from a
population. Examples: Average, Range, proportion, variance, etc
Statistic: - is a descriptive measure of a sample, or it is a summary value calculated from a sample.
Sampling: - The process or method of sample selection from the population.
Sample size: - The number of elements or observation to be included in the sample.
An element: - is a member of sample or population. It is specific subject or object (for example a
person, firm, item, etc.) about which the information is collected.
Variable: - It is an item of interest that can take numerical or non-numerical values for different
elements. It may be qualitative or quantitative. Example: age, weight, sex, marital status, etc.
Observation (measurement):- is the value of a variable for an element.
Qualitative variables:- are variables that assume non-numerical values. They can be categorized
and they are usually called attributes. Example: - Sex, marital status, ID number, etc.
Quantitative variables: - are variables which assume numerical values. eg. Age, weight, etc.
1.4 Applications of Statistics in Economics:
Statistics can be applied in any field of study which seeks quantitative evidence. For instance,
engineering, economics, natural science, etc.
In Economics: Statistics are widely used in economics study and research.
To measure and forecast Gross National Product (GNP)
Statistical analyses of population growth, inflation rate, poverty, unemployment figures,
rural or urban population shifts and so on influence much of the economic policy making.
Financial statistics are necessary in the fields of money and banking including consumer
savings and credit availability.
1.5 Levels of Measurement
Proper knowledge about the nature and type of data to be dealt with is essential in order to specify
and apply the proper statistical method for their analysis and inferences.
Scale Types
3|Page
Measurement is the assignment of values to objects or events in a systematic fashion. Four levels
of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio and each
possessed different properties of measurement systems. The first two are qualitative while the last
two are quantitative.
Nominal scale: The values of a nominal attribute are just different names, i.e., nominal attributes
provide only enough information to distinguish one object from another. Qualities with no ranking
or ordering; no numerical or quantitative value. These types of data are consists of names, labels
and categories. This is a scale for grouping individuals into different categories.
Example 1.3: Eye color: brown, black, etc, sex: male, female.
In this scale, one is different from the other
Arithmetic operations (+, -, *, ÷) are not applicable, comparison (<, >, ≠, etc) is impossible
Ordinal scale: - defined as nominal data that can be ordered or ranked.
Can be arranged in some order, but the differences between the data values are
meaningless.
Data consisting of an ordering of ranking of measurements are said to be on an ordinal
scale of measurements. That is, the values of an ordinal scale provide enough information
to order objects.
One is different from and greater /better/ less than the other
Arithmetic operations (+, -, *, ÷) are impossible, comparison (<, >, ≠, etc) is possible.
Example 1.4: Letter grading (A, B, C, D, F), -Rating scales (excellent, very good, good, fair,
poor), military status (general, colonel, lieutenant, etc).
Interval Level: data are defined as ordinal data and the differences between data values are
meaningful. However, there is no true zero, or starting point, and the ratio of data values are
meaningless. There is no true zero. For example, IQ tests do not measure people who have no
intelligence. For temperature, 00F does not mean no heat at all.
In this measurement scale:-
One is different, better/greater and by a certain amount of difference than another.
Possible to add and subtract. For example; 8000c – 500c = 3000c, 7000c – 4000c = 3000c.
Multiplication and division are not possible. For example; 600c = 3(200c). But this does
not imply that an object which is 600c is three times as hot as an object which is 200c.
Most common examples are: IQ, temperature.
4|Page
Ratio scale: Similar to interval, except there is a true zero (absolute absence), or starting point,
and the ratios of data values have meaning.
Arithmetic operations (+, -, *, ÷) are applicable. For ratio variables, both differences and
ratios are meaningful.
One is different/larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
This measurement scale provides better information than interval scale of measurement.
Example 1.5: weight, age, number of students.
5|Page
CHAPTER TWO
2. Methods of data collection and presentation
2.1 Methods of Data Collection
Data: - is the raw material of statistics. It can be obtained either by measurement or counting.
Sources of data
There are two types of source of data:
1. Primary source
2. Secondary source
The statistical data may be classified under two categories depending up on the sources.
1. Primary data: - Data collected by the investigator himself for the purpose of a specific inquiry
or study. Such data are original in character & are mostly generated by surveys conducted by
individuals or research institutions.
It is more reliable & accurate since the investigator can extract the correct information by
removing doubts, if any, in the minds of the respondents regarding certain questions.
2. Secondary data: - When an investigator uses data, which have already been collected by
others, such data are called secondary data. Such data are primary data for the agency that
collected them, and become secondary for someone else who uses these data for his own
purposes. Example of secondary data: books, reports, magazines, etc.
When our source is secondary data check that:
The type and objective of the situations.
The purpose for which the data are collected and compatible with the present problem.
The nature and classification of data is appropriate to our problem.
There are no biases and misreporting in the published data.
Note: Data which are primary for one may be secondary for the other.
2.2 Methods of Data Presentation
Having collected and edited the data, the next important step is to organize it. That is to present it
in a readily comprehensible condensed form that aids in order to draw inferences from it. It is also
necessary that the like be separated from the unlike ones.
The presentation of data is broadly classified in to the following two categories:
Tabular presentation/ Frequency distribution
Diagrammatic and Graphic presentation.
6|Page
The process of arranging data in to classes or categories according to similarities technically is
called classification. It eliminates inconsistency and also brings out the points of similarity or
dissimilarity of collected items/data.
Classification is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.
2.2.1 Frequency distribution
Frequency: - is the number of times a certain value or class of values occurs.
Frequency distribution (FD):- is the organization of raw data in table from using classes and
frequency.
There are three types of FD and there are specific procedures for constructing each type.
I. Categorical FD
II. Ungrouped FD and
III. Grouped FD
1. Categorical FD: Used for data that can be placed in specific categories; such as nominal,
ordinal level of data. Each category of the variable represents a single class and the number of
times each category repeats represents the frequency of that class (category)
Example 2.1: Twenty five patients were given a blood test to determine their blood type. The
data is as shown below: A B B AB O A O O B AB B B B O A O O O AB AB A O O B A.
Solution: since the data are categorical by taking the four blood types as classes we can
construct a FD as shown below.
Step 1: Make a table as shown below
Step 2: Tally data and place the result under the column Tally
Step 3: Count the tallies and place the result under the column Frequency.
7|Page
Step 4: find the percentage of values in each class by the formula (%= f/n * 100%; f= frequency,
n total number of observation.)
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mark 60 62 63 65 70 74 75 76 80 85 90
8|Page
Tally // // // // // // // // // // /
Frequency 2 2 2 2 2 2 2 2 2 2 1
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3. Grouped Frequency Distribution (GFD): FD of numerical data in which several values of a
variable are grouped into one class. When the range of the data is large the data must be
grouped in to classes that are more than one unit in width. The number of observations
belonging to the class is the frequency of the class Definition of some basic terms
Grouped frequency distribution: is a FD when several numbers are grouped into one class.
Class limits (CL): It separates one class from another. The limits could actually appear in
the data and have gaps between the upper limits of one class and the lower limit of the next
class.
Unit of measure (U): This is the possible smallest difference between successive values.
E.g. 1, 0.1, 0.01, 0.001……
Class boundaries: Separate one class in a grouped frequency distribution from the other.
The boundary has one more decimal place than the raw data. There is no gap between the
upper boundaries of one class and the lower boundaries of the succeeding class. Lower class
boundary is found by subtracting half of the unit of measure from the lower class limit and
upper class boundary is found by adding half unit measure to the upper class limit.
Class width (W): The difference between the upper and lower boundaries of any consecutive
class. The class width is also the difference between the lower limit or upper limits of two
consecutive classes.
Class mark (Mid point): It is found by adding the lower and upper class limit (Boundaries)
and divided the sum by two.
Cumulative frequency (CF): It is the number of observation less than the upper class
boundary or greater than the lower class boundary of class.
CF (Less than type): it is the number of values less than the upper class boundary of a given
class.
CF (Greater than type): it is the number of values greater than the lower class boundary of
a given class.
9|Page
Relative frequency (Rf ):The frequency divided by the total frequency. This gives the
percent of values falling in that class.
Rfi = fi/n= fi/∑fi
Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the percent of the values which
are less than the upper class boundary or the reverse.
10 | P a g e
11. Find the frequencies
12. Find the cumulative frequencies. Depending on what you are trying to accomplish, it may
be necessary to find the cumulative frequency.
13. If necessary find RF and RCF.
When grouping data the following rules are important:
The groups must not overlap, otherwise there is confusion concerning in which group a
measurement belongs.
There must be continuity from one group to the next, which means that there must be no gaps.
Otherwise some measurements may not fit in a group.
The groups must range from the lowest measurement to the highest measurement so that all
of the measurements have a group to which they can be assigned.
The groups should normally be of an equal width, so that the counts in different groups can
easily be compared.
Example 2.3: Construct FD for the following data.
11 29 33 22 27 19 22 21 18 17 22 26 39 27 6 34 13 20
Solution:-
1) Highest value = 39, Lowest value = 6
2) Range = 39 – 6 = 33
3) K = 1+ 3.322Log20 = 1 + 3.322(2.301) = 5.6 ≈ 6
4) W = R / K = 33/6 = 5.5 ≈ 6
5) U = 1
6) LCL1= 6
7) Find the upper class limits.
8) Find class boundaries
9) Find class mark
10) Tally the data
Class Class Class Tally Frequency CF(<) CF(>) RF RCF(>)
limit boundary Mark
6 – 11 5.5 – 11.5 8.5 // 2 2 20 2/20=0.1 1
12 – 17 11.5 – 17.5 14.5 // 2 4 18 2/20=0.1 0.9
11 | P a g e
18 – 23 17.5 – 23.5 20.5 ///// // 7 11 16 7/20=0.35 0.8
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 4/20=0.2 0.45
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 3/20=0.15 0.25
36 – 41 35.5 – 41.5 38.5 // 2 20 2 2/20=0.1 0.10
A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:
Example 2.4: Draw a suitable diagram to represent the following population in a town.
12 | P a g e
Step 3: Using a protractor and compass, graph each section and write its name with corresponding
percentage.
Boys Men
15% 25%
Girls Women
40% 20%
B) Bar Charts
Used to represent & compare the frequency distribution of discrete variables and attributes
or categorical series.
Bars can be drawn either vertically or horizontally.
All bars must have equal width and the distance between bars must be equal.
The height or length of each bar indicates the size (frequency) of the figure represented.
There are different types of bar charts. The most common being:
13 | P a g e
I. Simple bar chart
Are used to display data on one variable.
They are thick lines (narrow rectangles) having the same breadth. The magnitude of a
quantity is represented by the height /length of the bar.
Example 2.5: Number of students in the four department of Science College given as follows:
Solution:
Simple bar chart
800 600
Frequency
600 450
400
400 200
200
0
Phys Maths Chem Bio
Deprtm ent
Example 2.6: Draw a component (sub-divided) bar chart of the number of students by department
is given in the example 2.5.
Solution:
14 | P a g e
Sub-divided bar chart
800
600 Female
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department
Example 2.7: The following data represent sales by product, 1957- 1959 of a given company for
three products A, B, C.
Solution:
C) Pictograph
15 | P a g e
In this diagram, we represent data by means of some picture symbols. We decide about a suitable
picture to represent a definite number of units in which the variable is measured.
The histogram, frequency polygon and cumulative frequency graph or ogive is most commonly
applied graphical representation for continuous data.
Solution:
16 | P a g e
Histogram
Frequency
20
15
15 12
10
10
4 4
5 3 2
0
Class boundaries
Frequency polygon
If we join the mid-points of the tops of the adjacent rectangles of the histogram with line segments
a frequency polygon is obtained. When the polygon is continued to the x-axis just outside the range
of the lengths the total area under the polygon will be equal to the total area under the histogram.
Example 2.9: Construct a frequency polygon to represent the previous data in example 2.8.
Solution:
Class Frequency Class Class R.F. % R.F. Less than More than
limits marks boundaries C.F. C. F.
(percent)
Adding two class marks with f i 0 , we have 9.5 at the beginning, and 89.5 at the end, the
following frequency polygon is plotted:
17 | P a g e
Frequency Polygon
20
F
r
15
e
q
10
u
e
n 5
c
y 0
9.5 19.529.539.549.559.569.579.589.5
Class mark
An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the
cumulative frequency distribution lists cumulative frequencies. Note that the Ogive uses class
boundaries along the horizontal scale, and graph begins with the lower boundary of the first class
and ends with the upper boundary of the last class. Ogive is useful for determining the number of
values below or above some particular value. There are two type of Ogive namely less than Ogive
and more than Ogive. The difference is that less than Ogive uses less than cumulative frequency
and more than Ogive uses more than cumulative frequency on y axis.
Example 2.10: Draw a both types of ogives for the F.D. of Example 2.8.
Solutions:
40 40
30 30
20 20
10 10
0
0
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
Class Boundaries
Class Boundaries
Note: For both ogives, one class with frequency zero is added for similar reason with the
frequency polygon.
18 | P a g e
CHAPTER THREE
3. Measures of Central Tendency
Objectives
• To facilitate comparison.
• It should be unique.
19 | P a g e
• It should always exist.
Measures of Central Tendency:- give us information about the location of the center of the
distribution of data values. A single value that describes the characteristics of the entire mass of
data is called measures of central tendency
The following are types of Central Tendency which are suitable for a particular type of data. These
are
Arithmetic Mean
Geometric Mean
Harmonic Mean
Median
Mode or modal value
3.3.1 Arithmetic Mean:- Arithmetic mean is defined as the sum of the measurements of the
items divided by the total number of items. It is usually denoted by 𝑥̅ .
Arithmetic Mean for individual series
Suppose 𝑥1 , 𝑥2 , … , 𝑥𝑛 are observed values in a sample of size n from a population of size N, n<N
then the arithmetic mean of the sample, denoted by 𝑥̅ is given by
𝑥1 + 𝑥2+ … +𝑥𝑛 ∑𝑛
𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
20 | P a g e
The arithmetic mean for sample value is 39.
ii. The sample values are: 10.5 2.4 3.6 5.9 8.7
∑𝑛
𝑖=1 𝑥𝑖 10.5+ 2.4+3.6+ 5.9+ 8.7 31.1
𝑥̅ = = = = 6.22
𝑛 5 5
When the numbers 𝑥1 , 𝑥2 , … , 𝑥𝑘 occur with frequencies 𝑓1 , 𝑓2 , … , 𝑓𝑘 , respectively, then the mean
can be expressed in a more compact form as:
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖
Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10 classes:
50 42 48 60 58 54 50 42 50 42
∑𝑛
𝑖=1 𝑥𝑖 50+42+48+60+58+54+50+42+50+42 496
𝑥̅ = = = = = 49.6 ≈ 50
𝑛 10 10
In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The number of
times each number occurs is called its frequency and the frequency is usually denoted by f. The
information in the sentence above can be written in a table, as follows.
Value, xi 42 48 50 54 58 60
Frequency, 3 1 3 1 1 1
fi
The formula for the arithmetic mean for data of this type is
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖
21 | P a g e
If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖 𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘
𝑥̅ = ∑𝑘
= where 𝑥𝑖 is the class mark of the ith class; i=1, 2. . . K, 𝑓𝑖 is the
𝑖=1 𝑓𝑖 𝑓1 +𝑓2 + …+ 𝑓𝑘
Solution:
The formula to be used for the mean is as follows:
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = ∑𝑘
𝑖=1 𝑓𝑖
Let us calculate these values and make a table for these values for the sake of convenience.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Mid-Point (𝑥𝑖 ) 61 63 65 67 69 71
( x x) 0
i 1
i
22 | P a g e
n
• The sum of squares of deviations from the mean is the least. That is, ( x A)
i 1
i
2
is minimum
when A x .
Geometric mean for individual series: The geometric mean, G.M. of an individual series of
positive numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as the nth root of their product.
1
G.M . n x1f1 .x2f2 ..xmfm = antilog ( ∑ 𝑓𝑖 𝑙𝑜𝑔𝑥𝑖 )
𝑛
Example 3.8: Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6
23 | P a g e
Frequency 2 3 1 2
8
G.M. = √32 𝑋43 𝑋51 𝑋62 = 4.236
Geometric mean for continuous grouped FD:- The above formula can also be used whenever
the frequency distribution is grouped continuous, class marks of the class intervals are considered
as xi.
Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph. Find
the harmonic mean of the three velocities.
Solution
3
H .M
n = 1 1 1 = 40.9
+ +
1 1 1 25 50 75
x1 x2 xn
Harmonic mean for discrete data arranged in FD: If the data is arranged in the form of
frequency distribution
n
H .M m
, where n f k
f1 f 2 f
m k 1
x1 x 2 xm
Harmonic mean for continuous grouped FD: Whenever the frequency distribution are grouped
continuous, class marks of the class intervals are considered as 𝑥𝑖 and the above formula can be
used as
𝑛 m
H.M. = 𝑓𝑖 where n f k
∑𝑛
𝑖=1𝑥 k 1
𝑖
24 | P a g e
𝑥𝑖 Is the class mark of ith class?
3.3.4 Median
The median is as its name indicates the middle most value in the arrangement which divides the
data into two equal parts. It is obtained by arranging the data in an increasing or decreasing order
of magnitude and denoted by𝑥̃.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is the middle
value (if the sample size n is odd) or the average of the two middle values (if the sample size n is
even).
For individual series the median is obtained by
𝑛+1 𝑡ℎ
a/ 𝑥̃ = ( ) value if n is odd, and
2
𝑛 𝑛
( )𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + ( +1)𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2 2
b/ 𝑥̃ = if n is even
2
Solution;
i. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 n is even. The two middle values are 5th and 6th observations. So the median
is,
10 10
( )𝑡ℎ +( +1)𝑡ℎ 5𝑡ℎ +6𝑡ℎ 4+5
2 2
𝑥̃ = value = = = 4.5
2 2 2
Note: The median is easy to calculate for small samples and is not affected by an "outlier".
Median for Discrete data arranged in a frequency distribution:- In this case also, the median
is obtained by the above formula. After arranging the values in an increasing order find the smallest
CF greater than or equal to that value obtained by a & b above formula and the corresponding
value is the median.
25 | P a g e
Median for grouped continuous data:-For continuous data, the median is obtained by the following
formula.
w n
Median L CF ~
x
f med 2
Where: L= the lower class boundary of the median class; w = the class width of the median
class;
f m ed = the frequency of the median class; and CF the cum. freq. corresponding to the class
preceding the median class. That is, the sums of the frequencies of all classes lower than the median
class. Where the median class is the class which contains the (n/2)th observation whether n is odd or
even, since the items have already lost their originality once they are grouped in to continuous
classes.
Example 3.11: Calculate the median for the following frequency distribution.
C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40
Freq. 4 8 12 6 3 4 3 40
Cuml. Freq. 4 12 24 30 33 37 40
Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the median class
is the third class. And for this class, L = 10.5, w = 5, f m ed =12, CF = 12. Then applying the formula,
we get:
~
x =10.5+(20-12)*5/12=13.8
26 | P a g e
3.3.5 The Mode or modal value
The mode or the modal value is the value with the highest frequency and denoted by 𝑥̂. A data set
may not have a mode or may have more than one mode. A distribution is called a bimodal
distribution if it has two data values that appear with the greatest frequency. If a distribution has
more than two modes, then the distribution is multimodal. If a distribution has no modes, then the
distribution is no modal.
Mode of individual series:- The mode or the modal value of individual series (raw data) is simply
obtained by locating the observation with the maximum frequency.
Mode for discrete data arranged in a frequency distribution:-In the case of discrete grouped data,
the mode is determined just by looking to that value (s) having the highest frequency.
In such cases, one can only determine the modal class easily: the class with the highest frequency.
1
Mode L w , where L = the lower class boundary of the modal class; 1 f mod f 1
1 2
preceding the modal class; f 2 = frequency of the class immediately succeeding the modal class;
and fmode = frequency of the modal class.
Example 3.13: Calculate the mode for the frequency distribution of data of example 3.11.
27 | P a g e
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8, f2=6, w = 5
1
Mode L w = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
1 2
Let x1 , x 2 , , x n be n ordered observations. The ith quartile Qi is the value of the item
corresponding with the [i(n+1)/4]th position, i = 1, 2, 3.
That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:
• Quartiles in continuous data:- For continuous data, use the following formula:
w in
Qi L CF
f Qi 4
Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
𝑤 𝑛 𝑤 2𝑛 𝑤 3𝑛
i.e. Q1 = L +𝑓 ( 4 − 𝐶𝐹) , Q2 = L + 𝑓 ( 4 − 𝐶𝐹) 𝑎𝑛𝑑 Q3 = L + 𝑓 ( 4 − 𝐶𝐹)
𝑄1 𝑄2 𝑄3
The class under question is the one including (ixn/4)th value. That is, the class with the minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.
Deciles: are values dividing the data approximately in to ten equal parts, denoted by 𝐷1 , 𝐷2,…, 𝐷9 .
28 | P a g e
• Deciles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith decile (𝐷𝑖 ) is the value of the item
corresponding
That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:
• Deciles for continuous data: Apply the following formula and follow the procedures of quartile
for continuous data.
𝑤 𝑖𝑛
𝐷𝑖 = 𝐿 + (10 − 𝐶𝐹) ,i = 1, 2,...,9 . Then
𝑓𝐷𝑖
Define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal parts, and
denoted by 𝑃1 , 𝑃2,…, 𝑃99 .
• Percentiles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith percentile (𝑃𝑖 ) is the value of the item
That is, after arranging the data in ascending order, P1, P2, . . . & P99 are, obtained by:
29 | P a g e
• Percentiles for continuous data: Apply the following formula
𝑤 𝑖𝑛
𝑃𝑖 = 𝐿 + (100 − 𝐶𝐹) ,i = 1, 2,...,99 . Then
𝑓 𝑃𝑖
Define the symbols similar ways as we did in the case of quartiles or deciles for continuous data.
Interpretations
1. 𝑄𝑖 is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance 𝑄3 means the value below which 75 percent of observations in
the given series are found.
2. 𝐷𝑖 is the value below which ( i ×10) percent of the observations in the series are found (where
i = 1, 2,...,9 ). For instance 𝐷4 is the value below which 40 percent of the values are found in the
series.
3. 𝑃𝑖 is the value below which i percent of the total observations are found (where i = 1, 2,3,...,99
). For example 60 percent of the observations in a given series are below 𝑃60 .
Example 3.15: Calculate 𝑄1 , 𝑄2 , 𝑄3, 𝐷4, 𝐷9, 𝑃40 & 𝑃90 for the following data given on the table
below.
X 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
𝑛+1 𝑡ℎ 219+1 𝑡ℎ
𝑥̃ = ( ) =( ) value = 110th value = 14
2 2
1(𝑛+1) 𝑡ℎ 1(219+1) 𝑡ℎ
𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 55th value = 13
4 4
2(𝑛+1) 𝑡ℎ 2(219+1) 𝑡ℎ
𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 110th value = 14 = 𝑥̃
4 4
30 | P a g e
3(𝑛+1) 𝑡ℎ 3(219+1) 𝑡ℎ
𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 165th value = 15
4 4
4(𝑛+1) 𝑡ℎ 4(219+1) 𝑡ℎ
𝐷4 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
10 10
9(𝑛+1) 𝑡ℎ 9(219+1) 𝑡ℎ
𝐷9 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
10 10
40(𝑛+1) 𝑡ℎ 40(219+1) 𝑡ℎ
𝑃40 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
100 100
90(𝑛+1) 𝑡ℎ 90(219+1) 𝑡ℎ
𝑃90 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
100 100
Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find 𝑄1,
𝐷4 𝑎𝑛𝑑 𝑃7.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Solution:- first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
frequency
Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
𝑤 𝑛 5
Q1 = L +𝑓 ( 4 − 𝐶𝐹) = 55.5 +15 (12.5 − 12) = 55.7
𝑄1
D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
𝑤 4𝑛 5
D4 = L +𝑓 ( 10 − 𝐶𝐹) = 55.5 +15 (20 − 12) = 58.2
𝐷4
P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
𝑤 7𝑛 5
P7 = L +𝑓 (100 − 𝐶𝐹) = 45.5 +4 (3.5 − 0) = 49.875.
𝑃7
31 | P a g e