Professional Documents
Culture Documents
Statistics Notes Angelux
Statistics Notes Angelux
iv. Descriptive statistics are single results you get when you analyse a set of data —
for example, the sample mean, median, standard deviation, correlation, regression
line, margin of error, and test statistic. Descriptive statistics aim to summarize,
and as such can be distinguished from inferential statistics, which are more
predictive in nature.
v. Statistical inference refers to using your data (and its descriptive statistics) to make
conclusions about the population.
TYPES OF STATISTICS
Two types of statistical methods are used in analysing data:
- Descriptive statistics and
- Inferential statistics.
DESCRIPTIVE STATISTICS
Descriptive statistics mostly focus on the central tendency, variability, and distribution of
sample data.
Central tendency means the estimate of the characteristics, a typical element of a
sample or population, and includes descriptive statistics such as mean, median, and
mode.
Variability refers to a set of statistics that show how much difference there is among
the elements of a sample or population along the characteristics measured, and includes
metrics such as range, variance, and standard deviation.
The distribution refers to the overall “shape” of the data, which can be depicted on a
chart such as a histogram or dot plot, and includes properties such as the probability
distribution function, skewness, and kurtosis.
Descriptive statistics can also describe differences between observed characteristics of the
elements of a data set. Descriptive statistics help us understand the collective properties of the
elements of a data sample and form the basis for testing hypotheses and making predictions
using inferential statistics
INFERENTIAL STATISTICS
Inferential statistics are tools that statisticians use to draw conclusions about the
characteristics of a population, drawn from the characteristics of a sample, and to decide how
certain they can be of the reliability of those conclusions.
Inferential statistics are used to make generalizations about large groups, such as estimating
average demand for a product by surveying a sample of consumers’ buying habits or to attempt
to predict future events, such as projecting the future return of a security or asset class based
on returns in a sample period.
Regression analysis is a widely used technique of statistical inference used to determine the
strength and nature of the relationship (i.e., the correlation) between a dependent variable and
one or more explanatory (independent) variables.
Importance of Statistics
(1) Statistics helps in providing a better understanding and accurate description of nature’s
phenomena.
(2) Statistics helps in the proper and efficient planning of a statistical inquiry in any field
of study.
(4) Statistics helps in presenting complex data in a suitable tabular, diagrammatic and
graphic form for an easy and clear comprehension of the data.
(5) Statistics helps in understanding the nature and pattern of variability of a phenomenon
through quantitative observations.
(6) Statistics helps in drawing valid inferences, along with a measure of their reliability
about the population parameters from the sample data.
- Statistics guides for learning from data and navigating common problems that can lead
you to incorrect conclusions.
- It helps in critically assess the quality of analyses that others present to you. Statistics
offer critical guidance in producing trustworthy analyses and predictions. Along the
way, statisticians can help investigators avoid a wide variety of analytical traps.
Qualitative data refers to information about qualities, or information that cannot be measured.
It’s usually descriptive and textual. Examples include someone’s eye colour or the type of car
they drive. In surveys, it’s often used to categorise ‘yes’ or ‘no’ answers.
Quantitative data is numerical. It’s used to define information that can be counted. Some
examples of quantitative data include distance, speed, height, length and weight. It’s easy to
remember the difference between qualitative and quantitative data, as one refers to qualities,
and the other refers to quantities.
A. Discrete data
Discrete data is a whole number that can’t be divided or broken into individual parts, fractions
or decimals. Classic examples are the number of people in a classroom, number of brothers in
a family, etc. You can’t have 30.5 people in the class and you can’t have 1.5 brothers. Other
examples of discrete data include the number of pets someone has – one can have two dogs but
not two-and-a-half dogs. The number of wins someone’s favourite team gets is also a form of
discrete data because a team can’t have a half win – it’s either a win, a loss, or a draw.
B. Continuous data
Continuous data describes values that can be broken down into different parts, units, fractions
and decimals. Continuous data points, such as height and weight, can be measured. Time can
also be broken down – by half a second or half an hour. Temperature is another example of
continuous data.
Continuous data can be further categorized into a couple of types: interval and ratio.
Discrete versus continuous
There’s an easy way to remember the difference between the two types of quantitative data:
data is considered discrete if it can be counted and is continuous if it can be measured.
Someone can count students, tickets purchased and books, while one measures height, distance
and temperature.
B. Magnitude: Magnitude means that the values have an ordered relationship to one
another, so there is a specific order to the variables.
C. Equal intervals: Equal intervals mean that data points along the scale are equal, so
the difference between data points one and two will be the same as the difference
between data points five and six.
D. A minimum value of zero: A minimum value of zero means the scale has a true
zero point. Degrees, for example, can fall below zero and still have meaning. But if
you weigh nothing, you don’t exist.
The nominal scale of measurement defines the identity property of data. This scale has certain
characteristics, but doesn’t have any form of numerical meaning. Nominal scales are used for
labelling variables, without any quantitative value. “Nominal” scales could simply be called
“labels.”
Here are some examples, below.
Notice that all of these scales are mutually exclusive (no overlap) and none of them have any
numerical significance. A good way to remember all of this is that “nominal” sounds a lot like
“name” and nominal scales are kind of like “names” or labels.
Nominal data can be broken down again into three categories:
Nominal with order: Some nominal data can be sub-categorised in order, such as “cold,
warm, hot and very hot.
Nominal without order: Nominal data can also be sub-categorised as nominal without
order.
Dichotomous: Dichotomous data is defined by having only two categories or levels,
such as “yes’ and ‘no’, male/female.
2. Ordinal scale
With ordinal scales, the order of the values is what’s important and significant, but the
differences between each one is not really known. These values can’t be added to or subtracted
from.
Take a look at the example below.
In each case, we know that a #4 is better than a #3 or #2, but we don’t know–and cannot
quantify–how much better it is. For example, is the difference between “OK” and “Unhappy”
the same as the difference between “Very Happy” and “Happy?” We can’t say.
Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness,
discomfort, etc.
“Ordinal” is easy to remember because it sounds like “order” and that’s the key to remember
with “ordinal scales”–it is the order that matters, but that’s all you really get from these.
Where someone finished in a race also describes ordinal data. While first place, second place
or third place shows what order the runners finished in, it doesn’t specify how far the first-
place finisher was in front of the second-place finisher.
Interval scales are numeric scales in which we know both the order and the exact differences
between the values.
The classic example of an interval scale is Celsius temperature because the difference between
each value is the same. For example, the difference between 60 and 50 degrees is a measurable
10 degrees, as is the difference between 80 and 70 degrees.
Interval scales are nice because the realm of statistical analysis on these data sets opens up.
For example, central tendency can be measured by mode, median, or mean; standard deviation
can also be calculated.
Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval”
itself means “space in between,” which is the important thing to remember–interval scales not
only tell us about order, but also about the value between each item.
Here’s the problem with interval scales: they don’t have a “true zero.” For example, there
is no such thing as “no temperature,” at least not with Celsius. In the case of interval scales,
zero doesn’t mean the absence of value, but is actually another number used on the scale, like
0 degrees Celsius. Negative numbers also have meaning.
Without a true zero, it is impossible to compute ratios. With interval data, we can add and
subtract, but cannot multiply or divide.
The interval scale contains properties of nominal and ordered data, but the difference between
data points can be quantified. This type of data shows both the order of the variables and the
exact differences between the variables. They can be added to or subtracted from each other,
but not multiplied or divided. For example, 40 degrees is not 20 degrees multiplied by two.
This scale is also characterised by the fact that the number zero is an existing variable. In the
ordinal scale, zero means that the data does not exist. In the interval scale, zero has meaning –
for example, if you measure degrees, zero has a temperature.
Data points on the interval scale have the same difference between them. The difference on the
scale between 10 and 20 degrees is the same between 20 and 30 degrees. This scale is used to
quantify the difference between variables, whereas the other two scales are used to describe
qualitative values only. Other examples of interval scales include the year a car was made or
the months of the year.
Ratio scales of measurement include properties from all four scales of measurement. Good
examples of ratio variables include height, weight, and duration. The data is nominal and
defined by an identity, can be classified in order, contains intervals and can be broken down
into exact value. Weight, height and distance are all examples of ratio variables. Data in the
ratio scale can be added, subtracted, divided and multiplied.
Ratio scales also differ from interval scales in that the scale has a ‘true zero’. The number zero
means that the data has no value point. An example of this is height or weight, as someone
cannot be zero centimetres tall or weigh zero kilos – or be negative centimetres or negative
kilos.
Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These
variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency
can be measured by mode, median, or mean; measures of dispersion, such as standard deviation
and coefficient of variation can also be calculated from ratio scales.
Exercise
Please answer all the questions as detail as possible while providing the process to them (5
Points) Indicate the level of measurement (nominal, ordinal, interval, ratio) for the following;
1) Movie ratings (G, PG, PG-13, etc)
2) Number of siblings a person has
3) Number of a person’s siblings in the following categories: 0-1, 2-3, 4-6, 7+
4) Gender (male, female)
5) Length of pencils (in inches)
6) Number of bus routes that pass in front of a person’s apartment
7) Year in school (freshman, sophomore, junior, senior)
8) Distance from a person’s house to the public library
9) Professions (doctor, lawyer, baker, butcher, etc)
10) Money won from the lottery
11) Temperature in Antarctica today
12) Dolphin’s IQ
STATISTICAL GRAPHS
A statistical graph or chart is defined as the pictorial representation of statistical data in
graphical form. The statistical graphs are used to represent a set of data to make it easier to
understand and interpret statistical information. The different types of graphs that are
commonly used in statistics are given below.
Types of Graphs in Statistics
The four basic graphs used in statistics include;
- Bar graphs
- Line graphs
- histogram and
- pie charts.
BAR GRAPH
Bar graphs are the pictorial representation of grouped data in vertical or horizontal rectangular
bars, where the length of bars is proportional to the measure of data.
They are also known as bar charts. The chart’s horizontal axis represents categorical data,
whereas the chart’s vertical axis defines discrete data.
The bars drawn are of uniform width, and the variable quantity is represented on one of the
axes. Also, the measure of the variable is depicted on the other axes. The heights or the lengths
of the bars denote the value of the variable, and these graphs are also used to compare certain
quantities. The frequency distribution tables can be easily represented using bar charts which
simplify the calculations and understanding of data.
Types of Bar Charts
- Vertical bar chart
- Horizontal bar chart
Even though the graph can be plotted using horizontally or vertically, the most usual type of
bar graph used is the vertical bar graph. The orientation of the x-axis and y-axis are changed
depending on the type of vertical and horizontal bar chart. Apart from the vertical and
horizontal bar graph, the two different types of bar charts are:
- Grouped Bar Graph
- Stacked Bar Graph
Solution:
The given data can be represented as follows
Example 2:
A cosmetic company manufactures 4 different shades of lipstick. The sale for 6 months is
shown in the table. Represent it using bar charts.
Month Sales (in units)
Shade 1 Shade 2 Shade 3 Shade 4
January 4500 1600 4400 3245
February 2870 5645 5675 6754
March 3985 8900 9768 7786
April 6855 8976 9008 8965
May 3200 5678 5643 7865
June 3456 4555 2233 6547
Example 3:
The variation of temperature in a region during a year is given as follows. Depict it through the
graph (bar).
Solution:
As the temperature in the given table has negative values, it is more convenient to represent
such data through a horizontal bar graph.
LINE GRAPH
Is a graph that utilizes points and lines to represent change over time. In other words, it is a
chart that shows a line joining several points or a line that shows the relation between the points.
The diagram depicts quantitative data between two changing variables with a straight line or
curve that joins a series of successive data points. Linear charts compare these two variables
on a vertical and horizontal axis.
X – axis; this is also known as the base or horizontal axis. It is used principally to show the
value of independent variable like date or places.
Y – axis: This is also known as the vertical axis. It is used show the values for the dependent
variable of like output of crops, minerals etc.
Example
Y axis
For dependent
Variable
YEAR PRODUCTION
1990 100
1991 250
1992 300
1993 150
1994 500
1995 400
Procedure
(a) Variable’s identification
Fig.
Scales:
VS: 1 cm to 50 tons
HS: 1cm to 1 year
It is a form of line graph designed to show the accumulated total values at various dates or
possibly places for a single item. This graphical method has no alternative graphical bar method
as it can be compared to other linear graphical
Construction of the cumulative line graph
Consider the given hypothetical data below showing maize production for country X.
YEAR PRODUCTION
1990 50
1991 40
1992 90
1993 100
1994 90
1995 130
Procedures
(a) Identification of Variables
· Dependent variable = Production values
· Independent variable = Date (Years)
Y - axis ………. Production values
X – axis ………...Years
Scale: -
VS: 1cm represents 50 tons
HS: 1cm represents 1 year
It is a form of line graph designed to illustrate the increase and decrease of the distribution
values in relation to the mean.
The graph is designed to have upper and lower sections showing positive and negative
values respectively. The two portions are separated by the steady line
graduated with zero value along the vertical line. The steady line also shows the average of all
values.
Construction of the divergent line graph
Consider the following tabled data which show export values of coffee for country X in
millions of dollars.
EXPORT VALUES (000,000
YEAR
dollars)
1952 345
1953 256.5
1954 283
1955 500
1956 335
1957 330.5
1952 345-308 = 37
1953 256.5 – 308 = 52.5
1954 283-308 = -25
1955 300 – 308 = -8
1956 335 – 308 = 27
1957 330.5 – 308 = 22.5
(c) Estimation of the vertical scale.
Scales: -
- Vertical scale 1cm represents 15 tons
- Horizontal scale 1cm represents 1 year
It is a form of statistical line graph designed to have more than one lines of varied textures to
illustrate the values of more than one items. Group line graph is alternatively known as
composite, comparative, and multiple line graphs.
Consider the given data below showing values of export crops from Kenya (Ksh Million).
(a) Identification of Variables
Scales: -
Vertical scale: 1cm to 5,000 export values
It is a line graph designed to have more than one lines compounded to one another by varied
shade textures to show the cumulative values of more than one items.
Construction of the compound line graph
Consider the given data below showing cocoa production for the Ghana provinces in 000
tons.
YEAR/PRO
TV Togoland E. province W. province Ashanti
V
1947/48 40 40 30 35
1948/49 50 60 45 100
1949/50 45 46 89 110
1950/51 45 47 44 124
1951/52 47 23 50 100
1952/53 51 14 57 118
Procedure
Independent variable … Date (Years)
Y - -axis… export values
X – axis… Years
(a) Identification of Variables
Dependent variable…… export values
(b) Cumulative values determination for the dates.
HISTOGRAM
The first step, in the construction of histogram, is to take the observations and split them into
logical series of intervals called class or bins. X-axis indicates, independent variables i.e.,
classes while the y-axis represents dependent variables i.e., occurrences.
Rectangle blocks i.e., bars are depicted on the x-axis, whose area depends on the classes. See
figure given below:
Notice that the horizontal axis of Figure 1 consists of binned times: the first bin includes visits
from 0 up to and including ten minutes, the second bin from 10 up to and including 20 minutes,
and so on.
Characteristics of a histogram
1. histograms are used to show distributions of variables
2. Histogram plot binned quantitative data
3. Bars cannot be reordered in histograms.
4. there are no spaces between the bars of a histogram since there are no gaps between the
bins. An exception would occur if there were no values in a given bin but in that case
the value is zero rather than a space.
5. The widths of the bars in a histogram need not be the same
1.2 Uses of a Histogram
1.2.1 Identifying the most common process outcome
A quick look at a histogram can immediately reveal what the most common outcome of a
process with varying outcomes is, any special trends will quickly become apparent.
1.2.2 Identifying data symmetry
Sometimes, you will spot trends that lean in two directions simultaneously. A histogram can
make it very easy to identify those occurrences and know when your processes are prone to
producing symmetrical results in some circumstances. On the other hand, it can also help you
identify possible issues, as sometimes symmetry is not what you expect to see in your results.
0 – 10 – 20 – 30 – 40 – 50 – 60 – 70 – Marks
10 20 30 40 50 60 70 80
5 10 15 20 25 12 8 5 Number
of
students
Solution:
The class intervals are all equal with length of 10 marks. Let us denote these class intervals
along the X-axis. Denote the number of students along the Y-axis, with appropriate scale.
The histogram is given below.
Scale:
X – axis = 1 cm = 10 marks
Y – axis = 1 cm = 5 students
In the above diagram, the bars are drawn continuously. The rectangles are of lengths (heights)
proportional to the respective frequencies. Since the class intervals are equal, the areas of the
bars are proportional to the respective frequencies.
Example 2:
In a study of diabetic patients in a village, the following observations were noted. Represent
the above data by a frequency polygon using histogram.
Ages 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Number of 3 5 13 20 10 5
patients
Scale
X – axis = 1 cm = 10age
Y – axis = 1 cm = 2patients
When drawing histograms, it is possible that the intervals will not have the same width.
Consider the data given in the table below.
The way the data have been presented makes it impossible to draw a histogram with equal class
intervals.
In order to keep the histogram fair, the area of the bars, rather than the height, must be
proportional to the frequency. So, on the vertical scale we plot frequency density instead of
frequency, where;
Rewriting the table with an extra column for frequency density, gives
And you can draw the histogram with frequency density on the vertical axis.
Note
You can see that, it is the area that is proportional to the frequency – in fact, a frequency of 1
is represented by 10 little squares.
PIE CHART
A pie chart is a type of graph that represents the data in the circular graph. The slices of pie
show the relative size of the data, and it is a type of pictorial representation of data. A pie chart
requires a list of categorical variables and numerical variables. Here, the term “pie” represents
the whole, and the “slices” represent the parts of the whole.
Formula
The pie chart is an important type of data representation. It contains different segments and
sectors in which each segment and sector of a pie chart forms a specific portion of the
total(percentage). The sum of all the data is equal to 360°.
The total value of the pie is always 100%.
To work out with the percentage for a pie chart, follow the steps given below:
- Categorize the data
- Calculate the total
- Divide the categories
- Convert into percentages
- Finally, calculate the degrees
The data above can be represented by a pie chart as following and by using the circle graph
formula, i.e., the pie chart formula given below. It makes the size of the portion easy to
understand.
Step 1: First, Enter the data into the table.
Step 2: Add all the values in the table to get the total.
- Total students are 40 in this case.
Step 3: Next, divide each value by the total and multiply by 100 to get a per cent:
(10/40) × 100 (5/ 40) × 100 (5/40) ×100 (10/ 40) ×100 (10/40) × 100
=25% =12.5% =12.5% =25% =25%
Step 4: Next to know how many degrees for each “pie sector” we need, we will take a full
circle of 360° and follow the calculations below:
The central angle of each component =
Question: The percentages of various cops cultivated in a village of particular distinct are
given in the following table.
NATURE OF CLASS
The following are some basic technical terms when a continuous frequency distribution is
formed/ data are classified according to class intervals.
CLASS LIMITS/CLASS BOUNDARY
Refers to the lowest and highest values that can be included in the class. For example, 10-20,
the lowest value of class is 10 and the highest value of class is 20. the two boundaries of class
are known as the lower limits (on left side) and the upper limit of the class (on the right side).
CLASS INTERVAL
Refers to the numerical width of any class in a particular distribution. It’s defined as the
difference between the upper-class limit and the lower-class limit (range of each grouping data)
For example, 0-2,2-4, or 0-9, 10-19…
CLASS SIZE/ CLASS WIDTH
The difference between the upper limit and low limit of class. It is denoted by symbol ‘C’.
Class width = Upper Limit – Lower Limit OR Range
Number of classes Number of classes
CLASS MARK
The middle value of the selected class size. It can be calculated as follows:
Refers to statistical way of condensing and summarizing large amount of data in useful format
that can help in analysis of data so as to give out interpreted information.
• It describes characteristics of population
• It allows comparison of data set in interval manners
• It facilitates graphic presentation of data.
How to create frequency distribution table
76,84,76,103,92,47,98,54,80,91,69,86,83,75,93,89,96,65,94,85
Create a frequency table using 6 classes
Step 1: calculate class width
Step 2: create table by starting with the lowest number value, then add the value obtained from
class width in each side of the limit.
Step 3: insert frequency
CLASS BOUNDARY
Class boundaries are the numbers used to separate classes. The boundaries have one more
decimal place than the raw data and therefore do not appear in the data. The lower class
boundary is found by subtracting 0.5 unit from the lower class limit and the upper class
boundary is found by adding 0.5 units to upper class limit.
• It is used in drawing histogram.
Creating a class boundary
• Subtract the first upper class limit from the second lower class limit.
• Divide the difference by 2
• Subtract this value from all of the lower-class limits and add the value to all of the
upper-class limits.
Example
20 – 29 19.5 – 29.5
30 – 39 29.5 – 39.5
40 – 49 39.5 – 49.5
50 – 59 49.5 – 59.5
60 – 69 59.5 – 69.5
Number of 90 50 60 80 50 30
patients
So, the cumulative frequency table for the above data is given below:
10-20 90 90
20-30 50 140
30-40 60 200
40-50 80 280
50-60 50 330
60-70 30 360
The monthly wages (in rupees) of 28 labourers working in factory, are given below:
220 268 258 242 210 267 272 242
311 290 300 320 319 304 302 292
254 278 318 306 210 2 40 280 316
306 215 256 328
Form a cumulative frequency table with class interval of length 20.
Solution
Example
Create a cumulative frequency table with 10 classes
147, 167, 136, 178, 175, 116, 155, 121
115, 156, 176, 141, 189, 167, 177, 208
212, 143, 203, 210, 188, 178, 212, 118
197, 145, 134, 133, 196, 185.
115-124 4
125-134 2
135-144 3
145-154 2
155-164 2
165-174 2
175-184 5
185-194 4
195-204 3
205-214 3
TOTAL 30
Quesstion
You are provided with the following data;
Age Number of Students
0-5 35
5-10 45
10-15 50
15-20 50
Calculate;
1. The lower limit of the first class interval = 0
2. The class limits of the third class
Answer. The lower-class limit = 10
The upper-class limit = 15
3. Class mark for the interval 5 – 10
Answer: Class mark = Upper Class Limit + Lower Class Limit
2
10 + 5 = 7.5
2
Class mark = 7.5
4. The class size
Class size (C) = Upper Limit – Lower Limit
=5–0
C=5
Solution
Arrange them in ascending order
24,34,43,50,67,78
Take two middle number, plus them and divide them into 2
43+50
2
The median is 46.5
MODE OF UNGROUPED DATA
The value which appears most often in the given data i.e the observation with the highest
frequency is called a mode of data.
For ungrouped data, we just need to identify the observer which occurs maximum times.
• Mode =Observation with maximum frequency
For example, in the data: 6,8, 9, 3, 4, 6, 7, 6, 3
The value 6 appears the most times, thus, mode=6
An easy way to remember mode is: Most Often Data Entered.
Mode can be unimodal, bimodal data, trimodal
More examples
The following are the marks scored by 20 students in the class.
90, 70, 50, 30, 40, 86, 65, 73, 68, 90, 90, 10, 73, 25, 35, 88, 67, 80, 74, 46
Find the mode.
Solution:
Since the marks 90 occurs the maximum number of times, three times compared with the other
numbers, mode is 90.
Example
A doctor who checked 9 patients’ sugar level is given below. Find the mode value of the sugar
levels. 80, 112, 110, 115, 124, 130, 100, 90, 150, 180
Solution:
Since each values occurs only once, there is no mode.
Example
Compute mode value for the following observations.
2, 7, 10, 12, 10, 19, 2, 11, 3, 12
Solution:
Here, the observations 2, 10 and 12 occurs twice in the data set, the modes are 2, 10 and 12.
For discrete frequency distribution, mode is the value of the variable corresponding to the
maximum frequency.
Example 5.24
Calculate the mode from the following data
Solution:
Here, 7 is the maximum frequency, hence the value of x corresponding to 7 is 8.
Therefore 8 is the mode
Sample examples
A. The monthly salary of 10 employees in a factory are given below:
5000, 7000, 5000, 7000, 8000, 7000, 7000, 8000, 7000, 5000
Find the mean, median, and mode
B. Find the mode of the given data: 3.1, 3.2, 3.3, 2.1, 1.3, 3.3, 3.1
C. For the data 11, 15,17, x*1, 19, x-2, 3 if the mean is 14, find the value of x. Also find the
mode of the data.
ANSWERS
A. Mean 6600
Median 7000
Mode 7000
B. 3.1, 3.3
C. Mean x= 17
Mode=15
Example; Calculate the mean from the following data after drawing a distribution table using
the class of 5
118, 123, 124, 125, 127, 129, 130, 130, 133, 133, 136, 138, 141, 142, 149, 150, 154
Solution
Class width = Range
Number of classes
Class width = 154 -118 = 7.2 = 8
5
Class width = 8
Class interval F C.F X FX
118 - 125 4 4 121.5 486
126 - 133 6 10 129.5 777
134 - 141 3 13 137.5 412.5
142 - 149 2 15 145.5 291
150 - 157 2 17 155.5 307
∑f = 17 ∑FX = 2273.5
MEDIAN
Midian for grouped data is calculated using the following formula
NOTE: Certain sources use the letters; m in space of C, and c in space of i but their
definitions remain the same, in such case the formula looks like this;
Median = L + (N/2 -m) c
f
The following data attained from a garden record of certain period. Calculate the median weight
of the apple
Solution:
Example 2
The following table shows age distribution of persons in a particular region:
Find the median age.
Solution:
We are given upper limit and less than cumulative frequencies. First find the class-intervals
and the frequencies. Since the values are increasing by 10, hence the width of the class interval
is equal to 10.
Example 3
The following is the marks obtained by 140 students in a college. Find the median marks
Solution:
Merits of Midian
· It is easy to compute. It can be calculated by mere inspection and by the graphical
method
· It is not affected by extreme values.
· It can be easily located even if the class intervals in the series are unequal
Limitations of Midian
· It is not amenable to further algebraic treatment
· It is a positional average and is based on the middle item
· It does not take into account the actual values of the items in the series
MODE
Mode for grouped data is calculated using the following formula
Example
The following data relates to the daily income of families in an urban area. Find the modal
income of the families.
Solution:
v. Leave the 1st frequency and combine the remaining frequencies three by three and
write in column V
vi. Leave the 1st and 2nd frequencies and combine the remaining frequencies three by
three and write in column VI
Mark the highest frequency in each column. Then form an analysis table to find the modal
class. After finding the modal class use the formula to calculate the modal value.
Example 5.26
Calculate mode for the following frequency distribution:
Solution:
Analysis Table:
The maximum occurred corresponding to 20-25, and hence it is the modal class.
Merits of Mode:
· It is comparatively easy to understand.
· It can be found graphically.
· It is easy to locate in some cases by inspection.
· It is not affected by extreme values.
· It is the simplest descriptive measure of average.
Demerits of Mode:
· It is not suitable for further mathematical treatment.
· It is an unstable measure as it is affected more by sampling fluctuations.
· Mode for the series with unequal class intervals cannot be calculated.
· In a bimodal distribution, there are two modal classes and it is difficult to determine
the values of the mode
SEMI-INTERQUARTILE RANGE
The semi-interquartile range (SIR) (also called the quartile deviation) is a measure of spread.
It tells you something about how data is dispersed around a central point (usually the mean).
The SIR is half of the interquartile range.
How to Calculate the Semi Interquartile Range / Quartile Deviation
As the S.I.R is half of the Interquartile Range, all you need to do is find the IQR and then divide
your answer by 2.
Another way is to use the quartile deviation formula:
Note: You might see the formula QD = ½ (Q3 – Q1). Algebraically they are the same.
Example 01
Question: Find the Quartile Deviation for the following set of data:
{490, 540, 590, 600, 620, 650, 680, 770, 830, 840, 890, 900}
Percentile
1. MEASURES OF VARIABILITY
Range and QD
Standard deviations