Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

1

1. INTRODUCTION TO STATISTICS IN PSYCHOLOGY


Statistics is the scientific and systematic discipline that concerns the collection, organization,
analysis, interpretation, and presentation of data in precise manner.

DEFINITIONS OF KEY TERMS, NEED AND IMPORTANCE


Some big terms in statistics are population, sample, parameter, and statistic:
i. A population is the entire group of individuals you want to study, and a sample is
a subset of that group.

ii. A parameter is a quantitative characteristic of the population that you’re interested


in estimating or testing (such as a population mean or proportion). If I had all the
data of all humans on Earth and generated the mean age, this value would be a
parameter.

iii. A statistic is a quantitative characteristic of a sample that often helps estimate or


test the population parameter (such as a sample mean or proportion).

iv. Descriptive statistics are single results you get when you analyse a set of data —
for example, the sample mean, median, standard deviation, correlation, regression
line, margin of error, and test statistic. Descriptive statistics aim to summarize,
and as such can be distinguished from inferential statistics, which are more
predictive in nature.

v. Statistical inference refers to using your data (and its descriptive statistics) to make
conclusions about the population.

Major types of inference include;


- Regression
- Confidence intervals, and
- Hypothesis tests.

TYPES OF STATISTICS
Two types of statistical methods are used in analysing data:
- Descriptive statistics and
- Inferential statistics.

2021 Class Fireworks


A Compilation by Angelux F. M|
2

DESCRIPTIVE STATISTICS
Descriptive statistics mostly focus on the central tendency, variability, and distribution of
sample data.
 Central tendency means the estimate of the characteristics, a typical element of a
sample or population, and includes descriptive statistics such as mean, median, and
mode.

 Variability refers to a set of statistics that show how much difference there is among
the elements of a sample or population along the characteristics measured, and includes
metrics such as range, variance, and standard deviation.

 The distribution refers to the overall “shape” of the data, which can be depicted on a
chart such as a histogram or dot plot, and includes properties such as the probability
distribution function, skewness, and kurtosis.

Descriptive statistics can also describe differences between observed characteristics of the
elements of a data set. Descriptive statistics help us understand the collective properties of the
elements of a data sample and form the basis for testing hypotheses and making predictions
using inferential statistics

INFERENTIAL STATISTICS
Inferential statistics are tools that statisticians use to draw conclusions about the
characteristics of a population, drawn from the characteristics of a sample, and to decide how
certain they can be of the reliability of those conclusions.
Inferential statistics are used to make generalizations about large groups, such as estimating
average demand for a product by surveying a sample of consumers’ buying habits or to attempt
to predict future events, such as projecting the future return of a security or asset class based
on returns in a sample period.
Regression analysis is a widely used technique of statistical inference used to determine the
strength and nature of the relationship (i.e., the correlation) between a dependent variable and
one or more explanatory (independent) variables.

What Is the Difference Between Descriptive and Inferential Statistics?


Descriptive statistics are used to describe or summarize the characteristics of a sample or data
set, such as a variable’s mean, standard deviation, or frequency. Inferential statistics, in
contrast, employs any number of techniques to relate variables in a data set to one another, for
example using correlation or regression analysis. These can then be used to estimate forecasts
or infer causality.

2021 Class Fireworks


A Compilation by Angelux F. M|
3

Importance of Statistics
(1) Statistics helps in providing a better understanding and accurate description of nature’s
phenomena.

(2) Statistics helps in the proper and efficient planning of a statistical inquiry in any field
of study.

(3) Statistics helps in collecting appropriate quantitative data.

(4) Statistics helps in presenting complex data in a suitable tabular, diagrammatic and
graphic form for an easy and clear comprehension of the data.

(5) Statistics helps in understanding the nature and pattern of variability of a phenomenon
through quantitative observations.

(6) Statistics helps in drawing valid inferences, along with a measure of their reliability
about the population parameters from the sample data.

WHY NEED STATISTICS


- Statistical knowledge helps you use the proper methods to collect the data, employ the
correct analyses, and effectively present the results.

- Statistics is a crucial process behind how we make discoveries in science, make


decisions based on data, and make predictions. Statistics allows you to understand a
subject much more deeply.

- Statistics guides for learning from data and navigating common problems that can lead
you to incorrect conclusions.

- It helps in critically assess the quality of analyses that others present to you. Statistics
offer critical guidance in producing trustworthy analyses and predictions. Along the
way, statisticians can help investigators avoid a wide variety of analytical traps.

2021 Class Fireworks


A Compilation by Angelux F. M|
4

SCALES OF MEASUREMENT, GRAPHS

TYPES OF DATA & MEASUREMENT SCALES

TYPES OF STATISTICAL DATA


What is data?
Data are individual pieces of factual information recorder and used for the purpose of analysis.
Statistical data can be classified in several ways. Here we provide an overview of the major
types of data in statistics.

QUANTITATIVE VS. QUALITATIVE DATA

Qualitative data refers to information about qualities, or information that cannot be measured.
It’s usually descriptive and textual. Examples include someone’s eye colour or the type of car
they drive. In surveys, it’s often used to categorise ‘yes’ or ‘no’ answers.
Quantitative data is numerical. It’s used to define information that can be counted. Some
examples of quantitative data include distance, speed, height, length and weight. It’s easy to
remember the difference between qualitative and quantitative data, as one refers to qualities,
and the other refers to quantities.

2021 Class Fireworks


A Compilation by Angelux F. M|
5

QUANTITATIVE (NUMERICAL) DATA


Quantitative, or numerical, data can be broken down into two types:
- Discrete data and
- Continuous data

A. Discrete data

Discrete data is a whole number that can’t be divided or broken into individual parts, fractions
or decimals. Classic examples are the number of people in a classroom, number of brothers in
a family, etc. You can’t have 30.5 people in the class and you can’t have 1.5 brothers. Other
examples of discrete data include the number of pets someone has – one can have two dogs but
not two-and-a-half dogs. The number of wins someone’s favourite team gets is also a form of
discrete data because a team can’t have a half win – it’s either a win, a loss, or a draw.

B. Continuous data

Continuous data describes values that can be broken down into different parts, units, fractions
and decimals. Continuous data points, such as height and weight, can be measured. Time can
also be broken down – by half a second or half an hour. Temperature is another example of
continuous data.
Continuous data can be further categorized into a couple of types: interval and ratio.
Discrete versus continuous
There’s an easy way to remember the difference between the two types of quantitative data:
data is considered discrete if it can be counted and is continuous if it can be measured.
Someone can count students, tickets purchased and books, while one measures height, distance
and temperature.

2021 Class Fireworks


A Compilation by Angelux F. M|
6

TYPES OF DATA MEASUREMENT SCALES


Properties and scales of measurement
Scales of measurement is how variables are defined and categorised. Psychologist Stanley
Stevens developed the four common scales of measurement: nominal, ordinal, interval and
ratio.
Each scale of measurement has properties that determine how to properly analyse the data. The
properties evaluated are identity, magnitude, equal intervals and a minimum value of zero.
Properties of Measurement
A. Identity: Identity refers to each value having a unique meaning.

B. Magnitude: Magnitude means that the values have an ordered relationship to one
another, so there is a specific order to the variables.

C. Equal intervals: Equal intervals mean that data points along the scale are equal, so
the difference between data points one and two will be the same as the difference
between data points five and six.

D. A minimum value of zero: A minimum value of zero means the scale has a true
zero point. Degrees, for example, can fall below zero and still have meaning. But if
you weigh nothing, you don’t exist.

2021 Class Fireworks


A Compilation by Angelux F. M|
7

In statistics, there are four data measurement scales;


- Nominal
- Ordinal,
- Interval and
- Ratio.
These are simply ways to sub-categorize different types of data. These four data measurement
scales (nominal, ordinal, interval, and ratio) are best understood with example, as you’ll see
below.
1. Nominal scale of measurement

The nominal scale of measurement defines the identity property of data. This scale has certain
characteristics, but doesn’t have any form of numerical meaning. Nominal scales are used for
labelling variables, without any quantitative value. “Nominal” scales could simply be called
“labels.”
Here are some examples, below.

Notice that all of these scales are mutually exclusive (no overlap) and none of them have any
numerical significance. A good way to remember all of this is that “nominal” sounds a lot like
“name” and nominal scales are kind of like “names” or labels.
Nominal data can be broken down again into three categories:
Nominal with order: Some nominal data can be sub-categorised in order, such as “cold,
warm, hot and very hot.
Nominal without order: Nominal data can also be sub-categorised as nominal without
order.
Dichotomous: Dichotomous data is defined by having only two categories or levels,
such as “yes’ and ‘no’, male/female.

2. Ordinal scale

With ordinal scales, the order of the values is what’s important and significant, but the
differences between each one is not really known. These values can’t be added to or subtracted
from.
Take a look at the example below.

2021 Class Fireworks


A Compilation by Angelux F. M|
8

In each case, we know that a #4 is better than a #3 or #2, but we don’t know–and cannot
quantify–how much better it is. For example, is the difference between “OK” and “Unhappy”
the same as the difference between “Very Happy” and “Happy?” We can’t say.
Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness,
discomfort, etc.
“Ordinal” is easy to remember because it sounds like “order” and that’s the key to remember
with “ordinal scales”–it is the order that matters, but that’s all you really get from these.
Where someone finished in a race also describes ordinal data. While first place, second place
or third place shows what order the runners finished in, it doesn’t specify how far the first-
place finisher was in front of the second-place finisher.

3. Interval scale of measurement

Interval scales are numeric scales in which we know both the order and the exact differences
between the values.
The classic example of an interval scale is Celsius temperature because the difference between
each value is the same. For example, the difference between 60 and 50 degrees is a measurable
10 degrees, as is the difference between 80 and 70 degrees.
Interval scales are nice because the realm of statistical analysis on these data sets opens up.
For example, central tendency can be measured by mode, median, or mean; standard deviation
can also be calculated.
Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval”
itself means “space in between,” which is the important thing to remember–interval scales not
only tell us about order, but also about the value between each item.
Here’s the problem with interval scales: they don’t have a “true zero.” For example, there
is no such thing as “no temperature,” at least not with Celsius. In the case of interval scales,
zero doesn’t mean the absence of value, but is actually another number used on the scale, like
0 degrees Celsius. Negative numbers also have meaning.
Without a true zero, it is impossible to compute ratios. With interval data, we can add and
subtract, but cannot multiply or divide.

2021 Class Fireworks


A Compilation by Angelux F. M|
9

The interval scale contains properties of nominal and ordered data, but the difference between
data points can be quantified. This type of data shows both the order of the variables and the
exact differences between the variables. They can be added to or subtracted from each other,
but not multiplied or divided. For example, 40 degrees is not 20 degrees multiplied by two.
This scale is also characterised by the fact that the number zero is an existing variable. In the
ordinal scale, zero means that the data does not exist. In the interval scale, zero has meaning –
for example, if you measure degrees, zero has a temperature.
Data points on the interval scale have the same difference between them. The difference on the
scale between 10 and 20 degrees is the same between 20 and 30 degrees. This scale is used to
quantify the difference between variables, whereas the other two scales are used to describe
qualitative values only. Other examples of interval scales include the year a car was made or
the months of the year.

Example of Interval Scale

4. Ratio scale of measurement

Ratio scales of measurement include properties from all four scales of measurement. Good
examples of ratio variables include height, weight, and duration. The data is nominal and
defined by an identity, can be classified in order, contains intervals and can be broken down
into exact value. Weight, height and distance are all examples of ratio variables. Data in the
ratio scale can be added, subtracted, divided and multiplied.
Ratio scales also differ from interval scales in that the scale has a ‘true zero’. The number zero
means that the data has no value point. An example of this is height or weight, as someone
cannot be zero centimetres tall or weigh zero kilos – or be negative centimetres or negative
kilos.
Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These
variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency
can be measured by mode, median, or mean; measures of dispersion, such as standard deviation
and coefficient of variation can also be calculated from ratio scales.
Exercise

2021 Class Fireworks


A Compilation by Angelux F. M|
10

Please answer all the questions as detail as possible while providing the process to them (5
Points) Indicate the level of measurement (nominal, ordinal, interval, ratio) for the following;
1) Movie ratings (G, PG, PG-13, etc)
2) Number of siblings a person has
3) Number of a person’s siblings in the following categories: 0-1, 2-3, 4-6, 7+
4) Gender (male, female)
5) Length of pencils (in inches)
6) Number of bus routes that pass in front of a person’s apartment
7) Year in school (freshman, sophomore, junior, senior)
8) Distance from a person’s house to the public library
9) Professions (doctor, lawyer, baker, butcher, etc)
10) Money won from the lottery
11) Temperature in Antarctica today
12) Dolphin’s IQ

TOPIC: STATISTICAL GRAPHS

STATISTICAL GRAPHS
A statistical graph or chart is defined as the pictorial representation of statistical data in
graphical form. The statistical graphs are used to represent a set of data to make it easier to
understand and interpret statistical information. The different types of graphs that are
commonly used in statistics are given below.
Types of Graphs in Statistics
The four basic graphs used in statistics include;
- Bar graphs
- Line graphs
- histogram and
- pie charts.
BAR GRAPH
Bar graphs are the pictorial representation of grouped data in vertical or horizontal rectangular
bars, where the length of bars is proportional to the measure of data.
They are also known as bar charts. The chart’s horizontal axis represents categorical data,
whereas the chart’s vertical axis defines discrete data.

2021 Class Fireworks


A Compilation by Angelux F. M|
11

The bars drawn are of uniform width, and the variable quantity is represented on one of the
axes. Also, the measure of the variable is depicted on the other axes. The heights or the lengths
of the bars denote the value of the variable, and these graphs are also used to compare certain
quantities. The frequency distribution tables can be easily represented using bar charts which
simplify the calculations and understanding of data.
Types of Bar Charts
- Vertical bar chart
- Horizontal bar chart
Even though the graph can be plotted using horizontally or vertically, the most usual type of
bar graph used is the vertical bar graph. The orientation of the x-axis and y-axis are changed
depending on the type of vertical and horizontal bar chart. Apart from the vertical and
horizontal bar graph, the two different types of bar charts are:
- Grouped Bar Graph
- Stacked Bar Graph

FOUR DIFFERENT TYPES OF BAR GRAPHS.


Vertical Bar Graphs
When the grouped data are represented vertically in a graph or chart with the help of bars,
where the bars denote the measure of data, such graphs are called vertical bar graphs. The data
is represented along the y-axis of the graph, and the height of the bars shows the values.
Horizontal Bar Graphs
When the grouped data are represented horizontally in a chart with the help of bars, then such
graphs are called horizontal bar graphs, where the bars show the measure of data. The data is
depicted here along the x-axis of the graph, and the length of the bars denote the values.
Grouped Bar Graph
The grouped bar graph is also called the clustered bar graph, which is used to represent the
discrete value for more than one object that shares the same category. In other words, a grouped
bar graph is a type of bar graph in which different sets of data items are compared. Here, a
single colour is used to represent the specific series across the set. The grouped bar graph can
be represented using both vertical and horizontal bar charts.
Stacked Bar Graph
The stacked bar graph is also called the composite bar chart, which divides the aggregate into
different parts. In this type of bar graph, each part can be represented using different colours,
which helps to easily identify the different categories. The stacked bar chart requires specific
labelling to show the different parts of the bar. In a stacked bar graph, each bar represents the
whole and each segment represents the different parts of the whole.

2021 Class Fireworks


A Compilation by Angelux F. M|
12

Properties of Bar Graph


Some of the important properties of a bar graph are as follows:
- All the bars should have a common base.
- Each column in the bar graph should have equal width.
- The height of the bar should correspond to the data value.
- The distance between each bar should be the same.

Advantages and Disadvantages of Bar Chart


Advantages:
- Bar graph summarises the large set of data in simple visual form.
- It displays each category of data in the frequency distribution.
- It clarifies the trend of data better than the table.
- It helps in estimating the key values at a glance.
Disadvantages:
- Sometimes, the bar graph fails to reveal the patterns, cause, effects, etc.
- It can be easily manipulated to yield fake information.
How to Draw a Bar Graph?
Step 1: First, decide the title of the bar graph.
Step 2: Draw the horizontal axis and vertical axis. (For example, Types of Pets)
Step 3: Now, label the horizontal axis.
Step 4: Write the names on the horizontal axis, such as Cat, Dog, Rabbit, Hamster.
Step 5: Now, label the vertical axis. (For example, Number of Pets)
Step 6: Finalise the scale range for the given data.
Step 7: Finally, draw the bar graph that should represent each category of the pet with
their respective numbers.

Bar Graph Examples


Example 1:
In a firm of 400 employees, the percentage of monthly salary saved by each employee is given
in the following table. Represent it through a bar graph.

2021 Class Fireworks


A Compilation by Angelux F. M|
13

Savings (in percentage) Number of Employees (Frequency)


20 105
30 199
40 29
50 73
Total 400

Solution:
The given data can be represented as follows

This can also be represented using a horizontal bar graph as follows:

Example 2:

2021 Class Fireworks


A Compilation by Angelux F. M|
14

A cosmetic company manufactures 4 different shades of lipstick. The sale for 6 months is
shown in the table. Represent it using bar charts.
Month Sales (in units)
Shade 1 Shade 2 Shade 3 Shade 4
January 4500 1600 4400 3245
February 2870 5645 5675 6754
March 3985 8900 9768 7786
April 6855 8976 9008 8965
May 3200 5678 5643 7865
June 3456 4555 2233 6547

The graph given below depicts the following data

Example 3:
The variation of temperature in a region during a year is given as follows. Depict it through the
graph (bar).

Month Ja Febru Mar Ap Ma Jun Jul Aug Septe Octob Nove De


nu ary ch ril y e y ust mber er mber ce
ary mb
er
Temperat - - - 4° 6° 12° 15° 8°C 7.9°C 6.4°C 3.1°C -
ure 6° 3.5°C 2.7° C C C C 2.5
C C °C
<

2021 Class Fireworks


A Compilation by Angelux F. M|
15

Solution:
As the temperature in the given table has negative values, it is more convenient to represent
such data through a horizontal bar graph.

Another example of stacked bar graph is shown below

LINE GRAPH
Is a graph that utilizes points and lines to represent change over time. In other words, it is a
chart that shows a line joining several points or a line that shows the relation between the points.
The diagram depicts quantitative data between two changing variables with a straight line or
curve that joins a series of successive data points. Linear charts compare these two variables
on a vertical and horizontal axis.

2021 Class Fireworks


A Compilation by Angelux F. M|
16

Any line graph has two axes of the following: -

X – axis; this is also known as the base or horizontal axis. It is used principally to show the
value of independent variable like date or places.
Y – axis: This is also known as the vertical axis. It is used show the values for the dependent
variable of like output of crops, minerals etc.
Example

Y axis
For dependent
Variable

X axis for independent variable

TYPES OF LINE GRAPHS


Linear graphs are extremely differently in design to meet varied functions.
· Simple line graph
· Cumulative line graph
· Divergent line graph
· Group line graph
· Compound line graph

SIMPLE LINE GRAPH


It is a form of line graph, designed to have one line to illustrate the values of one item in
relation to dependent and independent variables. Example it is designed to show the values of
one item per varied date or place.
CONSTRUCTION OF THE SIMPLE LINE
Consider the given hypothetical data below showing maize production for country X in 0,000
metric tons (1990 – 1995).

2021 Class Fireworks


A Compilation by Angelux F. M|
17

YEAR PRODUCTION
1990 100
1991 250
1992 300
1993 150
1994 500
1995 400

Procedure
(a) Variable’s identification

Dependent variable = Production values


Independent variable = Date (Years).
Y – Axis …… production values
X – axis ……. Years

(b) Vertical and horizontal scales estimation

500cm = 1cm=500,000 tons


10cm
Hence; VS is 1 cm to 50000 tons.
Horizontal scale is up on decision
hence; 1cm represents 1 year

Fig.

2021 Class Fireworks


A Compilation by Angelux F. M|
18

MAIZE PRODUCTION FOR COUNTRY X IN (0,000) Metric tons

Scales:
VS: 1 cm to 50 tons
HS: 1cm to 1 year

Advantage of the simple line graph


- It is much easier to prepare as it does not involve too complicated mathematical
works, and
also a single line establishes the graph
- From the graph, the absolute values are extracted
- It is comparatively easier to read and interpret the values
- It has perfect replacement by simple bar graph

Disadvantage of the simple line graph


- It is a limited graphical method as only suited to represent the value for one item.
- Sometimes it becomes difficult to assess the vertical scale if the variation between the
highest and lowest values appears wider enough

CUMULATIVE LINE GRAPH

It is a form of line graph designed to show the accumulated total values at various dates or
possibly places for a single item. This graphical method has no alternative graphical bar method
as it can be compared to other linear graphical
Construction of the cumulative line graph

Consider the given hypothetical data below showing maize production for country X.

2021 Class Fireworks


A Compilation by Angelux F. M|
19

YEAR PRODUCTION
1990 50
1991 40
1992 90
1993 100
1994 90
1995 130

Procedures
(a) Identification of Variables
· Dependent variable = Production values
· Independent variable = Date (Years)
Y - axis ………. Production values
X – axis ………...Years

(b) Vertical and horizontal scales estimation

Hence: VS; 1cm represents 50 tons


(b) Determination of the cumulative values

YEAR PRODUCTION CUM VALUES


1990 50 50
1991 40 90
1992 90 180
1993 100 280
1994 90 370
1995 130 500

2021 Class Fireworks


A Compilation by Angelux F. M|
20

CUMULATIVE LINE GRAPH: MAIZE PRODUCTION FOR COUNTRY X

Scale: -
VS: 1cm represents 50 tons
HS: 1cm represents 1 year

Advantage of the cumulative line graph


- The graphical method shows cumulative values
- From the graph the values can be revealed and quantitatively analyzed

Disadvantage of the cumulative line graph


- The graphical method is not suited to show cumulative values for more than one item,
it is thus the graphical method limited for showing the values of a single item.
- It needs high skill to reveal the actual values of the item represented
- It has no alternative graphical bar method.

DIVERGENT LINE GRAPH

It is a form of line graph designed to illustrate the increase and decrease of the distribution
values in relation to the mean.
The graph is designed to have upper and lower sections showing positive and negative
values respectively. The two portions are separated by the steady line
graduated with zero value along the vertical line. The steady line also shows the average of all
values.
Construction of the divergent line graph

2021 Class Fireworks


A Compilation by Angelux F. M|
21

Consider the following tabled data which show export values of coffee for country X in
millions of dollars.
EXPORT VALUES (000,000
YEAR
dollars)
1952 345
1953 256.5
1954 283
1955 500
1956 335
1957 330.5

(a) Identification of Variables

· Dependent variable ……Export values


· Independent variable ---- Date (Years)
Y - axis ………. Export values
X – axis ………...Years

(b) Computation of the arithmetic mean

345 + 256 + 283 + 300 + 335 + 330.5 = 1850


Then;

2021 Class Fireworks


A Compilation by Angelux F. M|
22

Computation of the deviation values

1952 345-308 = 37
1953 256.5 – 308 = 52.5
1954 283-308 = -25
1955 300 – 308 = -8
1956 335 – 308 = 27
1957 330.5 – 308 = 22.5
(c) Estimation of the vertical scale.

Thus: the vertical scale


1cm represents 15 or -15 million dollars

(d) The graph has to be redrawn accordingly as follows: -

2021 Class Fireworks


A Compilation by Angelux F. M|
23

Scales: -
- Vertical scale 1cm represents 15 tons
- Horizontal scale 1cm represents 1 year

Advantage of the divergent line graph


- The graphical method is useful for showing increase and decrease of the values.
- The graphical method shows the average of all values
- It has perfect replacement by divergent bar graph

Disadvantage of the divergent line graph


- The graphical method is not suited to show the increase and decrease values for more
than one item. It is thus limited to a single item.
- It needs high skill to reveal the actual values of the item represented.
- It is time consuming graphical method as its preparation involves a lot of
mathematical works.
- It requires high skill to construct the divergent line graph.

GROUP LINE GRAPH

It is a form of statistical line graph designed to have more than one lines of varied textures to
illustrate the values of more than one items. Group line graph is alternatively known as
composite, comparative, and multiple line graphs.

2021 Class Fireworks


A Compilation by Angelux F. M|
24

Construction of the group line graph

Crop/Year 1997 1998 1999 2000 2001


Tea 24,126 32,971 33,065 35150 34,448
Coffee 16,856 12,817 12,029 11,707 7,460
Horticulture 13,752 14,938 17,641 21,216 19,846
Tobacco 1,725 1,607 1,554 2,167 2,887

Consider the given data below showing values of export crops from Kenya (Ksh Million).
(a) Identification of Variables

Dependent variable …… export values


Independent variable …. Date (years)
Y - -axis………. export values
X – axis………...Years

Hence; VS 1cm represents 5000 export value


Thus; the group line graph appears as follows: -
KENYA: CROPS EXPORT VALUES

2021 Class Fireworks


A Compilation by Angelux F. M|
25

Scales: -
Vertical scale: 1cm to 5,000 export values

Advantage of group line graph


i. It is much easier to prepare as it involves no complicated mathematical works
ii. It is useful graphical method for showing the values of more than one cases
iii. From the graph, the absolute values are extracted as the values are directly shown
iv. It is comparatively easier to read and interpret the values.
v. It has perfect replacement by group bar graph.

Disadvantage of the group line graph


i. Sometimes it becomes difficult to assess the vertical scale if the variation between
the highest and lowest values appears wider enough
ii. Crossing of the lines on the graph may confuse the interpreter.
iii. A problem may arise in the selection of the varied line textures.

COMPOUND LINE GRAPH

It is a line graph designed to have more than one lines compounded to one another by varied
shade textures to show the cumulative values of more than one items.
Construction of the compound line graph
Consider the given data below showing cocoa production for the Ghana provinces in 000
tons.
YEAR/PRO
TV Togoland E. province W. province Ashanti
V
1947/48 40 40 30 35
1948/49 50 60 45 100
1949/50 45 46 89 110
1950/51 45 47 44 124
1951/52 47 23 50 100
1952/53 51 14 57 118

2021 Class Fireworks


A Compilation by Angelux F. M|
26

Procedure
Independent variable … Date (Years)
Y - -axis… export values
X – axis… Years
(a) Identification of Variables
Dependent variable…… export values
(b) Cumulative values determination for the dates.

1947/48 40+40+30+35 = 145


1948/49 50+60+45+100 = 225
1949/50 45+46+89+110 = 290
1950/51 45+47+44+124 = 260
1951/52 47+23+50+100 = 220
1952/53 51+14+57+118= 240

(c) Vertical and horizontal scales determination


Hence; The vertical scale, 1cm represent 50 tons

Hence; the vertical scale, 1cm represent 50 tons


Thus the graph appears as follow: -

2021 Class Fireworks


A Compilation by Angelux F. M|
27

Advantage of compound line graph


I. It is useful graphical method for showing the cumulative values of more than one
case
II. Depending on the skill the interpreter has, from the graph, the absolute values are
extracted as the value directly shown.
III. It has perfect replacement by compound bar graph
IV. It is comparatively easier to assess the vertical scale to be used.

Disadvantage of compound line graph


i. It needs high skill to interpret the graph
ii. It needs high skill to construct the graph

HISTOGRAM

1.1 Definition of Histogram


In statistics, Histogram is defined as a type of bar chart that is used to represent statistical
information by way of bars to show the frequency distribution of continuous data. It indicates
the number of observations which lie in-between the range of values, known as class or bin.

2021 Class Fireworks


A Compilation by Angelux F. M|
28

The first step, in the construction of histogram, is to take the observations and split them into
logical series of intervals called class or bins. X-axis indicates, independent variables i.e.,
classes while the y-axis represents dependent variables i.e., occurrences.
Rectangle blocks i.e., bars are depicted on the x-axis, whose area depends on the classes. See
figure given below:

Notice that the horizontal axis of Figure 1 consists of binned times: the first bin includes visits
from 0 up to and including ten minutes, the second bin from 10 up to and including 20 minutes,
and so on.
Characteristics of a histogram
1. histograms are used to show distributions of variables
2. Histogram plot binned quantitative data
3. Bars cannot be reordered in histograms.
4. there are no spaces between the bars of a histogram since there are no gaps between the
bins. An exception would occur if there were no values in a given bin but in that case
the value is zero rather than a space.
5. The widths of the bars in a histogram need not be the same
1.2 Uses of a Histogram
1.2.1 Identifying the most common process outcome
A quick look at a histogram can immediately reveal what the most common outcome of a
process with varying outcomes is, any special trends will quickly become apparent.
1.2.2 Identifying data symmetry
Sometimes, you will spot trends that lean in two directions simultaneously. A histogram can
make it very easy to identify those occurrences and know when your processes are prone to
producing symmetrical results in some circumstances. On the other hand, it can also help you
identify possible issues, as sometimes symmetry is not what you expect to see in your results.

2021 Class Fireworks


A Compilation by Angelux F. M|
29

1.2.3 Spotting deviations


Likewise, a histogram can make it quite obvious when your results are deviating from the
expected values. You can rely on a histogram to tell you when the results are not moving in the
right direction
1.2.4 Verifying equal distribution
In some cases, symmetry is exactly what you’re looking for, especially in a process prone to
random deviations. If you’re looking to make sure that you have a maximized coverage of your
outputs, the histogram can be a very simple tool to go about that. If one of the data points is
below its standard norms, this will become apparent very quickly, and you can take appropriate
measures to correct the situation.
1.2.4 Spotting areas that require little effort
Last but definitely not least, a histogram can be helpful in determining when you’re wasting
too much effort or resources on a specific task.

1.3 PRESENTATION OF A HISTOGRAM


Steps followed in drawing histogram for grouped data.
N.B If the data given are ungrouped, it is advised to group
Step 1: Represent the data in the continuous (exclusive) form if it is in the discontinuous
(inclusive) form.
Step 2: Mark the class intervals along the X-axis on a uniform scale.
Step 3: Mark the frequencies along the Y-axis on a uniform scale.
Step 4: Construct rectangles with class intervals as bases and corresponding frequencies as
heights.

The method of drawing a histogram is explained in the following example.


Example 1:
Draw a histogram for the following table which represent the marks obtained by 100 students
in an examination:

0 – 10 – 20 – 30 – 40 – 50 – 60 – 70 – Marks
10 20 30 40 50 60 70 80
5 10 15 20 25 12 8 5 Number
of
students

2021 Class Fireworks


A Compilation by Angelux F. M|
30

Solution:
The class intervals are all equal with length of 10 marks. Let us denote these class intervals
along the X-axis. Denote the number of students along the Y-axis, with appropriate scale.
The histogram is given below.
Scale:
X – axis = 1 cm = 10 marks
Y – axis = 1 cm = 5 students

In the above diagram, the bars are drawn continuously. The rectangles are of lengths (heights)
proportional to the respective frequencies. Since the class intervals are equal, the areas of the
bars are proportional to the respective frequencies.
Example 2:
In a study of diabetic patients in a village, the following observations were noted. Represent
the above data by a frequency polygon using histogram.

Ages 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70

Number of 3 5 13 20 10 5
patients

Scale
X – axis = 1 cm = 10age
Y – axis = 1 cm = 2patients

2021 Class Fireworks


A Compilation by Angelux F. M|
31

When drawing histograms, it is possible that the intervals will not have the same width.
Consider the data given in the table below.

The way the data have been presented makes it impossible to draw a histogram with equal class
intervals.
In order to keep the histogram fair, the area of the bars, rather than the height, must be
proportional to the frequency. So, on the vertical scale we plot frequency density instead of
frequency, where;

Frequency Density = Frequency/Class Width

2021 Class Fireworks


A Compilation by Angelux F. M|
32

Rewriting the table with an extra column for frequency density, gives

And you can draw the histogram with frequency density on the vertical axis.

Note
You can see that, it is the area that is proportional to the frequency – in fact, a frequency of 1
is represented by 10 little squares.

PIE CHART
A pie chart is a type of graph that represents the data in the circular graph. The slices of pie
show the relative size of the data, and it is a type of pictorial representation of data. A pie chart
requires a list of categorical variables and numerical variables. Here, the term “pie” represents
the whole, and the “slices” represent the parts of the whole.
Formula

2021 Class Fireworks


A Compilation by Angelux F. M|
33

The pie chart is an important type of data representation. It contains different segments and
sectors in which each segment and sector of a pie chart forms a specific portion of the
total(percentage). The sum of all the data is equal to 360°.
The total value of the pie is always 100%.
To work out with the percentage for a pie chart, follow the steps given below:
- Categorize the data
- Calculate the total
- Divide the categories
- Convert into percentages
- Finally, calculate the degrees

Therefore, the pie chart formula is given as


(Given Data/Total value of Data) × 360°
Note: It is not mandatory to convert the given data into percentages until it is specified. We
can directly calculate the degrees for given data values and draw the pie chart accordingly.

How to Create a Pie Chart?


Imagine a teacher surveys her class on the basis of favourite Sports of students:
Football Hockey Cricket Basketball Badminton
10 5 5 10 10

The data above can be represented by a pie chart as following and by using the circle graph
formula, i.e., the pie chart formula given below. It makes the size of the portion easy to
understand.
Step 1: First, Enter the data into the table.
Step 2: Add all the values in the table to get the total.
- Total students are 40 in this case.
Step 3: Next, divide each value by the total and multiply by 100 to get a per cent:

Football Hockey Cricket Basketball Badminton

(10/40) × 100 (5/ 40) × 100 (5/40) ×100 (10/ 40) ×100 (10/40) × 100
=25% =12.5% =12.5% =25% =25%

2021 Class Fireworks


A Compilation by Angelux F. M|
34

Step 4: Next to know how many degrees for each “pie sector” we need, we will take a full
circle of 360° and follow the calculations below:
The central angle of each component =

(Value of each component/sum of values of all the components) ✕360°

Football Hockey Cricket Basketball Badminton


(10/ 40) × (5 / 40) × 360° (5/40) × 360° (10/ 40) × (10/ 40) ×
360° =45° =45° 360° 360°
=90° =90° =90

Now you can draw a pie chart.


Step 5: Draw a circle and use the protractor to measure the degree of each sector.

A PIE CHART TO SHOW FAVOURITE SPORTS OF STUDENTS:

Question: The percentages of various cops cultivated in a village of particular distinct are
given in the following table.

Items Wheat Pulses Jowar Groundnuts Vegetables Total


Percentage 125/3 125/6 25/2 50/3 25/3 100
of cops

Represent this information using a pie-chart.

2021 Class Fireworks


A Compilation by Angelux F. M|
35

Advantages of pie chart


- The picture is simple and easy-to-understand
- Data can be represented visually as a fractional part of a whole
- It helps in providing an effective communication tool for the even uninformed audience
- Provides a data comparison for the audience at a glance to give an immediate analysis
or to quickly understand information
- No need for readers to examine or measure underlying numbers themselves, which can
be removed by using this chart
- To emphasize a few points you want to make, you can manipulate pieces of data in the
pie chart

Disadvantages of a pie chart


- It becomes less effective if there are too many pieces of data to use
- If there are too many pieces of data. Even if you add data labels and numbers may not
help here, they themselves may become crowded and hard to read
- As this chart only represents one data set, you need a series to compare multiple sets
- This may make it more difficult for readers when it comes to analyse and assimilate
information quickly

TOPIC: FREQUENCY DISTRIBUTION

2021 Class Fireworks


A Compilation by Angelux F. M|
36

Grouped and ungrouped/raw data


Grouped data refers to data that have been organized in group(s).
Histogram and frequency tables can easily describe this type of data.

NATURE OF CLASS
The following are some basic technical terms when a continuous frequency distribution is
formed/ data are classified according to class intervals.
CLASS LIMITS/CLASS BOUNDARY
Refers to the lowest and highest values that can be included in the class. For example, 10-20,
the lowest value of class is 10 and the highest value of class is 20. the two boundaries of class
are known as the lower limits (on left side) and the upper limit of the class (on the right side).
CLASS INTERVAL
Refers to the numerical width of any class in a particular distribution. It’s defined as the
difference between the upper-class limit and the lower-class limit (range of each grouping data)
For example, 0-2,2-4, or 0-9, 10-19…
CLASS SIZE/ CLASS WIDTH
The difference between the upper limit and low limit of class. It is denoted by symbol ‘C’.
Class width = Upper Limit – Lower Limit OR Range
Number of classes Number of classes
CLASS MARK
The middle value of the selected class size. It can be calculated as follows:

Class mark= Upper limit+ lower limit


2
RANGE
The difference between largest and smallest value of the observation. It denoted by ‘R’ i.e
R= Largest value-smallest value
FREQUENCY
The number of classes falling within a particular class interval.
NUMBER OF CLASS INTERVAL
This refers to number given to facilitate the creation of the class interval.
Frequency distribution table

2021 Class Fireworks


A Compilation by Angelux F. M|
37

Refers to statistical way of condensing and summarizing large amount of data in useful format
that can help in analysis of data so as to give out interpreted information.
• It describes characteristics of population
• It allows comparison of data set in interval manners
• It facilitates graphic presentation of data.
How to create frequency distribution table
76,84,76,103,92,47,98,54,80,91,69,86,83,75,93,89,96,65,94,85
Create a frequency table using 6 classes
Step 1: calculate class width
Step 2: create table by starting with the lowest number value, then add the value obtained from
class width in each side of the limit.
Step 3: insert frequency

Types of frequency distribution table


i. Relative frequency distribution
ii. Cumulative frequency distribution

CLASS BOUNDARY
Class boundaries are the numbers used to separate classes. The boundaries have one more
decimal place than the raw data and therefore do not appear in the data. The lower class
boundary is found by subtracting 0.5 unit from the lower class limit and the upper class
boundary is found by adding 0.5 units to upper class limit.
• It is used in drawing histogram.
Creating a class boundary
• Subtract the first upper class limit from the second lower class limit.
• Divide the difference by 2
• Subtract this value from all of the lower-class limits and add the value to all of the
upper-class limits.

Example

Grades Class boundaries

2021 Class Fireworks


A Compilation by Angelux F. M|
38

20 – 29 19.5 – 29.5

30 – 39 29.5 – 39.5

40 – 49 39.5 – 49.5

50 – 59 49.5 – 59.5

60 – 69 59.5 – 69.5

Cumulative frequency of a class


• The sum of the frequencies of all the previous classes and that particular, is called the
cumulative frequency of class.
Cumulative frequency Table
A table which shows the cumulative frequencies over various classes is called a cumulative
frequency distribution table.
Ex. Following are the ages (in years) of 360 patients, getting medical treatment in a hospital.

Age (in years) 10-20 20-30 30-40 40-50 50-60 60-70

Number of 90 50 60 80 50 30
patients

So, the cumulative frequency table for the above data is given below:

Class interval frequency Cumulative frequency

10-20 90 90

20-30 50 140

30-40 60 200

40-50 80 280

50-60 50 330

60-70 30 360

2021 Class Fireworks


A Compilation by Angelux F. M|
39

The monthly wages (in rupees) of 28 labourers working in factory, are given below:
220 268 258 242 210 267 272 242
311 290 300 320 319 304 302 292
254 278 318 306 210 2 40 280 316
306 215 256 328
Form a cumulative frequency table with class interval of length 20.
Solution

Example
Create a cumulative frequency table with 10 classes
147, 167, 136, 178, 175, 116, 155, 121
115, 156, 176, 141, 189, 167, 177, 208
212, 143, 203, 210, 188, 178, 212, 118
197, 145, 134, 133, 196, 185.

Bills (in rs) frequency

115-124 4

125-134 2

135-144 3

145-154 2

155-164 2

165-174 2

2021 Class Fireworks


A Compilation by Angelux F. M|
40

175-184 5

185-194 4

195-204 3

205-214 3

TOTAL 30

Quesstion
You are provided with the following data;
Age Number of Students
0-5 35
5-10 45
10-15 50
15-20 50

Calculate;
1. The lower limit of the first class interval = 0
2. The class limits of the third class
Answer. The lower-class limit = 10
The upper-class limit = 15
3. Class mark for the interval 5 – 10
Answer: Class mark = Upper Class Limit + Lower Class Limit
2
10 + 5 = 7.5
2
Class mark = 7.5
4. The class size
Class size (C) = Upper Limit – Lower Limit
=5–0
C=5

2021 Class Fireworks


A Compilation by Angelux F. M|
41

MEASURES OF CENTRAL TENDENCY


• Mean- Mean in is the sum of all the elements of a set divided by the number of elements.
It is a mathematical average of two or more numbers. The mean can be calculated in
two ways; The arithmetic mean is calculated by summing the numbers and dividing by
their total counts, and the geometric mean is more complicated and multiplications of
numbers. The importance of mean lies in its ability to represent the whole dataset with
a single value.
• Median- refers to the middle value
• Mode- refers to the value that appears most often

Median of Ungrouped data


MEDIAN
The value of the middlemost observation, obtained after arranging the data in ascending order,
is called median of the data.
Example: consider the data 4, 4, 6, 3, 2,
Solution
Arrange this data in ascending order
2, 3, 4, 4, 6
There are 5 observations.
Thus, median=middle value i.e 4
Median is 4
Example
Consider the numbers 1,3,4,8,9,9,10
In this case, the middle value is 8.
Hence the median is 8.
If there are even a number of observations in a group, then there cannot be a single middle
value. Hence in such cases, the median is calculated by finding the mean of the two middle
values.
For example, consider the numbers 3,9,7,5,7,5
The median, in this case, is found out by calculating the mean of the middle two numbers, i.e.,
(7+5)/2 = 6.
Example
Let’s consider the data:
50, 67, 34, 78, 43, 24

2021 Class Fireworks


A Compilation by Angelux F. M|
42

Solution
Arrange them in ascending order
24,34,43,50,67,78
Take two middle number, plus them and divide them into 2
43+50
2
The median is 46.5
MODE OF UNGROUPED DATA
The value which appears most often in the given data i.e the observation with the highest
frequency is called a mode of data.
For ungrouped data, we just need to identify the observer which occurs maximum times.
• Mode =Observation with maximum frequency
For example, in the data: 6,8, 9, 3, 4, 6, 7, 6, 3
The value 6 appears the most times, thus, mode=6
An easy way to remember mode is: Most Often Data Entered.
Mode can be unimodal, bimodal data, trimodal
More examples
The following are the marks scored by 20 students in the class.
90, 70, 50, 30, 40, 86, 65, 73, 68, 90, 90, 10, 73, 25, 35, 88, 67, 80, 74, 46
Find the mode.
Solution:
Since the marks 90 occurs the maximum number of times, three times compared with the other
numbers, mode is 90.
Example
A doctor who checked 9 patients’ sugar level is given below. Find the mode value of the sugar
levels. 80, 112, 110, 115, 124, 130, 100, 90, 150, 180
Solution:
Since each values occurs only once, there is no mode.
Example
Compute mode value for the following observations.
2, 7, 10, 12, 10, 19, 2, 11, 3, 12

2021 Class Fireworks


A Compilation by Angelux F. M|
43

Solution:
Here, the observations 2, 10 and 12 occurs twice in the data set, the modes are 2, 10 and 12.
For discrete frequency distribution, mode is the value of the variable corresponding to the
maximum frequency.
Example 5.24
Calculate the mode from the following data

Solution:
Here, 7 is the maximum frequency, hence the value of x corresponding to 7 is 8.
Therefore 8 is the mode
Sample examples
A. The monthly salary of 10 employees in a factory are given below:
5000, 7000, 5000, 7000, 8000, 7000, 7000, 8000, 7000, 5000
Find the mean, median, and mode
B. Find the mode of the given data: 3.1, 3.2, 3.3, 2.1, 1.3, 3.3, 3.1
C. For the data 11, 15,17, x*1, 19, x-2, 3 if the mean is 14, find the value of x. Also find the
mode of the data.
ANSWERS
A. Mean 6600
Median 7000
Mode 7000
B. 3.1, 3.3
C. Mean x= 17
Mode=15

MEAN, MEDIAN AND MODE FOR GROUPED DATA


MEAN
Mean for grouped data is calculated using the formula

2021 Class Fireworks


A Compilation by Angelux F. M|
44

Example; Calculate the mean from the following data after drawing a distribution table using
the class of 5
118, 123, 124, 125, 127, 129, 130, 130, 133, 133, 136, 138, 141, 142, 149, 150, 154
Solution
Class width = Range
Number of classes
Class width = 154 -118 = 7.2 = 8
5
Class width = 8
Class interval F C.F X FX
118 - 125 4 4 121.5 486
126 - 133 6 10 129.5 777
134 - 141 3 13 137.5 412.5
142 - 149 2 15 145.5 291
150 - 157 2 17 155.5 307
∑f = 17 ∑FX = 2273.5

Mean = ∑FX = 2273.5 = 133.7


∑f 17

MEDIAN
Midian for grouped data is calculated using the following formula

Median = L + (N/2 -C)i


f
Where;

2021 Class Fireworks


A Compilation by Angelux F. M|
45

L = Lower class limit of the median class


N = Total frequency
C = Cumulative frequency above the median class
i = Class interval of the median class
f = Frequency of the median class

NOTE: Certain sources use the letters; m in space of C, and c in space of i but their
definitions remain the same, in such case the formula looks like this;
Median = L + (N/2 -m) c
f

Solved examples for Median

The following data attained from a garden record of certain period. Calculate the median weight
of the apple

Solution:

2021 Class Fireworks


A Compilation by Angelux F. M|
46

Example 2
The following table shows age distribution of persons in a particular region:
Find the median age.

Solution:
We are given upper limit and less than cumulative frequencies. First find the class-intervals
and the frequencies. Since the values are increasing by 10, hence the width of the class interval
is equal to 10.

2021 Class Fireworks


A Compilation by Angelux F. M|
47

Example 3
The following is the marks obtained by 140 students in a college. Find the median marks

Solution:

2021 Class Fireworks


A Compilation by Angelux F. M|
48

Merits of Midian
· It is easy to compute. It can be calculated by mere inspection and by the graphical
method
· It is not affected by extreme values.
· It can be easily located even if the class intervals in the series are unequal

Limitations of Midian
· It is not amenable to further algebraic treatment
· It is a positional average and is based on the middle item
· It does not take into account the actual values of the items in the series

MODE
Mode for grouped data is calculated using the following formula

2021 Class Fireworks


A Compilation by Angelux F. M|
49

Example
The following data relates to the daily income of families in an urban area. Find the modal
income of the families.

Solution:

2021 Class Fireworks


A Compilation by Angelux F. M|
50

Determination of Modal class:


For a frequency distribution modal class corresponds to the class with maximum frequency.
But in any one of the following cases that is not easily possible.

i. If the maximum frequency is repeated.


ii. If the maximum frequency occurs in the beginning or at the end of the distribution
iii. If there are irregularities in the distribution, the modal class is determined by the
method of grouping.

Steps for preparing Analysis table:


We prepare a grouping table with 6 columns
i. In column I, we write down the given frequencies.
ii. Column II is obtained by combining the frequencies two by two.
iii. Leave the 1st t frequency and combine the remaining frequencies two by two and
write in column III
iv. Column IV is obtained by combining the frequencies three by three.

2021 Class Fireworks


A Compilation by Angelux F. M|
51

v. Leave the 1st frequency and combine the remaining frequencies three by three and
write in column V
vi. Leave the 1st and 2nd frequencies and combine the remaining frequencies three by
three and write in column VI
Mark the highest frequency in each column. Then form an analysis table to find the modal
class. After finding the modal class use the formula to calculate the modal value.

Example 5.26
Calculate mode for the following frequency distribution:

Solution:

Analysis Table:

The maximum occurred corresponding to 20-25, and hence it is the modal class.

2021 Class Fireworks


A Compilation by Angelux F. M|
52

Merits of Mode:
· It is comparatively easy to understand.
· It can be found graphically.
· It is easy to locate in some cases by inspection.
· It is not affected by extreme values.
· It is the simplest descriptive measure of average.

Demerits of Mode:
· It is not suitable for further mathematical treatment.
· It is an unstable measure as it is affected more by sampling fluctuations.
· Mode for the series with unequal class intervals cannot be calculated.
· In a bimodal distribution, there are two modal classes and it is difficult to determine
the values of the mode

SOME CLASS EXERCISES


1. The mineral industry hardness scale for mineral particles includes the following
samples:
Diamond, 10 Tanzanite, 5 Uranium, 1 Gold, 7 Quartz, 5
What is the range of the fuel minerals listed? (Ans. 9)
2. Sukuma wiki received statistics quiz grade of 67%, 81%. 93%, 96% and 100% within
first 5 weeks of the semester.

2021 Class Fireworks


A Compilation by Angelux F. M|
53

What is the range of Sukumawiki’s statistics quiz grades? (Ans. 33%)


3. The following sample data set lists the number of 50 internet subscribers spent in the
internet during their most recent session. Construct a frequency distribution that has
seven classes.
50 45 41 17 11 7 22 4 28 21 19 27 37 51 54 42 86 41 78 58 22 56 11 7 69 30 80
56 29 33 46 31 39 26 18 29 34 59 73 77 36 39 30 62 54 67 39 31 53 44
4. Given is the data on number of hours 20 students spend on their works each week.
Determine the class width if there will be 5 classes and compute a frequency
distribution table.
3, 5, 5, 5.5, 7, 8, 12, 12, 14, 14, 20.5, 21.5, 21.5, 21.5, 22, 25, 29, 33, 39, 39
5. India has an estimated population of people 1,420,062,022, Mexico’s population is
132,328,035. China has a population of 1,368,737,531. The United Kingdom has a
population of 329,093,110.
What is the range of the data set? (Ans. 1,287,733,987)
6. Find the mean and median class of the following

X 8-10 11-13 14-16 17-19 20-22 23-25


f 2 4 6 4 3 1

X 0-15 15-30 30-45 45-60 60-75


f 5 20 40 50 25

X 5-15 15-25 25-35 35-45 45-55 55-65 65-75


f 6 10 16 15 24 8 7

MEASURES OF RELATIVE POSITION

2021 Class Fireworks


A Compilation by Angelux F. M|
54

QUARTILE AND DECILE


INTERQUARTILE RANGE
The interquartile range (I.Q.R) is a measure of variability, based on dividing a data set into
quartiles.
Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part
are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3,
respectively.
Q1 is the “middle” value in the first half of the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the “middle” value in the second half of the rank-ordered data set.
The interquartile range is equal to Q3 - Q1.
For example,
Consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11.
Q1 is the middle value in the first half of the data set. Since there are an even number
of data points in the first half of the data set, the middle value is the average of the two
middle values; that is,
Q1 = (3 + 4)/2 or Q1 = 3.5.
- Q3 is the middle value in the second half of the data set. Again, since the second half
of the data set has an even number of observations, the middle value is the average of
the two middle values;
that is,
Q3 = (6 + 7)/2 or Q3 = 6.5.
The interquartile range is Q3 - Q1,
So,
I.Q.R = 6.5 – 3.5 = 3.

SEMI-INTERQUARTILE RANGE
The semi-interquartile range (SIR) (also called the quartile deviation) is a measure of spread.
It tells you something about how data is dispersed around a central point (usually the mean).
The SIR is half of the interquartile range.
How to Calculate the Semi Interquartile Range / Quartile Deviation
As the S.I.R is half of the Interquartile Range, all you need to do is find the IQR and then divide
your answer by 2.
Another way is to use the quartile deviation formula:

2021 Class Fireworks


A Compilation by Angelux F. M|
55

Note: You might see the formula QD = ½ (Q3 – Q1). Algebraically they are the same.

Example 01
Question: Find the Quartile Deviation for the following set of data:
{490, 540, 590, 600, 620, 650, 680, 770, 830, 840, 890, 900}

Step 1: Find the first quartile, Q1.


This is the median of the lower half of the set {490, 540, 590, 600, 620, 650}.
Q1 = (590 + 600) / 2 = 595.
Step 2: Find the third quartile, Q3.
This is the median of the upper half of the set {680, 770, 830, 840, 890, 900}.
Q3 = (830 + 840) / 2 = 835.
Step 3: Subtract Step 1 from Step 2.
835 – 595 = 240.
Step 4: Divide by 2.
240 / 2 = 120
:- The quartile deviation for this set of data is 120.

The fireworks TO BE CONTINUED…!

 Percentile

1. MEASURES OF VARIABILITY

 Range and QD

 Average deviation/mean deviation

 Standard deviations

2021 Class Fireworks


A Compilation by Angelux F. M|

You might also like