Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20



Mathematics as a Tool
(Data Management)

Pajo, Melher and Villanueva, Ellen
Table of Contents
Gathering and Organizing Data 3
Four Levels of Measurement 3
Textual Form………………………………………………………………………………………………………………………………………..5

Tabular Method……………………………………………………………………………………………………………………………………5

Graphical Presentation…………………………………………………………………………………………………………………………8


Frequency Polygon…………………………………………………………………………………………………………………….9

Box Plot……………………………………………………………………………………………………………………………………10

Pareto Chart…………………………………………………………………………………………………………………………….11

Bar Graph………………………………………………………………………………………………………………………………..12

Time Series Graph……………………………………………………………………………………………………………………13

Pie Chart………………………………………………………………………………………………………………………………….13

MEASURES OF CENTRAL TENDENCY……………………………………………………………………………………………….…14







At the end of the lesson, students are expected to:
1. Identify the type of a data and classify it according to its level of measurement;
2. Represent data using different types graphs and charts and be able to interpret
the data as a whole; and
3. Describe a set of scores using the measures of central tendency.

Mathematics as a Tool
Mathematics is a quite significant tool for global knowledge and communication.
Students may use it to make sense of the world and solve complicated and real-world
issues. Viewing mathematics in a wider perspective provides students with a new
perspective on traditional subject, making math more relevant and meaningful to them.
For students to function in a global environment, math material must assist them in
achieving global competency, which includes comprehending diverse viewpoints and
world circumstances, realizing that issues are interrelated globally, and communicating
and responding appropriately. In math, this entails rethinking common subject in
standard ways and demonstrating to learners how the world is made up of situations,
events, and phenomena that can be sorted out using the appropriate arithmetic tools.
This study resource will teach you how mathematics may be used as a strong tool in
nature. This material will extend your knowledge of mathematics as it pertains to data
management. You will use techniques for presenting, organizing, measuring and
analyzing sets of data.
Data Management encompasses all disciplines concerned with the management
of data as a valuable resource. We will investigate and use data management tools as
we create the groundwork for a framework that will allow us to utilize data meaningfully
and intelligently to make choices. Data management is the process of acquiring and
processing information in order to assure the data's accessibility and dependability for
its users. Statistics are a vital tool in the processing and management of such data.
Statistics are used in almost every aspect of human effort. It is often used in education,
research, business, agriculture, and other sectors, as well as in everyday life.
What is Data?
A gathering of facts such as numbers, words, measurements, observations, or
even simple descriptions of objects can be defined as data which may or may not be
numerical. A data set is a collection of numbers or values relating to a certain subject. A
data set is, for example, each student's test scores in a specific class or the quantity of
fish eaten by each dolphin in an aquarium.

Almost anything can be used to collect data. The object about which you are
gathering data is referred to as a 'variable' (since the observed value can vary). For
instance, the characteristic or 'variable' under consideration may be height, which is
numerical or 'quantitative.' Alternatively, the variable may be hair color, which is
'qualitative' instead of numerical. We can classify data into two types known as
qualitative and quantitative data.
Qualitative Data- data that can only be written in words, not numbers which deals with
categories of attributes.
Quantitative Data- data that can be written in numbers which can be discrete or
continuous. Discrete data are those that can be obtained through counting such as
number of books in the table and number of toys in the box while continuous data are
those that can be obtained through measuring such as the height and weight of an
 the age of a person  whether someone is left
 the weight of a person or right handed
 the population of a city  a person’s favorite color
 the time it takes to travel to  the type of car someone
work wants to buy
 first language spoken

Why is Level of Measurement important?

To begin, understanding the level of measurement aids in determining how to
interpret the data from that variable. When you know a measure is nominal, you know
that the number values are merely abbreviations for the lengthier names. Furthermore,
understanding the level of measurement allows you to determine what statistical
analysis is appropriate for the data that were allocated. If a measure is nominal, you
would never average the data values or do a t-test on the data.
There are typically four levels of measurement that are defined:
 Nominal
 Ordinal
 Interval
 Ratio

 The NOMINAL LEVEL OF MEASUREMENT is the least of the four data
characterization methods. Nominal data is concerned with the use of names,
categories, or labels. At the nominal level, data is qualitative. The nominal level
of measurement is represented by eye color, yes or no replies to a survey, and
favorite breakfast cereal.
 The ORDINAL LEVEL OF MEASUREMENT is used to rank qualitative data.
Ordinal data examples include pageant winners and teacher academic rank.
 The INTERVAL LEVEL OF MEASUREMENT is concerned with data that may
be arranged and where discrepancies between data make sense. At this level,
data does not have a beginning point. Temperature scales in Fahrenheit and
Celsius are both instances of data at the interval level of measurement.
 The RATIO LEVEL is the fourth and greatest level of measurement. Data at the
ratio level have all of the characteristics of data at the interval level, plus a zero
value. Weight, the time it takes to solve a question, and the number of absences
in a class are all examples.
Data is at the heart of statistics. It takes skill to present facts in an effective and efficient
manner. While writing, you may have discovered numerous difficult realities that require
lengthy explanations. This is where the significance of data presentation comes into
play. You must present your results in such a way that the readers can swiftly browse
through them and comprehend each and every topic that you wish to highlight. As time
passed and fresh and sophisticated research became available, individuals realized the
value of data presentation in making sense of the findings. Data presentation is
described as the act of graphically representing the relationship between two or more
data sets using various graphical forms so that an educated choice may be made based
on them.

Three methods in presenting information from the data set:
This is the most basic way of data presentation among the several options. You
just write your results in a logical order, and your task is done. The disadvantage of this
approach is that it requires reading the entire text to obtain a good image. Yes, an
introduction, summary, and conclusion can aid in the condensing of information. This
summarizes the data by enumerating some of the most significant characteristics of the
data collection, such as providing the greatest, lowest, and average values. If there are
only a few observations, say fewer than ten, the values might be listed if necessary. For
example, data on the incidence rates of delirium after anesthesia in 2016–2017 might
be given using a few numbers: "The incidence rate of delirium following anesthesia was
11% in 2016 and 15% in 2017; no significant change in incidence rates was detected
between the two years." If this information were displayed in a graph or table, it would
take up an unnecessary amount of space on the page while not improving the readers'
understanding of the material. If additional data is to be shown, or other information,
such as data trends, is to be conveyed, a table or a graph would be preferable.
Tables and charts are used to show data to avoid the complexity associated in
textual data presentation. Data is presented in rows and columns in this way, exactly
like in a cricket match, to indicate who scored how many runs. There is one property for
each row and column (name, year, sex, age and other things like these). Data is written
within a cell against these characteristics. This is beneficial for huge data sets. A
frequency is the number of times a data value occurs. A frequency distribution is the
use of classifications and frequencies to organize raw data in table form.
A frequency distribution table consists of at least three columns - one listing
categories on the scale of measurement (X), for the tally of values and another for
frequency (f). In the X column, values are listed from the highest to lowest, without
skipping any. For the frequency column, tallies are determined for each value (how
often each X value occurs in the data set). These tallies are the frequencies for each X
value. The sum of the frequencies should equal N. A fourth column can be used for the
proportion (p) for each category (relative frequency): p = f/N. A relative frequency is
the ratio (fraction or proportion) of the number of occurrences of a data value in the set
of all outcomes to the total number of occurrences. Divide each frequency by the total
number of students in the sample to obtain the relative frequencies. Relative
frequencies can be expressed as a fraction, a percentage, or a decimal. The
accumulation of past relative frequencies is referred to as cumulative relative
frequency. Add all of the previous relative frequencies to the relative frequency for the
current row to get the cumulative relative frequency.

Jane is fond of playing games with dice. She throws the dice and notes the
observations each time. These are her observations: 4, 6, 1, 2, 2, 5, 6, 6, 5, 4, 2, 3. To
know the exact number of times she got each digit (1, 2, 3, 4, 5, 6) as the outcome, she
classifies them into categories. An easy way is to draw a frequency distribution table
with tally marks
1 I 1 8

2 III 3 25

3 I 1 8

4 II 2 17

5 II 2 17

6 III 3 25

n=12 100%

Table 1. Frequency Distribution of the Outcomes of Jane’s Observation

A frequency distribution table is shown in the table above. You can see that all of the
information gathered has been arranged into four columns. As a result, a frequency
distribution table is a table that summarizes the values and their frequencies. In other
terms, it is a data organization tool. This makes it simple for us to comprehend the
supplied collection of data.
Thus, in statistics, the frequency distribution table assists us in condensing data into a
simpler form that allows us to observe its properties at a glance.
When the data set has a wide range, grouped frequency distributions are utilized.
Lower- and upper-class limits are the smallest and biggest allowable data values in a
class. The classes are separated by class boundaries. To determine a class boundary,
take the average of one class's upper-class limit and the following class's lower-class
limit. Subtracting successive lower-class limits or successive upper-class limits yields
the class width (or boundaries). Averaging the upper and lower class limits yields the
class midpoint Xm (or boundaries).
Rules for Classes in Grouped Frequency Distributions
There should be 5-20 classes, the class width must be an odd number, mutually
exclusive, continuous, exhaustive and must be equal in width (except in open-ended

Let's look at an example. Ms. Jennifer is a teacher. She wants to look at the marks
obtained by the students of her class in the last exam. She does not have the time to go
through each test paper individually to see the marks. Thus, she asks Mr. Thomas to
organize the data in a table so that it is easier for her to look at everyone's marks
together. Ms. Jennifer suggests using a frequency distribution table to organize the
data, so as to get a better picture of the data rather than using a simple list.
9 1
11 4
13 1
18 1
20 1
21 2
22 1
23 3
25 1
26 3
29 1
30 1

The frequency distribution table drawn above is called an ungrouped frequency

distribution table. It is the representation of ungrouped data and is typically used when
you have a smaller data set. Hence, in such cases, we form class intervals to tally the
frequency for the data that belongs to that specific class interval.
Table 2. Grouped Frequency Distribution of Recorded Marks Obtained in a Test
0-5 0.5-5.5 3 60
5-10 4.5-10.5 11 57
10-15 9.5-15.5 12 46
15-20 14.5-20.5 19 34
20-25 19.5-25.5 7 15
25-30 24.5-30.5 8 7


Graphic presentation is a highly developed set of techniques for describing,
understanding, and analyzing numerical data using points, lines, areas, and other
geometric forms and symbols. The goal of graphically presenting data is to use the
power of visual presentation to effectively communicate information while avoiding
deceit or confusion. This is essential in both how we convey our results to others and
how we comprehend and analyze the data.
A histogram is used to summarize discrete or continuous data. In other words, it
provides a visual interpretation of numerical data by showing the number of data points
that fall within a specified range of values (called “bins”). It is similar to a vertical bar
graph. However, a histogram, unlike a vertical bar graph, shows no gaps between the
bars. Creating a histogram allows you to see how data is distributed visually.
Histograms may show a vast quantity of data as well as the frequency of data values. A
histogram may be used to identify the data's median and distribution. Furthermore, it
can highlight any anomalies or gaps in the data.
Example of a Histogram
Jeff is the branch manager at a local bank. Recently, Jeff’s been receiving customer
feedback saying that the wait times for a client to be served by a customer service
representative are too long. Jeff decides to observe and write down the time spent by
each customer on waiting. Here are his findings from observing and writing down the
wait times spent by 20 customers:
The corresponding histogram with 5-second
bins (5-second intervals) would look as follows:

A frequency polygon is very similar to a histogram, which is used to compare data
sets or to illustrate a cumulative frequency distribution. A line graph is used to depict
quantitative data. Statistics is concerned with the collecting of facts and information for a
certain goal. In cricket, the statistics of the game are determined by tabulating each run
for each delivery. Tables, graphs, pie charts, bar graphs, histograms, polygons, and
other visual representations of statistical data are utilized. Frequency polygons are a
visually appealing way of displaying quantitative data and frequency distributions. Let's
look at how to draw a frequency polygon.
In a batch of 400 students, the height of students is given in the following table.
Represent it through a frequency polygon.

Solution: Following steps are to be followed to construct a histogram from the given
 The heights are represented on the horizontal axes on a suitable scale as shown.
 The number of students is represented on the vertical axes on a suitable scale as
 Now rectangular bars of widths equal to the class- size and the length of the bars
corresponding to a frequency of the class interval is drawn.

A box and whisker plot, often known as a box plot, shows a five-number summary of
a collection of data. The lowest, first quartile, median, third quartile, and maximum are
the five-number summary. A box plot is created by drawing a box from the first to third
quartiles. At the median, a vertical line runs through the box. The whiskers go from one
quartile to the next until they reach the minimum or maximum.

Minimum: The minimum value in the given dataset

First Quartile (Q1): The first quartile is the median of the lower half of the data set.
Median: The median is the middle value of the dataset, which divides the given dataset into two
equal parts. The median is considered as the second quartile.
Third Quartile (Q3): The third quartile is the median of the upper half of the data.
Maximum: The maximum value in the given dataset.
Apart from these five terms, the other terms used in the box plot are:
Interquartile Range (IQR): The difference between the third quartile and first quartile is known as
the interquartile range. (i.e.) IQR = Q3-Q1
Outlier: The data that falls on the far left or right side of the ordered data is tested to be the outliers.
Generally, the outliers fall more than the specified distance from the first and third quartile. 
(i.e.) Outliers are greater than Q3+ (1.5 . IQR) or less than Q1-(1.5 . IQR).

The following data are the heights of 40 students in a statistics class.

10 | P a g e
59; 60; 61; 62; 62; 63; 63; 64; 64; 64; 65; 65; 65; 65; 65; 65; 65; 65; 65; 66; 66; 67;
67; 68; 68; 69; 70; 70; 70; 70; 70; 71; 71; 72; 72; 73; 74; 74; 75; 77
Construct a box plot with the following properties; the calculator instructions for the
minimum and maximum values as well as the quartiles follow the example.
Minimum value = 59
Maximum value = 77
Q1: First quartile = 64.5
Q2: Second quartile or median= 66
Q3: Third quartile = 70

a) Each quarter has approximately 25% of the data. The spreads of the four
quarters are 64.5–59=5.5 (first quarter), 66–64.5=1.5 (second quarter), 70–
66=4(third quarter), and 77–70=7 (fourth quarter). So, the second quarter has the
smallest spread and the fourth quarter has the largest spread.
b) Range = maximum value – the minimum value = 77 – 59 = 18
c) Interquartile Range: IQR= Q3 – Q1= 70–64.5=5.5.
d) The interval 59–65 has more than 25% of the data so it has more data in it than
the interval 66 through 70 which has 25% of the data.
e) The middle 50% (middle half) of the data has a range of 5.5 inches.

A Pareto chart is a type of bar graph. The bar lengths reflect frequency or cost (time
or money) and are ordered with the longest on the left and the shortest on the right. As
a result, the chart graphically illustrates which circumstances are more important. This
cause analysis technique is one of the seven fundamental quality tools.
 When analyzing data about the frequency of problems or causes in a process.
 When there are many problems or causes and you want to focus on the most
 When analyzing broad causes by looking at their specific components.
 When communicating with others about your data.

11 | P a g e
 Figure 1 shows how many customer complaints were received in each of five
 Figure 2 takes the largest category, "documents," from Figure 1, breaks it down
into six categories of document-related complaints, and shows cumulative
 If all complaints cause equal distress to the customer, working on eliminating
document-related complaints would have the most impact, and of those, working
on quality certificates should be most fruitful.

Figure 1: Pareto Chart, Customer Complaints

Figure 2: Pareto Chart, Document Complaints


12 | P a g e
A bar graph is a chart or graphical depiction of facts, quantities, or numbers that
uses bars or strips. Bar graphs are used to compare and contrast different types of data
by comparing and contrasting quantities, frequencies, or other metrics.


A time series graph is a line graph that depicts repeated measurements performed
at regular periods of time. The horizontal axis is always used to represent time. On time
series graphs, data points are drawn at regular intervals and connected by straight

A pie chart is a circular statistical graphic, which is divided into slices to illustrate
numerical proportion. In a pie chart, the arc length of each slice, is proportional to the
quantity it represents.

13 | P a g e


Measures of Central Tendency

A measure of central tendency is a single number that seeks to explain a
collection of data by finding the center location within that set of data. As a result,
measures of central tendency are often known as measures of central position. They're
also known as summary statistics. The mean (also known as the average) is probably
the most recognizable measure of central tendency, although there are others, such as
the median and the mode.
The mean, median, and mode are all appropriate measures of central tendency,
however depending on the circumstances, some measures of central tendency are
more appropriate to use than others.
The arithmetic mean of a data set is the sum of all values divided by the total number of
values. It’s the most commonly used measure of central tendency because all values
are used in the calculation.

14 | P a g e
Example: Finding the mean

REACTION 287 345 365 298 380


 First you add up the sum of all values:

⅀x = 287 + 345 + 365 + 298 + 380 = 1675

 Then you calculate the mean using the formula ⅀x/n. There are 5 values in the
dataset, so n = 5.

Mean (x̄) = 1675/5 = 335

Mean: 335 milliseconds
A data set is a collection of values from a sample or a population. A population is the
whole group that you want to study, but a sample is simply a subset of that population.
While data from a sample can be used to generate educated guesses about a
population, only whole population data can provide a complete picture.
The nomenclature of a sample mean and a population mean, as well as their formulae,
differ in statistics. The techniques for computing the population and sample means, on
the other hand, are the same.

The sample mean is written as M or x̄ (pronounced x-bar). For calculating the mean of a
sample, use this formula:

x̄ = ⅀x/n
 x̄:  sample mean
 ⅀x: sum of all values in the sample data set
 n: number of values in the sample data set

15 | P a g e
The population mean is written as μ (Greek term mu). For calculating the mean of a
population, use this formula:

μ = ⅀ X/N
 μ: population mean
 ⅀X: sum of all values in the population data set
 N: number of values in the population data set


The median of a data set is the value that’s exactly in the middle when it is ordered from
low to high.

Example: Finding the median

You measure the reaction times of 7 participants on a computer task and categorize
them into 3 groups: slow, medium or fast.

Participant 1 2 3 4 5 6 7

Speed Medium Slow Fast Fast Medium Fast Slow

To find the median, you first order all values from low to high. Then, you find the value
in the middle of the ordered data set – in this case, the value in the 4th position.

Ordered data set Slow Slow Medium Medium Fast Fast Fast

Median: Medium


For an odd-numbered data set, find the value that lies at the (n+1)/2 position, where n is
the number of values in the data set.
You measure the reaction times in milliseconds of 5 participants and order the data set.

16 | P a g e
Reaction time (milliseconds) 287 298 345 365 380

The middle position is calculated using (n+1)/2, where n = 5.

(5+1)/2 = 3

That means the median is the 3rd value in your ordered data set.

Median: 345 milliseconds


For an even-numbered data set, find the two values in the middle of the data set: the
values at the n/2 and (n/2) + 1 positions. Then, find their mean.

You measure the reaction times of 6 participants and order the data set.

Reaction time (milliseconds) 287 298 345 357 365 380

The middle positions are calculated using n/2 and (n/2) + 1, where n = 6.

6/2 = 3

(6/2) + 1 = 4

That means the middle values are the 3rd value, which is 345, and the 4th value, which
is 357.

To get the median, take the mean of the 2 middle values by adding them together and
dividing by two.

(345 + 357)/2 = 351

Median: 351 milliseconds

The mode is the most frequently occurring value in the data set. It’s possible to have no
mode, one mode, or more than one mode.

17 | P a g e
To find the mode, sort your data set numerically or categorically and select the
response that occurs most frequently.

Example: Finding the mode

 In a survey, you ask 9 participants whether they identify as conservative,
moderate, or liberal.
 To find the mode, sort your data by category and find which response was
chosen most frequently.
 To make it easier, you can create a frequency table to count up the values for
each category.

Political ideology Frequency

Conservative 2

Moderate 3

Liberal 4

Mode: Liberal


Instruction: Determine the following variables whether Qualitative or Quantitative.
If Quantitative, specify if it is Discreet or Continuous.

1. Eye color
2. Number of canned soda in grocery store
3. Height of a child
4. Number of unemployed member of the family
5. Number of white roses

Instruction: Identify what level of measurements being described by the following

1. It ranks the qualitative data.

2. It is the lowest among the four level of measurements.
3. In this level, data possesses all the features of the interval level.
4. Data at this level has no starting point.
5. Example of this level of measurement are color of the eye, yes or no survey and etc.

18 | P a g e
Instruction: Construct frequency distribution of the following data.

1. 36, 26, 27, 23, 25, 26, 28, 15, 26, 35, 25, 27, 34,

2. 1, 3, 8, 3, 9, 9, 6, 10, 4, 2, 6, 8, 9, 4, 2, 7, 5, 4, 8, 5


Test I.
1. Qualitative
2. Quantitative — Discreet
3. Quantitative — Continuous
4. Quantitative — Discreet
5 Quantitative — Discreet

Test II.
1. Ordinal level
2. Nominal level
3. Ratio level
4. Interval level
5. Nominal level

Test III.
Answer May Vary



19 | P a g e

You might also like