Module - Data Management (Part 1)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

DATA MANAGEMENT (STATISTICS)

Objectives
1. Recognize the basic terms of statistics.
2. Determine and apply the measures of central tendency, variability, and
position.
3. Apply the measures of central tendency, and variability in normal
distribution.
4. Determine the linear regression and correlation of the set of data.
Lesson Proper

LESSON 1. Statistics Introduction and Definition

The science of conducting studies to collect, organize, summarize, analyze, and draw
conclusions from data is called statistics. It is used in almost all fields of human
endeavor such as sports, education, health, research, and among others. Statistical
analysis are used to manipulate, summarize, and investigate data for a useful decision
– making information results.

Sir Ronald Aylmer Fisher (February 17, 1890 – July


29, 1962), British statistician and geneticist who
pioneered the application of statistical procedures to
the design of scientific experiments. He is consider as
the Father of Modern Statistics.

In 1990, he was awarded a scholarship to study


mathematics at University of Cambridge. In 1992, he
graduated from B.A. in Astronomy, and he continue to
study astronomy and physics at the university, and
study the theory of errors which connects him to
statistics.

Source:From 1914 to 1919, he taught high school


https://www.adelaide.edu.au/library/special/mss/fisher/
mathematics and physics while continuing his
research in statistics and genetics. In 1918, he published an important paper where
he used powerful statistical tools to reconcile inconsistencies between Charles
Darwin’s ideas of natural selection and rediscovered experiments of Australian
botanist Gregor Mendel.

In 1919, he became statistician for the Rothamsted Experimental Station and did
statistical work associated with plant – breeding experiments which led to theories
about gene dominance and fitness. From 1943 until 1957, he was Balfour Professor
of Genetics at Cambridge. He investigated the linkage of genes for different traits and

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 1
developed methods of multivariate analysis to deal with such questions. To avoid bias
in selection of experiment materials (inaccurate and misleading), he introduced
principle of randomization. In this way, random selection is used to diminish the
effects of variability in experimental materials.

One of the most important achievement of Fisher is the concept of analysis of variance
or ANOVA.

Types of Statistics
1. Descriptive Statistics
Consists of methods for collection, organization and summarization, and
presentation of data/ information

Example: construction of graphs, charts, and tables and the calculation of


various descriptive measures such as averages, measures of
variation, and percentiles

2. Inferential Statistics
Consists of methods for drawing and measuring the reliability of conclusions
about a population based on information obtained from a sample of the
population.

After collection, organization, summarization, and presentation of data


(descriptive), inferential statistics is used to determine the findings and draw
conclusions, respectively.

This denotes, that descriptive statistics and inferential statistics are


interrelated. Use descriptive statistics to organize and summarize the
obtained information from sample before carrying out an inferential statistics.
Descriptive statistics leads us to appropriate inferential method.
Population and Sample
Population
The collection of all individuals or items under consideration in a statistical study.

Sample
That part of the population from which information is obtained.
For example, in a certain study about Statistics University with 6,589 students. The
6,589 students is the population. Hence, if the researcher randomly selected class A
with 44 students, the 44 students is the sample. Sample is the representative of the
population.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 2
Before we through the discussions, let use first define some basic operational terms
in statistics:

Variable – a characteristic or attribute that can assume different values. Any


characteristic, number, or quantity that can be measured or counted. It is
also called data item. Collected information for variables, describe the
situation.

Example. Age, sex, business income and expenses, birth, expenditure,


class grades, eye color, and among others.

Types of Variables
1. Numeric Variables/ Quantitative Variables
Have values that describe a measurable quantity as a number, like ‘how many’
or ‘how much’. These are that quantifiable variables . Data collected in
numeric variable is called quantitative data.

a. Continuous Variable
Observations can take any value between a certain set of real numbers.
The value given to an observation for a continuous variable can include
values as small as the instrument of measurement allows.
Examples: height, time, age, and temperature

Height can be 1.62m, time can be 3.5hours (3 hours and 30 minutes), age can
3
be 16 4 years old (16 years and 9 months), and temperature can be
2
36 ℃𝑜𝑟 36.40℃
5

b. Discrete Variable
Observations can take a value based on a count from a set of distinct
whole values. A discrete variable cannot take the value of a fraction
between one value and the next closest value.
Examples: number of registered cars, number of business locations,
and number of children in a family, all of which measured
as whole units (i.e. 1, 2, 3 cars)

2. Categorical Variables/ Qualitative Variables


Have values that describe a 'quality' or 'characteristic' of a data unit, like 'what
type' or 'which category'. Categorical variables fall into mutually exclusive (in
one category or in another) and exhaustive (include all possible options)
categories. Therefore, categorical variables are qualitative variables and tend
to be represented by a non-numeric value. Data collected is called qualitative
data.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 3
a. Ordinal Variable
Observations can take a value that can be logically ordered or ranked.
The categories associated with ordinal variables can be ranked higher
or lower than another, but do not necessarily establish a numeric
difference between each category.
Examples: academic grades (i.e. A, B, C), clothing size (i.e. small,
medium, large, extra-large) and attitudes (i.e. strongly agree,
agree, disagree, and strongly disagree)

b. Nominal Variable
Observations can take a value that is not able to be organized in a logical
sequence.
Examples: sex, business type, eye color, religion and brand

Source: Australian Bureau of Statistics (2013)

Data

Data – values (measurements or observations) that the variables can assume.


Variables whose values are determined by chance are called random
variables.
Data Set – collection of data
Data Value or Datum – each value in the data set
Quantitative data – data from numeric/ quantitative variables; quantifiable data
Qualitative data – data from categorical/ qualitative variables; non - numeric
Discrete data – data from discrete variables; non – fraction data

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 4
Continuous data – data from continuous variable; data from the set of real numbers.

Data

Quantitative Qualitative

Discrete Continuous*

For example, the grades of 5 students in Statistics are 94, 75, 82.5, 74.9, and 89.

From the example above, the grades of students is the variable. Under numeric
variable, it classified as continuous variable since it can be represented by decimal or
fraction. Furthermore, 94, 75, 82.5, 74.9, and 89 is the data set. Each value is the data
value or datum (e.g. 94 is data value or datum). These data are continuous data since
it can be from a set of real numbers.

Moreover, variables can also be classified by how they are categorized besides
qualitative and quantitative data – measurement scales/ level of measurement.

Level of Measurement
1. Nominal level of measurement
Classifies data into mutually exclusive (no overlapping) categories in which no
order or ranking can be imposed on the data. Nominal data are countable.

Example: gender, zip codes; political party; religion; nationality

2. Ordinal level of measurement


Classifies data into categories that can be ranked; however, precise differences
between the ranks do not exist. Contain more information. Consists of distinct
categories in which order is implied. Values in one category are larger or
smaller than values in other categories (e.g. rating-excelent, good, fair, poor)

Example: evaluation (superior, average, poor); ranking (first, second, etc.);


letter grades (A, B, C, D, E, F)

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 5
3. Interval level pf measurement
Ranks data, and precise differences between units of measure do exist;
however, there is no meaningful zero. Set of numerical measurements in which
the distance between numbers is of a known, constant size.

Example: IQ level; temperature

There is a meaningful difference of 1 point between an IQ of 109 and an IQ of


110. Temperature is another example of interval measurement, since there is
a meaningful difference of 1°F between each unit, such as 72 and 73°F.

One property is lacking in the interval scale: There is no true zero. For
example, IQ tests do not measure people who have no intelligence. For
temperature, 0°F does not mean no heat at all.

4. Ratio level of measurement


Possesses all the characteristics of interval measurement, and there exists a
true zero or non - arbitrary zero point. In addition, true ratios exist when the
same variable is measured on two different members of the population.
Consists of numerical measurements where the distance between numbers is
of a known, constant size

Example: height; weight; area; number of phone calls

There exists a true zero or non - arbitrary zero point, zero weight, height, area,
or phone calls is meaningful, it could implies that the thing does not exist.

For example, if one person can lift 200 pounds and another can lift 100 pounds,
then the ratio between them is 2 to 1. Put another way, the first person can lift
twice as much as the second person.

There is not complete agreement among statisticians about the classification of data
into one of the four categories. For example, some researchers classify IQ data as ratio
data rather than interval. Also, data can be altered so that they fit into a different
category. For instance, if the incomes of all professors of a college are classified into
the three categories of low, average, and high, then a ratio variable becomes an
ordinal variable.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 6
LESSON 2. Data Collection and Sampling Techniques

Developed to mathematically determine the most effective way to acquire a sample


that would accurately reflect the population of the study.

The most common mathematical formula to determine the number of sample in


reference to population is the Slovin’s Formula which is introduced by Slovin in 1960.
To this day, it is still unknown who really Solvin is, many names associated either
Mark Slovin, Michael Slovin, or Kulkol Slovin.

Slovin’s Formula
𝑁
𝑛=
1 + 𝑁𝑒 2

where:
𝑛 is the sample size
𝑁 is the population size
𝑒 is the margin of error (e.g. 0.01, 0.05, 0.1, etc)

Use Slovin’s formula if you have no idea about the population’s behavior. Slovin’s
formula determines sample in proportion to the population. Slovin’s formula is
applicable only when estimating a population proportion and when the confidence
coefficient is 95%. There are other sampling formula that could be used to
determine samples in relation to the characteristics of the variables.

In most educational and scientific researches, 0.05 margin of error (level of


significance is used most of the times.

Margin of error tells how many times percentage points your results will differ from
the real population. For example, 0.05 (5%) level of significance which implies 0.95
(95%) confidence level to the real population value.

Example: Assuming a certain is to be conducted to a certain community with 6,518


residents. Determine the number of respondents of the study with 5% level
of significance using Slovin’s formula.

Solution: From the assumption, 6,518 is the population size, and 0.05 (5%) is the
margin of error. Therefore:

𝑁
𝑛=
1 + 𝑁𝑒 2
6,518
𝑛=
1 + 6,518(0.05)2
6,518
𝑛=
1 + 6,518(0.0025)

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 7
6,518
𝑛=
1 + 16.295
6,518
𝑛=
17.295
𝑛 = 376.87 ≈ 377
This implies that using Slovin’s formula, the given’s sample size is 377 (respondents).

Sampling Techniques
Sampling techniques are methods of identifying who will be the respondents of the
study (sample). For instance, in the previous example, how to identify the 377
respondents? Here comes the sampling techniques.

Types of Sampling Techniques


1. Probability/ Random Sampling Techniques
All members of the population have an equal chance of being selected to be
part of the sample.

a. Simple Random Sampling Technique (e.g. fishbowl method or lottery


method, table of random numbers, or computer)
In this method, names will be placed inside a bowl or box, then the
target respondents will be picked by one by one until the target number
of respondents is obtained.

Example: If we are going to select 5 of 10 using simple random


sampling, names or code of the 10 members of population
will be placed inside the bowl and box. Then, 5 names or
codes will be pick one by one. Then the selected 5 will be the
respondents.

i. Simple Random Sampling Technique with Replacement


In here, each of them can be selected or pick up more than once
because their names will be put back after they are picked.

ii. Simple Random Sampling Technique without Replacement


Here, if their names or codes are picked it will not be placed
again in the bowl or box.

b. Systematic Random Sampling Technique


Obtained by selecting every 𝑘 𝑡ℎ member of the population where 𝑘 is a
counting number.

Example: For the sake of illustration let us limit the population size.
Suppose 10 population size is 10, and the sample is 5. How
can we obtain the 5 samples?

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 8
Solution: Step 1. Divide the population size by sample size.
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
10
=
5
=𝟐
This implies that every 2nd will be selected.
Step 2. To start arrange the population in order, and
randomly select the starting first sample.

Assuming, by simple random selection (fishbowl or


lottery, we have chosen 4. So 4 is the first sample.

1 2 3 4 5 6 7 8 9 10

From 4, every 2nd will be selected until 5 target samples is obtained. So:

1 2 3 4 5 6 7 8 9 10
1st 2nd 1st 2nd 1st 2nd

Our samples are 4, 6, 8, and 10, but we have only 4 samples,


5 samples is not obtained. We need another sample.
Continue counting in cycle. Implies:

4 5 6 7 8 9 10 1 2 3
1st 2nd 1st 2nd 1st 2nd 1st 2nd 1st
Target 5 samples is now obtained: 4th, 6th, 8th, 10th, and 2nd

c. Stratified Random Sampling Technique


Obtained by dividing the population into subgroups or strata according
to some characteristic relevant to the study. (There can be several
subgroups.) Then subjects are selected at random from each subgroup.

Example: The town has 250 homeowners of which 25, 175, and 50 are
upper income, middle income, and low income, respectively.
Explain how we can obtain a sample of 20 homeowners,
using stratified sampling with proportional allocation,
stratifying by income group.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 9
Solution:
Step 1. Divide the population into subpopulations (strata).
Stratum 1: upper income (25)
Stratum 2: middle income (175)
Stratum 3: lower income (50)

Step 2. From each stratum, proportionate the sample size.


𝑠𝑡𝑟𝑎𝑡𝑢𝑚 𝑠𝑖𝑧𝑒
(𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒) ( )
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑖𝑧𝑒
25
Stratum 1: upper income (25) 20 ∙ 250 = 2
175
Stratum 2: middle income (175) 20 ∙ 250 = 14
50
Stratum 3: lower income (50) 20 ∙ 250 = 4

Step 3. Use all members obtained in Step 2 as the sample.


This implies that, upper income have 2 samples, middle income
have 14, and lower income have 4 samples.

Samples from each stratum can be obtained by either simple


random sampling or systematic random sampling.

Interpretation: This stratified sampling procedure ensures that no


income group is missed. It also improves the precision
of the statistical estimates (because the homeowners
within each income group tend to be homogeneous)
and makes it possible to estimate the separate
opinions of each of the three strata (income groups).

d. Cluster Random Sampling Technique


Obtained by dividing the population into sections or clusters and then
selecting one or more clusters at random and using all members in the
cluster(s) as the members of the sample. Groups or cluster could be by
geographic area or schools in large district. Cluster sampling is used
when the population is large or when it involves subjects residing in a
large geographic area.

Example: To save time, the planner decided to use cluster sampling.


The residential portion of the city was divided into 947
blocks, each containing 20 homes. Explain how the planner
used cluster sampling to obtain a sample of 300 homes.
Solution:
Step 1. Divide the population into groups (cluster)

The planner used the 947 blocks as the clusters, thus dividing
the population (residential portion of the city) into 947 groups.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 10
Step 2. Obtain a simple random sample of the clusters.

The planner numbered the blocks (clusters) from 1 to 947 and


then used a table of random numbers to obtain a simple
random sample of 15 of the 947 blocks.

Step 3. Use all the members of the clusters obtained in Step 2 as the
sample

The sample consisted of the 300 homes comprising the 15


sampled blocks:

15 blocks × 20 homes per block = 300 homes.

Interpretation. The planner used cluster sampling to obtain a sample of


300 homes: 15 blocks of 20 homes per block. Each of the
three interviewers was then assigned 5 of these 15
blocks. This method gave each interviewer 100 homes
to visit (5 blocks of 20 homes per block) but saved much
travel time because an interviewer could complete the
interviews on an entire before driving to another
neighborhood. The report was finished on time.

e. Multi-Stage Sampling Technique


Most large-scale surveys combine one or more of simple random
sampling, systematic random sampling, cluster sampling, and stratified
sampling. This is frequently by pollsters and government agencies.

2. Nonprobability/ Nonrandom Sampling Techniques


In these techniques, all members of the population have no equal chance of
being selected to be part of the sample.

a. Convenience or Accidental Sampling Technique


The use of most convenient way of determining the samples.

For instance, a survey about Facebook users, to select respondents


using convenience sampling technique, the researcher
could send private message to online Facebook friends.
Not all Facebook friends have equal chance to part of the
sample because what if the person is offline, therefor he/
she has no chance to be part of the respondents.

b. Quota Sampling Technique


Ensures equal or proportionate representation of the subjects,
depending on which trait is considered as the basis of quota. The usual

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 11
bases of quota are age, gender, education, race, religion, & socio-
economic status.

Example: The basis of quota is college level & research needs equal
presentation with 100 as sample size. Researcher must
select 25 from each year level.

c. Volunteer or self –selected Sampling Technique


If a person decided to include themselves as part of the samples.

d. Purposive/ Purposeful or Judgmental/ Judgement or Selective or


Deliberate Sampling Technique
Researcher selects samples who fulfil the criteria as well as inclusion in
the population as per knowledge of the researcher.

For example, a study about experiences of post disaster depression


among people living in earthquake affected areas,
therefore the respondents are the people who are
victims of earthquake and suffering post disaster
depression.

e. Snowball/ Networking Sampling Technique


Used to identify potential subjects in studies where subjects are hard
to locate. Works like chain referral. This is also known as chain referral
sampling technique.

After observing the initial subject, the researcher asks for assistance
from the subject to help in identifying people with a similar trait of
interest. It is like asking subjects to nominate another with the same
trait. The same process is done until sufficient number of subjects is
obtained.

f. Expert Sampling Technique


Samples are chosen their expertise.

For example, a study about volcanoes, then you will consult


volcanologists.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 12
LESSON 3. Measures of Central Tendency

It is a descriptive measures that indicate where the center or most of the typical
value of the data set lies. This often called averages. There are three most important
measures of central tendencies: the mean, median and mode. The mean and median
apply only to quantitative data, whereas the mode can either be used in quantitative
or qualitative data.
Statistic – a characteristic or measure obtained by using data values from sample.
Parameter – a characteristic or measure obtained by using all the data values from a
specific population.

Data Classification
a. Ungrouped/ Small Data – if data is 30 and below.
b. Grouped/ Large Data – if data is more than 30.

a) Ungrouped/ Small Data

Suppose, Carmella’s scores in seven 100 - item tests are 78, 96, 85, 91, 70, 79, and 96.
Determine the mean, median, and mode.

1. Mean
It is the sum of the observations divided by the number of observations.
Among the three this is the most reliable. Also called average.

𝑥 – mean of sample. Read as x – bar.


𝜇 – mean of population. A Greek letter pronounce as mu.

∑ 𝑥 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛−1 + 𝑥𝑛
𝑥= =
𝑛 𝑛
∑ 𝑥 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑁−1 + 𝑥𝑁
𝜇= =
𝑁 𝑁
Where:
𝑥 is the individual datum,
𝑛 is the sample size,
𝑁 is the population size.

∑ 𝑥 78 + 96 + 85 + 91 + 70 + 79 + 96
𝑥= =
𝑛 7
595
𝑥= = 𝟖𝟓
7

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 13
The mean being described above is arithmetic mean. But, besides this, there
are other types of mean such weighted mean, and combined/ compound
mean.
2. Median
When data in increasing or decreasing order, it is the middle most number.
• If the number of observations is odd, then the median is the
observation exactly in the middle of the ordered list.
• If the number of observations is even, then the median is the mean of
the two middle observations in the ordered list.
In both cases, if we let n denote the number of observations, then the median
is at position (n + 1)/2 in the ordered list. Median is denoted by 𝑥̃ (read as x
Let us consider the given above, arrange the data in increasing order. Since
the number of data 7 which odd, it satisfy the first condition.
70 78 79 85 91 96 96
1st 2nd 3rd 4th 5th 6th 7th
𝑛+1 7+1 8
The middle most number is 85. Hence, the position is = = = 4, so
2 2 2
85 is the 4th term.
𝑥̃ = 85
To illustrate the 2nd condition if we have even number of data, let consider
the same given we will add another number, suppose the additional number
is 68.
68 70 78 79 85 91 96 96
1st 2nd 3rd 4th 5th 6th 7th 8th
Median is the average of the numbers at the center, 79 and 85, respectively.
79 + 85 164
𝑥̃ = = = 82
2 2
𝑛+1 8+1 9
The position is 2 = 2 = 2 = 4.5. The position of 82 as median is 4.5th. This
means that 82 is halfway between the 4th and the 5th term.
Median is also the most stable measures among the three because it is not
affected by outliers (extremes). Outliers are the data that are either extremely
high or extremely low.
Let us consider again the same example, but this time, we’re going to change
either of the highest or lowest or both.
70 78 79 85 91 96 96
From the given, 85 is the median.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 14
1 78 79 85 91 96 96
We changed the lowest number from 70 to 1, but the median is still 85.
70 78 79 85 91 96 500
We replaced the highest from 96 to 500, still the median is 85.
1 78 79 85 91 96 500
We replaced both the lowest and highest, still the median is 85.
3. Mode
• The most frequent data.
• If no value occurs more than once, then the data set has no mode.
• Otherwise, any value that occurs with the greatest frequency is a mode
of the data set.
• Denoted by 𝑥̂ (read as x – hut).

The given data above is:


70 78 79 85 91 96 96
There are two 96 and other data appear only once. Therefore, the mode is 96
(unimodal).
𝑥̂ = 96
What if the set of value is:
70 79 79 85 91 96 96
Both 79 and 96 appeared twice and other data appeared once, therefore the
modes are 79 and 96 (Bimodal), respectively.
𝑥̂ = 79 & 96
Types of Modes
• Unimodal - one mode
• Bimodal -two modes
• Trimodal - three modes
• Multimodal -4 and above number of modes
The situation above shows that mean, median, and mode are 85, 82, and 96,
respectively.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 15
b) Grouped/ Large Data
The ages of the first 50 persons who enter the mall were tallied, as shown below.
Determine the mean, median, and mode of their ages.
Age Frequency
10 – 19 5
20 – 29 20
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50
From the table above, age is the classes.
1. Mean
∑ 𝑓𝑥 𝑓1𝑥1 + 𝑓2 𝑥2 + 𝑓3 𝑥3 + ⋯ + 𝑓𝑛−1 𝑥𝑛−1 + 𝑓𝑛 𝑥𝑛
𝑥= =
𝑛 𝑛
∑ 𝑓𝑥 𝑓1 𝑥1 + 𝑓2 𝑥2 + 𝑓3 𝑥3 + ⋯ + 𝑓𝑁−1 𝑥𝑁−1 + 𝑓𝑁 𝑥𝑁
𝜇= =
𝑁 𝑁
Where:
𝑥 is sample mean
𝜇 population mean
𝑛 is the sample size
𝑁 is the population size
𝑓 is class frequency
𝑥 is class mark

To start let us first complete the table below. In each class, for instance, class
10 – 19, the smaller value is the lower limit which 10 (in the given class), and
upper limit which 19 (in the given class). Class mark is the average of the lower
limit and upper limit of the class. In lowest class (class with lowest values), the
class mark is:
10 + 19 29
= = 14.5
2 2
You could do the same process the other class. But, there is alternative way to
continue the process by use of class interval.
𝐶𝑙𝑎𝑠𝑠(𝐴𝑔𝑒) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓) 𝐶𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 (𝑥) 𝑓𝑥
10 – 19 5 14.5
20 – 29 20
30 – 39 10
40 – 49 7
50 – 59 8
Total n=50

Class interval is the difference succeeding lower limits or difference of


succeeding upper limits. For example, 20 and 10 are lower limits of two

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 16
succeeding classes, and 19 and 29 are upper limits of two succeeding class.
Their difference is the class interval, such:
20 – 10 = 29 − 19 = 10
This also true to other classes.
To continue, just add the class interval to the initial class mark.
𝐶𝑙𝑎𝑠𝑠(𝐴𝑔𝑒) 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑓) 𝐶𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 (𝑥) 𝑓𝑥
10 – 19 5 14.5 5 ∙ 14.5 = 72.5
20 – 29 20 14.5+10=24.5 490
30 – 39 10 34.5 345
40 – 49 7 44.5 311.5
50 – 59 8 54.5 436
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

The column is the product of the frequency (f) and class mark (x). Add all the
product to get ∑ 𝑓𝑥. Hence, to get the mean:
∑ 𝑓𝑥 1,655
𝑥= = = 𝟑𝟑. 𝟏
𝑛 50
This implies that the average age who comes to mall is more or less 33 years
old.
2. Median
𝑛
− 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑚𝑒 + (2 )𝑖
𝑓𝑚𝑒
Where:
𝑥̃ is the median
𝐿𝐵𝑚𝑒 lower boundary of the median class
𝑛 is the sample size
𝑐𝑓𝑏 is the summation of frequencies before the median class (lower
classes of median class). 𝑐𝑓 stands for cumulative frequency.
𝑓𝑚𝑒 is the frequency of the median class
𝑖 is the class interval
Let us use the previous results. Add another column for summation of
frequencies. If you’re going only to find the median, you can disregard the 3 rd
column (class mark) and 4th column (fx).
𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝑓 𝑥 𝑓𝑥 cf
10 – 19 5 14.5 72.5 5
20 – 29 20 24.5 490 5+20=25
30 – 39 10 34.5 345 25+10=35
40 – 49 7 44.5 311.5 35+7=42
50 – 59 8 54.5 436 42+8=50
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 17
𝑛
− 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑚𝑒 + (2 )𝑖
𝑓𝑚𝑒
Divide first the sample size into 2.
𝑛 50
= = 𝟐𝟓 (𝟐𝟓𝒕𝒉 𝒕𝒆𝒓𝒎)
2 2
Observe the last column, class 10 -19 has 1st to 5th terms. Hence, class 20 – 29
has the 6th to 25th terms, then class 30 – 39 has the 26th to 35th terms, and so
on. Since the 25th term belongs to class 20 – 29, therefore the median class will
the class 20 – 29.

𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝑓 𝑥 𝑓𝑥 ∑𝑓
10 – 19 5 14.5 72.5 5
20 – 29 20 24.5 490 25
30 – 39 10 34.5 345 35
40 – 49 7 44.5 311.5 42
50 – 59 8 54.5 436 50
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

𝑓𝑚𝑒 𝑐𝑓𝑏

The last variable with no value yet is 𝐿𝑚𝑒 . This is the average of the lower
boundary of the median class which 20 in this case and upper boundary of the
lower class before the median class which is 19 in this case. So:
19 + 20 39
𝐿𝐵𝑚𝑒 = = = 19.5
2 2
Then, compute for the median of the given. The value of class interval (𝑖 ) is 10
the same as what we used earlier to determine the mean.
𝑛
− 𝑐𝑓𝑏
𝑥̃ = 𝐿𝐵𝑚𝑒 + (2 )𝑖
𝑓𝑚𝑒
50
−5
𝑥̃ = 19.5 + ( 2 )8
20
25 − 5
𝑥̃ = 19.5 + (
)8
20
20
𝑥̃ = 19.5 + ( ) 8
20
𝑥̃ = 19.5 + (1)8
𝑥̃ = 19.5 + 8 = 𝟐𝟕. 𝟓
Therefore, the median is 27.5. The middle most age is more or less 28 years.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 18
3. Mode
𝑓𝑚𝑜 − 𝑓𝑏
𝑥̂ = 𝐿𝐵𝑚𝑜 + ( )𝑖
2𝑓𝑚𝑜 − 𝑓𝑏 − 𝑓𝑎
Where:
𝑥̂ is the mode
𝐿𝐵𝑚𝑜 is the lower boundary of the modal class
𝑓𝑚𝑜 is the frequency of the modal class
𝑓𝑏 is the frequency before the modal class or frequency of
immediate lower class than modal class
𝑓𝑎 is the frequency after the modal class or frequency of
immediate higher class than modal class

Class 20 – 29 has the highest frequency, immediately that is the modal class.
In case two or more have the highest equal frequencies, therefore the classes
with the highest equal frequency are modal classes.

𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝑓 𝑥 𝑓𝑥 ∑𝑓
𝑓𝑏
10 – 19 5 14.5 72.5 5
20 – 29 20 24.5 490 25
30 – 39 10 34.5 345 35
40 – 49 7 44.5 311.5 42
50 – 59 8 54.5 436 50
𝑓𝑎
Total 𝑛 = 50 ∑ 𝑓𝑥 = 1,655

𝑓𝑚𝑜

The frequency of the modal class is 20. The frequency of class before modal class
(lower class immediately next to modal class) is 5. Hence, the frequency of the class
after the modal class (higher class immediately next to modal class) is 10. The class
interval is also 10 (like in the mean and median). Lower limit of the modal class is the
same process as the lower limit of the median class. The average of lower limit of the
modal class and upper limit of the immediate lower class next to modal class.
20 + 19 39
𝐿𝐵𝑚𝑜 = = = 19.5
2 2
Then, compute for the mode.
𝑓𝑚𝑜 − 𝑓𝑏
𝑥̂ = 𝐿𝐵𝑚𝑜 + ( )𝑖
2𝑓𝑚𝑜 − 𝑓𝑏 − 𝑓𝑎
20 − 5
𝑥̂ = 19.5 + ( )8
2(20) − 5 − 10
15
𝑥̂ = 19.5 + ( )8
40 − 15
15
𝑥̂ = 19.5 + ( ) 8
25

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 19
3
𝑥̂ = 19.5 + ( ) 8
5
24
𝑥̂ = 19.5 +
5
𝑥̂ = 19.5 + 4.8
̂ = 𝟐𝟒. 𝟑
𝒙

The mode is 24.3. Most of the age who enter the mall is more or less 24 years old.

References
Almukkahal, R., et. al. (2016). CK-12 Advanced Probability and Statistics Concepts.
Flexbook: next generation textbook.
Australian Bureau of Statistics (2013). What is Variable? Retrieved 04 June 2020 from
https://www.abs.gov.au/websitedbs/a3121120.nsf/home/statistical+langu
age+-
+what+are+variables#:~:text=A%20variable%20is%20any%20characteri
stics,type%20are%20examples%20of%20variables.
Bluman, A. G. (2018). Elementary Statistics: A Step by Step Approach , Tenth Edition,
ISBN 978 – 1 – 259 -75533 McGraw – Hill Education, New York City, USA.
Retrieved 03 June 2020 from https://b-ok.asia/book/5009088/f236d3
Dataceuticc, Inc. (2018). Sir Ronald Aylmer Fisher – The Father of Modern Statistics.
Retrieved 06 June 2020 from
https://www.dataceutics.com/blog/2018/7/24/sir-ronald-aylmer-fisher-
the-father-of-modern-statistics
Encyclopedia Britanica, Inc. (2020). Sir Ronald Aylmer Fisher. Retrieved 06 June
2020 from https://www.britannica.com/science/physical-anthropology
Gupta, S. (2014). Sampling Methods. Retrieved 06 June 2020 from
https://www.slideshare.net/shubhanshug1/seminar-sampling-
methods?qid=d1f11eda-cdd5-44b8-81de-
f0cd88637e6e&v=&b=&from_search=1
Ratner, B. (2009). The correlation coefficient: Its values range between +1/−1, or do
they?. Spring Nature Switzerland. Retrieved 17 June 2020 from
https://doi.org/10.1057/jt.2009.5
Tejada, J.J. & Punzalan, R. B. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician, Vol. 61, No. 1, pp. 129 – 136. Retrieved 06 May 2020 from
https://www.psai.ph/docs/publications/tps/tps_2012_61_1_9.pdf
Weiss, N. A. (2012). Elementary Statistics, 8th Edition, ISBN 978 – 0- 321 – 69123 - 1.
Pearson Education, Inc., Boston, USA. Retrieved 03 June 2020 from https://b-
ok.asia/book/1236722/d339a2
http://onlinestatbook.com/2/calculators/normal_dist.html

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 20
Appendix A

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 21
http://onlinestatbook.com/2/calculators/normal_dist.html

Appendix B.

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 22
http://onlinestatbook.com/2/calculators/normal_dist.html

Mathematics in the Modern World – Data Management (Part 1) – Madrazo, A. (2020), anthonymadrazo5@gmail.com | 23

You might also like