Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

RIZAL TECHNOLOGICAL UNIVERSITY

Cities of Mandaluyong and Pasig

Section 2: Mathematics as a Tool


Part 1

MODULE 4: Data Management

A. Gathering, Organizing, Representing, Analyzing and Interpreting Data


B. Measures of Central Tendency
C. Measures of Dispersion
D. Measures of Relative Position
E. Probabilities and Normal Distribution
F. Correlation ang Regression

Overview

Data Management is the whole process of dealing with data from the very
beginning of the study with data analysis as the last part of it. It is actually
divided into three phases with phase 1 being the preparation of data entry that
includes review of questionnaire forms, coding, preparation of master sheets or
spread sheets, dummy tables and quality control; phase 2 is the data entry and
the 3rd and last phase as mentioned is the data analysis.

Data analysis can either be descriptive or analytic. Descriptive analysis can


be done in three methods: tabular, graphic or numeric. Analytic utilizes
principles of statistics to test a hypothesis.

In short, data management is all about statistics both descriptive and


inferential.

According to Florence Nightingale, statistics is “the most important science


in the whole world: for upon it depends the practical application of every other
science and of every art: the one science essential to all political and social
administration, all education, all organization based on experience, for it only
gives results of our experience.”

Prepared by Lhalili
GE04 1
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

*slide taken from the 2nd generation training on MMW, Mapua Institute of
Technology (2017)

Study Guide

1. Download or read and understand the module for Chapter 4.


2. Watch corresponding video lectures.
3. Attend synchronous classes.
4. Answer the assignment.

Learning Outcomes

LO1: Use variety of statistical tools to process and manage numerical data.
LO2: Identify and define basic concepts of probability.

LO3: Determine the range of probability values and find the probability of an event.

LO4: Calculate areas under the normal curve.

LO5: Obtain specific percentiles of the normal distribution.

LO6: Use the methods of linear regression and correlations to predict the value of a variable
given certain conditions.

LO7: Advocate the use of statistical data in making important decisions.

Prepared by Lhalili
GE04 2
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

PART 1 of Module 4

GATHERING, ORGANIZING, REPRESENTING, ANALYZING AND


INTERPRETING DATA

Statistics has both a plural and singular sense. In its plural sense, statistics
refers to numerical facts that are systematically collected and analyzed. In its singular
sense, statistics refers to the scientific discipline consisting of theory and methods for
processing numerical information that one can use when making decisions in the face
of uncertainty. The recognition of uncertainty and the importance of statistical
activities are likely to be as old as civilization itself. Even before the art of counting
was perfected, there is evidence to suggest that herdsmen were putting notches on
trees to keep track of their cattle. In its plural and singular sense, the term Statistics
refers to quantities computed from numerical information (Philippine Statistical
Association, 2008).
As such, statisticians are involved with methods of data collection, data
summarization, and data analyses, as well as communicating the results of its
analyses.

Statistics is a science that deals with the collection, presentation, analysis and
interpretation of data.

Areas of Statistics
1. Descriptive Statistics are methods concerned w/ collecting, describing, and
analyzing a set of data without drawing conclusions (or inferences) about a
large group.
2. Descriptive Statistics are methods concerned w/ collecting, describing, and
analyzing a set of data without drawing conclusions (or inferences) about a
large group.

Statistical methods have two broad aims: (a) to describe, and (b) to infer. In
the first case, the main task is that of data organization and presentation (without
drawing conclusions or inferences beyond the data). These tools are called
descriptive statistical methods. In the second case, the task is to generalize results
beyond the data collected provided that the data collected is a part (sample) of a large
set of items (population). In this case, the statistical analysis required is inferential
statistical methods.

Prepared by Lhalili
GE04 3
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Definition of Terms

The universe/physical population is the collection of things or observational units


under consideration.

A variable is a characteristic observed or measured on every unit of the universe.


The statistical population is the set of all possible values of the variable.
Measurement is the process of determining the value or label of the variable based
on what has been observed.
An observation or variate is the realized value of the variable.

Data is the collection of all observations.

Prepared by Lhalili
GE04 4
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Parameters are numerical measures that describe the population or universe of


interest. Usually donated by Greek letters;  (mu),  (sigma),  (rho),  (lambda), 
(tau),  (theta),  (alpha) and  (beta).

Statistics are numerical measures of a sample.

Value Statistic Parameter


Mean 𝑥̅ 𝜇
Standard Deviation 𝑠 𝜎
Variance 𝑠2 𝜎2
Proportion 𝑝 𝜌

Kinds of variable and Data

The building blocks of statistical science are data. They come in diverse
range of formats and each type gives us a unique type of information. Data
represent the measured value of variables.

When an observation unit, e.g., a person, a family, a firm, has a characteristic


that may vary from unit to unit, the characteristic is called a variable. Variables are
characteristics or feature of the thing we are interested in people. In Psychology for
example, people are the subjects of studies so variables include levels of stress,
anxiety and physical health. We use statistics to understand if and how they are
related. There is a need to understand the nature of data: what they represent and
where they come from.

Variables in General

 Qualitative variables are those that express a qualitative attribute. They


describe the quality or character of something.
 Quantitative variables are those variables that are measured in terms of
numbers. They describe the amount or number of something that can be
classified either as discrete or continuous.

Examples:

Discrete: Number of apples in a box, Number of Students who are absent, COVID19
cases in the Philippines

Continuous: Time to respond to a stimulus, Height of first year college students,


Distance traveled

Prepared by Lhalili
GE04 5
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

A. DATA COLLECTION

Data Collection: Population and Samples


Example #1: You were hired by the Commission on Elections to examine how
Filipinos feel about the voting procedures in the Philippines. Who will you ask?

The population of a study is all of the individuals, items or units relevant to the
study. It comprises individuals, groups, organizations, documents, campaigns,
incidents and so on. It is also called the “universe”. Samples are subsets of the
population selected to represent the population.

Inferences from statistics are based on the assumption that sampling is


representative of the population otherwise, there is a possibility that sampling bias
occurs where the conclusions only apply to the samples and are not generalizable to
the population.

Example #2: A substitute teacher wants to know how students in the class did on
their last test. The teacher asks the 10 students sitting in the front row to state their
latest test score. He concludes from their report that the class did extremely well.

Levels of Measurement

The measurement process is an integral part of data collection. If the unit of analysis
is an individual person, many characteristics of that person, some visible and other
invisible, can be measured. Visible characteristics include sex, skin color, age,
height, weight, eye color and hair color. Invisible characteristics include intelligence,
prejudice, authoritarianism, alienation, paranoia, love and hate. Measurement is the
assignment of numbers to objects or events according to a predetermined set of
rules. To measure a property means to assign numbers to units as a way of
representing that property.

The kind of analysis that one can perform on the available data critically depends on
its scale of measurement or level of measurement.

Ratio Measurement level is an interval scale with the additional property that its
zero position indicates the absence of the quantity being measured. It also tells us
that one unit has so many times as much of the property as does another unit.

Properties
1. Numbers in the system are used to classify a person or an object into
distinct, non-overlapping and exhaustive categories.
2. Arrangement of the categories in the system according to magnitude.
3. There is a fixed unit of measurement representing a set size throughout the
scale.
4. There is an absolute zero (integral zero0 in the system.

Example: 5GB data allowance compare to a 10GB data allowance

Prepared by Lhalili
GE04 6
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

An Interval Measurement level is a numerical scale in which intervals have the same
interpretation throughout. It also tells us that one unit differs by a certain amount of
the property from another unit

Properties
1. Numbers in the system are used to classify a person or an object into distinct,
non-overlapping and exhaustive categories.
2. Arrangement of the categories in the system according to magnitude.
3. There is a fixed unit of measurement representing a set size throughout the scale.

Example: IQ

Ordinal Measurement level (for ranked data) allows comparisons of the degree to
which two subjects possess the dependent variable. It also tells when one unit has
more of the property than does another unit.

Properties
1. Numbers in the system are used to classify a person or an object into distinct,
non-overlapping and exhaustive categories.
2. Arrangement of the categories in the system according to magnitude.

Example: t – shirt sizes, employee ratings, salary grade, score in a


personality/beauty test

Nominal Measurement level (for categorical data) where one simply name or
categorizes responses. The nominal scale is the simplest scale of measurement for
variables where a value or unit of data is assigned to one of at least two distinct and
exhaustive categories.

Property
1. Numbers in the system are used to classify a person or an object into
distinct, non-overlapping and exhaustive categories.

Examples: sex, employment status, race, marital status, religious affiliation,


language spoken at home

The scale of measurement depends mainly on the method of measurement, not on


the property measured. The weight of a tray of eggs measured in grams has an
interval/ratio scale, but if the boxes are labeled as one of small, medium large, the
weight is then measured in an ordinal scale.

In summary:

Type of Scale Characteristics of Scale Basic Empirical Operation


Ratio Has order, distance and unique Determination of equality of

Prepared by Lhalili
GE04 7
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

origin ratios
Interval Both with order and distance but Determination of equality of
no unique origin intervals or difference
Ordinal Has order but no distance or Determination of greater or
unique origin lesser values
Nominal No order, distance, or origin Determination of Equality

Methods of Collecting Data


1. Interview Method
a. Direct Method
b. Indirect Method
2. Questionnaire Method
3. Observation Method
4. Test Method
5. Registration Method
6. Experimentation Method
7. Mechanical Devices
8. Others

B. DATA PRESENTATION

Big data hardly give information that can be of help in making decisions fast.
Management of modern business or any organization demand fast and accurate
decisions for they can be ruined by other competitors who are knowledgeable in
summarizing and interpreting large mass of data.

Definition of Terms
Raw data are data in their original form just as they were collected.
Constants are quantities that do not change under the same condition.
Variables are quantities that change over time and in some location.
Variates are the actual values of the variable.

Methods of Presenting Data


1. Textual
2. Tabular
3. Graphical

Tabular Presentation

Frequency Distribution is an arrangement of data, which shows the frequency of


different values or group of values of a variable.

Components of a Frequency Distribution Table

Class interval is made up of a lower limit and upper limit.

Prepared by Lhalili
GE04 8
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Class Frequency (f) or count is the number of observations belonging to a class


interval.
Class Mark (X) is the midpoint of the class interval obtained by averaging the lower
and upper limits.
Class Boundaries or exact limits are the true limits of the class interval obtained by
subtracting 0.5 to the lower limit and adding 0.5 to the upper limit.
Cumulative Frequency (cf) is obtained by adding absolute frequencies.
Relative Frequency (rf) is the ratio of the class frequency (f) to the total number of
cases (N).

Array is an arrangement of raw data which may be ascending or descending.

Stem and Leaf Display or a Stem and Leaf Plot is a method of graphically presenting
quantitative data likened to a histogram that helps in the visualization of the shape
of the distribution.

Steps in the Construction of a Frequency Distribution Table


1. Determine the range.
Range = Highest Value – Lowest Value
2. Compute n, the number of groupings.
𝑛 = 1 + 3.322 log 𝑁
3. Determine the class size, i.
𝑅
𝑖=
𝑛
4. Use the lowest score as the lower limit of the lowest class interval and obtain
the upper limit by adding 𝑖 − 1 to it.

Example. Scores of students in a GE04 test

10 20 50 23 21 12 13 15 24 25 26 23
24 25 28 24 56 20 10 32 30 31 13 25
65 45 51 42 35 65 36 28 35 40 60

Solution:

Array of scores
10 10 12 13 13 15 20 20 21 23 23 24
24 24 25 25 25 26 28 28 30 31 32 35
35 36 40 42 45 50 51 56 60 65 65

Stem and Leaf Display

Prepared by Lhalili
GE04 9
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

1. Range = 65 – 10 = 55
2. 𝑛 = 1 + 3.322 𝐿𝑜𝑔𝑁 = 1 + 3.322 𝐿𝑜𝑔35 = 6.13 ≈ 6
55
3. 𝑖 = 6 = 9.17 ≈ 9
4. lower limit of the lowest class interval: 10 (lowest score)
upper limit of the lowest class interval: 10 + (6 − 1) = 15

Frequency distribution table of the scores of students in a GE04 test


Class f X Class <cf >cf rf rp(%)
Interval Boundaries
10 – 18 6 14 9.5 – 18.5 6 35 0.1714 17.14
19 – 27 12 23 18.5 – 27.5 18 29 0.3429 34.29
28 – 36 8 32 27.5 – 36.5 26 17 0.2286 22.86
37 – 45 3 41 36.5 – 45.5 29 9 0.0857 8.57
46 – 54 2 50 45.5 – 54.5 31 6 0.0571 5.71
55 – 63 2 59 54.5 – 63.5 33 4 0.0571 5.71
64 – 72 2 68 63.5 – 72.5 35 2 0.0571 5.71
N=35

Graphical Presentation
1. Line Graph
2. Bar Graph
3. Circle Graph or Pie Graph
4. Pictograph or Picture Graph
5. Scatter Plots or Scatter gram
6. Statistical maps

Graphical Presentations of Frequency Distribution


1. Frequency Histogram is a special kind of bar graph where the bars are placed
adjacent to each other.
x – axis: lower boundaries
y – axis: f
2. Frequency Polygon is a special kind of line graph obtained by plotting the
midpoints and the frequencies that starts and ends with the x – axis.
x – axis: class marks or midpoints
y – axis: f

Prepared by Lhalili
GE04 10
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

3. Ogive is the graph of the cumulative frequency distribution.


x – axis: lower boundaries
y – axis: cf

Examples

Prepared by Lhalili
GE04 11
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Prepared by Lhalili
GE04 12
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Example

The following scores represent the final examination grade for an elementary
statistics course:

22 60 49 32 57 74 52 70 82 36
80 77 81 95 41 65 92 85 55 76
52 10 64 75 78 25 80 98 81 67
41 71 83 54 64 72 88 62 74 43
60 78 89 76 84 48 84 90 15 79
34 67 17 82 69 74 63 80 85 61

1. Construct a frequency distribution table to include columns for f, X, class


boundaries and cf (< and >).
2. Construct the histogram, frequency polygon and ogive using the data.

Solution:
1. FDT
a. Range = 98 – 10 = 88
b. 𝑛 = 1 + 3.322𝐿𝑜𝑔60 = 6.9~7
88
c. 𝑖 = 7 = 12.57~13
d. 10 + (13 − 1) = 22

You can make an array of scores using excel

Prepared by Lhalili
GE04 13
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Prepared by Lhalili
GE04 14
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

2.

Prepared by Lhalili
GE04 15
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Assessment

ASSIGNMENT: The following data give the hours worked last week by employees
of a company.
42 48 42 45 53 34 40 23 21 38
51 40 35 42 31 47 48 34 40 40
16 27 36 39 39 51 41 43 40 36
25 27 57 28 52 45 25 26 39 41
60 52 54 35 27 46 10 22 36 30
25 22 20 52 41 33 58 60 25 26

a) Construct a frequency distribution table. Include in the table f, class marks


or midpoints (X), cf (< & >).
b) Draw the histogram, frequency polygon and ogive using the given data
(you may use graphing paper for the 3 graphs – one graph each).

Prepared by Lhalili
GE04 16
RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

References
E. M. Adina & R. T. Earnhart. Mathematics in the Modern World Second Generation
Training. Mapua Institute of Technology. 2017

Training Manual on Teaching Basic Statistics. Philippine Statistical Association


(PSA) & Statistical Research & Training Center (SRTC). 2007.

Training on Teaching Basic Statistics for Tertiary Level Teachers Summer 2008
Elementary Statistics: A Handbook of Slide Presentation prepared by Z.V.J. Albacea,
C. E. Reano, R. V. Collado, L. N. Cornia and N. A. Tandang. Institute of Statistics, CAS,
UP Los Baños, Laguna. 2005

Foster, Garett C.; Lane, David; Scott, David; Hebl, Mikki; Guerra, Rudy; Osherson,
Dan; and Zimmer, Heidi, "An Introduction to Psychological Statistics" (2018). Open
Educational Resources Collection.

Prepared by Lhalili
GE04 17

You might also like