Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

UNIVERSITY OF MINDANAO

College of Arts and Sciences Education


General Education - Mathematics

Physically Distanced but Academically Engaged

Self-Instructional Manual (SIM) for Self-Directed Learning (SDL)

Course/Subject: GE 4 – Mathematics in the Modern World


(Week 4 – 5)

Name of Teacher: Louie Resti S. Rellon

SIM Prepared by: Prof. Ronnie O. Alejan

THIS SIM/SDL MANUAL IS A DRAFT VERSION ONLY.


THIS IS INTENDED ONLY FOR THE USE OF THE
STUDENTS WHO ARE OFFICIALLY ENROLLED
IN THE COURSE/SUBJECT.

THIS IS NOT FOR REPRODUCTION, COMMERCIAL, AND


DISTRIBUTION OUTSIDE OF ITS INTENDED USE.
EXPECT REVISIONS OF THE MANUAL.
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Week 4-5: Unit Learning Outcomes (ULO):


At the end of the unit, you are expected to
a. Summarize and interpret data graphically and numerically; and
b. Apply the concepts and procedure of correlation and regression
analyses.

Big Picture in Focus


ULO-a. Summarize and interpret data graphically and numerically.

Metalanguage

In this section, the essential terms relevant to the study of data management
and to demonstrate ULO-a will be operationally defined to establish a common frame
of reference as to how the texts work. You will encounter these terms as we go
through the study of data management. Please refer to these definitions in case you
will encounter difficulty in understanding some concepts.

1. Statistics provides us the tool through which such data are collected,
analyzed, and presented to arrive at some rich and interesting information.
These tools, which are derived from mathematics, are useful in processing
and managing numerical data to describe a phenomenon and predict
values.

2. Descriptive statistics is a division of statistics where a researcher is using


data gathered from a group to describe or reach conclusions about that
same group.

3. Inferential statistics is a division of statistics where a researcher gathers


data from a sample and uses the statistics generated to reach conclusions
about the population from which the sample was drawn.

4. Population generally consists of the totality of the observations,


individuals, or objects in which the investigator is interested. One should
not start collecting data without carefully defining the population to be
considered in the study.

5. Sample is a portion of a population. This is a small but representative cross


section of the population. It is used to give inferences on the population
from which it was extracted.

2
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Essential Knowledge

To perform the aforesaid big picture (unit learning outcomes) for the fourth and
fifth weeks of the course, you need to fully understand the following essential
knowledge that will be laid down in the succeeding pages. Please note that you are
not limited to refer to these resources exclusively. Thus, you are expected to utilize
other books, research articles, and other resources that are available in the
university’s library e.g., ebrary, search.proquest.com, etc.

1. Data is a set of values collected from the variable from each of the subjects
that belong to the sample. It refers to a collection of natural phenomena
descriptors such as results from experiences, observations or experiments, or
a set of premises. It may consist of numbers, words, or images. A collection
of data values forms a data set. Each value in the data set is called a data
value or a datum.
Data can be classified according to the type of variable for which it was
drawn. There are two general types of data according to how the data vary
across cases:
1.1 Quantitative data – these are data that are usually expressed in
numerical values or obtained by counting or measuring. It can be classified as
discrete data and continuous data.
Discrete data are count data or data obtained from counting.
Examples are the number of children in a family, the number of bicycles sold,
the number of sentences in a paragraph, and number of crimes recorded in a
police station.
Continuous data are also called measurement data because data are
obtained through direct or indirect measuring. Examples are blood pressure
of a person, total land area, weight of an object, and scores in an intelligence
test. Note that not all numeric by nature are quantitative data. Some are just
mere label or name. For example, ID numbers, SSS numbers, etc. These are
numeric but considered qualitative data.
1.2 Qualitative data – also called categorical data or classificatory data.
These are not expressed in numerical values but rather are classified
according to kind or characteristic by which they differ. These data are merely
labeled and classified into categories of statistical analysis. Examples are
gender, nationality, religious affiliation, occupation, and program.

2. Levels of Data Measurement


Millions of statistical data are gathered everyday. These data should
not be analyzed the same way statistically because the entities represented by
the numbers are different. For this reason, statisticians and researchers need
to know the level of data measurement represented by the numbers being
analyzed.
There are four common levels of data measurement:
2.1 Nominal level – is the lowest level of data measurement. The numbers
representing nominal data are used only to identify or classify. These numbers
may serve as labels and have no meaning attached to its magnitude.

3
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Examples are ID number of a student, numbers on the uniform jersey of a


basketball player, and plate number of vehicles.

2.2 Ordinal level – is higher than the nominal level. The numbers are not
only used to classify items but also reflect some rank or order of the individuals,
items or objects. It indicates that objects in one category are not only different
from those in the other categories of the variable but they may also be ranked
as either higher or lower, bigger or smaller, better or worse than those in the
other categories. Examples are ranks given to the winners in a singing contest,
hotel classifications, and military ranks.

2.3 Interval level – is the next to the highest level of data measurement. The
measurements have all the properties of ordinal data, but in addition the
distances between consecutive numbers have meaning. The measurement
units are equal to allow us to determine how far apart the two persons or things
are.
In addition, the zero point value on this level is arbitrary. That is, zero is
just another point on the scale and does not mean the absence of the
phenomenon. Examples are temperature reading in Celsius scale, scores in
intelligence tests, and scholastic grade of a student.

2.4 Ratio level – is the highest level of data measurement. It has the same
properties as interval level but the zero-point value of this level is absolute; that
is, the zero value represents the absence of the characteristic being studied.
Examples are height, weight, time, and volume.

Nominal data are the most limited data in terms of the types of statistical
analysis that can be used with them. Ordinal data allow the researcher to
perform any analysis that can be done with nominal data and some additional
analyses. With ratio data, a statistician can make ratio comparisons and
appropriately do any analysis that can be performed on nominal, ordinal, or
interval data. Some statistical techniques require ratio data and cannot be
used to analyze other levels of data.

3. Methods of Data Collection


Though there are several techniques of collecting data, there is no
generally best method that can be used to obtain the desired information from
the subjects under investigation. The choice of what method to use depends
on the following factors: nature of the problem, the population under
investigation, the time, and the resources.
The following are the methods of data gathering that you can choose
from or you can make combination of any of the methods presented to obtain
the needed accurate information at minimum cost and least possible time.

3.1 Survey is one of the most familiar methods of collecting data. An


important aspect of surveys is the response rate. The response rate is the
proportion of all people who were selected to complete the survey. It can be
done in different ways. Three of the most common methods are the telephone
survey, the mailed questionnaire, and the personal interview.

4
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

3.2 Direct observation is the simplest method of obtaining data. In this


method, data are gathered regarding the behavior, attitudes, values, or cultural
patterns of individuals or organization under investigation.

3.3 Experiment method is more expensive but better way to produce data.
It is used to gather data when the objective of the investigator is to determine
the cause and effect relationship of certain phenomena or variable under
controlled conditions.

3.4 Registration method is also called secondary data. In this method, the
respondents give information in compliance with or as enforced by certain
laws, policies, rules, regulations, decrees, or standard practices. The data is
kept systematized and made available to all because of the requirements of
the law.

4. Methods of Data Presentation


Data that are collected must be organized and presented effectively for
analysis and interpretation. It can be presented in different forms as follows:

4.1 Textual presentation presents data in a paragraph form which combines


text and figures. This is often the case with news items in business, finance,
economics, or the industries which are ordinarily published in the business,
trade or finance sections of local periodicals. The writer can emphasize the
importance of some figures or can call attention on specific data such as
comparisons, contrasts, syntheses, generalizations or findings. This method
when employed alone can elicit boredom to the reader. It is not suggested
when showing the quantitative comparisons or relations among quantitative or
numerical data.

4.2 Tabular presentation presents data on tables. Tabulation is a process


of condensing classified data and arranging them in a table where data can
readily be understood and comparisons can be done more easily. This method
is more effective in showing relationships or comparisons of numerical data. It
gives a more precise, systematic and orderly presentation of data in rows and
columns. It makes comparison of figures easy and comprehensible. In
general, tabular presentation is briefer than the textual method. It also
facilitates analysis of relationship between and among collected data since
these data are systematically arranged. This systematic arrangement is called
statistical table.

4.3 Graphical presentation is the most effective method of presenting


statistical results and can present clear pictures of numerical data.
Presentation of facts are made attractive and meaningful when pictures are
used, making it easy for important information to be grasped by the readers.

There are several kinds of graphs and these are as follows:


a) Bar Graph consists of bars or heavy lines of equal widths, either all
vertical or all horizontal. Bar graphs are constructed for comparative

5
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

purposes. The lengths of the bars represent the magnitudes of the


quantities being compared. Special cases of bar graphs are
compound bar graph and component bar chart.

b) Line Graph is another tool for the graphical presentation of data. It


shows the relationship between two or more sets of quantities. It
may show the relationship between two variables and it is best used
if you want to establish trends.

c) Pie Chart is used to represent quantities that make up a whole. It


is a circular diagram cut into subdivisions. The size of each section
indicates the proportion of each component part of the whole. The
pie chart can be constructed using percent or the actual figures. The
slices of the pie must be drawn in proportion to the different values
of each item. The proportion is then converted to degrees using the
relationship that in a circle there are 360º which will represent the
total items (100%) or we can use the relationship that one percent
is represented by 3.6% on the chart.

6
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

d) Pictogram is sometimes known as picture graph. It utilizes picture


symbols to represent values. The picture drawn would fit the data
being presented. For example, in order to represent population
statistics, the picture of person are drawn; or to represent numerical
data on house construction, the picture of house are drawn.
However, this type of graph has disadvantage. Readers have some
difficulty in estimating broken figures. To correct this, write the
corresponding numerical value together with the picture.

e) Map Graph or Cartogram is used to present geographical data. A


map is drawn and divided into the desired regions. Each region is
distinguished from other regions by using varied lines, colors, or
other symbols like pins. A legend always accompanies a map graph
which tells the meaning of the lines, colors, or other symbols.

5. Measures of Center
One type of measure being used to describe a set of data is the
measure of central tendency which yield information about the center, or
majority, of a group of numbers. It is a single value that stands for or represents
a group of values in the data set. The most common measures are the mean,
median, mode, percentile, and quartile.

7
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

5.1 Mean (denoted by x ) or arithmetic mean is synonymous with the average


of a group of numbers which is the sum of all given values in a distribution
divided by the number of values that were summed.
It is written mathematically as
n

x i
x i 1

n
where xi = individual value
n = total number of values

Example.
The following are the scores in a quiz by ten students in Algebra. Find the
mean score of the data set.
5 12 20 16 15 23 10 18 7 11

Solution.
From the given data set, n = 10.
Solve for the mean.
5  12  20  16  15  23  10  18  7  11 137
x 
10 10
x  13.7

Weighted Mean
Sometimes, in the computation of the mean of data set, each value in
the data set is associated with a certain weight or degree of importance. In
such cases, the weighted mean is computed.
The weighted mean of a set of values can be computed by multiplying
each value with its corresponding weight and taking the sum of the products
and then divided by the total number of weights. Mathematically written as
n

w x i i
xw  i 1
n

w
i 1
i

where xi = individual value


wi = weight of each value

Example.
The final grades of a student in six courses were taken and are shown below.
Compute the student’s weighted mean grade.

Course No. of Units Final Grade


Math 112 3 2.5
English 101 6 2.0
PS 25 3 1.5
Fil 1 3 1.4
Chem 1 5 2.4
PE 1 2 1.1

8
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Solution.
Solve for the weighted grade of each course.
No. of Final
Course Units Grade wx
(w) (x)
Math 112 3 2.5 7.50
English 101 6 2.0 12.00
PS 25 3 1.5 4.50
Fil 1 3 1.4 4.20
Chem 1 5 2.4 12.00
PE 1 2 1.1 2.20
Σw = 22 Σ(wx) = 42.40

Thus, the weighted mean is


n

w x i i
42.40
xw  i 1
n

22
w
i 1
i

xw  1.93

5.2 Median (denoted by x ) is the middlemost value in the data set. It divides
the given distribution into two equal parts.

Example.
Find the median of the following set of measurements.

25 41 56 34 28 67 49 37 52
Solution.
Arrange the data in ascending order

28 32 34 37 41 49 52 56 67

Locate the middlemost value. The middlemost value is the median.


x  41

Example.
Find the median of the given data set.
4.5 2.8 5.6 9.2 3.5 6.7 3.9 8.4

Solution.
Arrange the data in ascending order

2.8 3.5 3.9 4.5 5.6 6.7 8.4 9.2

Locate the middlemost value.


In this case, there are two middle values in the distribution. Obtain the
average of the middle values and the average is the median of the distribution.
4.5  5.6
x
2
x  5.05

9
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

5.3 Mode (denoted by x̂ ) is the value in a frequency distribution which occurs


most frequently or has the highest frequency. It is the value that occurs most
often.

Example.
Find the mode of the following data set.
a. 12 15 13 12 14 17 16 12 13 19
b. 3.4 2.2 3.5 3.4 2.2 2.6 2.1 3.9 2.2 3.4
c. 105 200 159 110 225 170 115 250 285 190

Solution.
a. On the first data set, 12 has the highest frequency in the distribution;
therefore, the mode is
xˆ  12

b. On the second data set, two values have the highest frequency; therefore,
there are two modes and the distribution is called bimodal. The modes
are
xˆ1  3.4 and xˆ2  2.2

c. On the third data set, there is no value that occurs most often; therefore,
there is NO mode in the distribution.

xˆ  Does not exists

Example.
Compare the mean, the median, and the mode for the salaries of 5 employees
of a small grocery store. Which averages could best represent the salaries of
the employees?

Salaries: P25,000 P10,000 P5,000 P3,000 P3,000

Solution.
Computing the mean, median and mode of the salaries of employees, we got
Mean = P9,200
Median = P5,000
Mode = P3,000

The median of P5,000 better represents the average of the salaries than does
either the mean or the mode.

6. Measures of Dispersion
The measures of central tendency give information about the center of
data set. Such descriptions, however, do not adequately describe the
characteristic of the distribution. To do this, we need to compute the degree
of dispersion of the values from the average. These measures are called the
measures of dispersion or variability. It describe how spread the individual
values from the average. Among these measures are the range, variance and
standard deviation.

10
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

6.1 Range is the simplest and the easiest to compute among the measures
of dispersion but it is also the most unstable and the most unreliable measure
because it can easily affected by the extreme values. It is the difference
between the highest and the lowest values in the distribution.
R = HV – LV

6.2 Variance and standard deviation


Variance is the average of the squared deviation of the values about the
arithmetic mean. The differences of the values from the mean will produce
negative differences if the values are below the mean. To avoid this, variance
was developed as an alternative mechanism for overcoming the zero-sum
property of deviations from the mean.
The population variance is denoted by σ2 and can be obtained using the
formula
  x  μ
2

σ 
2

N
where x = individual value
μ = population mean
N = population size

Standard deviation is the square root of the variance. It is popular and


most reliable measure of variability and expressed in the same units as the
raw data, unlike the variance is expressed in those squared units. The
population standard deviation is denoted by σ and can be computed as follows

x  μ
2

σ σ  2

Example.
A sample of six street vendors along San Pedro St. were surveyed and
obtained their average daily income as follows.

₱560 ₱320 ₱440 ₱650 ₱200 ₱490

Compute the variance and standard deviation of their income.

Solution.
Arrange the data in column.
Income
(x)
x  x  x  x 
2

200 -243.33 59,209.49


320 -123.33 15,210.29
440 -3.33 11.09
490 46.67 2,178.09
560 116.67 13,611.89
650 206.67 42,712.49
 x  2660 x  x  
2
132,933.34

11
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Compute for the mean.

x
 x  2660  443.33
n 6

Compute for the variance.


  x  x  132,933.34
2

s 
2

n 1 6 1
s  26,586.67
2

Compute for the standard deviation.

x  x 
2

s  26,586.67  163.05
n 1
Therefore, the sample variance is ₱26,586.67 and the sample standard
deviation is ₱163.05.

7. Measures of Relative Position


In addition to measures of central tendency and measures of
dispersion, there are measures of position which are used to locate the relative
position of value in the data set. Some of these measures are percentiles,
quartiles and standard scores.

7.1 Percentiles (denoted by Pk) are measures of relative position that divide
the distribution into 100 parts. The kth percentile is the value such that at least
k percent of the data are below that value and (100 – k) percent are above that
value.
Percentiles are also used to compare individual’s test score with the
some norm. For example, tests such as the National Secondary Achievement
Test (NSAT) are taken by high school students. A student’s scores are
compared with those of other students locally and nationally using percentile
ranks.
Percentiles are not the same as percentages. If a student gets 75
correct answers out of 100 items in an examination in his class, then he obtains
a percentage score of 75. But this will not tell his position with respect to the
rest of his class. His score could be the lowest, the highest, or somewhere in
between. But if his score of 75 corresponds to the 70th percentile, then he did
better than 70% of the students in his class.
To approximate the percentile rank of value x in the distribution, we
have
Percentile 
number of values below x   0.5  100
total number of values

Example.
A 30-point quiz was given to 10 students and the scores are shown below.
What is the percentile rank of 24?

23 25 19 21 28 15 20 24 22 27

12
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Solution.
Arrange the data in ascending order.
15 19 20 21 22 23 24 25 27 28
There are 6 values below 24.
Determine the percentile using the formula.
6  0.5
Percentile   100
10
Percentile  65 percentile
This means that a student with a score 24 did better than 65% of the class.

7.2 Quartiles (denoted by Qq) are positional measures that divide the
distribution into four parts such as first quartile (Q1), second quartile (Q2) and
third quartile (Q3). The first quartile separates the first one-fourth of the
distribution from the upper three-fourths and is equal to the 25th percentile; the
second quartile separates the first half of the distribution from the upper half
and is equal to 50th percentile and also equal to the median of the distribution;
the third quartile separates the lower three-fourths of the distribution from the
upper one-fourth and is equal to the 75th percentile.
Quartiles can be obtained by first arranging the data set in ascending
order. Next, determine the median of the distribution and that median is the
value of Q2. Then determine the median of the values of the 1st half of the
distribution to get Q1. And finally, determine the median of the values of the 2nd
half of the distribution for Q3.

Example.
Find the value of Q1, Q2, and Q3 of the following scores of students in a class.
20 15 10 29 30 19 12 26 24 18

Solution.
Arrange the data in ascending order.
10 12 15 18 19 20 24 26 29 30

Determine Q2 which is the median of the distribution.


10 12 15 18 19 20 24 26 29 30
Median

19  20
Q2 
2
Q2  19.5
This means that 50% of the students in the class got a score of 19.5 or less.

Determine Q1 which is the median of the lower half of the distribution.


Q1  15
This means that 25% of the students obtained a score of 15 or below.

Determine Q3 which is the median of the upper half of the distribution.


Q3  26
This indicates that 75% of the students got a score of 26 or below.
Equivalently, this means that 25% of the class got a score higher than 26.

13
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

7.3 Standard score or z-score is the number of standard deviations that a


value is above or below the mean of the data set. Observed values above the
mean have positive z-scores while values below the mean have negative z-
scores. The standard score or z-score can be computed using the following
formulas
Population Sample
xμ xx
z z
σ s
where x = observed value where x = observed value
μ = population mean x = sample mean
σ = population standard s = sample standard deviation
deviation
Example.
Johnny scored 72 in a quiz in Algebra for which the average score of the class
was 65 with a standard deviation of 8. He also took a quiz in Statistics and
scored 60 for which the average score of the class was 45 and the standard
deviation was 12. Relative to other students in the class, did Johnny do better
in Algebra or Statistics?
Solution.
Computing the z-scores of Johnny’s scores for each quiz.
For Algebra, For Statistics,
72  65 60  45
z72  z60 
8 12
z72  0.875 z60  1.25

In algebra, Johnny scored 0.875 standard deviation above the mean. In


statistics, he scored 1.25 standard deviations above the mean. These indicate
that relative to his classmates, Johnny scored better in statistics than in
algebra.

8. Normal Distribution
A normal distribution is a very important statistical data distribution
pattern occurring in many natural phenomena, such as height, blood pressure,
lengths of objects produced by machines, etc. Certain data, when graphed as
a histogram (data on the horizontal axis, amount of data on the vertical axis),
creates a bell-shaped curve known as a normal curve, or normal distribution.

14
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Characteristics of a Normal Distribution


1. The normal curve is bell-shaped and has a single peak at the exact center
of the distribution.
2. The sum of the area under the normal curve is 1.
3. The mean, median, and mode of the distribution are equal and located at
the peak.
4. Half the area under the curve is above and half is below this center point
(peak).
5. The normal probability distribution is symmetrical about its mean.
6. It is asymptotic - the curve gets closer and closer to the x-axis but never
actually touches it.

NOTE!
 You can also have normal distributions with the same means but different
standard deviations.
 You can also have normal distributions with the same standard deviation
but with different means.
 You can also have normal distributions with different means and different
standard deviations.
Emperical Rule

Using the empirical rule of a normal distribution, approximately


 68% of the data lie within 1 standard deviation of the mean.
 95% of the data lie within 2 standard deviations of the mean.
 99.7% of the data lie within 3 standard deviations of the mean.

Example.
The daily water usage per person in Davao City is normally distributed with a
mean of 20 gallons and a standard deviation of 5 gallons. Find and interpret
the intervals representing one, two, and three standard deviations of the mean.

15
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Solution.
For one standard deviation of the mean, approximately 68% of the people in
Davao City consumed water between 15 and 25 gallons daily. For two
standard deviations of the mean, approximately 95% of the people consumed
water between 10 and 30 gallons daily. And for three standard deviations of
the mean, nearly all of the people (99.74%) consumed water between 5 and
35 gallons daily.

9. Standard Normal Distribution


If the data set is a normal distribution, it follows that the corresponding
distribution of z-scores is also a normal distribution which is known as the
standard normal distribution. The mean of the transformed z-scores is
equal to 0 and the standard deviation is 1.

The z – value is computed as

xμ
z
σ
where
X - the distance between a selected value,
µ - the population mean
σ - population standard deviation

Example 1. The monthly incomes of teachers in public schools are normally


distributed with a mean of Php 20,000 and a standard deviation of Php 2000.
What is the z–value for an income X of (a) Php 22,000? (b) Php 17,500?
Solution.
a) For X = Php 22,000 with µ = Php 20,000 and σ = Php 2000,
solving for z, we have

X  μ 22,000  20,000
z   1.
σ 2000

A z–value of 1 indicates that the income of Php 22,000 is 1 standard


deviation above the mean income of Php 20,000.

b) For X = Php 17,500 with µ = Php 20,000 and σ = Php 2000, solving for z,
we have

X  μ 17,500  20,000
z   1.25
σ 2000

A z–value of –1.25 indicates that the income of Php 22,000 is 1.25


standard deviation below the mean income of Php 20,000.

Area Under the Normal Curve


Typically the probability distribution does not follow the standard
normal distribution, but does follow a general normal distribution. When this is
the case, we compute the z-score first to convert it into a standard normal

16
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

distribution. Then we can use the table for Areas Under the Normal Curve.
You can visit this site to have a copy of a table:
https://www.westgard.com/normalareas.htm

Example. The daily water usage per person in Davao City is normally distributed
with a mean of 20 gallons and a standard deviation of 5 gallons. Let X be the
daily water usage, what percent uses less than 24 gallons?
Solution.
We graph the problem in a normal distribution graph and see that the shaded
region we are looking for is the area before X = 24.

The z-value associated with the shaded region with X = 24 is

X  μ 24  20
z   0.8
σ 5
To check the probability of the z value, we would refer to the normal distribution
table which is also commonly called the z-table. To locate the probability for z = 0.8,
we look at the ones and tenth’s place value on the 1 st column of the z-table and
intersect it with the column corresponding to the hundredth’s place value of the
computed z value.

z 0.00 0.01 ... 0.09


0.0 0.0000 0.0040 ... 0.0359
0.1 0.0398 0.0478 ... 0.0753
. . . . .
. . . . .
0.8 0.2881 0.2910 ... 0.3133
. . . . .
. . . . .
3.4 0.4997 0.4997 ... 0.4998

Thus, P(X < 24) = P(z < 0.8) = 0.2881 + 0.5 = 0.7881 or 78.81%. This means
that the probability that a person uses less than 24 gallons of water daily is
78.81%.
To help you understand more of this concept, please see the following videos:
https://www.youtube.com/watch?v=mtbJbDwqWLE

https://www.youtube.com/watch?v=2tuBREK_mgE

17
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

You can refer to the source below to help


you further understand the lesson:

Ondaro et al. (2018). Mathematics in the modern world, e-book. Mutya Publishing
House, Inc.

Chapter 2 – Introduction
http://124.105.95.237/index.php/s/AY5PS7tCmWCET24

k. Chapter 2 Lesson 1 - Data Management


http://124.105.95.237/index.php/s/MAfNoiTiG7MgxgC

18
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Activity 1. Now that you know the most essential concepts in the study of the
nature of Mathematics. Let us try to check your understanding of
these concepts. You are directed to answer at least three (3)
exercises from

MMW Practice Set 4 – A


on pages 30 to 31.

Activity 1. Getting acquainted with the essential concepts in problem solving,


what also matters is you should also be able to apply the
mathematical concepts in solving problems. You are expected to
answer at least two (2) exercises each from

MMW Practice Set 4 – B, C, D, & E


on pages 32 to 40.

19
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Activity 1. Based from the most essential concepts in data management and the
learning exercises that you have done, please feel free to write your
arguments or lessons learned below.

1.

2.

3.

20
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Do you have any question for clarification?

Questions / Issues Answers

1.

2.

3.

4.

5.

Descriptive Statistics Inferential Statistics Data set


Mathematical Mathematical
Quantitative data
conventions translations
Interval Ordinal Ratio

Mean, Median, Mode Variance Standard Deviation

Percentile Quartile z-score

21
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Big Picture in Focus


ULO-b. Apply the concepts and procedure of correlation and
regression analyses.

Metalanguage

In this section, the essential terms relevant to the study of correlation and
regression analysis and to demonstrate ULO-b will be operationally defined to
establish a common frame of reference as to how the texts work. You will encounter
these terms as we go through this topic. Please refer to these definitions in case you
will encounter difficulty in understanding some concepts.

1. Correlation analysis is a method of statistical evaluation used to study the


strength of a relationship between two, numerically measured, continuous
variables.

2. Regression analysis is a powerful statistical method that allows you to


examine the relationship between two or more variables of interest.

3. Scatter Plot a graph in which the values of two variables are plotted along
two axes, the pattern of the resulting points revealing any correlation present.

4. Simple Relationship refers to analysis involving two variables - an


independent variable (also called an explanatory variable or a predictor
variable) and a dependent variable (also called a response variable). A simple
relationship analysis is called simple regression, where there is one
independent variable that is used to predict the dependent variable.

5. Multiple Regression is an extension of simple linear regression. It is used


when we want to predict the value of a variable based on the value of two or
more other variables.

6. Correlation Coefficient is a statistical measure of the strength of the


relationship between the relative movements of two variables.

7. Line of Best Fit is a straight line that is the best approximation of the given
set of data. It is used to study the nature of the relation between two variables.

8. Coefficient of Determination is a statistical measurement that examines


how differences in one variable can be explained by the difference in a second
variable, when predicting the outcome of a given event. In other words, this
coefficient, which is more commonly known as R-squared (or R2), assesses

22
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

how strong the linear relationship is between two variables, and is heavily
relied on by researchers when conducting trend analysis.

Essential Knowledge

To perform the aforesaid big picture (unit learning outcomes) for the fourth
and fifth weeks of the course, you need to fully understand the following essential
knowledge that will be laid down in the succeeding pages. Please note that you are
not limited to refer to these resources exclusively. Thus, you are expected to utilize
other books, research articles, and other resources that are available in the
university’s library e.g., ebrary, search.proquest.com, etc.

1. Scatterplot. The scatter plot is a visual way to describe the nature of the
relationship between the variables. It is a graph of the ordered pairs (x, y) of
numbers consisting of the independent variable x and the dependent variable
y.

Basically the independent variable is scaled along the x-axis and the
dependent variable is scaled along the y-axis. Graphing the data on scatter
plot gives preliminary information about the shape and spread of the data.
Example.
Construct the scatter plot of the data shown for the advertising cost (in thousands)
and sales (in thousands) from several companies and determine whether there
seems to be a linear relationship between the two variables.

Advertising Cost 12 8 10 5 12 14 11 8 6
Sales 20 12 15 10 18 20 18 10 11
Solution.
Step 1. Draw and label the x and y axes.
Step 2. Plot each point on the graph as shown.

23
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Based on the plots above, there could be a positive linear relationship


between advertising cost and sale of the company.

2. Correlation analysis is a statistical method used to determine whether a


linear relationship or association between variables exists. The measure of
the degree of correlation is known as the correlation coefficient. It is
computed from the sample data to measure the strength and direction of a
linear relationship of two the variables. The symbol for the sample correlation
coefficient is r while the symbol for the population correlation coefficient is
ρ(rho). The range of values of the correlation coefficient is from –1 to +1.
Using the interval notation, the values of r can expressed as

–1 ≤ r ≤ +1.

If there is a perfect positive linear relationship between the


variables, the value of r is equal to +1. For a perfect negative linear
relationship between the variables, the value of r is equal to –1. When there
is no linear relationship exists between the variables, the value of r is equal
to 0. A positive correlation is present when high values in one variable are
associated with high values of another variable. On the other hand, when
high values on one variable are associated with low values of the other
variable, a negative correlation is present.

24
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

There are several ways to compute the value of the correlation


coefficient. One is known as the Pearson product moment correlation
coefficient (PPMC) or simply the Pearson r named after statistician Karl
Pearson, who pioneered the research in this area. The formula is

n   xy     x   y 
r
n   x     x   n   y     y  
2 2 2 2



where n = number of data pairs


x = observed data for the independent variable
y = observed data for the dependent variable

The value of r is usually computed from data obtained from samples,


therefore, there is a probability that the value of r of the population from where
the sample was taken is not actually zero; that is, the value of r is due to
chance only. Hence, a test for the significance of the correlation coefficient
must be performed.

In hypothesis testing, the sample correlation coefficient r can then be


used as an estimator of population correlation coefficient (ρ) if the variables
are linearly related, random and normally distributed. One of these
hypotheses is true:

H0: ρ = 0 - This means that there is no correlation between the


variables in the population.
H1: ρ ≠ 0 - This means that there is a significant correlation between
the variables in the population.

Using the t–test,


n2
t r
1 r 2

where the degrees of freedom = n – 2, the null hypothesis is rejected


at a specific level if there is a significant difference between the value of r and

25
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

0. When the null hypothesis is not rejected, it means that the value of r is not
significantly different from 0 (zero) and is probably due to chance only.

Example.
The average normal daily temperature (in degrees Celsius) and the
corresponding average monthly precipitation (in inches) for the month of June are
shown here for seven randomly selected cities. Determine if there is a relationship
between the two variables.

Temperature (x) 30 27 28 32 27 23 18
Precipitation (y) 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Solution.
Arranging the data in table as shown.

City x y xy x2 y2
A 30 3.4 102.00 900.00 11.56
B 27 1.8 48.60 729.00 3.24
C 28 3.5 98.00 784.00 12.25
D 32 3.6 115.20 1024.00 12.96
E 27 3.7 99.90 729.00 13.69
F 23 1.5 34.50 529.00 2.25
G 18 0.2 3.60 324.00 0.04
Σx = 185 Σy=17.70 Σxy = 501.80 Σx =5019.00
2
Σy = 55.99
2

Substitute the corresponding values to the formula for r.

n   xy     x   y 
r
n  x    x   n  y    y  
2 2 2 2

   

7  501.80   185 17.70 


r
7  5019.00   185 2  7  55.99   17.70 2 
  
r  0.891

The correlation coefficient suggests a very strong positive relationship


between the average normal daily temperature and the corresponding
average monthly precipitation.

Test the significance of the correlation coefficient found in example above.


Use α = 0.05, n = 7 and r = 0.891.

Following the five-step process in hypothesis testing, we have


1. Formulate the null and alternative hypotheses.
H0 : ρ = 0
H1 : ρ ≠ 0
2. Specify the level of significance. α = 0.05
3. Critical value of t-test.

26
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Type of Test: Two-Tailed


df = n – 2 = 7 – 2 = 5
tt = 2.571 (refer to t distribution table)

4. Computation:
n2
tc  r
1 r 2
72
tc  0.891
1   0.891
2

tc  4.925

5. Decision.
Reject H0 because | tc | > | tt | and conclude that there is a significant
relationship between the average normal daily temperature and the
corresponding average monthly precipitation.

Correlation and Causation


If the two variables have a significant relationship, then any of the
following possible relationships exists between them:
 there is a direct cause-and-effect relationship between the variables; that
is, x causes y.
 there is a reverse cause-and-effect relationship between the variables; that
is, y causes x.
 the relationship between the variables may be caused by a third variable.
 there may be a complexity of interrelationships among many variables.
 the relationship may be coincidental.
Note that when the null hypothesis is rejected, the researcher must
consider all possibilities and select the appropriate one as determined by the
study. According to Bluman (2012), correlation does not necessarily imply
causation.

3. Regression analysis is the process of formulating a mathematical model that


can be used to predict or determine one variable by another variable/s. In
simple regression analysis, only a straight-line relationship between the two
variables is examined; one independent variable and one dependent variable.
Given a scatter plot, you must be able to draw the line of best fit. Best
fit means that the sum of the squares of the vertical distances from each point
to the line is at a minimum. The reason you need a line of best fit is that the
values of y will be predicted from the values of x; hence, the closer the points
are to the line, the better the fit and the prediction will be.

27
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Simple regression analysis considers a straight-line relationship


between two variables. This linear relationship can be expressed in an
equation in the form
Ŷ  a  bx

where Yˆ = predicted value of the dependent variable


a = the y-intercept
b = the slope of the line

For the slope of the line,


n   xy     x   y 
b
 x    x 
2 2
n
For the y-intercept,

a
 y  b  x   a  y  bx
n n

Example.
A law enforcement officer obtained a data on the performance rating of police
offices and the crime solution efficiency in their respective area of responsibility for
the last 6 months. Use the equation of the regression line to predict the crime solution
efficiency of the city with the police office performance rating of 82.

Performance Rating (x) 85 89 91 93 84 89


Crime Rate (y) 89 90 92 92 88 90
Solution.
Arranging the data in table as shown.

Month x y xy x2
1 85 89 7565 7225
2 89 90 8010 7921
3 91 92 8372 8281
4 93 92 8556 8649
5 84 88 7392 7056
6 89 90 8010 7921
n= 6 Σx = 531 Σy =541 Σxy =47,905 Σx2 = 47,053

28
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Solve for the slope of the regression line.

n   xy     x   y 
b
 x    x 
2 2
n
6  47,905    531 541
b
6  47,053    531
2

b  0.4454

Solve for the y-intercept.


541  531
a   0.4454 
6 6
a  50.75

Determine the regression equation.


Yˆ  a  bx
Yˆ  50.75  0.4454 x
Solve for the crime solution efficiency if police office performance rating is 82.
Yˆ  50.75  0.4454  82 
Yˆ  87.23

The coefficient of determination, denoted by r2, is a number that


expresses the proportion of the total variation in the values of the dependent
variable that can be explained by the linear relationship with the values of the
independent variable. If the coefficient of determination is 100%, then there is
no unexplained variation between the two variables. Coefficient of
determination can be obtained by squaring the correlation coefficient.

coefficient of determination, r2 = (r)2 x 100

Using the data in Example above, and determine by how much of the
variation of the crime solution efficiency is due to the variations of the
performance rating of police office. Upon computing the correlation
coefficient, we get r = 0.959.
Solve for the coefficient of determination.

r 2   r  x 100   0.959  x 100


2 2

r 2  91.97%

This result means that 91.97% of the variation in the crime solution
efficiency is accounted for by the variations in the performance rating of the
29
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

police office in the area. The rest of the variation, 0.0803 or 8.03%, is
unexplained and is called the coefficient of alienation.

You can refer to the source below to help


you further understand the lesson:

Ondaro et al. (2018). Mathematics in the modern world, e-book. Mutya Publishing
House, Inc.

Chapter 2 – Introduction
http://124.105.95.237/index.php/s/AY5PS7tCmWCET24

k. Chapter 2 Lesson 1 - Data Management


http://124.105.95.237/index.php/s/MAfNoiTiG7MgxgC

30
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Activity 1. Now that you know the most essential concepts in the study of the
data management. Let us try to check your understanding of these
concepts. You are directed to answer exercises number 1 and 2 from

MMW Practice Set 4 – F


on page 41.

Activity 1. Getting acquainted with the essential concepts in data management,


what also matters is you should also be able to apply these concepts
in solving problems. You are directed to answer exercises number 3
and 4 from

MMW Practice Set 4 – F


on page 42.

31
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Activity 1. Based from the most essential concepts in data management and the
learning exercises that you have done, please feel free to write your
arguments or lessons learned below.

1.

2.

3.

32
College of Arts and Sciences Education
General Education - Mathematics
2nd Floor, DPT Building, Matina Campus, Davao City
Phone No.: (082)300-5456/305-0647 Local 134

Do you have any question for clarification?

Questions / Issues Answers

1.

2.

3.

4.

5.

Correlation Regression Scatterplot


Simple linear
Line of best fit Pearson r
relationship
Coefficient of Perfect positive
No correlation
correlation correlation

33

You might also like