Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev.

0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

MODULE 4

DATA MANAGEMENT

MODULE OVERVIEW

This module consist of five lessons : Measure of Central Tendency ,Measures of Dispersion,
Measures of Relative Position, ,Normal Distribution, and Regression and Correlation .Each lesson
was designed as a self-teaching guide. Definitions of terms and examples had been incorporated.
Answering the problems in “your turn” will check your progress. You may compare your answers to
the solutions provided at the later part of this module in that way you will be able to measure your
achievement and as well as the effectiveness of the module. Exercises were prepared as your
assignment to measure your understanding about the topics.

MODULE LEARNING OBJECTIVES

At the end of the module, you should be able to:


• Use a variety of statistical tools to process and manage numerical data
• Use the methods of linear regression and correlations to predict the value of a variable given
certain conditions
• Advocate the use of statistical data in making important decisions

LEARNING CONTENTS (MEASURES OF CENTRAL TENDENCY)

Introduction
Numerical data is everywhere and everyday more data is being generated. It is important for
us to have a working knowledge of basic statistical concepts and tools so that we can use this data
correctly and optimally. A lot of data in in raw, - that is not been processed for use yet.

Discussion
Statistics involves the collection, organization, summarization, presentation, and interpretation of
data. The branch of mathematics that involves the collection of organization, summarization, and
presentation of data is called descriptive statistics. The branch that interprets, and draws conclusions
from the data is called inferential statistics.

Lesson 1: Measure of Central Tendency


A measure of central tendency is a summary measure that attempts to describe a whole set
of data with a single value that represents the middle or center of data set. Most commonly used
measures of central tendency or type of averages are arithmetic mean, median and mode.

PANGASINAN STATE UNIVERSITY 1


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Arithmetic Mean
The arithmetic mean or just simply mean is the sum of the value of each observation in a data
set divided by the number of observations. The traditional symbol used to indicate a summation is
the Greek letter 𝑠𝑖𝑔𝑚𝑎, Σ . Thus, the notation Σ𝑥, called summation notation, denotes the sum of all
numbers in a given set .
The definition is the same for both the sample (portion of the whole population) and population
(is a collection of all possible observations under a particular study), although we use different symbol
to refer to each. The sample mean
The symbol for the sample mean is 𝑥 bar (𝑥̅ ), and for the population mean is the Greek letter
mu (µ).

Mean

The mean of 𝒏 is the sum of the numbers divided by 𝒏 .


∑𝒙
The mean score of a sample 𝑥̅ , or any other
Meanmeasure
= 𝒏 based on a sample data is called
statistic. And any measurable characteristic of a population is called parameter. The mean of a
population 𝜇, is a parameter.

Example 1
. Six friends in a Mathematics in the Modern World class of 25 students
received test grades of 92 , 84 , 65 , 76 , 88 , and 90 .
a. Find the mean of these test scores.
b. Is the mean computed, a statistic or a parameter? Why?

Solution
a. The six friends are sample of the population of 20 students. Use 𝑥̅ instead of 𝜇 to represent the
mean .
∑ 𝒙 92 + 84 + 65 + 76 + 88 + 90
𝑥̅ = = = 𝟖𝟐. 𝟓
𝒏 6

The mean of the test scores is 82. 5.


b. Statistic , because it is a sample value.

Your turn 1 The daily wages of 10 employees of Home depot are : ₱500, ₱750 , ₱430,
₱630, ₱450, ₱440, ₱700, ₱350, ₱𝟓80, ₱630.

a. Find the mean of the daily salaries of the employees.


b. Is the mean computed, a statistic or a parameter? Why?

Median
The median is the middle number of the mean of the two middle numbers in a list of numbers
that have been arrange in numerical order from smallest to largest or largest to smallest. Any list of
numbers arranged in numerical order from smallest to largest or largest to smallest is a ranked list.

PANGASINAN STATE UNIVERSITY 2


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Median
The median of a ranked list of 𝑛 numbers is :
▪ The middle number if 𝑛 is odd
▪ The mean of two middle numbers if 𝑛 is even

Example 2 Find the median of the data in the following lists.

a. 4, 8,1,4,9,21,12 b. 46, 23, 92, 89,77, 108

Solution
a. The list 4, 8,1,4,9,21,12 contains 7 numbers. The median of a list with an odd number of entries is
found by ranking the numbers and finding the middle number.
Ranking the numbers from smallest to largest gives

1, 4, 8, 9, 12, 14, 21
The middle number is 9. Thus 9 is the median.

b. The list 46, 23, 92, 89,77, 108 contains 6 numbers. The median of the list of data with an even
number of entries is found by ranking the numbers and computing the mean of the two middle
numbers. Ranking the numbers from smallest to largest gives

23, 46, 77, 89, 92, 108


The two middle numbers are 77 and 89. The mean of 77 and 89 is 83. Thus 83 is the median of the
data.

Your turn 2 Find the median of the data in the following :


212
a. A sample of senior citizens in Lingayen, Pangasinan receiving Social Security payments revealed
these monthly benefits : ₱3400 , ₱2000 , ₱4000 , ₱4300 , ₱2500 , ₱3600 , ₱3500, ₱5000.

b. The scores in a quiz of nine students in MMW class are : 2, 4, 10, 7, 8, 0,5, 8, and 2.

Mode
The mode is another measure of type of average.

Mode
The value of the observation that appears most frequently.

PANGASINAN STATE UNIVERSITY 3


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Some lists of numbers do not have a mode. For instance, 1, 6, 8,10,32,15,49, each of number
occurs exactly once. Because no number occurs more often than the other numbers, there is no mode.
A list of numerical data can have more than one mode. For instance, in the list 4, 2 6, 2, 7, 9,
2, 4, 9, 8, 9, 7, the numbers 2 and 9 occurs three times . Thus 2 and 9 are both modes of the data .

Example 3 Find the mode of the data in the following lists.

a. 18, 15, 21, 16, 15, 14, 15, 21 b. 2,5, 8, 9, 11, 4, 7, 23

Solution
a. In the list 18, 15, 21, 16, 15, 14, 15, 21, the number 15 occurs more often that the other numbers.
Thus 15 is the mode.

b. Each of the number in the list 2,5, 8, 9, 11, 4, 7, 23 occurs only once. Because no number occurs
more often than others, there is no mode.

Your turn 3 Find the mode of the data in the following lists.

a. 3, 3, 3, 4, 4, 4, 5, 5, 5, 8 b. 12, 34, 12, 71, 48, 93, 71, 12

The mean, median , and mode are all averages; however, they are generally not equal. The
mean of a set of data is most sensitive of the averages. A change of the numbers changes the mean,
and the mean can be changed drastically by changing an extreme value.
In contrast, the median and the mode of a set of data are usually not changed by changing an
extreme value.
When a data set has one or more extreme values that are very different from the majority of
values, the mean will not necessarily be a good indicator of an average value. In the following
example, we compare the mean, median , and the mode for the salaries of five employees of a small
company.
Salaries : ₱370,00 ₱60,000 ₱36, 000 ₱20,000 ₱20,000

The mean is
506,000
= 101,200
5
The median is ₱36, 000 and the mode is₱20, 000 . The data contain one extreme value that is much
larger than the others. This extreme value makes the mean considerably larger than the median. So,
you would probably agree that ₱36, 000 better represents the average of the salaries than does either
mean or the mode.

Computer Solution
We can use spreadsheet like to find the mean, media, and the mode of a certain data set.
Consider the following satisfaction level ratings of 35 people.

PANGASINAN STATE UNIVERSITY 4


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

9 12 10 8 9 12 12
11 14 12 10 8 10 9
12 8 14 13 7 9 10
12 8 12 14 9 8 13
10 9 9 11 10 11 10

The following screen shot shows the mean , median and the mode for 35 ratings (occupying cells A2
to A36), as calculated by the spreadsheet’s built –in statistical functions.

The formula is : The formula is : The formula is :


= 𝐴𝑉𝐸𝑅𝐴𝐺𝐸(𝐴2: 𝐴36) = 𝑀𝐸𝐷𝐼𝐴𝑁(𝐴2: 𝐴36) = 𝑀𝑂𝐷𝐸(𝐴2: 𝐴36)

The Weighted Mean

The weighted mean of given groups data is the average of the mean of all the groups. Is often
used when some data values are more important than others.

The Weighted Mean


The weighted mean of the 𝑛 numbers 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 with respective assigned weights
𝑤1 , 𝑤2 , 𝑤3 , … , 𝑤𝑛 is

Σ(𝑥⋅𝑤)
Weighted mean =
Σw
where :
𝑤 = weight of each item
𝑥 = value of each item

PANGASINAN STATE UNIVERSITY 5


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Example 4 Table 1.1 shows Janet’s first semester course grades. Use the weighted mean
formula to find the Janet’s GPA for the spring semester.

Table 1.1 . Janet’s Grades, First Semester


Course Course Grade Course Units

Physics 1.75 4

Statistics 2.25 3
Psychology 2.75 3

P.E 1.5 2

Solution
Σ(𝑥⋅𝑤)
Weighted mean =
Σw
(1.5×4)+(2.0×3)+(2.5×3)+(1.75×2)
=
4+3+3+2

= 𝟐. 𝟎𝟖
Janet’s GPA for first semester is 2.08 .

Your turn 4 A man bought 10 liters of premium gasoline at P11.50 per liter, 12 liters at
P12.01 per liter and 18 liters at P11.78 per liter from three different gasoline
stations. Find the mean price per liter.

LEARNING POINTS
A measure of central tendency is a summary measure that attempts to describe a whole set of data
with a single value that represents the middle or center of data set. Most commonly used measures
of central tendency or type of averages are arithmetic mean, median and mode.

LEARNING ACTIVITY 1

In letters 1 to 5. Find the mean, the median, and the mode(s), if any, for the given data.
Round noninteger means to the nearest tenth.

Mean Median Mode

1. 2, 7, 5, 7, 14

PANGASINAN STATE UNIVERSITY 6


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

2. 8, 3, 3, 17, 9, 22, 19

3. 11, 8, 2, 5, 17, 39, 52, 42

4. 101, 88, 74, 60, 12, 94, 74, 85


5.
2.1, 4.6, 8.2, 3.4, 5.6, 8.0, 9.4, 12.2, 56.1, 78.2

6. The final grades of a student in six courses were taken and are shown below. Compute the
student’s weighted mean grade.
Courses No. of Units Final Grade
Math 112 3 2.5
English 101 6 2.0
PS 25 3 1.5
Fil 1 3 1.4
Chem 1 5 2.4
PE 1 2 1.1

7. A professor grades students on 4 tests, a term paper, and a fi nal examination. Each test counts as
15% of the course grade. The term paper counts as 20% of the course grade. The final examination
counts as 20% of the course grade. Alan has test scores of 80, 78, 92, and 84. Alan received an 84 on
his term paper. His fi nal examination score was 88. Use the weighted mean formula to fi nd Alan’s
average for the course. Hint: The sum of all the weights is 100% = 1.

8. After 6 math tests, Zia has a mean score of 88. What score does Ruben need on the next test to
raise his average (mean) to 90?

9. After 4 algebra tests, Alisa has a mean score of 82. One more 100-point test is to be given in this
class. All of the test scores are of equal importance. Is it possible for Alisa to raise her average
(mean) to 90? Explain
.
10. Pick six numbers and compute the mean and the median of the numbers.
a. Now add 12 to each of your original numbers and compute the mean and the median for this new
set of numbers.
b. How does the mean of the new set of data compare with the mean of the original set of data?
c. How does the median of the new set of data compare with the median of the original set of data?

PANGASINAN STATE UNIVERSITY 7


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

LEARNING CONTENTS (MEASURES OF DISPERSION)

Lesson 2 : Measures of Dispersion


While measures of central tendency are used to estimate "normal" values of a dataset,
measures of dispersion are important for describing the spread of the data, or its variation around a
central value. Two distinct samples may have the same mean or median, but completely different
levels of variability, or vice versa.

A measure of dispersion or variability tells us how much the observations spread out from
the mean. The higher the variability, the more dispersed are the observations; the lower it is, the
more consistent are the observations.

For instance, consider a soft-drink dispensing machine that should dispense 8 oz of your
selection into a cup. Table 2.1 shows data for two of these machines. The mean data value for each
machine is 8 oz.

Table 2.1 Soda Dispensed (ounces)


Machine 1 Machine 2
9.52 8.01
6.41 7.99
10.07 7.95
5.85 8.03
8.15 8.02
̅ = 𝟖. 𝟎
𝒙 ̅ = 𝟖. 𝟎
𝒙

However, look at the variation in data values for Machine 1. The quantity of soda dispensed
is very inconsistent—in some cases the soda overflows the cup, and in other cases too little soda is
dispensed. The machine obviously needs adjustment. Machine 2, on the other hand, is working just
fine. The quantity dispensed is very consistent, with little variation.
This example shows that average values do not reflect the spread or dispersion of data. To
measure the spread or dispersion of data, we must introduce statistical values known as the range,
mean deviation, standard deviation, and the variance.

The Range
The simplest measure of dispersion is the range. It is the difference between the largest and
the smallest values in a data set.

Range
Range = Largest value – Smallest value

PANGASINAN STATE UNIVERSITY 8


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Mean Deviation
A defect of the range is that it is based on only two values, the highest and the lowest; it does
not take into consideration all of the values. The mean deviation does. It measures the mean amount
by which the values in a population, or sample, vary from their mean. In terms of a definition:
Mean Deviation is the arithmetic mean of the absolute values of the deviations from the arithmetic
mean.

Mean Deviation (MD)


Σ|𝑥−𝑥̅ |
𝑀𝐷 = 𝑛

where
𝑥 − 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
𝑥̅ − 𝑡ℎ𝑒 𝑎𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒𝑠
𝑛 − 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑛𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
|𝑎| − 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑣𝑎𝑙𝑒

The mean deviation has two advantages First, it uses all values in the computation while range
only uses the highest and lowest values. Second, it is easy to understand−it is the average amount by
which values deviate from the mean.

Example 1 The weighs of some containers being shipped to China are (thousands of
pounds):
95, 103, 105, 110, 104, 105, 112, 90

a. What is the range of the weights?


b. Compute the arithmetic mean weight .
c. Compute the mean deviation of the weights .

Solution
a. Range = Highest value – Lowest value
= 112 − 90
= 22 𝑡ℎ𝑜𝑢𝑠𝑎𝑛𝑑 𝑝𝑜𝑢𝑛𝑑𝑠
𝟗𝟓+𝟏𝟎𝟑+𝟏𝟎𝟓+𝟏𝟏𝟎+𝟏𝟎𝟒+𝟏𝟎𝟓+𝟏𝟏𝟐+𝟗𝟎
̅=
b. 𝒙
𝟖
824
= = 103 𝑡ℎ𝑜𝑢𝑠𝑎𝑛𝑑𝑠 𝑜𝑓 𝑝𝑜𝑢𝑛𝑑𝑠
8
|95−103|+|103−103|+|105−103|+|110−103|+|104−103|+|105−103|+|112−103|+|90−103|
c. 𝑀𝐷 = 8
42
= = 5.25
8

PANGASINAN STATE UNIVERSITY 9


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Your turn 1 Find the following using the number of ounces by Machine 1 and 2 in Table
2.1

a. range
b. mean deviation amount in ounces dispensed by each machine.

The Standard Deviation


The standard deviation of a set of numerical data makes use of the individual amount that
each data value deviates from the mean. These deviations, represented by (𝑥 − 𝑥̅ ), are positive when
the data value x is greater than the mean 𝑥̅ and are negative when x is less than the mean 𝑥̅ . The sum
of all the deviations (𝑥 − 𝑥̅ ), is 0 for all sets of data. This is shown in Table 2.2 for the Machine 2
data of Table 2.1.

Table 2.2 Machine 2 :Deviations from the Mean


𝒙 (𝒙 − 𝒙̅)
8.01 8.01 − 8 = 0.01
7.99 7.99 − 8 = −0.01
7.95 7.95 − 8 = −0.05
8.03 8.03 − 8 = 0.03
8.02 8.02 − 8 = 0.02
∑(𝑥 − 𝑥̅ ) = 0

Because the sum of all the deviations of the data values from the mean is always 0, we cannot use the
sum of the deviations as a measure of dispersion for a set of data. Instead, the standard deviation uses
the sum of the squares of the deviations.

Standard Deviation for Populations and Samples

If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a population of 𝑛 numbers with a mean of 𝜇 , then the standard


∑(𝑥−𝜇)2
deviation of the population is 𝜎 = √ 𝑛

If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a sample of 𝑛 numbers with a mean of 𝜇 , then the standard deviation


∑(𝑥−𝜇)2
of the sample is 𝑠 = √ 𝑛−1

You may question why a denominator of 𝑛 − 1 is used instead of n when we compute a


sample standard deviation. The reason is that a sample standard deviation is often used to estimate
the population standard deviation, and it can be shown mathematically that the use of 𝑛 − 1 tends to
yield better estimates.

PANGASINAN STATE UNIVERSITY 10


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Procedures for Computing a Standard Deviation

1. Determine the mean of the n numbers.


2. For each number, calculate the deviation (difference) between the number and the mean
of the numbers.
3. Calculate the square of each of the deviations and find the sum of these squared
deviations.
4. If the data is a population, then divide the sum by 𝑛. If the data is a sample, then divide
the sum by 𝑛 − 1.
5. Find the square root of the quotient in Step 4.

Example 2 The following numbers were obtained by sampling a population. 2, 4, 7,


12, 15 . Find the standard deviation of the sample.

Solution :
Step 1: Determine the mean .

2 + 4 + 7 + 12 + 15 40
𝑥̅ = = =8
5 5

Step 2: For each number , calculate the deviation between the number and the mean.

𝒙 ̅
𝒙−𝒙
2 2 − 8 = −6
4 4 − 8 = −4
7 7 − 8 = −1
12 12 − 8 = 4
15 15 − 8 = 7
𝑛=5

Step 3 : Calculate the square of each of the deviations in Step 2, and find the sum of these
squared deviations.
𝒙 𝒙−̅ 𝒙 ̅ )𝟐
(𝒙 − 𝒙
2 − 8 = −6 2
2 (−6) = 36
4 4 − 8 = −4 (−4)2 = 16
7 7 − 8 = −1 (−1)2 = 1
12 12 − 8 = 4 42 = 16
15 15 − 8 = 7 72 = 49
𝑛=5 ̅)𝟐 = 118
Σ(𝒙 − 𝒙 Sum of the squared deviations

Step 4 : Because we have a sample of 𝒏 = 𝟓 values, divide the sum 118 by 𝒏 − 𝟏, which is 4.

118
= 29.5
4

PANGASINAN STATE UNIVERSITY 11


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Step 5: The standard deviation of the sample is 𝒔 = √𝟐𝟗. 𝟓 . Thus the standard deviation is
𝒔 = 𝟓. 𝟒𝟑.

A student has the following quiz scores: 5, 8, 16, 17, 18, 20. Find the standard deviation
Your turn 2 for this population of quiz score

Variance
A statistic known as the variance is also used as a measure of dispersion. The variance for a
given set of data is the square of the standard deviation of the data.

Variances
If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a population of 𝑛 numbers with a mean of 𝜇 , then the variance of
∑(𝑥−𝜇)2
the population is 𝜎 2 =
𝑛

If 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 is a sample of 𝑛 numbers with a mean of 𝜇 , then the variance of the


∑(𝑥−𝜇)2
sample is 𝑠 2 = 𝑛−1

Example 3 Find the variance for the sample given in Example 2.

Solution

In Example 2 , we found 𝑠 = √𝟐𝟗. 𝟓 . The variance is the square of the standard deviation . Thus
the variance is 𝑠 2 = (√𝟐𝟗. 𝟓 )𝟐 = 𝟐𝟗. 𝟓

Find the variance for the population given in Your turn 2.


Your turn 3

PANGASINAN STATE UNIVERSITY 12


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Computer Solution
We can use spreadsheet like to find the range, standard deviation, and variance and the mode
of a certain data set.
Let us use the same list of data in Example 2.data are : 2, 4, 7, 12, 15

The formula is : The formula is The formula is The formula is The formula
= 𝑀𝐴𝑋(𝐴2: 𝐴6) : : : is :
− 𝑀𝐼𝑁(𝐴2: 𝐴6) = 𝐴𝑉𝐸𝐷𝐸𝑉(𝐴2: 𝐴6) = 𝑆𝑇𝐷𝐸𝑉(𝐴2: 𝐴6) = 𝑆𝑇𝐷𝐸𝑉𝑃(𝐴2: 𝐴6) = 𝑉𝐴𝑅(𝐴2: 𝐴6)

.
LEARNING POINTS
Measures of dispersion are important for describing the spread of the data, or its variation around a
central value. . To measure the spread or dispersion of data, we must compute for statistical values
known as the range, mean deviation, standard deviation, and the variance.

LEARNING ACTIVITY 2

In exercises 1 to 5. Compute the (a) range, (b) mean deviation ,(c) standard deviation, and (d) variance
for the following samples.

1. 6, 8, 3, 5, 6, 2, 7
2. 2, 8, 4, 2, 5, 8, 10, 1, 8, 12
3. 3, 4, 4, 6, 7, 9, 10, 13, 15, 16, 18, 21
4. 5.2, 11.7, 19.1, 3.7, 8.2, 16.3
5. 93, 67, 49, 55, 92, 87, 77, 66, 73, 96, 54
6. 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4

7. The study described in the In the News article presented the data on fl u-related deaths in several
age categories. Here is the complete set of data for one category.

Number of Annual Flu-Related Deaths in People


Aged 0 to 19, United States, 1976/1977–2006/2007

PANGASINAN STATE UNIVERSITY 13


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Source: Centers for Disease Control and Prevention

Find the range, the mean, and the population standard deviation of the data.

In exercises 8 to 10 . The following tables list the ages of female and male actors when they starred
in their Oscar-winning Best Actor performances.

Ages of Best Female Actor Award Recipients,


Academy Awards, 1975–2010

Ages of Best Male Actor Award Recipients,


Academy Awards, 1975–2010

8. Find the mean and the sample standard deviation of the ages of the female recipients. Round each
result to the nearest tenth.

9. Find the mean and the sample standard deviation of the ages of the male recipients. Round each
result to the nearest tenth.

10. Which of the two data sets has the larger mean? Which of the two data sets has the larger
standard
deviation?

LEARNING CONTENTS ( MEASURES OF RELATIVE POSITION)

Lesson 3: Measures of Relative Position


The measures of relative position of a given value shows where the value stands in relation
position of a given value to other values in the same set of data. The most common measures of relative
position are quartiles, 𝒅𝒆𝒄𝒊𝒍𝒆𝒔, percentiles , and standard scores .

PANGASINAN STATE UNIVERSITY 14


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Quartiles divide a set of observations into four equal parts. To explain further, think of any
set of values arranged from smallest to largest . The first quartile , usually labelled 𝑸𝟏 , is the value
below which 25 percent of the observations occur , and the third quartile , usually labelled 𝑸𝟑 , is the
value below which 75 percent of the observations occur . Logically 𝑄2 , is the median .

In a similar fashion deciles divide a set of observation s into 10 equal parts. So if you found
that your GPA was in the 8th decile or 𝐷10 at your class , you could conclude that 80 percent of your
classmates had a GPA lower than yours and 20 percent has a higher GPA .

The last one is the percentiles which divides observations into 100 equal parts. For instance, a
GPA of 33rd percentile means that 33 percent of the students have a lower GPA and 67 percent have
a higher GPA.

Quartiles , Deciles , Percentiles

Quartile Decile Percentile


𝑖(𝑛+1) 𝑖(𝑛+1) 𝑖(𝑛+1)
𝑄𝑛 = 𝐷𝑛 = 𝑃𝑛 = 100
4 10
where
𝑛 − number of observations
𝑖 – desired location

Example 1 Find 3rd quartile for the following data.


5, 7, 11, 1, 17, 23, 19, 3, 9, 21, 15 and 13

Solution:

First thing to do is arrange the data in ascending order.

1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 and 23

For 3rd quartile:

i ( n + 1) 3(12 + 1)
Q3 = = = 9.75th position → 9 th position + .75 * (10th − 9 th ) position
4 4

= 17 + .75 ∗ (19 − 17)

= 18.5

After you arranged the data in ascending order, you count what number falls under the 9.75th position.
To get the 9.75th position, we have to interpolate from the given data. The 9.75th position is interpolated

PANGASINAN STATE UNIVERSITY 15


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management


th
from the 9 position plus .75 (10𝑡ℎ − 9𝑡ℎ). The value of the third quartile is equal to 18.5.

Your turn1 Using the same data in Example 1. Find the 4th decile.

Standard Scores (or the 𝒛 − 𝒔𝒄𝒐𝒓𝒆)


The 𝒛 − 𝒔𝒄𝒐𝒓𝒆 for a given data value x is the number of standard deviations that x is
above or below the mean of the data.

𝒛 − 𝒔𝒄𝒐𝒓𝒆

The following formulas show how to calculate the z-score for a data value x in a
population and in a sample.
𝑥−𝜇 𝑥−𝑥̅
Population : 𝑧𝑥 = Sample : 𝑧𝑥 =
𝜎 𝑠

A negative 𝑧 − 𝑠𝑐𝑜𝑟𝑒 represents a value less than the mean . A positive 𝑧 − 𝑠𝑐𝑜𝑟𝑒 represents
a value greater than the mean . When 𝑧 = 0 , the data value is equal to the mean.

A 𝑧 − 𝑠𝑐𝑜𝑟𝑒 equal to 1 represents a value that is 1 standard deviation above the mean ; a
𝑧 − 𝑠𝑐𝑜𝑟𝑒 equal to −1 represents an element that is 1 standard deviation below the mean . If the
number of elements in the data set is large , about 68% of the elements have 𝑧 − 𝑠𝑐𝑜𝑟𝑒𝑠 between
−1 and 1. About 95% have 𝑧 − 𝑠𝑐𝑜𝑟𝑒𝑠 between −2 and 2 and about 99% have 𝑧 − 𝑠𝑐𝑜𝑟𝑒𝑠 between
−3 and 3 .

Example 2 Andrew gets a score of 64 in the Mathematics test where the class mean is 50
with standard deviation of 8. Belle gets a score of 74 in a Physics test where
the mean is 58 and the standard deviation is 10 . Find out who actually performed better.

Solution
Find the z-score for each test.
64−50 74−58
Andrew : 𝑧64 = 8 = 1.75 Belle: 𝑧74 = = 1.6
10

So although Belle’s score is higher, Andrew’s score is farther above the mean and we may say that it
is Andrew who performed better.

Your turn2 Cheryl has taken two quizzes in her history class. She scored 15 on the first
quiz, for which the mean of all scores was 12 and the standard deviation was

PANGASINAN STATE UNIVERSITY 16


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

2.4. Her score on the second quiz, for which the mean of all scores was 11 and the standard deviation
was 2.0, was 14. In comparison to her classmates, did Cheryl do better on the first quiz or the second
quiz?

Example 3 A consumer group tested a sample of 100 light bulbs. It found that the mean
life expectancy of the bulbs was 842 h, with a standard deviation of 90. One
particular light bulb from the DuraBright Company had a 𝑧 − 𝑠𝑐𝑜𝑟𝑒 of 1.2. What was the life span of
this light bulb?

Solution

Substitute the given values into the 𝑧 − 𝑠𝑐𝑜𝑟𝑒 equation and solve for 𝑥.
𝑥 − 𝑥̅
𝑧𝑥 =
𝑠
𝑥−842
1.2 =
90

108 = 𝑥 − 842 Solve for 𝑥.


950 = 𝑥
The light bulb had a life span of 950 h.

Your turn3 Roland received a score of 70 on a test for which the mean score was 65.5.
Roland has learned that the z-score for his test is 0.6. What is the standard
deviation for this set of test scores?

LEARNING POINTS
The measures of relative position of a given value shows where the value stands in relation position
of a given value in relation to other values in the same set of data. The most common measures of
relative position are quartiles, 𝒅𝒆𝒄𝒊𝒍𝒆𝒔, percentiles , and standard scores

LEARNING ACTIVITY 3

In exercises 1 to 5. A data set has a mean of 𝑥̅ = 212 and a standard deviation of 40. Find the
𝑧 −score for each of the following.

PANGASINAN STATE UNIVERSITY 17


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

1. 𝑥 = 200
2. 𝑥 = 224
3. 𝑥 = 300
4. 𝑥 = 100
A data set has a mean of 𝑥̅ = 4010 and a standard deviation of 115. Find the z-score for each of the
following.
5. 𝑥 = 3840
6. 𝑥 = 4200
7. 𝑥 = 4300
8. 𝑥 = 4030

In exercises 9 to 10 . A random sample of 1000 oranges showed that the mean amount of juice per
orange was 7.4 fluid ounces, with a standard deviation of 1.1 fluid ounces.

9. Determine the z-score, to the nearest hundredth, of an orange that produced 6.6 fl uid ounces of
juice.
10. The z-score for one orange was 3.15. How much juice was produced by this orange? Round to
the nearest tenth of a fluid ounce.

11. Which of the following fitness scores is the highest relative score?
a. A score of 42 on a test with a mean of 31 and a standard deviation of 6.5
b. A score of 1140 on a test with a mean of 1080 and a standard deviation of 68.2
c. A score of 4710 on a test with a mean of 3960 and a standard deviation of 560.4

In exercises 12 to 14. The following scores were received by 20 accounting students in a short quiz:
10, 9, 15, 20, 13, 15, 18, 11, 7, 12, 15, 13, 18, 19, 12, 8, 10, 13, 17, and 15. Find the following :
12. third quartile,
13. eight decile and
14. forty percentile.

15. Rene scored at the 84th percentile on a test given to 12,600 students. How many students scored
higher than Rene?

PANGASINAN STATE UNIVERSITY 18


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

LEARNING CONTENTS (NORMAL DISTRIBUTION)

Lesson 4: Normal Distribution


Frequency Distributions and Histograms
A frequency distribution displays a data set by dividing the data into, intervals , or classes ,
and listing that number of data values that fall into each interval .

Large sets of data are often displayed using a grouped frequency distribution or a
histogram. For instance, consider the following situation. An Internet service provider (ISP) has
installed new computers. To estimate the new download times its subscribers will experience, the
ISP surveyed 1000 of its subscribers to determine the time required for each subscriber to download
a particular file from an Internet site. The results of that survey are summarized in Table 41.

Table 4.1 A grouped Frequency Distribution


with 12 Classes
Download Number of
time subscribers
(in seconds)
0−5 6
5 − 10 17
10 − 15 43
15 − 20 92
20 − 25 151
25 − 30 192
30 − 35 190
35 − 40 149
40 − 45 90
45 − 50 45
50 − 55 15 Figure 4.1 A histogram for the frequency distribution in Table
55 − 60 10 4.1

Table 4.1 is called a grouped frequency distribution. It shows how often (frequently) certain
events occurred. Each interval, 0– 5, 5– 10, and so on, is called a class. This distribution has 12 classes.
For the 10–15 class, 10 is the lower class boundary and 15 is the upper class boundary. Any data
value that lies on a common boundary is assigned to the higher class. The graph of a frequency
distribution is called a histogram. A histogram provides a pictorial view of how the data are
distributed. In Figure 4.1, the height of each bar of the histogram indicates how many subscribers
experienced the download times shown by the class on the base of the bar.
Examine the distribution in Table 4.2 below. It shows the percent of subscribers that are in
each class, as opposed to the frequency distribution in Table 4.1 , which shows the number of
customers in each class. The type of frequency distribution that lists the percent of data in each class
is called a relative frequency distribution.

PANGASINAN STATE UNIVERSITY 19


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

To convert a frequency distribution to a relative frequency distribution, each of the class


frequencies is divided by the total number of observations and multiply it by 100.

The relative frequency histogram in Figure 4.2 was drawn by using the data in the relative
frequency distribution. It shows the percent of subscribers along its vertical axis.

Table 4.2 A Relative Frequency Distribution

Download Percent of
time subscribers
(in seconds)
0−5 0.6
5 − 10 1.7
10 − 15 4.3
15 − 20 9.2
20 − 25 15.1
25 − 30 19.2
30 − 35 19.0
35 − 40 14.9
40 − 45 9.0
45 − 50 4.5
50 − 55 1.5
55 − 60 1.0 Figure 4.2 A relative frequency
histogram

One advantage of using a relative frequency distribution instead of a grouped frequency


distribution is that there is a direct correspondence between the percent values of the relative frequency
distribution and probabilities. For instance, in the relative frequency distribution in Table 13.8, the
percent of the data that lies between 35 and 40 s is 14.9%. Thus, if a subscriber is chosen at random,
the probability that the subscriber will require at least 35 s but less than 40 s to download the music
file is 0.149.

Example 1 Use the relative frequency distribution in Table 4.2 to determine the

a. percent of subscribers who required at least 25 s to download the fi le.


b. probability that a subscriber chosen at random will require at least 5 but less than 20 s to
download the file.

Solution
a. The percent of subscribers who required at least 25 s to download the file is 69.1%.

PANGASINAN STATE UNIVERSITY 20


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Table 4.3
Download Percent of
time subscribers
(in seconds)
0−5 0.6 The percent of data in all the classes with a lower boundary
5 − 10 1.7 of at least 5 s and an upper boundary of 20 s or less, the
sum is 15.2%
10 − 15 4.3
15 − 20 9.2
20 − 25 15.1
25 − 30 19.2
30 − 35 19.0 The percent of data in all the classes with a lower
35 − 40 14.9 boundary of 25 s or more. the sum 69.1%
40 − 45 9.0
45 − 50 4.5
50 − 55 1.5
55 − 60 1.0

b. The percent of subscribers who required at least 5 but less than 20 s to download the fi le is
15.2%. The probability that a subscriber chosen at random will require at least 5 but less than 20 s to
download the file is 0.152.

Use the relative frequency distribution in Table 4.2 to determine the


Your turn1

a. percent of subscribers who required less than 25 s to download the file.


b. probability that a subscriber chosen at random will require at least 10 s but less than 30 s to
download the file

Normal Distributions and the Empirical Rule


One of the most important statistical distributions of data is known as a normal distribution.
This distribution occurs in a variety of applications. Types of data that may demonstrate a normal
distribution include the lengths of leaves on a tree, the weights of newborns in a hospital, the lengths
of time of a student’s trip from home to school over a period of months, the SAT scores of a large
group of students, and the life spans of light bulbs.

A normal distribution forms a bell-shaped curve that is symmetric about a vertical line through
the mean of the data. A graph of a normal distribution with a mean of 5 is shown below.

PANGASINAN STATE UNIVERSITY 21


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Properties of a Normal Distribution

Every normal distribution has the following properties.


▪ The graph is symmetric about a vertical line through the mean of the distribution.
▪ The mean, median, and mode are equal.
▪ The y-value of each point on the curve is the percent (expressed as a decimal) of the
data at the corresponding x-value.
▪ Areas under the curve that are symmetric about the mean are equal.
▪ The total area under the curve is 1.

In the normal distribution shown below, the area of the shaded region is 0.159 units. This region
represents the fact that 15.9% of the data is greater than or equal to 10. Because the area under the
curve is 1, the unshaded region under the curve has area 1 − 0.159, or 0.841, representing the fact
that 84.1% of the data are less than 10.

The following rule, called the Empirical Rule, describes the percent of data that lie within 1, 2,
and 3 standard deviations of the mean in a normal distribution.
Empirical Rule for a Normal Distribution

In a normal distribution, approximately

▪ 68% of the data lie within 1 standard deviation of the mean.


▪ 95% of the data lie within 2 standard deviations of the mean.
▪ 99.7% of the data lie within 3 standard deviations of the mean. Ans

PANGASINAN STATE UNIVERSITY 22


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Example 2 Use the Empirical Rule to Solve an Application


A survey of 1000 U.S. gas stations found that the price charged for a gallon of
regular gas could be closely approximated by a normal distribution with a mean of $3.10 and a
standard deviation of $0.18. How many of the stations charge

a. between $2.74 and $3.46 for a gallon of regular gas?


b. less than $3.28 for a gallon of regular gas?
c. more than $3.46 for a gallon of regular gas?

Solution
2.74−3.10
a. Converting $2.74 into a z-score , 𝑧2.74 = = −2 , means that $2.74 per gallon price is 2
0.18
3.46−3.10
standard deviations below the mean. While the $3.46 price , 𝑧3.46 = 0.18 = 2 , thus $3.46
price is 2 standard deviations above the mean. In a normal distribution, 95% of all data lie within 2
standard deviations of the mean. See Figure 4.3 . Therefore, approximately
(95%)(1000) = (0.95)(1000) = 950 of the stations charge between $2.74 and $3.46 for a
gallon of regular gas.

Figure 4.3

3.28−3.10
b. Converting $3.28 price into a z-score , 𝑧3.28 = 0.18 we can say that $3.28 price is 1 standard
deviation above the mean. See Figure 4.4 . In a normal distribution, 34% of all data lie between the
mean and 1 standard deviation above the mean. Thus, approximately
(34%)(1000)= (0.34)(1000)= 340 of the stations charge between $3.10 and $3.28 for a
gallon of regular gasoline. Half of the 1000 stations, or 500 stations, charge less than the mean.
Therefore, about
340 − 500 = 840 of the stations charge less than $3.28 for a gallon of regular gas.

Figure 4.4

PANGASINAN STATE UNIVERSITY 23


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

3.46−3.10,
c. Converting $3.46 price in a z-score , 𝑧3.46 = 0.18 will give us a result of 2 standard deviations
above the mean. In a normal distribution, 95% of all data are within 2 standard deviations of the mean.
This means that the other 5% of the data will lie either above 2 standard deviations of the mean or
below 2 standard deviations of the mean. We are interested only in the data that are more than 2
1
standard deviations above the mean, which is 2 of 5%, or 2.5%, of the data. See Figure 4.5. Thus
about (2.5%)(1000) = (0.025)(1000)= 25 of the stations charge more than $3.46 for a gallon of regular
gas.

Figure 4.5

Your turn2
A vegetable distributor knows that during the month of August, the weights of
its tomatoes are normally distributed with a mean of 0.61 lb and a standard
deviation of 0.15 lb.
a. What percent of the tomatoes weigh less than 0.76 lb?
b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to weigh more than 0.31 lb?
c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to weigh from 0.31 lb to
0.91 lb

The Standard Normal Distribution


It is often helpful to convert data values x to z-scores, as we did in the previous section by
using the z-score formulas:

𝑥−𝜇 𝑥−𝑥̅
𝑧𝑥 = or 𝑧𝑥 =
𝜎 𝑠
If the original distribution of 𝑥 values is a normal distribution, then the corresponding distribution of
z-scores will also be a normal distribution. This normal distribution of z-scores is called the standard
normal distribution. See Figure 4.6. It has a mean of 0 and a standard deviation of 1.

PANGASINAN STATE UNIVERSITY 24


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Figure 4.6

The Standard Normal Distribution


The standard normal distribution is the normal distribution that has a mean of 0 and a
standard deviation of 1.

Tables and calculators are often used to determine the area under a portion of the standard normal
curve. We will refer to this type of area as an area of the standard normal distribution. Table 13.10
gives the approximate areas of the standard normal distribution between the mean 0 and z standard
deviations from the mean. See Figure 4.7. Table 4.4 indicates that the area A of the standard normal
distribution from the mean 0 up to z = 1.34 is 0.410 square unit.

Figure 4.7

PANGASINAN STATE UNIVERSITY 25


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

TABLE 4.4
Area Under the Standard Normal Curve
z A z A z A z A z A z A
0.00 0.000 0.56 0.212 1.12 0.369 1.68 0.454 2.24 0.487 2.80 0.497
0.01 0.004 0.57 0.216 1.13 0.371 1.69 0.454 2.25 0.488 2.81 0.498
0.02 0.008 0.58 0.219 1.14 0.373 1.70 0.455 2.26 0.488 2.82 0.498
0.03 0.012 0.59 0.222 1.15 0.375 1.71 0.456 2.27 0.488 2.83 0.498
0.04 0.016 0.60 0.226 1.16 0.377 1.72 0.457 2.28 0.489 2.84 0.498
0.05 0.020 0.61 0.229 1.17 0.379 1.73 0.458 2.29 0.489 2.85 0.498
0.06 0.024 0.62 0.232 1.18 0.381 1.74 0.459 2.30 0.489 2.86 0.498
0.07 0.028 0.63 0.236 1.19 0.383 1.75 0.460 2.31 0.490 2.87 0.498
0.08 0.032 0.64 0.239 1.20 0.385 1.76 0.461 2.32 0.490 2.88 0.498
0.09 0.036 0.65 0.242 1.21 0.387 1.77 0.462 2.33 0.490 2.89 0.498
0.10 0.040 0.66 0.245 1.22 0.389 1.78 0.462 2.34 0.490 2.90 0.498
0.11 0.044 0.67 0.249 1.23 0.391 1.79 0.463 2.35 0.491 2.91 0.498
0.12 0.048 0.68 0.252 1.24 0.393 1.80 0.464 2.36 0.491 2.92 0.498
0.13 0.052 0.69 0.255 1.25 0.394 1.81 0.465 2.37 0.491 2.93 0.498
0.14 0.056 0.70 0.258 1.26 0.396 1.82 0.466 2.38 0.491 2.94 0.498
0.15 0.060 0.71 0.261 1.27 0.398 1.83 0.466 2.39 0.492 2.95 0.498
0.16 0.064 0.72 0.264 1.28 0.400 1.84 0.467 2.40 0.492 2.96 0.498
0.17 0.067 0.73 0.267 1.29 0.401 1.85 0.468 2.41 0.492 2.97 0.499
0.18 0.071 0.74 0.270 1.30 0.403 1.86 0.469 2.42 0.492 2.98 0.499
0.19 0.075 0.75 0.273 1.31 0.405 1.87 0.469 2.43 0.492 2.99 0.499
0.20 0.079 0.76 0.276 1.32 0.407 1.88 0.470 2.44 0.493 3.00 0.499
0.21 0.083 0.77 0.279 1.33 0.408 1.89 0.471 2.45 0.493 3.01 0.499
0.22 0.087 0.78 0.282 1.34 0.410 1.90 0.471 2.46 0.493 3.02 0.499
0.23 0.091 0.79 0.285 1.35 0.411 1.91 0.472 2.47 0.493 3.03 0.499
0.24 0.095 0.80 0.288 1.36 0.413 1.92 0.473 2.48 0.493 3.04 0.499
0.25 0.099 0.81 0.291 1.37 0.415 1.93 0.473 2.49 0.494 3.05 0.499
0.26 0.103 0.82 0.294 1.38 0.416 1.94 0.474 2.50 0.494 3.06 0.499
0.27 0.106 0.83 0.297 1.39 0.418 1.95 0.474 2.51 0.494 3.07 0.499
0.28 0.110 0.84 0.300 1.40 0.419 1.96 0.475 2.52 0.494 3.08 0.499
0.29 0.114 0.85 0.302 1.41 0.421 1.97 0.476 2.53 0.494 3.09 0.499
0.30 0.118 0.86 0.305 1.42 0.422 1.98 0.476 2.54 0.494 3.10 0.499
0.31 0.122 0.87 0.308 1.43 0.424 1.99 0.477 2.55 0.495 3.11 0.499
0.32 0.126 0.88 0.311 1.44 0.425 2.00 0.477 2.56 0.495 3.12 0.499
0.33 0.129 0.89 0.313 1.45 0.426 2.01 0.478 2.57 0.495 3.13 0.499
0.34 0.133 0.90 0.316 1.46 0.428 2.02 0.478 2.58 0.495 3.14 0.499
0.35 0.137 0.91 0.319 1.47 0.429 2.03 0.479 2.59 0.495 3.15 0.499
0.36 0.141 0.92 0.321 1.48 0.431 2.04 0.479 2.60 0.495 3.16 0.499
0.37 0.144 0.93 0.324 1.49 0.432 2.05 0.480 2.61 0.495 3.17 0.499
0.38 0.148 0.94 0.326 1.50 0.433 2.06 0.480 2.62 0.496 3.18 0.499
0.39 0.152 0.95 0.329 1.51 0.434 2.07 0.481 2.63 0.496 3.19 0.499
0.40 0.155 0.96 0.331 1.52 0.436 2.08 0.481 2.64 0.496 3.20 0.499
0.41 0.159 0.97 0.334 1.53 0.437 2.09 0.482 2.65 0.496 3.21 0.499
0.42 0.163 0.98 0.336 1.54 0.438 2.10 0.482 2.66 0.496 3.22 0.499
0.43 0.166 0.99 0.339 1.55 0.439 2.11 0.483 2.67 0.496 3.23 0.499
0.44 0.170 1.00 0.341 1.56 0.441 2.12 0.483 2.68 0.496 3.24 0.499
0.45 0.174 1.01 0.344 1.57 0.442 2.13 0.483 2.69 0.496 3.25 0.499
0.46 0.177 1.02 0.346 1.58 0.443 2.14 0.484 2.70 0.497 3.26 0.499
0.47 0.181 1.03 0.348 1.59 0.444 2.15 0.484 2.71 0.497 3.27 0.499
0.48 0.184 1.04 0.351 1.60 0.445 2.16 0.485 2.72 0.497 3.28 0.499
0.49 0.188 1.05 0.353 1.61 0.446 2.17 0.485 2.73 0.497 3.29 0.499
0.50 0.191 1.06 0.355 1.62 0.447 2.18 0.485 2.74 0.497 3.30 0.500
0.51 0.195 1.07 0.358 1.63 0.448 2.19 0.486 2.75 0.497 3.31 0.500
0.52 0.198 1.08 0.360 1.64 0.449 2.20 0.486 2.76 0.497 3.32 0.500
0.53 0.202 1.09 0.362 1.65 0.451 2.21 0.486 2.77 0.497 3.33 0.500

PANGASINAN STATE UNIVERSITY 26


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

0.54 0.205 1.10 0.364 1.66 0.452 2.22 0.487 2.78 0.497
0.55 0.209 1.11 0.367 1.67 0.453 2.23 0.487 2.79 0.497

Because the standard normal distribution is symmetrical about the mean of 0, we can also use Table
4.4 to find the area of a region that is located to the left of the mean.

Example 3 Find the area of the standard normal distribution between z =1.44 and
z =0.

Solution
Because the standard normal distribution is symmetrical about the center line 𝑧 = 0, the area of the
standard normal distribution between 𝑧 = −1.44 and 𝑧 = 0 is equal to the area between 𝑧 = 0 and
𝑧 = 1.44. See Figure 13.9. The entry in Table 4.4 associated with 𝑧 = 1.44 is 0.425. Thus the area of
the standard normal distribution between 𝑧 = −1.44 and 𝑧 = 0 is 0.425 square unit. See figure 4.8.

Figure 4.8 Symmetrical region

Your turn3 Find the area of the standard normal distribution between 𝑧 = −0.67 and 𝑧 =
0.

In Figure 4.9 , the region to the right of 𝑧 = 0.82 is called a tail region. A tail region is a region of
the standard normal distribution to the right of a positive 𝑧 −value or to the left of a negative 𝑧 −value.
To find the area of a tail region, we subtract the entry in Table 4.4 from 0.500. This procedure is
illustrated in the next example.

Example 4 Find the area of the standard normal distribution to the right of 𝒛 =
𝟎. 𝟖𝟐.

Solution
Table 4.4 indicates that the area from 𝑧 = 0 to 𝑧 = 0.82 is 0.294 square unit. The area to the right
of 𝑧 = 0 is 0.500 square unit. Thus the area to the right of 𝑧 = 0.82 is 0.500 − 0.294 = 0.206
square unit. See Figure 13.10.

PANGASINAN STATE UNIVERSITY 27


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Figure 4.8 Area of a tail region

Your turn4 Find the area of the standard normal distribution to the left of 𝒛 =
−𝟏. 𝟒𝟕.

The Standard Normal Distribution, Areas, Percentages, and Probabilities


In the standard normal distribution, the area of the distribution from 𝑧 = 𝑎 to 𝑧 = 𝑏
represents
▪ the percentage of 𝑧 −values that lie in the interval from a to b.
▪ the probability that 𝑧 lies in the interval from a to b.

Because the area of a portion of the standard normal distribution can be interpreted as a
percentage of the data or as a probability that the variable lies in an interval, we can use the standard
normal distribution to solve many application problems.

Example 5 A soda machine dispenses soda into 12-ounce cups. Tests show that the
actual amount of soda dispensed is normally distributed, with a mean of
11.5 oz and a standard deviation of 0.2 oz.

a. What percent of cups will receive less than 11.25 oz of soda?


b. What percent of cups will receive between 11.2 oz and 11.55 oz of soda?
c. If a cup is chosen at random, what is the probability that the machine will overflow the cup?

Solution
a. Recall that the formula for the 𝑧 −score for a data value 𝑥 is
𝑥−𝑥̅
𝑧𝑥 =
𝑠

The 𝑧 − score for 11.25 oz is

11.25 − 11.5
𝑧11.25 = = −1.25
0.2
Table 4.4 indicates that 0.394 (39.4%) of the data in a normal distribution are between 𝑧 = 0 and
𝑧 = 1.25. Because the data are normally distributed, 39.4% of the data is also between 𝑧 = 0 and

PANGASINAN STATE UNIVERSITY 28


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

𝑧 = 1.25. The percent of data to the left of 𝑧 = −1.25 is 50% − 39.4% = 10.6%. See Figure 4.9 .
Thus 10.6% of the cups filled by the soda machine will receive less than 11.25 oz of soda.

Figure 4.9 Portion of data to the left of 𝑧 =


−1.25
b. The 𝑧 −score for 11.55 ounces is

11.55−11.5
𝑧11.55 = = 0.25
0.2

Table 4.4 indicates that 0.099 (9.9%) of the data in a normal distribution is between 𝑧 = 0 and 𝑧 =
0.25.
The z-score for 11.2 oz is

11.2 − 11.5
𝑧11.2 = = −1.5
0.2

Table 4.4 indicates that 0.433 (43.3%) of the data in a normal distribution are between 𝑧 = 0 and
𝑧 = 1.5. Because the data are normally distributed, 43.3% of the data is also between 𝑧 = 0 and
𝑧 = −1.5. See Figure 13.12. Thus the percent of the cups that the vending machine will fi ll with
between 11.2 oz and 11.55 oz of soda is 43.3% + 9.9% = 53.2%.

Figure 4.10 Portion of data between two 𝑧 −scores

c. A cup will overflow if it receives more than 12 oz of soda. The 𝑧 −score for 12 oz is
12−11.5
𝑧12 = = 2.5
0.2

Table 4.4 indicates that 0.494 (49.4%) of the data in the standard normal distribution are between
𝑧 = 0 and 𝑧 = 2.5. The percent of data to the right of 𝑧 = 2.5 is determined by subtracting 49.4%

PANGASINAN STATE UNIVERSITY 29


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

from 50%. See Figure 4.11. Thus 0.6% of the time the machine produces an overflow, and the
probability that a cup chosen at random will overflow is 0.006.

Figure 4.11 Portion of data to the right of 𝑧 =


2.5

A study of the careers of professional football players shows that the


Your turn5
lengths of their careers are nearly normally distributed, with a mean of
6.1 years and a standard deviation of 1.8 years.

a. What percent of professional football players have a career of more than 9 years?
b. If a professional football player is chosen at random, what is the probability that the player will
have a career of between 3 and 4 years?

LEARNING POINTS
A frequency distribution displays a data set by dividing the data into, intervals , or classes , and
listing that number of data values that fall into each interval . Large sets of data are often
displayed using a grouped frequency distribution or a histogram.
A normal distribution forms a bell-shaped curve that is symmetric about a vertical line through
the mean of the data.
Empirical Rule for a Normal Distribution

In a normal distribution, approximately

68% of the data lie within 1 standard deviation of the mean.


95% of the data lie within 2 standard deviations of the mean.
99.7% of the data lie within 3 standard deviations of the mean. A

PANGASINAN STATE UNIVERSITY 30


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

LEARNING ACTIVITY 4

In exercises 1 to 3. Use the Empirical Rule to answer each question. In a normal distribution, what
percent of the data lie
1. within 3 standard deviations of the mean?
2. below 2 standard deviations of the mean?
3. between 2 standard deviations below the mean and 3 standard deviations above the mean?

In exercises 4 to 5. Use the Empirical Rule to answer each question. A baseball franchise finds that
the attendance at its home games is normally distributed, with a mean of 16,000 and a standard
deviation of 4000.
4. What percent of the home games have an attendance between 12,000 and 20,000 people?
5. What percent of the home games have an attendance of fewer than 8000 people?

In exercises 6 to 9, find the area, to the nearest thousandth, of the standard normal distribution
between the given z-scores.
6. The region where 𝑧 = 1.92
7. The region where 𝑧 < −0.38
8. The region where 𝑧 < 1.82
9. The region where 𝑧 < 1.92

In exercises 10 to 13 . Find the area, to the nearest thousandth, of the standard normal distribution
between the given z-scores.
10. 𝑧 = 0 and 𝑧 = 1.9
11. 𝑧 = 0 and 𝑧 = −2.3
12. 𝑧 = 0.7 and 𝑧 = 1.92
13. 𝑧 = −0.44 and 𝑧 = 1.82

In exercises 14 to 15. A psychologist finds that the intelligence quotients of a group of patients are
normally distributed, with a mean of 102 and a standard deviation of 16. Find
the percent of the patients with IQs

14. above 114.


15. between 90 and 118.

LEARNING CONTENTS ( CORRELATION AND REGRESSION)

Lesson 5 : Regression and Correlation


Correlation is a degree of relationship between variables, which seeks to determine how well
a linear or other equation describes or explains the relationship between variables. It also implies
“association” between two variables.

PANGASINAN STATE UNIVERSITY 31


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT


The Pearson product-moment correlation coefficient (or Pearson r for short) is a
measure of the strength of a linear association between two variables with interval and ratio
type of scale.
N  xy −  x y
r=
 (
N  x2 −  x N  y 2 −  y
2
)  2
( )
where :
 x = sum of the values of x
 y = sum of the values of y
 x = sum of the values of the square of x
2

 y = sum of the values of the square y


2

 xy = sum of the values of the product of x and y


n = total number of pair

If the linear correlation coefficient 𝒓 is positive, the relationship between the variables has a positive
correlation. In this case, if one variable increases, the other variable also tends to increase. If 𝒓 is
negative, the linear relationship between the variables has a negative correlation. In this case, if one
variable increases, the other variable tends to decrease.

Figure 5.1 shows some scatter diagrams along with the type of linear correlation that exists between
the 𝒙 and 𝒚 variables. The closer |𝒓| is to 1, the stronger the linear relationship between the
variables.

Figure 5.1 Linear Correlation

PANGASINAN STATE UNIVERSITY 32


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

The arbitrary scale for the interpretation of 𝒓 is given below.

Range of computed r Interpretation


± 1.0 Perfect Relationship
± 0.70 to 0.99 Strong/ High Relationship
± 0.40 to 0.69 Moderate Relationship
± 0.10 to 0.39 Slight/ Low Relationship
0 No Correlation

Example 1 Below are the scores of 12 college students in Mathematics and Physics tests
of 80 items each.

Table 5.1
Mathematics (𝑥) 65 63 67 64 68 62 70 66 68 67 69 71
Physics (𝑦) 68 66 68 65 69 66 68 65 71 67 68 70

a. Draw a scatter diagram


b. Find the correlation coefficient of Mathematics and Physics scores and interpret.

Solution

Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend stop analysis,
conclude “no relationship”. Otherwise proceed to step number 2
72
71
70
69
68
67
66
65
64
60 62 64 66 68 70 72

The scatter plot indicates an upward linear trend between Mathematics and Physics proficiency.
Thus, “there is a reason to believe that they are related.”
Step 2: Compute for Pearson 𝑟 by rearranging the given in columns.

PANGASINAN STATE UNIVERSITY 33


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Table 5.2
Mathematics xy
Number
(𝒙)
Physics (𝒚) x2 y2
1 65 68 4225 4624 4420
2 63 66 3969 4356 4158
3 67 68 4489 4624 4556
4 64 65 4096 4225 4160
5 68 69 4624 4761 4692
6 62 66 3844 4356 4092
7 70 68 4900 4624 4760
8 66 65 4356 4225 4290
9 68 71 4624 5041 4828
10 67 67 4489 4489 4489
11 69 68 4761 4624 4692
12 71 70 5041 4900 4970
N = 12  x =800  y =811 x 2
=53418 y 2
=54849  xy =54107

r=
(12)(54107) − (800)(811)
(12)(53418) − (800)2 (12)(54849) − (811)2 
r = 0.70

Referring to the arbitrary scale for the interpretation of 𝒓 = 𝟎. 𝟕𝟎, it states that there is a strong/
high positive relationship between the scores of the students in Mathematics and Physics.

Your turn1 Find the linear correlation coefficient for stride length versus speed of a camel
as given in Table 5.3 and interpret the result. Round your result to the nearest
hundredth.

Table 5.3
Stride length(m) 2.5 3.0 3.2 3.4 3.5 3.8 4.0 4.2
Speed (m/s) 2.3 3.9 4.4 5.0 5.5 6.2 7.1 7.6

PANGASINAN STATE UNIVERSITY 34


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

LINEAR REGRESSION
Regression is a term used to describe the process of estimating the relationship between two
variables. The relationship is estimated by fitting a straight line through the given data. The method
of least squares permits us to find a line of best fit called regression line which keeps the errors of
prediction to a minimum.

The equation for a fitted line is:


Y = a + bx
where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted

To find the slope b: To find the value of 𝒂:


N  xy −  x y
b=
N  x 2 − ( x )
2

𝑎 = 𝑦̅ − 𝑏𝑥̅
where :
where : 𝑦̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑌
 x = sum of the values of x 𝑥̅ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑋
 y = sum of the values of y
 x = sum of the values of the square of x
2

 xy = sum of the values of the product of x and y


n= total number of pairs

Example 2 Find the regression line equation of Table 5.2 and predict the score in
Physics (𝑦) if the score in Mathematics (𝑥) of the student is 75.
Solution
Formulate the regression line equation by solving first the value of the variables b and a.
Solving for 𝒃

b=
(12)(54107) − (800)(811)
b = 0.48
(12)(53418) − (800)2
Solving for 𝒂

a = 67.58 − (0.48)(66.67) a = 35.59

PANGASINAN STATE UNIVERSITY 35


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

Substitute the computed values of b and a to the regression line equation

Y = a + bx
y = 35.59 + 0.48x regression line equation

We can now estimate scores in Physics (𝑦) using the regression line equation by substituting
a value or score in Mathematics (𝑥). Say for instance, if x is equal to 75, then solving for y will give a
71.59.

y = 35.59 + 0.48(75)
y = 71.59
Therefore, the estimated score in Physics is 71.59 or approximately equivalent to 72 if the
score in Mathematics is 75. The regression line equation may be used now in estimating scores for y
by substituting a value of 𝑥.

Find the regression line equation of Table 5.3 and predict the speed of a camel
(𝑦) if the stride length(𝑥) of the camel is 5.0.
Your turn2

Computer Solution
Using the data on the scores of 12 college students in Mathematics and Physics tests of 80
items (Table 5.1), the following screenshot shows 𝑟 for the 12 paired values (occupying cells A2 −
A13 and cells B2 − B13) as calculated by the spreadsheet’s built in PEARSON() ,INTERCEPT(),
SLOPE()function.

The formula is : The formula is : The formula is :


= 𝑃𝐸𝐴𝑅𝑆𝑂𝑁(𝐴2: 𝐴13, 𝐵2: 𝐵13) = 𝐼𝑁𝑇𝐸𝑅𝐶𝐸𝑃𝑇(𝐵2: 𝐵13, 𝐴2: 𝐴13) = 𝑆𝐿𝑂𝑃𝐸(𝐵2: 𝐵13, 𝐴2: 𝐴13) gives
gives the 𝑟 value gives the value of 𝒂 the value of 𝑏

The linear regression equation is


𝑦 = 35.82 + 0.48𝑥

Note here that the value of 𝑎 = 35.82 is slightly different from the value of 𝑎 = 35.59 in example

PANGASINAN STATE UNIVERSITY 36


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

1 because of some rounding off error.

LEARNING POINTS

Correlation is a degree of relationship between variables, which seeks to determine how well a
linear or other equation describes or explains the relationship between variables. It also implies
“association” between two variables
Regression is a term used to describe the process of estimating the relationship between two
variables. The relationship is estimated by fitting a straight line through the given data. The method
of least squares permits us to find a line of best fit called regression line which keeps the errors of
prediction to a minimum.

LEARNING ACTIVITY 5

1. Given the bivariate data:

𝒙 𝟑 𝟒 𝟓 𝟔 𝟕
𝒚 𝟐 𝟑 𝟑 𝟓 𝟓

a. Draw a scatter diagram for the data.


b. Find n, Σx, Σy, Σ𝑥 2 , (Σ𝑥)2 , and Σxy.
c. Find a, the slope of the least-squares line, and b, the y-intercept of the least-squares line.
d. Use the equation of the least-squares line to predict the value of y when 𝑥 = 7.3.
e. Find, to the nearest hundredth, the linear correlation coefficient

2. Test scores of nine students are shown below :

Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57

a. Draw the scatter diagram.


b. Find the correlation coefficient of 𝑥 and 𝑦 and interpret your answer .
c. Find the regression line equation.
d. What is the predict value of 𝑦 if 𝑥 are 55 and 60 .

3. The number of hours spent per week viewing television (𝑦) and the number of years of education
(𝑥) were recorded for ten randomly selected individuals. The results are given below;
𝑥 12 14 11 16 16 18 12 20 10 12
𝑦 10 9 15 8 5 4 20 4 16 15

a. Draw the scatter diagram.


b. Find the correlation coefficient of x and y and interpret your answer.
c. Find the regression line equation.
d. What is the predicted value of y if 𝑥 are 15, 17 and 19.

PANGASINAN STATE UNIVERSITY 37


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

REFERENCES

References :
• Blay et. all, Mathematical Trips in the Modern World Outcomes-Based Approach
• Nocon et. al , Essential Mathematics for the Modern World
• Baltazar et. al, Mathematics in the Modern World
• Aufman,Richard et. al, Mathematics in the Modern World
• Mathematics in the World book from RBSI
• Paguio et. all, Statistics with Computer Based Discussion

Photo credits:
Population vs sample, keydifference.com
Figure 4.1 A histogram for the frequency distribution , Aufman,Richard et. al, Mathematics in the Modern
World

ANSWERS TO YOUR TURN EXERCISES

Answers to Your Turn (Lesson 1)


1. a. The six friends are sample of the population of 20 students. Use 𝑥̅ instead of 𝜇 to represent the
mean .
∑ 𝒙 500 + 750 + 430 + 630 + 450 + 440 + 700 + 350 + 580 + 630
𝜇̅ = = = ₱𝟓𝟒𝟔
𝒏 10

The mean of the test scores is ₱𝟓𝟒𝟔


b. Parameter, because was computed using all the population values.

2. a. The amount of benefits received : ₱3400 , ₱2000 , ₱4000 , ₱4300 , ₱2500 , ₱3600 ,
₱3500, ₱5000 contains 8 numbers. The median of the list of data with an even number of entries is
found by ranking the numbers and computing the mean of the two middle numbers. Ranking the
numbers from smallest to largest gives

2000, 2500, 3400, 3500, 3600, 4000, 4300, 5000


The two middle numbers are 3500 and 3600. The mean of 3500 and 3600 is 3,550. Thus ₱3550 is
the median of the data.
b. The scores : 2, 4, 10, 7, 8, 0,5, 8, 2 contains 9 numbers. The median of a list with an odd
number of entries is found by ranking the numbers and finding the middle number.
Ranking the numbers from smallest to largest gives

0, 2, 2, 4, 5, 7, 8, 8, 10
The middle number is 5. Thus 5 is the median.

3. a. In the list 3, 3, 3, 4, 4, 4, 5, 5, 5, 8, the numbers 3, 4, and 5 occur more often. Thus 3, 4, and 5
are the mode.

b. In the list 12, 34, 12, 71, 48, 93, 71,the numbers 12 and 71 occur more often that others, thus 12

PANGASINAN STATE UNIVERSITY 38


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

and 71 are the mode

(𝟏𝟏.𝟓𝟎×𝟏𝟎)+(𝟏𝟐.𝟎𝟏×𝟏𝟐)+(𝟏𝟏.𝟕𝟖×𝟏𝟖)
4. 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑚𝑒𝑎𝑛 = 𝟏𝟎+𝟏𝟐+𝟏𝟖

= 11.78

Answers to Your turn (Lesson 2)


1.
Machine 1 Machine 2
a. Range = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 − Range = 10.07 − 5.85 = 4.22 Range = 8.03 − 7.95 = 0.08
𝐿𝑜𝑤𝑒𝑠𝑡 𝑉𝑎𝑙𝑢𝑒 𝑜𝑧 𝑜𝑧
b. 𝑀𝑒𝑎𝑛 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =
Σ|𝑥−𝑥̅| 𝟕. 𝟒𝟖 𝟎. 𝟏𝟐
𝑛
𝑴𝑫 = = 𝟏. 𝟒𝟗𝟔 𝑴𝑫 = = 𝟎. 𝟎𝟐𝟒
𝟓 𝟓

𝟓+𝟖+𝟏𝟔+𝟏𝟕+𝟏𝟖+𝟐𝟎 𝟖𝟒
2. 𝝁 = = = 𝟏𝟒
𝟔 𝟔

𝒙 ̅
𝒙−𝒙 ̅ )𝟐
(𝒙 − 𝒙
5 5 − 14 = −9 (−9)2 = 81
8 8 − 14 = −6 (−6)2 = 36
16 16 − 14 = 2 22 = 4
17 17 − 14 = 3 32 = 9
18 18 − 14 = 4 42 = 16
20 20 − 14 = 6 62 = 36
𝑛=6 ̅)𝟐 = 182
Σ(𝒙 − 𝒙

∑(𝒙 − 𝝁)𝟐 𝟏𝟖𝟐


𝝈=√ =√ ≈ √𝟑𝟎. 𝟑𝟑 ≈ 𝟓. 𝟓𝟏
𝒏 𝟔

The standard deviation for this population is approximately 5.51.

3. In Your turn 2, we found 𝜎 ≈ √30.33 . Variance is the square of the standard deviation. Thus the variance is 𝜎 2 ≈
2
(√30.33) = 30.33

Answers to Your turn (Lesson 3)


1. First thing to do is arrange the data in ascending order.

1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 and 23

For 4th decile :


𝑖 (𝑛 + 1) 4(12 + 1)
𝐷4 = = = 5.2𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 ⟶ 5𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 + .2(6𝑡ℎ − 5𝑡ℎ )𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4 10

PANGASINAN STATE UNIVERSITY 39


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

= 9 + .2(11 − 9)
= 9.4

After you arranged the data in ascending order, you count what number falls under the 5.2th position.
To get the 5.2th position, we have to interpolate from the given data. The 5.2th position is interpolated
from the 5th position plus .2 (6𝑡ℎ − 5𝑡ℎ). The value of fourth decile is equal to 9.4
15−12 14−11
2. 𝑧15 = = 1.25 𝑧14 = 2 = 1.5
2.4
These 𝑧 − 𝑠𝑐𝑜𝑟𝑒𝑠 indicate that in comparison to her classmates, Cheryl did better on the second
quiz than she did on the first quiz.
𝑥−𝜇
3. 𝑧𝑥 =
𝜎

70−65.5
0.6 =
𝜎

4.5
𝜎= = 7.5
0.6

The standard deviation for this set of test scores is 7.5.

Answers to Your turn (Lesson 4)


1. a. The percent of data in all classes with an upper bound of 25 s or less is the sum of the percents
for the fi rst fi ve classes in Table 4.2. Thus the percent of subscribers who required less than 25 s to
download the file is 30.9%.
b. The percent of data in all the classes with a lower bound of at least 10 s and an upper bound of
30 s or less is the sum of the percents in the third through sixth classes in Table 4.2 . Thus the
percent of subscribers who required from 10 to 30 s to download the fi le is 47.8%. The probability
that a subscriber chosen at random will require from 10 to 30 s to download the fi le is 0.478.

2. a. 0.76 lb is 1 standard deviation above the mean of 0.61 lb. In a normal distribution, 34% of all
data lie between the mean and 1 standard deviation above the mean, and 50% of all data lie below
the mean. Thus
34% +50% = 84% of the tomatoes weigh less than 0.76 lb.

b. 0.31 lb is 2 standard deviations below the mean of 0.61 lb. In a normal distribution, 47.5% of all
data lie between the mean and 2 standard deviations below the mean, and 50% of all data lie above
the mean. This gives a total of 47.5% + 50% = 97.5% of the tomatoes that weigh more than 0.31 lb.
Therefore
(97.5%)(6000)= (0.975)(6000)= 5850 of the tomatoes can be expected to weigh more than 0.31 lb.

c. 0.31 lb is 2 standard deviations below the mean of 0.61 lb and 0.91 lb is 2 standard deviations
above the mean of 0.61 lb. In a normal distribution, 95% of all data lie within 2 standard deviations
of the mean.
Therefore(95%)(4500) =(0.95)(4500) = 4275 of the tomatoes can be expected to weigh from 0.31 lb

PANGASINAN STATE UNIVERSITY 40


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

to 0.91 lb.

3. The area of the standard normal distribution between 𝑧 = −0.67 and 𝑧 = 0 is equal to the area
between 𝑧 = 0and 𝑧 = 0.67. The entry in Table 4.4 associated with 𝑧 = 0.67is 0.249. Thus the area
of the standard normal distribution between 𝑧 = −0.67and 𝑧 = 0is 0.249 square unit.

4. Table 4.4 indicates that the area from 𝑧 = 0 to 𝑧 = −1.47 is 0.429 square unit. The area to the left
of 𝑧 = 0 is 0.500 square unit. Thus the area to the left of 𝑧 = −1.47 is 0.500 − 0.429 = 0.071
square unit.

5. Round z-scores to the nearest hundredth so you can use Table 4.4 .
𝟗−𝟔.𝟏
a. 𝒛𝟗 = 𝟏.𝟖 ≈ 𝟏. 𝟔𝟏
Table 4.4 indicates that 0.446 (44.6%) of the data in the standard normal distribution are between
𝑧 = 0 and 𝑧 = 1.61. The percent of the data to the right of 𝑧 = 1.61 is 50% −44.6% = 5.4%.
Approximately 5.4% of professional football players have careers of more than 9 years.

𝟑−𝟔.𝟏 𝟒−𝟔.𝟏
b. 𝒛𝟑 = ≈ −𝟏. 𝟕𝟐 𝒛𝟒 = = −𝟏. 𝟏𝟕
𝟏.𝟖 𝟏.𝟖

From Table 4.4:


𝐴1.72 = 0.457 𝐴1.17 = 0.379

0.457 − 0.379 = 0.078

The probability that a professional football player chosen at random will have a career of between 3
and 4 years is about 0.078.

Answers to Your turn (Lesson 5)


1.
Number 𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
1 2.5 2.3 6.25 5.29 5.75
2 3.0 3.9 9.00 15.21 11.70
3 3.2 4.4 10.24 19.36 14.08
4 3.4 5.0 11.56 25.00 17.00
5 3.5 5.5 12.25 30.25 19.25
6 3.8 6.2 14.44 38.44 23.56
7 4.0 7.1 16.00 50.41 28.40
8 4.2 7.6 17.64 57.76 31.92
Σ𝑥𝑦
N=8 Σ𝑥 = 27.6 Σ𝑦 = 42.0 Σ𝑥 2 97.38 Σ𝑦 2 241.72 = 151.66

N  xy −  x y
r=
N  x − ( x) N  y − ( y ) 
2 2 2 2

𝟖(𝟏𝟓𝟏. 𝟔𝟔) − (𝟐𝟕. 𝟔)(𝟒𝟐. 𝟎)


𝒓=
√[𝟖(𝟗𝟕. 𝟑𝟖) − (𝟐𝟕. 𝟔)𝟐 ][𝟖(𝟐𝟒𝟏. 𝟕𝟐) − (𝟒𝟐. 𝟎)𝟐 ]

PANGASINAN STATE UNIVERSITY 41


Study Guide in Mathematics in the Modern World FM-AA-CIA-15 Rev. 0 10-July-2020

GE7 Mathematics in the Modern World Module 4 : Data Management

𝒓 ≈ 𝟎. 𝟗𝟗𝟖𝟒𝟗𝟖
The linear correlation coefficient, rounded to the nearest hundredth, is 1.00. Referring to the arbitrary
scale for the interpretation of 𝑟 = 1.00, it states that there is a perfect relationship between the
stride length and speed of a camel.

2. Formulate the regression line equation by solving first the value of the variables 𝑏 and 𝑎.
Solving for 𝒃

b=
(8)(195.86) − (28.8)(52.1)
b  2.7303
(8)(106.72) − (28.8)2
Solving for 𝒂

a  6.5125 − (2.7303)(3.6) a = −3.31658

Substitute the computed values of b and a to the regression line equation

Y = a + bx
y  −3.3 + 2.7 x regression line equation

We can now estimate the speed of a camel (𝑦) using the regression line equation by
substituting a value or stride length of the camel (𝑥). Say for instance, if 𝑥 is equal to 5.0, then solving
for y will give a 71.59.

y = −3.3 + 2.7(5.0)
y = 10.2
Therefore, the estimated speed of a camel is 10.2 if its stride length is 5.0. The regression
line equation may be used now in estimating scores for y by substituting a value of 𝑥.

PANGASINAN STATE UNIVERSITY 42

You might also like