LESSON 4 Data Management Schedule 2

G E 1 1 2 - M A T H E M A T I C S I N T H E M O D E R N W O R L D
D ATA
MANAGE ME NT
SCHEDULE-
S TAT I S T I C S
MODULE 4
J O N D E L S . I H A L A S
P A RT- T I M E I N S T R U C TO R
1.Uses a variety of statistical tools to process and manage
numerical data.
2.Use the methods of linear regression and correlations to
predict the value of a variable given certain conditions.
3.Advocate the use of statistical data in making important
decisions
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

TOPICS COVERED
● Data Gathering and Organizing Data; Representing Data using graphs and
charts; interpreting organized data
● Measures of Central Tendency: Mean, Median, Mode, Weighted Mean
● Measure of Dispersion: Range, Standard Deviation and Variance
● Measure of Relative Position: z-score, Percentile, Quartile and Box-and
Whiskers Plots
● Probabilities and Normal Distributions
● Linear Regressions and Correlation: Least-Square Line, Linear Correlation
Coefficient

• A teacher who is handling mathematics
READY OR NOT? subjects evaluated the readiness of the 15
students in the subject. He gave a diagnostic
test containing 100 items and recorded the
scores as follow:
45 55 55 55
65 65 65 65
65 65 65 75
75 75 85
• How can you represent data in a more organized manner?

• When do we say that the scores are normally distributed?
P R E S E N TAT I O N O F D ATA
7
STEM LEAF 6
4 5 5
5 5 5 5 4
6 5 5 5 5 5 5
3
5
7 5 5 5 2
8 5 1
35 45 55 65 75 85 95
From the stem-and-leaf plot or histogram, we can observe that the
mean, median and mode are the same
7
STEM LEAF 5
4 5
4
5 5 5 5
3
6 5 5 5 5 5 5
5 2
7 5 5 5
1
8 5
35 45 55 65 75 85 95
The graph is symmetrical from the mean

NORMAL DISTRIBUTION
Also called normal curve, is distribution of

data where the mean, median, and mode
are equal, the distribution is clustered at
the center, the graph is a bell-shaped
curve, and symmetrical.
7
PROPERTIES OF THE NORMAL
PROBABILITY DISTRIBUTION
1. The distribution curve is bell-shaped.

2. The curve is symmetrical about its center.
3. The mean, median, and the mode coincide at the
center.
4. The width of the curve is determined by the
standard deviation of the distribution.
8
PROPERTIES OF THE NORMAL
PROBABILITY DISTRIBUTION
5. The tails of the curve flatten out indefinitely along
the horizontal axis, always approaching the axis but
never touching it. That is, the curve is asymptotic
to the base line.
6. The area under the curve is 1. Thus, it represents
the probability or proportion or the percentage
associated with specific sets of measurement
values.
9
EMPIRICAL VALUE
an approximation of probability od standard deviation

THE STANDARD NORMAL CURVE
A standard normal
curve is a probability
distribution that has a
mean 𝜇 = 0 and a
standard deviation 𝜎 = 1.
11
THE Z-SCORE
The areas under the normal curve are given in terms of 𝑧-
values or scores.
Where: 𝑋 = given measurement

𝜇 = population mean
𝜎 = standard deviation
Note: The formula converts any x-value of a normal distribution into a
value in the standard normal Z-distribution.
12
13
14
EXAMPLE 1: GIVEN THE MEAN 𝜇 = 50 AND THE
STANDARD DEVIATION, 𝜎 = 4 OF A POPULATION
OF READING SCORES. FIND THE 𝑧-VALUE THAT
CORRESPONDS TO A SCORE X = 58.
Solution:
38 42 46 50 54 58 62
Figure 1: Normal Curve showing z-scores and Raw Scores

15
EXAMPLE 2: FIND THE 𝑧-VALUE OF THE
FOLLOWING SET OF DATA. TELL WHETHER THE
SCORE IS ABOVE OR BELOW THE MEAN.
1.𝜇 = 45, 𝜎 = 6, X = 39 z = -1 Below
2.𝜇 = 40, 𝜎 = 8, X = 52 z = 1.5 Above
3.𝜇 = 75, 𝜎 = 15, X = 82 z = 0.47 Above
16
EXAMPLE 3: FIND THE AREA UNDER THE
STANDARD NORMAL CURVE OF THE FOLLOWING
𝑧-SCORES. (USE THE TABLE OF AREAS UNDER
THE NORMAL CURVE)
a. 𝑧 = 0.96 or P (𝑧 < 0.96)

Step 1: Express the z-value into 3-digits. z = 0.96
Step 2: In the table, find the first two digits on the row
z = 0.9
Step 3: Match the third digit with the appropriate column
z = 0.06
heading
Step 4: Read the area (or probability) at the intersection of the
row and the column
17
18
STANDARD NORMAL CURVE OF THE FOLLOWING
𝑧-SCORES. (USE THE TABLE OF AREAS UNDER
THE NORMAL CURVE)
a. 𝑧 = 0.96 or P (𝑧 < 0.96)

Step 1: Express the z-value into 3-digits. z = 0.96
Step 2: In the table, find the first two digits on the row
z = 0.9
Step 3: Match the third digit with the appropriate column
z = 0.06
heading
Step 4: Read the area (or probability) at the intersection of the
row and the column P (𝑧 < 0.96)
= 0.8315 OR 83.15%
19
STANDARD NORMAL CURVE BETWEEN 𝑧 = 0 AND
THE FOLLOWING 𝑧-SCORES. (USE THE TABLE OF
AREAS UNDER THE NORMAL CURVE)
b. 𝑧 = 1.45 or P (0 < 𝑧 < 1.45)
20
21
STANDARD NORMAL CURVE BETWEEN 𝑧 = 0 AND
THE FOLLOWING 𝑧-SCORES. (USE THE TABLE OF
AREAS UNDER THE NORMAL CURVE)
b. 𝑧 = 1.45 or P (0 < 𝑧 < 1.45)

P (0 < 𝑧 < 1.45)
= 0.9265-0.5
P (0 < 𝑧 < 1.45)

= 0.4265 OR 42.65%
22
STANDARD NORMAL CURVE BOUNDED BY THE
FOLLOWING PAIRS OF 𝑧-SCORES. (USE THE
TABLE OF AREAS UNDER THE NORMAL
CURVE)
c. 𝑧 = -1.91 and 𝑧 = 3 or P (-1.91 < 𝑧 < 3)
23
24
25
STANDARD NORMAL CURVE BOUNDED BY THE
FOLLOWING PAIRS OF 𝑧-SCORES. (USE THE
TABLE OF AREAS UNDER THE NORMAL
CURVE)
c. 𝑧 = -1.91 and 𝑧 = 3 or P (-1.91 < 𝑧 < 3)
P (-1.91 < 𝑧 < 3)

=0.9987-0.0281
P (-1.91 < 𝑧 < 3)

= 0.9706 OR97.06%
26
EXAMPLE 5: FIND THE AREA OR PROPORTION
(PROBABILITY) INDICATED BY EACH ITEM.
(USE THE TABLE OF AREAS UNDER THE
NORMAL CURVE)
a. Above 𝑧 = -1 or P (𝑧 > -1)
27
28
EXAMPLE 5: FIND THE AREA OR PROPORTION
(PROBABILITY) INDICATED BY EACH ITEM.
(USE THE TABLE OF AREAS UNDER THE
NORMAL CURVE)
a. Above 𝑧 = -1 or P (𝑧 > -1)
P (𝑧 > -1)
=1-0.1587
P (𝑧 > -1)
=0.8413 OR 84.13%
29
Example 6. A coffee bending machine is set to
dispense amounts of coffee per cup that follows a
normal distribution with a mean of 200 ml and a
standard deviation of 10ml. Let the random variable
x be the amount (in ml) per cup
• What is the probability that a

randomly selected cup will contain
more than 225 mL of coffee as
dispense by the machine?
• What Proportion of the cups will
contain anywhere between 198mL
and 225mL of coffee?

31
32
Example 6. A coffee bending machine is set to
dispense amounts of coffee per cup that follows a
normal distribution with a mean of 200 ml and a
standard deviation of 10ml. Let the random variable
x be the amount (in ml) per cup
• About what value do we get the

largest filled 9.01% the cups of
coffee?

34
Example 7: Fifty job applicants took an IQ tests and
their scores are normally distributed with a mean of
100.
• How many applicants obtain a score between 74

and 126 if the standard deviation is 20?
• The management decided not to hire the lowest

20% of the applicants, what must be the score
an applicant must obtain to get hired if the
standard deviation is 20?

EXAMPLE 8: DG company has 100 branches nationwide. The annual
profit of DG company is normally distributed with a mean of Php 73 million
a year with a standard deviation of Php 3.25 million. How many branches
have a profit of Php 73 million to Php 80 million?
Given:
n = 100
μ = P73M = 73
X = P80M = 80
σ = P3.25M =
3.25
73M 80M
36
EXAMPLE 8: DG company has 100 branches nationwide. The annual
profit of DG company is normally distributed with a mean of Php 73 million
a year with a standard deviation of Php 3.25 million. How many branches
have a profit of Php 73 million to Php 80 million?
Number of branches
= (Area between 𝑧 = 0 & 𝑧 =
2.15)(n)
= (0.4842)(100)
= 48.42
≈ 48 branches
0.4842
Therefore, 48 branches have a
73M 80M profit of Php 73 M to Php 80 M.
37
Quartiles
a quartile is a type of quantile which divides the
number of data points into four parts, or quarters, of
more-or-less equal size. The data must be ordered
from smallest to largest to compute quartiles; as such,
quartiles are a form of order statistic.

Quartiles
The three main quartiles are as follows:
•The first quartile (Q1) is defined as the middle number
between the smallest number (minimum) and the median of
the data set. It is also known as the lower or 25th
empirical quartile, as 25% of the data is below this point.
•The second quartile (Q2) is the median of a data set; thus
50% of the data lies below this point.
•The third quartile (Q3) is the middle value between the
median and the highest value (maximum) of the data set. It is
known as the upper or 75th empirical quartile, as 75% of the
data lies below this point.
The median procedure for finding Quartiles
1. Rank the data
2. Find the media of the data. This is the
second quartile, 𝑄2 .
3. The first quartile, 𝑄1 , is the median of the
data values less than 𝑄2 . The third
quartile, 𝑄3 , is median of the data values
greater than 𝑄2 .

Example: The following table lists the calories per 100
milliliters of 25 popular sodas. Find the quartiles for the data.
43 37 42 40 53 62 36 32 50 49
26 53 73 48 45 39 45 48 50 56
41 36 39 58 42
Step 1: Rank
1)26 2)32 3)36 4)36 5)37 6)39 7)39 8)40 9)40 10)41 11)42 12)42 13)43
14)45 15)45 16)48 17)48 18)49 19)50 20)50 21)53 22)56 23)58 24)62 25)73
Step 2: Identify the median

What is a Box-and-Whisker Plot?
A box plot is a diagram that gives a visual
representation to the distribution of the data,
highlighting where most values lie and those
values that greatly differ from the norm, called
outliers. The box plot is also referred to as box
and whisker plot or box and whisker diagram

Elements of the box plot
• The bottom side of the box represents the first quartile,
and the top side, the third quartile. Therefore the vertical
width of the central box represents the inter-quartile
deviation.
• The horizontal line inside the box is the median.
• The vertical lines protruding from the box extend to the
minimum and the maximum values of the data set, a

Construction of a Box-and-Whisker Plot
1. Draw a horizontal scale that extends from the minimum
data value to maximum value.
2. Above the scale, draw a rectangle (box) with its left aside
at 𝑄1 and its right side at 𝑄3 .
3. Draw a vertical line segment across the rectangle at the
median, 𝑄2 .
4. Draw a horizontal line segment, called a whisker, that
extends from 𝑄1 to the minimum and another whisker that
extends from 𝑄3 to the maximum.

Construction of a Box-and-Whisker Plot
Example: Construct a box-and-whisker plot in the previous
example.
1)26 2)32 3)36 4)36 5)37 6)39 7)39 8)40 9)40 10)41 11)42 12)42 13)43
14)45 15)45 16)48 17)48 18)49 19)50 20)50 21)53 22)56 23)58 24)62 25)73

Pearson Correlation and Linear Regression
A correlation or simple linear regression analysis can
determine if two numeric variables are significantly linearly
related.
A correlation analysis provides information on
the strength and direction of the linear relationship between
two variables, while a simple linear regression analysis
estimates parameters in a linear equation that can be used
to predict values of one variable based on the other.

T H E P E A R S O N P R O D U C T- M O M E N T C O R R E L A T I O N
▪ is a devised quantitative way to measure the association between two variables
▪ the strength of correlation is indicated by the coefficient of correlation (𝑟)
▪ named in honor of the statistician who did a lot of research on the discipline, Karl
Pearson
• To compute 𝑟, we use the formula:
𝑛Σ𝑋𝑌−ΣX⋅ ΣY
𝑟=
𝑛Σ𝑋 2 − ΣX 2 𝑛Σ𝑌 2 − ΣY 2
where:
𝑛 = sample size; 𝑋 = value of the independent variable; 𝑌 = value of the dependent
variable

ASSUMPTIONS FOR PEARSON’S
C O R R E L AT I O N :
• 1. Variables should be measured at the interval or ratio level (they are continuous). Interval
measurements have no absolute zero, like motivation, stress and mathematical achievement. Ratio
measurements have an absolute zero, like height of children, weight of vegetables and brightness of light.
• 2. There is a linear relationship between your variables. Scatter plot is used to identify linear relationship.
Pearson r Qualitative Description

±1 Perfect
±0.75 to < ±1 Very high
±0.50 to < ±0.75 Moderately high
±0.25 to < ±0.50 Moderately low
> 0 to < ±0.25 Very low
0 No correlation
• Large sample size requires a tedious manual process. Thus, statistical software like SPSS may be used.
Example 1: A large industrial plant has seven divisions that do the same type of work. For their
safety report, a safety inspector visits each division of 20 workers quarterly.
a. Identify the independent and dependent variables;
b. test if there is a relationship between the variables.
c. Determine the equation that best fits the line
Number of work-hours lost
Number of work-hours devoted
Division due to industry-related
to safety training
accidents
1 10.0 80
2 19.5 65
3 30.0 68
4 45.0 55
5 50.0 35
6 65.0 10
7 80.0 12

Independent Variable: Number of work-hours devoted to safety training
Dependent Variable: Number of work-hours lost due to industry-related
accidents
Division X Y X2 Y2 XY
1 10.0 80 100 6400 800
2 19.5 65 380.25 4225 1267.5
3 30.0 68 900 4624 2040
4 45.0 55 2025 3025 2475
5 50.0 35 2500 1225 1750
6 65.0 10 4225 100 650
7 80.0 12 6400 144 960
σ 𝑋 = 299.5 σ 𝑌 = 325 σ 𝑋 2 = 16530.25

σ 𝑌 2 = 19743 σ 𝑋𝑌 = 9942.5 n=7

𝑛Σ𝑋𝑌−ΣX⋅ ΣY (7)(9942.5)−(299.5)(325)
𝑟= =
𝑛Σ𝑋 2 − ΣX 2 𝑛Σ𝑌 2 − ΣY 2 7(16530.25)− 299.5 2 7(19743)− 325 2
69597.5−97337.5
=
115711.75− 89700.25 138201− 105625
−27740 −27740 −27740

= = =
26011.5 32576 847350624 29109.28759
≈ −0.9530
The computed r is – 0.9530. The result indicates that there is a

very high negative association between the independent and dependent variables.
This implies that number of work-hours devoted to safety training and number
of work-hours lost due to industry-related accidents has a very strong negative
correlation. Thus, as one variable increases, the other variable decreases.

σ 𝑋 = 299.5 σ 𝑌 = 325 σ 𝑋2 = 16530.25 σ 𝑌2 = 19743
σ 𝑋𝑌 = 9942.5 n=7 r = -0.9530
σ 𝑌 σ 𝑋 2 −σ 𝑋 σ 𝑋𝑌 325 16530.25 − 299.5 9942.5 5372331.25−2977778.75

𝑎= = =
𝑛 σ 𝑋2− σ 𝑋 2 7 16530.25 − 299.5 2 115711.75−89700.25
2394552.5
= ≈ 92.057
26011.5
𝑛 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 7 9942.5 − (299.5)(325) 69597.5 − 97337.5 −27740

𝑏= = = = ≈ −1.066
𝑛 σ 𝑋2 − σ 𝑋 2 7 16530.25 − (299.5)2 115711.75 − 89700.25 26011.5
The best-fitting line is 𝑌 = −1.066𝑋 + 92.057.

• Example 2: Based on the data below, determine if there is a relationship
between the scores in Physics and Statistics. Also identify the equation
that best fits the line.
Physics (x) 3 9 10 12 7
Statistics (y) 5 8 10 9 8
X Y X2 Y2 XY
3 5 9 25 15
9 8 81 64 72
10 10 100 100 100
12 9 144 81 108
7 8 49 64 56
σ 𝑋 = 41 σ 𝑌 = 40 σ 𝑋 2 = 383 σ 𝑌 2 = 334 σ 𝑋𝑌 = 351 𝑛 = 5

𝑛Σ𝑋𝑌−ΣX⋅ ΣY (5)(351)−(41)(40)
𝑟= =
𝑛Σ𝑋 2 − ΣX 2 𝑛Σ𝑌 2 − ΣY 2 5(383)− 41 2 5(334)− 40 2
1755−1640
=
1915− 1681 1670−1600
115 115 115

= = =
234 70 16380 127.984374
≈ 0.8985
≈ 0.90
The computed r is 0.8985 or 0.90. The result show that there is a very
high or very strong positive association between the scores in Physics and Statistics.
This means that if a students has a high score in one subject it is expected that the
student also has a high score in the other subject. In addition, a low score in one subject
means a low score in the other subject.

σ 𝑌 σ 𝑋2 − σ 𝑋 σ 𝑋𝑌 (40)(383) − (41)(351) 15320 − 14391 929
𝑎= = = = ≈ 3.97
𝑛 σ 𝑋2 − (σ 𝑋)2 5(383) − (41)2 1915 − 1681 234
𝑛 σ 𝑋𝑌−σ 𝑋 σ 𝑌 (5)(351 )−(41)(40) 1755 −1640 115

𝑏= 𝑛 σ 𝑋 2 −(σ 𝑋 )2
= (5)(383 )−(41)2
= 1915 −1681 = 234
≈ 0.491
The association between Physics (x) and Statistics (y) can be modelled
by the regression line 𝑌 = 0.491𝑋 + 3.97.
REFERENCES:
Priscilla S. Altares, A. R. (2012). Elementary Statistics with Computer
Application. Quezon City: Rex Printing Company, Inc.
https://www.questionpro.com/blog/interval-scale/
http://lsc.cornell.edu/wp-content/uploads/2016/01/Intro-to-
measurement-and-statistics.pdf

LESSON 4 Data Management Schedule 2

Uploaded by

Copyright:

Available Formats

You might also like

LESSON 4 Data Management Schedule 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LESSON 4 Data Management Schedule 2

Uploaded by

Copyright:

Available Formats

G E 1 1 2 - M A T H E M A T I C S I N T H E M O D E R N W O R L D

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

• How can you represent data in a more organized manner?

The graph is symmetrical from the mean

Also called normal curve, is distribution of

1. The distribution curve is bell-shaped.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

Where: 𝑋 = given measurement

Figure 1: Normal Curve showing z-scores and Raw Scores

1.𝜇 = 45, 𝜎 = 6, X = 39 z = -1 Below

2.𝜇 = 40, 𝜎 = 8, X = 52 z = 1.5 Above

3.𝜇 = 75, 𝜎 = 15, X = 82 z = 0.47 Above

a. 𝑧 = 0.96 or P (𝑧 < 0.96)

a. 𝑧 = 0.96 or P (𝑧 < 0.96)

b. 𝑧 = 1.45 or P (0 < 𝑧 < 1.45)

b. 𝑧 = 1.45 or P (0 < 𝑧 < 1.45)

P (0 < 𝑧 < 1.45)

c. 𝑧 = -1.91 and 𝑧 = 3 or P (-1.91 < 𝑧 < 3)

c. 𝑧 = -1.91 and 𝑧 = 3 or P (-1.91 < 𝑧 < 3)

P (-1.91 < 𝑧 < 3)

P (-1.91 < 𝑧 < 3)

a. Above 𝑧 = -1 or P (𝑧 > -1)

a. Above 𝑧 = -1 or P (𝑧 > -1)

• What is the probability that a

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

• About what value do we get the

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

• How many applicants obtain a score between 74

• The management decided not to hire the lowest

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

Pearson r Qualitative Description

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

σ 𝑋 = 299.5 σ 𝑌 = 325 σ 𝑋 2 = 16530.25

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

−27740 −27740 −27740

The computed r is – 0.9530. The result indicates that there is a

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

σ 𝑌 σ 𝑋 2 −σ 𝑋 σ 𝑋𝑌 325 16530.25 − 299.5 9942.5 5372331.25−2977778.75

𝑛 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 7 9942.5 − (299.5)(325) 69597.5 − 97337.5 −27740

The best-fitting line is 𝑌 = −1.066𝑋 + 92.057.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

σ 𝑋 = 41 σ 𝑌 = 40 σ 𝑋 2 = 383 σ 𝑌 2 = 334 σ 𝑋𝑌 = 351 𝑛 = 5

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

115 115 115

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

𝑛 σ 𝑋𝑌−σ 𝑋 σ 𝑌 (5)(351 )−(41)(40) 1755 −1640 115

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS