LESSON 4 Data Management Schedule 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

G E 1 1 2 - M A T H E M A T I C S I N T H E M O D E R N W O R L D

D ATA
MANAGE ME NT
SCHEDULE-
S TAT I S T I C S
MODULE 4
J O N D E L S . I H A L A S

P A RT- T I M E I N S T R U C TO R
1.Uses a variety of statistical tools to process and manage
numerical data.
2.Use the methods of linear regression and correlations to
predict the value of a variable given certain conditions.
3.Advocate the use of statistical data in making important
decisions

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


TOPICS COVERED
● Data Gathering and Organizing Data; Representing Data using graphs and
charts; interpreting organized data
● Measures of Central Tendency: Mean, Median, Mode, Weighted Mean
● Measure of Dispersion: Range, Standard Deviation and Variance
● Measure of Relative Position: z-score, Percentile, Quartile and Box-and
Whiskers Plots
● Probabilities and Normal Distributions
● Linear Regressions and Correlation: Least-Square Line, Linear Correlation
Coefficient

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


• A teacher who is handling mathematics
READY OR NOT? subjects evaluated the readiness of the 15
students in the subject. He gave a diagnostic
test containing 100 items and recorded the
scores as follow:

45 55 55 55
65 65 65 65
65 65 65 75
75 75 85

• How can you represent data in a more organized manner?


• When do we say that the scores are normally distributed?
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
P R E S E N TAT I O N O F D ATA
7

STEM LEAF 6

4 5 5

5 5 5 5 4
6 5 5 5 5 5 5
3
5
7 5 5 5 2

8 5 1

35 45 55 65 75 85 95
From the stem-and-leaf plot or histogram, we can observe that the
mean, median and mode are the same
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
7

STEM LEAF 5
4 5
4
5 5 5 5
3
6 5 5 5 5 5 5
5 2
7 5 5 5
1
8 5
35 45 55 65 75 85 95

The graph is symmetrical from the mean


LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
NORMAL DISTRIBUTION

Also called normal curve, is distribution of


data where the mean, median, and mode
are equal, the distribution is clustered at
the center, the graph is a bell-shaped
curve, and symmetrical.

7
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
PROPERTIES OF THE NORMAL
PROBABILITY DISTRIBUTION

1. The distribution curve is bell-shaped.


2. The curve is symmetrical about its center.
3. The mean, median, and the mode coincide at the
center.
4. The width of the curve is determined by the
standard deviation of the distribution.
8
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
PROPERTIES OF THE NORMAL
PROBABILITY DISTRIBUTION
5. The tails of the curve flatten out indefinitely along
the horizontal axis, always approaching the axis but
never touching it. That is, the curve is asymptotic
to the base line.
6. The area under the curve is 1. Thus, it represents
the probability or proportion or the percentage
associated with specific sets of measurement
values.
9
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EMPIRICAL VALUE
an approximation of probability od standard deviation

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


THE STANDARD NORMAL CURVE

A standard normal
curve is a probability
distribution that has a
mean 𝜇 = 0 and a
standard deviation 𝜎 = 1.

11
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
THE Z-SCORE
The areas under the normal curve are given in terms of 𝑧-
values or scores.

Where: 𝑋 = given measurement


𝜇 = population mean
𝜎 = standard deviation
Note: The formula converts any x-value of a normal distribution into a
value in the standard normal Z-distribution.

12
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
13
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
14
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 1: GIVEN THE MEAN 𝜇 = 50 AND THE
STANDARD DEVIATION, 𝜎 = 4 OF A POPULATION
OF READING SCORES. FIND THE 𝑧-VALUE THAT
CORRESPONDS TO A SCORE X = 58.

Solution:

38 42 46 50 54 58 62

Figure 1: Normal Curve showing z-scores and Raw Scores


15
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 2: FIND THE 𝑧-VALUE OF THE
FOLLOWING SET OF DATA. TELL WHETHER THE
SCORE IS ABOVE OR BELOW THE MEAN.

1.𝜇 = 45, 𝜎 = 6, X = 39 z = -1 Below

2.𝜇 = 40, 𝜎 = 8, X = 52 z = 1.5 Above

3.𝜇 = 75, 𝜎 = 15, X = 82 z = 0.47 Above

16
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 3: FIND THE AREA UNDER THE
STANDARD NORMAL CURVE OF THE FOLLOWING
𝑧-SCORES. (USE THE TABLE OF AREAS UNDER
THE NORMAL CURVE)

a. 𝑧 = 0.96 or P (𝑧 < 0.96)


Step 1: Express the z-value into 3-digits. z = 0.96
Step 2: In the table, find the first two digits on the row
z = 0.9
Step 3: Match the third digit with the appropriate column
z = 0.06
heading
Step 4: Read the area (or probability) at the intersection of the
row and the column

17
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
18
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 3: FIND THE AREA UNDER THE
STANDARD NORMAL CURVE OF THE FOLLOWING
𝑧-SCORES. (USE THE TABLE OF AREAS UNDER
THE NORMAL CURVE)

a. 𝑧 = 0.96 or P (𝑧 < 0.96)


Step 1: Express the z-value into 3-digits. z = 0.96
Step 2: In the table, find the first two digits on the row
z = 0.9
Step 3: Match the third digit with the appropriate column
z = 0.06
heading
Step 4: Read the area (or probability) at the intersection of the
row and the column P (𝑧 < 0.96)
= 0.8315 OR 83.15%
19
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 3: FIND THE AREA UNDER THE
STANDARD NORMAL CURVE BETWEEN 𝑧 = 0 AND
THE FOLLOWING 𝑧-SCORES. (USE THE TABLE OF
AREAS UNDER THE NORMAL CURVE)

b. 𝑧 = 1.45 or P (0 < 𝑧 < 1.45)

20
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
21
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 3: FIND THE AREA UNDER THE
STANDARD NORMAL CURVE BETWEEN 𝑧 = 0 AND
THE FOLLOWING 𝑧-SCORES. (USE THE TABLE OF
AREAS UNDER THE NORMAL CURVE)

b. 𝑧 = 1.45 or P (0 < 𝑧 < 1.45)


P (0 < 𝑧 < 1.45)
= 0.9265-0.5

P (0 < 𝑧 < 1.45)


= 0.4265 OR 42.65%

22
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 4: FIND THE AREA UNDER THE
STANDARD NORMAL CURVE BOUNDED BY THE
FOLLOWING PAIRS OF 𝑧-SCORES. (USE THE
TABLE OF AREAS UNDER THE NORMAL
CURVE)

c. 𝑧 = -1.91 and 𝑧 = 3 or P (-1.91 < 𝑧 < 3)

23
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
24
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
25
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 4: FIND THE AREA UNDER THE
STANDARD NORMAL CURVE BOUNDED BY THE
FOLLOWING PAIRS OF 𝑧-SCORES. (USE THE
TABLE OF AREAS UNDER THE NORMAL
CURVE)

c. 𝑧 = -1.91 and 𝑧 = 3 or P (-1.91 < 𝑧 < 3)

P (-1.91 < 𝑧 < 3)


=0.9987-0.0281

P (-1.91 < 𝑧 < 3)


= 0.9706 OR97.06%

26
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 5: FIND THE AREA OR PROPORTION
(PROBABILITY) INDICATED BY EACH ITEM.
(USE THE TABLE OF AREAS UNDER THE
NORMAL CURVE)

a. Above 𝑧 = -1 or P (𝑧 > -1)

27
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
28
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 5: FIND THE AREA OR PROPORTION
(PROBABILITY) INDICATED BY EACH ITEM.
(USE THE TABLE OF AREAS UNDER THE
NORMAL CURVE)

a. Above 𝑧 = -1 or P (𝑧 > -1)

P (𝑧 > -1)
=1-0.1587

P (𝑧 > -1)
=0.8413 OR 84.13%

29
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
Example 6. A coffee bending machine is set to
dispense amounts of coffee per cup that follows a
normal distribution with a mean of 200 ml and a
standard deviation of 10ml. Let the random variable
x be the amount (in ml) per cup

• What is the probability that a


randomly selected cup will contain
more than 225 mL of coffee as
dispense by the machine?
• What Proportion of the cups will
contain anywhere between 198mL
and 225mL of coffee?

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


31
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
32
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
Example 6. A coffee bending machine is set to
dispense amounts of coffee per cup that follows a
normal distribution with a mean of 200 ml and a
standard deviation of 10ml. Let the random variable
x be the amount (in ml) per cup

• About what value do we get the


largest filled 9.01% the cups of
coffee?

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


34
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
Example 7: Fifty job applicants took an IQ tests and
their scores are normally distributed with a mean of
100.

• How many applicants obtain a score between 74


and 126 if the standard deviation is 20?

• The management decided not to hire the lowest


20% of the applicants, what must be the score
an applicant must obtain to get hired if the
standard deviation is 20?

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


EXAMPLE 8: DG company has 100 branches nationwide. The annual
profit of DG company is normally distributed with a mean of Php 73 million
a year with a standard deviation of Php 3.25 million. How many branches
have a profit of Php 73 million to Php 80 million?

Given:
n = 100
μ = P73M = 73
X = P80M = 80
σ = P3.25M =
3.25

73M 80M
36
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
EXAMPLE 8: DG company has 100 branches nationwide. The annual
profit of DG company is normally distributed with a mean of Php 73 million
a year with a standard deviation of Php 3.25 million. How many branches
have a profit of Php 73 million to Php 80 million?

Number of branches
= (Area between 𝑧 = 0 & 𝑧 =
2.15)(n)
= (0.4842)(100)
= 48.42
≈ 48 branches
0.4842
Therefore, 48 branches have a
73M 80M profit of Php 73 M to Php 80 M.
37
Quartiles
a quartile is a type of quantile which divides the
number of data points into four parts, or quarters, of
more-or-less equal size. The data must be ordered
from smallest to largest to compute quartiles; as such,
quartiles are a form of order statistic.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


Quartiles
The three main quartiles are as follows:
•The first quartile (Q1) is defined as the middle number
between the smallest number (minimum) and the median of
the data set. It is also known as the lower or 25th
empirical quartile, as 25% of the data is below this point.
•The second quartile (Q2) is the median of a data set; thus
50% of the data lies below this point.
•The third quartile (Q3) is the middle value between the
median and the highest value (maximum) of the data set. It is
known as the upper or 75th empirical quartile, as 75% of the
data lies below this point.
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
The median procedure for finding Quartiles
1. Rank the data
2. Find the media of the data. This is the
second quartile, 𝑄2 .
3. The first quartile, 𝑄1 , is the median of the
data values less than 𝑄2 . The third
quartile, 𝑄3 , is median of the data values
greater than 𝑄2 .

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


Example: The following table lists the calories per 100
milliliters of 25 popular sodas. Find the quartiles for the data.
43 37 42 40 53 62 36 32 50 49
26 53 73 48 45 39 45 48 50 56
41 36 39 58 42
Step 1: Rank
1)26 2)32 3)36 4)36 5)37 6)39 7)39 8)40 9)40 10)41 11)42 12)42 13)43

14)45 15)45 16)48 17)48 18)49 19)50 20)50 21)53 22)56 23)58 24)62 25)73
Step 2: Identify the median

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


What is a Box-and-Whisker Plot?
A box plot is a diagram that gives a visual
representation to the distribution of the data,
highlighting where most values lie and those
values that greatly differ from the norm, called
outliers. The box plot is also referred to as box
and whisker plot or box and whisker diagram

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


Elements of the box plot
• The bottom side of the box represents the first quartile,
and the top side, the third quartile. Therefore the vertical
width of the central box represents the inter-quartile
deviation.
• The horizontal line inside the box is the median.
• The vertical lines protruding from the box extend to the
minimum and the maximum values of the data set, a

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


Construction of a Box-and-Whisker Plot
1. Draw a horizontal scale that extends from the minimum
data value to maximum value.
2. Above the scale, draw a rectangle (box) with its left aside
at 𝑄1 and its right side at 𝑄3 .
3. Draw a vertical line segment across the rectangle at the
median, 𝑄2 .
4. Draw a horizontal line segment, called a whisker, that
extends from 𝑄1 to the minimum and another whisker that
extends from 𝑄3 to the maximum.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


Construction of a Box-and-Whisker Plot
Example: Construct a box-and-whisker plot in the previous
example.

1)26 2)32 3)36 4)36 5)37 6)39 7)39 8)40 9)40 10)41 11)42 12)42 13)43

14)45 15)45 16)48 17)48 18)49 19)50 20)50 21)53 22)56 23)58 24)62 25)73

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


Pearson Correlation and Linear Regression
A correlation or simple linear regression analysis can
determine if two numeric variables are significantly linearly
related.
A correlation analysis provides information on
the strength and direction of the linear relationship between
two variables, while a simple linear regression analysis
estimates parameters in a linear equation that can be used
to predict values of one variable based on the other.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


T H E P E A R S O N P R O D U C T- M O M E N T C O R R E L A T I O N
▪ is a devised quantitative way to measure the association between two variables
▪ the strength of correlation is indicated by the coefficient of correlation (𝑟)
▪ named in honor of the statistician who did a lot of research on the discipline, Karl
Pearson
• To compute 𝑟, we use the formula:
𝑛Σ𝑋𝑌−ΣX⋅ ΣY
𝑟=
𝑛Σ𝑋 2 − ΣX 2 𝑛Σ𝑌 2 − ΣY 2

where:
𝑛 = sample size; 𝑋 = value of the independent variable; 𝑌 = value of the dependent
variable

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


ASSUMPTIONS FOR PEARSON’S
C O R R E L AT I O N :
• 1. Variables should be measured at the interval or ratio level (they are continuous). Interval
measurements have no absolute zero, like motivation, stress and mathematical achievement. Ratio
measurements have an absolute zero, like height of children, weight of vegetables and brightness of light.
• 2. There is a linear relationship between your variables. Scatter plot is used to identify linear relationship.

Pearson r Qualitative Description


±1 Perfect
±0.75 to < ±1 Very high
±0.50 to < ±0.75 Moderately high
±0.25 to < ±0.50 Moderately low
> 0 to < ±0.25 Very low
0 No correlation

• Large sample size requires a tedious manual process. Thus, statistical software like SPSS may be used.
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS
Example 1: A large industrial plant has seven divisions that do the same type of work. For their
safety report, a safety inspector visits each division of 20 workers quarterly.
a. Identify the independent and dependent variables;
b. test if there is a relationship between the variables.
c. Determine the equation that best fits the line
Number of work-hours lost
Number of work-hours devoted
Division due to industry-related
to safety training
accidents
1 10.0 80
2 19.5 65
3 30.0 68
4 45.0 55
5 50.0 35
6 65.0 10
7 80.0 12

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


Independent Variable: Number of work-hours devoted to safety training
Dependent Variable: Number of work-hours lost due to industry-related
accidents

Division X Y X2 Y2 XY
1 10.0 80 100 6400 800
2 19.5 65 380.25 4225 1267.5
3 30.0 68 900 4624 2040
4 45.0 55 2025 3025 2475
5 50.0 35 2500 1225 1750
6 65.0 10 4225 100 650
7 80.0 12 6400 144 960

σ 𝑋 = 299.5 σ 𝑌 = 325 σ 𝑋 2 = 16530.25


σ 𝑌 2 = 19743 σ 𝑋𝑌 = 9942.5 n=7

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


𝑛Σ𝑋𝑌−ΣX⋅ ΣY (7)(9942.5)−(299.5)(325)
𝑟= =
𝑛Σ𝑋 2 − ΣX 2 𝑛Σ𝑌 2 − ΣY 2 7(16530.25)− 299.5 2 7(19743)− 325 2

69597.5−97337.5
=
115711.75− 89700.25 138201− 105625

−27740 −27740 −27740


= = =
26011.5 32576 847350624 29109.28759

≈ −0.9530

The computed r is – 0.9530. The result indicates that there is a


very high negative association between the independent and dependent variables.
This implies that number of work-hours devoted to safety training and number
of work-hours lost due to industry-related accidents has a very strong negative
correlation. Thus, as one variable increases, the other variable decreases.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


σ 𝑋 = 299.5 σ 𝑌 = 325 σ 𝑋2 = 16530.25 σ 𝑌2 = 19743
σ 𝑋𝑌 = 9942.5 n=7 r = -0.9530

σ 𝑌 σ 𝑋 2 −σ 𝑋 σ 𝑋𝑌 325 16530.25 − 299.5 9942.5 5372331.25−2977778.75


𝑎= = =
𝑛 σ 𝑋2− σ 𝑋 2 7 16530.25 − 299.5 2 115711.75−89700.25
2394552.5
= ≈ 92.057
26011.5

𝑛 σ 𝑋𝑌 − σ 𝑋 σ 𝑌 7 9942.5 − (299.5)(325) 69597.5 − 97337.5 −27740


𝑏= = = = ≈ −1.066
𝑛 σ 𝑋2 − σ 𝑋 2 7 16530.25 − (299.5)2 115711.75 − 89700.25 26011.5

The best-fitting line is 𝑌 = −1.066𝑋 + 92.057.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


• Example 2: Based on the data below, determine if there is a relationship
between the scores in Physics and Statistics. Also identify the equation
that best fits the line.

Physics (x) 3 9 10 12 7
Statistics (y) 5 8 10 9 8

X Y X2 Y2 XY
3 5 9 25 15
9 8 81 64 72
10 10 100 100 100
12 9 144 81 108
7 8 49 64 56

σ 𝑋 = 41 σ 𝑌 = 40 σ 𝑋 2 = 383 σ 𝑌 2 = 334 σ 𝑋𝑌 = 351 𝑛 = 5

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


𝑛Σ𝑋𝑌−ΣX⋅ ΣY (5)(351)−(41)(40)
𝑟= =
𝑛Σ𝑋 2 − ΣX 2 𝑛Σ𝑌 2 − ΣY 2 5(383)− 41 2 5(334)− 40 2

1755−1640
=
1915− 1681 1670−1600

115 115 115


= = =
234 70 16380 127.984374

≈ 0.8985
≈ 0.90

The computed r is 0.8985 or 0.90. The result show that there is a very
high or very strong positive association between the scores in Physics and Statistics.
This means that if a students has a high score in one subject it is expected that the
student also has a high score in the other subject. In addition, a low score in one subject
means a low score in the other subject.

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS


σ 𝑌 σ 𝑋2 − σ 𝑋 σ 𝑋𝑌 (40)(383) − (41)(351) 15320 − 14391 929
𝑎= = = = ≈ 3.97
𝑛 σ 𝑋2 − (σ 𝑋)2 5(383) − (41)2 1915 − 1681 234

𝑛 σ 𝑋𝑌−σ 𝑋 σ 𝑌 (5)(351 )−(41)(40) 1755 −1640 115


𝑏= 𝑛 σ 𝑋 2 −(σ 𝑋 )2
= (5)(383 )−(41)2
= 1915 −1681 = 234
≈ 0.491

The association between Physics (x) and Statistics (y) can be modelled
by the regression line 𝑌 = 0.491𝑋 + 3.97.
REFERENCES:
Priscilla S. Altares, A. R. (2012). Elementary Statistics with Computer
Application. Quezon City: Rex Printing Company, Inc.

https://www.questionpro.com/blog/interval-scale/

http://lsc.cornell.edu/wp-content/uploads/2016/01/Intro-to-
measurement-and-statistics.pdf

LESSON 4 DATA MANAGEMENT SCHEDULE-STATISTICS

You might also like