Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

lOMoARcPSD|24995734

Statistics - Mathematics in the modern worldd

BS accountancy (University of Cebu)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers

CHAPTER 4

Statistics

Topics:
4.1 Measures of Central
Tendency/Location
4.2 Measures of
Dispersion/Variation
4.3 Linear Correlation
and Simple Linear Regression

The word Statistics have two major definitions, a singular form and a plural form. Statistics, in a
plural sense, refers to the data itself or to some numerical computations derived from a set of data that are
systematically collected and analyzed. In a singular sense, Statistics refers to the scientific discipline
consisting of the theory and methods for processing collections of quantitative and qualitative data useful
when making decisions in the face of uncertainty.

Below are the objectives and some key definitions to be considered as you going through this
module.

Objectives:

(1) Calculate the mean, median and mode of a set of data and under what conditions they are most
appropriate to be used;
(2) Calculate the range, variance, and standard deviation;
(3) Plot a scatter diagram, measure and interpret the relationship between the two variables; and
(4) Predict or estimate values of dependent variable from known values of independent variables.

Key Definitions:

Population is a collection of all units from which data is to be collected.


Sample is a subset of the population.
Variables are the characteristics or properties measured from objects, persons or
things on every unit of the population.
Outlier is an observation of data that does not fit the rest of the data. It is sometimes
called an extreme value

First Semester 1 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


4.1 MEASURES OF CENTRAL TENDENCY/LOCATION
Measures of Central Tendency or Location is a numerical value that summarizes a set of
observations into a single value and that value may be used to represent the entire population. There
are three types of measures of central tendency namely: arithmetic mean, median and mode.

4.1.1. Mean
The mean (often called the average) is the most popular measure of central tendency. It is
the sum of a set of observations divided by the number of observations in the set. This measure
is appropriate for data in interval or ratio scale. The computing formulas of the mean are as
follows:

➢ Population Mean where


𝑁
1 𝜇 – population mean
𝜇= ∑ 𝑥𝑖
𝑁
𝑖=1 𝑥̅ – sample mean

➢ Sample Mean 𝑋ത – weighted mean


𝑛
1 𝑁 – population size or total number of observations
𝑥̅ = ∑ 𝑥𝑖
𝑛
𝑖=1 𝑛 – sample size or total number of observations

𝑥𝑖 – set of data or observations


➢ Weighted Mean
𝑤𝑖 – the weights of each of the k distinct
𝑘 observation
∑ 𝑤𝑖 𝑥𝑖
𝑖=1

𝑋ത =
𝑘

∑ 𝑤𝑖
𝑖=1

Example 1. The number of hours spent by 12 students in studying their Statistics lesson
before exam were recorded as follows: 9, 11, 16, 11, 15, 12, 10, 16, 13, 11, 11, 17. Find
the arithmetic mean.

Solution: Since it was not mentioned that the data are random samples, we assume,
for the purpose of illustration, that this a population data. Thus
12
1 1
𝜇= ∑ 𝑥𝑖 = (𝑥 + 𝑥2 + … + 𝑥12 )
12 12 1
𝑖=1

1
= (9 + 11 + 16 + 11 + 15 + 12 + 10 + 16 + 13 + 11 + 11 + 17)
12
1 152
= (152) = = 12.67
12 12

This result shows that on the average, the 12 students spent 12.67 hours in
studying their Statistics lesson.

First Semester 2 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


Example 2. The CMUCAT scores of a sample of 5 students who joined the university
during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88. Compute
the mean CMUCAT score.

Solution: This is a sample data, hence


5
1 1 1
𝑥̅ = ∑ 𝑥𝑖 = (𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5 ) = (78 + 90 + 89 + 95 + 88)
5 5 5
𝑖=1
1 440
= (440) = = 88
5 5
This result shows that 5 students have an average CMUCAT score of 88.

Example 3. The student’s final grades in Math 51, Math 43, GEE 12, GEC 19, PE31 and
NSTP 1 are 2.5, 2.75, 1.25, 1.75, 1.25 and 1.75, respectively. If the respective credits for
these subjects are 3, 4, 3, 3, 2, and 3 units, determine the student’s GPA or weighted
average grade.

Solution:
6

∑ 𝑤𝑖 𝑥𝑖
𝑖=1
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + 𝑤4 𝑥4 + 𝑤5 𝑥5 + 𝑤6 𝑥6
𝑋ത = =
6 𝑤1 + 𝑤2 + 𝑤3 + 𝑤4 + 𝑤5 + 𝑤6

∑ 𝑤𝑖 3(2.5) + 4(2.75) + 3(1.25) + 3(1.75) + 2(1.25) + 3(1.75)


𝑖=1 =
3+4+3+3+2+3

35.25
= = 1.96
18
This result shows that the GPA of this student is 1.96.

4.1.2. Median
The median is the middle value of a set of observations arranged in an increasing or
decreasing order of magnitude, denoted by 𝑥̃. It is a positional value and unlike the arithmetic
mean, it is not affected by the presence of extreme values. When abnormal values or outliers
are present, it is preferable to use the median rather than the mean as a measure of central
location. It is an appropriate measure for data which are at least in the ordinal scale.
❖ Population Median
➢ If N is odd, then the median is computed using
𝑋̃ = 𝑥(𝑁+1)
2

➢ If N is even, then the median is computed using


𝑥(𝑁) + 𝑥(𝑁+1)
𝑋̃ =
2 2

2
❖ Sample Median
➢ If n is odd, then the median is computed using
𝑥̃ = 𝑥(𝑛+1)
2

➢ If n is even, then the median is computed using


𝑥(𝑛) + 𝑥(𝑛+1)
2 2
𝑥̃ =
2
First Semester 3 CMU Mathematics Department
Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers

Example 4. The ages of 8 CMU students enrolled in GEC 14 subject are: 18, 17, 23, 20,
19, 18, 21, and 22. Find the median of ages.

Solution: Arrange the ages in ascending order: 17, 18, 18, 19, 20, 21, 22, 23. This
means that 𝑥(1) = 17, 𝑥(2) = 18, 𝑥(3) = 18, 𝑥(4) = 19, 𝑥(5) = 20,
𝑥(6) = 21, 𝑥(7) = 22, 𝑥(8) = 23.
Since it was not mentioned that the data are random samples, we assume,
for the purpose of illustration, that this a population data. Also, N=8, which is an
even number, the median is

𝑥(𝑁) + 𝑥(𝑁+1) 𝑥(8⁄2) + 𝑥(8+1) 𝑥4 + 𝑥5


𝑋̃ = 2 2
= 2
=
2 2 2

Now, 𝑥4 = 19 𝑎𝑛𝑑 𝑥5 = 20, then


19+20 39
𝑋̃ = 2 = 2 = 19.5.
Thus, the median ages of 8 CMU students enrolled in GEC 14 subject is 19.5.

Example 5. The CMUCAT scores of a sample of 5 students who joined the university
during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88.
Determine the median CMUCAT score.

Solution: Arrange the CMUCAT scores in ascending order: 78, 88, 89, 90, 95. This
means that 𝑥(1) = 78, 𝑥(2) = 88, 𝑥(3) = 89, 𝑥(4) = 90, 𝑥(5) = 95.
Since n=5, which is an odd number, the median is
𝑥̃ = 𝑥(𝑛+1) = 𝑥 5+1 = 𝑥(6) = 𝑥(3) = 89.
2 ( ) 2
2

Thus, the median is 89, which is the 3rd observation of the ordered data.

4.1.3. Mode
Mode is defined as the value which occur the greatest number of times or the value with
the greatest frequency. It is an appropriate measure for a nominal or categorical type of data.
Note: If observations occur with equal frequency then there is no modal value for the data
set.

Example 6. The CMUCAT scores of a sample of 5 students who joined the university
during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88. Find
the mode CMUCAT score.

Solution: Since the observations occur with equal frequency then there is no modal value
for the data set.

Example 7. The number of hours spent by 12 students in studying their Statistics lesson
before exam were recorded as follows: 9, 11, 16, 11, 15, 12, 10, 16, 13, 11, 11, 17. Find
the mode.

Solution: The mode is 11 hours since it occurs four times while the other observations occur
only once or twice.

First Semester 4 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


4.2. MEASURES OF DISPERSION/VARIATION
Measure of dispersion is a numerical value computed from the given observations,
measures how the data spreads from the central location. This often used in comparing two sets
of data. The lesser the measure is, the closer the values of the observations from the central value.
The common measures of dispersion/variation are the range, variance and standard deviation.

4.2.1. Range
Range is the difference between the highest value and the lowest value
𝑅 = 𝐻𝑉 − 𝐿𝑉

Example 8. The CMUCAT scores of a sample of 5 students who joined the university
during the first semester of SY 2020-2021 were found to be 78, 90, 89, 95, and 88. Find
the range of the CMUCAT score.

Solution: The highest CMUCAT score is 95 and the lowest CMUCAT score is 78; hence
the range is 17, that is,
𝑅 = 95 − 78 = 17.

Example 9. The number of hours spent by 12 students in studying their Statistics lesson
before exam were recorded as follows: 9, 11, 16, 11, 15, 12, 10, 16, 13, 11, 11, 17. Find
the range of the number of hours spent by 12 students in studying their Statistics lesson
before exam.

Solution: The highest value is 17 and the lowest value is 9; hence the range is 8, that is,
𝑅 = 17 − 9 = 8

4.2.2. Variance
Variance is another measure of variation which can be used instead of the range. The
variance considers the deviation of each observation from the mean. The computing formulas
are defined below.
➢ Population Variance
𝑁 𝑁
∑(𝑥𝑖 − 𝜇)2 ∑ 𝑥𝑖 2 − 𝑁𝜇 2
𝑖=1 𝑖=1
2
𝜎2 =
or 𝜎 =
𝑁 𝑁

where

𝜎 2 – population variance
𝜇 – population mean

𝑁 – population size or total number of observations

𝑥𝑖 – set of data or observations

First Semester 5 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


➢ Sample Variance
𝑛 𝑛 2
𝑛 𝑛 ∑ 𝑥𝑖 2 − ൭∑ 𝑥𝑖 ൱
∑(𝑥𝑖 − 𝑥̅ )2 𝑖=1 𝑖=1
𝑖=1 or
𝑠2 = 𝑠2 =
𝑛−1 𝑛(𝑛 − 1)
where

𝑠 2 – sample variance
𝑥̅ – sample mean

𝑛 – sample size or total number of observations

𝑥𝑖 – set of data or observations

Example 10. Refer to Example 1 and compute the variance.


Solution: The computed mean (𝜇) was 12.67 and the number of observations is 𝑁 = 12.
Since it was mentioned above (Example 1) that the data is a population data, hence we are
going to use the formula of the population variance, that is,

12

∑(𝑥𝑖 − 𝜇)2
𝑖=1
(𝑥1 − 𝜇)2 + (𝑥2 − 𝜇)2 + (𝑥3 − 𝜇)2 + ⋯ + (𝑥12 − 𝜇)2
𝜎 =2 =
12
12
(9 − 12.67)2 + (11 − 12.67)2 + (16 − 12.67)2 + ⋯ + (17 − 12.67)2
=
12

(−3.67)2 + (−1.67)2 + (3.33)2 + ⋯ + (4.33)2


= = 6.56
12

The population variance (𝜎 2 ) is 6.56.

Example 11. Refer to Example 2 and compute the sample variance.


Solution: From Example 2, 𝑥̅ = 88 and 𝑛 = 5; hence the sample variance is,
5

∑(𝑥𝑖 − 𝑥̅ )2
𝑖=1
(𝑥1 − 𝑥̅ )2 + (𝑥2 − 𝑥̅ )2 + (𝑥3 − 𝑥̅ )2 + (𝑥4 − 𝑥̅ )2 + (𝑥5 − 𝑥̅ )2
2 =
𝑠 = 5−1
5−1
(78 − 88)2 + (90 − 88)2 + (89 − 88)2 + (95 − 88)2 + (88 − 88)2
=
5−1
(−10)2 + (2)2 + (1)2 + (7)2 + (0)2 100 + 4 + 1 + 49 + 0 154
= = = = 38.5
4 4 4
The sample variance (𝑠 2 ) is 38.5.

First Semester 6 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


4.2.3. Standard Deviation
The standard deviation, 𝜎 for a population or 𝑠 for a sample, is the positive square root of the
variance.
➢ Population Standard Deviation 𝜎 = +√𝜎 2
➢ Sample Standard Deviation 𝑠 = +√𝑠 2
Note: A smaller standard deviation indicates that the data set tend to be closer to the mean.

Example 12. Refer to Example 10 and compute the standard deviation.

Solution: Given that 𝜎 2 = 6.56, hence 𝜎 = √6.56 = 2.56.

The population standard deviation (𝜎) is 2.56.

Example 13. Refer to Example 11 and compute the standard deviation.

Solution: Given that 𝑠 2 = 38.5, hence 𝑠 = √38.5 = 6.2.

The sample standard deviation (𝑠) is 6.2.

4.3. LINEAR CORRELATION AND SIMPLE LINEAR REGRESSION

4.3.1. Linear Correlation


Correlation analysis attempts to measure the strength of the relationship between two random
variables by means of a single number called correlation coefficient. This concerned only with the strength
of the relationship and no causal effect is implied. The estimated sample correlation coefficient, denoted
by (r ), is given by:
n n n
n xi yi −  xi  yi
r= i =1 i =1 i =1
where n is the sample size
 n 2  n  2
  n 2  n 2 
n xi −   xi   n yi −   yi  
 i=1  i=1    i=1  i=1  

The Sample Pearson Correlation Coefficient can be interpreted in the following manner:
1. The value of r, ranges from -1 to +1. If r = +1 or r = -1, there is a perfect linear relationship and all
points lie in the straight line.
2. An r close to +1 indicates a high positive linear relationship between the two variables X and Y,
that is, if the value of X increases then the value of Y also increases.
3. An r close to -1 indicates a high negative linear relationship between the sample values, that is, the
value of X decreases as the value of Y increases.
4. An r near 0 means that there is a lack of linearity between the two variables, or there is no linear
relationship between them. This doesn’t mean they are not associated at all because the relationship
maybe nonlinear.

Scatter diagram is a graphical presentation of the independent variable (plotted on the horizontal
axis) and the dependent variable (plotted on the vertical axis). Through this graph or diagram is the easiest
way to determine if a relationship exists between the two variables.

First Semester 7 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


The figure below are the scatter diagrams showing the different types of linear relationships.

Figure 1.Direct Linear Relationship Figure 2.Inverse Linear Relationship

Note: The correlation coefficient remains high (𝑟 ≈ ±1) value when the points cluster fairly around a
straight line (Figure 1 and Figure 2).

Figure 3. No linear Relationship Figure 4. No Linear Relationship


Note:
• In Figure 3, the coefficient r becomes smaller as the distribution of points cluster less closely
around the line, and it becomes virtually zero when the distribution shows randomness.
• Figure 4 shows a neat curvilinear relationship between the variables and it can be verified that its
linear correlation coefficient will be low or near 0.

The Sample Coefficient of Determination, r 2 , is a number that determine the total variation in the
values of variable Y that can be accounted for or explained by the linear relationship with the values of the
variable X . It is usually expressed as a percentage. For example, if the correlation coefficient, r, is 0.60,
then 𝑟 2 = (0.60)2 = 0.36 = 36%. This means that 36% of the total variation of Y can be explained by its
linear relationship X.

4.3.2. Simple Linear Regression


Regression analysis is a statistical method which makes use of the relationship between two or
more quantitative variables so that one variable, called the dependent variable or response variable, can
be predicted with the knowledge of the values of the other variable, called the independent variable or
explanatory variable. A mathematical equation that allows us to predict values of one dependent variable
from known values of one or more independent variable is called a regression equation.
𝑌 = 𝑎 + 𝑏𝑋
Regression analysis deals with finding estimates of the constants a and b so that once an
estimate of the constants is found, a value 𝑌̂ can be predicted from known value of X through the
regression equation
̂=𝒂
𝒀 ̂𝑿
̂+𝒃

First Semester 8 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


where 𝑌̂ – is the predicted dependent variable;
𝑋 – is the independent variable;
𝑎̂ – is the least squares estimates of the parameter 𝑎; and
𝑏̂ – is the least squares estimates of the parameter 𝑏.

Assumptions on Regression Analysis


i. The values of the independent variable X may be “fixed”, that is, X values may be selected in
advance by the researcher, or they may be obtained without the imposition of any restriction, in
which case, X is not a random variable.
ii. The values of X are measured without error.
iii. The dependent variable Y , given different values of the independent variable 𝑋 is normally
distributed.
iv. The variances of the dependent variable Y, given different values of the independent variable X are
equal.
Note: For iii and iv, this is a condition known as homoscedasticity.

Estimation of Parameters
Given the sample {( xi , yi ), i = 1, 2, 3, n} the least squares estimate of the parameters in the
regression line are:

𝑏̂ =

where 𝑏 is the regression coefficient or the slope of the regression line and 𝑎 is the constant of regression
or the y-intercept of the regression line. Moreover,
𝑛 𝑛
1 1
𝑦ത = ∑ 𝑦𝑖 𝑎𝑛𝑑 𝑥̅ = ∑ 𝑥𝑖
𝑛 𝑛
𝑖=1 𝑖=1
are the means of the sample values of 𝑋 and 𝑌, respectively.

Example 14. A person’s muscle mass is expected to decrease with age. To explore this relationship, a
researcher randomly selected 10 persons from ages 40 to 79 years old and measured their muscle mass(unit).
The result is as follows:

X (age) 71 64 43 67 56 73 68 56 76 65
Y (muscle mass) 82 91 100 68 87 73 78 80 65 84
Based on the given data, do the following:
a. Plot the scatter diagram of the given data.
b. Find the sample coefficient of determination, 𝑟 2 and interpret the result.
c. Obtain the regression line equation.
d. Estimate the muscle mass when age of the person is 60 years old.

First Semester 9 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


Solution:
a. The scatter diagram of the given data.
110
100

Muscle Mass
90
80
70
60
40 50 60 70 80
Age of a Person

A decreasing slope is observed indicating a negative relationship between X and Y.

b. To solve for 𝑟 2 , we have the following given and computations:


𝑛 = 10;
𝑥1 = 71, 𝑥2 = 64, 𝑥3 = 43, 𝑥4 = 67, 𝑥5 = 56, 𝑥6 = 73, 𝑥7 = 68, 𝑥8 = 56, 𝑥9 = 76, 𝑥10 = 65;
𝑦1 = 82, 𝑦2 = 91, 𝑦3 = 100, 𝑦4 = 68, 𝑦5 = 87, 𝑦6 = 73, 𝑦7 = 78, 𝑦8 = 80, 𝑦9 = 65, 𝑦10 = 84;

10

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥10 = 71 + 64 + ⋯ + 65 = 639;
𝑖=1
10

∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦10 = 82 + 91 + ⋯ + 84 = 808;
𝑖=1
10

∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥10 𝑦10 = 71(82) + 64(91) + ⋯ + 65(84) = 50887;


𝑖=1
10

∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + ⋯ + 𝑥10 2 = 712 + 642 + ⋯ + 652 = 41701 ;


𝑖=1
10

∑ 𝑦𝑖 2 = 𝑦1 2 + 𝑦2 2 + ⋯ + 𝑦10 2 = 822 + 912 + ⋯ + 842 = 66292.


𝑖=1

10(50887) − (639)(808)
𝑟= = −0.7961449318 ≈ −0.796,
√[10(41701) − 6392 ][10(66292) − 8082 ]

indicating a negative linear relationship between X (age of the person) and Y (muscle mass).

The sample coefficient of determination 𝑟 2 is computed as

𝒓𝟐 = (−𝟎. 𝟕𝟗𝟔)𝟐 × 𝟏𝟎𝟎% = 𝟔𝟑. 𝟑𝟔%

which means that 63% of the total variation of the muscle mass is explained or accounted
for by the age of the person.

First Semester 10 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


c. To solve for the estimates b and a, we have the following given and computations:
𝑛 = 10;
𝑥1 = 71, 𝑥2 = 64, 𝑥3 = 43, 𝑥4 = 67, 𝑥5 = 56, 𝑥6 = 73, 𝑥7 = 68, 𝑥8 = 56, 𝑥9 = 76, 𝑥10 = 65;
𝑦1 = 82, 𝑦2 = 91, 𝑦3 = 100, 𝑦4 = 68, 𝑦5 = 87, 𝑦6 = 73, 𝑦7 = 78, 𝑦8 = 80, 𝑦9 = 65, 𝑦10 = 84;

10

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥10 = 71 + 64 + ⋯ + 65 = 639;
𝑖=1
10

∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦10 = 82 + 91 + ⋯ + 84 = 808;
𝑖=1
10

∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥10 𝑦10 = 71(82) + 64(91) + ⋯ + 65(84) = 50887;


𝑖=1
10

∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + ⋯ + 𝑥10 2 = 712 + 642 + ⋯ + 652 = 41701 ;


𝑖=1
10

∑ 𝑦𝑖 2 = 𝑦1 2 + 𝑦2 2 + ⋯ + 𝑦10 2 = 822 + 912 + ⋯ + 842 = 66292.


𝑖=1
𝑛 𝑛
1 1 1 1
̅ = ∑ 𝑦𝑖 =
𝑦 𝑥 = ∑ 𝑥𝑖 = (639) = 63.9.
(808) = 80.8 ; 𝑎𝑛𝑑 ̅
𝑛 10 𝑛 10
𝑖=1 𝑖=1

10(50887) − (639)(808) 508870 − 516312 −7442


𝑏̂ = =
10(41701) − 6392
=
417010 − 408321
=
8689

= −0.8564852112 ≈ −0.8565.

𝑎̂ = 𝑦ത − 𝑏̂𝑥̅ = 80.8 − (−0.8564852112)(63.9) = 135.529405 ≈ 135.5294

Therefore, the estimated regression line is ̂


𝒀 = 𝟏𝟑𝟓. 𝟓𝟐𝟗𝟒 − 𝟎. 𝟖𝟓𝟔𝟓𝑿, that is,
𝑌̂ = 𝑎̂ + 𝑏̂𝑋
̂𝒀 = 135.5294 + (−0.8565)𝑋
= 𝟏𝟑𝟓. 𝟓𝟐𝟗𝟒 − 𝟎. 𝟖𝟓𝟔𝟓𝑿.
The negative slope indicates that as the person gets older, the muscle mass decreases.

d. The predicted muscle mass of a person who is 60 years old is


̂𝒀 = 𝟏𝟑𝟓. 𝟓𝟐𝟗𝟒 − 𝟎. 𝟖𝟓𝟔𝟓(𝟔𝟎) = 𝟖𝟒. 𝟏𝟒𝟓𝟒 ≈ 𝟖𝟒 𝒖𝒏𝒊𝒕𝒔

First Semester 11 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


PRACTICE EXERCISE

1. The following are the IQ scores of a random sample of 20 Senior High School Students enrolled
at CMU:
110 100 87 101 95 107 100 100 102 90
101 98 104 105 97 96 102 99 98 103

Calculate the following:


a. Mean
b. Median
c. Mode
d. Variance
e. Standard Deviation
f. Range

2. Consider the data below, where X is the number of hours spent in studying and Y is the exam
score
X 3 5 4 10 9 8 7 6 5 4 12 3
Y 30 54 40 90 85 82 78 68 60 48 96 35

Find the following:


a. Plot the scatter diagram of the given data.
b. Find the sample coefficient of determination, 𝑟 2 and interpret the result.
c. Obtain the regression line equation.
d. Estimate the exam score when the number of hours spent in studying is 20 hours.

First Semester 12 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


SOLUTIONS OF PRACTICE EXERCISE

1. The following are the IQ scores of a random sample of 20 Senior High School
Students enrolled at CMU:
110 100 87 101 95 107 100 100 102 90
101 98 104 105 97 96 102 99 98 103

Calculate the following:


a. Mean
b. Median
c. Mode
d. Variance
e. Standard Deviation
f. Range

Solutions:
Arranged the data in ascending order:
87, 90, 95, 96, 97, 98, 98, 99, 100, 100, 100, 101, 101, 102, 102, 103, 104, 105, 107, 110
and 𝑛 = 20

a. Mean
𝑛
1 1
𝑥̅ = ∑ 𝑥𝑖 = (87 + 90 + 95 + ⋯ + 110) = 99.75.
𝑛 20
𝑖=1
Hence, the average IQ scores of 20 Senior High School Students enrolled at CMU is 99.75.

b. Median
• Since n is even, then the median is
𝑥(𝑛) + 𝑥(𝑛+1) 𝑥(20⁄2) + 𝑥(20+1) 𝑥10 + 𝑥11
2
𝑥̃ = 2 2
= =
2 2 2
Since 𝑥10 = 100 𝑎𝑛𝑑 𝑥11 = 100, then
100+100 200
𝑥̃ = 2
= 2 = 100.
Thus, the median of the IQ scores of 20 SHS students is 100.

c. Mode
The value with the greatest frequency is 100 because it occurs three times.

First Semester 13 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


d. Variance

𝑛 𝑛 2
2
𝑛 ∑ 𝑥𝑖 − ൭∑ 𝑥𝑖 ൱
𝑖=1 𝑖=1

𝑠2 =
𝑛(𝑛 − 1)
20(872 + 902 + 952 + ⋯ + 1102 ) − (87 + 90 + 95 + ⋯ + 110)2
= = 28.20.
20(20 − 1)

Hence, the variance is 28.20.


e. Standard Deviation
𝑠 = √𝑠 2 = √28.20 = 5.31.
Hence, the standard deviation is 5.31.

f. Range
𝑅 = 𝐻𝑉 − 𝐿𝑉 = 110 − 87 = 23
Hence, the range is 23.

2. Consider the data below, where X is the number of hours spent in studying and Y is the exam
score
X 3 5 4 10 9 8 7 6 5 4 12 3
Y 30 54 40 90 85 82 78 68 60 48 96 35
Find the following:
a. Plot the scatter diagram of the given data.
b. Find the sample coefficient of determination, 𝑟 2 and interpret the result.
c. Obtain the regression line equation.
d. Estimate the exam score when the number of hours spent in studying is 20 hours.
Solution:
a. The scatter diagram of the given data.
120

100

80
Exam Score

60

40

20

0
0 2 4 6 8 10 12 14
Number of Hours Spent in Studying

An increasing slope is observed indicating a positive relationship between X and Y.


First Semester 14 CMU Mathematics Department
Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


2
b. To solve for 𝑟 , we have the following given and computations:
𝑛 = 12;
𝑥1 = 3, 𝑥2 = 5, 𝑥3 = 4, 𝑥4 = 10, 𝑥5 = 9, 𝑥6 = 8, 𝑥7 = 7, 𝑥8 = 6, 𝑥9 = 5, 𝑥10 = 4, 𝑥11 = 12, 𝑥12 = 3;
𝑦1 = 30, 𝑦2 = 54, 𝑦3 = 40, 𝑦4 = 90, 𝑦5 = 85, 𝑦6 = 82, 𝑦7 = 78, 𝑦8 = 68, 𝑦9 = 60, 𝑦10 = 48, 𝑦11
= 96, 𝑦12 = 35;

12

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥12 = 3 + 5 + ⋯ + 3 = 76;
𝑖=1
12

∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦12 = 30 + 54 + ⋯ + 35 = 766;
𝑖=1
12

∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥12 𝑦12 = 3(30) + 5(54) + ⋯ + 3(35) = 5544;


𝑖=1
12

∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + ⋯ + 𝑥12 2 = 32 + 52 + ⋯ + 32 = 574 ;
𝑖=1
12

∑ 𝑦𝑖 2 = 𝑦1 2 + 𝑦2 2 + ⋯ + 𝑦12 2 = 302 + 542 + ⋯ + 352 = 54518.


𝑖=1

12(5544) − (76)(766)
𝑟= = 0.9596877969 ≈ 0.9597,
√[12(574) − 762 ][12(54518) − 7662 ]

indicating a negative linear relationship between X (age of the person) and Y (muscle mass).

The sample coefficient of determination 𝑟 2 is computed as

𝒓𝟐 = (𝟎. 𝟗𝟓𝟗𝟕)𝟐 × 𝟏𝟎𝟎% = 𝟗𝟐. 𝟏𝟎%

which means that 92% of the total variation of the exam score (Y) can be explained by its
linear relationship with the number spent by studying (X).

c. To solve for the estimates b and a, we have the following given and computations:
𝑛 = 12;
𝑥1 = 3, 𝑥2 = 5, 𝑥3 = 4, 𝑥4 = 10, 𝑥5 = 9, 𝑥6 = 8, 𝑥7 = 7, 𝑥8 = 6, 𝑥9 = 5, 𝑥10 = 4, 𝑥11 = 12, 𝑥12 = 3;
𝑦1 = 30, 𝑦2 = 54, 𝑦3 = 40, 𝑦4 = 90, 𝑦5 = 85, 𝑦6 = 82, 𝑦7 = 78, 𝑦8 = 68, 𝑦9 = 60, 𝑦10 = 48, 𝑦11
= 96, 𝑦12 = 35;

12

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + ⋯ + 𝑥12 = 3 + 5 + ⋯ + 3 = 76;
𝑖=1
12

∑ 𝑦𝑖 = 𝑦1 + 𝑦2 + ⋯ + 𝑦12 = 30 + 54 + ⋯ + 35 = 766;
𝑖=1
12

∑ 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + ⋯ + 𝑥12 𝑦12 = 3(30) + 5(54) + ⋯ + 3(35) = 5544;


𝑖=1
12

∑ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + ⋯ + 𝑥12 2 = 32 + 52 + ⋯ + 32 = 574 ;
𝑖=1
First Semester 15 CMU Mathematics Department
Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)
lOMoARcPSD|24995734

Mathematics in The Modern World GEC 14 Teachers


12

∑ 𝑦𝑖 2 = 𝑦1 2 + 𝑦2 2 + ⋯ + 𝑦12 2 = 302 + 542 + ⋯ + 352 = 54518.


𝑖=1
𝑛 𝑛
1 1 1 1
̅ = ∑ 𝑦𝑖 =
𝑦 ̅ = ∑ 𝑥𝑖 =
(766) = 63.83333333 ; 𝑎𝑛𝑑 𝑥 (76) = 6.333333333.
𝑛 12 𝑛 12
𝑖=1 𝑖=1

12(5544) − (76)(766) 66528 − 58216 8312


𝑏̂ = =
12(574) − 762
=
6888 − 5776
=
1112

= 7.474820144 ≈ 7.4748

𝑎̂ = 𝑦ത − 𝑏̂𝑥̅ = 63.83333333 − (7.474820144)(6.333333333) = 16.4928

Therefore, the estimated regression line is


𝑌̂ = 𝑎̂ + 𝑏̂𝑋
̂𝒀 = 𝟏𝟔. 𝟒𝟗𝟐𝟖 + 𝟕. 𝟒𝟕𝟒𝟖𝑿.
The positive slope indicates that as the number of hours spent increases, the exam score
increases.

d. The exam score when the number of hours spent in studying in 20 hours is
̂𝒀 = 𝟏𝟔. 𝟒𝟗𝟐𝟖 + 𝟕. 𝟒𝟕𝟒𝟖(𝟐𝟎) = 𝟏𝟔𝟓. 𝟗𝟖𝟖𝟖 ≈ 𝟏𝟔𝟔

Reference: Supe, A., et. al., (2013). Elementary Statistics. Central Book Supply Inc.

First Semester 16 CMU Mathematics Department


Downloaded by Deniel Denamarca (deniedenamarca@gmail.com)

You might also like