Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

GEMMW

Mathematics in the Modern World

MODULE 3: DATA MANAGEMENT

Collection of data
Organize the data - tally
Presentation of data – graphs and table
Analysis of the data
Interpretation of the data

respondent f %
s
boys 15 30
girls 35 70

Learning Outcomes:

At the end of this module, the students should be able to:

1. Solve and interpret the measures of central tendency for ungrouped data.
2. Solve and interpret the range, variance, standard deviation, coefficient of variation
and skewness.
3. Apply the correlation to determine the relationship between two variables.
4. Use linear regression to predict the value of a variable given certain conditions.
5. Use a variety of statistical tools to process and manage numerical data.

MEASURES OF CENTRAL TENDENCY

*Central Tendency – value/s that represents the whole set of data.


MEAN (x)

- computational average
- the sum of all n values divided by the total frequency

● Arithmetic Mean

∑ ❑x
Where: x represents the value of an observation
x= ❑
n
n represents the total number of observations

● Weighted Mean

∑ ❑wx
w x= ❑
❑ Where: x represents each of the item values
∑ ❑w

w represents the weight of each item value

∑ ❑ fx
Where: f represents the frequency
w x= ❑
n
n represents the sample size

● Properties of the Mean:


1. Always a unique value in any set of data.
2. Associated with the interval or ratio data.
3. Strongly influenced by the extreme values in a set of data.
4. Most reliable measure of central tendency.

MEDIAN (~
x)

- Positional average
- the center most or the middle most observation or value (when n is odd) or
the average of the two middle values (when n is even) when the data are
arranged (either ascending or descending)
- divides the set of data into two equal parts (half of the observation belongs to
the higher 50%, while the other half belongs to the lower 50% of the group)

● Properties of the Median:


1. Always a unique value in any set of data.
2. Associated with ordinal data.
3. Is not affected by extreme values.
4. A positional measure.

MODE (^x )

- Nominal average
- the most frequently occurring score in a distribution
- the observation or value which appears the most number of times in the set of
values

● Properties of the Mode:


1. Not affected by extreme values.
2. It may not exist.
3. If the mode exists, it may not always be unique.
4. In finding the mode, we do not consider all the values in the distribution.
5. Associated with nominal data.

Examples:

Find the mean, median and mode of the following set of data.

1. 17 25 34 25 27 19 24

17+25+34 +25+27+19+ 24 171


x= = ≈ 24.43
7 7
● In getting the median, arrange first the data (either ascending or
descending), then get the middlemost (if n is odd) or the average of the
two middle values (if n is even).
~
x ⇒ 17, 19, 24, 25, 25, 27, 34
~
x=¿ 25

^x =¿ 25

2. 40 52 50 48 56 60 37 65 40 50 65

40 ( 2 ) +52+50 ( 2 )+ 48+56+ 60+37+65( 2) 563


x= = ≈ 51.18
11 11
~
x ⇒ 37, 40, 40, 48, 50, 50, 52, 56, 60, 65, 65
~
x=¿ 50

^x =¿ 40, 50 and 65

3. 87 94 36 56 54 76 87 54 87 36

667
x= =¿ 66.7
10

~
x ⇒ 36, 36, 54, 54, 56, 76, 87, 87, 87, 94
~ 56+76 132
x= = =¿ 66
2 2

^x =¿ 87

4. 21 23 16 15 26 27 19 24

171
x= =¿ 21.375 ≈ 21.38
8

~
x ⇒ 15, 16, 19, 21, 23, 24, 26, 27
~ 21+23 44
x= = =¿ 22
2 2

^x =¿ no mode

⮚ Weighted Mean

1. Supposed we are interested in computing the weighted mean of a BS Math


student in a certain university where he is enrolled in 6 subjects having different
unit load, as follows:

Subjec No. of Grades


wx
t units (w) (x)
1 5 2.25 11.25
2 3 2.75 8.25
3 4 3.00 12.00
4 3 1.25 3.75
5 1 2.00 2.00
6 2 2.00 4.00
❑ ❑
∑ ❑ w=¿ ∑ ❑ wx=¿
❑ ❑
18 41.25


∑ ❑wx
41.25
w x= ❑
= =¿ 2.29

18
∑ ❑w

2. If 8 000 books of Algebra were sold at ₱320 each, 1 500 Business Mathematics at
₱380 each, 1 000 Mathematics of Investment at ₱300 each and 3 500 Statistics at
₱340 each, find the weighted mean sales for the four books.

Book Title No. of books (w) Price (x) wx


Algebra 8 000 ₱320 2 560 000
Business
1 500 ₱380 570 000
Mathematics
Mathematics of
1 000 ₱300 300 000
Investment
Statistics 3 500 ₱340 1 190 000


∑ ❑ wx=¿ 4 620
∑ ❑ w=¿ 14 000 ❑

000


∑ ❑wx
4 620 000
w x= ❑
= =¿ ₱330.00

14 000
∑ ❑x

3. Miss Z has 21 students in a specific subject. These students were asked on how
often Miss Z gives assignment. Of these students, 18 answered (4) very often, 2
answered (3) often, 1 for (2) seldom and nobody for (1) never.


∑ ❑wx
18 ( 4 ) +2 ( 3 ) +1 (2 )+ 0(1)
w x= ❑
= =¿ 3.81(very often)

21
∑ ❑x

Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 1
Measures of Central Tendency

Find the mean, median and mode of the following data.

a. 21 10 36 42 39 52 30 25 26

x=¿ _________ ~
x=¿ _________ ^x =¿ _________

b. 21 55 25 30 26 36 42 39 36 25

x=¿ _________ ~
x=¿ _________ ^x =¿ _________

c. 108 120 154 118 125 164 135

x=¿ _________ ~
x=¿ _________ ^x =¿ _________

d. 31 21 16 15 21 27 19 18

x=¿ _________ ~
x=¿ _________ ^x =¿ _________

e. 87 94 36 56 54 76 87 85 68 56 78 88

x=¿ _________ ~
x=¿ _________ ^x =¿ _________

f. A student gets the following grades in his seven subjects: 87 for Calculus, 82
for Physics, 79 for Chemistry, 81 for English and 83 for History. Compute for
his mean grade if the weights for the five subjects are 5.0, 4.0, 4.0, 3.0 and
3.0, respectively. x=¿ _________

g. It was recorded that 5 brands of ballpen with tag prices of ₱7.50, ₱8.00,
₱9.00, ₱10.00 and ₱12.50 were bought by 16, 5, 4, 12 and 6 students. Find
the mean sale. x=¿ _________

h. Jessie Salvador, an Engineering student got 88%, 85%, 91% and 93% in four
of his subjects. What grade must he get in his fifth subject in order to obtain
an average of 90%? x=¿ _________
i. The table below shows the number of respondents who answered 5, 4, 3, 2
and 1 on three questions. Compute for the weighted mean and give the mean
interpretation using the scale below:

Mean Interpretation
1.00 – 1.79 To a Very Slight Extent (VSE)
1.80 – 2.59 To a Slight Extent (SE)
2.60 – 3.39 To a Moderate Extent (ME)
3.40 – 4.19 To a Great Extent (GE)
4.20 – 5.00 To a Very Great Extent (VGE)

Interpretatio
5 4 3 2 1 wx̅
n
To what extent do you think
Statistics will help you in your 15 20 5 0 0
chosen career?
To what extent do you think
Statistics will help you in doing 10 25 3 2 0
research?
To what extent do you think
Statistics will help you in real life 11 16 8 5 0
situation?
MEASURES OF VARIABILITY OR DISPERSION

The measures of variability indicate the degree or extent to which numerical


values are dispersed or spread out about the average value (mean) in a distribution.
The most commonly used measures of variations are the range, variance and
standard deviation.

RANGE (R)
The range, which is the simplest to compute, is the difference between the
largest and the lowest values in the set of numerical data. This is a poor and
unstable measure of variation, particularly, if we consider a large number of
values. It is least reliable and should be used only when someone wants to obtain
a quick measure of variation.

THE VARIANCE (s2) AND THE STANDARD DEVIATION (s)


The variance is the average of the squared deviation values from the
distribution’s mean. The standard deviation which is the positive square root of
the variance measures the spread or dispersion of each value from the mean of
the distribution. It is the most used measure of spread since it improves
interpretability by removing the variance square and expressing deviations in
their original unit, and is significantly related to normal distributions. It is the most
important measure of dispersion since it enables us to determine with a great
deal of accuracy where the values of the distribution are located in relation to the
mean.

The variance and the standard deviation are generally accepted measures
of dispersion, especially in discussions and presentation of reports containing
basic statistics. The standard deviation is more popularly used than the variance
since its value is expressed in the unit of observations and the mean.

Take note: The higher the standard deviation, the more spread or more dispersed
the data are. The smaller the standard deviation, the less spread and
less dispersed, the more homogeneous, more consistent or more
uniform the data are.
❑ ❑ ❑
2 2
∑ ❑(x− x)2 n ∑ ❑ x −( ∑ ❑ x )
s2= ❑ or 2
s= ❑ ❑
n−1 n (n−1)

❑ ❑ ❑


2


2 2
∑ ❑( x−x ) n ∑ ❑ x −( ∑ ❑ x)
or
s= ❑
s2= ❑ ❑
n−1 n( n−1)

Examples:

1. Find the value of the range, variance and standard deviation of the set of data:
17, 25, 24, 18, 20 17, 18. 20, 24, 25 mean = 20.8
R = HV – LV = 25 – 17 = 8

x ( x−x ) ( x−x )2 x2
17 17– 20.8 = –3.8 (–3.8)2 = 14.44 289
18 18 – 20.8 = –2.8 (–2.8)2 = 7.84 324
20 20 – 20.8 = –0.8 (–0.8)2 = 0.64 400
24 24 – 20.8 = 3.2 (3.2)2 = 10.24 576
25 25 – 20.8 = 4.2 (4.2)2 = 17.64 625
104 50.8 2214


∑ ❑( x− x)2
50.8 50.8 12.7 or
s2= ❑
= = =¿
n−1 5−1 4
❑ ❑
2
n ∑ ❑ x 2−( ∑ ❑ x )
5 ( 2214 )−(104)2 254 12.7
s2= ❑ ❑
= = =¿
n (n−1) 5(5−1) 20

s= √ 12.7 ≈ 3.56

2. Suppose two applicants, A and B for secretarial position were given an


examination to test and compare their typing speed. (Assume all factors are
being equal). Each was given nine trials (in minutes) and the results were as
follows:
A: 14 16 18 20 22 24 26 28 30
B: 18 18 20 22 24 24 24 24 24

RA = 30 – 14 = 16 RB = 24 – 18 = 6

Secretary A Secretary B
x x2 x x2
14 196 18 324
16 256 18 324
18 324 20 400
20 400 22 484
22 484 24 576
24 576 24 576
26 676 24 576
28 784 24 576
30 900 24 576
198 4 596 198 4 412

❑ ❑
2 2
n ∑ ❑ x −( ∑ ❑ x )
Secretary A: s2= 9 ( 4 596 )−(198)2 2160
❑ ❑
= = =¿ 30 s= √ 30 ≈ 5.48
n (n−1) 9(9−1) 72
❑ ❑
2 2
n ∑ ❑ x −( ∑ ❑ x )
Secretary B: s2= 9 ( 4 412 )−(198)2 504
❑ ❑
= = =¿ 7 s= √ 7 ≈
n (n−1) 9(9−1) 72
2.65

● Secretary B is more consistent than Secretary A in terms of performance in


the typing test.

Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 2
Measures of Variability or Dispersion

a. The monthly number of cars sold by a car dealer from January to October for a
particular year are: 20 24 12 10 18 4 15 6 11 19.

Find the range, variance and standard deviation.

b. Sample annual salaries, in thousands of pesos, for Manila and Makati are listed.
Manila: 34 25 17 17 27 25 29 33 26
Makati: 26 23 27 28 25 26 18 26 31

*Compute for the range, variance and standard deviation; and interpret the result.
*In which area salary is more consistent?
COEFFICIENT OF VARIATION

When the measures of absolute variability are expressed in some other


measures, the resulting measures are termed measures of relative dispersion.
These measures express the amounts of variation relative to the mean.

When the units of measurement are different, this relative dispersion may also
be used to compare the descriptions of the variability of sets of numerical data. For
instance, you may compare the variability of the ages of 9 children whose mean age
is 10 years with a standard deviation of 2 years, with their weights whose mean is 45
pounds with a standard deviation of 5 pounds, by calculating their measures of
relative dispersion. While it is not logical to compare the values of their standard
deviations in as much as they are expressed in different units of measure, it is,
nevertheless, reasonable to determine measures that would indicate the amounts of
their variations relative to their means.

COEFFICIENT OF VARIATION (CV)


- Expresses the standard deviation as a percentage of the mean.

s
CV = ×100 % Where: s = standard deviation and x = mean
x

Examples:

1. A dealer sells two classes of quality lamps, A and B. Lamp A has a mean life
span of 2000 hours with a standard deviation of 200 hours, while Lamp B has a
mean life span of 2500 hours with a standard deviation of 300 hours. Compare
the dispersion.

Lamp A Lamp B

s 200 s 300
CV = ×100 %= ×100 %=¿ 10% CV = ×100 %= ×100 %=¿ 12%
x 2000 x 2500

Interpretation:
● Lamp B (CV = 12%) has greater relative dispersion or is more variable; more
dispersed than Lamp A (CV = 10%).
● Lamp A has lesser relative dispersion or is more consistent; more uniform;
more homogenous; better than Lamp B.

2. An investor is considering the purchase of 1 of 2 stocks. The yield of company A has


an average of Php105 per share over the past ten years with a standard deviation of
Php15 per share. Company B has yielded an average of Php333 per share during
the same period, with a standard deviation of Php40. Which company is more
consistent?

Company A Company B

s 15 s 40
CV = ×100 %= ×100 % ≈ 14.29%CV = ×100 %= ×100 % ≈ 12.01%
x 105 x 333

Interpretation:
● Company B is more consistent than Company A.
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 3
Coefficient of Variation

1. A random sample of 10 students in a Statistics class got a mean score of 78%


with a standard deviation of 7% and a mean weight of 105 pounds with a
standard deviation of 10 pounds. Determine the coefficient of variation.

● The weight (CV = 9.52%) has greater relative dispersion or is more variable;
more dispersed than the score (CV = 8.97%).
● The score has lesser relative dispersion or is more consistent; more uniform;
more homogenous; better than the weight.

2. In a barangay health center with no more than a hundred patients, a distribution


of two different units is given to compare the dispersion of weights with the
dispersion of heights. The mean height is 5.7 feet with a standard deviation of 0.9
feet and the mean weight is 72.5 kilograms with a standard deviation of 8.1
kilograms.

● The height (CV = 15.79%) has greater relative dispersion or is more variable;
more dispersed than the weight (CV = 11.17%).
● The weight has lesser relative dispersion or is more consistent; more uniform;
more homogenous; better than the height.

3. Two employees A and B are to compare their daily routine of work. A can finish
his job with an average of 1.5 hours with a standard deviation of 0.025 hour,
whereas B can finish the job with an average of 4 hours and a standard deviation
of 0.01 hour. Who is more consistent?

● Employee A is more consistent than Company B.

4. A dealer of an electronic adaptor sells two classes of adaptor, A and B. Adaptor A


has a mean life span of 2 100 hours with a standard deviation of 150 hours, while
Adaptor B has a mean life span of 2 600 hours with a standard deviation of 200
hours. Which adaptor has the greater relative dispersion? Which is more
consistent?

● Adaptor B (CV = 7.69%) has greater relative dispersion or is more variable;


more dispersed than Adaptor A (CV = 7.14%).
● Adaptor A has lesser relative dispersion or is more consistent; more uniform;
more homogenous; better than Adaptor B.
SKEWNESS

Another statistical measure like the central tendency (average) and the
dispersion (variation) is the skewness (symmetry). Skewness (sk) is the degree of
symmetry or departures from symmetry of a set of data. A skewed distribution is
similar in shape to a normal distribution except that it is not symmetrical: the half left
of the polygon is not a mirror image of the right half.

3( x−~x)
sk=
s

Shapes commonly observed:

1. Normal Distribution or Symmetrical


- bell–shaped curve
- the mean is equal to the median and mode
- sk = 0

2. Positively Skewed
- skewed to the right (longer right tail)
- the mean is greater than the median and mode
- sk > 0

3. Negatively Skewed
- skewed to the left (longer left tail)
- the mean is less than the median and mode
- sk < 0

Examples:

1. Determine the coefficient of skewness for each of the following:


i. x=¿ 40 ~ x=¿ 38 s=4

3(x−~x ) 3 (40−38)
sk= = =¿ 1.5 positively skewed
s 4

ii. x=¿ 320 ~


x=¿ 350 s = 40

3(x−~x ) 3 (320−350)
sk= = =¿ –2.25 negatively skewed
s 40

iii. x=¿ 70 ~
x=¿ 70 s = 10
3(x−~ x ) 3 (70−70)
sk= = =¿ 0 symmetrical
s 10
2. A physician conducted a medical research on the study of the spread of cancer
using a group of patients. The results reveal that the mean is 70 days with a
standard deviation of 44 days and a median of 65 days. What is the coefficient of
skewness?

3(x−~x ) 3 (70−65)
sk= = ≈ 0.34 positively skewed
s 44
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 4
Skewness

1. Determine the coefficient of skewness for each of the following sets of data and
describe the result.
a. x=¿ 50 ~x=¿ 40 s = 4.5
~
b. x=¿ 100 x=¿ 120 s = 11.5
c. x=¿ 75 ~x=¿ 85 s = 6.2
~
d. x=¿ 295 x=¿ 250 s = 35

2. At Saint Mary’s Academy, the mean age of the students is 19.2 years, with a
standard deviation of 1.2 years. The median age is 18.6 years. Compute the
coefficient of skewness. Describe the skewness.
CORRELATION

In everyday discourse, almost all statements about the mutual relation


between variables are accepted without question. For example, age and physical
capacity, income and educational attainment, intelligence and academic
performance, cigarette smoking and lung disease, unemployment and the condition
of the economy, and so on. In almost every field, we find that one variable is
somewhat related to another variable, or that relationship exists between variables. It
should be noted, however, that relationship does not mean causality. That is,
relationship does not necessarily imply that one variable is the cause of the other
variable.

The investigation of two or more variables requires not only procedures for
defining and measuring the variables under study, but also for describing the nature
of relations between them. A procedure that may be used to determine the
relationship between variables is the correlation.

Correlation is a statistical tool to measure the association of two or more


quantitative variables. This is a measure of the degree of relationship of two sets of
variables, X and Y. The statistics used to describe the degree or magnitude of
relationship between variables is called a correlation coefficient (r) which is
composed of the direction and magnitude.

The types of correlation may be classified in terms of its magnitude and


direction. The degree or magnitude may be described as perfect, high, moderate or
low. The direction may be classified as positive correlation, negative correlation or
zero correlation. A positive correlation means that there is a direct relationship
between variables. It exists when high values in one variable are associated with
high values in the other variable, and low values in one variable are associated with
low values in the other variable. For instance, if a student top in test X, he is likely to
lead in test Y; and if he is low in test X, he is also likely to be low in test Y. The
negative correlation, on the other hand, exists when high values in one variable are
associated with low values in the second variable, and vice–versa. For instance, a
student who gets a high score in test X is low in test Y and one who is lowest in test
X is highest in test Y. When values in one variable tend to score neither
systematically high nor systematically low in the other variable, then there is zero
correlation.
Here is the correlation scale and the corresponding interpretation of r.
Value of r Interpretation
±1 Perfect Correlation
±0.80 – ±0.99 High Correlation
±0.60 – ±0.79 Moderately High Correlation
±0.40 – ±0.59 Moderate Correlation
±0.20 – ±0.39 Low Correlation
±0.01 – ±0.19 Negligible Correlation
0 No Correlation

Pearson Product Moment Correlation Coefficient

The most widely used measure of correlation is the Pearson Product Moment
Correlation Coefficient or Pearson r which was developed by Karl Pearson. This
statistics is used for interval and ratio type of data. If two variables, X and Y, are
under investigation, the correlation coefficient is determined by:

❑ ❑ ❑
n ∑ ❑ XY −( ∑ ❑ X )( ∑ ❑Y )
r= ❑





√ 2 2 2
[n ∑ ❑ X −( ∑ ❑ X ) ][n ∑ ❑ Y −( ∑ ❑Y ) ]
❑ ❑ ❑ ❑
2

Example:

Determine the degree of relationship between the midterm and final grade of
10 students at a certain university.

Student Midterm Grade Final Grade


A 84 85
B 88 89
C 78 86
D 79 83
E 91 88
F 84 87
G 77 81
H 83 86
I 85 82
J 86 85
Solution:

Midterm Final
Student XY X2 Y2
Grade (X) Grade (Y)
A 84 85 7 140 7 056 7 225
B 88 89 7 832 7 744 7 921
C 78 86 6 708 6 084 7 396
D 79 83 6 557 6 241 6 889
E 91 88 8 008 8 281 7 744
F 84 87 7 308 1 056 7 569
G 77 81 6 237 5 929 6 561
H 83 86 7 138 6 889 7 396
I 85 82 6 970 7 225 6 724
J 86 85 7 310 7 396 7 225
❑ ❑ ❑ ❑ ❑
∑ ❑ X =¿

∑ ❑Y =¿

∑ ❑ XY =¿ 71

∑ ❑ X 2=¿ 69



❑Y 2=¿ 72
835 852 208 901 650

❑ ❑ ❑
n ∑ ❑ XY −( ∑ ❑ X )( ∑ ❑Y )
r= ❑





√ 2 2 2
[n ∑ ❑ X −( ∑ ❑ X ) ][n ∑ ❑ Y −( ∑ ❑Y ) ]
❑ ❑ ❑ ❑
2

10 ( 71208 ) −(835)(852) 660


r= 2 2
= ≈ 0.64
√ [10 ( 69 901 )−(835) ][10 ( 72 650 )−(852) ] √(1785)(596)

Interpretation: There is a moderately high positive correlation between the midterm


and final grade of 10 students.

Spearman Rank–Order Correlation Coefficient

The Spearman Rank–Order Correlation Coefficient or Spearman rho (ρ) is


another statistic in determining the correlation coefficient. This statistic is used to find
out if there is a significant relationship between two variables of ordinal type. In some
cases, values from an interval type of data, such as test scores and grade point
average, may be transformed into ranks. To obtain the value of Spearman rho,
consider this formula:

2
6∑ ❑ D
ρ=1− ❑ Where: D is the difference between ranks
2
n(n −1)

Example:

Compute for the value of Spearman rho and determine the degree of
relationship between capital and profit of dried fish.

Businessmen Capital (X) Profit (Y) RX RY D D2


1 20 000 5 000 6 7 1 1
2 50 000 15 000 3 3.5 –0.5 0.25
3 10 000 3 000 9 9.5 –0.5 0.25
4 100 000 30 000 2 2 0 0
5 18 000 4 000 7 8 –1 1
6 25 000 9 000 5 5 0 0
7 11 000 6 000 8 6 2 4
8 150 000 70 000 1 1 0 0
9 5 000 3 000 10 9.5 0.5 0.25
10 40 000 15 000 4 3.5 0.5 0.25

∑ ❑ D 2=¿

7


6∑ ❑ D2
6 (7 ) 42 0.96
ρ=1− ❑
=1− =1− ≈
2
n ( n −1 ) 2
10 ( 10 −1 ) 990

Interpretation: There is a high positive correlation between the capital and profit of
10 businessmen.
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 5
Correlation

1. The heights and weights of 10 basketball players in the PBA are randomly
selected from different teams. Calculate the value of Pearson r and interpret the
result.

Player Height (X) Weight (Y) XY X2 Y2


A 68 180
B 72 200
C 76 175
D 70 190
E 74 180
F 69 195
G 70 145
H 70 172
I 73 190
J 68 160

2. Compute for the value of Spearman rho and determine the degree of relationship
between weight and height of bottle–fed infants using the same brand of milk.

Infant Weight (X) Height (Y) RX RY D D2


1 27 0.70
2 25 0.64
3 28 0.77
4 23 0.62
5 21 0.60
6 20 0.62
7 29 0.77
8 24 0.64
LINEAR REGRESSION

A linear regression is used to make predictions about a single value. Simple


linear regression involves discovering the equation for a line that most nearly fits the
given data. The linear equation is used to predict values for the data. Simple linear
regression aims to find a linear relationship between a response variable and a
possible predictor variable by the least square method.

A regression equation is a mathematical equation that is used to predict the


values of one dependent variable from unknown values of one or more independent
variables. The variable being predicted or explained is called dependent variable,
while the variable that is used to predict or explain the dependent variable is called
the independent variable.

The least square regression equation can be formed from a set of sample
data using the formula:

y = a + bx

Where: y = predicted or dependent variable


x = predictor or independent variable
a = y–intercept (value of y at point where x = 0)

❑ ❑ ❑ ❑
∑ ❑Y (∑ ❑ X 2)−∑ ❑ X (∑ ❑ XY )
a= ❑ ❑




or
2 2
n ∑ ❑ X −(∑ ❑ X )
❑ ❑

a= y−b x Where: y is the mean of y and x is the mean of x


b is the slope of the line that represents the equation

❑ ❑ ❑ ❑

b=
n (∑ ❑ XY )−(∑ ❑ X )(∑ ❑Y ) or
❑ ❑ ❑
b=



❑ XY −n x y
❑ ❑
2
n( ∑ ❑ X 2 )−( ∑ ❑ X) ∑ ❑ X 2−n x 2
❑ ❑ ❑

Note: The constants a and b in the regression equation are called the regression
coefficients.

Example:
The number of hours 13 students spent in studying for a test and their scores on that
test are shown below, what would be the estimated score if a student studies for 6.5
hours?

Hours spent
0 1 2 4 4 5 5 5 6 6 7 7 8
studying, X
Test Score, Y 40 41 51 48 64 69 73 75 68 93 84 90 95

Solution:

❑ ❑ ❑
From the data above: ∑ ❑ X =¿ 60; ∑ ❑Y =¿ 891; ∑ ❑ XY =¿ 4 620 and
❑ ❑ ❑

∑ ❑ X 2=¿ 346.

First, solve for b.

❑ ❑ ❑

b=
n (∑❑ ❑ XY )−(∑❑ ❑ X )(∑❑ ❑Y ) =
13 ( 4 620 ) −(60)(891) 6 600
= ≈ 7.35
13 ( 346 )−(60)2 898
❑ ❑
2
n( ∑ ❑ X 2 )−( ∑ ❑ X)
❑ ❑

Then, compute for a.

❑ ❑ ❑ ❑
2
∑ ❑Y (∑ ❑ X )−∑ ❑ X (∑ ❑ XY ) 891 ( 346 ) −(60)( 4620) 31 086
a= ❑ ❑ ❑ ❑
= = ≈ 34.62
❑ ❑
2 13 ( 346 )−(60)2 898
n ∑ ❑ X 2−(∑ ❑ X )
❑ ❑

Therefore: y = a + bx Where x = 6.5


y = 34.62 + (7.35) (6.5) –regression equation
y = 34.62 + 47.78
y = 82.40
Name:______________________________________Score:_________________
Section:_____________________________________Date:__________________

Activity 6
Linear Regression

1. The table below shows the monthly income (X) and the monthly expenses (Y) of
7 families in a certain barangay in Makati. Estimate the monthly expenditures of a
family whose income is ₱ 8 250.

Monthly Monthly
Family No. XY X2
Income (X) Expenses (Y)
1 6 600 4 980
2 5 875 4 680
3 7 250 5 650
4 4 925 3 700
5 5 678 5 668
6 5 975 4 260
7 6 950 6 380
References:

Sirug, W. S. (2018). Mathematics in the Modern World

Sirug, W. S. (2018), Elementary Statistics

Blay, B. E. (2013), Elementary Statistics

You might also like