Target: Learning Guide Module

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Learning Guide Module

Subject Code (Stat1): Statistics 1


Module Code : 11 (Linear Regression)
Lesson Code: 11.1 - Sample Coefficient of Determination, part 1 (r2)
Time Limit: 30 mins.

This module is a sequel of the Week 2 module which introduced explanatory and response variables
and the relationship that could be derived between them. We learned how to form a regression line between
the explanatory and response variable and interpret the corresponding measure (Pearson r) determined from
this line.
With the two lessons (Lesson 11.1 and 11.2) in this module, we shall be exploring another measure
of the linear relationship between an explanatory (independent) and response (dependent) variables, which is
the sample coefficient of determination (r2) and for Lesson 3.2.3 we shall examine residuals.

TARGET

At the end of Lesson 11.1, learners are expected to:

• determine the relationship between Pearson Correlation Coefficient (r) and sample coefficient
of determination (r2);
• compute r2 manually; and
• interpret the value of r2.

HOOK
You wanted to study the relationship between two
quantitative variables. Given the data you create the
scatterplot as shown in fig.1. Based on this graph, what can
you say about the relationship between these two variables?
Based on the previous lesson, there seemed to be a negative
relationship between these two variables since the scatter
plot showed a linear pattern inclined toward the left. In
addition, the relationship is not very strong, which indicates
that the correlation coefficient is positive. However, there is
still some features about this that needs to be studied further, Figure 1. Sample Scatter Plot
for example, we know that the points do not lie on one line.
Are there reasons why the points are not exactly on the line? Is there such a measure that will quantify this?

Statistics 1 Page 1 of 6
IGNITE

Indeed, there is a need to determine the correlation coefficient (r) to quantify the strength of the relationship
between the explanatory and response variables. Recall that the value of r tells us the strength and direction
of the a linear relationship between these two variables. Now, let us look at the graph below:

Figure 2. Scatter Plot with Regression Line

Notice in figure 2 that there are many data points not included in the regression line. Thus, there are
values in the response variable that cannot be explained by a linear relationship with the values of the
explanatory variable. Hence, there is a need to measure this variation in the linear relationship between these
two variables.

Definition 11.1 The Sample Coefficient of Determination (r2) measures the total
variation in the values of response variable (Y) that can be explained by the linear
relationship with the values of the explanatory variable (X).

Statistics 1 Page 2 of 6
In details, the sample coefficient of determination (r2) can be a measure of the:

1.) degree of closeness of the fit of the regression equation to the sample data. The closer the value of r2
to 1, the better the fit of the computed regression line; and
2.) linearity of the data points. If the regression line fits the data well, data points in the scatter diagram
form a straight line.

How should the coefficient of determination be interpreted? For example, if 𝑟 2 = 0.90, then we say
that 90% of the variations you see in the response variable can be explained by its linear relationship with the
explanatory variable. The other 10% can be attributed to error or some other factor not shown by the data
available. Is this in anyway related to the correlation coefficient, r, or is that pure coincidence that the same
letter was used? Of course they are related!

Take note that the value of r2 is just the square of the Pearson Correlation Coefficient (r). You may
use the correlation coefficient you have learned how to compute previously, and then simply square it. Also,
you have learned how to use the scatterplot function in Excel 2016 in a previous lesson. You can easily get
the value of r2 from the other options available in the scatterplot function. Let use try to demonstrate that using
the following dataset.

Example 1. The following shows the scores on a clerical aptitude test (X) and grades in a clerical skills course
(Y) for 10 business students. Compute and interpret the r2.
X 90 70 65 72 75 82 84 75 60 95
Y 96 72 76 78 80 82 90 86 68 93

First, we try to use the =correl() function. Place the data on Excel (in rows or in columns), in the function,
enter the values (you may include the labels), and then just get the square.

Statistics 1 Page 3 of 6
Similarly, you may use the scatterplot function in Excel 2016 to compute the coefficient of
determination.

First, you create the scatterplot using the method taught previously. Then click the scatterplot, click Chart
Design, then click the Add Chart Element drop down menu, Trendline, and then click More Trendline options

Statistics 1 Page 4 of 6
After doing so, a new panel should appear on the right side of the sheet. Click linear, to add the least squares
line. Then scroll down, check ‘Display Equation on chart’ and ‘Display R-squared value on chart’ then these
two should automatically appear on the scatterplot

This value of r2means that approximately 85% of the variation of the grades in clerical skills is
explained by the scores in the clerical aptitude test, among these 10 business students. The remaining 15%
could be accounted for by other factors and may be explained not through a linear model.

NAVIGATE

Let us now apply what we have learned by doing the given exercise below. Your answers should be
posted in the discussion forum of our Moodle Class. (Note for the teacher: this should be included in the
formative assessment) *NOT-graded

Exercise 1. The data in the table below comprise results on a study made to determine the relationship
between advertising costs and sales. In this study, advertising cost was considered as the explanatory
variable X,while sales was considered as the response variable (Y).

X 30 15 24 37 42 45 48 40 20 25 20 35 50 45 20
Y 400 320 350 490 500 500 530 385 450 390 365 470 485 520 408

Do a regression analysis of the given data. Each answer should be supported with a corresponding solution.
In your analysis, please be guided by the following questions:
(a) What is the value of the Pearson Correlation Coefficient, r?
(b) How do you describe the relationship between advertising cost thru the value of r you computed?
(c) What is the value of the coefficient of determination, r2 ?
(d) Interpret the value of r2 in relation to the variables involved, advertising cost and sales.
Statistics 1 Page 5 of 6
KNOT
The Sample Coefficient of Determination (r2) measures the total variation in the values of
response variable (Y) that can be explained by the linear relationship with the values of the
explanatory variable (X). It can be computed by squaring the Pearson correlation coefficient.

In details, the sample coefficient of determination (r2) can be a measure of the:


1.) degree of closeness of the fit of the regression equation to the sample data. The closer the value of r2
to 1, the better the fit of the computed regression line; and
2.) linearity of the data points. If the regression line fits the data well, data points in the scatter diagram
form a straight line.

To get to know more the r2, please watch this video at

https://www.khanacademy.org/math/ap-statistics/bivariate-data-ap/correlation-coefficient-
r/v/calculating-correlation-coefficient-r

REFERENCES:
1. Albarico, J.M. (2013). THINK Framework. Based on Ramos, E.G. and N. Apolinario.
(n.d.)Science LINKS. Quezon City: Rex Bookstore Inc
2. Albacea, Z.V.J., Abitona, L.P.D., Collado, R.V., …, Ynion, J.C. (2012). Workbook in
Statistics 1. 9th Edition. Institute of Statistics, UP Los Banos College, Laguna, Philippines.
3. Bluman, A.J. (2014). Elementary Statistics: A Step by Step Approach. 9th Ed. McGraw-Hill
Education. New York.
4. Moore, D.S. (2010). The Basic Practice of Statistics. 5th Edition. W.H. Freeman and
Company. New York
5. https://www.khanacademy.org/math/statistics-probability/describing-relationships-
quantitative-data#scatterplots-and-correlation

Prepared by: Arlene Cahoy– Agosto Reviewed by: Myrna B. Libutaque


Position SST Position: SST
Campus CVisC Campus: WVC

© 2020 Philippine Science High School System. All rights reserved. This document may contain proprietary information and may only be
released to third parties with approval of management. Document is uncontrolled unless otherwise marked; uncontrolled documents
are not subject to update notification.

Statistics 1 Page 6 of 6

You might also like