Correlation and Regression Analysis: C H A P T E R 5

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

CORRELATION AND Objectives:

1. Define and explain statistical terms

REGRESSION 2.
3.
4.
5.
Differentiate the different divisions of statistics
Identify the scale of measurement of variables
Differentiate data sets
Present data in three different ways

C ANALYSIS
H
A OBJECTIVES:
1. Differentiate correlation and regression
P
analysis
T 2. Draw a scatter plot for the set of ordered
pairs
E 3. Identify and graph the equation of the
regression line
R 4. Calculate correlations and regressions
using MS- Excel

1. Differentiate group from the ungroup


data of Measures Central Tendency
NO PART OF THIS 2.
eBOOK MAY BE REPRODUCED IN ANY FORM OR BY ANY 1

3.
MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
CORRELATION AND
REGRESSION ANALYSIS
Correlation is a degree of relationship between variables, which seeks to
determine how well a linear or other equation describes or explains the relationship
between variables. It also implies “association” between two variables.

PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT

The Pearson product-moment correlation coefficient (or Pearson r for short) is a


measure of the strength of a linear association between two variables with interval and
ratio type of scale.

N  xy   x y
r
N  x   x N  y   y 
2 2 2 2

where :  x = sum of the values of x


 y = sum of the values of y
 x = sum of the values of the square of x
2

 y = sum of the values of the square y


2

 xy = sum of the values of the product of x and y


n = total number of pair

The Pearson correlation coefficient, r, can take a range of values from +1 to -1.
A value of 0 indicates that there is no association between the two variables.This is
shown in figure 7.

2 NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY


ANY MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
Figure 7:Scatterplot Diagram

The arbitrary scale for the interpretation of r is given below.

Range of computed r Interpretation


± 1.0 Perfect Relationship
± 0.70 to 0.99 Strong/ High Relationship
± 0.40 to 0.69 Moderate Relationship
± 0.01 to 0.39 Slight/ Low Relationship
0 No Correlation

LINEAR REGRESSION

Regression is a term used to describe the process of estimating the relationship


between two variables. The relationship is estimated by fitting a straight line through the
given data. The method of least squares permits us to find a line of best fit called
regression line which keeps the errors of prediction to a minimum.

The equation for a fitted line is:

Y  a  bx
where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted

NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY ANY 3


MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
To find the value of a:

a  y  bx

where:
y = mean value of Y
x = mean value of X

To find the slope b:

N  xy   x y
b
N  x 2   x 
2

where :
 x = sum of the values of x
 y = sum of the values of y
 x = sum of the values of the square of x
2

 xy = sum of the values of the product of x and y


n= total number of pairs

Example

Below are the scores of 12 college students in Mathematics and Physics tests of 80 items
each.

Mathematics (x) 65 63 67 64 68 62 70 66 68 67 69 71
Physics (y) 68 66 68 65 69 66 68 65 71 67 68 70

a. Draw a scatter diagram


b. Find the correlation coefficient of Mathematics and Physics scores and interpret
c. Find the regression line equation
d. Predict the score in Physics (x) if the score in Mathematics (y) of the student is 75

Solution

Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend stop
analysis, conclude “no relationship”. Otherwise proceed to step number 2

4 NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY


ANY MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
72
71
70
69
68
67
66
65
64
60 62 64 66 68 70 72

The scatter plot indicates an upward linear trend between Mathematics and Physics
proficiency. Thus, “there is a reason to believe that they are related.”

Step 2: Compute for Pearson r by rearranging the given in columns.

Numbe Mathematic xy
r s (x)
Physics (y) x2 y2
1 65 68 4225 4624 4420
2 63 66 3969 4356 4158
3 67 68 4489 4624 4556
4 64 65 4096 4225 4160
5 68 69 4624 4761 4692
6 62 66 3844 4356 4092
7 70 68 4900 4624 4760
8 66 65 4356 4225 4290
9 68 71 4624 5041 4828
10 67 67 4489 4489 4489
11 69 68 4761 4624 4692
12 71 70 5041 4900 4970

 x 800  y 811  x 53418 y 54849  xy 54107


2 2
N = 12

NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY ANY 5


MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
r
12 54107   800 811
1253418   800 2 1254849   8112 
r  0.70
Referring to the arbitrary scale for the interpretation of r = 0.70, it states that
there is a strong/ high positive relationship between the scores of the students in
Mathematics and Physics.

Step 3: Formulate the regression line equation by solving first the value of the variables
b and a.

Solving for b

12 54107   800 811


b b  0.48
12 53418   800 2
Solving for a

a  67.58  0.4866.67  a  35.58


Substitute the computed values of b and a to the regression line equation

Y = a + bx
y  35.58  0.48 x Regression line equation

We can now estimate scores in Physics (y) using the regression line equation by
substituting a value or score in Mathematics (x). Say for instance, if x is equal to 75, then
solving for y will give a 71.59.

y  35.58  0.4875
y  71.58
Therefore, the estimated score in Physics is 71.59 or approximately equivalent to
72 if the score in Mathematics is 75. The regression line equation may be used now in
estimating scores for y by substituting a value of x.

6 NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY


ANY MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
Problems:

1. Test scores of nine (9) students are shown below. What can you say about the
strength of the correlation between these sets of scores in Trigonometry and
Geometry?

Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57

2. Calculate the degree of linear relationship for the following number of minutes
consumed in studying and score in the examination.

Number of
27 50 57 15 18 48 52 55 28 32
minutes
Score in
40 53 52 24 21 35 40 39 47 36
examination

3. The number of hours spent per week viewing television (y) and the number of years
of education (x) were recorded for ten randomly selected individuals. The results are
given below;

x 12 14 11 16 16 18 12 20 10 12

y 10 9 15 8 5 4 20 4 16 15
a. Draw the scatter diagram.
b. Find the correlation coefficient of x and y and interpret your answer.
c. Find the regression line equation.
d. What is the predicted value of y if x are 15, 17 and 19.

4. An experiment was completed to study the relationship between concentrations of


estrone in saliva and in free plasma. The following data were obtained:

Subject 1 2 3 4 5 6 7 8 9 10
Estrone in
7.4 7.5 8.5 9.0 9.0 11.0 13.0 14.0 14.5 16.0
Saliva (x)
Estrone in
free 30.0 25.0 31.5 27.5 39.5 38.0 43.2 49.0 55.0 48.5
plasma(y)

NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY ANY 7


MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
a. Compute and interpret the correlation coefficient for the estrone saliva and
estrone in free plasma.

b. Estimate the line of regression of estrone saliva on estrone in free


plasma.

c. If the estrone level is 12.1, predict the level of estrone in free plasma.

5. Compute the correlation ratio between test scores and teaching method.

Teaching Method 54 61 75 63 82 52 63 50
Test scores 76 80 89 80 88 83 79 82

6. A researcher allegedly thinks that a person who works in the academe and spends
years in it receives yearly increment in his salary. So the researcher conducted the
research, gathered data and sought to create a linear regression equation to
represent this allegation. Below are the gathered data.

Monthly Monthly Monthly Monthly


Yrs. Of Salary(i Yrs. Of Salary Yrs. Of Salary Yrs. Of Salary
Experie n Experie (in Experie (in Experie (in
nce Thousan nce Thousan nce Thousan nce Thousan
ds) ds) ds) ds)
7 18 2 14 19 30 35 24
11 19 9 18 10 26 6 17
33 28 12 20 9 24 5 17
24 25 13 25 12 20 7 18
5 19 7 18 13 25 11 16
18 23 10 18 7 18 33 25
35 30 3 12 10 18 20 25
19 21 4 15 3 12 13 25
10 16 18 25 4 15 7 18
8 20 35 28 18 23 10 18

a. Is there a basis for the researcher’s allegation?


b. Define the regression line equation.
c. If one of the male respondent has a 22-year experience, predict his monthly
salary.

8 NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY


ANY MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
7. Calculate the degree of linear relationship for the following number of years of
experiences and monthly salary.

No. of years of experience 7 11 33 24 5 18 35 19 10


Monthly salary 18 19 27 26 16 22 28 23 21

8. A study was conducted to examine the association between adult immunity and
juvenile mortality in southern fur seals. Therefore, researchers determined the
percentage of adult southern fur seals on different island populations that contained
a certain antibody in their blood and they also determined the mortality rate for seal
pups on those same islands. Is there a significant relationship between adult
southern seal immunity and seal pup mortality on these islands?

Antibody
Presence 35 58 69 43 94 26 7 9 12 45 11 66 51
Pup
mortality 115 98 109 63 24 226 357 339 112 145 111 36 54

NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY ANY 9


MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
Multiple Regression

We now consider the problem of estimating or predicting the value of a dependent


variable Y on the basis of a set of measurements taken on several independent variables
x1 , x2 ,…….. xr .

General linear model

y   0  1 x1   2 x2  ........   r xr

where:

y= dependent or criterion variable


 0, 1,  2....... r = parameters to be estimated from the data
x0, x1, x2,.......xr = independent variables

The least square estimates of  0, 1,  2,...... r are obtained by solving simultaneous linear
equations:

n 0  1  x1   2  x2   y

0  x1  1  x12  2  x`1 x2   x1 y

0  x2  1  x1 x2  2  x22  x2 y

Example Problem:

1. The given data below are the number of class periods missed by the 12 students
taking the Business Statistics subject. The data are recorded in the following table:

Business Statistics Grade Test Score Classed Missed


Student
y x1   x2 
1 85 65 1
2 74 50 7
3 76 55 5
4 90 65 2

10 NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY


ANY MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
5 85 55 6
6 87 70 3
7 94 65 2
8 98 70 5
9 81 55 4
10 91 70 3
11 76 50 1
12 74 55 4

a. Fit a regression equation of the form y   0  1 x1   2 x2  ........   r xr .

b. Estimate the grade if the student’s test score is 75 and have missed 3 classes.

2. To develop an equation from which we can predict the gasoline mileage of an


automobile based on its weight and the temperature at the time of operation, these data
are gathered:

Car Number 1 2 3 4 5 6 7 8 9 10
Miles per gallon(y) 17.9 16.5 16.4 16.8 18.8 15.5 17.5 16.4 15.9 18.3
Weight in tons x1  1.35 1.90 1.70 1.80 1.30 2.05 1.60 1.80 1.85 1.40
Temperature in F x2  90 30 80 40 35 45 50 60 65 30
a. Fit a regression curve of the form y  B0  B1 x1  B2 x2

b. Estimate the miles per gallon of an automobile who are 2.50 tons and temperature of
85.

NO PART OF THIS eBOOK MAY BE REPRODUCED IN ANY FORM OR BY ANY 11


MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs

You might also like