Professional Documents
Culture Documents
Course Pack Correlation
Course Pack Correlation
Course Pack Correlation
This chapter presents the most commonly used techniques for investigating the relationship between two
quantitative variables. The statistical tool used to investigate the strength of the linear relationship between a pair of
variables is called correlation while the tool used to evaluate the relationship of dependent and independent variables in
the equation is called regression. These are vital in the field of economics, business, and other fields of study.
Lesson 1 Correlation
Learning Competencies:
In the previous lesson, you have learned about different statistical tools in testing the hypothesis;
the z-test, -test, F-test (ANOVA), and the H-test.
In this lesson, you will learn about the strength of the linear relationship between two
quantitative variables.
The linear correlation coefficient, denoted by r, measures the strength and the direction of
a linear relationship between two variables. This coefficient is sometimes called Pearson product
D moment correlation since it was developed by the English mathematician and biostatistician, Karl
Pearson.
E
To compute for the value of r, the formula is:
F
I
N
where n is the number of pairs of the values of the variables;
I x are the values of the independent variable; and
T y are the values of the dependent variable.
The following examples identify the independent and dependent variables in the given situations:
1. The time spent by a student in reviewing a lesson can increase his score in an examination.
dependent variable (y) : score in an examination
independent variable (x) : time spent in reviewing a lesson
A positive linear correlation means that as the values of x increases, the value of y also increases. Likewise, as x
decreases, y also decreases. The variables x and y have a strong positive linear correlation if the value of r is
close to 1. Thus, r = 1 indicates a perfect positive correlation.
Perfect negative correlation Low
r = -1 and ρ = -1 negative
correlation
A negative linear correlation means that as the values of x increases, the value of y decreases. The variables x
and y have a strong negative linear correlation if the value of r is close to -1. Thus, r = -1 indicates a perfect
negative correlation.
No correlation
r = 0 and ρ = 0
The variables x and y have a weak positive or negative linear correlation if the value of r is close to 0. Likewise,
r = 0 implies that x and y has no linear correlation.
How can we determine the strength of association based on the Pearson correlation coefficient?
The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, will be to
either +1 or -1 depending on whether the relationship is positive or negative, respectively. Achieving a value of
+1 or -1 means that all your data points are included on the line of best fit – there are no data points that show
any variation away from this line. Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there
is variation around the line of best fit. The closer the value of r to 0 the greater the variation around the line of
best fit. Different relationships and their correlation coefficients are shown in the diagram above.
Illustrative example 2.
Find the value of the correlation coefficient from the following table:
Step 1:Make a chart. Use the given data, and add three more columns: xy, x 2, and y2.
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257.
Step 3: Take the square of the numbers in the x column, and put the result in the x 2 column.
Step 4: Take the square of the numbers in the y column, and put the result in the y 2 column.
Step 5: Add up all of the numbers in the columns and put the result at the bottom of the column. The Greek letter sigma
(Σ) is a short way of saying “sum of.”
Σx = 247
Σy = 486
Σxy = 20,485
Σx2 = 11,409
Σy2 = 40,022
n is the sample size, in our case = 6
The correlation coefficient =
Therefore, the range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which means
the variables have a moderate positive correlation.
Illustrative example 3.
The owner of a chain of fruit shake stores would like to study the correlation between atmospheric temperature
and sales during the summer season. A random sample of 12 days is selected with the results given as follows:
Plot the data on a scatter diagram. Does it appear there is a relationship between atmospheric temperature and sales?
Compute the coefficient of correlation. Determine at the 0.05 significance level whether the correlation in the
population is greater than zero.
Solution:
• H o : r = 0 , There is no correlation between atmospheric temperature and total sales of fruit shake.
• H a : r ≠ 0 , There is a correlation between atmospheric temperature and total sales of fruit shake.
Df = n – 2 = 12 – 2 = 10 and t = ±2.228
The coefficient of correlation, r=0.93, between the atmospheric temperature and total sales
indicates a very high positive correlation (very dependable relationship) – that is an increase in
atmospheric temperature is highly associated with the increased in total sales of fruit shake.
In order to make a decision on the significant relationship we need to determine the value of t.
Step 7 : Conclusion
Since the null hypothesis has been rejected, we can conclude that there is evidence that shows
significant association between the atmospheric temperature and the total sales of fruit shake.
We can use the CORREL function or the Analysis Toolpak add-in in Excel to find the correlation coefficient between
two variables.
6. Click OK.
Result.
Conclusion: variables A and C are positively correlated (0.91). Variables A and B are not correlated (0.19). Variables B and
C are also not correlated (0.11) . You can verify these conclusions by looking at the graph.
Activity 1.
Directions: Read carefully. These questions pertain to correlations and correlation coefficients.
1. r = 0.50
A B C D E F
2. r = 0
A B C D E F
3. r = -0.85
A B C D E F
4. r = 0.92
A B C D E F
5. r = 1
A B C D E
F
6. r = -0.48
A B C D E
F
7. What does it mean to say that data has a strong negative correlation?
Choose: There is no relationship at all between the variables.
More than half of the variables have a negative value.
There is a negative cause and effect relationship.
A linear model with a negative slope is appropriate.
8. Which of the following correlation coefficients represents the strongest linear relationship?
Choose: 0.79 0.36 -0.12 -0.87
11. Which value of r represents data with a strong positive linear correlation betweeen two variables?
Choose:
0.91 0.42 1.03
12. The table at the right shows the average target training heart Age Average Target
rates, by age, according to the American Heart Association. Which value (years) Heart Rate (bpm)
represents the linear correlation coefficient between a person's age, in
years, and that person's average target training heart rate, in beats per 20 135
minute (bpm)? (Round to four decimal places with Age on x-axis and
Heart Rate on y-axis.) 30 129
Choose:
40 122
-0.6652 1.3231
50 115
-0.9996 0.9993
60 108
70 102
Activity 2.
The National Housing Authority (NHA) wants to investigate the relationship between the size of houses and the
rents paid by tenants in Marikina City. The NHA collected the following information on the sizes (in hundreds or square
feet) for eight houses and the monthly rents (in thousands of pesos) paid by the tenants.
Size of 35 40 50 60 28 34 45 25
House
Monthly 11 17 18 20 6 10 19 5
Rent
Construct a scatter diagram for these data. Determine if the relationship exists between the sizes of houses and
the monthly rents using 0.05 significance level.
Activity 3.