Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

CORRELATION ANALYSIS

Is applied in quantifying the association between two continuous variables. For


example, an dependent and independent variables or among two independent
variables. The word "correlation" is made by clubbing the words "co" and "relation".
The word "co" means together, thus, correlation means the relationship between any
set of data when considered together.

Correlation is a statistic that establishes the relationship between two variables. In


other words, it is the measure of association of variables.

Simple correlation: Is the study of the relationship that involves only two variables.
And when two or more variables are involved we speak of multiple correlation.

Scatter Diagram: Is the useful method to investigate if there is any correlation


between two variables.

Linear correlation: Is defined when all the points on a scatter diagram seem to lie
near a straight line. If Y tends to increase as X increase, the correlation is positive and
if Y tends to decrease as X increases the correlation is negative. If some points seems
to lie near some curve the correlation is non-linear.

If all the points lie on a straight line we have perfect linearity, while if all the points
lie on a curve we have perfect non-linearity between the two variables.

Examples: one variable might be the number of hunters in a region and the other
variable could be the deer population. Perhaps as the number of hunters increases, the
deer population decreases. This is an example of a negative correlation: as one
variable increases, the other decreases. A positive correlation is where the two
variables react in the same way, increasing or decreasing together. An increase in the
amount of rain will be accompanied by increase in the sales of umbrellas.

Correlation Coefficient Definition

1
The correlation coefficient is used to measure the strength of the relationship between
two variables.

The value of the correlation coefficient ranges from -1.0 to +1.0

This means that any value beyond this range will be the result of an error in correlation
measurement.

A correlation of value -1.0 means a perfect negative correlation, while a correlation of


+1.0 means a perfect positive correlation.

A correlation of 0.0 means no linear relationship between the movement of the two
variables.

The following correlation graphs show the examples of different range of values for a
correlation coefficient:

Positive correlation

2
Negative correlation

No correlation

There are several types of correlation coefficients, Pearson's correlation (r) being
the most common among all.

It measures the strength and direction of the linear relationship between the two
variables and cannot capture nonlinear relationships between two variables.

It cannot differentiate between dependent and independent variables.

Types of Correlation Coefficient Formulas

a) Pearson’s -product moment Correlation


b) Spearman’s rank correlation
c) Linear Correlation Coefficient
d) Sample Correlation Coefficient
e) Population Correlation Coefficient

3
Pearson’s -product moment Correlation (r)

A person can tell if there is a correlation by how closely the data resemble a line. If
the points are scattered about then, there may be no correlation. If the points would
closely fit a quadratic or exponential equation, etc., then they have a nonlinear
correlation. In

How can you tell by inspection the type of correlation?


If the graph of the variables represent a line with positive slope, then there is a positive
correlation (x increases as y increases). If the slope of the line is negative, then there is
a negative correlation (as x increases y decreases).

An important aspects of correlation is how strong it is. The strength of a correlation is


measured by the correlation coefficient r. Another name for r is the Pearson
product moment correlation coefficient in honor of Karl Pearson who developed it
about 1900. There are at least three different formulae in common used to calculate
this number and these different formulae somewhat represent different approaches to
the problem. However, the same value for r is obtained by any one of the different
procedures. First we give the raw score formula. n has the usual meaning of how
many ordered pairs are in our sample.

Where:

n=Quantity of information (number of ordered pairs in the sample)

Σx=Total of all values for first variable

Σy=Total of all values for second variable

4
Σxy=Sum of product of first and second value

∑ x 2=Sum of squares of the first value

∑ y 2=Sum of squares of the second value

Example 01

The local ice cream shop keeps track of how much ice cream they sell versus the noon
temperature on that day. Here are their figures for the last 12 days:

5
a) Sketch the scatter plot
b) Find the correlation coefficient (r)

Solution

Sketch the scatter plot

6
b). Find the correlation coefficient (r)

From

7
Y

X Co (Ice
X2 Y2 XY
(Temperature) cream
Sales)$

14.2 215 201.64 46225 3053


16.4 325 268.96 105625 5330
11.9 185 141.61 34225 2201.5
15.2 332 231.04 110224 5046.4
18.5 406 342.25 164836 7511
22.1 522 488.41 272484 11536.2
19.4 412 376.36 169744 7992.8
25.1 614 630.01 376996 15411.4
23.4 544 547.56 295936 12729.6
18.1 421 327.61 177241 7620.1
22.6 445 510.76 198025 10057
17.2 408 295.84 166464 7017.6
∑ Y =¿ ¿ ∑ X 2=¿ ¿ ∑ Y 2=¿ ¿2 ∑ XY =¿ ¿
∑ X=¿ ¿224.1
4829 4362.1 118025 95506.6

From the equation above

12 ( 95506.6 )−(224.1)(4829)
r= = 0.95737
√[12 ( 4362.1 )−( 224.1 ) ]¿ ¿ ¿
2

There is a very strong correlation between the Temperature (C o) and Ice cream sales
($)

Table of summary of correlation coefficient (r) and its interpretation

Range Strength of association(correlation)

8
±1 Perfect correlation

±0.75 to ± 1 Very strong correlation

±0.5 to ±0.75 Moderate correlation

± 0.25 to ±0.50 Week correlation

0 to ± 0.25 Negligible correlation

0 No correlation

QUESTIONS

1. The following data refer to the proportion of households owning a television set
and social class index in ten different towns.

9
% 57 54 49 42 38 32 30 24 20 18
with
T.V’s
(X)

Socia 113 111 107 103 100 96 94 84 74 76


l
class
Index
(Y)

Calculate the coefficient of correlation between X and Y are as follows

X 1 2 3 4 5 6 7 8

Y 1.4 1.8 2.9 4.5 5.2 5.3 7.2 7.9

a) Plot a scatter diagram for this information and comment on its feature
b) Compute the correlation coefficient between X and Y using Pearson correlation
coefficient.

2. Values of two variables X and Y

Spearman’s rank correlation

Is the measure of the relationship between two ordinal variables that are related but
not linearly.

10
Spearman’s Rank correlation formula

Spearman’s rank correlation coefficient formula quantifies the degree and direction of
association between two ranked variables. It measures the monotonicity of a
relationship between two variables that is how well a monotonic function can represent
a relationship between two variables.

Spearman’s Rank correlation formula can be calculated using:

6 ∑ di
2
ρ=1− 2
n(n −1)

Where:

d i=¿Distinction between each observation’s two ranks

n=¿ Numerical value for the number of observations

ρ=¿Spearman’s rank coefficient

The Spearman Rank correlation coefficient can be anywhere between -1 and +1 such
that −1 ≤r s ≤+1

Example 01

The scores for 9 students in math and physics are as follows:

Physics 35 23 47 17 10 43 9 6 28

Mathematics 30 33 45 23 8 49 12 4 31

Calculate the student’s ranks in the two subjects and compute the Spearman rank
correlation

Solution

11
First, find the rank for each individual subject. Assign the rank 1 to the highest score, 2
to the next highest and so on. Thus we have

Physics Rank Maths Rank Difference d


2

btn ranks
(d)

35 3 30 5 -2 4

23 5 33 3 2 4

47 1 45 2 -1 1

17 6 23 6 0 0

10 7 8 8 -1 1

43 2 49 1 1 1

9 8 12 7 1 1

6 9 4 9 0 0

28 4 31 4 0 0

∑ d 2=12

From

6 ∑ di
2
ρ=1− 2
n(n −1)

12
6(12)
ρ=1−
9(9 2−1)

ρ=¿ 0.933

The Spearman’s rank correlation for this set of data is 0.933, which implies very strong
positive correlation coefficient.

QUESTION

Calculate the Spearman’s rank correlation coefficient of the data in the table given
below

X 10 8 12 15 8 10

Y 7 4 6 7 9 8

13
14
15
16

You might also like