Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 6


There are variables in nature that are related in such a way that if we know one of
them, the others can be estimated. For example, bright parents will most likely have
bright children. So if we know the IQ of the parents, we can make an educated guess of
their children’s IQ. The farther you travel in a vehicle, the more gasoline you consume.
The higher the sun in the horizon, the shorter is the shadow of the objects.


A Correlation is a relationship or association between two variables.

A direct or positive relationship between two variables

implies that an increase in value of one of the variables
corresponds to an increase in value of the other variable.

A inverse or negative relationship between two variables

means that an increase in the value of one variable corresponds
to decrease in the value of the other variable.

A zero relationship exists between two variables if an

increase in one is not accompanied by either an increase or a
decrease in another.

In the language of statistics, the relationship between two variables is

termed as the correlation between two variables. Thus, we have
correspondingly positive correlation, negative correlation and zero
correlation. We say that there is a positive correlation between achievement
in English and Mathematics, a negative correlation between pressure and
volume at constant temperature and zero (or no) correlation between IQ and
mental ability and weight.

These conclusions are descriptive and they may not be sufficient to

understand the meaning of correlation. There is a need to be more precise
in expressing relationships between two variables. To be more precise
means to be able to express this relationship in numerical terms.

A correlation coefficient is a numerical measure of the

linear relationship between two variables.
Based on the formula derived by Carl Pearson, the correlation
coefficient has a range extending from -1 to +1.

-1 -0.5 0 +0.5 +1

Consider the number line. Correlations coefficients between +0.5 and

+1 are considered highly positive, while correlation between -1 and -0.5 are
considered highly negative. Correlations lower that +0.5 are considered
mildly positive, while correlation higher than -0.5 are considered mildly
negative. Finally correlations close to zero imply that no correlation exists
between the two variables. A more precise meaning attached to each
coefficient is dealt in inferential statistics.

The correlation coefficients are solved using respective derived

formula. Each is used depending on the type of data about the variables
one is dealing with. Recall that there are 4 types of data: nominal
dichotomous, ordinal, interval and ratio.


By assuming a linear relationship between two quantities x and y, the

famous British statistician, Carl Pearson derived a formula for finding the
correlation between x and y expressed as a number. The formula named in
his honor: Pearson Product-Moment Correlation Coefficient.

The Pearson Product-Moment Correlation coefficient rxy is a measure

of the linear correlation of two variables which are either ratio or interval.

n n

∑ (xi - xx̅ )(yi - yx̅) ∑ (zx )( z y )

i=1 i=1
rxy = ------------------------------ = ---------------------
(n – 1) (sx) (sy) n -1
Where xi = any x value
yi = any y value
xx̅ = mean of x
yx̅ = mean of y
sx = standard deviation of x
sy = standard deviation of y
n = number of pairs of x and y
zx = standard score for x
zy = standard score for y

The term n – 1 is used for samples, while n is used when dealing with
populations. The location of n – 1 or n in the denominator makes rxy dependent on the
size of the sample.

A more convenient form is derived by expanding the numerator and simplifying sx

and sy.

n n n

n ∑ (xi) (yi) - ∑ (xi ) ∑ (yi )

i=1 i=1 i=1
rxy = -------- -----------------------------------
n n n n

n ∑ (xi ) 2 – (∑ xi ) 2 n ∑ (yi ) 2
- (∑ yi ) 2

i=1 i=1 i=1 i=1


When the two variables to be correlated are both measured in the

ordinal scale, the Spearman’s Rank Correlation Coefficient is used. For
example, 10 candidates for a managerial position were rank in their
presentation of business plan.
The British psychologist Charles Spearman (1863-1945) derived a
formula for rank correlation rs, the formula is

6∑ (xi - yi) ²
rs = 1 - ------------------
n ( n² – 1)


Dealing with nominal dichotomous variables, the most appropriate correlation

coefficient to use is called the Phi Coefficient. Refer to the Table of Data for the Phi

Variable x

1 2

1 a b a+b
Variable y
2 c d c+ d

a+c b+d

The Phi Coefficient rφ is

ad – bc
rφ = ---------------------------------------------------------------------------------------
(a + b)(c + d)(a + c)(b +d)
This formula was first derived by Carl Pearson in 1901.

The Phi coefficient rφ is the measure of the correlation between

two real nominal dichotomous variables.

There is another correlation that is a special case of the Pearson product moment
correlation. It is called the Point-Biseral Correlation rpb . It correlates a real dichotomous
variable with an interval variable. For example, the score x in a test correlated with
gender y categorized as male (1) or female (0).

The formula is derived from the Pearson r:

xx̅1 - xx̅0 n1 no
rpb = ----------- ---------
sx n(n-1)

where xx̅1 = the mean of those which are labeled 1 in the real dichotomous, y

xx̅0 = the mean of those which are labeled 0 in the real dichotomous, y

n1 = the number of samples labeled 1 in y

n0 = the number of samples labeled 0 in y

n = the total number of samples n = n0 + n1

sx = the standard deviation of all the x values

The point-biseral correlation measures the correlation
between a real dichotomous variable and an interval variable,

You might also like