Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Correlational Analysis: Pearson r and Spearman’s Rank

What is Correlation and Regression?

Regression involves assessing the correlation between two variables. The prefix co means two ─ hence

correlation is a term used to describe the strength and direction of two quantitative variables. Direction is

indicated by the sign of r value; ─ or +. Strength is indicated by the numueric value. There are two types of

correlation: positive and negative correlation.

Positive correlation means that the relationship between two variables move in the same direction ─ one

variable increases while the other increases and vice-versa. On the other hand, negative correlation occurs when

one variable increases as the other decreases and vice-versa. Aside from these two, there is also a term “no

correlation” whereas the other variable does not tend to either increase or decrease.

y y y

x x x
Negative correlation No correlation Positive correlation

Pearson r Correlation Coefficient

Pearson r correlation is about the linear relationship of two paired data. A relationship is said to be linear when

a transformation in one variable relates to a constant change in the other variable.

-1 ≤ r ≤ 1

Furthermore:

 Positive values denote positive linear correlation;

 Negative values denote negative linear correlation;


 A value of 0 denotes no correlation;

 The closer the value is to 1 or -1, the stronger the linear correlation.

For example, Pearson r correlation might use to evaluate whether increase in temperature at your production

facility are associated with decreasing thickness of your chocolate coating. Pearson r correlation is used to

measure the degree of relationship between the two. The following formula is used to calculate the Pearson r

n ∑ x i y −∑ x i ∑ yi
correlation: rxy=
i

2
√n ∑ x −¿ ¿
i

rxy= Pearson r correlation coefficient between x and y

n= number of obeservations

xi= value of x (for ith observation)

yi= value of y (for ith observation)

Correlation is an effect size and so we can verbally describe the strength of the correlation using the guide that

Evans (1996) suggests for the absolute value of r.

 .00-.19 “very weak”


 .20-.39 “weak”
 .40-.59 “moderate”
 .60-.79 “strong”
 .80-1.0 “very strong”

Types of research questions a Pearson r correlation can examine:

 Is there a statistically significant relationship between age, as measured in years, and height, measures in

inches?

 Is there a relationship between temperature, measured in degrees Fahrenheit, and ice cream sales,

measured by income?
 Is there a relationship between job satisfaction, as measured by the JSS, and income, measured in

dollars?

Assumptions

The calculation of Pearson’s correlation coefficient and subsequent testing of it requires the following data

assumption to hold:

 Interval or ration level;

 Linearly related;

 Bivariate normally distributed.

REMEMBER: If your data does not meet the above assumptions then use Spearman’s rank correlation

Spearman’s correlation

Monotonic Function

Before understanding Spearman’s correlation, it is important to know first about “monotonic function”. A

monotonic function happens if one that either never increases or never decreases as its independent variables.

The following graphs illustrate monotonic functions:

y y y

x x x

Monotonically increasing Monotonically decreasing Not monotonic

 Monotonically increasing – as the x variable increases the y variable never decreases;

 Monotonically decreasing – as the x variable increases the y variable never increases;


 Not monotonic – as the x variable increases the y variable sometimes decreases and sometimes

increases.

Spearman Rank Correlation Coefficient

Spearman rank correlation coefficient evaluates the monotonic relationship between two continuous or ordinal

values. It is denoted by rs and is by design constrained as follows:

-1 ≤ rs ≤ 1

The Spearman rank correlation focuses on the ranked values for each variable rather than the raw data. And its

interpretation is the same with of Pearson’s, e.g., the closer rs is to ±1 the stronger the monotonic relationship.

For example, Spearman’s correlation might use to evaluate whether the order in which employees complete a

test exercise is related to the number of months they have been employed. The following formula is used to

calculate the Spearman rank correlation:

6 ∑ d i2
ρ=1−
n(n2 −1)

ρ= Spearman rank correlation

di= the difference between the ranks of corresponding variables

n= number of observations

Correlation is an effect size and so we can verbally describe the strength of the correlation using the guide that

Evans (1996) suggests for the absolute value of rs:

 .00-.19 “very weak”

 .20-.39 “weak”
 .40-.59 “moderate”

 .60-.79 “strong”

 .80-1.0 “very strong”

Types of research questions as a Spearman rank correlation can examine:

 Is there a statistically significant relationship between participants’ level of education (high school,

bachelor’s, or graduate degree) and their starting salary?

 Is there a statistically significant relationship between horse’s fishing position a race and horse’s age?

Assumptions

The calculation of Spearman’s correlation coefficient and subsequent testing of it requires the following data

assumption to hold:

 Interval or ratio level or ordinal;

 Monotonically related.

NOTE: Unlike Pearson r correlation, there is no requirement of normality and hence it is nonparametric

statistic.

REMEMBER: It is always necessary to study the relationship between variables with a scatterplot.

Correlation coefficients only measure linear (Pearson) or monotonic (Spearman) relationship.

Comparison of Pearson r and Spearman Rank

The Pearson and Spearman correlation coefficients can range in value from −1 to +1. For the Pearson

correlation coefficient to be +1, when one variable increase then the other variable increases by a consistent

amount. This relationship forms a perfect line. The Spearman correlation coefficient is also +1 in this case.
Pearson = +1, Spearman = +1

If the relationship is that one variable increase when the other increases, but the amount is not consistent, the

Pearson correlation coefficient is positive but less than +1. The Spearman coefficient still equals +1 in this case.

Pearson = +0.851, Spearman = +1

When a relationship is random or non-existent, then both correlation coefficients are nearly zero.

Pearson = −0.093, Spearman = −0.093


If the relationship is a perfect line for a decreasing relationship, then both correlation coefficients are −1.

Pearson = −1, Spearman = −1

If the relationship is that one variable decrease when the other increases, but the amount is not consistent, then

the Pearson correlation coefficient is negative but greater than −1. The Spearman coefficient still equals −1 in

this case

Pearson = −0.799, Spearman = −1

Correlation values of −1 or 1 imply an exact linear relationship, like that between a circle's radius and

circumference. However, the real value of correlation values is in quantifying less than perfect relationships.

Finding that two variables are correlated often informs a regression analysis which tries to describe this type of

relationship more.
Courtesy: Minitab Express Support

Exercises: Pearson r and Spearman rank Correlation

Directions: Answer the following questions.

1. The correlation between temperature and number of ice cream cones bought is the same whether the temperature is

measured in Celsius or Fahrenheit.

a. True

b. False

2. The correlation between two sets of numbers is the same as the correlation between the log of those two sets of

numbers.

a. True

b. False

3. Which of the following is not a possible value of Pearson’s correlation?

a. -1.5

b. -1

c. 0.99

4. Which is higher, the correlation between height and weight or the correlation between weight and height?

a. Weight and height

b. They are about the same

c. They are exactly the same

d. Height and weight

5. The scatter plot below represents

Y
X

a. Positive correlation
b. Negative correlation
c. No correlation
6. The data shows the relation between a company’s production and its employees’ salaries in 5 years.

PRODUCTION 1,000 2,000 2,500 4,000 2,300


SALARIES 150 200 250 700 180

QUESTION: Find the Spearman’s correlation coefficient between production and salaries.

7. What type of data is Spearman’s rank correlation coefficient appropriate for

a. Continuous and discrete data

b. Discrete data

c. Continuous data

8. Find the Spearman’s correlation coefficient between sales and advertising from the given data.

ADVERTISING 1,000 800 1,000 1,500


SALES 5,000 4,500 4,500 6,500
9. Which of the following is the formula of Spearman’s rank correlation coefficient?

6 ∑ d i2
a. ρ=1−
n(n2 −1)

6 ∑ d i2
b. ρ=1−
n ( n2−2 )

6 ∑ di3
c. ρ=1−
n(n2 −1)

10. The following table represents the relation between sale and profit for six models of televisions.

TELEVISIO 500 600 550 100 480 400


N SALE
TELEVISIO 300 400 400 90 250 200
N PROFIT
QUESTION: Find Spearman’s correlation coefficient between television sale and profit. Round your

answer to three decimal places

11. Find Spearman’s correlation coefficient between x and y. Round your answer to three decimal places.

X 4 7 8 5 8 12
Y 7 6 6 4 6 10

12. In a study of the relation between students’ grades in mathematics and science, the following results

were found for six students.

MATHEMATIC D B A B D D
S
SCIENCE C C B A C F
QUESTION: Find the Spearman’s correlation coefficient. Round your answer to three decimal places.
13. The following table represents the relation between the results of employees’ appraisal this year and last

year.

LAST YEAR Meets Needs Exceptional Meets Exceeds


expectations improvement expectations expectations
THIS YEAR Exceeds Meets Exceptional Needs Exceeds
expectations expectations improvement expectations
QUESTION: Find the Spearman’s correlation coefficient between last and current year’s results.

14. In a study to discover the relationship between the age of a mother and the number of her children, the

following data were found.

MOTHER’ 19 22 24 28 29 32 34 35
S AGE
NUMBER 2 1 1 2 3 4 3 5
OF
CHILDREN
QUESTION: Find Spearman’s correlation coefficient. Round your answer to three decimal places.

15. The table shows the relation between two variables, L AND M. Find Spearman’s correlation coefficient

between them, round them to the nearest thousandth.

L 50 90 75 150 63 35 75 90 50
M 130 130 64 80 100 55 80 100 80
16. Using the information from the table, find the Spearman’s rank correlation coefficient and determine the

type of correlation between the age of a mother and number of children. Give the numerical part of your

answer to four decimal places.

AGE OF 24 27 35 24 35 21 35 33
MOTHERS
(YEARS)
NO. OF 1 4 3 5 3 1 2 4
CHILDRE
N (n)

a. 0.2113, direct correlation

b. 0.1845, inverse correlation

c. -0.2113, inverse correlation

d. 0.1845, direct correlation

17. Using the information given in the table, find the Spearman’s rank correlation between the variables x

and y. Give your answer to four decimal places.

X Good Excellent Good Excellent Excellent Excellent


Y Poor Good Poor Excellent Very good Good

18. Using the information from the table, find the Spearman’s rank correlation coefficient and determine the

correlation between the variables x and y. Give the numerical part of your answer to four decimal places.

X 14 9 10 13 6 9
Y 21 20 23 16 20 16
a. 0.2714, inverse correlation

b. 0.3, inverse correlation

c. 0.2714, direct correlation

d. 0.3, direct correlation

19. Which of the following values cannot represent a correlation coefficient?

a. r= 1.08

b. r= 0.95
c. r= 0

d. r= 1.0

20. Compute the value of Pearson r correlation coefficient for the data below:

X -2 -5 3
Y 7 -1 2
a. r= 0.002

b. r=0.235

c. r= -0.002

d. r= -0.235

You might also like