Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Choosing the test

Use the table below to choose the test. See below for further details.

How many dichotomous+ (binary) variables?


Both variables interval or ratio?
Measures are linear? (No = monotonic*)
Y Y Pearson correlation
N Spearman correlation
Both variables are ordinal?
0 Y Kendall correlation
Both variables can be ranked?
N Y Kendall correlation
N
Convert to frequency data and use Chi-square test for
N
independence

1 Biserial Correlation Coefficient


2 x 2 table?
2 Y Phi
N Cramer's V

Data has frequency values for each category?

Y Chi-square test for independence

+
dichotomous = 'can have only two values' (eg. yes/no or 0/1).
*
monotonic = constantly increasing or decreasing.

Discussion
A correlation coefficient is measured between -1 and 1.
A positive indicates that if one variable increases, the other increases also. A negative
coefficient indicates that if one variable increases, the other decreases.
0 indicates no relationship between the two variables.
1 or -1 indicates a linear relationship, such that if one variable is known, the second can be
accurately predicted.

Correlation
Definition
Correlation of two variables is a measure of the degree to which they vary together.
More accurately, correlation is the covariation of standardized variables.
In positive correlation, as one variable increases, so also does the other.
In negative correlation, as one variable increases, the other variable decreases.

Correlation can be visually displayed in a Scatter Diagram.


Correlation is a descriptive statistic, as it simply describes data, telling you something about it.
This is in contrast to inferential statistics.

Coefficients
A correlation coefficient is a calculated number that indicates the degree of correlation
between two variables:
 Perfect positive correlation usually is calculated as a value of 1 (or 100%).
 Perfect negative correlation usually is calculated as a value of -1.
 A values of zero shows no correlation at all.

A simple form of correlation is to calculate regression coefficients, m and c, so a line can be


drawn on a scatter diagram with the equation y = mx + c. These coefficients are often
calculated with the method of least squares.
The problem with simple regression coefficients is that they are tied to the units from which
they are calculated, which does not make them very portable. This is compensated for in
correlation coefficients by standardizing both measurement scales.
Three common types of correlation are Pearson, Spearman (for ranked data) and Kendall (for
uneven or multiple rankings), and can be selected using the table below.

Parametric? (interval data, with normal distribution and linear relationship between x and
y)
Y Pearson correlation

Equidistant positions on variables measured?

Y Spearman correlation
N

N Kendall correlation

The causality trap


It is a very common trap to assume that correlation shows that changing one
variable causes the other to change. In practice, when there is no direct causal link, this can be
coincidence, but usually it is because they both have a common cause. For example sales of
ice-cream correlate with drowning in the sea -- because they both increase with fine weather.

You might also like