Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

Testing if a relationship

occurs between two variables


using correlation
How does one variable respond
to changes in another variable?
 Lichen is sensitive to SO2
 e.g. Growth of lichen vs. Air pollution
 Growth determined by max length
 Pollution indicated by the distance from a town
center (0-10 km)
30.00

25.00

20.00

Max length (mm)


15.00

10.00

5.00

0.00
0.00 2.00 4.00 6.00 8.00 10.00
Distance from the town center (km)

Evernia prunastri

• Decreasing thallus size as the town center is approached


• A gap in the data between 4 and 6 km
• Any outliner(s)?
• A statistical technique
30.00
termed CORRELATION
25.00
enables us to quantify the
20.00 relationship between two
Max length (mm)

variables
15.00

10.00
• Calculation of a
correlation coefficient r
5.00
(range from –1 to +1)
0.00
0.00 2.00 4.00 6.00 8.00 10.00 • r  +1 : +ve correlation
Distance from the town center (km)
• r  0 : no correlation
• r  –1 : -ve correlation
A B

X2 X2

X1 X1

C D

X2 X2

X1 X1
Covar(x1, x2) =
Covariance (x1 – x1)(x2- x2)/(n – 1)

5
= [(-6)(4) + (5)(3) + (-2)(-10)](3-1)
4
3 = 5.5
Mean point
-6 (X1, X2)
Pearson’s r product-moment
X2

-10 correlation coefficient (r):

-2 is obtained by dividing the


covariance by a pooled standard
X1
deviation (SX1, SX2)
Correlation coefficient r
= covariance/ pooled standard deviation
= [x1x2 - (x1  x2)/n]/ {[x12 – (x1)2/n][x22 – (x2)2/n]

Convenient form

• The coefficient is also called as Pearson’s r


• Measure the strength of association between a pair of
variables and
• Test whether the association is greater than can be
expected by chance
• But it’s dangerous to read too much into a significant
correlation. Why?
Significant negative correlation between number of trees
and number of sick people within individual regions
(r = -0.981, p < 0.001).

Is this conclusion right???


醫療最前線
學歷高易患乳癌

 
    
乳癌自一九九四年
開始,已取代肺
            癌 成 為 本 港 女 性 最
常患的癌症,新
症數字也較二十年
前上升兩倍,患
者除漸趨年輕化之
外,平均每二十
四名女性便有一名
乳癌患者,女士
們不應再低估自己
患 乳 癌 的 機 會 。....
X1 X2
30.00
Distance Length X1^2 X2^2 X1X2
25.00
3.16 11.60 9.99 134.56 36.66
6.10 20.10 37.21 404.01 122.61
20.00 7.15 20.80 51.12 432.64 148.72
Max length (mm)

1.76 3.80 3.10 14.44 6.69


15.00 3.61 18.00 13.03 324.00 64.98
8.18 23.20 66.91 538.24 189.78
10.00 9.31 27.30 86.68 745.29 254.16
6.64 16.15 44.09 260.82 107.24
5.00 1.05 3.80 1.10 14.44 3.99
6.90 11.10 47.61 123.21 76.59
0.00
0.00 2.00 4.00 6.00 8.00 10.00
Sum 53.86 155.85 360.84 2991.65 1011.41
Distance from the town center (km)
n= 10.00

Ho: r = 0 [1011.41 - (53.86 x 155.85)/10]


r=
{[360.84 – (53.86)2/10][2991.65 – (155.85)2/10]
DF = n - 2
r = 0.862 critical r 0.05(2), DF = n – 2 = 8 = 0.632 (Table B17),
Usually 2-
tailed
P < 0.01
In conclusion, maximum thallus size of the lichen increases as the distance
from the town center is increased (r = 0.862, p < 0.01, n =10). I.e.
significant positive association between these two variables.
Assumption for Pearson’s product-moment
correlation analysis
 As a parametric test, it is also based on a bivariate
normal distribution
– i.e. both sample distribution should be normal
 Measurements must be taken at interval or scale
ratio level
 If these criteria are not fulfilled,
– Transform the data into normal OR
– Use a non-parametric test: Spearman’s rank
correlation analysis
Spearman’s rank correlation
 Non-parametric test
 Ranking is involved
 The method is modified where there are tied
observations
 Ho: no association between the two
variables (i.e. rs = 0), i.e. rs differs from
zero only by chance
Example: Spearman’s test
 In the UK, a qualitative scale is available
for the assessment of SO2 pollution using
the presence of lichens.
 The scale ranges from 0 to 10:
0 5 10
absent Moderately Very sensitive
intolerant spp. spp.
Test whether there is correlation between smoke level
(g/m3) and lichen scale
Smoke Lichen Smoke Lichen Rank d^2
level scale rank rank different (d) Spearman’s rs
89 0 8 1 7 49
29 9 3 8 -5 25
43
102
3
4
6
9
3
4
3
5
9
25
rs = 1 – (6d 2)/(n3-n)
32 6 4 5 -1 1
85 1 7 2 5 25
22 7 2 6 -4 16 = 1 – 6(218)/(93-9)
33 8 5 7 -2 4 = – 0.817
20 10 1 9 -8 64

n=9 Sum = 218 Critical rs 0.05(2), 9 = 0.7


(Table B20), p = 0.001

In conclusion, there is a significant negative


association between the lichen scale and smoke level
(rs = - 0.817, p < 0.01, n =9)
Modifications for ties values (p. 396)
Rank Rank Rank d^2 rs = {[(n3 - n)/6] - d 2 - Tx1 -- Tx2}/ AB
X1 X2 diff (d)
7.5 1 6.5 42.25
3 8 -5 25 A = (n3 - n)/6 – 2Tx1
6 3 3 9 B = (n3 - n)/6 – 2Tx2
9 4 5 25
4 6 -2 4  Tx1 = (t3 – t)/12 = (23-2)/12 = 0.5
7.5 2 5.5 30.25  Tx2 = (33-3)/12 = 2
2 6 -4 16
5 6 -1 1 r = -0.843 (p<0.01)
s
1 9 -8 64

Reject Ho
n=9 Sum = 216.5
Problems of multiple tests
Character
Character 1 2 3 4
2 r 0.708
p value 0.010
3 r 0.750 0.398
p value 0.005 0.200
4 r 0.216 0.823 0.795
p value 0.500 0.001 0.002
5 r 0.497 0.398 0.658 0.658
p value 0.100 0.200 0.020 0.020

Total 10 tests (i.e. k = 10)

 Multiple tests increase the error to reject Ho


 For k = 20, 1 test will turn out to be significant
Problems of multiple tests –
Sequential Bonferroni Correction
Character Sequential Bonferroni
Character 1 2 3 4 n = 12 corrected critical
2 r 0.708 df = 10 1+k-i p value
p value 0.010 i p value for r alpha (k = 10) alpha/(1+k-i)
3 r 0.750 0.398 1 0.001 0.050 10 0.005
p value 0.005 0.200 2 0.002 0.050 9 0.006
4 r 0.216 0.823 0.795 3 0.005 0.050 8 0.006
p value 0.500 0.001 0.002 4 0.010 0.050 7 0.007
5 r 0.497 0.398 0.658 0.658 5 0.020 0.050 6 0.008
p value 0.100 0.200 0.020 0.020 6 0.020 0.050 5 0.010
7 0.100 0.050 4 0.013
Total 10 tests (i.e. k = 10) 8 0.200 0.050 3 0.017
9 0.200 0.050 2 0.025
10 0.500 0.050 1 0.050

 Rank P values from smallest to largest


 Calculate critical Pi  / (1 + k – i)
 Compare the sample P against the new Pi
 Can also be applied to other multiple tests
Concordance correlation
(p. 407-10, Zar 1999)

X Y 0.6

Sample Method A Method B


0.5
1 0.22 0.21

[Pb] (ug/g) - Method B


2 0.26 0.23
0.4
3 0.30 0.27
4 0.33 0.27 0.3
5 0.36 0.31
6 0.39 0.33 0.2
7 0.41 0.37
8 0.44 0.38 0.1

9 0.47 0.40
10 0.51 0.43 0
0 0.1 0.2 0.3 0.4 0.5 0.6
11 0.55 0.47
[Pb] (ug/g) - Method A
0.6

0.5

Concordance correlation

[Pb] (ug/g) - Method B


0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6

[Pb] (ug/g) - Method A

X Y XY X^2 Y^2 xy x y
Sample Method A Method B
1 0.22 0.21 0.05 0.05 0.04
2 0.26 0.23 0.06 0.07 0.05
3 0.30 0.27 0.08 0.09 0.07
4 0.33 0.27 0.09 0.11 0.07
5 0.36 0.31 0.11 0.13 0.10
6 0.39 0.33 0.13 0.15 0.11
7 0.41 0.37 0.15 0.17 0.14
8 0.44 0.38 0.17 0.19 0.14
9 0.47 0.40 0.19 0.22 0.16
10 0.51 0.43 0.22 0.26 0.18
11 0.55 0.47 0.26 0.30 0.22

Sum 4.24 3.67 1.50 1.74 1.29 0.086 0.1075 0.0705


Mean 0.39 0.33

(n-1)(mean X - mean Y) = 0.026851


rc = 2(sum xy)/(sum x^2 + sum x^2 + 0.02685)
rc = 0.844638
Key notes (1)
 Correlation is a technique used to clarify the
relationships between two variables
 For a Pearson’s product-moment
correlation, the variables are assumed to
follow the bivariate normal distribution
 The correlation coefficient r is a numerical
measure of the strength of the relationship
between two variables
Key notes (2)
 All correlation coefficients range from +1
to –1
 The Spearman rank correlation coefficient
is used for non-normally distributed ordinal
or interval/ratio measurements
 Always draw the scatter graph before
obtaining a correlation coefficient (r)
Hospital for Aquatic Organisms

Ecotoxicologist
Individual responses to stresses

What
What is
is aa
biomarker
biomarker
??

Biological
Biological
responses
responsesto toaa
stress
stressoror
stresses
stressesthat
that
give
giveaameasure
measure
of
ofexposure
exposure
and
andsometimes,
sometimes,
also,
also,of
oftoxic
toxic
effect
effect
Biomarker of Aging: Telomeres

 Physical ends of linear eukaryotic


chromosomes.
 Important functions: protection,
replication, and stabilization of the
chromosome ends.
 Contain lengthy repeated simple DNA
sequences: all vertebrates have the
same simple sequence, (TTAGGG)n.
 With each cell division telomeres
become shorter.
 Shortened telomeres appear to lead to
cell senescence and apoptosis.
Example: Analysis of
telomere lengths in sheep
 The mean size of the terminal
telomere fragment (TRF)obtained by
cutting with restriction enzyme.
 Mean TRF of telomere decreased in
control animals with increasing age,
at a mean rate of 0.59 kilobases per
year (t =3.29; P <0.01).
 Mean TRF sizes were smaller in all
three cloned animals than in age-
matched controls.

Nature 399:316-317 (1999)


Why t not r?
If critical values of r are not available, you can use the
relationship below, which transforms your sample r to
student’s t

t = r [(n - 2)/(1 - r2)]

Suppose for n = 65 and sample r = -0.216

t = -0.216 [(65 - 2)/(1 – (0.216)2)] = -1.756

Critical t 0.05(2), 63 = 1.999, thus accept Ho

You might also like