Professional Documents
Culture Documents
R4 Lecture 3
R4 Lecture 3
SPEARMAN’S RANK
CORRELATION (NP)
TWO SAMPLES* MORE THAN TWO SAMPLES*
t-TEST (P; S; I)
MANN-WHITNEY (NP; I) ANOVA (P)
PAIRED t-TEST (P; D) KRUSKAL-WALLIS TEST (NP; I)
WILCOXON TEST (NP; D) FRIEDMAN TEST (NP; D)
* = OR POPULATIONS; OR TREATMENTS
P = PARAMENTRIC; NP = NONPARAMETRIC; I = INDEPENDENT; D = DEPENDENT OR RELATED
S = SMALL SAMPLE; L = LARGE SAMPLE
CORRELATION BETWEEN TWO VARIABLES
Consider the following x and y variables:
X-VARIABLE Y-VARIABLE1 Y-VARIABLE2 Y-VARIABLE3
1 1 3 10
2 2 9 9
3 3 1 8
4 4 5 7
5 5 1 6
6 6 10 5
7 7 6 4
8 8 1 3
9 9 4 2
10 10 10 1
Y-variable3
Y-variable1
Y-variable2
12 12 12
10 10 10
8 8 8
6 6 6
4 4 4
2 2 2
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
POSITIVE NO NEGATIVE
CORRELATION CORRELATION CORRELATION
CORRELATION BETWEEN TWO VARIABLES
12
DEPENDENT VARIABLE:
10
PULSE RATE
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10
INDEPENDENT VARIABLE:
CAFFEINE CONSUMPTION
REGRESSION VS. CORRELATION
DEPENDENT VARIABLE:
12 12
10 10
PULSE RATE
Y-variable1
8 8
6 6
4 4
2 2
0 0
0 5 10 15 1 2 3 4 5 6 7 8 9 10
INDEPENDENT VARIABLE:
X-variable CAFFEINE CONSUMPTION
CORRELATION REGRESSION
RESEARCH QUESTION
FUNCTIONAL RELATIONSHIP
PEARSON REGRESSION
CORRELATION (P) ANALYSIS
SPEARMAN’S RANK
CORRELATION (NP)
* = OR POPULATIONS; OR TREATMENTS
P = PARAMENTRIC; NP = NONPARAMETRIC; I = INDEPENDENT; D = DEPENDENT OR RELATED
S = SMALL SAMPLE; L = LARGE SAMPLE
CORRELATION BETWEEN TWO VARIABLES
Y-variable2
Y-variable3
12 12 12
10 10 10
8 8 8
6 6 6
4 4 4
2 2 2
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
VALUE OF r (+ OR -) MEANING
0.00 to 0.19 A very weak correlation
0.20 to 0.39 A weak correlation
0.40 to 0.69 A modest correlation
0.70 to 0.89 A strong correlation
0.90 to 1.00 A very strong correlation
REQUIREMENTS OF PEARSON CORRELATION ANALYSIS
6
50 100 150 200 250 300
1. Hypotheses:
Ho: Rho = 0
H1: Rho ≠ 0
RQ: (Generic) Is the correlation between the two
variables (x and y) statistically significant? (Example)
Is there a significant correlation between otolith length
and mass of fish in the Indian River Lagoon under
investigation?
2. Test-statistic:
6
50 100 150 200 250 300
6
50 100 150 200 250 300
1. Hypotheses:
Ho: rs = 0
H1: rs ≠ 0
RQ: (Generic) Is the correlation between the two
variables (x and y) statistically significant? (Example)
Is there a significant correlation between water
hardness and number of Plecoptera nymphs?
2. Test-statistic:
FUNCTIONAL RELATIONSHIP
PEARSON REGRESSION
CORRELATION (P) ANALYSIS
SPEARMAN’S RANK
CORRELATION (NP)
* = OR POPULATIONS; OR TREATMENTS
P = PARAMENTRIC; NP = NONPARAMETRIC; I = INDEPENDENT; D = DEPENDENT OR RELATED
S = SMALL SAMPLE; L = LARGE SAMPLE
REGRESSION BETWEEN TWO VARIABLES
RECALL THE FOLLOWING FEATURES OF REGRESSION ANALYSIS:
10
VARIABLE
8
6
4
2
0
1 2 3 4 5 6 7 8 9 10
INDEPENDENT VARIABLE
SOME USES OF REGRESSION ANALYSIS
1. To construct a line through the points of a
scattergram (called the REGRESSION [or least
squares] LINE).
2. To approximate an equation that describes the linear
relationship between the 2 variables (called the
REGRESSION EQUATION).
3. To predict values of the dependent variable (y) at
various values of the independent variable (x).
4. To estimate the extent to which the dependent
variable (y) is under the control of the independent
variable (x) (called the COEFFICIENT OF
DETERMINATION).
REGRESSION BETWEEN TWO VARIABLES
12
DEPENDENT VARIABLE:
10
PULSE RATE
8
6
4
= SLOPE = B
2
0
1 2 3 4 5 6 7 8 9 10
INDEPENDENT VARIABLE:
CAFFEINE CONSUMPTION
REGRESSION BETWEEN TWO VARIABLES
= SLOPE = b
} = a = intercept x
a = the value of y when x is zero
b = regression coefficient
REGRESSION BETWEEN TWO VARIABLES
intercept slope
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 2.139 1.917 1.116 .301
TEMP 1.775 .170 .969 10.421 .000
a. Dependent Variable: HRT_RATE
Y = 2.139 + 1.775(X)
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 2.139 1.917 1.116 .301
TEMP 1.775 .170 .969 10.421 .000
a. Dependent Variable: HRT_RATE
intercept
slope
HYPOTHESIS TESTING IN CORRELATION ANALYSIS
1. Hypotheses:
Ho: = 0 =0
H1: 0
RQ: (Generic) Is there a statistically significant linear
relationship between the two variables (x and y)?
(Example) Is there a significant (cause-effect)
relationship between heart rate and temperature?
2. Test-statistic:
calculated t = 10.421
HYPOTHESIS TESTING IN REGRESSION ANALYSIS
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -298.878 117.943 -2.534 .039
Length 55.451 13.982 .832 3.966 .005
a. Dependent Variable: Mass
Model Summary
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a. Predictors: (Constant), Length
b. Dependent Variable: Mass
HYPOTHESIS TESTING IN REGRESSION ANALYSIS
USING ANOVA (THE ONLY WAY TO USE IN MODEL 2
(REDUCED MAJOR AXIS) REGRESSION
1. Hypotheses:
=0
Ho: = 0
H1: ≠ 0
RQ: Is there a significant linear relationship between
mass of fish and otolith length?
2. Test-statistic: F (as in ANOVA)
F = Regression MS (Variance) ÷ Residual MS (Variance)
calculated F = 37211.793 ÷ 2366.776 = 15.729
HYPOTHESIS TESTING IN REGRESSION ANALYSIS
USING ANOVA (THE ONLY WAY TO GO FOR MODEL 2 (RMA) REGRESSION
Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a.
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” relationship between variables x and y, use
REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and the population
of dependent variable (y) is normally distributed for any value of x
(variances of the residuals homogeneous across all values of x) → USE
MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” or functional relationship between variables x
and y, use REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and dependent
variable (y) is normally distributed and has homogeneous variance
across (x) → USE MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
MODEL 2 REGRESSION OR
REDUCED MAJOR AXIS (RMA)
6
50 100 150 200 250 300
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -298.878 117.943 -2.534 .039
Length 55.451 13.982 .832 3.966 .005
a. Dependent Variable: Mass
Model Summary
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a. Predictors: (Constant), Length
b. Dependent Variable: Mass
11
6
50 100 150 200 250 300
Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a.
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” relationship between variables x and y, use
REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and the population
of dependent variable (y) is normally distributed for any value of x
(variances of the residuals homogeneous across all values of x) → USE
MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
REDUCED MAJOR AXIS OR
MODEL 2 REGRESSION
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -298.878 117.943 -2.534 .039
Length 55.451 13.982 .832 3.966 .005
a. Dependent Variable: Mass
Model Summary
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a. Predictors: (Constant), Length
b. Dependent Variable: Mass
11
6
50 100 150 200 250 300
300
y = 51.93 + 0.8114x
y = 51.93 + 0.8114x
Yield of Grass (g/m2)
250
200
150
100
50
0 50 100 150 200 250 300
600
2 541
400
4 116
6 58 300
8 27 200
10 6
100
12 3
0
0 2 4 6 8 10 12 14
Weeks
DEALING WITH CURVED RELATIONSHIPS
y’ = 3.088 + (-0.2210x)
x y log(y)
2 541 2.7332
4 116 2.0645
6 58 1.7634
8 27 1.4314
10 6 0.7782
12 3 0.4771
TRANSFORMATION OF BOTH y AND x