Download as pdf or txt
Download as pdf or txt
You are on page 1of 76

RESEARCH QUESTION

FUNCTIONAL RELATIONSHIP DIFFERENCES

BETWEEN BETWEEN BETWEEN


ASSOCIATION CAUSE AND EFFECT DISTRIBUTIONS AVERAGES VARIANCES

PEARSON REGRESSION CHI-SQUARE TESTS LEVENE TEST


CORRELATION (P) ANALYSIS

SPEARMAN’S RANK
CORRELATION (NP)
TWO SAMPLES* MORE THAN TWO SAMPLES*

t-TEST (P; S; I)
MANN-WHITNEY (NP; I) ANOVA (P)
PAIRED t-TEST (P; D) KRUSKAL-WALLIS TEST (NP; I)
WILCOXON TEST (NP; D) FRIEDMAN TEST (NP; D)

* = OR POPULATIONS; OR TREATMENTS
P = PARAMENTRIC; NP = NONPARAMETRIC; I = INDEPENDENT; D = DEPENDENT OR RELATED
S = SMALL SAMPLE; L = LARGE SAMPLE
CORRELATION BETWEEN TWO VARIABLES
Consider the following x and y variables:
X-VARIABLE Y-VARIABLE1 Y-VARIABLE2 Y-VARIABLE3
1 1 3 10
2 2 9 9
3 3 1 8
4 4 5 7
5 5 1 6
6 6 10 5
7 7 6 4
8 8 1 3
9 9 4 2
10 10 10 1

What can we say about the relationship between x and y?


CORRELATION BETWEEN TWO VARIABLES

Y-variable3
Y-variable1

Y-variable2
12 12 12
10 10 10
8 8 8
6 6 6
4 4 4
2 2 2
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15

X-variable X-variable X-variable

POSITIVE NO NEGATIVE
CORRELATION CORRELATION CORRELATION
CORRELATION BETWEEN TWO VARIABLES

CORRELATION IS USED TO DETERMINE:


1. If an association between two variables exist; and
2. If so, how strong is such an association.

By association, we mean that if one variable changes, the


other variable changes in some consistent way.

Note that in correlation, there is no assumption of


“cause-and-effect” association between the two
variables.
REGRESSION BETWEEN TWO VARIABLES

In Regression Analysis, a “cause-and-effect” relationship


between the two variables is determined.
In such an analysis, a substantial proportion of the
variation in one variable (called the DEPENDENT
variable) can be explained by or attributed to the other
variable (called the INDEPENDENT variable).

For example, we may want to investigate if caffeine


consumption causes the pulse rate of FT Tech
students to change.
REGRESSION BETWEEN TWO VARIABLES

12
DEPENDENT VARIABLE:

10
PULSE RATE

8
6
4
2
0
1 2 3 4 5 6 7 8 9 10

INDEPENDENT VARIABLE:
CAFFEINE CONSUMPTION
REGRESSION VS. CORRELATION

1. Correlation analysis determines an association between


two variables x and y; regression analysis determines
the “cause-and-effect” relationship between two
variables x and y.
2. The graph in correlation analysis is a simple scatter plot
of variables x and y; the graph in regression analysis
includes a “line of best fit” through the variables x and y.
3. In regression analysis, the investigator must identify a
dependent variable (y) and an in dependent variable (x);
this is not necessary for correlation analysis.
REGRESSION VS. CORRELATION

DEPENDENT VARIABLE:
12 12
10 10

PULSE RATE
Y-variable1

8 8
6 6
4 4
2 2
0 0
0 5 10 15 1 2 3 4 5 6 7 8 9 10

INDEPENDENT VARIABLE:
X-variable CAFFEINE CONSUMPTION

CORRELATION REGRESSION
RESEARCH QUESTION

FUNCTIONAL RELATIONSHIP

ASSOCIATION CAUSE AND EFFECT

PEARSON REGRESSION
CORRELATION (P) ANALYSIS

SPEARMAN’S RANK
CORRELATION (NP)

* = OR POPULATIONS; OR TREATMENTS
P = PARAMENTRIC; NP = NONPARAMETRIC; I = INDEPENDENT; D = DEPENDENT OR RELATED
S = SMALL SAMPLE; L = LARGE SAMPLE
CORRELATION BETWEEN TWO VARIABLES

CORRELATION ANALYSIS ADDRESSES TWO QUESTIONS:


1. Are two variables related in some consistent and linear
way?; and
2. What is the strength of such relationship?
→ The strength of the relationship between two variables
is measured using the CORRELATION COEFFICIENT.

Product Moment or Pearson Correlation Coefficient:


→ used when both variables come from normally
distributed populations
Y-variable1 PEARSON CORRELATION COEFFICIENT

Y-variable2

Y-variable3
12 12 12
10 10 10
8 8 8
6 6 6
4 4 4
2 2 2
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15

X-variable X-variable X-variable


POSITIVE CORRELATION NO CORRELATION NEGATIVE CORRELATION

r = +1.0 r=0 r = -1.0

 (rho) = population Pearson correlation coefficient


r = sample Pearson correlation coefficient
PEARSON CORRELATION COEFFICIENT

VALUE OF r (+ OR -) MEANING
0.00 to 0.19 A very weak correlation
0.20 to 0.39 A weak correlation
0.40 to 0.69 A modest correlation
0.70 to 0.89 A strong correlation
0.90 to 1.00 A very strong correlation
REQUIREMENTS OF PEARSON CORRELATION ANALYSIS

1. The sample is a random sample from the


population of interest.
2. Both variables are approximately normally
distributed.
3. Measurement of both variables is on an interval
or ratio scale.
4. The relationship between the two variables, if it
exists, is linear.
FISH OTOLITHS
CASE STUDY: CORRELATION BETWEEN TWO VARIABLES

A sample of 10 fish is randomly selected from a


normally distributed population of Indian River Lagoon
fishes. Each fish is weighed and measured and then
dissected to remove the otoliths (or ear stones), which
are also measured. The distribution of otolith length
measurements is approximately normal.
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” relationship between variables x and y, use
REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and dependent
variable (y) is normally distributed and has homogeneous variance
across (x) → USE MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
How to report a correlation in graphical form?
11

Otolith Length (mm)


10

6
50 100 150 200 250 300

Fish Mass (g)


Figure Z. A scatter plot showing the corrlation
between mass (g) and otolith length (mm) of fish in the
Indian River Lagoon.
CORRELATION BETWEEN TWO VARIABLES
Otolith length in mm (x) and fish mass in g (y) measurements
x y x2 y2 xy
6.6 86 43.56 7396 567.6
6.9 92 47.61 8464 634.8
7.3 71 53.29 5041 518.3
8.2 185 67.24 34225 1517
8.3 85 68.89 7225 705.5
9.1 201 82.81 40401 1829.1
9.2 283 84.64 80089 2603.6
9.4 255 88.36 65025 2397
10.2 222 104.04 49284 2264.4
 82.7 1554 696.69 302626 13592.3
HYPOTHESIS TESTING IN CORRELATION ANALYSIS

1. Hypotheses:
Ho: Rho = 0
H1: Rho ≠ 0
RQ: (Generic) Is the correlation between the two
variables (x and y) statistically significant? (Example)
Is there a significant correlation between otolith length
and mass of fish in the Indian River Lagoon under
investigation?
2. Test-statistic:

calculated r = 0.8386 (A strong correlation)


HYPOTHESIS TESTING IN CORRELATION ANALYSIS

3. Null Hypothesis Rejection Region: Refer to Table A.8


α = 0.05; df = (n- 2); where n = # of paired observations
Critical r (α = 0.05; df = 8) = 0.632
If calculated r ≥ critical r (Table A.8), then reject Ho; if
calculated r ≤ critical r (Table A.8), then we fail to reject Ho.
4. Conclusion:
STATS: Since calculated r = 0.8386 (step 2) > critical r =
0.632 (step 3), we reject Ho.
BIO: We conclude that there is a significant, strong
correlation between otolith length and mass of fish in the
Indian River Lagoon (r = 0.8386; df = 8; P < 0.05).
How to report a correlation in graphical form?
11

r = 0.8386 (P < 0.05)


Otolith Length (mm)
10

6
50 100 150 200 250 300

Fish Mass (g)


Figure Z. A scatter plot showing the correlation
between mass (g) and otolith length (mm) of fish in the
Indian River Lagoon.
REQUIREMENTS OF PEARSON CORRELATION ANALYSIS

1. The sample is a random sample from the


population of interest.
2. Both variables are approximately normally
distributed.
3. Measurement of both variables is on an interval
or ratio scale.
4. The relationship between the two variables, if it
exists, is linear.
PRESENTING THE RESULT OF A PEARSON CORRELATION ANALYSIS
11

Otolith Length (mm)


10 r = 0.8386 (P < 0.05)

6
50 100 150 200 250 300

Fish Mass (g)


Figure Z. A scatter plot showing the correlation
between mass (g) and otolith length (mm) of fish in the
Indian River Lagoon.
CORRELATION BETWEEN TWO VARIABLES

A biologist investigates the usefulness of


Plecoptera (stonefly) nymphs as indicators of
environmental factors in streams. Samples from 13
streams are obtained b displacing nymphs from a stream
bed into a net. Values of water hardness (i.e., calcium
concentration) are obtained from the local water
authority. Is there a significant correlation between water
hardness and number of Plecoptera nymphs?

Note: The observations and their ranks, together with d


and d2 are shown in the following table.
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” relationship between variables x and y, use
REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and dependent
variable (y) is normally distributed and has homogeneous variance
across (x) → USE MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
CORRELATION BETWEEN TWO VARIABLES

Figure XY. Scatter plot showing the relationship between water


hardness (in CaCO3 units) and number of Plecoptera nymphs .
CORRELATION BETWEEN TWO VARIABLES
x Rank of x y Rank of y d d2
17 1 42 13 -12 144
20 2 40 12 -10 100 Variable x =
22 3 30 11 -8 64 water hardness
28 4 7 6 -2 4 (CaCO3 units)
42 5 12 10 -5 25
Variable y =
55 6.5 10 9 -2.5 6.25 number of
55 6.5 8 8 -1.5 2.25 Plecoptera
75 8 7 6 2 4 nymphs
80 9 3 2 7 49
90 10 7 6 4 16
145 11.5 5 4 7.5 56.25
145 11.5 2 1 10.5 110.25
170 13 4 3 10 100
d2 = 681
SPEARMAN RANK CORRELATION ANALYSIS
OUTPUT IN SPSS
HYPOTHESIS TESTING IN SPEARMAN RANK
CORRELATION ANALYSIS

1. Hypotheses:
Ho: rs = 0
H1: rs ≠ 0
RQ: (Generic) Is the correlation between the two
variables (x and y) statistically significant? (Example)
Is there a significant correlation between water
hardness and number of Plecoptera nymphs?
2. Test-statistic:

calculated rs = - 0.87 (A strong negative correlation.)


HYPOTHESIS TESTING IN SPEARMAN RANK CORRELATION ANALYSIS

3. Null Hypothesis Rejection Region:


At an α = 0.05; n = 13 (# of paired observations), if Sig
(P) < 0.05, we reject the Ho that there is no significant
correlation between water hardness and number of
Plecoptera nymphs in the streams under investigation;
if Sig (P) 0.05, we fail to reject Ho that…
4. Conclusion:
STATS: Since Sig (P) < 0.05, we reject the Ho that ...
BIO: We conclude that there is a significant negative
correlation between water hardness and number of
Plecoptera nymphs (rs = - 0.87; n = 13; P < 0.05).
How to report a Spearman Rank correlation in graphical form?

rs = -0.886 (P < 0.05)

Figure XY. Scatter plot showing the relationship between water


hardness (in CaCO3 units) and number of Plecoptera nymphs .
RESEARCH QUESTION

FUNCTIONAL RELATIONSHIP

ASSOCIATION CAUSE AND EFFECT

PEARSON REGRESSION
CORRELATION (P) ANALYSIS

SPEARMAN’S RANK
CORRELATION (NP)

* = OR POPULATIONS; OR TREATMENTS
P = PARAMENTRIC; NP = NONPARAMETRIC; I = INDEPENDENT; D = DEPENDENT OR RELATED
S = SMALL SAMPLE; L = LARGE SAMPLE
REGRESSION BETWEEN TWO VARIABLES
RECALL THE FOLLOWING FEATURES OF REGRESSION ANALYSIS:

In Regression Analysis, a “cause-and-effect” relationship


between the two variables is determined.
In such an analysis, a substantial proportion of the
variation in one variable (called the DEPENDENT
variable) can be explained by or attributed to the other
variable (called the INDEPENDENT variable).
12
DEPENDENT

10
VARIABLE

8
6
4
2
0
1 2 3 4 5 6 7 8 9 10

INDEPENDENT VARIABLE
SOME USES OF REGRESSION ANALYSIS
1. To construct a line through the points of a
scattergram (called the REGRESSION [or least
squares] LINE).
2. To approximate an equation that describes the linear
relationship between the 2 variables (called the
REGRESSION EQUATION).
3. To predict values of the dependent variable (y) at
various values of the independent variable (x).
4. To estimate the extent to which the dependent
variable (y) is under the control of the independent
variable (x) (called the COEFFICIENT OF
DETERMINATION).
REGRESSION BETWEEN TWO VARIABLES

12
DEPENDENT VARIABLE:

10
PULSE RATE

8
6
4
= SLOPE = B
2
0
1 2 3 4 5 6 7 8 9 10

INDEPENDENT VARIABLE:
CAFFEINE CONSUMPTION
REGRESSION BETWEEN TWO VARIABLES

The equation of a straight line or rectilinear equation:


y
y = a + bx

= SLOPE = b

} = a = intercept x
a = the value of y when x is zero
b = regression coefficient
REGRESSION BETWEEN TWO VARIABLES

Calculating the estimated Regression Equation: y = a + bx


REGRESSION BETWEEN TWO VARIABLES
How do you draw the line through a scattergram???
REGRESSION BETWEEN TWO VARIABLES
How do you draw the line through a scattergram???
→ THE METHOD OF LEAST SQUARES

Fitting a regression line to a scattergram involves placing it


through the points so that the sum of the vertical distances
(deviations) of all the points from the line is minimized.
REQUIREMENTS OF LEAST SQUARE REGRESSION
1. There is a linear relationship between a dependent y-
variable and the independent x-variable which is
implied to be functional or causal.
2. The x-variable is not a random variable but is under
the control of the observer (RECALL MODEL I ANOVA:
FIXED EFFECTS MODEL).
3. For any single defined observation of x-variable, there
is a theoretical population of y-values, that is normally
distributed.
4. The variances of different populations of y-values
corresponding to different individual x-values are
similar.
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” or functional relationship between variables x
and y, use REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and dependent
variable (y) is normally distributed and has homogeneous variance
across (x) → USE MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .969a .939 .931 2.639
a. Predictors: (Constant), TEMP

Within the “model summary”of your output it will


generate the R-squared value. Convert this to a
percentage and this value indicates how much of the
variation in the dependent variable “y” is explained
by the variation in the independent variable “x” in the
model. In this case study, the regression model
indicates that about 94% of the variation in heart rate
of the snakes under investigation is explained by the
variation in temperature. In other words, we are
confident that temperature affected heart rate in this
case study.
An important aspect of regression is its power to predict
the value of one variable based on the relationship
between the two variables. We use the equation for a
straight line and plugin known values for one variable to
get theunknown value for the other variable.
(Equation for a straight line) y = a + bx

intercept slope
Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 2.139 1.917 1.116 .301
TEMP 1.775 .170 .969 10.421 .000
a. Dependent Variable: HRT_RATE

Y = 2.139 + 1.775(X)
Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 2.139 1.917 1.116 .301
TEMP 1.775 .170 .969 10.421 .000
a. Dependent Variable: HRT_RATE

intercept
slope
HYPOTHESIS TESTING IN CORRELATION ANALYSIS

1. Hypotheses:
Ho:  = 0 =0
H1:   0
RQ: (Generic) Is there a statistically significant linear
relationship between the two variables (x and y)?
(Example) Is there a significant (cause-effect)
relationship between heart rate and temperature?
2. Test-statistic:
calculated t = 10.421
HYPOTHESIS TESTING IN REGRESSION ANALYSIS

3. Null Hypothesis Rejection Region:


At  = 0.05, df = (n- 2), if Sig or P < 0.05, we reject the
null hypothesis that …; if Sig or P > 0.05, we fail to
reject the null hypothesis.
4. Conclusion:
STATS: Since Sig (P) <<<0.05, we reject Ho that the
slope of the line that relates temperature with heart
rate is zero.
BIO: We conclude that there is a significant linear
relationship between heart rate and temperature (t =
10.421; df = 7; P <<< 0.05).
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” or functional relationship between variables x
and y, use REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and dependent
variable (y) is normally distributed and has homogeneous variance
across (x) → USE MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
REDUCED MAJOR AXIS (RMA) OR
MODEL 2 REGRESSION

1. This is used instead of the Regression by Least


Squares when both x and y variables are random, i.e.,
the dependent variable (x) is not under the control of
the investigator. This is the case in Model 2
Regression (recall Model 2 ANOVA!).
2. There is no requirement that the sampling units
should be obtained randomly. It is preferable to
select items which span the available range for
measurement.
3. There is no clear dependent variable. The x and y
variables are therefore assigned arbitrarily.
RMA OR MODEL 2 REGRESSION
A sample of 9 fish is randomly selected from a
normally distributed fish population. Each fish is
weighed and measured and then dissected to remove the
otoliths (or ear stones), which are also measured. Is
there a significant linear relationship between otolith
length and weight of fish?
Table A. Otolith length in mm (x) and fish mass in g (y) measurements.
SPSS REGRESSION ANALYSIS OUTPUT
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -298.878 117.943 -2.534 .039
Length 55.451 13.982 .832 3.966 .005
a. Dependent Variable: Mass

Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .832a .692 .648 48.639
a. Predictors: (Constant), Length

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a. Predictors: (Constant), Length
b. Dependent Variable: Mass
HYPOTHESIS TESTING IN REGRESSION ANALYSIS
USING ANOVA (THE ONLY WAY TO USE IN MODEL 2
(REDUCED MAJOR AXIS) REGRESSION
1. Hypotheses:
=0
Ho:  = 0
H1:  ≠ 0
RQ: Is there a significant linear relationship between
mass of fish and otolith length?
2. Test-statistic: F (as in ANOVA)
F = Regression MS (Variance) ÷ Residual MS (Variance)
calculated F = 37211.793 ÷ 2366.776 = 15.729
HYPOTHESIS TESTING IN REGRESSION ANALYSIS
USING ANOVA (THE ONLY WAY TO GO FOR MODEL 2 (RMA) REGRESSION

3. Null Hypothesis Rejection Region:


At α = 0.05, if P (Sig) < 0.05, reject the Ho that…; if P
(Sig) > 0.05, we fail to reject this Ho.
4. Conclusion:
STATS: Since P(Sig.) < 0.05, we reject the Ho that β =
0.
BIO: We conclude that there is a significant linear
relationship between otolith length and weight of fish
(F = 18.963; df = 1, 7; P < 0.05).
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a.
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” relationship between variables x and y, use
REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and the population
of dependent variable (y) is normally distributed for any value of x
(variances of the residuals homogeneous across all values of x) → USE
MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” or functional relationship between variables x
and y, use REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and dependent
variable (y) is normally distributed and has homogeneous variance
across (x) → USE MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
MODEL 2 REGRESSION OR
REDUCED MAJOR AXIS (RMA)

1. This is used instead of the Regression by Least


Squares when both x and y variables are random, i.e.,
the dependent variable (x) is not under the control of
the investigator. This is the case in Model 2
Regression.
2. There is no requirement that the sampling units
should be obtained randomly. It is preferable to
select items which span the available range for
measurement.
3. There is no clear dependent variable. The x and y
variables are therefore assigned arbitrarily.
Case Study: MODEL 2 REGRESSION RMA
A sample of 9 fish is randomly selected from a
normally distributed fish population. Each fish is
weighed and measured and then dissected to remove the
otoliths (or ear stones), which are also measured. Is
there a significant linear relationship between otolith
length and weight of fish?
Table A. Otolith length in mm (x) and fish mass in g (y) measurements.
11

Otolith Length (mm)


10

6
50 100 150 200 250 300

Fish Mass (g)


Figure Z. A scatter plot showing the relationship
between mass (g) and otolith length (mm) of fish in the
Indian River Lagoon.
SPSS REGRESSION ANALYSIS OUTPUT
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -298.878 117.943 -2.534 .039
Length 55.451 13.982 .832 3.966 .005
a. Dependent Variable: Mass

Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .832a .692 .648 48.639
a. Predictors: (Constant), Length

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a. Predictors: (Constant), Length
b. Dependent Variable: Mass
11

(y = -298.88 + 55.45x; p < 0.05; r2 = 69.2)

Otolith Length (mm)


10

6
50 100 150 200 250 300

Fish Mass (g)


Figure Z. A scatter plot showing the relationship
between mass (g) and otolith length (mm) of fish in the
Indian River Lagoon.
HYPOTHESIS TESTING IN REGRESSION ANALYSIS
USING ANOVA (THE ONLY WAY TO USE IN MODEL 2
(REDUCED MAJOR AXIS) REGRESSION
1. Hypotheses:
=0
Ho:  = 0
H1:  ≠ 0
RQ: Is there a significant linear relationship between
mass of fish and otolith length?
2. Test-statistic: F (as in ANOVA)
F = Regression MS (Variance) ÷ Residual MS (Variance)
calculated F = 37211.793 ÷ 2366.776 = 15.729
HYPOTHESIS TESTING IN REGRESSION ANALYSIS
USING ANOVA (THE ONLY WAY TO GO FOR MODEL 2 (RMA) REGRESSION

3. Null Hypothesis Rejection Region:


At α = 0.05, if P (Sig) < 0.05, reject the Ho that…; if P
(Sig) > 0.05, we fail to reject this Ho.
4. Conclusion:
STATS: Since P(Sig.) < 0.05, we reject the Ho that β =
0.
BIO: We conclude that there is a significant linear
relationship between otolith length and weight of fish
(F = 18.963; df = 1, 7; P < 0.05).
ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a.
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” relationship between variables x and y, use
REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and the population
of dependent variable (y) is normally distributed for any value of x
(variances of the residuals homogeneous across all values of x) → USE
MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)
REDUCED MAJOR AXIS OR
MODEL 2 REGRESSION

1. This is used instead of the Regression by Least


Squares when both x and y variables are random, i.e.,
the dependent variable (x) is not under the control of
the investigator. This is the case in Model 2
Regression (recall Model 2 ANOVA!).
2. There is no requirement that the sampling units
should be obtained randomly. It is preferable to
select items which span the available range for
measurement.
3. There is no clear dependent variable. The x and y
variables are therefore assigned arbitrarily.
b’ = ± (standard deviation of y ÷ standard deviation of x)
SPSS REGRESSION ANALYSIS OUTPUT
Coefficientsa

Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -298.878 117.943 -2.534 .039
Length 55.451 13.982 .832 3.966 .005
a. Dependent Variable: Mass

Model Summary

Adjusted Std. Error of


Model R R Square R Square the Estimate
1 .832a .692 .648 48.639
a. Predictors: (Constant), Length

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regression 37211.793 1 37211.793 15.729 .005a
Residual 16560.430 7 2365.776
Total 53772.222 8
a. Predictors: (Constant), Length
b. Dependent Variable: Mass
11

(y = -298.88 + 55.45x; p < 0.05; r2 = 69.2)

Otolith Length (mm)


10

6
50 100 150 200 250 300

Fish Mass (g)


Figure Z. A scatter plot showing the relationship
between mass (g) and otolith length (mm) of fish in the
Indian River Lagoon.
The Coefficient of Determination r2

The Coefficient of Determination r2 quantifies the


proportion of the variance in the y- (dependent)
variable that is explained by its dependence on the x-
(independent) variable.

100-r2 = the residual or “unexplained” variance, i.e., the


variance in y that is not explained by its relationship
with x.
The Coefficient of Determination r2 may also be
computed for the correlation analysis, in this case it
is the square of the Correlation Coefficient r. In this
case, r2 quantifies the variation in y that is associated
with the variation in x, and vice versa.
Which of these graphs is statistically incorrect?

300

y = 51.93 + 0.8114x
y = 51.93 + 0.8114x
Yield of Grass (g/m2)

250

200

150

100

50
0 50 100 150 200 250 300

Mass of Fertilizer (g/m2)


DEALING WITH CURVED RELATIONSHIPS

In a study of amphibian mortality, a mass of developing


frog spawn is released into a suitable environment. At
two-week intervals, predetermined by the observer, the
numbers of surviving tadpoles are counted. Is there a
significant change in survivorship of tadpoles during
the 12-week survey?
DEALING WITH CURVED RELATIONSHIPS

600

Number of tadpoles surviving


x y 500

2 541
400
4 116
6 58 300

8 27 200
10 6
100
12 3
0

0 2 4 6 8 10 12 14

Weeks
DEALING WITH CURVED RELATIONSHIPS

y’ = 3.088 + (-0.2210x)

x y log(y)
2 541 2.7332
4 116 2.0645
6 58 1.7634
8 27 1.4314
10 6 0.7782
12 3 0.4771
TRANSFORMATION OF BOTH y AND x

Island biogeographic theory tells us that the


number of species on islands increases with the
size of the island, but not in rectilinear fashion. Is
there a significant effect of island size on the
number of species?
TRANSFORMATION OF BOTH y AND x
Island Area x (ha) ln area (x') No. species y ln (No. species y')
A 1850 7.523 164 5.1
B 269 5.59 122 4.8
C 72 4.28 67 4.2
D 21 3.04 70 4.25
E 9.5 2.25 43 3.76
F 7 1.95 37 3.61
G 2 0.693 23 3.14
H 0.375 -0.981 18 2.89
1. y on x: r2 = 69.8 → r2 = Coefficient of Determination
2. ln(y) on x: r2 = 45.8
3. y on ln(x): r2 = 91.2
4. ln(y) on ln(x): r2 = 96.2
HOW TO CHOOSE THE APPROPRIATE HYPOTHESIS TEST CONCERNING
LINEAR RELATIONSHIPS BETWEEN TWO VARIABLES
1A. Observation is discrete, derived, ordinal scale→ 2
1B. Observation is continuous, interval or ratio scale → 3A or 3B
2. Transform data → 3A or 3B
3A. To test for association between variables x and y, use CORRELATION
ANALYSIS → 4A or 4B
3B. To test for “cause-effect” relationship between variables x and y, use
REGRESSION ANALYSIS → 5A or 5B
4A. Variables are normally distributed → USE PEARSON CORRELATION
4B. Variables are not normally distributed → USE SPEARMAN CORRELATION
5A. Independent variable (x) is FIXED by the investigator and the population
of dependent variable (y) is normally distributed for any value of x
(variances of the residuals homogeneous across all values of x) → USE
MODEL I REGRESSION (Least-Squares Regression)
5B. Both variables x and y are random variables (i.e., independent variable
(x) is NOT FIXED by the investigator) → USE MODEL 2 REGRESSION
(Reduced Major Axis Regression)

You might also like