Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

CORRELATION

Research Problem:

What is the relationship between two variables?

Relationship between hours studying (X)


and grades on a midterm (Y)?

Relationship between stressful life events (X)


and number of illness symptoms (Y)?

Correlation = Direction and strength of (linear)


relationship between two variables

1
I. THE SCATTERPLOT

What is the relationship between hours studying (X) and


scores on a quiz (Y)?

STUDENT HOURS SCORE


A 1 1
B 1 3
C 3 4
D 4 5
E 6 4
F 7 6

7
6
5
Y 4
(Score) 3
2
1
0
0 1 2 3 4 5 6 7 8
X (Hours Studying)

2
II. Pearson Correlation Coefficeint

Symbol: r

r can range from -1.0 to +1.0

Sign (+/-) indicates “direction”

Value indicates “strength”

Measures a “linear” relationship only

(a) Direction of relationship between X, Y

Positive (+r) = As X goes up, Y goes up

Negative (-r) = As X goes up, Y goes down

(b) Strength of a relationship between X, Y

Closer to  1.0, stronger

Closer to 0, weaker

when r = 0  X,Y relationship not defined


by a straight line

3
Figure 16-3 (p. 524)
Examples of positive and negative relationships. (a) Beer sales are positively related to
temperature. (b) Coffee sales are negatively related to temperature.

4
Pearson Correlation Coefficient

-1.0 0 +1.0
Perfect No Linear Perfect
Negative Relationship Positive
Relationship Relationship

 Closer to 0 = weaker

 Closer to 1.0 = stronger

 r close to 1.0 very rare in social science

 r   .30 considered important

 r  0, no linear relationship between X & Y

5
r = .90 r = -.40

r = .00
r = -1.0

Figure 16-5 (p. 525)


Examples of different values for linear correlations: (a) shows a strong positive relationship, approx +.90;
(b) shows a relatively weak negative correlation, approx –.40; (c) shows a perfect negative correlation,
correlation = –1.0; (d) shows no linear trend, correlation = 0.0.

6
What does r represent?

r = degree to which X & Y vary together


degree to which X & Y vary separately

r = covariance of X & Y
variance of X & Y

Definitional Formula for Pearson r:

SP
r= SS XSSY

SP = “Sum of Products”

SS = Sum of Squared Deviations

SP = (X- X )(Y- Y )

SSX=(X- X )2

SSy=(Y- Y )2

7
VARIANCE INTERPRETATION OF r :

r 2 = % of variance in Y explained by its linear


relationship with X (and vice versa)

r 2 = “Coefficient of determination”

% of shared variance between X & Y

% of variance in Y predicted by X

8
III. Factors that affect the size of r
 r  0 could mean many things:
 No relationship at all between X & Y
 Non-linear relationship between X & Y
 Restricted range on X and/or Y
 Outlier may be causing problem

 Non-linear relationships

Curvilinear relationship

 Restricted range

Low variability on X and/or Y

 Outliers

Extreme value on X and/or Y

9
Examples of how restricted range can distort a correlation

(a) In this example, the full range of X and Y values shows a strong, positive correlation, but the restricted
range of scores produces a correlation near zero.

(b) An example in which the full range of X and Y values shows a correlation near zero, but the scores in the
restricted range produce a strong, positive correlation.

10
Example of how an outlier can distort a correlation

A demonstration of how one extreme data point (an outlier) can influence the value of a correlation.

11
IV. CORRELATION VS. CAUSALITY:

 Correlation tells you two variables are related

 Does NOT tell you why!!

 Do not draw causal inferences from a correlation

X  Y

Y  X
X Third variable problem
Z
Y

example:

r = -.30 #friends, depression

r = +.40 hours studying, grades

 Causal inferences require an “experiment”

12
V. OTHER CORRELATION COEFFICIENTS

Pearson r used when X & Y are at least interval level

Many types of correlation coefficients for other data

Spearman  ordinal (rank) data


Point-biserial  nominal X, interval/ratio Y
Phi  nominal X & Y

13

You might also like