Download as pdf or txt
Download as pdf or txt
You are on page 1of 16


Three segments
•  Overview
Statistics One •  Calculation of r
•  Assumptions
Lecture 5

1 2

Correlation: Overview
•  Important concepts & topics
–  What is a correlation?
Lecture 5 ~ Segment 1 –  What are they used for?
Correlation: Overview –  Scatterplots
–  Types of correlations

3 4



Correlation: Overview Correlation: Overview

•  Correlation •  When two variables, let’s call them X and
–  A statistical procedure used to measure and Y, are correlated, then one variable can be
describe the relationship between two used to predict the other variable
variables –  More precisely, a person’s score on X can be
–  Correlations can range between +1 and -1 used to predict his or her score on Y
•  +1 is a perfect positive correlation
•  0 is no correlation (independence)
•  -1 is a perfect negative correlation
5 6

Correlation: Overview Correlation: Overview

•  Example:
–  Working memory capacity is strongly
correlated with intelligence, or IQ, in healthy
young adults
–  So if we know a person’s IQ then we can
predict how they will do on a test of working
7 8



Correlation: Overview Correlation: Overview

–  Correlation does not imply causation –  The magnitude of a correlation depends upon
many factors, including:
•  Sampling (random and representative?)

9 10

Correlation: Overview Correlation: Overview

•  CAUTION! •  For now, consider just one assumption:
–  The magnitude of a correlation is also –  Random and representative sampling
influenced by:
•  Measurement of X & Y (See Lecture 6) –  There is a strong correlation between IQ and
•  Several other assumptions (See Segment 3) working memory among all healthy young
•  What is the correlation between IQ and working
memory among college graduates?
11 12



Correlation: Overview Correlation: Overview

•  Finally & perhaps most important:
–  The correlation coefficient is a sample
statistic, just like the mean
•  It may not be representative of ALL individuals
–  For example, in school I scored very high on Math and
Science but below average on Language and History

13 14

Correlation: Overview Correlation: Overview

•  Note: there are several types of correlation
coefficients, for different variable types
–  Pearson product-moment correlation
coefficient (r)
•  When both variables, X & Y, are continuous
–  Point bi-serial correlation
•  When 1 variable is continuous and 1 is
15 16



Correlation: Overview Segment summary

•  Note: there are several types of correlation •  Important concepts/topics
coefficients –  What is a correlation?
–  Phi coefficient –  What are they used for?
•  When both variables are dichotomous –  Scatterplots
–  Spearman rank correlation –  CAUTION!
•  When both variables are ordinal (ranked data) –  Types of correlations

17 18

END SEGMENT Lecture 5 ~ Segment 2

Calculation of r

19 20



Calculation of r Calculation of r
•  Important topics •  r = the degree to which X and Y vary together,
–  r relative to the degree to which X and Y vary
•  Pearson product-moment correlation coefficient independently
–  Raw score formula
–  Z-score formula
•  r = (Covariance of X & Y) / (Variance of X & Y)
–  Sum of cross products (SP) & Covariance

21 22

Calculation of r Calculation of r
•  Two ways to calculate r •  Let’s quickly review calculations from
–  Raw score formula Lecture 4 on summary statistics
–  Z-score formula
•  Variance = SD2 = MS = (SS/N)

23 24



Linsanity! Jeremy Lin (10 games)

Points  per  game   (X-­‐M)   (X-­‐M)2  
28   5.3   28.09  
26   3.3   10.89  
10   -­‐12.7   161.29  
27   4.3   18.49  
20   -­‐2.7   7.29  
38   15.3   234.09  
23   0.3   0.09  
28   5.3   28.09  
25   2.3   5.29  
2   -­‐20.7   428.49  
M  =  227/10  =  22.7   M  =  0/10  =  0   M  =  922.1/10  =  92.21   26

Results Just one new concept!

•  M = Mean = 22.7 •  SP = Sum of cross Products
•  SD2 = Variance = MS = SS/N = 92.21
•  SD = Standard Deviation = 9.6

27 28



Just one new concept! Just one new concept!

•  Review: To calculate SS •  To calculate SP
–  For each row, calculate the deviation score –  For each row, calculate the deviation score on
•  (X – Mx) X
–  Square the deviation scores •  (X - Mx)
•  (X - Mx)2 –  For each row, calculate the deviation score on
–  Sum the squared deviation scores Y
•  SSx = Σ[(X – Mx)2] = Σ[(X – Mx) x (X – Mx)] •  (Y – My)

29 30

Just one new concept! Calculation of r

•  To calculate SP Raw score formula:

–  Then, for each row, multiply the deviation score

on X by the deviation score on Y r = SPxy / SQRT(SSx x SSy)

•  (X – Mx) x (Y – My)

–  Then, sum the “cross products”
•  SP = Σ[(X – Mx) x (Y – My)]

31 32



Calculation of r Formulae to calculate r

SPxy = Σ[(X - Mx) x (Y - My)]
r = SPxy / SQRT (SSx x SSy)

SSx = Σ(X - Mx)2 = Σ[(X - Mx) x (X - Mx)]
r = Σ[(X - Mx) x (Y - My)] /

SQRT (Σ(X - Mx)2 x Σ(Y - My)2)

SSy = Σ(Y - My)2 = Σ[(Y - My) x (Y - My)]


Formulae to calculate r Formulae to calculate r

Z-score formula:
Zx = (X - Mx) / SDx

Zy = (Y - My) / SDy

r = Σ(Zx x Zy) / N

SDx = SQRT (Σ(X - Mx)2 / N)

SDy = SQRT (Σ(Y - My)2 / N)

35 36



Formulae to calculate r Formulae to calculate r

Proof of equivalence:
r = Σ { [(X - Mx) / SQRT (Σ(X - Mx)2 / N)] x

[(Y - My) / SQRT (Σ(Y - My)2 / N)] } / N

Zx = (X - Mx) / SQRT (Σ(X - Mx)2 / N)

Zy = (Y - My) / SQRT (Σ(Y - My)2 / N)


Formulae to calculate r Variance and covariance

r = Σ { [(X - Mx) / SQRT (Σ(X - Mx)2 / N)] x
•  Variance = MS = SS / N
[(Y - My) / SQRT (Σ(Y - My)2 / N)] } / N

•  Covariance = COV = SP / N

r = Σ [(X - Mx) x (Y - My)] /

SQRT ( Σ(X - Mx)2 x Σ(Y - My)2 )
•  Correlation is standardized COV

–  Standardized so the value is in the range -1 to
r = SPxy / SQRT (SSx x SSy) ß The raw score formula!

39 40



Note on the denominators Segment summary

•  Correlation for descriptive statistics •  Important topics
–  Divide by N –  r
•  Correlation for inferential statistics •  Pearson product-moment correlation coefficient
–  Raw score formula
–  Divide by N – 1 –  Z-score formula
–  Sum of cross Products (SP) & Covariance

41 42

END SEGMENT Lecture 5 ~ Segment 3


43 44



Assumptions Assumptions
•  Assumptions when interpreting r •  Assumptions when interpreting r
–  Normal distributions for X and Y –  Reliability of X and Y
–  Linear relationship between X and Y –  Validity of X and Y
–  Homoscedasticity –  Random and representative sampling

45 46

Assumptions Assumptions
•  Assumptions when interpreting r •  Assumptions when interpreting r
–  Normal distributions for X and Y –  Linear relationship between X and Y
•  How to detect violations? •  How to detect violation?
–  Plot histograms and examine summary statistics –  Examine scatterplots (see following examples)

47 48



Assumptions Homoscedasticity
•  Assumptions when interpreting r •  In a scatterplot the vertical distance between a
–  Homoscedasticity dot and the regression line reflects the amount
•  How to detect violation? of prediction error (known as the “residual”)
–  Examine scatterplots (see following examples)

49 50

Homoscedasticity Anscombe’s quartet

•  Homoscedasticity means that the •  In 1973, statistician Dr. Frank Anscombe
distances (the residuals) are not related to developed a classic example to illustrate
the variable plotted on the X axis (they are several of the assumptions underlying
not a function of X) correlation and regression
•  This is best illustrated with scatterplots

51 52



Anscombe’s quartet Anscombe’s quartet

53 54

Anscombe’s quartet Anscombe’s quartet

55 56



Anscombe’s quartet Segment summary

•  Assumptions when interpreting r
–  Normal distributions for X and Y
–  Linear relationship between X and Y
–  Homoscedasticity

57 58

Segment summary
•  Assumptions when interpreting r
–  Reliability of X and Y
–  Validity of X and Y
–  Random and representative sampling

59 60






You might also like