Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 22

Correlation and Regression

 Correlation describes the strength of a linear


relationship between two variables
 Linear means “straight line”

 Regression tells us how to draw the straight line


described by the correlation
Regression Analyses
Regression: technique concerned with predicting
some variables by knowing others

The process of predicting variable Y using


variable X
Regression
Uses a variable (x) to predict some
outcome variable (y)
Tells you how values in y change as a
function of changes in values of x
Linear Equations
Y
ŷY = bX
a +bX
a
Change
b = Slope in Y
Change in X
a = Y-intercept
X
Vocabulary
• Rank-correlation test -- nonparametric procedure
used to test claims regarding association between
two variables.

• Spearman’s rank-correlation coefficient -- test


statistic, rs
6Σdi²
rs = 1 – --------------
n(n²- 1)
Definition
 Nonparametric methods 1: rank-based methods
are used when we have no idea about the
population distribution from which the data is
sampled.
 Used for small sample sizes.
 Used when the data are measured on an ordinal
scale and only their ranks are meaningful.

5
Spearman Rank Correlation Coefficient
(rs)

It is a non-parametric measure of correlation.


This procedure makes use of the two sets of
ranks that may be assigned to the sample
values of x and Y.
Spearman Rank correlation coefficient could be
computed in the following cases:
Both variables are quantitative.
Both variables are qualitative ordinal.
One variable is quantitative and the other is
qualitative ordinal.
:Procedure
1. Rank the values of X from 1 to n where n
is the numbers of pairs of values of X and
Y in the sample.
2. Rank the values of Y from 1 to n.
3. Compute the value of di for each pair of
observation by subtracting the rank of Yi
from the rank of Xi
4. Square each di and compute ∑di2 which
is the sum of the squared values.
From Pearson to Spearman
 Pearson’s
 Measure only the degree of linear association

 Based on the assumption of bivariate normally

of two variables

 Spearman’s
 Take in account only the ranks

 Measure the degree of monotone association

 Inferences on the rank correlation coefficients

are distribution-free
8
Example 1

Calculations:
S D S-Rank D-Rank d=X-Y d²
100 257 2.5 1 1.5 2.25
102 264 5 4 1 1
103 274 6 6 0 0
101 266 4 5 -1 1
105 277 7.5 8 -0.5 0.25
100 263 2.5 3 -0.5 0.25
99 258 1 2 -1 1
105 275 7.5 7 0.5 0.25

102 267 Ave Sum 6


Apply the following formula

6 (di) 2
rs  1  2

n(n  1)
6 Σdi² 6 (6) 36
rs = 1 - ----------- = 1 – ------------- = 1 - -------- = 0.929
n(n² - 1) 8(64 - 1) 8(63)

The value of rs denotes the magnitude


and nature of association giving the same
interpretation as simple r.
Example 2
Table below: Wine Consumption and Heart Disease Deaths

11
Example 2

12
Example
Table below: Ranks of Wine Consumption and Heart Disease Deaths

13
Solution
 ∑d2=2,081.5

 r = 1- (6)(2,081.5) = -0.826
(19)(360)
 There is negative strong rank
correlation between wine consumption
and heart disease deaths
Example
In a study of the relationship between level
education and income the following data was
obtained. Find the relationship between them and
comment.

sample level education Income


numbers (X) (Y)
A Pre-Secondary. 25
B Primary. 10
C University. 8
D secondary 10
E secondary 15
F illiterate 50
G University. 60
Answer:
Rank Rank di di2
(X) (Y) X Y
A Pre- 25 5 3 2 4
Secondary
B Primary. 10 6 5.5 0.5 0.25
C University. 8 1.5 7 -5.5 30.25
D secondary 10 3.5 5.5 -2 4
E secondary 15 3.5 4 -0.5 0.25
F illiterate 50 7 2 5 25
G university. 60 1.5 1 0.5 0.25

∑ di2=64
6  64
rs  1   0.1
7(48)

Comment:
There is an indirect weak correlation
between level of education and income.
Assignment 1
The following table shows how 10 students, arranged in
alphabetical order, were ranked according to their
achievements in both Course work and final examination
of QM.
Find the coefficient of rank correlation

COURSE WORK 8 3 9 2 7 10 4 6 1 5

EXAMINATION 9 5 10 1 8 7 3 4 2 6
Assignment 2

The following table shows the respective heights X and Y


of a sample of 12 fathers and their oldest sons
Calculate the Spearman’s coefficient of rank correlation.

Height X of Father
(inches) 65 63 67 64 68 62 70 66 68 67 69 71
Height Y of Son
(inches) 68 66 68 65 69 66 68 65 71 67 68 70
Solution for Assignment 2
Arranged in ascending order of magnitude, the fathers’ heights are
62, 63,64,65,66,67,67,68,68,69,70,71
Since the 6th and 7th places in this array represent the same height (67 inches), we assign a
mean rank 6.5 to these places. Similarly, the 8th and 9th places are assigned the rank 8.5. It
follows that the fathers’ heights are assigned the ranks
1,2,3,4,5,6.5,6.5,8.5,8.5,10,11,12.
Sons’ heights arranged in ascending order of magnitude are
65,65,66,66,67,68,68,68,69,70,71
Since the 6th ,7th ,8th and 9th places represent the same height (68 inches), we assign the mean
rank 7.5 [(6+7+8+9)]/4 to these places. Thus the sons’ heights are assigned the ranks
1.5,1.5,3.5,3.5,5,7.5,7.5,7.5,7.5,10,11,12.
Using the correspondences above the ranking becomes
Solution to Assignment 2

Rank of Father 4 2 6.5 3 8.5 1 11 5 8.5 6.5 10 12


Rank of Son 7.5 3.5 7.5 1.5 10 3.5 7.5 1.5 12 5 7.5 11

D -3.5 -1.5 -1 1.5 -1.5 -2.5 3.5 3.5 -3.5 1.5 2.5 1
2 2
D 12.25 2.25 1 2.25 2.25 6.25 12.25 12.25 12.3 2.25 6.25 1 ∑D = 72.50
Solutions
 Assignment 1: Rrank = 1 - 6∑D2
N(N2-1)
1 - 6(24) = 0.8545
10(102-1)

Assignment 2
Rrank = 1 – 6(72.50) = 0.7465
12(122-1)

You might also like