Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Correlation Analysis

1. What is correlation co-efficient?


Ans ) Its numerical value ranges from +1.0 to –1.0. It gives us an indication of the strength
of relationship.
In general, r > 0 indicates positive relationship,
r < 0 indicates negative relationship
while r = 0 indicates no relationship (or that the variables are independent and not related).
Here r = +1.0 describes a perfect positive correlation and r = -1.0 describes a perfect
negative correlation.
Closer the coefficients are to +1.0 and –1.0, greater is the strength of the relationship
between the variables
2. What is probable error?
Ans Probable error of ‘r’ is a statistical measurement which measures reliability and
dependability of the value of ‘r’.
3. Difference between correlation and causation.
Ans correlation deals with the association between 2 or more variables and helps to
determine the degree of relationship between 2 or more variables. But it does not indicate
the cause and effect relationship between two variables. It only explains co-variation.
Causation deals with the cause and effect relationship between two variables.
4. What do you mean by coefficient of determination?
It was introduced by famous statistician Tuttle. It is the square of ‘r’. it is used to measure
the percentage of variation in the dependent variable which is accounted for by the
independent variable . r2 = Explained variance
Total variance

5. What is scatter diagram? Distinguish between positive and negative correlation with the
help of scatter diagrams?
Graphical method of studying the correlation between two variables. One variable shown
on x axis and other on the Y axis. Each pair of the values is plotted on thye graph by means
of a dot mark.

6. Define Karl Pearson’s co-efficient of correlation and explain the various formulae through
which Pearson’s co-efficient of correlation can be obtained.
Karl Pearson’s coefficient of correlation:
It gives the numerical expression for the measure of correlation. it is denoted by ‘ r ’.
The value of ‘ r ’ gives the magnitude of correlation and sign denotes its direction.

Method No.1
Direct method based on deviations
Deviation of items are taken from assumed mean and actual means
This method is conveniently used where the values of the variables are of very big size and their
deviations from their respective means are found to be in whole numbers.

This assumed mean method is advisable when it is not possible to get the arithmetic mean of
both the variables in whole numbers or round numbers ie.if the actual means are in fractions. It is
better to consider assumed mean to find out deviations

Formula
Actual mean Method
Assumed mean Method

dx=X-X

r= dy=Y-Y
∑dx=0
∑dy=0

dx=X-X

dy=Y-Y
Steps
Steps 1. find arithmetic mean of both the
a) find the deviations of both X and Y variables
series from the assumed mean. It is
denoted by dx and dy and its total as
2. Find the deviation of the values of
∑dx and ∑dy respectively.
both the variables from their
b) Square up the deviations under the respective means and present them as
heading dx2 and dy2 and get totaled X and Y respectively
∑dx2 and ∑dy2.
c) Multiply dx and dy and totaled them as
3. Square up the deviations of each
∑dxdy
variable and present them as X2 and
a) Substitute the values in the formula Y2 or dx2 and dy2
4. Find the product of each pair of the
deviations and get them totaled under
∑XY or ∑dxdy
5. Find the total of squares of the
deviations of each of the variable∑ X2
and ∑Y2 or
a.
∑dx2 and ∑dy2
6. Put the respective values in the
relevant formula.
7. Interpret the result lies between +1or -
1

Product moment correlation coefficient

∑(X-X) (Y-Y)

n SDof X*SDof Y

SD of x =√∑(X-X)2
n
SD of Y=√∑(Y-Y)2
n
7. What are the properties of the Karl Pearson’s co-efficient of correlation?
a) Correlation coefficient has a well defined formula.
b) r means it is a pure number and is independent of the units of measurement.
c) It lies between ± 1
d) r does not change with reference to change of origin or change of scale.
e) r between X and Y is same as that between Y and X
8. Explain the Assumptions on which Karl Pearson’s ‘r’ is based.
Ans Prof. Pearson’s co-efficient is based on the following assumptions:
1) Linear relationship
In devising the formulae, Prof.Pearson has assumed that there is a linear relationship between
the variables Which means that if the values of the two variables are plotted on a scatter
diagram , it will give rise to a straight line.
2) Cause and effect relationship
There is a cause and effect relationship between variables which means that a change in the
value of one variable is a cause for effecting a change in the value of another variable.
3) Normalcy in distribution
It is assumed that the population from which the data are collected are normally distributed.
4) Multiplicity of Causes
Each of variables under study is affected by Multiplicity of causes.
5) Probable error of Measurement
He assumed that there is a probability of some error which may creep into the measurement
of ‘ r ’.

9. What do you mean by probable error ? what are the uses of it?
Probable error of ‘r’ is a statistical measurement which measures reliability and
dependability of the value of ‘r’.
The Magnitude of probable error must lie within a limit which is obtained by the following
formula.
r = coefficient of correlation
PE ( r) = 0.6745 1-r2
n=No. of pairs of the 2 variables √n
0.6745-Constant
SE (r ) = 1-r2
√n
Usually ‘r’ is calculated from samples. Different samples drawn from the same population
the ‘r’ may vary. But the numerical value of such variations is expected to be less than the
PE
Probable Error is the difference (error) occurring due to taking samples from the
means or population.
Interpretation of ‘ r’ on the basis of PE significance of ‘ r’
Not at all significant correlation is taken to
r < PE ( r) be almost absent.
(less than)

r > 6 times PE ( r) It is significant


(More than)
r >PE( r) but< 6 times PE ( r) Correlation is taken to be moderate

By adding and subtracting the value of PE from the coefficient of correlation we get the
upper limit and lower limit within which the ‘r’ in the population can be expected to lie.
Symbolically P= r±PE
Uses of PE
a) It is used to determine the limits within which the population correlation coefficient
may be expected to lie.
b) Measure the reliability and dependability of the value of ‘r’
10. Explain the methods for calculating Spearman’s rank correlation co-efficient.
This method is a development over Karl Pearson’s method of correlation coefficient.
There are many occasions whereby the values of certain variables can not be measured in
quantitative terms. If we want to study the association between two attributes namely intelligence
and beauty, we can not assign definite values. To study the correlation between attributes. Method
developed by British Psychologist Charls Edward Spearman in 1904. This method is known as
Rank Correlation.
R= 1- 6∑D2 or 1 - 6∑D2 ‘D’ difference of rank , ‘N’ number of pairs.
N3-N n(n2-1)
When the actual ranks are given When the actual ranks are not given
Steps: Steps:

a) Take the difference of the two ranks ie. a) To assign the ranks either ascending or
R1-R2 descending order.
b)
Square these differences D2 and get b) In case of ascending order- smallest
them totaled ie. ∑D2 value is assigned the first rank
c) Apply the formula c) In case of descending order – largest
value is assigned the first rank.

When equal ranks appears repeated ranks( tie in rank)


In some cases 2 or more items are getting equal rank at that stage average rank is to be calculated.
1 - 6∑D2 + m3-m
12 ‘m’ stands for number of times each value repeats.
N3-N
11. What do you mean by con current deviation method ?
Under this method the directions of deviations are only taken. The magnitudes of the values are
ignored. Under this method the nature of correlation is known from the direction of change in the
values of variables. If the deviations of the 2 variables are concurrent then they move in the same
direction.
±√±(2c-n) ‘c’ number of concurrent deviations
n ‘n’ number of pairs compared n= N-1
12. Explain the Steps under current deviation method
1. Find out the deviations of change of X and Y compare the 1st value with 2nd value of each
variables.
2. Increase in value is denoted by +ve sign and decrease in value is denoted by –ve sign under
Dx and Dy.
3. Multiply Dx and Dy to determine the value of ‘c’ ie number of positive signs or number of
concurrent deviations.
4. r is +ve when 2c> n ( 5) r is –ve when 2c< n
Problems
No.1
X 7 8 9 6 5
Y 8 6 7 9 10

Find the Karl Pearson’s co-efficient of correlation


No.2
X: 5 10 5 11 12 4 3 2 7 6
Y: 1 6 2 8 5 1 4 6 5 2
Find the Karl Pearson’s co-efficient of correlation.
No.3
X Y
No.of Pairs of observation 15 15
Standard deviation 3.01 3.03
Covariance between X and Y is 8.13
Calculate Karl Pearson’s co-efficient of correlation

No.4
Find the co-efficient of correlation between age and playing habit of the following students.
Age: 14.5-15.5 15.5-16.5 16.5-17.5 17.5-18.5 18.5-19.5 19.5-20.5
No.of 250 200 150 120 100 80
students
Regular players: 200 150 90 48 30 12
No.5
Find the co-efficient of correlation between the density of population and the death rate.
Cities A B C D E F
Areas in sq.miles 150 180 100 60 120 80
Population in’000 30 90 40 42 72 24
No. of death 300 1440 560 840 1224 312

No 6
A student calculates the values of ‘r’ as 0.7 when the number of items (n) is 25. Find the limits
within which ‘r’ lies for another sample from the same universe.

No.7

Test the significance of correlation for the following values based on the number of observations
i) 10
and ‘r’= +0.4
No 8

The coefficient of rank correlation of the marks obtained by 10 students in statistics and
accountancy was 0.2. it was later discovered that the difference in ranks in the two subjects
of one of the student was wrongly entered as 7 instead of 9. Find correct correlation
coefficient.
No 9

The ranking of 10 individuals at the start and at the finish of a course of a training are as follows.
Individuals A B C D E F G H I J
Rank before 1 6 3 9 5 2 7 10 8 4
Rank after 6 8 3 2 7 10 5 9 4 1

No 10

Find out the spearman’s rank correlation coefficient


A 60 34 40 50 45 41 22 43 42 66 64
46
B 75 32 34 40 45 33 12 30 36 72 41
57

No 11

Find out the spearman’s rank correlation coefficient


X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70

No 12

Calculate the coefficient of concurrent deviation from the following data


X 55 50 45 51 25 65 35 30 75 75 70
Y 60 35 30 70 58 75 30 15 75 55 45

No.13
X Y
No.of Pairs of observation 20 20
Sum of squares of deviations 136 138
from mean
Summation of product of deviations of X and Y from their means =122
Calculate product moment correlation coefficient
Additional problems
1. Calculate Karl Pearsons’ ‘r’ method . Arithmetic mean of X and Y are 6 and 8 respectively.
X 6 2 10 - 8
Y 9 11 - 8 7
X series Y series
2. Sum of deviations from Assumed mean -14 18
Sum of squares of deviations from Assumed mean 4304 6308
Sum of products of deviation from their respective Assumed mean1510
No. of pairs of observation 12
Calculate Karl Pearson’s ‘ r’ between X and Y.

3. No. of pairs of observations of X and Y series = 1000


x = 4.5 y = 3.6
∑(x – x) (y – y) =4800
Calculate ‘r’ between X and Y series.

4. Covariance between X and Y is 488 , variance of X and Y are 824 and 325 respectively . Final
out ‘r’.

5. ‘r’ between X and Y= -0.75


Covariance is -15
S.D of Y series is 5
What will be the S,D of X series?

6. While calculating the ‘r’ , 30 pairs of X and by the following results were obtained
∑ x = 120, ∑x2 = 600 , ∑ Y =90, ∑Y2=250
∑xy = 356. It was however discovered that two pairs of observations were

X Y
8 10 While the correct values were X Y
12 7 8 12
Calculate the correct correlation coefficient. 10 8

7. The correlation co-efficient (on the basis of rank) of the marks awarded to 10 students in
commerce and economics was 0.2. Later, it was found that the difference in ranks in the two
subjects of one of the students was wrongly entered as 7 instead of 9 . Calculate the correct
rank correlation co-efficient)

8. From the following data relating to the marks secured by a batch of Candidates ascertain the
Rank Correlation coefficient and interpret result.
Candidates :A B C D E F G H I J
Marks in statistics : 55 40 50 35 37 18 30 22 15 5
Maths : 58 60 48 50 30 32 45 37 42 52
Economics :70 68 75 40 80 50 30 85 25 90

9 Calculate ‘r’ from the following results


N=10, X=140, Y=150, (X-10)2=180, (Y-15)2 =215
(X-10)(Y-15)=60
10. Calculate ‘r’ from the following data X series Series
Number of pairs of observations 15 15
Arithmetic Mean 25 18
Standard deviation 3.01 3.03
Sum of squares of deviations from the 136 138
Arithmetic Mean
Summation of product deviations of X and Y series from their respective Arithmetic
Mean=122
11.Co-efficient of correlation between 2 variates X and Y is 0.48. Their co-variance is 36. the
variance of X is 16. Find the standard deviation of Y series.
12.The co-efficient of rank correlation of the marks obtained by 10students in statistics and
accountancy was found to be 0.2. It was later discovered that the difference in ranks in the two
subjects obtained by one of the students was wrongly taken as 9 instead of 7. Find the correct
value of co-efficient of rank correlation.
13.If co-variance between X and Y is 10, variance of X and Y is 16 and 9. Find correlation co-
efficient.
14.Given r =0.8, XY =60, standard deviation of Y=2.5, X2 = 90. Find the number of items.
15.Determine the value of n, where r is 0.80, and its PE(r) is 0.025.
16.A computer while calculating the correlation co-efficient between two variables
X and Y from 25 pairs of observations obtained the following results:-
N=25,  X=125,  Y=100,  X2 =650,  Y2 =460 ,  XY=508
It was, however, discovered at the time of checking that two pairs of observations
were not correctly copied. They were taken as (6, 14) and (8, 6) while the correct values
were (8,12) and (6,8). Find out the correct value of correlation co-efficient.
17.Find out the relationship between age and blindness.
Age No. of persons Blind
(in thousands)
0 – 10 100 55
10 – 20 60 40
20 – 30 40 40
30 – 40 36 40
40 – 50 24 36
50 – 60 11 22
60 – 70 6 18
70 – 80 3 15
************

You might also like