Professional Documents
Culture Documents
A Positive Relationship
A Positive Relationship
A Positive Relationship
Correlation: Correlation between 2 variables X and Y indicates whether they are related to each
other and also to what extent. There are 2 types of correlation i.e. positive and negative
correlation. If the 2 variables move in the same direction i.e. increasing or decreasing together
then there is a positive correlation between them. If the 2 variables move in the opposite
direction i.e. one increasing and the other decreasing then there is a negative correlation between
them. There are different ways of expressing the correlation between two variables.
(i) Scatter Diagrams: By plotting the values of two variables X and Y, we can understand the
relationship between them.
A Positive Relationship
A Negative Relationship
No Apparent Relationship
For a given change in X, if the change in Y remains the same for all the points and in the same
direction, then all the points lie on the line and the slope remains the same throughout. This is
called as perfect positive correlation between X and Y.
Similarly, for a given change in X, if the change in Y remains the same for all the points but in
the opposite direction, then there is a perfect negative correlation between them.
(ii) Karl Pearson’s coefficient of correlation ‘r’: The value of the correlation coefficient lies
between -1 and 1. This tells us to what extent the variables are related.
If r is between 0.7 and 1, it shows high positive correlation, if it is between -0.7 and -1, it shows
high negative correlation. If r is close to 0.5 or -0.5, it shows moderate positive or negative
correlation. If r is close to 0, it shows low positive or negative correlation.
Karl Pearson’s correlation coefficient can be calculated as
r = Cov(X, Y)/ σx σy
Where Cov(X, Y) is the covariance between X and Y. σ x is the standard deviation of variable X
and σy is the standard deviation of variable Y.
Cov(X, Y) = (∑(X – X bar)*∑(Y-Y bar))/n where n is the number of observations.
σx = sqrt((∑(X – X bar)2/n), σy= sqrt((∑(Y – Y bar)2/n)
Substituting the expressions for Cov(X,Y), σx and σy in the formula for r, we get
r = ∑xy/ sqrt(∑x2 * ∑y2)
Where x = X –X bar, y = Y – Y bar
Q1. The following data relates to Sales (Crores of rupees) and Profits (in lakhs of rupees):
Sales X : 5 7 8 10 15
Profits Y : 12 15 17 20 21
Find the correlation coefficient between sales and profits.
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
5 12 -4 -5 20 16 25
7 15 -2 -2 4 4 4
8 17 -1 0 0 1 0
10 20 1 3 3 1 9
15 21 6 4 24 36 16
X bar = 45/5 = 9, Y bar = 85/5 = 17
Q2. The following data relates to advertising expenditure (in lakhs of rupees) and sales (in crores
of rupees):
Advertising expenditure X : 10 12 15 23 20 22
Sales Y : 14 17 23 25 21 22
X Y x = X –X bar y = Y –Y bar xy x2 y2
This means that there is a high positive correlation between X and Y. So both X and Y move in
the same direction and the extent of relationship between them is high.
(iii) Spearman’s coefficient of rank correlation ‘r s’: If the data is available in the form of
rankings, then we can find the rank correlation coefficient to find the extent of similarity or
dissimilarity between them. The value of rs lies between -1 and +1. We can calculate rs as
follows:
rs= 1- (6∑d2/ (n(n2-1))) where n is the number of observations and d is the difference
between the ranks.
Suppose we ask two employees to give their rankings to various factors affecting job
satisfaction. Then we can find out to what extent the opinion of these two employees is similar or
dissimilar. If rs is positive then the opinion is similar and if it is negative, then the opinion is
dissimilar.
(R1) (R2)
Salary 1 3 4
Job conditions 2 2 0
Growth opportunities 3 1 4
∑d2 = 8, n = 4
So there is a low positive correlation between the opinions of the two employees. This means
that the opinions of the two employees are similar but the extent of relationship is less.
Now, we take an example where marks of students in two subjects are given. We can convert
these marks into ranks and find the rank correlation coefficient.
So there is a low negative correlation between the marks in the two tests. This means that the
marks in the two tests are moving in the opposite direction but the extent of relationship is very
less.
If some values are repeating for one or both the variables, then we give an average rank to these
values and introduce a correction factor in the formula which can be written as
Correction factor = 1/12 * [(m13 –m1) + (m23 – m2) + …] where m1, m2 … are the number of
times a particular value repeats.
Q1. For the following data, find the rank correlation coefficient after making adjustment for tied
ranks.
X: 48 33 40 9 16 16 65 24 16 57
Y: 13 13 24 6 15 4 20 9 6 19
Solution:
X Y R1 R2 d2
48 13 3 5.5 6.25
33 13 5 5.5 0.25
40 24 4 1 9
9 6 10 8.5 2.25
16 15 8 4 16
16 4 8 10 4
65 20 1 2 1
24 9 6 7 1
16 6 8 8.5 0.25
57 19 2 3 1
∑d2 = 41, n = 10
Regression: In the case of regression, we express the relation between 2 variables in the form of
a cause- effect relationship which can be written as a linear equation
Example, Y = 1 + 2 X
For some given values of X and Y, we can have many lines drawn through them, but there will
be only one line which is the closest to these points and this is called as the best fit line. The
values of a1 & byx can be found by using the method of least squares. In this method, we try to
minimise the value of ∑e2 where e is the difference between the Y coordinates of the point
plotted and the point on the straight line.
These values of a1 & byx can be substituted in the equation Y = a1 + b yx X and this equation can
be used to forecast the value of Y for some given value of X.
X = a2 + bxy Y
byx x bxy = r2 where r2 is called as the coefficient of determination. It signifies the percentage
variation in the dependent variable that is explained by the independent variable.
r = ± √ byx x bxy
Both byx and bxy should be of the same sign otherwise r will be imaginary. If b yx and bxy are
positive, then r will have a positive sign and if b yx and bxy are negative, then r also will be
negative.
Q1. For the following data, find the simple linear regression equation of Y on X and forecast Y
when X = 20.
X Y x = X –X bar y = Y – Y bar xy x2 y2
5 12 -4 -5 20 16 25
7 15 -2 -2 4 4 4
8 17 -1 0 0 1 0
10 20 1 3 3 1 9
15 21 6 4 24 36 16
X bar = 45/5 = 9, Y bar = 85/5 = 17
Q2. The coefficient of correlation between the variables x and y is 0.64, their covariance is 16.
The variance of x is 9. Find the standard deviation of y.
Q3. Calculate the regression of Y on X and the regression equation of X on Y for the following
data. Also find the coefficient of determination.
X : 10 12 15 23 20
Y : 14 17 23 25 21
x = X –X bar: -6 -4 -1 7 4
y = Y –Y bar: -6 -3 3 5 1
xy : 36 12 -3 35 4
x2 : 36 16 1 49 16
y2 : 36 9 9 25 1
x2 : 36 16 1 49 16
byx = ∑xy / ∑ x2 = 84 / 118 = 0.71, a1 = Y bar – byx X bar = 20 – (0.71) (16) = 8.64
Q4. Find Karl Pearson’s coefficient of correlation and the equation of the best fit simple linear
regression line for the following data.
X 9 8 7 6 5 4 3 2 1
Y 15 16 14 13 11 12 10 8 9
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
9 15 4 3 12 16 9
8 16 3 4 12 9 16
7 14 2 2 4 4 4
6 13 1 1 1 1 1
5 11 0 -1 0 0 1
4 12 -1 0 0 1 0
3 10 -2 -2 4 4 4
2 8 -3 -4 12 9 16
1 9 -4 -3 12 16 9
X bar = 45/9 = 5, Y bar = 108/9= 12
Q5. Following are the average prices of a particular stock and the values of Stock Exchange
index for 6 years:
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
245 307 -118 -24 2832 13924 576
255 322 -108 -9 972 11664 81
240 337 -123 6 -738 15129 36
390 310 27 -21 -567 729 441
655 350 292 19 5548 85264 361
393 360 30 29 870 900 841
X bar =2178/6 = 363, Y bar = 1986/6 = 331
byx = ∑xy / ∑ x2 = 8917 / 127610 = 0.07, a1 = Y bar – byx X bar = 331 – (0.07) (363) = 305.59
r = ∑xy/ sqrt (∑x2 * ∑y2) = 8917/ √127610 x 2336 = 8910/(357.22 x 48.33) = 0.516
Q6. Find Karl Pearson’s coefficient of correlation and the equation of the best fit simple linear
regression line for the following data.
Age X 56 42 36 47 49 42 60 68
Blood pressure Y 147 125 118 128 145 140 155 162
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
56 147 6 7 42 36 49
42 125 -8 -15 120 64 225
36 118 -14 -22 308 196 484
47 128 -3 -12 36 9 144
49 145 -1 5 -5 1 25
42 140 -8 0 0 64 0
60 155 10 15 150 100 225
68 162 18 22 396 324 484
X bar = 400/8 = 50, Y bar = 1120/8 = 140
byx = ∑xy / ∑ x2 = 1047 / 794 = 1.319, a1 = Y bar – byx X bar = 140 – (1.319) (50) = 74.05
r = ∑xy/ sqrt (∑x2 * ∑y2) = 1047/ √794 x 1636 = 1047/(28.2 x 40.45) = 0.918
Q7. A research project was undertaken to determine if there is a relationship between the years
of experience on the job (X) and efficiency rating of employees (Y). The objective of the study
was to predict the efficiency rating of the employee. The sample results are as follows:
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
1 6 -6 2 -12 36 4
20 5 13 1 13 169 1
6 3 -1 -1 1 1 1
8 5 1 1 1 1 1
2 2 -5 -2 10 25 4
1 2 -6 -2 12 36 4
14 4 7 0 0 49 0
8 5 1 1 1 1 1
4 4 -3 0 0 9 0
6 4 -1 0 0 1 0
byx = ∑xy / ∑ x2 = 26 / 328 = 0.079, a1 = Y bar – byx X bar = 4 – (0.079) (7) = 3.447
Q8. Quinine may be determined by measuring the fluorescence intensity in IM sulphuric acid.
Standard solutions of quinine gave the following fluorescence values. Calculate the correlation
coefficient.
If the intensity was observed to be 14.85 what is the concentration of quinine Y likely to be in
the solution.
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
0 0 -9 -0.2 1.8 81 0.04
r = ∑xy/ sqrt (∑x2 * ∑y2) = 4.25/ √182.66 x 0.1 = 4.25/ √18.266 = 4.25/4.27 = 0.995
This means that there is a very high positive correlation between X and Y.
Q9. The manufacturers of a particular brand of chocolate were interested in examining the
relationship between the sales of chocolates and shelf space allocated to that brand of chocolate
by various stores. Data from 10 stores are as follows:
Sales ( Rs in thousands) 25 15 28 30 17 16 12 21 19 27
Y
Shelf Space (sq ft) X 5 3.2 5.4 6.1 4.3 3. 2.6 6.4 4.9 6
1
Determine the regression to predict sales using shelf space as the independent variable. Also
find the Karl Pearson’s correlation coefficient between X and Y.
Solution:
X Y x = X –X bar y = Y – Y bar xy x2 y2
5 25 0.3 4 1.2 0.09 16
3.2 15 -1.5 -6 9 2.25 36
5.4 28 0.7 7 4.9 0.49 49
6.1 30 1.4 9 12.6 1.96 81
4.3 17 -0.4 -4 1.6 0.16 16
3.1 16 -1.6 -5 8 2.56 25
2.6 12 -2.1 -9 18.9 4.41 81
6.4 21 1.7 0 0 2.89 0
4.9 19 0.2 -2 -0.4 0.04 4
6 27 1.3 6 7.8 1.69 36
Y bar = 210/10 = 21, X bar = 47/10 = 4.7
r = ∑xy/ sqrt (∑x2 * ∑y2) = 63.6/ √344 x 16.54 = 63.6/ √5689.76 = 63.6/75.44 = 0.843
The data below shows the profit (in Rs.’000), sales (in Rs. Lakhs) and advertising expenditure(in
Rs.’00). Find the multiple regression equation of profit on sales and advertising expenditure.
Sales(X1) Advertising expenditure (X2) Profit(Y) X12 X1X2 X22 X1Y X2Y
24 16 10 576 384 256 240 160
35 17 11 1225 595 289 385 187
38 18 12 1444 684 324 456 216
41 19 13 1681 779 361 533 247
42 20 14 1764 840 400 588 280
∑X1= 180, ∑X2= 90, ∑Y = 60, ∑X1X2=3282, ∑X2 2 =1630, ∑X12 =6690, ∑X1Y=2202,
∑X2Y= 1090, n= 5
60= 5a + 180 b1 + 90 b2
Solving the above equations, we will get the values of a, b1 and b2 which we substitute in the
equation
Y = a + b1X1 + b2X2
Y = 8.8 + 0.089 X1 + 0.49 X2