Professional Documents
Culture Documents
Regression Analysis
Regression Analysis
Regression Analysis
10.1 INTRODUCTION
In Chapter 9 we introduced the concept of statistical relationship between two variables such as: level
of sales and amount of advertising; yield of a crop and the amount of fertilizer used; price of a
product and its supply, and so on. The relationship between such variables indicate the degree and
direction of their association, but fail to answer following question:
• Is there any functional (or algebraic) relationship between two variables? If yes, can it be used
to estimate the most likely value of one variable, given the value of other variable?
The statistical technique that expresses the relationship between two or more variables in the
form of an equation to estimate the value of a variable, based on the given value of another variable,
is called regression analysis. The variable whose value is estimated using the algebraic equation is
called dependent (or response) variable and the variable whose value is used to estimate this value is
called independent (regressor or predictor) variable. The linear algebraic equation used for expressing a
dependent variable in terms of independent variable is called linear regression equation.
The basic differences between correlation and regression analysis are summarized as follows:
1. Developing an algebraic equation between two variables from sample data and predicting the
value of one variable, given the value of the other variable is referred to as regression analysis,
while measuring the strength (or degree) of the relationship between two variables is referred as
REGRESSION ANALYSIS 337
correlation analysis. The sign of correlation coefficient indicates the nature (direct or inverse) of
relationship between two variables, while the absolute value of correlation coefficient indicates
the extent of relationship.
2. Correlation analysis determines an association between two variables x and y but not that they
have a cause-and-effect relationship. Regression analysis, in contrast to correlation, determines
the cause-and-effect relationship between x and y, that is, a change in the value of independent
variable x causes a corresponding change (effect) in the value of dependent variable y if all other
factors that affect y remain unchanged.
3. In linear regression analysis one variable is considered as dependent variable and other as
independent variable, while in correlation analysis both variables are considered to be
independent.
4. The coefficient of determination r2 indicates the proportion of total variance in the dependent variable that
is explained or accounted for by the variation in the independent variable. Since value of r2 is determined
from a sample, its value is subject to sampling error. Even if the value of r2 is high, the assumption
of a linear regression may be incorrect because it may represent a portion of the relationship that
actually is in the form of a curve.
Remarks
1. The relationship between the dependent variable y and independent variable x exists and is
linear. The average relationship between x and y can be described by a simple linear regression
equation y = a + bx + e, where e is the deviation of a particular value of y from its expected value
for a given value of independent variable x.
2. For every value of the independent variable x, there is an expected (or mean) value of the dependent
variable y and these values are normally distributed. The mean of these normally distributed
values fall on the line of regression.
3. The dependent variable y is a continuous random variable, whereas values of the independent
variable x are fixed values and are not random.
4. The sampling error associated with the expected value of the dependent variable y is assumed to
be an independent random variable distributed normally with mean zero and constant standard
deviation. The errors are not related with each other in successive observations.
5. The standard deviation and variance of expected values of the dependent variable y about the
regression line are constant for all values of the independent variable x within the range of the
sample data.
338 BUSINESS S T A T I S T I C S
6. The value of the dependent variable cannot be estimated for a value of an independent variable
lying outside the range of values in the sample data.
Remarks
1. When variables x and y are correlated perfectly (either positive or negative) these lines coincide, that is, we
have only one line.
2. Higher the degree of correlation, nearer the two regression lines are to each other.
3. Lesser the degree of correlation, more the two regression lines are away from each other. That is, when r
= 0, the two lines are at right angle to each other.
4. Two linear regression lines intersect each other at the point of the average value of variables x and y.
To determine the value of ŷ for a given value of x, this equation requires the determination of two
unknown constants a (intercept) and b (also called regression coefficient). Once these constants are
calculated, the regression line can be used to compute an estimated value of the dependent variable
y for a given value of independent variable x.
REGRESSION ANALYSIS 339
The particular values of a and b define a specific linear relationship between x and y based on
sample data. The coefficient ‘a’ represents the level of fitted line (i.e., the distance of the line above or
below the origin) when x equals zero, whereas coefficient ‘b’ represents the slope of the line (a measure
of the change in the estimated value of y for a one-unit change in x).
The regression coefficient ‘b’ is also denoted as:
• byx (regression coefficient of y on x) in the regression line, y = a + bx
• bxy (regression coefficient of x on y) in the regression line, x = c + dy
1. The correlation coefficient is the geometric mean of two regression coefficients, that is, r = byx × bxy .
2. If one regression coefficient is greater than one, then other regression coefficient must be less than one,
because the value of correlation coefficient r cannot exceed one. However, both the regression coefficients
may be less than one.
3. Both regression coefficients must have the same sign (either positive or negative). This property rules out
the case of opposite sign of two regression coefficients.
4. The correlation coefficient will have the same sign (either positive or negative) as that of the two regression
coefficients. For example, if by x = – 0.664 and bxy = – 0.234, then r = – 0.664 × 0.234 = – 0.394.
5. The arithmetic mean of regression coefficients bxy and byx is more than or equal to the correlation coefficient
r, that is, (by x + bx y ) / 2 ≥ r. For example, if byx = – 0.664 and bx y = – 0.234, then the arithmetic mean of
these two values is (– 0.664 – 0.234)/2 = – 0.449, and this value is more than the value of r = – 0.394.
6. Regression coefficients are independent of origin but not of scale.
where n is the total number of pairs of values of x and y in a sample data. The equations (10.1) are
called normal equations with respect to the regression line of y on x. After solving these equations for
a and b, the values of a and b are substituted in the regression equation, y = a + bx.
Similarly if we have a least squares line x̂ = c + dy of x on y, where x̂ is the estimated mean value
of dependent variable x, then the normal equations will be
Σ x = nc + d Σ y
Σ xy = n Σ y + d Σ y2
These equations are solved in the same manner as described above for constants c and d. The values
of these constants are substituted to the regression equation x = c + dy.
2
n 2 2
n 1 n
Sxx = ∑ ( xi − x ) = ∑ xi − ∑ xi
i =1 i =1 n i =1
2
S yx n 2 n 1 n
Syy = ∑ ( yi − y ) = ∑ yi −
2
and d = , where ∑ y
S yy i =1 i =1 n i =1
Since the regression line passes through the point ( x , y ), the mean values of x and y and the
regression equations can be used to find the value of constants a and c as follows:
a = y − bx for regression equation of y on x
c = x – d y for regression equation of x on y
The calculated values of a, b and c, d are substituted in the regression line y = a + bx and
x = c + dy respectively to determine the exact relationship.
Example 10.1: Use least squares regression line to estimate the increase in sales revenue expected
from an increase of 7.5 per cent in advertising expenditure.
Solution: Assume sales revenue ( y) is dependent on advertising expenditure (x). Calculations for
regression line using following normal equations are shown in Table 10.1
Σ y = na + bΣ x and Σ x y = a Σ x + bΣ x2
Σ xΣ y 40 × 56
where Sxy = Σ x y – = 373 – = 93
n 8
(Σ x)2 (56)2
Sxx = Σ x2 – = 524 – = 132
n 8
The intercept ‘a’ on the y-axis is calculated as:
40 56
a = y − bx = – 0.704 × = 5 – 0.704×7 = 0.072
8 8
Substituting the values of a = 0.072 and b = 0.704 in the regression equation, we get
y = a + bx = 0.072 + 0.704 x
For x = 0.075, we have y = 0.072 + 0.704 (0.075) = 0.1248 or 12.48%.
342 BUSINESS S T A T I S T I C S
Example 10.2: The owner of a small garment shop is hopeful that his sales are rising significantly
week by week. Treating the sales for the previous six weeks as a typical example of this rising trend,
he recorded them in Rs. 1000’s and analysed the results.
Week : 1 2 3 4 5 6
Sales : 2.69 2.62 2.80 2.70 2.75 2.81
Fit a linear regression equation to suggest to him the weekly rate at which his sales are rising and
use this equation to estimate expected sales for the 7th week.
Solution: Assume sales ( y) is dependent on weeks (x). Then the normal equations for regression
equation: y = a + bx are written as:
Σ y = na + bΣ x and Σxy = aΣ x + bΣx2
Calculations for sales during various weeks are shown in Table 10.2.
(∑ x)2 (21)2
Sxx = ∑ x 2 − = 91 – = 17.5
n 6
The intercept ‘a’ on the y-axis is calculated as
16.37 21
a = y − bx = − 0.025 ×
6 6
= 2.728 – 0.025×3.5 = 2.64
Substituting the values a = 2.64 and b = 0.025 in the regression equation, we have
y = a + bx = 2.64 + 0.025x
For x = 7, we have y = 2.64 + 0.025 (7) = 2.815
Hence the expected sales during the 7th week is likely to be Rs. 2.815 (in Rs. 1000’s).
(a) Deviations Taken from Actual Mean Values of x and y If deviations of actual values of variables x and
y are taken from their mean values x and y , then the regression equations can be written as:
• Regression equation of y on x • Regression equation of x on y
y – y = byx (x – x ) x – x = bx y (y – y )
where by x = regression coefficient of where bx y = regression coefficient
= y on x. where bxy = of x on y.
The value of byx can be calculated using The value of bx y can be calculated
the using the formula formula
Σ (x − x ) ( y − y ) Σ (x − x ) ( y − y )
by x = bx y =
Σ ( x − x )2 Σ ( y − y )2
(b) Deviations Taken from Assumed Mean Values for x and y If mean value of either x or y or both are in
fractions, then we must prefer to take deviations of actual values of variables x and y from their
assumed means.
• Regression equation of y on x • Regression equation of x on y
y – y = byx (x – x ) x – x = bxy (y – y )
n Σ dx dy − (Σ dx ) (Σ d y ) n Σ dx dy − (Σ dx )(Σ dy )
where by x = where bx y =
nΣ dx2 2
− (Σ dx ) n Σ dy2 − (Σ dy )2
n = number of observations n = number of observations
dx = x – A; A is assumed mean of x dx = x – A; A is assumed mean
dx = = of x
dy = y – B; B is assumed mean of y dy = y – B; B is assumed mean
of y
(c) Regression Coefficients in Terms of Correlation Coefficient If deviations are taken from actual mean
values, then the values of regression coefficients can be alternatively calculated as follows:
Σ (x − x ) ( y − y ) Σ (x − x ) ( y − y )
byx = 2
bxy =
Σ (x − x ) Σ ( y − y )2
Covariance(x, y) σy Covariance(x, y) σx
= = r⋅ = = r⋅
σ2x σx σ2y σy
Example 10.3: Compute the two regression coefficients using the value of actual mean value of X and
Y from the data given below and then work out the value of r.
X : 7 4 8 6 5
Y : 6 5 9 8 2 [GJU (Hisar), BBA, 2004]
Solution: Calculations for two regression coefficients are given below:
X Y x=X – X y=Y– Y xy x2 y2
7 6 +1 0 0 1 0
4 5 –2 –1 2 4 1
8 9 +2 +3 6 4 9
6 8 0 +2 0 0 4
5 2 –1 –4 4 1 16
ΣX 30 ΣY ΣY
X = = =6 and Y = = =6
n 5 n 5
Σ xy 12
Regression coefficient of X on Y is: bxy = = = 0.4
Σ y2 30
Σ xy 12
Regression coefficient of Y on X: byx = = = 1.2
Σ x2 10
Example 10.4: Use following data to find out the two lines of regression and compute the Karl
Pearson’s coefficient of correlation.
Σ x = 250, Σ y = 300, Σ x y = 7900, Σ x2 = 6500, Σ y2 = 10000, n = 10
Solution: Calculate x , y , bxy and byx with the help of given information as follows:
Σx 250 Σy 300
x = = = 25; y = = = 30
n 10 n 10
nΣ x y − Σ xΣ y 10(7900) − (250) (300)
bxy = 2 2
= 2
= 0.4
n Σ y − (Σ y) 10(10000) − (300)
Let the regression line of x on y be:
x − x = bxy ( y − y )
x – 25 = 0.4( y – 30)
x = 0.4y – 12 + 25 = 0.4y + 13
n Σ x y − Σ xΣ y 10(7900) − (250) (300)
Also byx = 2 2 = = 1.6
n Σ x − (Σ x) 10(6500) − (250)2
Let the regression line of y on x be
y − y = byx ( x − x )
y – 30 = 1.6(x – 25)
y = 1.6x – 40 + 30 = 1.6x – 10
Hence, the coefficient of correlation is: r = bxy × byx = 0.4 × 1.6 = 0.8.
Example 10.5: A departmental store gives training to its salesman which is followed by a test. The
store is considering whether it should terminate the service of any salesman who does not do well in
the test. The following data gives the scores and sales made by nine salesmen during a certain period:
Test Scores : 14 19 24 21 26 22 15 20 19
Sales (‘00 Rs.) : 31 36 48 37 50 45 33 41 39
Calculate the correlation coefficient between test scores and the sales. Does it indicate that the
termination of services of low test scores is justified? If the firm wants a minimum sales volume of Rs.
3000, what is the minimum test score that will ensure continuation of service? Also estimate the most
probable sales volume of a sales making a score of 28. [Delhi Univ., B.Com(Hons), 1998, 2002]
Solution: Let test scores be x and sales be y. Calculation required to determine co-relation coeffi-
cient are shown below:
REGRESSION ANALYSIS 345
X dx = X – 20 dx2 Y dy = Y – 40 dy2 dx dy
14 –6 36 31 –9 81 54
19 –1 1 36 –4 16 4
24 4 16 48 8 64 32
21 1 1 37 –3 9 –3
26 6 36 50 10 100 60
22 2 4 45 5 25 10
15 –5 25 33 –7 49 35
20 0 0 41 1 1 0
19 –1 1 39 –1 1 1
180 120 360 346 193
Σ x 180 Σ y 360
x = = = 20 and y = = = 40
n 9 n 9
Σxy 193
r= = = 0.9476
2
Σx × Σy2 120 × 346
The value of r shown that there is a high degree of correlation between test scores (x) and sales (y).
This implies that the persons having low test scores will not be able to make good sales hence termi-
nation is justified.
Regression equation of X on Y is:
Σ dx dy − (Σ dx ) (Σdy ) 193
X – X = bxy ( Y − Y ); where bxy = = = 0.557
Σd2y − (Σdy )2 346
x – 20 = 0.557 (y – 40)
x = 0.557 y – 22.28 + 20 = 0.557 Y – 2.28
When sales is Rs. 3000, i.e. y = 30, then x = 0.557 (30) – 2.28 = 14.43
Hence, test score = 14.43 when sales volume is Rs. 3000.
Regression equation of Y on X:
Σxy 193
y − y = byx ( x − x ) ; where byx = = =1.6
Σx 2 120
∴ y – 40 = 1.6 (x – 20)
y = 1.6x – 32 + 40 = 1.6x + 8
When test score is 28, i.e. x = 28, then y= 1.6(28) + 8 = 52.8
Thus sales volume = 52.8 × 100 = Rs. 5280.
Example 10.6: The following data relate to the scores obtained by 9 salesmen of a company in an
intelligence test and their weekly sales (in Rs. 1000’s)
Salesmen : A B C D E F G H I
Test scores : 50 60 50 60 80 50 80 40 70
Weekly sales : 30 60 40 50 60 30 70 50 60
(a) Obtain the regression equation of sales on intelligence test scores of the salesmen.
(b) If the intelligence test score of a salesman is 65, what would be his expected weekly sales.
[HP Univ., MCom, 1996]
Solution: Assume weekly sales (y) as dependent variable and test scores (x) as independent variable.
Calculations for the following regression equation are shown in Table 10.3.
y – y = by x (x – x )
346 BUSINESS S T A T I S T I C S
Σx 540 Σy 450
(a) x = = = 60; y = = = 50
n 9 n 9
Σ dx dy − (Σ dx )(Σ d y ) 1200
byx = = = 0.75
Σ dx2 − (Σ dx )2
1600
Job : A B C D E F G H I
Points : 5 25 7 19 10 12 15 28 16
Pay (Rs.) : 3.0 5.0 3.25 6.5 5.5 5.6 6.0 7.2 6.1
(a) Find the least squares regression line for linking pay scales to points.
(b) Estimate the monthly pay for a job graded by 20 points.
Solution: Assume monthly pay (y) as the dependent variable and job grade points (x) as the independent
variable. Calculations for the following regression equation are shown in Table 10.4.
y – y = byx (x – x )
REGRESSION ANALYSIS 347
Σx 137 Σy 48.15
(a) x = = = 15.22; y = = = 5.35
n 9 n 9
Since mean values x and y are non-integer value, therefore deviations are taken from assumed mean
as shown in Table 10.4.
n Σ dx d y − (Σ dx ) (Σ d y ) 9 × 65.40 − 2 × 3.15 582.3
byx = = = = 0.133
n Σ dx2 − (Σ dx ) 2
9 × 484 − (2)2 4352
Substituting values in the regression equation, we have
y – y = byx (x – x ) or y – 5.35 = 0.133 (x – 15.22) = 3.326 + 0.133x
(b) For job grade point x=20, the estimated average pay scale is given by
y = 3.326 + 0.133x = 3.326 + 0.133 (20) = 5.986
Hence, likely monthly pay for a job with grade points 20 is Rs. 5986.
Example 10.8: Find two regression equations from the data given below:
x : 57 58 59 59 60 61 62 64
y : 77 78 75 78 82 82 79 81 [GJU (Hisar), BBA, 2005]
Solution: Calculations of two regression equations as shown below:
x y dx = x – 59 dY = y – 82 dx dy dx2 dy2
57 77 –2 –5 +10 4 25
58 78 –1 –4 +4 1 16
59 75 0 –7 0 0 49
59 78 0 –4 0 0 16
60 82 +1 0 0 1 0
61 82 +2 0 0 4 0
62 79 +3 –3 –9 9 9
64 81 +5 –1 –5 25 1
Σx 480 Σy 632
x = = = 60 and y = = = 79
n 8 n 8
x – 60 = 0.55( y − y )
x = 0.55y – 43.45 + 60 = 0.55y + 16.55
Regression equation of y on x: y − y = byx ( x − x )
y – 79 = 0.67 (x – 60)
y = 0.67 x – 40.2 + 79 = 0.67 x + 38.8.
Example 10.9: The following data give the ages and blood pressure of 10 women:
Age : 156 142 136 147 149 142 160 172 163 155
Blood pressure : 147 125 118 128 145 140 155 160 149 150
(a) Find the correlation coefficient between age and blood pressure.
(b) Determine the least squares regression equation of blood pressure on age.
(c) Estimate the blood pressure of a woman whose age is 45 years.
Solution: Assume blood pressure (y) as the dependent variable and age (x) as the independent variable.
Calculations for regression equation of blood pressure on age are shown in Table 10.5.
Table 10.5 Calculations for Regression Equation
We may conclude that there is a high degree of positive correlation between age and blood pressure.
(b) The regression equation of blood pressure on age is given by
y – y = byx (x – x )
Σx 522 Σy 1417
x = = = 52.2; y = = = 141.7
n 10 n 10
n Σ dx dy − Σ dx Σ dy 10(1115) − 32(− 33) 12206
and byx = = = = 1.11
n Σ dx2 2
− (Σ dx ) 10(1202) − (32)2 10996
Example 10.10: The General Sales Manager of Kiran Enterprises—an enterprise dealing in the sale
of readymade men’s wear—is toying with the idea of increasing his sales to
Rs. 80,000. On checking the records of sales during the last 10 years, it was found that the annual
sale proceeds and advertisement expenditure were highly correlated to the extent of 0.8. It was
further noted that the annual average sale has been Rs. 45,000 and annual average advertisement
expenditure Rs. 30,000, with a variance of Rs. 1600 and Rs. 625 in advertisement expenditure
respectively.
In view of the above, how much expenditure on advertisement would you suggest the General
Sales Manager of the enterprise to incur to meet his target of sales?
Solution: Assume advertisement expenditure (y) as the dependent variable and sales (x) as the
independent variable. Then the regression equation advertisement expenditure on sales is given by
σy
(y – y ) = r (x − x )
σx
Given r = 0.8, σx = 40, σy = 25, x = 45,000, y = 30,000. Substituting these value in the above
equation, we have
25
(y – 30,000) = 0.8 (x – 45,000) = 0.5 (x – 45,000)
40
y = 30,000 + 0.5x – 22,500 = 7500 + 0.5x
When a sales target is fixed at x = 80,000, the estimated amount likely to the spent on advertisement
would be
y = 7500 + 0.5×80,000 = 7500 + 40,000 = Rs. 47,500
350 BUSINESS S T A T I S T I C S
Example 10.11: You are given the following information about advertising expenditure and sales:
Arithmetic mean, x 10 90
Standard deviation, σ 3 12
Example 10.12: For 100 students of a class, the regression equation of marks in statistics (x) on marks
in economics (y) is: 3y – 5x + 180 = 0. If marks in economics is 50 and variance of marks in statistics
is (4/9) of variance of marks in economics, find mean marks in statistics and the coefficient of correlation
between them. [Delhi Univ., BCom (Hons), 2005]
Also it is given that y = 50. To find x , put y = 50 in the regression equation we get
3(50) – 5x + 180 = 0 or x = 66, i.e. x = 66
4 2 2 σx 2
It is given that σ2x = σ y i.e. σ2x = σy or =
9 3 σy 3
3 σx 3
Since, bxy = or r . =
5 σy 5
r ( 23 ) = 53 or r =
3 3
×
5 2
=
9
10
= 0.9
Solution: Given, regression equation of y on x : y = 20 + 0.4x. This implies that byx = 0.4.
Also, x = 30, put x = 30 in the regression equation of y on x, we get
y = 20 + 0.4 (30) = 20 + 12 = 32
Thus y = 32
We know that, r2 = bxy × byx
(0.8)2 = bxy × 0.4 [since r = 0.8, byx = 0.4]
0.64
bxy = = 1.6
0.4
Regression equation of x on y is:
x − x = bxy ( y − y )
x – 30 = 1.6 ( y – 32) = 1.6 y – 51.2 + 30 = 1.6 y – 21.2.
Mean 18 100
Standard deviation 14 20
(0.8)(20)
Y – 100 = ( X − 18) = 1.14 (X – 18)
14
Y = 1.14 X – 20.52 + 100 = 1.14 X + 79.48
If X = 70, then Y = 1.14 (70) + 79.48 = 159.28.
Example 10.16: In a partially destroyed laboratory record of an analysis of regression data, the
following results only are legible:
Variance of x = 9
Regression equations : 8x – 10y + 66 = 0 and 40x – 18y = 214
Find on the basis of the above information:
(a) The mean values of x and y,
(b) Coefficient of correlation between x and y, and
(c) Standard deviation of y.
Solution: (a) Since two regression lines always intersect at a point ( x , y ) representing mean values of the
variables involved, solving given regression equations to get the mean values x and y as shown below:
8x – 10y = – 66
40x – 18y = 214
Multiplying the first equation by 5 and subtracting from the second, we have
32y = 544 or y = 17, i.e. y = 17
Substituting the value of y in the first equation, we get
8x – 10(17) = – 66 or x = 13, that is, x = 13
(b) To find correlation coefficient r between x and y, we need to determine the regression coefficients
bxy and byx.
Rewriting the given regression equations in such a way that the coefficient of dependent variable
is less than one at least in one equation.
REGRESSION ANALYSIS 353
66 8
8x – 10 y = – 66 or 10 y = 66 + 8x or y= + x
10 10
That is, byx = 8/10 = 0.80
214 18
40x – 18y = 214 or 40x = 214 + 18y or x = + y
40 40
That is, bxy = 18/40 = 0.45
Hence coefficient of correlation r between x and y is given by
Example 10.17: There are two series of index numbers, P for price index and S for stock of a
commodity. The mean and standard deviation of P are 100 and 8 and of S are 103 and 4 respectively.
The correlation coefficient between the two series is 0.4. With these data, work out a linear equation
to read off values of P for various values of S. Can the same equation be used to read off values of S for
various values of P?
Solution: The regression equation to read off values of P for various values S is given by
σp
P = a + bS or (P – P ) = r (S − S)
σs
Given P = 100, S = 103, σp = 8, σs = 4, r = 0.4. Substituting these values in the above equation,
we have
8
P – 100 = 0.4 (S − 103) or P = 17.6 + 0.8 S
4
This equation cannot be used to read off values of S for various values of P. Thus to read off values of
S for various values of P we use another regression equation of the form:
σs
S = c + dP or S−S = (P − P)
σp
Substituting given values in this equation, we have
4
S – 103 = 0.4 (P – 100) or S = 83 + 0.2P
8
Example 10.18: A panel of Judges A and B graded seven debators and independently awarded the
following marks:
Debator : 1 2 3 4 5 6 7
Marks of A : 40 34 28 30 44 38 31
Marks of B : 32 39 26 30 38 34 28
An eighth debator was awarded 36 marks by Judge A while Judge B was not present. If Judge B
was also present, how many marks would you expect him to award to eighth debator assuming same
degree of relationship exists in judgement? [Delhi Univ., BCom (Hons) 1993]
Solution: Let marks of A be denoted by x and that of B by y. A = 30 and B = 30 be assumed as mean
value of x-series and y-series, respectively. The calculation required for regression equations are
shown below:
354 BUSINESS S T A T I S T I C S
40 10 100 32 2 4 20
34 4 16 39 9 81 36
28 –2 4 26 –4 16 8
30 ← A 0 0 30 ← B 0 0 0
44 14 196 38 8 64 112
38 8 64 34 4 16 32
31 1 1 28 –2 4 2
245 35 381 227 17 185 206
Σx 245 Σy 227
x = = = 35 and y= = = 32.43
n 7 n 7
Regression coefficient of y on x:
n Σ dx d y − Σ dx Σ d y 7(206) − (35)(17) 1442 − 595 847
byx = = = = = 0.587
2
n Σ dx − (Σ dx )
2
7(381) − (35)2 2667 − 1225 1442
−8
r = byx × bxy = × (−2) =1.06
14
REGRESSION ANALYSIS 355
Since value of r cannot exceed one, consider another set of regression equations:
Regression equation of x on y is
208 14
14y – 208 = –8x or x= −
8 8
−14
Thus bxy =
8
Regression equation of y on x is:
145 5
5x – 145 = –10y 10y = 145 – 5x or y = − x
10 10
−5 1
Thus byx = =−
10 2
1 −14
Now r = byx × bxy = − × = 0.875 = ± 0.93
2 8
Since bxy and byx both are negative, r should also be negative. Hence r = – 0.93.
rσ y 1 − 0.93 σ y
We know that, byx = or − = , where σx = 2
σx 2 2
2
or σy = = 1.075
0.93 × 2
σx 6 σy 768
We know that bx y = r = and byx = r = , therefore
σy 5 σx 1000
356 BUSINESS S T A T I S T I C S
6 768
bx y byx = r2 = × = 0.9216
5 1000
Hence r = 0.9216 = 0.96.
Since both b xy and b yx are positive, the correlation coefficient is positive and hence
r = 0.96.
1 − r2 1 − (0.96)2
Probable error of r = 0.6745 = 0.6745
n 60
0.0528
= = 0.0068
7.7459
Solving the given regression equations for x and y, we get x = 6 and y = 1 because regression
lines passed through the point ( x , y ).
σx 6 σ 6 σ 6 5
Since r = or 0.96 x = or x = =
σy 5 σy 5 σy 5 × 0.96 4
σx / x y σx 1 5 5
Also the ratio of the coefficient of variability = = ⋅ = × = .
σy / y x σy 6 4 24
S e l f-P r a c t i c e P r o b l e m s 10A
10.1 The following calculations have been made for 10.3 The following data give the experience of
prices of twelve stocks (x) at the Calcutta Stock machine operators and their performance
Exchange on a certain day along with the ratings given by the number of good parts
volume of sales in thousands of shares (y). From turned out per 100 pieces:
these calculations find the regression equation Operator : 11 12 13 4 5 16 7 18
of price of stocks on the volume of sales of experience (x) : 16 12 18 4 3 10 5 12
shares.
Performance
Σ x = 580, Σ y = 370, Σ xy = 11494,
ratings (y) : 87 88 89 68 78 80 75 83
Σ x2 = 41658, Σ y2 = 17206.
Calculate the regression lines of performance
10.2 A survey was conducted to study the ratings on experience and estimate the probable
relationship between expenditure (in Rs.) on performance if an operator has 7 years
accommodation (x) and expenditure on food experience.
and entertainment (y) and the following results 10.4 A study of prices of a certain commodity at Delhi
were obtained: and Mumbai yield the following data:
Mean Standard
Delhi Mumbai
Deviation
• Expenditure on 173..1 63.15 • Average price per kilo (Rs.) 2.463 2.797
• accommodation • Standard deviation 0.326 0.207
• Expenditure on food 47.8 22.98 • Correlation coefficient
• and entertainment between prices at Delhi
• Coefficient of correlation r = 0.57 and Mumbai r = 0.774
Write down the regression equation and
Estimate from the above data the most likely
estimate the expenditure on food and
price (a) at Delhi corresponding to the price of Rs.
entertainment if the expenditure on
2.334 per kilo at Mumbai (b) at Mumbai
accommodation is Rs. 200.
corresponding to the price of 3.052 per kilo at
Delhi
REGRESSION ANALYSIS 357
10.5 The following table gives the aptitude test do this he selects a random sample of 10
scores and productivity indices of 10 workers applicants. They are given the test and later
selected at random: assigned a production rating. The results are
Aptitude as follows:
scores (x) : 60 62 65 70 72 48 53 73 65 82
Worker : A B C D E F G H I J
Productivity
index (y) : 68 60 62 80 85 40 52 62 60 81 Test score : 53 36 88 84 86 64 45 48 39 69
Production
Calculate the two regression equations and rating : 45 43 89 79 84 66 49 48 43 76
estimate (a) the productivity index of a worker
whose test score is 92, (b) the test score of a Fit a linear least squares regression equation of
worker whose productivity index is 75. production rating on test score.
10.6 A company wants to assess the impact of R&D 10.9 Find the regression equation showing the
expenditure (Rs. in 1000s) on its annual profit capacity utilization on production from the
(Rs. in 1000’s). The following table presents following data:
the information for the last eight years:
Average Standard
Year R & D expenditure Annual profit Deviation
1991 9 45 • Production 35.6 10.5
1992 7 42 (in lakh units) :
1993 5 41 • Capacity utilization
1994 10 60 (in percentage) : 84.8 8.5
1995 4 30 • Correlation coefficient r = 0.62
1996 5 34 Estimate the production when the capacity
1997 3 25 utilization is 70 per cent.
1998 2 20 10.10 Suppose that you are interested in using past
expenditure on R&D by a firm to predict current
Estimate the regression equation and predict expenditures on R&D. You got the following
the annual profit for the year 2002 for an allo- data by taking a random sample of firms, where
cated sum of Rs. 1,00,000 as R&D expendi- x is the amount spent on R&D (in lakh of rupees)
ture. 5 years ago and y is the amount spent on R&D
10.7 Obtain the two regression equations from the (in lakh of rupees) in the current year:
following bivariate frequency distribution: x : 30 50 20 180 10 20 20 40
y : 50 80 30 110 20 20 40 50
Sales Revenue Advertising Expenditure (Rs. in thousand) (a) Find the regression equation of y on x.
(Rs. in lakh) 5–15 15–25 25–35 35–45 (b) If a firm is chosen randomly and x = 10,
75–125 3 4 4 8 can you use the regression to predict the
value of y? Discuss.
125–175 8 6 5 7
175–225 2 2 3 4 10.11 The following data relates to the scores
obtained by a salesmen of a company in an
225–275 3 3 2 2
intelligence test and their weekly sales (in Rs.
1000’s ):
Estimate (a) the sales corresponding to
advertising expenditure of Rs. 50,000, (b) the Salesman
advertising expenditure for a sales revenue of intelligence : A B C D E F G H I
Rs. 300 lakh, (c) the coefficient of correlation.
Test score : 50 60 50 60 80 50 80 40 70
10.8 The personnel manager of an electronic Weekly sales: 30 60 40 50 60 30 70 50 60
manufacturing company devises a manual test
for job applicants to predict their production (a) Obtain the regression equation of sales on
rating in the assembly department. In order to intelligence test scores of the salesmen.
358 BUSINESS S T A T I S T I C S
(b) If the intelligence test score of a salesman is 10.14 In trying to evaluate the effectiveness in its ad-
65, what would be his expected weekly sales? vertising compaign, a firm compiled the
[HP Univ., M.Com., 1996] following information:
Calculate the regression equation of sales on
10.12 Two random variables have the regression
advertising expenditure. Estimate the probable
equations:
sales when advertisement expenditure is Rs.
3x + 2y – 26 = 0 and 6x + y – 31 = 0 60 thousand.
(a) Find the mean values of x and y and coeffi-
cient of correlation between x and y. Year Adv. expenditure Sales
(b) If the varaince of x is 25, then find the stand- (Rs. 1000’s) (in lakhs Rs.)
ard deviation of y from the data. 1996 12 5.0
10.13 For a given set of bivariate data, the following 1997 15 5.6
results were obtained 1998 17 5.8
x = 53.2, y = 27.9, 1999 23 7.0
Regression coefficient of y on x = – 1.5, and 2000 24 7.2
Regression coefficient of x and y = – 0.2. 2001 38 8.8
2002 42 9.2
Find the most probable value of y when x = 60.
2003 48 9.5
σx 3x + 2y = 26 or y = 13 – (3/2)x,
x – x =r (y − y)
σy So byx = – 3/2
6x + y = 31 or x = 31/6 – (1/6)y,
10.5
x – 35.6 = 0.62 (y – 84.8) So bxy = – 1/6
8.5
x = – 29.3483 + 0.7659y Correlation coefficient,
Conceptual Questions
1. (a) Explain the concept of regression and point 10. (a) Distinguish between correlation and
out its usefulness in dealing with business regression analysis.
problems. (b) The coefficient of correlation and coefficient
(b) Distinguish between correlation and of determination are available as measures of
regression. Also point out the properties of association in correlation analysis. Describe
regression coefficients. the different uses of these two measures of
2. Explain the concept of regression and point out association.
its importance in business forecasting. 11. What are regression coefficients? State some of
3. Under what conditions can there be one the important properties of regression
regression line? Explain. coefficients.
4. Why should a residual analysis always be done as 12. What is regression? How is this concept useful to
part of the development of a regression model? business forecasting?
13. What is the difference between a prediction
5. What are the assumptions of simple linear
interval and a confidence interval in regression
regression analysis and how can they be
analysis?
evaluated?
14. Explain what is required to establish evidence of
6. What is the meaning of the standard error of
a cause-and-effect relationship between y and x
estimate?
with regression analysis.
7. What is the interpretation of y-intercept and the 15. What technique is used initially to identify the kind
slope in a regression model? of regression model that may be appropriate.
8. What are regression lines? With the help of an 16. (a) What are regression lines? Why is it necessary
example illustrate how they help in business to consider two lines of regression?
decision-making.
(b) In case the two regression lines are identical,
9. Point out the role of regression analysis in business prove that the correlation coefficient is either + 1
decision-making. What are the important or – 1. If two variables are independent, show
properties of regression coefficients? that the two regression lines cut at right angles.
Formulae Used
1. Simple linear regression model 3. Regression coefficient in sample regression
y = β0 + β1x + e equation b = ŷ
2. Simple linear regression equation based on
sample data y = a + bx a = y − bx
362 BUSINESS S T A T I S T I C S
True or False
1. A statistical relationship between two variables 8. Correlation coefficient is the geometric mean of
does not indicate a perfect relationship. regression coefficients.
2. A dependent variable in a regression equation is a 9. If the sign of two regression coefficients is
continuous random variable. negative, then sign of the correlation coefficient is
positive.
3. The residual value is required to estimate the
10. Correlation coefficient and regression coefficient
amount of variation in the dependent variable
are independent.
with respect to the fitted regression line.
11. The point of intersection of two regression lines
4. Standard error of estimate is the conditional represents average value of two variables.
standard deviation of the dependent variable. 12. The two regression lines are at right angle when
5. Standard error of estimate is a measure of scatter the correlation coefficient is zero.
of the observations about the regression line. 13. When value of correlation coefficient is one, the
6. If one of the regression coefficients is greater than two regression lines coincide.
one the other must also be greater than one. 14. The product of regression coefficients is always
more than one.
7. The signs of the regression coefficients are always
15. The regression coefficients are independent of
same.
the change of origin but not of scale.
1. T 2. T 3. T 4. T 5. T 6. F 7. T 8. T 9. F
10. F 11. T 12. T 13. T 14. F 15. T
R e v i e w S e l f-P r a c t i c e P r o b l e m s
10.15 Given the following bivariate data: 10.17 The coefficient of correlation between the ages
x: – 1 5 3 2 1 1 7 3 of husbands and wives in a community was
found to be + 0.8, the average of husbands
y: – 6 1 0 0 1 2 1 5 age was 25 years and that of wives age 22 years.
(a) Fit a regression line of y on x and predict y Their standard deviations were 4 and 5 years
if x = 10. respectively. Find with the help of regression
equations:
(b) Fit a regression line of x on y and predict x
if y = 2.5. (a) the expected age of husband when wife’s
age is 16 years, and
10.16 Find the most likely production corresponding (b) the expected age of wife when husband’s
to a rainfall of 40 inches from the following data: age is 33 years.
10.18 You are given below the following information
Rainfall Production about advertisement expenditure and sales:
(in inches) (in quintals)
Adv. Exp. (x) Sales (y)
Average 30 50
(Rs. in crore) (Rs. in crore)
Standard deviation 5 10
Mean 20 120
Coefficient of correlation r = 0.8. Standard deviation 5 25
REGRESSION ANALYSIS 363