Download as pdf or txt
Download as pdf or txt
You are on page 1of 80

Regression

Syllabus
• Regression equations and estimation;
• Properties of regression coefficients;
• Relationship between Correlation and Regression coefficients;
• Standard Error of Estimate
Regression
• a statistical technique that relates a dependent variable to one or
more independent (explanatory) variables.
• The dictionary meaning of the word Regression is ‘Stepping back’ or
‘Going back’.
• Regression is the study of the nature of relationship between the
variables so that one may be able to predict the unknown value of
one variable for a known value of another variable.
What is regression?

Simple Linear Regression

Independent
Variable

Independent Dependent Independent Dependent


Dependent
Variables Variable Variables Variable
Variable
Ex:

Independent
variable
Regression
• Regression analysis is a very powerful tool in the field of statistical
analysis in predicting the value of one variable, given the value of
another variable, when those variables are related to each other.
• It does this by essentially fitting a best-fit line and seeing how the
data is dispersed around this line.
METHODS OF REGRESSION LINE

➢ There are two methods of obtaining Regression Lines :


1. Free hand curve method :
• Also called scatter diagram.
• The independent variable is taken on the
horizontal axis and the dependent variable is
taken on the vertical axis.Values of the related
variables are plotted on a graph.
• A straight line with free hand is drawn
passing through the plotted points.
Example 1.
From the following data relating to x and y, draw a regression line of yon x.

X 110 140 180 200 240


Y 70 92 82 140 165

180
Solution : 160
140
120
100
80
60
40
20
0
0 100 200 300
2. Method of least Square
Under this method, a regression line is fitted through different points in
such a way that the sum of squares of the deviations of the observed values
from the fitted line shall be least. The line is drawn by this method is called
the line of best fit.
• Also called Least Squares Regression Line- the line that minimizes the sum of square
of vertical distances of the data point from the line.
• Regression equations of Yon X:
Y= a +bX
Regression Equation / Line & Method of Least Squares
• Regression equation of y on x
Y = a + bx

In order to obtain the values of ‘a’ & ‘b’, solve the following normal
equations
∑Y = na + b∑X
∑XY = a∑X + b∑X2
Example 1.
X 2 6 4 3 2 2 8 4

Y 7 2 1 1 2 3 2 6

a) Fit the regression line of Yon X and hence predict Yif X is 20.
Solution :

X Y X2 XY
2 7 4 14

6 2 36 12

4 1 16 4

3 1 9 3

2 2 4 4

2 3 4 6

8 2 64 16

4 6 16 24

∑ X=31 ∑ Y=24 ∑ X2 = 153 ∑ XY = 83


Contd:
Contd:
Regression Equation / Line & Method of Least Squares
There are as many numbers of regression lines as variables.
Suppose we take two variables, say X and Y, then there will be two regression lines:
• Regression line of Y on X: This gives the most probable values of Y from the
given values of X.
• Regression line of X on Y: This gives the most probable values of X from the
given values of Y.

• Regression equation of X on Y
X = c + dY
In order to obtain the values of ‘c’& ‘d’, solve the following normal equations
∑X = nc + d∑Y
∑XY = c∑Y + d∑ Y2
Example 2 (using normal equations)
The following data relates to advertising expenditure and sales
Advertisem 1 2 3 4 5
ent exp (Rs
lakhs)
Sales 10 20 30 50 40

a. Find two regression equations


b. Estimate the likely sales when advertisement expenditure is Rs 7
lakhs
c. What should be the advertising expenditure if the firm wants to
attain sales target of Rs 80 lakhs
b) Using Direct Method
The calculation by the least squares method are quite cumbersome when the values
of X and Y are large. So the work can be simplified by using this method.
The formula for the calculation of Regression Equations by this method:

ത bxy ( Y – 𝑌ത )
Regression Equation of X on Y- X- 𝑋=

ത byx ( X – 𝑋ത )
Regression Equation of Y on X- Y- 𝑌=

𝑏𝑥𝑦 and 𝑏𝑦𝑥 are regression coefficients


Regression Coefficients

Regression Coefficient of X on Y (bxy) :

bxy = N. ∑XY - ∑X. ∑Y


N. ∑Y2 - (∑Y)2
Example 2 (using direct method)
The following data relates to advertising expenditure and sales
Advertisem 1 2 3 4 5
ent exp (Rs
lakhs)
Sales 10 20 30 50 40

a. Find two regression equations


b. Estimate the likely sales when advertisement expenditure is Rs 7
lakhs
c. What should be the advertising expenditure if the firm wants to
attain sales target of Rs 80 lakhs
To obtain Regression Equation from coefficient of correlation, Standard
Deviations and Arithmetic Mean of X and Y.

bxy= r . bxy= r.

EXAMPLE: 5

Find
I. Obtain two Regression Equations
II. Estimate Y when X = 9
III. Estimate X when Y= 12
Solution :(i) Regression Equations

(a) Regression Equation of X on Y

bxy= r .
X – X = r. (Y – Y)
bxy= r.

X – 5 = 0.7 × (Y – 12)

X – 5 = 0.51 (Y – 12)
X – 5 = 0.51Y – 6.12
X = 0.51 Y – 1.12
(b) Regression Equation of Yon X

Y– Y= r. (X – X)

Y – 12 = 0.7 × (X – 5)

Y – 12 = 0.97 (X – 5)
Y – 12 = 0.97X – 4.85
Y = 0.97X – 4.85 + 12
Y = 0.97X + 15
Example
8. The following table shows the sales and advertisement expenditure of a form

Coefficient of correlation r= 0.9. Estimate the likely sales for a proposed advertisement
expenditure of Rs. 10 crores.
Cont:

When advertisement expenditure is 10 crores i.E., Y=10 then


sales X=6(10)+4=64 which implies sales is 64.
EXAMPLE 9

There are two series of index numbers P for price index


and S for stock of the commodity. The mean and standard
deviation of P are 100 and 8 and of S are 103 and 4
respectively. The correlation coefficient between the two
series is 0.4. With these data obtain the regression lines
of P on S and S on P.
EXAMPLE 9
There are two series of index numbers P for price index and S for stock of
the commodity. The mean and standard deviation of P are 100 and 8 and
of S are 103 and 4 respectively. The correlation coefficient between the
two series is 0.4. With these data obtain the regression lines
of P on S and S on P.
SOLUTION Let the regression line X on Y be
EXAMPLE 10
• For 5 pairs of observations the following results are obtained
∑X=15, ∑Y=25, ∑ X2 =55, ∑ Y2 =135, ∑XY=83 find the
equation of the lines of regression and estimate the value of X
on the first line when Y=12 and value of Y on the second line
if X=8.
EXAMPLE 10
• For 5 pairs of observations the following results are obtained ∑X=15,
∑Y=25, ∑X2 =55, ∑Y2 =135, ∑XY=83 find the equation of the lines of
regression and estimate the value of X on the first line when Y=12 and value
of Y on the second line if X=8.
• SOLUTION
b) Using Deviations taken from Actual Mean

Another formula for the calculation of Regression coefficient is as follows:

are regression coefficients

= ∑ xy /∑ y2

b yx = ∑ xy /∑ x2
b) Using Deviations taken from Actual Mean

Example 3. Calculate the Regression equations from the


following data :

X 2 4 6 8 10 12

Y 4 2 5 10 3 6
Solution :
X Y X=7 x2 Y= 5 y2 xy
(X–X) ( Y – Y)
x y
2 4 -5 25 -1 1 5

4 2 -3 9 -3 9 9

6 5 1 1 0 0 0

8 10 1 1 5 25 5

10 3 3 9 -2 4 -6

12 6 5 25 1 1 5

N=6 ∑ Y = 30 ∑x = 0 ∑ x2 = ∑y = 0 ∑ y2 = ∑ xy =
∑ X=42 70 40 18
Solution :
Regression Equations of Yon X :
Y – Y = byx ( X – x̅)
Y – 5 = 0.257 ( X – 7 )
Y – 5 = 0.257 X – 1.7999
Y = 0.257 X + 3.201

Regression Equations of X on Y :
X – x̅= bxy ( Y – Y )
X – 7 = 0.45 (Y – 5)
X = 0.45 Y – 2.25 + 7
X = 0.45 Y + 4.75
c) Using Deviations taken from Assumed Mean
When actual mean of X and Y variables are in fractions ,the calculations can be simplified
by taking the deviations from the assumed mean. The Regression coeffecients can be
calculated as follows:

bxy = N. ∑dxdy - ∑dx. ∑dy


N. ∑ d x2 - (∑d x)2

bxy = N. ∑dxdy - ∑dx. ∑dy


N. ∑ d x2 - (∑d x)2
c) Using Deviations taken from Assumed Mean
Example 4.
Calculate two Regression equations from the following data :
X 43 44 46 40 44 42 45 42 38 40 52 57

Y 29 31 19 18 19 27 27 29 41 30 26 10

Find the value of Y when value of X is 50 and value of X when value of Y is 49.
Solution :
X A= 42 dx2 Y A= 27 dy2 dxdy
dx dy
43 1 1 29 2 4 2
44 2 4 31 4 16 8
46 4 16 19 -8 64 -32
40 -2 4 18 -9 81 18
44 2 4 19 -8 64 -16
42 0 0 27 = A 0 0 0
45 3 9 27 0 0 0
42 = A 0 0 29 2 4 0
38 -4 16 41 14 196 -56
40 -2 4 30 3 9 -6
52 10 100 26 -1 1 -10
57 15 225 10 -17 289 -255
N = 12 ∑dx = 29 ∑dx2 ∑Y= 306 ∑dy = ∑dy2 ∑dxdy=
∑X =533 = 383 -18 =728 -347
X = ∑X = 533 = 44.12
N 12
Y= ∑ Y = 306 = 25.5
N 12

• Bxy = N. ∑dxdy - ∑dx. ∑dy


N. ∑ dx2 - (∑dx)2
= 12 × (-347) – (29) (-18)
12 × 383 – (29)2

= -4164 + 522
4596 – 841
= = 0.97

bxy = N. ∑dxdy - ∑dx. ∑dy


N. ∑ d x2 - (∑d x)2

= - 0.43
Cont:
Cont:
Properties of the Regression Coefficients
❖ The coefficient of correlation is geometric mean of the two regression coefficients.
R = √ bxy * byx
❖ If byx is positive than bxy should also be positive & vice versa.
❖ If one regression coefficient is greater than one, the other must be less than
one.
❖ The coefficient of correlation will have the same sign as that our regression
coefficient.
❖ Regression coefficient are independent of origin but not of scale.
Find the means of X and Y variables and the coefficient of
correlation between them from the following two regression
equations:
3X +2Y=26
6X+Y=31
Find the means of X and Y variables and the coefficient of correlation between them
from the following two regression equations:
3X +2Y=26
6X+Y=31

Solution: Solving these equations 3X +2Y=26


6X+Y=31
Mean of X=4
Mean of Y=7
For calculating correlation coefficient, let us assume that equation 1 is Y on X
2Y=-3X+26
Y= (-3/2)X+13 so byx= -3/2
6X=-Y+31
X=(-1/6)Y +31/6 so bxy=-1/6 Therefore, our assumption is correct

r= -0.5
Example 6
Find the means of X and Y variables and the coefficient
of correlation between them from the following two
regression equations:
2y–x–50 = 0
3y–2x–10 = 0.
Example 6
Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:
2y–x–50 = 0
3y–2x–10 = 0.
SOLUTION
We are given
2Y–X–50 = 0 ... (1) 3Y–2X–10 = 0 ... (2)
Solving equation (1) and (2)
We get Y = 90 X = 130
Calculating correlation coefficient
Let us assume equation (1) be the regression equation of Y on X
CONT:

It may be noted that in the above problem one of the regression coefficient is
greater than 1 and the other is less than 1. Therefore our assumption on given
equations are correct.
Example 12
Example 12
Properties of the Regression Coefficients
❖ The coefficient of correlation is geometric mean of the two regression coefficients.

❖ If byx is positive than bxy should also be positive & vice versa.
❖ If one regression coefficient is greater than one, the other must be less than one.
❖ The coefficient of correlation will have the same sign as that our regression
coefficient.
❖ Regression coefficient are independent of origin but not of scale.
Find the means of X and Y variables and the coefficient of
correlation between them from the following two regression
equations:
3X +2Y=26
6X+Y=31
Find the means of X and Y variables and the coefficient of correlation between them
from the following two regression equations:
3X +2Y=26
6X+Y=31

Solution: Solving these equations 3X +2Y=26


6X+Y=31
Mean of X=4
Mean of Y=7
For calculating correlation coefficient, let us assume that equation 1 is Y on X
2Y=-3X+26
Y= (-3/2)X+13 so byx= -3/2
6X=-Y+31
X=(-1/6)Y +31/6 so bxy=-1/6 Therefore, our assumption is correct

r= -0.5
Let us understand the following property
• Regression coefficient are independent of origin but not of scale.
Explanation:
Change of origin: If X and Y are transformed as follows:
u=X-r v=Y-s
Regression coefficient of new variables will remain the same, that is
byx= bvu

Change of scale: If X and Y are transformed as follows:


u=X/r v=Y/s
𝑠
byx= bvu
𝑟

𝑟
bxy= buv
𝑠
3
a. byx= bvu
2
2
bvu= x 1.2 = 0.8
3
𝑐𝑜𝑣(𝑥,𝑦) 900
b. r= 𝜎 𝜎 = 15 𝑋 80 = 0.75
𝑥 𝑦
𝑟= 𝑏𝑇𝑆 𝑏𝑆𝑇
𝑟𝜎𝑦 80 5
byx = =0.75 x 15 = 4
0.75= 2
𝑥 𝑏𝑆𝑇
𝜎𝑥
𝑏𝑆𝑇 = 0.225
8 5
byx= bTS so bTS = 2
5
Standard Error of Estimate
The standard error of the estimate gives us an idea of how well a regression model fits a
dataset. In particular: The smaller the value, the better the fit.The larger the value, the worse
the fit.
For a regression model that has a small standard error of the estimate, the data points will be
closely packed around the estimated regression line (1)Conversely, for a regression model
that has a large standard error of the estimate, the data points will be more loosely
scattered(2) around the regression line:

(1) (2)
Graphical representation
Standard Error of Estimate
❖ Standard error of estimate is the measure of variation around the computed regression
line.
❖ Standard error of estimate (SE) of y measure the variability of the observed values of y
around the regression line.
❖ Standard error of estimate gives us a measure of the scatter of the observations about the
line of regression.

❖Standard error of estimate of Y on X and X on Y is:


෌ 𝑦−𝑦𝑐 2 ෌ 𝒙−𝒙𝒄 2
SEyx = SExy =
𝑛 𝑛

𝒖𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
=
𝑛

x and y = observed values


𝒙𝒄 𝒂𝒏𝒅 𝑦𝑐 = estimated values from the estimated equation
n = number of observation in sample.
Alternative formulas for standard error of estimate

𝑠𝑦𝑥 = 𝜎𝑦 1 − 𝑟 2

𝑠𝑥𝑦 = 𝜎𝑥 1 − 𝑟 2
CORRELATION VS. REGRESSION

BASIS OF CORRELATION REGRESSION


COMPARISON
Correlation is a statistical Regression describes how an
1. Meaning measure which determines co independent variable is
relationship or association of numerically related to the
two variables. dependent variable.
Correlation is used to Regression is used to fit a
2. Usage represent linear relationship best line and estimate one
between two variables. variable on the basis of
another variable.
Correlation coefficient Regression indicates the
3. Indicates indicates the extent to which impact of a unit change in the
two variables move together. known variable (X) on the
estimated variable (Y).
To find a numerical value To estimate values of random
4. Objective expressing the relationship variable on the basis of values
between variables. of fixed variables.
CONTD:
When advertisement expenditure is 10 crores i.E., Y=10 then sales X=6(10)+4=64 which implies
sales is 64.

You might also like