Professional Documents
Culture Documents
Regression (Autosaved) (Autosaved)
Regression (Autosaved) (Autosaved)
Syllabus
• Regression equations and estimation;
• Properties of regression coefficients;
• Relationship between Correlation and Regression coefficients;
• Standard Error of Estimate
Regression
• a statistical technique that relates a dependent variable to one or
more independent (explanatory) variables.
• The dictionary meaning of the word Regression is ‘Stepping back’ or
‘Going back’.
• Regression is the study of the nature of relationship between the
variables so that one may be able to predict the unknown value of
one variable for a known value of another variable.
What is regression?
Independent
Variable
Independent
variable
Regression
• Regression analysis is a very powerful tool in the field of statistical
analysis in predicting the value of one variable, given the value of
another variable, when those variables are related to each other.
• It does this by essentially fitting a best-fit line and seeing how the
data is dispersed around this line.
METHODS OF REGRESSION LINE
180
Solution : 160
140
120
100
80
60
40
20
0
0 100 200 300
2. Method of least Square
Under this method, a regression line is fitted through different points in
such a way that the sum of squares of the deviations of the observed values
from the fitted line shall be least. The line is drawn by this method is called
the line of best fit.
• Also called Least Squares Regression Line- the line that minimizes the sum of square
of vertical distances of the data point from the line.
• Regression equations of Yon X:
Y= a +bX
Regression Equation / Line & Method of Least Squares
• Regression equation of y on x
Y = a + bx
In order to obtain the values of ‘a’ & ‘b’, solve the following normal
equations
∑Y = na + b∑X
∑XY = a∑X + b∑X2
Example 1.
X 2 6 4 3 2 2 8 4
Y 7 2 1 1 2 3 2 6
a) Fit the regression line of Yon X and hence predict Yif X is 20.
Solution :
X Y X2 XY
2 7 4 14
6 2 36 12
4 1 16 4
3 1 9 3
2 2 4 4
2 3 4 6
8 2 64 16
4 6 16 24
• Regression equation of X on Y
X = c + dY
In order to obtain the values of ‘c’& ‘d’, solve the following normal equations
∑X = nc + d∑Y
∑XY = c∑Y + d∑ Y2
Example 2 (using normal equations)
The following data relates to advertising expenditure and sales
Advertisem 1 2 3 4 5
ent exp (Rs
lakhs)
Sales 10 20 30 50 40
ത bxy ( Y – 𝑌ത )
Regression Equation of X on Y- X- 𝑋=
ത byx ( X – 𝑋ത )
Regression Equation of Y on X- Y- 𝑌=
bxy= r . bxy= r.
EXAMPLE: 5
Find
I. Obtain two Regression Equations
II. Estimate Y when X = 9
III. Estimate X when Y= 12
Solution :(i) Regression Equations
bxy= r .
X – X = r. (Y – Y)
bxy= r.
X – 5 = 0.7 × (Y – 12)
X – 5 = 0.51 (Y – 12)
X – 5 = 0.51Y – 6.12
X = 0.51 Y – 1.12
(b) Regression Equation of Yon X
Y– Y= r. (X – X)
Y – 12 = 0.7 × (X – 5)
Y – 12 = 0.97 (X – 5)
Y – 12 = 0.97X – 4.85
Y = 0.97X – 4.85 + 12
Y = 0.97X + 15
Example
8. The following table shows the sales and advertisement expenditure of a form
Coefficient of correlation r= 0.9. Estimate the likely sales for a proposed advertisement
expenditure of Rs. 10 crores.
Cont:
= ∑ xy /∑ y2
b yx = ∑ xy /∑ x2
b) Using Deviations taken from Actual Mean
X 2 4 6 8 10 12
Y 4 2 5 10 3 6
Solution :
X Y X=7 x2 Y= 5 y2 xy
(X–X) ( Y – Y)
x y
2 4 -5 25 -1 1 5
4 2 -3 9 -3 9 9
6 5 1 1 0 0 0
8 10 1 1 5 25 5
10 3 3 9 -2 4 -6
12 6 5 25 1 1 5
N=6 ∑ Y = 30 ∑x = 0 ∑ x2 = ∑y = 0 ∑ y2 = ∑ xy =
∑ X=42 70 40 18
Solution :
Regression Equations of Yon X :
Y – Y = byx ( X – x̅)
Y – 5 = 0.257 ( X – 7 )
Y – 5 = 0.257 X – 1.7999
Y = 0.257 X + 3.201
Regression Equations of X on Y :
X – x̅= bxy ( Y – Y )
X – 7 = 0.45 (Y – 5)
X = 0.45 Y – 2.25 + 7
X = 0.45 Y + 4.75
c) Using Deviations taken from Assumed Mean
When actual mean of X and Y variables are in fractions ,the calculations can be simplified
by taking the deviations from the assumed mean. The Regression coeffecients can be
calculated as follows:
Y 29 31 19 18 19 27 27 29 41 30 26 10
Find the value of Y when value of X is 50 and value of X when value of Y is 49.
Solution :
X A= 42 dx2 Y A= 27 dy2 dxdy
dx dy
43 1 1 29 2 4 2
44 2 4 31 4 16 8
46 4 16 19 -8 64 -32
40 -2 4 18 -9 81 18
44 2 4 19 -8 64 -16
42 0 0 27 = A 0 0 0
45 3 9 27 0 0 0
42 = A 0 0 29 2 4 0
38 -4 16 41 14 196 -56
40 -2 4 30 3 9 -6
52 10 100 26 -1 1 -10
57 15 225 10 -17 289 -255
N = 12 ∑dx = 29 ∑dx2 ∑Y= 306 ∑dy = ∑dy2 ∑dxdy=
∑X =533 = 383 -18 =728 -347
X = ∑X = 533 = 44.12
N 12
Y= ∑ Y = 306 = 25.5
N 12
= -4164 + 522
4596 – 841
= = 0.97
= - 0.43
Cont:
Cont:
Properties of the Regression Coefficients
❖ The coefficient of correlation is geometric mean of the two regression coefficients.
R = √ bxy * byx
❖ If byx is positive than bxy should also be positive & vice versa.
❖ If one regression coefficient is greater than one, the other must be less than
one.
❖ The coefficient of correlation will have the same sign as that our regression
coefficient.
❖ Regression coefficient are independent of origin but not of scale.
Find the means of X and Y variables and the coefficient of
correlation between them from the following two regression
equations:
3X +2Y=26
6X+Y=31
Find the means of X and Y variables and the coefficient of correlation between them
from the following two regression equations:
3X +2Y=26
6X+Y=31
r= -0.5
Example 6
Find the means of X and Y variables and the coefficient
of correlation between them from the following two
regression equations:
2y–x–50 = 0
3y–2x–10 = 0.
Example 6
Find the means of X and Y variables and the coefficient of correlation between them from the following
two regression equations:
2y–x–50 = 0
3y–2x–10 = 0.
SOLUTION
We are given
2Y–X–50 = 0 ... (1) 3Y–2X–10 = 0 ... (2)
Solving equation (1) and (2)
We get Y = 90 X = 130
Calculating correlation coefficient
Let us assume equation (1) be the regression equation of Y on X
CONT:
It may be noted that in the above problem one of the regression coefficient is
greater than 1 and the other is less than 1. Therefore our assumption on given
equations are correct.
Example 12
Example 12
Properties of the Regression Coefficients
❖ The coefficient of correlation is geometric mean of the two regression coefficients.
❖ If byx is positive than bxy should also be positive & vice versa.
❖ If one regression coefficient is greater than one, the other must be less than one.
❖ The coefficient of correlation will have the same sign as that our regression
coefficient.
❖ Regression coefficient are independent of origin but not of scale.
Find the means of X and Y variables and the coefficient of
correlation between them from the following two regression
equations:
3X +2Y=26
6X+Y=31
Find the means of X and Y variables and the coefficient of correlation between them
from the following two regression equations:
3X +2Y=26
6X+Y=31
r= -0.5
Let us understand the following property
• Regression coefficient are independent of origin but not of scale.
Explanation:
Change of origin: If X and Y are transformed as follows:
u=X-r v=Y-s
Regression coefficient of new variables will remain the same, that is
byx= bvu
𝑟
bxy= buv
𝑠
3
a. byx= bvu
2
2
bvu= x 1.2 = 0.8
3
𝑐𝑜𝑣(𝑥,𝑦) 900
b. r= 𝜎 𝜎 = 15 𝑋 80 = 0.75
𝑥 𝑦
𝑟= 𝑏𝑇𝑆 𝑏𝑆𝑇
𝑟𝜎𝑦 80 5
byx = =0.75 x 15 = 4
0.75= 2
𝑥 𝑏𝑆𝑇
𝜎𝑥
𝑏𝑆𝑇 = 0.225
8 5
byx= bTS so bTS = 2
5
Standard Error of Estimate
The standard error of the estimate gives us an idea of how well a regression model fits a
dataset. In particular: The smaller the value, the better the fit.The larger the value, the worse
the fit.
For a regression model that has a small standard error of the estimate, the data points will be
closely packed around the estimated regression line (1)Conversely, for a regression model
that has a large standard error of the estimate, the data points will be more loosely
scattered(2) around the regression line:
(1) (2)
Graphical representation
Standard Error of Estimate
❖ Standard error of estimate is the measure of variation around the computed regression
line.
❖ Standard error of estimate (SE) of y measure the variability of the observed values of y
around the regression line.
❖ Standard error of estimate gives us a measure of the scatter of the observations about the
line of regression.
𝒖𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒗𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
=
𝑛
𝑠𝑦𝑥 = 𝜎𝑦 1 − 𝑟 2
𝑠𝑥𝑦 = 𝜎𝑥 1 − 𝑟 2
CORRELATION VS. REGRESSION