Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

REGRESSION

Literally the word regression means ‘return to the origin’. In statistics, the word is used in a different
sense. If two variables are correlated, the unknown value of one of the variables can be estimated by
using the known value of the other variable. The so estimated value may not be equal to the actually
observed value, but it will be close to the actual value. Regression Analysis, in general sense, means the
estimation or prediction of the unknown value of one variable from the known value of the other
variable.

The Regression Analysis confined to the study of only two variables at a time is termed as Simple
Regression. But quite often the values of a particular phenomenon may be affected by multiplicity of
causes. The Regression analysis for studying more than two variables at a time is known as Multiple
Regression.

In Regression Analysis there are two types of variables. The variable whose value is influenced or is to be
predicted is called dependent variable. The variable which influences the values or used for prediction is
called independent variable. The Regression Analysis independent variable is known as regressor or
predictor or explanator while the dependent variable is also known as regressed or explained variable.

LINEAR & NON-LINEAR REGRESSION

If the given bivariate data are plotted on a graph, the points so obtained on the diagram will more or
less concentrate around a curve, called the “Curve of Regression”. The mathematical equation of the
Regression curve, is called the Regression Equation. If the regression curve is a straight line, we say that
there is linear regression between the variables under study. If the curve of regression is not a straight
line, the regression is termed as curved or non-linear regression.

The property of the tendency of the actual value to lie close to the estimated value is called regression.
In a wider usage regression is the theory of estimation of unknown value of a variable with the help of
known values of the variables. The regression theory was first introduced and developed by Sir Francis
Galton in the field of Genetics.

Here, firstly, a mathematical relation between the two variables is framed. This relation which is called
regression equation is obtained by the method of least squares. It may be linear or non – linear.
For a bivariate data on x and y, the regression equation obtained with the assumption that x is
dependent on y is called regression of x on y. The regression of x on y is:

(x – AM of x ) = bxy (y – AM of y)

The regression equation obtained with the assumption that y is dependent on x is called regression of y
on x. the regression of y on x is –

(y – AM of y) = byx (x – AM of x)

The following set of formulas explains all the terms given below:

Cov (x,y)
bxy =
r. бx
bxy =
Cov (x,y) byx = r. бy byx =
бy бy2 бx бx2

nΣxy - Σx.Σy Σdx.dy nΣxy - Σx.Σy Σdx.dy


bxy= bxy = byx= byx =
Σdy2 Σdx2
nΣy2 -(Σy)2 nΣx2 -(Σx)2

The regression of x on y is used for the estimation of x values and the regression of y on x is used for the
estimation of y values. The graph of the regression equations are the regression lines.

PROPERTIES OF REGRESSION

Regression coefficient are the coefficients of the independent variables in the regression equations.

1. The regression coefficient bxy is the change occurring in x for unit change in y. The regression
coefficient byx is the change occurring in y for unit change in x.

2. The regression coefficient is independent of the origin of measurements of the variables. But,
they are dependent on the scale.
3. The geometric mean of regression coefficients is equal to the coefficient of correlation
(numerically).
4. The regression coefficients cannot be of opposite signs. If r is positive, both the regression
coefficients will be positive. If r is negative, both the regression coefficients will be negative. If r
is zero, both the regression coefficients will be zero.
5. Since coefficient of correlation, numerically cannot be greater than 1, the product of regression
coefficients cannot be greater than 1.

PROPERTIES OF REGRESSION LINES

There are two regression lines.

1. The regression lines intersect at ( x,y)


2. The regression lines have positive slope if the variables are positively correlated. They have
negative slope if the variables are negatively correlated.
3. If there is perfect correlation, the regression lines coincide ( there will be only one
regression line)

LINES OF REGRESSION

Line of regression is the lines which gives the best estimate of one variable for any given value of the
other variable. In case of two variable say x & y, we shall have two regression equations; x on y and the
other is y on x.

Line of regression of y on x is the line which gives the best estimate for the value of y for any specified
value of x.

Line of regression of x on y is the line which gives the best estimate for the value of x for any specified
value of y.

LINES OF REGRESSION OF y on x

(y - AM of y) = (x – AM of x) r. бy
бx
LINES OF REGRESSION OF x on y

(x – AM of x) = (y - AM of y) r. бx

REMEMBER

a. When r=0 i.e., when x & y are uncorrelated, then the lines of regression of y on x, and x on y are
given as: y – y = 0 and x – x = 0. The lines are perpendicular to each other.
b. When r=+1 then the two lines coincide.
c. If the value of r is significant, we can use the lines of regression for estimation and prediction.
d. If r is not significant, then the linear model is not a good fit and hence the line of regression should
not be used for prediction.

COEFFICIENTS OF REGRESSION

a. bxy is the Coefficient of regression of x on y.


b. byx is the Coefficient of regression of y on x.

THEOREMS ON REGRESSION COEFFICIENTS

a. The correlation coefficient is the Geometric Mean between the Regression Coefficients i.e., r 2= bxy byx
b. The sign to be taken before the square root is same as that of regression coefficients.
c. If one of the regression coefficient is greater than one, then the other must be less than one.
d. The AM of the modulus value of regression coefficients is greater than the GM of the modulus value
of the Correlation Coefficient.
e. Regression coefficients are independent of change of origin but not of scale.

Problem 1:

X Y dx=X-X dy=Y-Y dx2 dy2 dxdy


91 71 1 1 1 1 1
Σdx.dy Σdx.dy
97 75 7 5 49 25 35 bxy = byx =
Σdy2 Σdx2
105 69 18 -1 324 1 -18
121 97 31 27 961 729 837
3900 3900
67 70 -23 0 529 0 0 bxy = byx =
2868 6360
124 91 34 21 1156 441 714
51 39 -39 -31 1521 961 1209
73 61 -17 -9 289 81 153 1.361 0.6132
111 80 21 10 441 100 210
57 47 -33 -23 1089 529 759
900 700 0 0 6360 2868 3900

(x-x) = bxy (y-y) (y-y) = byx (x-x)

(x-90) = 1.361(y-70) (y-70) = 0.6132 (x-90)


x=1.361y - 5.27
y=0.6132x + 14.812

Problem 2:

The data about the sales & advertisement expenditure of a firm is given below:

Sales Advertisement Expenditure

Means 40 6

Standard Deviations 10 1.5

Coefficient of Correlation is 0.9

o Estimate the likely sales for a proposed advertisement expenditure of Rs. 10 crores.
o What should be the advertisement expenditure if the firm proposes a sales target of 60 crores of
rupees?

Answer:
(x-x) = bxy (y-y) (y-y) = byx (x-x)

r. бx byx = r. бy
bxy = бx
бy
(x-40) = (0.9*10/1.5) (y-6) (y-6) = (0.9*1.50/10) (x-40)

x = 6y+4 y = 0.135x+0.6

x = 6*10+4 y = 0.135*60+0.6

x = 64 y =8.7

Problem 3:

Point out the consistency, if any, in the following statement: “The Regression Equation of y on x is
2y+3x=4 and the correlation coefficient between x & y is 0.8”

Answer:

Refer properties.

Problem 4:

By using the following data, find out the two lines of regression and from them compute the Karl-
Pearson’s coefficient of correlation: ΣX=250; ΣY=300; ΣXY=7900; ΣX2=6500; ΣY2=10000; n=10

Answer:
nΣxy - Σx.Σy nΣxy - Σx.Σy
bxy = byx =
nΣx2 -(Σx)2
nΣy2 -(Σy)2

10*7900 – 250*300 10*7900 – 250*300


bxy = byx =
10*10000 -(300)2 10*6500 -(250)2

0.4 1.6

rxy2 = bxy* rxy2 = 1.6* rxy = 0.8


bxy 0.4

Problem 5:

Find the two regression coefficients and hence the r. n=5; X=10; Y=20; Σ(X-4)2=100; Σ(Y-10)2=160; Σ(X-4)
(Y-10)=80

Answer:

U=X-4; U=X-4=6; ΣU= nU = 30. Similarly ΣV=50

nΣUV - ΣU.ΣV nΣUV - ΣU.ΣV


byx= byx=
nΣU2 -(ΣU)2 nΣV2 -(ΣV)2

5*80 – 30*50 5*80 – 30*50


= (11 byx= = (11
byx= 4) 5*160 -(50)2 17)
5*100 -(30)2

r = √(11/4)(11/17) = 1.33 ( it is impossible)

You might also like