Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

(3.

2) Regression Analysis

B.Com (Hons.) Ist Year


Unit no: 1
Paper No:IV
3.2 Regression Analysis

Fellow
Dr.Kawal Gill, Associate Professor
Department/College: Shri Guru Gobind Singh College of Commerce

Author
Dr. Madhu Gupta, Associate Professor
College/Department: Janki devi Memorial College of Commerce,
University of Delhi

Reviewer
Dr. Bindra Prasad, Associate Professor
College/ Department: Department of Commerce, Shaheed Bhagat Singh
College, University of Delhi

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Table of Contents
 (3.2) Regression Analysis
o 2.1 Introduction
o 2.2 Regression Lines
o 2.3 Regression Equations And Prediction
o 2.4 Standard Error Of Estimate
o 2.5 Difference between Correlation Analysis and Regression analysis
o Summary
o Exercise
o References
o Glossary

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.1 Introduction
Correlation analysis, discussed in the previous chapter, studies the degree and direction of
covariation between two or more variables. However, it fails to give the functional relationship, to
enable us to predict the value of one based on another. If we could establish a functional
relationships between these variables based on past observations, we can estimate with some
accuracy, the value of an unknown dependent variable with the help of known values of
independent variable/s (see web link 2.1). Regression analysis establishes this functional
relationships between two or more variables.

2.1.1 Meaning
Regression analysis is the statistical technique that identifies the functional relationship between
two or more variables. Regression analysis helps us understand how the typical value of the
dependent variable changes when any one of the independent variables is varied, while the other
independent variables are held fixed (see web link 2.2). The technique is used to find the
equation that represents the average relationship between the variables, so that we can predict
the unknown values of one variable (dependent variable) from the known values of other variable
(the independent variable).

2.1.2 Utility Of Regression Analysis


Regression analysis establishes a functional relationship between two or more variables. With it,
we can predict the value of an unknown variable with the help of known values of the other
variables and as such, it is extensively used in every field of study.

Since most of the problems of business and economic analysis are based on cause and effect
relationship, regression analysis is a highly valuable tool in economic and business research. It is
highly used in the estimation of demand and supply curves, cost functions, production and
consumption functions, advertising expenditure and sales functions, etc. In fact, economists have
propounded many types of production functions by fitting regression lines to the input and output
data. For efficient planning of an economy, prediction or estimation of future prices, sales,
production, investment, profits, etc is of paramount importance, and it is done with the help of
regression analysis. In finance, it used in securities' markets analysis and in the risk-return
analyses which is basic to Portfolio Theory. The pharmaceutical firms use it to study the effect of
new drugs on patients and Government use it to project population statistics like birth rate, death
rate and literacy rate.

It is important statistical tool used in almost all sciences – natural, social, and physical to
establish an average relation between two or more variables, which are in casual relationship. It
also forms a basis for further statistical analysis like time series analysis and covariance analysis.
It defines in clear measurable terms the important characteristics hidden in the large masses of
data.
2.1.3 Types Of Regression Analysis
The different types of regression analysis are:

Simple and Multiple Regression

When regression analysis is confined to the study of only two variables at a time, it is termed as
simple regression analysis. In such cases, value of one variable (the dependent variable) is
estimated based on the value of another variable (the independent variable). For example, if the
functional relationship between advertising expenditure on sales is studied, it will be called simple
regression analysis, as it involves only two variables. In this relationship, advertising expenditure
is the independent variable (X) on which the value of dependent variable sales (Y) depends and
function is expressed as
Y = f(X), where Y = sales and X = advertising expenditure.

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

A multiple regression analysis on the other hand, is one in which, functional relationship between
more than two related variables is studied. In such analysis, one variable is dependent variable
(Y), and other variables are taken as independent variables (say X and Z). For example, if sales
of an article (Y) depend on advertising expenditure (X), income of the consumers (Z) and price of
the article (P); then the study of the functional relationship between these four variables is called
multiple regression analysis and the function will be expressed as
Y = f(X,Z,P), where Y = sales, X = advertising expenditure, Z = income of the consumers and P
= price of the article.

Figure2.9: Simple and Multiple Regressions

Total and Partial Regression

A total regression analysis is one, where the value of a variable is affected by multiplicity of
causes and the effect of all such variables on that variable, is studied together. For example, if
the sales of an article (Y) is affected by many variables like advertising expenditure (X), income
of the consumers (Z) and price of the article (P) and we study the effect of all these variables in
sales at a time we call it total regression analysis. Thus, Y = f(X,Z,P).

On the other hand, partial regression analysis is one, in which, though all the variables affecting
the dependent variable are recognised, but the effect of only one is studied keeping the other
variables constant. Thus, in our example, a partial regression analysis will take the form of

Y = f(X but not Z and P)


Y = f(Z but not X and P)
Y = f(P but not X and Z).

Linear and Non-linear Regression

When the two variables are so related that, change in the independent variable by one unit,
causes a constant absolute change in the value of the dependent variable, then, the study of such
relationship is called linear regression analysis. When plotted on the graph, the bivariate data will

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

form a linear or straight-line path. The equation of such path or line is the equation of a straight
line. On the other hand, if a change in the independent variable by one unit does not cause a
constant absolute change in the value of dependent variable, the study of such relationship is
called non-linear regression analysis. When plotted on the graph, such bivariate data will not form
a straight line.

Figure2.10: Linear and Non-Linear Regression

2.2 Regression Lines


The main objective of regression analysis is to establish a functional relationship between two or
more variables, so that, we could predict the value of one variable corresponding to the given
values of other variables.

Our study of regression analysis will be limited to simple linear regression, i.e. linear
regression between two variables.

In case of two variables X and Y, we shall have two regression lines – regression line of Y on X
and regression line of X on Y. The regression line of Y on X is used to estimate the most probable
value of Y, for a given value of X. Here Y is the dependent variable and X is the independent
variable. The regression line of X on Y, on the other hand, is used to estimate the most probable
value of X for a given value of Y. In this case, X is the dependent variable and Y is the
independent variable.

2.2.1 Least Square Method

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure 2.11 : Regression Line of Y on X

A regression line is fitted on the principle of least squares and such a line is called ‘the line of
best fit’. This line minimizes the error between the estimated points on the line and the actual
observed points that were used to draw it. In other words, the line of best fit is a line that is
drawn in such a way that the total sum of squares of the deviations of the individual observations
from the line is least.

The deviations that are minimised may be vertical deviations or horizontal deviations. A line that
minimises the sum of the squares of vertical deviations of the observations from it, is called, the
regression line of Y on X; whereas the line which minimizes the sum of the squares of horizontal
deviations of the observed points from it, is called the regression line of X on Y.

Here ∑(Y – Yc)2 is minimum

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure2.12: Regression Line of X on Y

Here ∑(X – Xc)2 is minimum


The intersection point of two regression lines gives us the mean values of the two variables X and
Y on X and Y axis respectively.

Figure2.13: Regression Lines and the Mean Values

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.2.2 Regression Lines And Correlation


When there is perfect positive or perfect negative correlation between the two variables (i.e.,
when r = +1 or r = -1), the two regression lines will coincide, i.e., we will have only one line.
Such regression line would be sloped positively or negatively depending upon whether the
correlation is positive or negative. On the other extreme, if the variables are independent (i.e., r
= 0), the two lines would lie at right angles to each other. The larger the angle between the two
regression lines, the lesser is the degree of correlation and the smaller the angle between them,
the higher is the degree of correlation. Both the lines will have either positive slope or negative
slope depending upon whether the correlation between them is positive or negative.

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure2.14: Regression Lines and Correlation

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.3 Regression Equations And Prediction

2.3.1 Regression Equations


Regression equations are algebraic expressions of the regression lines. Since there are two
regression lines, we have two regression equations. These are:

1. Regression equation of Y on X

Yc = a + bX

1. Regression equation of X on Y

Xc = a + bY
Regression Equation of Y on X
The regression equation of Y on X defines the regression line of Y on X. This is expressed as
follows
Yc = a + bX
Here Y is dependent variable and X is the independent variable. ‘a’ is the Y intercept and gives
the value of Y when X = 0. ‘b’ is called regression coefficient of Y on X and it gives the slope of
the regression line. It shows the change in Y associated with a unit change in the X variable. It is
also written as byx. To find the values of ‘a’ and ‘b’, the following equations, called normal
equations, are to be solved.
∑Y = Na + b∑X and
∑XY = a∑X + b∑X2

Regression Equation of X on Y
Regression equation of X on Y defines the regression line of X on Y and is expressed as follows
Xc = a + bY
In this equation X is the dependent variable whose value depends on Y, which is the
independent variable. ‘a’ is the X intercept i.e., it is the value of X at Y = 0 and ‘b’ is the
regression coefficient of X on Y. It is also written as bxy, which gives the slope of the line and
indicate the amount of change in X variable when Y variable changes by one unit.
To determine the value of the constants ‘a’ and ‘b’ the two normal equations that are to be
solved simultaneously are:
∑X = Na + b∑Y and
∑XY = a∑Y + b∑Y2

2.3.2 Prediction Using Regression Equations


The most important use of regression is to make predictions. Once we have a fitted regression
equation that shows the average relation between the two variables X and Y, we can plug in any
value of independent variable, to obtain the prediction for dependent variable. For example

If Yc = 19.7 + .56X
Then for X = 6,
Estimated value of Y (i.e. Yc) = 19.7 + 0.56 6 = 23.06.
Illustration: The following data give the experience of machine operators and their performance
ratings given by the number of good parts turned out per 20 pieces. Obtain the equations of the
two regression lines from the data and plot the data and two lines on the graph. Also, estimate
the probable performance, if an operator has 10 years experience and most probable experience
of an operator, if he gets 18 points in performance rating

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure 2.16: Machine Operators

Experience 1 2 3 4 5 6 7 8 9
in years (X)
Performance 9 8 10 12 11 13 14 16 15
ratings (Y)

Solution: Determination of two regression equations

Performance XY X2 Y2 Yc Xc
Experience in ratings (Y) = 7.25 + 0.95X = -6.4 +
years (X) 0.95Y
1 9 9 1 81 8.2 2.15
2 8 16 4 64 9.15 1.2
3 10 30 9 100 10.1 3.1
4 12 48 16 144 11.05 5
5 11 55 25 121 12 4.05
6 13 78 36 169 12.95 5.95
7 14 98 49 196 13.9 6.9
8 16 128 64 256 14.85 8.8
9 15 135 81 225 15.8 7.85
45 108 597 285 1356

1. Regression equation of Y on X is Yc = a + bX
The two normal equations are
∑Y = Na + b∑X and
∑XY = a∑X + b∑X2
Putting values, we get
108 = 9a + 45b .......(i)
597 = 45a + 285b ........(ii)
Multiplying equation (i) by 5 and subtracting it from equation (ii), we get
597 = 45a + 285b
-540 = -45a – 225b
57 = 60b

or b = = 0.95
Putting this value in equation (i), we get
108 = 9a + (45 x 0.95)
or 9a = 108 – 42.75 = 62.25

or a = = 7.25

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

The required equation is


Yc = 7.25 + 0.95X
When, X = 10
Yc = 7.25 + (0.95 x 10)
= 7.25 + 9.5 = 16.75
Therefore, when X = 10 years, the most probable performance rating is 16.75.
2. Regression equation of X on Y is Xc = a + bY
The two normal equations are
∑X = Na + b∑Y and
∑XY = a∑Y + b∑Y2
Putting values, we get
45 = 9a + 108b ...........(i)
597 = 108a + 1356b ...........(ii)
Multiplying equation (i) by 12 and subtracting it from equation (ii), we get
597 = 108a + 1356b
-540 = -108a – 1296b
57 = 60b

or b = = 0.95

Putting this value in equation (i), we get


45 = 9a + (108 x 0.95)
9a = 45 – 102.6 = -57.6

and, a = = -6.4
Therefore, the required equation is

Xc = -6.4 + 0.95Y

Figure2.17: Actual Data and Regression Lines

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

When Y = 18,
Xc = -6.4 + 0.95Y = -6.4 + (0.95 x 18)
= -6.4 + 17.1 = 10.7
Therefore, when performance rating is 18, the most probable experience of the operator is 10.7
years.

2.3.3 Alternative Methods Of Finding Regression Equations

Instead of finding the regression equations by solving normal equations, we can find the two
regression equations as under:

The regression equation of Y on X is given by

The values of regression coefficients byx and bxy may be calculated

1. by taking deviations from arithmetic mean,


2. by taking deviations from assumed mean, or
3. directly from the original values without taking deviations.

1) Regression coefficients when deviations are taken from arithmetic means


When deviations are taken from arithmetic means regression coefficients are obtained as follows

Illustration: Obtain regression equations taking deviations from the means of X and Y from the
data given below

25 28 35 32 31 36 29 38 34 32
Price of
belt in Rs
Demand in 43 46 49 41 36 32 31 30 33 39
units

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure2.18: Belts

Solution: Let price be denoted by X and demand by Y.


Determination of two regression equations

Y x y x2 y2 Xy
X =X- =Y-
= X - 32 = Y - 38
25 43 -7 5 49 25 -35
28 46 -4 8 16 64 -32
35 49 3 11 9 121 33
32 41 0 3 0 9 0
31 36 -1 -2 1 4 2
36 32 4 -6 16 36 -24
29 31 -3 -7 9 49 21
38 30 6 -8 36 64 -48
34 33 2 -5 4 25 -10
32 39 0 1 0 1 0
320 380 0 0 140 398 -93

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

or X – 32 = -0.23 (Y – 38)
or X – 32 = -0.23Y + 8.88
or X = 40.88 – 0.23Y
Thus, the two regression equations are
Y = 59.26 – 0.66X and
X = 40.88 – 0.23Y

2) Regression coefficients when deviations are taken from assumed mean


When actual means of X and Y variables are in fraction, deviations can be taken from assumed
mean to simplify the calculations. In that case, regression coefficients would be

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Where dx = X – Ax and dy = Y – Ay; Ax and Ay are the assumed means of X- series and Y-series
respectively.
If a common factor is taken out from each value, then

ix = common factor of X-series and iy = common factor of Y-series.

Illustration: you are given the following data regarding a distribution:


N = 4; = 5, = 21.25; ∑(X – 5)2 = 20, ∑(Y – 20)2 = 225; ∑(X – 5) (Y – 20) = 65
Find the two regression equations.

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Regression equation of Y on X is

3) Regression coefficients from original values


From original values regression coefficients are obtained as follows

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Illustration: Calculate two regression equations from the data of a pizza corner shop given below:

2 4 6 8 10 12 14
Number of
delivery boys
(X)
Number of 50 64 85 100 95 105 125
pizzas sold
(Y)

Figure2.19: Pizza Corner

Solution: Determination of two regression equations

Number of pizzas X2 Y2 XY
Number of sold (Y)
delivery boys (X)
2 50 4 2500 100

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

4 64 16 4096 256
6 85 36 7225 510
8 100 64 10000 800
10 95 100 8836 940
12 105 144 11025 1260
14 125 196 15625 1750
56 623 560 59307 5616

The two regression equations are

Y- = byx (X - ) and

X- = bxy (Y - )

Putting values, we get

Y – 89 = 5.64 (X – 8)

Y = 43.88 + 5.64X and

X – 8 = 0.16 (Y – 89)

X = -6.24 + 0.16Y

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.3.4 Regression Coefficients In Case Of Bivariate Frequency


Distribution
In case of bivariate frequency distribution also, any of the methods of finding regression
coefficients may be used, but the most convenient will be one where deviations from assumed
mean (or step deviation) are taken from assumed mean. Taking step deviations from assumed
mean,

Illustration: From the following data relating to the prices of two shares X and Y, calculate
regression equations.

130-140 140-150 150-160 160-170 170-180 Total

120-130 2 5 3 - - 10
130-140 1 8 12 6 - 27
140-150 - 5 22 14 1 42
150-160 - 2 16 9 2 29
160-170 - 1 8 6 1 16
170-180 - - 2 4 2 8
Total 3 21 63 39 6 132

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure2.20: Display of Share Prices

Solution: Calculation of two regression equations

f 3 21 63 39 6 132
Total
fdxdy 10 14 0 27 20 71
fdx -6 -21 0 39 12 24
fdx2 12 21 0 39 24 96

fdxdy fdy fdy2


18 -20 40
4 -27 27
0 0 0
11 29 29
14 32 64
24 24 72
Total 71 38 232

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.3.5 Properties Of Regression Coefficients


1. The coefficient of correlation between two variables can be determined by extracting the
geometric mean of the two regression coefficients, i.e.

r=
The sign of r will be same as the sign of the regression coefficients.
2. Both the regression coefficients must have the same algebraic sign i.e. both of them shall be
either positive or negative. Each regression coefficient is a product/division of three components

r, and . The standard deviations can never be negative, it only r that can be positive or
negative. Thus, the sign of regression coefficient is determined by the sign of r. hence the two
regression coefficients must bear the same sign as that of r.
3. Both regression coefficients cannot simultaneously exceed one. If they both exceed one, their
product will exceed one which is not possible since bxy x byx = r2 which should always be equal
to or less than one.
4. Regression coefficients are independent of the change of origin but not of scale. Thus, if a
constant ‘a’ is added to or subtracted from the original values of the two variables, regression
coefficients will not be affected but, if the values of the two variables are multiplied or divided
by a constant ‘k’ the regression coefficients will be similarly affected.
5. The arithmetic mean of the regression coefficients is either equal to or more than the

correlation coefficient i.e.

Illustration: Y = X + 50 and X = kY – 10 are the lines of regression of Y on X and X on Y

respectively. If k is positive, prove that it cannot exceed 2. If k = , find the means of the two
variables and the correlation coefficient between them.

Solution: bxy = k and byx = ; r2 = bxy x byx =

Maximum value of r2 can be equal to 1.

Therefore, 1 or k 2.

If k = , the two regression equations are

Y= X + 50 and X = Y – 10

Solving these equations simultaneously, we get X = 130, and Y = 90.

Therefore, mean value of X = 130 and mean value of Y = 90 .

r=+ =+ = +0.866

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.4 Standard Error Of Estimate

2.4.1 Meaning
Lines of regression are lines of average relationship between the two variables. These are
helpful in predicting the value of one variable on the basis of given values of another variable.
The estimated values based on regression lines are generally different from the observed values
because the variations in the dependent variable may not be solely due to variations in the
independent variable. Some other factors also operate to cause variation in the dependent
variable. It is better to find the likely error in the estimated values.

A measure of such error in estimating the values of dependent variable based on regression line
is called standard error of estimate. It shows the average deviation of the actual values of
variable from the estimated values of that variable or in other words, it is a measure of the
variation or scatterdness about the line of regression. Thus, it is analogous to standard
deviation. The standard deviation measures the dispersion around the arithmetic mean while the
standard error of estimate measures the dispersion around the regression line.

Figure2.24: Regressions differing in Accuracy of Prediction


There are two types of standard error of estimates:

1. Standard error of estimate of Y on X, denoted by Syx


2. Standard error of estimate of X on Y, denoted by Syx

There are three methods of finding either standard error:

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Sxy and Sxy are expressed in the same units in which the variable Y and X are expressed
respectively. Since the standard error of estimate measures the deviation of actual data from
the regression line, it indicates how good the regression line is in describing the data. A low
standard error of estimate means actual and estimated values are close to each other and
regression line is a better estimate of the given data, but a high standard error of estimate
indicate that the given regression line does not describe the given data well as the deviation
between the actual and estimated values are very high. A zero standard error of estimate
shows a perfect relationship between actual and estimated values of the variables.

Normal equations to determine regression equation of Y on X are


∑Y = Na + b∑X and
∑XY = a∑X + b∑X2.
Putting values, we get
9 = 5a + 15b and
31 = 15a + 55b
Solving them simultaneously, we get
Yc = 0.6 + 0.4X
Similarly, normal equations to determine regression equation of Y on X are
∑X = Na + b∑Y and
∑XY = a∑Y + b∑Y2 .
Putting values, we get
15 = 5a + 9b and
31 = 9a + 19b
Solving them simultaneously, we get
Xc = 0.43 + 1.43Y

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.4.2 Explained And Unexplained Variation


The total variation in the value of a variable can be split into two parts:

1. The explained variation, i.e., the variation which is explained by the regression line
2. The unexplained variation, i.e., the variation which is unexplained by the regression line.

Thus, Total variation = unexplained variation + explained variation

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure2.26: Explained and Unexplained Variation in Y

The same way variations in the value of X are calculated with X on Y regression line.
r2 = coefficient of determination

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

Figure2.27: Explained and Unexplained Variation in X

Illustration: Given that N = 10, total variation = ∑(Y - )2 = 39 and unexplained variation =
∑(Y – Yc)2 = 19.7, find

1. Coefficient of determination
2. Standard deviation of Y, and
3. Standard error of estimate of Y on X.

Solution:

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

2.5 Difference between Correlation Analysis and Regression


analysis
1. Correlation analysis studies the degree and direction (i.e. positive or negative) of relationship
between the variables while the regression analysis studies the nature of such relationship.

2. With correlation analysis, we cannot predict the value of one variable if the values of other
variables are known while this prediction is possible in regression analysis as it establishes a
functional relationship between the variables.

3. While correlation need not imply cause and effect relationship between the variables under
study, regression analysis presumes the cause and effect relationship.

4. As correlation analysis does not establish any cause and effect relationship no variable can be
regarded as dependent or/and independent variable while in case of regression analysis the
cause is the independent variable and effect is the dependent variable. The value of the
independent variable is used to predict the value of dependent variable.

5. While correlation analysis consists of only one coefficient i.e. r, the regression analysis
consists of two coefficients, i.e., bxy and byx. Thus, rxy= ryx and bxy ≠byx.

6. While Coefficient of Correlation cannot exceed unity; any of the regression coefficients can
very well exceed unity. However, both the regression coefficients cannot exceed unity.

7. The correlation Coefficient is independent of the change of both origin and scale but the
regression coefficients are independent of the change of origin but not scale.

8. Correlation Coefficient is a pure number without any units while regression coefficient takes
the units of measurement of the variable.

9. Correlation analysis is confined only to the study of linear relationship between the variables
while regression analysis studies linear as well as non-linear relationship between the variables.

Summary
 The main objective of regression analysis is to establish a functional relationship between
two or more variables and this is done with the help of regression lines also known as
estimating lines.
 In case of two variables X and Y, we shall have two regression lines – regression line of Y
on X and regression line of X on Y.
 A regression line is fitted on the principle of least squares and such a line is called ‘line of
best fit’. It is drawn in such a way that the sum of the squares of the deviations of the
individual observations from the line is least.
 A line that minimizes the sum of the squares of vertical deviations of the observations
from it, is called, the regression line of Y on X, whereas the line which minimizes the sum
of the squares of horizontal deviations of the observed points from it, is called the
regression line of X on Y.
 When there is perfect positive or perfect negative correlation between the two variables,
the two regression lines will coincide. On the other extreme, if the variables are
independent, the two lines would cut each other at right angles to each other. The higher
the angle between the two-regression lines, the lesser is the degree of correlation. Both
the lines will have either positive slope or negative slope depending upon whether the
correlation between them is positive or negative.

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

 Regression equations are algebraic expressions of the regression lines. Since in the
simple regression there are two regression lines, we have two regression equations.
These are:
Regression equation of Y on X
Yc = a + bX
Regression equation of X on Y
Xc = a + bY
 The coefficient of correlation between two variables can be determined by extracting the
geometric mean of the two regression coefficients, bxy and byx.
 A measure of error in estimating the value of dependent variable based on regression
line is called standard error of estimate. It shows the average deviation of the actual
values of variable from the estimated values of that variable.
 The total variation in the value of a variable can be split into two: the explained variation
i.e. the variation that is explained by the regression line and the unexplained variation,
i.e., the variation that is unexplained by the regression line.

Exercise
1.1 What is time series? What are its important components? Give an example of each
component.
1.2 What is meant by analysis of time series? Discuss its importance in business and
economics.
1.3 Explain cyclical variations in a time series. How do seasonal variations differ from them?
1.4 What is secular trend? How does it differ from other short term variations in a time series
data?
1.5 Explain briefly the additive and multiplicative models of time series. What are their
underlying assumptions? Which of these models is more popular in practice and why?

References

 Berenson and Levine, "Basic Business Statistics: Concepts and Applications", Prentice
Hall.
 Chou, Ya-lun Holt,Rinehart and Winston, New York. Croxton and Cowden, , Prentice Hall,
London ."Statistical analysis”
“Applied general statistics”
 David P. Doane & Lori E.Seward, :Applied Statistics for business and economics" Tata
McGraw Hill Publishing Co. ltd.
 Dhingra, I.C., and M.P. Gupta, "Lectures in Business Statistics", Sultan Chand
 Douglas A,Lind, William G Marshal & Samuel A. Wathen, "Statistics techniques for
business and economics" Tata McGraw Hill Publishing Co. ltd.
 Frank , Harry and Steven C. Althoen, "Statistics: Concepts and Applications", Cambride
Low-priced Editions, 1995.
 Gupta, S.C., "Fundamentals of Statistics", Himalaya Publishing House.
 Gupta, S.P., and Archana Gupta, "Statistical Methods", Sultan Chand and Sons, New
Delhi.
 Kakkar N.K. & Vohra N.D. jnanada prakashan"Statistics-an introductory analysis"
 Levin, Richard and David S. Rubin, "Statistics for Management", 7th Edition, Prentice Hall
of India.
 Sharma J.K., "Business Statistics" second edition ’pearsons education.
 Srivastava T.N. & Shailja Rego, "Statistics for Management", Tata McGraw Hill Publishing
Co. ltd.
 Siegel, Andrew F., , International Edition (4th Ed.), Irwin McGraw Hill."Practical Business
Statistics"
 Spiegel M.D., "Theory and Problems of Statistics", Schaum’s Outlines Series, McGraw Hill
Publishing Co.

Institute of Lifelong Leanring, University of Delhi


(3.2) Regression Analysis

 Yule and Kendal, "An introduction to the theory of statistics", Charles Griffen & co.,
London

Web Links

1.1 http://en.wikipedia.org/wiki/Time_series
1.2 http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc41.htm
1.3
http://mathematics.nayland.school.nz/Year_13_Stats/3.1_timeseries/Time_Series_home.htm#t
op
1.4
http://mathematics.nayland.school.nz/Year_13_Stats/3.1_timeseries/3_seasonal_variation.htm

Glossary
Additive model: A model for the decomposition of time series which assumes that the four
components of time series interact in additive fashion in order to produce the observed values.
Analysis of time series: Analysis of figures comprising a time series for evaluation and
forecasting.
Cyclical variations: Regular but not uniformly periodic short term fluctuations in a time series
caused by business cycles.
Irregular variations: Short term variations in a time series which are purely random and are
the result of unforeseen and unpredictable forces.
Multiplicative model: A model for the decomposition of time series which assumes that the
four components of a time series interact in multiplicative fashion to produce the observed
values.
Seasonal variations: Short term fluctuations in a time series which occur regularly and
periodically within a period of less than one year.
Time series: Data collected over a period of time at regular intervals, and arranged in
chronological sequence.
Trend: General tendency of the data to increase or to decrease or to remain constant over a
long period of time.

Institute of Lifelong Leanring, University of Delhi

You might also like