Regression

Regression
Analysis
Based on the chap 2 and chap 3 , of the
book “Econometrics “ –
Damodar Gujarati
Definition :
 Regression analysis is concerned with the study of
the dependence of one variable,( the dependent
variable), on one or more other variables,( the
explanatory variables), with a view to estimating
and/or predicting the (population) mean or average
value of the former in terms of the known or fixed
(in repeated sampling) values of the latter.
Conditional Mean and
Regression
Geometrically , a regression curve is simply

the locus of the conditional means of the
dependent variable for the fixed values of the
explanatory variable(s). More simply, it is the
curve connecting the means of the
subpopulations of Y corresponding to the
given values of the regressor X.
Population Regression Function (PRF)
• E(Y | Xi ) : This is the mean weekly expenditure (Y) for all families
with a particular income level (Xi ).
• PRF : Y i = E(Y | X i)
• So what we are actually predicting is the mean for all families with a
particular income.
• The deviation of an individual Yi around its expected value as follows:
ui = Y i - E(Y | X i)
• So we can write the PRF as : Y i = E(Y | X i) + u i
How much the individual

ie ; Mean of Y for all differ from the grp mean
Expenditure for families with that of all families who have
the i th family particular income as much of the income
as they have
Population Regression Function (PRF)
 If E(Y | Xi ) is assumed to be linear in Xi , we get the PRF as:
Yi = β1 + β2 Xi + u i 1
 The error term can be interpreted as : it clearly shows that there are other variables besides income
that affect consumption expenditure and that an individual family’s consumption expenditure
cannot be fully explained only by the variable(s) included in the regression model.
 So the disturbance term ui is a surrogate for all those variables that are omitted from the model but
that collectively affect Y.
 Then we may think , why not develop a multiple regression model with as many variables as
possible? There are many reasons for this.
The Sample Regression Function (SRF)
In most practical situations what we have is but a sample of Y values corresponding to some fixed X’s.
Therefore, our task now is to estimate the PRF on the basis of the sample information.
The sample counterpart of Eq. 1 may be written as :

Yi = βˆ 1 + βˆ 2 .Xi + ûi
Where :
Yˆ i = βˆ 1 + βˆ 2 Xi
Yî = estimator of E(Y | X i)
βˆ 1 = estimator of β1
βˆ 2 = estimator of β2
^
ui = sample residual term
To sum up, then, we find our primary objective in
regression analysis is to estimate the PRF :
Yi = β1 + β2 Xi + u i
on the basis of the SRF :
Yi = βˆ 1 + βˆ 2 .Xi + û i
Granted that the SRF is but an approximation of the

PRF, how should the SRF be constructed so that βˆ 1 is
as “close” as possible to the true β1 and βˆ 2 is as “close”
as possible to the true β2 even though we will never
know the true β1 and β2.
 Task is to estimate the population regression function (PRF) on the basis of the sample
regression function (SRF) as accurately as possible.
 Two generally used methods of estimation: (1) ordinary least squares (OLS) and (2) maximum
likelihood (ML).
 The uî (the residuals) are simply the differences between the actual and estimated Y values.
 Choose the SRF in such a way that the sum of the residuals uî , σ(𝑌𝑖 − 𝑌𝑖 ^ ) is as small as
possible. Here all the residuals receive equal importance no matter how close or how
widely scattered the individual observations are from the SRF. (in previous slide see the
residuals u1,u4 and u2,u3)
 We can avoid this problem if we adopt the least-squares criterion, which states that the
SRF can be fixed in such a way that
is as small as possible. By squaring, this method gives weight to residuals.

 The process of differentiation yields the following normal equations for estimating β1 and β2:
 Solving the normal equations simultaneously, we obtain

 It passes through the sample means of Y and X.
 The mean value of the estimated Y ( Yˆ i) is equal to

the mean value of the actual Y.
Properties of
Regression  The mean value of the residuals uî is zero.
line :  The residuals uî are uncorrelated with the

predicted Yi .
 The residuals uî are uncorrelated with Xi.

 The regression model is linear in the parameters, though it
may or may not be linear in the variables.
 All explanatory variables are uncorrelated with the error

term.
Assumptions
 The mean value of ui (error) conditional upon the given Xi
is zero.
 The variance of the error, or disturbance, term is the same

regardless of the value of X. (Homoscedasticity)
 Observations of the error term are uncorrelated with each
other (no auto correlation).
 The number of observations n must be greater than the

number of parameters to be estimated.
Assumptions
 The X values in a given sample must not all be the same.
Technically, var (X) must be a positive number.
Furthermore, there can be no outliers in the values of the
X variable.
The Coefficient of Determination r^2: A Measure of
“Goodness of Fit”.
• r^2 measures the proportion or percentage of the total variation in Y explained by the
regression model.
• Two properties of r 2 may be noted:
It is a nonnegative quantity.
Its limits are 0 ≤ r 2 ≤ 1. r^2 = 0 ,implies there is no relationship btw X and Y. In this
situation the regression line will be horizontal to the X axis.
• It is the ratio of variations explained by the model to the actual variations present in Y.
• It indicates the extent to which the variation in Y is explained by the variation in X.
 TSS : total variation of the actual Y values about their sample mean, which may be called the total sum
of squares.
 ESS : is the explained sum of squares (or regression sum of squares) ie; variation of the estimated Y
values about their mean.
 RSS : Residual sum of squares ie; unexplained variation of the Y values about the regression line.
 TSS = RSS + ESS

Regression

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression

Uploaded by

Copyright:

Available Formats

Regression

Geometrically , a regression curve is simply

• So we can write the PRF as : Y i = E(Y | X i) + u i

How much the individual

The sample counterpart of Eq. 1 may be written as :

Yˆi = estimator of E(Y | X i)

on the basis of the SRF :

Granted that the SRF is but an approximation of the

is as small as possible. By squaring, this method gives weight to residuals.

 Solving the normal equations simultaneously, we obtain

 The mean value of the estimated Y ( Yˆ i) is equal to

line :  The residuals uˆi are uncorrelated with the

 The residuals uˆi are uncorrelated with Xi.

 All explanatory variables are uncorrelated with the error

 The variance of the error, or disturbance, term is the same

 The number of observations n must be greater than the

 TSS = RSS + ESS

You might also like