Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Regression

Analysis
Based on the chap 2 and chap 3 , of the
book “Econometrics “ –

Damodar Gujarati
Definition :
 Regression analysis is concerned with the study of
the dependence of one variable,( the dependent
variable), on one or more other variables,( the
explanatory variables), with a view to estimating
and/or predicting the (population) mean or average
value of the former in terms of the known or fixed
(in repeated sampling) values of the latter.
Conditional Mean and
Regression

Geometrically , a regression curve is simply


the locus of the conditional means of the
dependent variable for the fixed values of the
explanatory variable(s). More simply, it is the
curve connecting the means of the
subpopulations of Y corresponding to the
given values of the regressor X.
Population Regression Function (PRF)

• E(Y | Xi ) : This is the mean weekly expenditure (Y) for all families
with a particular income level (Xi ).
• PRF : Y i = E(Y | X i)
• So what we are actually predicting is the mean for all families with a
particular income.
• The deviation of an individual Yi around its expected value as follows:
ui = Y i - E(Y | X i)

• So we can write the PRF as : Y i = E(Y | X i) + u i

How much the individual


ie ; Mean of Y for all differ from the grp mean
Expenditure for families with that of all families who have
the i th family particular income as much of the income
as they have
Population Regression Function (PRF)
 If E(Y | Xi ) is assumed to be linear in Xi , we get the PRF as:

Yi = β1 + β2 Xi + u i 1

 The error term can be interpreted as : it clearly shows that there are other variables besides income
that affect consumption expenditure and that an individual family’s consumption expenditure
cannot be fully explained only by the variable(s) included in the regression model.

 So the disturbance term ui is a surrogate for all those variables that are omitted from the model but
that collectively affect Y.

 Then we may think , why not develop a multiple regression model with as many variables as
possible? There are many reasons for this.
The Sample Regression Function (SRF)
In most practical situations what we have is but a sample of Y values corresponding to some fixed X’s.
Therefore, our task now is to estimate the PRF on the basis of the sample information.

The sample counterpart of Eq. 1 may be written as :


Yi = βˆ 1 + βˆ 2 .Xi + ˆui

Where :
Yˆ i = βˆ 1 + βˆ 2 Xi

Yˆi = estimator of E(Y | X i)

βˆ 1 = estimator of β1

βˆ 2 = estimator of β2

^
ui = sample residual term
To sum up, then, we find our primary objective in
regression analysis is to estimate the PRF :

Yi = β1 + β2 Xi + u i

on the basis of the SRF :

Yi = βˆ 1 + βˆ 2 .Xi + ˆu i

Granted that the SRF is but an approximation of the


PRF, how should the SRF be constructed so that βˆ 1 is
as “close” as possible to the true β1 and βˆ 2 is as “close”
as possible to the true β2 even though we will never
know the true β1 and β2.
 Task is to estimate the population regression function (PRF) on the basis of the sample
regression function (SRF) as accurately as possible.
 Two generally used methods of estimation: (1) ordinary least squares (OLS) and (2) maximum
likelihood (ML).
 The uˆi (the residuals) are simply the differences between the actual and estimated Y values.
 Choose the SRF in such a way that the sum of the residuals uˆi , σ(𝑌𝑖 − 𝑌𝑖 ^ ) is as small as
possible. Here all the residuals receive equal importance no matter how close or how
widely scattered the individual observations are from the SRF. (in previous slide see the
residuals u1,u4 and u2,u3)
 We can avoid this problem if we adopt the least-squares criterion, which states that the
SRF can be fixed in such a way that

is as small as possible. By squaring, this method gives weight to residuals.


 The process of differentiation yields the following normal equations for estimating β1 and β2:

 Solving the normal equations simultaneously, we obtain


 It passes through the sample means of Y and X.

 The mean value of the estimated Y ( Yˆ i) is equal to


the mean value of the actual Y.
Properties of
Regression  The mean value of the residuals uˆi is zero.

line :  The residuals uˆi are uncorrelated with the


predicted Yi .

 The residuals uˆi are uncorrelated with Xi.


 The regression model is linear in the parameters, though it
may or may not be linear in the variables.

 All explanatory variables are uncorrelated with the error


term.
Assumptions
 The mean value of ui (error) conditional upon the given Xi
is zero.

 The variance of the error, or disturbance, term is the same


regardless of the value of X. (Homoscedasticity)
 Observations of the error term are uncorrelated with each
other (no auto correlation).

 The number of observations n must be greater than the


number of parameters to be estimated.

Assumptions
 The X values in a given sample must not all be the same.
Technically, var (X) must be a positive number.
Furthermore, there can be no outliers in the values of the
X variable.
The Coefficient of Determination r^2: A Measure of
“Goodness of Fit”.

• r^2 measures the proportion or percentage of the total variation in Y explained by the
regression model.
• Two properties of r 2 may be noted:
It is a nonnegative quantity.
Its limits are 0 ≤ r 2 ≤ 1. r^2 = 0 ,implies there is no relationship btw X and Y. In this
situation the regression line will be horizontal to the X axis.
• It is the ratio of variations explained by the model to the actual variations present in Y.
• It indicates the extent to which the variation in Y is explained by the variation in X.
 TSS : total variation of the actual Y values about their sample mean, which may be called the total sum
of squares.
 ESS : is the explained sum of squares (or regression sum of squares) ie; variation of the estimated Y
values about their mean.
 RSS : Residual sum of squares ie; unexplained variation of the Y values about the regression line.

 TSS = RSS + ESS

You might also like