Professional Documents
Culture Documents
Regression
Regression
Analysis
Based on the chap 2 and chap 3 , of the
book “Econometrics “ –
Damodar Gujarati
Definition :
Regression analysis is concerned with the study of
the dependence of one variable,( the dependent
variable), on one or more other variables,( the
explanatory variables), with a view to estimating
and/or predicting the (population) mean or average
value of the former in terms of the known or fixed
(in repeated sampling) values of the latter.
Conditional Mean and
Regression
• E(Y | Xi ) : This is the mean weekly expenditure (Y) for all families
with a particular income level (Xi ).
• PRF : Y i = E(Y | X i)
• So what we are actually predicting is the mean for all families with a
particular income.
• The deviation of an individual Yi around its expected value as follows:
ui = Y i - E(Y | X i)
Yi = β1 + β2 Xi + u i 1
The error term can be interpreted as : it clearly shows that there are other variables besides income
that affect consumption expenditure and that an individual family’s consumption expenditure
cannot be fully explained only by the variable(s) included in the regression model.
So the disturbance term ui is a surrogate for all those variables that are omitted from the model but
that collectively affect Y.
Then we may think , why not develop a multiple regression model with as many variables as
possible? There are many reasons for this.
The Sample Regression Function (SRF)
In most practical situations what we have is but a sample of Y values corresponding to some fixed X’s.
Therefore, our task now is to estimate the PRF on the basis of the sample information.
Where :
Yˆ i = βˆ 1 + βˆ 2 Xi
βˆ 1 = estimator of β1
βˆ 2 = estimator of β2
^
ui = sample residual term
To sum up, then, we find our primary objective in
regression analysis is to estimate the PRF :
Yi = β1 + β2 Xi + u i
Yi = βˆ 1 + βˆ 2 .Xi + ˆu i
Assumptions
The X values in a given sample must not all be the same.
Technically, var (X) must be a positive number.
Furthermore, there can be no outliers in the values of the
X variable.
The Coefficient of Determination r^2: A Measure of
“Goodness of Fit”.
• r^2 measures the proportion or percentage of the total variation in Y explained by the
regression model.
• Two properties of r 2 may be noted:
It is a nonnegative quantity.
Its limits are 0 ≤ r 2 ≤ 1. r^2 = 0 ,implies there is no relationship btw X and Y. In this
situation the regression line will be horizontal to the X axis.
• It is the ratio of variations explained by the model to the actual variations present in Y.
• It indicates the extent to which the variation in Y is explained by the variation in X.
TSS : total variation of the actual Y values about their sample mean, which may be called the total sum
of squares.
ESS : is the explained sum of squares (or regression sum of squares) ie; variation of the estimated Y
values about their mean.
RSS : Residual sum of squares ie; unexplained variation of the Y values about the regression line.