Eco 3

Classical Linear
Regression Model
(CLRM)
• Regression analysis is concerned with
the study of the dependence of one
variable, the dependent variable, on one
or more other variables, the explanatory
variables, with a view to estimating
and/or predicting the (population) mean
or average value of the dependent
variable in terms of the known or fixed
(in repeated sampling) values of the
latter.
Regression vs. Causation
• Regression analysis deals with the
dependence of one variable on other
variables but it does not necessarily
imply causation.
 Rainfall vs. Crop yield
• Statistical relationship in itself cannot
logically imply causation. To ascribe
causality, one must appeal to a priori or
theoretical considerations.
Regression vs. correlation
• In correlation analysis, the primary
objective is to measure the strength or
degree of linear association between two
variables.
• The coefficient measures this strength of
(linear) association.
 Smoking vs. lung cancer,
 statistics score vs. mathematics score
 Advertisement expenditure vs. sales volume
• There is no distinction between the

dependent and explanatory variables
• Regression analysis try to estimate or
predict the average value of one
variable (dependent variable) on the
basis of the fixed values of other
variables.
Simple vs. Multiple LRM
• Simple regression model
Y= β0 + β1X + u
Multiple regression model

Y= β0 + β1X1 + β2X2 + u
The Meaning Linearity
• Linear both in the parameters and the variables
Y= β0 + β1X
• Linear in the parameters but non-linear in the
Variables
Y= β0 + β1X2
• Non-linear in the parameters but linear in the
Variables
Y= β0 + β12X
• Non-linear both in the parameters and the
variables
Y= β0 + β12X2
Linear regression means a regression that is linear in
the parameters.
Model linear Model linear in variables
in
parameters?
Yes No
Yes LRM LRM
No NLRM NLRM
The Simple Regression Model
y   0  1 x  u (3.1)
• For two variables y and x that represent
some population, the model explain how y
varies with changes in x.
• y is the DEPENDENT or EXPLAINED
variable
• x is the INDEPENDENT or EXPLANATORY
variable
• y is a function of x.
• u is the ERROR TERM or
DISTURBANCE variable
u accounts for all “unobserved”
impacts on y
u takes into account all factors
other than x that affect y
• Why not introduce these variables into
the model explicitly?
 Vagueness of theory
Unavailability of data
 Core variables versus peripheral
variables
 Intrinsic randomness in human behavior
Poor proxy variables
Principle of parsimony
Wrong functional form
• In regression analysis our interest is in
estimating the values of the unknowns
β0 and β1 on the basis of observations
on y and x.
• β1 is the SLOPE PARAMETER. It
captures ceteris paribus effect of x on
y:
y  1x if u  0
• For example, if β1 =3, a 2 unit increase in x
would cause a 6 unit change in y (2 x 3 = 6)
• if x and y are positively (negatively)
correlated, β1 will be positive (negative)
• B0 is the INTERCEPT PARAMETER or
CONSTANT TERM
-not always useful in analysis
• β0 + β1 x is the systematic (explained or
deterministic) part of y
• u is the unsystematic (unexplained) part of
y
• Note that this equation implies CONSTANT
returns
-the first x has the same impact on y as
the 100th x
-to avoid this we can include powers or
change functional forms
SRM is also called the two-variable linear regression
model or bivariate linear regression model
PRF vs. SRF
• The population regression function (PRF) is
an idealized concept. In practice one rarely
has access to the entire population of
interest.
• The primary objective in regression analysis
is to estimate the PRF on the basis of the
sample regression function (SRF) as
accurately as possible.
• How should the SRF be constructed so that
β0 is as close as possible to the true β0 and β1
is as close as possible to the true β1 even
though we will never know the true β0 and β1?
The Method of Ordinary Least Squares (OLS)
• It is one of the most powerful and
popular methods of regression analysis
• PRF
• SRF
• The objective is to determine the SRF in
such a manner that it is as close as
possible to the actual Y.
• This is done by making the sum of
squared residuals of SRF as small as
possible, hence the method is OLS.
• The sum of the squared residuals is
also some function of the estimators of
SRF.
The principle or the method of least
squares chooses in such a manner that,
for a given sample or set of data, is as
small as possible. That is,
The difference between the actual Y values and

the estimated is the ESTIMATED error, or
residuals
Illustration Part I
Y X
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
• Based on the data given let us estimate
of the model:
Numerical properties of the estimators
1. The OLS estimators are expressed
solely in terms of the sample data
2. They are point estimators; that is,
given the sample, each estimator will
provide only a single (point) value of
the relevant population parameter.
3. Properties of the regression line
include:
a) The mean value of the estimated Y is
equal to the mean value of the actual Y
b) The mean value of the residuals is zero.
c) The residuals ui are uncorrelated with the
predicted Yi. [cov(ui, Yi)=0]
d) The residuals ui are uncorrelated with Xi.
[cov(ui, Xi)=0]
e) The line passes through the sample
means of Y and X
• The OLS line ensures the residuals that are
equal in magnitude are given equal
weight. Consider the two residuals –4 and 4.
In both of these observations, the estimated
y-value is equal distance from the observed
y-value, 4 units. It just happens you
overestimated y in the first case and
underestimated y in the second case. By
squaring the residuals, both values are 16 in
the function. Therefore, both residuals
contribute (have equal weight) the same to
the sum of the squared residuals.
• The OLS line penalizes large errors more than
small errors. This trait arises because of the
objective of OLS to minimize the sum of squared
residuals. Consider the following three residuals, 2,
4, and 8. Each residual is twice as large as the
preceding residual. That is, 4 is twice as large as 2
and 8 is twice as large as 4. When the residual is
squared, the penalties (squared values) are 4, 16,
and 64 in the OLS objective function. This trait
places a larger weight on the function when the
estimated y-value is far away from the actual value
than when the estimated y-value is close to the
actual value.
Assumptions of CLRM
• In order to achieve a ceteris paribus
analysis of x’s affect on y and for valid
interpretation of the regression estimates,
we need assumptions about the Xi
variable(s) and the error term.
• All the assumptions listed below pertain to

the PRF only and not the SRF
Assumption 1: Linear regression model.
• The regression model is linear in the
parameters, as shown in:
Yi = β0 + β1Xi + ui
Assumption 2: X values are fixed in repeated

sampling of Y.
• Values taken by the regressor X are
considered fixed in repeated samples.
More technically, X is assumed to be
nonstochastic.
• The assumption implies that our
regression analysis is conditional
regression analysis, that is, conditional
on the given values of the regressor(s) X.
Assumption 3: Zero mean value of
disturbance ui.
• Given the value of X, the mean, or
expected, value of the random disturbance
term ui is zero. Technically, the conditional
mean value of ui is zero. Symbolically, we
have:
E(ui |Xi) = 0
• It means that the factors not explicitly included in
the model, and therefore subsumed in ui, do not
systematically affect the mean value of Y;
Assumption 4: Homoscedasticity or
equal variance of ui.
• Given the value of X, the variance of ui is the
same for all observations. That is, the
conditional variances of ui are identical.
• Symbolically, we have
var (ui |Xi) = σ2
• It neither increases or decreases as X
varies.
• All Y values corresponding to the
various X’s are equally important.
Assumption 5: No autocorrelation
between the disturbances.
• Given any two X values, Xi and Xj (i  j),
the correlation between any two ui and
uj (i  j) is zero.
• Symbolically,
cov (ui, uj |Xi, Xj) = 0
• If correlated, Yt depends not only on Xt
but also on ut−1 for ut−1 to some extent
determines ut.
Assumption 6: Zero covariance
between ui and Xi.
• Formally, cov (ui, Xi) =0
• It means the disturbance u and
explanatory variable X are uncorrelated.
Otherwise, it is difficult to isolate the
influence of X and u on Y.
• If xi is correlated with u for any reason,
then xi is said to be an endogenous
explanatory variable.
• Assumption 7: The number of
observations n must be greater than
the number of parameters to be
estimated. n >p to be stmeted
Alternatively, the number of observations
n must be greater than the number of
explanatory variables.
• n>x
Assumption 8: Variability in X values.
• The X values in a given sample must
not all be the same.
• Technically, var (X) must be a finite
positive number.
• Invariable X makes it impossible to
estimate β2 and therefore β1.
Assumption 9: The regression model
is correctly specified.
• There is no specification bias or error
[choosing the wrong functional form] in
the model used in empirical analysis.
Model specification question
(1) What variables should be included in
the model?
(2) What is the functional form of the
model? Is it linear in the parameters, the
variables, or both?
(3) What are the probabilistic assumptions
made about the Yi , the Xi, and the ui
entering the model?
• Suppose we choose the following two models to
depict the underlying relationship between the
rate of change of money wages and the
unemployment rate:
Yi = β1 + β2 Xi + ui Yi = β1 + β2
(1/Xi ) + ui
where Yi = the rate of change of money wages,
and Xi = the unemployment rate.
If the second model is the “correct” or the “true”
model, fitting the first model to the scatter points
shown in the following Figure will give us wrong
predictions.
• Unfortunately, in practice one rarely
knows the correct variables to include in
the model or the correct functional form
of the model or the correct probabilistic
assumptions about the variables
entering the model for the theory
underlying the particular investigation
may not be strong or robust enough to
answer all these questions.
Assumption 10: There is no perfect
multicollinearity
• There are no perfect linear relationships
among the explanatory variables.
• This relates to multiple regression
models to be discussed in the next
chapters
Precision or Standard Errors of OLS
Estimates
• Regression estimates are a function of
the sample data as the data are likely to
change from sample to sample.
• Therefore, what is needed is some
measure of “reliability” or precision of
the estimators and
• In statistics the precision of an estimate
is measured by its standard error (se).
• The standard error is nothing but the
standard deviation of the sampling
distribution of the estimator obtained
from all possible samples of the same
size from the population.
• It is determined as follows:
Where σ2 is the constant or homoscedastic variance of the
population ui of Assumption 4.
Note that the true error terms (i.e., ui) by definition are not
observed. We do, however, have estimates of the true error
terms, the residuals, and the population variance is
determined based on the residuals as follows:
The denominator is the degrees of freedom which is equal

to the total number of observations minus number of
parameters estimated including the intercept.
• The positive square root of the
variance is the standard error of
estimate or the standard error of the
regression (se).
• It is simply the standard deviation of the
Y values about the estimated regression
line
• It is often used as a summary measure
of the “goodness of fit” of the estimated
regression line.
Illustration Part II
Determine the standard error of the estimates
The basic features of the variances (and
therefore the standard errors) of and .
• Given σ2, the larger the variation in the X values, the
smaller the variance of and hence the greater the
precision with which can be estimated
• As n increases, the precision with which can be
estimated also increases.
• The variance of is directly proportional to σ2and
X2ibut inverselyproportional to and the sample size
n.
Properties of Least-squares Estimators:
The Gauss–Markov Theorem
• Unbiased Estimator: An estimator whose
expected value equals the true population
value.
• Minimum Variance. An estimator is said to be a
minimum-variance estimator of the population if
its variance is smaller than or at most equal to
the variance of any other estimator.
• Efficient estimator : An unbiased estimator
with the lowest variance is the best unbiased
or efficient estimator.
• Linearity. An estimator is said to be a linear
estimator of the population if it is a linear
function of the sample observations.
• Best Linear Unbiased Estimator (BLUE). If

an estimator is linear, is unbiased, and has
minimum variance in the class of all linear
unbiased estimators of the population, then it
is called a best linear unbiased estimator, or
BLUE for short. “Best” refers to the smallest
variance of the estimated coefficients
The Coefficient of Determination
The Coefficient of Determination R2: A
Measure of “Goodness of Fit”
• The coefficient of determination R2 is
a summary measure that tells how
well the sample regression line fits
the data.
• How do we determine it?
• From the idea of fitted and residual components,
we can calculate the TOTAL SUM OF SQUARES
(SST), the EXPLAINED SUM OF SQUARES (SSE)
and the RESIDUAL SUM OF SQUARES (SSR)
n
SST   (Yi - Y) 2
i 1
n
SSE   (Ŷi - Y) 2
i 1
n n
SSR   (Yi - Ŷi ) 2   i
(û ) 2
i 1 i 1
• SST measures the sample variation in Y.
• SSE measures the sample variation in (the
fitted component.
• SSR measures the sample variation in the
residual component (u).
These relate to each other as follows:
SST  SSE  SSR

• R2 is the ratio of the explained variation
compared to the total variation
2 SSE SSR
R   1-
SST SST
”the fraction of the sample variation in Y that is
explained by X”
• R2 always lies between zero and
1
• if R2=1, all actual points lie on
the regression line (usually an
error)
• if R2≈0, the regression explains
very little; OLS is a “poor fit”
• A low R2 is not uncommon in the social
sciences, especially in cross-sectional
analysis
• econometric regressions should not be
heavily judged due to a low R2
• -for example, if R2=0.12, that means
12% of the variation in Y is explained
by the model, which is better than the
0% before the regression
Illustration Part III
Determine the R2
1. In regression, we treat the dependent
variable (Y) and the independent
variable (X) very differently. The Y
variable is assumed to be random or
“stochastic” in some way, i.e. to have a
probability distribution. The X variable is,
however, assumed to have fixed
(“non-stochastic”) values in repeated
samples.

Eco 3

Uploaded by

Copyright:

Available Formats

You might also like

Eco 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eco 3

Uploaded by

Copyright:

Available Formats

Classical Linear

• There is no distinction between the

Multiple regression model

Yes LRM LRM

The difference between the actual Y values and

• All the assumptions listed below pertain to

Assumption 2: X values are fixed in repeated

The denominator is the degrees of freedom which is equal

• Best Linear Unbiased Estimator (BLUE). If

These relate to each other as follows:

SST  SSE  SSR

You might also like