Professional Documents
Culture Documents
Eco 3
Eco 3
Eco 3
Regression Model
(CLRM)
• Regression analysis is concerned with
the study of the dependence of one
variable, the dependent variable, on one
or more other variables, the explanatory
variables, with a view to estimating
and/or predicting the (population) mean
or average value of the dependent
variable in terms of the known or fixed
(in repeated sampling) values of the
latter.
Regression vs. Causation
• Regression analysis deals with the
dependence of one variable on other
variables but it does not necessarily
imply causation.
Rainfall vs. Crop yield
• Statistical relationship in itself cannot
logically imply causation. To ascribe
causality, one must appeal to a priori or
theoretical considerations.
Regression vs. correlation
• In correlation analysis, the primary
objective is to measure the strength or
degree of linear association between two
variables.
• The coefficient measures this strength of
(linear) association.
Smoking vs. lung cancer,
statistics score vs. mathematics score
Advertisement expenditure vs. sales volume
Yes No
No NLRM NLRM
The Simple Regression Model
y 0 1 x u (3.1)
• For two variables y and x that represent
some population, the model explain how y
varies with changes in x.
• y is the DEPENDENT or EXPLAINED
variable
• x is the INDEPENDENT or EXPLANATORY
variable
• y is a function of x.
• u is the ERROR TERM or
DISTURBANCE variable
u accounts for all “unobserved”
impacts on y
u takes into account all factors
other than x that affect y
• Why not introduce these variables into
the model explicitly?
Vagueness of theory
Unavailability of data
Core variables versus peripheral
variables
Intrinsic randomness in human behavior
Poor proxy variables
Principle of parsimony
Wrong functional form
• In regression analysis our interest is in
estimating the values of the unknowns
β0 and β1 on the basis of observations
on y and x.
• β1 is the SLOPE PARAMETER. It
captures ceteris paribus effect of x on
y:
y 1x if u 0
• For example, if β1 =3, a 2 unit increase in x
would cause a 6 unit change in y (2 x 3 = 6)
• if x and y are positively (negatively)
correlated, β1 will be positive (negative)
• B0 is the INTERCEPT PARAMETER or
CONSTANT TERM
-not always useful in analysis
• β0 + β1 x is the systematic (explained or
deterministic) part of y
• u is the unsystematic (unexplained) part of
y
• Note that this equation implies CONSTANT
returns
-the first x has the same impact on y as
the 100th x
-to avoid this we can include powers or
change functional forms
SRM is also called the two-variable linear regression
model or bivariate linear regression model
PRF vs. SRF
• The population regression function (PRF) is
an idealized concept. In practice one rarely
has access to the entire population of
interest.
• The primary objective in regression analysis
is to estimate the PRF on the basis of the
sample regression function (SRF) as
accurately as possible.
• How should the SRF be constructed so that
β0 is as close as possible to the true β0 and β1
is as close as possible to the true β1 even
though we will never know the true β0 and β1?
The Method of Ordinary Least Squares (OLS)
• It is one of the most powerful and
popular methods of regression analysis
• PRF
• SRF
• The objective is to determine the SRF in
such a manner that it is as close as
possible to the actual Y.
• This is done by making the sum of
squared residuals of SRF as small as
possible, hence the method is OLS.
• The sum of the squared residuals is
also some function of the estimators of
SRF.
The principle or the method of least
squares chooses in such a manner that,
for a given sample or set of data, is as
small as possible. That is,
Note that the true error terms (i.e., ui) by definition are not
observed. We do, however, have estimates of the true error
terms, the residuals, and the population variance is
determined based on the residuals as follows:
i 1
n
SSE (Ŷi - Y) 2
i 1
n n
SSR (Yi - Ŷi ) 2 i
(û ) 2
i 1 i 1
• SST measures the sample variation in Y.
• SSE measures the sample variation in (the
fitted component.
• SSR measures the sample variation in the
residual component (u).
Determine the R2
1. In regression, we treat the dependent
variable (Y) and the independent
variable (X) very differently. The Y
variable is assumed to be random or
“stochastic” in some way, i.e. to have a
probability distribution. The X variable is,
however, assumed to have fixed
(“non-stochastic”) values in repeated
samples.