Professional Documents
Culture Documents
05 16 Simple Regression 2
05 16 Simple Regression 2
05 16 Simple Regression 2
Ekki Syamsulhakim
Undergraduate Program
Department of Economics
Universitas Padjadjaran
Last Week
• Chapter 2 – Wooldridge
– Simple regression model
• Definition
• Zero Conditional Mean Assumption
• Derivation of OLS estimates
Today
• Chapter 2 – Wooldridge
– Simple regression model continues
• Goodness of fit
• Interpretation of simple regression parameters
• Chapter 3 – Wooldridge
– Multiple regression model
– Omitted Variable Bias and Multiple Regression Model
– Gauss Markov Assumptions
OLS estimates of and
[2.23]
Fitted Values and Residuals
• We assume that the
intercept and slope
estimates, and , have
been obtained for the
given sample of data.
• Given and , we can
obtain the fitted value
for each observation.
Fitted Values and Residuals
• By definition, each fitted value of is on the
OLS regression line.
• The OLS residual associated with observation ,
, is the difference between and its fitted
value ().
– If is positive, the line underpredicts ;
– if is negative, the line overpredicts
• STATA example (sysuse auto)
Fitted Values and Residuals
clear
sysuse auto
reg price mpg
predict pricehat
predict resid, residual
browse price pricehat resid
scatter price mpg || line pricehat mpg
Algebraic Properties of OLS Statistics
1. The sum, and therefore the sample average of the OLS
residuals, is zero.
Mathematically,
because
then (why?)
making
Algebraic Properties of OLS Statistics
(2.32)
• From property (1), the average of the residuals is
zero; equivalently, the sample average of the fitted
values, , is the same as the sample average of the ,
or .
Algebraic Properties of OLS Statistics
• Further, properties (1) and (2) can be used to
show that the sample covariance between
and is zero.
(2.33)
decomposition by OLS
• Define the total sum of squares (SST), the
explained sum of squares (SSE), and the
residual sum of squares (SSR) (also known as
the sum of squared residuals), as follows:
(2.34)
decomposition by OLS
• Define the total sum of squares (SST), the explained
sum of squares (SSE), and the residual sum of
squares (SSR) (also known as the sum of squared
residuals), as follows:
(2.35)
and
SST= SSE + SSR
(2.36)
Fitted values and residuals
• We want to know if the model can explain if
SSR=
the variations in the dependent variable due
to the variations in the explanatory variable
• We want to measure how well the proportion
SSE
of variation in the explanatory variable
influence the proportion of variation in the
dependent variable
Important Notes
• Some words of caution about SST, SSE, and
SSR are in order. There is no uniform
agreement on the names or abbreviations for
the three quantities defined in equations.
• The total sum of squares is called either SST or
TSS, so there is little confusion here.
Important Notes
• Unfortunately, the explained sum of squares is
sometimes called the “regression sum of
squares.”
– If this term is given its natural abbreviation, it can
easily be confused with the term “residual sum of
squares.”
– Some regression packages refer to the explained
sum of squares as the “model sum of squares.”
Important Notes
• To make matters even worse, the residual sum
of squares is often called the “error sum of
squares.”
• This is especially unfortunate because, as we
will see in Section 2.5, the errors and the
residuals are different quantities.
• We prefer to use the abbreviation SSR to
denote the sum of squared residuals, because
it is more common in econometric packages.
Model’s goodness of fit
• Do we have a good or bad model?
– It is often useful to compute a number that summarizes
how well the OLS regression line fits the data.
• Coefficient of “determination”
• Good to compare models using the same
dependent variable
R 2
• Example
Interpretation of
intercept
• : The model predicted [dependent variable]
to be [ [unit of dependent variable] if
[independent variable] is zero [unit of
independent variable]
• Should we keep ?
• Regression through the origin?
Interpretation of
• If is continuous
• If is linear
• Example
OMITTED VARIABLE BIAS &
MULTIPLE REGRESSION
Multiple Regression Model : Introduction
Similar for
Bias of , When is omitted
• Direction of bias:
• Example
Interpretation of
estimated coefficients
• : A one [unit of independent variable] increase
in [independent variable] [increases/decreases]
[dependent variable] by [ [unit of dependent
variable], assuming other variables constant
• If is continuous
• If is linear
• Example
Assumption of S/MLR
1. LR 1: Linear in Parameter
2. LR 2: Random Sampling
5. LR 5: Homoskedasticity
The primary drawback in using simple regression analysis for empirical
work is that it is very difficult to draw ceteris paribus conclusions
about how x affects y : the key assumption, SLR.3—that all other
factors affecting y are uncorrelated with x—is often unrealistic
Example
Remember…LR1
• We are doing a LINEAR regression model
– SLR 1 Linear in parameter
– We cannot estimate because and are not
linearly related to
– We must use non-linear regression model
yˆ ˆ1 2 ˆ 2 x x, so
yˆ ˆ
1 2 ˆ 2 x
x
– Example
Quadratic Models: Example
• Now let’s go back to our discussion on Income
and age
• According to theory, there should be an
inverted-U shaped relationship between the
two variable
• Hence we need to specify
Quadratic Models: Example
• Let’s use Wooldridge data “SMOKE” in GRETL
– Number of cigarettes smoked each day depends
on age
Logarithmic Models
• Log model natural log, even tough written
“log”
• Interpretation of
Accuracy: semi-elasticity
Accuracy: semi-elasticity
Accuracy: semi-elasticity
Accuracy: semi-elasticity