Download as pdf or txt
Download as pdf or txt
You are on page 1of 231

JIMMA UNIVERSITY CHAPTER 1 - 1

2008/09

INTRODUCTION TO

A
D
AB
ECONOMETRICS

N
SE
(ECON. 352)
AS
H

HASSEN A. (M.Sc.)
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 2
2008/09

CHAPTER ONE
INTRODUCTION
1.1 The Econometric Approach
1.2 Models, Economic Models &
HASSEN
Econometric ABDA
Models
1.3 Types of Data for Econometric
Analysis

.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 3
2008/09

1.1 The Econometric Approach


WHAT IS ECONOMETRICS?

 Econometrics means “economic


measurement”
 In simple terms, econometrics deals
HASSEN ABDA
with the application of statistical
methods to economics.
 The application of mathematical &
statistical techniques to data in order to
collect evidence on questions of interest
to economics.
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 4
2008/09

1.1 The Econometric Approach


 Unlike economic statistics, which
mainly collects & summarizes statistical
data, econometrics combines economic
theory, mathematical economics,
economic statistics & mathematical
statistics:
HASSEN ABDA
 economic theory: providing the
theory, or, imposing a logical
structure on the form of the
question). e.g., when price goes
up, quantity demanded goes down.
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 5
2008/09

1.1 The Econometric Approach

 mathematical economics:
expressing economic theory
using math (mathematical form).
 economic statistics: data
HASSEN ABDA
presentation & description.
 mathematical statistics:
estimation & testing techniques.

.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 6
2008/09

1.1 The Econometric Approach


Goals/uses of econometrics
 Estimation/measurement of
economic parameters or
relationships, which may be needed
for policy- or decision-making;
HASSEN ABDA
 Testing (& possibly refining)
economic theory;
 Forecasting/prediction of future
values of economic magnitudes; &
 Evaluation of policies/programs.
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 7
2008/09

1.2 Models, Economic Models & Econometric Models


 Model: a simplified representation of
the real world phenomena.
Combines the economic model
with assumptions about the
random nature of the data
HASSEN ABDA

ECONOMETRIC
MODEL
ECONOMIC MODEL

MODEL .
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 8
2008/09

1.2 Models, Economic Models & Econometric Models


1. Economic theory or model
4. Some
2. Econometric model: a
priori 3. Data
statement of the economic theory
information
in an empirically testable form

HASSEN
5. Estimation ABDA
of the model

6. Tests of any hypothesis


suggested by the economic model

7. Interpreting results & using the


model for prediction & policy .
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 9
2008/09

1.2 Models, Economic Models & Econometric Models


1. Statement of theory or hypothesis:
e.g. Theory: people increase
consumption as income increases, but
not by as much as the increase in their
income.
HASSEN
2. Specification ABDA
of mathematical model:
C = α + βY; 0 < β < 1.
where: C = Consumption,
Y = Income,
β = slope = MPC = ∆C/∆Y,
α = intercept .
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 10
2008/09

1.2 Models, Economic Models & Econometric Models

3. Specification of econometric (statistical)


model:
C = α + βY + ɛ; 0 < β < 1.
α = intercept = autonomous
consumption HASSEN ABDA
ɛ = error/stochastic/disturbance term. It
captures several factors:
 omitted variables,
 measurement error in the dependent
variable and/or wrong functional form.
 randomness of human behavior .
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 11
2008/09

1.2 Models, Economic Models & Econometric Models

4. Obtain data….
5. Estimate parameters of the model: How?
3 methods!
Suppose Cˆ i = 184.08 + 0.8Yi
6. Hypothesis HASSEN
testing: ABDA
Is 0.8 statistically <1?
7. Interpret the results & use the model for
policy or forecasting:
 A 1 Br. increase in income induces an 80
cent rise in consumption, on average.
 If Y = 0, then average C = 184.08 .
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 12
2008/09

1.2 Models, Economic Models & Econometric Models

 Predict the level of C for a given Y,


 Pick the value of the control variable
(Y) to get a desired value of the target
variable (C), …
HASSEN ABDA

.
JIMMA UNIVERSITY HASSEN A. CHAPTER 1 - 13
2008/09

1.3 Types of Data for Econometric Analysis


 Time series data: a set of observations on
the values that a variable takes at different
times. e.g. money supply, unemployment
rate, … over years.
 Cross-sectional data: data on one or more
variables collected at the same point in
time. HASSEN ABDA
 Pooled data: cross-sectional observations
collected over time, but the units don’t
have to be the same.
 Longitudinal/panel data: a special type of
pooled data in which the same cross-
sectional unit (say, a family or a firm) is
surveyed over time. .
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 1
2008/09

CHAPTER TWO
SIMPLE LINEAR REGRESSION
2.1 The Concept of Regression Analysis
2.2 The Simple Linear Regression Model
2.3 The Method of Least Squares
HASSEN ABDA
2.4 Properties of Least-Squares Estimators and the
Gauss-Markov Theorem
2.5 Residuals and Goodness of Fit
2.6 Confidence Intervals and Hypothesis Testing in
Regression Analysis
2.7 Prediction with the Simple Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 2
2008/09

2.1 The Concept of Regression Analysis


 Origin of the word regression!
 Our objective in regression analysis is to find out
how the average value of the dependent variable (or
the regressand) varies with the given values of the
explanatory variable (or the regressor/s).
HASSEN ABDA
 Compare regression & correlation! (dependence vs.
association).
 The key concept underlying regression analysis is
the conditional expectation function (CEF), or
population regression function (PRF).
E[Y | X i ] = f ( X i )
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 3
2008/09

2.1 The Concept of Regression Analysis


 For empirical purposes, it is the stochastic PRF that
matters. Y = E[Y | X ] + ε
i i i

 The stochastic disturbance term ɛi plays a critical


role in estimating the PRF.
HASSEN ABDA
 The PRF is an idealized concept, since in practice
one rarely has access to the entire population.
 Usually, one has just a sample of observations.
 Hence, we use the stochastic sample regression
function (SRF) Yˆi = f(Xi ) to estimate the PRF, i.e., we

]
use: Yi = f (Yˆi ,ei ) to estimate Yi = f (E[Y | Xi ,ei ) .
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 4
2008/09

2.2 The Simple Linear Regression Model


We assume linear PRFs, i.e., regressions that are
linear in parameters (α and β). They may or may not
be linear in variables (Y or X).
E[Y | Xi ] =α + βXi ⇒Yi =α + βXi +εi
Simple because weHASSEN ABDA
have only one regressor (X).
Accordingly, we use:
Yˆ = αˆ + βˆX to estimate E[Y | X i ] = α + β X i .
i i
⇒ αˆ , βˆ and e from a sample are estimates of
i
α, β and ε i , respective ly. .
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 5
2008/09

2.2 The Simple Linear Regression Model


 Using the theoretical relationship between X and Y,
Yi can be decomposed into its non-stochastic
component α+βXi and its random component ɛi.
 This is a theoretical decomposition because we do
not know the values of α and β, or the values of ɛ.
HASSEN ABDA
 An operational decomposition of Y (used for
practical purposes) is with reference to the fitted
line. The actual value of Y is equal to the fitted
value Yˆi = αˆ + βˆX i plus the residual ei.
 The residuals ei serve a similar purpose as the
stochastic term ɛi, but the two are not identical.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 6
2008/09

2.2 The Simple Linear Regression Model

 From the PRF:


Yi = E [Y i | X i ] + ε i ε i = Yi − E[Yi | X i ]

εεεε iiii
YYYY iiii
αααα
β X
but, E[Yi | X i ] = α + βX i = − −

iiii
HASSEN ABDA
 From the SRF:
Y = Yˆ + e e i = Y i − Yˆi
i i i
eeee iiii
YYYY iiii
αααα
ββββ
XXXX
iiii
but Yˆi = αˆ + βˆX = − −
ˆ ˆ
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 7
2008/09

2.2 The Simple Linear Regression Model

E[Y|X2] = α + βX2
O4
Y ɛ4
E[Y|Xi] = α + βXi
O1 P3
HASSEN ABDA P4
ɛ1 P2 ɛ3
ɛ2 O3
α P1
O2

X
X1 X2 X3 X4
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 8
2008/09

2.2 The Simple Linear Regression Model

O4
SRF : Yˆi = αˆ + βˆX i
Y e4 ɛ
4
R4 PRF: Y = α + βX
R3 i i
O1
HASSEN ABDA P4 Ɛi & ei are
RP P3
e1 ɛ1 22 e3 not identical
ɛ3
e2 ɛ2 Ɛ1 < e1
P1 O3
α O2
R1 Ɛ2 = e2
α̂
Ɛ3 < e3
X1 X2 X3 X4 X Ɛ4 > e4
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 9
2008/09

2.3 The Method of Least Squares


 Remember that our sample is only one of the large
number of possibilities.
 Implication: the SRF line in the figure above is just
one of the many possible such lines. Each of the
SRF lines has unique αˆ and βˆ values.
HASSEN ABDA
 Then, which of these lines should we choose?
 Generally we will look for the SRF which is very close
to the PRF.
 But, how can we devise a rule that makes the SRF
as close as possible to the PRF? Equivalently, how
can we choose the best technique to estimate the
parameters of interest (α and β)?
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 10
2008/09

2.3 The Method of Least Squares


Generally speaking, there are 3 methods of
estimation:
 method of least squares,
 method of moments, and
 maximum likelihood estimation.
HASSEN ABDA
The most common method for fitting a regression
line is the method of least-squares. We will use
the LSE, specifically, the Ordinary Least Squares
(OLS) in Chapters 2 and 3.
What does the OLS do?
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 11
2008/09

2.3 The Method of Least Squares


A line gives a good fit to a set of data if the points
(actual observations) are close to it. That is, the
predicted values obtained by using the line should
be close to the values that were actually observed.
Meaning, the residuals should be small. Therefore,
when assessing theHASSEN
fit of a line,
ABDAthe vertical distances
of the points to the line are the only distances that
matter because errors are measured as vertical
distances.
The OLS method calculates the best-fitting line for
the observed data by minimizing the sum of the
squares of the vertical deviations from each data
point to the line (the RSS).
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 12
2008/09

2.3 The Method of Least Squares


n

RSS =∑ i
2
 Minimize e
i =1
 We could think of minimizing RSS by successively
choosing pairs of values for αˆ and βˆ until RSS is
made as small as possible
 But, we will use differential calculus (which turns
HASSEN ABDA
out to be a lot easier).
 Why the squares of the residuals? Why not just
minimize the sum of the residuals?
 To prevent negative residuals from cancelling
positive ones. Because the deviations are first
squared, then summed, there are no cancellations
between positive and negative values.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 13
2008/09

2.3 The Method of Least Squares


n

 If we use ∑ ei , all the error terms ei would receive


i =1
equal importance no matter how close or how
widely scattered the individual observations are
from the SRF.
 A consequence of this is that
HASSEN it is quite possible that
ABDA
the algebraic sum of the ei is small (even zero)
although the eis are widely scattered about the SRF.
 Besides, the OLS estimates possess desirable
properties of estimators under some assumptions.
 OLS Technique: n n n

minimize ∑ e = (Y − Yˆ ) = (Y − αˆ − βˆX )2
2
i ∑ 2
i i ∑ i i
i =1 i =1 i =1
αˆ , βˆ
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 14
2008/09

2.3 The Method of Least Squares


n n

F.O.C.: (1) ∂ (∑ ei2 ) ∂[∑ (Yi − αˆ − βˆX i ) 2 ]


i =1
=0⇒ i =1
=0
∂αˆ ∂αˆ
n
⇒ 2.[∑ (Yi − αˆ − βˆX i )][−1] = 0
i =1
n
⇒HASSEN
∑ i
(Y −ABDA
αˆ − βˆX ) = 0
i
i =1
n n n
⇒ ∑ Yi − ∑αˆ − ∑ βˆX i = 0
i =1 i =1 i =1
n n
⇒ ∑ Yi − nαˆ − βˆ ∑ X i = 0.

αααα
YYYY
ββββ
XXXX
i =1 i =1

⇒Y −αˆ − βˆX = 0 ⇒ ˆ= −ˆ
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 15
2008/09

2.3 The Method of Least Squares


n n
F.O.C.: (2) ∂ ( ∑ e i2 ) ∂[ ∑ (Y i − αˆ − βˆX i ) 2 ]
i =1
=0⇒ i =1
=0
∂ βˆ ∂ βˆ
n
⇒ 2.[∑(Yi − αˆ − βˆX i )][− X i ] = 0
i =1
HASSEN
n ABDA
⇒ ∑ [(Y i − αˆ − βˆX i ) ( X i )] = 0
i =1
n n n
⇒ ∑ Yi X i − ∑ αˆX i − ∑ βˆX i2 = 0
i =1 i =1 i =1

n n n
⇒ ∑Yi X i = αˆ ∑ X i + βˆ ∑ X i2
i =1 i =1 i =1
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 16
2008/09

2.3 The Method of Least Squares

YYYY iiii
XXXX iiii
αααα
XXXX iiii
ββββ
XXXX2222 iiii
αααα
YYYY
ββββ
XXXX
Solve ˆ = − ˆ and ∑ = ˆ∑ + ˆ∑
(called normal equations) simultaneously!
n n n

∑Yi X i = α ∑ X i + β ∑ X i ∑ i i
ˆ ˆ 2
⇒ Y X = (Y − ˆ
βX)( X
∑ i ) + ˆ
β X
∑ i
2

i =1 i =1 i =1

⇒ ∑Yi Xi = Y ∑ Xi − βHASSEN
ˆX ∑ Xi + βABDA
ˆ ∑ Xi
2

⇒ ∑Yi X i − Y ∑ X i = βˆ ∑ X i2 − βˆX ∑ X i

⇒ ∑Yi Xi − Y ∑ Xi = β( ∑ Xi − X ∑ Xi )
ˆ 2

⇒∑Yi Xi −nXY = β( ∑Xi −nX )


ˆ 2 2

∑ Xi
b/c X = ⇔ ∑ X i = nX.
n
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 17
2008/09

2.3 The Method of Least Squares

Thus, ∑Yi X i − nXY


1. βˆ =
X 2 − nX 2 To easily recall
∑ i

ββββ
the formula:
Alternative expressions for
ˆ :
HASSEN ABDA
∑(X − X)(Y −Y ) ˆβ = ∑ xy
2. βˆ = i i ∑x
2

2 where: x = X − X & y = Y − Y .
∑ ( Xi− X) i i

Cov( X , Y ) n∑Y X − (∑ X )(∑Y )


3. β =
ˆ 4. βˆ = i i i i
Var( X ) n∑ X 2 − (∑ X 17)
i i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 18
2008/09

2.3 The Method of Least Squares

αααα
YYYY
ββββ
XXXX
αααα
for ˆ just use: ˆ = − ˆ ∑Yi Xi − nXY
Or, if you wish: αˆ = Y −{X.[ ]}
X 2 − nX 2
2 2 2 ∑
[∑ X − nX ]Y −[X∑Y X − nX Y] i
⇒αˆ = i i i
2 2
∑ X − nX HASSEN ABDA
i
Y ∑ X 2 − nX 2Y − X ∑Y X + nX 2Y
⇒αˆ = i i i
X 2 − nX 2
∑ i
2
Y ∑ X − X ∑Y X (∑Y )(∑ X 2) − (∑ X )(∑Y X )
i i i ⇒αˆ = i i i i i
⇒ αˆ =
∑ X 2 − nX 2 n(∑ X 2 − nX 2)
i i 18
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 19
2008/09

2.3 The Method of Least Squares

Previously, we came across the following two


normal
n
equations: n

1. ∑(Yi −αˆ −βˆXi ) = 0 this is equivalent to: ∑ei = 0


i=1
i=1
n n

− α − ˆX )( X HASSEN
β = ABDA ∑e X = 0
2. ∑ i
[(Y
i=1
ˆ i i )] 0 equivalently, i
i=1
i

Note also the following property: Y = Yˆ


Y = Yˆ + e ∑ Yi ∑ Yˆi ∑ e i
= +
i i i ⇒
n n n
⇒∑Y = ∑Yˆ + ∑e ⇒ Y = Yˆ since e = 0 ⇔ e = 0.
i i i ∑i 19
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 20
2008/09

2.3 The Method of Least Squares

The facts that Ŷ and Y have the same average and


that this average value is achieved at the average
value of X (i.e., Y = Yˆ & Y = αˆ + βˆX ) together imply
that the sample regression line passes through the
sample mean/average values
HASSEN of X and Y.
ABDA

Yˆi = αˆ + βˆXi
Y

X
X
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 21
2008/09

2.3 The Method of Least Squares


Assumptions Underlying the Method of Least Squares
 To obtain the estimates of α and β, assuming that
our model is correctly specified and that the
systematic and the stochastic components in the
equation are independent suffice.
 But the objective inHASSEN ABDAanalysis is not only
regression
to obtain αˆ and βˆ but also to draw inferences about
the true α and β . For example, we’d like to know
how close αˆ and βˆ are to α and β or Ŷi to E[Y | X i ] .
 To that end, we must not only specify the functional
form of the model, but also make certain assumps
about the manner in which Y are generated.
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 22
2008/09

2.3 The Method of Least Squares


Assumptions Underlying the Method of Least Squares
The PRFYi = α + βXi + εi shows that Yi depends on
both Xi and εi .
 Therefore, unless we are specific about howXi and εi
are created or generated, there is no way we can
HASSEN ABDA
make any statistical inference about the Yi and also
about α and β .
 Thus, the assumptions made about the X variable
and the error term are extremely critical to the valid
interpretation of the regression estimates.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 23
2008/09

2.3 The Method of Least Squares


THE ASSUMPTIONS:
1. Zero mean value of disturbance, ɛi: E(ɛi|Xi) = 0.
Or equivalently, E[Yi|Xi] = α + βXi.
2. Homoscedasticity or equal variance of ɛi. Given the
value of X, the variance of ɛi is the same (finite
positive constant σHASSEN ABDA
2) for all observations. That is,
var(ɛi|Xi) = E[ɛi–E(ɛi|Xi)]2 = E(ɛi)2 = σ2.
By implication: var(Yi|Xi) = σ2.
var(Yi|Xi) = E{α+βXi+ɛi – (α+βXi)}2
= E(ɛi)2
= σ2 for all i.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 24
2008/09

2.3 The Method of Least Squares


3. No autocorrelation between the disturbance terms.
Each random error term ɛi has zero covariance
with, or is uncorrelated with, each and every other
random error term ɛs (for s ≠ i).
cov(ɛi,ɛs|Xi,Xs) = E{[ɛi−E(ɛi)]|Xi}{[ɛs−E(ɛs)]|Xs} =
E(ɛi|Xi)(ɛs|Xs) = 0.HASSEN ABDA
Equivalently, cov(Yi,Ys|Xi,Xs) = 0. (for all s ≠ i).
4. The disturbance ɛ and explanatory variable X are
uncorrelated. cov(ɛi,Xi) = 0.
cov(ɛi,Xi) = E[ɛi−E(ɛi)][Xi−E(Xi)]
= E[ɛi(Xi−E(Xi))]
= E(ɛiXi)−E(Xi)E(ɛi) = E(ɛiXi) = 0
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 25
2008/09

2.3 The Method of Least Squares


5. The error terms are normally and independently
distributed, i.e., ε i ~ NID(0, σ ).
2

Assumptions 1 to 3 together imply that i ε ~ IID( 0, σ 2


).
The normality assumption enables us to derive the
sampling distributions of the OLS estimators (
αˆ and βˆ ). ThisHASSEN ABDA
simplifies the task of establishing
confidence intervals and testing hypotheses.
6. X is assumed to be non-stochastic, and must take at
least two different values.
7. The number of observations n must be greater than
the number of parameters to be estimated.
n > 2 in this case.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 26
2008/09

2.3 The Method of Least Squares


Firm (i) Sales (Yi) Advertising Expense (Xi)
Numerical
1 11 10
Example:
2 10 7
Explaining sales
3 12 10
= f(advertising)
4 6 5
Sales are in HASSEN ABDA
5 10 8
thousands
6 7 8
of Birr &
7 9 6
advertising
expenses are in 8 10 7
hundreds of Birr. 9 11 9
10 10 10
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 27
2008/09

2.3 The Method of Least Squares


i Yi Xi y i = Yi − Y xi = X i − X xi y i 10
1 11 10 1.4 2 2.8 ∑Y i
96
2 10 7 0.4 -1 -0.4 Y = i =1
=
. n 10
3 12 10 2.4 2 4.8
= 9.6
4 6 5 -3.6 -3 10.8
10
5 10 8 0.4 HASSEN 0ABDA 0
6 7 8 -2.6 0 0
∑X i
80
X= i =1
=
7 9 6 -0.6 -2 1.2 n 10
8 10 7 -0.4 -1 -0.4 =8
9 11 9 -1.4 1 1.4
10 10 10 0.4 2 0.8
Ʃ 96 80 0 0 21
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 28
2008/09

2.3 The Method of Least Squares


i yi y i2 xi xi2
1 1.4 1.96 2 4
βˆ =
∑ xiy i
=
21
= 0.75
2 0.4 0.16 -1 1
3
.
2.4 5.76 2 4
∑x 2
i 28

4 -3.6 12.96 -3 9
HASSEN
0 ABDA
5 0.4 1.96 0
αˆ = Y − βˆX
6 -2.6 6.76 0 0
7 -0.6 0.36 -2 4 = 9.6 − 0.75(8) = 3.6
8 -0.4 0.16 -1 1
9 -1.4 1.96 1 1
10 0.4 0.16 2 4
Ʃ 0 30.4 0 28
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 29
2008/09

2.3 The Method of Least Squares


Yˆi = 3.6 + 0.75Xi ei = Yi −Yi e i
∑ = 14.65
2
i ˆ 2
e i
1 11.1 -0.10 0.01
2.
3
8.85
11.10
1.15
0.90
1.3225
0.81 ∑ yˆ = 15.75
2
i
4 7.35 -1.35 1.8225
5
6
9.60
9.60
HASSEN
0.40
-2.60
ABDA
0.16
6.76
∑ y = 30.42
i

7 8.10 0.90 0.81 ∑y = ∑x = ∑yˆ


i i i
1.15 1.3225
8 8.85
= ∑e = 0 i
9 10.35 0.65 0.4225
10 11.10 -1.10 1.21

Ʃ 96 0 14.65
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 30
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


☞Given the assumptions of the classical linear
regression model, the least-squares estimators
possess some ideal or optimum properties.
These statistical properties are extremely important
because they provide criteria for choosing among
HASSEN ABDA
alternative estimators.
These properties are contained in the well-known
Gauss–Markov Theorem.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 31
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


Gauss-Markov Theorem: Under the above
assumptions of the linear regression model, the
estimators αˆ and βˆ have the smallest variance of
all linear and unbiased estimators of α and β . That
is, OLS estimators are the Best Linear Unbiased
Estimators (BLUE)HASSEN ABDA
of α and β .
The Gauss-Markov Theorem does not depend on the
assumption of normality (of the error terms).

Let us prove that β̂ is the BLUE of β !


JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 32
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


Linearity of β̂ : (in a stochastic variable, Yi or ε i ).
∑ xi yi ∑ xi (Yi − Y ) xi
β=
ˆ = ⇒ βˆ =
∑xi 2
∑xi 2 ∑ ( x 2 )Yi
∑ i
=
∑ x Y ∑x Y

i i i
xi
⇒ β ABDA
HASSENˆ = ∑kiYi whereki =
∑x ∑x2
i
2
i
∑ xi2

βˆ = ∑ x Y

i i Y ∑ x i

∑x ∑x2
i
2
i
⇒βˆ = k1Y1 + k2Y2 +...+ knYn
=
∑ xYi i
(since∑x = 0) i
∑x 2
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 33
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem

Note that:
(1) ∑ i is a constant
2
x
(2) because xi is non-stochastic, ki is also nonstochastic
(3) .∑ k i = ∑ ( xi
)=
∑ xi
=0
∑ x HASSEN
2
i ∑ x ABDA
2
i

= ∑(
x
)( x ) =
∑ x
=1
2
i
(4) . ∑ k i x i
i
i
∑x 2
i ∑x 2
i

xi
(5) . ∑ k = ∑[( 2 )] =
2 2
=
1
.
∑i
x 2

i
∑ xi (∑ xi )
2 2
∑ xi
2

xi xi
(6) . ∑ki X i = ∑( 2 )( X i ) = ∑( 2 )( xi + X ) = 2 + ∑i
X ∑ xi
x 2

=1
∑ xi ∑ xi ∑ xi ∑ xi2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 34
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


Unbiasedness: βˆ = ∑ k i Yi
βˆ = ∑ k i ( α + β X i + ε i )
βˆ = α ∑ ki + β ∑ ki X i + ∑ kiε i
βˆ = β + ∑ kiε i [because
HASSEN∑ ki = 0 and ∑ ki X i = 1]
ABDA

E ( βˆ ) = E ( β ) + E ( k1ε 1 + k 2 ε 2 + ... + k n ε n )
E ( βˆ ) = E ( β ) + ( ∑ k i ).E (ε i )
E(βˆ ) = β + ( k ).(0)
∑ i

E(βˆ ) = β
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 35
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


Efficiency:
~
Suppose β is another unbiased linear estimator of β .
~
Then, var( β ) ≤ var( β ) .
ˆ
Proof: var( βˆ ) = var( ∑ k i Yi )
HASSEN ABDA
var( β ) = var( k1Y1 + k 2Y2 + ... + k n Yn )
ˆ
var( βˆ ) = var( k1Y1 ) + var( k 2 Y2 ) + ... + var( k n Yn )
{since the covariance between Yi and Ys (for ∀i ≠ s) = 0}
var(β ) = k1 var(Y1 ) + k2 var(Y2 ) + ... + kn var(Yn )
ˆ 2 2 2

var(β ) = k (σ ) + k (σ ) + ... + k (σ )
ˆ 2
1
2 2
2
2 2
n
2
.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 36
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


σ 2

var(β ) = σ ∑ki or , var( βˆ ) = x 2


ˆ 2 2

~ ∑ i

Suppose : β = ∑ wi Yi where wi s are coefficien ts.


~
β = ∑ wiYi
~
β = ∑ wi (α + β X i + ε i )
HASSEN ABDA
~
β = α ∑ wi + β ∑ wi Xi + ∑ wiε i
~
E(β) = ( ∑ wi ).E(α) + ( ∑ wi X i ).E( β) + ( ∑ wi ).E(εi )
~
E(β ) = ( ∑ wi ).α + ( ∑ wi X i ).β
~
for β to be an unbiased estimator of β , ∑ wi = 0 and ∑w X
i i = 1.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 37
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


~
. var(β ) = var(∑ wi Yi )
~
var(β ) = var(w1Y1 + w2Y2 + ... + wnYn )
~
var( β ) = var( w1Y1 ) + var( w2Y2 ) + ... + var( wn Yn )
since the covariance HASSEN Y and Y (for ∀i ≠ s) = 0
between ABDA
i s
~
var(β ) = w1 var(Y1 ) + w2 var(Y2 ) + ... + wn var(Yn )
2 2 2

~
var(β ) = w1 (σ ) + w2 (σ ) + ... + wn (σ )
2 2 2 2 2 2

~
var(β ) = σ 2
∑w 2
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 38
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


~
. ∗ Let us now compare var( β ) and var( β )!
ˆ
∗ Suppose wi ≠ k i , and the r/p b/n them
be given by : d i = wi − k i .
∗ Because both ∑ wi and ∑k i equal zero :
⇒ ∑ d i = ∑ wiHASSEN
− ∑ k i ABDA
=0
∗ Because both ∑ wi xi and ∑ k i xi equal one :
⇒ ∑ d i xi = ∑ wi xi − ∑ k i xi = 1 − 1 = 0
∗ (wi ) 2 = (k i + d i ) 2 ⇒ wi2 = k i2 + d i2 + 2k i d i
xi
⇒ ∑ w = ∑ k + ∑ d + 2∑ ( d i )(
2
i i
2
i
2
)
∑ xi2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 39
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


. ⇒ ∑wi2 = ∑ki2 + ∑di2 + 2( 1 2 )(∑di xi )
∑xi
1
⇒ ∑w = ∑k + ∑d + 2(
2
i i
2
i
2
)(0)
∑xi2

⇒ ∑wi2 = ∑ki2 + ∑di2


HASSEN wi ≠ k i , not all d i s
ABDA
(given
⇒ ∑w > ∑k 2
i i
2
are zero a nd thus, d i2 > 0 ).

⇒σ 2
∑w >σ ∑k
2
i
2
i
2

~ ~
var( β ) = var (βˆ) if and o nly if all
⇒ var(β) > var(β). d s are zero and thus, ∑ d
ˆ
i
2
i39 = 0.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 40
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem

Linearity of αˆ: αˆ = Y − βˆ X
⇒ αˆ = Y − X{∑kiY i }
⇒ αˆ = Y − X{k1Y1 + k2Y2 + ...+ knYn }
1
⇒ αˆ = (Y1 + Y2 + ... +HASSEN
Yn ) − {XABDA
k1Y1 + Xk2Y2 + ...+ XknYn }
n
1 1 1
⇒αˆ = ( − Xk1)Y1 + ( − Xk2 )Y2 + ...+ ( − Xkn )Yn
n n n
1
⇒αˆ = f1Y1 + f2Y2 + ...+ fnYn where fi = − Xki
n 40
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 41
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


Unbiasedness:αˆ = Y − βˆX
⇒αˆ = (α + βX) − X{(∑ki )(α + βXi + ε i )}

⇒ αˆ = (α + βX) − X{α ∑ki + β ∑ki X i + ∑ki ε i }


⇒ αˆ = (α + βX) −HASSEN ABDA
X{β + ∑ ki ε i }
⇒ αˆ = (α + β X − β X − X ∑kε i i )
⇒ E (αˆ ) = E (α ) − E ( X ∑ k i ε i )
⇒ E (αˆ ) = E (α ) − X ( ∑ k i ). E (ε i )
⇒ E (αˆ ) = α
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 42
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


Efficiency:
Suppose α~ is another unbiased linear estimator of α .
Then, var( αˆ ) ≤ var( α~ ) .
Proof: var(αˆ ) = var( fiYi )

var(αˆ ) = var( f1Y1 +ABDA
HASSEN f 2Y2 + ... + f nYn )
var(αˆ ) = var( f1Y1 ) + var( f 2Y2 ) + ... + var( f nYn )
{since cov(Yi , Ys ) = 0 for ∀i ≠ s}
var(α ) = f1 var(Y1 ) + f2 var(Y2 ) + ...+ fn var(Yn )
ˆ 2 2 2

var(α ) = f1 (σ ) + f2 (σ ) +... + fn (σ ) =σ ∑ fi
ˆ 2 2 2 2 2 2 2 2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 43
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


1
var(αˆ ) = σ ∑ fi = σ (∑( − Xki )2 )
2 2 2

n
1 2
var(αˆ ) = σ {∑( 2 + X ki − Xki )}
2 2 2

n n
2
2 1 2 1 X
var(αˆ ) = σ { + X ∑ kiHASSEN
2 2
− X ∑ABDA
ki } var(αˆ ) = σ 2 ( + )
n n n ∑xi 2

2
1 1 X
ˆ) = σ 2 ∑ i 2
2
var(αˆ ) = σ { + X ∑ ki } = σ { +
2 2 2 2
}
2 or,var(α
X
n n ∑ xi ∑
n x i

note that :
2
1 1 X
∑ fi =∑(n − Xki ) =1− X∑ki =1 ∑ fi = +
2

n ∑43
xi
2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 44
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


Suppose : α~ = ∑ ziYi where z i s are coefficients.
α~ = ∑ z i Yi
α~ = ∑ z (α + β X
i i + εi)
α~ = α ∑ zi + β ∑ zi Xi + ∑ ziε i
~ HASSEN ABDA
E(α) = ( ∑ zi ).E(α) + ( ∑ zi X i ).E( β) + ( ∑ zi ).E(εi )
E(α~) = ( ∑ zi ).α + ( ∑ zi X i ).β
for α~ to be an unbiased estimator of α , ∑ z i = 1 & ∑ z i X i = 0.

var(α~) = var(∑ ziYi )


var(α~) = var(z1Y1 + z2Y2 + ... + znYn )
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 45
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


~
var(α ) = var(z1Y1 ) + var(z 2Y2 ) + ... + var(z nYn )
since cov (Y , Y ) = 0 for ∀i ≠ s.
i s
.
~
var(α ) = z1 var(Y1 ) + z2 var(Y2 ) + ... + zn var(Yn )
2 2 2

~
var(α ) = z (σ ) + z (σ ) + ... + z (σ )
2 2 HASSEN
2 2 ABDA 2 2
1 2 n
~
var(α ) = σ 2
∑z 2
i
∗ Let us now compare var(αˆ ) and var(α~ )!
∗ Suppose zi ≠ f i , and the relatioship
b/n them be given by : d i = zi − f i .
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 46
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


. ∗ Because ∑ z i X i = 0, and ∑ z i = 1,
⇒ ∑ z i xi = ∑ z i ( X i − X ) = ∑ z i X i − ∑ z i X
= ∑ zi X i − X ∑ zi
= 0 − X (1) = − X
1 xi
∑d = ∑ z + ∑ f
i
2 2
i i − 2{∑
2HASSEN
zABDA
i fi }
where f i = − X (
n ∑ xi2
)

1 xi
∑ d = ∑ z + ∑ f i − 2{∑[ zi ( n − X
i
2 2
i
2
)]}
∑ xi2

1 X
⇒∑d = ∑z + ∑fi − 2{ ∑zi − ( 2 )(∑zi xi )}
i
2 2
i
2

n ∑xi
1 X
⇒∑d = ∑z + ∑fi − 2{ − ( 2 )(−X)}
i
2 2
i
2

n ∑xi
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 47
2008/09

2.4 Properties of OLS Estimators and the Gauss-Markov Theorem


2
1 X
⇒ ∑ d i2 = ∑ z i2 + ∑ f i 2 − 2{ + }
n ∑ xi 2

⇒ ∑ d i2 = ∑ z i2 + ∑ f i 2 − 2∑ f i 2
.
⇒ d =
∑ i ∑ i ∑ i
2
z 2
− f 2

⇒∑z =∑ ∑2
i d +
HASSEN
i
2
ABDA
f 2
i

⇒ ∑ z >∑ f i
2
i
2

⇒σ ∑ z >σ ∑ f
2
i
2 2
i
2

~
⇒ var(α ) > var(αˆ ). all ds and ∑d are zero.
var (α~) = var (αˆ) if and only if
2
i
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 48
2008/09

2.5 Residuals and Goodness of Fit


Decomposing the variation in Y:

HASSEN ABDA
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 49
2008/09

2.5 Residuals and Goodness of Fit


Decomposing the variation in Y:
One measure of the variation in Y is the sum of its
squared deviations around its sample mean, often
described as the Total Sum of Squares, TSS.
TSS, the total sum of squares of Y can be
HASSEN ABDA
decomposed into ESS, the ‘explained’ sum of
squares, and RSS, the residual (‘unexplained’) sum
of squares.
TSS = ESS + RSS

∑(Yi −Y) = ∑(Yi −Y) + ∑ei


2 ˆ 2 2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 50
2008/09

2.5 Residuals and Goodness of Fit


Yi = Yˆi + ei ⇒ Yi − Y = Yˆi − Y + ei
(Yi − Y ) = (Yi − Y + ei )
2 ˆ 2

∑ (Yi − Y ) = ∑ (Yi − Y + ei )
2 ˆ 2

∑y 2
i = ∑ ( yˆ i + HASSEN
2
ei ) ABDA

∑ i ∑ i ∑ i + 2∑ yˆi ei
y 2
= ˆ
y 2
+ e 2

The last term equals zero:


∑ ii ∑ i
ˆ
y e = (Yˆ − Y )ei = ∑ i ei − ∑Yei

⇒ ∑ yˆi ei = ∑(αˆ + βˆXi )ei − Y ∑ei


JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 51
2008/09

2.5 Residuals and Goodness of Fit


. ⇒ ˆ
y e = α
ˆ e + β
ˆ Xe
∑ i i ∑i ∑ i i
⇒ ∑ yˆi ei = 0
Hence: ⇒ y 2 = yˆ 2 + e2
∑ i ∑ i ∑i
HASSEN ABDA
TSS = ESS + RSS
30.4 = 15.75 + 14.65
Coefficient of Determination (R2):
the proportion of the variation in the dependent
variable that is explained by the model.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 52
2008/09

2.5 Residuals and Goodness of Fit


1. R 2
=
ESS
=
∑ yˆ 2
TSS ∑ y2
ESS 2 ∑
x 2

2. R =
2
=βˆ
∑ ( βˆ x ) 2
R2 =
ESS
=
TSS ∑ y 2

TSS ∑ y2
 The OLS regression coefficients are chosen in such
HASSEN ABDA
a way as to minimize the sum of the squares of the
residuals. Thus it automatically follows that they
maximize R2. ⇒1=
ESS
+
RSS
TSS = ESS + RSS TSS TSS
ESS RSS

TSS ESS RSS ⇒
= + TSS
=1−
TSS ⇒ 3. R2 = 1−
∑i
e 2

TSS TSS TSS ∑


52
y 2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 53
2008/09

2.5 Residuals and Goodness of Fit


Coefficient of Determination (R2):
R =
2 ESS
= β(
ˆ ∑ xy ∑ x 2
ESS ∑ ˆy i2
)( ) R =
2
=
TSS ∑x ∑y
2 2
TSS ∑y 2

4. R 2
=
ESS ∑ xy
= βHASSEN
ˆ =
15 . 75
= 0 .5181
2 ABDA 30 . 4
TSS ∑ y

R 2
=
∑ xy ∑ xy
∑ x ∑ y
2 2

2
( ∑ xy )2
[cov(X , Y )]
⇒ 5. R 2
= ⇒ 6. R = 2

∑x ∑y2 2
var(X ) × var(Y )
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 54
2008/09

2.5 Residuals and Goodness of Fit


 A natural criterion of goodness of fit is the
correlation between the actual and fitted values of
Y. The least squares principle also maximizes this.
 In fact, ⇒ R =HASSEN 2
= (rx, y )
(ryˆ , y )ABDA 2 2

where r yˆ , y and rx,y are the coefficients of correlation


between Yˆ & Y, and X & Y, defined as:
cov( Yˆ , Y ) cov( X , Y )
r yˆ , y =
σ Yˆ σ & rx , y = , respectively.
Y σ XσY

RSS = (1 − R )∑ y
Note: 2 2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 55
2008/09

To sum up:
Use Yˆi = αˆ + βˆX i to estimate E[Y | X i ] = α + β X i .
n n n
OLS: min i∑=1 ∑
e = 2
i (Yi − Yˆi ) 2
= ∑ i (Y − ˆ
α − ˆ
β X i )2

i =1 i =1
α̂, β̂
∑ xy
β =
ˆ
2 ˆ
α = Y − ˆ
βX
HASSEN ∑ x ABDA
Given the assumptions of the linear regression
model, the estimators αˆ and βˆ have the smallest
variance of all linear and unbiased estimators of
α and β
σ 2 ∑ i
2 2 2
1 X X
var( β ) =
ˆ var(αˆ ) = σ 2 ( + ) =σ
∑ x i2 n ∑ xi 2
n∑55xi2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 56
2008/09

To sum up …
2
σ σ
∑ y = ∑ yˆ + ∑e
2 2
. 2 2
i i i var( βˆ ) = =
∑x 2
28
TSS = ESS + RSS
i

≈ 0 . 0357 σ 2

R 2
=
ESS
=
∑ yˆ 2
1 X 2
var( αˆ ) = σ 2
( + )
TSS ∑y 2
HASSEN ABDA n ∑ x i
2

∑ y = β̂ ∑ xy
ˆ 2
= σ ( 2 1
10
+
64
28
)

∑ ˆ
y 2
= β
ˆ 2 x2
∑ ≈ 2 . 3857 σ 2

RSS= (1− R )∑y 2 2


But, σ = ? 2
56
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 57
2008/09

An unbiased estimator for σ2

E ( RSS ) = E (∑ e ) = (n − 2)σ
2
i
2

Thus, if we define σˆ 2
=
∑ e i
2

, then :
n−2
1 2 ABDA1
HASSEN
E (σˆ ) = (
2
) E (∑ ei ) = ( )(n − 2)σ = σ
2 2

n−2 n−2

⇒ σˆ 2
=
∑ e i
2

is an unbiased estimator of σ . 2

n−2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 58
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis


Why is the Error Normality Assumption Important?
The normality assumption permits us to derive the
functional form of the sampling distributions of
αˆ , βˆ & σˆ 2.
Knowing the formHASSENof the sampling
ABDA distributions
enables us to derive feasible test statistics for the
OLS coefficient estimators.
These feasible test statistics enable us to conduct
statistical inference, i.e.,
1)to construct confidence intervals for α , β & σ 2.
2)to test hypothesis about the values of α, β &σ .
2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 59
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

ε
. i ~ N(0, ) σ 2
i⇒ Y ~ N(α + βX ,σ )i
2

σ 2 αˆ − α 2 ∑ i
X 2

βˆ ~ N (β , ) α
ˆ ~ N(α,σ ) ~ t n −2
∑i
x 2
∑ xi
2
s ˆ
e(αˆ )
HASSEN ABDA 2
βˆ − β seˆ(αˆ) = σˆ.
∑ iX
(
σ
) ∑x 2
i ~ N(0,1) 2
n ∑ xi

βˆ − β σˆ
~t seˆ(βˆ ) = ∑e 2
s eˆ ( βˆ )
n−2 2 σˆ = i
x
∑ i n−2
59
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 60
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

αααα
αααα
Confidence Interval for α and β :
ˆ−

αααα
P{−t n−2
α /2 ≤ ≤t n−2
α /2 } = 1−α
seˆ( ˆ )

αααα

αααα
HASSEN ABDA
100( 1 − α)% Two-Sided
αααα
::::

CI for
ˆ ± (t α/n−22 )seˆ( ˆ)
Similarly,
100( 1 − α)% Two-S ided
CI for β: β ± (tα / 2 ) seˆ( β )
ˆ n− 2 ˆ
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 61
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis


σˆ 2
(n − 2) 2 ~ χ n−2
2

σ
CI for σ 2: P{χ 2 ≤χ ≤χ 2 2
} = 1−α
1−(α / 2);df df (α / 2);df

σ
( n − 2)ABDA
ˆ 2
⇒ P{χ 2
1− (α / 2 ); ( n − 2 ) ≤HASSEN ≤ χ (α / 2 );( n − 2 ) } = 1 − α
2

σ 2

1 σ 2
1
⇒ P{ ≥ ≥ } = 1−α
χ 2
1− (α / 2 );( n − 2 ) (n − 2)σˆ 2
χ 2
(α / 2 );( n − 2 )
2
1 σ 1
⇒ P{ ≤ ≤ 2 } = 1− α
χ 2
(α / 2 );(n − 2 ) (n − 2 )σˆ 2
χ1−(α / 2 );(n− 2 )
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 62
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

CI for σ 2 (continued):
( n − 2 )σˆ 2
( n − 2 )σˆ 2
⇒ P{ 2 ≤σ ≤ 2
2
} = 1−α
χ (α / 2 ); n − 2 χ 1− (α / 2 ); n − 2
⇒ 100( 1 − α)% Two-S ided CI
HASSEN ABDAfo r σ 2
:

(n − 2)σˆ 2
(n − 2)σˆ 2
[ , ]
χ 2
(α / 2 );n − 2 χ 2
1−(α / 2 );n − 2
RSS RSS
[ , ]
OR
χ 2
(α / 2);n−2 χ2
1−(α / 2);n−2
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 63
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis


Let us continue with our earlier example.
We have: n = 10, αˆ = 3.6 , β = 0.75, R = 0.5181,
ˆ 2

var( αˆ ) ≈ 2 .3857 σ 2 , var(βˆ ) ≈ 0.0357σ 2 ,& ∑ ei2 = 14.65


2
σ is estimated by: σˆ =
2 2 e
∑ i 14.65
= = 1.83125
HASSENn − ABDA
2 8
⇒ σˆ = 1.83125 ≈ 1.3532
Thus, vâr(αˆ ) ≈ 2.3857(1.83125) ≈ 4.3688
⇒ seˆ(αˆ ) ≈ 4.3688 ≈ 2.09
vâr( βˆ ) ≈ 0.0357 (1.83125 ) ≈ 0.0654
⇒ seˆ(βˆ ) ≈ 0.0654 ≈ 0.256
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 64
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

αααα
::::
 95% CI for α and β :
95% CI for 1−α = 0.95⇒α = 0.05 ⇒ α / 2 = 0.025
3.6 ± (t 08.025 )( 2 .09 )
⇒ 95% CI for α :
= 3.6 ± ( 2 .306 )( 2 .09 )
= 3 .6 ± 4 . 8195 :::: [−1.2195, 8.4195]
HASSEN ABDA

95% CI for β
0.75 ± (t 08.025 )(0.256 ) ⇒ 95% CI for β :
= 0.75 ± ( 2.306 )(0.256 ) [0.1597, 1.3403]
= 0.75 ± 0.5903
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 65
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

 95% CI for σ 2 : (n − 2)σˆ 2


(n − 2)σˆ 2
[ , ]
σˆ = 1.83125
2
χ 2
(α / 2);n −2 χ 2
1−(α / 2);n −2

χ α / 2;n − 2 : χ 2
2
0.025;8 = 17.5

χ 2
: χ 2 HASSEN ABDA
0.975;8 = 2.18
1− (α / 2 ); n − 2

::::
⇒ 95% CI for σ 2

14.65 14.65
=[ ,
17.5 2.18
]
= [0.84, 6.72]
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 66
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

The confidence intervals we have constructed for


α , β & σ 2 are two-sided intervals.
Sometimes we want either the upper or lower limit
only, in which case we construct one-sided intervals.
For instance, let usHASSEN constructABDA
a one-sided (upper
limit) 95% confidence interval for β .
Form the t-table, t08.05 = 1.86 .
Hence, βˆ + t 08.05 .seˆ( βˆ ) = 0.75 + 1.86(0.256)
= 0.75 + 0.48 = 1.23
The confidence interval is (- ∞, 1.23].
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 67
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

 Similarly, lower limit:


βˆ − t 08.05 seˆ( βˆ ) = 0.75 − 1.86(0.256)
= 0.75 − 0.48 = 0.27
 Hence, the 95% CI is: [0.27, ∞).
HASSEN ABDA
Hypothesis Testing:
 Use our example to test the following hypotheses.
 Result: Yˆi = 3.6 + 0.75 X i
(2.09) (0.256)
1. Test the claim that sales doesn’t depend on
advertising expense (at 5% level of significance).
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 68
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

 H0: β = 0 against Ha: β ≠ 0 .


 Test statistic: t = βˆ − β 0.75 − 0
⇒ tc = = 2.93
seˆ( β )
c
ˆ 0.256
 Critical value: (tt = t-tabulated)
n−2
α = 0.05 ⇒ α / 2 = 0.025 t = tα / 2 = t 0.025 = 2.306
HASSENt ABDA 8

 Since t c > t t , we reject the null (the alternative is


supported). That is, the slope coefficient is
statistically significantly different from zero:
advertising has a significant influence on sales.
2. Test whether the intercept is greater than 3.5.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 69
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

 H0:α = 3.5 against Ha:α > 3.5.


 Test statistic: t = αˆ − α ⇒ t = 3.6 − 3.5 = 0.1 = 0.05
se(α )
c c
ˆ ˆ 2.09 2.09

 Critical value: (tt = t-tabulated)


HASSEN ABDA
At 5% level of significance (α = 0.05),
n−2
t t = tα =t 8
0.05 = 1.86
 Since t c < t t , we do not reject the null (the null is
supported). That is, the intercept (coefficient) is not
statistically significantly greater than 3.5.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 70
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis


3. Can you reject the claim that a unit increase in
advertising expense raises sales by one unit? If so,
at what level of significance?
 H0: β = 1 against Ha: β ≠ 1 .
 Test statistic: βˆ −HASSEN
β 0.75 − 1 − 0.25
ABDA
tc = ⇒ tc = = = −0.98
s eˆ ( β )
ˆ 0.256 0.256
 At α = 0.05, t 08.025 = 2.306 and thus H0 can’t be rejected.
 Similarly, at α = 0.10, 0.05 = 1.86 H0 can’t be rejected.
t 8

 At α = 0.20, t 08.10 = 1.397 and thus H0 can’t be rejected.


 At α = 0.50, t 08.05 = 0.706 H0 is rejected.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 71
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

 For what level of significance (probability) is the


value of the t-tabulated for 8 df as extreme as
t c = 0 .98 ?
i.e., find P for which P{ t > 0.98}P{t .> 0.98 or - t < −0.98} = ?
P{t > 0.706} = HASSEN P{t > 1.397} = 0.10
0.25 & ABDA
 0.98 is between the two numbers (0.706 and 1.397).
 So, P{ t > 0 .98} is somewhere between 0.25 & 0.10.
 1.397 – 0.706 = 0.691, and 0.98 is 0.98 – 0.706 =
0.274 units above 0.706. Thus, the P-value for 0.98
(P{t > 0 .98}) is ( 0.274)(0.25 − 0.10) units below 0.25.
0.691
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 72
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis

 That is, the P-value for 0.98 is 0.06 units below


0.25. i.e., P{t > 0.98} ≈ 0.25 − 0.06 ≈ 0.19.
 Hence, P{ t > 0.98} = 2 P{t > 0.98} ≈ 0.38.
 For our H0 to be rejected, the minimum level of
HASSEN ABDA
significance (the probability of Type I error) should
be as high as 38%. To conclude, H0 is retained!
 The p-value associated with the calculated sample
value of the test statistic is defined as the lowest
significance level at which H0 can be rejected.
 Small p-values constitute strong evidence against H0.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 73
2008/09

2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis


There is a correspondence between the confidence
intervals derived earlier and tests of hypotheses.
For instance, the 95% CI we derived earlier for β is:
(0.16 < β < 1.34).
Any hypothesis that says β = c , where c is in this
HASSEN ABDA
interval, will not be rejected at the 5% level for a
two-sided test.
For instance, the hypothesis β = 1 was not rejected,
but the hypothesis β = 0 was.
For one-sided tests we consider one-sided
confidence intervals.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 74
2008/09

2.7 Prediction with the Simple Linear Regression


The estimated regression equation Yˆi = αˆ + βˆX i is used
for predicting the value (or the average value) of Y
for given values of X.
Let X0 be the given value of X. Then we predict the
corresponding value Y of Y by: Yˆ = αˆ + βˆX
HASSEN
P ABDA P 0

The true value YP is given by: YP = α + βX 0 + ε P


Hence the prediction error is:
YˆP − YP = (αˆ − α ) + (βˆ − β ) X 0 − ε P
E(YˆP −YP ) = E(αˆ −α) + E(βˆ − β)X0 − E(εP )⇒ E(YˆP − YP ) = 0
 YˆP = αˆ + βˆX 0 is an unbiased predictor of Y. (BLUP!)
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 75
2008/09

2.7 Prediction with the Simple Linear Regression


The variance of the prediction error is:
var(YˆP − YP ) = var(αˆ − α ) + X 02 var(βˆ − β )
+ 2 X 0 cov(αˆ − α , βˆ − β ) + var(ε P )

var(YP − YP ) = σ
ˆ ∑ i
2
X 2

+
HASSENσ 2 X 2
0
ABDA − 2 X σ 2 X
+ σ 2
0
n∑ xi
2
∑ix 2
∑i
x 2

1 ( X − X ) 2
var(YP − YP ) = σ [1 + +
ˆ 2 0
]
n ∑xi 2

Thus, the variance increases the farther away the


value of X0 is from X , the mean of the observations
on the basis of which αˆ & βˆ have been computed.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 76
2008/09

2.7 Prediction with the Simple Linear Regression

That is, prediction is more precise for values nearer


to the mean (as compared to extreme values).
within-sample prediction (interpolation): if X0 lies
within the range of the sample observations on X.
out-of-sample prediction (extrapolation):
HASSEN ABDA
if X 0 lies
outside the range of the sample observations. Not
recommended!
Sometimes, we would be interested in predicting the
mean of Y, given X0. We use:YˆP = αˆ + βˆX P to predict
YP = α + βX P. (The same predictor as before!)
The prediction error is: YˆP −YP = (αˆ −α) + (βˆ − β)XP
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 77
2008/09

2.7 Prediction with the Simple Linear Regression

The variance of the prediction error is:


var(YP − YP ) = var(α − α ) + X 0 var(βˆ − β ) + 2X 0 cov(αˆ − α, βˆ − β )
ˆ ˆ 2

1 ( X − X ) 2
⇒ var(YˆP − YP ) = σ 2 [ + 0 2 ]
n
HASSEN ABDA
∑ xi
Again, the variance increases the farther away the
value of X0 is from X .
The variance (the standard error) of the prediction
error is smaller in this case (of predicting the
average value of Y, given X) than that of predicting
a value of Y, given X.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 78
2008/09

2.7 Prediction with the Simple Linear Regression


Predict (a) the value of sales, and (b) the average
value of sales, for a firm with an advertising expense
of six hundred Birr.
a. From Yˆi = 3.6 + 0.75Xi , at Xi = 6,
i = 3.6 + 0.75(6) = 8.1
Point prediction: YˆHASSEN ABDA
[Sales value | advertising of 600 Birr] = 8,100 Birr.
Interval prediction: 95% CI: t 0.025 = 2.306
8

1 ( X − X ) 2
1 ( 6 − 8) 2
se(YP ) = σ [1 + +
ˆ ˆ *
ˆ 2 0
] ⇒ s ˆ
e (Yˆ
P ) = 1.35 1 +
*
+
n ∑ xi 2
10 28
= 1.35(1.115) = 178.508
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 79
2008/09

2.7 Prediction with the Simple Linear Regression


Hence, 95% CI : 8.1 ± (2.306)(1.508)
[4.62,11.58]
b. From Yˆ = 3.6 + 0.75X i , at Xi = 6, Yˆi = 3.6 + 0.75(6) = 8.1
i

Point prediction:
HASSEN ABDA
[Average sales | advertising of 600 Birr] = 8,100 Birr.
Interval prediction: 95% CI: 1 (X − X )2
seˆ (YˆP* ) = σˆ 2 [ + 0
]
n ∑x 2
i
1 ( 6 − 8) 2
⇒ se (Yˆ ) = 1.35 + ⇒ se(YP ) = 1.35(0.493) = 0.667
ˆ
* *
P
10 28
ˆ

95% CI : 8.1 ± (2.306)(0.667) [ 6 . 56 ,9 . 64 ] 79


JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 80
2008/09

Notes on interpreting the coefficient of X in simple linear regression

1. Y = α + βX + ε ⇒ dY = β .dX ⇒ β =
dY
= slope
dX
β is the (AVERAGE) change in Y resulting from
a unit change in X.
α + βX +ε HASSEN ABDA
2. Y =e ⇒ lnY = α + βX +ε
1 dY )
⇒ d(lnY ) = β.dX ⇒ .dY = β.dX ⇒ β = (
Y = Relative ∆ in Y
Y dX Absolute ∆ in X
( dY ) × 100 %age ∆ in Y
⇒ β (×100) = Y = β ( ×100) is the (AVERAGE)
dX dX percentagechange in Y resul -
⇒ %age ∆ in Y = β .dX (× 100 ) ting from a unit change in X.
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 81
2008/09

Notes on interpreting the coefficient of X in simple linear regression


β
3. e
Y
= AX E ⇒ Y = α + β ln X + ε ;
α = ln( A) &ε = ln(E )
Absolute ∆ in Y
⇒β =
dY
=
dY
= β dX
d (ln X ) ( 1 )dX Relative ∆ in X ⇒ dY = .( × 100)
X 100 X
⇒ dY = (0.01β ).(% age ∆ in X )
HASSEN change
β ( ×0.01) is the (AVERAGE) ABDA in Y resulting from
a percentage change in X.
4. Y = AX e ⇒ lnY = α + β (ln X ) + ε ;α = ln A
β ε

d (ln Y ) dY / Y %age ∆ in Y
⇒β = = =
d (ln X ) dX / X %age ∆ in X
β is the (AVERAGE) percentage change in Y
= Elasticity
resulting from a percentage change in X. 81
JIMMA UNIVERSITY HASSEN A. CHAPTER 2 - 82
2008/09

HASSEN ABDA

STATA SESSION
2008/09 CHAPTER THREE
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 1

THE MULTIPLE LINEAR REGRESSION


3.1 Introduction: The Multiple Linear Regression
3.2 Assumptions of the Multiple Linear Regression
3.3 Estimation: The Method of OLS
3.4 Properties of OLS Estimators
3.5 Partial Correlations and Coefficients of
Multiple Determination

.
3.6 Statistical Inferences in Multiple Linear
Regression
3.7 Prediction with Multiple Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 2
2008/09
3.1 Introduction: The Multiple Linear Regression
Relationship between a dependent & two or more
independent variables is a linear function

Population Random
Population slopes
Y-intercept Error

Yi = β 0 + β1 X 1i + β 2 X 2i + • • • + β K X Ki + ε i

. Y = βˆ + βˆ X + βˆ X + • • • + βˆ X + e
i 0 1 1i 2 2i K

Residual
Dependent (Response) Independent (Explanatory)
variable (for sample) variables (for sample)
Ki i
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 3
2008/09
3.1
3.1 Introduction:
Introduction: The
The Multiple
Multiple Linear
Linear Regression
Regression
) What changes as we move from simple to
multiple regression?
1. Potentially more explanatory power with more
variables;
2. The ability to control for other variables; (and
the interaction of the various explanatory
variables: correlations and multicollinearity);

.
3. Harder to visualize drawing a line through
three or more (n)-dimensional space.
4. The R2 is no longer simply the square of the
correlation coefficient between Y and X.
3.1 Introduction: The HASSEN
JIMMA UNIVERSITY
2008/09 Multiple
A.
Linear Regression
CHAPTER 3 - 4

)Slope ( βj ):
Ceteris paribus, Y changes by βj for every 1 unit
change in X , on average.
j
)Y-Intercept (β0 ):
The average value of Y when all Xj s are zero.
(may not be meaningful all the time)
)A multiple linear regression model is defined to
be linear in the regression parameters rather

.than in the explanatory variables.


)Thus, the definition of multiple linear regression
includes polynomial regression.
e.g. Y = β + β X + β X + β X 2 + β X X + ε
i 0 1 1i 2 2i 3 1i 4 1i 2i i
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 5
2008/09
3.2
3.2 Assumptions
Assumptions of
of the
the Multiple
Multiple Linear
Linear Regression
Regression
) Assumptions 1 – 7 in Chapter Two.
1. E(ɛi|Xji) = 0. (for all i = 1, 2, …, n; j = 1, …, K)
2. var(ɛi|Xji) = σ2. (i ≠ s) (Homoscedastic errors)
3. cov(ɛi,ɛs|Xji,Xjs) = 0. (i ≠ s) (No autocorrelation)
4. cov(ɛi,Xji) = 0. Errors are orthogonal to the Xs.
5. Xj is non-stochastic, and must assume different
values.

.
6. n > K+1. (Number of observations > number of
parameters to be estimated). Number of
parameters is K+1 in this case ( β0, β1, …, βK )
7. ɛi ~N(0, σ2). Normally distributed errors.
3.2
3.2 Assumptions of the Multiple
Assumptions of the Multiple Linear
Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 6
2008/09
Regression
) Additional Assumption:
8. No perfect multicollinearity: That is, no exact
linear relation exists between any subset of
explanatory variables.
) In the presence of perfect (deterministic) linear
relationship between/among any set of the Xjs,
the impact of a single variable ( β j ) cannot be

. identified.
) More on multicollinearity in a later chapter!
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
The A.Method
Estimation:HASSEN Method of
of OLS
OLS
CHAPTER 3 - 7

The Case of Two Regressors (X1 and X2)


Yˆi = βˆ0 + βˆ1 X1i + βˆ2 X 2i Yˆi = βˆ0 + βˆ1 X1i + βˆ2 X 2i
yˆ i = βˆ1 x1i + βˆ2 x2i ei = Yi −Yˆi = Yi −Y +Y −Yˆi = Yi −Y − (Yˆi −Y )

⇒ ei = yi − yˆ i RSS = ∑e = ∑( yi − β1 x1i − β2 x2i )


2
i
ˆ ˆ 2

) Minimize the RSS with respect to βˆ1 & βˆ2 .


∂(RSS)
= 2∑( yi − βˆ1x1i − βˆ2 x2i )(−x ji ) = 0; j = 1,2 ⇒ −2∑ ei x ji = 0
∂β j
ˆ
1. ∑ ( y i − βˆ1 x1i − βˆ 2 x 2i )( x1i ) = 0
. ⇒ ∑ y i x 1i = βˆ1 ∑ x 12i + βˆ 2 ∑ x 1i x 2 i
2. ∑ ( y i − βˆ1 x1i − βˆ 2 x 2i )( x 2i ) = 0
⇒ ∑ yi x2i = β1 ∑ x1i x2i + β 2 ∑ x2i
ˆ ˆ 2
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation:
Estimation: The
The Method
Method of
HASSEN A.
of OLS
OLS CHAPTER 3 - 8

⎛ ∑ y i x1i ⎞ ⎡ ∑ x 1i2 ∑x x ⎤ ⎛ βˆ 1 ⎞
⇒ ⎜⎜ ⎟=⎢ 1i 2i
⎥ ⎜⎜ ˆ ⎟⎟
⎝ ∑ i 2i ⎠ ⎢⎣∑ x 2i x 1i

y x ∑x 2
2i ⎥⎦ ⎝ β 2 ⎠
F = A • β̂
Solve for the coefficients:
Determinant: A = ∑ 1i ∑ x1i x2i
x 2

= ∑ x12i ∑ x22i − (∑ x1i x 2i ) 2


∑ x1i x2i ∑ 2i
x 2

To find β̂ 1, substitute the first column of A by


A1
elements of F, then find |A1|, and finally find .

.
A1 =

β1 =
ˆ
∑ yi x1i
∑y x
A1
A
i 2i

=
∑x
∑x1i x2i
2
2i
= (∑ yi x1i )(∑x ) − (∑x1i x2i )(∑ yi x2i )
2
2i

(∑ y i x1i )(∑ x 2i ) − (∑ x1i x 2i )(∑ y i x 2i )


2

(∑ x1i )(∑ x 2i ) − (∑ x1i x 2i )


2 2 2
A
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
Estimation: The Method
Method of
HASSEN A.
of OLS
OLS
CHAPTER 3 - 9

Similarly, to find β̂2, substitute the second column


of A by elements of F, then find |A2|, and finally
A2
find .
A

A2 =
∑ x ∑ yx 2
1i i 1i
= (∑ yi x2i )(∑x12i ) − (∑x1i x2i )(∑ yi x1i )
∑x x ∑y x 1i 2i i 2i

A2 (∑ y i x 2i )(∑ x ) − (∑ x1i x 2i )(∑ y i x1i )


2

. β2 =
ˆ
A
=
1i

(∑ x1i )(∑ x 2i ) − (∑ x1i x 2i )


2 2 2

βˆ 0 = Y − βˆ1 X 1 − βˆ 2 X 2
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
TheA.Method
Estimation:HASSEN Method of
of OLS
OLS
CHAPTER 3 - 10

The Case of K Explanatory Variables


) The number of parameters to be estimated:
K+1 ( β0 , β1, β2 ,…, β K).

Y1 = βˆ0 + βˆ1 X11 + βˆ2 X 21 + …+ βˆK X K1 + e1


Y = βˆ + βˆ X + βˆ X + …+ βˆ X + e
2 0 1 12 2 22 K K2 2

.Y3 = βˆ0 + βˆ1 X13 + βˆ2 X 23 + …+ βˆK X K 3 + e3

Yn = βˆ0 + βˆ1 X1n + βˆ2 X 2n + …+ βˆK X Kn + en


JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 11
2008/09
3.3
3.3 Estimation:
Estimation: The
The Method
Method of
of OLS
OLS
.
⎡ Y1 ⎤ ⎡1 X 11 X 21 X 31 … X K1 ⎤ ⎡ βˆ 0 ⎤ ⎡e 1 ⎤
⎢Y ⎥ ⎢1 ⎥ ⎢ˆ ⎥ ⎢ ⎥
⎢ 2⎥ ⎢ X12 X 22 X 32 … X K2 ⎥ ⎢ β 1 ⎥ ⎢e 2 ⎥
⎢Y3 ⎥ = ⎢1 X13 X 23 X 33 … X K3 ⎥ • ⎢ βˆ 2 ⎥ + ⎢e 3 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣Yn ⎥⎦ ⎢⎣1 X1n X 2n X 3n … X Kn ⎥⎦ ⎢⎣βˆ K ⎥⎦ ⎢⎣e n ⎥⎦
.
n ×1

Y = Xβ + e
ˆ
n × ( K + 1) (K +1) ×1 n ×1
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 12
2008/09
3.3
3.3 Estimation:
Estimation: The
The Method
Method of
of OLS
OLS
.
⎡e 1 ⎤ ⎡ Y1 ⎤ ⎡1 X 11 X 21 X 31 … X K1 ⎤ ⎡ βˆ 0 ⎤
⎢e ⎥ ⎢Y ⎥ ⎢1 ⎥ ⎢ˆ ⎥
⎢ 2⎥ ⎢ 2⎥ ⎢ X 12 X 22 X 32 … X K2 ⎥ ⎢ β 1 ⎥
⎢e 3 ⎥ = ⎢Y3 ⎥ − ⎢1 X 13 X 23 X 33 … X K3 ⎥ * ⎢ βˆ 2 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣e n ⎥⎦ ⎢⎣Yn ⎥⎦ ⎢⎣1 X 1n X 2n X 3n … X Kn ⎥⎦ ⎢⎣βˆ K ⎥⎦
. e = Y − Xβ̂
3.3
3.3 Estimation: The
Estimation: The Method
Method of
of OLS
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 13
2008/09
OLS
⎛ e1 ⎞
⎜ ⎟
⎜ e2 ⎟
RSS = ∑ei2 = e12 + e22 + ...+ en2 = (e1 e2 … en ).⎜ ⎟ ⇒RSS= e'e
⎜ ⎟
⎜e ⎟
⎝ n⎠

RSS = (Y − Xβˆ )'(Y − Xβˆ ) = Y'Y − Y'Xβˆ − βˆ ' X'Y + βˆ ' X'Xβˆ
Since Y' Xβˆ is a costant, Y' Xβˆ = (Y' Xβˆ )' = βˆ ' X' Y

.
⇒ RSS = Y'Y − 2βˆ ' X'Y + βˆ ' (X'X)βˆ

F.O.C. :
∂( RSS)
∂(βˆ )
=0 ⇒
∂(RSS )
∂(βˆ )
= −2X' Y + 2X' Xβˆ = 0

⇒ −2X' (Y − Xβˆ ) = 0
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
Estimation: The Method
Method of
HASSEN A.
of OLS
OLS
CHAPTER 3 - 14

⎡ 1 1 … 1 ⎤ ⎛ e ⎞ ⎛ 0⎞
⎢ X11 X12 … X1n ⎥ ⎜ 1 ⎟ ⎜ ⎟
⎢X ⎥ ⎜ e2 ⎟ ⎜ 0⎟

. X'e = 0 ⇒ ⎢ 21 X 22 … X 2n ⎥.⎜ ⎟ = ⎜ ⎟
⎢ ⎥⎜ ⎟ ⎜ ⎟
⎢X ⎜ ⎟ ⎜ ⎟
⎣ K1 X K 2 … X Kn ⎥⎦ ⎝ n ⎠ ⎝ 0⎠
e

1. ∑ ei = 0 2. ∑ ei X ji = 0. ( j = 1,2,..., K )

. X'e = X'(Y − Xβˆ ) = 0 ⇒ X' Xβˆ = X' Y


ˆβ = ( X' X ) − 1 X' Y
JIMMA UNIVERSITY
2008/09 3.3
3.3 Estimation: The
TheA.Method
Estimation:HASSEN Method of
of OLS
OLS CHAPTER 3 - 15

⎛ βˆ0 ⎞ ⎡ 1 ⎤ ⎡1 X11 … X K1 ⎤
⎜ ⎟ 1 … 1 ⎢ ⎥
⎜ β
ˆ ⎟ ⎢ X X … X 1n ⎢
⎥ 1 X … X K2 ⎥
βˆ = ⎜ 1 ⎟ X/ X = ⎢ 11 12
⎥.
12

⎢ ⎥⎢ ⎥
⎜ ⎟
⎜ βˆ ⎟ ⎢⎣ X K1 X K 2 … X Kn ⎥⎦ ⎢1 X … X ⎥
⎝ K⎠ ⎣ 1n 2n ⎦

⎡ n ∑X … ∑XK ⎤
⎢ ⎥
1

⇒X X=
/ ⎢ ∑ X1 ∑X 2
1 …∑ X 1 X K ⎥
⎢ ⎥
⎢ ⎥
⎢⎣∑ X K ∑X K X1 … ∑ X K2 ⎥⎦

. ⎡ 1
⎢ X 11
X Y=⎢
/


⎢⎣ X K 1
1
X 12

X K2


1 ⎢ ⎥⎤

⎥⎢ ⎥
Y1 ⎤
X 1 n ⎥ ⎢Y 2 ⎥

X Kn ⎥⎦ ⎢Y ⎥
⎣ n⎦
⎡ ∑Y ⎤
⎢ YX ⎥
⇒ X/Y = ⎢

∑ 1⎥
⎢ YX ⎥
⎣∑

K ⎦
3.3
3.3 Estimation:
Estimation: The
The Method
Method of
of OLS
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 16
2008/09
OLS
. ˆβ = (X' X)−1 (X' Y)

-1
⎛ βˆ 0 ⎞ ⎡ n ∑X ∑X … ∑X ⎤ ⎛ ∑Y ⎞
⎜ ⎟ ⎢ 1 2 K
⎥ ⎜ ⎟
⎜ βˆ1 ⎟ ⎢ ∑ X 1 ∑X ∑X X … ∑X X ⎜ ∑ YX 1 ⎟
2
1 1 2 1 K ⎥
⎜ ˆ ⎟=⎢ X ⎜ YX ⎟
⎜ β2 ⎟ ⎢ ∑ 2 ∑X X ∑X ∑X X ⎜∑ 2 ⎟
2
… K


2 1 2 2

⎜ ⎟ ⎢ ⎥ ⎜ ⎟
⎜⎜ ˆ ⎟⎟ ⎢ 2 ⎥ ⎜⎜ ⎟⎟
⎝ β K ⎠ ⎣∑ X K ∑X X1 ∑X X2 … ∑ K ⎦
X ⎝ ∑ YX K ⎠

.
(K+1)×1
K K

(K +1)×(K +1) (K+1)×1


JIMMA UNIVERSITY
2008/09 3.4 PropertiesHASSEN
of OLSA.
Estimators CHAPTER 3 - 17

)Given the assumptions of the classical linear


regression model (in Section 3.2), the OLS
estimators of the partial regression coefficients
are BLUE: linear, unbiased and have minimum
variance in the class of all linear unbiased
estimators – the Gauss-Markov Theorem.
)In cases where the small-sample desirable
properties (BLUE) may not be found, we look for

.asymptotic (or large-sample) properties, like


consistency and asymptotic normality (CLT).
)The OLS estimators are consistent:
p lim (βˆ − β) = 0 & lim var(βˆ ) = 0
n →∞ n→∞
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 18

)In the multiple regression equation with 2


regressors (X1 and X2), Yi = βˆ0 + βˆ1 X1i + βˆ2 X2i + ei , we
can talk of:
¾the joint effect of X1 and X2 on Y, and
¾the partial effect of X1 or X2 on Y.
)The partial effect of X1 is measured by β̂1 and the
partial effect of X2 is measured by β̂ 2 .

.
)Partial effect: holding the other variable constant
or after eliminating the effect of the other variable.
)Thus, β̂ 1 is interpreted as measuring the effect of
X1 on Y after eliminating the effect of X2 on X1.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 19

)Similarly, β̂ 2 measures the effect of X2 on Y after


eliminating the effect of X1 on X2.
)Thus, we can derive the estimatorβ̂1 of β1 in two
steps (by estimating two separate regressions):
)Step 1: Regress X1 on X2 (an auxiliary regression
to eliminate the effect of X2 from X1). Let the
regression equation be: X 1 = a + b12 X 2 + e12

.Or, in deviation form: x1 = b12x2 + e12


Then, b = ∑x1x2
12
∑x2 2

)e12 is part of X1 which is free from the influence


of X2.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 20

)Step 2: Regress Y on e12 (residualized X1). Let the


regression equation be: y = b ye e12 + v in
deviation form.
∑ ye12
Then, b ye = e 2
∑ 12
) b ye is the same as β̂1in the multiple regression,
y = βˆ1x1 + βˆ2 x2 + e. i.e., b ye = βˆ1
)Proof: (You may skip the proof!)

. b ye

⇒ bye
=

∑e
ye

=
=2

12 y( x − b x )
∑ (x − b x )
12

∑ yx − b ∑ yx
1

∑ x + b ∑ x − 2b ∑ x x
2
1
2
12
1

12
2
2
12 2

12 2
2

12
2

1 2
But, b12 =

∑x
xx 1 2
2
2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 21

∑ xx
∑ yx − ( x )∑ yx
1 2
.

1 2 2

⇒b =
2

∑ ∑
ye
xx xx
∑ x + ( x ) ∑ x − 2( x )∑ x x
2 1 2 2 2 1 2

∑ ∑
1 2 2 2 1 2
2 2

[∑ x22 ∑ yx1 − ∑ x1 x2 ∑ yx2 ]

⇒ bye =
∑2 x 2

(∑ x x ) 2
(∑ x x ) 2

∑ x1 + −2
2 1 2 1 2

∑2 x 2
∑2 x 2

.
bye =
[∑ x22 ∑ yx1 − ∑ x1 x2 ∑ yx2 ]

[∑ x12 ∑ x22 − (∑ x1 x2 ) 2 ]
2
b =

∑2
x 2
∑x

2
x ∑ yx − ∑x x ∑ yx
2
2

∑x ∑x − (∑x x )
ye 2
1
1

⇒ bye = β̂1
2
2
1 2

1 2
2
2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 22

)Alternatively, we can derive the estimator βˆ1 of β 1


as follows:
)Step 1: regress Y on X2, & save the residuals, ey2.
1. y = by2 x2 + ey2 …... [ey2 = residualized Y]
)Step 2: regress X1 on X2, & save the residuals, e12.
2. x1 = b12x2 + e12 …… [e12 = residualized X1]
)Step 3: regress ey2 (that part of Y cleared of the

.influence of X2) on e12 (part of X1 cleared of the


influence of X2). 3. e = α e + u
y2 12 12

Then, α 12 in regression (3) = βˆ 1 in y = βˆ1 x1 + βˆ 2 x 2 + e!


3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 23

) Suppose we have a dependent variable, Y, and


two regressors, X1 and X2.
) Suppose also: y1 and y 2 are the squares of the
r 2
r 2

simple correlation coefficients between Y & X1


and Y & X2, respectively.
) Then,
r y12 = the proportion of TSS that X1 alone explains.

.r y22 = the proportion of TSS that X2 alone explains.


) On the other hand, R y2•12 is the proportion of the
variation in Y that X1 & X2 jointly explain.
) We would also like to measure something else.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 24

For instance:
a) How much does X2 explain after X1 is already
included in the regression equation? Or,
b) How much does X1 explain after X2 is included?
) These are measured by the coefficients of partial
determination: ry 2•1 and r y21• 2 , respectively.
2

) Partial correlation coefficients of the first order:

. ry1•2 & ry 2•1.


) Order = number of X's already in the model.

ry1•2 =
ry1 − ry2r12
(1− r )(1− r )
2
y2
2
12
ry2•1 =
ry2 − ry1r12
(1− r )(1− r )
2
y1
2
12
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 25

On Simple and Partial Correlation Coefficients


1. Even if ry1 = 0, ry1.2 will not be zero unless ry2 or
r12 or both are zero.
2. If ry1 = 0; and ry2 ≠ 0, r12 ≠ 0 and are of the same
sign, then ry1.2 < 0, whereas if they are of
opposite signs, ry1.2 > 0.
Example: Let Y = crop yield, X1 = rainfall, X2 =

. temperature. Assume: ry1 = 0 (no association


between crop yield and rainfall); ry2 > 0 & r12 <
0. Then, ry1.2 > 0, i.e., holding temperature
constant, there is a positive association between
yield and rainfall.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 26

3. Since temperature affects both yield & rainfall,


in order to find out the net relationship between
crop yield and rainfall, we need to remove the
influence of temperature. Thus, the simple
coefficient of correlation (CC) is misleading.
4. ry1.2 & ry1 need not have the same sign.
5. Interrelationship among the 3 zero-order CCs:

. 0 ≤ r + r + r − 2ry1ry 2 r12 ≤ 1
2
y1
2
y2
2
12
6. ry2 = r12 = 0 does not mean that ry1 = 0.
Y & X1 and X1 & X2 are uncorrelated does not
mean that Y and X1 are uncorrelated.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 27

) The partial r , measures the (square of the)


r2,
2
y 2•1
mutual relationship between Y and X2 after the
influence of X1 is eliminated from both Y and X2.
) Partial correlations are important in deciding
whether or not to include more regressors.
e.g. Suppose we have: two regressors (X1 & X2);
ry2 = 0.95; and ry22•1 = 0.01.
2

) To explain Y, X2 alone can do a good job (high

. simple correlation coefficient between Y & X2).


) But after X1 is already included, X2 does not add
much – X1 has done the job of X2 (very low
partial correlation coefficient between Y & X2).
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 28

) If we regress Y on X1 alone, then we would


have: RSS SIMP = (1 − R y2•1 )∑ y 2
i.e., of the total variation in Y, an amount =
2 2

(1 − Ry•1 ) yi remains unexplained (by X1 alone).
) If we regress Y on X1 and X2, the variation in Y
(TSS) that would be left unexplained is:
RSSMULT = (1 − R 2
y •12 )∑ y 2

.
) Adding X2 to the model reduces the RSS by:
RSSSIMP − RSSMULT = (1 − R

= (R
y•1 ∑
2
)
2
y 2

y•12
− (1 − Ry•12 ∑
2

− R )∑ y
2
y•1
) y
2
2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 29

) If we now regress that part of Y freed from the


effect of X1 (residualized Y) on the part of X2
freed from the effect of X1 (residualized X2), we
will be able to explain the following proportion
of the RSSSIMP:
(R 2
y •12 − R )∑ y2
y •1
2
i R 2
y •12 −R 2
y •1
r
2
= =
(1 − R )∑ y
y 2•1 2
y •1
2
i 1 − R y2•1

.
) This is the Coefficient of Partial Determination
(square of the coefficient of partial correlation).
) We include X2 if the reduction in RSS (or the
increase in ESS) is significant.
) But, when exactly? We will see later!
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 30

) The amount y•12


R − R y•1 ∑ i represents the
y
2 2 2
( )
incremental contribution of X2 in explaining the
TSS. (R2 − R2 ) = (1 − R2 )r 2
y•12 y•1 y•1 y 2•1

4. the proportion of the


1. proportion of
incremental contribution
∑y 2
i explained by
of X 2 in explaining the

.
X 1 & X 2 jointly

2. proportion of ∑y 2
i
unexplained part of

3. proportion of
explained by X 1 alone X leaves unexplained
1
∑ i that
y 2
∑ i
y 2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 31

) Coefficient of Determination (in Simple Linear


Regression): 2 βˆ ∑ xy
R = Or , R =
2
β ∑
ˆ 2 x2
∑ y 2
∑y 2

) Coefficient of Multiple Determination:


K n

βˆ1 ∑x1 y + βˆ 2 ∑x2 y ∑ j ∑ x ji yi }


{βˆ
Ry2•12 = R 2 = Ry2•12...K =
j =1 i =1

∑y 2
∑ i
y 2
n

.
) Coefficients of Partial Determination:

r 2
y 2•1 =
R 2
y•12

1− R
−R
2
y•1
2
y•1
r2
y1•2 =
R 2
y •12

1− R
−R
2
i =1

2
y •2

y •2
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 32

)The coefficient of multiple determination (R2)


measures the proportion of the variation in the
dependent variable explained by (the set of all the
regressors in) the model.
)However, the R2 can be used to compare the
goodness-of-fit of alternative regression
equations only if the regression models satisfy
two conditions.
1) The models must have the same dependent

. variable.
Reason: TSS, ESS, and RSS depend on the units
in which the regressand Yi is measured.
For instance, the TSS for Y is not the same as the
TSS for log(Y).
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 33

2) The models must have the same number of


regressors and parameters (the same value of K).
Reason: Adding a variable to a model will never
raise the RSS (or, will never lower ESS or R2)
even if the new variable is not very relevant.
)The adjusted R-squared, R 2 , attaches a penalty to
adding more variables.
)It is modified to account for changes/differences

. in degrees of freedom (df): due to differences in


number of regressors (K) and/or sample size (n).
)If adding a variable raises R 2 for a regression,
then this is a better indication that it has
improved the model than if it merely raises R 2.
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 34

∑ yˆ 2
∑ e 2
[ ∑ e 2
]
R 2
= = 1− n − (K + 1)
R 2
= 1−
∑y 2
∑y 2
[∑
y 2
]
n−1
(Dividing TSS and RSS by their df).
)K + 1 represents the number of parameters to be
estimated. e2
R 2
= 1−[
∑ •
n −1
]
∑y 2
n − K −1
n −1
. R = 1− (1− R ) • (
2

1− R = (1− R ) • (
2 n −1
2

n − K −1
2

n − K −1
)
)
As long as K ≥ 1,
1− R 2 > 1− R2 ⇒ R 2 < R2
In general, R 2 ≤ R 2
As n grows larger (relative
to K ), R 2 → R 2 .
3.5 Partial Correlations and
JIMMA UNIVERSITY
2008/09 Coefficients of Determination
HASSEN A. CHAPTER 3 - 35

1. While R 2 is always non-negative, R 2can be


positive or negative.
2. R. 2 can be used to compare the goodness-of-fit of
two regression models only if the models have
the same regressand.
3. Including more regressors reduces both the RSS
and df; and R 2 raises only if the former effect
dominates.

.
4. R. 2 should never be the sole criterion for choosing
between/among models:
) Consider expected signs & values of coefficients,
) Look for results consistent with economic theory
or reasoning (possible explanations), ...
Numerical Example:
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 36
2008/09

Y (Salary in X1 (Years of post X2 (Years of


'000 Dollars) high school Experience)
Education)
30 4 10
20 3 8
36 6 11

. ƩY = 150
24
40
4
8
ƩX1 = 25
9
12
ƩX2 = 50
Numerical Example:
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 37
2008/09

X1Y X2Y X12 X22 X1X2 Y2 n=5


ƩX1 = 25
120 300 16 100 40 900
ƩX2 = 50
60 160 9 64 24 400 ƩY = 150
216 396 36 121 66 1296 ƩYX1=812
ƩYX2=1567
96 216 16 81 36 576
ƩX1X2=262

. 320 480 64 144 96 1600 ƩX 2 = 141


1
ƩX1Y ƩX2Y ƩX12 ƩX22 ƩX1X2 ƩY2 = ƩX22 = 510
= 812 = 1552 = 141 = 510 = 262 4772
ƩY2 = 4772
βˆ = ( X' X ) X' Y −1
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 38
2008/09

−1
⎛ βˆ ⎞ ⎡ n
⎜ ⎟ ⎢
0 ∑ X ∑ X 1 2 ⎤ ⎛ ∑Y ⎞
⎥ ⎜ ⎟
⎜ βˆ ⎟ = ⎢∑ X
1 ∑ X1 ∑ X X 2
1 1 2 ⎥ • ⎜ ∑YX1 ⎟
⎜⎜ ˆ ⎟⎟ ⎢ ⎜ YX ⎟
β
⎝ ⎠
2 ⎣∑ X ∑ X X
2 ∑ X 1 2
2 ⎥
2 ⎦ ⎝∑ 2 ⎠
⎛ βˆ 0 ⎞ ⎡ 5 25 50 ⎤ −1 ⎛ 150 ⎞
⎜ ⎟ ⎜ ⎟
⎢ ⎥
⎜ βˆ 1 ⎟ = 25 141 262 • ⎜ 812 ⎟
⎜⎜ ˆ ⎟⎟ ⎢ ⎥
⎜ 1552 ⎟
β ⎢ 50 262 510 ⎥
⎝ 2⎠ ⎣ ⎦ ⎝ ⎠
⎛ βˆ 0 ⎞ ⎛ - 23.75 ⎞

.
⎛βˆ 0 ⎞ ⎡40.825 4.375 - 6.25⎤ ⎛ 150⎞
⎜ ⎟
⎜⎜ ˆ ⎟⎟ ⎢
β
⎝ 2⎠ ⎣

⎢ - 6.25 - 0.75 1



⎜βˆ 1 ⎟ = 4.375 0.625 - 0.75 •⎜ 812⎟

⎜1552⎟
⎦ ⎝ ⎠
⎜ ⎟ ⎜
⎟ ⇒ ⎜ βˆ 1 ⎟ = ⎜ - 0.25
⎜⎜ ˆ ⎟⎟ ⎜

Ŷ = −23.75 − 0.25 X1 + 5.5 X 2


β
⎝ ⎠ 2 ⎝ 5.5




)One more year of experience, after controlling
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 39
2008/09

for years of education, results in $5500 rise in


salary, on average.
)Or, if we consider two persons with the same
level of education, the one with one more year of
experience is expected to have a higher salary of
$5500.
)Similarly, for two people with the same level of

.experience, the one with an education of one


more year is expected to have a lower annual
salary of $250.
)Experience looks far more important than
education (which has a negative sign).
) The constant term - 23.75 is the salary one
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 40
2008/09

would get with no experience and no education.


) But, a negative salary is impossible.
) Then, what is wrong?
1. The sample must have been drawn from a
subgroup. We have persons with experience
ranging from 8 to 12 years (and post high
school education ranging from 3 to 8 years). So

. we cannot extrapolate the results too far out of


this sample range.
2. Model specification: is our model correctly
specified (variables, functional form); does our
data set meet the underlying assumptions?
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 41

1. TSS = ∑ y 2 = ∑ Y 2 − nY 2
2008/09

TSS = 4772 − 5(30) 2


⇒ TSS = 272
2. ESS = ∑ yˆ = ∑ 2
(β x + β x )
ˆ
1 1
ˆ 2
2 2

ESS = β1 ∑ x1 + β2 ∑ x2 + 2 βˆ1 βˆ 2 ∑ x1 x2
ˆ 2 2 ˆ 2 2

ESS = βˆ12( ∑X12 − nX12 ) + βˆ22( ∑X 22 − nX 22 )


+ 2βˆ1 βˆ2( ∑X1 X 2 − nX1 X 2 )
.
ESS = ( −0.25) 2 [141 − 5(5) 2 ] + (5.5) 2 [510 − 5(10) 2 ]
+ 2( −0.25)(5.5) [262 − 5(5)(10)]
⇒ ESS = 270.5
OR : ESS = βˆ1 ∑ yx1 + βˆ2 ∑ yx2
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 42
2008/09

ESS = βˆ1 (∑YX1 − nX1Y) + βˆ2( ∑YX2 − nX 2Y)


⇒ ESS = − 0 .25 ( 62 ) + 5 .5( 52 ) = 270 .5
3. RSS = TSS − ESS ⇒ RSS = 272 − 270.5
⇒ RSS = 1.5
ESS 270.5
4. R = 2
= ⇒ R = 0.9945
2
TSS 272

.Our model (education and experience together)


explains about 99.45% of the wage differential.

5. R = 1 −
2 RSS (n − K − 1)
TSS (n − 1)
= 1−
1.5 2
272 4
⇒ R 2
= 0.9890
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 43

Regressing Y on X 1 :
2008/09

βˆ y1 =
∑ yx 1
=
∑ YX 1 − nX 1Y
=
62
= 3.875
∑x 2
1 ∑X 1
2
− nX 1
2
16
ESS SIMP βˆ y •1 ∑ yx1 3.875 × 62
6. R 2
= = = = 0.8833
∑y
y •1
TSS 2
272
RSSSIMP = (1 − 0.8833)(272) = 0.1167(272) = 31.75
X 1 (education ) alone explanis about 88.33% of the difference s

.
in wages, and leaves about 11.67% ( = 31.75) unexplaine d.

7. R
(R 2
2
y•12

y •12
−R
2
y •1
2
y•1 = 0.9945 − 0.8833 = 0.1112
− R )∑ y = 0.1112(272) = 30.25
2
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 44
2008/09

X 2 (experience) enters the wage equation with an


extra (marginal) contribution of explaining about
11.12% ( = 30.25) of the total variation in wages.
Note that this is the contribution of the part of X 2 which
is not related to (free from the influence of) X 1 .

R 2
y •12 −R 2
y •1 0.9945 − 0.8833
8. r 2
y 2•1 = = = 0.9528

. 1− R 2
y •1

Or, X 2 (experienc e) explains about 95.28%


1 − 0.8833

( = 30.25) of the wage differenti al that X 1 has


left unexplaine d ( = 31.75).
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression CHAPTER 3 - 45

) The case of two regressors (X1 & X2):

ε i ~ N(0,σ ) 2

σ 2
ˆ ˆ
β1 ~ N( β1 , var(β1 )); var(βˆ )=
∑x (1 − r )
1 2 2
1i 12

σ 2
βˆ2 ~ N (β 2 , var(βˆ2 )); var(βˆ2 ) =
∑x (1 − r )
.βˆ0 ~ N(β0 , var(βˆ0 ));
var(β0 ) =
ˆ σ
n
2
2
2i

+ X 22 var(βˆ1 ) + X12 var(βˆ2 ) + 2X1 X 2 cov(βˆ1 , βˆ2 )


2
12
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 46

− σ r 2 2
cov(βˆ1 , βˆ2 ) = 12

∑ x1i x2i (1 − r12 )


2

( ∑ x 1i x 2 i ) 2

r2
=
∑x ∑x
12 2 2
1i 2i

∑ 1i 12 ) is the RSS from regressing X 1 on X 2 .


x 2
(1 − r 2

.
∑x
σˆ =
RSS
2

n−3
2
2i (1 − r ) is the RSS from regressing X 2 on X 1 .
2
12

is an unbiased estimator of σ .
2
3.6
3.6 Statistical
Statistical Inferences
JIMMA UNIVERSITY
2008/09 in
in Multiple
InferencesHASSENMultiple
A.
Linear
Linear Regression
Regression
CHAPTER 3 - 47

−1
⎡ n ∑ X1 … ∑ XK ⎤
⎢ ⎥
2⎢∑ 1
X ∑ X1 2
…∑ X1 X K ⎥
var− cov(β) = σ (X X) = σ
ˆ 2 / -1
⎢ ⎥
⎢ 2 ⎥
⎢⎣∑ X K ∑ X K X1 … ∑ X K ⎥⎦
−1
⎡ n ∑ X1 … ∑ XK ⎤
⎢ ⎥
2⎢∑ ∑ …∑ X1 X K ⎥
2
∧ X1 X1
var − cοο(β) = σˆ
ˆ
⎢ ⎥
⎢ ⎥
∑ X K ∑ X K X 1 … ∑ X K ⎥⎦
.
)Note that: ⎢

(a) (X'X)-1 is the same matrix we use to derive the
OLS estimates, and
(b) σˆ 2 = RSS in the case of two regressors.
n−3
2
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 48

) In the general case of K explanatory variables,


RSS is an unbiased estimator of σ 2.
σˆ =
2

n − K −1
Note:
) Ceteris paribus, the higher the correlation
coefficient between X1 & X2 ( r12 ), the less
precise will the estimates βˆ1 & βˆ2 be, i.e., the CIs

. for the parameters β1 & β 2 will be wider.


) Ceteris paribus, the higher the degree of
variation of the Xjs (the more Xjs vary in our
sample), the more precise will the estimates be –
narrow CIs for population parameters.
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression CHAPTER 3 - 49

) The above two points are contained in:


σ2
βˆ j ~ N (β j , ); ∀ j = 1,2,..., K .
RSS j
where RSSj is the RSS from an auxiliary regres-
sion of Xj on all other (K–1) X's and a constant.
) We use t test to test about single parameters and
single linear functions of parameters.

.
) To test hypotheses about & construct intervals
for individual β j use: ˆ
βj −βj
seˆ(βˆ j )
*

~ tn−K−1;∀j = 0,1,...,K.
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 50

) Tests about and interval estimation of the error


variance σ are based on:
2

RSS (n − K − 1)σ̂ 2
2
= 2
~ χ 2
n− K −1
σ σ
) Tests of several parameters and several linear
functions of parameters are F-tests.
Procedures for Conducting F-tests:

.
1. Compute the RSS from regressing Y on all Xjs
(URSS=Unrestricted Residual Sum of Squares).
2. Compute the RSS from the regression with the
hypothesized/specified values of parameters (β s)
(RRSS = Restricted RSS).
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 51

3. Under H0 (if the restriction is correct)


(RRSS − URSS) / J (RU2 − RR2 )/J
~ FJ ,n−K −1 ~ FJ,n−K −1
URSS/(n − K − 1) (1 − RU )/(n− K − 1)
2

where J is the number of restrictions imposed.


If F-calculated is greater than the F-tabulated,
then the RRSS (is significantly) greater than the
URSS, and thus we reject the null.

.
) A special F-test of common interest is to test the
null that none of the Xs influence Y (i.e., that
our regression is useless!):
Test H0: β1 = β 2 = ... = β K = 0 vs. H1: H0 is not true.
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 52

K n
URSS = (1 − R )∑ y = ∑ y − ∑{βˆ j ∑ x ji yi }.
2 2
i
2
i
j =1 i =1
RRSS = ∑ y . 2
i

( RRSS − URSS) / K R2 / K
⇒ = ~ FK ,n− K −1
URSS /(n − K − 1) (1 − R ) /(n − K − 1)
2

) With reference to our example on wages, test

. the following at the 5% level of significance.


a) β1 = 0 ; b) β 2 = 0 ; c) β 0 = 0;
d) the overall significance of the model; and
e) β1 = β 2 .
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 53

var− cov(β ) = σ ( X' X )


ˆ −1
2008/09
2

−1
⎡ 5 25 50 ⎤ ⎡40.825 4.375 - 6.25 ⎤
(X' X) −1 ⎢ ⎥
= ⎢25 141 262⎥ ⎢ = ⎢ 4.375 0.625 - 0.75 ⎥⎥
⎢⎣50 262 510⎥⎦ ⎢⎣ - 6.25 - 0.75 1 ⎥⎦
RSS 1.5
σ is estimated by : σˆ =
2 σˆ =
2
2 = 0.75
n − K −1 2

⎡ 40.825 4.375 - 6.25 ⎤
var − cov ( β̂ ) = 0.75 ⎢⎢ 4.375 0.625 - 0.75 ⎥

. ⎢⎣ - 6.25
⎡30.61875 3.28125
= ⎢⎢ 3.28125 0.46875
⎢⎣ - 4.6875 - 0.5625
- 0.75 1
- 4.6875 ⎤
- 0.5625 ⎥⎥
0.75 ⎥⎦

⎥⎦
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 54
2008/09

⎡ var( βˆ 0 ) cov( βˆ 0 , βˆ 1 ) cov( βˆ 0 , βˆ 2 ) ⎤ ⎡30.61875 3.28125 - 4.6875 ⎤


= ⎢⎢ - 0.5625 ⎥⎥
⎢ ⎥
⎢ var( βˆ 1 ) cov( βˆ 1 , βˆ 2 ) ⎥ 0.46876
⎢ var( βˆ ) ⎥
⎣ 2 ⎦ ⎢⎣ 0.75 ⎥⎦
βˆ1 − 0 − 0.25
a) t c =
seˆ( βˆ1 )
=
0.46875
≈ −0.37 ttab = t 2
0.025 ≈ 4.30
t cal ≤ t tab , ⇒ we do not reject the null.
β −0
ˆ 5.5
t cal > t tab
b) t c = 2
= ≈ 6.35

.
c) t c =
seˆ( βˆ 2 )
βˆ0 − 0
seˆ( βˆ0 )
=
− 23.75
30.61875
0.75

≈ −4.29

t cal ≤ t tab , ⇒ we do not reject the null! ! !


⇒ reject the null.
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 55

R /K 2
2008/09
0.9945 / 2
d) Fc = = ≈ 180.82
(1 − R ) /(n − K − 1) 0.0055 / 2
2

Ft = F20, .205 ≈ 19 Fcal > Ftab , ⇒ reject the null.


e) From Yˆi = βˆ0 + βˆ1 X 1i + βˆ 2 X 2i , URSS = 1.5
Now run Yˆ = βˆ + βˆ X + βˆ X
i 0 1i 2i

⇒ Yˆi = βˆ 0 + βˆ ( X 1i + X 2 i ). ⇒ RRSS = 12.08

.Fc =
( RRSS − URSS) / J (12.08 − 1.5) / 1
(URSS) /(n − K − 1)
Ft = F 0.05
1, 2
=
1.5 / 2
≈ 18.51
≈ 14.11

Fcal ≤ Ftab , ⇒ we do not reject the null.


3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression CHAPTER 3 - 56

) Note that we can also use t-test to test the single


restriction that β1 = β2 (equivalently, β1 - β2 = 0).
β̂ 1 − β̂ 2 − 0 β̂ 1 − β̂ 2
= ~ t1
sê(β̂ 1 − β̂ 2 ) vâr(β̂ 1 ) + vâr(β̂ 2 ) − 2côv(β̂ 1 , β̂ 2 )
− 5.75
tc = ≈ −3.76
0.6846532+ 0.8660254− 2( −0.5625)

. tt = t = 12.706
1
0.025
t cal < t tab ⇒ do not reject the null.
) The same result as the F-test, but the F-test is
easier to handle.
3.6 Statistical InferencesHASSEN
JIMMA UNIVERSITY
2008/09 in Multiple
A.
Linear Regression
CHAPTER 3 - 57

To sum up:
Assuming that our model is correctly specified
and all the assumptions are satisfied,
) Education (after controlling for experience)
doesn’t have a significant influence on wages.
) In contrast, experience (after controlling for
education) is a significant determinant of wages.
) The intercept parameter is also insignificant

.
)

)
(though at the margin). Less Important!
Overall, the model explains a significant portion
of the observed wage pattern.
We cannot reject the claim that the coefficients
of the two regressors are equal.
3.7 Prediction with Multiple Linear Regression
JIMMA UNIVERSITY HASSEN A. CHAPTER 3 - 58
2008/09

) In Chapter 2, we used the estimated simple


linear regression model for prediction: (i) mean
prediction (i.e., predicting the point on the
population regression function (PRF)), and (ii)
individual prediction (i.e., predicting an
individual value of Y), given the value of the
regressor X (say, X = X0).
) The formulas for prediction are also similar to
. those in the case of simple regression except
that, to compute the standard error of the
predicted value, we need the variances and
covariances of all the regression coefficients.
2008/09 3.7 Prediction with Multiple
JIMMA UNIVERSITY HASSEN A.
Linear Regression
CHAPTER 3 - 59

Note:
) Even if the R2 for the SRF is very high, it does
not necessarily mean that our forecasts are
good.
) The accuracy of our prediction depends on the
stability of the coefficients between the period
used for estimation and the period used for

. prediction.
) More care must be taken when the values of the
regressors (X's) themselves are forecasts.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 1
2008/09

CHAPTER FOUR
VIOLATING THE ASSUMPTIONS OF
THE CLASSICAL LINEAR
. REGRESSION MODEL (CLRM)
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 2
2008/09

4.1 Introduction

) The estimates derived using OLS techniques


and the inferences based on those estimates are
valid only under certain conditions.
) In general, these conditions amount to the
regression model being "well-specified".
) A regression model is statistically well-specified

. for an estimator (say, OLS) if all of the


assumptions required for the optimality of the
estimator are satisfied.
) The model will be statistically misspecified if
one/more of the assumptions are not satisfied.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 3
2008/09

4.1 Introduction

)Before we proceed to testing for violations of (or


relaxing) the assumptions of the CLRM
sequentially, let us recall: (i) the basic steps in a
scientific enquiry & (ii) the assumptions made.
I. The Major Steps Followed in a Scientific Study:
Study
1. Specifying a statistical model consistent with

. theory (or a model representing the theoretical


relationship between a set of variables).
)This involves at least two choices to be made:
A.The choice of variables to be included into
the model, and
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 4
2008/09

4.1 Introduction

B.The choice of the functional form of the link


(linear in variables, linear in logarithms of
the variables, polynomial in regressors, etc.)
2. Selecting an estimator with certain desirable
properties (provided that the regression model
in question satisfies a given set of conditions).

. 3. Estimating the model. When can one estimate a


model? (sample size? perfect multicollinearity?)
4. Testing for the validity of assumptions made.
5. a) If there is no evidence of misspecification, go
on to conducting statistical inferences.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 5
2008/09

4.1 Introduction

5. b) If the tests show evidence of misspecification


in one or more relevant forms, then there are
two possible courses of action implied:
)If the precise form of model misspecification
can be established, then it may be possible to
find an alternative estimator that is optimal

. under the particular sort of misspecification.


)Regard statistical misspecification as an
indication of a defective model. Then, search
an alternative, well-specified regression
model, and start over (return to Step 1).
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 6
2008/09
4.1 Introduction
II. The Assumptions of the CLRM:
A1 n > K+1. Otherwise, estimation is not possible.
A1:
A2 No perfect multicollinearity among the X's.
A2:
Implication: any X must have some variation.
⎧σ 2
for s = t
A3 ɛi|Xji ~ IID(0,σ ) or E(ε s εt | X j ) = ⎨
A3: 2

A3.1: var(ɛ |X ) = σ2 (0 < σ2 < ∞). ⎩0 for s ≠ t


i j
A3.2: cov(ɛi,ɛs|Xj) = 0, for all i ≠ s; s = 1, …, n.

.A4 ɛi's are normally distributed: ɛi|Xj ~ N(0,σ2).


A4:
A5 E(ɛi|Xj) = E(ɛi) = 0; i = 1, …, n & j = 1, …, K.
A5:
A5.1: E(ɛi) = 0 and X’s are non-stochastic, or
A5.2: E(ɛiXji) = 0 or E(ɛi|Xj) = E(ɛi) with stochastic X’s.
Implication: ɛ is independent of Xj & thus cov(ɛ,Xj) = 0.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 7
2008/09

4.1 Introduction

)Generally speaking, the several tests for the


violations of the assumptions of the CLRM are
tests of model misspecification.
)The values of the test statistics for testing
particular H0's tend to reject these H0's when
the model is misspecified in some way.

. e.g., tests for heteroskedasticity or autocorrelation


are sensitive to omission of relevant variables.
)A significant test statistic may indicate hetero-
skedastic (or autocorrelated) errors, but it may
also reflect omission of relevant variables.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 8
2008/09

4.1 Introduction
Outline:
1. Small Samples (A1?)
2. Multicollinearity (A2?)
3. Non-Normal Errors (A4?)
4. Non-IID Errors (A3?):
A. Heteroskedasticity (A3.1?)
B. Autocorrelation (A3.2?)
5. Endogeneity (A5?):

. A. Stochastic Regressors and Measurement Error


B. Model Specification Errors:
a. Omission of Relevant Variables
b. Wrong Functional Form
c. Inclusion of Irrelevant Variables (?XXX)
d. Stability of Parameters
C. Simultaneity (or Reverse Causality)
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 9
2008/09

4.2 Sample Size: Problems with Few Data Points

) Requirement for estimation: n > K+1.


) If the number of data points (n) is small, it may
be difficult to detect violations of assumptions.
) With small n, it is hard to detect heteroskedast-
icity or nonnormality of ɛi's even when present.
) Though none of the assumptions is violated, a

. linear regression with small n may not have


sufficient power to reject βj = 0, even if βj ≠ 0.
) If [(K+1)/n] > 0.4, it will often be difficult to fit
a reliable model.
) Rule of thumb: aim to have n ≥ 6X & ideally n ≥ 10X.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 10
2008/09

4.3 Multicollinearity

) Many social research studies use a large


number of predictors.
) Problems arise when the various predictors are
highly and linearly related (highly collinear).
) Recall that, in a multiple regression, only the
independent variation in a regressor (an X) is

. used in estimating the coefficient of that X.


) If two X's (X1 & X2) are highly correlated with
each other, then the coefficients of X1 & X2 will
be determined by the minority of cases where
they don’t vary together (or overlap).
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 11
2008/09

4.3 Multicollinearity

) Perfect multicollinearity: occurs when one (or


more) of the regressors in a model (e.g., XK) is a
linear function of other/s (Xi, i = 1, 2, …, K-1).
) For instance, if X2 = 2X1, then there is a perfect
(an exact) multicollinearity between X1 & X2.
) Suppose, PRF: Y=β0+β1X1+β2X2, & X2=2X1.

.) The OLS technique yields 3 normal equations:


∑Y = nβ̂ + β̂ ∑X + β̂ ∑X
i 0 1

∑Y X = β̂ ∑X + β̂ ∑X + β̂ ∑X X
i 1i 0 1i

∑Y X = β̂ ∑X + β̂ ∑X X + β̂ ∑X
i 2i 0 2i
1i

1
2
2
1i

1i
2i

2i
2

2
1i 2i
2
2i
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 12
2008/09

4.3 Multicollinearity

) But, substituting 2X1 for X2 in the 3rd equation


yields the 2nd equation.
) That is, one of the normal equations is in fact
redundant.
) Thus, we have only 2 independent equations (1
& 2 or 1 & 3) but 3 unknowns (β's) to estimate.

.) As a result, the normal equations will reduce to:

∑ i 0 1 2 ∑ 1i
Y
Y
=
X
nβˆ + [ βˆ + 2βˆ ] X

∑ i 1i 0 ∑ 1i 1 2 ∑ 1i
= ˆ X + [ βˆ + 2βˆ ] X 2
β
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 13
2008/09

4.3 Multicollinearity

⎛ ∑Yi ⎞ ⎡ n
⇒⎜⎜ ⎟=⎢ ∑ 1i ⎥.⎜ 0 ⎞⎟
X ⎤ ⎛ ˆ
β
⎝ ∑Yi X1i

⎠ ⎣∑X1i ∑ 1i ⎦ ⎝ 1 2 ⎠
X 2 ⎜ˆ
β + 2βˆ ⎟
)The number of β's to be estimated is greater
than the number of independent equations.
)So, if two or more X's are perfectly correlated, it

. is not possible to find the estimates for all β's.


i.e., we cannot find β̂1 & β̂2separately, but β̂1 + 2.β̂2

α̂ = β̂1 + 2β̂2 =
∑YX
i

∑X
1i
2
1i
− nX1Y
− nX2
1
& β̂0 = Y − [β̂1 + 2β̂2 ]X1
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 14
2008/09

4.3 Multicollinearity

) High, but not perfect, multicollinearity: two or


more regressors in a model are highly (but
imperfectly) correlated. e.g. X1 = 3 – 5XK + ui.
) This makes it difficult to isolate the effect of
each of the highly collinear X's on Y.
) If there is inexact but strong multicollinearity:

. * The collinear regressors (X's) explain the


same variation in the regressand (Y).
* Estimated coefficients change dramatically,
depending on the inclusion/exclusion of
other predictor/s into (or out of) the model.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 15
2008/09

4.3 Multicollinearity

* .β̂' s tend to be very shaky from one sample to


another.
* Standard errors of β̂' s will be inflated.
* As a result, t-tests will be insignificant & CIs
wide (rejecting H0: βj = 0 becomes very rare).
* We get low t-ratios but high R2 (or F): there

. is not enough individual variation in the X's,


but a lot of common variation.
)Yet, the OLS estimators are BLUE.
BLUE
)BLUE – a property of repeated-sampling – says
nothing about estimates from a single sample.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 16
2008/09

4.3 Multicollinearity

) But, multicollinearity is not a problem if the


principal aim is prediction, given that the same
pattern of multicollinearity persists into the
forecast period.
Sources of Multicollinearity:
) Improper use of dummy variables. (Later!)

.) Including the same (or almost the same)


variable twice (e.g. different operationaliaztions
of a single concept used together).
) Method of data collection used (e.g. sampling
over a limited range of X values).
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 17
2008/09

4.3 Multicollinearity
)Including a variable computed from other
variables in the model (e.g. using family income,
mother’s income & father’s income together).
)Adding many polynomial terms to a model,
especially if the range of the X variable is small.
)Or, it may just happen that variables are highly
correlated (without any fault of the researcher).

. Detecting Multicollinearity:
)The classic case of multicollinearity occurs
when R2 is high (& significant), but none of X's
is significant (some of the X's may even have
wrong sign).
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 18
2008/09

4.3 Multicollinearity

) Detecting the presence of multicollinearity is


more difficult in the less clear-cut cases.
) Sometimes, simple or partial coefficients of
correlation among regressors are used.
) However, serious multicollinearity may exist
even if these correlation coefficients are low.

.) A statistic commonly used for detecting multi-


collinearity is VIF (Variance Inflation Factor).
) From a simple linear regression of Y on Xj we
have: σ2
var( β̂ j ) =
∑ x 2ji
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 19
2008/09

4.3 Multicollinearity

) From multiple linear regression of Y on X's:


σ2 1 σ2 σ2
var(β̂ j ) = var(β̂ j ) = . = VIFj .
∑ ji
x 2
(1 − R 2
j ) (1 − R2
j ) ∑ ji
x 2
∑ ji
x 2

where R 2j is R2 from regressing Xj on all other X's.


) The difference between variance of βj in the
two cases arises from the correlation between

. Xj and the other X's, and is captured by:


VIF j =
1
1 − R 2j
) If Xj is not correlated with the other X's, R = 0,
VIFj = 1 and the two variances will be identical.
2
j
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 20
2008/09

4.3 Multicollinearity
) As Rj2 increases, VIFj rises.
) If Xj is perfectly correlated with the other X's,
VIFj = ∞. Implication for precision (or CIs)???
) Thus, a large VIF is a sign of serious/severe (or
“intolerable”) multicollinearity.
) There is no cutoff point on VIF (or any other

. measure) beyond which multicollinearity is


taken as intolerable.
) A rule of thumb: VIF > 10 is a sign of severe
multicollinearity.
# In stata (after regression): vif
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 21
2008/09

4.3 Multicollinearity

Solutions to Multicollinearity:
) Solutions depend on the sources of the problem.
) The formula below is indicative of some
solutions: σ̂ 2 =
∑ 2
ei
vâr(β̂ j ) =
∑ x ji (1 − Rj )
2 2 (n − K − 1)∑ ji
x 2
(1 − R2
j )
) More precision is attained with lower variances

. of coefficients. This may result from:


a) Smaller RSS (or variance of error term) –
less “noise”, ceteris paribus (cp);
b) Larger sample size (n) relative to the
number of parameters (K+1), cp;
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 22
2008/09

4.3 Multicollinearity
c) Greater variation in values of each Xj, cp;
d) Less correlation between regressors, cp.
)Thus, serious multicollinearity may be solved by
using one/more of the following:
1. “Increasing sample size” (if possible). ???
2. Utilizing a priori information on parameters

. (from theory or prior research).


3. Transforming variables or functional form:
a) Using differences (ΔX) instead of levels (X)
in time series data where the cause may be
X's moving in the same direction over time.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 23
2008/09

4.3 Multicollinearity

b) In polynomial regressions, using deviations


of regressors from their means ((Xj–X̅j)
instead of Xj) tends to reduce collinearity.
c) Usually, logs are less collinear than levels.
4. Pooling cross-sectional and time-series data.
5. Dropping one of the collinear predictors. ???

. However, this may lead to the omitted variable


bias (misspecification) if theory tells us that the
dropped variable should be incorporated.
6. To be aware of its existence and employing
cautious interpretation of results.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 24
2008/09

4.4 Non-normality of the Error Term

) Normality is not required to get BLUE of β's.


) The CLRM merely requires errors to be IID.
) Normality of errors is required only for valid
hypothesis testing, i.e., validity of t- and F-tests.
) In small samples, if the errors are not normally
distributed, the estimated parameters will not

. follow normal distribution, which complicates


inference.
) NB: there is no obligation on X's to be normally
distributed.
# In stata (after regression): kdensity residual, normal
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 25
2008/09

4.4 Non-normality of the Error Term

)A formal test of normality is the Shapiro-Wilk


test [H0: errors are normally distributed].
)Large p-value shows that H0 cannot be rejected.
#In stata: swilk residual
)If H0 is rejected, transforming the regressand or
re-specifying (the functional form of) the model

.may help.
)With large samples, thanks to the central limit
theorem, hypothesis testing may proceed even if
distribution of errors deviates from normality.
)Tests are generally asymptotically valid.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 26
2008/09

4.5 Non-IID Errors

)The assumption of IID errors is violated if a


(simple) random sampling cannot be assumed.
)More specifically, the assumption of IID errors
fails if the errors:
1) are not identically distributed, i.e., if var(εi|Xji)
varies with observations – heteroskedasticity.

. 2) are not independently distributed, i.e., if errors


are correlated to each other – serial correlation.
3) are both heteroskedastic & autocorrelated.
This is common in panel & time series data.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 27
2008/09

4.5.1 Heteroskedasticity

) One of the assumptions of the CLRM is homo-


skedasticity, i. e., var(εi|X) = var(εi) = σ2.
) This will be true if the observations of the error
term are drawn from identical distributions.
) Heteroskedasticity is present if var(εi)=σi2≠σ2:
different variances for different segments of the
population (segments by the values of the X's).

.e.g.: Variability of consumption rises with rise in


income, i.e., people with higher incomes display
greater variability in consumption.
) Heteroskedasticity is more likely in cross-
sectional than time-series data.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 28
2008/09

4.5.1 Heteroskedasticity

)With a correctly specified model (in any other


aspect), but heteroskedastic errors, the OLS
coefficient estimators are unbiased & consistent
but inefficient.
)Reason: OLS estimator for σ2 (and thus for the
standard errors of the coefficients) are biased.

.)Hence, confidence intervals based on biased


standard errors will be wrong, and the t & F
tests will be misleading/invalid.
NB: Heteroskedasticity could be a symptom of
other problems (e.g. omitted variables).
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 29
2008/09

4.5.1 Heteroskedasticity

) If heteroskedasticity is a result (or a reflection)


of specification error (say, omitted variables),
OLS estimators will be biased & inconsistent.
) In the presence of heteroskedasticity, OLS is
not optimal as it gives equal weight to all
observations, when, in fact, observations with

. larger error variances (σi2) contain less


information than those with smaller σi2 .
) To correct, give less weight to data points with
greater σi2 and more weight to those with
smaller σi2. [i.e., use GLS (WLS or FGLS)].
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 30
2008/09

4.5.1 Heteroskedasticity

Detecting Heteroskedasticity:
A. Graphical Method
) Run OLS and plot squared residuals versus
fitted value of Y (Ŷ) or against each X.
# In stata (after regression): rvfplot
) The graph may show some relationship (linear,

. quadratic, …), which provides clues as to the


nature of the problem and a possible remedy.
e.g. let, the plot of ũ2 (from Y = α + βX + u) against
X signifies that var(ui) increases proportional
to X2; (var(ui)=σi2 =cXi2). What is the Solution?
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 31
2008/09

4.5.1 Heteroskedasticity

) Now, transform the model by dividing Y, α, X


and u by X. Y = α 1 + β X + u
X X X X
⇒ y* = αx * + β + u *
) Now, u* is homoskedastic: var(ui*) = c; i.e.,
using WLS solves heteroskedasticity!
) WLS yields BLUE for the transformed model.

.) If the pattern of heteroskedasticity is unknown,


log transformation of both sides (compressing
the scale of measurement of variables) usually
solves heteroskedasticity.
) This cannot be used with 0 or negative values.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 32
2008/09

4.5.1 Heteroskedasticity

B. A Formal Test:
) The most-often used test for heteroskedasticity
is the Breusch-Pagan (BP) test.
H0: homoskedasticity vs. Ha: heteroskedasticity
) Regress ũ2 on Ŷ or ũ2 on the original X's, X2's
and, if enough data, cross-products of the X's.

.) H0 will be rejected for high values of the test


statistic [n*R2~χ2q] or for low p-values.
) n & R2 are obtained from the auxiliary
regression of ũ2 on q (number of) predictors.
# In stata (after regression): hettest or hettest, rhs
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 33
2008/09

4.5.1 Heteroskedasticity

) The B-P test as specified above:


9 uses the regression of ũ2 on Ŷ or on X's;
9 and thus consumes less degrees of freedom;
9 but tests for linear heteroskedasticity only;
9 and has problems when the errors are not
normally distributed.
# Alternatively, use: hettest, iid or hettest, rhs iid

. This doesn’t need the assumption of normality.


) If you want to include squares & cross products
of X's, generate these variables first and use:
# hettest varlist or hettest varlist, iid
) The hettest varlist, iid version of B-P test is the
same as White’s test for heteroskedasticity:
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 34
2008/09

4.5.1 Heteroskedasticity

# In stata (after regression): imtest, white


Solutions to (or Estimation with) Heteroskedasticity
) If heteroskedasticity is detected, first check for
some other specification error in the model
(omitted variables, wrong functional form, …).
) If it persists even after correcting for other

. specification errors, use one of the following:


1. Use better method of estimation (WLS/FGLS);
2. Stick to OLS but use robust (heteroskedasticity
consistent) standard errors.
# In stata: reg Y X1 … XK, robust
This is OK even with homoskedastic errors.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 35
2008/09

4.5.2 Autocorrelation

) Error terms are autocorrelated if error terms


from different (usually adjacent) time periods
(cross-sectional units) are correlated, E(εiεj)≠0.
) Autocorrelation in cross-sectional data is called
spatial autocorrelation (in space, not over time).
) However, spatial autocorrelation is uncommon

. since cross-sectional data do not usually have


some ordering logic, or economic interest.
) Serial correlation occurs in time-series studies
when the errors associated with a given time
period carry over into future time periods.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 36
2008/09

4.5.2 Autocorrelation

) et are correlated with lagged values: et-1, et-2, …


) Effects of autocorrelation are similar to those
of heteroskedasticity:
) OLS coefficients are unbiased and consistent,
but inefficient; the estimate of σ2 is biased, and
thus inferences are invalid.
Detecting Autocorrelation

.) Whenever you do on time series data, set up


your data as a time-series (i.e., identify the
variable that represents time or the sequential
order of observations).
# In stata: tsset varname
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 37
2008/09

4.5.2 Autocorrelation

) Then, plotting OLS residuals against the time


variable, or a formal test could be used to check
for autocorrelation.
# In stata (after regression and predicting residuals):
scatter residual time
The Breusch-Godfrey Test
) Commonly-used general test of autocorrelation.

.) It tests for autocorrelation of first or higher


order, and works with stochastic regressors.
Steps:
Steps
1. Regress OLS residuals on X's and lagged
residuals: et = f(X1t,...,XKt, et-1,…,et-j)
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 38
2008/09

4.5.2 Autocorrelation

2. Test the joint hypothesis that all the estimated


coefficients on lagged residuals are zero. Use the
test statistic: jFcal ~ χ2j ;
3. Alternatively, test the overall significance of the
auxiliary regression using nR2 ~ χ2(k+j).
4. Reject H0: no serial correlation for high values

. of the test statistic or for small p-values.


# In stata (after regression): bgodfrey, lags(#)
Eg. bgodfrey, lags(2) tests for 2nd order auto in error
terms (et's up to 2 periods apart) like et, et-1, et-2;
while bgodfrey, lags(1/4) tests for 1st, 2nd, 3rd & 4th
order autocorrelations.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 39
2008/09

4.5.2 Autocorrelation

Estimation in the Presence of Serial Correlation:


)Solutions to autocorrelation depend on the
sources of the problem.
)Autocorrelation may result from:
)Model misspecification (e.g. Omitted
variables, a wrong functional form, …)

. )Misspecified dynamics (e.g. static model


estimated when dependence is dynamic), …
)If autocorrelation is significant, check for model
specification errors, & consider re-specification.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 40
2008/09

4.5.2 Autocorrelation

) If the revised model passes other specification


tests, but still fails tests of autocorrelation, the
following are the key solutions:
1. FGLS: Prais-Winston regression, ….
# In stata: prais Y X1 … XK

.2. OLS with robust standard errors:


# In stata: newey Y X1 … XK, lags(#)
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 41
2008/09

4.6 Endogenous Regressors: E(ɛi|Xj) ≠ 0

) A key assumption maintained in the previous


lessons is that the model, E(Y|X) = Xβ or
K
E(Y|X) = β + β X , was correctly specified.
0 ∑ i i
i =1

) The model Y = Xβ + ε is correctly specified if:


1. ε is orthogonal to the X's, enters the model

. with an additively (separable effect on Y),


and this effect equals zero on average; and,
2. E(Y|X) is linear in stable parameters (β's).
) If the assumption E(εi|Xj) = 0 is violated, the
OLS estimators will be biased & inconsistent.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 42
2008/09

4.6 Endogenous Regressors: E(ɛi|Xj) ≠ 0

)Assuming exogenous regressors (orthogonal


errors & X's) is unrealistic in many situations.
)The possible sources of endogeneity are:
1. stochastic regressors & measurement error;
2. specification errors: omission of relevant
variables or using a wrong functional form;
3. nonlinearity in & instability of parameters; and

. 4. bidirectional link between the X's and Y


(simultaneity or reverse causality);
)Recall two versions of exogeneity assumption:
1. E(ɛi) = 0 and X’s are fixed (non-stochastic),
2. E(ɛiXj) = 0 or E(ɛi|Xj) = 0 with stochastic X’s.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 43
2008/09

4.6 Endogenous Regressors: E(ɛi|Xj) ≠ 0

) The assumption E(εi) = 0 amounts to: “We do


not systematically over- or under-estimate the
PRF,” or the overall impact of all the excluded
variables is random/unpredictable.
) This assumption cannot be tested as residuals
will always have zero mean if the model has an
intercept.
. ) If there is no intercept, some information can
be obtained by plotting the residuals.
) If E(ɛi) = μ (a constant ≠ 0) & X's are fixed, the
estimators of all β's, except β0, will be OK!
) But, can we assume non-stochastic regressors?
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 44
2008/09

4.6.1 Stochastic Regressors and Measurement Error

A. Stochastic Regressors
) Many economic variables are stochastic, and it
is only for ease that we assumed fixed X's.
) For instance, the set of regressors may include:
* a lagged dependent variable (Yt-1), or
* an X characterized by a measurement error.

. ) In both of these cases, it is not reasonable to


assume fixed regressors.
) As long as no other assumption is violated, OLS
retains its desirable properties even if X's are
stochastic.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 45
2008/09

4.6.1 Stochastic Regressors and Measurement Error


) In general, stochastic regressors may or may
not be correlated with the model error term.
1. If X & ɛ are independently distributed, E(ɛ|X)
= 0, OLS retains all its desirable properties.
2. If X & ɛ are not independent but are either
contemporaneously uncorrelated, [E(ɛi|Xi±s) ≠
0 for s = 1, 2, … but E(ɛi|Xi) = 0], or ɛ & X are

. asymptotically uncorrelated, OLS retains its


large sample properties: estimators are biased,
but consistent and asymptotically efficient.
) The basis for valid statistical inference remains
but inferences must be based on large samples.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 46
2008/09

4.6.1 Stochastic Regressors and Measurement Error


3. If X & ɛ are not independent and are
correlated even asymptotically, then OLS
estimators are biased and inconsistent.
) SOLUTION: IV/2SLS REGRESSION!
) Thus, it is not the stochastic (or fixed) nature of
regressors by itself that matters, but the nature
of the correlation between X's & ɛ.

. B. Measurement Error
) Measurement error in the regressand (Y) only
does not cause bias in OLS estimators as long
as the measurement error is not systematically
related to one or more of the regressors.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 47
2008/09

4.6.1 Stochastic Regressors and Measurement Error

) If the measurement error in Y is uncorrelated


with X's, OLS is perfectly applicable (though
with less precision or higher variances).
) If there is a measurement error in a regressor
and this error is correlated with the measured
variable, then OLS estimators will be biased

. and inconsistent.
) SOLUTION: IV/2SLS REGRESSION!
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 48
2008/09

4.6.2 Specification Errors

) Model misspecification may result from:


) omission of relevant variable/s,
) using a wrong functional form, or
) inclusion of irrelevant variable/s.
1. Omission of relevant variables: when one/more
relevant variables are omitted from a model.
) Omitted-variable bias: bias in parameter

. estimates when the assumed specification is


incorrect in that it omits a regressor that must
be in the model.
) e.g. estimating Y=β0+β1X1+β2X2+u when the
correct model is Y=β0+β1X1+β2X2+β3Z+u.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 49
2008/09

4.6.2 Specification Errors

) Wrongly omitting a variable (Z) is equivalent


to imposing β3 = 0 when in fact β3 ≠ 0.
) If a relevant regressor (Z) is missing from a
model, OLS estimators of β's (β0, β1 & β2) will
be biased, except if cov(Z,X1) = cov(Z,X2) = 0.
) Even if cov(Z,X1) = cov(Z,X2) = 0, the estimate
for β0 is biased.
. ) The OLS estimators for σ2 and for the
standard errors of the β̂'s are also biased.
) Consequently, t- and F-tests will not be valid.
) In general, OLS estimators will be biased,
inconsistent and the inferences will be invalid.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 50
2008/09

4.6.2 Specification Errors

) These consequences of wrongly excluding


variables are clearly very serious and thus,
attempt should be made to include all the
relevant regressors.
) The decision to include/exclude variables
should be guided by economic theory and

. reasoning.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 51
2008/09

4.6.2 Specification Errors

2. Error in the algebraic form of the relationship:


a model that includes all the appropriate
regressors may still be misspecified due to
error in the functional form relating Y to X's.
) e.g. using a linear functional form when the
true relationship is logarithmic (log-log) or
semi-logarithmic (lin-log or log-lin).

.) The effects of functional form misspecification


are the same as those of omitting of relevant
variables, plus misleading inferences.
) Again, rely on economic theory, and not just on
statistical tests.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 52
2008/09

4.6.2 Specification Errors

Testing for Omitted Variables and Functional


Form Misspecification
1. Examination of Residuals
) Most often, we use the plot of residuals versus
fitted values to have a quick glance at problems
like nonlinearity.

. ) Ideally, we would like to see residuals rather


randomly scattered around zero.
# In stata (after regression): rvfplot, yline(0)
) If in fact there are such errors as omitted
variables or incorrect functional form, a plot of
the residuals will exhibit distinct patterns.
4.6.2 Specification Errors
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 53
2008/09

2. Ramsey’s Regression Equation Specification


Error Test (RESET)
) It tests for misspecification due to omitted
variables or a wrong functional form.
) Steps:
1. Regress Y on X's, and get Ŷ & ũ.

.2. Regress: a) Y on X's Ŷ2 & Ŷ3, or


b) ũ on X's, Ŷ2 & Ŷ3, or
c) ũ on X's, X2's, Xi*Xj's (i ≠ j).
3. If the new regressors (Ŷ2 & Ŷ3 or X2's, Xi*Xj's)
are significant (as judged by F test), then reject
H0, and conclude that there is misspecification.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 54
2008/09

4.6.2 Specification Errors

# In stata (after regression): ovtest or ovtest, rhs


) If the original model is misspecified, then try
another model: look for some variables which
are left out and/or try a different functional
form like log-linear (but based on some theory).
) The test (by rejecting the null) does not suggest

. an alternative specification.
3. Inclusion of irrelevant variables: when one/more
irrelevant variables are wrongly included in the
model. e.g. estimating Y=β0+β1X1+β2X2+β3X3+u
when the correct model is Y=β0+β1X1+β2X2+u.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 55
2008/09

4.6.2 Specification Errors

) The consequence is that the OLS estimators will


remain unbiased and consistent but inefficient
(compared to OLS applied to the right model).
) σ2 is correctly estimated, and the conventional
hypothesis-testing methods are still valid.
) The only penalty we pay for the inclusion of the

. superfluous variable/s is that the estimated


variances of the coefficients are larger.
) As a result, our probability inferences about the
parameters are less precise, i.e., precision is lost
if the correct restriction β3 = 0 is not imposed.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 56
2008/09

4.6.2 Specification Errors

)To test for the presence of irrelevant variables,


use F-tests (based on RRSS & URSS) if you
have some ‘correct’ model in your mind.
)Do not eliminate variables from a model based
on insignificance implied by t-tests.
)In particular, do not drop a variable with |t| > 1.
)Do not drop two or more variables at once (on

. the basis of t-tests) even if each has |t| < 1.


)The t statistic corresponding to an X (Xj) may
radically change once another (Xi) is dropped.
)A useful tool in judging the extra contribution
of regressors is the added variable plot.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 57
2008/09

4.6.2 Specification Errors

) The added variable plot shows the (marginal)


effect of adding a variable to the model after all
other variables have been included.
) In a multiple regression, the added variable plot
for a predictor, say Xj, is the plot showing the
residuals of Y on all predictors except Xj

. against the residuals of Xj on all other X's.


# In stata (after regression): avplots or avplot varnarnes
) In general, model misspecification due to the
inclusion of irrelevant variables is less serious
than that due to omission of relevant variable/s.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 58
2008/09

4.6.2 Specification Errors

) Taking bias as a more undesirable outcome


than inefficiency, if one is in doubt about which
variables to include in a regression model, it is
better to err by including irrelevant variables.
) This is one reason behind the advocacy of
Hendry’s “general-to-specific” methodology.

.) This preference is reinforced by the fact that


standard errors are incorrect if variables are
wrongly excluded, but not if variables are
wrongly included.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 59
2008/09

4.6.2 Specification Errors

) In general, the specification problem is less


serious when the research task/aim is model
comparison (to see which has a better fit to the
data) as opposed to when the task is to justify
(and use) a single model and assess the relative
importance of the independent variables.

.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 60
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
) So far we assumed that the intercept and all the
slope coefficients (βj's) are the same/stable for
the whole set of observations. Y = Xβ + e
) But, structural shifts and/or group differences
are common in the real world. May be:
) the intercept differs/changes, or

. ) the (partial) slope differs/changes, or


) both the intercept and slope differ/change
across categories or time period.
) Two methods for testing parameter stability:
(i) Using Chow tests, or (ii) Using DVR.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 61
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
A. The Chow Tests
) Using an F-test to determine whether a single
regression is more efficient than two (or more)
separate regressions on sub-samples.
) The stages in running the Chow test are:
1. Run two separate regressions on the data (say,

. before and after war or policy reform, …) and


save the RSS's: RSS1 & RSS2.
) RSS1 has n1–(K+1) df & RSS2 has n2–(K+1) df.
) The sum RSS1 + RSS2 gives the URSS with
n1+n2–2(K+1) df.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 62
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
2. Estimate the pooled/combined model (under
H0: no significant change/difference in β's).
) The RSS from this model is the RRSS with
n–(K+1) df; where n = n1+n2.
3. Then, under H0, the test statistic will be:
[RRSS − URSS]

. F cal =
URSS
(K + 1)

[n − 2(K + 1)]
4. Find the critical value: FK+1,n-2(K+1) from table.
5. Reject the null of stable parameters (and favor
Ha: that there is structural break) if Fcal > Ftab.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 63
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
Example: Suppose we have the following results
from the OLS Estimation of real consumption
on real disposable income:
i. For the period 1974-1991: consi = α1+β1*inci+ui
Consumption = 153.95 + 0.75*Income
p-value: (0.000) (0.000)

. RSS = 4340.26114; R2 = 0.9982


ii. For the period 1992-2005: consi = α2+ β2*inci+ui
Consumption = 1.95 + 0.806*Income
p-value: (0.975) (0.000)
RSS = 10706.2127; R2 = 0.9949
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 64
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
iii. For the period 1974-2005: consi = α+ β*inci+ui
Consumption = 77.64 + 0.79*Income
t-ratio: (4.96) (155.56)
RSS = 22064.6663; R2 = 0.9987
1. URSS = RSS1 + RSS2 = 15064.474
2. RRSS = 22064.6663

. ) K = 1 and K + 1 = 2; n1 = 18, n2 = 15, n = 33.


3. Thus, F =
cal
[22064.666 3 − 15064.474]

15064.474
29
2 = 6.7632981

4. p-value = Prob(F-tab > 6.7632981) = 0.003883


JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 65
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
5. So, reject the null that there is no structural
break at 1% level of significance.
) The pooled consumption model is inadequate
specification and thus we should run separate
regressions for the two periods.
) The above method of calculating the Chow test

. breaks down if either n1 < K+1 or n2 < K+1.


) Solution: use Chow’s second (predictive) test!
) If, for instance, n2 < K+1, then the F-statistic
will be altered as follows.
) Replace URSS by RSS1 and use the statistic:
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 66
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
[RRSS − RSS 1 ]
n2
Fcal =
RSS 1
n 1 − (K + 1)
* The Chow test tells if the parameters differ on
average, but not which parameters differ.
* The Chow test requires that all groups have the
same error variance.

. ) This assumption is questionable: if parameters


can be different, then so can the variances be.
) One method of correcting for unequal error
variances is to use the dummy variable
approach with White's Robust Standard Errors.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 67
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
B. The Dummy Variables Regression
I. Introduction:
# Not all information can easily be quantified.
) So, need to incorporate qualitative information.
e.g. 1. Effect of belonging to a certain group:
1 Gender, location, status, occupation
1 Beneficiary of a program/policy

. 2. Ordinal variables:
1 Answers to yes/no (or scaled) questions...
# Effect of some quantitative variable may differ
between groups/categories:
1 Returns to education may differ between
sexes or between ethnic groups …
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 68
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
# Interest in determinants of belonging to a group
1 Determinants of being poor …
)Dummy dependent variable (logit, probit…)
) Dummy Variable: a variable devised to use
qualitative information in regression analysis.
) A dummy variable takes 2 values: usually 0/1.
e.g. Yi=β0+β1*D+u; D = ⎧1 for i ϵ group 1, and

. ⎨
⎩0 for i ∉ group 1.
¾If D = 0, E(Y) = E(Y|D = 0) = β0
¾If D = 1, E(Y) = E(Y|D = 1) = β0 + β1
) Thus, the difference between the two groups (in
mean values of Y) is: E(Y|D=1) – E(Y|D=0) = β1.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 69
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
) So, the significance of the difference between
the groups is tested by a t-test of β1 = 0.
e.g.: Wage differential between male and female
) Two possible ways: a male or a female dummy.
1. Define a male dummy (male = 1 & female = 0).
# reg wage male
# Result: Yi = 9.45 + 172.84*D + ûi

. p-value: (0.000) (0.000)


) Interpretation: the monthly wage of a male
worker is, on average, 172.84$ higher than that
of a female worker.
) This difference is significant at 1% level.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 70
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
2. Define a female dummy (female = 1 & male = 0)
# reg wage female
# Result: Yi = 182.29 – 172.84*D + ûi
p-value: (0.000) (0.000)
) Interpretation: the monthly wage of a female
worker is, on average, 172.84$ lower than that
of a male worker.

.) This difference is significant at 1% level.


II. Using the DVR to Test for Structural Break:
) Recall the example of consumption function:
period 1: consi = α1+ β1*inci+ui vs.
period 2: consi = α2+ β2*inci+ui
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 71
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
) Let’s define a dummy variable D1, where:
⎧1 for the period 1974-1991, and
D1 = ⎨
⎩0 for the period 1992-2005
) Then, consi = α0+α1*D1+β0*inci+β1(D1*inci)+ui
For period 1: consi = (α0+α1)+(β0+β1)inci+ui
Intercept = α0+α1; Slope (= MPC) = β0+β1.

.For period 2 (base category): consi=α0+β0*inci+ui


Intercept = α0; Slope (= MPC) = β0.
) Regressing cons on inc, D1 and (D1*inc) gives:
cons = 1.95 + 152D1 + 0.806*inc – 0.056(D1*inc)
p-value: (0.968) (0.010) (0.000) (0.002)
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 72
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
) Substituting D1=1 for i ϵ period-1 and D1=0 for
i ϵ period-2:
period 1 (1974-1991): cons = 153.95 + 0.75*inc
period 2 (1992-2005): cons = 1.95 + 0.806*inc
) The Chow test is equivalent to testing α1=β1=0
in: cons=1.95+152D1+0.806*inc – 0.056(D1*inc)

.# In stata (after regression): test D1=D1*inc=0.


) This gives F(2, 29) = 6.76; p-value = 0.0039.
) Then, reject H0! There is a structural break!
) Comparing the two methods, it is preferable to
use the method of dummy variables regression.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 73
2008/09

4.6.3 Stability of Parameters and the Dummy


Variables Regression (DVR)
) This is because with the method of DVR:
1. we run only one regression.
2. we can test whether the change is in the
intercept only, in the slope only, or in both.
In our example, the change is in both. Why???
) For a total of m categories, use m–1 dummies!
) Including m dummies (1 for each group) results

. in perfect multicollinearity (the dummy


variable trap). e.g.: 2 groups & 2 dummies:
X = [constant D1

) constant = D1 + D2 !!!
D2 ]
X
⎡1
= ⎢⎢ 1
⎢⎣ 1
X
X
X
11

12

13
1
1
0
0
0
1



⎥⎦
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 74
2008/09

4.6.4 Simultaneity Bias

) Simultaneity occurs when an equation is part of


a simultaneous equations system, such that
causation runs from Y to X as well as X to Y.
) In such a case, cov(X,ε) ≠ 0 and OLS estimators
are biased and inconsistent.
) Such situations are pervasive in economic

. models so simultaneity bias is a vital issue.


e.g. The Simple Keynesian Consumption Function
) Structural form model: consists of the national
accounts identity and a basic consumption
function, i.e., a pair of simultaneous equations.
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 75
2008/09

4.6.4 Simultaneity Bias

⎧Yt = C t + I t

⎩C t = α + βYt + U t
) Yt & Ct are endogenous (simultaneously
determined) and It is exogenous.
) Reduced form: expresses each endogenous
variable as a function of exogenous variables,

. (and/or predetermined variables – lagged


endogenous variables, if present) and random
error term/s.
) The reduced form is: ⎨
⎧ 1
⎪⎪Yt = ( 1 − β )[α + I t + U t ]
1
⎪C = (
⎪⎩ t 1− β
)[α + βI + U ]
t t
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 76
2008/09

4.6.4 Simultaneity Bias

) The reduced form equation for Yt shows that:


1
cov(Y t , U t ) = cov[( )(α + I t + U t ), U t ]
1− β
1
=( )[cov(α ,U t ) + cov( I t ,U t ) + cov(U t ,U t )]
1− β
1 σ U2
=( )var(Ut ) = ( )≠0

. 1− β 1− β
) Yt, in Ct = α + βYt + Ut, is correlated with Ut.
) OLS estimators for β (MPC) & α (autonomous
consumption) are biased and inconsistent.
) Solution:
Solution IV/2SLS
JIMMA UNIVERSITY HASSEN A. CHAPTER 4 - 77
2008/09

… THE END …
GOOD LUCK!
.

You might also like