01B Linear Regression With Time Series Data

university of copenhagen department of economics
Econometrics II
Linear Regression with Time Series Data
Morten Nyboe Tabor
Learning Goals
• Give an interpretation of the linear regression model for stationary time

series and explain the model’s identifying assumptions.
• Derive the method of moments (MM) estimator and state the assumptions
used to derive the estimator. Estimate and interpret the parameters.
• Explain the sufficient conditions for consistency, unbiasedness, and

asymptotic normality of the method of moments estimator in the linear
regression model.
• Construct misspecification tests and analyze to what extent a statistical

model is congruent with the data.
• Explain the consequences of autocorrelated, heteroskedastic, and

non-normal residuals for the properties of the MM estimator.
Econometrics II — Linear Regression with Time Series Data — Slide 2/47

Outline
1 The Linear Regression Model
Definition, Interpretation, and Identification
2 How do we identify and interpret parameters of the model?
Method of Moments (MM) Estimation
3 Properties of the Estimator
Consistency
Unbiasedness
Example: Bias in AR(1) Model
Asymptotic Distribution
4 Dynamic Completeness and Autocorrelation
A Dynamically Complete Model
Autocorrelation of the Error Term
Consequences of Autocorrelation
5 Model Formulation and Misspecification Testing
Model Formulation
Misspecification Testing
Model Formulation and Misspecification Testing in Practice
6 The Frisch-Waugh-Lovell Theorem
7 Recap: Linear Regression Model with Time Series Data
1. The Linear Regression Model
Time Series Regression Models
• Consider the linear regression model
yt = xt0 β + t = x1t β1 + x2t β2 + ... + xkt βk + t ,
for t = 1, 2, ..., T . Note that yt and t are 1 × 1, while xt and β are k × 1.
• The interpretation of the model depends on the variables in xt .

1 If xt contains contemporaneously dated variables, it is denoted a static
regression:
yt = xt0 β + t .
2 A simple model for yt given the past is an autoregressive model:
yt = θyt−1 + t .
3 More complicated dynamics in the autoregressive distributed lag (ADL)
model:
0
yt = θ1 yt−1 + x̃t0 ϕ0 + x̃t−1 ϕ1 + t .

2. How do we identify and interpret
parameters of the model?
Interpretation of Regression Models
• Consider again the regression model
yt = xt0 β + t , t = 1, 2, ..., T . (∗)
As it stands, the equation is a tautology: not informative on β!

Why?... For any β, we can find a residual t so that (∗) holds.
• We have to impose restrictions on t to ensure a unique solution to (∗).
This is called identification in econometrics.
Assume that (∗) represents the conditional expectation, E (yt | xt ) = xt0 β,
so that
E (t | xt ) = 0. (∗∗)
This condition states a zero-conditional-mean. We say that xt is
predetermined.
• Under assumption (∗∗) the coefficients are the partial (ceteris paribus)
effects
∂E (yt | xt )
= βj .
∂xjt

Identification
• Predeterminedness implies the so-called moment condition:
E (xt t ) = 0, (∗ ∗ ∗)
stating that xt and t are uncorrelated.

• Now insert the model definition, t = yt − xt0 β, in (∗ ∗ ∗) to obtain
E (xt (yt − xt0 β)) = 0

E (xt yt ) − E (xt xt0 )β = 0.
This is a system of k equations in the k unknown parameters, β, and if

E (xt xt0 ) is non-singular we can find the so-called population estimator
β = E (xt xt0 )−1 E (xt yt ),
which is unique.
• The parameters in β are identified by (∗ ∗ ∗) and the non-singularity
condition.
The latter is the well-known condition for no perfect multicollinearity.

Method of Moments (MM) Estimation

• From a given sample (yt , xt0 )0 , t = 1, 2, ..., T , we cannot compute
expectations. In practice we replace with sample averages and obtain the
MM estimator
T
!−1 T
!
−1
X X
βb = T xt xt0 T −1
xt yt .
t=1 t=1
Note that the MM estimator coincides with the OLS estimator.

b → β, we need a law of large numbers (LLN) to
• For MM to work, i.e., β
apply, i.e.,
T T
X X
T −1 xt yt → E (xt yt ) and T −1 xt xt0 → E (xt xt0 ).
t=1 t=1
• Note the two distinct conditions for OLS to converge to the true value:
1 The moment condition (∗ ∗ ∗) should be satisfied.
2 A law of large numbers should apply.
A central part of econometric analysis is to ensure these conditions.

Main Assumption
• We impose assumptions to ensure that a LLN applies to the sample

averages.
Main Assumption
Consider a time series yt and the k × 1 vector time series xt . We assume:
0
1 The process zt = (yt , xt0 ) has a joint stationary distribution.
2 The process zt is weakly dependent, so that zt and zt+h becomes
approximately independent for h → ∞.
• Interpretation:
Think of (1) as replacing identical distributions for IID data.
Think of (2) as replacing independent observations for IID data.
• Under the main assumption, most of the results for linear regression on
random samples carry over to the time series case.

3. Properties of the Estimator
Consistency
• Consistency is the first requirement for an estimator:

βb should converge to β.
Result 1: Consistency
Let yt and xt obey the main assumption. If the regressors obey the moment
condition,
E (xt t ) = 0,
then the OLS estimator is consistent, i.e., βb → β as T → ∞.
• The OLS estimator is consistent if the regression represents the

conditional expectation of yt | xt . In this case E (t |xt ) = 0, and βi has a
”ceteris paribus” interpretation.
• The conditions in Result 1 are sufficient but not necessary.

We will see examples of estimators that are consistent even if the
conditions are not satisfied (related to unit roots and cointegration).

Illustration of Consistency
• Consider the regression model with a single explanatory variable, k = 1,
yt = xt β + t , t = 1, 2, ..., T , with E (xt t ) = 0 and E (xt ) = 0.
• Write the OLS estimator as

PT PT PT
T −1 yx
t=1 t t
T −1 t=1 (xt β + t )xt T −1 t=1 t xt
βb = PT 2 = PT 2 = β + PT 2 ,
T −1 xt −1 T xt −1 T xt
t=1 t=1 t=1
and look at the terms as T → ∞:

T
X
plim T −1 xt2 = σx2 , 0 < σx2 < ∞ (O)
T →∞
t=1
T
X
plim T −1 t xt = E (t xt ) = 0 (OO)
T →∞
t=1
• Where LLN applies under Assumption 1; and

(O) holds for a stationary process (σx2 is the limiting variance of xt ).
(OO) follows from predeterminedness.

Unbiasedness
• A stronger requirement for an estimator is unbiasedness: E (β
b) = β.
Result 2: Unbiasedness
Let yt and xt obey the main assumption. If the regressors are strictly
exogenous,
E (t | x1 , x2 , ..., xt , ..., xT ) = 0,
then the OLS estimator is unbiased, i.e., E (βb | x1 , x2 , ..., xT ) = β.
• Unbiasedness requires strict exogeneity, which is not fulfilled in a dynamic

regression.
Consider the first order autoregressive model
yt = θyt−1 + t .
Here yt is function of t , so t cannot be uncorrelated with yt , yt+1 , ..., yT .
Result 3: Estimation bias in dynamic models

In general, the OLS estimator is not unbiased in regressions with lagged
dependent variables.
Example: Finite Sample Bias in an AR(1)
• Consider a first-order autoregressive, AR(1), model,
yt = θyt−1 + t , t = 1, 2, ..., T , (∗)
where θ = 0.9. The time series yt satisfy Assumption 1 of stationarity and

weak dependence (we will return to the properties of yt later).
The OLS estimator is,
T
!−1 T
!
X 2
X
θb = yt−1 yt−1 yt .
t=1 t=1
We are often interested in E (θb) to check for bias for a given T . This is
typically difficult to derive analytically.
• But if we could draw realizations of θ

b, then we could estimate E (θb).

Finite Sample Bias in an AR(1)

MC simulation:
1 Construct (randomly) M artificial data sets from the AR(1) model (∗).
b(m) , for
2 Estimate the model (∗) for each data set and get the estimates, θ
m = 1, 2, ..., M.
3 Compute the Monte Carlo mean, the Monte Carlo standard deviation, and
an estimate of the bias:
M
X
MEAN(θb) = M −1 θb(m)
m=1
v
u M
X 2
u
MCSD(θb) = tM −1 θb(m) − MEAN(θb)
m=1
Bias(θb) = MEAN(θb) − θ.
Note that by the LLN (for independent observations, since the θb(m) ’s are
independent):
M
X
MEAN(θb) = M −1 θb(m) → E (θb) for M → ∞.
m=1

Finite Sample Bias in an AR(1): Monte Carlo Simulations in

PcNaive
We consider Monte Carlo simulations to illustrate potential bias of the OLS

estimator for θ in the AR(1) model, (∗). We consider the parameter values:
θ ∈ {0.0, 0.5, 0.9}.
This can be done quite easily in the module PcNaive in OxMetrics.

1 In OxMetrics, click the “Model” menu and select “Model”.
2 In the window that pops up, select the “PcGive” module. In “Category”
select “Monte Carlo” and in “Model class” select “AR(1) Experiment
using PcNaive”.
3 Click “Formulate”.
4 In the “Formulate” window you must specify the DGP, the estimating
model, the number of replications, the sample length(s), and some output
settings. See next slide.

Finite Sample Bias in an AR(1): PcNaive

Finite Sample Bias in an AR(1): PcNaive Output for θ = 0
Ya_1
5.0
2.5
-0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25
0.5 Ya_1 × 2MCSD
0.0
-0.5
20 40 60 80 100 120 140 160 180 200

Finite Sample Bias in an AR(1): PcNaive Output for θ = 0.5
Ya_1
5.0
2.5
0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
1.0
Ya_1 × 2MCSD
0.5
0.0
20 40 60 80 100 120 140 160 180 200

Finite Sample Bias in an AR(1): PcNaive Output for θ = 0.9
Ya_1
30
20
10
0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94
1.25
Ya_1 × 2MCSD
1.00
0.75
0.50
0 100 200 300 400 500 600 700 800 900 1000


• In a Monte Carlo simulation, we take an AR(1) as the DGP and
• In a MC simulation we take an AR(1) as the DGP and estimation model:
estimation model:
 = 09 · −1 +   ∼ (0 1)
yt = 0.9 · yt−1 + t , t ∼ N(0, 1).
Mean of OLS estimate in AR(1) model

1.2
1.1
1.0
True value
0.9
0.8 MEAN
0.7
0.6
0.5 MEAN ± 2·MCSD
0.4
10 20 30 40 50 60 70 80 90 100
11 of 21

Asymptotic Distribution
• To derive the asymptotic distribution we need a CLT; additional
restrictions on t .
Result 4: Asymptotic distribution and OLS variance

Let yt and xt obey the main assumption and let xt and t be uncorrelated,
E (t xt ) = 0. Furthermore, assume (conditional) homoskedasticity and no
serial correlation, i.e.,
E 2t | xt σ2

=
E (t s | xt , xs ) = 0 for all t 6= s.
Then as T → ∞, the OLS estimator is asymptotically normal with usual

variance: √
D
T βb − β → N(0, σ 2 E (xt xt0 )−1 ).
• Inserting estimators, we can test hypotheses using

T
!−1 !
a
X
βb ∼ N β, σ
b 2
xt xt0 .
t=1
• Recall that E (t xt ) = 0 is implied by E [t |xt ] = 0.

4. Dynamic Completeness and
Autocorrelation
A Dynamically Complete Model
• Result 4 (Asymptotic distribution of the OLS estimator) required no serial

correlation in the error terms.
The precise condition for no serial correlation looks strange. Often we
disregard conditioning and consider whether t and s are uncorrelated.
• We say that a model is dynamically complete if
E (yt | xt , yt−1 , xt−1 , yt−2 , xt−2 , ..., y1 , x1 ) = E (yt | xt ) = xt0 β.
xt contains all relevant information in the available information set.

No-serial-correlation is practically the same as dynamic completeness.
• All systematic information in the past of yt and xt is used in the

regression model.
This is often taken as an important design criteria for a dynamic
regression model.
We should always test for no-autocorrelation in time series models.

Autocorrelation of the Error Term
• If Cov(t , s ) 6= 0 for some t 6= s, we have autocorrelation of the error

term.
• This is detected from the estimated residuals and Cov(b s ) 6= 0 is

t , b
referred to as residual autocorrelation. Often used synonymously.
• Residual autocorrelation does not imply that the DGP has autocorrelated
errors. Typically, autocorrelation is taken as a signal of misspecification.
Different possibilities:
1 Autoregressive errors in the DGP.
2 Dynamic misspecification.
3 Omitted variables and non-modelled structural shifts.
4 Misspecified functional form
The solution to the problem depends on the interpretation.

Consequences of Autocorrelation
• Autocorrelation will not violate the assumptions for Result 1

(consistency) in general.
• But E (xt t ) = 0 is violated if the model includes a lagged dependent
variable. Look at an AR(1) model with error autocorrelation, i.e., the two
equations
yt = θyt−1 + t
t = ρt−1 + vt , vt ∼ IID(0, σv2 ).
Both yt−1 and t depend on t−1 , so E (yt−1 t ) 6= 0.
Result 5: Inconsistency of OLS

In a regression model including the lagged dependent variable, the OLS
estimator is not consistent in the presence of autocorrelation of the error term.
• Even if OLS is consistent, the standard formula for the variance in Result
4 is no longer valid. It is possible to derive the variance, the so-called
heteroskedasticity-and-autocorrelation-consistent (HAC) standard errors.

(1) Autoregressive Errors in the DGP
• Consider the case where the errors are truly autoregressive:
yt = xt0 β + t
t = ρt−1 + vt , vt ∼ IID(0, σv2 ).
• If ρ is known we can write
xt0 − ρxt−1
0

(yt − ρyt−1 ) = β + (t − ρt−1 )
yt = ρyt−1 + xt0 β − xt−1
0
ρβ + vt .
The transformation is analog to GLS transformation in the case of

heteroskedasticity.
• The GLS model is subject to a so-called common factor restriction: three

regressors but only two parameters, ρ and β. Estimation is non-linear and
can be carried out by maximum likelihood.

• Consistent estimation of the parameters in the GLS model requires
E ((xt − ρxt−1 ) (t − ρt−1 )) = 0.
But then t−1 should be uncorrelated with xt , i.e., E (t xt+1 ) = 0.

Consistency of GLS requires stronger assumptions than consistency of
OLS.
• The GLS transformation is rarely used in modern econometrics.

1 Residual autocorrelation does not imply that the error term is
autoregressive.
There is no a priori reason to believe that the transformation is correct.
2 The requirement for consistency of GLS is strong.

(2) Dynamic Misspecification
• Residual autocorrelation indicates that the model is not dynamically

complete, so
E (yt | xt ) 6= E (yt | xt , yt−1 , xt−1 , yt−2 , xt−2 , ..., y1 , x1 ).
The (dynamic) model is misspecified and should be reformulated.

Natural remedy is to extend the list of lags of xt and yt .
• If autocorrelation seems of order one, then a starting point is the GLS

transformation.
The AR(1) structure is only indicative and we look at the unrestricted
ADL model
yt = α0 yt−1 + xt0 α1 + xt−1
0
α2 + ηt .
Finding of first-order autocorrelation is only used to extend the list of
regressors. The common factor restriction is removed.

(3) Omitted Variables
• Omitted variables in general can also produce autocorrelation.

Let the DGP be
yt = x1t · β1 + x2t · β2 + t , (♦)
and consider the estimation model
yt = x1t · β1 + ut . (♦♦)
Then the error term is ut = x2t · β2 + t , which is autocorrelated if x2t is

persistent.
• An example is if the DGP exhibits a level shift, e.g., (♦) includes the
dummy variable
0 for t < T0
x2t = .
1 for t ≥ T0
If x2t is not included in (♦♦) then the residual will be systematic.
• Again the solution is to extend the list of regressors, xt .

(4) Misspecified Functional Form
• If the true relationship

yt = g(xt ) + t
is non-linear, the residuals from a linear regression will typically be
autocorrelated.
• The solution is to try to reformulate the functional form of the regression

line.

5. Model Formulation and Misspecification
Testing
Model Formulation and Properties of the Estimator
• We consider the linear regression model for time series,
yt = xt0 β + εt . (*)
If there is no perfect multicolleniarity in xt we can always compute the

OLS estimator βb and consider the estimated model,
yt = xt0 βb + εbt , for the sample t = 1, 2, ...T . (**)
• Under specific assumptions can we derive properties of the estimator β

b:
1 b → β as T → ∞.
Consistency: β
2 b|x1 , x2 , ..., xT ] = β.
Unbiasedness: E [β
b ∼a N(β, b
PT
3 Asymptotic distribution: β σ 2 ( t=1 xt xt0 )−1 ).
• However, if (∗∗) does not satisfy the specific assumptions we cannot use
the results in (1)-(3) to interpret the estimated coefficients and to do
statistical inference (e.g. test the hypothesis H0 : βi = 0).

Misspecification Testing
• We can never show that the model is well specified;

but we can certainly find out if it is not in a specific direction!
Misspecification testing.
• We consider the following standard tests on the estimated errors, ε

bt :
1 Test for no-autocorrelation.
2 Test for no-heteroskedasticity.
3 Test for normality of the error term.
• If all tests are passed, we may think of the model as representing the main
features of the data and we can use the results in (1) − (3), e.g. for
testing hypotheses on estimated parameters.

(1) Tests for No-Autocorrelation

• Let b
t (t = 1, 2, ..., T ) be the estimated residuals from the original
regression model
yt = xt0 β + t .
A test for no-autocorrelation (of first order) is based on the hypothesis
γ = 0 in
the auxiliary regression
t = xt0 δ + γb
b t−1 + ut ,
where xt is included because it may be correlated with t−1 .

• A valid test is t−ratio for γ = 0. Alternatively there is the
Breusch-Godfrey LM test
LM = T · R 2 ∼ χ2 (1).
Note that xt and t are orthogonal, and any explanatory power is due to
t−1 .
b
• The Durbin Watson (DW) test is derived for finite samples.
Based on strict exogeneity. Not valid in many models.

(2) Test for No-Heteroskedasticity
• Consider the auxiliary regression
2t = γ0 + x1t γ1 + ... + xkt γk + x1t

b 2
δ1 + ... + xkt2 δk + ut .
• A test for unconditional homoskedasticity is based on the null hypothesis:
γ1 = ... = γk = δ1 = ... = δk = 0.
The alternative is that the variance of t depends on xit or the squares xit2
for some i = 1, 2, ...k.
• The LM test,
LM = T · R 2 ∼ χ2 (2k).

(3) Test for Normality of the Error Term
• Skewness (S) and kurtosis (K ) are the estimated third and fourth central
moments of the standardized estimated residuals ut = (εbt − ε̄)/σ
b.:
T
X T
X
S = T −1 ut3 and K = T −1 ut4 .
t=1 t=1
• The normal distribution is symmetric and has S = 0, while K = 3 (K − 3

is often referred to as excess kurtosis). If K > 3, the distribution has “fat
tails”.
• Under the assumption of normality,
T 2
ξS = S → χ2 (1)
6
T
ξK = (K − 3)2 → χ2 (1).
24
• It turns out that ξS and ξK are independent: Jarque-Bera joint test of
normality:
ξJB = ξS + ξK → χ2 (2).

Notes on Normality of the Error Term
• Often normality of the error terms is rejected due to the presence of a few
large outliers that the model cannot account for (i.e. they are captured by
the error term).
• Start with a plot and a histogram of the (standardized) estimated

residuals. Any large outliers of, say, more than three standard deviations?
• A big residual at time T0 may be accounted for by including a dummy

variable (xt = 1 for t = T0 , xt = 0 for all other t).
• Note that the results for the linear regression model hold without
assuming normality of εt . However, the normal distribution is a natural
benchmark and given normality:
• β
b converges faster to the asymptotic normal distribution.
• The OLS estimator coincides with the maximum likelihood (ML) estimator.

Model Formulation and Misspecification Testing in Practice
• Finding a well-specified model for the data of interest can be hard in

practice! Potential reasons include:
• Which variables to include in xt ? Economic theory can often guide us.
• How many lags to include to obtain a dynamically complete model?
• Are the model parameters constant over the sample or do we need to
include structural breaks?
• Do we need to include dummy variables to capture large outliers?
Think about which model and which formulation is useful to represent the
characteristics of the data!
• In practice we use an iterative procedure:

1 Estimate the model for a specific formulation.
2 Do misspecification tests on the estimated errors.
3 If the misspecification tests are rejected, re-formulate the model.
Repeat (1) and (2) until the misspecification tests are not rejected.
• General-to-Specific: Start with a larger model and simplify it by removing

insignificant variables.

6. The Frisch-Waugh-Lovell Theorem
Motivation
Consider the linear regression model,
yt = µ + β1 xt + εt , t = 1, 2, ..., T , (1)
yt = yt − ȳ and
and the linear regression model for the de-meaned variables e
xt = xt − x̄ , given by
e
yt = b1 e
e xt + ut , t = 1, 2, ..., T . (2)
Question: Would you expect β̂1 to equal b̂1 ?

Motivation: The Frisch-Waugh-Lovell Theorem

Consider the linear regression model
yt = µ + β1 xt + εt , t = 1, 2, ..., T , (3)
yt = yt − ȳ and
and the linear regression model for the de-meaned variables e
xt = xt − x̄ , given by
e
yt = b1 e
e xt + ut , t = 1, 2, ..., T . (4)
Question: Would you expect β̂1 to equal b̂1 ?
(a) Scatterplot of y t (vertical axis) and xt (horizontal axis) y t = y t −ȳ (vertical axis) and ~xt = xt − ¯x (horizontal axis)
(b) Scatterplot of ~
y t ×xt y t ×~
~ xt
20
5
15
0
10
-5
0 2 4 6 8 -4 -2 0 2 4
The Frisch-Waugh-Lovell theorem tells us that β̂1 = b̂1 .

Model
Consider the more general model

0 0
yt = x1t β1 + x2t β2 + εt , t = 1, ..., T ,
with x1t K1 × 1 and x2t K2 × 1.
The Frisch-Waugh-Lovell theorem tells us that we can obtain the OLS

estimate of β1 in two ways:
0 0 0
1 (The usual way) Regress yt on (x1t , x2t ) and obtain, β̂1 .
2 (A three step procedure) Regress yt on x2t and obtain the residuals, yt? .
?
Next, regress x1t on x2t and obtain the residuals, x1t . Lastly, regress yt? on
?
x1t and obtain β̂1 .
The the two estimates are numerically identical.
A formal proof will be posted on Absalon.

Why is it a useful theorem?
In some cases, it will allow to greatly simplify the model:

• Demeaning variables: x2t = 1.
• Detrending variables: See PS #1.
• Removing fixed effects in a panel data model (Econometrics I).

7. Recap: Linear Regression Model with
Time Series Data
Recap: yt = xt0 β + εt
Main assumption: Cross-section vs. Time series
Cross-Section: independent and identically distributed
Time Series: weak dependence and stationarity
Technical requirements for the application of a LLN and a CLT.
OLS: Assumptions required for implementation and properties

Computation Consistency Unbiasedness Asympt. Distr.
β = (X 0 X )−1 X 0 y
b plim β = β
b E (β ) = β
b ba
β ∼ Normal
(1) No perfect collinearity
in X variables X X X X
(2) Strict exogeneity
E (t | x1 , x2 , . . . , xT ) = 0 (∗) X (∗)
(3) Contemporaneous exogeneity
(predeterminedness)
E (t | xt ) = 0 (∗) (∗)
(4) Moment conditions
E (xt t ) = 0 X X
(5) Conditional homoskedasticity
E (2t | xt ) = σ 2 X
(6) No serial correlation
E (t s | xt , xt ) = 0 ∀t 6= s X
Note: Since (2) : E (t | x1 , x2 , . . . , xT ) = 0 ⇒ (3) : E (t | xt ) = 0 ⇒ (4) : E (t xt ) = 0, but the
reverse is not true, conditions (∗) are stronger and therefore not necessary, as long as one of the less strong required
assumptions mentioned in the table is fulfilled. X: In order obtain the ”usual” OLS variance.

01B Linear Regression With Time Series Data

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01B Linear Regression With Time Series Data

Uploaded by

Copyright:

Available Formats

university of copenhagen department of economics

• Give an interpretation of the linear regression model for stationary time

• Explain the sufficient conditions for consistency, unbiasedness, and

• Construct misspecification tests and analyze to what extent a statistical

• Explain the consequences of autocorrelated, heteroskedastic, and

Econometrics II — Linear Regression with Time Series Data — Slide 2/47

Time Series Regression Models

• Consider the linear regression model

yt = xt0 β + t = x1t β1 + x2t β2 + ... + xkt βk + t ,

for t = 1, 2, ..., T . Note that yt and t are 1 × 1, while xt and β are k × 1.

• The interpretation of the model depends on the variables in xt .

Econometrics II — Linear Regression with Time Series Data — Slide 5/47

Interpretation of Regression Models

• Consider again the regression model

yt = xt0 β + t , t = 1, 2, ..., T . (∗)

As it stands, the equation is a tautology: not informative on β!

Econometrics II — Linear Regression with Time Series Data — Slide 7/47

• Predeterminedness implies the so-called moment condition:

stating that xt and t are uncorrelated.

E (xt (yt − xt0 β)) = 0

This is a system of k equations in the k unknown parameters, β, and if

β = E (xt xt0 )−1 E (xt yt ),

Econometrics II — Linear Regression with Time Series Data — Slide 8/47

Method of Moments (MM) Estimation

Note that the MM estimator coincides with the OLS estimator.

Econometrics II — Linear Regression with Time Series Data — Slide 9/47

• We impose assumptions to ensure that a LLN applies to the sample

Econometrics II — Linear Regression with Time Series Data — Slide 10/47

• Consistency is the first requirement for an estimator:

• The OLS estimator is consistent if the regression represents the

• The conditions in Result 1 are sufficient but not necessary.

Econometrics II — Linear Regression with Time Series Data — Slide 12/47

yt = xt β + t , t = 1, 2, ..., T , with E (xt t ) = 0 and E (xt ) = 0.

• Write the OLS estimator as

and look at the terms as T → ∞:

• Where LLN applies under Assumption 1; and

Econometrics II — Linear Regression with Time Series Data — Slide 13/47

• Unbiasedness requires strict exogeneity, which is not fulfilled in a dynamic

Here yt is function of t , so t cannot be uncorrelated with yt , yt+1 , ..., yT .

Result 3: Estimation bias in dynamic models

Example: Finite Sample Bias in an AR(1)

• Consider a first-order autoregressive, AR(1), model,

yt = θyt−1 + t , t = 1, 2, ..., T , (∗)

where θ = 0.9. The time series yt satisfy Assumption 1 of stationarity and

• But if we could draw realizations of θ

Econometrics II — Linear Regression with Time Series Data — Slide 15/47

Finite Sample Bias in an AR(1)

Econometrics II — Linear Regression with Time Series Data — Slide 16/47

Finite Sample Bias in an AR(1): Monte Carlo Simulations in

We consider Monte Carlo simulations to illustrate potential bias of the OLS

θ ∈ {0.0, 0.5, 0.9}.

This can be done quite easily in the module PcNaive in OxMetrics.

Econometrics II — Linear Regression with Time Series Data — Slide 17/47

Finite Sample Bias in an AR(1): PcNaive

Econometrics II — Linear Regression with Time Series Data — Slide 18/47

Finite Sample Bias in an AR(1): PcNaive Output for θ = 0

0.5 Ya_1 × 2MCSD

Econometrics II — Linear Regression with Time Series Data — Slide 19/47

Finite Sample Bias in an AR(1): PcNaive Output for θ = 0.5

20 40 60 80 100 120 140 160 180 200

Econometrics II — Linear Regression with Time Series Data — Slide 20/47

Finite Sample Bias in an AR(1): PcNaive Output for θ = 0.9

Econometrics II — Linear Regression with Time Series Data — Slide 21/47

yt = xt0 β + t = x1t β1 + x2t β2 + ... + xkt βk + t ,

for t = 1, 2, ..., T . Note that yt and t are 1 × 1, while xt and β are k × 1.

yt = xt0 β + t , t = 1, 2, ..., T . (∗)

stating that xt and t are uncorrelated.

yt = xt β + t , t = 1, 2, ..., T , with E (xt t ) = 0 and E (xt ) = 0.

Here yt is function of t , so t cannot be uncorrelated with yt , yt+1 , ..., yT .

yt = θyt−1 + t , t = 1, 2, ..., T , (∗)

• Recall that E (t xt ) = 0 is implied by E [t |xt ] = 0.

• If Cov(t , s ) 6= 0 for some t 6= s, we have autocorrelation of the error

• This is detected from the estimated residuals and Cov(b s ) 6= 0 is

Both yt−1 and t depend on t−1 , so E (yt−1 t ) 6= 0.

E ((xt − ρxt−1 ) (t − ρt−1 )) = 0.

But then t−1 should be uncorrelated with xt , i.e., E (t xt+1 ) = 0.

Then the error term is ut = x2t · β2 + t , which is autocorrelated if x2t is

where xt is included because it may be correlated with t−1 .

2t = γ0 + x1t γ1 + ... + xkt γk + x1t