Week 1

Applications of Econometrics
Time Series Basics

Wooldridge (2012) Chapters 10 & 11
Semester 2, 2023/24
Applications of Econometrics Ch. 10 & 11. Time Series Basics Semester 2, 2023/24 1 / 75
Course overview
Start recording!
Me: Dr. Yuejun Zhao
Textbook - Wooldridge, Introductory Econometrics: A Modern Approach, 7e
Part I - Topics in Time Series and Intro to Panel Data
Time series basics (chapters 10 & 11)
Serial correlation (chapter 12)
Stationary time series (chapter 18.1 and 18.5)
Nonstationary time series (chapter 18.2–18.4)
Diff-in-diff and first difference (chapter 13)
In this lecture
1 Types of Time Series Models

Static models
Finite distributed lag (FDL) models
Models with lagged dependent variables
2 Finite-Sample Analysis of OLS for Time Series Data
3 Large-Sample Analysis of OLS for TS Data
4 Trends and Seasonality
Types of Times Series Models
Types of Time Series Models
Types of Times Series Models Static models
What are static models?
Let us begin with a class of models called static models.

The simplest model with time series is called a static model, which related
the outcome (or the dependent variable) at time t, yt , to one or more
explanatory variables (or independent variables, regressors) dated at the
same time.
With just one explanatory variable zt , we have
yt = β0 + β1 zt + ut .
In cross-sectional analysis, we used subscript i to denote the unit (e.g.,

individual, school). Here, we use subscript t to denote time (e.g., day, quarter).
An example of a static model
Example: Static Phillips curve

The Phillips curve is an economic model that hypothesises that inflation and
unemployment share an inverse relationship.
One way to write a static Phillips curve is
inft = β0 + β1 unemt + ut ,
where inft is, say, the annual rate of inflation during year t, and unemt is annual
unemployment rate during year t. β1 is supposed to the measure the tradeoff
between inflation and unemployment.
When are static models used?
Static models are generally used when we are interested in a

contemporaneous relationship (i.e., a relationship at the same period). But
they cannot capture effects that take place with a lag.
Static models are not good for forecasting. For one, they ignore the fact that
usually past outcomes on y help predict further values of y . Second, to
forecast yt+1 at time t we would have to know or forecast zt+1 at time t. There
are more direct ways to forecast.
Types of Times Series Models FDL models
Finite distributed lag (FDL) model with two lags
Another class of models is the finite distributed lag (FDL) models.

Suppose that we think that a change in a variable, z, today, can affect y up to
two periods into the future. This calls for a finite distributed lag model with
two lags (in addition to the contemporaneous variable):
yt = α0 + δ0 zt + δ1 zt−1 + δ2 zt−2 + ut
Finite lags: 2.
Such models are good for estimating lagged effects of, say, policy.
An example of a finite distributed lag model
Example: Effects of personal exemption on fertility

Personal exemption is the amount taxpayers can claim as a tax deduction
against personal income in the US.
For purely biological reasons, the effect of making it monetarily more attractive
to have children – by increasing the value of the personal exemption, pe, – is
unlikely to have a purely contemporaneous effect. Plus, people often react
with a lag to policy changes.
Allowing for a two-year effect on the general fertility rate (gfr ):
gfrt = α0 + δ0 pet + δ1 pet−1 + δ2 pet−2 + ut
A similar example
Example: Effects of minimum wage on employment

Suppose we have monthly data on employment and the minimum wage. Will a
change in the minimum wage have its total effect on this month’s employment, or
might the effect take several months to pass through?
Finite distributed lag model with q lags
We can generalise the FDL model.

An FDL model of order q is
yt = α0 + δ0 zt + δ1 zt−1 + δ2 zt−2 + . . . + δq zt−q + ut
As a practical matter, the choice of q can be hard. Often dictated by frequency

of data.
With annual data, q is usually small.
With monthly data, q is often chosen as 12, 24, or even higher, depending on
how many months of data we have.
As we will see, under some assumptions we can use an F test to see if
additional lags are jointly significant.
Lag distribution
With an FDL model, we are often interested in the shape of the lag
distribution, which is just the values of the δj . Of course, we will have to
eventually estimate the δj .
Unfortunately, the estimated lag distribution is often very jagged because the
lag coefficients are imprecisely estimated.
Lag distributed illustrated
Here, δ0 = .3, δ1 = .4, δ2 = .1, and δj = 0 for j ≥ 3. So it represents an FDL of

order 2, with the maximum effect after one period.
Impact propensity
With an FDL model, we can calculate something called the impact propensity.
The coefficient on the contemporaneous z, δ0 , is the impact propensity (IP):
Impact Propensity = δ0
It tells us the immediate change in y when z increases by one unit.

If both variables are in logarithmic form, the IP is sometimes called the short
run (instantaneous) elasticity.
In some examples, δ0 might be zero. We can impose that by dropping zt .
Long run propensity (LRP)
With an FDL model, we can also compute something called the long run
propensity.
The sum of all lag coefficients – where we must make sure to keep the signs –
is the long run propensity (LRP).
Long Run Propensity = δ0 + δ1 + . . . + δq
The LRP gives us the answer to the following thought experiment. Suppose
the level of z increases permanently today. For example, the minimum wage
increases by $1.00 per hour and stays there. The LRP is the (ceteris paribus)
change in y after the change in z has passed through all q time periods.
If y and z are both in logs, the LRP is called the long run elasticity.
What if the change is not permanent?
Notice that if z increases by one unit today, but then falls back to its original
level in the next period, the lag distribution tells us how y changes in each
future period. Eventually y falls back to its original level with a temporary
change in z.
With permanent change in z, y changes to a new level, and the change from
the old to the new level is the LRP.
An example of an LRP
We can plot the cumulative effect of changing z permanently (other factors
fixed) using the previous DL that was plotted. LRP = .3 + .4 + .1 = .8.
LRP reached 0.8 once we sum up all the non-zero lags.

Finite distributed lag models with multiple explanatory variables
We can, of course, have more than one variable appear with multiple lags.
Example: Federal funds rate

A simple equation to explain how the Federal Reserve Bank in the U.S. changes
the Federal Funds Rate is
ffratet = α0 + δ0 inf t + δ1 inf t−1 + δ2 inf t−2

+γ0 gdpgapt + γ1 gdpgapt−1 + γ2 gdpgapt−2 + ut ,
where inf t is the inflation rate and gdpgapt is the GDP gap (actual GDP - potential
GDP, measured as a percent). If the available data are usually quarterly, we
probably would try at least four lags.
Are FDLs good for forecasting?
FDLs are often more realistic than static models, and they can do better in
forecasting because they account for some dynamic behaviour.
But they are not usually the most preferred for forecasting because they do
not allow lagged outcomes on y to directly affect current outcomes.
Which brings us to...
Types of Times Series Models Models with lagged dependent variables
Autoregressive model of order 1
The third class of models is models with lagged dependent variables.

With time series data, there is the possibility of allowing past outcomes on y to
affect current y. The simplest model is
yt = β0 + β1 yt−1 + ut ,
which is a simple regression model for time series data where the explanatory
variable at time t is yt−1 .
Called an autoregressive model of order 1, or AR(1).
Autoregressive models with higher orders
Does AR(1) carry economic interpretations? This simple model typically does
not have much economic or policy interest because we are just using lagged y
to explain current y .
We can add even more lags of y to explain yt . (Each time we add a lag, we
lose an observation for estimating the parameters.)
Autoregressive models can be remarkably good at forecasting, even
compared with complicated economic models.
Combining AR models with static models
It is easy to add other explanatory variables along with a lag. For example,
yt = β0 + β1 yt−1 + β2 zt + ut
β2 measures the effect of changing zt on yt , holding fixed yt−1 . It is a kind of

short-run effect of z on y .
Controlling for yt−1 while estimating the effect of zt can be effective for
estimating the causal effect of zt on yt : it recognises that the policy variable
(zt ) may be correlated with yt−1 .
An example of an AR(1) with a static regressor
Example: Monthly employment growth

Consider
gempt = β0 + β1 gempt−1 + β2 gminwaget + ut
where gempt is, say, monthly employment growth (as a percentage) in an
economy, or a sector of the economy, and gminwaget the percentage growth
in the (real value of) the minimum wage.
β2 measures the effect of changing minimum wage growth on employment
growth this period. By controlling for gempt−1 , we allow the possibility that
gminwaget reacts to past employment.
Other extensions
Other extensions can be very useful for forecasting. For example, to forecast
inflation one period out, we include only lags of variables:
inft = β0 + β1 inft−1 + β2 unemt−1 + ut
This model implies we forecast next period’s inflation, inft+1 , as a linear

function of this period’s inflation and unemployment:
β0 + β1 inft + β2 unemt
(What about the error term? We cannot forecast the error term next period,
t + 1, so it is set to its mean value, zero.)
We have to estimate the βj first to make this operational.
Inference on models with lagged dependent variables
Statistically, models with lagged dependent variables are more difficult to

study. For one, the OLS estimators are no longer unbiased under any
assumptions, so large-sample analysis is very important.
(Large-sample analysis is critical for static and FDL models, too, but at least a
finite-sample analysis makes sense sometimes.)
Large-sample analysis is much trickier with time series data because of
correlation across time. With cross-sectional data, we relied on random
sampling.
Finite-Sample Properties
Finite-Sample Analysis of
OLS for Time Series Data
TS.1 Linear in parameters
Assumption TS.1 (Linear in Parameters)

For a time series process {(yt , xt1 , . . . , xtk ) : t = 1, . . . , n},
yt = β0 + β1 xt1 + β2 xt2 + . . . + βk xtk + ut , t = 1, . . . , n.
Can we allow non-linear combinations of xtj ? Yes.

Can we allow non-linear combinations of βj ? No.
TS.2 No perfect collinearity
Assumption TS.2 (No Perfect Collinearity)

Each xtj varies somewhat over time, and no explanatory variable is an exact linear
function of the others.
TS.2 rules out perfect correlation. The consequences of high correlation

among the xtj is the same as the CS case: it is not a violation of any
assumptions, but it can cause difficulty in precisely estimating parameters.
TS.3 Zero conditional mean
Assumption TS.3 (Zero Conditional Mean)

For each t,
E(ut |x1 , x2 , . . . , xt , . . . , xn ) = 0
where xt = (xt1 , . . . , xtk ) is the collection of explanatory variables at time t.
In practice, we ask whether ut is uncorrelated with each xsj for all t and s,
including t = s and all variables j = 1, . . . , k .
Assumption TS.3 is often called strict exogeneity of {xt : t = 1, . . . , n}.
TS.1 – TS.3 Unbiasedness
Theorem (Unbiasedness of OLS for Time Series)

Under Assumptions TS.1, TS.2, and TS.3, the OLS estimators are unbiased
(conditional on the realisation of the explanatory variables):
E(β̂j ) = βj , j = 0, . . . , k .
Notice that we get unbiasedness without restricting the correlation across time
in the explanatory variables. In other words, the {xtj } are allowed to be
correlated across time.
Further, the errors, {ut } are allowed to be correlated across time.
What we are ruling out with TS.3 is correlation between the errors in any time
periods and the explanatory variables in any time period.
TS.4 Homoskedasticity
Of course, unbiasedness says nothing about how precise the OLS estimators
are, and it does not give us a way to test hypotheses or construct confidence
intervals.
Assumption TS.4 (Homoskedasticity)

For all t,
Var (ut |X) = σ 2 .
Variance of ut cannot depend on xt , xs , or change over time for reasons we do

not know.
Violation of TS.4 is called “heteroskedasticity,” as in the cross-sectional case.
TS.5 No serial correlation
Assumption TS.5 (No Serial Correlation)

For all t ̸= s,
Corr (ut , us |X) = 0
Sometimes called the no serial correlation (no autocorrelation)

assumption.
In practice, do not worry about the conditioning on X. Just consider
Corr (ut , us ).
Three types of correlations
x1 u1
   
 x2   u2 
.. ..
   
   
 .  
 ←− 2. correlation in xs , ut −→  . 
1. correlation in xt   3. correlation in ut
TS.2, no perfect collinearity 
 xt 
 TS.3, zero conditional mean

 ut  TS.5, no serial correlation

 ..   .. 
 .   . 
xn un
TS.3’
TS.1 – TS.5 Gauss-Markov Theorem
Theorem (Gauss-Markov Theorem for TS):

Under TS.1 through TS.5, the OLS estimators are BLUE: the best (i.e., smallest
variance), linear, unbiased estimators.
Assumptions TS.1 through TS.5 are the Gauss Markov assumptions for
time series data.
TS.6 Normality
To perform exact inference, we add a normality assumption.
Assumption TS.6 (Normality)

{ut } is independent of X and is independent and identically distributed (i.i.d.) as
Normal(0, σ 2 ):
ut ∼ Normal(0, σ 2 ), t = 1, 2, . . . , n
Assumptions TS.1 to TS.6 are the classical linear model (CLM)

assumptions for time series.
TS.1 – TS.6 Exact inference
Theorem (Statistical Inference for TS)

Under TS.1 to TS.6, t statistic = (β̂ − β)/se(β̂) has exact tn−k −1 distributions
(under the null), the usual confidence intervals follow the pre-specified confidence
r −SSRur )/q
levels, and F statistic = (SSR
SSRur /(n−k−1)
have exact Fq,n−k−1 distributions.
However, the full set of CLM assumptions is often unrealistic for TS applications.
Strict exogeneity rules out some interesting cases such as autoregressive
models.
Serial correlation is often a problem, especially in static and FDL models.
Example: Inflation and interest rate
Consider an example where strict exogeneity does not hold.

Suppose we have an FDL relationship between inflation and an interest rate,
say, the federal funds rate:
inft = α0 + δ0 ffratet + δ1 ffratet−1 + δ2 ffratet−2 + ut
If we assume two lags of the FF rate suffice, we need not worry about
correlation between ut and further lags of ffrate.
But perhaps a positive shock to inflation at time t – that is, ut > 0 – leads the
Fed to increase ffrate the next period. Then ffratet+1 and ut are correlated,
violating strict exogeneity.
Fortunately, we can allow these situations when we examine large-sample
properties. But there are some additional complications there.
Something for the break
What is the large-sample equivalent of each assumption?
Large-Sample Analysis for TS Data
Large-Sample Analysis of OLS for TS Data
Large sample assumptions
The assumption of stationarity – that all joint distributions of the time series
process are constant across time – simplifies statements of the assumptions
but is not crucial.
The crucial assumption is weak dependence (topic 4).
For a series yt that follows
yt = ρyt−1 + ut ,
weak dependence means that |ρ| < 1.
TS.1′ Linearity and weak dependence
Assumption TS.1′ (Linearity and Weak Dependence)

The model is
yt = β0 + β1 xt1 + . . . + βk xtk + ut
where {(xt1 , . . . , xtk , yt )} is a stationary and weakly dependent process. In
particular, we can apply the law of large numbers and central limit theorem.
This is the same linear-in-parameters model as usual, but we also restrict the
time series dependence in the data.
TS.2′ No perfect collinearity
Assumption TS.2′ (No Perfect Collinearity)

Each xtj varies somewhat over time, and no explanatory variable is an exact linear
function of the others. (Same as TS.2.)
TS.3′ Zero conditional mean
Assumption TS.3′ (Zero Conditional Mean)

The explanatory variables are contemporaneously exogenous, that is
E(ut |xt1 , . . . , xtk ) = E(ut ) = 0.
This is implied by strict exogeneity but applies to lots of cases strict

exogeneity does not. three types
TS.1′ – TS.3′ Consistency
Theorem (Consistency of OLS)

Under Assumptions TS.1′ to TS.3′ , the OLS estimators are consistent. That is, the
probability limit of β̂j is βj as the sample size grows.
Has a similar flavor to the unbiasedness result, but two key points:
Unbiasedness required strict exogeneity but consistency does not.
Consistency assumes weak dependence whereas unbiasedness does not
(provided we have strict exogeneity).
The consistency result justifies models with lagged dependent variables and
other non-strictly exogenous variables. But we often have small time series
samples, especially with annual data.
TS.4′ Homoskedasticity
Assumption TS.4′ (Homoskedasticity)

The errors are contemporaneously homoskedastic, that is,
Var (ut |xt ) = Var (ut ) = σ 2
This assumption is more natural than TS.4, where we asked ut to have

constant variance conditional on X from all periods.
Here, the variance of the error term cannot depend on whatever explanatory
variables are in the equation at time t.
For example, in
yt = α0 + δ0 zt + δ1 zt−1 + δ2 zt−2 + ut ,
we require that Var (ut |zt , zt−1 , zt−2 ) = σ 2 .
TS.5′ No serial correlation
Assumption TS.5′ (No Serial Correlation)

For all t ̸= s,
E(ut us |xt , xs ) = 0
This assumption is stated in its form that is needed to get useful results. But
when we go to evaluate it, we focus on the covariance without the
conditioning:
Cov (ut , us ) = 0, all t ̸= s.

The assumption is much more likely to hold—in fact, the goal is often to make
it hold—when we include lagged dependent variables.
Example: Inflation and interest rate
In the interest rate example, suppose ut is serially correlated in
inft = α0 + δ0 ffratet + δ1 ffratet−1 + δ2 ffratet−2 + ut .
We can add lags of inf or further lags of ffrate to eliminate the serial
correlation.
We may end up with a model such as
inft = α0 + ρ1 inft−1 + δ0 ffratet + δ1 ffratet−1 + δ2 ffratet−2 + δ3 ffratet−3 + ut .
The key point is this: we can include lagged y and possibly other variables; if
we have enough lags then there cannot be serial correlation.
TS.1′ – TS.5′ Asymptotic normality
Theorem (Asymptotic Normality of OLS)

Under Assumptions TS.1′ to TS.5′ , the OLS estimators are approximately normally
distributed as n → ∞. Moreover, the usual t statistics are asymptotically standard
normal and the F statistics are valid in large samples, as are the usual OLS
confidence intervals.
The result of this is that we can use large-sample inference for regression with
time series like we do with cross section, where we have replaced random
sampling with the idea of weak dependence (to allow some, but not too much,
time series correlation).
In topic 2, we discuss what to do when there is serial correlation – Assumption
TS.5′ is violated – as that is an issue we did not need to confront with CS data.
Summary
Compared to finite-sample analysis, we have

additionally imposed weak dependence (TS.1’)
left no perfect collinearity unchanged (same TS.2’)
replaced strict exogeneity with contemporaneous exogeneity (TS.3’)
achieved consistency instead of unbiasedness (TS.1’–TS.3’)
replaced rather strict homoskedasticity with contemporaneous
homoskedasticity (TS.4’)
conditioned the serial correlation on explanatory variables in the time periods
coinciding with the error terms (TS.5’)
attained asymptotic normality rather than imposing normality (no TS.6’).
Trends and Seasonality
Trending data
Many series tend to increase over time, at least on average. They typically
have up and down periods, but the overall trend is up. (An example is gross
domestic product.)
Other variables tend to decline over time (such as the rate of traffic fatalities).
Whether a series grows or shrinks over time, care needs to be in place
because we can find spurious (not genuine) relations among trending
variables that have nothing to do with each other.
Trend illustrated
Example: Index of U.S. labour productivity, 1947-1987
There is an upward trend that is more or less linear.
Linear time trends
It is very common to use linear time trends. A simple representation is
yt = α0 + α1 t + et
E(et ) = 0 for all t
So the average value of yt is a linear function of time:
E(yt ) = α0 + α1 t
and then et is the (small) deviations about the trend.
Detrending the data
Let’s define the change in yt from period t − 1 to t as
∆yt = yt − yt−1
Then under the linear trend representation,
∆yt = yt − yt−1
= (α0 + α1 t + et ) − (α0 + α1 (t − 1) + et−1 ) (1)
= α1 + ∆et
Interpreting α1
Then how do we interpret α1 ?

Because E(∆et ) = E(et − et−1 ) = 0,
α1 = E(∆yt ) − E(∆et )
= E(∆yt ) for all t
In other words, α1 is the average change in yt over each period.
Exponential trends
Other series are better approximated by exponential trends (population, imports).
Example: U.S. imports, billions of dollars
Imports grow exponentially with time.

Specifying exponential trends
For strictly positive variables – by far the leading case – we can capture an
exponential trend as
yt = exp(β0 + β1 t + et ),
where E(et ) = 0.
Taking logs gives
log(yt ) = β0 + β1 t + et
In other words, the log of the variable follows a linear trend.
Interpreting β1
How can we interpret β1 ? Define the change in the log as
∆ log(yt ) = log(yt ) − log(yt−1 )
If we set E(et ) and E(et−1 ) to zero, following the derivation in Eq. (1), we get
E[∆ log(yt )] = β1 for all t
Remember that the change in the log approximates the average growth rate
(as a decimal). Therefore,

(yt − yt−1 )
β1 ≈ E
yt−1
Trending data in regression
In order to be sure we are capturing a true relationship between yt and the

explanatory variables when at least one is linearly trending, we can add a
linear time trend to the regression.
An example is
yt = β0 + β1 xt1 + β2 xt2 + β3 t + ut , t = 1, 2, . . . , n
where t is a variable that runs from 1 to n.

This equation allows us to control for a linear trend that affects yt and may
also be related to trends in xt1 , and xt2 .
Inference with and without the trend
Is the trend term really necessary?

If the equation with the time trend satisfies TS.1 (linear in parameters), TS.2
(no perfect collinearity), and TS.3 (zero conditional mean), then leaving out t
causes bias – perhaps severe bias – in estimating β1 and β2 .
Under the full set of CLM assumptions (adding homoskedasticity, no serial
correlation, and normality), we can use the usual statistics, and confidence
intervals, in the usual way. This includes for testing the coefficient on the
trend, β3 .
Example: Effects of seat belt and speed laws on traffic accidents
Let’s explore trends with an example. The data (TRAFFIC.DTA) are monthly
for 9 years for California, from 1981 to 1989 (108 observations).
. des totacc spdlaw beltlaw unem t
storage display value
variable name type format label variable label
------------------------------------------------------------------------
totacc float %9.0g statewide total accidents
spdlaw byte %9.0g =1 after 65 mph in effect
beltlaw byte %9.0g =1 after seatbelt law
unem float %9.0g state unemployment rate
t int %9.0g time trend
. sum totacc spdlaw beltlaw unem t
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
totacc | 108 42831.26 4608.328 32699 52971
spdlaw | 108 .2962963 .4587521 0 1
beltlaw | 108 .4444444 .4992206 0 1
unem | 108 7.200926 1.790134 4.3 11.9
t | 108 54.5 31.32092 1 108
totacc is total number of accidents in each month. The key policy variables
are spdlaw and beltlaw, binary indicators. spdlaw is one when the speed limit
was raised from 55 mph to 65 mph. beltlaw is 1 after a mandatory seat belt
law went into effect.
If we estimate
log(totacct ) = β0 + β1 spdlawt + β2 beltlawt + ut ,

(what model is this?*), we get
. reg ltotacc spdlaw beltlaw
Source | SS df MS Number of obs = 108
---------+------------------------------ F( 2, 105) = 100.78
Model | .82707904 2 .41353952 Prob > F = 0.0000
Residual | .430858438 105 .004103414 R-squared = 0.6575
---------+------------------------------ Adj R-squared = 0.6510
Total | 1.25793748 107 .011756425 Root MSE = .06406
------------------------------------------------------------------------------
ltotacc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
spdlaw | .0127553 .0196136 0.65 0.517 -.0261349 .0516456
beltlaw | .1674237 .0180237 9.29 0.000 .131686 .2031613
_cons | 10.58104 .0082698 1279.47 0.000 10.56465 10.59744
------------------------------------------------------------------------------
Because the dependent variable is a logarithm, the estimates imply that

raising the speed limit increased accidents by about 1.3% on average, but the
effect is not statistically different from zero.
Imposing a seat belt law increased accidents by a large 16.7%, and this is
statistically significant.
What do we make of the coefficients?

There is evidence that people drive less safely when they feel more secure.
Could it be that forcing people to wear seat belts actually increases accidents
by such a large amount?
Accidents, in particular, its log, trends upward over the period (as population
increases and more miles are driven). The laws went into effect in the latter
half of the sample so spdlaw and beltlaw are positively correlated with a time
trend.
Total traffic accidents fluctuate but increase over time on average.
But including the time trend gives

. reg ltotacc spdlaw beltlaw t
---------+------------------------------ F( 3, 104) = 91.13
Model | .91128369 3 .30376123 Prob > F = 0.0000
---------+------------------------------ Adj R-squared = 0.7165
Total | 1.25793748 107 .011756425 Root MSE = .05773
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
spdlaw | -.0352312 .0200908 -1.75 0.082 -.075072 .0046096
beltlaw | .091445 .0221899 4.12 0.000 .0474416 .1354484
t | .0019994 .0003978 5.03 0.000 .0012106 .0027883
_cons | 10.52006 .0142396 738.79 0.000 10.49182 10.5483
------------------------------------------------------------------------------
Now increasing the the speed limit from 55 to 65 appears to decrease total
accidents. The effect of the seat belt law is now smaller but still substantial
and statistically significant.
Could there be omitted variables?

It is possible that raising the maximum speed limit (on rural interstates) from
55 mph to 65 reduced accidents, and that imposing a seat belt law increased
accidents. But it could also be that the linear time trend only crudely accounts
for other factors affecting accidents.
Adding a measure of economic activity – the state level unemployment rate –
makes the speed limit law have even more of a negative effect and it becomes
very statistically significant (next page). The seatbelt law now has about a
6.9% effect and is still statistically significant.
If the unemployment rate increases by one percentage point, total accidents

fall by an estimated 2.7%, and tunem = −5.75.
. reg ltotacc spdlaw beltlaw unem t
---------+------------------------------ F( 4, 103) = 97.65
Model | .995434322 4 .24885858 Prob > F = 0.0000
---------+------------------------------ Adj R-squared = 0.7832
Total | 1.25793748 107 .011756425 Root MSE = .05048
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
spdlaw | -.0557147 .0179257 -3.11 0.002 -.0912661 -.0201633
beltlaw | .0686293 .0198053 3.47 0.001 .0293503 .1079084
unem | -.0265936 .004628 -5.75 0.000 -.0357722 -.017415
t | .0013536 .0003656 3.70 0.000 .0006286 .0020786
_cons | 10.76297 .0440684 244.23 0.000 10.67557 10.85037
------------------------------------------------------------------------------
The time trend shows that, controlling for the unemployment rate and policy
changes, total accidents increase by about 0.135% per month, or 1.62% at an
annual rate. (How did we get 1.62%?*)
A detrending interpretation of including time trends
It turns out that adding a time trend – linear or more complicated – to a

multiple regression analysis has a nice interpretation in terms of detrending yt
and each explanatory variables.
If we include a time trend, the OLS coefficient on xtj is the same as if we first
remove a trend from yt and each xt1 , . . . , xtk – whether or not we need to.
Goodness of fit with a time trend specified
One needs to be cautious in interpreting goodness of fit when yt is trending.

The usual and adjusted R-squareds do not properly remove the trend from yt .
In the last regression using log(totacct ), R 2 = .791, which suggests a very
good fit. But most of that is due to the time trend. In other words, the trend
↑ R2.
Including a time trend essentially represents our ignorance about omitted
factors causing yt to trend up or down. We should not get credit for the fit due
to yt trending up or down for reasons we have not explained.
Introduction to seasonality
The next important concept in TS analysis is seasonality.

When the data are quarterly or monthly (or, less often, weekly or daily), we
may need to account for different seasonal patterns in yt or the xtj .
Often, quarterly or monthly data are seasonally adjusted by the government,
in which case we can ignore seasonality. But in other cases the data have not
been adjusted.
How to account for seasonality?
An easy way to account for seasonality is to include seasonal dummy

variables in the regression equation. We choose a quarter (usually the first)
or a month (usually January) and then include dummies for the remaining
quarters or months. (Why do we leave one out?*)
For example, if we have monthly data, we define dummies febt , mart , ..., novt ,
dect , equal to one when t corresponds to the appropriate month, and zero
otherwise.
yt = β0 + δ1 febt + δ2 mart + . . . + δ11 dect +

β1 xt1 + . . . + βk xtk + ut
Interpreting models with seasonal dummies
Several noteworthy points:

What happens when there exist trends and seasonality? We can include
trends along with seasonal dummies.
If the CLM assumptions hold, we can use a joint F test for whether the
seasonal dummies are jointly significant.
One can give OLS regression with seasonal dummies an interpretation of
deseasonalising the data.
R-squareds can be computed after yt has been deasonalised (and possibly
detrended, too).
Example: Traffic data revisited I
In the TRAFFIC.DTA example, adding seasonal dummies changes the

estimates somewhat – most notably, the beltlaw coefficient increases to .095.
Many of the seasonal dummies are very statistically significant and large. For
example, on average, there are about 8.1% more accidents in October than in
January.
The high R-squared is to be discounted. If ltotacc is first detrended and
deseasonalized (by regressing on a time trend and the seasonal dummies
and keeping the residuals), the adjusted R-squared falls to .481 from .895.
Example: Traffic data revisited II
. reg ltotacc spdlaw beltlaw unem t feb-dec

-------------+------------------------------ F( 15, 92) = 61.55
Model | 1.14394116 15 .076262744 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.8946
Total | 1.25793748 107 .011756425 Root MSE = .0352
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
spdlaw | -.0533559 .0125802 -4.24 0.000 -.0783413 -.0283706
beltlaw | .0950403 .0142103 6.69 0.000 .0668174 .1232632
unem | -.0212768 .0033927 -6.27 0.000 -.0280149 -.0145386
t | .0010995 .0002576 4.27 0.000 .000588 .001611
feb | -.039384 .0165991 -2.37 0.020 -.0723512 -.0064169
mar | .0750819 .0166398 4.51 0.000 .0420339 .1081299
apr | .0078247 .0167641 0.47 0.642 -.0254702 .0411196
may | .0228674 .0168919 1.35 0.179 -.0106813 .0564161
jun | .0185189 .0167546 1.11 0.272 -.0147572 .051795
jul | .0491413 .0166597 2.95 0.004 .0160537 .0822289
aug | .0551325 .0167715 3.29 0.001 .0218229 .0884421
sep | .0390115 .0169108 2.31 0.023 .0054251 .0725978
oct | .0808843 .0169088 4.78 0.000 .0473019 .1144667
nov | .0738247 .0168761 4.37 0.000 .0403072 .1073421
dec | .097514 .0169569 5.75 0.000 .063836 .1311919
_cons | 10.68606 .0351915 303.65 0.000 10.61616 10.75595
------------------------------------------------------------------------------
Example: Traffic data revisited III
. quietly reg ltotacc t feb-dec

. predict ltotacc_dtds, resid
. reg ltotacc_dtds spdlaw beltlaw unem t feb-dec
-------------+------------------------------ F( 15, 92) = 7.61
Model | .141500444 15 .009433363 Prob > F = 0.0000
-------------+------------------------------ Adj R-squared = 0.4811
Total | .255496765 107 .00238782 Root MSE = .0352
------------------------------------------------------------------------------
ltotacc_dtds | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
spdlaw | -.0533559 .0125802 -4.24 0.000 -.0783413 -.0283706
beltlaw | .0950403 .0142103 6.69 0.000 .0668174 .1232632
unem | -.0212768 .0033927 -6.27 0.000 -.0280149 -.0145386
t | -.0016476 .0002576 -6.40 0.000 -.0021591 -.0011361
feb | .0033025 .0165991 0.20 0.843 -.0296647 .0362696
mar | -.0047427 .0166398 -0.29 0.776 -.0377907 .0283053
apr | -.0106602 .0167641 -0.64 0.526 -.0439551 .0226347
may | -.0092307 .0168919 -0.55 0.586 -.0427794 .0243179
jun | -.0016729 .0167546 -0.10 0.921 -.034949 .0316032
jul | .0115587 .0166597 0.69 0.490 -.0215289 .0446463
aug | .0011495 .0167715 0.07 0.946 -.0321602 .0344591
sep | -.0033496 .0169108 -0.20 0.843 -.0369359 .0302368
oct | -.0012291 .0169088 -0.07 0.942 -.0348115 .0323532
nov | .0025461 .0168761 0.15 0.880 -.0309713 .0360636
dec | .0013568 .0169569 0.08 0.936 -.0323211 .0350348
_cons | .2174899 .0351915 6.18 0.000 .1475965 .2873832
------------------------------------------------------------------------------

Week 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 1

Uploaded by

Copyright:

Available Formats

Applications of Econometrics

Time Series Basics

1 Types of Time Series Models

2 Finite-Sample Analysis of OLS for Time Series Data

3 Large-Sample Analysis of OLS for TS Data

4 Trends and Seasonality

Types of Time Series Models

What are static models?

Let us begin with a class of models called static models.

In cross-sectional analysis, we used subscript i to denote the unit (e.g.,

An example of a static model

Example: Static Phillips curve

When are static models used?

Static models are generally used when we are interested in a

Finite distributed lag (FDL) model with two lags

Another class of models is the finite distributed lag (FDL) models.

An example of a finite distributed lag model

Example: Effects of personal exemption on fertility

gfrt = α0 + δ0 pet + δ1 pet−1 + δ2 pet−2 + ut

Example: Effects of minimum wage on employment

Finite distributed lag model with q lags

We can generalise the FDL model.

yt = α0 + δ0 zt + δ1 zt−1 + δ2 zt−2 + . . . + δq zt−q + ut

As a practical matter, the choice of q can be hard. Often dictated by frequency

Lag distributed illustrated

Here, δ0 = .3, δ1 = .4, δ2 = .1, and δj = 0 for j ≥ 3. So it represents an FDL of

yt = α0 + δ0 zt + δ1 zt−1 + δ2 zt−2 + . . . + δq zt−q + ut

The coefficient on the contemporaneous z, δ0 , is the impact propensity (IP):

It tells us the immediate change in y when z increases by one unit.

Long run propensity (LRP)

Long Run Propensity = δ0 + δ1 + . . . + δq

What if the change is not permanent?

LRP reached 0.8 once we sum up all the non-zero lags.

Finite distributed lag models with multiple explanatory variables

Example: Federal funds rate

ffratet = α0 + δ0 inf t + δ1 inf t−1 + δ2 inf t−2

Are FDLs good for forecasting?

Autoregressive model of order 1

The third class of models is models with lagged dependent variables.

Autoregressive models with higher orders

Combining AR models with static models

β2 measures the effect of changing zt on yt , holding fixed yt−1 . It is a kind of

An example of an AR(1) with a static regressor

Example: Monthly employment growth

inft = β0 + β1 inft−1 + β2 unemt−1 + ut

This model implies we forecast next period’s inflation, inft+1 , as a linear

Inference on models with lagged dependent variables

Statistically, models with lagged dependent variables are more difficult to

TS.1 Linear in parameters

Assumption TS.1 (Linear in Parameters)

yt = β0 + β1 xt1 + β2 xt2 + . . . + βk xtk + ut , t = 1, . . . , n.

Can we allow non-linear combinations of xtj ? Yes.

TS.2 No perfect collinearity

Assumption TS.2 (No Perfect Collinearity)

TS.2 rules out perfect correlation. The consequences of high correlation

TS.3 Zero conditional mean

Assumption TS.3 (Zero Conditional Mean)

TS.1 – TS.3 Unbiasedness

Theorem (Unbiasedness of OLS for Time Series)

Assumption TS.4 (Homoskedasticity)

Variance of ut cannot depend on xt , xs , or change over time for reasons we do

TS.5 No serial correlation

Assumption TS.5 (No Serial Correlation)