Time Series: Chapter 4 - Estimation

Time Series
Chapter 4 - Estimation
Estimation (Time Series) Chapter 4 1 / 53

...
Agenda
1 Introduction
2 Estimation of Parameters in Time Series Models
Moment Estimators (All models)
Yule Walker Estimators (AR model only)
Least Squares Estimators (AR model only)
Conditional Least Squares Estimators (MA/ARMA models)
Maximum Likelihood Estimator (All models)
3 Inference of Parameters in Time Series Models
General M-estimation Theory: MM/YW/CLS/MLE
LSE
4 Order Selection
Graphical Methods using ACF/PACF
Information Criteria
5 Residual Analysis
6 Model Building
...
Introduction
ARIMA(p, d, q) Model
φ(B)(1 − B)d (Yt − µ) = θ(B)Zt , where Zt ∼ W N (0, σ 2 ) .
Unknown order: (p, d, q)

Unknown parameters: µ, φ1 , φ2 , . . . , φp , θ1 , θ2 , . . . , θq , σ 2

Questions:
Given the order (p, d, q), how to estimate the unknown parameters ?
How to estimate the order (p, d, q)?

...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Moment Estimators
Sample Moments ( n1 ni=1 Yik ) can be computed from the data
P
Theoretical Moments (E(Y k )) depends on unknown parameters

Examples:
Theoretical Moment Sample Moment

Pn
µ Y = n1 Y
t=1 t
C0 = n1 (Yt − Y )2
P
γ(0)
Pn−k
γ(k) Ck = n1 t=1
(Yt − Y )(Yt+k − Y )
ρ(0) 1
ρ(k) rk = C C0
k
Basic idea of Moment Estimators:

Match the sample and theoretical moments to solve for the unknown
parameters
Express γ(k) or ρ(k) in terms of unknown parameters
Solve Ck = γ(k) or rk = ρ(k), k = 1, . . . , q, for unknown parameters

...
Moment Estimator: MA models
Example 1
Find the method of moment estimators of θ and σ 2 in the MA(1) model
Yt = Zt − θZt−1 , Zt ∼ W N (0, σ 2 ) .
Theoretical moments:
θ
ρ(1) = − 1+θ 2
γ(0) = (1 + θ2 )σ 2
Sample moments:
Pn−1
r1 = P(Yn t −Y )(Yt+12 −Y )
t=1
(Yt −Y )
Pn t=1
C0 = n1 t=1 (Yt − Y )2
The estimates satisfy
√
θ −1± 1−4r12
r1 = − b ⇒ r1 θb2 + θb + r1 = 0 ⇒ θb = 2r1
1+bθ2
C0 = (1 + θb2 )b
σ2 ⇒ σ b2 = C0 2
1+bθ

...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Parameter Estimators in AR Model: Yule Walker
Yule-Walker equations is a system of equations connecting AR

parameters φi s and ACVF/ACFs:
For k = 1, 2, ..., p, find covariance between
Yt−k and each side of Yt = φ1 Yt−1 + · · · + φp Yt−p + Zt
⇒ γ(k) = φ1 γ(k − 1) + · · · + φp γ(k − p)
⇒ ρ(k) = φ1 ρ(k − 1) + · · · + φp ρ(k − p) (easier to estimate ρ(k) by R)
Given parameters φi s, we have used Yule-Walker equation to find

ACVF/ACFs (Chapter 3)
Given sample ACFs, we can use Yule-Walker equation to estimate φi s

...
ρ(1) = φ1 ρ(0) + · · · + φp ρ(p − 1)

ρ(2) = φ1 ρ(1) + · · · + φp ρ(p − 2)
··· = ··· + ··· + ···
ρ(p) = φ1 ρ(p − 1) + · · · + φp ρ(0)
Matrix Form
 
  ρ(0) ρ(1) ··· ρ(p − 1)  
ρ(1) φ1

 ..   ρ(1) ρ(0) ··· ρ(p − 2)
  .. 
 . = .. .. .. ..  . 
 . . . . 
ρ(p) φp
ρ(p − 1) ρ(p − 2) · · · ρ(0)
Yule-Walker Estimator (estimate ρ(k) by rk )
 −1
  1 r1 ··· rp−1  
φ1
b r1
. 
 r1 1 ··· rp−2 
  .. 
φb=  ..  =  .. .. .. ..   . 
 . . . . 
φp
b rp
rp−1 rp−2 · · · 1
...
Yule-Walker Estimator
 −1

 1 r1 ··· rp−1  
φb1 r
. 
 r1 1 ··· rp−2   .1 

φ  ..  = 
b= .. .. .. ..   .. 
 . . . . 
φbp rp
rp−1 rp−2 ··· 1
Example: Find the Yule Walker estimate for fitting an AR(2) model to the
time series (−1.4, 0.39, 0.97, 1.5, 0.59, −2.4, −2.2, −1.5, −0.42, 0.10).

...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Parameter Estimation in AR Model: LSE
AR(p): Yt = φ1 Yt−1 + · · · + φp Yt−p + Zt
Express as a regression problem

φ1
 
 .. 

0
Yt = Yt−1 · · · Yt−p  .  + Zt = Yt−1 φ + Zt ,
φp
where Yt−1 = (Yt−1 · · · Yt−p )0 and φ = (φ1 · · · φp )0 .
Recall in linear regression model with p predictors, we have

Y = Xβ + ε
b = (X0 X)−1 X0 Y.
Least squares estimate of β is β
2 b 0 (Y − Xβ)/(n
Least squares estimate of σ is σ̂ 2 = (Y − Xβ) b − p).
...
Least Squares Estimate (LSE) for AR Model
AR(p):
Yp+1 Yp0 Zp+1
     
0
0
 Yp+2   Yp+1   Zp+2 
Yt = Yt−1 φ + Zt ⇒ Y=
 ...  = 
  .. φ +  . 
 .. 
.

0
Yn Yn−1 Zn
Regression: Y = Xβ + ε
b = (X0 X)−1 X0 Y
β
1
Pn
σ̂ 2 = n−p b 2
t=p+1 (Yt − Yt )
Least squares estimate of φ:

b = (X0 X)−1 X0 Y
φ
X −1 X
n 0 n
= Yt−1 Yt−1 Yt−1 Yt
t=p+1 t=p+1
Example - AR(1): Yt = φ1 Yt−1 + Zt , (Yt−1 = Yt−1 )

Pn
Yt−1 Yt 1 Xn
φb1 = Pt=2
n 2 and b2 =
σ (Yt − φb1 Yt−1 )2
t=2 Yt−1 n−1 t=2

...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Conditional Least Squares Method
Conditional Least Squares (CLS) Method

(also called Conditional Sum of Squares (CSS))
Procedure:
1 From the ARMA model φ(B)Yt = θ(B)Zt , estimate the noise sequence
{Zt } from the observation {Yt }, conditional on Ys = Zs = 0 for s ≤ 0
(so we require the MA model to be invertible!)
2 Minimize the sum of squares

n
X
S∗ (φ, θ) = Z̃t2 ,
t=1
where φ = (φ1 , . . . , φp ) and θ = (θ1 , . . . , θq ) are the ARMA

parameters, and Z̃t is the estimate of Zt

...
Conditional Least Squares Method
Example 2
MA(1): Yt = Zt + θZt−1
Conditional on Z0 = 0, we have
Z̃1 = Y1
Z̃2 = Y2 − θZ̃1
...
Z̃n = Yn − θZ̃n−1
Pn 2
Minimize S∗ (θ) = t=1 Z̃t by numerical optimization
Implementation:
set.seed(1234)
y=arima.sim(2000,model=list(ma=c(0.3)))
get.z=function(theta){
z=rep(0,length(y))
z[1]=y[1]
for (k in 2:length(y)){ z[k]=y[k]-theta∗z[k-1]}
return(sum(z∧2))
}
optim(0.1,get.z)
...
Conditional Least Squares: ARMA Model
Example 3
ARMA(1,1): Yt − φYt−1 = Zt − θZt−1
1 Conditional on Z0 = 0, Y0 = 0, we have
Z̃1 = Y1
Z̃2 = Y2 − φY1 + θZ̃1
Z̃3 = Y3 − φY2 + θZ̃2
...
Z̃n = Yn − φYn−1 + θZ̃n−1
Pn 2
2 Minimize S∗ (φ, θ) = t=1 Z̃t by numerical optimization

...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Maximum Likelihood Estimator (MLE)
MLE = Most probable parameter value of the statistical model

inferred from the observed data
Let X1 , X2 , . . . , Xn be i.i.d. random variables with p.d.f. f (x, θ)
f (Xi , θ) is the probability of observing Xi
Denote X = (X1 , . . . , Xn ), then the likelihood function

Yn
L(X, θ) = f (Xi , θ)
i=1
is the probability of observing the whole data set.
The MLE θb maximizes L(X, θ)

The most probable parameter value such that the current data set is
obtained
...
Maximum Likelihood Estimator: Two standard approaches
Approach 1: Iterative Conditioning
1 ,...,Yn ) f (Y1 ,...,Yn−1 )
f (Y1 , . . . , Yn ) = f f(Y(Y1 ,...,Y · · · f (Y 1 ,Y2 )
f (Y1 ) f (Y1 )
Qn n−1 ) f (Y1 ,...,Yn−2 )
= { t=2 f (Yt |Yt−1 , . . . , Y1 )} f (Y1 )
f (Yt |Yt−1 , . . . , Y1 ) may be easy to write down:
e.g. AR(p): Yt = φ1 Yt−1 + · · · + φp Yt−p + Zt , Zt ∼ N (0, σ 2 )
Yt |Yt−1 , . . . , Y1 ∼ N (φ1 Yt−1 + · · · + φp Yt−p , σ 2 )
Example 4 (AR(1): Yt = φYt−1 + Zt )

Y1 ∼ N (0, γ(0)) = N (0, σ 2 /(1 − φ2 ))
12
1−φ2

f (Y1 ) = 2πσ 2
exp −Y12 (1 − φ2 )/2σ 2
For t ≥ 2, Yt |Yt−1 , . . . , Y1 ∼ N (φ1 Yt−1 , σ 2 )

1
12
f (Yt |Yt−1 , . . . , Y1 ) = 2πσ 2
exp − 2σ1 2 (Yt − φYt−1 )2
Likelihood: ( n
Y
)
L(Y1 , Y2 , . . . , Yn ) = f (Yt |Yt−1 , . . . , Y1 ) f (Y1 )
t=2
n2 p Pn
1 − 12 (Yt −φYt−1 )2 +Y12 (1−φ2 )

= 1 − φ2 e 2σ t=2
2πσ 2
...
Maximum Likelihood Estimator: Two standard approaches
Approach 2: Multivariate Normal Distribution
In causal ARMA models, if {Zt } are normally distributed, then {Yt }
are also normal
P∞
since Yt = j=0
ψj Zt−j is a sum of normal noises
{Y1 , . . . , Yn } follows multivariate normal distribution, which is

characterized by the ACVFs:
1 − 21 y0 Σ−1 y
fY1 ,...,Yn (y1 , . . . , yn ) = n 1 e ,
(2π) 2 |Σ| 2
where y = (y1 , . . . , yn ) and
γ(0) γ(1) ... γ(n − 1)
 
 γ(1) γ(0) ... γ(n − 2)
Σ = (γ(|i − j|)) =  .. 
 . 
γ(n − 1) γ(n − 2) ... γ(0)
This approach is used when MA part exists.
...
Maximum Likelihood Estimator using R
ARMA(p, q) model:
Yt = µ + φ1 Yt−1 + · · · + φp Yt−p + Zt + θ1 Zt−1 + · · · + θq Zt−q ,
Zt ∼ W N (0, σ 2 )
Except for some simple models such as AR(1), we have to employ
numerical optimization for the estimation.
arima(x, order = c(0L, 0L, 0L),
seasonal = list(order = c(0L, 0L, 0L), period = NA),
xreg = NULL, include.mean = TRUE,
fixed = NULL, init = NULL,
method = c("CSS-ML", "ML", "CSS"))
Examples:
set.seed(1234)
x=arima.sim(n=1000,
model=list(ar=c(0.7,-0.12),ma=c(0.7)))
arima(x,order=c(2,0,1))
arima(x,order=c(2,0,1),include.mean=F)
...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Statistical Inference for General Time Series
In many situations, parameters are estimated by solving equations

from
M (θ) = 0 ,
where θ is the parameter vector and M (θ) is a (multivariate) function

depending on both θ and the data.
Examples:
γ(0) − C0
Method of Moment: M (θ) =
γ(1) − C1
r1 − φ1 − · · · − φp rp−1
 
r2 − φ1 r1 − · · · − φp rp−2 
Yule Walker: M (θ) = 
······ 
rp − φ1 rp−1 − · · · − φp
∂S(θ) ∂S(θ) 0 Pn

CLS: M (θ) = ∂θ1
··· ∂θk , where S(θ) = t=1
Z̃t2
∂L(θ) ∂L(θ) 0

MLE: M (θ) = ∂θ1
··· ∂θk , where L(θ) is the likelihood

...
Statistical Inference for Time Series via M-estimation
The estimator θ̂ that satisfies M (θ̂) = 0 is called M-estimator.
Asymptotic distribution of θ̂:
Denote θ0 as the true parameter. By Taylor’s expansion,
0 = M (θ̂) = M (θ0 ) + M 0 (θ0 )(θ̂ − θ0 ) + smaller order terms
√ √ −1
⇒ n(θ̂ − θ0 ) ≈ − n [M 0 (θ0 )] M (θ0 )
√ −1
Sometimes we can show that − n [M 0 (θ0 )] M (θ0 ) ∼ N (0, Σ), thus
√
n(θ̂ − θ0 ) ∼ N (0, Σ)
can be employed for statistical
Pn inference. Pn
Example: CLS for MA(1): S(θ) = Z̃ 2 = t=1 (Yt − θZ̃t−1 )2
t=1 t
Pn
M (θ) = ∂S(θ)/∂θ = −2 t=1 (Yt − θZ̃t−1 )Z̃t−1
0
Pn 2
M (θ) = 2 t=1 Z̃t−1
When θ = θ0 , Z̃t = ZtPis the true noise,
n
M (θ0 ) = −2 t=1 Zt Zt−1 ∼ N (0, 4nσ 4 )
0
P n 2 2 ) = 2nσ 2
M (θ0 ) = 2 t=1 Zt−1 → 2nE(Zt−1
√ √ N (0,4nσ 4 )
Result: n(θ̂ − θ0 ) ≈ − n [M 0 (θ0 )]−1 M (θ0 ) = 2√nσ2 ∼ N (0, 1)
Details are out of scope, but results are reported by R
...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Statistical Inference in AR Model: LSE
Least squares estimate of φ: (Yt−1 = (Yt−1 · · · Yt−p )0 )
Xn −1 Xn

b = (X0 X)−1 X0 Y =
φ 0
Yt−1 Yt−1 Yt−1 Yt
t=p+1 t=p+1
Asymptotic distribution of the estimator :

√ b
n(φ − φ) → N (0, σ 2 Γ−1
p )
where
γ(0) γ(1) ··· γ(p − 1)
 
γ(1) γ(0) ··· γ(p − 2)
Γp = 
 
.. .. .. .. 
. . . .
γ(p − 1) γ(p − 2) ··· γ(0)
Proof:
Regression result: If β̂ = (X 0 X)−1 X 0 Y , then (β̂ − β) ∼ N (0, σ 2 (X 0 X)−1 ).
 
Yt−1
 ..  (Yt−1 · · · Yt−p ) ≈ n (γ(|i − j|))i,j=0,...,p−1 = nΓp
Pn
Now X 0 X = t=p+1 .
Yt−p

...
Statistical Inference in AR Model: LSE
Asymptotic distribution of the estimator:
γ(0) γ(1) ··· γ(p − 1)
 
√ γ(1) γ(0) ··· γ(p − 2)
b − φ) → N (0, σ2 Γ−1
n(φ p ) Γp = 

.. .. .. ..


. . . .
γ(p − 1) γ(p − 2) ··· γ(0)
Confidence Intervals or hypothesis tests to make inference on φ:

Example: Estimate AR(2) model: n = 200, Yt = 0.3Yt−1 + 0.04Yt−2 + Zt ;
γ̂(0) = 1.11, γ̂(1) = 0.347, σ̂ 2 = 1.
!−1 !
b −1 1.11 0.347 0.998 −0.312
σ̂ 2 Γ p = =
0.347 1.11 −0.312 0.998
p
C.I. for φ2 : [0.04 ± 2 0.998/n] = [−0.101, 0.181]. (Not significantly
different from 0)
φ̂1 p
Testing for φ1 = 0: z = p = 0.3/ 0.998/200 = 4.24 > 2.
c φ̂1 )
Var(
(significantly different from 0)
...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Graphical Methods for order selection using ACF/PACF
Some facts about ACF of ARMA models:

1 For an MA(q) model: Yt = Zt + θ1 Zt−1 + · · · θq Zt−q , Zt ∼ N (0, σ 2 ),
Pq−|h|
σ2 j=0 θj θj+|h| if |h| ≤ q ,
ρ(h) =
0 if |h| > q .
ACF function has a cut-off at lag q.
2 For an AR(1) model: Yt = φYt−1 + Zt , Zt ∼ N (0, σ 2 ),
ρ(h) = φ|h| .
ACF function is exponential decaying

...
Some facts about PACF of ARMA models:
1 For an AR(p) model: Yt = φ1 Yt−1 + · · · + φp Yt−p + Zt ,
Zt ∼ N (0, σ 2 ),
φk,k = 0 for all k > p
Proof:
Yk+1 − φ1 Yk − · · · − φp Yk−p+1 = Zk is uncorrelated with all {Yj }j≤k−p
⇒ (φ1 , . . . , φp , 0, . . . , 0) minimizes E(Yk+1 − β1 Yk − · · · − βk Y1 )2 for
k ≥ p.
Interpretation: Given Yk , . . . , Yk−p+1 , additional
Yk−p , Yk−p−1 , · · · , Y1 cannot explain more information about Yk+1
On the other hand: φ11 , φ22 , . . . , φpp are non-zero
PACF function has a cut-off at lag p.
2 Is PACF of MA(1) model exponentially decaying?

...
−1 
φk1 ρ(0) ··· ρ(k − 1) ρ(1)
   
 ..   .
. .. ..  .. 
 . =

. . .   . 
φkk ρ(k − 1) · · · ρ(0) ρ(k)
θ
Example: MA(1): Yt = Zt + θZt−1 . Recall ρ(1) = 1+θ2
, ρ(k) = 0 (k ≥ 2).
1st lag PACF is:
θ
φ11 = ρ(0)−1 ρ(1) = ρ(1) =
1 + θ2
For 2nd lag PACF,
−1
φ21 1 ρ(1) ρ(1)
=
φ22 ρ(1) 1 0

1 1 −ρ(1) ρ(1) 1 ρ(1)
= =
1 − ρ(1)2 −ρ(1) 1 0 1 − ρ(1)2 −ρ2 (1)
−ρ2 (1) 2
The 2nd lag PACF is φ22 = 1−ρ2 (1)
= − 1+θθ2 +θ4 .
...
−1 
φk1 ρ(0) ··· ρ(k − 1) ρ(1)
   
 ..   .
. .. ..  .. 
 . =

. . .   . 
φkk ρ(k − 1) · · · ρ(0) ρ(k)
θ
Example: MA(1): Yt = Zt + θZt−1 . Recall ρ(1) = 1+θ2
, ρ(k) = 0 (k ≥ 2)
For kth lag PACF, we solve
 
1 ρ(1)  

ρ(1)

ρ(1) 1 ρ(1)  φk1  0 
 
 ρ(1) 1 ρ(1)  φk2   
0
  
 .  = 
    
···   .. 


   . 
.. 
 ρ(1) 1 ρ(1)  φkk
 
ρ(1) 1 0
Advanced techniques in difference equations and tedious algebras show

k (1−θ 2 )
that φkk = − (−θ)
1−θ2(k+1)
(roughly exponential decaying).
...
Summary:
AR(p) Model -
ACF plot: No clear pattern except AR(1) shows exponential decay
pattern
PACF plot: Cut-off at lag p
MA(q) Model -
ACF plot: Cut-off at lag q
PACF plot: No clear pattern except MA(1) shows exponential decay
pattern
Remark: No clear pattern in ACF/PACF plots for ARMA(p, q) models.

...
Examples on ACF and PACF
ACF PACF
AR(1)
AR(3)
...
Examples on ACF and PACF
ACF PACF
MA(1)
MA(4)

...
Examples on PACF
Try:
x=arima.sim(1000,model=list(ar=c(0.2,0.4,0.3)))
pacf(x)
Which model will you decide?

...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Order Selection
ACF and PACF are graphical methods to determine the order of AR

and MA model
It is more desirable to have a systematic order selection criterion for a
general ARMA model
Some common model selection criterion:
FPE (Final Prediction Error)
AIC (Akaike’s Information Criterion)
BIC (Bayesian Information Criterion)

...
Order Selection - Final Prediction Error
Final Prediction Error (FPE)

n+p
b2
FPE = σ n−p
b 2 is the MLE of σ 2
where σ
Idea of FPE:
For an AR(p) Model, the Mean Square Error of parameter estimate is

M SE = E(φ b − φ) ≈ σ 2 n + p
b − φ)0 (φ (1)
n
n
The best unbiased estimator for σ 2 is σ
b2 n−p
2 2 n
Substituting σ by σb n−p in (1) gives the best estimator of MSE
(define as FPE).
We find the model (AR(1), AR(3), etc) that minimizes FPE.
Note the trade-off between goodness of fit and the model complexity

...
Order Selection - Akaike’s Information Criterion
Akaike’s Information Criterion (AIC)
AIC -
!
b Sx (β)
b
−2 log L β, + 2(p + q + 1)
n
AICC (AIC corrected) -
!
b Sx (β) 2(p + q + 1)n
b
−2 log L β, +
n n−p−q−2
where
Xn
Sx (β)
b = (Xt − φb1 Xt−1 − · · · − φbp Xt−p − θb1 Zt−1 − · · · − θbq Zt−q )2
t=1
βb = (φb1 , . . . , φbp , θb1 , . . . , θbq ) is the MLE
n2
1 1
L(β,
bσ b2 ) = exp − Sx (β)
b
2πb σ2 2πb σ2

...
Order Selection - Akaike’s Information Criterion
Idea -
AIC/AICC estimates the expected likelihood function E[LY (β,
b σb 2 )]
β,
b σb2 are MLE from the observation X = {X1 , . . . , Xn }
Y is a ’new’ observation independent of X
X and Y are different to avoid the problem of overfitting

β,
b σb2 were chosen to maximize LX (β, σ 2 )
We find the model (AR(1), ARMA(1,1), etc) that minimizes AIC.
Note the trade-off between model complexity and goodness of fit

...
Order Selection - Bayesian Information Criterion
Bayesian Information Criterion (BIC)
" #
b2
nσ √
BIC = (n − p − q) log + n(1 + log 2π)
n−p−1
"P #
n 2 b2
− nσ
i=1 Xi
+(p + q) log
p+q
Motivated by Bayesian argument:

BIC ≈ the posterior probability of a particular model given data
We find the model (AR(1), ARMA(1,1), etc) that minimizes BIC.
Consistent order selection procedure
As n → ∞, the order selected by BIC will be equal to the true order
with probability going to 1
AIC/FPE are not consistent (i.e. get a wrong model), but achieve
minimum prediction error
...
Example
n+p

b2
FPE = σ n−p

Sx (β )
AIC = −2 log L βb, n
b + 2(p + q + 1)
Pn
h
nσ 2
i √ Xi2 −nσ 2
b
BIC = (n − p − q) log b
n−p−1
+ n(1 + log 2π) + (p + q) log i=1
p+q
IC=function(x,order.input=c(1,0,1)){
fit=arima(x,order=order.input);
n=length(x);p=order.input[1];q=order.input[3];sig=fit$ sigma2
FPE=sig*(n+p)/(n-p);AIC=fit$ aic
BIC=(n-p-q)*log(n*sig/(n-p-1))+n*(1+log(sqrt(2*pi)))+
(p+q)*log((sum(x∧2)-n*sig)/(p+q)); return(c(FPE,AIC,BIC)) }
Example:
Data= (−0.63, −1.8, −0.98, −0.67, −1.14, −1.67, −2.35, −1.70).
Is ARMA(1,1) or AR(2) better?
x=c(-0.63,-1.8,-0.98,-0.67,-1.14,-1.67,-2.35,-1.70)
IC(x,order=c(1,0,1))
IC(x,order=c(2,0,0))
...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Residual Analysis
Fitted values and residuals:
Once we obtained the estimates, say φ̂ and θ̂ in ARMA(1,1) model
Yt = φYt−1 + Zt + θZt−1 , we can compute the fitted values, {Ybt },
and residuals, {Zbt }, by the following recursions:
1 Ŷ1 = 0; Zb1 = Y1 − Yb1
2 Ŷ2 = φ̂Y1 + θ̂Z
b1 ; Zb2 = Y2 − Yb2
3 Ŷ3 = φ̂Y2 + θ̂Z
b2 ; Zb3 = Y3 − Yb3
4 ······
5 Ŷn = φ̂Yn−1 + θ̂Zbn−1 ; Zbn = Yn − Ybn
If the model fits the data well, then the fitted values {Ybt } should be
closed to the observed time series {Yt }, i.e., the residuals
Zbt = Yt − Ybt
is small. More importantly, {Zbt } should be similar to the white noise

sequence {Zt }.

...
Residual Analysis
Check the goodness of fit of the model by studying the residual
Zbt = Yt − Ybt
STEPS -
1 Time series plot of Zbt (should look like white noises)
2 ACF plot of the ACF of Zbt , rbZ (j).
Good fit if most rbZ (j) are within the C.I. − √2n , √2n , j 6= 0.
3 Portmanteau Statistics
2
Xh rbZ (j)
Q(h) = n(n + 2) → χ2 (h − p − q)
j=1 n−j
A common choice of h is between 10 to 30

If Q(h) is bigger than the 95% percentile of the χ2 (h − p − q)
bt ∼ WN.
distribution, we reject H0 : Z
If H0 is not rejected, then the model is a good fit to the data.

...
Residual Analysis
STEPS -
1 Time series plot of Zbt (should look like white noises)
2 ACF plot of Z bt , rbZ (j).

Good fit if all rbZ (j) are within the C.I. − √2n , √2n , j 6= 0.
2
Ph b rZ (j)
3 Portmanteau Statistics Q(h) = n(n + 2) j=1 n−j ∼ χ2 (h − p − q).
Example ARMA(1,1): Yt = φYt−1 + Zt + θZt−1 .
set.seed(6104)
x=arima.sim(1000,model=list(ar=c(0.2), ma=c(0.6)))
n=length(x)
fit=arima(x,order=c(1,0,1))
par(mfrow=c(2,1))
ts.plot(fit$ res)
r.z=as.numeric(acf(fit$ res,12)$ acf) ## h=12
portmanteau.stat=n*(n+2)*sum((r.z[-1]∧2)/(n-(1:12)))
portmanteau.stat>qchisq(0.95,12-1-1) ## FALSE: not reject H0
or tsdiag(fit)

...
Agenda
1 Introduction
LSE
4 Order Selection
5 Residual Analysis
6 Model Building
...
Model Building
Three Stages -
1. Model Specification
Trend, seasonal effect, choosing ARIMA model by FPE/AIC/BIC
2. Model Identification (estimating coefficient)
MM, YW, LSE, CLS, MLE,
3. Model Checking(diagnostic)
Residual analysis (ts.plot/acf of {Zbt })
Portmanteau test

Time Series: Chapter 4 - Estimation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series: Chapter 4 - Estimation

Uploaded by

Copyright:

Available Formats

Time Series

Estimation (Time Series) Chapter 4 1 / 53

φ(B)(1 − B)d (Yt − µ) = θ(B)Zt , where Zt ∼ W N (0, σ 2 ) .

Unknown order: (p, d, q)

How to estimate the order (p, d, q)?

Estimation (Time Series) Chapter 4 3 / 53

Theoretical Moments (E(Y k )) depends on unknown parameters

Theoretical Moment Sample Moment

Basic idea of Moment Estimators:

Estimation (Time Series) Chapter 4 6 / 53

Estimation (Time Series) Chapter 4 7 / 53

Yule-Walker equations is a system of equations connecting AR

⇒ γ(k) = φ1 γ(k − 1) + · · · + φp γ(k − p)

⇒ ρ(k) = φ1 ρ(k − 1) + · · · + φp ρ(k − p) (easier to estimate ρ(k) by R)

Given parameters φi s, we have used Yule-Walker equation to find

Given sample ACFs, we can use Yule-Walker equation to estimate φi s

Estimation (Time Series) Chapter 4 9 / 53

ρ(1) = φ1 ρ(0) + · · · + φp ρ(p − 1)

Estimation (Time Series) Chapter 4 11 / 53

AR(p): Yt = φ1 Yt−1 + · · · + φp Yt−p + Zt

Express as a regression problem

Recall in linear regression model with p predictors, we have

Least squares estimate of φ:

Example - AR(1): Yt = φ1 Yt−1 + Zt , (Yt−1 = Yt−1 )

Estimation (Time Series) Chapter 4 14 / 53

Conditional Least Squares (CLS) Method

2 Minimize the sum of squares

where φ = (φ1 , . . . , φp ) and θ = (θ1 , . . . , θq ) are the ARMA

Estimation (Time Series) Chapter 4 16 / 53

Estimation (Time Series) Chapter 4 18 / 53

MLE = Most probable parameter value of the statistical model

Let X1 , X2 , . . . , Xn be i.i.d. random variables with p.d.f. f (x, θ)

f (Xi , θ) is the probability of observing Xi

Denote X = (X1 , . . . , Xn ), then the likelihood function

is the probability of observing the whole data set.

The MLE θb maximizes L(X, θ)

Example 4 (AR(1): Yt = φYt−1 + Zt )

For t ≥ 2, Yt |Yt−1 , . . . , Y1 ∼ N (φ1 Yt−1 , σ 2 )

{Y1 , . . . , Yn } follows multivariate normal distribution, which is

In many situations, parameters are estimated by solving equations

where θ is the parameter vector and M (θ) is a (multivariate) function

Estimation (Time Series) Chapter 4 26 / 53

Xn −1 Xn 

Asymptotic distribution of the estimator :

Estimation (Time Series) Chapter 4 29 / 53

Confidence Intervals or hypothesis tests to make inference on φ:

Some facts about ACF of ARMA models:

ACF function has a cut-off at lag q.

2 For an AR(1) model: Yt = φYt−1 + Zt , Zt ∼ N (0, σ 2 ),

ACF function is exponential decaying

Estimation (Time Series) Chapter 4 33 / 53

φk,k = 0 for all k > p

2 Is PACF of MA(1) model exponentially decaying?

Advanced techniques in difference equations and tedious algebras show

Remark: No clear pattern in ACF/PACF plots for ARMA(p, q) models.

Estimation (Time Series) Chapter 4 37 / 53

Estimation (Time Series) Chapter 4 39 / 53

Estimation (Time Series) Chapter 4 40 / 53

ACF and PACF are graphical methods to determine the order of AR

Estimation (Time Series) Chapter 4 42 / 53

Final Prediction Error (FPE)

Estimation (Time Series) Chapter 4 43 / 53

Xn −1 Xn