Chap 35

3 ARIMA Models - 3.
5 Estimation
Aaron Smith
2023-01-14
This code is modified from Time Series Analysis and Its Applications, by Robert H. Shumway,
David S. Stoffer https://github.com/nickpoison/tsa4
The most recent version of the package can be found at https://github.com/nickpoison/astsa/
You can find demonstrations of astsa capabilities at

https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md
In addition, the News and ChangeLog files are at

https://github.com/nickpoison/astsa/blob/master/NEWS.md.
The webpages for the texts and some help on using R for time series analysis can be found at
https://nickpoison.github.io/.
UCF students can download it for free through the library.
methods of moments estimation

We assume that we have observations, $x_1,x_2,...,x_n$, from an ARMA(p,q) process that is
causal and invertible, and the degrees p, and q are known.
Our goal is to estimate the coefficients and the variance of our ARMA(p,q) process.
\[ \phi_1,\phi_2,...,\phi_p \\ \theta_1,\theta_2,...,\theta_q \\ \sigma_w^2 \]
Our game plan is to identify statistics whose expected value equals the desired parameter, then
use the statistic in place of the parameter.
Method of moments can lead to sub-optimal estimators.
Definition 3.10 The Yule–Walker equations

Consider the AR(p) process
\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + w_t \]
\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + w_t \\ x_t x_{t-h} = \phi_1 x_{t-1}
x_{t-h} + \phi_2 x_{t-2} x_{t-h} + ... + \phi_p x_{t-p} x_{t-h} + w_t x_{t-h} \\ E(x_t x_{t-h}) = \phi_1
E(x_{t-1} x_{t-h}) + \phi_2 E(x_{t-2} x_{t-h}) + ... + \phi_p E(x_{t-p} x_{t-h}) + E(w_t x_{t-h}) \\
\gamma(h) = \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + ... + \phi_p \gamma(h - p) + E(w_t
x_{t-h}) \\ \]
Pluging in values for $h$, we get:
\[ \gamma(h) = \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + ... + \phi_p \gamma(h - p), \ h =

1,2,...,p \\ \sigma_w^2 = \gamma(0) - \phi_1 \gamma(1) - \phi_2 \gamma(2) - ... - \phi_p
\gamma(p), \ h = 0 \]
Using matrix notation:
\[ \Gamma_p \phi = \gamma_p \\ \sigma_w^2 = \gamma(0) - \phi ' \gamma_p \\ \Gamma_p =

(\gamma(k-j))_{j,k=1}^{p} \\ \phi = \begin{pmatrix} \phi_1 \\ \phi_2 \\ \vdots \\ \phi_p \end{pmatrix}
\\ \gamma_p = \begin{pmatrix} \gamma(1) \\ \gamma(2) \\ \vdots \\ \gamma(p) \end{pmatrix} \]
Now insert the sample autocovariance function for the population autocovariance.
\[ \widehat{\gamma}(h) = \dfrac{1}{n}\sum_{t = 1}^{n - h}(x_{t+h} - \bar{x})(x_{t} - \bar{x}) \\

\widehat{\phi} = \widehat{\Gamma}_p^{-1} \widehat{\gamma}_p \\ \widehat{\sigma}_w^2 =
\widehat{\gamma}(0) - \widehat{\phi} ' \widehat{\gamma}_p = \widehat{\gamma}(0) -
\widehat{\gamma}_p'\widehat{\Gamma}_p^{-1} \widehat{\gamma}_p \]
Yule–Walker estimators
Switching from autocovariance to autocorrelation we get the Yule–Walker estimators.
\[ \widehat{\phi} = \widehat{\Gamma}_p^{-1} \widehat{\gamma}_p = \widehat{R}_p^{-1}

\widehat{\rho}_p\\ \widehat{\sigma}_w^2 = \widehat{\gamma}(0) -
\widehat{\gamma}'_p\widehat{\Gamma}_p^{-1}\widehat{\gamma}_p =
\widehat{\gamma}(0)\left[1 - \widehat{\rho}_p'\widehat{R}_p^{-1} \widehat{\rho}_p\right] \\
\widehat{R}_p^{-1} = (\widehat{\rho}(k-j))_{j,k = 1}^{p} \\ \widehat{\rho}_p = \begin{pmatrix}
\widehat{\rho}(1) \\ \widehat{\rho}(2) \\ \vdots \\ \widehat{\rho}(p) \end{pmatrix} \]
When the sample size is large, the Yule–Walker estimators are approximately normally
distributed and the variance estimate is close to the population variance.
Property 3.8 Large Sample Results for Yule–Walker Estimators

The asymptotic behavior of the Yule-Walker estimators of an AR(p) process as $n \rightarrow
\infty$ are
\[ \sqrt{n}(\widehat{\phi} - \phi) \xrightarrow{d} N(0,\sigma_w^2 \Gamma_p^{-1}) \\

\widehat{\sigma}_w^2 \xrightarrow{p} \sigma_w^2 \]
Load data
rm(
list = ls()
)
options(
digits = 3,
scipen = 999
)
data(
list = "rec",
package = "astsa"
)
Example 3.28
ar.yw_rec = ar.yw(
x = rec,
order = 2
)
ar.yw_rec$x.mean # = 62.26278 (mean estimate)
## [1] 62.3
ar.yw_rec$ar # = 1.3315874, -.4445447 (parameter estimates)
## [1] 1.332 -0.445
sqrt(
x = diag(
x = ar.yw_rec$asy.var.coef
)
) # = .04222637, .04222637 (standard errors)
## [1] 0.0422 0.0422
ar.yw_rec$var.pred # = 94.79912 (error variance estimate)
## [1] 94.8
predict_rec = predict(
object = ar.yw_rec,
n.ahead = 24
)
U = predict_rec$pred + predict_rec$se
L = predict_rec$pred - predict_rec$se
astsa::tsplot(
x = cbind(
rec, predict_rec$pred
),
spag = TRUE,
xlim = c(1980,1990),
ylab = "Recruitment"
)
lines(
x = predict_rec$pred,
col = 2,
type = "o"
)
lines(
x = U,
col = 4,
lty = 2
)
lines(
x = L,
col = 4,
lty = 2
)
Example 3.29
set.seed(
seed = 20230114
)
sarima.sim_ma1 = astsa::sarima.sim(
ma = 0.9,
n = 50
)
astsa::acf1(
series = sarima.sim_ma1,
max.lag = 1,
plot = FALSE
) # [1] .536 (lag 1 sample ACF)
## [1] 0.458
astsa::acf1(
max.lag = length(
x = sarima.sim_ma1
) - 1,
plot = TRUE
)
## [1] 0.46 -0.08 -0.25 -0.23 0.04 0.05 -0.07 -0.11 -0.04 0.08
0.09 0.08
## [13] 0.00 0.10 0.10 -0.19 -0.29 -0.18 -0.01 0.08 -0.01 -0.09 -
0.04 0.07
## [25] 0.14 0.07 -0.09 -0.20 -0.06 0.04 0.04 0.03 -0.05 -0.01
0.00 -0.03
## [37] 0.02 -0.03 -0.06 -0.04 -0.02 -0.01 0.01 0.06 0.04 0.03
0.04 0.01
## [49] 0.01
Example 3.31
ar.mle_rec = ar.mle(
x = rec,
order = 2
)
ar.mle_rec$x.mean
## [1] 62.3
ar.mle_rec$ar
## [1] 1.351 -0.461
sqrt(
x = diag(
x = ar.mle_rec$asy.var.coef
)
) # standard errors
## [1] 0.041 0.041
ar.mle_rec$var.pred
## [1] 89.3
Example 3.33
data(
list = "varve",
package = "astsa"
)
diff_log_varve = diff(
x = log(
x = varve
)
) # data
r <- astsa::acf1(
series = diff_log_varve,
max.lag = 1,
plot = FALSE
) # acf(1)
astsa::acf1(
max.lag = length(
x = diff_log_varve
) - 1,
plot = TRUE
)
## [1] -0.40 -0.04 -0.06 0.01 0.00 0.04 -0.04 0.04 0.01 -0.05
0.06 -0.06
## [13] -0.04 0.08 -0.02 0.01 0.00 0.03 -0.05 -0.06 0.07 0.04 -
0.06 0.05
## [25] -0.01 -0.04 0.05 -0.05 0.03 -0.02 0.00 0.06 -0.05 -0.03
0.04 -0.05
## [37] 0.03 -0.06 0.09 -0.03 0.03 -0.01 -0.02 -0.04 -0.01 0.08 -
0.06 0.08
## [49] -0.08 0.05 -0.06 0.03 -0.01 0.03 -0.03 0.02 0.01 0.00 -
0.03 0.02
## [61] -0.03 0.01 0.03 0.04 -0.07 -0.03 0.03 0.00 0.02 -0.01
0.00 0.05
## [73] -0.06 -0.02 -0.01 0.07 -0.04 0.03 -0.07 0.04 0.02 0.01 -
0.06 -0.02
## [85] 0.09 -0.04 0.00 -0.02 0.03 0.03 -0.05 0.01 0.00 0.03
0.01 -0.05
## [97] 0.01 0.00 0.04 -0.04 -0.04 0.04 0.02 0.03 -0.03 -0.05 -
0.01 0.04
## [109] -0.02 0.02 0.01 -0.05 0.05 -0.03 0.00 0.02 -0.02 0.02
0.03 -0.04
## [121] -0.03 0.04 -0.05 0.07 -0.06 0.06 -0.04 0.02 -0.01 0.00
0.01 0.00
## [133] -0.01 0.01 0.04 -0.03 -0.04 0.07 -0.04 -0.01 0.00 0.04 -
0.01 -0.02
## [145] 0.02 -0.10 0.10 -0.06 0.04 0.03 -0.05 0.02 0.03 -0.04 -
0.06 0.10
## [157] -0.04 0.06 -0.01 -0.10 0.08 -0.04 -0.01 0.02 0.01 -0.04
0.06 -0.06
## [169] 0.02 0.01 -0.05 0.07 -0.02 0.07 -0.03 -0.03 0.01 -0.07
0.06 0.05
## [181] -0.07 0.00 0.05 -0.04 -0.04 0.03 -0.02 0.07 0.00 -0.09
0.08 -0.05
## [193] 0.02 0.02 0.00 -0.03 0.05 -0.01 -0.05 0.00 -0.01 0.00
0.03 0.04
## [205] -0.05 -0.03 0.00 0.03 0.01 -0.05 0.03 -0.03 0.04 -0.02
0.02 0.02
## [217] -0.04 -0.02 0.03 0.01 0.01 -0.04 0.04 -0.05 0.02 0.00 -
0.01 0.05
## [229] -0.05 0.04 -0.05 0.03 -0.01 0.03 -0.06 0.02 0.01 -0.02
0.04 -0.04
## [241] 0.01 0.02 -0.01 0.02 -0.01 0.01 -0.02 -0.04 0.03 -0.01
0.01 0.02
## [253] -0.07 0.05 -0.01 0.02 -0.02 -0.01 0.01 0.04 -0.03 -0.02
0.01 0.00
## [265] 0.03 -0.03 0.02 -0.01 0.04 -0.09 0.04 -0.02 0.03 0.00 -
0.06 0.06
## [277] -0.03 0.04 -0.06 0.06 -0.05 0.01 0.02 -0.01 0.04 -0.01 -
0.01 0.00
## [289] -0.01 0.01 0.03 -0.03 0.00 0.00 -0.01 0.05 -0.02 -0.05
0.03 -0.01
## [301] 0.04 -0.07 0.03 -0.04 0.07 0.00 -0.03 0.03 -0.03 0.00
0.02 -0.02
## [313] -0.01 0.01 0.02 -0.05 0.06 -0.05 0.01 0.00 0.02 0.02 -
0.04 0.01
## [325] 0.00 0.02 -0.03 0.01 -0.04 0.05 -0.02 0.03 0.00 -0.09
0.07 0.02
## [337] -0.03 -0.01 -0.01 -0.02 0.05 -0.06 0.02 0.05 -0.04 0.00
0.04 -0.06
## [349] 0.05 -0.06 0.01 0.00 0.02 0.02 -0.04 -0.01 0.00 0.04 -
0.04 -0.01
## [361] 0.04 -0.01 0.01 -0.04 0.05 -0.06 0.06 0.00 -0.04 0.03 -
0.01 0.01
## [373] -0.02 -0.01 0.02 0.00 0.01 -0.01 -0.04 0.03 0.00 0.01
0.02 -0.02
## [385] -0.02 0.01 0.01 0.00 0.01 -0.02 0.01 -0.01 0.00 0.00
0.02 0.00
## [397] -0.01 -0.01 0.02 0.00 -0.02 0.02 0.00 0.02 -0.02 -0.02
0.02 -0.02
## [409] 0.03 -0.04 0.03 -0.01 0.01 -0.02 0.02 -0.02 -0.01 0.00
0.03 -0.03
## [421] 0.05 -0.06 0.04 -0.02 0.04 -0.05 0.02 -0.03 0.06 -0.03
0.01 -0.04
## [433] 0.02 0.01 0.04 -0.03 0.00 -0.01 0.00 0.00 0.00 -0.01
0.03 -0.02
## [445] -0.02 0.05 -0.04 0.02 -0.03 0.03 0.00 0.01 -0.01 -0.02
0.01 -0.01
## [457] 0.02 0.00 0.00 0.01 -0.03 0.03 -0.02 0.00 0.00 0.01
0.02 -0.03
## [469] 0.02 -0.01 0.01 0.00 0.03 -0.04 0.02 -0.04 0.05 -0.03
0.01 0.01
## [481] 0.00 -0.02 0.02 -0.01 -0.01 -0.01 0.01 0.01 0.00 -0.03
0.05 -0.05
## [493] 0.02 0.03 -0.04 0.01 0.02 0.00 0.01 -0.01 0.00 0.01 -
0.01 0.02
## [505] 0.03 -0.05 -0.01 0.03 0.00 0.00 -0.01 0.02 -0.01 -0.02
0.00 0.02
## [517] -0.03 0.02 0.01 -0.01 -0.01 0.01 0.00 0.00 -0.01 0.01 -
0.01 0.00
## [529] 0.00 0.02 -0.04 0.03 -0.01 0.01 -0.03 0.01 0.01 -0.01 -
0.01 0.02
## [541] 0.00 -0.03 0.04 -0.01 -0.01 -0.01 0.01 0.00 0.00 0.00
0.02 -0.03
## [553] -0.01 0.03 -0.01 0.00 0.00 -0.01 0.02 -0.02 0.03 -0.01 -
0.02 0.00
## [565] 0.03 -0.02 0.00 0.02 -0.02 0.02 -0.02 0.02 -0.01 0.00
0.00 0.01
## [577] -0.01 -0.01 0.01 0.01 -0.01 -0.01 0.00 0.02 -0.02 0.01 -
0.01 0.02
## [589] -0.02 0.02 -0.01 0.00 0.01 -0.02 0.01 0.00 0.01 -0.01
0.00 0.00
## [601] 0.01 0.00 0.00 0.01 -0.02 0.01 0.01 0.00 -0.01 -0.01
0.02 -0.01
## [613] 0.00 0.00 0.00 -0.01 0.00 0.01 -0.01 0.00 0.00 0.01 -
0.01 0.00
## [625] 0.01 0.00 -0.01 0.00 0.00 0.00 0.00 0.00
initialize all variables
#c(0) -> w -> z -> Sc -> Sz -> Szw -> para
w <- 0
z <- 0
Sc <- 0
Sz <- 0
Szw <- 0
para <- 0
length_varve = length(
x = diff_log_varve
) # 633
Gauss-Newton Estimation
para[1] <- (1-sqrt(1-4*(r^2)))/(2*r) # MME to start (not very good)
niter <- 20
for (j in 1:niter){
for(t in 2:length_varve){
w[t] <- diff_log_varve[t] - para[j]*w[t-1]
z[t] <- w[t-1] - para[j]*z[t-1]
}
Sc[j] <- sum(
x = w^2
)
Sz[j] <- sum(z^2)
Szw[j] <- sum(z*w)
para[j+1] <- para[j] + Szw[j]/Sz[j]
}
Results
cbind(
iteration = 1:niter-1,
thetahat = para[1:niter],
Sc,
Sz
)
## iteration thetahat Sc Sz
## [1,] 0 -0.495 159 171

## [2,] 1 -0.668 151 235
## [3,] 2 -0.733 149 301
## [4,] 3 -0.756 149 337
## [5,] 4 -0.766 149 354
## [6,] 5 -0.769 149 362
## [7,] 6 -0.771 149 366
## [8,] 7 -0.772 149 367
## [9,] 8 -0.772 149 368
## [10,] 9 -0.772 149 369
## [11,] 10 -0.773 149 369
## [12,] 11 -0.773 149 369
## [13,] 12 -0.773 149 369
## [14,] 13 -0.773 149 369
## [15,] 14 -0.773 149 369
## [16,] 15 -0.773 149 369
## [17,] 16 -0.773 149 369
## [18,] 17 -0.773 149 369
## [19,] 18 -0.773 149 369
## [20,] 19 -0.773 149 369
Plot conditional SS and results
#c(0) -> w -> cSS
w <- 0
cSS <- 0
th = seq(
from = -0.3,
to = -0.94,
by = -0.01
)
for(p in 1:length(th)){
w[t] <- diff_log_varve[t] - th[p]*w[t-1]
}
cSS[p] <- sum(
x = w^2
)
}
astsa::tsplot(
x = th,
y = cSS,
ylab = expression(S[c](theta)),
xlab = expression(theta)
)
abline(
v = para[1:length(Sc)],
lty = 2,
col = 4
) # add previous results to plot
points(
x = para[1:length(Sc)],
y = Sc,
pch = 16,
col = 4
)
Example 3.36
generate data
set.seed(
seed = 20230115
)
# VGAM::rlaplace would have been better
rexp_0.5 = rexp(
n = 150,
rate = 0.5
)
runif_sign = runif( # sample with -1,1 would have been better
n = 150,
min = -1,
max = 1
)
rlaplace_0.5 = rexp_0.5*sign(
x = runif_sign
)
sarima.sim_laplace = 50 + astsa::sarima.sim(
n = 100,
ar = 0.95,
innov = rlaplace_0.5,
burnin = 50
)
astsa::tsplot(
x = sarima.sim_laplace,
ylab = expression(X[~t])
)
Bootstrap
set.seed(
seed = 20230115
) # not that 666
ar.yw_laplace = ar.yw(
order = 1
) # assumes the data were retained
mean_laplace = ar.yw_laplace$x.mean # estimate of mean
phi = ar.yw_laplace$ar # estimate of phi
nboot = 250 # number of bootstrap replicates
resid_laplace = ar.yw_laplace$resid[-1] # the 99 innovations
x.star = sarima.sim_laplace # initialize x*
phi.star.yw = c() # initialize phi*
for (i in 1:nboot) {
resid.star = sample(
x = resid_laplace,
replace = TRUE
)
x.star = astsa::sarima.sim(
n = 99,
ar = phi,
innov = resid.star,
burnin = 0
) + mean_laplace
phi.star.yw[i] <- ar.yw(
x = x.star,
order = 1
)$ar
}
small sample distn

set.seed(
seed = 20230115
)
phi.yw = rep(
x = NA,
times = 1000
)
for (i in 1:1000){
rexp_0.5 <- rexp(
n = 150,
rate = 0.5
);
runif_sign <- runif(
n = 150,
min = -1,
max = 1
);
rlaplace_0.5 <- rexp_0.5*sign(
x = runif_sign
)
arima.sim_laplace <- 50 + arima.sim(
n = 100,
list(
ar = 0.95
),
n.start = 50
)
phi.yw[i] <- ar.yw(
x = arima.sim_laplace,
order = 1
)$ar
}
Picture
hist(
x = phi.star.yw,
breaks = 15,
main = "",
prob = TRUE,
xlim = c(
0.65,1.05
),
ylim = c(
0,14
),
col = astsa::astsa.col(
col = 4,
alpha = 0.3
),
xlab = expression(hat(phi))
)
lines(
x = density(
x = phi.yw,
bw = 0.02
),
lwd = 2
)
curve(
expr = dnorm(
x = x,
mean = 0.96,
sd = 0.03
),
from = 0.75,
to = 1.1,
lty = 2,
lwd = 2,
add = TRUE
)
legend(
x = 0.65,
y = 14,
bty = 'n',
lty = c(
1,0,2
),
lwd = c(
2,0,2
),
col = 1,
pch = c(
NA,22,NA
),
pt.bg = c(
NA,astsa::astsa.col(
col = 4,
alpha = 0.3
),NA
),
pt.cex = 2.5,
legend = c(
'true distribution', 'bootstrap distribution', 'normal approximation'
)
)
3 ARIMA Models - 3.5.1 Estimation
- Method of Moments
Aaron Smith
2023-01-14


methods of moments estimation

We assume that we have observations, $x_1,x_2,...,x_n$, from an ARMA(p,q) process that is
causal and invertible, and the degrees p, and q are known.
Our goal is to estimate the coefficients and the variance of our ARMA(p,q) process.
\[ \phi_1,\phi_2,...,\phi_p \\ \theta_1,\theta_2,...,\theta_q \\ \sigma_w^2 \]
Our game plan is to identify statistics whose expected value equals the desired parameters,
then find theoretical equations that use the parameters. Then use these equations to estimate
the parameters.
Method of moments can lead to sub-optimal estimators.
Definition 3.10 The Yule–Walker equations

Consider the AR(p) process
\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + w_t \]

\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + w_t \\ x_t x_{t-h} = \phi_1 x_{t-1}
x_{t-h} + \phi_2 x_{t-2} x_{t-h} + ... + \phi_p x_{t-p} x_{t-h} + w_t x_{t-h} \\ E(x_t x_{t-h}) = \phi_1
E(x_{t-1} x_{t-h}) + \phi_2 E(x_{t-2} x_{t-h}) + ... + \phi_p E(x_{t-p} x_{t-h}) + E(w_t x_{t-h}) \\
\gamma(h) = \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + ... + \phi_p \gamma(h - p) + E(w_t
x_{t-h}) \\ \]
Pluging in values for $h$, we get:
\[ \gamma(h) = \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + ... + \phi_p \gamma(h - p), \ h =

1,2,...,p \\ \sigma_w^2 = \gamma(0) - \phi_1 \gamma(1) - \phi_2 \gamma(2) - ... - \phi_p
\gamma(p), \ h = 0 \]
Using matrix notation:
\[ \Gamma_p \phi = \gamma_p \\ \sigma_w^2 = \gamma(0) - \phi ' \gamma_p \\ \Gamma_p =

(\gamma(k-j))_{j,k=1}^{p} \\ \phi = \begin{pmatrix} \phi_1 \\ \phi_2 \\ \vdots \\ \phi_p \end{pmatrix}
\\ \gamma_p = \begin{pmatrix} \gamma(1) \\ \gamma(2) \\ \vdots \\ \gamma(p) \end{pmatrix} \]
Now insert the sample autocovariance function for the population autocovariance.
\[ \widehat{\gamma}(h) = \dfrac{1}{n}\sum_{t = 1}^{n - h}(x_{t+h} - \bar{x})(x_{t} - \bar{x}) \\

\widehat{\phi} = \widehat{\Gamma}_p^{-1} \widehat{\gamma}_p \\ \widehat{\sigma}_w^2 =
\widehat{\gamma}(0) - \widehat{\phi} ' \widehat{\gamma}_p = \widehat{\gamma}(0) -
\widehat{\gamma}_p'\widehat{\Gamma}_p^{-1} \widehat{\gamma}_p \]
Yule–Walker estimators
Switching from autocovariance to autocorrelation we get the Yule–Walker estimators.
\[ \widehat{\phi} = \widehat{\Gamma}_p^{-1} \widehat{\gamma}_p = \widehat{R}_p^{-1}

\widehat{\rho}_p\\ \widehat{\sigma}_w^2 = \widehat{\gamma}(0) -
\widehat{\gamma}'_p\widehat{\Gamma}_p^{-1}\widehat{\gamma}_p =
\widehat{\gamma}(0)\left[1 - \widehat{\rho}_p'\widehat{R}_p^{-1} \widehat{\rho}_p\right] \\
\widehat{R}_p^{-1} = (\widehat{\rho}(k-j))_{j,k = 1}^{p} \\ \widehat{\rho}_p = \begin{pmatrix}
\widehat{\rho}(1) \\ \widehat{\rho}(2) \\ \vdots \\ \widehat{\rho}(p) \end{pmatrix} \]
When the sample size is large, the Yule–Walker estimators are approximately normally
distributed and the variance estimate is close to the population variance.
Property 3.8 Large Sample Results for Yule–Walker Estimators

The asymptotic behavior of the Yule-Walker estimators of an AR(p) process as $n \rightarrow
\infty$ are
\[ \sqrt{n}(\widehat{\phi} - \phi) \xrightarrow{d} N(0,\sigma_w^2 \Gamma_p^{-1}) \\

\widehat{\sigma}_w^2 \xrightarrow{p} \sigma_w^2 \]
Proof:
The proof will be presented separately.
The Yule-Walker estimators are presented using an inverse of the covariance matrix. We can
use the Durbin-Levinson algorithm to calculate $\widehat{\phi}$ without inverting
$\widehat{\Gamma}_p$ nor $\widehat{R}_p$, by replacing $\gamma(h)$ with
$\widehat{\gamma}(h)$.
The Durbin-Levinson algorithm iteratively calculates
\[ \widehat{\phi}_h = \begin{pmatrix} \phi_{h1} \\ \phi_{h2} \\ \vdots \\ \phi_{hh} \\ \end{pmatrix} \]
Applying the large sample results for Yule–Walker estimators to the Durbin-Levinson algorithm,
we get the large sample distribution of the PACF $\widehat{\phi}_{hh}$.
Property 3.9 Large Sample Distribution of the PACF

For a causal AR(p) process with $h > p$
\[ \sqrt{n}\widehat{\phi}_{hh} \xrightarrow{d} N(0,1) \text{ as } n \rightarrow \infty \]
Example 3.27 Yule–Walker Estimation for an AR(2) Process

Consider an AR(2) process with
\[ x_t = 1.5 x_{t-1} - 0.75x_{t-2} +w_t \\ w_t \sim iid \ N(0,1) \]
n <- 1e6
v_x <- rep(

x = 0,
times = n
)
set.seed(
seed = 20230124
)
for(j in 3:n) v_x[j] <- 1.5*v_x[j-1] - 0.75*v_x[j-2] + rnorm(1)
var(
x = v_x
)
## [1] 8.638448
astsa::acf1(
series = v_x,
max.lag = 2
)
## [1] 0.86 0.54
astsa::acf2(
series = v_x,
max.lag = 3
)
## [,1] [,2] [,3]
## ACF 0.86 0.54 0.16

## PACF 0.86 -0.75 0.00
Manually estimate model parameters. Note: the time series was generated using a random
number generator and the estimated values may change with different runs. Also rounding will
change the output.
solve(matrix(
data = c(1,0.86,0.86,1),
nrow = 2
)) %*% matrix(
data = c(0.86,0.54),
ncol = 1
)
## [,1]
## [1,] 1.5192012
## [2,] -0.7665131
8.64 * (1 - sum(c(0.86,0.54)*c(1.52,-0.77)))
## [1] 0.938304
(1/n)*(0.94/8.67) * solve(matrix(
data = c(1,0.86,0.86,1),
nrow = 2
))
## [,1] [,2]
## [1,] 4.163588e-07 -3.580686e-07

## [2,] -3.580686e-07 4.163588e-07
Let ar.ols() perform the calculations.
ar.ols(
x = v_x,
intercept = FALSE,
order.max = 2,
demean = FALSE
)
##
## Call:
## ar.ols(x = v_x, order.max = 2, demean = FALSE, intercept = FALSE)
##
## Coefficients:
## 1 2
## 1.5003 -0.7504
##
## Order selected 2 sigma^2 estimated as 1.001
Load data
rm(
list = ls()
)
options(
digits = 2,
scipen = 999
)
data(
list = "rec",
package = "astsa"
)
Example 3.28 Yule–Walker Estimation of the Recruitment Series

The Yule–Walker estimator is an alternative to ordinary least squares to estimating the model.
ar.ols(
x = rec,
order = 2
)
##
## Call:
## ar.ols(x = rec, order.max = 2)
##
## Coefficients:
## 1 2
## 1.35 -0.46
##
## Intercept: -0.0564 (0.446)
##
## Order selected 2 sigma^2 estimated as 89.7
ar.yw_rec = ar.yw(
x = rec,
order = 2
)
ar.yw_rec$x.mean # = 62.26278 (mean estimate)
## [1] 62
ar.yw_rec$ar # = 1.3315874, -.4445447 (parameter estimates)
## [1] 1.33 -0.44
sqrt(
x = diag(
x = ar.yw_rec$asy.var.coef
)
) # = .04222637, .04222637 (standard errors)
## [1] 0.042 0.042
ar.yw_rec$var.pred # = 94.79912 (error variance estimate)
## [1] 95
Obtain the 24 month ahead predictions and their standard errors, and then plot the results.
predict_rec = predict(
object = ar.yw_rec,
n.ahead = 24
)
U = predict_rec$pred + predict_rec$se
L = predict_rec$pred - predict_rec$se
astsa::tsplot(
x = cbind(
rec, predict_rec$pred
),
spag = TRUE,
xlim = c(1980,1990),
ylab = "Recruitment"
)
lines(
x = predict_rec$pred,
col = 2,
type = "o"
)
lines(
x = U,
col = 4,
lty = 2
)
lines(
x = L,
col = 4,
lty = 2
)
For AR(p) models, the Yule–Walker estimators are optimal in the sense that the asymptotic
distribution is the best asymptotic normal distribution.
This is because, given initial conditions, AR(p) models are linear models, and the Yule–Walker
estimators are essentially least squares estimators.
If we use method of moments for MA or ARMA models, we will not get optimal estimators
because such processes are nonlinear in the parameters.
Example 3.29 Method of Moments Estimation for an MA(1)

Consider the MA(1) time series
\[ x_t = w_t + \theta w_{t-1} \\ |\theta| < 1 \]
Let’s use the variance, autocovariance and autocorrelation to estimate the parameter.
\[ \gamma(0) = \sigma_w^2(1+\theta^2) \\ \gamma(1) = \sigma_w^2 \theta \\ \rho(1) =

\dfrac{\gamma(1)}{\gamma(0)} = \dfrac{\theta}{1+\theta^2} \]
Now let’s use method of moments. We will use the quadratic equation to estimate the
parameter.
\[ \widehat{\rho}(1) = \dfrac{\widehat{\theta}}{1+\widehat{\theta}^2} \\ \widehat{\theta} = \dfrac{1
\pm \sqrt{1 - 4\widehat{\rho}(1)^2}}{2\widehat{\rho}(1)} \]
To eliminate complex numbers, we restrict $|\widehat{\rho}(1)| \leq 1/2$. There are two
estimates from the quadratic equation. The one using addition has a vertical asymptote at
$\widehat{\rho}(1) = 0$, while the minus solution has a hole discontinuity at $\widehat{\rho}(1)$.
We go with the minus solution.
\[ \widehat{\theta} = \dfrac{1 - \sqrt{1 - 4\widehat{\rho}(1)^2}}{2\widehat{\rho}(1)} \]
rho_1 <- seq(
from = -0.5,
to = 0.5,
by = 0.0001
)
theta_p <- (1 + sqrt(1 - 4*(rho_1^2)))/(2*rho_1)
theta_m <- (1 - sqrt(1 - 4*(rho_1^2)))/(2*rho_1)
plot(
x = rho_1,
y = theta_p,
pch = ".",
ylim = c(-100,100)
)
plot(
x = rho_1,
y = theta_m,
pch = "."
)
Using the delta method, we can see that
\[ \widehat{\theta} \sim AN\left(\theta, \dfrac{1+\theta^2+4\theta^4+\theta^6 + \theta^8}{n(1-

\theta^2)^2}\right) \]
This example highlights disadvantages of the method of moments.
• For $|\widehat{\rho}(1)| > 1/2$ we get complex estimates which do not make sense.
• There are two solutions, requiring the statistician to figure out which one to use.
• One of the solutions has a vertical asymptote with respect to $\widehat{\rho}(1)$ which
is unreasonable.
• For large samples, the method of moments estimate will have greater variance than the
maximum likelihood estimate.
set.seed(
seed = 20230114
)
sarima.sim_ma1 = astsa::sarima.sim(
ma = 0.9,
n = 50
)
astsa::acf1(
max.lag = 1,
plot = FALSE
) # [1] .536 (lag 1 sample ACF)
## [1] 0.46
astsa::acf1(
max.lag = length(
x = sarima.sim_ma1
) - 1,
plot = TRUE
)
## [1] 0.46 -0.08 -0.25 -0.23 0.04 0.05 -0.07 -0.11 -0.04 0.08
0.09 0.08
## [13] 0.00 0.10 0.10 -0.19 -0.29 -0.18 -0.01 0.08 -0.01 -0.09 -
0.04 0.07
## [25] 0.14 0.07 -0.09 -0.20 -0.06 0.04 0.04 0.03 -0.05 -0.01
0.00 -0.03
## [37] 0.02 -0.03 -0.06 -0.04 -0.02 -0.01 0.01 0.06 0.04 0.03
0.04 0.01
## [49] 0.01
- Maximum Likelihood Estimation
Aaron Smith
2023-01-27


Maximum likelihood estimation of an AR(1) process

To make the concepts more solid, we will focus on ar AR(1) process.
\[ x_t = \mu + \phi(x_{t-1}-\mu) + w_t \\ |\phi| < 1 \\ w_t \sim iid N(0,\sigma_w^2) \]
Given the observations $x_1,x_2,...,x_n$, we want to maximize the likelihood function.

Likelihood functions and densities are equal to each other, but treat the data and the
parameters differently.
• density functions: the observations are variables, the parameter values are constant
(data has not been collected yet)
• likelihood functions: the parameters are the variables, the observations are constant
(after data collection)
When we do not plug-in values for the data nor the parameters, the likelihood function and the
density are equal to each other.
\[ L(\mu,\phi.\sigma_w^2|x_1,x_2,...,x_n) = f(x_1,x_2,...,x_n|\mu,\phi.\sigma_w^2) \]
Since our process is AR(1), each observation depends of the prior data point.
\[ L(\mu,\phi.\sigma_w^2|x_1,x_2,...,x_n) =
f(x_1|\mu,\phi.\sigma_w^2)f(x_2|x_1,\mu,\phi.\sigma_w^2)f(x_3|x_2,\mu,\phi.\sigma_w^2)...f(x_
n|x_{n-1},\mu,\phi.\sigma_w^2) \]
If we isolate $w_t$ in our process equation and consider that it is normally distributed, $w_t
\sim N(0,\sigma_w^2)$, we get
\[ w_t = (x_t - \mu) - \phi(x_{t-1}-\mu) \\ x_t|x_{t-1} \sim N(\mu - \phi(x_{t-1}-\mu),\sigma_w^2) \\

f(x_t|x_{t-1},\mu,\phi.\sigma_w^2) = f_w[(x_t - \mu) - \phi(x_{t-1}-\mu)] \]
Entering this into our likelihood function, we get
\[\begin{align} L(\mu,\phi.\sigma_w^2|x_1,x_2,...,x_n) =&

f(x_1|\mu,\phi,\sigma_w^2)\prod_{t=2}^{n}f_w[(x_t - \mu) - \phi(x_{t-1}-\mu)] \\ =&
f(x_1|\mu,\phi,\sigma_w^2)\prod_{t=2}^{n}\dfrac{1}{\sqrt{2\pi\sigma_w^2}}e^{-\dfrac{[(x_t - \mu) -
\phi(x_{t-1}-\mu)]^2}{2\sigma_w^2}} \\ =&
f(x_1|\mu,\phi,\sigma_w^2)\dfrac{1}{(\sqrt{2\pi\sigma_w^2})^{n-1}}e^{-\sum_{t=2}^{n}\dfrac{[(x_t
- \mu) - \phi(x_{t-1}-\mu)]^2}{2\sigma_w^2}} \\ =& f(x_1|\mu,\phi,\sigma_w^2)
(2\pi\sigma_w^2)^{-(n-1)/2} exp\left(-\frac{1}{2\sigma_w^2}\sum_{t=2}^{n}[(x_t - \mu) - \phi(x_{t-
1}-\mu)]^2\right) \\ \end{align}\]
To get a better grasp of $x_1$, we take its causal representation. We see that $x_1$ is normal
with expected value $\mu$ and variance $\dfrac{\sigma_w^2}{1 - \phi^2}$.
\[\begin{align} x_1 =& \mu + \sum_{j = 0}^{\infty}\phi^j w_{1-j} \\ E(x_1) =& E\left(\mu + \sum_{j =
0}^{\infty}\phi^j w_{1-j}\right) = \mu + \sum_{j = 0}^{\infty}\phi^j E(w_{1-j})= \mu \\ V(x_1) =&
V\left(\mu + \sum_{j = 0}^{\infty}\phi^j w_{1-j}\right)\\ =& E\left(\sum_{j = 0}^{\infty}\phi^j w_{1-
j}\sum_{k = 0}^{\infty}\phi^k w_{1-k}\right) \\ =& E\left(\sum_{j = 0}^{\infty}\sum_{k =
0}^{\infty}\phi^{j+k} w_{1-j} w_{1-k}\right)\\ =& E\left(\sum_{j = 0}^{\infty}\phi^{2j} w_{1-j}^2\right)
+ E\left(\sum_{j\neq k}\phi^{j+k} w_{1-j} w_{1-k}\right)\\ =& \sum_{j = 0}^{\infty}\phi^{2j}
E\left(w_{1-j}^2\right) + \sum_{j\neq k}\phi^{j+k} E\left(w_{1-j} w_{1-k}\right)\\ =&
\sigma_w^2\sum_{j = 0}^{\infty}\phi^{2j} \\ =& \dfrac{\sigma_w^2}{1 - \phi^2} \end{align}\]
\[\begin{align} L(\mu,\phi.\sigma_w^2|x_1,x_2,...,x_n) =& f(x_1|\mu,\phi,\sigma_w^2)

(2\pi\sigma_w^2)^{-(n-1)/2} exp\left(-\frac{1}{2\sigma_w^2}\sum_{t=2}^{n}[(x_t - \mu) - \phi(x_{t-
1}-\mu)]^2\right) \\ =& \dfrac{1}{\sqrt{2\pi \dfrac{\sigma_w^2}{1 - \phi^2}}} exp\left(-\dfrac{(x_1-
\mu)^2}{2 \dfrac{\sigma_w^2}{1 - \phi^2}}\right) (2\pi\sigma_w^2)^{-(n-1)/2} exp\left(-
\frac{1}{2\sigma_w^2}\sum_{t=2}^{n}[(x_t - \mu) - \phi(x_{t-1}-\mu)]^2\right) \\ =& \sqrt{1 -
\phi^2}(2\pi\sigma_w^2)^{-n/2} exp\left(-\frac{1}{2\sigma_w^2}\left((1 - \phi^2)(x_1-\mu)^2 +
\sum_{t=2}^{n}[(x_t - \mu) - \phi(x_{t-1}-\mu)]^2\right)\right) \\ =& \sqrt{1 -
\phi^2}(2\pi\sigma_w^2)^{-n/2} exp\left(-\frac{S(\mu,\phi)}{2\sigma_w^2}\right) \end{align}\]
Unconditional sum of squares
\[ S(\mu,\phi) = (1 - \phi^2)(x_1-\mu)^2 + \sum_{t=2}^{n}[(x_t - \mu) - \phi(x_{t-1}-\mu)]^2 \]
If use a derivative to maximize $L(\mu,\phi.\sigma_w^2|x_1,x_2,...,x_n)$ with respect to

$\sigma_w^2$.
\[ \dfrac{\partial}{\partial \sigma_w^2} L(\mu,\phi.\sigma_w^2|x_1,x_2,...,x_n) = (2\pi)^{-n/2}(1-
\phi^2)^{1/2}(\sigma_w^2)^{-n/2}exp\left(-\dfrac{S(\mu,\phi)}{2}(\sigma_w^2)^{-
1}\right)\dfrac{1}{2}(\sigma_w^2)^{-1}\left[S(\mu,\phi)(\sigma_w^2)^{-1}-n\right] \]
Setting the partial derivative to zero and solving, we get maximum likelihood estimator of our
white noise variance.
\[ \widehat{\sigma}_w^2 = \dfrac{S(\widehat{\mu},\widehat{\phi})}{n} \]
This is a biased estimator. If the replace $n$ in the denominator with $n-2$, we get the
unbiased unconditional least squares estimate.
We do not need to run the second derivative test. The likelihood function is smooth and positive
over the positive real line and it will go to zero to the right, hence the only point where the
derivative is zero has to be a maximum.
If we take the logarithm of the likelihood function and drop constants, we get an criterion
function for minimization.
\[\begin{align} log\left(L(\mu,\phi.\sigma_w^2|x_1,x_2,...,x_n)\right) &= log\left(\sqrt{1 -

\phi^2}(2\pi\sigma_w^2)^{-n/2} exp\left(-\frac{S(\mu,\phi)}{2\sigma_w^2}\right)\right) \\ &=
log\left(\sqrt{1 - \phi^2}\right) + log\left((2\pi\sigma_w^2)^{-n/2}\right) + log\left(exp\left(-
\frac{S(\mu,\phi)}{2\sigma_w^2}\right)\right) \\ &= \dfrac{1}{2}log\left(1 - \phi^2\right) -\dfrac{n}{2}
log\left(2\pi\sigma_w^2\right)-\frac{S(\mu,\phi)}{2\sigma_w^2} \\ &= \dfrac{1}{2}log\left(1 -
\phi^2\right) -\dfrac{n}{2} log\left(2\pi\right) -\dfrac{n}{2} log\left(\sigma_w^2\right)-
\frac{S(\mu,\phi)}{2\sigma_w^2} \\ &\approx \dfrac{1}{2}log\left(1 - \widehat{\phi}^2\right) -
\dfrac{n}{2} log\left(2\pi\right) -\dfrac{n}{2}
log\left(\dfrac{S(\widehat{\mu},\widehat{\phi})}{n}\right)-
\frac{S(\widehat{\mu},\widehat{\phi})}{2\dfrac{S(\widehat{\mu},\widehat{\phi})}{n}} \\ &= -
\dfrac{n}{2} \left[ log\left(\dfrac{S(\widehat{\mu},\widehat{\phi})}{n}\right) - \dfrac{1}{n}log\left(1 -
\widehat{\phi}^2\right) \right] -\dfrac{n}{2} log\left(2\pi\right)-\frac{n}{2} \\
I(\widehat{\mu},\widehat{\phi}) &= log\left(\dfrac{S(\widehat{\mu},\widehat{\phi})}{n}\right) -
\dfrac{1}{n}log\left(1 - \widehat{\phi}^2\right) \end{align}\]
This criterion function is rather complicated. We could reasonably simplify the situation by
ignoring the first observation.
\[ S_c(\mu,\phi) = \sum_{t=2}^{n}[(x_t - \mu) - \phi(x_{t-1}-\mu)]^2 \\ = \sum_{t=2}^{n}[x_t -

(\alpha + \phi x_{t-1})]^2 \\ \alpha = \mu(1-\phi)\\ \widehat{\sigma}_w^2 =
\dfrac{S_c(\widehat{\mu},\widehat{\phi})}{n-1} \]
We use these equations to solve the regression problem.
\[ \widehat{\alpha} = \bar{x}_{(2)} - \widehat{\phi}\bar{x}_{(1)} \\ \bar{x}_{(1)} = \dfrac{1}{n-

1}\sum_{t = 1}^{n-1}x_t \\ \bar{x}_{(2)} = \dfrac{1}{n-1}\sum_{t = 2}^{n-1}x_t \\ \widehat{\mu} =
\dfrac{\bar{x}_{(2)} - \widehat{\phi}\bar{x}_{(1)}}{1 - \widehat{\phi}} \\ \widehat{\phi} =
\dfrac{\sum_{t = 2}^{n}(x_t-\bar{x}_{(2)})(x_{t-1} - \bar{x}_{(1)})}{\sum_{t = 2}^{n}(x_{t-1} -
\bar{x}_{(1)})^2} \]
Notice that for AR(1) processes with large sample size, the Yule-Walker estimators and the
conditional least squares estimators are approximately the same. The difference is in how the
end points of our data are handled, $x_1,x_n$.
\[ \widehat{\mu} \approx \bar{x} \\ \widehat{\phi} \approx \widehat{\rho}(1) \]
Notice that we can adjust the simplier white noise variance estimator to be unbiased.
\[ \widehat{\sigma}_w^2 = \dfrac{S_c(\widehat{\mu},\widehat{\phi})}{n-3} \]
For general AR(p) models, maximum likelihood estimation, unconditional least squares, and
conditional least squares are analogous to the AR(1) example.
For general ARMA models, it is difficult to explicitly write the likelihood function. Instead we
write the likelihood in terms of the innovations, or one-step-ahead prediction errors.
\[ x_t - x_{t}^{t-1} \]
Dr. Smith skipped material on estimating ARMA(p,q) models with maximum likelihood. The
material in the textbook is very general is not specific for our usage.
Example 3.31
Let’s fit a AR(2) model to the Recruitment time series using maximum likelihood.
Load data
rm(
list = ls()
)
options(
digits = 2,
scipen = 999
)
data(
list = "rec",
package = "astsa"
)
ar.mle_rec = ar.mle(
x = rec,
order = 2
)
ar.mle_rec$x.mean
## [1] 62
ar.mle_rec$ar
## [1] 1.35 -0.46
sqrt(
x = diag(
x = ar.mle_rec$asy.var.coef
)
) # standard errors
## [1] 0.041 0.041
ar.mle_rec$var.pred
## [1] 89
Using Gauss-Newton to compute estimates

Example 3.32 Gauss–Newton for an MA(1)
Consider an invertible MA(1) process
\[ x_t = w_t + \theta w_{t-1} \]
Let $w_t(\beta)$ be the error between the observed value and modeled value immediately
before $x_t$ is observed.
\[ w_t(\beta) = x_t - \sum_{j =1}^{p} \phi_j x_{t-j} - \sum_{k = 1}^{q}\theta_k w_{t-k}(\beta) \]
Notice that the errors depend on the parameters.
For our MA(1) process we set $w_0(\theta) = 0$ and get
\[ w_t(\theta) = x_t - \theta w_{t-1}(\theta), t = 1,2,...,n \]
Let’s take the derivative with respect to the parameter.
\[ \dfrac{\partial}{\partial \theta}w_t(\theta) = - w_{t-1}(\theta) - \theta\dfrac{\partial}{\partial \theta}

w_{t-1}(\theta) \]
Set
\[ \dfrac{\partial}{\partial \theta}w_0(\theta) = 0 \\ z_t(\theta) = -\dfrac{\partial}{\partial

\theta}w_t(\theta) \\ z_0(\theta) = 0 \]
Our differential equation becomes
\[ z_t(\theta) = w_{t-1}(\theta) - \theta z_{t-1}(\theta) \]
Let $\theta_{(0)}$ be an initial guess of $\theta$. The Gauss–Newton procedure for conditional
least squares is given by
\[ \theta_{(j + 1)} = \theta_{(j)} + \dfrac{\sum_{t = 1}^{n}z_t(\theta_{(j)})w_t(\theta_{(j)})}{\sum_{t =
1}^{n}z^2_t(\theta_{(j)})} \]
We use this recursive equation to estimate $\theta$.
Example 3.33 Fitting the Glacial Varve Series

Consider the series of glacial varve thicknesses from Massachusetts for n = 634 years. The
textbook states a case for a first-order moving average model might fit the logarithmically
transformed and differenced varve series.
\[ \nabla log(x_t) = log(x_t) - log(x_{t-1}) = log\left(\dfrac{x_t}{x_{t-1}}\right) \]
$\nabla log(x_t)$ is approximately the percentage change.
The ACF and PACF plots show that a MA(1) model is appropriate.
data(
list = "varve",
package = "astsa"
)
diff_log_varve = diff(
x = log(
x = varve
)
) # data
r <- astsa::acf1(
max.lag = 1,
plot = FALSE
) # acf(1)
astsa::acf1(
max.lag = length(
x = diff_log_varve
) - 1,
plot = TRUE
)
## [1] -0.40 -0.04 -0.06 0.01 0.00 0.04 -0.04 0.04 0.01 -0.05
0.06 -0.06
## [13] -0.04 0.08 -0.02 0.01 0.00 0.03 -0.05 -0.06 0.07 0.04 -
0.06 0.05
## [25] -0.01 -0.04 0.05 -0.05 0.03 -0.02 0.00 0.06 -0.05 -0.03
0.04 -0.05
## [37] 0.03 -0.06 0.09 -0.03 0.03 -0.01 -0.02 -0.04 -0.01 0.08 -
0.06 0.08
## [49] -0.08 0.05 -0.06 0.03 -0.01 0.03 -0.03 0.02 0.01 0.00 -
0.03 0.02
## [61] -0.03 0.01 0.03 0.04 -0.07 -0.03 0.03 0.00 0.02 -0.01
0.00 0.05
## [73] -0.06 -0.02 -0.01 0.07 -0.04 0.03 -0.07 0.04 0.02 0.01 -
0.06 -0.02
## [85] 0.09 -0.04 0.00 -0.02 0.03 0.03 -0.05 0.01 0.00 0.03
0.01 -0.05
## [97] 0.01 0.00 0.04 -0.04 -0.04 0.04 0.02 0.03 -0.03 -0.05 -
0.01 0.04
## [109] -0.02 0.02 0.01 -0.05 0.05 -0.03 0.00 0.02 -0.02 0.02
0.03 -0.04
## [121] -0.03 0.04 -0.05 0.07 -0.06 0.06 -0.04 0.02 -0.01 0.00
0.01 0.00
## [133] -0.01 0.01 0.04 -0.03 -0.04 0.07 -0.04 -0.01 0.00 0.04 -
0.01 -0.02
## [145] 0.02 -0.10 0.10 -0.06 0.04 0.03 -0.05 0.02 0.03 -0.04 -
0.06 0.10
## [157] -0.04 0.06 -0.01 -0.10 0.08 -0.04 -0.01 0.02 0.01 -0.04
0.06 -0.06
## [169] 0.02 0.01 -0.05 0.07 -0.02 0.07 -0.03 -0.03 0.01 -0.07
0.06 0.05
## [181] -0.07 0.00 0.05 -0.04 -0.04 0.03 -0.02 0.07 0.00 -0.09
0.08 -0.05
## [193] 0.02 0.02 0.00 -0.03 0.05 -0.01 -0.05 0.00 -0.01 0.00
0.03 0.04
## [205] -0.05 -0.03 0.00 0.03 0.01 -0.05 0.03 -0.03 0.04 -0.02
0.02 0.02
## [217] -0.04 -0.02 0.03 0.01 0.01 -0.04 0.04 -0.05 0.02 0.00 -
0.01 0.05
## [229] -0.05 0.04 -0.05 0.03 -0.01 0.03 -0.06 0.02 0.01 -0.02
0.04 -0.04
## [241] 0.01 0.02 -0.01 0.02 -0.01 0.01 -0.02 -0.04 0.03 -0.01
0.01 0.02
## [253] -0.07 0.05 -0.01 0.02 -0.02 -0.01 0.01 0.04 -0.03 -0.02
0.01 0.00
## [265] 0.03 -0.03 0.02 -0.01 0.04 -0.09 0.04 -0.02 0.03 0.00 -
0.06 0.06
## [277] -0.03 0.04 -0.06 0.06 -0.05 0.01 0.02 -0.01 0.04 -0.01 -
0.01 0.00
## [289] -0.01 0.01 0.03 -0.03 0.00 0.00 -0.01 0.05 -0.02 -0.05
0.03 -0.01
## [301] 0.04 -0.07 0.03 -0.04 0.07 0.00 -0.03 0.03 -0.03 0.00
0.02 -0.02
## [313] -0.01 0.01 0.02 -0.05 0.06 -0.05 0.01 0.00 0.02 0.02 -
0.04 0.01
## [325] 0.00 0.02 -0.03 0.01 -0.04 0.05 -0.02 0.03 0.00 -0.09
0.07 0.02
## [337] -0.03 -0.01 -0.01 -0.02 0.05 -0.06 0.02 0.05 -0.04 0.00
0.04 -0.06
## [349] 0.05 -0.06 0.01 0.00 0.02 0.02 -0.04 -0.01 0.00 0.04 -
0.04 -0.01
## [361] 0.04 -0.01 0.01 -0.04 0.05 -0.06 0.06 0.00 -0.04 0.03 -
0.01 0.01
## [373] -0.02 -0.01 0.02 0.00 0.01 -0.01 -0.04 0.03 0.00 0.01
0.02 -0.02
## [385] -0.02 0.01 0.01 0.00 0.01 -0.02 0.01 -0.01 0.00 0.00
0.02 0.00
## [397] -0.01 -0.01 0.02 0.00 -0.02 0.02 0.00 0.02 -0.02 -0.02
0.02 -0.02
## [409] 0.03 -0.04 0.03 -0.01 0.01 -0.02 0.02 -0.02 -0.01 0.00
0.03 -0.03
## [421] 0.05 -0.06 0.04 -0.02 0.04 -0.05 0.02 -0.03 0.06 -0.03
0.01 -0.04
## [433] 0.02 0.01 0.04 -0.03 0.00 -0.01 0.00 0.00 0.00 -0.01
0.03 -0.02
## [445] -0.02 0.05 -0.04 0.02 -0.03 0.03 0.00 0.01 -0.01 -0.02
0.01 -0.01
## [457] 0.02 0.00 0.00 0.01 -0.03 0.03 -0.02 0.00 0.00 0.01
0.02 -0.03
## [469] 0.02 -0.01 0.01 0.00 0.03 -0.04 0.02 -0.04 0.05 -0.03
0.01 0.01
## [481] 0.00 -0.02 0.02 -0.01 -0.01 -0.01 0.01 0.01 0.00 -0.03
0.05 -0.05
## [493] 0.02 0.03 -0.04 0.01 0.02 0.00 0.01 -0.01 0.00 0.01 -
0.01 0.02
## [505] 0.03 -0.05 -0.01 0.03 0.00 0.00 -0.01 0.02 -0.01 -0.02
0.00 0.02
## [517] -0.03 0.02 0.01 -0.01 -0.01 0.01 0.00 0.00 -0.01 0.01 -
0.01 0.00
## [529] 0.00 0.02 -0.04 0.03 -0.01 0.01 -0.03 0.01 0.01 -0.01 -
0.01 0.02
## [541] 0.00 -0.03 0.04 -0.01 -0.01 -0.01 0.01 0.00 0.00 0.00
0.02 -0.03
## [553] -0.01 0.03 -0.01 0.00 0.00 -0.01 0.02 -0.02 0.03 -0.01 -
0.02 0.00
## [565] 0.03 -0.02 0.00 0.02 -0.02 0.02 -0.02 0.02 -0.01 0.00
0.00 0.01
## [577] -0.01 -0.01 0.01 0.01 -0.01 -0.01 0.00 0.02 -0.02 0.01 -
0.01 0.02
## [589] -0.02 0.02 -0.01 0.00 0.01 -0.02 0.01 0.00 0.01 -0.01
0.00 0.00
## [601] 0.01 0.00 0.00 0.01 -0.02 0.01 0.01 0.00 -0.01 -0.01
0.02 -0.01
## [613] 0.00 0.00 0.00 -0.01 0.00 0.01 -0.01 0.00 0.00 0.01 -
0.01 0.00
## [625] 0.01 0.00 -0.01 0.00 0.00 0.00 0.00 0.00
initialize all variables
#c(0) -> w -> z -> Sc -> Sz -> Szw -> para
w <- 0
z <- 0
Sc <- 0
Sz <- 0
Szw <- 0
para <- 0
length_varve = length(
x = diff_log_varve
) # 633
Gauss-Newton Estimation
para[1] <- (1-sqrt(1-4*(r^2)))/(2*r) # MME to start (not very good)
niter <- 20
for (j in 1:niter){
w[t] <- diff_log_varve[t] - para[j]*w[t-1]
z[t] <- w[t-1] - para[j]*z[t-1]
}
Sc[j] <- sum(
x = w^2
)
Sz[j] <- sum(z^2)
Szw[j] <- sum(z*w)
para[j+1] <- para[j] + Szw[j]/Sz[j]
}
Results
cbind(
iteration = 1:niter-1,
thetahat = para[1:niter],
Sc,
Sz
)
## iteration thetahat Sc Sz
## [1,] 0 -0.49 159 171

## [2,] 1 -0.67 151 235
## [3,] 2 -0.73 149 301
## [4,] 3 -0.76 149 337
## [5,] 4 -0.77 149 354
## [6,] 5 -0.77 149 362
## [7,] 6 -0.77 149 366
## [8,] 7 -0.77 149 367
## [9,] 8 -0.77 149 368
## [10,] 9 -0.77 149 369
## [11,] 10 -0.77 149 369
## [12,] 11 -0.77 149 369
## [13,] 12 -0.77 149 369
## [14,] 13 -0.77 149 369
## [15,] 14 -0.77 149 369
## [16,] 15 -0.77 149 369
## [17,] 16 -0.77 149 369
## [18,] 17 -0.77 149 369
## [19,] 18 -0.77 149 369
## [20,] 19 -0.77 149 369
Plot conditional SS and results
#c(0) -> w -> cSS
w <- 0
cSS <- 0
th = seq(
from = -0.3,
to = -0.94,
by = -0.01
)
for(p in 1:length(th)){
w[t] <- diff_log_varve[t] - th[p]*w[t-1]
}
cSS[p] <- sum(
x = w^2
)
}
astsa::tsplot(
x = th,
y = cSS,
ylab = expression(S[c](theta)),
xlab = expression(theta)
)
abline(
v = para[1:length(Sc)],
lty = 2,
col = 4
) # add previous results to plot
points(
x = para[1:length(Sc)],
y = Sc,
pch = 16,
col = 4
)
In the general case of causal and invertible ARMA(p, q) models, maximum likelihood estimation
and conditional and unconditional least squares estimation (and Yule–Walker estimation in the
case of AR models) all lead to optimal estimators.
Going forward, we will denote the ARMA(p,q) coefficients as $\beta$.
\[ \beta = \begin{pmatrix} \phi_1 \\ \vdots \\ \phi_p \\ \theta_1 \\ \vdots \\ \theta_q \end{pmatrix} \]
Property 3.10 Large Sample Distribution of the Estimators

Under appropriate conditions, for causal and invertible ARMA processes, the maximum
likelihood, the unconditional least squares, and the conditional least squares estimators, each
initialized by the method of moments estimator, all provide optimal estimators of $\sigma_w^2$
and $\beta$, in the sense that $\sigma_w^2$ is consistent, and the asymptotic distribution of
$\widehat{\beta}$ is the best asymptotic normal distribution. In particular, as
\[ \sqrt{n}(\widehat{\beta}-\beta) \xrightarrow{d} N\left(0,\sigma_w^2 \Gamma^{-1}_{p,q}\right) \]
The asymptotic variance–covariance matrix of the estimator $\widehat{\beta}$ is the inverse of

the information matrix.
\[ \Gamma_{p,q} = \begin{pmatrix} \Gamma_{\phi\phi} & \Gamma_{\phi \theta} \\
\Gamma_{\theta \phi} & \Gamma_{\theta \theta} \end{pmatrix} \\ \]
• $\Gamma_{\phi\phi} = \{\gamma_x(i-j)\}_{i,j = 1}^{p}$ is a $p\times p$ matrix of

autocovariances from an AR(p) process, $\phi(B)x_t = w_t$.
• $\Gamma_{\theta \theta} = \{\gamma_y(i-j)\}_{i,j = 1}^{q}$ is a $q\times q$ matrix of
autocovariances from an AR(q) process, $\theta(B)y_t = w_t$. Notice that in the
calculations, $\theta(B)y_t = w_t$, means that we treat the moving average coefficients
as autoregressive coefficients but with sign changes.
• $\Gamma_{\phi \theta} = \{\gamma_{xy}(i-j)\}_{i=1:p,j = 1:q}$ is a $p\times q$ matrix of
cross-covariances for the two processes $\phi(B)x_t = w_t$ and $\theta(B)y_t = w_t$.
• $\Gamma_{\theta \phi} = \Gamma_{\phi \theta}'$
Example 3.34 Some Specific Asymptotic Distributions

AR(1):
\[ \gamma_x(0) = \dfrac{\sigma_w^2}{1 - \phi^2} \rightarrow \sigma_w^2 \Gamma_{1,0}^{-1} = 1

- \phi^2 \\ \widehat{\phi} \sim AN(\phi,\dfrac{1-\phi^2}{n}) \]
AR(2):
Using the autocovariance equations we get
\[ \gamma_x(2) = \phi_1\gamma_x(1) + \phi_2\gamma_x(0) \\ \gamma_x(1) =

\phi_1\gamma_x(0) + \phi_2\gamma_x(1) \\ \gamma_x(0) = \phi_1\gamma_x(1) +
\phi_2\gamma_x(2) + \sigma_w^2\\ \]
Solving the system of equations we get
\[ \gamma_x(1) = \dfrac{\phi_1}{1-\phi_2}\gamma_x(0) \\ \gamma_x(2) = \left(\dfrac{\phi_1^2}{1-

\phi_2} + \phi_2\right)\gamma_x(0) \\ \gamma_x(0) = \dfrac{1 - \phi_2}{1 +
\phi_2}\dfrac{\sigma_w^2}{(1 - \phi_2)^2 - \phi_1^2} \\ \Gamma_{2,0} = \begin{pmatrix}
\gamma_x(0) & \dfrac{\phi_1}{1-\phi_2}\gamma_x(0) \\ \dfrac{\phi_1}{1-\phi_2}\gamma_x(0) &
\gamma_x(0) \end{pmatrix} \]
Plugging the values into the theorem gives us.
\[ \begin{pmatrix}\widehat{\phi}_1 \\ \widehat{\phi}_2 \\ \end{pmatrix} \sim AN\left[

\begin{pmatrix}\phi_1 \\ \phi_2 \\ \end{pmatrix}, \dfrac{1}{n}\begin{pmatrix} (1-\phi_2)(1+\phi_2)
& -\phi_1(1+\phi_2) \\ -\phi_1(1+\phi_2) & (1-\phi_2)(1+\phi_2) \end{pmatrix} \right] \]
MA(1):
We handle this the same way as the AR(1) case. $\theta(B)y_t = w_t$ or $y_t + \theta y_{t-1} =
w_t$.
\[ \gamma_y(0) = \dfrac{\sigma_w^2}{1-\theta^2} \rightarrow \sigma_w^2 \Gamma_{0,1}^{-1} =
1-\theta^2 \]
Plugging into the theorem we get
\[ \widehat{\theta} \xrightarrow{d} AN\left[\theta,\dfrac{1-\theta^2}{n}\right] \]
MA(2):
Again we treat the moving average coefficients like autoregressive coefficients. Notice the sign
changes compared the AR(2) case.
\[ \begin{pmatrix}\widehat{\theta}_1 \\ \widehat{\theta}_2 \\ \end{pmatrix} \sim AN\left[

\begin{pmatrix}\theta_1 \\ \theta_2 \\ \end{pmatrix}, \dfrac{1}{n}\begin{pmatrix} (1-
\theta_2)(1+\theta_2) & \theta_1(1+\theta_2) \\ \theta_1(1+\theta_2) & (1-\theta_2)(1+\theta_2)
\end{pmatrix} \right] \]
ARMA(1,1):
To get the covariance matrix, $\dfrac{\sigma_w^2}{n}\Gamma_{1,1}^{-1}$, we find

$\Gamma_{1,1}$ by entering AR(1), MA(1) variances on the diagonal, then put the cross-
covariance term in the off-diagonal.
\[ \Gamma_{1,1} = \begin{pmatrix} \dfrac{\sigma_w^2}{1 - \phi^2} & \gamma_{xy}(0) \\

\gamma_{xy}(0) & \dfrac{\sigma_w^2}{1 - \theta^2} \end{pmatrix} \] To find $\gamma_{xy}(0)$,
we use the backshift formulas in the theorem.
\[ \gamma_{xy}(0) = cov(x_t,y_t) = cov(\phi x_{t-1} + w_t,-\theta y_{t-1} + w_t) = -\phi \theta

cov(x_{t-1},y_{t-1}) + \sigma_w^2 = -\phi \theta \gamma_{xy}(0) + \sigma_w^2 \\
\gamma_{xy}(0) = \dfrac{\sigma_w^2}{1 + \phi\theta} \\ \Gamma_{1,1} = \begin{pmatrix}
\dfrac{\sigma_w^2}{1 - \phi^2} & \dfrac{\sigma_w^2}{1 + \phi\theta} \\ \dfrac{\sigma_w^2}{1 +
\phi\theta} & \dfrac{\sigma_w^2}{1 - \theta^2} \end{pmatrix} \\ \begin{pmatrix} \widehat{\phi} \\
\widehat{\theta} \end{pmatrix} \sim AN\left[\begin{pmatrix} \phi \\ \theta
\end{pmatrix},\dfrac{\phi\theta+1}{n(\phi+\theta)^2}\begin{pmatrix}-(\phi - 1) (\phi + 1) (\phi \theta
+ 1) & -(\phi - 1)(\phi + 1)(\theta - 1)(\theta + 1) \\ -(\phi - 1)(\phi + 1)(\theta - 1)(\theta + 1) & -
(\theta - 1)(\theta + 1)(\phi \theta + 1) \end{pmatrix}\right] \]
- Overfitting and Bootstrap
Aaron Smith
2023-01-27


Example 3.35 Overfitting Caveat

If we overfit, we obtain less efficient, or less precise parameter estimates.
Suppose a time series follows an AR(1) process, but we decide to fit an AR(2) model.
The correct model would have this asymptotic distribution for the parameter.
\[ \widehat{\phi}_1 \sim AN\left(\phi_1,\dfrac{1-\phi_1^2}{n}\right) \]
But because of our model selection error, the parameters have this asymptotic distribution.
(Take the AR(2) distribution from the previous section and plug in zero for $\phi_2$.)
\[ \begin{pmatrix}\widehat{\phi}_1 \\ \widehat{\phi}_2 \\ \end{pmatrix} \sim AN\left[

\begin{pmatrix}\phi_1 \\ 0 \\ \end{pmatrix}, \dfrac{1}{n}\begin{pmatrix} 1 & -\phi_1 \\ -\phi_1 & 1
\end{pmatrix} \right] \]
Variance of $\widehat{\phi}_1$ increased, and the covariance with $\widehat{\phi}_2$ makes

things more complex, and the $\widehat{\phi}_2$ would not improve the model.
Typically, when we overfit we will see terms that are not statistically significant. We can use this
to grow our model past what we believe is the correct model, then confirm that the hypothesized
extra terms can be dropped.
Why are the asymptotic distributions of AR(1) and MA(1) so similar?

Regression:
Consider a normal regression model with no intercept.
\[ x_t = \beta z_t + w_t \\ \widehat{\beta} \text{ is asymptotically normal with mean zero} \\
V\left(\sqrt{n}(\widehat{\beta} - \beta)\right) = n\sigma_w^2\left(\sum_{t = 1}^{n}z_t^2\right)^{-1}
= \sigma_w^2\left(\dfrac{1}{n}\sum_{t = 1}^{n}z_t^2\right)^{-1} \]
AR(1):
For a causal AR(1) process and large sample size
\[ x_t = \phi x_{t-1} + w_t \\ \sqrt{n}\left(\widehat{\phi} - \phi\right) \sim

AN\left(0,\sigma_w^2\left(\dfrac{1}{n}\sum_{t = 2}^{n}x_{t-1}^2\right)^{-1}\right) \]
When $E(x_t) = 0$, $\dfrac{1}{n}\sum_{t = 2}^{n}x_{t-1}^2$ is the sample variance using

population mean instead of sample mean.
\[ \dfrac{1}{n}\sum_{t = 2}^{n}x_{t-1}^2 \xrightarrow{p} \dfrac{\sigma_w^2}{1 - \phi^2} \]
Thus the asymptotic variance of $\sqrt{n}\left(\widehat{\phi} - \phi\right)$ is
\[ \sigma_w^2\left(\dfrac{1}{n}\sum_{t = 2}^{n}x_{t-1}^2\right)^{-1} \xrightarrow{p}

\sigma_w^2\left(\dfrac{\sigma_w^2}{1 - \phi^2}\right)^{-1} = 1 - \phi^2 \]
MA(1):
Let’s repeat this approach for an MA(1) process. Take the Gauss-Newton setup from the
previous section.
\[ z_t(\widehat{\theta}) = -\theta z_{t-1}(\widehat{\theta}) + w_{t-1} \]
The independent variable is $z_{t-1}(\widehat{\theta})$.
\[ \sqrt{n}\left(\widehat{\theta} - \theta\right) \sim

AN\left(0,\sigma_w^2\left(\dfrac{1}{n}\sum_{t=2}^{n}z_{t-1}^2(\widehat{\theta})\right)^{-1}\right) \]
Once again when the expected value of our time series is zero, $\dfrac{1}{n}\sum_{t=2}^{n}z_{t-
1}^2(\widehat{\theta})$ is the sample variance using the population mean instead of the sample
mean. Thus it will converge to the population variance.
With this setup, $z_t(\theta)$ is a AR(1) process.
\[ \sigma_w^2\left(\dfrac{1}{n}\sum_{t = 2}^{n}z_{t-1}^2(\widehat{\theta})\right)^{-1}
\xrightarrow{p} \sigma_w^2\left(\dfrac{\sigma_w^2}{1 - (-\theta)^2}\right)^{-1} = 1 - \theta^2 \]
If $n$ is small, or if the parameters are close to the causal boundaries, the asymptotic
approximations can be quite poor. The bootstrap can be helpful in this case.
Once again, we will use the AR(1) case to provide insight into other time series models.
We consider an AR(1) model with a regression coefficient near the boundary of causality and
an error process that is symmetric but not normal.
\[ x_t - \mu = \phi (x_{t-1} - \mu) + w_t \\ \mu = 50 \\ \phi = 0.95 \\ f(w) = \dfrac{1}{2\beta} e^{-
|w|/\beta}, \ w \in \mathbb{R} \\ \beta = 2 \\ E(w_t) = 0 \\ V(w_t) = 2\beta^2 = 8 \]
generate data
The time series plot looks non-stationary in the mean, but we know that it is stationary.
set.seed(
seed = 20230115
)
# VGAM::rlaplace would have been better
rexp_0.5 = rexp(
n = 150,
rate = 0.5
)
runif_sign = runif( # sample with -1,1 would have been better
n = 150,
min = -1,
max = 1
)
rlaplace_0.5 = rexp_0.5*sign(
x = runif_sign
)
sarima.sim_laplace = 50 + astsa::sarima.sim(
n = 100,
ar = 0.95,
burnin = 50
)
astsa::tsplot(
ylab = expression(X[~t])
)
use Yule-Walker estimate on the data to estimate values
set.seed(
seed = 20230115
) # not that 666
ar.yw_laplace <- ar.yw(
order = 1
) # assumes the data were retained
mean_laplace <- ar.yw_laplace$x.mean # estimate of mean
phi <- ar.yw_laplace$ar # estimate of phi
ar.yw_laplace[c("x.mean","ar","var.pred")]
## $x.mean
## [1] 41.7783
##
## $ar
## [1] 0.8555428
##
## $var.pred
## [1] 9.996455
Let’s run simulations with the known parameters to assess the distribution. This will give us
optimistic results compared to what we will get when estimating the coefficient using estimated
values.
set.seed(
seed = 20230209
)
phi.yw <- rep(NA,1000)
for(i in 1:1000){
e <- rexp(
n = 150,
rate = 0.5
)
u <- runif(
n = 150,
min = -1,
max = 1
)
de <- e*sign(u)
x <- 50 + arima.sim(
n = 100,
model = list(
ar = 0.95
),
innov = de,
n.start = 50
)
phi.yw[i] <- ar.yw(
x = x,
order = 1
)$ar
}
mean(
x = phi.yw
)
## [1] 0.8903678
hist(
x = phi.yw
)
phi.yw_0 <- phi.yw
Our one-step ahead predictions are
\[ x_t^{t-1} = \mu + \phi (x_{t-1} - \mu), \ t > 1 \]
Our prediction errors are
\[ \epsilon_t = x_t - x_t^{t-1} = (x_t - \mu) - \phi (x_{t-1} - \mu) \]
The mean squared prediction error is
\[ E(\epsilon_t^2) = E\left((x_t - \mu) - \phi (x_{t-1} - \mu)\right) = E(w_t^2) = \sigma_w^2 \]
\[ x_t = x_t^{t-1} + \epsilon_t = \mu + \phi (x_{t-1} - \mu) + \epsilon_t \]
Bootstrap
Notice that the previous calculations and equations used the true values of the parameters, not
the estimated values of the parameters.
Now let’s proceed as if we did not know the parameters of the model. This will simulate using
bootstrap on real data.
nboot = 250 # number of bootstrap replicates
resid_laplace = ar.yw_laplace$resid[-1] # the 99 innovations

x.star = sarima.sim_laplace # initialize x*
phi.star.yw = rep(NA,nboot) # initialize phi*
for (i in 1:nboot) {
resid.star <- sample(
x = resid_laplace,
replace = TRUE
)
x.star <- astsa::sarima.sim(
n = 99,
ar = phi,
innov = resid.star,
burnin = 0
) + mean_laplace
phi.star.yw[i] <- ar.yw(
x = x.star,
order = 1
)$ar
}
small sample distn

set.seed(
seed = 20230115
)
phi.yw <- rep(
x = NA,
times = 1000
)
for (i in 1:1000){
rexp_0.5 <- rexp(
n = 150,
rate = 0.5
);
runif_sign <- runif(
n = 150,
min = -1,
max = 1
);
rlaplace_0.5 <- rexp_0.5*sign(
x = runif_sign
)
arima.sim_laplace <- 50 + arima.sim(
n = 100,
list(
ar = 0.95
),
n.start = 50
)
phi.yw[i] <- ar.yw(
x = arima.sim_laplace,
order = 1
)$ar
}
Picture
hist(
x = phi.star.yw,
breaks = 15,
main = "",
prob = TRUE,
xlim = c(
0.65,1.05
),
ylim = c(
0,14
),
col = astsa::astsa.col(
col = 4,
alpha = 0.3
),
xlab = expression(hat(phi))
)
lines(
x = density(
x = phi.yw,
bw = 0.02
),
lwd = 2
)
curve(
expr = dnorm(
x = x,
mean = 0.95,
sd = 0.03
),
from = 0.75,
to = 1.1,
lty = 2,
lwd = 2,
add = TRUE
)
legend(
x = 0.65,
y = 14,
bty = 'n',
lty = c(
1,0,2
),
lwd = c(
2,0,2
),
col = 1,
pch = c(
NA,22,NA
),
pt.bg = c(
NA,astsa::astsa.col(
col = 4,
alpha = 0.3
),NA
),
pt.cex = 2.5,
legend = c(
'true distribution', 'bootstrap distribution', 'normal approximation'
)
)
- Bootstraping Autoregressive
Coefficients
Aaron Smith
2023-02-10
Load our data

options(
digits = 2,
scipen = 999
)
data(
list = "rec",
package = "astsa"
)
Fit an AR(2) model

ar.yw_rec <- ar.yw(
x = rec,
aic = TRUE,
order.max = 2
)
Bootstrap the fitted AR(2) model

Sample with replacement the residuals between the model and the observations.
Then use the residuals to generate a simulated an AR(2) process with the fitted coefficients as
the coefficients.
Then fit an AR(2) model to the simulated time series.
ar.yw_rec_resid <- ar.yw_rec$resid
n_boot <- 1e5

list_ar.yw <- list()
for(j in 1:n_boot) list_ar.yw[[j]] <- ar.yw(
x = ar.yw_rec$x.mean + astsa::sarima.sim(
n = length(
x = rec
) - 2,
ar = ar.yw_rec$ar,
innov = sample(
x = ar.yw_rec_resid[-c(1:2)],
size = length(
x = ar.yw_rec_resid[-c(1:2)]
),
replace = TRUE
),
burnin = 0
),
aic = FALSE,
order.max = 2
)
Extract and organize the bootstrap

coefficients
M_bootstrap <- t(sapply(
X = list_ar.yw,
FUN = function(x) x$ar
))
M_bootstrap <- as.data.frame(
x = M_bootstrap
)
colnames(M_bootstrap) <- c("phi_1","phi_2")
summary(
object = M_bootstrap
)
## phi_1 phi_2
## Min. :1.09 Min. :-0.61

## 1st Qu.:1.29 1st Qu.:-0.47
## Median :1.32 Median :-0.44
## Mean :1.32 Mean :-0.44
## 3rd Qu.:1.35 3rd Qu.:-0.41
## Max. :1.50 Max. :-0.22
Visualize the empirical bootstrap
density
library(ggplot2)
ggplot(M_bootstrap) +
aes(x = phi_1,y = phi_2) +
geom_point(alpha = 0.1) +
geom_density_2d(linewidth = 1) +
geom_point(data = data.frame(phi_1 = mean(M_bootstrap$phi_1),phi_2 =
mean(M_bootstrap$phi_2)),size = 3,color = "red") +
theme_bw() +
labs(
title = "Bootstrap coefficients of the AR(2) model\nwith bootstrap estimated density"
)
Use the bootstrap observations to fit a
bivariate distribution
M_grid <- expand.grid(
phi_1 = seq(
from = min(M_bootstrap$phi_1),
to = max(M_bootstrap$phi_1),
length = 200
),
phi_2 = seq(
from = min(M_bootstrap$phi_2),
to = max(M_bootstrap$phi_2),
length = 200
)
)
M_bootstrapnormal <- cbind(
M_grid,
density = mvtnorm::dmvnorm(
x = M_grid,
mean = colMeans(M_bootstrap),
sigma = cov(M_bootstrap)
)
)
ggplot(M_bootstrapnormal) +
geom_point(data = M_bootstrap,alpha = 0.1) +
geom_contour(mapping = aes(z = density),lwd = 1) +
geom_point(data = data.frame(phi_1 = mean(M_bootstrap$phi_1),phi_2 =
mean(M_bootstrap$phi_2)),size = 3,color = "red") +
theme_bw() +
labs(
title = "Bootstrap coefficients of the AR(2) model\nwith bootstrap estimated parameters of
bivariate normal"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2

3.4.0.
## � Please use `linewidth` instead.

Visualize the asymptotic distribution of
the initial model
M_bivariatenormal <- cbind(
M_grid,
density = mvtnorm::dmvnorm(
x = M_grid,
mean = ar.yw_rec$ar,
sigma = ar.yw_rec$asy.var.coef
)
)
ggplot(M_bivariatenormal) +
geom_point(data = M_bootstrap,alpha = 0.1) +
geom_contour(mapping = aes(z = density),lwd = 1) +
geom_point(data = data.frame(phi_1 = ar.yw_rec$ar[1],phi_2 = ar.yw_rec$ar[2]),size =
3,color = "red") +
theme_bw() +
labs(
title = "Bootstrap coefficients of the AR(2) model\nwith asymptotic normal distribution"
)

Chap 35

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap 35

Uploaded by

Copyright:

Available Formats

3 ARIMA Models - 3.

The most recent version of the package can be found at https://github.com/nickpoison/astsa/

You can find demonstrations of astsa capabilities at

In addition, the News and ChangeLog files are at

UCF students can download it for free through the library.

methods of moments estimation

\[ \phi_1,\phi_2,...,\phi_p \\ \theta_1,\theta_2,...,\theta_q \\ \sigma_w^2 \]

Method of moments can lead to sub-optimal estimators.

Definition 3.10 The Yule–Walker equations

\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + w_t \]

Pluging in values for \(h\), we get:

\[ \gamma(h) = \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + ... + \phi_p \gamma(h - p), \ h =

Using matrix notation:

\[ \Gamma_p \phi = \gamma_p \\ \sigma_w^2 = \gamma(0) - \phi ' \gamma_p \\ \Gamma_p =

\[ \widehat{\gamma}(h) = \dfrac{1}{n}\sum_{t = 1}^{n - h}(x_{t+h} - \bar{x})(x_{t} - \bar{x}) \\

\[ \widehat{\phi} = \widehat{\Gamma}_p^{-1} \widehat{\gamma}_p = \widehat{R}_p^{-1}

Property 3.8 Large Sample Results for Yule–Walker Estimators

\[ \sqrt{n}(\widehat{\phi} - \phi) \xrightarrow{d} N(0,\sigma_w^2 \Gamma_p^{-1}) \\

ar.yw_rec$ar # = 1.3315874, -.4445447 (parameter estimates)

## [1] 1.332 -0.445

## [1] 0.0422 0.0422

ar.yw_rec$var.pred # = 94.79912 (error variance estimate)

## [1] 1.351 -0.461

## [1] 0.041 0.041

initialize all variables

#c(0) -> w -> z -> Sc -> Sz -> Szw -> para

para[1] <- (1-sqrt(1-4*(r^2)))/(2*r) # MME to start (not very good)

## [1,] 0 -0.495 159 171

Plot conditional SS and results

#c(0) -> w -> cSS

small sample distn

The most recent version of the package can be found at https://github.com/nickpoison/astsa/

You can find demonstrations of astsa capabilities at

In addition, the News and ChangeLog files are at

UCF students can download it for free through the library.

methods of moments estimation

\[ \phi_1,\phi_2,...,\phi_p \\ \theta_1,\theta_2,...,\theta_q \\ \sigma_w^2 \]

Method of moments can lead to sub-optimal estimators.

Definition 3.10 The Yule–Walker equations

\[ x_t = \phi_1 x_{t-1} + \phi_2 x_{t-2} + ... + \phi_p x_{t-p} + w_t \]

Pluging in values for \(h\), we get:

\[ \gamma(h) = \phi_1 \gamma(h-1) + \phi_2 \gamma(h-2) + ... + \phi_p \gamma(h - p), \ h =

Using matrix notation:

\[ \Gamma_p \phi = \gamma_p \\ \sigma_w^2 = \gamma(0) - \phi ' \gamma_p \\ \Gamma_p =

\[ \widehat{\gamma}(h) = \dfrac{1}{n}\sum_{t = 1}^{n - h}(x_{t+h} - \bar{x})(x_{t} - \bar{x}) \\

\[ \widehat{\phi} = \widehat{\Gamma}_p^{-1} \widehat{\gamma}_p = \widehat{R}_p^{-1}

Property 3.8 Large Sample Results for Yule–Walker Estimators

\[ \sqrt{n}(\widehat{\phi} - \phi) \xrightarrow{d} N(0,\sigma_w^2 \Gamma_p^{-1}) \\

The Durbin-Levinson algorithm iteratively calculates

\[ \widehat{\phi}_h = \begin{pmatrix} \phi_{h1} \\ \phi_{h2} \\ \vdots \\ \phi_{hh} \\ \end{pmatrix} \]

Property 3.9 Large Sample Distribution of the PACF

\[ \sqrt{n}\widehat{\phi}_{hh} \xrightarrow{d} N(0,1) \text{ as } n \rightarrow \infty \]

Example 3.27 Yule–Walker Estimation for an AR(2) Process

\[ x_t = 1.5 x_{t-1} - 0.75x_{t-2} +w_t \\ w_t \sim iid \ N(0,1) \]

v_x <- rep(

## ACF 0.86 0.54 0.16

## [1,] 4.163588e-07 -3.580686e-07

Let ar.ols() perform the calculations.

Example 3.28 Yule–Walker Estimation of the Recruitment Series

ar.yw_rec$ar # = 1.3315874, -.4445447 (parameter estimates)

## [1] 1.33 -0.44

## [1] 0.042 0.042

para[1] <- (1-sqrt(1-4(r^2)))/(2r) # MME to start (not very good)

para[1] <- (1-sqrt(1-4(r^2)))/(2r) # MME to start (not very good)