Chapter 11 Part 1 - Arima (Box-Jenkin) - 2023

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT
Chapter 11. Introduction to ARIMA Models
Nguyen VP Nguyen, Ph.D.

Department of Industrial & Systems Engineering, HCMUT
Email: nguyennvp@hcmut.edu.vn
General Overview
• An ARIMA model is a mathematical model for
time series data, or iterative approach (cách tiếp
cận lặp)
• George Box and Gwilym Jenkins developed a
systematic approach for fitting these models
to data so these models are often called Box-
Jenkins models.
• Iterative approach:
identify model  estimate parameter
 checking  …
 We always use statistical or forecasting programs
to fit these models
 The programs fit models and produce forecasts for
us.
2

1
ARIMA Stands for AutoRegressive

Integrated Moving Average
ARIMA Models
• ARIMA
 Special cases: AR models, MA models, ARMA models,
IMA models.
 Models generalize regression but “independent”
variables are past values of the series itself and
unobservable random disturbances.
• Estimation is based on maximum likelihood; not least
squares.
• We distinguish between seasonal and non-seasonal
models.

2
MLE and LSE

• Maximum Likelihood Estimation:
 MLE finds the parameter values that maximize the likelihood function, which
measures how well the model with those parameters explains the observed
data.
 widely used in logistic regression, generalized linear models, and machine
learning algorithms
 Requires an assumption about the distribution of the data (e.g., normal
distribution). The accuracy of MLE depends on how well this distribution fits
the actual data.
• Least Squares Estimation:
 LSE finds the parameters that minimize the sum of the squared differences
(residuals) between the observed values and the values predicted by the
model.
 Commonly used in linear regression models where the aim is to quantify the
relationship between variables and make predictions.
 It primarily assumes a linear relationship between variables and
homoscedasticity (constant variance of errors). 5
When To Use ARIMA forecasting

• Benchmark for other forecasting models
• Presence of large residual errors in regression
• Limitation of explanatory variables: No Need for External Variables or
Complex Relationships
 ARIMA models are univariate and do not incorporate external variables or complex
relationships between different variables.
• Operational Forecasting
 It's used in operations and production planning where demand or supply data
fluctuate over time but do not have strong seasonal patterns.
• Short-term Forecasting:
 ARIMA is typically more accurate for short-term rather than long-term forecasting
because it assumes that the patterns in the data will not change dramatically in the
future.
• Data Contains Autocorrelations: ARIMA is suitable for data where
autocorrelations
• Economic and Financial Time Series:
 ARIMA is commonly used in economics and finance for forecasting stock prices, GDP,
demand, sales, and other metrics that fluctuate over time.
6

3
HOW to use ARIMA
Consider a time series screened by a "black box“
Input  Black box  Output

Observed White noise
time series (residuals)
Fitted models
Objective?: Find the black box that most closely fits the
data
What are the inputs to the black box? In ARIMA
analysis the inputs are observed series and output is
"white noise“
7
Notation
• Y1, Y2, …, Yt denotes a series of values for a time series.
 These are observable.
• 1, 2, …, t denotes a series of random disturbances (nhiễu
ngẫu nhiên), or error terms at time 1, 2,...t
 These are not observable.
 Usually they are assumed to be generated from a Normal
distribution with mean 0 and standard deviation , and to be
uncorrelated with each other.
 They are often called “white noise”.
• E[Yt] = μ, var(Yt) = σ2 and cov(Yi, Yj) =
0 for i ≠ j.
Yt     t  c   t • Since these values are constants, this
type of time series is stationary
8

4
White Noise
White noise (residuals): purely random data no relation
between consecutively observed values or zero "serial
correlation“ The previous values do not help in predicting future
values example: the toss of a fair coin
Characteristics:
• pattern through time is completely random ~ mean of zero
• no correlation between its values at different times ~ All
nonzero lags
A simple random model, often called a white noise model: the
observation Yt is composed of two parts: c, the overall level, and the
random error component 𝜀 t, is assumed to be uncorrelated from
period to period.
𝑌 𝑐 𝜀
Example with white noise

The Random Walk Model is a statistical
model, to represent systems or processes
from random import gauss that evolve over time in a seemingly
from random import seed
from pandas import Series
random manner.
from pandas.plotting import
autocorrelation_plot It's particularly popular in financial market
from matplotlib import pyplot analysis for modeling stock prices.
# white noise series
series = [gauss(0.0, 1.0) for
i in range(1000)]
A random walk: the current observation
series = Series(series) equals the value at the previous time plus
a random, unpredictable change .
𝑌 𝑌 𝜀
where 𝜀 is a discrete white noise series.
10

5
How To Determine The Black Box

There are three types of models considered in ARIMA
analysis:
1. Autoregressive Models (các mô hình tự hồi qui)
2. Moving Average Models (các mô hình trung bình di
động)
3. Autoregressive - Moving Average Models (các mô
hình tự hồi qui- trung bình di động)
11
Autoregressive Model, AR(p)

AR models of Yt are based on a linear combination of a
finite number of the previous values of Yt, rather than the
white noise series or residuals.
where:
Yt = time series generated;
pth-order autoregressive models
1,  2,…,  p = coefficients
Yt-1, Yt-2,…, Yt-p = lagged values of the time series
t = white noise
• AR(1), AR(2): The AR models of order 1 and order 2

• Any AR(p) models are reflected by the autocorrelation (ACF)
and partial autocorrelation functions’ behavior (PACF)

6
Autoregressive Model, AR(p)
• Autoregressive models are appropriate for stationary

time series, and the coefficient 𝜙 is related to the
constant level of the series.
• If the data vary about zero or are expressed as
deviations from the mean, 𝑌 𝑌, the coefficient 𝜙 is
not required
an AR model of order 2
13
ACF
• Autocorrelation function (ACF) is a measure of the correlation
between observations of a time series that are separated by k
time units (Yt and Yt–k).
 The autocorrelation (or serial correlation) for time series
observations (Yt) with observations with of the same series at
previous time steps (Yt-k) or is called lags, k=steps, lags
 Confidence intervals are drawn as a cone.
By default, this is set to a

95% confidence interval,
correlation values outside of
this cone have a correlation
and not a statistical fluke
(false positive or false
negative results) 14

7
PACF
• Partial autocorrelation function (PACF) is a measure of the
correlation between observations of a time series that are
separated by k time units (Yt and Yt–k), after removing for the
presence of all the other terms of shorter lag (Yt–1, Yt–2, ..., Yt–k–1).
 The partial autocorrelation at lag k is the correlation that results after
removing the effect of any correlations due to the terms at shorter
lags.
 For an AR model, the theoretical PACF “shuts off” past the order of
the model.
 The phrase “shuts off” means that in theory the partial
autocorrelations are equal to 0 beyond that point. = the number
of non-zero partial autocorrelations gives the order of the AR model.
 By the “order of the model” we mean the most extreme lag of x that
is used as a predictor.
15
The partial autocorrelation function (PACF) gives the partial

correlation of a stationary time series with its own lagged values,
regressed the values of the time series at all shorter lags.
It contrasts with the autocorrelation function (ACF), which does
not control for other lags.
16

8
The autocorrelation coefficients The partial autocorrelation coefficients

trail off to zero gradually drop to zero after the first time lag.
It must be
Remembered that,
the sample ACF
are going to
differ from these
theoretical
functions because
of sampling
variation.
• "Cut off after lag“ = becomes zero abruptly, and "tails

off" means that it
• "Tails off“ = decays to zero asymptotically (normally
exponentially)
• ACF "tails off" to zero

• PACF "cuts off" after the 2nd lag
18

9
Example
• Example: A time series of annual numbers of worldwide

earthquakes. Following is the sample PACF for this series.
What is an AR(p) model for a time series of annual

numbers of worldwide earthquake? 19
Moving Average Models , MA(q)

Moving Average Models: The dependent variable Yt depends
on the previous errors, rather than variable itself. MA(q)
models are based on a linear combination of a finite number
of past errors.

10
Moving Average Models , MA(q)

• Moving Average Models in ARIMA are different from the
moving average procedure (Chapter 4, 5).
 The weight q do not need to sum up to 1; might be negative or
positive.
• The following figures show the MA models of order 1, MA(1)

and order 2, MA(2); here q=2
• The number q can be added to the model, is the number of
past error terms, MA(q).
 Example 9.1 page 386
21
MA(1), MA(2) model drop to zero after The partial autocorrelation coefficients
the 1st time lag and 2nd time lag trail off to zero gradually
The sample
autocorrelation
functions are
going
to differ from
these theoretical
functions
because of
sampling
variation.

11
Autoregressive Moving Average Models

AR(p), MA(q) and ARMA(p, q) models
• A model with autoregressive terms AR(p) can be combined with
a model having moving average terms MA(q) creating a
“mixed” autoregressive–moving average model
• Forecasts generated by an ARMA(p, q) model will depend on
current and past values of the response Y, as well as current
and past values of the errors (residuals),  s
• An general form ARMA(p, q) model:
23
both the autocorrelations and the partial autocorrelations die out; neither cuts
off

12
Autoregressive Moving Average Models

AR(p), MA(q) and ARMA(p, q) models
• ARMA(p, q) models can describe a wide variety of behaviors
for stationary time series.
• In practice, p and q <= 2
25
“Full” ARIMA Model
• The Box-Jenkins approach uses an iterative model-

building strategy that
 selecting an initial model (model identification),
 estimating the model coefficients (parameter estimation),
 analyzing the residuals (model checking).
 If necessary, the initial model is modified
the process is repeated until the residuals All data in

indicate no further modification is necessary. ARIMA analysis
ARIMA (p d q) is assumed to
AR I MA be "stationary"
p is the number of AR terms,
d is the number of differences (amount of differencing), and
q is the number of MA terms.
26

13
Stationarity Data
• A time series is stationary if:
 It’s mean is the same at every time
 It’s variance is the same every time
 It’s autocorrelations are the same at every time
• Examples:
 A series of outcomes from independent identical trials is stationary.
 A series with a trend is not stationary.
 A random walk is not stationary.
• If a time series is non-stationary, its ACF dies off slowly and the first partial
autocorrelation is near 1.
 In such cases we can sometimes create a stationary series by
differencing the original series. This is the source of the "I" in an ARIMA
model
 If Yt is a random walk, then its differences are white noise which is
stationary
• A unit root test is a formal test for non-stationarity
 One such test is the Dickey-Fuller test 27
Step 1: Model Identification

• To determine whether the series is stationary
 Look at a plot of the series and the sample autocorrelation function
 A nonstationary time series: the series appears to grow or decline over
time and the sample autocorrelations fail to die out rapidly.
• A nonstationary time series can be converted to a stationary
series by differencing
 The original series is replaced by a series of differences
ΔYt = Yt - Yt-1
• ARIMA (p, d, q):
 If d>0 (nonstationary):
͟ if ΔYt follows an AR(p) model, then Yt follows and ARIMA(p,1,0) model.
͟ if ΔYt follows an MA(q) model, then Yt follows and ARIMA(0,1,q) model.
 If d= 0 (stationary): the ARIMA models reduce to ARMA models.
• In effect, the analyst is modeling changes rather than levels.
28

14
Step 1: Model Identification

• In some cases, it may be necessary to difference the
differences before stationary data are obtained
• Differencing is done until a plot of the data indicates that the

series varies about a fixed level and the sample
autocorrelations die out fairly rapidly.
• If the sample autocorrelations die out exponentially to zero and
the sample partial autocorrelations cut off, the model will
require autoregressive terms.
• If the sample autocorrelations cut off and the sample partial
autocorrelations die out, the model will require moving average
terms.
• If both the sample autocorrelations and the sample partial
autocorrelations die out, both the autoregressive and the
moving average terms are indicated. 29
Step 2: Model Estimation
• Once a tentative model has been selected, the

parameters for that model must be estimated.
• The parameters in
ARIMA models are
estimated by
minimizing the sum of
squares of the fitting
errors
• Generally, Minitab
uses a nonlinear
least squares
procedure.
30

15
Step 2: Model Estimation
• Once the parameters in ARIMA models are obtained,

they are judged significantly different from zero are
retained in the fitted model
• For example, an ARIMA(1, 0, 1) has been fit to a time
series of 100 observations
p-value of .14, the hypothesis is not rejected,

and this term could be deleted from the model.
An ARIMA(0, 0, 1) model—that is, an MA(1) model—could then

be fit to the data.
31
Step 3: Model Checking

• The model must be checked for adequacy
• A histogram and normal probability plot (to check for
normality) and a time sequence plot (to check for
outliers)
• The individual residual autocorrelations should be
small and generally within of zero.
 Significant residual autocorrelations at low lags or seasonal
lags suggest that the model is inadequate and that a new or
modified model should be selected.
• The residual autocorrelations as a group should be
consistent with those produced by random errors.
32

16

• An overall check of model adequacy is provided by a
chi-square 𝜒 test, to determine whether the residuals
are, based on the Ljung-Box Q (LBQ) statistic.
• A model is adequate if the residuals cannot be used
to improve the forecasts, or the residuals should be
random.
• LBQ statistic determines whether all the
autocorrelations up to and including a specific lag are
equal to 0 (up to 12, 24, 36.. lags).
 Minitab displays lags that are in multiples of 12.
 The maximum number of lags
͟ is approximately n/4 for a series with less than 240
observations
͟ is approximately 𝒏 𝟒𝟓 for a series with more than 240
observations, where n is the number of observations. 33
• If the p-value associated with the Q statistic is greater

than the significance level, you can conclude that the
residuals are independent and that the model meets
the assumption.
• If the p-value associated with the Q statistic is small
(say, p-value ), the model is considered inadequate.
 If the assumption is not met, the model may not fit the data
and you should use caution when you interpret the results.
34

17
Step 4: Forecasting with the Model
• Once an adequate model has been found, forecasts

for one period or several periods
• for a given confidence level, the longer the forecast
lead time, the larger the prediction interval.
• Example of Dow Jones Transportation Index
 The Cameron Consulting Corporation specializes in portfolio
investment services.
 The company analyst developed more-sophisticated
techniques for forecasting Dow Jones averages, on the Dow
Jones Transportation Index.
35
Example
An upward trend in the series
The first several autocorrelations

were persistently large and trailed
off to zero rather slowly
 the time series was
nonstationary
36

18
Example
To difference the data to see if we could
eliminate the trend and create a
stationary series.
Comparing the autocorrelations with their

error limits, the only significant
autocorrelation was at lag 1
Only the lag 1 partial autocorrelation was

significant.
Neither pattern appears to die out in a

declining manner at low lags  to fit
both ARIMA(1, 1, 0) and ARIMA(0, 1, 1)
models to the Transportation Index.
37
Example
• To include a constant term in each model to allow for

the fact that the series of differences appears to vary
about a level greater than zero.
38

19
Example
39

20

Chapter 11 Part 1 - Arima (Box-Jenkin) - 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 11 Part 1 - Arima (Box-Jenkin) - 2023

Uploaded by

Copyright:

Available Formats

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Chapter 11. Introduction to ARIMA Models

Nguyen VP Nguyen, Ph.D.

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

ARIMA Stands for AutoRegressive

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

MLE and LSE

When To Use ARIMA forecasting

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

HOW to use ARIMA

Consider a time series screened by a "black box“

Input  Black box  Output

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Example with white noise

where 𝜀 is a discrete white noise series.

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

How To Determine The Black Box

Autoregressive Model, AR(p)

• AR(1), AR(2): The AR models of order 1 and order 2

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Autoregressive Model, AR(p)

• Autoregressive models are appropriate for stationary

By default, this is set to a

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

The partial autocorrelation function (PACF) gives the partial

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

The autocorrelation coefficients The partial autocorrelation coefficients

• "Cut off after lag“ = becomes zero abruptly, and "tails

• ACF "tails off" to zero

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

• Example: A time series of annual numbers of worldwide

What is an AR(p) model for a time series of annual

Moving Average Models , MA(q)

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Moving Average Models , MA(q)

• The following figures show the MA models of order 1, MA(1)

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Autoregressive Moving Average Models

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Autoregressive Moving Average Models

“Full” ARIMA Model

• The Box-Jenkins approach uses an iterative model-

the process is repeated until the residuals All data in

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Step 1: Model Identification

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Step 1: Model Identification

• Differencing is done until a plot of the data indicates that the

Step 2: Model Estimation

• Once a tentative model has been selected, the

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Step 2: Model Estimation

• Once the parameters in ARIMA models are obtained,

p-value of .14, the hypothesis is not rejected,

An ARIMA(0, 0, 1) model—that is, an MA(1) model—could then

Step 3: Model Checking

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Step 3: Model Checking

Step 3: Model Checking

• If the p-value associated with the Q statistic is greater

Bộ môn Kỹ thuật Hệ thống Công nghiệp- HCMUT

Step 4: Forecasting with the Model

• Once an adequate model has been found, forecasts