Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

DATA ANALYTICS WITH R

Data analysis with R:


TIME SERIES
Phân tích dữ liệu với R 3

CONTENT
1. The introduction of time series
2. Some models of time series analysis
3. Time series analysis with R
Phân tích dữ liệu với R 4

1. Time Series
Time series: the structure of a data table with many different columns,
which requires to have a timestamp column and events that change
with that timestamp column.
E.g.:
⁃ Data about customers' purchasing demand by year, quarter, month, ...
⁃ Data on stock prices, trading volume by year, month, quarter, week, day, ...
⁃ ….

Applications of time series


analysis?
Phân tích dữ liệu với R 5

1. Time Series
Components
Phân tích dữ liệu với R 6

1. Time Series
Components
Phân tích dữ liệu với R 7

1. Time Series
+

▷ k th-order autoregressive model


+…+ +

E.g.: +

▷ Finite distributed lag models


+…+ +

E.g.: + +
Phân tích dữ liệu với R 8

1. Time Series
Stationarity
Why is it necessary to
check for stationarity in
E(𝑌 ¿¿t)=μ¿
2 time series analysis?
dừng  Var(𝑌 ¿¿t)=𝜎 ¿
Cov (𝑌 ¿ ¿t ,𝑌 t − p )=𝛾 p ∉t ¿
Phân tích dữ liệu với R 9

1. Time Series
Correlation
1st order:

2nd order:

1st&2nd order:
Phân tích dữ liệu với R 10

1. Time Series
White noise
E (𝜀¿¿t)=0¿
2
nhiễu trắng  Var(𝜀¿¿t )=𝜎 ¿
Cov (𝜀¿ ¿t , 𝜀t − p )=0¿

Similar to white noise +

Why is it important to identify “white noise” in time


series data analysis?
Phân tích dữ liệu với R 11

2. Some models of time series analysis


AR ARIMA SARIMA

AR (Auto-Regression) model: modeling the relationship between the data


and its own lag  expressed through the parameter 'p' in the ARIMA
model
I (Integrated): using the difference to eliminate propensity in the data 
expressed through the parameter 'd' in the ARIMA model

MA (Moving Average) : modeling the relationship between the data and the
error relative to the average of the other lags  expressed through the
parameter 'q' in the ARIMA model
 ARIMA(p, d, q)
Phân tích dữ liệu với R 12

2. Some models of time series analysis


▷ AR(p) – AutoRegression
AR(1):
AR(2): 𝑌 t =𝛼+ 𝛽1 𝑌 t −1 + 𝛽2 𝑌 t − 2+ 𝜀 𝑡

Find p?

PACF – Partial Auto Correlation Function

Finding ‘p’ via PACF plot


𝐸𝑥𝑝 𝐴𝑢𝑔 =𝛼+ 𝛽1 𝐸𝑥𝑝 𝐽𝑢𝑙 + 𝛽 2 𝐸𝑥𝑝 𝐽𝑢𝑛 + 𝐸𝑟𝑟𝑜𝑟
Phân tích dữ liệu với R 13

2. Some models of time series analysis


▷ MA(q) – Moving Average
AR(1):
MA(1):
𝑌t 𝑌 t−1 𝑌 t−2
Find q?

ACF – Auto Correlation Function

Finding ‘q’ via ACF plot


Phân tích dữ liệu với R 14

2. Some models of time series analysis


▷ ARMA – Auto Regression Moving Average
AR(p) (Auto-Regression): use past values
AR(1):
MA(q) (Moving Average): use past errors
MA(1):
ARMA(p,q) model:
ARMA(1,1):
Phân tích dữ liệu với R 15

2. Some models of time series analysis


▷ ARIMA – Auto Regression Integrated Moving Average

AR(p) sử dụng sai phân MA(q)


chuyển đổi chuỗi dừng
(d)

 ARIMA(p, d, q)
Phân tích dữ liệu với R 16

3. Time series analysis with R


Data analysis process
Identify
the Choose an approach
problem (technical)
1 3

2 4
Preliminary Forecast
data analysis
Phân tích dữ liệu với R 17

3. Time series analysis with R


▷ Data analysis with the ARIMA model
• Trực quan (xem xét đặc trưng tổng thể)
1

• Check the stationarity  determine (d)


2
• Check the correlation with the chart ACF/PACF  determine
3 (p, q)

• Build the ARIMA model (p,d,q)


4

• Forecast
5
Phân tích dữ liệu với R 18

3. Time series analysis with R


▷ Data analysis with the ARIMA model
S
t  Series is non-stationary
a
t
i
o
n
a
r
y
ADF Test (Augmented Dickey-Fuller Test) với giả thuyết không (Ho) (null
hypothesis) là chuỗi không dừng.
KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin Test) với giả thuyết không (Ho)
là chuỗi dừng.
Why is it necessary to consider the stationarity of data??
🏚:To learn more about methods of testing ADF, KPSS, students should read Econometrics
textbook.
Phân tích dữ liệu với R 19

3. . Time series analysis with R


▷ Data analysis with the ARIMA model

S  Series is stationary
t
a
t
i
o
n
a
r
y
Phân tích dữ liệu với R 20

3. Time series analysis with R


▷ Data analysis with the ARIMA model

S
t
a
t
i
o
n
a
r
y
Phân tích dữ liệu với R 21

3. Time series analysis with R


▷ Data analysis with the ARIMA model
C Autocorrelation is a PACF – Partial Auto Correlation Function
o
phenomenon in
r
r which values in a
e time series are
 Shows a direct correlation between the lags
l correlated with each of the data.
a other.
t
i
o
ACF – Auto Correlation Function
n

 Shows the overall correlation between the lags


of the data.
Phân tích dữ liệu với R 22

3. Time series analysis with R


▷ Data analysis with the ARIMA model
C
PACF – Partial Auto Correlation Function
o
r
r ARIMA(p, d, q)
e Finding ‘p’ via PACF plot
l
a
t
i
o ACF – Auto Correlation Function
n

ARIMA(p, d, q)
Finding ‘q’ via ACF plot
Phân tích dữ liệu với R 23

3. Time series analysis with R


▷ Data analysis with the ARIMA model
Define parameters automatically for the model::
With Python, the
library supports
determining the
optimal set of
parameters (p, d, q)
based on AIC
(Akaike Information
Criterion) criteria 
reducing the risk of
"overfitting" and
"underfitting" of the
model.
Models with lower AIC scores are considered better
A
Q

You might also like