Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 54

Predictive Analytics

PGDM-II: Term IV
(2021-22)

Kakali Kanjilal
Professor, OM & QT
IMI New Delhi
PA with TS
data

Modelling

Forecasts Policy Analysis

Eg., How do gold, oil and Sensex impact


Eg., Stock Market, Sales, Exchange each other? How do you decide a pair of
Rate, Oil Price, Inventory stocks to diversify risk? Are the stock
Management, Weather markets integrated globally.
FUTURE is PREDICTABLY
UNPREDICTABLY…..
STILL
We LOVE to LIVE on PREDICTIONS
All FORECASTS are PREDICTIONS but NOT
All PREDICTIONS are FORECASTS
What is Forecast

• Forecast is a quantitative estimate about the


future events based on PAST and CURRENT
information.

•The PAST and CURRENT information is used in


Econometric Models to generate forecasts
Modelling Anatomy

Problem Statement

Data Collection and


Data Cleaning

Decide the most


appropriate Model

Estimate the model, Evaluate


Model Accuracy

Forecasts: Ex-Post, Ex-Ante


Evaluate Forecast Accuracy /Policy Analysis,
Evaluate accuracy.
Forecasting Models

Models

Time Data Mining


Econometric Others
Series Techniques

-Linear/Non- - Exponential Smoothing -Judgmental - Neural


linear - ARIMA, State Space, Forecasting
Regression
Network
ARMAX, Kalman Filter - Leading
-Dynamic Indicator - Wavelet
Regression - GARCH, MGARCH Method - Text Analytics
Model
- VAR, Cointegration
- Time Series
Decomposition
- Non-linear Models:
Regime Switching
Types of Forecast

• Immediate Future/Short-Term
• Long-Term
• Ex Post/In-sample
• Ex Ante/Out-of-Sample
Suppose you have data covering the period
2010.Q1-2018.Q4

Ex-Post/In-
sample The
Forecast Future
Ex-Post Estimation Period Period

2012017.420181 2018.4 20191


Ex-Ante/Out-
of-Sample
Forecast
Period
Forecast Accuracy

Minimize the “UNCERTAINTY/ERROR”

• Mean Absolute Error (MAE)

• Mean Absolute Percentage Error (MAPE)

• Root Mean Square Error (RMSE)


Time Series
A time series is a sequence of observations taken
sequentially in time, objective is how to see the
future without happening?

An intrinsic feature of a time series is that, typically


adjacent observations are dependent

Time Series Analysis is concerned with techniques for


the analysis of this dependence
Anatomy of Time Series Modelling
Univariate Modeling Multivariate Modeling
Business Problem
Business Problem = Policy Analysis,
1. Collect Data
= Forecasting 2. Decide the most suitable Forecasting
model

1. Check the properties of time series:


Stationarity
2. Decompose the time Series into: Trend,
Seasonality, Cyclical part, Random part

Not 1. Estimate the model Not


good 2. Check the model good
accuracy
Forecast, 1. Test for causal relationship
Check in-sample, out-of- 2. Long-run relationship
sample forecast performance Good 3. Sensitivity of external shock
by RMSE/MSE 4. Forecast

Leverage the model for


policy decisions
Time Series Data
4000 Weekly Closing Price
3500 of Infosys Tech Ltd

3000

2500

2000

1500

1000

• A trend over time observed


500

• Information content at time t gets carried over


2-May-09 2-Sep-09 2-Jan-10 2-May-10 2-Sep-10 2-Jan-11

to t+1, t+2….., t+n


200

• Possible to forecast period t+k based on the


180

160

140

past history of information


120

100 WPI (2004-05)


80 CPI-IW (2000-01)
60

40

20

0
Apr-09 Jul-09 Oct-09 Jan-10 Apr-10 Jul-10 Oct-10 Jan-11



0
5000
10000
15000
20000
25000
30000
35000
0
100000
200000
300000
400000
500000
600000

Apr-00
Aug-00 Jan-98
Jun-98
Dec-00
Nov-98
Apr-01
Apr-99
Aug-01 Sep-99
Dec-01 Feb-00
Apr-02 Jul-00
Dec-00
Aug-02
Exists trend May-01

Trend
Dec-02
Oct-01

Random part
Apr-03 Mar-02
Aug-03 Aug-02
Dec-03 Jan-03
Jun-03
Apr-04
Nov-03
Aug-04

Demand
Apr-04
Seasonal movement
Dec-04 Sep-04

Electricity Peak
Apr-05 Feb-05
Aug-05 Jul-05
foreign tourist

Dec-05 Dec-05

Random part
May-06
Apr-06
Oct-06
Aug-06 Mar-07
Dec-06
Time Series Data

cyclical effect may or may not have


Seasonality

Seasonality
Trend & Cyclical Patterns

Sales ($)

Secular trend
Cyclical patterns

0 2 4 6 8 10 12 14 16 18 20
Years
Trend, Seasonal & Random
Components

Sales ($)
peak

peak Seasonal
pattern

Long-run trend
Random (secular plus cyclical)
fluctuations

J F M A M J J A S O N D
Months
Time Series Data
25
30 days return over a
period(%)-Infosys
20

15

10

0
2-May-09 2-Sep-09 2-Jan-10 2-May-10 2-Sep-10 2-Jan-11

• No trend
-5

-10

-15 • Seasonal/Cyclical effect could be present


5
• Returns are more volatile than Inflation
4
Inflation-WPI (%) (2004-05)
Inflation -CPI-IW(%) (2000-01)
3

0
Apr-09 Jun-09 Aug-09 Oct-09 Dec-09 Feb-10 Apr-10 Jun-10 Aug-10 Oct-10 Dec-10 Feb-11

-1

-2
Components
• Secular Trend: long run pattern

• Cyclical Fluctuation: expansion and contraction of


overall economy (business cycle)

• Seasonality: when the event pattern repeats. For


example, annual sales patterns tied to weather,
traditions, customs.

• Irregular or random component: when the pattern


cannot be explained
Components

• Seasonality: this component arises if the data


frequency is
– Hourly data : (seasonal lag 24) or,
– Daily data : (seasonal lag 7) or,
– Weekly data : (seasonal lag 4) or,
– Monthly data : (seasonal lag 12) or,
– Quarterly data: (seasonal lag 4)
Estimation of Components

• Classical Decomposition of a Time Series

Yt = mt + st + ct + rt (additive model)
Yt = mt * st * ct * rt (multiplicative model)

– mt : trend component
(deterministic, changes slowly with t);
– st : seasonal component
(deterministic, period d);
- ct : Cyclical component = 0 (Assume)
– rt : random stationary component
Elimination of Trend

• Non-seasonal model with trend:


Xt = mt + Yt, E(Yt)=0

• Methods:

(a) Moving Average Smoothing

(b) Exponential Smoothing

(c) Polynomial Fitting

(d) Differencing k times to eliminate trend


Differencing ‘k’ times to eliminate
trend
• Backward shift operator B : B Xt = Xt-1

• We can remove trend by differencing, e.g.

(1-B) Xt = Xt - Xt-1, (First Differencing)

(1-B)2 Xt = (1-2B+B2) Xt = Xt - 2Xt-1 + Xt-2


= (Xt - Xt-1) – (Xt-1 – Xt-2 )
= ∆Xt - ∆Xt-1 (Second Differencing)

• It can be shown that a polynomial trend of degree k


will be reduced to a constant by differencing k times,
that is, by applying the operator (1-B)k Xt
Elimination of Seasonality

• Seasonal model without trend:


Xt = St + Yt, E(Yt)=0,.
• Classical Decomposition
– Regress level variable (Xt) on dummy variables (with
or without intercept)
– Calculate residuals, add these to the mean of Xt
– Resulting series is de-seasonalized time series

• Differencing at lag d to eliminate period d

– Since, (1-Bd)st= st - st-d = 0, differencing at lag d will


eliminate a seasonal component of period d.
Elimination of Trend + Seasonality

• Elimination of both trend and seasonal components in


a series, can be achieved by using trend as well as
seasonal differencing in a multiplicative model

For example: (1-B)(1-B12)Xt

• Objective is to extract and eliminate each systematic


component and then model rt appropriately
Stationary Time Series
• A stochastic process is said to be stationary if its
mean and variance are constant over time and the
value of covariance between two time periods
depends only the distance/gap/lag between the
two time periods

• In time series literature, such stochastic process is


known as weakly stationary or covariance
stationary

• A time series is strictly stationary if all the


moments of its probability distribution and not just
the first two (mean & variance) are invariant over
time
Stationary Time Series

• Thus, if a time series is stationary, its mean,


variance, auto-covariance remains same, no
matter at what point we measure them i.e. they
are time invariant

• Such a time series which tends to return to its


mean, called mean reversion
Non-stationary Series
• A non-stationary time series will have a time varying
mean or variance or both

• For non-stationary time series, we can study its


behavior only for the time period under consideration

• Each set of time series data will therefore be for a


particular episode

• So it is not possible to generalize it to other time


periods

• Therefore, for the purpose of forecasting, non-


stationary time series may be of little practical value
Problems of Non-stationary Series
• Non-stationary series yields spurious or nonsense
relationship/regression

• Relationship between GDP of India and rainfall of


some African countries may produce high R2 and
significant relationship.

• However, R2 should be close to zero but as the above


relationship does not exist in reality!

• Problem arises due to underlying nonstationary


relationship of the variables.
Univariate Time Series
Analysis
and
Forecasting
Time Series Forecasting Tools

• Time Series Decomposition Technique


– Decompose the components (Trend, Seasonality
& Cyclicity) from Additive or Multiplicative Model
– Yt = mt + st + ct OR Yt = mt * st * ct

• Exponential Smoothing Technique


• Box & Jenkin’s univariate forecasting methodology
(ARIMA modelling)
• Volatility Forecasting (ARIMA ARCH/GARCH)
• Multivariate Forecasting VAR, Cointegration
Box & Jenkin’s Univariate
ARIMA Modelling &
Forecasting
Box Jenkin’s ARIMA Modelling

• Step I: Identification
Check the stationary property of a time series
through graphical plot/unit root tests/Correlogram
Identify the time series (AR/MA/ARMA/ARIMA)
process through ACF, PACF

• Step II: Estimation


Estimate the time series model
(AR/MA/ARMA/ARIMA)
Box Jenkin’s ARIMA Modelling

• Step III: Diagnostic Check


• Check the model accuracy/goodness of fit

• Step IV: Forecasting


• Forecast using the model developed
• Check the forecast accuracy by RMSE/MSE
Flow Chart
Time series data, Yt = mt + st + rt
ACF, PACF, Stationary Tests(ADF tests)

Non-stationary series Stationary Series

De-trend and/or
Model for Yt
De-seasonalize
AR, MA, ARMA

Stationary Series Yt Residual series WN

Model for Yt
AR, MA, ARMA

Residual series WN Estimate AR, MA, ARMA parameters

Estimate AR, MA, ARMA parameters Forecast Yt


(In-sample/Out of sample)

Forecast Yt
(In-sample/Out of sample)
Identification
Non-stationarity
(Due to Trend & Seasonality)
Box Jenkin’s ARIMA Modeling

• Step I: Identification
• Check the stationary property of a time series
through graphical plot ,unit root tests Correlogram
(ACF/PACF plots)

• If stationary, Identify the time series


(AR/MA/ARMA/ARIMA) process through ACF, PACF
plot or Correlogram .
Box Jenkin’s ARIMA Modeling

• Step I: Identification

• If non-stationary, check if “Trend Differencing” or


“Seasonal Differencing” or both are required to
make the process stationary

• Identify the time series (AR/MA/ARMA/ARIMA)


process through ACF, PACF plot or Correlogram Plot
Autocorrelation function (ACF)
Autocorrelation function (ACF) of a random process
describes the correlation between the process at different
points in time.
Let Xt be the value of the process at time t (where t may be
an integer for a discrete-time process or a real number for
a continuous-time process).
If Xt has mean μ; and variance σ2 then the definition of ACF
is
Autocorrelation Function (ACF)

Autocorrelation Function (ACF):The ACF at lag K is


defined as
ρk = γk / γ 0
= Covariance at lag k/Variance at time t=0

A plot of ρk against k is called correlogram.


Partial Autocorrelation Function (PACF)

The partial autocorrelation at lag k , ρkk is the


regression coefficient of yt-k when yt is regressed on
a constant, yt-1, yt-2, …, yt-k .
It measures the correlation of values that are k
periods apart after removing the correlation from the
intervening lags.
ACF & PACF

• The partial autocorrelation at lag k is the regression


coefficient on Yt-k when Yt is regressed on a
constant,Yt-1…Yt-k

• This is a partial correlation since it measures the


correlation of values that are periods apart after
removing the correlation from the intervening lags

• Correlogram: Plot of ACF & PACF against lags

• Rule of thumb for the choice of the lag length: one-


third or one-fourth of the underlying time series.
Trend: ACF & PACF

• The ACF function shows a definite pattern, it


decreases with the lags. This means there is a trend in
the data.
• Since the pattern does not repeat , we can conclude
that the data does not show any seasonality.
Trend & Seasonality: ACF & PACF

The ACF plots clearly show a repetition in the pattern


indicating that the data are seasonal, there is
periodicity after every 12 observations, ie they show
seasonality and trend in the data.
Trend & Seasonality: ACF & PACF

The PACF plots also show seasonality, trend.


Q-STAT

Q-Stat (Box-Pierce Statistic):

Q = n ∑ ρk2 k= 1, 2, …, m ~ Chi Sq (m df)


under the null hypothesis that ρk’ s are
simultaneously equal to zero.
ACF, PACF & Q-Stat
US GDP: Quarterly data
Date: 07/21/11 Time: 12:26
Sample: 1970Q1 1991Q4
Included observations: 88

Autocorrelation Partial Correlation Lags AC PAC Q-Stat Prob

. |******* . |******* 1 0.968911 0.968911 85.46212 0


. |******* .|. | 2 0.935254 -0.05774 166.0159 0
. |******| .|. | 3 0.90136 -0.0196 241.717 0
. |******| .|. | 4 0.865792 -0.04502 312.3932 0
. |******| .|. | 5 0.829817 -0.02361 378.1002 0
. |******| .|. | 6 0.791221 -0.06222 438.5656 0
. |***** | .|. | 7 0.751933 -0.02906 493.8494 0
. |***** | .|. | 8 0.712502 -0.0239 544.1077 0
. |***** | .|. | 9 0.674907 0.009481 589.7729 0
. |***** | .|. | 10 0.638137 -0.01036 631.1213 0
. |**** | .|. | 11 0.601443 -0.0202 668.3282 0
. |**** | .|. | 12 0.565464 -0.01199 701.6495 0
. |**** | .|. | 13 0.532167 0.019973 731.5555 0
. |**** | .|. | 14 0.499796 -0.01239 758.2905 0
. |*** | .|. | 15 0.467677 -0.02061 782.0203 0
Non-stationarity/Unit
Root Tests
Random Walk Models (RWM)
RWM without drift
yt= yt-1+ut ; t=1,2,…,n
y1 = y0 + u1 ; y2 = y1 + u2 =y0 + u1 +u2 ;

E(Yt) = Y0 and var(Yt) = tσ2

Mean value of Y is its initial value, which is constant, but as


t increases, its variance increases indefinitely, thus violating
the stationary condition.

Stock Price, Exchange Rate etc.


Random Walk Models (RWM)
RWM with drift
yt= α + yt-1+ut ; t=1,2,…,n
y1 = α + y0 + u1 ; y2 = α + y1 + u2 = α + y0 + u1 +u2 ;

E(Yt) = α + Y0 and var(Yt) = tσ2

Mean value of Y is its initial value, which is constant plus


the drift. As t increases, its variance increases indefinitely,
thus violating the stationary condition.

Y drift upward or downward depending upon α being


positive or negative
Random Walk Models (RWM)
RWM with drift and a trend
yt= α1 + α2 t+yt-1+ut ; t=1,2,…,n

Y drifts upward or downward depending upon α2 being


positive or negative

SO,
RWM is the persistence of random shocks and impact of
particular shock does not die away

RWM said to have infinite memory

RWM is an example of what is known as unit root process


Dicky-Fuller Unit Root Tests
• Simple AR(1) model
xt=αxt-1+ut ….. (1)
• The null hypothesis of unit root,
Ho: α=1 with H1: α < 1

• Subtracting xt-1 from both sides of eq (1), we get


xt – xt-1 = αxt-1 – xt-1 + ut
Δxt-1 = (α-1)xt-1+ ut
Δxt-1 = γxt-1+ ut
• Here null hypothesis of unit root
H : γ = 0 and H : γ < 0
Detection of Unit Root – ADF Tests
• ADF test is conducted with the following model:

where Xt is the underlying variable at time t,


• ut is the error term
• The lag terms are introduced in order to justify that
errors are uncorrelated with lag terms.
• For the above-specified model the hypothesis, which
would be of our interest, is:
H0: γ = 0
Double click
The series has a definite
on the
Screen Shotsthat
of
pattern
Eviews
So,
pattern,
we reject
variables
the repeats
has the
series ‘DEMAND’
7
trend.
nullAlso,
around laghas
hypothesis
12.unit
the

to find
root
Seems
or ‘the
there
series
is seasonality.
is NOT
Stationary’ at 5% level of
‘Correlogram’
significance.
So, to forecast monthly ‘Peak
Electricity Demand’ we have to
and
make the series stationary.

‘Non-Staionarity Tests’
Class Exercise
-Cases from Gujarati
- Course material cases
Find ACF & PACF,
Conduct Stationarity Test
Microsoft Office
Excel Worksheet

You might also like