Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

A STUDY REPORT ON TIME SERIES ANALYSIS AND

TESTS FOR STATIONARITY


India, November 2021

What is a time series data and time series analysis?

An ordered series of data points recorded over a specified period of time at equal time interval
is called time-series data. Time-series analysis is a technique for analyzing time series data and
extract meaningful statistical information and characteristics of the data. One of the major
objectives of the analysis is to forecast future value. Extrapolation is involved when forecasting
with the time series analysis which is extremely complex. A time series is generally assumed
to consist of deterministic components (results can be predicted with certainty) and stochastic
components (results cannot be predicted with certainty as the outcome depends on chance).
These uncertainties can be due to uncertainty in process knowledge, uncertainty in
measurement and unknown causes. The forecasted value along with the estimation of
uncertainty associated with that can make the result extremely valuable.

What is stationary and Non-stationary time series?

The term stationarity refers to the notion that the essential statistical properties of the sample
data, including their probability distribution and related parameters, are invariant with respect
to time, i.e. they are not a function of time. The order of the stationarity represents the highest
central moment (moment around the mean), which remain constant over time. For instance,
first-order stationarity indicates time invariant mean or mean does not change over time.
Similarly, if both mean and variance (second-order central moment) remain constant over time,
the time series is known to be second-order stationary or weakly stationary. If mean, variance,
and all higher-order moments are constant over time, the time series is called strict sense
stationary or simply stationary.

If the statistical properties of a time series change or vary with time, it is known as non-
stationary time series. Apart from various other causes, presence of trend, jump, periodicity,
and a combination thereof, cause non-stationarity in the time series. These are generally
deterministic components that should be removed to obtain the stochastic component of the
time series before applying any time series model.
Deterministic components in time series:

i) Trend:- Trend refers to gradual but continuous change in mean of a time series. Trend may
be increasing or decreasing and may be linear or nonlinear.

ii) Jump:- An abrupt change in the mean of the time series at some time step is termed as jump.
The removal of jump requires identification of the time step of its occurrence.

iii) Periodicity:- Periodicity is a property of time series in which the same or similar values
get repeated after some time difference. On visualization, periodic time series show wave like
characteristics. The time series that do not exhibit periodicity is termed as aperiodic.

The above explained deterministic components are shown below graphically.

Stochastic component:

Noise: After we extract level, trend, seasonality/cyclicity, what is left is noise. Noise is a
completely random fluctuation in the data.
Components of time series data comprising deterministic and stochastic components

Why should time series be stationary?

The stationarity or otherwise of a series can strongly influence its behaviour and properties

• persistence of shocks will be infinite for nonstationary series.


• Spurious regressions: If two variables are trending over time, a regression of one on
the other could have a high R2 even if the two are totally unrelated.
• If the variables in the regression model are not stationary, then it can be proved that
the standard assumptions for asymptotic analysis (i.e. n -> ∞ ) will not be valid. In other
words, the usual “t-ratios” will not follow a t-distribution, so we cannot validly
undertake hypothesis tests about the regression parameters.
• Furthermore, stationary series is easier for statistical models to predict effectively and
precisely.

Thus, a key role in time series analysis is played by processes whose properties, or some of
them, do not vary with time. If we wish to make predictions, then clearly we must assume that
something does not vary with time. In extrapolating deterministic functions, it is common
practice to assume that either the function itself or one of its derivatives is constant. The
assumption of a constant first derivative leads to linear extrapolation as a means of prediction.
In time series analysis our goal is to predict a series that typically is not deterministic but
contains a random component. If this random component is stationary, then we can develop
powerful techniques to forecast its future values.

Before looking into the statistical tests for stationarity, we shall familiarize with few statistical
terms and its significance in these tests.

Covariance

Covariance is a measure of the joint variability of two random variables and can be written as
the expected value of the product of their deviations from their respective mean values.

Cov (X,Y)= σX,Y = E [(X − µX) (Y – μY )] or (∑(𝑥 − (𝑥 ) ̅ )(𝑦 − 𝑦 ̅ ))/((𝑛 − 1) )

By using the linearity property of expectations, r.h.s. of Equation can be transformed to a


simpler form, which describes as the expected value of their product minus the product of their
expected values

E [(X – μX) (Y – μY )] = E (XY) − E (X) E (Y )

Covariance can classify three types of relationships, i.e. relationship with positive trend
(covariance is positive), relationship with negative trend (covariance is negative) and no
relationship as there is no trend (covariance is zero). covariance of two independent variables
is always 0. Also, covariance values are sensitive to the scale of the data and this makes them
difficult to interpret.

Covariance is a computational stepping stone to another thing called correlation.

Correlation:-

Correlation analysis comprises statistical methods used to evaluate whether two or more
random variables are related through some functional form and the degree of association
between them. Correlation analysis is one of the most utilized techniques for assessing the
statistical dependence among variables or their covariation, and can be a useful tool for
indicating the kind and the strength of association between random quantities. Qualitative
indications on the association of two variables are readily visualized on a scatterplot. Multiple
patterns or no pattern at all may arise from these plots, and, in the former case, can provide
evidence of the most appropriate functional form, as given by measures of correlation. If on a
given scatterplot, a variable Y systematically increases or decreases as the second variable X
increases, the two variables are associated through a monotonic correlation. Otherwise, the
correlation is said to be non-monotonic. The strength of the association between two variables
is usually expressed by correlation coefficients. Such coefficients, hereafter denoted generally
as ρ, are dimensionless quantities, which lie in the range -1< ρ < 1. If the two variables have
the same trend of variation or, in other words, if one increases as the other increases, then ρ
will be positive. On the other hand, if one of the variables decreases as the other increases then
ρ will be negative. Finally, if ρ = 0, then, either the two variables are independent, in a statistical
sense, or the functional form of the association is not correctly described by the correlation
coefficient in use. Correlation in a data set may be either linear or nonlinear. Nonlinear
associations can be represented by exponential, piecewise linear or power functional forms. If
the association between variables is linear, the most common correlation coefficient is the
Pearson’s r. It has to be noted that highly correlated variables do not necessarily have a cause–
effect relationship. In fact, correlation only measures the joint tendency of variation of two
variables.

𝐶𝑜𝑣 (𝑋, 𝑌) 𝜎 𝑋, 𝑌
𝑃𝑒𝑎𝑟𝑠𝑜𝑛′ 𝑠 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡, 𝜌𝑋, 𝑌 = =
√𝑉𝑎𝑟𝑋 √𝑉𝑎𝑟𝑌 𝜎𝑋. 𝜎𝑌

Cov (X,Y) is a value between -∞ and +∞.The denominator brings down the value between -1
and 1, i.e the denominator ensures that the scale of the data does not affect the correlation
coefficient and this makes correlations much easier to interpret. It is worth stressing that one
must be cautious about the occurrence of spurious correlations. As far as hypothesis testing is
concerned, it is often useful to evaluate whether the Pearson’s correlation coefficient is null. In
this case, the null and alternative hypotheses to test are H0 : ρ =0 and H1 : ρ ≠ 0, respectively.
The related test statistic is expressed as

2
𝑟 √𝑁 − 2
𝑡0 = 2
√1 − 𝑟 2

for which the null distribution is a Student’s t, with ν =N - 2 degrees of freedom. The null
hypothesis is rejected if |𝑡| > 𝑡∝ , where α corresponds to the level of significance of the
,𝑁−2
2

test. Correlation is stronger and confident level is high when the data set is large. Our
confidence depends on the p value. If the data set is large, p value is low, higher is the
confidence and if the data set is small, p value is high, lower is the confidence level.

The sample estimate of the population correlation coefficient ρX,Y is rX,Y computed from
𝑆 𝑋, 𝑌
r X, Y =
𝑆𝑋. 𝑆𝑌

where SX and SY are the sample estimates of σX and σY , respectively, and SX,Y is the sample
covariance.

Auto Correlation Function (ACF) :-

Different time series modeling approaches can be used for modeling stochastic component of
the time series. Some of the popular linear models for time series prediction/forecast are
following:

(i) Autoregressive model

(ii) Moving average model

(iii) Autoregressive moving average model

(iv) Autoregressive integrated moving average model

Out of these, the first three are linear stationary models used for modeling stationary time
series. However, the last model is a linear non-stationary model and is used to model a time
th
series for which d difference series is stationary. All of these models are linear regression
model and try to relate the present value of time series with the previous values as correlation
may also exist between successive observations of the same random variable in a time series.
According to Haan (2002), this results from the fact that part of the information contained in a
given observation is actually already known from the previous observation. Being linear, these
models rely on mutual linear association between time series values. These linear associations
are expressed in term of autocorrelation function and partial autocorrelation function in time
series.

Autocorrelation is a measure of linear association between the values of same time series
separated by some time lag/steps (say k). For a time series X(t), and the same time series with
lag k (represented by X(t − k)), the linear association is measured by autocovariance. The term
auto is used as the values are from same series but with some lags. The autocovariance function
for lag k (represented by Ck ) is given by:

Ck = E(X(t), X(t − k))

where E represents the expectation. The autocorrelation function for lag k is defined as:
𝐶𝑘
ρ𝑘 =
𝜎𝑡 𝜎𝑡−𝑘

where σt and σt−k are standard deviation for time series X(t) and X(t − k), respectively. If the
time series is second-order or higher-order stationary (standard deviation does not change over
time), then the above equation can be expressed as:

𝐶𝑘
ρ𝑘 = 𝜎2

where σ is the standard deviation of time series X(t). A plot of autocorrelation function with
corresponding lag is called autocorrelogram. For a stationary time series the autocorrelation
become insignificant with increasing lag. However, for a periodic time series the
autocorrelation is also periodic and decreases slowly with damping peaks. Under the
assumption of independent time series, autocorrelation at lag k is normally distributed with
zero mean and 1/(N − k) variance, N being the length of time series.

Auto Regressive (AR) model

Autoregressive model tries to estimate the current value of time series using linear combination
of weighted sum of previous values of the same time series. But for this the time series should
follow 2 assumptions: Stationarity and Autocorrelation. For a time series to be stationary, the
mean, variance and co-variance should be constant. Autocorrelation helps us to know how a
variable is influenced by its own lagged values.

The number of lagged values being considered (say p) is called order of AR model. pth-order
AR model (AR(p)) is given by
𝑝

𝑋 (𝑡) = ∑ 𝛷𝑖 𝑋(𝑡 − 𝑖) + 𝜀 (𝑡)


𝑖=1

Where Φi (for i 𝜖 { 1,2, … . , 𝑝}) are called autoregressive coefficients and 𝜀(𝑡) is uncorrelated
identically distributed error with mean zero, also known as white noise. Time series X(t) is
obtained after removing the deterministic components like trend and periodicity. Time series
X(t) is obtained after removing the deterministic components like trend and periodicity. Using
the backshift operator ARMA(p) can also be written as,

X(t) − Φ1B(X(t)) − Φ2B2(X(t)) −· · ·−ΦpBp(X(t)) = ε(t)

As an initial guess, the order p is decided from partial autocorrelation function. Number of lags
for which partial autocorrelation is significant is considered as p. Hence, for a AR(p) model,
all partial autocorrelation with lag more than p should be zero and autocorrelation decays
exponentially to zero. Different AR models are fitted using the slight variation in initial guess
of AR order, the best model out of all fitted models is chosen on the basis of their parsimony

Properties of AR Model

Stationarity: The developed AR model is required to be a stationary model. For a stationary


AR(p) model, the autocorrelation matrix for order p should be positive definite, i.e.,
determinant of all minors of the correlation matrix is positive.

Moving average (MA) model

Here future values are forecasted using past forecast errors in a regression-like model. In MA
model, the current time series values are modelled using linear association with the lagged
residual values. The MA model of order q considers q lagged residual for developing the model.
In general, the qth-order moving average model is expressed as:
𝑞

𝑋(𝑡) = 𝜀(𝑡) − ∑ 𝜃𝑖 𝜀(𝑡 − 𝑖)


𝑖=1

where θi and ε(t − i ) are the MA parameter and residual at lag i respectively. Time series X(t)
is obtained after removing the deterministic components like trend and periodicity. The above
expression for MA(q) model can be expressed in terms of function of backshift operator as,

X(t) = ε(t) − θ1B(ε(t)) − θ2B2(ε(t)) −· · ·−θq Bq (ε(t))

or, X(t) = θ(B)X(t)

where θ(B) = 1 − θ1B − θ2B2 −· · ·−θq Bq for MA(q) model.

The assumptions for AR model also hold for MA model.

What is a unit roots?

Let us consider an AR(1) model, i.e time series is modelled as lagged value of time series with
some residuals or errors or white noise (εt) (WN~ 0, 𝜎 2 ).

Yt = Φ Y t-1 + ε t ---- (1) , where Φ is the coefficient of lagged value

Yt-1 = Φ Y t-2 + ε t-1 – substitute this in (1)

Yt-2 = Φ Y t-2 + ε t-2 - substitute this in (1)


Thus Yt becomes,

𝑌𝑡 = 𝛷𝑡 𝑌0 + ∑𝑡−1 𝑘
𝑘=0 𝛷 𝜀𝑡−𝑘

Now, Var (Yt) = 𝜎 2 [ 𝛷0 + 𝛷2 + 𝛷4 + ⋯ … … … + 𝛷 2(𝑡−1) ] -(2) where ( 𝛷𝑡 𝑌0 is a constant,


therefore variance is not contributed by it)

E(Yt) = 𝛷𝑡 𝑌0 ( Mean or expected value of WN or error is considered as zero)- (3)

Case 1: |𝜱| < 𝟏

If Φ< 1, then E(Yt) in (3) diminishes to zero as t->∞ .

In equation (2), the term [ 𝛷0 + 𝛷2 + 𝛷4 + ⋯ … … … + 𝛷 2(𝑡−1) ] is a geometric series which


𝜎2
reduces the Var (Yt) as Var (Yt) -> 1−𝛷2 , thus Var (Yt) also becomes constant.

As both E( Yt) and Var (Yt) remains constant, the time series is stationary if |𝛷| < 1

Case 2: |𝜱| > 𝟏

When Φ>1, the value of Yt explodes which also contributes to non-stationarity.

Case 3: |𝜱| = 𝟏

A time series is said to have unit root if Φ= +


−1.

If Φ=1, then E(Yt) in equation(3) equals 𝑌0 , which is a constant and makes us to believe it is
stationary. Apparently, it is not when we consider the variance too.

In equation (2), Var (Yt) = t. 𝜎 2 , i.e. variance is a function of time t and therefore it is not
constant.

Thus, the time series becomes non-stationary when the unit root =1.

The above is applied on AR(1) model. This can be extended to higher order models as well
wherein unit root is solved using characteristic equation due to the complexity. The equation
Φ(B) = 0 is called characteristic equation of AR model.
Time series with unit root, we cannot apply AR, MA, ARMA models. We have to do
transformations to the time series to remove the unit root.

• Unit root applies to all time series models, but we focus on AR model.

How to determine if a given data is stationary or not?

1) Visualization:- The values can be plotted on a graph and we can visually look for the varying
or non-varying parameters. However, this cannot be always accurate and therefore we need a
statistical method to predict it more rightly.

2) Dickey Fuller test

A more systematic approach to testing for the presence of a unit root of the autoregressive
polynomial in order to decide whether or not a time series should be differenced (to remove
non-stationarity) was pioneered by Dickey and Fuller (1979).

This test assumes that the model is AR (1). Let X1, . . . , Xn be observations from the AR(1)
model

Yt = µ +Φ Yt-1 + εt

From the concept of unit root, we have seen that when Φ =1, the series is considered as non-
stationary. In order to retain the LHS as stationary, the model equation is differenced once.

Yt - Yt-1 = µ +Φ Yt-1 - Yt-1 + εt

Δ Yt = = µ + (Φ-1) Yt-1 + εt

Substituting (Φ-1) as 𝛿 , the equation becomes,

Δ Yt = = µ + 𝛿 Yt-1 + εt

For Φ =1, δ = 0. Thus, the hypothesis for testing this regression is H0 : δ = 0 and Ha :δ < 0.

As Yt-1 is not stationary, we cannot consider the regular t distribution, instead it is corrected by
a standard error estimate.

̂
𝛿
𝑡𝛿̂ = 𝑆𝐸 (𝛿̂)

−1/2 ∑𝑛
𝑡=2(𝛥𝑌𝑡 −𝜇− 𝛿𝑌𝑡−1 )
2
𝑆𝐸 (𝛿̂) = 𝑆 (∑𝑛𝑡=2(𝑌𝑡−1 − 𝑌̅) and S2 =
𝑛−3
Where 𝑌̅ is the sample mean. The 0.01, 0.05, and 0.10 quantiles of the limit distribution of
𝑡𝛿̂ ˆare −3.43, −2.86, and −2.57 respectively.

Compare the t ration thus obtained 𝑡𝛿̂ with Dickey Fuller Distribution. If 𝑡𝛿̂ < DFcritical, then
reject the null hypothesis (H0) and if 𝑡𝛿̂ > DFcritical , then we fail to reject the null hypothesis.

Augmented Dickey fuller test

The Augmented Dickey-Fuller test can be used to test for a unit root in a univariate process in
the presence of serial correlation. The above procedure explained in Dickey Fuller test can be
extended to the case where {Yt} follows the AR( p) model with mean μ given by
𝑝

𝑌𝑡 = 𝜇 + ∑ 𝛷𝑖 𝑌𝑡−𝑖 + 𝜀𝑡
𝑖=1

𝛥𝑌𝑡 = 𝜇 + 𝛿 𝑌𝑡−1 + ∑ 𝛽𝑖 𝛥𝑌𝑡−𝑖 + 𝜀𝑡


𝑖=1

For Φ =1, δ = 0. Thus, the hypothesis for testing this regression is H0 : δ = 0 and Ha:δ <0.

𝛽̂
𝑡𝛽̂ =
𝑆𝐸 (𝛽̂)

The null hypothesis of the Augmented Dickey-Fuller is that there is a unit root, with the
alternative that there is no unit root. If the p-value is above a critical size, then we cannot reject
that there is a unit root. If the p-value is close to significant, then the critical values should be
used to judge whether to reject the null.

Once you know how many lags to use, the augmented test is identical to the simple Dickey-
Fuller test. We can use the Akaike Information Criterion (AIC) or Bayesian Information
Criteria (BIC) to determine how many lags to consider. Before that a brief note on what is
maximum likelihood function of a model.

Maximum Likelihood

Maximum-likelihood (ML) method assumes that the best estimator of a parameter of a


distribution should maximize the likelihood or the joint probability of occurrence of a sample.
Let us consider, x = (x1, . . . , xn) is a set of n independent and identically distributed observed
sample and f (x, θ) is the probability distribution function with parameter θ. The likelihood
function can be written as follows:
𝑛

𝐿 = ∏ 𝑓𝑥 (𝑥𝑖 )
𝑖=1

Where the symbol П indicates multiplication. Sometimes, it becomes convenient to work with
logarithmic of likelihood function, i.e,

ln L = ∑𝑛𝑖=1 ln[𝑓𝑥 (𝑥𝑖 )]

In this case, 𝜃̂ is said to be the maximum-likelihood estimator (MLE) of θ if 𝜃̂ maximizes the


function L or ln (L).

Akaike Information Criterion (AIC)

Increasing the number of parameters in order to achieve model flexibility and greater accuracy
also increases the estimation uncertainty. Some information measures seek to summarize into
a single metric these two opposing objectives of model selection, namely, greater accuracy and
less estimation uncertainty. The first information measure is the Akaike Information Criterion
(AIC), introduced by Akaike (1974) and given by the following expression:

AIC = 2k-2 ln (𝐿̂)

Where 𝐿̂ is maximum likelihood function of the model and k is the number of estimated model
parameters. The model with least AIC is selected. AIC deals with both the risk of overfitting
and the risk of underfitting.

Bayesian Information Criterion (AIC)

When fitting models, it is possible to increase the likelihood by adding parameters, but doing
so may result in overfitting. Both BIC and AIC attempt to resolve this problem by introducing
a penalty term for the number of parameters in the model; the penalty term is larger in BIC
than in AIC.

BIC= k ln(n) – 2 ln (𝐿̂)

Where n is the number of data points (i.e. sample size), k is the number of parameters estimated
by the model (i.e k=q+2, intercept, q slope parameters, constant variance of errors) , 𝐿̂ is the
maximum likelihood function of the model.
But for selecting the best-suited ARMA model from a pool of feasible ARMA models (with
different orders), AIC should be preferred over BIC.

Next is another test for stationarity which is KPSS test.

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

The Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test figures out if a time series is stationary


around a mean or linear trend, or is non-stationary due to a unit root. In other words, it tests for
the stationarity of a given series around a deterministic trend. the test is somewhat similar in
spirit with the ADF test. A common misconception, however, is that it can be used
interchangeably with the ADF test. This can lead to misinterpretations about the stationarity,
which can easily go undetected causing more problems down the line.

A key difference from ADF test is the null hypothesis of the KPSS test is that the series is
stationary. So practically, interpretation of p-value is just the opposite to each other. That is, if
p-value is < significance level (say 0.05), then the series is non-stationary. Whereas in ADF
test, it would mean the tested series is stationary. The p-value reported by the test is the
probability score based on which you can decide whether to reject the null hypothesis or not.
If the p-value is less than a predefined alpha level (typically 0.05), we reject the null hypothesis.
The KPSS statistic is the actual test statistic that is computed while performing the test. In order
to reject the null hypothesis, the test statistic should be greater than the provided critical values.
If it is in fact higher than the target critical value, then that should automatically reflect in a
low p-value. That is, if the p-value is less than 0.05, the kpss statistic will be greater than the
5% critical value.

Null Hypothesis (H0): The series is stationary: p−value>0.05

Alternate Hypothesis (Ha): The series is not stationary: p−value≤0.05

Another major difference between KPSS and ADF tests is the capability of the KPSS test to
check for stationarity in the ‘presence of a deterministic trend’. The test may not necessarily
reject the null hypothesis (that the series is stationary) even if a series is steadily increasing or
decreasing. The word ‘deterministic’ implies the slope of the trend in the series does not change
permanently. That is, even if the series goes through a shock, it tends to regain its original path.

A comparison of ADF test and KPSS test results for various types of stationarity to illustrate
afore said differences.
Case 1: Both tests conclude that the series is not stationary - The series is not stationary

Case 2: Both tests conclude that the series is stationary - The series is stationary

Case 3: KPSS indicates stationarity and ADF indicates non-stationarity - The series is trend
stationary. Trend needs to be removed to make series strict stationary. The detrended series is
checked for stationarity.

Case 4: KPSS indicates non-stationarity and ADF indicates stationarity - The series is
difference stationary. Differencing is to be used to make series stationary. The differenced se

A major disadvantage for the KPSS test is that it has a high rate of Type I errors (it tends to
reject the null hypothesis too often). If attempts are made to control these errors (by having
larger p-values), then that negatively impacts the test’s power. One way to deal with the
potential for high Type I errors is to combine the KPSS with an ADF test. If the result from
both tests suggests that the time series in stationary, then it probably is.

The KPSS test is based on linear regression. It breaks up a series into three parts: a deterministic
trend (βt), a random walk (rt), and a stationary error (εt), with the regression equation:

xt = rt + βt + ε1

KPSS test statistic is given by,

𝑁
1
𝐾𝑃𝑆𝑆𝑁 = 2 2 ∑ 𝑆𝑛2
𝑁 𝜎̂𝑁
𝑛=1

Where N is the number of samples, 𝜎̂𝑁2 is a consistent estimator of the long run variance σ2 of
the residuals, 𝑆𝑛2 is the squared sum of cumulative residual.

𝑆𝑛 = ∑ 𝜀𝑡
𝑗=1

where εt is the residuals, i,e observed value- expected value (mean).

When the model is for constant regression expected value shall be the average and when the
model is for constant with trend regression the expected value shall be estimated from the
values of slope and intercept determined using OLS method.

The KPSS statistics are compared against the critical values predefined for the same.

You might also like