Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

A beginner’s guide to Time Series Analysis


How to get the most out of the time-series data

Santhosh Kumar R Follow


Nov 22, 2019 · 6 min read

Time series (TS) analysis is a sub-domain in data science. As the name suggests, Time
series is simply a sequence of observations of a variable measured over time. We analyze
the series to determine the long term trend to forecast for the future or perform some
analysis. When modeling such processes, the time component becomes an important
variable. Different tools and methods have been developed to incorporate the time
factor while modeling the process. I will present some common techniques involved in
analyzing with time series in this article.

I will explain the concepts as we go through by practical implementing of real-world


data. In this tutorial, I have used 64 years Dataset of mean monthly streamflow data
measured between 1950 and 2013 at a gauging station. Lets’s look at the time series.

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 1/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

The figure above shows the Monthly mean streamflow in cubic feet per second values
plotted versus time from 1950 to 2013. From the plot, observing the oscillating behavior
it seems like Periodicity exists in the data and also data is stationary at an annual scale
(12 months) as data is oscillating around fixed mean. Anyway, this presumption will
become more evident when we statistically test these hypotheses.

Generally, Following steps, are involved in building a TS model

1. Stationary and Periodicity check

2. Time series transformation

3. Model building

4. Validation

Stationary and Periodicity check

TS is said to be stationary if its statistical properties such as mean, variance remain


constant over time. It is crucial to make sure data is stationary because most of the TS
models, including ARIMA, assume data to be stationary.

Typically, statistical properties considered are:

1. constant mean

2. constant variance

One way to find check the presence of stationarity in the data is simply plotting the
rolling mean and standard deviation with a fixed window size along the time. The plot
below shows that for our data.

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 2/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

Augmented Dickey-Fuller Test


Dickey-Fuller test is the statistical test popular for testing whether data is exhibiting
stationarity or not. Here we test for the null hypothesis which is series has a unit root in
other words TS is not stationary. If the test statistic does not provide enough evidence
then we will reject null accept that TS is stationary. The results of the DF test is
presented below.

Results of Dickey-Fuller Test:

0
Test Statistic -5.281712
p-value 0.000006
#Lags Used 20.000000
Number of Observations Used 747.000000
Critical Value (1%) -3.439134
Critical Value (5%) -2.865417
Critical Value (10%) -2.568834

Notice that the test statistic is less than critical values that correspond to 1, 5 and 10%.
Simply put, the test says it is not even 1% confident that TS is non-stationary. Hence, we
reject null and accept TS data to be stationary.

Auto-Correlation Function

The autocorrelation indicates dependencies of value at the current time step on the
previous time steps. It is the correlation of data and itself with some lag. Suppose, if we
want to find ACF at one-step lag then you need to correlate series starting from x(t) with
itself starting from x(t-1) where x(t) and x(t-1) are the values at the current step and
previous step respectively. Correlation tells us whether the process is cyclic or purely
random with no dependencies. Below is the Auto-Correlation function vs lag plotted by
taking lag up to 25% of data length.

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 3/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

From the plot, it is confirmed that monthly mean streamflow is exhibiting stationary
concerning time as we can see the Auto-Correlation function decaying very slowly. Here
the points exceeding the Blue line (significance level) are all significant. The oscillating
pattern also indicates the presence of periodicity in the data.

Partial Auto-Correlation function

The variable of importance at the current step depends on the values observed at a few
previous steps unless the process is purely random. One of the components in the
ARIMA model uses this information for regressing variable at the current step (Y =
x(t))against it with lags (X = x(t-1), x(t-2)…). Knowing this dependency can be crucial
in building a forecasting model. Partial Auto Correlation (PAC) defines the dependence
of x(t) on x(t-k) when the dependence of all preceding variables (x(t-1), x(t-2) …x(t-k-
1)) are removed. This tells us how much explanatory power x(t-k) term has on x(t).

For example Lets’s assume that x(t) is dependent on x(t-1) and x(t-2) only. Then upon
regressing x(t) against x(t-1) the residual errors resulting are due to absence x(t-2). This
means regressing residuals against x(t-2) should show a strong relation.

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 4/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

Plot x(t) is strongly associated with x(t-1) with PAC value of close to 0.6. Notice that few
other lag effect x(t) inversely.

Spectral analysis

Periodicities in data can best be determined by analyzing the time series in the
frequency domain. Spectral density represents time series in the frequency domain
instead of the time domain. The idea behind plotting data in the frequency domain is
that it will show a spike on the frequency at which the pattern repeats itself. If you want
to more about Frequency Domain Analysis, just Google it.

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 5/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

In the line spectrum, spike at the angular frequency (W(k)) of 0.528 indicates data
exhibiting periodicity that corresponds to 12 months (2×pi/0.528) which means the
pattern repeats every 12 months.

Standardization of steam flow

To build the forecast model using the ARIMA method. We cannot use the time series as it
is as ARIMA assumes data to be not stationary. But our data is not stationary at a
monthly scale (forecast interval). So we need to transform data to make it stationary.
One way to do that is by standardizing time series. Simply data deducting respective
monthly mean and dividing monthly standard deviation. Here mean and the standard
deviation has to be selected corresponding to the month of observation.

Standardization transforms periodic series into a sequence of uncorrelated, identically


distributed random variables with zero mean and constant variance. The plot below
shows the Standardized steam flow series.

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 6/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

Auto-Correlation function

Let’s plot ACF computed for standardized data to check the existence of periodicity

Since the Auto-Correlation function is decaying at a faster rate which is the indication of
non-stationary, So it is evident that dependencies have been removed.

Periodicity in the standardized data

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 7/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium

From the plot, it is evident that periodicity that exited in the original data has been
removed.

Now our processed data model consumable. In the next article, we will try to fit data on
the various models and choose the best among them.

Thank you! I appreciate you for reading my article

Note: Readers who are interested in theory and math behind these concepts can refer to
this YouTube playlist of lecture videos by Prof. P. P. Mujumdar, IISc Bangalore.

References:

1. https://nptel.ac.in/courses/105108079/

2. https://courses.analyticsvidhya.com/courses/take/a-comprehensive-learning-path-
to-become-a-data-scientist-in-2019/texts/6087012-introduction-to-time-series-
forecasting

Data Science Time Series Analysis Statistics Data Visualization Analytics

About Help Legal

Get the Medium app

https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 8/8

You might also like