Professional Documents
Culture Documents
A Beginner's Guide To Time Series Analysis - by Santhosh Kumar R - Medium
A Beginner's Guide To Time Series Analysis - by Santhosh Kumar R - Medium
Time series (TS) analysis is a sub-domain in data science. As the name suggests, Time
series is simply a sequence of observations of a variable measured over time. We analyze
the series to determine the long term trend to forecast for the future or perform some
analysis. When modeling such processes, the time component becomes an important
variable. Different tools and methods have been developed to incorporate the time
factor while modeling the process. I will present some common techniques involved in
analyzing with time series in this article.
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 1/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium
The figure above shows the Monthly mean streamflow in cubic feet per second values
plotted versus time from 1950 to 2013. From the plot, observing the oscillating behavior
it seems like Periodicity exists in the data and also data is stationary at an annual scale
(12 months) as data is oscillating around fixed mean. Anyway, this presumption will
become more evident when we statistically test these hypotheses.
3. Model building
4. Validation
1. constant mean
2. constant variance
One way to find check the presence of stationarity in the data is simply plotting the
rolling mean and standard deviation with a fixed window size along the time. The plot
below shows that for our data.
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 2/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium
0
Test Statistic -5.281712
p-value 0.000006
#Lags Used 20.000000
Number of Observations Used 747.000000
Critical Value (1%) -3.439134
Critical Value (5%) -2.865417
Critical Value (10%) -2.568834
Notice that the test statistic is less than critical values that correspond to 1, 5 and 10%.
Simply put, the test says it is not even 1% confident that TS is non-stationary. Hence, we
reject null and accept TS data to be stationary.
Auto-Correlation Function
The autocorrelation indicates dependencies of value at the current time step on the
previous time steps. It is the correlation of data and itself with some lag. Suppose, if we
want to find ACF at one-step lag then you need to correlate series starting from x(t) with
itself starting from x(t-1) where x(t) and x(t-1) are the values at the current step and
previous step respectively. Correlation tells us whether the process is cyclic or purely
random with no dependencies. Below is the Auto-Correlation function vs lag plotted by
taking lag up to 25% of data length.
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 3/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium
From the plot, it is confirmed that monthly mean streamflow is exhibiting stationary
concerning time as we can see the Auto-Correlation function decaying very slowly. Here
the points exceeding the Blue line (significance level) are all significant. The oscillating
pattern also indicates the presence of periodicity in the data.
The variable of importance at the current step depends on the values observed at a few
previous steps unless the process is purely random. One of the components in the
ARIMA model uses this information for regressing variable at the current step (Y =
x(t))against it with lags (X = x(t-1), x(t-2)…). Knowing this dependency can be crucial
in building a forecasting model. Partial Auto Correlation (PAC) defines the dependence
of x(t) on x(t-k) when the dependence of all preceding variables (x(t-1), x(t-2) …x(t-k-
1)) are removed. This tells us how much explanatory power x(t-k) term has on x(t).
For example Lets’s assume that x(t) is dependent on x(t-1) and x(t-2) only. Then upon
regressing x(t) against x(t-1) the residual errors resulting are due to absence x(t-2). This
means regressing residuals against x(t-2) should show a strong relation.
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 4/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium
Plot x(t) is strongly associated with x(t-1) with PAC value of close to 0.6. Notice that few
other lag effect x(t) inversely.
Spectral analysis
Periodicities in data can best be determined by analyzing the time series in the
frequency domain. Spectral density represents time series in the frequency domain
instead of the time domain. The idea behind plotting data in the frequency domain is
that it will show a spike on the frequency at which the pattern repeats itself. If you want
to more about Frequency Domain Analysis, just Google it.
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 5/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium
In the line spectrum, spike at the angular frequency (W(k)) of 0.528 indicates data
exhibiting periodicity that corresponds to 12 months (2×pi/0.528) which means the
pattern repeats every 12 months.
To build the forecast model using the ARIMA method. We cannot use the time series as it
is as ARIMA assumes data to be not stationary. But our data is not stationary at a
monthly scale (forecast interval). So we need to transform data to make it stationary.
One way to do that is by standardizing time series. Simply data deducting respective
monthly mean and dividing monthly standard deviation. Here mean and the standard
deviation has to be selected corresponding to the month of observation.
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 6/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium
Auto-Correlation function
Let’s plot ACF computed for standardized data to check the existence of periodicity
Since the Auto-Correlation function is decaying at a faster rate which is the indication of
non-stationary, So it is evident that dependencies have been removed.
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 7/8
12/09/2020 A beginner’s guide to Time Series Analysis | by Santhosh Kumar R | Medium
From the plot, it is evident that periodicity that exited in the original data has been
removed.
Now our processed data model consumable. In the next article, we will try to fit data on
the various models and choose the best among them.
Note: Readers who are interested in theory and math behind these concepts can refer to
this YouTube playlist of lecture videos by Prof. P. P. Mujumdar, IISc Bangalore.
References:
1. https://nptel.ac.in/courses/105108079/
2. https://courses.analyticsvidhya.com/courses/take/a-comprehensive-learning-path-
to-become-a-data-scientist-in-2019/texts/6087012-introduction-to-time-series-
forecasting
https://medium.com/@skumarr53/a-beginners-guide-to-time-series-analysis-411d8baaccc6 8/8