Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 22

What is time series

decomposition and how


does it work?
Plus a headfirst dive into a powerful time series
decomposition algorithm using Python

Sachin Date

Jun 20, 2020·9 min read

A time series can be thought of as being made up of 4


components:

A seasonal component
A trend component
A cyclical component, and
A noise component.

The Seasonal component

The seasonal component explains the periodic ups and


downs one sees in many data sets such as the one shown
below.
Retail Used Car Sales. Data source: US FRED (Image by Author)

In the above example, the seasonal period is approximately


12 months and it peaks in March and bottoms out in
November or December before peaking in March again.

A time series can contain multiple superimposed seasonal


periods. A classic example is a time series of hourly
temperatures at a weather station. Since the Earth rotates
around its axis, the graph of hourly temperatures at a
weather station will show a seasonal period of 24 hours. The
Earth also revolves around the Sun in a tilted manner,
leading to seasonal temperature variations. If you follow the
temperature at the weather station at say 11 am for 365 days,
you will see a second pattern emerging that has a period of
12 months. The 24 hour long daily pattern is superimposed
on the 12 month long yearly pattern.

In case of the hourly weather data, one knows what are the
underlying physical phenomena that are causing the two
seasonal patterns. But in most cases, it’s not possible to
know what are all the factors that are introducing
seasonality into your data. And so, it is seldom easy to
unearth all the seasonal periods that may be hiding in a time
series.

That being said, the commonly occurring seasonal periods


are a day, week, month, quarter (or season), and year.

Seasonality is also observed on much longer time scales such


as in the solar cycle, which follows a roughly 11 year period.
Daily sunspot count. Data source: SILSO (Image by Author)

The Trend component

The Trend component refers to the pattern in the data that


spans across seasonal periods.

The time series of retail eCommerce sales shown below


demonstrates a possibly quadratic trend (y = x²) that spans
across the 12 month long seasonal period:
Retail eCommerce sales. Data source: US FRED (Image by Author)

The Cyclical component

The cyclical component represents phenomena that happen


across seasonal periods. Cyclical patterns do not have a fixed
period like seasonal patterns do. An example of a cyclical
pattern is the cycles of boom and bust that stock markets
experience in response to world events.
Dow Jones % change in closing price from previous year (1880–2020). Data
source: MeasuringWorth.com via Wikipedia) (Image by Author)

The cyclical component is hard to isolate and it's often ‘left


alone’ by combining it with the trend component.

The Noise component

The noise or the random component is what remains behind


when you separate out seasonality and trend from the time
series. Noise is the effect of factors that you do not know, or
which you cannot measure. It is the effect of the known
unknowns, or the unknown unknowns.

Additive and Multiplicative effects

The trend, seasonal and noise components can combine in


an additive or a multiplicative way.

Additive combination
If the seasonal and noise components change the trend by
an amount that is independent of the value of trend, the
trend, seasonal and noise components are said to behave in
an additive way. One can represent this situation as
follows:

y_i = t_i + s_i + n_i

where y_i = the value of the time series at the ith time step.


t_i = the trend component at the ith time step.
s_i = the seasonal component at the ith time step.
n_i = the noise component at the ith time step.

Multiplicative combination
If the seasonal and noise components change the trend by
an amount that depends on the value of trend, the three
components are said to behave in a multiplicative way as
follows:

y_i = t_i * s_i * n_i

A step-by-step procedure for decomposing a time series


into trend, seasonal and noise components using Python

There are many decomposition methods available ranging


from simple moving average based methods to powerful
ones such as STL.

In Python, the statsmodels library has


a seasonal_decompose() method that lets you decompose a
time series into trend, seasonality and noise in one line of
code.

In my articles, we like to get into the weeds. So before we


use seasonal_decompose(), let’s do a deep dive into a
simple, yet powerful time series decomposition technique.

Let’s understand how decomposition really works under


the covers.

We’ll hand-crank out the decomposition of a time series into


its trend, seasonal and noise components using a simple
procedure based on moving averages using the following
steps:

STEP 1: Identify the length of the seasonal period


STEP 2: Isolate the trend
STEP 3: Isolate the seasonality+noise
STEP 4: Isolate the seasonality
STEP 5: Isolate the noise

We’ll use as an example, the following time series of retail


sales of user cars dealers in the US:
Retail Used Car Sales. Data source: US FRED (Image by Author)

Let’s load the data into a pandas DataFrame and plot the
time series:
import pandas as pd
import numpy as np
import math
from matplotlib import pyplot as pltmydateparser = lambda x:
pd.datetime.strptime(x, '%d-%m-%y')df =
pd.read_csv('retail_sales_used_car_dealers_us_1992_2020.csv',
header=0, index_col=0, parse_dates=['DATE'],
date_parser=mydateparser)fig =
plt.figure()fig.suptitle('Retail sales of used car dealers in
the US in millions of
dollars')df['Retail_Sales'].plot()plt.show()
Now let’s begin the step by step decomposition of
this time series.

STEP 1: Try to guess the duration of the seasonal


component in your data. In the above example, we’ll guess it
to be 12 months.

STEP 2: Now run a 12 month centered moving


average on the data. This moving average is spread across
a total of 13 months. i.e. 6 months each on the left and right
side of the center month. The 12 month centered MA is an
average of two moving averages that are shifted from each
other by 1 month, effectively making it a weighted moving
average.

Here is an illustration of how this centered MA can be


calculated in Microsoft Excel:
Illustration of a 2 x 12 centered moving average (Image by Author)
This MA will smooth out seasonality and noise and bring out
the trend.

Continuing with our Python example, here is how we can


calculate the centered moving average in Python:
#Add an empty column to store the 2x12 centered MA values
df['2 x 12 CMA (TREND)'] = np.nan#Fill it up with the 2x12
centered MA values
for i in range(6,df['Retail_Sales'].size-6):
df['2 x 12 CMA (TREND)'][i] = np.round(
df['Retail_Sales'][i - 6] * 1.0 / 24 +
(
df['Retail_Sales'][i - 5] +
df['Retail_Sales'][i - 4] +
df['Retail_Sales'][i - 3] +
df['Retail_Sales'][i - 2] +
df['Retail_Sales'][i - 1] +
df['Retail_Sales'][i] +
df['Retail_Sales'][i + 1] +
df['Retail_Sales']i + 2] +
df['Retail_Sales'][i + 3] +
df['Retail_Sales'][i + 4] +
df['Retail_Sales'][i + 5]
) * 1.0 / 12 +
df['Retail_Sales'][i + 6] * 1.0 / 24

Notice how the values at indices [i-6] and [i+6] are weighted


by 1.0/24 while the rest of the values are each weighted
by 1.0/12.

Let’s plot the resulting time series that is contained in


column ‘2 x 12 CMA (TREND)’:
#plot the trend component
fig = plt.figure()fig.suptitle('TREND component of Retail
sales of used car dealers in the US in millions of
dollars')df['2 x 12 CMA (TREND)'].plot()plt.show()
As you can see, our moving average transformation has
highlighted the trend component of the retail sales time
series:

(Image by Author)

STEP 3: Now we have a decision to make. Depending on


whether the composition is multiplicative or additive,
we’ll need to divide or subtract the trend component
from the original time series to retrieve the seasonal and
noise components. If we inspect the original car sales time
series, we can see that the seasonal swings are
increasing in proportion to the current value of the
time series. Hence we’ll assume that the seasonality is
multiplicative. We’ll also take a small leap of faith to assume
that the noise is multiplicative.

Thus the retail used car sales time series is assumed to have
the following multiplicative decomposition model:

Time series value = trend component * seasonal component


* noise component

Therefore:

seasonal component * noise component = Time series value


/ trend component

We’ll add a new column into our data frame and fill it with
the product of the seasonal and noise components using the
above formula.
df['SEASONALITY AND NOISE'] = df['Retail_Sales']/df['2 x 12
CMA (TREND)']

Let’s plot the new column. This time, we will see the
seasonality and noise showing through:
fig = plt.figure()fig.suptitle('SEASONALITY and NOISE
components')plt.ylim(0, 1.3)df['SEASONALITY AND
NOISE'].plot()plt.show()
(Image by Author)

STEP 4: Next, we will get the ‘pure’ seasonal component


out of the mixture of seasonality and noise, by calculating
the average value of the seasonal component for all January
months, all February months, all March months and so on.
#first add a month column
df['MONTH'] =
df.index.strftime('%m').astype(np.int)#initialize the month
based dictionaries to store the running total of the month
wise seasonal sums and counts
average_seasonal_values = {1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0,
8:0, 9:0, 10:0, 11:0, 12:0}average_seasonal_value_counts =
{1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0, 10:0, 11:0,
12:0}#calculate the sums and counts
for i in range(0, df['SEASONALITY AND NOISE'].size):
if math.isnan(df['SEASONALITY AND NOISE'][i]) is False:
average_seasonal_values[df['MONTH'][i]] =
average_seasonal_values[df['MONTH'][i]] +
df['SEASONALITY AND NOISE'][i]
average_seasonal_value_counts[df['MONTH'][i]] =
average_seasonal_value_counts[df['MONTH'][i]] +
1#calculate the average seasonal component for each month
for i in range(1, 13):
average_seasonal_values[i] = average_seasonal_values[i] /
average_seasonal_value_counts[i]#create a new column in the
data frame and fill it with the value of the average
seasonal component for the corresponding
monthdf['SEASONALITY'] = np.nanfor i in range(0,
df['SEASONALITY AND NOISE'].size):
if math.isnan(df['SEASONALITY AND NOISE'][i]) is False:
df['SEASONALITY'][i] =
average_seasonal_values[df['MONTH'][i]]

Let's plot this pure seasonal component:


#plot the seasonal component
fig = plt.figure()fig.suptitle('The \'pure\' SEASONAL
component')plt.ylim(0, 1.3)df['SEASONALITY'].plot()plt.show()
(Image by Author)

STEP 5: Finally, we will divide the noisy seasonal value that


we had isolated earlier with the averaged out seasonal value
to yield just the noise component for each month.

noise component = noisy seasonal component / averaged


out seasonal component
df['NOISE'] = df['SEASONALITY AND
NOISE']/df['SEASONALITY']#plot the seasonal componentfig =
plt.figure()fig.suptitle('The NOISE component')plt.ylim(0,
1.3)df['NOISE'].plot()plt.show()
(Image by Author)

So there you have it! We just hand cranked out the


procedure for decomposing a time series into its trend,
seasonal and noise components.

Here is a collage of the time series and its constituent


components:
(Image by Author)

Time series decomposition using statsmodels

Now that we know how decomposition works from the


inside, we can cheat a little, and use the
seasonal_decompose() in statsmodels to do all of the above
work in one line of code:
from statsmodels.tsa.seasonal import
seasonal_decomposecomponents =
seasonal_decompose(df['Retail_Sales'],
model='multiplicative')components.plot()

Here’s the plot we get:


Output of seasonal_decompose() on the Retail Used Car Sales data set (Image
by Author)

Here is the complete Python source code:

And here is the link to the data set used in the Python


example.

You might also like