ML Assignment Presentation

ML
Assignment Presentation
Competition = https://www.kaggle.com/c/m5-forecasting-accuracy (https://www.kaggle.com/c/m5-forecasting-accuracy)
In this competition, contestants are challenged to forecast future sales at Walmart based on heirarchical sales in the states of California, Texas, and
Wisconsin. Forecasting sales, revenue, and stock prices is a classic application of machine learning in economics, and it is important because it allows
investors to make guided decisions based on forecasts made by algorithms.
In this python3 notebook, I will briefly explain the structure of dataset. Then, I will visualize the dataset using Matplotlib and Plotly. And finally, I will
demonstrate how this problem can be approached with a variety of forecasting algorithms.
Contents
Objective
Import Libraries
Load Data
Data Exploration and Preparation
Data Viewing
Denoising - removing noise
Wavelet Denoising
Average Smoothing
Exploratory Data Analytics (EDA)
Train Test Split
Modeling
Naive Approach
Moving Average
Exponential Smoothing
Prophet - by Facebook
Loss comparisions for models
Future Works
Objective
To analyze the hierarchical sales data from Walmart, the world’s largest company by revenue, to forecast daily sales. The data covers stores in three
US States (California, Texas, and Wisconsin) and includes item level, department, product categories, and store details. In addition, it has explanatory
variables such as price, promotions, day of the week, and special events.
We aim to Explore the data and use different models to predict the sales.
This is a Time series forcasting problem and as no test data is give, we shall split the datset into training and testing time frames.
Import Libraries
In [1]:
import os
import gc
import time
import math
import datetime
from math import log, floor
from sklearn.neighbors import KDTree
import numpy as np
import pandas as pd
from pathlib import Path
from sklearn.utils import shuffle
from tqdm.notebook import tqdm as tqdm
import seaborn as sns

from matplotlib import colors
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import pywt
from statsmodels.robust import mad
import scipy
import statsmodels
from scipy import signal
import statsmodels.api as sm
from fbprophet import Prophet
from scipy.signal import butter, deconvolve
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
Load Data
In [2]:
calendar = pd.read_csv('calendar.csv')
selling_prices = pd.read_csv('sell_prices.csv')
sample_submission = pd.read_csv('sample_submission.csv')
sales_train_val = pd.read_csv('sales_train_validation.csv')
Data Exploration and Preparation
Data Viewing
In [3]:
print(calendar.head())
date wm_yr_wk weekday wday month year d event_name_1 \

0 2011-01-29 11101 Saturday 1 1 2011 d_1 NaN
1 2011-01-30 11101 Sunday 2 1 2011 d_2 NaN
2 2011-01-31 11101 Monday 3 1 2011 d_3 NaN
3 2011-02-01 11101 Tuesday 4 2 2011 d_4 NaN
4 2011-02-02 11101 Wednesday 5 2 2011 d_5 NaN
event_type_1 event_name_2 event_type_2 snap_CA snap_TX snap_WI

0 NaN NaN NaN 0 0 0
1 NaN NaN NaN 0 0 0
2 NaN NaN NaN 0 0 0
3 NaN NaN NaN 1 1 0
4 NaN NaN NaN 1 0 1
Calendar contains information about the dates on which the products are sold
In [4]:
print(selling_prices.head())
store_id item_id wm_yr_wk sell_price

0 CA_1 HOBBIES_1_001 11325 9.58
1 CA_1 HOBBIES_1_001 11326 9.58
2 CA_1 HOBBIES_1_001 11327 8.26
3 CA_1 HOBBIES_1_001 11328 8.26
4 CA_1 HOBBIES_1_001 11329 8.26
selling_prices contains information about the price of the products sold per store and date
In [5]:
print(sales_train_val.head())
id item_id dept_id cat_id store_id \

0 HOBBIES_1_001_CA_1_validation HOBBIES_1_001 HOBBIES_1 HOBBIES CA_1
state_id d_1 d_2 d_3 d_4 ... d_1904 d_1905 d_1906 d_1907 d_1908 \
0 CA 0 0 0 0 ... 1 3 0 1 1
1 CA 0 0 0 0 ... 0 0 0 0 0
2 CA 0 0 0 0 ... 2 1 2 1 1
3 CA 0 0 0 0 ... 1 0 5 4 1
4 CA 0 0 0 0 ... 2 1 1 0 1
d_1909 d_1910 d_1911 d_1912 d_1913

0 1 3 0 1 1
1 1 0 0 0 0
2 1 0 1 1 1
3 0 1 3 7 2
4 1 2 2 2 4
[5 rows x 1919 columns]
sales_train_val contains the historical daily unit sales data per product and store [d_1 - d_1913]
to see what we need to submit as a result, lets see the sample submission
In [6]:
print(sample_submission.head())
id F1 F2 F3 F4 F5 F6 F7 F8 F9 ... \
0 HOBBIES_1_001_CA_1_validation 0 0 0 0 0 0 0 0 0 ...
F19 F20 F21 F22 F23 F24 F25 F26 F27 F28
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0
Below are sales data from six randomly selected stores in the dataset.
In [7]:
ids = sorted(list(set(sales_train_val['id'])))
d_cols = [c for c in sales_train_val.columns if 'd_' in c]
x_1 = sales_train_val.loc[sales_train_val['id'] == ids[0]].set_index('id')[d_cols].values[0]
fig = make_subplots(rows=2, cols=3)
fig.add_trace(go.Scatter(x=np.arange(len(x_1)), y=x_1,mode='lines',
name="First sample",marker=dict(color="green")),row=1, col=1)
name="Second sample",marker=dict(color="violet")),row=1, col=2)
name="Third sample",marker=dict(color="blue")),row=1, col=3)
name="Fourth sample",marker=dict(color="red")),row=2, col=1)
name="Fifth sample",marker=dict(color="yellow")),row=2, col=2)
name="Sixth sample",marker=dict(color="gray")),row=2, col=3)
fig.update_layout(height=500, width=800, title_text="Sample sales")

fig.show()
Sample sales
15
First sample
100 Second sample
10 20
Third sample
Fourth sample
50 10
5 Fifth sample
Sixth sample
0 0 0
0 500 1000 1500 0 500 1000 1500 0 500 1000 1500
15
15
10 10
5
5 5
0 0 0
0 500 1000 1500 0 500 1000 1500 0 500 1000 1500
The sales data is very erratic, owing to the fact that so many factors affect the sales on a given day.
On certain days, the sales quantity is zero, which indicates that a certain product may not be available on that day
Denoising
This method may lose some information from the original time series, but it may be useful in extracting certain features regarding the general trends in
the time series. Denoising is done to get rid of rare occurances and get an overall trend alone.
we shall see 2 denoising methods, they are
Wavelet Denoising
Average Smoothing
Wavelet Denoising
Wavelet denoising (usually used with electric signals) is a way to remove the unnecessary noise from a time series. This method calculates coefficients
called the "wavelet coefficients". These coefficients decide which pieces of information to keep (signal) and which ones to discard (noise). We make
use of the MAD (mean absolute deviation) value to understand the randomness in the sales and accordingly decide the minimum threshold for the
wavelet coefficients in the time series. We filter out the low coefficients from the wavelets and reconstruct the sales data from the remaining coefficients
In [8]:
def maddest(d, axis=None):

return np.mean(np.absolute(d - np.mean(d, axis)), axis)
def denoise_signal(x, wavelet='db4', level=1):

coeff = pywt.wavedec(x, wavelet, mode="per")
sigma = (1/0.6745) * maddest(coeff[-level])
uthresh = sigma * np.sqrt(2*np.log(len(x)))

coeff[1:] = (pywt.threshold(i, value=uthresh, mode='hard') for i in coeff[1:])
return pywt.waverec(coeff, wavelet, mode='per')
From here on, we shall take the first 3 samples alone and experiment on them
In [9]:
# Wavelet Denoising
y_w1 = denoise_signal(x_1)
fig.add_trace(
go.Scatter(x=np.arange(len(x_1)), mode='lines+markers', y=x_1,
marker=dict(color="lightgreen"), showlegend=False,
name="Original signal"),
row=1, col=1
)
fig.add_trace(
go.Scatter(x=np.arange(len(x_1)), y=y_w1, mode='lines',
marker=dict(color="darkgreen"), showlegend=False,
name="Denoised signal"),
row=1, col=1
)
fig.add_trace(
marker=dict(color="yellow"), showlegend=False),
row=2, col=1
)
fig.add_trace(
marker=dict(color="red"), showlegend=False),
row=2, col=1
)
fig.add_trace(
marker=dict(color="lightblue"), showlegend=False),
row=3, col=1
)
fig.add_trace(
marker=dict(color="darkblue"), showlegend=False),
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Original (pale) vs. Denoised (dark) sales")

fig.show()
Original (pale) vs. Denoised (dark) sales
15
10
0 500 1000 1500 2000
100
50
0 500 1000 1500 2000
30
25
20
15
10
−5
0 500 1000 1500 2000
Seeing a small part of sample 1 alone

In [10]:
tx_1 = sales_train_val.loc[sales_train_val['id'] == ids[0]].set_index('id')[d_cols].values[0][:90]

ty_w1 = denoise_signal(tx_1)
fig.add_trace(
go.Scatter(x=np.arange(len(tx_1)), mode='lines+markers', y=tx_1,
row=1, col=1
)
fig.add_trace(
go.Scatter(x=np.arange(len(tx_1)), y=ty_w1, mode='lines',
row=1, col=1
)
0 20 40 60 80
The below diagram illustrates these graphs side-by-side. Red graphs represent original sales and green graphs represent denoised sales.
In [11]:
fig.add_trace(
marker=dict(color="lightgreen"),name="Original signal 1"),row=1, col=1
)
fig.add_trace(
marker=dict(color="darkgreen"),name="Denoised signal 1"),row=1, col=2
)
fig.add_trace(
marker=dict(color="yellow"),name="Original signal 2"),row=2, col=1
)
fig.add_trace(
marker=dict(color="red"),name="Denoised signal 2"),row=2, col=2
)
fig.add_trace(
marker=dict(color="lightblue"),name="Original signal 3"), row=3, col=1
)
fig.add_trace(
marker=dict(color="darkblue"),name="Denoised signal 3"), row=3, col=2
)
fig.update_layout(height=600, width=800, title_text="Original (light) vs. Wavelet Denoised (dark) sales")

fig.show()
Original (light) vs. Wavelet Denoised (dark) sales
15 10 Original signal 1
Denoised signal 1
10
5 Original signal 2
5 Denoised signal 2
0
Original signal 3
0
Denoised signal 3
0 500 1000 1500 2000 0 500 1000 1500
100
100
50
50
0
0
0 500 1000 1500 2000 0 500 1000 1500
30
20
20
10
10
0
0
0 500 1000 1500 2000 0 500 1000 1500
Average smoothing
Average smooting is a relatively simple way to denoise time series data. In this method, we take a "window" with a fixed size (like 10). We first place the
window at the beginning of the time series (first ten elements) and calculate the mean of that section. We now move the window across the time series
in the forward direction by a particular "stride", calculate the mean of the new window and repeat the process, until we reach the end of the time series.
All the mean values we calculated are then concatenated into a new time series, which forms the denoised sales data.
In [12]:
# Functions for average smoothing
def average_smoothing(signal, kernel_size=3, stride=1):

sample = []
start = 0
end = kernel_size
while end <= len(signal):
start = start + stride
end = end + stride
sample.extend(np.ones(end - start)*np.mean(signal[start:end]))
return np.array(sample)
In [13]:
# Perform average smoothing on the first 3 samples alone
y_w1 = average_smoothing(x_1)
fig.add_trace(
marker=dict(color="lightgreen"),name="Original signal 1"),row=1, col=1
)
fig.add_trace(
marker=dict(color="darkgreen"),name="Denoised signal 1"),row=1, col=1
)
fig.add_trace(
marker=dict(color="yellow"),name="Original signal 2"),row=2, col=1
)
fig.add_trace(
marker=dict(color="orange"),name="Denoised signal 2"),row=2, col=1
)
fig.add_trace(
marker=dict(color="lightblue"),name="Original signal 3"), row=3, col=1
)
fig.add_trace(
marker=dict(color="darkblue"),name="Denoised signal 3"), row=3, col=1
)
fig.update_layout(height=900, width=800, title_text="Original (light) vs. Average Smoothing Denoised (dark) sales
")
fig.show()
Original (light) vs. Average Smoothing Denoised (dark) sales
15 Original signal 1
Denoised signal 1
Original signal 2
10 Denoised signal 2
Original signal 3
Denoised signal 3
5
0
0 500 1000 1500 2000
100
50
0
0 500 1000 1500 2000
30
20
10
0
0 500 1000 1500 2000
Seeing a small part alone of sample 1

In [14]:
tx_1 = sales_train_val.loc[sales_train_val['id'] == ids[0]].set_index('id')[d_cols].values[0][:90]

ty_w1 = average_smoothing(tx_1)
fig.add_trace(
go.Scatter(x=np.arange(len(tx_1)), mode='lines+markers', y=tx_1,
row=1, col=1
)
fig.add_trace(
go.Scatter(x=np.arange(len(tx_1)), y=ty_w1, mode='lines',
row=1, col=1
)
0 20 40 60 80
In the above graphs, the dark lineplots represent the denoised sales and the light lineplots represent the original sales. We can see that average
smoothing is not as effective as Wavelet denoising at finding macroscopic trends and pattersns in the data. A lot of the noise in the original sales
persists even after denoising. Therefore, wavelet denoising is clearly more effective at finding trends in the sales data. Nonetheless, average
smoothing or "rolling mean" can also be used to calculate useful features for modeling.
EDA
Rolling Average Price vs. Time for each store

In [15]:
past_sales = sales_train_val.set_index('id')[d_cols].T.merge(calendar.set_index('d')['date'],
left_index=True,right_index=True,validate='1:1').set_index('date')
store_list = list(selling_prices['store_id'].unique())
means = []
fig = go.Figure()
for s in store_list:
store_items = [c for c in past_sales.columns if s in c]
data = past_sales[store_items].sum(axis=1).rolling(90).mean()
means.append(np.mean(past_sales[store_items].sum(axis=1)))
fig.add_trace(go.Scatter(x=np.arange(len(data)), y=data, name=s))

fig.update_layout(yaxis_title="Sales", xaxis_title="Time", title="Rolling Average Sales vs. Time (per store)")
Rolling Average Sales vs. Time (per store)
7000
CA_1
CA_2
CA_3
6000
CA_4
TX_1
TX_2
5000
TX_3
WI_1
Sales
WI_2
4000 WI_3
3000
2000
0 500 1000 1500
Time
The above graphs corrspond to how an average retail store in the US performs over time. image src : imgur This insight was from mM5 starter data
exploration by Rob Mulan
In [16]:
# Bocplotting each store avg sales
fig = go.Figure()
for i, s in enumerate(store_list):
store_items = [c for c in past_sales.columns if s in c]
data = past_sales[store_items].sum(axis=1).rolling(90).mean()
fig.add_trace(go.Box(x=[s]*len(data), y=data, name=s))

fig.update_layout(yaxis_title="Sales", xaxis_title="Time", title="Rolling Average Sales vs. Store name ")
Rolling Average Sales vs. Store name
7000
CA_1
CA_2
CA_3
6000
CA_4
TX_1
TX_2
5000
TX_3
WI_1
Sales
WI_2
4000 WI_3
3000
2000
CA_1 CA_2 CA_3 CA_4 TX_1 TX_2 TX_3 WI_1 WI_2 WI_3
Time
In [17]:
# Avg sales by store
df = pd.DataFrame(np.transpose([means, store_list]))
df.columns = ["Mean sales", "Store name"]
px.bar(df, y="Mean sales", x="Store name", color="Store name", title="Avg sales vs. Store name")
Avg sales vs. Store name
6000 Store name

CA_1
CA_2
5000 CA_3
CA_4
TX_1
4000 TX_2
Mean sales
TX_3
WI_1
3000 WI_2
WI_3
2000
1000
0
CA_1 CA_2 CA_3 CA_4 TX_1 TX_2 TX_3 WI_1 WI_2 WI_3
Store name
Results from the EDA
CA_3 is the best performing store

CA_4 is the worst performing store
Wisconsin stores are all about the same in performance
WI_1 has exceeded usual improvement rates in the given time frame
TX_3 took a bad fall but recovered
Train Test Split

In [18]:
train_dataset = sales_train_val[d_cols[:1500]]
val_dataset = sales_train_val[d_cols[1500:]]
In [19]:
fig.add_trace(
go.Scatter(x=np.arange(1500), mode='lines', y=train_dataset.loc[0].values,
marker=dict(color="blue"), showlegend=False,
row=1, col=1
)
fig.add_trace(
go.Scatter(x=np.arange(1500, 1919), y=val_dataset.loc[0].values, mode='lines',
marker=dict(color="orange"), showlegend=False,
row=1, col=1
)
fig.add_trace(
marker=dict(color="blue"), showlegend=False),
row=2, col=1
)
fig.add_trace(
marker=dict(color="orange"), showlegend=False),
row=2, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Train (blue) vs. Validation (orange) sales")

fig.show()
Train (blue) vs. Validation (orange) sales
0
0 500 1000 1500
0
0 500 1000 1500
0
0 500 1000 1500
The above graph shows random samples being split into testa nd train regions represented by Blue and Orange respectively. The first 1600 days are
for training and the next 414 for testing
Modeling
We shall use 4 different methods to predict the future and compare them. They are
Naive Approach
Moving AVerage
Prophet - by Facebook
Naive Approach
Naive approach simply forecasts the next day's sales as the current day's sales. The model can be summarized as follows:
In the above equation, yt+1 is the predicted value for the next day's sales and yt is today's sales. The model predicts tomorrow's sales as today's sales.
Now let us see how this simple model performs on our dataset
In [20]:
predictions = []
for i in range(len(val_dataset.columns)):
if i == 0:
predictions.append(train_dataset[train_dataset.columns[-1]].values)
else:
predictions.append(val_dataset[val_dataset.columns[i-1]].values)

predictions = np.transpose(np.array([row.tolist() for row in predictions]))
error_naive = np.linalg.norm(predictions[:3] - val_dataset.values[:3])/len(predictions[0])
Lets take sample 0,69 and 99 to comapre in all methods

In [21]:
pred_0 = predictions[0]
fig.add_trace(
marker=dict(color="blue"),
name="Train"),
row=1, col=1
)
fig.add_trace(
marker=dict(color="orange"),
name="Val"),
row=1, col=1
)
fig.add_trace(
go.Scatter(x=np.arange(1500, 1919), y=pred_0, mode='lines',
marker=dict(color="green"),
name="Pred"),
row=1, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
marker=dict(color="green"), showlegend=False,
row=2, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Naive approach")

fig.show()
Naive approach
5 Train
Val
Pred
4
0
0 500 1000 1500
0
0 500 1000 1500
1.5
0.5
0
0 500 1000 1500
In the above graphs, the predictions and test values are too close to make out properly, so giving a better view of them alone by just showing a part of
them, say days 1500-1600
In [22]:
fig.add_trace(
marker=dict(color="darkorange"),
name="Val"),
row=1, col=1
)
fig.add_trace(
marker=dict(color="seagreen"),
name="Pred"),
row=1, col=1
)
fig.add_trace(
marker=dict(color="darkorange"), showlegend=False),
row=2, col=1
)
fig.add_trace(
marker=dict(color="seagreen"), showlegend=False,
row=2, col=1
)
fig.add_trace(
marker=dict(color="darkorange"), showlegend=False),
row=3, col=1
)
fig.add_trace(
marker=dict(color="seagreen"), showlegend=False,
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Naive approach Close up")

fig.show()
Naive approach Close up
2 Val
Pred
1.5
0.5
0
1500 1520 1540 1560 1580
0
1500 1520 1540 1560 1580
1.5
0.5
0
1500 1520 1540 1560 1580
We create a smaller dataset because the following methids are computationaly expensive. We just take the last 100 days and consider the first 70 for
training and the last 30 for validation
In [23]:
train_dataset = sales_train_val[d_cols[-100:-30]]
val_dataset = sales_train_val[d_cols[-30:]]
fig.add_trace(
marker=dict(color="blue"), showlegend=False,
row=1, col=1
)
fig.add_trace(
marker=dict(color="orange"), showlegend=False,
row=1, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Train (blue) vs. Validation (orange) sales")

fig.show()
Train (blue) vs. Validation (orange) sales
0
0 20 40 60 80
0.8
0.6
0.4
0.2
0
0 20 40 60 80
0
0 20 40 60 80
Moving Average
Moving average calculates the mean sales over the previous 30 (or any other number) days and forecasts that as the next day's sales. This method
takes the previous 30 timesteps into consideration, and is therefore less prone to short term fluctuations than the naive approach. The model can be
summarized as follows:
In [24]:
predictions = []
for i in range(len(val_dataset.columns)):
if i == 0:
predictions.append(np.mean(train_dataset[train_dataset.columns[-30:]].values, axis=1))
if i < 31 and i > 0:
predictions.append(0.5 * (np.mean(train_dataset[train_dataset.columns[-30+i:]].values, axis=1) + \
np.mean(predictions[:i], axis=0)))
if i > 31:
predictions.append(np.mean([predictions[:i]], axis=1))

predictions = np.transpose(np.array([row.tolist() for row in predictions]))
error_avg = np.linalg.norm(predictions[:3] - val_dataset.values[:3])/len(predictions[0])
In [25]:
fig.add_trace(
name="Train"),
row=1, col=1
)
fig.add_trace(
name="Val"),
row=1, col=1
)
fig.add_trace(
name="Pred"),
row=1, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Moving Average Approach")

fig.show()
Moving Average Approach
4 Train
Val
Pred
3
0
0 20 40 60 80
0
0 20 40 60 80
1.5
0.5
0
0 20 40 60 80
We can see that this model performs better than the naive approach. It is less susceptible to the volatility in day-to-day sales data and manages to pick
up trends with slightly higher accuracy. However, it is still unable to find high-level trends in the sales.
The exponential smoothing method uses a different type of smoothing which differs from average smoothing. The previous time steps are
exponentially weighted and added up to generate the forecast. The weights decay as we move further backwards in time. The model can be
summarized as follows:
In the above equations, α is the smoothing parameter. The forecast yt+1 is a weighted average of all the observations in the series y1, … ,yt. The rate at
which the weights decay is controlled by the parameter α. This method gives different weightage to different time steps, instead of giving the same
weightage to all time steps (like the moving average method). This ensures that recent sales data is given more importance than old sales data while
making the forecast.
In [26]:
predictions = []
for row in tqdm(train_dataset[train_dataset.columns[-30:]].values[:100]):
fit = ExponentialSmoothing(row, seasonal_periods=3).fit()
predictions.append(fit.forecast(30))
predictions = np.array(predictions).reshape((-1, 30))
error_exponential = np.linalg.norm(predictions[:3] - val_dataset.values[:3])/len(predictions[0])
In [27]:
fig.add_trace(
name="Train"),
row=1, col=1
)
fig.add_trace(
name="Val"),
row=1, col=1
)
fig.add_trace(
name="Pred"),
row=1, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Exponential Smoothing Approach")

fig.show()
Exponential Smoothing Approach
4 Train
Val
Pred
3
0
0 20 40 60 80
0
0 20 40 60 80
1.5
0.5
0
0 20 40 60 80
We can see that exponential smoothing is generating a horizontal line every time. This is because it gives very low weightage to faraway time steps,
causing the predictions to flatten out or remain constant. However, it is able to predict the mean sales with excellent accuracy.
Facebook Prophet
Prophet is an opensource time series forecasting project by Facebook. It is based on an additive model where non-linear trends are fit with yearly,
weekly, and daily seasonality, including holiday effects. It works best with time series that have strong seasonal effects and several seasons of
historical data. It is also supposed to be more robust to missing data and shifts in trend compared to other models.
In [28]:
dates = ["2007-12-" + str(i) for i in range(1, 31)]

predictions = []
for row in tqdm(train_dataset[train_dataset.columns[-30:]].values[:100]):
df = pd.DataFrame(np.transpose([dates, row]))
df.columns = ["ds", "y"]
model = Prophet(daily_seasonality=True)
model.fit(df)
future = model.make_future_dataframe(periods=30)
forecast = model.predict(future)["yhat"].loc[30:].values
predictions.append(forecast)
predictions = np.array(predictions).reshape((-1, 30))
error_prophet = np.linalg.norm(predictions[:3] - val_dataset.values[:3])/len(predictions[0])
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override th

is.
INFO:fbprophet:n_changepoints greater than number of observations. Using 23.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
is.
In [29]:
fig.add_trace(
name="Train"),
row=1, col=1
)
fig.add_trace(
name="Val"),
row=1, col=1
)
fig.add_trace(
name="Pred"),
row=1, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=2, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.add_trace(
row=3, col=1
)
fig.update_layout(height=1200, width=800, title_text="Prophet Approach")

fig.show()
Prophet Approach
4 Train
Val
Pred
3
0
0 20 40 60 80
0
0 20 40 60 80
1.5
0.5
−0.5
0 20 40 60 80
Prophet is able to find low-level and high-level trends simultaneously, unlike most other models which can only find one of these. It is able to predict a
periodic function for each sample, and these functions seem to be pretty accurate .
On closer observation, we can see that the there is a macroscopic upward trend in samples 1 and 2 and a downward one in sample 3 which show the
improvemnt or fall over time.
This is a major use of FB prophet, i.e, to determine where a business is going in the future.
Submission CSV
We shall use FB Prophet for submissions. The answers have been compiled into a submissions.csv
In [30]:
days = range(1, 1913 + 1)

time_series_columns = [f'd_{i}' for i in days]
time_series_data = sales_train_val[time_series_columns]
forecast = pd.DataFrame(time_series_data.iloc[:, -28:].mean(axis=1))
forecast = pd.concat([forecast] * 28, axis=1)
forecast.columns = [f'F{i}' for i in range(1, forecast.shape[1] + 1)]
validation_ids = sales_train_val['id'].values
evaluation_ids = [i.replace('validation', 'evaluation') for i in validation_ids]
ids = np.concatenate([validation_ids, evaluation_ids])
predictions = pd.DataFrame(ids, columns=['id'])
forecast = pd.concat([forecast] * 2).reset_index(drop=True)
predictions = pd.concat([predictions, forecast], axis=1)
predictions.to_csv('submission.csv', index=False)
In [31]:
print(predictions.head())
id F1 F2 F3 F4 \
0 HOBBIES_1_001_CA_1_validation 0.964286 0.964286 0.964286 0.964286
F5 F6 F7 F8 F9 ... F19 F20 \

0 0.964286 0.964286 0.964286 0.964286 0.964286 ... 0.964286 0.964286
1 0.071429 0.071429 0.071429 0.071429 0.071429 ... 0.071429 0.071429
2 0.571429 0.571429 0.571429 0.571429 0.571429 ... 0.571429 0.571429
3 1.821429 1.821429 1.821429 1.821429 1.821429 ... 1.821429 1.821429
4 1.357143 1.357143 1.357143 1.357143 1.357143 ... 1.357143 1.357143
F21 F22 F23 F24 F25 F26 F27 \

0 0.964286 0.964286 0.964286 0.964286 0.964286 0.964286 0.964286
1 0.071429 0.071429 0.071429 0.071429 0.071429 0.071429 0.071429
2 0.571429 0.571429 0.571429 0.571429 0.571429 0.571429 0.571429
3 1.821429 1.821429 1.821429 1.821429 1.821429 1.821429 1.821429
4 1.357143 1.357143 1.357143 1.357143 1.357143 1.357143 1.357143
F28
0 0.964286
1 0.071429
2 0.571429
3 1.821429
4 1.357143
Loss for each model

In [32]:
error = [error_naive, error_avg, error_exponential, error_prophet]

names = ["Naive approach", "Moving average", "Exponential smoothing", "Prophet"]
df = pd.DataFrame(np.transpose([error, names]))
df.columns = ["RMSE Loss", "Model"]
px.bar(df, y="RMSE Loss", x="Model", color="Model", title="RMSE Loss vs. Model")
RMSE Loss vs. Model
0.3 Model
Naive approach
Moving average
0.25 Exponential smoothing
Prophet
0.2
RMSE Loss
0.15
0.1
0.05
0
Na Mo Ex Pr
ive vin po op
ap g a ne he
pr ve nti t
oa ra al
c ge sm
h oo
thi
ng
Model
From the above graph, we can see that Naive is the best-scoring model. Prophet is the worst-scoring models. I believe that the Prophet can be
boosted significantly by tuning the hyperparameters.
Naive may not work out for other samples due to it being a very basic approach
Moving Average and exponential smoothing approaches are pretty similar
Prophet approach seems to be the worst scoring but I believe that is we trained it over all 1919 days instead of just a 100, it could very well perform
better
Conclusions
Different states have different mean and variance of sales, indicating differences in the distribution of development in these states.
Most sales have a linearly trended sine wave shape, reminiscent of the macroeconomic business cycle.
Several non-ML models can be used to forecast time series data. Moving average and exponential smoothing are very good models.
Prophet's performance can be boosted with more hyperparamter tuning.

ML Assignment Presentation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Assignment Presentation

Uploaded by

Copyright:

Available Formats

ML

Data Exploration and Preparation

Denoising - removing noise

Exploratory Data Analytics (EDA)

Train Test Split

Loss comparisions for models

import seaborn as sns

Data Exploration and Preparation

date wm_yr_wk weekday wday month year d event_name_1 \

event_type_1 event_name_2 event_type_2 snap_CA snap_TX snap_WI

store_id item_id wm_yr_wk sell_price

id item_id dept_id cat_id store_id \

d_1909 d_1910 d_1911 d_1912 d_1913

[5 rows x 1919 columns]

fig = make_subplots(rows=2, cols=3)

fig.update_layout(height=500, width=800, title_text="Sample sales")

we shall see 2 denoising methods, they are

def maddest(d, axis=None):

def denoise_signal(x, wavelet='db4', level=1):

uthresh = sigma * np.sqrt(2*np.log(len(x)))

return pywt.waverec(coeff, wavelet, mode='per')

fig = make_subplots(rows=3, cols=1)

fig.update_layout(height=1200, width=800, title_text="Original (pale) vs. Denoised (dark) sales")

0 500 1000 1500 2000

0 500 1000 1500 2000

0 500 1000 1500 2000

Seeing a small part of sample 1 alone

tx_1 = sales_train_val.loc[sales_train_val['id'] == ids[0]].set_index('id')[d_cols].values[0][:90]

fig = make_subplots(rows=3, cols=2)

fig.update_layout(height=600, width=800, title_text="Original (light) vs. Wavelet Denoised (dark) sales")

Original (light) vs. Wavelet Denoised (dark) sales

# Functions for average smoothing

def average_smoothing(signal, kernel_size=3, stride=1):

# Perform average smoothing on the first 3 samples alone

fig = make_subplots(rows=3, cols=1)

Seeing a small part alone of sample 1

tx_1 = sales_train_val.loc[sales_train_val['id'] == ids[0]].set_index('id')[d_cols].values[0][:90]

Rolling Average Price vs. Time for each store

Rolling Average Sales vs. Time (per store)

0 500 1000 1500

# Bocplotting each store avg sales

Rolling Average Sales vs. Store name

# Avg sales by store

Avg sales vs. Store name

6000 Store name

Results from the EDA

CA_3 is the best performing store

Train Test Split

fig = make_subplots(rows=3, cols=1)

fig.update_layout(height=1200, width=800, title_text="Train (blue) vs. Validation (orange) sales")

Lets take sample 0,69 and 99 to comapre in all methods

fig = make_subplots(rows=3, cols=1)

fig.update_layout(height=1200, width=800, title_text="Naive approach")

fig = make_subplots(rows=3, cols=1)

fig.update_layout(height=1200, width=800, title_text="Naive approach Close up")

fig = make_subplots(rows=3, cols=1)

fig.update_layout(height=1200, width=800, title_text="Train (blue) vs. Validation (orange) sales")

fig = make_subplots(rows=3, cols=1)

fig.update_layout(height=1200, width=800, title_text="Moving Average Approach")

fig = make_subplots(rows=3, cols=1)

fig.update_layout(height=1200, width=800, title_text="Exponential Smoothing Approach")

dates = ["2007-12-" + str(i) for i in range(1, 31)]