Professional Documents
Culture Documents
Improving Neural Networks For Time-Series Forecasting Using Data Augmentation and Automl
Improving Neural Networks For Time-Series Forecasting Using Data Augmentation and Automl
Email: tmiller@kennesaw.edu
Abstract—Statistical methods such as the Box-Jenkins method competitions [2], [3] and have been competitive and, in some
for time-series forecasting have been prominent since their cases, better than the traditional statistical models.
development in 1970. Many researchers rely on such models as A major challenge with the neural network-based models
they can be efficiently estimated and also provide interpretability.
However, advances in machine learning research indicate that arises when we are looking at scenarios where less training
neural networks can be powerful data modeling techniques, data is available such that the networks are not fully able to
as they can give higher accuracy for a plethora of learning learn the information from the limited number of samples.
problems and datasets. In the past, they have been tried on Short and intermediate length time-series forecasting is one
time-series forecasting as well, but their overall results have not of such areas where neural networks struggle to compete with
been significantly better than the statistical models especially for
intermediate length times series data. Their modeling capacities other models. However, the performance of neural networks
are limited in cases where enough data may not be available to increases significantly as we get to longer length time-series
estimate the large number of parameters that these non-linear as indicated in multiple studies, e.g., [4], [5], [6].
models require. This paper presents an easy to implement data This paper focuses on intermediate length time-series data,
augmentation method to significantly improve the performance those with a length comparable to the M4 competition average
of such networks. Our method, Augmented-Neural-Network,
which involves using forecasts from statistical models, can help of 1035 data points, as neural networks often dominate for
unlock the power of neural networks on intermediate length longer time-series and tend to be inferior for shorter time-
time-series and produces competitive results. It shows that data series. Several open datasets that record daily records concern-
augmentation, when paired with Automated Machine Learning ing the COVID-19 Pandemic fall into this intermediate length
techniques such as Neural Architecture Search, can help to find category. In addition, the highly dynamic nature of pandemic
the best neural architecture for a given time-series. Using the
combination of these, demonstrates significant enhancement for data makes it challenging for time-series forecasting. Finally,
two configurations of our technique for a COVID-19 dataset, the importance of having accurate forecasts for the pandemic
improving forecasting accuracy by 19.90% and 11.43%, respec- cannot be overestimated.
tively, over the neural networks that do not use augmented data. The paper provides a comparative analysis between tra-
Index Terms—statistical models, time-series forecasting, neural ditional statistical models and neural network models. In
networks, data augmentation, AutoML, COVID-19
particular, it shows that the performance of regular neural
networks is competitive but not better than statistical models.
However, we show that data augmentation can significantly
I. I NTRODUCTION
improve the quality of out-of-sample forecasts making them
In recent years, neural networks have been used extensively better than the traditional models.
in time-series forecasting to capture the nonlinear information We also focus on building the neural network architecture
that traditional linear statistical models cannot fully capture, through Automated Machine Learning (AutoML) by using
and where nonlinear regression models may be challenging to Neural Architecture Search (NAS) to efficiently search for the
develop. Earlier models also tried combining neural networks best performing model architecture for the given learning task
with traditional models where linear components of the time- and dataset. This combined with the data augmentation gives
series data were captured using the statistical models and the neural networks large performance boosts that help them
the neural networks were then trained to capture the leftover to perform better than the statistical models.
non-linearities. While neural networks were not competitive Much of the earlier work in time-series forecasting using
in many earlier time-series forecasting competitions, an auto- neural networks specifies the use of time-series features from
mated neural network model was judged the 9th best model in the statistical model forecasts as inputs for the neural networks.
the M3 time-series forecasting competition [1] held in 2000. This includes the top-performing models in M4 competition
More recently, several neural network models have been a part such as the hybrid method of exponential smoothing and
of the M4 in 2018 and M5 in 2020 time-series forecasting recurrent neural networks [7] and the FFORMA [8]. Although
2
these methods do not aim to increase the sample size of of the disease, and they produce state-of-the-art results even
the training data made available to the neural networks, they in the presence of many hidden variables.
aid in providing more information to the neural networks The rest of this paper is organized as follows: Section 2 fo-
helping them to better model the non-linear patterns within cuses on the COVID-19 datasets. Related work is reviewed in
the data and thus, improving the generalization performance Section 3. Section 4 describes the epidemiological dynamics.
of these models. The data augmentation approach that we use Section 5 and 6 discuss the statistical and machine learning
interleaves the forecasts collected from a statistical model to models. The methodology used for data augmentation, Au-
build a time-series that is double the length of the original toML, and rolling validation is discussed in Section 7. Finally,
time-series providing more samples for the neural networks to results are provided in Section 8, and conclusions and future
train on. work are given in Section 9.
Producing accurate forecasts for the COVID-19 Pandemic
has been challenging due to the length of the time-series data II. COVID-19 DATASETS
that are available. Only now is a year’s worth of data available Out of numerous data sources available for tracking the
for the United States. This paper examines the COVID- COVID-19 pandemic, here in the US, we have two pri-
19 modeling studies that have been conducted and applies mary data sources which collect the information from the
common modeling techniques from two main categories: (1) public health authorities throughout the country. First, The
statistical models, and (2) machine learning models. The first Johns Hopkins Coronavirus Resource Center (CRC) is a
category of models represents a robust and relatively easy to continuously updated source of COVID-19 data have their
apply set of models that should be used in modeling studies data repository stored at https://github.com/CSSEGISandData/
(at least as a point of comparison). The second category, that COVID-19. Secondly, The COVID Tracking Project is a
of machine learning, supports more intense pattern matching volunteer organization collecting COVID-19 data used by
that has the potential to produce more accurate forecasts. The multiple organizations and research groups.
disadvantages include the necessity for more data, tuning of This paper utilizes the national COVID-19 dataset com-
hyper-parameters, architecture search, and interpretability. posed of data collected and stored at https://covidtracking.com/
Modeling studies are important for several reasons. Fore- data. The dataset consists of COVID-19 case data for testing,
casting is important, so people and their governments know hospitalization, and patient outcomes from all over the United
what to expect in the next few weeks. The COVID-19 Pan- States.
demic is unique in the sense that the last pandemic with this
(or more) significance was the 1918 H1N1 Influenza Pandemic A. COVID-19 Dataset - United States
[9]. The S CALAT ION COVID-19 dataset https://github.
Modeling and forecasting studies for COVID-19 are com/scalation/data/tree/master/COVID contains data about
grouped into four major approaches: a) SEIR/SIR models, b) COVID-19 cases, hospitalizations, deaths and several more
Simulation/Agent-based models, c) Curve-fitting models, and columns in the United States. This dataset is in a CSV file
d) Predictive models. with 391 rows and 18 columns.
SEIR/SIR [10] models are common epidemiological model- This study focuses on the incident deaths or daily death
ing techniques that predict the pandemic course through the increase column using it as a time-series for the neural
system of equations that divide an estimated population into networks and statistical models.
different compartments: susceptible, exposed, infected, and The time-series includes daily death counts in the United
recovered. A set of mathematical rules govern the equations States starting from January 13, 2020 to February 6, 2021. We
as to how the assumptions about the disease process, Non- eliminate the first 44 day records and feed our models time-
Pharmaceutical Interventions, social mixing, and public vac- series data starting February 26, 2020 when the first deaths
cinations would affect the population. due to COVID-19 in the United States were recorded. After
Simulation/Agent-based [11] models simulate a community eliminating the first 44 days, the new size of the time-series
where individuals represent “agents” in that community. The is 347 days. Ending the dataset on February 6, 2021 has the
pandemic course spreads among the agents based upon the modeling advantage that vaccination effects may be ignored,
virus assumptions and rules about social mixing, governmental but after this time, models should take vaccinations under
policy, and other behavioral patterns in the studied community. consideration. Unfortunately, there will initially be limited data
Curve-fitting [12] models refer to fitting a mathematical available on vaccinations.
model by considering the current state and estimate the rela-
tionship between features of the pandemic. The future of the B. Data Preprocessing
pandemic course can be approximated fully from assumptions Data is preprocessed for smoothing the outliers that lie 3.5
about the contagion, public health policies, or other limitations standard deviations away from the rolling mean. The rolling
and uncertainties of spread disease parameters and/or experi- mean is the 6 time points local average such that we consider
ences in other communities. 3 time-points before and after the current point. Observations
Predictive models play a key role in modern pandemic predic- within 3.5 standard deviations are left unchanged while the
tion. Scientists apply different statistical and machine learning observations which are considered as outliers are replaced by
algorithms for modeling the future of morbidity and mortality the Kalman Smoothing using a Local Level Model [13].
3
C. Optimum Lag Selection Exponential Smoothing (statistical model) with Long Short-
In time-series modeling, it is common to use lagged features Term Memory (LSTM) networks to give a single hybrid
as inputs to the models. A lag value is the target value from and powerful forecasting method that was the winner of
any of its previous values. For example, the k th lag value of yt the M4 competition, producing forecasts which were nearly
will be yt−k . We need to define models such that they look for 10% better (in sMAPE) than the combination benchmarks
p previous lags while estimating their parameters and would considered in the competition. The second most accurate
also require p valued vector sequence to make forecasts once method, FFORMA [8], also involved a combination of 7
the parameters are estimated from the training data. statistical methods and 1 ML method, and the weights for
The choice of p needs to be determined before taking any averaging/combining such methods also used an ML algo-
model under consideration. Depending on the characteristics rithm. This encouraged the time-series research community to
and dynamics of the model as well as the time-series, we develop techniques that use a combination of machine learning
choose a particular value for p. The choice of p largely impacts models with time-series features as indicated in [2]. In light of
the performance of the models on forecasts, as it can control these previous works on intermediate length time-series, we
the modeling capabilities of the model. We determine the explore the performance of neural networks, compare them
optimal number of lags to be in the range of 18 to 21 using with statistical benchmarks and empirically show that using
the Auto-correlation function (ACF), as shown in Figure 1. data augmentation can help boost the performance of such
neural network models.
Beyond Fully Connected Neural Networks, many Recurrent
Neural Network (RNN) based frameworks have been proposed
for intermediate length time-series forecasting as indicated
in [16], [17]. More recent studies such as [18], show the
application of RNNs on several short and intermediate length
time-series where the results are better than statistical models
for some datasets while the RNNs performing worse than
ARIMA and ETS models on diverse datasets such as the M4,
which consists of 48,000 time-series.
Convolutional Neural Networks (CNN) have also been
referred to in [19] which uses the dilated convolutions that can
provide broader access to historical values. They claimed that
CNN performs on par with the recurrent-type network, whilst
requiring significantly fewer parameters and computation to
achieve comparable performance.
With the advent of “Attention Mechanisms” in recent years,
Fig. 1: ACF Plot of time-series these techniques allow the network to directly concentrate on
important time points in the past. Application of such networks
in longer length time-series shows improvements over RNN-
III. R ELATED W ORK based competitors which has been reflected in [20], [21],
In time-series forecasting, neural networks have been com- [22]. These type of networks often employs an encoder to
peting with many statistical models for many years. The M3 summarize historical information and a decoder to integrate
competition in the year 2000, saw a single neural network ar- them with known future values [22], [20]. However, there has
chitecture being compared with the other 23 statistical models been little evidence for the performance of such networks on
[1]. The Automated ANN model presented in this competition shorter and intermediate length time-series data.
[14] first demonstrated the potential of using an automated Although the advanced neural network architectures have
procedure for the selection of neural network architecture achieved state-of-the-art in image and text domains, a limited
paving way for AutoML techniques such as NAS to be used number of studies are available around intermediate time-
in the context of time-series forecasting. However, in the series analysis because that such techniques require a plethora
case of shorter length time-series the automatic architecture of input data for training. Our work adds to this growing
selection, even though competitive, was not better than the body of machine learning research with a novel approach to
other statistical models considered. framing, evaluating, and comparing intermediate time-series
In the continuation of previous competitions the M4 time- forecasts for the COVID-19 prediction task, and in comparison
series competition, which consisted of many short and inter- with other powerful techniques, we showed that our new
mediate length time-series, saw a greater number of neural proposed technique (AUG-NN) can indeed outperform other
network-based models some of which were able to perform competitors.
better than statistical models. One of the major findings
of this competition included the “hybrid” approach which
included both statistical and machine learning features [2], A. Modeling Studies for COVID-19
[15]. The hybrid method by Slawek Smyl [7] used a Dynamic Several papers include forecasts for COVID-19 deaths, pos-
Computational Graph Neural Network system to combine an itive cases, and hospitalizations. The authors in [23] forecasted
4
the parameters of ARIMA can be dynamically calibrated for the time-series is still shorter than ideal for training neural
different horizons. Additionally, we expect to improve the networks. Therefore, in this paper, we only consider some
prediction accuracy by adding the Seasonality component in simpler network architectures, rely on AutoML to find better
ARIMA, as Figure 1 testifies such effects exist in the data models, and utilize data augmentation to compensate.
at every 7 lags. We are interested in making comparisons A simple type of Neural Network generalizes an AR(p)
between our proposed approach and current statistical methods model. We use a three-layer (one hidden) neural network,
through running multiple experiments and showing that the where the input layer has a node for each of the p lags and
AUG-NN technique can indeed produce the best results. A the current time t. The hidden layer has nh nodes while the
brief overview of the statistical methods mentioned above are output layer has no .
provided in the following:
1) The Random Walk (RW) model states that the future
value yt is the previous value yt−1 disturbed by white xt = [yt−p , . . . , yt−1 , t]
noise t ∈ N (0, Iσ 2 ) and the forecast is just yt−1 . yt = W2t f (W1t xt + b1 ) + b2 + t
changing the auto-regressive non-seasonal order (p) of such Fig. 3: Out-Of-Sample Forecasts (h = 1)
models. The (p) in Table I indicates the auto-regressive non-
seasonal order for the model with lowest sMAPE for the given Figure 3 shows the true values (in blue) vs. the horizon 1
horizon. out-of-sample forecasts (in orange) of the AUG-NN model.
For AR(p) model, we evaluate p = 1 through p = 14. The forecasts in the initial days closely follow the true death
Similarly, for ARIMA(p, 1, 0) model, we use a fixed single count and is only off in the later days where fluctuations and
difference with no MA component for all horizons, whereas p trend is higher.
goes from 1 through 14. The SARIMA(p, 0, 0)× (3, 1, 1)7 uses
no differencing and no MA component, whereas it includes TABLE III: Improvement Due to Augmentation
P = 3 and Q = 1 seasonal components with 1 seasonal Horizon SARIMA NN AUG-NN Improvement (%)
differencing component and a seasonal period of 7 days. h=1 13.50 19.39 13.06 32.65
The highlighted values in each row indicate the best per- h=2 15.10 20.74 14.59 29.65
h=3 15.50 16.86 15.00 11.03
forming model for the given horizon having the lowest sMAPE h=4 15.80 19.34 14.22 26.47
score amongst all the statistical models. The SARIMA model h=5 16.60 22.62 15.96 29.44
gives the best in-sample fit and performs the best for most h=6 16.30 20.16 15.70 22.12
horizons. The ARIMA model is able to be competitive on h=7 16.50 22.40 16.64 25.71
h=8 18.80 23.38 18.16 22.33
horizons 5, 6, and 7. h=9 20.70 21.85 18.86 13.68
h = 10 20.70 22.92 19.81 13.57
TABLE II: Multi-Horizon (h) Rolling Forecasts: Neural Network h = 11 20.80 24.71 18.92 23.43
models (sMAPE). h = 12 21.30 22.81 19.76 13.37
Horizon NN AUG-NN ConvLSTM AUG-ConvLSTM h = 13 21.40 21.24 20.34 4.24
in-sample 11.52 11.04 14.37 10.49 h = 14 21.30 23.44 20.87 10.96
h=1 19.39 13.06 18.15 14.07
h=2 20.74 14.59 18.48 15.96
h=3 16.86 15.00 19.31 16.99 Table III shows the improvement in AUG-NN over the
h=4 19.34 14.22 20.28 18.08 neural network trained on the original time-series. In terms
h=5 22.62 15.96 19.55 18.79
h=6 20.16 15.70 21.13 18.01 of improvement due to augmentation, on average we see a
h=7 22.40 16.64 21.30 20.49 19.90% improvement across all horizons with a maximum
h=8 23.38 18.16 25.28 21.72 improvement of 32.65% seen at horizon 1 forecasts. While the
h=9 21.85 18.86 26.52 22.72
h = 10 22.92 19.81 27.89 23.56
NN model does not perform as well as the SARIMA model,
h = 11 24.71 18.92 28.47 26.24 the AUG-NN model substantially outperforms the SARIMA
h = 12 22.81 19.76 27.95 26.99 model. Also, the ConvLSTM model saw an improvement of
h = 13 21.24 20.34 29.97 25.58 11.43% on average across all horizons.
h = 14 23.44 20.87 31.54 28.78
However, with the data augmentation paired with AutoML [19] A. Borovykh, S. Bohte, and C. W. Oosterlee, “Conditional time series
to search for the optimal neural network architecture, the forecasting with convolutional neural networks,” arXiv preprint, 2017.
[20] C. Fan, Y. Zhang, Y. Pan, X. Li, C. Zhang, R. Yuan, D. Wu, W. Wang,
performance of such neural networks is significantly boosted J. Pei, and H. Huang, “Multi-horizon time series forecasting with
making them better than the best performing statistical models. temporal attention learning,” in Proceedings of the 25th ACM SIGKDD
We also show that not just with the feed forward networks, International Conference on Knowledge Discovery & Data Mining,
2019.
this performance improvement is also possible on other deep [21] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, and X. Yan, “En-
learning models such as the ConvLSTM model. hancing the locality and breaking the memory bottleneck of transformer
When compared with the methods using hybrid approaches on time series forecasting,” arXiv preprint, 2019.
[22] B. Lim, S. O. Arik, N. Loeff, and T. Pfister, “Temporal fusion trans-
described in our related work, our method is much simpler to formers for interpretable multi-horizon time series forecasting,” arXiv
implement and offers relatively larger performance improve- preprint, 2019.
ments. The simplicity of our data augmentation method makes [23] N. Kumar and S. Susan, “Covid-19 pandemic prediction using time
series forecasting models,” in 2020 11th International Conference on
it an easy yet powerful technique to improve the performance Computing, Communication and Networking Technologies (ICCCNT).
of neural networks on time-series data. IEEE, 2020, pp. 1–7.
In the context of forecasting on intermediate time-series [24] S. J. Taylor and B. Letham, “Forecasting at scale,” The American
Statistician, 2018.
forecasting, our work serves as a baseline to improve the per- [25] O.-D. Ilie, R.-O. Cojocariu, A. Ciobica, S.-I. Timofte, I. Mavroudis, and
formance of neural networks and encourages further research B. Doroftei, “Forecasting the spreading of covid-19 across nine countries
work on this study to incorporate additional deep learning from europe, asia, and the american continents using the arima models,”
Microorganisms, 2020.
architectures along with the epidemiological models such as [26] S. Prasanth, U. Singh, A. Kumar, V. A. Tikkiwal, and P. H. Chong,
SEIR/SIR model for more in-depth analysis and comparison. “Forecasting spread of covid-19 using google trends: A hybrid gwo-
deep learning approach,” Chaos, Solitons & Fractals, 2020.
[27] W. O. Kermack and A. G. McKendrick, “A contribution to the mathemat-
R EFERENCES ical theory of epidemics,” Proceedings of the royal society of London.
Series A, 1927.
[1] S. Makridakis and M. Hibon, “The m3-competition: results, conclusions [28] W. Goffman and V. Newill, “Generalization of epidemic theory,” Nature.
and implications,” 2000. [29] R. M. Anderson, “Discussion: the kermack-mckendrick epidemic thresh-
[2] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “The m4 com- old theorem,” Bulletin of mathematical biology, 1991.
petition: Results, findings, conclusion and way forward,” International [30] M. Toutiaee, S. Amirian, J. A. Miller, and S. Li, “Stereotype-free
Journal of Forecasting, 2018. classification of fictitious faces,” arXiv preprint arXiv:2005.02157, 2020.
[3] ——, “The m5 accuracy competition: Results, findings and conclusions,” [31] M. Toutiaee, A. Keshavarzi, A. Farahani, and J. A. Miller, “Video
Int J Forecast, 2020. contents understanding using deep neural networks,” arXiv preprint
[4] H. Peng, S. U. Bobade, M. E. Cotterell, and J. A. Miller, “Forecasting arXiv:2004.13959, 2020.
traffic flow: Short term, long term, and when it rains,” in International [32] M. Toutiaee, “Occupancy detection in room using sensor data,” arXiv
Conference on Big Data. Springer, 2018. preprint arXiv:2101.03616, 2021.
[5] H. Peng, N. Klepp, M. Toutiaee, I. B. Arpinar, and J. A. Miller, [33] M. Toutiaee and J. Miller, “Gaussian function on response surface
“Knowledge and situation-aware vehicle traffic forecasting,” in 2019 estimation,” arXiv preprint, 2021.
IEEE International Conference on Big Data (Big Data). IEEE, 2019. [34] M. Toutiaee, “Olsh: Occurrence-based locality sensitive hashing.”
[6] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, [35] H. W. Hethcote and P. Van den Driessche, “Some epidemiological
“A dual-stage attention-based recurrent neural network for time series models with nonlinear incidence,” Journal of Mathematical Biology.
prediction,” arXiv preprint, 2017. [36] J. A. Miller, “Introduction to computational data science using scala-
[7] S. Smyl, “A hybrid method of exponential smoothing and recurrent tion,” 2020.
neural networks for time series forecasting,” IJF. [37] S. B. Taieb, R. J. Hyndman et al., Recursive and direct multi-step
[8] P. Montero-Manso, G. Athanasopoulos, R. J. Hyndman, and T. S. Ta- forecasting: the best of both worlds. Citeseer, 2012.
lagala, “Fforma: Feature-based forecast model averaging,” International [38] T. Edwards, D. Tansley, R. Frank, and N. Davey, “Traffic trends analysis
Journal of Forecasting, 2020. using neural networks,” in Proceedings of the International Workshop
[9] D. M. Morens, J. K. Taubenberger, H. A. Harvey, and M. J. Memoli, on Applications of Neural Networks to Telecommunications, 1997.
“The 1918 influenza pandemic: lessons for 2009 and the future,” Critical [39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
care medicine, 2010. arXiv preprint arXiv:1412.6980, 2014.
[10] L. López and X. Rodo, “A modified seir model to predict the covid-19 [40] X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c.
outbreak in spain and italy: simulating control scenarios and multi-scale Woo, “Convolutional lstm network: A machine learning approach for
epidemics,” Available at SSRN 3576802, 2020. precipitation nowcasting,” Advances in neural information processing
[11] M. S. Shamil, F. Farheen, N. Ibtehaz, I. M. Khan, and M. S. Rahman, systems, 2015.
“An agent based modeling of covid-19: Validation, analysis, and recom- [41] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning
mendations,” medRxiv, 2020. with neural networks,” arXiv preprint, 2014.
[12] Z. Zhao, K. Nehil-Puleo, and Y. Zhao, “How well can we forecast the [42] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “The m4 compe-
covid-19 pandemic with curve fitting and recurrent neural networks?” tition: 100,000 time series and 61 forecasting methods,” International
[13] J. J. Commandeur and S. J. Koopman, An introduction to state space Journal of Forecasting, 2020.
time series analysis. Oxford University Press, 2007. [43] X. He, K. Zhao, and X. Chu, “Automl: A survey of the state-of-the-art,”
[14] S. D. Balkin and J. K. Ord, “Automatic neural network modeling for Knowledge-Based Systems, 2021.
univariate time series,” International Journal of Forecasting, 2000. [44] J. Wu, X.-Y. Chen, H. Zhang, L.-D. Xiong, H. Lei, and S.-H. Deng,
[15] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “The m4 competi- “Hyperparameter optimization for machine learning models based on
tion: 100,000 time series and 61 forecasting methods,” IJF, 2020. bayesian optimization,” Journal of Electronic Science and Technology.
[16] S. S. Rangapuram, M. Seeger, J. Gasthaus, L. Stella, Y. Wang, and [45] H. Jin, Q. Song, and X. Hu, “Auto-keras: An efficient neural architecture
T. Januschowski, “Deep state space models for time series forecasting,” search system,” in Proceedings of the 25th ACM SIGKDD International
in Proceedings of the 32nd international conference on neural informa- Conference on Knowledge Discovery & Data Mining. ACM.
tion processing systems, 2018. [46] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time series
[17] Y. Wang, A. Smola, D. Maddix, J. Gasthaus, D. Foster, and analysis: forecasting and control. John Wiley & Sons, 2015.
T. Januschowski, “Deep factors for forecasting,” in International Con-
ference on Machine Learning. PMLR.
[18] H. Hewamalage, C. Bergmeir, and K. Bandara, “Recurrent neural net-
works for time series forecasting: Current status and future directions,”
International Journal of Forecasting, 2021.