Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Applied Energy 262 (2020) 114486

Contents lists available at ScienceDirect

Applied Energy
journal homepage: www.elsevier.com/locate/apenergy

Day-ahead high-resolution forecasting of natural gas demand and supply in T


Germany with a hybrid model
Ying Chena,b,c, Xiuqin Xuc,d, , Thorsten Koche,f

a
Department of Mathematics, Block S17, Level 4, 2 Science Drive 2, National University of Singapore, Singapore 117543, Singapore
b
Risk Management Institute, 21 Heng Mui Keng Terrace, 04-03, National University of Singapore, Singapore 119613, Singapore
c
Institute of Data Science, 3 Research Link, 04-06, National University of Singapore, Singapore 117602, Singapore
d
NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore 119077, Singapore
e
Department of Mathematics, Technische Universität Berlin, Str.des 17. Juni 136, 10623 Berlin, Germany
f
Mathematical Optimization Department at Zuse Institute Berlin, Takustr. 7, 14195 Berlin, Germany

HIGHLIGHTS

• We provide an innovative hybrid model for German natural gas forecasting.


• The model handles complex dependence with state-of-the-art statistical and deep learning.
• We achieve accurate and robust day-ahead forecasts for 92 nodes with various types.
• Compared to alternatives, the short-term forecast accuracy improves up to fourfold.

ARTICLE INFO ABSTRACT

Keywords: As the natural gas market is moving towards short-term planning, accurate and robust short-term forecasts of the
Natural gas flow forecasting demand and supply of natural gas is of fundamental importance for a stable energy supply, a natural gas control
Functional autoregressive schedule, and transport operation on a daily basis. We propose a hybrid forecast model, Functional
Neural network AutoRegressive and Convolutional Neural Network model, based on state-of-the-art statistical modeling and
Hybrid model
artificial neural networks. We conduct short-term forecasting of the hourly natural gas flows of 92 distribution
nodes in the German high-pressure gas pipeline network, showing that the proposed model provides nice and
stable accuracy for different types of nodes. It outperforms all the alternative models, with an improved relative
accuracy up to twofold for plant nodes and up to fourfold for municipal nodes. For the border nodes with rather
flat gas flows, it has an accuracy that is comparable to the best performing alternative model.

1. Introduction Iran, Qatar, Saudi Arabia and the UAE in the Middle East, Turkmenistan
in Central Asia, the USA and Venezuela in the Americas, as well as
Natural gas is an important and clean energy source. It supplies 24% Nigeria and Algeria in Africa. Many countries are net importers of fossil
of the energy used worldwide in 2018 [1]. Natural gas makes up nearly fuels. For such countries, the transmission of natural gas, as a foreign
one-quarter of the generation of electricity and plays a crucial role as a source of energy, is of high social and political significance.
feedstock for industry, according to the International Energy Agency. Germany is also a net fossil fuel importer: natural gas contributes
There has been strong growth in the demand and production of natural about 23% to its entire energy supply and is, besides mineral oil, the
gas, driven by the clean energy and environmental policies im- most important source of energy. Annually, 853.4 TW·h of natural gas is
plemented by, e.g., the EU, China, and the ASEAN countries. This in- transported via 35,000 km of German high-pressure gas pipelines. The
creasing trend is foreseen to continue. Given the uneven distribution of natural gas is mainly used for daily heating and electricity generation,
natural gas, its transmission plays an important role in the stability of and thus the supply of the gas should be uninterruptible. Furthermore,
the energy supply. Nearly 80% of the world’s total proven natural gas the German network is the central hub of the natural gas transmission
reserves are located in only 10 countries, namely, Russia in Europe, system for Europe, see Fig. 1.Twice the amount of gas Germany uses


Corresponding author at: Institute of Data Science, 3 Research Link, 04-06, National University of Singapore, Singapore 117602, Singapore.
E-mail address: xiuqin.xu@u.nus.edu (X. Xu).

https://doi.org/10.1016/j.apenergy.2019.114486
Received 10 October 2019; Received in revised form 20 December 2019; Accepted 29 December 2019
0306-2619/ © 2020 Elsevier Ltd. All rights reserved.
Y. Chen, et al. Applied Energy 262 (2020) 114486

Fig. 1. The European Natural Gas Network 2017. Source: ENTSOG.

itself is transported through Germany to supply its neighbors. This and others), sampling frequency (hourly, daily, and annual), number of
corresponds to roughly 10 times the electricity generation of all German nodes (one node or city/country as a whole or many nodes) and others,
nuclear power plants. Germany is responsible for both its own natural see [2] for a comprehensive comparison of recent works. The present
gas supply and that of other EU countries, where the stability of the work belongs to the subcategory of hybrid forecast models that com-
supply is the factor with the highest priority. bines state-of-the-art statistical modeling and artificial neural networks,
The speed of the natural gas flow is rather slow, around 25 km/h, focusing on the short-term day-ahead forecasting of the hourly natural
even in the high-pressure pipeline. Meanwhile, the natural gas market gas flows for both supply and demand for 92 distribution points in the
is becoming more and more competitive and is moving towards the German high-pressure gas pipeline network.
trend of day-ahead contracts, which makes the dispatching of natural Autoregressive (AR) models and their variations have been used in
gas in the pipeline network even more challenging. Therefore, an ac- energy forecasting. For example, Potoĉnik et al. [3] and Taşpinar et al.
curate and robust short-term forecasting of the natural gas flows is [4] predict daily natural gas consumption using an AR with exogenous
essential for efficient operation of the network. In particular, being able variables (ARX) model and an Seasonal Autoregressive Integrated
to accurately predict the future day-ahead hourly demand and supply of Moving Average (SARIMA) model respectively. The classical AR models
the involved distribution nodes in a large gas transport network, it will take into account the temporal dependence in the series and compute
enable the design and implementation of an optimal natural gas control its future values based on the fitted dynamics learned from history.
schedule and further cost savings in operation. Most research investigates data with a relatively low time resolution,
The challenge is in the development of a robust model to provide like daily or even annually, and focus on a small number of nodes or an
accurate short-term forecasts, not only for one node or a few nodes, but amount aggregated over a region/country. When either the time re-
for many nodes in the transmission network, to meet the different needs solution increases to, e.g., hourly, or the number of nodes goes up,
of gas suppliers, customers in the industrial sector, the service sector, Vector AR (VAR) model is considered attractive for being able to use the
and households. These different needs, reflected in the volume of lead–lag cross dependence between several series. This comes with a
hourly gas flows, are characterized by different features of their gas certain amount of unknown parameters, which increases quadratically
supply and consumption in terms of, e.g., dependence (linear and with the number of series. In most cases, this overparametrization
nonlinear) and seasonality (daily, weekly or monthly cycle). Much re- causes overfitting and therefore poor out-of-sample forecast perfor-
search on forecasting natural gas demand has been published in recent mance. Functional AR (FAR) provides a remedy by representing large
years. The literature on natural gas forecasting can be categorized using scale data with a small number of parameters. Hence, the forecasting
various criteria: forecasting horizon (short- and long-term), types of models are easy to estimate, even with a relatively small amount of
tools (autoregressive, regression, support vector machine, artificial data, and readily understandable with a simple structure, cf. the day-
neural network, and hybrid), area covered (EU countries, the US, China, ahead hourly forecast of the price of electricity in [5,6] and the day-

2
Y. Chen, et al. Applied Energy 262 (2020) 114486

head forecast of hourly natural gas flows in [7]. Though popular, AR relative accuracy up to twofold for plant nodes and up to fourfold for
models essentially assume a linear temporal dependence, which is municipal nodes. For the border nodes with stable gas flows, it has
especially inconsistent with high resolution natural gas flows. comparable performance to the best performing alternative model.
Artificial Neural Networks (ANNs) have been used to investigate the – We contribute to the literature by comparing the hybrid FAR-CNN
nonlinear temporal dependence of energy data, including short-term model with a number of alternative forecast models, including
natural gas forecasts. The Multi-Layer Perceptron (MLP), as a special univariate, multivariate, functional AR models, and the state-of-the-
form of ANNs, has been widely used and considered as a benchmark in art machine learning approaches (CNN, LSTM, MLP, LightGBM and
different kinds of energy forecast analysis. Szoplik [8] proposes to use another comparable hybrid model), in an experiment in high re-
an MLP model to predict daily natural gas consumption in Szczecin solution natural gas flows forecasting.
(Poland). Dombaycı [9] develops an MLP model to predict hourly
heating energy consumption of a model house in Denizli (Turkey). MLP This work is part of a project in cooperation with Germany’s biggest
is also compared with the recently developed deep neural network transmission system operator, where we develop advanced models to
(DNN) algorithms. The Convolutional Neural Network (CNN), see [10] provide day-ahead 24 hourly forecasts for a variety of nodes with dif-
and the Long Short Term Memory (LSTM), a type of the Recurrent ferent characteristics and dependence structures. The forecasts will be
Neural Network (RNN), see [11], have demonstrated impressive pattern further used to optimize the gas distribution in the high-pressure gas
detections in image analysis and natural language processing, see [12]. pipeline network on a daily basis. The proposed FAR-CNN is the fore-
Recently, there are some works using CNN and RNN for energy data cast model considered to be used for future distribution system.
forecasting. For instance, Cai et al. [13] propose to use a gated RNN and It is worth mentioning that there has been similar research by
a gated CNN to predict the day-ahead multi-step load in commercial Panapakidis and Dagoumas [18], which proposes the hybrid wavelet-ANFIS/
buildings. Lago et al. [14] propose to use an MLP, a CNN and two RNNs NN model to compute day-ahead forecasts for 40 distribution nodes in the
for predicting electricity prices compared with 27 alternative models, national natural gas system of Greece. Besides the different approaches used
where the machine learning methods in general perform better than the in this study, the above mentioned paper considers the daily natural gas,
statistical models. Even so, it is challenging to get a stable and sa- only demand, from 01/2014 to 06/2016, while the present work analyzes
tisfactory accuracy with ANNs when applied to time series with sto- the hourly gas flows for both demand and supply of 92 nodes in Germany.
chastic patterns, especially when there are much fewer training ob- The rest of this work is organized as follows. Section 2 presents the
servations compared to those in, e.g., computer vision. It is also worth data on the natural gas flows. Section 3 details the hybrid FAR-CNN
noting that ANNs are pure cyber intelligence, with a black-box solution, model. Section 4 implements the hybrid model to forecast day-ahead
making interpretation difficult. hourly gas flows and conducts a comparison with nine alternative
Hybrid models are often favored in real data analysis as they augment models. Section 5 provides a conclusion.
the advantages of the different types of approaches. Wei et al. [15]
combine an improved singular spectrum analysis with the LSTM model 2. Data
to perform daily forecasting of natural gas demand for 4 cities in 3 cli-
mate zones. Bento et al. [16] use a bat optimized MLP combined with a We investigate the static and dynamic patterns of natural gas flows
wavelet transform for short-term price forecasting. Akpinar et al. [17] in the German high-pressure gas pipeline network. The gas flows record
use an artificial bee colony-based optimized MLP model to predict the both the demand and supply of 92 nodes in the network, with an hourly
natural gas consumption in Turkey. Panapakidis and Dagoumas [18] time resolution for 24 h, 7 days a week. These nodes are spread over the
combine Wavelet Transform (WT), Genetic Algorithm optimized Adap- territory of Germany and can be classified into three categories ac-
tive Neuro-Fuzzy Inference System (ANFIS) and MLP to predict the gas cording to their functions:
flows for 40 gas distribution points in Greece. Yu and Xu [19] propose an
MLP model optimized with a Genetic Algorithm and combined with – Border nodes: There are eight border nodes (labeled B-1 to B-8). B-
various data-preprocessing methods to predict the daily natural gas nodes serve as network transfer points with natural gas imported and
consumption of Shanghai (China). Panapakidis and Dagoumas [20] use exported via Germany. The hourly volume of the border nodes is on
an MLP combined with a clustering algorithm to group the training data average 50 times more than that of the plant and municipal nodes.
to forecast day-ahead hourly electricity prices. Keles et al. [21] also – Power plants nodes: There are 23 plant nodes (labeled P-1 to P-23).
combine the MLP with a clustering algorithm, but the clustering algo- They are nodes where natural gas is used to generate electricity and
rithm is used for appropriate selection of input data. Among the litera- for factory production.
ture of hybrid model for forecasting, a merging of AR and ANN has been – Municipal energy supplier nodes: There are 61 nodes (labeled M-1
successful, see the hybrid Elman’s Recurrent Neural Network and ARIMA to M-61). They serve residential and small commercial customers,
[22] and the hybrid ARIMA-MLP [23] for Canadian lynx forecasting, the where natural gas is mainly used for, e.g., space heating, water
hybrid ARMA-Exponential Smoothing-RNN [24] for Indian stock price heating, and cooking.
forecasting, the hybrid ARIMA-MLP [25] for forecasting the natural gas
demand, just to mention a few. The advances in both statistical modeling The dataset covers 637 days (15,288 observations for each node)
and artificial neural networks motivate an update of the hybrid forecast from 19 April 2016 to 16 January 2018. The period is split into training,
model with the cutting edge technologies and an investigation of its validation and test samples. The training sample corresponds to the
forecast accuracy, with a thorough comparison. period from 19 April 2016 to 17 September 2017 (515 days, labeled T1),
The contributions of our work can be summarized as follows: which is used for learning the unknown parameters in the models. The
validation set is from 17 September 2017 to 16 November 2017 (61 days,
– A hybrid natural gas forecast model, the Functional AutoRegressive labeled T2 ), which is used to choose the optimal hyperparameters in the
and Convolutional Neural Network (FAR-CNN), is proposed. We models. The test sample covers the period from 17 November 2017 to 16
incorporate a state-of-the-art statistical model and an ANN to esti- January 2018 (again 61 days, labeled T3 ), which is used to perform out-
mate the complex nonlinear temporal dependence with a relatively of-sample forecasts of the hourly gas demand and supply.
small training sample size. The total hourly gas flows aggregated for each type of node, including
– The FAR-CNN is used to forecast day-ahead hourly natural gas flows the 8 B-nodes, 23 P-nodes and 61 M-nodes, are displayed in Fig. 2. The B-
in both demand and supply, for 92 distribution nodes with different nodes contain both supply and demand flows, i.e., in and out flows. As
types in the German high-pressure gas pipeline network. The FAR- the aggregate supply flows are greater than the demand flows, the ag-
CNN model outperforms many alternative models, with an improved gregate gas flows for the B-nodes are all positive, i.e., more supply than

3
Y. Chen, et al. Applied Energy 262 (2020) 114486

Fig. 2. Data and seasonality. The top row displays the aggregate hourly natural gas flows for the different types of nodes. The aggregate gas flows are scaled into the
range of [ 1, 1] by dividing them by the respective maximum (in absolute value) gas flow. The bottom row displays the monthly, weekly and diurnal patterns of the
different types of nodes. The curves represent the average aggregate gas flows for 12 months, 7 days of the week, and 24 h, respectively.

demand. The P-nodes and M-nodes on the other hand are all demand z-axis of the 3D plots to exhibit the evolution of the gas flows. Fig. 3(c)
flows, thus the aggregated gas flows for these two types are negative. To illustrates the sample autocorrelation function (ACF) of the gas flows at
provide an interpretable comparison between the different types of node, 1:00, when human activity is low. The sample ACFs are significantly
the aggregate gas flows are scaled into the range of [ 1, 1] by dividing by large up to time lags of 30, corresponding to the same hour the previous
the respective maximum (in absolute value) gas flow for each type. Al- night. The sample cross-correlation functions (CCFs) between two dis-
though there is in general an increase in demand during the winter, due tinct hours, 1:00 (late night) and 13:00 (a busy hour in the daytime) are
to the increased need for heating, the seasonal patterns do vary from type also significant up to long time lags, see Fig. 3d). This strong serial- and
to type. For B-nodes, the gas supply is usually bounded by relatively long- cross-dependence motivates the adoption of AR type models, see [3,4].
term contracts, and so is more dependent on production than on seasonal AR type models only consider a linear temporal dependence, which may
variation. Naturally, we observe that the B-nodes are generally stable and not be appropriate to handle nonlinear structures observed in the real
less sensitive to the change from winter to summer. For the P-nodes, data. Fig. 3(e) displays the lag scatter plots of the residuals after the AR
many spikes are observed, caused by, e.g., emergent demand for elec- modeling, where the nonparametric regression line (LOESS) evidences
tricity generation using natural gas. For the M-nodes, the winter demand the existence of a nonlinear pattern in the gas flows. This indicates that
dominates, obviously due to the need for space and water heating. the residuals from the AR type models still contain a useful yet nonlinear
Fig. 2 also details the seasonal variations within the different time pattern that could be used to improve the forecasting.
frames. It displays the average aggregate gas flows of each node type In summary, the features of the data of one type of node differ from
against the twelve months for the monthly comparison, seven days to those for the other types of node. For B-nodes, the gas flows are more stable,
highlight the difference in volume during working days from that in the suggesting that a simple model, such as the sample average, may give good
weekend, and twenty-four hours for the change from daylight hours to forecast accuracy. As for the P- and M-nodes, the gas flows are more sto-
night. While all the three types of node have monthly variations, the B- chastic, with a temporal dependence, requiring a complex forecast model.
nodes are insensitive to the weekly and hourly variations by generating
almost flat curves, see Fig. 2(e) and (f). The M-nodes are also insensitive 3. Method
to the weekly changes, yet an obvious diurnal pattern with a relatively
smaller volume in the late evening and relatively higher values during We propose a hybrid model to describe and forecast the serially de-
the daytime and early evening. On the other hand, the P-nodes exhibit a pendent high-resolution natural gas flows. The hybrid model aims to ex-
strong weekly variation, driven by the working routines, and also a ploit the linear pattern of the large-scale time series via the FAR modeling,
strong diurnal pattern, similar to the M-nodes. but the nonlinear dynamics via the CNN, so that it can be safely used for
A temporal dependence is observed for all the nodes. As an illustra- predicting natural gas flows for different types of nodes. A schematic re-
tion, we discuss three individual nodes. Each of the selected nodes, B-5, presentation of the proposed hybrid model is displayed in Fig. 4.
P-12, M-25, is the median in terms of the average volume within its type. Instead of modeling the 24 time series of hourly gas flows, sampled
Fig. 3(a) displays the temporal evolution of the hourly gas flows of the each day, we represent the multiple time series as one series of daily
individual nodes. The representative nodes are observed to have similar curves. The curves are obtained by smoothing the 24 hourly observa-
patterns regarding their aggregate gas flows. For example, the level of tions and are modeled by a functional autoregression, where the curves
the gas flows shifts from day to day for the B-5 node; several spikes are are decomposed into a small number of time-dependent coefficients and
observed in the P-12 node; both strong inter-daily and intra-daily var- the corresponding orthogonal basis functions with certain parametric
iations are observed for the M-25 node. Fig. 3(b) shows the daily gas flow forms. The temporal dependence of the daily gas flow curves is pro-
curves to present the temporal evolution from day to day. The daily pagated to the series of coefficients, which are independent and thus
curves are obtained by smoothing the discrete hourly observations for can be separately estimated. FAR overcomes the problem of over-
each day over the time interval from 1:00 to 24:00. As B-5, P-12, M-25 all parametrization and simultaneously uses the rich information con-
have demand flows, which are represented as negative values, we flip the tained in the cross-dependent curves. The nonlinear dependence, if it

4
Y. Chen, et al. Applied Energy 262 (2020) 114486

Fig. 3. Plots for B-5, P-12, M-25 nodes; time period is from 19 April 2016 to 16 January 2018. (a) Hourly gas flows (b) Daily gas flows (c) Gas flow sample ACFs at
1:00 (d) Gas flow sample CCFs at 1:00 vs. 13:00 (e) Lag scatter plots for residuals after AR modeling at 1:00.

exists, will be visible in the residuals, which are passed to the CNN t [1, , n], denoted by Yt ( ) . These curves are continuous functions
framework. The parameters and hyperparameters are estimated by defined in the time domain in the Hilbert space , where the time
minimizing the squared loss of the CNN forecasts. domain [0, 1] corresponds to the 24 h from 1 o’clock to 24 o’clock. The
In the following, we will give in detail the FAR model and the CNN FAR model is defined as
model. 1
Yt ( ) µ( ) = ( s ){Yt 1( ) µ ( )} ds + t( ) (1)
0
3.1. FAR model
where µ ( ) is the mean function of the daily curves, which is assumed
Suppose given the series of the daily gas curves for days to be constant from day to day. The serial dependence of each daily gas

5
Y. Chen, et al. Applied Energy 262 (2020) 114486

Fig. 4. Schematic representation of the FAR-CNN model.

curve on its own lagged daily curve Yt 1 ( ) is controlled by a con- at ,0 = p0 + c0 at 1,0 + t ,0


volution kernel Hilbert–Schmidt operator in satisfying at , k = pk +
1
c a +
2 k t 1, k t ,k , k = 1, ,
L2 ([0, 1]) and 2 < 1, where · 2 denotes the L norm in C[0,1].
2
1
Without loss of generality, ( ) is assumed to be an even function. The bt , k = qk + c b
2 k t 1, k
+ t ,k

innovation t ( ) is a strong -white noise with zero mean and finite


second moment E t ( ) 2 < , t = 1, 2 n , where the norm · is in- where at ,0 , at , k , bt , k are known from the observed data of the gas flows.
duced from the inner product of . The other coefficients, t ,0 , t , k , t , k for innovations, c0, ck for the op-
The continuous curves in (1) can be represented by their Fourier erator, and p0 , pk , qk for the intercept, determine the FAR dynamics and
transforms, due to the computational tractability of the Fourier trans- need to be estimated. For k = 1, , , the likelihood is not well defined
formation and the periodicity of the gas flows. Trigonometric basis in an infinite parameter space. Mourid and Bensmain [26] and Chen
functions are used to decompose the curves in (1): and Li [5] propose a sieves estimator by using a sequence of subspaces
of the infinite parameters mn , called sieves. The estimation is con-
ducted over the approximating subspaces. The sequence of mn is
Yt ( ) = at ,0 + [bt , k 2k 1 ( ) + at , k 2k ( )]
compact and nondecreasing mn mn + 1 and mn is dense
k=1
in . The sieve mn is defined by
t( ) = t ,0 + [ t , k 2k 1 ( )+ t , k 2k ( )] mn mn
k=1 L2| ( ) = c 0 1[0,1] ( ) + k2ck2
mn = ( ) ck 2 cos2 k , [0, 1], mn
k= 1 k=1 (3)
( ) = c0 + ck 2k ( )
k=1 where mn + as n + , i.e., as the sample size increases, the
1 number of parameters increases. The increasing pattern of the sieves is
µ( ) ( s ) µ ( ) ds = p0 + [qk 2k 1 ( ) + pk 2k ( )]
controlled by the constraint k =n 1 k 2ck2 mn .
m
0
k=1
Assuming that the Fourier coefficients for innovations t ,0 , t , k , t , k
(2) are independently and identically Gaussian distributed with mean 0
where 0 = 1[0,1] , 2k 1 ( ) = 2 sin(2 k ), 2k ( ) = 2 cos(2 k ) and finite variance k2 , a transition density is defined:
are the basis functions, which are orthogonal and time invariant. The
dependence in the gas flow curves is propagated to the Fourier coeffi-
cients. The terms at ,0 , at , k , bt , k are the Fourier coefficients for the con-
g (Yt ( ), Yt 1( ); ( ))=
2
0
(2mn + 1)/2
mn 2 exp
k= 1 k
{ 1
2 02
(at ,0 p0 c 0 at 1,0 )
2

mn
stant, cosine and sine basis functions for Yt ( ), t ,0 , t , k , t , k are the
(b )
1 1 2
t, k qk c b
2 k t 1, k
coefficients for t ( ), c0, ck are the coefficients for ( ) , and p0 , pk , qk k=1
2 k2
1
are the coefficients for µ ( ) ( s ){µ ( )} ds . Reformulating (1)
( )
0 1 2
using the above Fourier expansions (2), we obtain relations between the + at , k pk c a
2 k t 1, k
Fourier coefficients:

6
Y. Chen, et al. Applied Energy 262 (2020) 114486

from which the log-likelihood conditional on a fixed convolution kernel hidden layers, and the output layer. Depending on the design of the net-
operator ( ) is work, different neurons are arranged in each layer and joined together.
n
We model the vector of current residuals by a CNN model with the
L (Y1 ( ), …, Yn ( ); ( ))= log g (Yt ( ), Yt 1( ); ( )) information of the residuals of the past q days. In particular, at day t, the
t= 2 input of the CNN model is the set of residuals {Zt q, , Zt 1} . To initialize
mn
=
(2mn + 1)(n 1)
log2 (n 1)log 0 (n 1) log 2
k
the network, denote the input layer by Xt(0) , Xt(0) = [ZtT q; ;ZtT 1] Rq × 24
with each row representing one day’s residuals.
2
k= 1
The first layer after the input layer is a convolutional layer. The con-
mn

{ (b )
1 1 1 2
qk c b
volutional layer can learn the location invariant features in the inputs Xt(0) .
t ,k 2 k t 1, k
2 02 2 k2
k= 1
In addition, via parameter sharing, the convolutional layer has many fewer
( )}
1 2
+ at , k pk c a
2 k t 1, k parameters to be estimated compared with MLP, making it easier to train.
Suppose the number of convolutional filters is K and the dimensions of
The sieve estimators for the Fourier coefficients for the convolution
each filter is the same. The height of the convolutional filters is chosen to
kernel operator can be computed analytically by
be q so that the convolutional filters will convolve across q days by pro-
cessing the residuals at the same hour across different days simultaneously
n n n
t = 2 at ,0 t = 2 at 1,0 (n 1) t = 2 at ,0 at 1,0
c0 =
to use the seasonality and non-linearity overlooked by the FAR model.
n 2 n
( t = 2 at 1,0 ) (n 1) t = 2 at2 1,0

t (at , k at 1, k + bt , k bt 1, k ) ( t at , k t at 1, k + t bt , k t bt 1, k ) / (n 1) Associate a weight matrix to each convolutional filter and denote the
ck = 2
weight matrix associated to the kth convolutional filter by Wk . It will have
2 2 2 2
t (at 1, k + bt 1, k ) {( t at 1, k ) + ( t bt 1, k ) } / (n 1)

q rows. Denote the number of its columns by k w . Each filter will take a
and the estimates for the innovation variance 2
and the intercept term
k
receptive field with dimensions of q × k w on Xt(0) and the dot product of
are
the receptive field and the filter is taken. The receptive field will move
2
n
t = 2 (at ,0 p0 c 0 at 1,0 )2 from left to right with a stride of 1 and the dot product is taken along the
=
process until the receptive field arrives at the right corner of Xt(0) . The
0 n 1
n n
c0 t = 2 at 1,0 + t = 2 at ,0
p0 = output of the convolutional layer, denoted by Xt(1) , is the concatenated
n 1
2 2 results from the dot product operation, see the convolutional layer in
n 1 1
2
t=2 bt , k qk
2
ck bt 1, k + at , k pk
2
ck at 1, k
Fig. 4. Usually, a non-linear ReLU layer (ReLU(·) = max(·,0) , with no
k = 2(n 1) parameter) will be attached after the convolutional layer to introduce non-
n
t = 2 at , k
1
ck n
t = 2 at 1, k
linearity into the CNN. The output of the convolutional layer plus the
2
pk = 2(n 1) ReLU layer (we treat the two layers as one layer) can be expressed by
n 1 n
t = 2 bt , k 2
ck t = 2 bt 1, k q kw
qk = n 1 Xt(1)
,(k , i, j ) = Relu Xt(0)
,(i + a 1, j + b 1) ·Wk ,(a, b) , k = 1, ,K
a=1 b =1 (6)
which leads to the predicted daily gas curves
mn
1
where Xt(1),(k , i, j) is the element at the (k , i , j ) th position of Xt
(1)
and Wk,(a, b) is
Yt ( ) = p0 + [qk 2k 1 ( ) + pk 2k ( )] +
0
( s ) Yt 1( ) ds the element at the (a, b) th position of Wk . The output of the convolutional
k=1 layer Xt(1) is an array with three dimensions: these dimensions are denoted
Note that Yt ( ) is obtained by smoothing the 24 discrete hourly gas by dc(1), dh(1), d w(1) , which can be calculated by dc(1) = K , dh(1) = 1, and
flows vector Y t = [Yt ,1, , Yt ,24] where Yt , h denotes the gas flow at day t d w(1) = 24 k w + 1. The set of parameters for the convolutional layer is
at hour h. The vector of predicted hourly gas flows is constituted by the {Wk, k = 1, K } . The hyperparameters for the convolutional layer in-
discrete points in the predicted gas curve clude the width k w of the convolutional filters and the number K of con-
volutional filters.
Y t = [Yt (1/24), Yt (2/24), , Yt (24/24)]T (4)
After the convolutional layer (plus ReLU), the pooling layer is used
The vector of the 24 hourly residuals Zt R24× 1 is to regularize the size of the representation. For the pooling layer, a
receptive field of dimension 1 × pw is taken from Xt(1) . Max pooling
Zt = Y t Yt (5) takes the maximum over a receptive field to represent the field, An
The convolutional kernel Hilbert–Schmidt operator in the FAR alternative, average pooling, takes the average value to represent the
model is eventually a linear combination of the gas flows at different receptive field. Max pooling pays attention to extremes and therefore is
time points s in the previous day with weights ( s ) . This means able to provide robust and safe performance for decision making in,
that the FAR model only considers the linear serial dependence in the e.g., transmission scheduling. Similarly, the receptive field of dimen-
data, but is incapable of detecting any nonlinear dependence structure. sion 1 × pw will move from left to right with a stride of 1 until arriving
In addition, the kernel operator only convolves over the gas flows at the right corner of the outputs of the previous layer Xt(1) .
across different hours on the previous day, which overlooks the sea- Mathematically, the max pooling layer can be formulated as
sonality, which extends over several days. This explains why in Fig. 3(e)
Xt(2)
,(c, i, j) = Max a
(1)
, pw } (Xt ,(c, i, j + a 1) ), c = 1, , dc(1) (7)
there is still a serial dependence left over in the residuals. We consider {1,

using the nonlinearity in the residuals.


and the average pooling layer as

3.2. Convolutional neural network


pw
1
Xt(2)
,(c, i, j) = Xt(1)
,(c , i, j + a 1) , c = 1, , dc(1)
ph · pw a=1 (8)
We adopt the CNN model to help automatically extract the leftover
patterns embedded in the residuals from the FAR model, if any. CNN where the dimensions of Xt(2) are
denoted by and can be dc(2), dh(2), d w(2)
differs from the Multi-Layer Perceptron (MLP), another class of ANN, in calculated by dc(2) = dc(1), dh(2) = 1,
d w(2) = dh(1) pw + 1. In the pooling
that MLP requires the full connectedness of the neurons and is prone to layer, there is no parameter and the hyperparameters are the width of
overfit, while the CNN takes a hierarchical pattern and assumes a simple the receptive field pw and the types of the pooling layer (Max or
structured connectedness between the neurons. There are multiple layers Average).
in the CNN model: the input layer, the convolutional layer, the nonlinear After the convolutional layers (plus ReLU) and the pooling layer, a
ReLU layer, the pooling layer, the flattening layer, some fully connected flattening layer will be used to reshape the output from the last layer

7
Y. Chen, et al. Applied Energy 262 (2020) 114486

Xt(2) with dimensions of dc(2) × dh(2) × d w(2) to be Xt(3) with dimensions of dependence between the daily flow curves.
1
dc(2) dh(2) d w(2) × 1. Several fully connected layers will then be used to FAR: Yt ( ) µ ( ) = 0 ( s ){Yt 1 ( ) µ ( )} ds + t ( ) .
transfer the knowledge learned by the previous layers to produce the By comparison, it helps to show whether the hybrid FAR-CNN model
final prediction. A fully connected layer at the lth hidden layer can be is able to benefit from learning the nonlinear pattern in the re-
formulated as siduals.
2. The univariate/multivariate AR models are commonly used in the
Xt(l) = ReLu(W (l) Xt(l 1)
+ b(l) ), l = 4, L 1 (9)
energy forecasting literature, see a review in [27]. We consider the
where W (l) Rd × d
(l ) (l 1)
is a matrix that weights all the signals coming univariate AR model, the seasonal AR (SAR) model, and the vector
(l )
from the previous layer Xt(l 1) and b(l) Rd × 1 is the bias vector. For AR (VAR) model. The AR and SAR models are implemented sepa-
each fully connected layer, the parameters are {W (l), b(l) } and the hy- rately for each hour, resulting in a total of 24 models for each node.
perparameters for the fully connected layers are the number of the fully In the SAR modeling framework, the lag-2 and lag-7 values are also
connected layers and the number of hidden units d (l) in each layer. incorporated, motivated by the weekly pattern. The VAR model is
The output of the CNN model is the 24 × 1 vector of predicted 24 implemented by jointly analyzing the 24 hourly time series.
hourly residuals at day t, denoted by Zt : AR: Yt , h µh = h (Yt 1, h µh ) + t , h , h = 1, , 24 .
SAR: Yt, h µh = 1,h (Yt 1, h µh ) + 2, h (Yt 2, h µh ) + h,3 (Yt 7, h µh ) + t, h , .
Zt = W (L) Xt(L 1)
+ b (L ) (10) h = 1, , 24
(L 1) VAR: Y t = 0 + 1Y t 1 + t , where Y t = [Y1, t , , Y24, t ]T
where W (L ) and
R24 × d are the parameters for the fully
b (l ) R24 × 1
where Yt , h represents the discrete gas flows for the specific day t and
connected output layer.
hour h, and µh represents the mean value at hour h. In the VAR
Denote the set of all parameters in the L-layer CNN model as CNN(L) ,
model, Y t is a 24-dimensional vector consisting of 24 hourly gas
which contains all parameters of the convolutional layers and fully
flows for day t. The h , 1, h , 2, h , 3, h are the coefficients corre-
connected layers. The squared loss of the CNN model can be defined as sponding to the specific hour h and the 0 , 1 are the matrices of
Loss( Zt ) 2 coefficients for the VAR model. Furthermore, t , h is the corre-
CNN (L) ) = (Zt (11)
sponding residual at day t and hour h while t is the vector of re-
The parameters of the CNN model can be estimated by minimizing the siduals for day t. Comparing the FAR-CNN model with the AR and
squared loss with the stochastic gradient decent (SGD) method. SAR models helps to show whether the cross-dependence is useful to
improve the forecasting. Comparing the FAR-CNN model with VAR
3.3. FAR-CNN helps to show whether we can avoid overfitting by reducing the
parameter space.
After obtaining the FAR and CNN models as described above, we 3. ANN, we consider the MLP, the LSTM (a popular RNN model) and
can obtain the out-of-sample final prediction for the gas flows from the the CNN model. We use the neural network models to find a func-
FAR-CNN model, denoted by Ft , t T3 , by adding the prediction for the tion directly mapping from the previous gas flows to the gas flows of
gas flows from the FAR model in Eq. (4) and the prediction for the the next day. For a fair comparison, we also use the past q days’ gas
residuals in Eq. (10) together: flows Y t 1, , Y t q as input to predict the next 24 hourly gas flows
Y t.
Ft = Yt + Zt , t T3 (12)
– MLP contains several fully connected layers. Suppose the MLP
Algorithm 1 outlines the general steps for the FAR-CNN model. model has L layers and denote the input of the MLP by
Xt(0) = [Y t 1, , Y t q], which has the dimensions 24q × 1. Write
Algorithm 1. The FAR-CNN model
Xt(l) for the output vector of the lth layer and denote its dimensions
by d (l) × 1. A fully connected layer at the lth hidden layer can be
Training and validation: formulated as
1: Fit the functional time series Yt ( ) with the FAR model and use the maximum
likelihood method to estimate the parameters; Xt(l) = ReLu(W (l) Xt(l 1)
+ b(l) ), l = 1, L 1
2: Obtain the FAR prediction for the vector of the next 24 hourly gas flows Yt and the (l ) (l 1) (l )
vector of residuals Zt for t T1 T2 ;
where W (l ) is a matrix and
Rd × d Rd × 1 is a bias
b (l )
3: Model the residuals with a CNN model and train the model with SGD; select the vector. The output of the MLP model is the predicted vector of 24
hyperparameters of the CNN model over the validation set T2 ; hourly gas flows at day t, denoted by Ft , and is given by
Testing:
4: Obtain the prediction for the vector of the next 24 hourly gas flows from the FAR Ft = W (L) Xt(L 1)
+ b (L )
model Yt , t T3 ; (L 1)
5: Obtain the prediction for the vector of the next 24 hourly residuals from the CNN where W (L) R24 × d and b(L) R24× 1 are the parameters. The
model Zt , t T3 ; hyperparameters for the MLP model include the number of fully
6: Calculate the final prediction for the next 24 hourly gas flows with connected layers and the number of hidden units d (l) in each layer.
Ft = Yt + Zt , t T3 ; – LSTM is a type of RNN with a specially designed gate structure to
return Ft mitigate the gradient vanishing problem. Instead of feeding all the
past information to the input layer together, as in MLP, LSTM
processes the past q days’ information sequentially with q blocks.
Suppose there are d (1) hidden units in the LSTM layer in the model.
3.4. Alternative models The input to the LSTM model is {Y t q, , Y t 1} . Denote the Input
(1) (1)
Gate by IGt , i Rd , the Forget Gate by FGt , i Rd , and the
We consider several alternative models, which can be categorized (1)
Output Gates by OGt , i Rd , i = 1, , q , which are used to con-
into four categories: the functional AR models, the univariate/ multi- trol the flow of the information with a maximum value of 1 by
variate AR models, ANN models, and the decision tree model. keeping all the corresponding information and with a minimum
value of 0 by deleting all the corresponding information. The
1. The FAR model is used to investigate the linear temporal computation in the LSTM layer is as follows. For i = 1, , q,

8
Y. Chen, et al. Applied Energy 262 (2020) 114486

FGt , i = Sigmoid(Wf × [ht , i 1, Y t , i] + bf ) For each partition, we will use the average of the target values Yt , h in
IGt , i = Sigmoid(Wi × [ht , i 1, Y t , i] + bi) that region Rm to predict all the points in Rm , i.e.,
OGt , i = Sigmoid(Wo × [ht , i 1, Y t , i] + bo) cm, h = Average(Yt , h |(Yt 1, , Yt q) Rm), h = 1, , 24
Ct , i = Tanh(WC × [ht , i 1, Y t , i] + bC )
The specific variable upon which a branch is based and the exact
Ct , i = FGt , i × Ct , i 1 + IGt , i × Ct , i
value where the branch is split are chosen by minimizing the
ht , i = OGt , i × Tanh(Ct , i ) squared loss between the real values and the forecast values. In
(1) (1) order to improve the forecasting, an ensemble of tree models are
where ht , i Rd and Ct , i Rd are the hidden state and cell state
trained with a gradient boosting algorithm, where samples with
for the ith block. The initial hidden state ht,0 and cell state Ct,0 are
(1)
small gradients are excluded to speed up the training process, see
zero vectors. The weight matrices Wf , Wi , Wo, WC Rd × (d (1) + 24)
[28]. We use LightGBM to investigate the performance of state-of-
(1)
and the weight vector bf(1),Rd × 1 are unknown
bi(1), bo(1), bC(1) art machine learning algorithms other than the ANNs in predicting
parameters. Here, ht , q encodes all the information coming from natural gas flows.
the past information Y t 1, , Y t q . Again, we use several fully
connected layers to transform the information encoded in ht , q to The features of different models are listed in Table 1. Compared to
our final prediction for the gas flows: the alternative models, the proposed hybrid FAR-CNN provides a
powerful modeling framework to incorporate complex dependence
Xt(l) = ReLu(W (l) Xt(l 1)
+ b(l) ), l = 2, L 1
from cross-dependence to non-linearity, and simultaneously benefits
where Xt(1)
= h t , q , W (l ) R d × d
(l ) (l 1)
is a matrix, and b(l) Rd × 1 is
(l ) from augmenting statistical modeling and machine learning. On the
a bias vector. The output of the LSTM model is the vector of other hand, the cross-dependence cannot be captured by the univariate
predicted 24 hourly gas flows at day t, denoted by Ft , and is given AR models (AR and SAR), although SAR can also utilize the seasonality
by in the modeling. The LightGBM and the ANNs, including the hybrid
models can handle non-linear dependence, but there often exists the
Ft = W (L) Xt(L 1)
+ b (L ) overfitting problem when the sample size is relatively small.

The hyperparameters in the LSTM model include the number of


fully connected layers, the number of hidden units in the LSTM 4. Real data analysis
layers, and the number of hidden units in the fully connected
layers. In this section, we apply the hybrid FAR-CNN model to perform a
– For the CNN model directly applied to the data of the gas flows, day-ahead forecast of the hourly gas flows in the German transmission
the input is the concatenated gas flow matrix network. We are interested in studying how much the combination of
[Y Tt 1; ;Y Tt q] Rq× 24 where each row represents the gas flows for the state-of-the-art statistical modeling and neural network learning,
one day. The output is the predicted 24 hourly gas flows. The namely the FAR and the CNN, benefits the out-of-sample forecasting of
hyperparameters in the CNN model are the same as in the FAR- high-resolution energy data with a complex dependence and a rela-
CNN model. tively small sample size. We investigate the relative accuracy of the
The parameters of the above three ANN models are obtained by proposed hybrid model in comparison to several alternative models and
minimizing the squared loss between the predicted and original methods that are popular in the literature on energy forecasting.
gas flows using SGD. The hyperparameters are optimized by
minimizing the squared loss over the validation set T2 . Comparing 4.1. Forecasting procedure
the neural network models indicates whether combining statis-
tical models with neural network models will help improve the We use data for the German hourly flows of natural gas from 19
forecast accuracy for relatively smaller datasets. April 2016 to 16 January 2018, for details, see Section 2. The sample
4. LightGBM is a gradient boosting decision tree method, which pro- period is split into a training set from 19 April 2016 to 17 September
vides an alternative way to handle complex dependencies. Again, we 2017 (labeled T1), a validation set from 17 September 2017 to 16 No-
use the previous q days’ hourly gas flows to predict the day-ahead 24 vember 2017 (labeled T2 ), and a test set from 17 November 2017 to 16
hourly gas flows individually. Therefore, we obtain a total of 24 tree January 2018 (labeled T3 ). We make out-of-sample day-ahead forecasts
models, one for each hour’s gas flows. Trees are designed to find over T3 , covering 61 days.
groups of similar observations. The trees grow by finding the best We train the model over T1, and optimize the hyperparameters over
split variable and the split value sequentially. At each step, a new T2 by minimizing the squared loss. The fitted model is then used to
branch will sort the data left from the last step into bins based on obtain the out-of-sample forecast where the input is the vector of seven
one of the input variables. In the end, the decision regression tree days’ data from the previous week, i.e., q = 7 . We do not update the
will split the data into several inclusive regions Rm , m = 1, , M parameters and hyperparameters in the testing period.
and can be formulated as a linear combination of M indicator For the alternative NN models and machine learning model, we use
functions: the data for the previous seven days’ gas flows as the input to perform
M the day-ahead forecasting. The hyperparameters are chosen similarly.
fh (Yt 1, , Yt q) = cm, h ((Yt 1, , Yt q) Rm), h = 1, , 24 For the alternative AR type models, stationarity is required to ensure
m=1 stable estimates. In addition, the estimation of AR type models is much

Table 1
Features of the models considered
Features AR SAR VAR FAR MLP LSTM CNN Light GBM FAR-LSTM FAR-CNN

Cross-dependence ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
Seasonality ✓ ✓ ✓
Non-linearity ✓ ✓ ✓ ✓ ✓ ✓
Easy overfitting ✓ ✓ ✓ ✓ ✓

9
Y. Chen, et al. Applied Energy 262 (2020) 114486

Table 2
Comparison of the FAR-CNN model and alternative models. The first column shows the MAPE, nRMASE, and MARNE
for FAR-CNN. The other columns show the relative MAPE, relative nRMSE, and relative MARNE for the alternative
models.

faster than for neural network models and LightGBM. Therefore, we use For ease of interpretation, the FAR-CNN model is used as the
a 28-day rolling window to estimate the parameters. benchmark and we compute the relative MAPE, relative nRMSE and
relative MARNE for model i with
4.2. Measurement of the accuracy MAPE(modeli)
relative MAPE = 1
MAPE(FAR CNN)
A set of criteria is used to elaborate the accuracy of each forecast:
the Mean Absolute Percentage Error (MAPE), the normalized Root nRMSE(modeli)
relative nRMSE = 1
Mean Squared Error (nRMSE), and the Mean Absolute Range nRMSE(FAR CNN)
Normalized Error (MARNE):
MARNE(modeli)
24 relative MARNE = 1
1 Yt , h Ft , h MARNE(FAR CNN)
MAPEh = 24 | T3 |
| Yt , h
|
t T3 h = 1 A positive (negative) value of the relative measurements means that the
24 FAR-CNN model outperforms (underperforms) the alternative model i,
1
RMSEh = 24 | T3 |
(Yt , h Ft , h )2 × 100 and the magnitude gives the percentage improvement (degradation).
t T3 h = 1
RMSEh
nRMSEh = max (Yt , h) min (Yt , h ) 4.3. Results and discussion
t ,h t, h
24
MARNEh =
1 Yt , h Ft , h
× 100 The average values of the relative MAPE, relative nRMSE, and re-
24 | T3 |
t T3 h = 1
max (|Yt , h |)
t, h lative MARNE of each alternative model are reported in Table 2, where
the accuracy of the benchmark, i.e., FAR-CNN, is listed as a reference.
where Yt , h and Ft , h denote the real and forecast values of the natural gas When all the nodes are considered (labeled ‘All’ in the table), the FAR-
flows at day t and hour h, t = 1, , 61 and h = 1, , 24 , while |T3 | is the CNN model outperforms all the alternative models. The relative MAPE
length of the test period, with a value of 61 in our case. All are scale- ranges from 4.2% (LSTM) to 268.3% (MLP), indicating at least a 4.2%
free measurements with different foci. MAPE indicates the average level improvement, and up to 26-fold. For plant and municipal type nodes,
of relative accuracy, nRMSE indicates the variance standardized by the the dominating performance remains, while for border type nodes,
original range, and MARNE indicates the relative performance stan- FAR-CNN is not always the winner. In particular, for the B-nodes, the
dardized by the maximum absolute value. Each measurement is com- AR type models tend to perform better than the FAR-CNN model, with
puted over all the hourly forecasts in the test period. The smaller the negative values of relative MAPE (−5.0% for VAR, −1.9% for FAR and
values of the criteria, the better the accuracy. −1.4% for SAR). This actually is in conformance with the information

Fig. 5. Boxplots of the hourly MAPE of different methods for different types of nodes. The orange line indicates the median and the green triangle shows the mean.

10
Y. Chen, et al. Applied Energy 262 (2020) 114486

from the exploratory data analysis in Section 2, which shows that the B- to 340.5% (MLP) for the M-nodes. They do require a more compre-
nodes in general have stabler gas flows. Simple models such as auto- hensive model to handle the forecasting. Similar results are observed
regression are good enough to achieve reasonable accuracy. On the for the relative nRMSE and relative MARNE evaluation criteria, which
other hand, for the P-nodes and M-nodes, which have a more complex implies a stable performance of FAR-CNN in terms of variance and
dependency structure, the reduction in the forecast error ranges from extremes.
3.6% (LSTM) to 128.5% (MLP) for the P-nodes, and from 4.5% (LSTM) Moreover, the FAR-CNN model is generally better than the FAR

Fig. 6. The original hourly gas flows (the gray area) and the day-ahead 24 hourly predicted gas flows from the FAR-CNN model (the red dashed line) for 15 nodes
from 17 November 2017 to 16 January 2018. Each row displays the forecasts for a B-node, a P-node, and an M-node with the smallest, first quartile, median, third
quartile, biggest hourly MAPE within the 8 B-nodes, 23 P-nodes and 61 M-nodes, respectively. For example, node B-1 has the smallest hourly MAPE within the B-
nodes, while node M-61 has the biggest hourly MAPE within the M-nodes. The data are scaled to the range of [−1,1] by dividing by the maximum (in absolute value)
gas flows for the respective nodes from the training and validation set.

11
Y. Chen, et al. Applied Energy 262 (2020) 114486

model (an improvement of 7.7% and 11.8% for the P-nodes and M- 5. Conclusion
nodes, respectively), though both avoid overfitting by making use of a
reduction in the parameter space. The better performance of the FAR- We have proposed a hybrid of a Functional AutoRegressive model
CNN model indicates that there is a non-linearity in the gas flows and and a Convolutional Neural Network model (FAR-CNN) to capture both
the addition of the CNN helps to improve the forecasting performance. the linear and nonlinear patterns in time series. The proposed model
Meanwhile, the FAR-CNN model also dominates the CNN model across has been applied to predict the day-ahead high-resolution 24 hourly
all types. The hybrid model benefits from the strength of its two com- flows of natural gas, both demand and supply, in the German pipeline
ponents: FAR explains the empirical data feature of serial dependence network, considering a total of 92 nodes categorized as border nodes
and CNN extracts further useful nonlinear information, and thus im- (B-nodes), plant nodes (P-nodes) and municipal nodes (M-nodes). We
proves the forecasting. have compared this hybrid model with several alternative models
In comparison with the other NN models, LSTM is the best, with a popular in the literature. An analysis of real data indicates that for the
relative MAPE around 4% for All; MLP has the worst performance, with B-nodes, which have stabler gas flows, the simple AutoRegressive (AR)
double the relative error. Another hybrid candidate, FAR-LSTM, does type models (like Vector AR, FAR) are sufficient to obtain an accurate
not perform well. This may be because the LSTM model is designed prediction. For the P-nodes and M-nodes, with their more complex
specially for learning patterns in sequence data, whereas by applying structure, the FAR-CNN model outperforms all the alternative models,
the FAR model, the sequence structures in the gas flows have been with an improvement of hourly mean absolute percentage error (MAPE)
explained and thus left limited information for the LSTM. In addition, from 3.6% to 128.5% for P-nodes, and an improvement of hourly MAPE
by processing the residuals at the same hour for different days together, from 4.5% to 340.5% for M-nodes. Furthermore, the FAR-CNN model
the FAR-CNN is able to explicitly exploit the seasonality left in the re- improves on the FAR model, which indicates that the CNN model
siduals. successfully learns the left-over nonlinear patterns in the residuals from
The AR type models are superior for the B-nodes, which have stable the FAR model. In addition, the FAR-CNN model is superior to the CNN
flows. However, their performance is quite unsatisfactory for the P- model when directly applied to the gas flows, which shows that by
nodes and M-nodes. For example, VAR performs the best for the B- disentangling the linear patterns and nonlinear patterns in the gas flows
nodes, but is 78.6% worse than FAR-CNN for the M-nodes. Given the with the FAR model, the hybrid FAR-CNN model is able to make use of
intention of proposing a robust forecast model for different types of the strengths of its two components—FAR focusing on the linear part
nodes that can be used for scheduling and automating operations, the and CNN handling the remaining nonlinear structure—to improve the
FAR-CNN model is to be favored. final forecasting.
The LightGBM decision tree model delivers the second worst per-
formance, which could be attributed to overfitting problems as the Declaration of Competing Interest
training sample is rather small for training LightGBM.
Fig. 5 displays the boxplots of the hourly MAPE of the methods for The authors declare that they have no known competing financial
the different types of nodes. It also confirms our discussion. For the B- interests or personal relationships that could have appeared to influ-
nodes, the AR type models perform better: they have the smallest ence the work reported in this paper.
median and mean forecast errors. For the P-nodes, the FAR-CNN
performs the best, followed by the LSTM model. For the M-nodes, Acknowledgements
FAR-CNN and LSTM have similar accuracy, while FAR-CNN has a
smaller third quartile, indicating that it has a stabler performance. In The authors gratefully acknowledge the financial support of the
summary, the hourly MAPE ranges in [2.4%,28.2%] for the B-nodes, Singapore Ministry of Education Academic Research Fund Tier 1 at
[1.8%, 41.9%] for the P-nodes and [1.3%, 47.5%] for the M-nodes. National University of Singapore and the Research Campus MODAL
Among them, 75% (i.e., 6) of the B-nodes, 60.8% (i.e., 14) of the P- funded by the German Federal Ministry of Education and Research
nodes, and 86.7% (i.e., 53) of the M-nodes, have an hourly MAPE (05M14ZAM).
below 15%. In general, the prediction accuracy for the M-nodes is
better than for the B-nodes and P-nodes. The detailed hourly evalua- Appendix A. Supplementary material
tion results of the FAR-CNN model for every node are reported in the
supplementary material. Supplementary data associated with this article can be found, in the
As an illustration, we show the forecasts of five nodes for each type, online version, at https://doi.org/10.1016/j.apenergy.2019.114486.
namely the node with the smallest (min), first quartile (25%), median
(50%), third quartile (75%), and the biggest (max) hourly MAPE, see References
Fig. 6. in most cases, FAR-CNN forecasts trace the actual values very
well even for the nodes with the third quartile (75%) MAPE. For the [1] British Petroleum Company. BP statistical review of world energy 2019. BP World
nodes with the biggest forecast errors, i.e., B-8, P-23, and M-61, it can Energy; 2019.
[2] Soldo B. Forecasting natural gas consumption. Appl Energy 2012;92:26–37.
be seen that the large errors are to be attributed to the abrupt changes [3] Potočnik P, Soldo B, Šimunović G, Šarić T, Jeromen A, Govekar E. Comparison of
in the gas flows, where all the methods are unable to obtain a sa- static and adaptive models for short-term residential natural gas forecasting in
tisfactory accuracy. Detailed hourly MAPE results for the three worst- Croatia. Appl Energy 2014;129:94–103.
[4] Taşpınar F, Celebi N, Tutkun N. Forecasting of daily natural gas consumption on
performing nodes are reported in the supplementary material. regional basis in Turkey using various computational methods. Energy Build
In summary, the proposed hybrid model outperforms the univariate 2013;56:23–31.
models as it considers the cross-dependence in the natural gas flows. [5] Chen Y, Li B. An adaptive functional autoregressive forecast model to predict
electricity price curves. J Bus Econ Stat 2017;35:371–88.
Compared to the classic functional time series model, it utilizes the non- [6] Chen Y, Marron J, Zhang J, et al. Modeling seasonality and serial dependence of
linearity dependence embedded in the residuals, which further en- electricity price curves with warping functional autoregressive dynamics. Ann Appl
hances predictive power. The artificial neural networks, though pow- Stat 2019;13:1590–616.
[7] Chen Y, Chua WS, Koch T. Forecasting day-ahead high-resolution natural-gas de-
erful for big data, are challenged by overfitting for the relatively small
mand and supply in Germany. Appl Energy 2018;228:1091–110.
sample size. The hybrid model however disentangles the mixed linear [8] Szoplik J. Forecasting of natural gas consumption with artificial neural networks.
and non-linear dependence and adopts appropriate approach to avoid Energy 2015;85:208–20.
information loss and overfitting. It also outperforms the FAR-LSTM [9] Dombaycı ÖA. The prediction of heating energy consumption in a model house by
using artificial neural networks in Denizli-Turkey. Adv Eng Softw 2010;41:141–7.
model as the CNN structure is able to explicitly incorporate the sea- [10] LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al.
sonality in the learning. Backpropagation applied to handwritten zip code recognition. Neural Comput

12
Y. Chen, et al. Applied Energy 262 (2020) 114486

1989;1:541–51. [19] Yu F, Xu X. A short-term load forecasting model of natural gas based on optimized
[11] Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propa- genetic algorithm and improved BP neural network. Appl Energy 2014;134:102–13.
gating errors. Nature 1986;323:533. [20] Panapakidis IP, Dagoumas AS. Day-ahead electricity price forecasting via the ap-
[12] Abdel-Hamid O, Mohamed A-R, Jiang H, Deng L, Penn G, Yu D. Convolutional plication of artificial neural network based models. Appl Energy 2016;172:132–51.
neural networks for speech recognition. IEEE/ACM Trans Audio Speech Lang [21] Keles D, Scelle J, Paraschiv F, Fichtner W. Extended forecast methods for day-ahead
Process 2014;22:1533–45. electricity spot prices applying artificial neural networks. Appl Energy
[13] Cai M, Pipattanasomporn M, Rahman S. Day-ahead building-level load forecasts 2016;162:218–30.
using deep learning vs. traditional time-series techniques. Appl Energy [22] Aladag CH, Egrioglu E, Kadilar C. Forecasting nonlinear time series with a hybrid
2019;236:1078–88. methodology. Appl Math Lett 2009;22:1467–70.
[14] Lago J, De Ridder F, De Schutter B. Forecasting spot electricity prices: deep learning [23] Zhang GP. Time series forecasting using a hybrid ARIMA and neural network
approaches and empirical comparison of traditional algorithms. Appl Energy model. Neurocomputing 2003;50:159–75.
2018;221:386–405. [24] Rather AM, Agarwal A, Sastry V. Recurrent neural network and a hybrid model for
[15] Wei N, Li C, Peng X, Li Y, Zeng F. Daily natural gas consumption forecasting via the prediction of stock returns. Expert Syst Appl 2015;42:3234–41.
application of a novel hybrid model. Appl Energy 2019;250:358–68. [25] Cardoso CV, Cruz GL. Forecasting natural gas consumption using ARIMA models
[16] Bento PM, Pombo JA, Calado MR, Mariano SJ. A bat optimized neural network and and artificial neural networks. IEEE Latin Am Trans 2016;14:2233–8.
wavelet transform approach for short-term price forecasting. Appl Energy [26] Mourid T, Bensmain N. Sieves estimator of the operator of a functional auto-
2018;210:88–97. regressive process. Stat Prob Lett 2006;76:93–108.
[17] Akpinar M, Adak MF, Yumusak N. Day-ahead natural gas demand forecasting using [27] Weron R. Electricity price forecasting: a review of the state-of-the-art with a look
optimized ABC-based neural network with sliding window technique: the case study into the future. Int J Forecast 2014;30:1030–81.
of regional basis in Turkey. Energies 2017;10:781. [28] Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: a highly efficient
[18] Panapakidis IP, Dagoumas AS. Day-ahead natural gas demand forecasting based on gradient boosting decision tree. In: Advances in neural information processing
the combination of wavelet transform and ANFIS/genetic algorithm/neural net- systems; 2017. p. 3146–54.
work model. Energy 2017;118:231–45.

13

You might also like