Professional Documents
Culture Documents
Integrated Multiple Directed Attention-Based Deep Learning For Improved Air Pollution Forecasting
Integrated Multiple Directed Attention-Based Deep Learning For Improved Air Pollution Forecasting
Abstract— In recent years, human health across the world pollution is becoming a critical problem in urban areas and
is becoming concerned by a constant threat of air pollution, industrialized countries and one of the principal factors for
which causes many chronic diseases and premature mortalities. global warming. Mitigating air pollution is a paramount issue
Poor air quality does not have only serious adverse effects on
human health and vegetation but also some major negative in developing countries, notably in larger urban areas with
political, societal, and economic impacts. Hence, it is essential a high concentration of emission sources, including vehicles
to invest more effort on accurate forecasting of ambient air and industrial activities. Many epidemiological studies showed
pollution to provide practical and relevant solutions, achieve the effect of certain chemical compounds such as sulfur
acceptable air quality, and plan for prevention. In this work, dioxide (SO2 ), nitrogen dioxide (NO2 ), ozone (O3 ), or dust
we propose a flexible and efficient deep learning-driven model
to forecast concentrations of ambient pollutants. This article particles in the air on the health of the general population and
introduces first the traditional variational autoencoder (VAE) particularly noticeable on sensitive people such as asthmatics,
and the attention mechanism to develop the forecasting modeling children, and elderly [1]. Today, air quality is a multidis-
strategy based on the innovative integrated multiple directed ciplinary problem that mobilizes epidemiological specialists,
attention (IMDA) deep learning architecture. To assess the per- specialists in transport modeling, emissions and transformation
formance of the proposed forecasting methodology, experimental
validation is then performed using air pollution data from four of pollutants, geographical systems, forecasting, and local
US states. Six statistical indicators have been used to evaluate authorities and industrialists. Thus, monitoring the ambient
the forecasting accuracy. A discussion of the results obtained air quality is essential to achieve acceptable air quality [2].
finally demonstrates the satisfying performance of integrated Over the past few decades, much effort has been made to
multiple directed attention variational autoencoder (IMDA-VAE) enhance air quality [3], [4]. For instance, air quality networks
methods to forecast different pollutants in different locations.
Furthermore, results indicate that the proposed IMDA-VAE (composed of numerous measurement stations) were installed
model can effectively improve air pollution forecasting per- across almost all countries to monitor numerous air pollutants’
formance and outperforms the deep learning models, namely concentration levels. This study attempts to design an efficient
VAE, long short-term memory (LSTM), gated recurrent units deep learning data-driven model for forecasting concentrations
(GRU), bidirectional LSTM, bidirectional GRU, and ConvLSTM. of carbon monoxide (CO), nitrogen monoxide (NO), nitrogen
We also showed that the forecasting results of the proposed model
surpass the performance of LSTM and GRU with the attention dioxide (NO2 ), sulfur dioxide (SO2 , and ozone (O3 ).
mechanism. Reliable forecasting of air pollutants gives valuable infor-
mation to help people take the necessary precautions to
Index Terms— Air pollution concentrations forecasting, deep
learning, integrated multiple directed attention variational avoid undesirable consequences. Also, it permits taking more
autoencoder (IMDA-VAE), multiple directed attention, time adequate countermeasures for preventing air pollution crisis
series. and protecting public health. The need for flexible forecasting
techniques that can accurately forecast air pollutants has
I. I NTRODUCTION considerably drawn researchers’ and engineers’ attention [5].
Various model- and data-based methods have been designed
C ONCERNS for the environment, health, and safety have
been attracting considerable attention worldwide due to
the new environmental challenges that threaten the planet. Air
to improve air pollution modeling and forecasting over the
past four decades [1], [6], [7]. Conventional time-series-based
models are among the widely utilized methods for air pollution
Manuscript received February 28, 2021; revised May 6, 2021; accepted forecasting in the literature [8]–[10]. These models comprise
May 25, 2021. Date of publication June 28, 2021; date of current version
July 9, 2021. This work was supported by the King Abdullah University autoregressive integrated moving average (ARIMA) and its
of Science and Technology (KAUST), Office of Sponsored Research (OSR) alternatives, such as seasonal-ARIMA [11] and Holt–Winters
under Award OSR-2019-CRG7-3800. The Associate Editor coordinating the models [12], [13]. Nevertheless, the forecast error in these
review process was Wilson Wang. (Corresponding author: Fouzi Harrou.)
Abdelkader Dairi is with the Computer Science Department Signal, Image methods is apparent when the concentration levels of pollu-
and Speech Laboratory (SIMPA) Laboratory, University of Science and tants exhibit irregular variations [14]. Also, the linear nature
Technology of Oran-Mohamed Boudiaf (USTO-MB), Bir El Djir 31000, of statistical models (e.g., AR and ARIMA) does not enable
Algeria.
Fouzi Harrou and Ying Sun are with the Computer, Electrical and Math- forecasting the nonlinear and nonstationary of ambient air
ematical Sciences and Engineering (CEMSE) Division, KAUST, Thuwal pollution accurately [15].
23955-6900, Saudi Arabia (e-mail: fouzi.harrou@kaust.edu.sa). To mitigate the weakness mentioned above, machine learn-
Sofiane Khadraoui is with the Department of Electrical Engineering,
University of Sharjah, Sharjah, United Arab Emirates. ing models, which are more flexible, such as neural network
Digital Object Identifier 10.1109/TIM.2021.3091511 forecasting and support vector machine, have been widely
1557-9662 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
3520815 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
employed in improving air pollution forecasting [16]–[18]. prevention. Deep learning has recently emerged as a promising
Machine learning models showed a suitable ability to model research line in modeling and forecasting time series data, both
the complicated relationship between process variables without in academia and industry [22]–[26]. Various deep techniques
the need for an explicit model formulation to be specified. have been applied in the literature to improve ambient air
Over the last two decades, several machine learning-based pollution forecasting. Chang et al. [27] considered an aggre-
methods have been applied for air pollution forecasting. For gated long short-term memory model (ALSTM) deep learning
instance, in [16], a wavelet-based neural network approach approach to improve air pollution forecasting. Essentially,
has been proposed for forecasting one step-ahead hourly, this forecasting approach consists of aggregating three LSTM
daily mean, and daily maximum concentrations of ozone (O3 ), models into a predictive model using information obtained
sulfur dioxide (SO2 ), carbon monoxide (CO), nitrogen oxides from nearby industrial air quality stations and external sources
(NO), nitrogen dioxide (NO2 ), and dust particles (PM2.5). of pollution. Results proved the outperformance of this aggre-
In particular, by using maximum overlap wavelet transform gated deep learning method compared to LSTM, support
(MODWT), this approach decomposes each time series of vector machine (SVR)-based regression, and gradient boosted
each air pollutant into different time-scale components. Then, tree regression (GBTR) in predicting PM2.5 concentrations.
the Elman network is applied to these time-scale components. In [28], the convolutional-based bidirectional gated recurrent
In this study, the wavelet network approach is used for one- unit (CBGRU) method is applied to forecast PM2.5 concen-
step-ahead forecasting. In [19], a hybrid method merging the tration levels. This approach combines both 1-D convolutional
advantages of the empirical wavelet transform (EWT), multi- neural networks’ desirable features and bidirectional gated
agent evolutionary genetic algorithm (MEGA), and nonlinear recurrent unit neural networks. This approach showed good
autoregressive network with exogenous inputs (NARX) is forecasting performance of PM2.5 concentration compared
introduced for enhancing multistep air pollutant concentrations to SVR, gradient boosting regressor (GBR), LSTM, GRU,
forecasting. Specifically, the air pollutant series are decom- and bidirectional GRU. In [29], LSTM optimized using a
posed via the EWT, and then, the optimized NARX neural particle swarm optimization algorithm is applied for predicting
network by the MAEGA model is applied to forecasting air ambient air pollutants concentrations (PM2.5, PM10,NO2 ,
pollutant concentrations. Results showed the outperformance CO, O3 , and SO2 ). In [10], an approach combining RNN
of this model compared to conventional models when applied models with LSTM (RNN + LSTM) is proposed to predict
for forecasting PM2.5, SO2 , NO2 , and CO concentrations PM10 particles in different places in the city Skopje. Results
levels in Beijing, China. However, these methods are not suited show that using both meteorological and air pollution mea-
for real-time forecasting because wavelet transform needs surements enhances LSTM and RNN + LSTM models’ fore-
batch data. In [20], an ANN model has been combined with casting accuracy. Also, it has been shown that the combined
numerical models to improve the prediction of daily concentra- RNN + LSTM models consistently outperform the ARIMA
tions of air pollutants, including SO2, NO2, and PM10, using approach.
meteorological variables. Similarly, Arhami et al. [17] intro- Bui et al. [30] and Kim et al. [31] proposed a method for
duced a hybrid forecasting model to forecast hourly pollutant air pollution prediction from historical time series pollutant
levels (i.e., NO2 , NO, O3 , CO, and PM10 ) based on artificial data and meteorological data using RNN and LSTM models.
neural networks (ANN) combined with uncertainty analysis Another related work [32] utilizes an LSTM model to forecast
by Monte Carlo simulations (MCS). This hybrid model uses the 8-h moving average concentrations of ozone, where results
several selected input meteorological variables for improved obtained showed forecasts with low error. Deep learning
forecasting. It has been shown that the combination of ANN models with RNN, LSTM, and GRU architectures are applied
with MCS offers a promising tool for air pollution predictions. for forecasting air quality based on the AirNet dataset that
In [18], at first, a genetic algorithm and a linear method of includes both meteorological time series and air quality data
stepwise fit are applied to select the relevant features from [33]. The analysis examined in [33] showed that the GRU
the prediction point of view. Then, the random forest (RF), model slightly outperforms the RNN and LSTM architectures
the multilayer perception (MLP), the radial basis function for PM10 concentration prediction. A deep learning approach
(RBF), and the support vector machine (SVM) are applied is proposed in [34] and [35] for the air pollution prediction
to the selected features for daily forecasting of atmospheric in South Korea, where a stacked autoencoder (SAE) model is
pollutants NO2 , O3 , SO2 , and PM10 . It has been shown that utilized for learning and training data. An LSTM model-based
the features selection step with an ensemble of predictors approach is proposed in [36] to predict air pollutant con-
enables improved forecasting quality of atmospheric pollution. centrations. This forecasting approach presented in [36] is
In [21], an approach to predict air concentration levels of capable to automatically extract useful features, such as the
air pollutants (i.e., NO2 , NOX, SO2 , O3 , and PM2.5) is pro- spatiotemporal correlations within air pollutant concentrations.
posed using based on sparse response backpropagation training The spatiotemporal deep learning (STDL) architectures have
feedforward neural networks (called FFANN-SRBP). This also been used for spatiotemporal prediction of air quality.
approach outperformed the multiple linear regression (MLR) A combination of a convolutional neural network and LSTM
and FFANN based on backpropagation (FFANN-BP) in terms model is proposed in [37] to forecast air quality up to
of prediction precision. 48 h and extracts the spatial–temporal relations. Chiou-Jye
Precise air pollution forecasting provides relevant informa- and Ping-Huan [38] presented a comparative study of the
tion about the future pollution evolution, which is crucial for performance of CNN-LSTM model with traditional machine
effective air pollution monitoring and assists in planning for learning models in terms of their ability to forecast PM2.5
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
DAIRI et al.: INTEGRATED MULTIPLE DIRECTED ATTENTION-BASED DEEP LEARNING 3520815
concentration, where the obtained results showed that the ConvLSTM, as well as LSTM and GRU with the
CNN-LSTM model provides the lowest root mean square attention mechanism (termed LSTM-A and GRU-A).
error (RMSE) and mean absolute error (MAE). A CNN-GRU The following section presents the preliminary material
model proposed in [39] is applied to forecast three air pollu- needed in this study and briefly introduces the IMDA-VAE
tants (PM2.5 , PM10 , and O3 ) of monitoring stations over 48 h. forecasting methodology. In Section II, the performances of
In this article, we propose an innovative deep learning model the considered methods are illustrated using air pollution data
to improve the forecasting quality of concentrations levels of from four US states. Finally, in Section IV, we conclude this
ambient air pollutants (NO2 , O3 , SO2 , and CO) based on a study and sheds light on potential future research lines.
wind attention mechanism and variational AE (VAE) deep
model. The contribution of this article is threefold. II. M ETHODOLOGY
1) The first contribution is mainly related to the devel- In this section, we briefly describe the basic concept of
opment of an innovative integrated multiple directed vanilla AEs and VAEs. Afterward, we briefly describe how
attention deep learning architecture (called IMDA-VAE) self-attention mechanisms work. We then introduce the pro-
based on the traditional VAE and the attention mech- posed IMDA-VAE deep learning-driven method. The overall
anism. To the best of the authors’ knowledge, this schematic of the proposed forecasting strategy is presented
is the first study introducing the traditional VAE and in Fig. 1.
the attention mechanism to accurately forecast various
air pollutants concentrations and leverage air pollution A. Variational Autoencoder
complexity. Essentially, the proposed method extends
the capability of the traditional VAE to model temporal To better understand the VAE, we first present a short
dependencies by introducing the self-attention mech- description of the AE. The AE comprises three principal
anism at a multilevel of the VAE model. The pro- elements, namely the encoder, the latent space, and the decoder
posed flexible modeling framework permits exploiting (see Fig. 2). The encoder receives input data, pollutants’
the suitable performance of VAE in time-series modeling concentration time series, and projects it into the latent space
and flexible nonlinear approximation and the focus on employing neural networks. Crucially, the pertinent informa-
the relevant features via the attention-based mechanism. tion is stored in the latent space, and the aim is to optimize
Exploiting all these sophisticated statistical tools is how data can be scattered in this latent space. On the other
advantageous in the sense that it has the potential to hand, the decoder represents a mirror of the encoder because
improve short-term forecasting of ambient air pollution it attempts to reconstruct the compressed latent space’s input
time-series data. data. In short, we can summarize the AE procedure using this
2) The second contribution lies in the validation of the chain of events
proposed method through three different forecasting encoder(x;θe ) decoder(z;θd )
x ⇒ z ⇒ x̂
experimentations: univariate, multivariate with a single
output, and multivariate with multiple outputs. where θe and θd , respectively, denote parameterizations of the
3) Finally, the third contribution is the comparative study encoder and decoder networks.
of the proposed forecasting model and some powerful The AE aims to give a dimensionality reduction procedure,
recurrent neural network models performed in this work which learns to encode the data such that a distance metric
to show and evaluate their performance and capabilities loss is optimized. Essentially, the more effective the encoder is
in forecasting air ambient pollutants. Air pollution data learning data compression, the more accurate the reconstructed
from four US states were used in the experiments data will be. A traditional and straightforward loss function
and assessment of the outputs’ deep learning-driven utilized for the AE is the mean squared error, which consists
forecasting methods. The results demonstrate that the of the squared deviation between the input, X, and output
designed IMDA-VAE method offers satisfying fore- data, X̂
casting performance of different air ambient pollu-
1
n
tants and outperformed other deep learning models, ( X̂ − X)2 . (1)
including GRUs, LSTM, BiGRU, Bidirectional LSTM, n i=1
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
3520815 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
DAIRI et al.: INTEGRATED MULTIPLE DIRECTED ATTENTION-BASED DEEP LEARNING 3520815
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
3520815 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
DAIRI et al.: INTEGRATED MULTIPLE DIRECTED ATTENTION-BASED DEEP LEARNING 3520815
TABLE I
S TATISTICS S UMMARY OF THE A RIZONA A MBIENT
A IR P OLLUTION D ATASETS
A. Data Description
The dataset used in this study was collected by the United
States Environmental Protection Agency, in which the con-
centration level of several ambient pollutants is recorded
daily in different states. The datasets are publicly accessi-
ble on the website “https://www.epa.gov/outdoor-air-quality-
data/download-daily-data.” The pollutants measurements were
collected during a period of 16 years exactly (2000–2016).
This study focuses on four significant pollutants, namely nitro-
gen dioxide (NO2 ), sulfur dioxide (SO2 ), carbon monoxide
(CO), and ozone (O3 ). In our study, we picked measurements
from four locations, namely California, Arizona, Texas, and
Pennsylvania. It should be mentioned that each state contains
numerous air quality monitoring stations; however, we selected
one station per state for measurement collection.
The descriptive statistics of the Arizona ambient air pol-
lution datasets are listed in Table I. The skewness of the
normal distribution (or any perfectly symmetric distribution)
is zero, and the kurtosis of the normal distribution is 3. We can
Fig. 5. Monthly distribution of concentration levels of (a) NO2 , (b) CO,
conclude from Table I that the Arizona ambient air pollution (c) O3 , and (d) SO2 in California.
time-series datasets are non-Gaussian distributed with positive
support and exhibit different intervals of variability. California, Pennsylvania, and Texas. Fig. 6 shows the presence
Monthly distributions of concentration levels of the four of a weak negative correlation between O3 and its precursors
ambient pollutants from California are shown in Fig. 5(a)–(d). (NO2 and CO) and a moderate positive correlation between
Fig. 5(c) shows that the highest O3 concentration levels can precursors, NO2 and CO. Besides, distinct patterns regarding
be seen in the summer season due to the local photochem- SO2 and the other pollutants are obtained for each station.
ical production. Also, Fig. 5(a) and (c) shows that NO2 is Interpretation of the relation between pollutants is relatively
negatively correlated with the variation of O3 concentration difficult because of their dependence on several factors, includ-
levels. This fact is because O3 is formed through the photo- ing meteorological variables and the transportation phenom-
chemical destruction of nitrogen dioxide (NO2 ) under sunlight. ena. Specifically, sometimes pollution concentration can be
Similarly, we can see the presence of a negative correlation high due to transported pollution produced elsewhere in the
between O3 and CO because of photochemical production of region. Various studies have been reported in the literature
O3 principally from the oxidation of natural and anthropogenic to investigate the correlation between different ambient air
hydrocarbons, carbon monoxide (CO), and methane (CH4) by pollutants [57], [58].
hydroxyl (OH) radical in the availability of enough quantity of
NOx. Of course, O3 is adversely correlated with O3 precursors
(i.e., NO2 and CO) [57]. B. Experiments Settings
Fig. 6 shows the correlation coefficients between the To evaluate the performance of the proposed approach in
NO2 , CO, O3 , and SO2 recorded, respectively, in Arizona, forecasting air ambient pollutants, several experiments are
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
3520815 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
Fig. 6. Pairwise correlation matrix between ambient air pollutants (NO2 , CO, O3 , and SO2 ) obtained in (a) Arizona, (b) California, (c) Pennsylvania, and
(d) Texas.
conducted to show its superiority and efficiency through a learn how to predict the next value of a given pollutant
comparative study including powerful recurrent neural net- based on a sequence of measurements of the four other
work models, namely GRU, LSTM, BiGRU, BiLSTM, and pollutants. In the end, a forecasting model dedicated to
ConvLSTM. We also compared the performance of the a pollutant with multiple inputs is obtained.
proposed IMDA-VAE approach to that GRU and LSTM with 3) Multivariates Forecasting With Multioutput: This exper-
the attention mechanism (LSTM-A and GRU-A). Importantly, imentation can be seen as a one-shot task, where all
these models with attention are designed by stacking three pollutants are forecast based on all historical measure-
components. For instance, in LSTM-A, LSTM is used to ments of all pollutants. In the end, only one forecasting
model time dependencies and extract temporal features, fol- model for all pollutants with multiple inputs to forecast
lowed by the attention module that aims to improve and all pollutants next values is derived.
highlight relevant extracted features, and finally, we added a
forecaster layer represented by a fully connected layer. In a C. Measurements of Effectiveness
similar way to LSTM-A, we designed GRU-A. Here, both
LSTM-A and GRU-A are trained through supervised learning. To assess the forecast precision and compare the models,
We used the same settings (hyperparameters) used for all some validation metrics, such as coefficient of determination
models: 300 epochs, a learning rate of 0.0001, batch size (R 2 ), RMSE, MAE, explained variance (EV), mean absolute
of 250, 32 hidden units, Rmsprop optimizer, and binary cross percentage error (MAPE), mean bias error (MBE), and
entropy as a lost function. To demonstrate the advantage of relative MBE (rMBE), are used (Table II), where yt is the
the proposed IMDA-VAE compared to the traditional deep concentration level of a pollutant, ŷt is its corresponding
learning models, we consider the following experimentations forecast values, and n is the number of data points [59], [60].
in this study. The more precise forecasting is, the lower RMSE, MAE, MBE
and rMBE values and high R 2 , EV, and MAPE values are.
1) Univariate Forecasting: In this experiment, each pollu-
tant is modeled and forecast individually, which means
that five models are trained to forecast the next values D. Results Analysis
based on only the measurement of the pollutant in The first set of experiments aims to analyze the performance
question. of the investigated models in forecasting the concentration
2) Multivariate Forecasting With One Output (Prediction): levels of the four pollutants separately. Toward this end, in this
The forecasting of each pollutant is done using all univariate forecasting, each model is trained in a supervised
other pollutants, which means that the training aims to way to learn temporal dependencies included in time-series
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
DAIRI et al.: INTEGRATED MULTIPLE DIRECTED ATTENTION-BASED DEEP LEARNING 3520815
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
3520815 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
TABLE IV
P ERFORMANCE C OMPARISON OF THE P ROPOSED U SING P OLLUTION
T ESTING D ATASETS F ROM T EXAS
TABLE V
P ERFORMANCE C OMPARISON OF THE P ROPOSED U SING P OLLUTION
T ESTING D ATASETS F ROM P ENNSYLVANIA
Fig. 7. Records and forecasts using the five models of (a) NO2 , (b) CO,
(c) O3 , and (d) SO2 from Arizona.
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
DAIRI et al.: INTEGRATED MULTIPLE DIRECTED ATTENTION-BASED DEEP LEARNING 3520815
TABLE VII
P ERFORMANCE C OMPARISON OF THE P ROPOSED A PPROACH
U SING CO P OLLUTANT TABLE IX
P ERFORMANCE C OMPARISON OF THE P ROPOSED A PPROACH
U SING SO 2 P OLLUTANT
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
3520815 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
TABLE X TABLE XI
P ERFORMANCE C OMPARISON OF THE P ROPOSED A PPROACH M ULTIVARIATE W ITH M ULTIOUTPUT P ERFORMANCE R ESULTS U SING
U SING O 3 P OLLUTANT P OLLUTION D ATASETS F ROM C ALIFORNIA
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
DAIRI et al.: INTEGRATED MULTIPLE DIRECTED ATTENTION-BASED DEEP LEARNING 3520815
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
3520815 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 70, 2021
[14] Z. Zhao, W. Chen, X. Wu, P. C. Y. Chen, and J. Liu, “LSTM network: [38] C.-J. Huang and P.-H. Kuo, “A deep CNN-LSTM model for particulate
A deep learning approach for short-term traffic forecast,” IET Intell. matter (PM2.5 ) forecasting in smart cities,” Sensors, vol. 18, no. 7,
Transp. Syst., vol. 11, no. 2, pp. 68–75, Mar. 2017. p. 2220, Jul. 2018, doi: 10.3390/s18072220.
[15] P. Jiang, C. Li, R. Li, and H. Yang, “An innovative hybrid air pollution [39] H. Wang, B. Zhuang, Y. Chen, N. Li, and D. Wei, “Deep infer-
early-warning system based on pollutants forecasting and extenics ential spatial-temporal network for forecasting air pollution con-
evaluation,” Knowl.-Based Syst., vol. 164, pp. 174–192, Jan. 2019. centrations,” Mach. Learn., to be published. [Online]. Available:
[16] A. Prakash, U. Kumar, K. Kumar, and V. K. Jain, “A wavelet-based https://arxiv.org/abs/1809.03964
neural network model to predict ambient air pollutants’ concentration,” [40] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” Stat,
Environ. Model. Assessment, vol. 16, no. 5, pp. 503–517, Oct. 2011. vol. 1050, pp. 1–15, Dec. 2014.
[17] M. Arhami, N. Kamali, and M. M. Rajabi, “Predicting hourly air [41] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational inference:
pollutant levels using artificial neural networks coupled with uncertainty A review for statisticians,” J. Amer. Stat. Assoc., vol. 112, no. 518,
analysis by Monte Carlo simulations,” Environ. Sci. Pollut. Res., vol. 20, pp. 859–877, Apr. 2017.
no. 7, pp. 4777–4789, Jul. 2013. [42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
[18] K. Siwek and S. Osowski, “Data mining methods for prediction of air tion,” 2014, arXiv:1412.6980. [Online]. Available: http://arxiv.org/abs/
pollution,” Int. J. Appl. Math. Comput. Sci., vol. 26, no. 2, pp. 467–478, 1412.6980
Jun. 2016. [43] J. An and S. Cho, “Variational autoencoder based anomaly detection
[19] H. Liu et al., “An intelligent hybrid model for air pollutant concentra- using reconstruction probability,” Special Lecture IE, vol. 2, pp. 1–18,
tions forecasting: Case of Beijing in China,” Sustain. Cities Soc., vol. 47, Dec. 2015.
May 2019, Art. no. 101471. [44] A. Dairi, F. Harrou, Y. Sun, and S. Khadraoui, “Short-term forecasting
[20] J. He et al., “Numerical model-based artificial neural network model of photovoltaic solar power production using variational auto-encoder
and its application for quantifying impact factors of urban air quality,” driven deep learning approach,” Appl. Sci., vol. 10, no. 23, p. 8400,
Water, Air, Soil Pollut., vol. 227, no. 7, p. 235, Jul. 2016. Nov. 2020.
[21] W. Ding, J. Zhang, and Y. Leung, “Prediction of air pollutant con- [45] G. Boquet, A. Morell, J. Serrano, and J. L. Vicario, “A variational
centration based on sparse response back-propagation training feed- autoencoder solution for road traffic forecasting systems: Missing
forward neural networks,” Environ. Sci. Pollut. Res., vol. 23, no. 19, data imputation, dimension reduction, model selection and anomaly
pp. 19481–19494, Oct. 2016. detection,” Transp. Res. C, Emerg. Technol., vol. 115, Jun. 2020,
[22] F. Harrou et al., Statistical Process Monitoring Using Advanced Data- Art. no. 102622.
Driven and Deep Learning Approaches: Theory and Practical Applica- [46] H. Gunduz, “An efficient stock market prediction model using hybrid
tions. Amsterdam, The Netherlands: Elsevier, 2020. feature reduction method based on variational autoencoders and recur-
[23] W. Mao, J. He, and M. J. Zuo, “Predicting remaining useful life of rolling sive feature elimination,” Financial Innov., vol. 7, no. 1, pp. 1–24,
bearings based on deep feature representation and transfer learning,” Dec. 2021.
IEEE Trans. Instrum. Meas., vol. 69, no. 4, pp. 1594–1608, Apr. 2020. [47] A. Zeroual, F. Harrou, A. Dairi, and Y. Sun, “Deep learning methods for
[24] X. Yuan, S. Qi, and Y. Wang, “Stacked enhanced auto-encoder for data- forecasting COVID-19 time-series data: A comparative study,” Chaos,
driven soft sensing of quality variable,” IEEE Trans. Instrum. Meas., Solitons Fractals, vol. 140, Nov. 2020, Art. no. 110121.
vol. 69, no. 10, pp. 7953–7961, Oct. 2020. [48] M. R. Ibrahim, J. Haworth, A. Lipani, N. Aslam, T. Cheng, and
[25] F. Harrou, T. Cheng, Y. Sun, T. Leiknes, and N. Ghaffour, “A data-driven N. Christie, “Variational-LSTM autoencoder to forecast the spread of
soft sensor to forecast energy consumption in wastewater treatment coronavirus across the globe,” PLoS ONE, vol. 16, no. 1, Jan. 2021,
plants: A case study,” IEEE Sensors J., vol. 21, no. 4, pp. 4908–4917, Art. no. e0246120.
Feb. 2021. [49] Y. Zerrouki, F. Harrou, N. Zerrouki, A. Dairi, and Y. Sun, “Desertifica-
[26] F. Harrou, F. Kadri, and Y. Sun, “Forecasting of photovoltaic solar power tion detection using an improved variational autoencoder-based approach
production using LSTM approach,” in Advanced Statistical Modeling, through ETM-landsat satellite data,” IEEE J. Sel. Topics Appl. Earth
Forecasting, and Fault Detection in Renewable Energy Systems. London, Observ. Remote Sens., vol. 14, pp. 202–213, 2021.
U.K.: IntechOpen, 2020, p. 3. [50] L. Li, J. Yan, H. Wang, and Y. Jin, “Anomaly detection of time
[27] Y.-S. Chang, H.-T. Chiao, S. Abimannan, Y.-P. Huang, Y.-T. Tsai, and series with smoothness-inducing sequential variational auto-encoder,”
K.-M. Lin, “An LSTM-based aggregated model for air pollution fore- IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 3, pp. 1177–1191,
casting,” Atmos. Pollut. Res., vol. 11, no. 8, pp. 1451–1463, Aug. 2020. Mar. 2021.
[28] Q. Tao, F. Liu, Y. Li, and D. Sidorov, “Air pollution forecasting using [51] C. Zhang, S. Li, H. Zhang, and Y. Chen, “VELC: A new variational
a deep learning model based on 1D convnets and bidirectional GRU,” AutoEncoder based model for time series anomaly detection,” 2019,
IEEE Access, vol. 7, pp. 76690–76698, 2019. arXiv:1907.01702. [Online]. Available: http://arxiv.org/abs/1907.01702
[29] S. Al-Janabi, M. Mohammad, and A. Al-Sultan, “A new method [52] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
for prediction of air pollution based on intelligent computation,” Soft jointly learning to align and translate,” 2014, arXiv:1409.0473. [Online].
Comput., vol. 24, no. 1, pp. 661–680, Jan. 2020. Available: http://arxiv.org/abs/1409.0473
[30] T.-C. Bui, V.-D. Le, and S.-K. Cha, “A deep learning approach for [53] K. Xu et al., “Show, attend and tell: Neural image caption genera-
forecasting air pollution in South Korea using LSTM,” Mach. Learn., tion with visual attention,” in Proc. Int. Conf. Mach. Learn., 2015,
to be published. [Online]. Available: https://arxiv.org/abs/1804.07891v3 pp. 2048–2057.
[31] S. Kim, J. M. Lee, J. Lee, and J. Seo, “Deep-dust: Predicting concen- [54] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to
trations of fine dust in Seoul using LSTM,” Climate Informat., to be attention-based neural machine translation,” 2015, arXiv:1508.04025.
published. [Online]. Available: http://arxiv.org/abs/1901.10106 [Online]. Available: http://arxiv.org/abs/1508.04025
[32] B. S. Freeman, G. Taylor, B. Gharabaghi, and J. Thé, “Forecasting air [55] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf.
quality time series using deep learning,” J. Air Waste Manage. Assoc., Process. Syst., 2017, pp. 5998–6008.
vol. 68, no. 8, pp. 866–886, Aug. 2018. [56] C. Cortes, M. Mohri, and A. Rostamizadeh, “L2 regularization for learn-
[33] A. V, G. P, V. R, and S. K P, “DeepAirNet: Applying recurrent ing kernels,” in Proc. 25th Conf. Uncertainty Artif. Intell., Arlington, VA,
networks for air quality prediction,” Procedia Comput. Sci., vol. 132, USA: AUAI Press, 2009, pp. 109–116.
pp. 1394–1403, Jan. 2018, doi: 10.1016/j.procs.2018.05.068. [57] J. Ngarambe, S. J. Joen, C.-H. Han, and G. Y. Yun, “Exploring the
[34] T. Xayasouk and H. Lee, “Air pollution prediction system using deep relationship between particulate matter, CO, SO2 , NO2 , O3 and urban
learning,” WIT Trans. Ecology Environ., vol. 230, pp. 71–79, Oct. 2018. heat island in Seoul, Korea,” J. Hazardous Mater., vol. 403, Feb. 2021,
[35] X. Li, L. Peng, Y. Hu, J. Shao, and T. Chi, “Deep learning architecture Art. no. 123615.
for air quality predictions,” Environ. Sci. Pollut. Res., vol. 23, no. 22, [58] Ş. Ç. Doǧruparmak and B. Özbay, “Investigating correlations and
pp. 22408–22417, Nov. 2016, doi: 10.1007/s11356-016-7812-9. variations of air pollutant concentrations under conditions of rapid
[36] X. Li et al., “Long short-term memory neural network for air pollu- industrialization—Kocaeli (1987–2009): Investigating correlations and
tant concentration predictions: Method development and evaluation,” variations,” CLEAN Soil, Air, Water, vol. 39, no. 7, pp. 597–604,
Environ. Pollut., vol. 231, pp. 997–1004, Dec. 2017, doi: 10.1016/ Jul. 2011.
j.envpol.2017.08.114. [59] R. A. Rajagukguk, R. A. A. Ramadhan, and H.-J. Lee, “A review
[37] P.-W. Soh, J.-W. Chang, and J.-W. Huang, “Adaptive deep learning-based on deep learning models for forecasting time series data of solar
air quality prediction model using the most relevant spatial-temporal irradiance and photovoltaic power,” Energies, vol. 13, no. 24, p. 6623,
relations,” IEEE Access, vol. 6, pp. 38186–38199, 2018. Dec. 2020.
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.
DAIRI et al.: INTEGRATED MULTIPLE DIRECTED ATTENTION-BASED DEEP LEARNING 3520815
[60] J. Zhang et al., “A suite of metrics for assessing the performance of Sofiane Khadraoui received the B.E. and magister’s degrees in automatic
solar power forecasting,” Sol. Energy, vol. 111, pp. 157–175, Jan. 2015. control from Abou Bekr Belkaïd University, Tlemcen, Algeria, in 2004 and
[61] M. Kuhn et al., Applied Predictive Modeling, vol. 26. New York, NY, 2007, respectively, and the Ph.D. degree in automatic control from the
USA: Springer, 2013. University of Franche-Comté, Besançon, France, in 2012.
[62] G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to From 2012 to 2015, he was a Post-Doctoral Research Associate with Texas
Statistical Learning, vol. 112. New York, NY, USA: Springer, 2013. A&M University, Doha, Qatar. In 2015, he joined the Electrical Engineering
[63] G. E. Hinton, “A practical guide to training restricted Boltzmann Department, University of Sharjah, Sharjah, UAE, where he is currently an
machines,” in Neural Networks: Tricks of the Trade. Berlin, Germany: Assistant Professor. His main research interests mainly include robust control
Springer, 2012, pp. 599–619. design for uncertain systems, data-based control design for unknown systems,
[64] R. Sarikaya, G. E. Hinton, and A. Deoras, “Application of deep belief robust stability analysis, process monitoring and fault detection, compensation
networks for natural language understanding,” IEEE/ACM Trans. Audio, of nonlinear phenomena, modeling of uncertainties, optimization, and interval
Speech, Language Process., vol. 22, no. 4, pp. 778–784, Apr. 2014. computation.
[65] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm
for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554,
Jul. 2006.
Abdelkader Dairi received the B.E. degree in computer science from the
University of Oran 1 Ahmed Ben Bella, Oran, Algeria, in 2003, the magister’s
degree in computer science from the National Polytechnic School of Oran,
Oran, in 2006, and the Ph.D. degree in computer science from Ben Bella
Oran1 University, Oran, in 2018.
From 2007 to 2013, he was a Senior Oracle Database Administrator (DBA)
and Enterprise Resource Planning (ERP) Manager. He has over 20 years
of programming experience in different languages and environments. He is
currently an Assistant professor with the Department of Computer Science,
University of Sciences and Technology of Oran Mohamed Boudiaf, Bir El
Djir, Algeria. His research interests include programming languages, artificial
intelligence, computer vision, machine learning, and deep learning.
Authorized licensed use limited to: National Taipei Univ. of Technology. Downloaded on August 15,2021 at 14:54:06 UTC from IEEE Xplore. Restrictions apply.