Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Accepted Manuscript

Spatial estimation of urban air pollution with the use of artificial neural network models

A. Alimissis, K. Philippopoulos, C.G. Tzanis, D. Deligiorgi

PII: S1352-2310(18)30511-9
DOI: 10.1016/j.atmosenv.2018.07.058
Reference: AEA 16164

To appear in: Atmospheric Environment

Received Date: 29 January 2018


Revised Date: 20 July 2018
Accepted Date: 30 July 2018

Please cite this article as: Alimissis, A., Philippopoulos, K., Tzanis, C.G., Deligiorgi, D., Spatial
estimation of urban air pollution with the use of artificial neural network models, Atmospheric
Environment (2018), doi: 10.1016/j.atmosenv.2018.07.058.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to
our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and all
legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

1 Spatial estimation of urban air pollution with the use of artificial neural network models

2 Alimissis A., Philippopoulos K., Tzanis C.G.*, Deligiorgi D.


3 Section of Environmental Physics and Meteorology, Department of Physics, National and
4 Kapodistrian University of Athens, 15784 Athens, Greece
5 * Corresponding author. C.G. Tzanis. National and Kapodistrian University of Athens, Department
6 of Physics, Section of Environmental Physics and Meteorology, University Campus, Bldg Phys-5,
7 15784 Athens, Greece

PT
8 E-mail: chtzanis@phys.uoa.gr

9 ABSTRACT

RI
10 The deterioration of urban air quality is considered worldwide one of the primary
11 environmental issues and scientific evidence associates the exposure to ambient air pollution
12 with serious health effects. This fact highlights the importance of generating accurate fields of

SC
13 air pollution for quantifying present and future health related risks. Interpolation methods for
14 point estimations in the field of air pollution modelling enable the estimation of pollutant
15 concentrations in unmonitored locations. The main objective of this study is to evaluate two
16 interpolation methodologies, Artificial Neural Networks and Multiple Linear Regression,

U
17 using data from a real urban air quality monitoring network located at the greater area of
18 metropolitan Athens in Greece. The results for five regulated air pollutants (Nitrogen dioxide,
AN
19 Nitrogen monoxide, Ozone, Carbon monoxide and Sulphur dioxide) are compared through
20 the use of a set of correlation and difference statistical measures and residuals distribution.
21 Artificial neural networks are found in most cases to be significantly superior, especially
22 where the air quality network density is limited, leading to a decreased degree of spatial
M

23 correlations among the monitoring sites.

24 Keywords
D

25 Air quality, spatial interpolation, artificial neural networks


26 1. Introduction
TE

27 Urban air pollution is considered a major environmental issue because it is associated with a
28 variety of adverse effects on human health. It is considered the primary cause of mortality
29 related to environmental conditions (Aunan and Pan, 2004; Curtis et al., 2006; Scoggins et al.,
EP

30 2004) among a variety of other effects (Wiedensohler et al., 2002; Tzanis et al., 2009;
31 Ganguly and Tzanis, 2011; Varotsos et al., 2012a,b; Amanollahi et al., 2013). In order to
32 minimize future health related risks, it is necessary to introduce a series of countermeasures
33 based on information provided by accurate fields of air pollutant distributions. Air pollution
C

34 modelling follows two different approaches. The first approach is the numerical modelling of
35 air pollutants dispersion, which involves the simulation of dispersion and transport
AC

36 mechanisms using emission source data and the knowledge of the chemical transformations in
37 the atmosphere. On the contrary, the second approach employs advanced statistical models,
38 such as machine learning methodologies, to data from air quality monitoring networks of
39 urban areas. The statistical approach takes advantage of the spatial and temporal correlations
40 that are present in the air pollution concentration time series and formulate models that
41 simulate these dependencies with a high degree of accuracy. The spatial interpolation
42 schemes can be classified in various categories such as global or local methodologies and
43 exact or approximate among others (Li and Heap, 2011). Air pollution point spatial
44 estimations is an extremely important field of spatial interpolation methodologies as the
45 available data from an existing air quality monitoring network can be used for predicting air
46 pollutant concentrations at unmonitored locations. In this field, a commonly used linear
47 interpolation scheme is the Multiple Linear Regression (MLR), which can generate accurate
48 results (Vicente-Serrano et al., 2003; Rosenlund et al., 2008; Li et al., 2010; Dominick et al.,

1
ACCEPTED MANUSCRIPT
49 2012). However, air pollutants spatial distribution is influenced by a number of complex
50 physical processes that require a more sophisticated non-linear mathematical approach.
51 Artificial Neural Network (ANN) models have been used as an interpolation scheme for non-
52 linear problems (Gardner and Dorling, 1999; Bandyopadhyay and Chattopadhyay, 2007;
53 Kalogirou, 2001; Şahin, 2012; Abdul-Wahab and Al-Alawi, 2002; Tasadduq and Rehman,
54 2002; Voukantsis et al., 2011; Rigol et al., 2001; Viotti et al., 2002) but without evaluating
55 their predictive ability in comparison with other statistical methodologies (Sousa et al., 2007;
56 Fallahi et al., 2018).
57 The first application of the ANNs in the field of air quality was performed by Boznar et al.

PT
58 (1993). Gardner and Dorling (1998) concluded that the ANNs is a successful methodology
59 and provides better results than statistical linear methods, due to the nonlinear behavior of air
60 pollutants. A subsequent detailed evaluation and comparison of multilayer ANNs, self-
61 organizing maps (SOMs) and a linear method yielded more accurate forecasts for the ANN

RI
62 scheme for hourly NO2 concentrations (Kolehmainen et al., 2001). Furthermore, Kukkonen et
63 al. (2003) evaluated multiple ANN models for NO2 and PM10 temporal predictions in
64 Helsinki. In recent years, ANNs are used for forecasting PM2.5, due to its importance for

SC
65 human health (Feng et al., 2015; Mishra et al., 2015; Franceschi et al., 2018). Regarding
66 spatial forecasting, Li and Heap (2014) provided a detailed review about the various spatial
67 interpolation methodologies. However, research on ANNs spatial forecasting ability in the
68 field of air quality is limited. Pfeiffer et al. (2009), by using a large number of diffusive

U
69 samplers, calculated the average NO2 distribution in Cyprus, while Wahid et al. (2013)
70 constructed an ANN model in order to predict ground level O3 concentrations in Sydney.
AN
71 Both studies found that ANNs estimate accurately the selected pollutants’ spatial distribution.
72 The main objective of this study is to model the spatial atmospheric pollution variability using
73 the ANN and MLR methodologies for the greater area of metropolitan Athens in Greece,
M

74 focusing on five regulated atmospheric pollutants (Nitrogen dioxide-NO2, Nitrogen


75 monoxide-NO, Ozone-O3, Carbon monoxide-CO, Sulphur dioxide-SO2). The direct
76 comparison between the two spatial interpolation schemes can provide useful information and
77 determine which method is considered optimum. Finally, this work evaluates the spatial
D

78 predictive ability of ANNs to perform accurate predictions for air pollutants in urban
79 environments with complex topography. In the following section of the paper the area of
TE

80 study and the available air quality data are presented. The methodology and the required
81 theoretical background of the MLR and the ANN models are presented in the third section
82 whereas the results and the conclusions are presented in the final two sections of the paper.
EP

83 2. Area of study – Air quality data


84 The area of study is the Attica region in Greece, located at the southeastern Mediterranean
85 (Fig. 1). The climate of the region is characterized by a distinct, long, dry period during
C

86 summer and a short, wet period in the winter (Founda and Giannakopoulos, 2009). During the
87 warm period, extreme high temperature events with high solar radiation intensities are
AC

88 observed. The topography of the region influences the air pollution levels by restricting
89 atmospheric dispersion and affecting the transport mechanisms (Deligiorgi et al., 2009, 2013;
90 Moustris et al., 2010; Zoras et al., 2006; Asimakopoulos et al., 1992). The urban area of
91 metropolitan Athens is located within the Athens basin, which is defined by the Mount
92 Parnitha and Penteli at the north, Mount Hymmetos at the east and Mount Aegaleo at the
93 west, which separates the basin from the heavily industrialized area of Thriassion plain. These
94 features in combination with the close proximity to the sea (Saronikos Gulf from the south
95 and Evoikos Gulf from the east) are responsible for a number of local and mesoscale flows
96 (e.g. sea-breeze), forming specific climatic conditions, which greatly affect air pollution
97 (Lalas et al., 1983; Mavrakou et al., 2012). The main anthropogenic sources of air pollution
98 are fossil fuel combustion for transport and domestic heating along with industry emissions.

2
ACCEPTED MANUSCRIPT

PT
RI
99 Fig. 1. Area of study and air quality monitoring network.

SC
100 The Hellenic Ministry of Environment, Energy and Climate Change (MEECC) operate the air
101 quality monitoring network of the greater area of metropolitan Athens since 1984. The
102 majority of the monitoring sites are located in urban and suburban areas of Athens, where

U
103 different harmful air pollution events are reported due to the effect of emission sources,
104 topography and the lack of an effective urban development plan. The statistical analysis is
AN
105 based on the development of an air quality database of hourly air pollutant concentrations for
106 examining the intra-daily spatial variability. The criteria for the selection of the optimum
107 number of monitoring sites and pollutants are the overall and yearly data availability along
108 with homogeneous spatial data coverage for each air pollutant. Following the above criteria
M

109 the selected pollutants are NO2, NO, O3, CO, SO2, and from 2001 to 2013 from 13 monitoring
110 sites (Table 1).
111 Table 1. Air quality monitoring sites, coordinates and station characteristics.
D

Altitude
Site Abbreviation Longitude Latitude Station type
(a.m.s.l.)
TE

Athinas ATH 23°43’36’’ 37°58’41’’ 100 Urban/Traffic


Aristotelous ARI 23°43’39’’ 37°59’16’’ 95 Urban/Traffic
Geoponiki GEO 23°42’24’’ 37°59’01’’ 40 Suburban/Industrial
Liosia LIO 23°41’52’’ 38°04’36’’ 165 Suburban/Background
EP

Lykovrisi LYK 23°47’19’’ 38°04’04’’ 234 Suburban


Marousi MAR 23°47’14’’ 38°01’51’’ 170 Urban/Traffic
Nea Smyrni SMY 23°42’46’’ 37°55’55’’ 50 Urban/Background
Patission PAT 23°43’58’’ 37°59’58’’ 105 Urban/Traffic
C

Piraeus PIR 23°38’42’’ 37°56’40’’ 4 Urban/Traffic


Peristeri PER 23°41’18’’ 38°01’14’’ 80 Urban/Background
AC

Ag. Paraskevi AGP 23°49’09’’ 37°59’42’’ 290 Suburban/Background


Elefsina ELE 23°32’18’’ 38°03’04’’ 20 Suburban/Industrial
Thrakomakedones THR 23°45’29’’ 38°08’36’’ 550 Suburban/Background

112 Furthermore, the network density of the selected monitoring sites ranges from 4 for SO2 to 13
113 for NO and NO2 (Table 2). The pollution concentration measurements were performed with
114 conventional analyzers (APOA, HORIBA, using the methods of chemiluminescence for NO
115 and NO2, UV absorption for O3, IR absorption for CO and fluorescense for SO2). A reference
116 also needs to be made about the significance of the five atmospheric pollutants. Although
117 there are variations in average annual pollution values, a downward trend or stabilization
118 trend is observed during the examined period. Figure 2 depicts the average annual evolution
119 for the five selected pollutants and for the 2001 to 2013 time period. The stations which are
120 presented are characterized by the highest average concentrations in each case.

3
ACCEPTED MANUSCRIPT

PT
RI
U SC
AN
121 Fig.2. Yearly averaged concentrations for NO2, NO and CO at PAT and ATH sites (a, b and d
M

122 respectively), for O3 at AGP and THR sites (c) and for SO2 at PAT and PIR (e). The three
123 different time periods correspond to the FFNN scheme subsets (training, validation and test).
D

124 The selected dataset, according to the annual reports provided by the MEECC is
125 representative of the air pollution spatial variability at the area of study (MEECC, 2013) and
126 can be used effectively in statistical modeling via machine learning approaches. According to
TE

127 the correlation analysis, the maximum correlation coefficient is observed between different
128 stations depending on the type of pollutant. For example, for NO2, NO and CO the higher
129 correlation coefficient values are found between the PER and LIO sites (0.83), the MAR and
130 GEO sites (0.84) and the PER and GEO sites (0.84) respectively. It should be noted that for
EP

131 all pollutants and especially for the NO2 and NO, the THR site time series exhibit low
132 correlation coefficient values. This can be attributed to the location of the station (Figure 1).
133 Table 2. Monitoring sites for each pollutant.
C

NO2 NO O3 CO SO2
PAT ✓ ✓ ✓ ✓ ✓
AC

PIR ✓ ✓ ✓ ✓ ✓
ARI ✓ ✓ - - -
ATH ✓ ✓ ✓ ✓ ✓
ELE ✓ ✓ ✓ - -
SMY ✓ ✓ ✓ ✓ -
LIO ✓ ✓ ✓ - -
PER ✓ ✓ ✓ ✓ ✓
MAR ✓ ✓ ✓ ✓ -
LYK ✓ ✓ ✓ - -
GEO ✓ ✓ ✓ ✓ -
AGP ✓ ✓ ✓ - -
THR ✓ ✓ ✓ - -

4
ACCEPTED MANUSCRIPT
134 3. Methods
135 3.1. Methodology
136 In this study the air pollution spatial variability is modeled using the MLR and ANN
137 approaches. The evaluation of both linear (MLR) and non-linear (ANN) schemes is
138 performed through the use of the leave-one-out cross-validation methodology, where for each
139 pollutant a specific monitoring site is the target site and the concentrations at the remaining
140 monitoring sites (their number depends on the pollutant) are used to estimate the air pollutant
141 concentrations at the target site. Both approaches are data intensive and require representative
142 datasets for achieving a high level of generalization. For this reason the experimental air

PT
143 pollution database is divided into three subsets, the training, validation and test sets.
144 Specifically, depending on the air pollutant, more than 60% of the available data is used for
145 training the models, approximately 15% to 20% is used for validation and the final two years
146 of the database (test set) are used for assessing the predictive performance of the schemes. In

RI
147 Table 3 for each pollutant the exact data percentages along with the associated number of
148 pollutant concentrations for each subset are presented.

SC
149 Table 3. Training, Validation and Test sets for each pollutant.
ΝΟ2 ΝΟ CO SO2 O3
% points % points % points % points % points
Training 65.50 23,294 65.45 23,108 65.03 44,002 60.08 43,437 63.94 28,340

U
Validation 15.65 5,561 15.58 5,501 16.05 10,860 19.01 13,748 17.63 7,816
Prediction 18.85 6,700 18.97 6,696 18.92 12,804 20.91 15,116 18.43 8,168
AN
150 The accuracy of both schemes is based on the comparison of the observed and predicted
151 concentrations and the statistical analysis of the model residuals. The mean performance of
152 each scheme is examined using a set of difference and correlation measures (i.e. Mean
M

153 Absolute Error – MAE, Root Mean Square Error – RMSE and the coefficient of
154 determination – R2) according to the following equations:

1
= | − |
(1)
D

1
TE

(2)
= −
EP

∑ − −
(3)
=
∑ − ∑ −
C

155 where O and P are the observed and predicted concentrations, respectively. The above
AC

156 statistics are calculated for the test set, where the scheme with lower MAE and RMSE values
157 and higher R2 values is considered to model more effectively the air pollution spatial
158 variability. Information regarding model bias and under or over-prediction is provided by the
159 examination of the residuals distributions whereas scatter diagrams are used for the best
160 performing model to examine the error magnitude for low, medium and high concentration
161 levels.

162 3.2 Multiple linear regression


163 The MLR in the context of spatial interpolation is a linear methodology based on the
164 assumption that the spatial relationship between the concentrations of a target site with one or
165 more reference sites can be modeled using linear predictor functions. Linear regression
166 analysis is used extensively in statistical applications and requires training to calculate the

5
ACCEPTED MANUSCRIPT
167 values of a number of coefficients, which associate the response variable (air pollutant
168 concentrations at the target site) with the explanatory variables (air pollutant concentrations at
169 the reference sites) according to the relationship:
= + # + # +⋯+ # (4)
170 where zi are the target site data, xi the reference site data and bi the regression coefficients.
171 Higher values for the regression coefficients indicate an increased importance of the reference
172 site to the predicted concentration levels of the target site.

PT
173 3.3 Artificial Neural Networks
174 ANN development is the result of the formulation of simplified mathematical models based
175 on the brain physiology. The human brain is a highly complex information processing system

RI
176 whose main feature is the ability for parallel and non-linear processing of external
177 information-stimuli. The fundamental structural element of the biological nervous system is
178 the biological neuron. In analogy with a biological neural network, ANNs are defined as an

SC
179 architectural structure (network) consisting of a large number of parallel, interconnected set of
180 adaptive processing units and their hierarchical organization aims to interact with the
181 environment (Kohonen, 1988). According to Haykin (2009), the two main similarities of the
182 ANNs and the biological networks are that a training process provides the information to the

U
183 networks and that the interconnections between the artificial neurons (synaptic weights w) are
184 used to store the acquired information. The processing units are called artificial neurons
AN
185 (McCulloch and Pitts, 1943), which are the computational analogs of biological cells (Figure
186 3a). The output of such a neuron can be calculated using the equation:
'

%=& # )* − +
(5)
M

(
187 where xi are the inputs, wi the synaptic weights, θ the threshold and f the activation function.
188 The synaptic weights adjust the strength of the connections, while depending on the
D

189 application, a number of different activation functions can be used (e.g. linear, sigmoid,
190 hyperbolic tangent). In this study the Feed-Forward Neural Network (FFNN) architecture
TE

191 (Figure 3b) is used due to its ability to model very effectively any measurable input-output
192 relationship to any desired degree of accuracy (Hornik, 1989). Their main characteristic is
193 that the information flows in only one direction (forward) and they are characterized by lack
194 of memory.
C EP
AC

195 Fig. 3. An artificial neuron model (a) and a fully interconnected multi-layer FFNN network
196 with M input nodes, a hidden layer with K neurons and N neuron output layer (M-K-N) (b).

6
ACCEPTED MANUSCRIPT
197 Regarding the current application, the number of reference sites for each pollutant defines the
198 number of input layer nodes, whereas according to leave-one-out cross-validation
199 methodology the output layer consists of a single neuron. A hyperbolic tangent activation
200 function is used for the neurons of the input and hidden layer and a linear function for the
201 output layer. The optimum FFNN architecture for each case (number of hidden layer neurons)
202 is related to the complexity of the spatial variability and is defined by minimizing the MAE of
203 the validation set according to the methodology proposed by Philippopoulos and Deligiorgi
204 (2012). The trial and error procedure involves training multiple FFNN models for different
205 number of hidden layer neurons (from 1 to 30) with the Levenberg-Marquardt back-
206 propagation algorithm. Furthermore, due to the sensitivity of the algorithm to initial synaptic

PT
207 weights and to avoid local minima, repeated training with random initial weights is performed
208 (10 repetitions). An additional scope of the study is to estimate the explanatory capacities of
209 the FFNN models, by calculating the Relative Importance percentage (RI) of the input

RI
210 variables to the FFNN output. The methodology is proposed by Garson (1991) and repeated
211 by Goh (1995). It is based on the connection synaptic weights and reveals important
212 information regarding the spatial associations of the air monitoring sites. The synaptic
213 weights indicate the effect each input site’s data has on the output results. The RI is calculated

SC
) )
214 by:
∑41 0 31 *1 0
∑ 2)1 2
,-. % =
2)*1 22)1 2
∑ ∑1
3 4
(6)

U ∑3 2)1 2
AN
215 where wji is the synaptic weight between i-th input and j-th hidden neuron, and wkj the
216 synaptic weight between j-th hidden and k-th output neuron.
M

217 4. Results
218 In Table 4 the optimum FFNN architecture for each station and pollutant is presented. An
219 initial finding is that the number of hidden layer neurons is related to the complexity of
D

220 the input-output relationship and therefore to the degree of the air pollution spatial
221 variability in the study area. The values of the hidden layer neurons for each pollutant,
TE

222 averaged over all available monitoring sites, are 18, 22, 22, 20 and 17 for NO2, NO, O3,
223 CO and SO2, respectively.
224 Table 4. FFNN architecture (Input Layer Neurons - Hidden Layer Neurons – Output Layer
EP

225 Neurons).
NO2 NO O3 CO SO2
PAT 12-23-1 12-24-1 11-28-1 6-30-1 3-15-1
PIR 12-16-1 12-29-1 11-12-1 6-27-1 3-15-1
C

ARI 12-17-1 12-30-1 - - -


ATH 12-19-1 12-21-1 11-12-1 6-15-1 3-24-1
ELE 12-14-1 12-11-1 11-21-1 - -
AC

SMY 12-18-1 12-19-1 11-22-1 6-12-1 -


LIO 12-12-1 12-23-1 11-20-1 - -
PER 12-23-1 12-16-1 11-27-1 6-18-1 3-14-1
MAR 12-19-1 12-16-1 11-24-1 6-15-1 -
LYK 12-12-1 12-24-1 11-30-1 - -
GEO 12-18-1 12-27-1 11-14-1 6-21-1 -
AGP 12-17-1 12-21-1 11-28-1 - -
THR 12-23-1 12-19-1 11-28-1 - -
226 A general remark, according to the comparison of the model performance statistics (Table 6-
227 10), is that in the majority of the cases the FFNN models outperform the MLR prediction
228 schemes (Table 5) (Grivas and Chaloulakou, 2006). The performance of both schemes is
229 found to depend heavily on the target station and the examined pollutant, due to the complex
230 effects of topography, transport mechanisms, chemical reactions and air pollution sources and

7
ACCEPTED MANUSCRIPT
231 emissions. In some cases the results are also affected by insufficient data for
232 inter/extrapolation during the training phase of both schemes. An interesting finding is related
233 to the sites where the prediction ability for both schemes is limited (e.g. PAT) (Moustris et al.,
234 2010). In these cases, for all pollutants, the FFNN models are significantly superior compared
235 to the MLR schemes with a relative decrease in terms of the MAE up to 63.9% for SO2. This
236 fact is attributed to the ability of non-linear models to simulate complex relationships better,
237 compared to linear models (Yi and Prybutok, 1996). In cases where both schemes are suitable
238 for spatial point interpolation, FFNN models are associated with lower MAE values however
239 their differences are marginal. An additional conclusion is related to the air quality
240 monitoring network density, which varies from 4 sites for SO2 to 13 sites for NO and NO2

PT
241 (Table 2). The FFNN models exhibit higher predictive ability than the MLR schemes for
242 sparse monitoring networks (i.e. where the information provided to the models is limited). In
243 more detail, the FFNNs for SO2 and CO display superior performance for nearly all

RI
244 monitoring sites, whereas for NO2, NO and O3 the MLR scheme is the optimum choice for
245 four, two and three sites respectively (Table 5).
246 Table 5. Predictive performance comparison of the FFNN and MLR models. In each case the

SC
247 best performing scheme is displayed and boldfaced values refer to cases where the predictive
248 performance was above average.
NO2 NO O3 CO SO2

U
PAT FFNN FFNN FFNN FFNN FFNN
PIR FFNN FFNN MLR FFNN MLR
AN
ARI MLR FFNN - - -
ATH MLR FFNN FFNN/MLR FFNN FFNN
ELE FFNN FFNN MLR - -
SMY FFNN FFNN FFNN FFNN -
M

LIO FFNN FFNN MLR - -


PER MLR FFNN/MLR FFNN FFNN/MLR FFNN
MAR FFNN/MLR FFNN FFNN FFNN/MLR -
D

LYK MLR FFNN/MLR FFNN - -


GEO FFNN FFNN/MLR FFNN FFNN -
TE

AGP FFNN MLR FFNN - -


THR FFNN MLR FFNN - -

249 Table 6. Performance statistics results for NO2.


EP

ΜΑΕ (µg/m3) RMSE (µg/m3) R2


MLR FFNN MLR FFNN MLR FFNN
PAT 22.72 20.61 27.28 25.60 0.28 0.38
C

PIR 17.82 13.96 20.56 17.54 0.43 0.41


ARI 10.42 21.7 13.46 25.94 0.73 0.64
AC

ATH 9.42 9.67 11.61 11.85 0.69 0.69


ELE 10.96 10.39 14.75 14.55 0.55 0.55
SMY 11.01 10.35 15.6 15.32 0.56 0.59
LIO 7.24 7.01 10.21 10.13 0.71 0.71
PER 7.81 7.94 11.48 11.53 0.67 0.68
MAR 8.11 8.07 11.8 11.92 0.73 0.74
LYK 7.14 7.91 10.24 11 0.65 0.66
GEO 9.06 8.67 13.34 12.63 0.71 0.72
AGP 7.45 6.01 10.3 9.31 0.29 0.23
THR 3.92 3.7 6.15 6.11 0.31 0.31

8
ACCEPTED MANUSCRIPT
250 Table 7. Performance statistics results for NO.
ΜΑΕ (µg/m3) RMSE (µg/m3) R2
MLR FFNN MLR FFNN MLR FFNN
PAT 46.98 35.89 55.15 44.58 0.54 0.59
PIR 21.37 19.32 30.39 31.82 0.53 0.55
ARI 12.76 12.18 25.33 25.84 0.79 0.8
ATH 13.92 13.31 22.69 22.68 0.81 0.83
ELE 5.43 4.84 10.5 10.3 0.56 0.56
SMY 8.48 6.4 15.71 13.66 0.6 0.67
LIO 4.62 3.77 10.5 9.4 0.72 0.75

PT
PER 5.6 5.62 10.4 9.95 0.62 0.65
MAR 6.55 5.42 13.86 13.48 0.73 0.75
LYK 7.37 7.32 13.43 13.59 0.67 0.66
GEO 8.05 7.55 15.63 16.17 0.78 0.77

RI
AGP 1.75 1.86 2.55 2.64 0.09 0.08
THR 0.75 0.84 1.46 1.51 0.13 0.13
251 Table 8. Performance statistics results for O3.

SC
ΜΑΕ (µg/m3) RMSE (µg/m3) R2
MLR FFNN MLR FFNN MLR FFNN
PAT 12.57 11.88 16.6 16.41 0.41 0.44
PIR 15.14 15.2 19.21 19.71 0.54 0.54

U
ATH 9.91 9.69 12.86 13.18 0.8 0.8
ELE 13.34 13.55 17.16 17.4 0.7 0.7
AN
SMY 11.4 11.06 17.74 15.03 0.85 0.84
LIO 13.13 14.84 16.4 18.32 0.83 0.83
PER 9.04 8.53 11.95 11.39 0.89 0.89
MAR 12.4 12.09 16.33 16.11 0.8 0.81
LYK 11.15 10.96 15.45 14.9 0.86 0.85
M

GEO 15.42 13.67 19.45 18.46 0.8 0.8


AGP 11.43 10.59 15.17 14.6 0.73 0.74
THR 12.5 11.89 16.20 15.48 0.69 0.68
D

252 Table 9. Performance statistics results for CO.


ΜΑΕ (mg/m3) RMSE (mg/m3) R2
TE

MLR FFNN MLR FFNN MLR FFNN


PAT 0.9 0.62 1.06 0.81 0.5 0.51
PIR 0.31 0.22 0.39 0.34 0.71 0.7
ATH 0.37 0.32 0.48 0.47 0.73 0.76
EP

SMY 0.25 0.17 0.38 0.33 0.72 0.76


PER 0.14 0.14 0.3 0.29 0.78 0.78
MAR 0.16 0.16 0.3 0.31 0.78 0.78
GEO 0.16 0.14 0.28 0.26 0.82 0.83
C

253 Table 10. Performance statistics results for SO2.


AC

ΜΑΕ (µg/m3) RMSE (µg/m3) R2


MLR FFNN MLR FFNN MLR FFNN
PAT 11.15 4.03 11.97 5.74 0.31 0.28
PIR 6.14 6.17 8.65 10.15 0.23 0.2
ATH 3.07 2.75 3.86 3.6 0.31 0.29
PER 3.27 2.55 4.37 3.94 0.35 0.37

254 A detailed comparison between the FFNN and MLR interpolation schemes is based on the
255 analysis of the residuals distribution (differences of the predicted and observed air pollutant
256 concentrations). In Figure 4 some characteristic cases are presented for selected pollutants and
257 monitoring sites. A general remark is that the residuals distributions are in accordance with
258 the mean model performance statistics (Tables 6-10). In more detail, for NO2 at ARI site the
259 FFNN model under-predicts (Figure 4a) whereas at PIR the MLR scheme systematically
260 over-predicts (Figure 4b) the observed concentrations. Regarding SO2 at PAT (Figure 4c), the

9
ACCEPTED MANUSCRIPT
261 under-estimation of the FFNN model is significantly lower than the MLR over-prediction,
262 which is consistent with the MAE results (4.03 µg/m3 and 11.15 µg/m3, respectively). The
263 model residuals for O3 at LIO, PER and THR exhibit similar distributions for both schemes
264 (Figures 4d-4f). Focusing on the THR monitoring site, which is associated with considerably
265 higher O3 concentrations and therefore with environmental health risk, the FFNN model
266 residuals distribution is centered at 0 µg/m3 and exhibits lower dispersion compared to the
267 MLR scheme (Figure 4f). Regarding CO, at the urban/traffic monitoring sites at ATH and
268 PAT (Figures 4g and 4h), the residuals distributions for the FFNN models are considerably
269 superior, even at the PAT site, which is associated with higher prediction errors. In both cases
270 for the MLR scheme, a positive bias is observed, leading to a systematic over-prediction of

PT
271 the observed CO concentrations. The prediction errors for the MLR scheme are in general
272 increased due to the fact that the method is heavily influenced from the representativeness of
273 the air monitoring network and the degree of the spatial variability of the examined pollutants.

RI
274 For example, for NO2 the performance of the MLR scheme is relatively accurate for the ATH,
275 ARI, PER, MAR and LYK sites. In these cases the highest regression coefficient values of the
276 MLR models correspond to the pair of monitoring sites with the highest NO2 time series
277 correlation coefficient values. For CO and SO2, where the air quality network density is

SC
278 limited, the only case where the MLR scheme provides low prediction errors is the MLR
279 model for CO at PER (Table 9). In this case the most critical explanatory variable (GEO site)
280 is also associated with the pair of sites that exhibit the highest correlation coefficient value
281 (GEO and PER sites).

U
AN
M
D
TE
C EP
AC

282 Fig. 4. Error values histograms where both FFNNs (black bars) and MLR (grey bars) are
283 presented for NO2-ARI (a), NO2-PIR (b), SO2-PAT (c), O3-LIO (d), O3-PER (e), O3-THR (f),
284 CO-ATH (g) and CO-PAT (h).

10
ACCEPTED MANUSCRIPT
285 In Figure 5, the comparison of the FFNN model predictions and the observed concentrations
286 are presented for O3, focusing on the monitoring sites that report during the test period (2012-
287 2013) a higher number of information threshold exceedances (LYK, GEO, AGP and THR).
288 The importance of O3 pollution in the area of study is well documented due to high levels of
289 solar radiation and temperature values (Tzanis, 2005, 2009). The findings are in accordance
290 with the FFNN model residual distributions and in all cases low dispersion is observed along
291 the diagonal of the optimum prediction. More specifically, no systematic bias is observed for
292 all sites except GEO, where for low O3 concentrations the FFNN model overpredicts the
293 observed values. It should be noted that for all cases the critical high-end O3 concentrations
294 are well represented in all cases by the FFNN models. The findings of the relative importance

PT
295 of the input air pollutant concentrations according to the leave-one-out cross validation
296 methodology for O3 are presented in Table 11. The results indicate that the critical inputs for
297 the FFNN models are not necessarily air pollutant concentrations from sites with the same

RI
298 characteristics with the target station. The analysis reveals that the most important features of
299 the input variables are the proximity of the monitoring sites along with concentration levels
300 for the examined pollutant. In more detail, for example for the LYK site the most important
301 input variables are the O3 concentrations at THR and MAR sites with RI values 12.58 % and

SC
302 11.96 %, respectively, highlighting the importance of proximity of the monitoring sites,
303 whereas the importance of the pollution levels, regardless of the geographical location of the
304 sites, is evident at FFNN for O3 at GEO site, where the most important input variables are the
305 corresponding concentrations at MAR and LYK.

U
306
AN
M
D
TE

307 Fig. 5. Comparison of the observed and predicted FFNN values for O3-LYK (a), O3-GEO (b),
308 O3-AGP (c) and O3-THR (d).
EP

309 Table 11. Relative importance (%) of the input data for O3.
O3 PAT PIR ATH ELE SMY LIO PER MAR LYK GEO AGP THR
PAT - 6.89 11.18 6.87 10.01 9.07 9.41 7.94 8.63 8.59 14.37 7.04
C

PIR 7.64 - 8.45 4.82 15.20 11.42 9.92 8.52 10.73 11.85 4.33 7.11
ATH 7.19 8.83 - 6.61 10.85 4.30 9.66 10.61 5.44 18.97 8.50 9.04
ELE 5.32 11.58 9.44 - 6.48 12.12 8.36 9.12 8.91 9.10 9.31 10.27
AC

SMY 5.91 8.46 9.63 6.83 - 8.67 12.04 11.43 7.68 12.11 9.13 8.10
LIO 9.97 8.14 7.90 9.13 6.60 - 11.25 7.64 11.37 11.04 9.20 7.78
PER 8.96 8.36 11.04 6.88 8.56 10.48 - 10.36 10.21 9 6.32 9.82
MAR 5.28 7.70 9.14 6.58 11.27 11.51 9.92 - 11.84 6.99 10.25 9.51
LYK 6.21 7.57 8.60 6.85 7.68 8.92 9.65 11.96 - 12.66 7.31 12.58
GEO 3.67 5.76 11.07 4.97 10.30 8.61 9.04 14.47 13.16 - 8.16 10.79
AGP 9.02 7.16 9.05 9.60 8.23 8.51 10.34 8.88 7.50 11.11 - 10.60
THR 6.01 8.44 7.34 8.45 8.70 7.82 9.30 13.17 8.72 12.49 9.58 -

310 5. Conclusions
311 FFNN models are used as a point interpolation methodology for air pollution spatial
312 forecasting, using available data from an existing air quality monitoring network. The selected
313 area of study is the greater metropolitan area of Athens in Greece and the method is applied to

11
ACCEPTED MANUSCRIPT
314 five air pollutants (NO, NO2, O3, CO and SO2). The results highlight the superior performance
315 of the FFNN models compared to the linear MLR interpolation scheme due to their ability to
316 model more efficiently the complex air pollution spatial variability. A significant factor that
317 affects the predicting ability of the FFNN models is the optimum selection of the network
318 architecture along with the air quality monitoring network density. A considerable drawback
319 of the FFNN methodology is the requirement of a representative training dataset in order to
320 provide sufficient information to the networks to maximize their generalization ability
321 (Philippopoulos and Deligiorgi, 2011). It is confirmed that the FFNN models incorporate
322 significant spatial variability features such as the proximity of the input-output sites and the
323 air pollutant concentration levels. The study proposes the method as a suitable alternative in

PT
324 air pollution spatial interpolation and future research will be focused to examine the
325 predictive ability of more advanced ANN schemes (Díaz-Robles et al., 2008; Al-Alawi et al.,
326 2008). Finally, the trained FFNN models could be operationally employed to increase the

RI
327 effectiveness and the representativeness of the air monitoring networks by providing data at
328 currently unmonitored locations and thus eliminating the requirement of a relatively high
329 number of monitoring stations for describing the air pollution spatial variability.

SC
330 Acknowledgements
331 The authors would like to acknowledge the Ministry of Environment, Energy and Climate
332 Change for providing the air quality data

U
AN
M
D
TE
C EP
AC

12
ACCEPTED MANUSCRIPT
333 References

334 Abdul-Wahab S.A., Al-Alawi S.M., 2002. Assessment and prediction of tropospheric ozone
335 concentration levels using artificial neural networks. Environmental Modelling and
336 Software 17, 219–228. https://doi.org/10.1016/S1364-8152(01)00077-9.
337 Al-Alawi S.M., Abdul-Wahab S.A., Bakheit C.S., 2008. Combining principal component
338 regression and artificial neural networks for more accurate predictions of ground-level
339 ozone. Environmental Modelling and Software 23, 396–403.
340 https://doi.org/10.1016/j.envsoft.2006.08.007.
341 Amanollahi J., Tzanis C., Abdullah A.M., Ramli M.F., Pirasteh S., 2013. Development of the

PT
342 models to estimate particulate matter from thermal infrared band of Landsat Enhanced
343 Thematic Mapper. International Journal of Environmental Science and Technology 10,
344 1245–1254. https://doi.org/10.1007/s13762-012-0150-7.

RI
345 Asimakopoulos D., Deligiorgi D., Drakopoulos C., Helmis C., Kokkori K., Lalas D., Sikiotis
346 D., Varotsos C, 1992. An experimental study of nighttime air-pollutant transport over
347 complex terrain in Athens. Atmospheric Environment, Part B Urban Atmosphere 26, 59-
348 71. https://doi.org/10.1016/0957-1272(92)90037-S.

SC
349 Aunan K., Pan X.C., 2004. Exposure-response functions for health effects of ambient air
350 pollution applicable for China - A meta-analysis. Science of the Total Environment 329,
351 3–16. https://doi.org/10.1016/j.scitotenv.2004.03.008.

U
352 Bandyopadhyay G., Chattopadhyay S., 2007. Single hidden layer artificial neural network
353 models versus multiple linear regression model in forecasting the time series of total
354 ozone. International Journal of Environmental Science and Technology 4, 141-149.
AN
355 https://doi.org/10.1007/BF03325972.
356 Boznar M., Lesjak M., Mlakar P., 1993. A neural network-based method for short-term
357 predictions of ambient SO2 concentrations in highly polluted industrial areas of complex
M

358 terrain. Atmospheric Environment. Part B. Urban Atmosphere 27, 221–230.


359 https://doi.org/10.1016/0957-1272(93)90007-S.
360 Curtis L., Rea W., Smith-Willis P., Fenyves E., Pan Y., 2006. Adverse health effects of
361 outdoor air pollutants. Environment International 32, 815–830.
D

362 https://doi.org/10.1016/j.envint.2006.03.012.
363 Deligiorgi D., Philippopoulos K., Karvounis G., Tzanakou M., 2009. Identification of
TE

364 pollution dispersion patterns in complex terrain using AERMOD modelling system.
365 International Journal of Energy, Environment and Economics 3, 143-150.
366 Deligiorgi D., Philippopoulos K., Karvounis G., 2013. Estimation of pollution dispersion
367 patterns of a power plant plume in complex terrain. Global NEST Journal 15, 227-240.
EP

368 Díaz-Robles L.A., Ortega J.C., Fu J.S., Reed G.D., Chow J.C., Watson J.G., Moncada-
369 Herrera J.A., 2008. A hybrid ARIMA and artificial neural networks model to forecast
370 particulate matter in urban areas: The case of Temuco, Chile. Atmospheric Environment
371 42, 8331–8340. https://doi.org/10.1016/j.atmosenv.2008.07.020.
C

372 Dominick D., Juahir H., Latif M.T., Zain S.M., Aris A.Z., 2012. Spatial assessment of air
373 quality patterns in Malaysia using multivariate analysis. Atmospheric Environment 60,
AC

374 172–181. https://doi.org/10.1016/j.atmosenv.2012.06.021.


375 Fallahi S., Amanollahi J., Tzanis C.G., Ramli M.F., 2018. Estimating solar radiation using
376 NOAA/AVHRR and ground measurement data. Atmospheric Research 199, 93–102.
377 http://dx.doi.org/10.1016/j.atmosres.2017.09.006.
378 Feng X., Li Q., Zhu Y., Hou J., Jin L., Wang J., 2015. Artificial neural networks forecasting
379 of PM2.5 pollution using air mass trajectory based geographic model and wavelet
380 transformation. Atmospheric Environment 107, 118–128.
381 https://doi.org/10.1016/j.atmosenv.2015.02.030.
382 Founda D., Giannakopoulos C., 2009. The exceptionally hot summer of 2007 in Athens,
383 Greece - A typical summer in the future climate? Global and Planetary Change 67, 227–
384 236. https://doi.org/10.1016/j.gloplacha.2009.03.013.
385 Franceschi F., Cobo M., Figueredo M., 2018. Discovering relationships and forecasting PM10
386 and PM2.5 concentrations in Bogotá, Colombia, using Artificial Neural Networks,

13
ACCEPTED MANUSCRIPT
387 Principal Component Analysis, and k-means clustering. Atmospheric Pollution
388 Research. https://doi.org/10.1016/j.apr.2018.02.006.
389 Ganguly N.D., Tzanis C., 2011. Study of stratosphere-troposphere exchange events of ozone
390 in India and Greece using ozonesonde ascents. Meteorological Applications 18, 467–
391 474. https://doi.org/10.1002/met.241.
392 Gardner M.W., Dorling S.R., 1998. Artificial neural networks (the multilayer perceptron)—a
393 review of applications in the atmospheric sciences. Atmospheric Environment 32, 2627–
394 2636. https://doi.org/10.1016/S1352-2310(97)00447-0.
395 Gardner M.W., Dorling S.R., 1999. Neural network modelling and prediction of hourly NOx
396 and NO2 concentrations in urban air in London. Atmospheric Environment 33, 709-719.

PT
397 https://doi.org/10.1016/S1352-2310(98)00230-1.
398 Garson G.D., 1991. Interpreting neural-network connection weights. AI Expert 6, 46-51.
399 Goh A.T.C., 1995. Back-propagation neural networks for modeling complex systems.

RI
400 Artificial Intelligence in Engineering, 9, 143–151. https://doi.org/10.1016/0954-
401 1810(94)00011-S.
402 Grivas G., Chaloulakou A., 2006. Artificial neural network models for prediction of PM10
403 hourly concentrations, in the Greater Area of Athens, Greece. Atmospheric Environment

SC
404 40, 1216–1229. https://doi.org/10.1016/j.atmosenv.2005.10.036.
405 Haykin S., 2009. Neural Networks and Learning Machines (3rd ed.). New Jersey: Pearson
406 Education Inc.
407 Hornik K., Stinchcombe M., White H., 1989. Multilayer feedforward networks are universal

U
408 approximators. Neural Networks 2, 359-366. https://doi.org/10.1016/0893-
409 6080(89)90020-8.
AN
410 Kalogirou S.A., 2001. Artificial neural networks in renewable energy systems applications: a
411 review. Renewable and Sustainable Energy Reviews 5, 373–401.
412 https://doi.org/10.1016/S1364-0321(01)00006-5.
413 Kohonen T., 1988. An introduction to neural computing. Neural Networks 1, 3–16.
M

414 https://doi.org/10.1016/0893-6080(88)90020-2.
415 Kolehmainen M., Martikainen H., Ruuskanen J., 2001. Neural networks and periodic
416 components used in air quality forecasting. Atmospheric Environment 35, 815–825.
D

417 https://doi.org/10.1016/S1352-2310(00)00385-X.
418 Kukkonen J., Partanen L., Karppinen A., Ruuskanen J., Junninen H., Kolehmainen M., Niska
TE

419 H., Dorling S., Chatterton T., Foxall R., Cawley G., 2003. Extensive evaluation of
420 neural network models for the prediction of NO2 and PM10 concentrations, compared
421 with a deterministic modelling system and measurements in central Helsinki.
422 Atmospheric Environment 37, 4539–4550. https://doi.org/10.1016/S1352-
EP

423 2310(03)00583-1.
424 Lalas D.P., Αsimakοpοulοs D.N., Deligiοrgi D.G., Ηelmis C.G., 1983. Sea-breeze circulatiοn
425 and phοtοchemical pοllutiοn in Αthens, Greece, Αtmοspheric Εnvirοnment 17, 1621-
426 1632. https://doi.org/10.1016/0004-6981(83)90171-3.
C

427 Li C., Du S., Bai Z., Shao-fei K., Yan Y., Bin H., Dao-wen H., Li Z., 2010. Application of
428 land use regression for estimating concentrations of major outdoor air pollutants in
AC

429 Jinan, China. Journal of Zhejiang University-SCIENCE A 11, 857–867.


430 https://doi.org/10.1631/jzus.A1000092.
431 Li J., Heap A.D., 2011. A review of comparative studies of spatial interpolation methods in
432 environmental sciences: Performance and impact factors. Ecological Informatics 6, 228–
433 241. https://doi.org/10.1016/j.ecoinf.2010.12.003.
434 Li J., Heap A.D., 2014. Spatial interpolation methods applied in the environmental sciences:
435 A review. Environmental Modelling and Software 53, 173–189.
436 https://doi.org/10.1016/j.envsoft.2013.12.008.
437 Mavrakou T., Philippopoulos K., Deligiorgi D., 2012. The impact of sea breeze under
438 different synoptic patterns on air pollution within Athens basin. Science of the Total
439 Environment 433, 31–43. https://doi.org/10.1016/j.scitotenv.2012.06.011.

14
ACCEPTED MANUSCRIPT
440 McCulloch W.S., Pitts W., 1943. A logical calculus of the ideas immanent in nervous activity.
441 The Bulletin of Mathematical Biophysics 5, 115–133.
442 https://doi.org/10.1007/BF02478259.
443 MEECC, 2013. Annual air quality report 2013. Ministry of Environment Energy and Climate
444 Change. Directorate of air pollution and noise control PERPA.
445 Mishra D., Goyal P., Upadhyay A., 2015. Artificial intelligence based approach to forecast
446 PM2.5 during haze episodes: A case study of Delhi, India. Atmospheric Environment
447 102, 239–248. https://doi.org/10.1016/j.atmosenv.2014.11.050.
448 Moustris K.P., Ziomas I.C., Paliatsos A.G., 2010. 3-day-ahead forecasting of regional
449 pollution index for the pollutants NO2, CO, SO2, and O3 using artificial neural networks

PT
450 in Athens, Greece. Water, Air, and Soil Pollution 209, 29–43.
451 https://doi.org/10.1007/s11270-009-0179-5.
452 Pfeiffer H., Baumbach G., Sarachaga-Ruiz L., Kleanthous S., Poulida O., Beyaz E., 2009.

RI
453 Neural modelling of the spatial distribution of air pollutants. Atmospheric Environment
454 43, 3289–3297. https://doi.org/10.1016/j.atmosenv.2008.05.073.
455 Philippopoulos K., Deligiorgi D., 2011. Spatial Interpolation Methodologies in Urban Air
456 Pollution Modeling: Application for the Greater Area of Metropolitan Athens, Greece.

SC
457 Advanced Air Pollution. doi: 10.5772/17734.
458 Philippopoulos K., Deligiorgi D., 2012. Application of artificial neural networks for the
459 spatial estimation of wind speed in a coastal region with complex topography.

U
460 Renewable Energy 38, 75–82. https://doi.org/10.1016/j.renene.2011.07.007.
461 Rigol J.P., Jarvis C.H., Stuart N., 2001. Artificial neural networks as a tool for spatial
462 interpolation. International Journal of Geographical Information Science 15, 323–343.
AN
463 https://doi.org/10.1080/13658810110038951.
464 Rosenlund M., Forastiere F., Stafoggia M., Porta D., Perucci M., Ranzi A., Nussio F., Perucci
465 C.A., 2008. Comparison of regression models with land-use and emissions data to
M

466 predict the spatial distribution of traffic-related air pollution in Rome. Journal of
467 Exposure Science and Environmental Epidemiology 18, 192–199.
468 https://doi.org/10.1038/sj.jes.7500571.
469 Şahin M., 2012. Modelling of air temperature using remote sensing and artificial neural
D

470 network in Turkey. Advances in Space Research 50, 973–985.


471 https://doi.org/10.1016/j.asr.2012.06.021.
TE

472 Scoggins A., Kjellstrom T., Fisher G., Connor J., Gimson N., 2004. Spatial analysis of annual
473 air pollution exposure and mortality. Science of the Total Environment 321, 71–85.
474 https://doi.org/10.1016/j.scitotenv.2003.09.020.
475 Sousa S.I.V., Martins F.G., Alvim-Ferraz M.C.M., Pereira M.C., 2007. Multiple linear
EP

476 regression and artificial neural networks based on principal components to predict ozone
477 concentrations. Environmental Modelling and Software 22, 97–103.
478 https://doi.org/10.1016/j.envsoft.2005.12.002.
479 Tasadduq I., Rehman S., Bubshait K., 2002. Application of neural networks for the prediction
C

480 of hourly mean surface temperatures in Saudi Arabia. Renewable Energy 25, 545–554.
481 https://doi.org/10.1016/S0960-1481(01)00082-9.
AC

482 Tzanis C., 2005. Ground-based observations of ozone at Athens, Greece during the solar
483 eclipse of 1999. International Journal of Remote Sensing 26, 3585–3596.
484 https://doi.org/10.1080/01431160500076947.
485 Tzanis C., 2009. On the relationship between total ozone and temperature in the troposphere
486 and the lower stratosphere. International Journal of Remote Sensing 30, 6075–6084.
487 https://doi.org/10.1080/01431160902798429.
488 Tzanis C., Tsivola E., Efstathiou M., Varotsos C., 2009. Forest fires pollution impact on the
489 solar UV irradiance at the ground. Fresenius Environmental Bulletin 18, 2151–2158.
490 Varotsos C., Efstathiou M., Tzanis C., Deligiorgi D., 2012a. On the limits of the air pollution
491 predictability: The case of the surface ozone at Athens, Greece. Environmental Science
492 and Pollution Research 19, 295–300. https://doi.org/10.1007/s11356-011-0555-8.

15
ACCEPTED MANUSCRIPT
493 Varotsos C., Ondov J., Tzanis C., Öztürk F., Nelson M., Ke H., Christodoulakis J., 2012b. An
494 observational study of the atmospheric ultra-fine particle dynamics. Atmospheric
495 Environment 59, 312–319. https://doi.org/10.1016/j.atmosenv.2012.05.015.
496 Vicente-Serrano S.M., Saz-Sanchez M.A., Cuadrat J.M., 2003. Comparative analysis of
497 interpolation methods in the middle Ebro Valley (Spain): Application to annual
498 precipitation and temperature. Climate Research 24, 161–180.
499 https://doi.org/10.3354/cr024161.
500 Viotti P., Liuti G., Di Genova P., 2002. Atmospheric urban pollution: Applications of an
501 artificial neural network (ANN) to the city of Perugia. Ecological Modelling 148, 27–46.
502 https://doi.org/10.1016/S0304-3800(01)00434-3.

PT
503 Voukantsis D., Karatzas K., Kukkonen J., Räsänen T., Karppinen A., Kolehmainen M., 2011.
504 Intercomparison of air quality data using principal component analysis, and forecasting
505 of PM10 and PM2.5 concentrations using artificial neural networks, in Thessaloniki and

RI
506 Helsinki. Science of the Total Environment 409, 1266–1276.
507 https://doi.org/10.1016/j.scitotenv.2010.12.039.
508 Wahid H., Ha Q.P., Duc H., Azzi M., 2013. Neural network-based meta-modelling approach
509 for estimating spatial distribution of air pollutant levels. Applied Soft Computing 13,

SC
510 4087–4096. https://doi.org/10.1016/j.asoc.2013.05.007.
511 Wiedensohler A., Wehner B., Birmili W., 2002. Aerosol number concentrations and size
512 distributions at mountain-rural, urban-influenced rural, and urban-background sites in
513 Germany. Journal of Aerosol Medicine: Deposition, Clearance, and Effects in the Lung

U
514 15, 237–243. https://doi.org/10.1089/089426802320282365.
515 Yi J., Prybutok V.R., 1996. A neural network model forecasting for prediction of daily
AN
516 maximum ozone concentration in an industrialized urban area. Environmental Pollution
517 92, 349–357. https://doi.org/10.1016/0269-7491(95)00078-X.
518 Zoras S., Triantafyllou A.G., Deligiorgi D., 2006. Atmospheric stability and PM10
519 concentrations at far distance from elevated point sources in complex terrain: worst case
M

520 episode study. Journal of Environmental Management 80, pp. 295-302.


521 https://doi.org/10.1016/j.jenvman.2005.09.010.
D
TE
C EP
AC

16
ACCEPTED MANUSCRIPT
Highlights

• ANN models are superior compared to MLR for air pollution spatial forecasting
• The air quality monitoring network density affects the ANN predicting ability
• ANN models incorporate the most significant spatial variability features
• The critical high-end O3 concentrations are well represented by the ANNs
• ANNs could be used operationally for modeling air pollution spatial variability

PT
RI
U SC
AN
M
D
TE
C EP
AC

You might also like