Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence 20 (2007) 745–755


www.elsevier.com/locate/engappai

Forecasting of the daily meteorological pollution using wavelets and


support vector machine
Stanislaw Osowskia,b,, Konrad Garantya
a
Warsaw University of Technology, Warsaw, Poland
b
Military University of Technology, Warsaw, Poland
Received 14 July 2006; received in revised form 23 October 2006; accepted 27 October 2006
Available online 29 December 2006

Abstract

The paper presents the method of daily air pollution forecasting by using support vector machine (SVM) and wavelet decomposition.
Based on the observed data of NO2, CO, SO2 and dust, for the past years and actual meteorological parameters, like wind, temperature,
humidity and pressure, we propose the forecasting approach, applying the neural network of SVM type, working in the regression mode.
To obtain the acceptable accuracy of forecast we decompose the measured time series data into wavelet representation and predict the
wavelet coefficients. On the basis of these predicted values the final forecasting is prepared. The paper presents the results of numerical
experiments on the basis of the measurements made by the meteorological stations, situated in the northern region of Poland.
r 2006 Elsevier Ltd. All rights reserved.

Keywords: Pollution forecasting; Support vector machine; Wavelet decomposition; Neural network predictors; Generalization ability

1. Introduction years (Comrie and Diem, 1999; Boznar et al., 2004;


Hooyberghs et al., 2005; Kukkonen et al., 2003; Bianchini
The information of the meteorological pollution, such as et al., 1989). We propose here the forecasting system based
CO, NO2, SO2 and dust is more and more important due to on the support vector machine (SVM) and the wavelet
their harmful effects on human health (Comrie and Diem, decomposition of the time series, formed on the basis of the
1999). It is especially true in the urban environment of signals measured in the previous days. The prediction
every country. The automatic measurements of the system makes use of such meteorological quantities as
concentration of these pollutants provide the instant speed and direction of the wind, temperature, humidity, air
registration, on the basis of which the averaged values pressure, season of the year, type of the day and also the
are usually calculated. The important problem is early daily pollution measured in the previous days. We build the
prediction of the harmful pollution just to inform or alarm SVM networks for the prediction of each considered
the local inhabitants of the incoming danger. pollutant (CO, NO2 and SO2 and dust). The forecasting
The aim of this research work is to construct a system based directly on these measurements was not able
forecasting model for the averaged next day pollution that to provide the acceptable accuracy of prediction. To solve
would be applicable for the use by the authority the problem we decompose the measured signals into
responsible for air pollution regulation in the appropriate wavelets (Mallat, 1989; Daubechics, 1988). The prediction
region of the country. The use of the artificial neural is performed for the wavelet coefficients (the detailed
networks (Haykin, 1999) of multilayer perceptron (MLP) coefficients up to some level and the approximated coarse
type as the model of pollution was exploited frequently last signal corresponding to the last level) in the original
resolution. On the basis of these predicted values the
Corresponding author. Warsaw University of Technology, Warsaw, reconstruction of the real value of the forecasted pollution
Poland. for all considered pollutants is performed by simply
E-mail address: sto@iem.pw.edu.pl (S. Osowski). summing up the predicted decomposition signals.

0952-1976/$ - see front matter r 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.engappai.2006.10.008
ARTICLE IN PRESS
746 S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755

The obtained system based on the application of SVM Fig. 1 presents the daily averaged time series of NO2
networks and wavelet decomposition represents very good concentration for the whole year 2003 measured by one of
generalization ability. Trained on one type of pollutant the meteorological stations of northern region of Poland.
(NO2) it is able to predict the other pollutant concentra- There is a visible large variety of the concentration of the
tions (CO, SO2, dust) with the satisfactory accuracy. The pollutant from the day to day. At the mean value of 15.53
numerical experiments of forecasting have been performed mg/m3 the standard deviation of the measured values was
for the northern region of Poland (the meteorological equal 7.99 mg/m3. The large value of the standard deviation
stations near Gdansk). The results of experiments have confirms the difficulty of the forecasting problem.
confirmed good accuracy of daily prediction for all The meteorological stations measure also some addi-
considered pollutants. These detailed results will be tional parameters such as the temperature, direction and
presented and discussed in the paper. strength of the wind, humidity, pressure, etc. These
parameters are associated with the pollution, although this
2. Problem analysis association is of rather complex nature. Fig. 2 presents the
exemplary relationships between the temperature and dust
The problem of monitoring and early warning of (Fig. 2a), and the temperature and NO2 concentration
alarming values of the pollution in a region is an important (Fig. 2b). As it is seen the distribution of the measured
task of any agency monitoring the quality of the points is far from linear and represents the complex
environment. The typical values measured by the meteor- relation. Plotting the distribution of other pollutants versus
ological stations include the concentrations of SO2, NO2, the temperature, wind, humidity or pressure, no clear
CO, dust, etc. The network of measuring stations are relationship can be observed as well. It means that the
owned and maintained by the environmental agency of the prediction of these quantities needs an application of
state and the measurements are performed according to the highly complicated nonlinear model of these dependencies.
regulations of this agency. Typically the instantaneous A bit better correlation is observed among the concen-
values are registered and on the basis of them the trations of different pollutants. Fig. 3 presents the
appropriate averaged values are determined. The day measured dependencies between the concentration of SO2
averaged values are of practical use, since they represent and NO2 (Fig. 3a) and between NO2 and dust (Fig. 3b).
some long term tendencies of the pollution development. These correlations are closer to the linear, although even
For meteorologists they establish the satisfactory tool for here the scattering of points is also wide. The standard
observing trends of pollution, and enable to warn the local deviations of the distance between the real distribution
population of alarming values, as well as to undertake of points and their linear approximation are equal 1.92
some preventive actions for the future. [mg/m3] for the relationship between NO2 and SO2 (Fig. 3a)

Fig. 1. The real time series of NO2 concentration for the year 2003 averaged a day for one chosen station in northern region of Poland.
ARTICLE IN PRESS
S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 747

Fig. 2. The measured dependence between the temperature and the concentration of (a) dust and (b) NO2 for one chosen station in northern region of
Poland.

Fig. 3. The measured relationship between the concentrations of NO2 and SO2 (a) and NO2 and dust (b) for one chosen station in northern region of
Poland.

and 1.98 [mg/m3] for the relationship between NO2 and dust is simplified to the solution of the quadratic problem of a
(Fig. 3b). single minimum point (global minimum). To obtain better
The important conclusion from these results is that there accuracy of prediction we have applied also the wavelet
are some similarities existing among the mechanisms of decomposition (Mallat, 1989; Daubechier, 1988) of the time
spreading different pollutants. The observed increase of NO2 series of the measured concentrations of pollutants corre-
pollution results in the increase of the concentration of the sponding to the whole year. Instead of predicting the total
other types of pollutants, although the observed relationship concentration of the particular pollutant we build few SVM
is not linear one. The interesting challenge is to construct and networks responsible for prediction of the wavelet coeffi-
learn the model of one kind of pollution and using this model cients on different levels, forming the partial representations
to predict the data corresponding to the other pollutant. of the particular pollutant concentration. Reconstruction of
the final concentration of the pollutant is done on the basis
3. The tools and methods of pollution prediction of the predicted wavelet coefficients using simple operation
of summing.
In our work we will apply the SVM of the Gaussian
kernel, working in the regression mode (Vapnik, 1998; 3.1. Support vector machine for regression
Schölkopf and Smola, 2002) as the model of pollution. This
choice was accepted after trying other neural type solutions, SVM, the solution of the universal feedforward net-
like MLP and neuro-fuzzy structure (Haykin, 1999; Jang et works, is known as the excellent tool for the classification
al., 1997). The main advantage of the SVM over MLP or and regression problems of good generalization ability
neuro-fuzzy network is its good generalization ability, (Vapnik, 1998; Schölkopf and Smola, 2002). In distinction
acquired at relatively small number of learning data and to the classical neural networks, the formulation of the
at large number of input nodes (high dimensional problem). learning problem of SVM leads to the quadratic program-
Due to very specific problem formulation the learning task ming with linear constraints.
ARTICLE IN PRESS
748 S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755

The SVM is a linear machine of one output y(x), The most important is the choice of coefficients e and C.
working in the high dimensional feature space formed by Constant e determines the margin within which the error is
the nonlinear mapping of the N-dimensional input vector x neglected. The smaller its value the higher accuracy of
into a K-dimensional feature space (K4N) through the use learning is required, and more support vectors will be
of the nonlinear function jðxÞ. The number of hidden units found by the algorithm. The regularization constant C is
(K) is equal to the number of so-called support vectors, the weight, determining the balance between the complex-
that are the learning data points, closest to the separating ity of the network, characterized by the weight vector w
hyperplane. The learning task is transformed to the and the error of approximation, measured by the slack
minimization of the error function, while keeping the variables and the value of e. For the normalized input
weights of the network at minimum. The error function is signals the value of e is usually adjusted in the range
defined through the so-called e-insensitive loss function (103–102), and C is much bigger than 1.
L ðd; yðxÞÞ (Vapnik, 1998):
(
jd  yðxÞj   for jd  yðxÞjX; 3.2. Wavelet decomposition of signals
L ðd; yðxÞÞ ¼ (1)
0 for jd  yðxÞjo;
The discrete wavelet transform belongs to the multi-
where e is the assumed accuracy, d is the destination, x the resolution analysis (Mallat, 1989; Daubechies, 1988). It is a
input vector and y(x) the actual output of the network linear transformation with a special property of time and
under excitation of x and the actual output signal of the frequency localization at the same time. It decomposes the
SVM network is defined by given signal series onto a set of basis functions of different
frequencies, shifted each other and called wavelets. Unlike
X
K
the discrete Fourier transform the discrete wavelet transform
yðxÞ ¼ wj jj ðxÞ þ b ¼ wT uðxÞ þ b, (2)
j¼1
is not a single object. In reality, it hides a whole family of
transformations. The individual members of the family are
where w ¼ ½w1 ; . . . ; wK T is the weight vector, b the bias and determined by the choice of so-called mother wavelet
uðxÞ ¼ ½j1 ðxÞ; . . . ; jK ðxÞT the basis function vector. function. The goal of discrete wavelet transform is to
The solution of the so defined optimization problem is decompose arbitrary signal f(t) into a finite summation of
solved by the introduction of the Lagrangian function and wavelets at different scales (levels) according to the expansion
the Lagrange multipliers ai , a0i ði ¼ 1; 2; . . . ; pÞ responsible XX
for the functional constraints defined by (1). The mini- f ðtÞ ¼ cjk cð2j t  kÞ, (6)
mization of the Lagrangian function has been transformed j k

to the so-called dual problem (Vapnik, 1998; Platt, 1998): where cjk is a new set of coefficients and cð2j t  kÞ is the
( wavelet of jth level (scale) shifted by k samples. The set of
Xp Xp
max d i ðai  a0i Þ   ðai  a0i Þ wavelets of different scales and shifts can be generated from
i¼1 i¼1 the single prototype wavelet, called mother wavelet, by
)
1XX
p p dilations and shifts. What makes the wavelet bases interesting
 ðai  a0i Þðaj  a0j ÞKðxi ; xj Þ ð3Þ is their self-similarity: every function in wavelet basis is a
2 i¼1 j¼1
dilated and shifted version of one (or possibly few) mother
at the constraints functions. In practice the most often used are the orthogonal
or bio-orthogonal wavelets, for which the set of wavelets
X
p
forms an orthogonal or bi-orthogonal base (Mallat, 1989;
ðai  a0i Þ ¼ 0,
i¼1
Daubechies, 1988).
Let us denote the discrete form of the original signal
0pai pC; 0pa0i pC, ð4Þ
vector by f and by Aj f the operator that computes the
where Kðxi ; xj Þ ¼ uT ðxi Þuðxj Þ is an inner-product kernel approximation of f at resolution 2j. Let Dj f denotes the
defined in accordance with the Mercer’s theorem (Vapnik, detailed signal, Dj f ¼ Ajþ1 f  Aj f at the resolution 2j. It
1998) for the learning data set x. After solving the dual was shown by Mallat (1989) that both operations Aj f and
problem all weights are expressed through the Nsv nonzero Dj f can be interpreted as the convolution of the signal of
Lagrange multipliers ai , a0i and the same number of previous resolution and the finite impulse response of the
learning vectors xi associated with them. The network quadrature mirror filters: the high pass (G) ~ of coefficients g~
output signal y(x) can be then expressed in the form: ~
and the low pass (H) of coefficients h ~

X
N sv X
1
Aj f ¼ ~
hð2n  kÞAjþ1 fð2nÞ, (7)
yðxÞ ¼ ðai  a0i ÞKðx; xi Þ þ b. (5)
i¼1 k¼1

The most known kernel functions used in practice are X


1
radial (Gaussian), polynomial, spline or even sigmoidal Dj f ¼ ~
gð2n  kÞAJþ1 fð2nÞ. (8)
functions (Vapnik, 1998; Schölkopf and Smola, 2002). k¼1
ARTICLE IN PRESS
S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 749

These operations performed for values of j, from 1 to J, the level for which the standard deviation of the
deliver the coefficients of the decomposition at different approximated signal is substantially smaller than that of
levels (scales) and different resolutions of the original the original signal. In practice, the stopping condition has
vector f and form the analysis of the signal. The most often been expressed in the empirical form:
used discrete wavelet analysis scheme uses Mallat pyramid
stdðAJ fÞ
algorithm (Mallat, 1989). o0:1. (10)
As a result of such transformation we get the set of stdðfÞ
coefficients representing the detailed signals Dj at different For the data distribution presented in Fig. 4 the value J ¼
levels jðj ¼ 1; 2; . . . ; JÞ, and the residue signal AJ f at the 5 was appropriate, since the ratio stdðA 5 fÞ
stdðfÞ ¼ 0:067 satisfies
level J. All of them are of different resolutions, appropriate
relation (10).
to the level. The coefficients of Djf can be interpreted as the
high frequency details, that distinguish the approximation
of f at two subsequent levels of resolution. On the other 3.3. The prediction method
hand, the signal AJf represents the coarse approximation of
the vector f. The forecasting of the pollution is done for the next day
The next step is the transformation of the detailed signals on the basis of the information of the past pollution data
Djf ðj ¼ 1; 2; . . . ; JÞ and the coarse approximation signals from the last few days and the measured or predicted
AJf into the original resolution. It is done by using special meteorological parameters, such as the speed and direction
filters G and H associated with the analysis filters G~ and H~ of the wind, the temperature, humidity, pressure, and the
by the quadrature and reflection relationships (Mallat, actual information of the month (from 1 to 12) as well as
1989). This is so-called reverse Mallat pyramid algorithm, the type of the day within the week (from 1 to 7).
forming the reconstruction of the original signal. As a The meteorological parameters listed above are well-
result we get the decomposed signals of each level in the known factors influencing the pollution (Comrie and Diem,
original resolution. The recovery of the original signal f(n) 1999; Boznar et al., 2004; Hooyberghs et al., 2005;
in each time instant n is then performed by simply adding Kukkonen et al., 2003). All of them are available from
the appropriate wavelet coefficients and the coarse the actual measurements performed by the meteorological
approximation. At J-level decomposition we have stations. The type of the day influences the pollution in an
evident way, since the industrial activity (the important
f ðnÞ ¼ D1 ðnÞ þ D2 ðnÞ þ    þ DJ ðnÞ þ AJ ðnÞ. (9)
source of pollution) is concentrated mainly in the working
Fig. 4 presents the results of 5-level wavelet decomposition days. The season of the year is also strictly connected with
of the real data of NO2 concentrations (see Fig. 1), of the the state of environment due to the enlarged demand of
whole year 2003, obtained by using Matlab (1997) power in the winter or reduced activity of industry in the
platform. The Daubechies wavelets Db8 have been applied summer holiday.
in the decomposition. All signals (the first five levels of The consideration of the past pollution is justifiable by
wavelet coefficients from D1 to D5 and the coarse the continuity of the process of formation of the actual
approximation A5 on the fifth level) are illustrated in the pollution. The significant question is how many past days
original resolution. We observe the substantial difference should be taken into account. To solve this problem we
of variability of the signals at different levels. The higher is have applied the correlation analysis of the actual pollution
the wavelet level, the lower variation of the coefficients and with the pollution of few past days. Table 1 presents the
easier prediction of them. exemplary results of the correlation analysis for CO, NO2,
Our main idea is to substitute the prediction task of the SO2 and dust for the period of one year in the form of the
original time series of high variability by the prediction of correlation coefficient for the data measured by one chosen
its wavelet coefficients on different levels of lower meteorological station situated in northern Poland. The
variabilities, and then using Eq. (9) for final forecasting actual day (d) presented in the table was Saturday and the
of the pollution at any time point n. Since most of the previous days: Friday (d1) Thursday (d2), Wednesday
wavelet coefficients are of lower variability we expect the (d3) and Tuesday (d4).
increase of the total prediction accuracy. We have analyzed many examples of such dependencies
The important point in designing the prediction system is for different days of the week and noticed significant
decision what the optimal value of J is. At higher J the correlations for at most two neighboring days. At the
variability of larger number of predicted signals is lower, so number of days exceeding two the correlations were usually
their prediction is easier and hence the expected accuracy either weak or changing significantly (from small to
higher. However, at too high number of levels the total medium) for different days. So only past two days data
error associated with the increased number of terms under have been used in the prediction model.
prediction begins to dominate and as a result the total All data have been normalized linearly to the range from
accuracy deteriorates. In our solution, we have determined 0 to 1 by simply dividing the real value by the maximum
the value of J on the basis of the standard deviation of the one of the appropriate set. For the prediction of the
approximated signals AJf. We stop the decomposition on wavelet coefficients on different levels for the next day we
ARTICLE IN PRESS
750 S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755

Fig. 4. The wavelet decomposition of the measured time series corresponding to NO2 concentration of the year 2003; D1 to D5 represent the detailed
coefficients and A5 the coarse approximation of the time on fifth level.

Table 1
The correlation coefficient values of the actual concentration of CO, NO2,
SO2 and dust with the past four days for one chosen meteorological
station

Previous days d1 d2 d3 d4

CO 0.97 0.90 0.79 0.64


NO2 0.95 0.82 0.76 0.69
SO2 0.95 0.83 0.80 0.75
Dust 0.94 0.82 0.78 0.77

have applied the SVM approach. After analysis of


variability of the coarse approximation data at different
level decompositions (see Eq. (7)) we have decided to
apply five levels ðJ ¼ 5Þ. Prediction of wavelet coefficients
on each level requires to use one SVM network. The Fig. 5. The structure of the SVM network used for prediction of any
additional one is needed for prediction of the coarse detailed coefficient Dpi and the coarse approximation Api for pth
approximation of the data. It means the application of six pollutant.
SVM networks altogether. Fig. 5 presents the exemplary
general SVM structure used for prediction of the wavelet The input data of the network is formed by five
coefficients Dpi ðd þ 1Þ for i ¼ 1; 2; . . . ; 5 of the particular meteorological parameters (the speed of the wind, direction
pollutant p on ith level for the next (d+1) day. Identical of the wind, temperature, humidity, pressure), the succeed-
structure is used for the prediction of the coarse ing number of the month and the week day, and also two
approximation Ap5 of the pollutant p (CO, NO2, SO2 past values of the forecasted quantities corresponding to
and dust). two last days. To provide the appropriate representation of
ARTICLE IN PRESS
S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 751

the wind, we have applied its strength and direction


combined together in the form x and y components
(rectangular system) of its speed vector (two nodes in
representation of x). It makes together nine inputs and the
same number of normalized input signals. All data used in
learning has been transformed to the original resolution on
each level using reverse Mallat filtering. The learning
vectors x were formed from the components of this data
base of the year 2001 and 2002 and the corresponding
meteorological parameters for these days. The destination
was associated with the predicted value of the appropriate
wavelet coefficient for the next day.
Two kinds of experiments have been performed. In the
first one, the SVM networks were specialized for prediction
of the wavelet coefficients on the appropriate decomposi-
tion levels for the next day for each pollutant separately. At
five decomposition levels and four pollutants it means
training 24 independent SVM networks. After phase of
learning all parameters of SVM networks were frozen and
the networks were tested on the data of the same kind of
pollutant (not used in learning phase).
In the second type of experiments, we have learned the Fig. 6. The geographical location of the meteorological stations in
SVM networks on the data related to one pollutant only northern region of Poland.
(for example NO2) and the trained networks tested on the
data of all pollutants (NO2, CO, SO2 and dust). It means 2003 year. We denote by d and y the 365 (the year)
training only six SVM networks. Assuming that the component vectors of the average daily concentration of
mechanisms of creating different type pollutions are the the particular pollutant (CO, NO2, SO2 or dust) corre-
same, the generalization ability of the trained SVM sponding to the destination vector d (the real measure-
networks should be sufficient for predicting the concentra- ments) and the vector y actually generated by our SVM
tions of the considered pollutants. system. We have defined two kinds of errors.
In the testing mode we supply the appropriate values
forming the vector x to the trained SVM networks that  The mean absolute error
predict the wavelet coefficients of all five levels and the !
coarse approximation value (all in the normal resolution) 1 X N
MAE ¼ jd i  yi j , (11)
for the next day. On the basis of these predicted coefficients N i¼1
the real prediction of the concentration of the particular where N is the number of days under prediction.
pollutant for the next day is made by simply adding them,  The relative (normalized) error
as is shown by Eq. (9).
kd  yk
¼ . (12)
4. The results of numerical experiments kdk

The numerical experiments of predicting the next day The main experiments have been performed by using SVM
average pollution have been performed for the network of predictor. For the comparison purposes some experiments
seven meteorological stations in the northern region of have been repeated by application of the classical MLP.
Poland near Gdansk, belonging to the ARMAAG The introductory experiments have been performed for
foundation. Fig. 6 presents the geographical location of prediction of the whole pollutant without any previous
these stations. decomposition. However, the results were not encouraging.
Some of the stations are situated in different points of Although the learning was possible with the acceptable
the city of Gdansk and two outside the real center. The error, the testing error was far too large (lack of general-
data taking part in learning and testing have been collected ization ability). This was the main reason why we have
within three years, from 2001 to 2003. Part of them (the proposed the additional step of decomposing the data into
years 2001 and 2002) has been used for learning and the the wavelets, and used the SVM networks for predicting
other part (the year 2003) for testing only. the wavelet coefficients of each decomposition level. Five
The results of learning and testing have been assessed on level decompositions of Daubechies wavelets Db8 have
the basis of the mean absolute errors and the average been applied. In the prediction step instead of one SVM
relative error of the real concentrations and their estimated network to predict particular pollutant concentration we
values produced by our predicting system for the whole have to use 6 networks responsible for predicting five levels
ARTICLE IN PRESS
752 S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755

of wavelet coefficients and the coarse approximation The networks trained on one type of pollution data were
signal. On the basis of these predicted values the whole able to predict the concentration of the other pollutants
pollutant concentrations for the next day have been (SO2, CO2, dust) with practically the same accuracy as the
reconstructed by applying Eq. (9). In all experiments, we specialized networks, trained for each pollutant separately.
have used the general scheme of SVM network shown in It confirms the fact that the mechanism of creating
Fig. 5 of the Gaussian kernel Kðx; xi Þ ¼ expðkx different type pollutions is similar and the neural networks
xi k2 =2s2 Þ. For the normalized data samples we have used are able to learn such mechanism.
s ¼ 1 and the tolerance value  ¼ 0:01. The regularization All presented numerical results will be related to this
constant C was chosen after series of experiments using the generalization ability of SVM networks. We limit pre-
standard validation approach (C ¼ 100). sentation of the results to the testing data only, not taking
The most important observation from these experiments part in learning (the data of the year 2003).
is, that we have got very good generalization abilities of the Fig. 7(a–d) depicts the exemplary results of prediction of
SVM networks, trained on the data related to NO2 only. NO2, CO, dust and SO2 concentrations corresponding to

Fig. 7. The exemplary results of daily prediction of the time series of CO (a), dust (b), SO2 (c) and NO2 (d) for one chosen meteorological station (the
upper plots—the real and predicted concentrations, the lower ones—the prediction errors).
ARTICLE IN PRESS
S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 753

Fig. 7. (Continued)

one chosen station, compared to the real measured values. m3]), CO (average level of 500 [mg/m3]), NO2 (average level
The upper plots present both patterns (the real and of 16 [mg/m3]) and SO2 (the average value of 18 [mg/m3]) it
predicted) while the lower one—the distribution of the is seen that the relative errors for all pollutants are close to
prediction errors. The predictions coincide very well with each other.
the real measured data. The mean of prediction errors is Table 2 presents the mean absolute error of prediction of
close to zero (0.19 [mg/m3] for dust, 1.2 [mg/m3] for CO, CO, NO2, SO2 and dust for all seven stations. Comparing
0.094 [mg/m3] for NO2 and 0.023 [mg/m3] for SO2). The these values with the average levels of the real concentra-
mean of absolute errors is 4.88 [mg/m3] for the dust, 27.77 tion of the pollutants shows that their relative values are
[mg/m3] for CO, 1.59 [mg/m3] for NO2 and 1.13 [mg/m3] for relatively small and stay on a similar level for all pollutants.
SO2. The standard deviation of errors for the dust is equal Table 3 presents the details of the average relative
5.97 [mg/m3], 49.7 [mg/m3] for CO, 2.27 [mg/m3] for NO2 prediction errors for all considered pollutants at seven
and 1.97 [mg/m3] for SO2. If we compare these values to the stations. All of them have been obtained using SVM
real concentrations of the dust (the average level of 70 [mg/ networks trained on the data of NO2 only.
ARTICLE IN PRESS
754 S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755

Table 2 To assess the quality of SVM predictor we have


The mean absolute error MAE of daily predictions of all pollutants for compared it with the solution based on MLP. Table 4
seven stations in the year 2003 obtained by using SVM
presents the comparison of the average relative errors of
Pollutant CO (mg/m3) NO2 (mg/m3) SO2 (mg/m3) Dust (mg/m3) the daily NO2 forecasting in the year 2003 for seven
stations by applying the SVM and MLP approaches. The
Station 1 31.26 2.11 0.94 5.86 average error eSVM corresponds to the SVM and the eMLP
Station 2 27.77 1.59 1.13 4.88
to MLP approach. There is a significant difference of
Station 3 22.94 2.06 0.99 3.17
Station 4 31.96 1.78 1.40 3.59 results. The average errors at SVM approach (14.74%) is
Station 5 36.54 3.41 0.98 5.96 much lower than that obtained at MLP application
Station 6 52.66 2.62 1.45 4.62 (21.71%). Evidently the MLP is of much worse general-
Station 7 30.56 2.16 1.41 3.83 ization ability. Observe that the same data has been used
for learning and testing both networks. The optimal
structure of MLP was found 9-17-1 (188 weights). At
available 730 training data pairs (two years used in
Table 3 learning) the number of data points was not satisfactory
The average relative errors eSVM of the daily predictions of all pollutants and as a result the acquired generalization ability of MLP
for seven stations in the year 2003 obtained by using SVM was not good enough and this was the main reason of its
Pollutant CO (%) NO2 (%) SO2 (%) Dust (%) inferior testing performance. On the other side the
relatively small number of learning pairs was not a problem
Station 1 13.37 15.22 18.19 21.53 for SVM, since this network is relatively insensitive to the
Station 2 10.10 14.10 18.14 18.66
limited number of learning data.
Station 3 7.15 15.18 18.16 13.71
Station 4 11.37 13.00 16.55 13.70 Fig. 8 presents the details of performance of both
Station 5 10.47 13.97 16.34 13.99 predictors of NO2 for 13 succeeding days compared to the
Station 6 14.03 17.41 17.79 13.11 real values (destination). It is evident that the results of
Station 7 9.76 14.39 15.83 14.16 SVM are much closer to the destination (solid line) than
Average for 10.86 14.74 17.28 15.55
that generated by MLP.
all stations
The interesting is also the comparison of the proposed
approach with the other recent solutions presented in the
works devoted to the prediction of pollution. Unfortu-
Table 4 nately the data under prediction are different in each case,
The average relative errors of prediction of the daily NO2 concentration since no standard base is available in internet. The paper
for seven different stations in the year 2003 obtained by SVM and MLP (Peace et al., 2005) has presented the results of predicting
predictors
the carbon concentration in the north–west of Italy using
Stations 1 2 3 4 5 6 7 the site-optimized semi-empirical air pollution model. The
MAE error given for each season of the year was changing
eMLP (%) 21.03 22.06 22.64 18.55 19.80 27.01 20.97
eSVM (%) 15.22 14.10 15.18 13.00 13.97 17.41 14.39
from 127 [mg/m3] to 510 [mg/m3], while the average level of

The presented numerical results confirm that the


accuracy of prediction of all pollutants stay on a similar
level (the average relative error for all stations—below or
close to 15%) although the testing data were related to the
pollutants not used in learning. This is very interesting
phenomenon of the proposed system of prediction,
following from the principle of operation of the neural
networks. In the stage of learning by using training data,
the SVM predictor was learned the mechanism of creation
of the pollution and not the data points themselves!
Satisfactory results of testing the trained network on
different pollutants simply show that the mechanism of
formation of different pollutions are really similar. Good
generalization ability of SVM acquired in the learning
stage was sufficient to obtain the satisfactory results of
forecasting for all pollutants. The obtained accuracy of
prediction makes this method practical for the one day Fig. 8. The details of tracing the destination values of NO2 concentration
ahead prediction. (solid line) by SVM (dash line) and MLP (dash-dot line) predictors.
ARTICLE IN PRESS
S. Osowski, K. Garanty / Engineering Applications of Artificial Intelligence 20 (2007) 745–755 755

pollution under observation was around 1000 [mg/m3] (the producing good quality predictions for the other types of
normalized relative error from 12.7% to 51%). In our case pollutants (CO, SO2, dust) at all stations.
the average MAE error of prediction of CO concentration
for all seven stations over the whole year was 33.38 [mg/m3]
at the average level of pollution approximately 500 [mg/m3]
(the normalized relative error around 7%). References
The paper (Bianchini et al., 2006) has proposed and
compared two methods of predicting NO2 concentration: Bianchini, M., Di Iorio, E., Maggini, M., Mocenni, C., Pucci, A., 2006. A
cyclostationary neural network model for the prediction of the NO2
the AutoRegressive eXogenous (ARX) model and a concentration. Proceedings of ESANN, Bruges, 2006.
cyclostationary neural network (CNN) model applying Boznar, M., Mlakar, P., Grasic, B., 2004. Neural networks based ozone
the MLP network. The results have been obtained for forecasting. Ninth Conference on Harmonisation within Atmospheric
the data gathered by the agency ARPA, Lombardia in Dispersion Modelling for Regulatory Purposes, Garmisch Parten-
northern Italy. The MAE error computed for 12 months kirchen, 2004, pp. 356–360.
Comrie, A.C., Diem, J.E., 1999. Climatology and forecast modelling of
was almost 5 [mg/m3] (the CNN model) and 20 [mg/m3] (the ambient carbon monoxide in phoenix. Atmospheric Environment 33,
ARX model). The peak errors of prediction were 35 [mg/ 5023–5036.
m3] (CNN) and 60 [mg/m3] (ARX), respectively. Our results Daubechies, I., 1988. Ten Lectures on Wavelets. SIAM Press, Philadel-
of NO2 prediction were as following: 2.24 [mg/m3] (MAE) phia, PA.
Haykin, S., 1999. Neural Networks. Comprehensive Foundation. Pre-
and 8 [mg/m3] (the peak error). This comparison shows the
ntice-Hall, New Jersey.
significant advantage of applying the wavelet decomposi- Hooyberghs, J., Mensink, C., Dumont, D., Fierens, F., Brasseur, O., 2005.
tion and the SVM predictors proposed in this paper. A neural network forecast for daily average PM10 concentrations in
Belgium. Atmospheric Environment 39/18, 3279–3289.
5. Conclusions Jang, J.S., Sun, C.T., Mizutani, E., 1997. Neuro-fuzzy and Soft
Computing. Prentice-Hall, New York.
Kukkonen, J., Partanen, L., Karpinen, A., Ruuskanen, J., Junninen, H.,
The paper has presented the prediction method of the Kolehmainen, M., Niska, H., Dorling, S., Chatterton, T., Foxall, R.,
daily atmospheric pollution by applying the support vector Cawley, G., 2003. Extensive evaluation of neural networks models for
machine and wavelet decomposition. The important point the prediction of NO2 and PM10 concentrations, compared with a
of this approach is the decomposition of the daily data into deterministic modeling system and measurements in central Helsinki.
Atmospheric Environment 37, 4539–4550.
the wavelets and individual prediction of wavelets at
Mallat, S., 1989. A theory for multiresolution signal decomposition: the
different levels. Application of SVM instead of classical wavelet representation. IEEE Transactions of the Pattern Analysis and
MLP has enabled to obtain much better accuracy of Machine Intelligence 11, 674–693.
prediction of the wavelet coefficients and as a result also of Peace, M., Dirks, K., Austin, G., 2005. The prediction of air pollution
the whole pollutant concentration. using a site optimised model and mesoscale model wind forecast.
The proposed approach has been tested on the data of World Weather Research Program Symposium on Nowcasting and
Very Short Range Forecasting, Toulouse, 2005.
the meteorological stations situated in northern Poland. Platt, L., 1998. Fast training of SVM using sequential optimization. In:
The obtained results of prediction are in good agreement Scholkopf, B., Burges, B., Smola, A. (Eds.), Advances in Kernel
with the actual measurements made at these stations, Methods—Support Vector Learning. MIT Press, Cambridge,
irrespective of the type of pollutant. The important pp. 185–208.
observation from these experiments is very good general- Schölkopf, B., Smola, A., 2002. Learning with Kernels. MIT Press,
Cambridge, MA.
ization ability of the applied predicting system. The SVM Vapnik, V., 1998. Statistical Learning Theory. Wiley, New York.
networks trained on the data related to only one pollutant Wavelet Toolbox for Matlab, 1997. User manual, MathWorks, Natick,
(NO2) measured at one chosen station, were capable of USA, 1997.

You might also like