Professional Documents
Culture Documents
Contaminacion Del Aire Recopilados Por Sensores Moviles y Estaciones Terrestres PDF
Contaminacion Del Aire Recopilados Por Sensores Moviles y Estaciones Terrestres PDF
Contaminacion Del Aire Recopilados Por Sensores Moviles y Estaciones Terrestres PDF
Science
Yee Leung, Yu Zhou, Ka-Yu Lam, Tung Fung, Kwan-Yau Cheung, Taehong Kim
& Hanmin Jung
To cite this article: Yee Leung, Yu Zhou, Ka-Yu Lam, Tung Fung, Kwan-Yau Cheung, Taehong
Kim & Hanmin Jung (2019) Integration of air pollution data collected by mobile sensors and ground-
based stations to derive a spatiotemporal air pollution profile of a city, International Journal of
Geographical Information Science, 33:11, 2218-2240, DOI: 10.1080/13658816.2019.1633468
RESEARCH ARTICLE
1. Introduction
Air pollution has become a life-threatening hazard that could produce severe conse-
quences, and hence undermine the sustainable development of cities. It has been reported
that exposure to air pollution can pose a significant threat to human health or even cause
death (Brauer et al. 2015, WHO 2016, Shaddick et al. 2018). To find an effective solution to
the air pollution problem, it is necessary to have a comprehensive understanding of the air
pollution process. Such an understanding essentially depends on reliable records that can
depict the variations in air pollution in time and space at various scales. For urban environ-
ments, the most important spatial scales are those within a city.
a given time and location without observations (Yanosky et al. 2008, Pollice and
Lasinio 2010, Liang and Kumar 2013). Because of the high computation cost of
spatiotemporal Kriging (Yanosky et al. 2008), functional Kriging has been developed
and applied to the estimation of air pollutant concentration (Henao 2009, Montero
et al. 2015, Montero and Fernández-Avilés 2018). In addition to the Kriging methods,
regression and its variations such as local smoothing or weighted spatial average have
been applied to the estimation of air quality at unmonitored locations. It has been
shown by a study that their performances are similar to that of Kriging (Wong et al.
2004). However, these direct methods still suffer from the insufficiency of direct air
pollution observation records because of the limited number of ground stations. To
alleviate such insufficiency and shortcomings of direct observation, some ancillary
sensors have been introduced as data sources that act complementary to the ground
stations (Devarakonda et al. 2013). Ancillary sensors are usually low-cost and mobile;
therefore, by deploying a large number of such sensors, a fine-grained air quality
monitoring network can be formed based on the Internet of Things paradigm
(Devarakonda et al. 2013, Firculescu and Tudose 2015, Brynda et al. 2016, Biondi
et al. 2017, Kersting et al. 2017). After calibration, these mobile sensors can obtain
reliable information about air quality (Mead et al. 2013, Williams et al. 2013,
Moltchanov et al. 2015). Among the different types of mobile sensors, those carried
by vehicles such as buses and taxis are capable of directly and accurately monitoring
air quality over a large area (Devarakonda et al. 2013, Firculescu and Tudose 2015,
Biondi et al. 2017, Kersting et al. 2017). They can complement the station observations
so that a sufficient number of observations in space and time can be made to derive
a more accurate spatiotemporal air pollution profile of a region, such as a city. For
example, OpenSense (Aberer et al. 2010), Air Quality Egg (http://airqualityegg.com),
and MESSAGE (Mobile Environmental Sensing System Across Grid Environments,
http://research.cs.ncl.ac.uk/message/) are some projects that use mobile sensors to
improve the monitoring of air quality. Cities like Sharjah (Al-Ali et al. 2010), Prague
(Brynda et al. 2016), Catania (Biondi et al. 2017), Bucharest (Firculescu and Tudose
2015), and Daegu (Kersting et al. 2017) have already started such mobile-sensor net-
works for air pollution monitoring.
With a limited number of ground stations and a large number of mobile sensors
collecting air pollution concentration data over a city, an outstanding but important
issue is the integration of these two types of data to construct a composite and more
complete database for subsequent analysis. Nevertheless, these two types of data are of
different spatial and temporal resolutions/scales. In general, ground stations regularly
take measurements at fixed locations and release the mean value over some duration of
time, usually 1 h. In contrast, mobile sensors carried by vehicles only collect the data
along their routes. Measurements are taken only at the time when they pass through
a point in space. How to effectively integrate the station data, which are spatially sparse
but temporal dense, and the mobile-sensor data, which are spatially dense but tempo-
rally sparse, have thus become a challenge. The aim of this study is to construct
a rigorous framework to integrate the data collected by mobile sensors and ground
stations in an urban environment. By using this framework, we can estimate the air
quality at any point in space and time and obtain the spatiotemporal profile of the air
pollution of a city. This framework can be extended to integrate other forms of data.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2221
2.2. Methods
To facilitate our discussion, we outline our study framework in Figure 1 and introduce some
notations, especially for the mobile-sensor records. We first average the mobile-sensor
records to remove some redundancy and then unify the station and mobile-sensor records.
On the basis of the unified records, two-step local regression is applied to estimate the air
pollutant concentration at the target locations and times. The two-step local regression
performs local regression for the given locations at all times and then performs regression
on the obtained time series to get the final estimation. Thus, the spatial pattern of air
pollution for the entire city can be obtained for a specific time. With regard to the pattern for
a specific location, a time series of records can be estimated, and its components and auto-
correlation are subjected to empirical mode decomposition and detrended fluctuation
analysis in order to study the temporal behaviors and validate the estimation results.
Based on the flowchart shown in Figure 1, we can quantitatively profile the spatiotemporal
dynamics of the air pollution of a place. The general idea for the integration of the air
pollutant concentrations recorded by mobile sensors and ground stations is as follows:
firstly, we remove the data redundancy by averaging the records collected by each sensor
within a short time period to make them comparable with the ground stations; secondly, for
each given time stamp, we use these new records and the ground station records at the
same time to estimate the air pollutant concentration at the given location; thirdly, for all
time stamps, we obtain a series consisting of the estimations at the given location by
smoothing this series and then taking the smoothed value at the given time stamp as the
estimation at the given location and time stamp.
2222 Y. LEUNG ET AL.
Figure 1. Flowchart of the proposed framework for the spatiotemporal integration and validation of
air pollution data obtained from the ground-based stations and mobile sensors.
With regard to the symbols, we use ðu; vÞ and t to indicate the location and time
stamp for each record, respectively. Specifically, the station records form a dataset
fsi ðui ; vi ; tk ÞgM
i¼1 ; whereas, the mobile-sensor records form another dataset
fmj ðuj;lðjÞ ; vj;lðjÞ ; tj;lðjÞ ÞgNj¼1 . Here, the subscripts i, j, and lðjÞ denote the ith station, the jth
taxi with M ¼ 13 and N ¼ 44, and the lth location of the jth taxi, respectively. Regarding
the station records si ðui ; vi ; tk Þ, their locations are fixed for all T ¼ 744 time stamps, tk ,
with the interval between two successive time stamps, δtk , equaling 1 h. With respect to
the records of mobile sensors mj ðuj;lðjÞ ; vj;lðjÞ ; tj;lðjÞ Þ, their locations depend on the time
stamp tj;lðjÞ and the taxi ID j. The number of time stamps tj;lðjÞ may be different for
different taxis. The interval between two continuous time stamps δtj;lðjÞ is 10 s. We limit
the locations of our records to the area between 35.61 N and 36.03 N latitudes and
128.33 E and 128.77 E longitudes, i.e. Daegu city.
operation.
Because the temporal resolution of the ground stations is 1 h, we assume that there is
little change within an hour so that the time stamp of this newly-generated m e j is reset at
ðjÞ
e j is set at ðuj;lðjÞ ; vj;lðjÞ Þ, with l corresponding to the time stamp tj;lðjÞ
tk . The location of m
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2223
closest to the mean value of those time stamps in ½tk1 þ ðq 1Þδt ; tk1 þ qδt . In fact,
there could be some systematic differences between the mobile sensor and ground
station records because they use different sensors. We found that for each mobile
sensor, the relationship between its records and those obtained by the ground stations
within a range δd is highly linear and insensitive to the range δd 2 ½500 m; 1000 m.
Generally, more data can lead to more reliable statistical analysis. Therefore, we cali-
brated the mobile-sensor records using linear regression, which was obtained on the
basis of the linear relationship with δd ¼ 1000 m.
To indicate the dependence of the locations of those newly-generated records m e j on
k, one more subscript k was employed in the form ðuj;lðjÞ ;k ; vj;lðjÞ ;k Þ. For each tk , the number
of new records Mk should be no more than M n. To simplify the notations, we analog
each new record to a unique record of a new station ~j with the corresponding location
ðuj;lðjÞ ;k ; vj;lðjÞ ;k Þ and the time stamp tk . Re-denoting the location as ðue; veÞ, all newly-
j j
generated records at tk can be re-expressed as fm eeðue; ve; tk ÞgMk .
j j j ej¼1
With these notations, given a time stamp tk we have a set of records
fsi ðui ; vi ; tk ÞgM
i¼1 [ fm eeðue; ve; tk ÞgMk . If we treat all records as equivalent and unify the
j j j ej¼1
notations as Yi;k , then for given tk we can re-express the set of records as
fYi;k ðui;k ; vi;k ; tk Þgni¼1
k
with nk ¼ M þ Mk . In this sense, the dataset we are going to study
is Y ¼ [k¼1 fYi;k ðui;k ; vi;k ; tk Þgni¼1
T k
, which is standard longitudinal data in statistics. Thus,
the construction of the dataset depends on only one parameter δt , which can be
empirically determined as discussed below.
function to perform the analysis, to make our estimation more adaptive and data-driven,
we employed non-parametric regression. The local variations with respect to time and
space are important for understanding the air pollution processes. We, therefore,
employed a local regression model, namely two-step local regression, to capture the
variations in space and time.
To facilitate our discussion, we first briefly describe the two-step local regression
model in Yan and Mei (2014). The purpose of this model is to explore the variations of
the variable of interest, i.e. the PM2:5 concentration, Y, in our study, with respect to time
and space. Essentially, the two-step local regression model is a method that combines
geographically weighted regression (GWR) and local smoothing for handling both
spatial and temporal information (Yan and Mei 2014). Specifically, we attempt to identify
a function f for the estimation of Yi;k as follows:
where i;k is the error term with zero mean (i ¼ 1; 2; . . . ; nk ; k ¼ 1; 2; . . . ; T). Given
a location of interest and a time stamp ðu0 ; v0 ; t0 Þ, we can estimate Y0;0 as f ðu0 ; v0 ; t0 Þ
by a two-step local model and denote it as Y e0;0 . To identify the function f , we first
estimate Y0;k at ðu0 ; v0 ; tk Þ for each time stamp tk by the locally linear GWR on the basis
of Yi;k as
ð1Þ
Yi;k ¼ f1 ðui;k ; vi;k ; tk Þ þ i;k ; (2)
Then, we can get the estimated f at the given ðu0 ; v0 ; t0 Þ as the estimate of f2 ðu0 ; v0 ; t0 Þ,
e0;0 .
which is the desired Y
The implementation of the first step requires the assumption of continuity of the
e0;k and
partial derivatives @f1 =@u and @f1 =@v of f1 ðu; v; tÞ with respect to u and v. Then, Y
the estimates of its partial derivatives can be determined by minimizing
X
nk
2 dik
Yi;k Y0;k @f1 =@uðu0 ; v0 ; tk Þðui;k u0 Þ @f1 =@vðu0 ; v0 ; tk Þðvi;k v0 Þ Kð 0 Þ (4)
i¼1
hk
with respect to Y0;k , @f1 =@uðu0 ; v0 ; tk Þ, and @f1 =@vðu0 ; v0 ; tk Þ. Here, KðÞ is a kernel function,
ðikÞ
d0 is the Euclidean distance between ðu0 ; v0 Þ and ðui;k ; vi;k Þ, and hk is the bandwidth for
given tk , which can be determined by the cross-validation procedure. With
0 1
1 u1;k u0 v1;k v0
B1 u2;k u0 v2;k v0 C
B C
Xðu0 ; v0 ; tk Þ ¼ B . .. .. C; (5)
@ .. . . A
1 unk ;k u0 vnk ;k v0
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2225
0 1
Y1;k
B Y2;k C
B C
Yk ¼ B .. C; (6)
@ . A
Ynk ;k
and
!
ð1kÞ ð2kÞ ðn kÞ
d0 d0 d0 k
Wðu0 ; v0 ; tk Þ ¼ diag Kð Þ; Kð Þ; ; Kð Þ ; (7)
hk hk hk
we can define
1
Qðu0 ; v0 ; tk Þ ¼ XT ðu0 ; v0 ; tk ÞWðu0 ; v0 ; tk ÞXðu0 ; v0 ; tk Þ XT ðu0 ; v0 ; tk ÞWðu0 ; v0 ; tk Þ; (8)
where the superscript T indicates the transpose of the corresponding matrix. Let
e0;k in the form of
eT ¼ ð1; 0; 0Þ, we can then obtain the estimate Y
e0;k ¼ eT Qðu0 ; v0 ; tk ÞYk :
Y (9)
Based on the series fYe0;k gTk¼1 obtained in the first step, we can apply local linear
smoothing to get the desired estimate Y e0;0 . Because the location has been fixed, we
e0;k , Y
simplify Y e0;0 , and f2 ðu0 ; v0 ; tk Þ as Yek , Y
e0 , and f2 ðtk Þ, respectively. Similar to the first
step, we assume that the derivative of f2 ðtÞ is continuous for t 2 ½t1 ; kT , and denote it as
e0 can be calculated by minimizing
df2 =dt. The estimate of Y
XT 2 t t
Yek f2 ðt0 Þ df2 =dtðtk t0 Þ Kð
k 0
Þ (10)
k¼1
r0
with respect to f2 ðt0 Þ and df2 =dtðt0 Þ. Here, KðÞ and r0 are also the kernel function and
the corresponding bandwidth. Using
0 1
1 t1 t0
B 1 t2 t0 C
B C
Mðt0 Þ ¼ B .. .. C; (11)
@. . A
1 tT t0
0 1
e1
Y
BY C
B e2 C
Z0 ¼ B . C; (12)
@ .. A
ek
Y
and
t1 t0 t2 t0 tT t0
Wðt0 Þ ¼ diag Kð Þ; Kð Þ; ; Kð Þ ; (13)
r0 r0 r0
we have
1
P0 ðt0 Þ ¼ MT ðt0 ÞWðt0 ÞMðt0 Þ MT ðt0 ÞWðt0 Þ: (14)
2226 Y. LEUNG ET AL.
e0 as
Denoting ð1; 0Þ as eT2 , we have the estimate Y
e0 ¼ e2 T P0 ðt0 ÞZ0 ;
Y (15)
(1) Construct the upper and lower envelope, ENVmax and ENVmin , on the basis of local
maxima and minima, respectively.
(2) Calculate the average of the upper and lower envelope M using M ¼ ðENVmax þ
ENVmin Þ=2 and the difference between M and X, i.e. h ¼ X M, can be obtained.
(3) If h is not an IMF; then, iteratively repeat the above two steps on h until the
envelopes have zero-mean under certain stopping criteria or h becomes an IMF. Such h
is taken as the first component and is denoted as IMF1 .
(4) Treat the residues, X IMF1 , as a new series and perform the above steps to
extract IMF2 .
(5) Repeat the iterative procedure until no more IMFs can be extracted.
Denoting the number of extracted IMFs as k, the implementation of EMD decom-
P
poses X as X ¼ ki¼1 IMFi þ r, where r is the residual from which no more IMFs can be
extracted. However, EMD suffers from the mode-mixing problem (Wu and Huang 2009).
To solve this problem, the ensemble EMD (EEMD) was proposed by Wu and Huang
(2009). The key modification of EMD is to add a white noise series to X before perform-
ing EMD and then obtain the EEMD IMFs by ensemble averaging the corresponding
EMD IMFs. The MatLab code of EEMD employed in this study was downloaded from
http://rcada.ncu.edu.tw/.
If the two-step local regression model gives reliable estimates, then the estimated series
should not only be highly correlated to the observed series, but the EEMD components
should also be similar to those of the observed series. EMD has been applied to characterize
many real-life processes, such as sunspots (Zhou and Leung 2010a, Zhou et al. 2013) and air
pollution process (Hu et al. 2013, Jiang and Bai 2018), along with time. In this study, we
evaluated the performance of the two-step local regression model by examining 1) if the
observed and estimated records have the same number of EMD components; and 2) if their
corresponding components are highly correlated.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2227
(1) Divide the cumulative sum of X into several segments with equal length s.
(2) For each of these segments, a detrending step removes the local trend; then, the
variance around the local trend can be obtained.
(3) The fluctuation function FðsÞ is further calculated as the average of variances over
all segments.
(4) Via the scaling behavior FðsÞ,sα , a scaling exponent α could be obtained.
In principle, the detrended segments can be considered as statistically identical so that
the average overall segment is equivalent to the ensemble average over different realiza-
tions of the given process (Höll et al. 2016). The scaling exponent α can be connected with
that of the autocorrelation decaying power γ of the scaling law CðτÞ,τ γ for a stationary
series by α ¼ 1 γ=2 (Höll and Kantz 2015). In fact, γ < 1, i.e. α > 0:5. It ensures the
divergence of sum of CðτÞ and corresponds to long-range correlation.
Obviously, if the performance of the two-step local regression estimation is satisfac-
tory; then, the DFA scaling behavior of the estimated series should be very similar to that
of the observed series. This will give an additional angle to validate the estimation
results.
3.1.1. EEMD
EEMD was employed to evaluate the performance of our estimates for Daemyeong district
station. Eight components and one residual were extracted for both of the observed and
estimated PM2:5 records. As shown in Figure 4, they are very similar with respect to both the
original series and most of their EEMD components. Quantitatively, we employed Fourier
transform to extract the dominant periods of the individual IMFs of both series and
measured the similarity between them by cross-correlation.
Table 1 shows that the dominant periods of the original series and IMFs are very
similar for the observed and estimated records, especially for the IMFs with long periods.
As for the cross-correlation, the corresponding IMFs of the observed and estimated
values are highly correlated except for the first two, which are relatively more irregular.
The p-values of cross-correlations other than that between two IMF1 are less than 0.05,
indicating that the detected correlations are significant at 0.95 level of confidence. For
the Horim-dong, Hyeonpung-myeon, and Igok-dong station displayed in Figure 3, the
correlation coefficients between the estimated and observed PM2:5 values are 0.90, 0.87,
and 0.91, respectively.
60
observed
9−hour moving average
50 estimated
PM2.5 (µg/m )
40
3
30
20
10
0
100 200 300 400 500 600 700
hours
Figure 2. Observed, 9-h moving average and estimated PM2.5 values at the Daemyeong district
station in July 2017.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2229
3.1.2. DFA
We also employed DFA to study long-range correlation between the observed and esti-
mated series at Daemyeong district in July 2017. The detrending order was set for 2, which
leads to scaling behaviors very similar to those obtained by the detrending order 3. Here, we
did not filter anything, such as the periodic non-stationarity shown in Figures 2 and 4, from
the series but analyzed them directly, because the purpose of employing the DFA is to
characterize long-range correlation in series, and the filters can remove some information
from the series. However, it should be noted that if our focus is on the dynamics of high-
frequency information, these cyclic trends should be removed because they are strong
enough to mask the dynamics of the high-frequency information. The DFA scaling beha-
viors are shown in Figure 5. We can see that s0 ¼ 101:4 24 h is a critical scale. At the scales
s > s0 , their DFA scaling behaviors are almost the same. In this scaling range, the estimated
scaling exponents α 1:3. However, at smaller scales, there are some differences between
the two DFA scaling behaviors. The DFA fluctuation function FðsÞ of the estimated series has
larger power than that of the observed series in the range s < s0 .
60 50
observed observed
estimated 45 estimated
50
40
35
40
PM2.5 (µg/m )
PM2.5 (µg/m )
3
3
30
30 25
20
20
15
10
10
5
0 0
0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800
hours hours
70
observed
estimated
60
50
PM2.5 (µg/m )
3
40
30
20
10
0
0 100 200 300 400 500 600 700 800
hours
Figure 3. Observed and estimated PM2.5 values at the Horim-dong (left upper panel), Hyeonpung-
myeon (right upper panel), and Igok-dong (bottom panel) stations in July 2017.
2230 Y. LEUNG ET AL.
0
−5
5
IMF2
0
−5
5
IMF3
0
−5
5
IMF4
0
−5
0 100 200 300 400 500 600 700
hours
5
IMF5
0
−5
10
IMF6
−10
4
2
IMF7
0
−2
−4
1
IMF8
−1
20
r
10
0 100 200 300 400 500 600 700
hours
Figure 4. The original series and its EEMD components of the observed, 9-h moving average and
estimated PM2.5 values at the Daemyeong district station in July 2017.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2231
region was divided into 40 40 grids with size of approximately 1000 m 1160 m. For each
grid, we estimated the PM2:5 at its center to represent the air pollutant concentration of this
grid. The time stamp 12:00 pm on 25 July 2017 was selected as an example to show the
spatial pattern of the estimated PM2:5 . To facilitate our analysis, we constructed a web-based
software platform for efficient data management, data analysis, query, and visualization.
In Figure 6, the symbols of cars and buildings denote the location of the mobile sensors
and ground stations at a given time, respectively. The colors of these symbols exhibit the
extent of the recorded air pollutant concentrations, whereas the colors of the grids on the
map display the estimated values. Specifically, yellow, green, and purple indicate the low,
moderate, and high concentrations of PM2:5 , respectively. The gradation of colors from
yellow to purple indicates the gradation of air pollution from low to high. As shown in Figure
6, we can observe that the air quality with respect to PM2:5 concentration improves when
Table 1. Dominant period of the original series and its EEMD components for the observed,
9-h moving average and estimated PM2:5 records at the Daemyeong district in July 2017.
Observation (day) 9-Hour moving average (day) Estimation (day)
Original 42.67 42.67 42.67
IMF1 0.16 0.11 0.08
IMF2 0.37 0.23 0.44
IMF3 0.99 0.99 0.99
IMF4 1.33 2.03 1.33
IMF5 3.88 3.88 3.88
IMF6 8.23 8.23 8.23
IMF7 42.67 42.67 42.67
IMF8 42.67 42.67 42.67
2.5
2
1.3
1.5
log10(F(s))
0.5
0
observed
2.5
9−hour moving average
−0.5
estimated
−1
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
log10(s)
Figure 5. The DFA scaling behavior of the observed, 9-h moving average and estimated PM2.5
records at the Daemyeong district station in July 2017.
2232 Y. LEUNG ET AL.
Figure 6. Estimated PM2.5 of 1600 grids covering the city area of Daegu at time 12:00 pm on 25 July 2017
using data collected by both of mobile sensors and ground stations. Colour changing from yellow to
green to purple indicates the change of PM2.5 concentration from low to moderate to high.
moving from the southwestern to the northeastern corner, which is consistent with the
conclusion based on the direct observation of the records collected by mobile sensors and
ground stations. We also used the 30 30 and 50 50 grids to cover the whole city. Very
similar spatial patterns were obtained. The estimated spatial pattern shows that PM2:5
concentration gradually decreases from the southwestern corner to the northeastern
corner. In general, the estimated pattern is consistent with the observation records. On
the other hand, if we use only the air pollution data collected by the ground stations to
estimate the spatial pattern of air pollutant concentration at the time stamp 12:00 pm on
25 July 2017; then, the results shown in Figure 7 display a very different pattern compared
with that shown in Figure 6. Such difference indicates that if we only have ground stations,
which are limited in number; then, we will fail to capture the real patterns of air pollution
distribution in the whole city. This demonstrates that mobile sensors provide additional
information necessary for air pollution profiling. To show the temporal variations of the
spatial pattern, we give one more example in Figure 8 at another time stamp (12:00 pm on
18 July 2017) for comparison.
Figure 7. Estimated PM2.5 of 1600 grids covering the city area of Daegu at time 12:00 pm on
25 July 2017 using data collected by only ground stations. Color changing from yellow to green to
purple indicates the change of PM2.5 concentration from low to moderate to high.
indicating the good performance of our estimations. As for the spatial pattern, the
stations located near the city center have coefficients higher than those at the boundary.
This is understandable because taxis usually pass through the city center more often
than the fringes the of the city. We then compared the estimates and observations for all
stations. As shown in Figure 9, the value of the correlation coefficient is 0.88.
4. Discussion
As argued above, in order to profile the spatiotemporal variations of air pollutant
concentrations in a city, ground station measurements must be supplemented by
mobile-sensor measurements. The challenge then is to find a rigorous method to
integrate both to form a composite dataset for further analysis. However, to the best
of our knowledge, there are no studies on the integration of these two types of data. In
our view, the lack of such research may be attributed to the difficulty in integrating data
of different types, i.e. regular measurements made at fixed ground stations, which have
high temporal resolution but low spatial resolution, and irregular measurements made
by mobile sensors at different locations, which have low temporal resolution but high
spatial resolution. Therefore, one of the main contributions of this study is the formula-
tion of a rigorous framework for such integration. As we can observe in the ‘Empirical
Results and Interpretations’ section (see Figures 2 and 6), the integrated data capture the
temporal and spatial characteristics of air pollutant concentrations well. Furthermore,
the proposed framework enables estimation using direct information as well as the
incorporation of indirect information for a more comprehensive and accurate profiling of
air pollution concentrations via explanatory variables.
2234 Y. LEUNG ET AL.
Figure 8. Estimated PM2.5 of 1600 grids covering the city area of Daegu at time 12:00 pm on
18 July 2017 using data collected by both of mobile sensors and ground stations. Color changing
from yellow to green to purple indicates the change of PM2.5 concentration from low to moderate to
high.
70
regression formula: y=0.64x+6.74
60
values (µg/m )
3
50
40
2.5
estimated PM
30
20
10
Figure 9. Comparison between estimated and observed PM2.5 values for all 13 stations at all 744-
time stamps in July 2017.
The similarity at large scales and difference at small scales can also be observed in
Figure 2: the estimated PM2:5 series can generally characterize the air pollution process
at large scales but is smoother than the observed series. We think it is understandable,
because the two-step local regression model includes the kernel smoothing procedure,
which could filter out some of the high-frequency information. As depicted in Figure 4,
the first two IMFs of the estimated series have much smaller amplitudes, making
effective extraction of the dominant period more difficult. In addition, it shows that
the IMF2 of the observed series has the dominant period of 0.37 day (approximately 9 h).
Therefore, we use the 9-h moving average to smoothen the observed series to examine
the low-pass filter effect of the two-step regression model. Figure 4 displays that the
amplitudes of the first two IMFs of the 9-h moving average series are much smaller than
those of the observed series. The cross correlations between the 9-h moving average
and observed IMF1 and IMF2 in Table 2 are also very low, even lower than those
between the estimated and observed IMFs. Therefore, the low-pass filter smoothens
out the high-frequency information and reduces the correlation between the filtered
and observed series at small scales. In contrast, the correlation at large scales is well
maintained. The low-pass filter effect can also explain the difference between the DFA
scaling behaviors of the observed and estimated series, because the 9-h moving average
series exhibits a DFA scaling behavior very similar to that of the estimated series. In fact,
2236 Y. LEUNG ET AL.
Table 2. Pairwise cross correlation of the observation, 9-h moving average, and estimation of
PM2:5 records at Daemyeong district in July 2017.
Observation 9-Hour moving average Observation
v.s. Estimation v.s. Estimation v.s. 9-Hour Moving Average
Original 0.94 0.96 0.96
IMF1 0.13a −0.01a 0.05a
IMF2 0.58 0.26 0.04a
IMF3 0.86 0.88 0.93
IMF4 0.91 0.91 0.96
IMF5 0.94 0.94 0.98
IMF6 0.97 0.96 0.99
IMF7 0.93 0.94 0.94
IMF8 0.90 0.97 0.98
Residual 1.00 0.99 0.99
a
The corresponding p-values are larger than 0.05.
the low-pass filter, by using the moving average or kernel smoothing procedure of the
two-step local regression model, could introduce very strong short-range correlation
into the filtered data. Such strong auto-correlation may dominate at small scales but
could be overwhelmed by long-range correlation at large scales with a transition scaling
behavior connecting these two dominated behaviors. Therefore, the low-pass-filtered
DFA scaling behavior exhibits a double power law connected by a transition in Figure 5.
Therefore, the difference between the observed and estimated series with respect to
correlations among IMFs and the DFA scaling behavior can be basically attributed to the
low-pass filter effect. However, it by no means implies the equivalence of the two-step
local regression model and the simple moving average, because of their first two IMFs,
especially IMF1 , exhibit low correlation (see Table 2). Compared to the moving average,
the two-step local regression model could retain more high-frequency information,
which is indicated by its higher cross correlations to the observed IMF1 and IMF2 .
Usually, the estimation performance is evaluated simply by R2 (square of the correlation
coefficient), e.g. the studies of Wong et al. (2004) and He and Huang (2018). In this study, to
further improve this, we further compare the estimations and observations at multiple time
scales and with respect to their dynamics using EMD and DFA, respectively. Although it is not
new to employ EMD and DFA to study air pollutant data, previous studies have usually applied
them to analyze only observed records to uncover the components with different periods (Hu
et al. 2013, Jiang and Bai 2018) and identify their long-range correlation (Varotsos et al. 2005,
Dong et al. 2017, Plocoste et al. 2017). In fact, the comparison at multiple time scales and with
respect to their dynamics can undoubtedly provide more information about the performance
of estimations and provides a more comprehensive and revealing evaluation. This study
provides an understanding of how good are estimations are, as well as the performance of
the estimations at different time scales, e.g. estimations are better at the daily and weekly
scales. Such comprehensive evaluation is another contribution of this study.
5. Conclusions
In this study, we have proposed a rigorous framework to effectively integrate data collected
by the mobile sensors and ground stations. Using the data collected in Daegu during the
whole of July 2017, we have shown that air pollutant concentrations can be estimated by
the two-step local regression model for any given location and time stamp. Furthermore, we
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2237
have demonstrated that the estimated data can characterize the observed data at daily and
weekly scales in the temporal dimension with respect to the EEMD components and long-
range correlation, and from them, we can also extract the general spatial pattern of air
pollution. The differences at small scales can be mainly attributed to the low-pass filter effect
of the two-step local regression model. Although the framework is established and eval-
uated by using only PM2.5, it is applicable to the study of other air pollutants. Therefore, our
research advances the frontier of basic research in air pollution monitoring by integrating
station-based and mobile-sensor-based data. It can be extended to the integration of multi-
source information for urban big-data analysis.
As previously mentioned, the project to record air pollutant concentrations using
mobile sensors was initiated in Daegu at the end of June 2017. In the next phase of the
project, additional mobile sensors will be deployed to further improve our analysis.
Within the proposed study framework, future studies could assess the dependence of air
pollution on other variables and the consequences of air pollution at the location or
time stamp of interest, based on the estimated air pollutant concentrations. In further
research, it is also of interest to improve our method to better capture the local and
extreme variations.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This work was supported by the the earmarked grant of the Hong Kong Research Grants Council
[Project ID: 2120517, Ref: 14653316];The Chinese University of Hong Kong [VC Discretionary Fund].
References
Aberer, K., et al., 2010. Opensense: open community driven sensing of environment. In: Proceedings
of the ACM SIGSPATIAL International Workshop on GeoStreaming. San Jose, CA: ACM, 39–42.
Al-Ali, A., Zualkernan, I., and Aloul, F., 2010. A mobile GPRS-sensors array for air pollution
monitoring. IEEE Sensors Journal, 10 (10), 1666–1671. doi:10.1109/JSEN.2010.2045890
Barrett, S.R., et al., 2012. Public health, climate, and economic impacts of desulfurizing jet fuel.
Environmental Science & Technology, 46 (8), 4275–4282. doi:10.1021/es203325a
Bashan, A., et al., 2008. Comparison of detrending methods for fluctuation analysis. Physica A, 387
(21), 5080–5090. doi:10.1016/j.physa.2008.04.023
Beckerman, B.S., et al., 2013. Application of the deletion/substitution/addition algorithm to select-
ing land use regression models for interpolating air pollution measurements in California.
Atmospheric Environment, 77, 172–177. doi:10.1016/j.atmosenv.2013.04.024
Beran, J., 1994. Statistics for long-memory processes. Vol. 61. Boca Raton, FL: CRC Press.
Biondi, S.M., et al., 2017. Bus as a sensor: A mobile sensor nodes network for the air quality
monitoring. In: Wireless and mobile computing, networking and communications (WiMob). Rome,
Italy: IEEE, 272–277.
Brauer, M., et al., 2003. Estimating long-term average particulate air pollution concentrations:
application of traffic indicators and geographic information systems. Epidemiology, 14,
228–239.
Brauer, M., et al., 2015. Ambient air pollution exposure estimation for the global burden of disease
2013. Environmental Science & Technology, 50 (1), 79–88. doi:10.1021/acs.est.5b03709
2238 Y. LEUNG ET AL.
Briggs, D.J., et al., 2000. A regression-based method for mapping traffic-related air pollution:
application and testing in four contrasting urban environments. Science of the Total
Environment, 253 (1–3), 151–167. doi:10.1016/S0048-9697(00)00429-0
Brynda, P., Kosová, Z., and Kopřiva, J., 2016. Mobile sensor unit for online air quality monitoring. In:
2016 Smart Cities Symposium Prague (SCSP). Prague, Czech Republic: IEEE, 1–4.
Devarakonda, S., et al., 2013. Real-time air quality monitoring through mobile sensing in metro-
politan areas. In: Proceedings of the 2nd ACM SIGKDD international workshop on urban comput-
ing. Chicago, IL: ACM, 15.
Dong, Q., Wang, Y., and Li, P., 2017. Multifractal behavior of an air pollutant time series and the
relevance to the predictability. Environmental Pollution, 222, 444–457. doi:10.1016/j.
envpol.2016.11.090
Firculescu, A.C. and Tudose, D.S., 2015. Low-cost air quality system for urban area monitoring. In:
Control Systems and Computer Science (CSCS), 2015 20th International Conference on. Bucharest,
Romania: IEEE, 240–247.
Gopikrishnan, P., et al., 1999. Scaling of the distribution of fluctuations of financial market indices.
Physical Review E, 60 (5), 5305. doi:10.1103/PhysRevE.60.5305
Grimmond, C. and Oke, T.R., 1999. Aerodynamic properties of urban areas derived from analysis of
surface form. Journal of Applied Meteorology, 38 (9), 1262–1292. doi:10.1175/1520-0450(1999)
038<1262:APOUAD>2.0.CO;2
Gupta, P., et al., 2006. Satellite remote sensing of particulate matter and air quality assessment over
global cities. Atmospheric Environment, 40 (30), 5880–5892. doi:10.1016/j.atmosenv.2006.03.016
He, Q. and Huang, B., 2018. Satellite-based high-resolution PM2.5 estimation over the
Beijing-Tianjin-Hebei region of China using an improved geographically and temporally
weighted regression model. Environmental Pollution, 236, 1027–1037. doi:10.1016/j.
envpol.2018.01.053
Henao, R.G., 2009. Geostatistical analysis of functional data. Thesis (PhD). Universitat Politècnica de
Catalunya.
Hoek, G., et al., 2001. Estimation of long-term average exposure to outdoor air pollution for
a cohort study on mortality. Journal of Exposure Science and Environmental Epidemiology, 11
(6), 459. doi:10.1038/sj.jea.7500189
Höll, M. and Kantz, H., 2015. The relationship between the detrendend fluctuation analysis and the
autocorrelation function of a signal. The European Physical Journal B, 88 (12), 327. doi:10.1140/
epjb/e2015-60721-1
Höll, M., Kantz, H., and Zhou, Y., 2016. Detrended fluctuation analysis and the difference between
external drifts and intrinsic diffusionlike nonstationarity. Physical Review E, 94 (4), 042201.
doi:10.1103/PhysRevE.94.042201
Hu, M., et al., 2013. Spatial and temporal characteristics of particulate matter in Beijing, China using
the empirical mode decomposition method. Science of the Total Environment, 458, 70–80.
doi:10.1016/j.scitotenv.2013.04.005
Huang, N.E., et al., 1998. The empirical mode decomposition and the Hilbert spectrum for non-
linear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series
A: Mathematical, Physical and Engineering Sciences, 454 (1971), 903–995. doi:10.1098/
rspa.1998.0193
Jerrett, M., et al., 2007. Modeling the intraurban variability of ambient traffic pollution in Toronto,
Canada. Journal of Toxicology and Environmental Health, Part A, 70 (3–4), 200–212. doi:10.1080/
15287390600883018
Jha, D.K., et al., 2011. Evaluation of interpolation technique for air quality parameters in Port Blair,
India. Universal Journal of Environmental Research & Technology, 1, 3.
Jiang, L. and Bai, L., 2018. Spatio-temporal characteristics of urban air pollutions and their causal
relationships: evidence from Beijing and its neighboring cities. Scientific Reports, 8 (1), 1279.
doi:10.1038/s41598-017-18107-1
Kantelhardt, J.W., et al., 2001. Detecting long-range correlations with detrended fluctuation
analysis. Physica A, 295 (3–4), 441–454. doi:10.1016/S0378-4371(01)00144-3
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2239
Kersting, J., et al., 2017. Internet of things architecture for handling stream air pollution data. In:
Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security
(IoTBDS 2017). Porto, Portugal, 117–124.
Koscielny-Bunde, E., et al., 1998. Indication of a universal persistence law governing atmospheric
variability. Physical Review Letters, 81 (3), 729–732. doi:10.1103/PhysRevLett.81.729
Koscielny-Bunde, E., et al., 2006. Long-term persistence and multifractality of river runoff records:
detrended fluctuation studies. Journal of Hydrology, 322 (1–4), 120–137. doi:10.1016/j.
jhydrol.2005.03.004
Lee, K.H., et al., 2009. Atmospheric aerosol monitoring from satellite observations: A history of
three decades. In: Y. Kim, U. Platt, M. B. Gu and H. Iwahashi eds. Atmospheric and biological
environmental monitoring. Springer, 13–38.
Leung, Y., et al., 2018. An integrated web-based air pollution decision support system–a prototype.
International Journal of Geographical Information Science, 32 (9), 1787–1814. doi:10.1080/
13658816.2018.1460752
Li, J., Carlson, B.E., and Lacis, A.A., 2015. How well do satellite AOD observations represent the
spatial and temporal variability of PM2.5 concentration for the United States? Atmospheric
Environment, 102, 260–273. doi:10.1016/j.atmosenv.2014.12.010
Liang, D. and Kumar, N., 2013. Time-space Kriging to address the spatiotemporal misalignment in
the large datasets. Atmospheric Environment, 72, 60–69. doi:10.1016/j.atmosenv.2013.02.034
Liao, D., et al., 2006. GIS approaches for the estimation of residential-level ambient PM
concentrations. Environmental Health Perspectives, 114 (9), 1374. doi:10.1289/ehp.9169
Lim, D., Kim, T., and Jung, H., 2018. Fine-grained particulate matter prediction using long
short-term memory on vehicle IoT platform. In: International Conference on Future Information
& Communication Engineering. vol. 10, Pattaya, Thailand, 295–296.
Ma, Z., et al., 2014. Estimating ground-level PM2.5 in China using satellite remote sensing.
Environmental Science & Technology, 48 (13), 7436–7444. doi:10.1021/es5009399
Mead, M.I., et al., 2013. The use of electrochemical sensors for monitoring urban air quality in
low-cost, high-density networks. Atmospheric Environment, 70, 186–203. doi:10.1016/j.
atmosenv.2012.11.060
Mei, C.L. and Wang, N., 2012. Modern regression analysis methods. Beijing: Science Press.
Moltchanov, S., et al., 2015. On the feasibility of measuring urban air pollution by wireless
distributed sensor networks. Science of the Total Environment, 502, 537–547. doi:10.1016/j.
scitotenv.2014.09.059
Montero, J.M. and Fernández-Avilés, G., 2018. Functional Kriging prediction of atmospheric particulate
matter concentrations in Madrid, Spain: is the new monitoring system masking potential public
health problems? Journal of Cleaner Production, 175, 283–293. doi:10.1016/j.jclepro.2017.12.041
Montero, J.M., et al., 2015. Spatial and spatio-temporal geostatistical modeling and Kriging. Vol. 998.
West Sussex, UK: John Wiley & Sons.
Peng, C.K., et al., 1994. Mosaic organization of DNA nucleotides. Physical Review E, 49 (2),
1685–1689. doi:10.1103/PhysRevE.49.1685
Plocoste, T., Calif, R., and Jacoby-Koaly, S., 2017. Temporal multiscaling characteristics of particulate
matter PM10 and ground-level ozone O3 concentrations in Caribbean region. Atmospheric
Environment, 169, 22–35. doi:10.1016/j.atmosenv.2017.08.068
Pollice, A. and Lasinio, G.J., 2010. Spatiotemporal analysis of the PM10 concentration over the
Taranto area. Environmental Monitoring and Assessment, 162 (1–4), 177–190. doi:10.1007/
s10661-009-0779-y
San Jose, R., Karatzas, K., and Perez, J., 2008. Air quality modeling. Ecological Models, 1, 111–123.
Shaddick, G., et al., 2018. Data integration model for air quality: A hierarchical approach to the
global estimation of exposures to ambient air pollution. Journal of the Royal Statistical Society:
Series C (applied Statistics), 67 (1), 231–253. doi:10.1111/rssc.12227
Van Donkelaar, A., et al., 2010. Global estimates of ambient fine particulate matter concentrations
from satellite-based aerosol optical depth: development and application. Environmental Health
Perspectives, 118 (6), 847. doi:10.1289/ehp.0901623
2240 Y. LEUNG ET AL.
Van Donkelaar, A., et al., 2016. Global estimates of fine particulate matter using a combined
geophysical-statistical method with information from satellites, models, and monitors.
Environmental Science & Technology, 50 (7), 3762–3772. doi:10.1021/acs.est.5b05833
Varotsos, C., Ondov, J., and Efstathiou, M., 2005. Scaling properties of air pollution in Athens,
Greece and Baltimore, Maryland. Atmospheric Environment, 39 (22), 4041–4047. doi:10.1016/j.
atmosenv.2005.03.024
Wang, J. and Christopher, S.A., 2003. Intercomparison between satellite-derived aerosol optical
thickness and PM2.5 mass: implications for air quality studies. Geophysical Research Letters, 30
(21), 2095. doi:10.1029/2003GL018174
Williams, D.E., et al., 2013. Validation of low-cost ozone measurement instruments suitable for use
in an air-quality monitoring network. Measurement Science and Technology, 24 (6), 065803.
doi:10.1088/0957-0233/24/6/065803
Wong, D.W., Yuan, L., and Perlin, S.A., 2004. Comparison of spatial interpolation methods for the
estimation of air quality data. Journal of Exposure Science and Environmental Epidemiology, 14
(5), 404. doi:10.1038/sj.jea.7500338
World Health Organization, 2016. Ambient air pollution: A global assessment of exposure and burden
of disease. Geneva, Switzerland: World Health Organization. https://www.who.int/phe/publica
tions/air-pollution-global-assessment/en/
Wu, Z., et al., 2007. On the trend, detrending, and variability of nonlinear and nonstationary time
series. Proceedings of the National Academy of Sciences, 104 (38), 14889–14894. doi:10.1073/
pnas.0701020104
Wu, Z. and Huang, N., 2009. Ensemble empirical mode decomposition: A noise-assisted data analysis
method. Advances in Adaptive Data Analysis, 1 (1), 1–41. doi:10.1142/S1793536909000047
Yan, N. and Mei, C.L., 2014. A two-step local smoothing approach for exploring spatio-temporal
patterns with application to the analysis of precipitation in the mainland of China during
1986–2005. Environmental and Ecological Statistics, 21 (2), 373–390. doi:10.1007/s10651-013-
0259-y
Yanosky, J.D., et al., 2008. Spatio-temporal modeling of chronic PM10 exposure for the nurses’
health study. Atmospheric Environment, 42 (18), 4047–4062. doi:10.1016/j.
atmosenv.2008.01.044
Zheng, C., et al., 2017. Analysis of influential factors for the relationship between PM2.5 and AOD in
Beijing. Atmospheric Chemistry and Physics, 17 (21), 13473. doi:10.5194/acp-17-13473-2017
Zheng, Y., et al., 2015. Forecasting fine-grained air quality based on big data. In: Proceedings of the 21th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, NSW: ACM,
2267–2276.
Zhou, Y. and Leung, Y., 2010a. Empirical mode decomposition and long-range correlation analysis
of sunspot time series. Journal of Statistical Mechanics, 2010 (12), P12006. doi:10.1088/1742-
5468/2010/12/P12006
Zhou, Y. and Leung, Y., 2010b. Multifractal temporally weighted detrended fluctuation analysis
and its application in the analysis of scaling behavior in temperature series. Journal of Statistical
Mechanics, 2010 (06), P06021. doi:10.1088/1742-5468/2010/06/P06021
Zhou, Y., Leung, Y., and Ma, J.M., 2013. Empirical study of the scaling behavior of the amplitude–
frequency distribution of the Hilbert–huang transform and its application in sunspot time series
analysis. Physica A, 392 (6), 1336–1346. doi:10.1016/j.physa.2012.11.055
Zhu, J.Y., Sun, C., and Li, V.O., 2015. Granger-causality-based air quality estimation with
spatio-temporal (ST) heterogeneous big data. In: Computer Communications Workshops
(INFOCOM WKSHPS), 2015 IEEE Conference on. Hong Kong, China: IEEE, 612–617.
Zhu, J.Y., Sun, C., and Li, V.O., 2017. An extended spatio-temporal Granger causality model for air
quality estimation with heterogeneous urban big data. IEEE Transactions on Big Data, 3 (3),
307–319. doi:10.1109/TBDATA.2017.2651898