Contaminacion Del Aire Recopilados Por Sensores Moviles y Estaciones Terrestres PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

International Journal of Geographical Information

Science

ISSN: 1365-8816 (Print) 1362-3087 (Online) Journal homepage: https://www.tandfonline.com/loi/tgis20

Integration of air pollution data collected by


mobile sensors and ground-based stations to
derive a spatiotemporal air pollution profile of a
city

Yee Leung, Yu Zhou, Ka-Yu Lam, Tung Fung, Kwan-Yau Cheung, Taehong Kim
& Hanmin Jung

To cite this article: Yee Leung, Yu Zhou, Ka-Yu Lam, Tung Fung, Kwan-Yau Cheung, Taehong
Kim & Hanmin Jung (2019) Integration of air pollution data collected by mobile sensors and ground-
based stations to derive a spatiotemporal air pollution profile of a city, International Journal of
Geographical Information Science, 33:11, 2218-2240, DOI: 10.1080/13658816.2019.1633468

To link to this article: https://doi.org/10.1080/13658816.2019.1633468

Published online: 28 Jun 2019. Submit your article to this journal

Article views: 369 View related articles

View Crossmark data Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=tgis20
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE
2019, VOL. 33, NO. 11, 2218–2240
https://doi.org/10.1080/13658816.2019.1633468

RESEARCH ARTICLE

Integration of air pollution data collected by mobile sensors


and ground-based stations to derive a spatiotemporal air
pollution profile of a city
Yee Leunga,b, Yu Zhoub, Ka-Yu Lamb, Tung Funga,b, Kwan-Yau Cheungb,c,
Taehong Kimd and Hanmin Junge
a
Department of Geography and Resource Management, The Chinese University of Hong Kong, Shatin,
Hong Kong; bInstitute of Future Cities, The Chinese University of Hong Kong, Shatin, Hong Kong;
c
Depeartment of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin,
Hong Kong; dConvergence Service Center, Korea Institute of Oriental Medicine, Daejeon, Republic of Korea;
e
Future Medicine Division, Korea Institute of Science and Technology Information, Daejeon, Republic of
Korea

ABSTRACT ARTICLE HISTORY


Air pollution has become a serious environmental problem causing Received 12 November 2018
severe consequences in our ecology, climate, health, and urban devel- Accepted 14 June 2019
opment. Effective and efficient monitoring and mitigation of air pollu- KEYWORDS
tion require a comprehensive understanding of the air pollution Air pollution; ground-based
process through a reliable database carrying important information station data; integration;
about the spatiotemporal variations of air pollutant concentrations at mobile sensor data; spatio-
various spatial and temporal scales. Traditional analysis suffers from the temporal variation
severe insufficiency of data collected by only a few stations. In this
study, we propose a rigorous framework for the integration of air
pollutant concentration data coming from the ground-based stations,
which are spatially sparse but temporally dense, and mobile sensors,
which are spatially dense but temporally sparse. Based on the inte-
grated database which is relatively dense in space and time, we then
estimate air pollutant concentrations for given location and time by
applying a two-step local regression model to the data. This study
advances the frontier of basic research in air pollution monitoring via
the integration of station and mobile sensors and sets up the stage for
further research on other spatiotemporal problems involving multi-
source and multi-scale information.

1. Introduction
Air pollution has become a life-threatening hazard that could produce severe conse-
quences, and hence undermine the sustainable development of cities. It has been reported
that exposure to air pollution can pose a significant threat to human health or even cause
death (Brauer et al. 2015, WHO 2016, Shaddick et al. 2018). To find an effective solution to
the air pollution problem, it is necessary to have a comprehensive understanding of the air
pollution process. Such an understanding essentially depends on reliable records that can
depict the variations in air pollution in time and space at various scales. For urban environ-
ments, the most important spatial scales are those within a city.

CONTACT Yee Leung yeeleung@cuhk.edu.hk


© 2019 Informa UK Limited, trading as Taylor & Francis Group
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2219

Usually, air quality is monitored by ground-based stations, hereinafter called ground


stations. However, there are only very few stations in most cities, for example, 16
stations in Hong Kong, P.R. China (https://cd.epic.epd.gov.hk/EPICDI/air/station/?lang=
en) and 13 stations in Daegu, Korea (www.airkorea.or.kr/index in Korean and Lim et al.
(2018)). In fact, the areas of Hong Kong and Daegu are 2.755  109 m2 (https://www.
landsd.gov.hk/mapping/en/publications/total.htm) and 8.83  108 m2 (Lim et al. 2018),
respectively, which means that the information obtained by one single station has to be
used to represent the air quality over a very large space, approximately 1.60  108 m2
and 7.0  107 m2 on average in Hong Kong and Daegu, respectively. The true informa-
tion about air quality at the locations of interest could be severely compromised if they
are considerable distances away from their nearest stations. Therefore, it will not be
possible to accurately obtain a reliable spatiotemporal distribution of air pollutant
concentrations, with only a few stations.
To solve this problem, other possible data sources, which could provide comple-
mentary information about the air pollution process, need to be used. Remote sensing
is a common technique for the monitoring of air quality (Wang and Christopher 2003,
Gupta et al. 2006, Lee et al. 2009, Ma et al. 2014, Li et al. 2015). In addition, there are
some emission/chemical-transport models for the simulation of air pollution (San Jose
et al. 2008, Barrett et al. 2012). However, remote sensing provides only an indirect
indication of the air pollution situation with a relatively low spatiotemporal resolution,
e.g. the estimated PM2:5 based on aerosol optical depth (AOD) has a spatial resolution
ranging from hundred meters to several thousand metres in the daily scale. In addi-
tion, remote sensing cannot provide information about the near-surface air pollution
and the relationship between AOD and PM2:5 , which varies greatly with location
because of the difference in local meteorological conditions, land cover or use,
planetary boundary layer height, and the vertical structure of aerosol distribution
(Van Donkelaar et al. 2010, Zheng et al. 2017). With regard to the simulation models,
their accuracy is limited by the assumptions made and the computational power we
have to fully capture air pollutant concentrations at the scale of cities (Grimmond and
Oke 1999). It is not uncommon to see a large divergence between the model simula-
tion results and real-life observations (Leung et al. 2018). Several methods have been
developed for estimating the air pollutant concentrations at the locations of interest
from the data collected by a limited number of ground stations. Some of them employ
indirect information of other relevant factors, for example, the spatial-temporal
Granger causality models constructed on the basis of urban dynamics involving
relevant factors in meteorology, traffic, geography, and air pollution process (Zhu
et al. 2015, 2017), and the land use regression, in which land use is considered as an
explanatory variable in the regression to provide additional information for the esti-
mation of air quality (Briggs et al. 2000, Hoek et al. 2001, Brauer et al. 2003, Jerrett et al.
2007, Beckerman et al. 2013, Van Donkelaar et al. 2016). The methods based on indirect
data might be biased unless we have a complete and clear picture with all key drivers
recognized (Beckerman et al. 2013). There are also some methods that work directly on
air pollutant data, such as the spatial interpolation methods (Wong et al. 2004, Liao
et al. 2006, Jha et al. 2011). Since air pollution data involve both time and space,
spatiotemporal Kriging has been applied to estimate air pollutant concentration at
2220 Y. LEUNG ET AL.

a given time and location without observations (Yanosky et al. 2008, Pollice and
Lasinio 2010, Liang and Kumar 2013). Because of the high computation cost of
spatiotemporal Kriging (Yanosky et al. 2008), functional Kriging has been developed
and applied to the estimation of air pollutant concentration (Henao 2009, Montero
et al. 2015, Montero and Fernández-Avilés 2018). In addition to the Kriging methods,
regression and its variations such as local smoothing or weighted spatial average have
been applied to the estimation of air quality at unmonitored locations. It has been
shown by a study that their performances are similar to that of Kriging (Wong et al.
2004). However, these direct methods still suffer from the insufficiency of direct air
pollution observation records because of the limited number of ground stations. To
alleviate such insufficiency and shortcomings of direct observation, some ancillary
sensors have been introduced as data sources that act complementary to the ground
stations (Devarakonda et al. 2013). Ancillary sensors are usually low-cost and mobile;
therefore, by deploying a large number of such sensors, a fine-grained air quality
monitoring network can be formed based on the Internet of Things paradigm
(Devarakonda et al. 2013, Firculescu and Tudose 2015, Brynda et al. 2016, Biondi
et al. 2017, Kersting et al. 2017). After calibration, these mobile sensors can obtain
reliable information about air quality (Mead et al. 2013, Williams et al. 2013,
Moltchanov et al. 2015). Among the different types of mobile sensors, those carried
by vehicles such as buses and taxis are capable of directly and accurately monitoring
air quality over a large area (Devarakonda et al. 2013, Firculescu and Tudose 2015,
Biondi et al. 2017, Kersting et al. 2017). They can complement the station observations
so that a sufficient number of observations in space and time can be made to derive
a more accurate spatiotemporal air pollution profile of a region, such as a city. For
example, OpenSense (Aberer et al. 2010), Air Quality Egg (http://airqualityegg.com),
and MESSAGE (Mobile Environmental Sensing System Across Grid Environments,
http://research.cs.ncl.ac.uk/message/) are some projects that use mobile sensors to
improve the monitoring of air quality. Cities like Sharjah (Al-Ali et al. 2010), Prague
(Brynda et al. 2016), Catania (Biondi et al. 2017), Bucharest (Firculescu and Tudose
2015), and Daegu (Kersting et al. 2017) have already started such mobile-sensor net-
works for air pollution monitoring.
With a limited number of ground stations and a large number of mobile sensors
collecting air pollution concentration data over a city, an outstanding but important
issue is the integration of these two types of data to construct a composite and more
complete database for subsequent analysis. Nevertheless, these two types of data are of
different spatial and temporal resolutions/scales. In general, ground stations regularly
take measurements at fixed locations and release the mean value over some duration of
time, usually 1 h. In contrast, mobile sensors carried by vehicles only collect the data
along their routes. Measurements are taken only at the time when they pass through
a point in space. How to effectively integrate the station data, which are spatially sparse
but temporal dense, and the mobile-sensor data, which are spatially dense but tempo-
rally sparse, have thus become a challenge. The aim of this study is to construct
a rigorous framework to integrate the data collected by mobile sensors and ground
stations in an urban environment. By using this framework, we can estimate the air
quality at any point in space and time and obtain the spatiotemporal profile of the air
pollution of a city. This framework can be extended to integrate other forms of data.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2221

2. Materials and methods


2.1. Data
Daegu is a Korean city located approximately between 35.61 N and 36.03 N latitudes
and 128.33 E and 128.77 E longitudes. In Daegu, there are 13 ground-based stations
that release the hourly average concentration of air pollutants such as PM2:5 , PM10 , CO,
NO2 , and SO2 (The data are open to public and can be accessed from the open APIs of
air-Korea https://www.airkorea.or.kr/, air-Daegu https://air.daegu.go.kr/, and K-weather
http://www.kweather.co.kr/). At the end of June 2017, Daegu started deploying mobile
sensors mounted on 44 taxis taking measures of air pollutant concentrations in the city
on the run. The selected model of the mobile sensor is SEN0177, manufactured by
‘Samyoung S&C’ in Korea (http://www.samyoungsnc.com). These mobile sensors mea-
sure the concentration of the same air pollutants as those measured by the ground
stations but using different a recording frequency, i.e. every 10 s along their routes. In
this study, we take PM2:5 as an illustrative example to show how to construct
a framework for the integration of measurements obtained from these mobile sensors
and ground stations, using Daegu as an example. The analysis of other air pollutants can
be similarly performed using the same framework. The study period is July 2017. During
this period, 24  31 ¼ 744 values were released by each ground station, and 2,182,578
records were collected by the mobile sensors, with the number of records varying
among individual sensors because of the difference in routes taken by the taxis.

2.2. Methods
To facilitate our discussion, we outline our study framework in Figure 1 and introduce some
notations, especially for the mobile-sensor records. We first average the mobile-sensor
records to remove some redundancy and then unify the station and mobile-sensor records.
On the basis of the unified records, two-step local regression is applied to estimate the air
pollutant concentration at the target locations and times. The two-step local regression
performs local regression for the given locations at all times and then performs regression
on the obtained time series to get the final estimation. Thus, the spatial pattern of air
pollution for the entire city can be obtained for a specific time. With regard to the pattern for
a specific location, a time series of records can be estimated, and its components and auto-
correlation are subjected to empirical mode decomposition and detrended fluctuation
analysis in order to study the temporal behaviors and validate the estimation results.
Based on the flowchart shown in Figure 1, we can quantitatively profile the spatiotemporal
dynamics of the air pollution of a place. The general idea for the integration of the air
pollutant concentrations recorded by mobile sensors and ground stations is as follows:
firstly, we remove the data redundancy by averaging the records collected by each sensor
within a short time period to make them comparable with the ground stations; secondly, for
each given time stamp, we use these new records and the ground station records at the
same time to estimate the air pollutant concentration at the given location; thirdly, for all
time stamps, we obtain a series consisting of the estimations at the given location by
smoothing this series and then taking the smoothed value at the given time stamp as the
estimation at the given location and time stamp.
2222 Y. LEUNG ET AL.

Figure 1. Flowchart of the proposed framework for the spatiotemporal integration and validation of
air pollution data obtained from the ground-based stations and mobile sensors.

With regard to the symbols, we use ðu; vÞ and t to indicate the location and time
stamp for each record, respectively. Specifically, the station records form a dataset
fsi ðui ; vi ; tk ÞgM
i¼1 ; whereas, the mobile-sensor records form another dataset
fmj ðuj;lðjÞ ; vj;lðjÞ ; tj;lðjÞ ÞgNj¼1 . Here, the subscripts i, j, and lðjÞ denote the ith station, the jth
taxi with M ¼ 13 and N ¼ 44, and the lth location of the jth taxi, respectively. Regarding
the station records si ðui ; vi ; tk Þ, their locations are fixed for all T ¼ 744 time stamps, tk ,
with the interval between two successive time stamps, δtk , equaling 1 h. With respect to
the records of mobile sensors mj ðuj;lðjÞ ; vj;lðjÞ ; tj;lðjÞ Þ, their locations depend on the time
stamp tj;lðjÞ and the taxi ID j. The number of time stamps tj;lðjÞ may be different for
different taxis. The interval between two continuous time stamps δtj;lðjÞ is 10 s. We limit
the locations of our records to the area between 35.61 N and 36.03 N latitudes and
128.33 E and 128.77 E longitudes, i.e. Daegu city.

2.2.1. Integration of the ground station and mobile-sensor records


Obviously, the air quality of two locations separated by a distance of 10 s of driving time
should be almost the same, which means that the mobile-sensor records taken by a taxi
are highly redundant within a certain period of time. To deal with such redundancy,
locations with the time period jtj;lðjÞ  tj;lðjÞ j  δt can be treated the same. We then divide
1 2
½tk1 ; tk  into n segments of equal length δt . Among mj ðuj;lðjÞ ; vj;lðjÞ ; tj;lðjÞ Þ, we select those
belonging to the qth segment (q 2 f1; 2; . . . ; n ¼ δtk =δt g), i.e. those with
tj;lðjÞ 2 ½tk1 þ ðq  1Þδt ; tk1 þ qδt . Based on these selected records, we obtain a new
record m e j ¼ hmj ðuj;lðjÞ ; vj;lðjÞ ; tj;lðjÞ Þjt ðjÞ 2½tk1 þðq1Þδt ;tk1 þqδt  i, where hi denotes the average
j;l

operation.
Because the temporal resolution of the ground stations is 1 h, we assume that there is
little change within an hour so that the time stamp of this newly-generated m e j is reset at
ðjÞ
e j is set at ðuj;lðjÞ ; vj;lðjÞ Þ, with l corresponding to the time stamp tj;lðjÞ
tk . The location of m
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2223

closest to the mean value of those time stamps in ½tk1 þ ðq  1Þδt ; tk1 þ qδt . In fact,
there could be some systematic differences between the mobile sensor and ground
station records because they use different sensors. We found that for each mobile
sensor, the relationship between its records and those obtained by the ground stations
within a range δd is highly linear and insensitive to the range δd 2 ½500 m; 1000 m.
Generally, more data can lead to more reliable statistical analysis. Therefore, we cali-
brated the mobile-sensor records using linear regression, which was obtained on the
basis of the linear relationship with δd ¼ 1000 m.
To indicate the dependence of the locations of those newly-generated records m e j on
k, one more subscript k was employed in the form ðuj;lðjÞ ;k ; vj;lðjÞ ;k Þ. For each tk , the number
of new records Mk should be no more than M  n. To simplify the notations, we analog
each new record to a unique record of a new station ~j with the corresponding location
ðuj;lðjÞ ;k ; vj;lðjÞ ;k Þ and the time stamp tk . Re-denoting the location as ðue; veÞ, all newly-
j j
generated records at tk can be re-expressed as fm eeðue; ve; tk ÞgMk .
j j j ej¼1
With these notations, given a time stamp tk we have a set of records
fsi ðui ; vi ; tk ÞgM
i¼1 [ fm eeðue; ve; tk ÞgMk . If we treat all records as equivalent and unify the
j j j ej¼1
notations as Yi;k , then for given tk we can re-express the set of records as
fYi;k ðui;k ; vi;k ; tk Þgni¼1
k
with nk ¼ M þ Mk . In this sense, the dataset we are going to study
is Y ¼ [k¼1 fYi;k ðui;k ; vi;k ; tk Þgni¼1
T k
, which is standard longitudinal data in statistics. Thus,
the construction of the dataset depends on only one parameter δt , which can be
empirically determined as discussed below.

2.2.2. Two-step local regression


On the basis of Y, we estimated the air pollutant concentration for a given location and
time stamp. Interpolation and regression are two commonly used methods for such
estimation. As a method that works directly on the air pollution data, both interpolation,
like Kriging, and regression can estimate air pollutant concentration at the time and
location of interest. In fact, both have been successfully applied in recent studies
mentioned above (Wong et al. 2004, Liao et al. 2006, Yanosky et al. 2008, Henao 2009,
Pollice and Lasinio 2010, Jha et al. 2011, Liang and Kumar 2013, Montero and Mateu
et al. 2015, Montero and Fernández-Avilés 2018). However, in contrast to interpolation,
regression provides a more flexible framework for involving indirect variables, including
meteorological variables such as temperature, humidity, wind speed, direction, etc.
(Zheng et al. 2015, He and Huang 2018); land use (Briggs et al. 2000, Hoek et al. 2001,
Brauer et al. 2003, Jerrett et al. 2007, Beckerman et al. 2013, Van Donkelaar et al. 2016);
and social and economic data (Shaddick et al. 2018), in the estimation. It not only serves
the interpolation function but can also offer a powerful way for air pollution prediction
via indirect information. We can explore the roles played by indirect information under
the framework of regression and test whether they are significant explanatory variables
(Mei and Wang 2012). Hence, we choose to use regression for the estimation of air
pollutant concentration at given time and location, with the view to include other
relevant variables in further study. With such purpose in mind, we focus on the integra-
tion of data collected by mobile sensors and ground stations within the framework of
regression in the present study. Because parametric regression needs a predetermined
2224 Y. LEUNG ET AL.

function to perform the analysis, to make our estimation more adaptive and data-driven,
we employed non-parametric regression. The local variations with respect to time and
space are important for understanding the air pollution processes. We, therefore,
employed a local regression model, namely two-step local regression, to capture the
variations in space and time.
To facilitate our discussion, we first briefly describe the two-step local regression
model in Yan and Mei (2014). The purpose of this model is to explore the variations of
the variable of interest, i.e. the PM2:5 concentration, Y, in our study, with respect to time
and space. Essentially, the two-step local regression model is a method that combines
geographically weighted regression (GWR) and local smoothing for handling both
spatial and temporal information (Yan and Mei 2014). Specifically, we attempt to identify
a function f for the estimation of Yi;k as follows:

Yi;k ¼ f ðui;k ; vi;k ; tk Þ þ i;k ; (1)

where i;k is the error term with zero mean (i ¼ 1; 2; . . . ; nk ; k ¼ 1; 2; . . . ; T). Given
a location of interest and a time stamp ðu0 ; v0 ; t0 Þ, we can estimate Y0;0 as f ðu0 ; v0 ; t0 Þ
by a two-step local model and denote it as Y e0;0 . To identify the function f , we first
estimate Y0;k at ðu0 ; v0 ; tk Þ for each time stamp tk by the locally linear GWR on the basis
of Yi;k as

ð1Þ
Yi;k ¼ f1 ðui;k ; vi;k ; tk Þ þ i;k ; (2)

and denote the estimate by Y e0;k . Second, the estimate Y


e0;0 of Y0;0 could be further
obtained by the local linear smoothing method applied to the time series fY e0;k gT as
k¼1

Ye0;k ¼ f2 ðu0 ; v0 ; tk Þ þ 0;k :


ð2Þ
(3)

Then, we can get the estimated f at the given ðu0 ; v0 ; t0 Þ as the estimate of f2 ðu0 ; v0 ; t0 Þ,
e0;0 .
which is the desired Y
The implementation of the first step requires the assumption of continuity of the
e0;k and
partial derivatives @f1 =@u and @f1 =@v of f1 ðu; v; tÞ with respect to u and v. Then, Y
the estimates of its partial derivatives can be determined by minimizing

X
nk
 2 dik
Yi;k  Y0;k  @f1 =@uðu0 ; v0 ; tk Þðui;k  u0 Þ  @f1 =@vðu0 ; v0 ; tk Þðvi;k  v0 Þ Kð 0 Þ (4)
i¼1
hk

with respect to Y0;k , @f1 =@uðu0 ; v0 ; tk Þ, and @f1 =@vðu0 ; v0 ; tk Þ. Here, KðÞ is a kernel function,
ðikÞ
d0 is the Euclidean distance between ðu0 ; v0 Þ and ðui;k ; vi;k Þ, and hk is the bandwidth for
given tk , which can be determined by the cross-validation procedure. With
0 1
1 u1;k  u0 v1;k  v0
B1 u2;k  u0 v2;k  v0 C
B C
Xðu0 ; v0 ; tk Þ ¼ B . .. .. C; (5)
@ .. . . A
1 unk ;k  u0 vnk ;k  v0
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2225

0 1
Y1;k
B Y2;k C
B C
Yk ¼ B .. C; (6)
@ . A
Ynk ;k

and
!
ð1kÞ ð2kÞ ðn kÞ
d0 d0 d0 k
Wðu0 ; v0 ; tk Þ ¼ diag Kð Þ; Kð Þ;    ; Kð Þ ; (7)
hk hk hk

we can define
 1
Qðu0 ; v0 ; tk Þ ¼ XT ðu0 ; v0 ; tk ÞWðu0 ; v0 ; tk ÞXðu0 ; v0 ; tk Þ XT ðu0 ; v0 ; tk ÞWðu0 ; v0 ; tk Þ; (8)

where the superscript T indicates the transpose of the corresponding matrix. Let
e0;k in the form of
eT ¼ ð1; 0; 0Þ, we can then obtain the estimate Y
e0;k ¼ eT Qðu0 ; v0 ; tk ÞYk :
Y (9)

Based on the series fYe0;k gTk¼1 obtained in the first step, we can apply local linear
smoothing to get the desired estimate Y e0;0 . Because the location has been fixed, we
e0;k , Y
simplify Y e0;0 , and f2 ðu0 ; v0 ; tk Þ as Yek , Y
e0 , and f2 ðtk Þ, respectively. Similar to the first
step, we assume that the derivative of f2 ðtÞ is continuous for t 2 ½t1 ; kT , and denote it as
e0 can be calculated by minimizing
df2 =dt. The estimate of Y
XT  2 t  t
Yek  f2 ðt0 Þ  df2 =dtðtk  t0 Þ Kð
k 0
Þ (10)
k¼1
r0

with respect to f2 ðt0 Þ and df2 =dtðt0 Þ. Here, KðÞ and r0 are also the kernel function and
the corresponding bandwidth. Using
0 1
1 t1  t0
B 1 t2  t0 C
B C
Mðt0 Þ ¼ B .. .. C; (11)
@. . A
1 tT  t0

0 1
e1
Y
BY C
B e2 C
Z0 ¼ B . C; (12)
@ .. A
ek
Y
and
 
t1  t0 t2  t0 tT  t0
Wðt0 Þ ¼ diag Kð Þ; Kð Þ;    ; Kð Þ ; (13)
r0 r0 r0
we have
 1
P0 ðt0 Þ ¼ MT ðt0 ÞWðt0 ÞMðt0 Þ MT ðt0 ÞWðt0 Þ: (14)
2226 Y. LEUNG ET AL.

e0 as
Denoting ð1; 0Þ as eT2 , we have the estimate Y

e0 ¼ e2 T P0 ðt0 ÞZ0 ;
Y (15)

which is also the estimate of the desired f ðu0 ; v0 ; t0 Þ.

2.2.3. Empirical mode decomposition


To evaluate the performance of the two-step local regression model, we estimated the
PM2:5 concentrations at the Daemyeong district station and compared them with the
observed records taken at the station. Usually, air pollution is affected by factors such as
human activities and climate dynamics that dominate different periods. Therefore, we
employed empirical mode decomposition (EMD), an effective and adaptive method, to
decompose the observed and estimated series for more detailed comparison. The obtained
EMD components are called the intrinsic mode functions (IMFs), which play a key role in
EMD analysis. An IMF is a function with two properties: 1) the number of extrema differs
from that of zero-crossings by no more than one; and 2) the envelope of either the local
maxima or local minima has zero mean (Huang et al. 1998, Wu et al. 2007).
Given a time series, X ¼ fxt g, the EMD method can be briefly summarized as follows
(Huang et al. 1998, Wu et al. 2007):

(1) Construct the upper and lower envelope, ENVmax and ENVmin , on the basis of local
maxima and minima, respectively.
(2) Calculate the average of the upper and lower envelope M using M ¼ ðENVmax þ
ENVmin Þ=2 and the difference between M and X, i.e. h ¼ X  M, can be obtained.
(3) If h is not an IMF; then, iteratively repeat the above two steps on h until the
envelopes have zero-mean under certain stopping criteria or h becomes an IMF. Such h
is taken as the first component and is denoted as IMF1 .
(4) Treat the residues, X  IMF1 , as a new series and perform the above steps to
extract IMF2 .
(5) Repeat the iterative procedure until no more IMFs can be extracted.
Denoting the number of extracted IMFs as k, the implementation of EMD decom-
P
poses X as X ¼ ki¼1 IMFi þ r, where r is the residual from which no more IMFs can be
extracted. However, EMD suffers from the mode-mixing problem (Wu and Huang 2009).
To solve this problem, the ensemble EMD (EEMD) was proposed by Wu and Huang
(2009). The key modification of EMD is to add a white noise series to X before perform-
ing EMD and then obtain the EEMD IMFs by ensemble averaging the corresponding
EMD IMFs. The MatLab code of EEMD employed in this study was downloaded from
http://rcada.ncu.edu.tw/.
If the two-step local regression model gives reliable estimates, then the estimated series
should not only be highly correlated to the observed series, but the EEMD components
should also be similar to those of the observed series. EMD has been applied to characterize
many real-life processes, such as sunspots (Zhou and Leung 2010a, Zhou et al. 2013) and air
pollution process (Hu et al. 2013, Jiang and Bai 2018), along with time. In this study, we
evaluated the performance of the two-step local regression model by examining 1) if the
observed and estimated records have the same number of EMD components; and 2) if their
corresponding components are highly correlated.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2227

2.2.4. Detrended fluctuation analysis


Autocorrelation is an important property with respect to the underlying dynamics of
a process (Gopikrishnan et al. 1999). If the sum of autocorrelation CðτÞ with time lag τ
ranging from a finite value to infinity diverges; then, the corresponding process is with
long-range correlation (Beran 1994). In the natural world, long-range correlation has
been found to be ubiquitous (Koscielny-Bunde et al. 1998, 2006, Zhou and Leung 2010a,
b), including air pollution processes (Varotsos et al. 2005, Dong et al. 2017, Plocoste et al.
2017). Detrended fluctuation analysis (DFA) is one of the most popular methods to
detect long-range correlation in a series X ¼ fxt g, even if the series is with nonstatio-
narity such as trends (Bashan et al. 2008). Given a time series, X ¼ fxt g, the implementa-
tion of DFA is as follows (Peng et al. 1994, Kantelhardt et al. 2001):

(1) Divide the cumulative sum of X into several segments with equal length s.
(2) For each of these segments, a detrending step removes the local trend; then, the
variance around the local trend can be obtained.
(3) The fluctuation function FðsÞ is further calculated as the average of variances over
all segments.
(4) Via the scaling behavior FðsÞ,sα , a scaling exponent α could be obtained.
In principle, the detrended segments can be considered as statistically identical so that
the average overall segment is equivalent to the ensemble average over different realiza-
tions of the given process (Höll et al. 2016). The scaling exponent α can be connected with
that of the autocorrelation decaying power γ of the scaling law CðτÞ,τ γ for a stationary
series by α ¼ 1  γ=2 (Höll and Kantz 2015). In fact, γ < 1, i.e. α > 0:5. It ensures the
divergence of sum of CðτÞ and corresponds to long-range correlation.
Obviously, if the performance of the two-step local regression estimation is satisfac-
tory; then, the DFA scaling behavior of the estimated series should be very similar to that
of the observed series. This will give an additional angle to validate the estimation
results.

3. Empirical results and interpretations


To integrate the station and mobile-sensor records, one parameter δt needs to be
determined. In our view, δt with a value of 10 min should be able to limit the distance
between two successive m ee to within 5000 m. This is because the usual vehicle speed in
j
an urban area is approximately 30000–40000 m/hr and the driving route is usually not
a straight line. Therefore, we think that such δt could maintain a good balance between
removing the redundant information and extracting the essential information. For
sensitivity, we also ran the experiments by setting δt at 5, 20, and 30 min. The results
are similar. Therefore, in what follows we only report the results obtained with the
parameter δt ¼ 10 min.
In particular, we report the following: 1) the estimated PM2:5 values at the Daemyeong
district station (35.85 N, 128.57 E) for the whole July 2017 and compare them to the
observed values with respect to their EEMD components and DFA scaling behavior; 2) the
spatial distribution of the estimated PM2:5 at 12:00 pm on 25 July 2017.
2228 Y. LEUNG ET AL.

3.1. Estimates at Daemyeong district


Using two-step local regression, we estimated the PM2:5 values at Daemyeong district for
744 h in July 2017. As shown in Figure 2, the observed and estimated PM2:5 values are
very similar. Our method showed similar performance for all other stations. For example,
comparisons between the estimates and observations at the Horim-dong, Hyeonpung-
myeon, and Igok-dong stations are depicted in Figure 3.

3.1.1. EEMD
EEMD was employed to evaluate the performance of our estimates for Daemyeong district
station. Eight components and one residual were extracted for both of the observed and
estimated PM2:5 records. As shown in Figure 4, they are very similar with respect to both the
original series and most of their EEMD components. Quantitatively, we employed Fourier
transform to extract the dominant periods of the individual IMFs of both series and
measured the similarity between them by cross-correlation.
Table 1 shows that the dominant periods of the original series and IMFs are very
similar for the observed and estimated records, especially for the IMFs with long periods.
As for the cross-correlation, the corresponding IMFs of the observed and estimated
values are highly correlated except for the first two, which are relatively more irregular.
The p-values of cross-correlations other than that between two IMF1 are less than 0.05,
indicating that the detected correlations are significant at 0.95 level of confidence. For
the Horim-dong, Hyeonpung-myeon, and Igok-dong station displayed in Figure 3, the
correlation coefficients between the estimated and observed PM2:5 values are 0.90, 0.87,
and 0.91, respectively.

60
observed
9−hour moving average
50 estimated
PM2.5 (µg/m )

40
3

30

20

10

0
100 200 300 400 500 600 700
hours
Figure 2. Observed, 9-h moving average and estimated PM2.5 values at the Daemyeong district
station in July 2017.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2229

3.1.2. DFA
We also employed DFA to study long-range correlation between the observed and esti-
mated series at Daemyeong district in July 2017. The detrending order was set for 2, which
leads to scaling behaviors very similar to those obtained by the detrending order 3. Here, we
did not filter anything, such as the periodic non-stationarity shown in Figures 2 and 4, from
the series but analyzed them directly, because the purpose of employing the DFA is to
characterize long-range correlation in series, and the filters can remove some information
from the series. However, it should be noted that if our focus is on the dynamics of high-
frequency information, these cyclic trends should be removed because they are strong
enough to mask the dynamics of the high-frequency information. The DFA scaling beha-
viors are shown in Figure 5. We can see that s0 ¼ 101:4  24 h is a critical scale. At the scales
s > s0 , their DFA scaling behaviors are almost the same. In this scaling range, the estimated
scaling exponents α  1:3. However, at smaller scales, there are some differences between
the two DFA scaling behaviors. The DFA fluctuation function FðsÞ of the estimated series has
larger power than that of the observed series in the range s < s0 .

3.2. Estimation at a fixed time stamp


We then estimated air pollutant concentrations at a fixed time stamp for the region
35.61 N to 36.03 N and 128.33 E to 128.77 E, which covers the whole Daegu city. The entire

60 50
observed observed
estimated 45 estimated
50
40

35
40
PM2.5 (µg/m )
PM2.5 (µg/m )

3
3

30

30 25

20
20
15

10
10
5

0 0
0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800
hours hours

70
observed
estimated
60

50
PM2.5 (µg/m )
3

40

30

20

10

0
0 100 200 300 400 500 600 700 800
hours

Figure 3. Observed and estimated PM2.5 values at the Horim-dong (left upper panel), Hyeonpung-
myeon (right upper panel), and Igok-dong (bottom panel) stations in July 2017.
2230 Y. LEUNG ET AL.

observed 9−hour moving average estimated


50
40
30
20
10
10
5
IMF1

0
−5
5
IMF2

0
−5

5
IMF3

0
−5

5
IMF4

0
−5
0 100 200 300 400 500 600 700
hours

observed 9−hour moving average estimated

5
IMF5

0
−5
10
IMF6

−10
4
2
IMF7

0
−2
−4
1
IMF8

−1

20
r

10
0 100 200 300 400 500 600 700
hours

Figure 4. The original series and its EEMD components of the observed, 9-h moving average and
estimated PM2.5 values at the Daemyeong district station in July 2017.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2231

region was divided into 40  40 grids with size of approximately 1000 m  1160 m. For each
grid, we estimated the PM2:5 at its center to represent the air pollutant concentration of this
grid. The time stamp 12:00 pm on 25 July 2017 was selected as an example to show the
spatial pattern of the estimated PM2:5 . To facilitate our analysis, we constructed a web-based
software platform for efficient data management, data analysis, query, and visualization.
In Figure 6, the symbols of cars and buildings denote the location of the mobile sensors
and ground stations at a given time, respectively. The colors of these symbols exhibit the
extent of the recorded air pollutant concentrations, whereas the colors of the grids on the
map display the estimated values. Specifically, yellow, green, and purple indicate the low,
moderate, and high concentrations of PM2:5 , respectively. The gradation of colors from
yellow to purple indicates the gradation of air pollution from low to high. As shown in Figure
6, we can observe that the air quality with respect to PM2:5 concentration improves when

Table 1. Dominant period of the original series and its EEMD components for the observed,
9-h moving average and estimated PM2:5 records at the Daemyeong district in July 2017.
Observation (day) 9-Hour moving average (day) Estimation (day)
Original 42.67 42.67 42.67
IMF1 0.16 0.11 0.08
IMF2 0.37 0.23 0.44
IMF3 0.99 0.99 0.99
IMF4 1.33 2.03 1.33
IMF5 3.88 3.88 3.88
IMF6 8.23 8.23 8.23
IMF7 42.67 42.67 42.67
IMF8 42.67 42.67 42.67

2.5

2
1.3
1.5
log10(F(s))

0.5

0
observed
2.5
9−hour moving average
−0.5
estimated

−1
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
log10(s)

Figure 5. The DFA scaling behavior of the observed, 9-h moving average and estimated PM2.5
records at the Daemyeong district station in July 2017.
2232 Y. LEUNG ET AL.

Figure 6. Estimated PM2.5 of 1600 grids covering the city area of Daegu at time 12:00 pm on 25 July 2017
using data collected by both of mobile sensors and ground stations. Colour changing from yellow to
green to purple indicates the change of PM2.5 concentration from low to moderate to high.

moving from the southwestern to the northeastern corner, which is consistent with the
conclusion based on the direct observation of the records collected by mobile sensors and
ground stations. We also used the 30  30 and 50  50 grids to cover the whole city. Very
similar spatial patterns were obtained. The estimated spatial pattern shows that PM2:5
concentration gradually decreases from the southwestern corner to the northeastern
corner. In general, the estimated pattern is consistent with the observation records. On
the other hand, if we use only the air pollution data collected by the ground stations to
estimate the spatial pattern of air pollutant concentration at the time stamp 12:00 pm on
25 July 2017; then, the results shown in Figure 7 display a very different pattern compared
with that shown in Figure 6. Such difference indicates that if we only have ground stations,
which are limited in number; then, we will fail to capture the real patterns of air pollution
distribution in the whole city. This demonstrates that mobile sensors provide additional
information necessary for air pollution profiling. To show the temporal variations of the
spatial pattern, we give one more example in Figure 8 at another time stamp (12:00 pm on
18 July 2017) for comparison.

3.3. Estimations across the city


We used the estimate for the ground stations to show how well our estimation method
performs across the city, because there are generally no station observations at the
points the taxis pass through. For all 13 stations, we calculated their correlation coeffi-
cients between the estimated and observed values and examined the spatial pattern of
these coefficients. All coefficients had relatively high values, ranging from 0.87 to 0.94,
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2233

Figure 7. Estimated PM2.5 of 1600 grids covering the city area of Daegu at time 12:00 pm on
25 July 2017 using data collected by only ground stations. Color changing from yellow to green to
purple indicates the change of PM2.5 concentration from low to moderate to high.

indicating the good performance of our estimations. As for the spatial pattern, the
stations located near the city center have coefficients higher than those at the boundary.
This is understandable because taxis usually pass through the city center more often
than the fringes the of the city. We then compared the estimates and observations for all
stations. As shown in Figure 9, the value of the correlation coefficient is 0.88.

4. Discussion
As argued above, in order to profile the spatiotemporal variations of air pollutant
concentrations in a city, ground station measurements must be supplemented by
mobile-sensor measurements. The challenge then is to find a rigorous method to
integrate both to form a composite dataset for further analysis. However, to the best
of our knowledge, there are no studies on the integration of these two types of data. In
our view, the lack of such research may be attributed to the difficulty in integrating data
of different types, i.e. regular measurements made at fixed ground stations, which have
high temporal resolution but low spatial resolution, and irregular measurements made
by mobile sensors at different locations, which have low temporal resolution but high
spatial resolution. Therefore, one of the main contributions of this study is the formula-
tion of a rigorous framework for such integration. As we can observe in the ‘Empirical
Results and Interpretations’ section (see Figures 2 and 6), the integrated data capture the
temporal and spatial characteristics of air pollutant concentrations well. Furthermore,
the proposed framework enables estimation using direct information as well as the
incorporation of indirect information for a more comprehensive and accurate profiling of
air pollution concentrations via explanatory variables.
2234 Y. LEUNG ET AL.

Figure 8. Estimated PM2.5 of 1600 grids covering the city area of Daegu at time 12:00 pm on
18 July 2017 using data collected by both of mobile sensors and ground stations. Color changing
from yellow to green to purple indicates the change of PM2.5 concentration from low to moderate to
high.

As presented in section 3, for any given time stamp, although it is impossible to


obtain the actual spatial pattern at such fine resolutions, the distribution of the esti-
mated concentration is generally consistent with the observation records collected by
the mobile sensors and ground stations (see Figures 6 and 8), except for some local
variations recorded by a few mobile sensors. This is, however, understandable because
regression captures the general trend but not all individual variations. Although we use
the local method to estimate the air pollutant concentrations, it cannot capture some
extreme changes. So, the pattern of gradual variation in Daegu is not surprising.
Now, we focus on the estimation at the Daemyeong district station. It is observed that
the identified s0  1 day in Figure 5 classifies large scales corresponding to the third to
eighth IMFs as well as the residual and the small scales with the first two IMFs. As shown
in Table 1, at large scales, the observed and estimated series result in IMFs with high
correlation and almost the same dominant periods, i.e. the third to eighth IMFs corre-
spond to the daily, one-and-half day, half-week, weekly, and monthly scales. The
extracted daily and weekly periods should be connected to human activity, like daily
commuting traffic and the cycle of working days and weekends, respectively. At the
daily to weekly scales, the DFA power is about 1.3, which is indicative of non-stationarity
with moderate anti-correlation. This DFA power is similar to that reported in (Varotsos
et al. 2005) with a value of about 1.2 in Greece at daily and weekly scales. With regard to
intra-day scales, the extracted IMFs are somehow different with respect to either the
dominant periods or cross correlation. In addition, the DFA scaling behaviors are also
very different at the hourly scales. The estimated series at hourly scales results in a very
strong power, with a value larger than 2.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2235

70
regression formula: y=0.64x+6.74

60
values (µg/m )
3

50

40
2.5
estimated PM

30

20

10

correlation coefficient: 0.8


0
0 10 20 30 40 50 60 70
observed PM2.5 values (µg/m3)

Figure 9. Comparison between estimated and observed PM2.5 values for all 13 stations at all 744-
time stamps in July 2017.

The similarity at large scales and difference at small scales can also be observed in
Figure 2: the estimated PM2:5 series can generally characterize the air pollution process
at large scales but is smoother than the observed series. We think it is understandable,
because the two-step local regression model includes the kernel smoothing procedure,
which could filter out some of the high-frequency information. As depicted in Figure 4,
the first two IMFs of the estimated series have much smaller amplitudes, making
effective extraction of the dominant period more difficult. In addition, it shows that
the IMF2 of the observed series has the dominant period of 0.37 day (approximately 9 h).
Therefore, we use the 9-h moving average to smoothen the observed series to examine
the low-pass filter effect of the two-step regression model. Figure 4 displays that the
amplitudes of the first two IMFs of the 9-h moving average series are much smaller than
those of the observed series. The cross correlations between the 9-h moving average
and observed IMF1 and IMF2 in Table 2 are also very low, even lower than those
between the estimated and observed IMFs. Therefore, the low-pass filter smoothens
out the high-frequency information and reduces the correlation between the filtered
and observed series at small scales. In contrast, the correlation at large scales is well
maintained. The low-pass filter effect can also explain the difference between the DFA
scaling behaviors of the observed and estimated series, because the 9-h moving average
series exhibits a DFA scaling behavior very similar to that of the estimated series. In fact,
2236 Y. LEUNG ET AL.

Table 2. Pairwise cross correlation of the observation, 9-h moving average, and estimation of
PM2:5 records at Daemyeong district in July 2017.
Observation 9-Hour moving average Observation
v.s. Estimation v.s. Estimation v.s. 9-Hour Moving Average
Original 0.94 0.96 0.96
IMF1 0.13a −0.01a 0.05a
IMF2 0.58 0.26 0.04a
IMF3 0.86 0.88 0.93
IMF4 0.91 0.91 0.96
IMF5 0.94 0.94 0.98
IMF6 0.97 0.96 0.99
IMF7 0.93 0.94 0.94
IMF8 0.90 0.97 0.98
Residual 1.00 0.99 0.99
a
The corresponding p-values are larger than 0.05.

the low-pass filter, by using the moving average or kernel smoothing procedure of the
two-step local regression model, could introduce very strong short-range correlation
into the filtered data. Such strong auto-correlation may dominate at small scales but
could be overwhelmed by long-range correlation at large scales with a transition scaling
behavior connecting these two dominated behaviors. Therefore, the low-pass-filtered
DFA scaling behavior exhibits a double power law connected by a transition in Figure 5.
Therefore, the difference between the observed and estimated series with respect to
correlations among IMFs and the DFA scaling behavior can be basically attributed to the
low-pass filter effect. However, it by no means implies the equivalence of the two-step
local regression model and the simple moving average, because of their first two IMFs,
especially IMF1 , exhibit low correlation (see Table 2). Compared to the moving average,
the two-step local regression model could retain more high-frequency information,
which is indicated by its higher cross correlations to the observed IMF1 and IMF2 .
Usually, the estimation performance is evaluated simply by R2 (square of the correlation
coefficient), e.g. the studies of Wong et al. (2004) and He and Huang (2018). In this study, to
further improve this, we further compare the estimations and observations at multiple time
scales and with respect to their dynamics using EMD and DFA, respectively. Although it is not
new to employ EMD and DFA to study air pollutant data, previous studies have usually applied
them to analyze only observed records to uncover the components with different periods (Hu
et al. 2013, Jiang and Bai 2018) and identify their long-range correlation (Varotsos et al. 2005,
Dong et al. 2017, Plocoste et al. 2017). In fact, the comparison at multiple time scales and with
respect to their dynamics can undoubtedly provide more information about the performance
of estimations and provides a more comprehensive and revealing evaluation. This study
provides an understanding of how good are estimations are, as well as the performance of
the estimations at different time scales, e.g. estimations are better at the daily and weekly
scales. Such comprehensive evaluation is another contribution of this study.

5. Conclusions
In this study, we have proposed a rigorous framework to effectively integrate data collected
by the mobile sensors and ground stations. Using the data collected in Daegu during the
whole of July 2017, we have shown that air pollutant concentrations can be estimated by
the two-step local regression model for any given location and time stamp. Furthermore, we
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2237

have demonstrated that the estimated data can characterize the observed data at daily and
weekly scales in the temporal dimension with respect to the EEMD components and long-
range correlation, and from them, we can also extract the general spatial pattern of air
pollution. The differences at small scales can be mainly attributed to the low-pass filter effect
of the two-step local regression model. Although the framework is established and eval-
uated by using only PM2.5, it is applicable to the study of other air pollutants. Therefore, our
research advances the frontier of basic research in air pollution monitoring by integrating
station-based and mobile-sensor-based data. It can be extended to the integration of multi-
source information for urban big-data analysis.
As previously mentioned, the project to record air pollutant concentrations using
mobile sensors was initiated in Daegu at the end of June 2017. In the next phase of the
project, additional mobile sensors will be deployed to further improve our analysis.
Within the proposed study framework, future studies could assess the dependence of air
pollution on other variables and the consequences of air pollution at the location or
time stamp of interest, based on the estimated air pollutant concentrations. In further
research, it is also of interest to improve our method to better capture the local and
extreme variations.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the the earmarked grant of the Hong Kong Research Grants Council
[Project ID: 2120517, Ref: 14653316];The Chinese University of Hong Kong [VC Discretionary Fund].

References
Aberer, K., et al., 2010. Opensense: open community driven sensing of environment. In: Proceedings
of the ACM SIGSPATIAL International Workshop on GeoStreaming. San Jose, CA: ACM, 39–42.
Al-Ali, A., Zualkernan, I., and Aloul, F., 2010. A mobile GPRS-sensors array for air pollution
monitoring. IEEE Sensors Journal, 10 (10), 1666–1671. doi:10.1109/JSEN.2010.2045890
Barrett, S.R., et al., 2012. Public health, climate, and economic impacts of desulfurizing jet fuel.
Environmental Science & Technology, 46 (8), 4275–4282. doi:10.1021/es203325a
Bashan, A., et al., 2008. Comparison of detrending methods for fluctuation analysis. Physica A, 387
(21), 5080–5090. doi:10.1016/j.physa.2008.04.023
Beckerman, B.S., et al., 2013. Application of the deletion/substitution/addition algorithm to select-
ing land use regression models for interpolating air pollution measurements in California.
Atmospheric Environment, 77, 172–177. doi:10.1016/j.atmosenv.2013.04.024
Beran, J., 1994. Statistics for long-memory processes. Vol. 61. Boca Raton, FL: CRC Press.
Biondi, S.M., et al., 2017. Bus as a sensor: A mobile sensor nodes network for the air quality
monitoring. In: Wireless and mobile computing, networking and communications (WiMob). Rome,
Italy: IEEE, 272–277.
Brauer, M., et al., 2003. Estimating long-term average particulate air pollution concentrations:
application of traffic indicators and geographic information systems. Epidemiology, 14,
228–239.
Brauer, M., et al., 2015. Ambient air pollution exposure estimation for the global burden of disease
2013. Environmental Science & Technology, 50 (1), 79–88. doi:10.1021/acs.est.5b03709
2238 Y. LEUNG ET AL.

Briggs, D.J., et al., 2000. A regression-based method for mapping traffic-related air pollution:
application and testing in four contrasting urban environments. Science of the Total
Environment, 253 (1–3), 151–167. doi:10.1016/S0048-9697(00)00429-0
Brynda, P., Kosová, Z., and Kopřiva, J., 2016. Mobile sensor unit for online air quality monitoring. In:
2016 Smart Cities Symposium Prague (SCSP). Prague, Czech Republic: IEEE, 1–4.
Devarakonda, S., et al., 2013. Real-time air quality monitoring through mobile sensing in metro-
politan areas. In: Proceedings of the 2nd ACM SIGKDD international workshop on urban comput-
ing. Chicago, IL: ACM, 15.
Dong, Q., Wang, Y., and Li, P., 2017. Multifractal behavior of an air pollutant time series and the
relevance to the predictability. Environmental Pollution, 222, 444–457. doi:10.1016/j.
envpol.2016.11.090
Firculescu, A.C. and Tudose, D.S., 2015. Low-cost air quality system for urban area monitoring. In:
Control Systems and Computer Science (CSCS), 2015 20th International Conference on. Bucharest,
Romania: IEEE, 240–247.
Gopikrishnan, P., et al., 1999. Scaling of the distribution of fluctuations of financial market indices.
Physical Review E, 60 (5), 5305. doi:10.1103/PhysRevE.60.5305
Grimmond, C. and Oke, T.R., 1999. Aerodynamic properties of urban areas derived from analysis of
surface form. Journal of Applied Meteorology, 38 (9), 1262–1292. doi:10.1175/1520-0450(1999)
038<1262:APOUAD>2.0.CO;2
Gupta, P., et al., 2006. Satellite remote sensing of particulate matter and air quality assessment over
global cities. Atmospheric Environment, 40 (30), 5880–5892. doi:10.1016/j.atmosenv.2006.03.016
He, Q. and Huang, B., 2018. Satellite-based high-resolution PM2.5 estimation over the
Beijing-Tianjin-Hebei region of China using an improved geographically and temporally
weighted regression model. Environmental Pollution, 236, 1027–1037. doi:10.1016/j.
envpol.2018.01.053
Henao, R.G., 2009. Geostatistical analysis of functional data. Thesis (PhD). Universitat Politècnica de
Catalunya.
Hoek, G., et al., 2001. Estimation of long-term average exposure to outdoor air pollution for
a cohort study on mortality. Journal of Exposure Science and Environmental Epidemiology, 11
(6), 459. doi:10.1038/sj.jea.7500189
Höll, M. and Kantz, H., 2015. The relationship between the detrendend fluctuation analysis and the
autocorrelation function of a signal. The European Physical Journal B, 88 (12), 327. doi:10.1140/
epjb/e2015-60721-1
Höll, M., Kantz, H., and Zhou, Y., 2016. Detrended fluctuation analysis and the difference between
external drifts and intrinsic diffusionlike nonstationarity. Physical Review E, 94 (4), 042201.
doi:10.1103/PhysRevE.94.042201
Hu, M., et al., 2013. Spatial and temporal characteristics of particulate matter in Beijing, China using
the empirical mode decomposition method. Science of the Total Environment, 458, 70–80.
doi:10.1016/j.scitotenv.2013.04.005
Huang, N.E., et al., 1998. The empirical mode decomposition and the Hilbert spectrum for non-
linear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series
A: Mathematical, Physical and Engineering Sciences, 454 (1971), 903–995. doi:10.1098/
rspa.1998.0193
Jerrett, M., et al., 2007. Modeling the intraurban variability of ambient traffic pollution in Toronto,
Canada. Journal of Toxicology and Environmental Health, Part A, 70 (3–4), 200–212. doi:10.1080/
15287390600883018
Jha, D.K., et al., 2011. Evaluation of interpolation technique for air quality parameters in Port Blair,
India. Universal Journal of Environmental Research & Technology, 1, 3.
Jiang, L. and Bai, L., 2018. Spatio-temporal characteristics of urban air pollutions and their causal
relationships: evidence from Beijing and its neighboring cities. Scientific Reports, 8 (1), 1279.
doi:10.1038/s41598-017-18107-1
Kantelhardt, J.W., et al., 2001. Detecting long-range correlations with detrended fluctuation
analysis. Physica A, 295 (3–4), 441–454. doi:10.1016/S0378-4371(01)00144-3
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 2239

Kersting, J., et al., 2017. Internet of things architecture for handling stream air pollution data. In:
Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security
(IoTBDS 2017). Porto, Portugal, 117–124.
Koscielny-Bunde, E., et al., 1998. Indication of a universal persistence law governing atmospheric
variability. Physical Review Letters, 81 (3), 729–732. doi:10.1103/PhysRevLett.81.729
Koscielny-Bunde, E., et al., 2006. Long-term persistence and multifractality of river runoff records:
detrended fluctuation studies. Journal of Hydrology, 322 (1–4), 120–137. doi:10.1016/j.
jhydrol.2005.03.004
Lee, K.H., et al., 2009. Atmospheric aerosol monitoring from satellite observations: A history of
three decades. In: Y. Kim, U. Platt, M. B. Gu and H. Iwahashi eds. Atmospheric and biological
environmental monitoring. Springer, 13–38.
Leung, Y., et al., 2018. An integrated web-based air pollution decision support system–a prototype.
International Journal of Geographical Information Science, 32 (9), 1787–1814. doi:10.1080/
13658816.2018.1460752
Li, J., Carlson, B.E., and Lacis, A.A., 2015. How well do satellite AOD observations represent the
spatial and temporal variability of PM2.5 concentration for the United States? Atmospheric
Environment, 102, 260–273. doi:10.1016/j.atmosenv.2014.12.010
Liang, D. and Kumar, N., 2013. Time-space Kriging to address the spatiotemporal misalignment in
the large datasets. Atmospheric Environment, 72, 60–69. doi:10.1016/j.atmosenv.2013.02.034
Liao, D., et al., 2006. GIS approaches for the estimation of residential-level ambient PM
concentrations. Environmental Health Perspectives, 114 (9), 1374. doi:10.1289/ehp.9169
Lim, D., Kim, T., and Jung, H., 2018. Fine-grained particulate matter prediction using long
short-term memory on vehicle IoT platform. In: International Conference on Future Information
& Communication Engineering. vol. 10, Pattaya, Thailand, 295–296.
Ma, Z., et al., 2014. Estimating ground-level PM2.5 in China using satellite remote sensing.
Environmental Science & Technology, 48 (13), 7436–7444. doi:10.1021/es5009399
Mead, M.I., et al., 2013. The use of electrochemical sensors for monitoring urban air quality in
low-cost, high-density networks. Atmospheric Environment, 70, 186–203. doi:10.1016/j.
atmosenv.2012.11.060
Mei, C.L. and Wang, N., 2012. Modern regression analysis methods. Beijing: Science Press.
Moltchanov, S., et al., 2015. On the feasibility of measuring urban air pollution by wireless
distributed sensor networks. Science of the Total Environment, 502, 537–547. doi:10.1016/j.
scitotenv.2014.09.059
Montero, J.M. and Fernández-Avilés, G., 2018. Functional Kriging prediction of atmospheric particulate
matter concentrations in Madrid, Spain: is the new monitoring system masking potential public
health problems? Journal of Cleaner Production, 175, 283–293. doi:10.1016/j.jclepro.2017.12.041
Montero, J.M., et al., 2015. Spatial and spatio-temporal geostatistical modeling and Kriging. Vol. 998.
West Sussex, UK: John Wiley & Sons.
Peng, C.K., et al., 1994. Mosaic organization of DNA nucleotides. Physical Review E, 49 (2),
1685–1689. doi:10.1103/PhysRevE.49.1685
Plocoste, T., Calif, R., and Jacoby-Koaly, S., 2017. Temporal multiscaling characteristics of particulate
matter PM10 and ground-level ozone O3 concentrations in Caribbean region. Atmospheric
Environment, 169, 22–35. doi:10.1016/j.atmosenv.2017.08.068
Pollice, A. and Lasinio, G.J., 2010. Spatiotemporal analysis of the PM10 concentration over the
Taranto area. Environmental Monitoring and Assessment, 162 (1–4), 177–190. doi:10.1007/
s10661-009-0779-y
San Jose, R., Karatzas, K., and Perez, J., 2008. Air quality modeling. Ecological Models, 1, 111–123.
Shaddick, G., et al., 2018. Data integration model for air quality: A hierarchical approach to the
global estimation of exposures to ambient air pollution. Journal of the Royal Statistical Society:
Series C (applied Statistics), 67 (1), 231–253. doi:10.1111/rssc.12227
Van Donkelaar, A., et al., 2010. Global estimates of ambient fine particulate matter concentrations
from satellite-based aerosol optical depth: development and application. Environmental Health
Perspectives, 118 (6), 847. doi:10.1289/ehp.0901623
2240 Y. LEUNG ET AL.

Van Donkelaar, A., et al., 2016. Global estimates of fine particulate matter using a combined
geophysical-statistical method with information from satellites, models, and monitors.
Environmental Science & Technology, 50 (7), 3762–3772. doi:10.1021/acs.est.5b05833
Varotsos, C., Ondov, J., and Efstathiou, M., 2005. Scaling properties of air pollution in Athens,
Greece and Baltimore, Maryland. Atmospheric Environment, 39 (22), 4041–4047. doi:10.1016/j.
atmosenv.2005.03.024
Wang, J. and Christopher, S.A., 2003. Intercomparison between satellite-derived aerosol optical
thickness and PM2.5 mass: implications for air quality studies. Geophysical Research Letters, 30
(21), 2095. doi:10.1029/2003GL018174
Williams, D.E., et al., 2013. Validation of low-cost ozone measurement instruments suitable for use
in an air-quality monitoring network. Measurement Science and Technology, 24 (6), 065803.
doi:10.1088/0957-0233/24/6/065803
Wong, D.W., Yuan, L., and Perlin, S.A., 2004. Comparison of spatial interpolation methods for the
estimation of air quality data. Journal of Exposure Science and Environmental Epidemiology, 14
(5), 404. doi:10.1038/sj.jea.7500338
World Health Organization, 2016. Ambient air pollution: A global assessment of exposure and burden
of disease. Geneva, Switzerland: World Health Organization. https://www.who.int/phe/publica
tions/air-pollution-global-assessment/en/
Wu, Z., et al., 2007. On the trend, detrending, and variability of nonlinear and nonstationary time
series. Proceedings of the National Academy of Sciences, 104 (38), 14889–14894. doi:10.1073/
pnas.0701020104
Wu, Z. and Huang, N., 2009. Ensemble empirical mode decomposition: A noise-assisted data analysis
method. Advances in Adaptive Data Analysis, 1 (1), 1–41. doi:10.1142/S1793536909000047
Yan, N. and Mei, C.L., 2014. A two-step local smoothing approach for exploring spatio-temporal
patterns with application to the analysis of precipitation in the mainland of China during
1986–2005. Environmental and Ecological Statistics, 21 (2), 373–390. doi:10.1007/s10651-013-
0259-y
Yanosky, J.D., et al., 2008. Spatio-temporal modeling of chronic PM10 exposure for the nurses’
health study. Atmospheric Environment, 42 (18), 4047–4062. doi:10.1016/j.
atmosenv.2008.01.044
Zheng, C., et al., 2017. Analysis of influential factors for the relationship between PM2.5 and AOD in
Beijing. Atmospheric Chemistry and Physics, 17 (21), 13473. doi:10.5194/acp-17-13473-2017
Zheng, Y., et al., 2015. Forecasting fine-grained air quality based on big data. In: Proceedings of the 21th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Sydney, NSW: ACM,
2267–2276.
Zhou, Y. and Leung, Y., 2010a. Empirical mode decomposition and long-range correlation analysis
of sunspot time series. Journal of Statistical Mechanics, 2010 (12), P12006. doi:10.1088/1742-
5468/2010/12/P12006
Zhou, Y. and Leung, Y., 2010b. Multifractal temporally weighted detrended fluctuation analysis
and its application in the analysis of scaling behavior in temperature series. Journal of Statistical
Mechanics, 2010 (06), P06021. doi:10.1088/1742-5468/2010/06/P06021
Zhou, Y., Leung, Y., and Ma, J.M., 2013. Empirical study of the scaling behavior of the amplitude–
frequency distribution of the Hilbert–huang transform and its application in sunspot time series
analysis. Physica A, 392 (6), 1336–1346. doi:10.1016/j.physa.2012.11.055
Zhu, J.Y., Sun, C., and Li, V.O., 2015. Granger-causality-based air quality estimation with
spatio-temporal (ST) heterogeneous big data. In: Computer Communications Workshops
(INFOCOM WKSHPS), 2015 IEEE Conference on. Hong Kong, China: IEEE, 612–617.
Zhu, J.Y., Sun, C., and Li, V.O., 2017. An extended spatio-temporal Granger causality model for air
quality estimation with heterogeneous urban big data. IEEE Transactions on Big Data, 3 (3),
307–319. doi:10.1109/TBDATA.2017.2651898

You might also like