Regional Air Pollution Requia2019

Atmospheric Environment 213 (2019) 258–272
Contents lists available at ScienceDirect
Atmospheric Environment
journal homepage: www.elsevier.com/locate/atmosenv
Regional air pollution mixtures across the continental US T

a,∗ b a
Weeberb J. Requia , Brent A. Coull , Petros Koutrakis
a
Harvard University, Department of Environmental Health, School of Public Health, Boston, MA, United States
b
Harvard University, Department of Biostatistics, School of Public Health, Boston, MA, United States
A R T I C LE I N FO A B S T R A C T
Keywords: A limited literature body have estimated regional differences in air pollution mixtures. More comprehensive
Air pollution analyses are necessary to accurately depict differences in air pollution characteristics over space and time. Our
Pollutant mixtures objective is to further these efforts by investigating spatial differences of air pollution mixtures across the US. We
Spatiotemporal analysis employed spatially constrained clustering approach (based on k-means algorithm) to group air pollution mon-
Cluster analysis
itoring sites that exhibit distinct pollutant profiles or mixtures in the US over 9 years (2008–2016). We accounted
for 20 chemical components of PM2.5. The resulting clusters of pollution mixtures are characterized and vali-
dated based on source emissions represented by land-use information. Our analysis resulted in 27 clusters with
different number of sites. For example, the cluster 1 has 14 sites and it covers part of the southeast, including the
states of North Carolina, South Carolina, Georgia, and Florida. The southwest has a very prominent cluster with
8 sites (cluster 26), covering part of the Louisiana, Mississippi, Texas, and Arkansas. In the west coast, two
clusters were highlighted in our analysis, cluster 3 in California and cluster 7 in Washington and part of Oregon.
Both clusters with 5 sites. We estimated that Cu, Se, NO3−, Cr, and Ba were the top five species that divided the
study area into cluster of sites more effectively. Observing the concentration ratios (concentrations of the species
i/concentration of PM2.5) for some of these clusters, our results show that clusters 3 and 7 in the west coast
represent sites with high Na ratios. Cluster 13 in the northwest and part of the Midwest represents sites with high
SO42− ratio. The cluster 16 with a single site in northeast has the highest SO42− ratio, representing almost the
third quartile of the SO42− ratio. This is one of the few studies focused on spatial patterns analysis to estimate
regions that exhibit distinct pollutant mixtures on a large scale. We expect that further investigations can use our
findings to analyze the relationship between areas that exhibit distinct pollutant mixtures and the impact of
regulations, climate change, and health effects in the US.
1. Introduction networks, improve air quality regionally, and reduce health risks
(Wesson et al., 2010).
Studies have demonstrated significant regional differences in air More detailed analyses have been conducted using emission in-
pollution transport, emission sources, and atmospheric reactions, which ventories and ambient data from monitoring networks for identifying
lead to considerable spatial heterogeneity of pollutant mixtures (Jeong multi-pollutant profiles in air pollution data (Austin et al., 2013, 2012).
and Park, 2013; Lee et al., 2012; J. Liu et al., 2009a, b; Ying and Cluster framework has been one of the main methods used by these
Kleeman, 2006). Researchers and environmental agencies highlight the previous studies. Investigators have also used pollution mixture clusters
importance of multi-pollutant mixtures in air pollution investigations, to assess the heterogeneity in the relationship between air pollution and
showing that the multi-pollutant approach enhances our understanding health (Zanobetti et al., 2014). The literature has shown that the in-
of the multivariate relationship between pollutants at a given site teractions among air pollutants may cause different impacts on in-
(Cooper et al., 2012; Oakes et al., 2014). For example, the United States dividuals, including additive, multiplicative, and antagonistic effects
Environmental Protection Agency (US EPA) has adopted multi-pollu- (Mauderly et al., 2010). Understanding the effect of mixtures on health,
tant approach to develop policies and plans in order to control and rather than the effect of the individual components, is a crucial step that
regulate air quality. Scientists from EPA have reported that policies and must be undertaken to enhance our understanding about the health
plans based on multi-pollutant approach are able to achieve greater effects or air pollution.
reductions of particulates and gases at as observed by the monitoring These studies compose the short literature body on regional
∗
Corresponding author.
E-mail addresses: weeberb@gmail.com, wjrequia@hsph.harvard.edu (W.J. Requia).
https://doi.org/10.1016/j.atmosenv.2019.06.006
Received 19 February 2019; Received in revised form 27 May 2019; Accepted 1 June 2019
Available online 04 June 2019
1352-2310/ © 2019 Elsevier Ltd. All rights reserved.
W.J. Requia, et al. Atmospheric Environment 213 (2019) 258–272
differences in air pollution mixtures. More comprehensive analyses are species fractions were standardized using a z-transformation. The z-
necessary to accurately depict differences in air pollution character- transformation is the mean for all values minus the observation value
istics over space and time. Our objective is to further these efforts by and divided by the standard deviation for all values.
investigating spatiotemporal regional differences of air pollution mix- We used PM2.5 as a reference pollutant as its concentration will be
tures across the US. more reliably determined using different data sources, including
We previously developed cluster analysis-based framework to ground monitor, satellite, and modeled data. Although our analysis
identify distinct groups of days with similar pollutant mixtures (Austin accounts only for ground monitor, future studies can use our framework
et al., 2012), to cluster days based on weather parameters (Austin et al., for satellite and modeled data. Also, PM2.5 modifies the sum of species
2015), and then to identify spatial patterns in PM2.5 composition included in this analysis. In addition, we chose PM2.5 because it is re-
(Austin et al., 2013). Now, we aim to expand this framework by in- sponsible for a large proportion of air pollution mortality. Thus, future
cluding a spatial constraints parameter and expanding the study period health studies can express mixture toxicity in a risk unit per 1 μg/m3 of
- over 9 years (2008–2016) to group areas that exhibit distinct pollutant PM2.5. Using PM2.5 as a reference pollutant for both the mixture profiles
profiles or mixtures in the US. These areas are referred to as air pol- and health effects makes it possible to directly link mixture composition
lution regions. We will then characterize and validate the spatial pat- to its toxicity.
terns of pollution mixtures based on emission rates and source emis- The results of the mixture profiles were consolidated in a single
sions influencing air quality. dataset (20 profiles for each of the 108 air pollution monitoring sta-
We hypothesize that our cluster analysis based on air pollutant tions) that was used in the cluster analysis, as detailed in the next
mixtures will make it possible to minimize within-region variability and section.
maximize between-regional variability of regional mixture profiles. We
posit that areas exhibiting similar pollutant mixtures are impacted by 2.3. Spatial patterns analysis
similar sources and atmospheric processes, and this can inform devel-
opment of more targeted and regionally tailored air quality public As mentioned above, we estimated spatiotemporal clustering of
health management practices. pollutant mixtures based on the framework we previously developed
(Austin et al., 2013). We used the k-means algorithm to partition air
2. Materials and methods pollution monitoring stations into clusters. This algorithm is easily
implemented, computationally efficient, and not very sensitive to out-
2.1. Study design and data liers (Punj and Stewart, 1983). The difference from our previous fra-
mework is that in this current analysis we accounted for a spatial
We evaluated spatiotemporal patterns for 20 components of am- constraint parameter.
bient PM2.5, including As, Ba, Ca, Cr, Cu, EC, Fe, K, Mn, Na, NH4+, Ni, We employed Spatially Constrained Multivariate Clustering (SCMC)
NO3−, OC, Pb, Se, Si, SO42−, V, and Zn. Other elements obtained as approach, which clusters multivariate vectors of pollution constituents
part of the speciation of the filters were considered but were excluded while accounting for spatial autocorrelation among measurements in
because of the data did not reach the criteria defined in the data pre- neighboring sites, to yield air pollution regions. This method estimates
paration (these criteria are detailed below). clusters using local measures of spatial autocorrelation based on spatial
Data for this analysis were obtained from the EPA Air Data constraints parameter (we describe that in the next section). The use of
Monitoring Program - air quality data collected at outdoor monitors spatial constraints results in the creation of air pollution regions in-
across the US (https://www.epa.gov/outdoor-air-quality-data). We ac- cluding contiguous sites, as compared to having sites of the same air
cessed data constructed on a daily basis listed by year (data with in- pollution region to belong to different geographic areas. We anticipate
formation on the day, month, and year of monitoring). We considered that this approach maximizes air pollution homogeneity within regions
the period between 2008 and 2016. and heterogeneity across regions. Toward to this end, we applied the
We only considered air pollution monitoring stations that have less SCMC method using an algorithm based on unsupervised machine
than 25% missing observations for the elements of interest. Stations learning process. This algorithm supports the goal in determining the
that did not meet this criterion were excluded. We also require that best solution that maximizes both intra-cluster similarity and inter-
each season within the study period has less than 25% missing data. cluster differences (Assuncao et al., 2006; Duque et al., 2007).
This is to ensure that the site means are not unduly influenced by
missing data sets between 2008 and 2016. As result, 108 sites with 2.3.1. Spatial aspect – modeling spatial relationships
complete datasets were selected. This 25% missing criteria was applied In our study, the spatial aspect (spatial constraints parameter) refers
across all sites, regardless of whether they were sampled every day, 3 to the conceptualization of spatial relationships (spatial neighboring)
days, or 6 days. among the ambient monitors across the US. We chose the inverse dis-
Then, for each site, we estimated the average concentration between tance method to describe those spatial relationships. This method is
2008 and 2016 for those 20 component of ambient PM2.5. This final recommended by the literature to model variables where the closer two
dataset was used in the further stages of this study, as described in the features (monitoring sites) are in space, the more likely they are to
next sections. interact with each other (Wong et al., 2004). The inverse distance
considers that every monitor is potentially a neighbor of every other
2.2. Mixture profiles monitor. We forced the model to estimate the minimum distance that
ensures every feature has at least one neighbor. The inversed distance
To analyze mixtures, we created a new parameter representing method was applied by generating a spatial weight matrix. This matrix
species profiles. We considered a mixture that consists of p species encompasses the conceptualization of the relationships among a set of
(i = 1, 2, 3, …, p) measured during a time period t (t = 1, 2, 3, …, m) at points.
site j. The profile of a species i in a site j and period t, fijt, is equal to the
ratio Cijt/Pjt, where, Cijt and Pjt are the concentrations of the species i 2.3.2. Optimal number of clusters (Pseudo-F statistic)
and PM2.5 in site j during the period t, respectively. The correlation In order to determine each cluster, the SCMC approach first iden-
between profiles of two mixtures indicates the degree of similarity. tifies a seed site (monitors) randomly, and then assigns all sites to the
Therefore, the normalization of species concentrations enabled us to closest seed site. Then, it computes a mean data center for each cluster
compare mixture characteristics between and within regions. To elim- of sites, and reassigns each site to the closest center.
inate differences in the order of magnitude between profile levels, the The number of seed features selected randomly matched the number
259
of clusters. To estimate the optimal number of clusters, we applied the Table 1

Calinski-Harabasz pseudo F-statistic (Calinski and Harabasz, 1974), as R2 values (the effectiveness of each species fraction to divide the sites into
presented by the following equations: clusters).
Species Mean Standard Minimum Maximum R2
CH =
( ) T2
nc − 1 fraction deviation
( ) 1 − T2
n−nc (1)
Cu
Se
0.00250
0.002499
1.004448
1.004448
−0.13503
−0.141095
7.912261
7.387605
1
0.999993
where, CH is the Calinski-Harabasz pseudo F-statistic; nc is the number NO3− 0.002683 1.004422 −0.144867 7.912232 0.999989
Cr 0.001724 1.004538 −0.126204 10.357268 0.999930
of clusters; n is the number of sites; and T2 is defined as follows: Ba 0.002213 1.004482 −0.15068 7.379933 0.999881
SST − SSE V 0.001830 1.004527 −0.145462 8.728342 0.999877
T2 = Mn 0.002927 1.004373 −0.195065 9.369018 0.998401
SST (2)
K 0.001765 1.004485 −0.229605 7.534327 0.997474
where SST is a reflection of between-clusters differences (described by Ni −0.013538 0.997743 −0.370325 7.799321 0.98489
Zn 0.0108230 1.000811 −0.770086 6.825110 0.93903
Equation (3)) and SSE is a reflection within-clusters similarity (de-
Na −0.080534 0.795406 −0.662960 3.176717 0.89838
scribed by Equation (4)). Si −0.034974 0.923669 −0.800437 3.835498 0.889383
nc ni nv As 0.0086620 1.002555 −1.548009 6.133993 0.86320
2 −0.010572 −1.446661
SST = ∑ ∑ ∑ (Vkmj − V k) Fe 0.990227 5.692884 0.851959
m= 1 j= 1 k= 1 (3) Pb 0.011470 1.000593 −1.133681 5.086068 0.845947
Ca −0.020774 0.992678 −0.906632 4.265528 0.803963
nc ni nv SO42- 0.004313 0.998129 −2.315519 2.534484 0.802544
2
SSE = ∑ ∑ ∑ (Vkmj − Vkg ) NH4+ 0.037589 0.964309 −2.032916 2.407511 0.746411
m= 1 j= 1 k= 1 (4) EC 0.017339 0.993448 −1.558706 4.639332 0.718052
OC 0.042189 0.953742 −2.668853 3.128883 0.717184
where ni is the number of sites in cluster m; nc is the number of clusters;
k Notes: the mean, standard deviation, minimum, and maximum values are
nv is the number of variables uses to group sites; Vmj is the value of the
standardized.
kth variable of the jth site in the mth cluster; V k is the mean value of the
kth variable; and V gk is the mean value of the kth variable in cluster m.
The pseudo F-statistic was performed for 30 simulations (simulating low membership probability indicates that the site could be classified in
a different cluster group, whereas a high membership probability sug-
2 clusters, 3 clusters, 4 clusters, up to 30 clusters). The highest pseudo
F-statistic value among these simulations determines the optimal gests confidence that the site belongs in the cluster group it was in-
cluded. The variation range of the membership probabilities is between
number of clusters.
0 and 1. In other words, the membership probability measures how well
our methodology minimized within-region variability and maximized
2.3.3. Analysis of variables to distinguish cluster
between-regional variability of regional mixture profiles (the closer to
The SCMC analysis calculates an R2 value for each variable (species
1, the higher confidence that our methodology achieved the purpose).
fraction). This value indicates the variable that divides the study area
into clusters of sites more effectively (Assuncao et al., 2006; Duque
et al., 2007). The larger the R2 value, the better the discrimination 2.4. Sensitivity analysis
among the sites. In other words, the R2 value reflects how much of the
variation in the original elemental concentration data was retained We conducted a sensitivity analysis to examine how sensitive the
after the clustering process. The R2 value is calculated as following: results were to the following parameters: i) spatial constraints; ii) data
completeness; iii) site inclusion, iv) characterization of pollutant mix-
TSS − ESS tures, and; v) number of clusters.
R2 =
TSS (5)
First, we changed the spatial constraints method, from the inverse
where TSS is the total sum of squares, represented by squaring and then distance approach to two other methods - the spatial weight matrix
summing deviations from the global mean value for a particular vari- based on k nearest neighbors (minimum neighbors = 8), and the
able; and ESS is the explained sum of squares, represented the same Ttrimmed Delaunay Triangulation. Then, to test the sensitivity to data
way, except that deviations are calculated by subtracting every value completeness, for each site, 20% of the days were randomly excluded
from the mean value for the cluster it belongs to and is then squared and the season means were recalculated. To test how the clusters are
and summed. subject to which sites are included in the analysis, we randomly re-
moved 10% of the sites and repeated the clustering. To test the sensi-
2.3.4. Membership probabilities tivity to the approach used to characterize the mixture profiles, we
We evaluated cluster membership likelihood using 1000 permuta- repeated the analyses accounting for the dataset without mixture. Here
tions of random spanning trees and evidence accumulation (Lage et al., it was considered the measured pollutant reported by the US EPA and
2001; Maravalle et al., 1997). This process was employed by the we only standardized the values using z-transformation. Finally, we
method Skater (Assuncao et al., 2006; Lage et al., 2001). Cluster ana- tested the sensitivity to the number of clusters by looking at the tra-
lysis of spatial objects has an inherent problem with relational (con- deoffs between the number of clusters and the Pseudo F Statistics.
tiguity) constraints. Lage et al. (2001), Maravalle et al. (1997), and
Assuncao et al. (2006) define this as an optimization problem related to 3. Results
the contiguity-constrained clustering (known as clustering problems on
Trees). The method Skater reduces the original graph representation of 3.1. Summary statistics
the spatial information of the objects by pruning edges generating a
minimal spanning tree. The permutation process performed by the Table S.1 in supplemental information shows the descriptive sta-
Skater approach assesses the occurrence at which sites of a particular tistics (over the 108 sites) of the 20 components of ambient PM2.5 in-
cluster are clustered together under the varying spanning trees. Sites cluded in the study. OC, SO42−, and NO3− were the elements with the
that switch clusters as consequence of variations in the spanning tree highest mean, 1.99; 1.62; and 1.27 μg/m3, respectively. The lowest
will be assigned with low membership probabilities while sites that do mean was observed for As, Se, V, and Ni, with mean concentration
not switch clusters are assigned with high membership probabilities. A equal to 0.0006; 0.0006; 0.0007, and 0.0010, respectively.
260
Fig. 1. Optimal number of clusters (Pseudo F values).
Fig. 2. Distribution of membership probability.

Note: This histogram was based on the membership probability value estimated for each air pollution site (total of 108 sites).
In section 3.3, Table 1, we present the descriptive statistics for the confidence in the results, while a high membership probability (highest
fractions considered in the cluster analysis (concentrations of the spe- value = 1) indicates high confidence in the results. In general, our re-
cies i/PM2.5 concentration in site j). sults are relatively significant in terms of confidence. The average
membership probability was 0.85, with a standard deviation of 0.18.
3.2. Number of clusters
Our analysis resulted in 27 clusters. This is the optimal number of 3.5. Clusters characteristics
clusters that maximizes both intra-cluster similarity and inter-cluster
dissimilarity. As described in the methods section, the optimal number 3.5.1. Spatiotemporal distribution and geographic parameters
of clusters was estimated with the pseudo F-statistic where the highest Fig. 3 presents a map of locations of the clusters and a chart illus-
pseudo F-statistic value represents the optimal number of cluster. For trating the distribution of sites number in each cluster. Five clusters
27 clusters, the estimated pseud F-statistic value was 28.94. Fig. 1 (cluster 1, 3, 7, 13, and 26) had more than 4 sites. Our analyses also
shows the pseudo F-statistic values for the different simulations show that 11 clusters (cluster 2, 8, 9, 10, 15, 16, 18, 21, 23, 24, and 27)
(number of clusters, from 2 to 30). presented a single site.
Among the clusters with more than 4 sites, the cluster 13 was the
3.3. Effectiveness of each species fraction to divide sites into clusters one with the highest number of sites, 33 air pollution monitoring sta-
tions. This cluster 13 is located in northeast and part of the Midwest
Table 1 shows descriptive statistics (considering the whole sample, (Ohio, Indiana, Illinois, and Wisconsin). The cluster 1 has 14 sites and it
108 sites) and the R2 value for the 20 fractions (concentrations of the covers part of the southeast, including the states of North Carolina,
species i/PM2.5 concentration in site j) considered in our analysis. Based South Caroline, Georgia, and Florida. The southwest has a very pro-
on the R2 value, the results suggest that Cu, Se, NO3−, Cr, and Ba were minent cluster with 8 sites (cluster 26), covering part of the Louisiana,
the top five species that divided the study area into cluster of sites more Mississippi, Texas, and Arkansas. In the west coast, two clusters were
effectively. The larger R2 value, the better the discrimination among the highlighted in our analysis, cluster 3 in California and cluster 7 in
sites. EC and OC presented the lowest R2 values. Washington and part of Oregon. Both clusters with 5 sites.
The input dataset in our analysis includes the land use classification
3.4. Cluster membership likelihood for each site. This classification was assigned by the EPA and is divided
into three groups – rural, suburban and urban/center city areas. We
The histogram (plus mean and standard deviation values) of mem- present in Fig. 4 the average mixture concentration rates (specie/PM2.5)
bership probability are presented in Fig. 2. These results can provide a of all specie fractions by cluster and land use type. Our results showed
sense of the likelihood of significantly different conclusions (e.g., a that sites do not necessarily have the same land use classification within
particular site could be classified in a different cluster). This is a mea- a cluster. We can observe that in the clusters 1, 4, 13, and 26, which
sure of pollution homogeneity within clusters, and heterogeneity across presented the three land use classes. This may be related to the re-
regions. A low membership probability (lowest value = 0) suggests low lationship between regional and local pollution.
261
Fig. 3. Spatial distribution and number of sites by cluster. Note: the map and the chart has the same color key according to the cluster. (For interpretation of the
references to color in this figure legend, the reader is referred to the Web version of this article.)
Fig. 4. Average mixture concentration rates (specie/PM2.5) by cluster and land use.
262
Fig. 5. Heatmap of the standardized concentration ratio by cluster of the mixture profiles. Note: Axis-x represents the mixture profiles; axis-y represents the clusters;
and the color key distribution across the heatmap was standardized by columns (mixture profiles). (For interpretation of the references to color in this figure legend,
the reader is referred to the Web version of this article.)
3.5.2. Concentration ratios sites located in the west coast. The other 105 sites represent the second
Fig. 5 shows the distribution of the concentration ratio by all the 27 cluster. K nearest neighbors is based on a prior minimum number of
clusters. These diagnostic ratios permit simple comparisons between neighbors (we defined 8). This is appropriate for analysis when fixing
cluster types and allow us to interpret the multi-pollutant mixtures the spatial scale is less important than fixing the number of neighbors.
according to certain types of pollution regimes (tracer elements of Spatial scale is an important aspect in air pollution studies. Regarding
source types). For example, our results show that clusters 2 and 8 re- the triangulation method, it defines neighbors based on voronoi trian-
present sites with very high Mn and Cr ratios, respectively. gles, which each site is a triangle node. Nodes connected by the triangle
edge will be determined as neighbors (Cai et al., 2018; Deng et al.,
3.6. Sensitivity of the results 2011). The limitation of this method is that some grouped triangles are
not contiguous over space. Therefore, we recommend the use of the
Our results demonstrated some sensitivity to the five parameters inverse distance approach (used in the primary analysis). The inverse
tested, spatial constraints, data completeness, site inclusion, approach distance method accounts for at least one neighbor and the spatial
used to characterize the pollutant mixture, and number of clusters. distribution of the data itself will estimate how many neighbors each
For the spatial constraints, the number of clusters changed when we site gets. This method make it possible to create air pollution regions
applied the weight matrix approach with k nearest neighbors or when including contiguous sites, as compared to having sites of the same air
we applied the trimmed delaunay triangulation approach to represent pollution region to belong to different geographic areas. This approach
the spatial relationship among sites. Our results showed 2 clusters when maximize air pollution homogeneity within regions and heterogeneity
we accounted for weight matrix or trimmed Delaunay triangulation. For across regions. Therefore, a more substantiated grouping of regions can
both spatial constraints approach, the first cluster is composed by 3 inform management of regional air sheds.
263
Fig. 6. Spatial distribution of clusters (each color represents one cluster) for the sensitivity based on the approach used to characterize the mixture profiles (without
accounting for the ratio by PM2.5 concentration). This sensitivity analysis resulted in 30 clusters, illustrated in this map.
The clusters distribution also changed when we excluded 20% of the 4. Discussion
days (data completeness parameter) and 10% of the sites (site inclusion
parameter). By incorporating this change, our results showed a sig- This is one of the limited number of studies focusing on spatial
nificant difference in the optimal number of clusters. While in the patterns analysis to estimate regions that exhibit distinct pollutant
primary analysis we estimated 27 clusters, in this sensitivity analysis we mixtures on a large scale (all the US). Our analysis was based on a
estimated 17 clusters. The sensitivity to data completeness suggests that framework previously developed for a single city (Austin et al., 2013).
the site means are influenced by missing data sets between 2008 and We adapted this framework for a multi-city study in the US. The chal-
2016. This supports the data treatment that we performed before lenge here was related to the aspects of cluster analysis that are in-
cluster analysis. In this treatment, we defined the completeness of the herently subjective in selecting the best clustering solution for each
original data to be greater than 75% for the sites included in the ana- location. However, our findings suggest strong confidence (based on the
lysis. This sensitivity was observed when we applied our framework statistical criteria) according to the results of the membership like-
previously (Austin et al., 2013). lihood, which the average membership probability was 0.85, with a
The clustering is also subject to the approach used to characterize standard deviation equal to 0.18. The higher cluster membership is
the mixture profiles. As we described above, we repeated the analysis essential in cluster analysis, without it, the clusters are of little use for
without accounting for the mixture profiles (ratio by PM2.5 concentra- air pollution studies (Keller et al., 2017).
tion). We repeated this analysis considering only the concentration of Our findings show that the spatial variation in air pollution mixtures
individual species. As illustrated by Fig. 6, the number and spatial in the US affect substantially in defining cluster profiles. This is con-
distribution of clusters changed. For example, in the primary analysis sistent with previous studies that have demonstrated strong spatio-
we estimated 27 clusters, while in this sensitivity analysis the number temporal variation in air pollution (Austin et al., 2012; Bell et al., 2007;
of clusters estimated was 30. The cluster profile also changed with this Li et al., 2017; Querol et al., 2008; Zhang et al., 2015). For example,
sensitivity analysis. For example, in the primary analysis we estimated Austin et al. (2012) observed that seasonal patterns within cluster of
about 26 sites in Northeast reflecting high concentration of SO4−2 and pollutant mixtures in Boston. The authors suggest that conditions that
OC, whereas in this sensitivity analysis we observed about 38 sites. The lead to the formation of the mixture captured by some clusters occur
results of this sensitivity analysis show the different aspects on the most often in specific regions, including the northeast. Bell et al. (2007)
degree of similarity between species and PM2.5. As we mentioned be- found distinct regional and seasonal patterns of the PM2.5 components
fore, the correlation between profiles of two mixtures indicates this in the US. The authors report that the degree of the spatiotemporal
degree of similarity. Therefore, the normalization (using PM2.5 as a variation differs by PM2.5 components.
reference pollutant) of species concentrations supports the comparison Besides spatiotemporal variation in pollution mixtures, air pollution
of profile characteristics between and within regions or between time sources, chemical properties, and geographic parameters were also
periods. identified as significant factors in distinguishing one region from an-
As shown by Fig. 1, there is a slight difference in the Pseudo F other. This is in agreement with the literature that shows the potential
Statistic between approximately 15 clusters to 30 clusters. The biggest of these variables as modifier factors of the air pollution exposure levels
inflection point appears around 11 clusters. Given that from the per- (Austin et al., 2013; Bari and Kindzierski, 2016; Keller et al., 2017;
spective of national air quality management strategies, it would be Requia et al., 2017; van Donkelaar et al., 2014). In particular, we ob-
better to have few numbers of clusters. Therefore, we conducted sen- served similar influence from these factors when we compared our re-
sitivity analysis to test how sensitive the results are to three different sults with those obtained in the original framework (Austin et al., 2013)
number of clusters – 10, 15, and 20 clusters. These different number of – the framework as reference for the present study. Other specific work
clusters were based on the tradeoffs between the number of clusters and demonstrates that geographic covariate information increases the pre-
the Pseudo F Statistics. Fig. 7 shows the spatial distribution of clusters cision in exposure assignment when using clusters of air pollution at
for this sensitivity analysis. We can observe that our primary analysis cohort locations (Keller et al., 2017).
(27 clusters) is similar to the analysis with 15 and 20 clusters. The re- Our findings are also in agreement with previous investigations
sults of this sensitivity analysis show that only few sites are assigned to when we incorporate our results in source apportionment studies con-
different clusters when we set the model with the parameter number of sidering the regional profiles. This allows us to characterize the clusters
clusters varying from 15 to 27 clusters. into regions with certain types of pollution regimes based on emissions
264
Fig. 7. Spatial distribution of clusters (each color represents one cluster) for the sensitivity analysis to examine how sensitive the result are to different number of
clusters – 10, 15, and 20 clusters.
Note: the clusters are represented by the different colors in the maps.
265
Fig. 8. Spatial distribution and mixture ratios in the region 1 - Northeast and part of the Midwest.
Note: the map and the chart has the same color key according to the cluster.
sources. Therefore, we suggest a characterization into 5 regions, as in the form of ammonium sulfate. Studies show that reduction in sulfate
described below. will increase the available free ammonium (Ciuraru et al., 2012). Also,
ammonia from sources such as fertilizer contributes to the formation of
sulfates and nitrates that exist in the atmosphere as ammonium sulfate
4.1. Region 1 - northeast and part of the midwest and ammonium nitrate (Shen et al., 2011). Other clusters located in
that Industrial Midwest area, cluster 12 (3 sites), cluster 11 (4 sites),
The first region is the northeast and part of the Midwest. This region and cluster 21 (1 site) presented similar mixture ratios. The cluster 21
is mostly defined by the cluster 13 (the cluster with the highest number (south Illinois), in particular, highlighted by the very high ratio of Fe
of sites, 33) with a mix of rural, suburban and urban area, plus seven and Zn, reflecting emissions from road dust and motor vehicle Fe
other clusters with few sites. Fig. 8 illustrates this region and the dis- (Almeida-Silva et al., 2011; Bari and Kindzierski, 2016; Liu et al.,
tribution of the mixture ratio of each cluster. 2014).
Overall, this region reflects air pollution sites with the higher SO4−2 Specifically in the coastal area (East coast), there is a particular
concentration ratio (except for two clusters, cluster 27 with a single cluster (cluster 17) with 4 sites in urban areas which reflects high ratio
site, and 22 with three sites). Most sulfate aerosol in the atmosphere of Ni and Na. Regarding the Ni, it suggests that this location is impacted
comes from the photochemical conversion of SO2 (Roberts and by emissions from ports. Studies have shown contributions from ship
Friedlander, 1976). Source apportionment studies have shown that emissions to Ni concentrations (Agrawal et al., 2008; Moldanová et al.,
power plants are the main sources of SO2 (Fu et al., 2013; Huang et al., 2009). Regarding Na, this pollution regime indicates presence of sea
2012). According to the US Energy Information Administration (EIA), salt, the main source of sodium (Bersenkowitsch et al., 2018; Laskin
in 2015, about 40% of the SO2 emissions from power plants in the US et al., 2003).
occurred in New England, Middle Atlantic, and in the East North Cen- Finally, we highlight the high ratio of NH4+ in this region (except
tral region (EIA, 2015). the cluster 22 and 27, more close to the coastal area). This reflects
The cluster 16 with a single site located in a suburban area in west contribution from agricultural area. As we mentioned above, ammonia
Pennsylvania (known as Industrial Midwest) exhibited very high from sources such as fertilizer contributes to the formation of sulfates
(nearly the third quartile) ratio of SO4−2, As, EC, NH4+, OC, and Pb. and nitrates that exist in the atmosphere as ammonium sulfate and
This suggests contributions from coal combustion and industrial pro- ammonium nitrate (Shen et al., 2011).
cesses. The high ration of SO4−2 and NH4+ in this cluster 16 may be
due to the chemistry of these elements, which there is an inter-
dependence during the reactions - sulfuric acid and ammonia to form
ammonium sulfate. Indeed, sulfate is mostly present in the atmosphere
266
Fig. 9. Spatial distribution and mixture ratios in the region 2 - Southeast.

Note: the map and the chart has the same color key according to the cluster.
4.2. Region 2 - southeast reaction between ammonia and sulfuric acid is thermodynamically fa-
vored. In addition, sulfuric acid and ammonia are used to form am-
This region is mostly characterized by the cluster 1, composed by 14 monium sulfate. Indeed, sulfate is mostly presented in the atmosphere
sites located in a mix of rural, suburban and urban areas. In addition, in the form of ammonium sulfate. Studies show that reduction in sulfate
there are three more clusters in this area with single sites, clusters 15, will increase the available free ammonium (Ciuraru et al., 2012).
18, and 23. Fig. 9 illustrates the Southeast region and the distribution of The other two clusters with single site (cluster 15 and 23) have si-
the mixture ratio of each cluster. milar ratios. These clusters are located in a suburban/urban area in
The 14 sites grouped as cluster 1 reflect average ratios for most of Alabama, representing very high ratios (above the third quartile) of As,
the pollutants. The exception is for OC, which the ratio is almost the Ca, Fe, Mn, Pb, and Zn. This represent a mixture of pollution regimes
third quartile. Organic aerosols are a complex mixture of chemical based on emissions sources. The high ratio of As represents significant
compounds formed primarily by incomplete combustion or the oxida- contribution from coal combustion; Ca, Fe and Mn from road dust; and
tion of gas-phase precursors. It can be produced from fossil fuel and Pb and Zn from motor vehicle.
biofuel burning and natural biogenic emissions (Kanakidou et al.,
2005). Huang et al. (2015) estimated a global emission inventory of OC
and found that the Southeast in the US is a region with substantial OC 4.3. Region 3 - south
concentration. Huang et al. (2015) also show that in the US more than
90% of the anthropogenic OC comes from oil, gas and coal emissions. This region encompass one cluster with 8 stations in a mix of rural
The single site represented by cluster 18 in south Florida reflects and urban areas (cluster 26), and three clusters with single station
very high ratios (above the third quartile) of Ba, Ca, Cu, Fe, K, Na, Ni, (clusters 2 and 8 in an urban area, and cluster 9 in a rural area). The
NO3−, Se, Si, V, suggesting significant contribution from a mixture of clusters of the south region represents the states of Texas, Louisiana,
sources, including road dust (Ba, Ca, Fe, Na and Si), motor vehicle (Cu), Mississippi, Arkansas, Oklahoma, and Missouri, which are highlighted
wood burning (K), coal combustion (Se), and common sources in in Fig. 10.
coastal areas – ship engine exhaust (Ni) and and sea salt (Na), as we Clusters 2 and 8 have very similar ratios. Observing the whole ratio
mentioned above. In particular about NO3−, note that this single site in distribution with all 27 clusters estimated in our analysis (Fig. 5), we
cluster 18 presented extremes values for NH4+, SO4−2, and NO3−. This can see that the clusters 2 and 8 reflect the highest ratio of Ba, Cr, Cu,
may represent the complex chemistry of this elements, which more Ec, K, Mn, Ni, NO3−, Se, Si, and V. This represent a mixture of local and
ammonia becomes available to react with nitric acid to form ammo- regional sources, including motor vehicle, road dust, oil/coal combus-
nium nitrate. When nitric acid and sulfuric acid are present, the tion, and wood burning.
Cluster 9 is the only one in the region located in a rural area, which
267
Fig. 10. Spatial distribution and mixture ratios in the region 3 - South.
Note 1: the map and the chart has the same color key according to the cluster.
Note 2: the states representing the region 3 (south) are highlighted with black polygons.
is expected to find substantial contribution from agricultural activities. 4.4. Region 4 – southwest and mountain
Similar to the pollution regime observed in the region 1 (Northeast and
part of the Midwest), ammonia from fertilizer contributes to the for- This region includes five clusters with similar pollution regime –
mation of sulfates and nitrates that exist in the atmosphere as ammo- clusters 4 (three sites), 5 (three sites), 10 (a single site), 14 (two sites),
nium sulfate and ammonium nitrate (Shen et al., 2011). Cluster 9 and 24 (a single site) (Fig. 11).
presented high ratios of NH4+, NO3−, and SO4−2. All these ratios were All these clusters reflect little contribution from power plants (low
above the third quartile. Cluster 9 also presented very low As and EC ratios of SO4−2), including coal combustion (low ratios of As and Se). In
ratios. These ratios were lowest one (very close to the minimum value) contrast, these clusters suggest high contribution from road dust (high
compared to the clusters in the south region (Fig. 10) and even to the ratios of Ca and Si). Overall, these clusters in the southwest/mountain
total clusters (Fig. 5). This reflects very little contribution from coal region are differentiated by the ratio of EC, Fe, Na, NH4+, OC, Pb and
combustion (main As source) and motor vehicles (important EC source Zn. This is related to the geographical distribution of some tracer ele-
plus Zn, Pb, Cu, and Br, which were low for cluster 9 as well). ments of source types. For example, the ratios of carbon particles (EC
The cluster with the highest number of stations in the south region and OC) tends to be higher in the west coast. This reflects the significant
(cluster 26) represents ratios with values within the interquartile range, EC and OC emissions from wildfire, most strongly in California, Nevada,
except for the Na, which the ratio was above the third quartile. This and Arizona (Bendix and Commons, 2017; Doerr and Santín, 2016;
suggests some regional contribution from sea salt, since most of the sites Marlon et al., 2012). Some studies have defined the chemical char-
in this region are not in the coastal area. A large body of the literature acteristics of PM2.5 as the outcome in the wildfire-related air pollution
has demonstrated that local air quality can be impacted by pollution models (Gunsch et al., 2018; Jaffe et al., 2008; Spracklen et al., 2009).
from distant sources (e.g., local, regional, and even inter-continental These studies have shown that the relationship between particle com-
sources) due to the atmospheric transport (Jeffe et al., 1999; Lin et al., ponents and wildfire varies significantly over space and time depending
2014; J. Liu et al., 2009a, b; Ngo et al., 2018; Zhang et al., 2017). For on the chemical characteristics of PM2.5 and geographical character-
example, Lin et al. (2014) estimate that air pollution sources in China istics, including weather parameters (McClure and Jaffe, 2018;
contribute 3–10% of annual mean surface sulfate concentrations over Spracklen et al., 2007). Among those numerous chemical components
the western United States in 2006. Similar to cluster 9, cluster 26 also of PM2.5, particulate carbon (including EC and OC) have been the most
reflects very little contribution from motor vehicles. The ratios for EC, indicated as trace elements of wildfires (McClure and Jaffe, 2018). In
Zn, Pb, and Cu were in general in the first quartile. our analysis, the cluster 14 with high ratio of EC and OC is located in
the state of Nevada (the station in this state is very close to California)
268
Fig. 11. Spatial distribution and mixture ratios in the region 4 – Southwest and mountain.
Note 2: the states representing the region 4 (southwest and mountain) are highlighted with black polygons.
and Arizona. influence of the relationship between regional and local pollution in
clustering air pollution mixture. First, we found that sites spatially close
are assigned to different clusters. Then, when we categorized the results
4.5. Region 5 – west coast
based on land use, we detected that sites do not necessarily have the
same land use class within a cluster. These results may be related to
Finally, the last region that we suggest according to our cluster
whether the cluster profile was influenced by regional pollution versus
analysis cover the west coast of the US, which can be considered a
local pollution. We suggest that the concentration measured at specific
coastal region. This region encompass 5 clusters – cluster 3 with 5 sites,
air pollution monitoring station will have differing amounts of mea-
cluster 7 with 4 sites, cluster 19 with 4 sites, cluster 20 with 2 sites, and
surement error, depending on the spatial heterogeneity of a given
cluster 25 with 2 cluster (Fig. 12).
pollutant across the study region.
These clusters have similar pollution regime (as we identified in the
We suggest that this study can benefit researchers, policy makers,
region 4 as well) for most of the element fractions. For example, all the
and local communities to create future strategies related to air pollution
5 clusters presented high values of Na, OC, and EC (for this element the
and environmental health. The important and imminent policy im-
clusters 3, 19, and 25 had values within the interquartile range). As we
plications is that our approach supports more targeted and regionally
discussed above, Na and OC are indicators of marine aerosols and
air quality management practices by minimizing within-region varia-
wildfire emissions, respectively. Both sources are significant in the west
bility and maximize between-regional variability of regional mixture
coast (Bendix and Commons, 2017; Doerr and Santín, 2016; Hand et al.,
profiles. This is based on the concept that regions with similar pollutant
2012).
mixtures are impacted by similar air pollution sources and atmospheric
Most clusters presented low rate of SO4−2 suggesting little con-
processes. We expect that further investigations can use our findings to
tribution from power plants (similar as we found for the region 4). Coal
analyze the relationship between areas that exhibit distinct pollutant
combustion, main source of As, is contributing only to the clusters 7 and
mixtures and the impact of regulations, climate change, and health
25. On the other hands, clusters 7, 19, and 25 reflect low rates of NH4+,
effects in the US.
while the clusters 3 and 20 reflect average rates.
Finally, given that differences in the PM2.5 constituents explain the
varying effect size of the association between PM2.5 and health
5. Conclusions (Achilleos et al., 2017; Dai et al., 2014; Zanobetti et al., 2009), we
suggest that our study can support further investigations to assess the
We propose an innovative approach to classify regions in the US health effects of PM2.5 components. For example, in the U.S., previous
based on the clusters of air pollutant mixtures. We observed a strong
269
Fig. 12. Spatial distribution and mixture ratios in the region 5 – Southwest and mountain.
Note 2: the states representing the region 5 (west coast) are highlighted with black polygons.
studies have shown that health impacts for PM2.5 mass are higher when represent the official views of the US Environmental Protection Agency.
the PM2.5 content of Br, Cr, Ni, or Na was higher (Franklin et al., 2008; Further, the agency does not endorse the purchase of any commercial
Zanobetti et al., 2009). In our study, we estimated that the clusters 2 products or services mentioned in the publication.
and 8 (located in the region 3 – South) reflect high ratio of these PM2.5
content. Bell et al. (2009) estimated that regions in the U.S. with high Appendix A. Supplementary data
concentration of EC, V, or Ni had higher risk of hospitalizations asso-
ciated with short-term exposure to PM2.5. In our analysis, we observed Supplementary data to this article can be found online at https://
that the region 1 (in the northeast and part of the Midwest) includes the doi.org/10.1016/j.atmosenv.2019.06.006.
cluster 17 with 4 sites that reflect high ratio of EC, V, and Ni. In Boston,
(Zanobetti et al. (2014) found that cluster characterized by high con- References
centrations of the elements related to primary traffic pollution and oil
combustion emissions has significant association of PM2.5 with daily Achilleos, S., Kioumourtzoglou, M.A., Wu, C. Da, Schwartz, J.D., Koutrakis, P.,
deaths. Zanobetti et al. (2014) found a 3.7% increase (95%CI: 0.4, 7.1) Papatheodorou, S.I., 2017. Acute effects of fine particulate matter constituents on
mortality: a systematic review and meta-regression analysis. Environ. Int. 109,
in total mortality, per 10 μg/m3 increase in the same day average of 89–100. https://doi.org/10.1016/j.envint.2017.09.010.
PM2.5. In our study, clusters suggesting high contribution from traffic Agrawal, H., Malloy, Q.G.J., Welch, W.A., Wayne Miller, J., Cocker, D.R., 2008. In-use
and oil combustion are significant in the regions 1 (especially in the gaseous and particulate matter emissions from a modern ocean going container
vessel. Atmos. Environ. 42, 5504–5510. https://doi.org/10.1016/j.atmosenv.2008.
coastal area - East coast) and 5 (west coast). Therefore, we suggest that 02.053.
taking our findings together, further investigations can assess the health Almeida-Silva, M., Canha, N., Freitas, M.C., Dung, H.M., Dionísio, I., 2011. Air pollution
effects of PM2.5 components by accounting for effect modification and at an urban traffic tunnel in Lisbon, Portugal: an INAA study. Appl. Radiat. Isot. 69,
1586–1591. https://doi.org/10.1016/j.apradiso.2011.01.014.
mediation of effects of spatial patterns via air pollution mixtures on Assuncao, R.M., Neves, M.C., Camera, G., Freitas, C., 2006. Efficient regionalization
health. techniques for socio-economic geographical units using minimum spanning trees. Int.
J. Geogr. Inforamtion Sci. 20, 797–811. https://doi.org/10.1080/
13658810600665111.
Austin, E., Coull, B., Thomas, D., Koutrakis, P., 2012. A framework for identifying distinct
Acknowledgement multipollutant pro fi les in air pollution data. Environ. Int. 45, 112–121. https://doi.
org/10.1016/j.envint.2012.04.003.
This work was supported by the US Environmental Protection Austin, E., Coull, B.A., Zanobetti, A., Koutrakis, P., 2013. A framework to spatially cluster
air pollution monitoring sites in US based on the PM2.5 composition. Environ. Int. 59,
Agency (grant RD-834798 and RD-835872). The contents of this report 244–254.
are solely the responsibility of the grantee and do not necessarily
270
Austin, E., Zanobetti, A., Coull, B., Schwartz, J., Gold, D.R., Koutrakis, P., 2015. Ozone depth and spatial clustering to predict ambient PM2.5 concentrations. Environ. Res.
trends and their relationship to characteristic weather patterns. J. Expo. Sci. Environ. 118, 8–15. https://doi.org/10.1016/j.envres.2012.06.011.
Epidemiol. 25, 535–542. https://doi.org/10.1038/jes.2014.45. Li, R., Cui, L., Li, J., Zhao, A., Fu, H., Wu, Y., Zhang, L., Kong, L., Chen, J., 2017. Spatial
Bari, M.A., Kindzierski, W.B., 2016. Fine particulate matter (PM2.5) in Edmonton, and temporal variation of particulate matter and gaseous pollutants in China during
Canada: source apportionment and potential risk for human health. Environ. Pollut. 2014–2016. Atmos. Environ. 161, 235–246. https://doi.org/10.1016/j.atmosenv.
1–11. https://doi.org/10.1016/j.envpol.2016.06.014. 2017.05.008.
Bell, M.L., Dominici, F., Ebisu, K., Zeger, S.L., Samet, J.M., 2007. Spatial and temporal Lin, J., Pan, D., Davis, S.J., Zhang, Q., He, K., Wang, C., Streets, D.G., Wuebbles, D.J.,
variation in PM2.5 chemical composition in the United States for health effects stu- Guan, D., 2014. China's international trade and air pollution in the United States.
dies. Environ. Health Perspect. 115, 989–995. https://doi.org/10.1289/ehp.9621. Proc. Natl. Acad. Sci. U.S.A. 111, 1736–1741. https://doi.org/10.1073/pnas.
Bell, M.L., Ebisu, K., Peng, R.D., Samet, J.M., Dominici, F., 2009. Hospital admissions and 1312860111.
chemical composition of fine particle air pollution. Am. J. Respir. Crit. Care Med. Liu, E., Yan, T., Birch, G., Zhu, Y., 2014. Pollution and health risk of potentially toxic
179, 1115–1120. https://doi.org/10.1164/rccm.200808-1240OC. metals in urban road dust in Nanjing, a mega-city of China. Sci. Total Environ.
Bendix, J., Commons, M.G., 2017. Distribution and frequency of wildfire in California 476–477, 522–531. https://doi.org/10.1016/j.scitotenv.2014.01.055.
riparian ecosystems. Environ. Res. Lett. 12. https://doi.org/10.1088/1748-9326/ Liu, J., Mauzerall, D.L., Horowitz, L.W., 2009a. Evaluating inter-continental transport of
aa7087. fine aerosols:(2) Global health impact. Atmos. Environ. 43, 4339–4347. https://doi.
Bersenkowitsch, N.K., Ončák, M., Van Der Linde, C., Herburger, A., Beyer, M.K., 2018. org/10.1016/J.ATMOSENV.2009.05.032.
Photochemistry of glyoxylate embedded in sodium chloride clusters, a laboratory Liu, Y., Paciorek, C.J., Koutrakis, P., 2009b. Estimating regional spatial and temporal
model for tropospheric sea-salt aerosols. Phys. Chem. Chem. Phys. 20, 8143–8151. variability of PM2.5 concentrations using satellite data, meteorology, and land use
https://doi.org/10.1039/c8cp00399h. information. Environ. Health Perspect. 117, 886–892. https://doi.org/10.1289/ehp.
Cai, J., Liu, Q., Deng, M., Tang, J., He, Z., 2018. Adaptive detection of statistically sig- 0800123.
nificant regional spatial co-location patterns. Comput. Environ. Urban Syst. 68, Maravalle, M., Simeone, B., Naldini, R., 1997. Clustering on trees. Comput. Stat. Data
53–63. Anal. 24, 217–234.
Calinski, T., Harabasz, J., 1974. A dendrite method for cluster analysis. Commun. Stat. 3, Marlon, J.R., Bartlein, P.J., Gavin, D.G., Long, C.J., Anderson, R.S., Briles, C.E., Brown,
1–27. K.J., Colombaroli, D., Hallett, D.J., Power, M.J., Scharf, E.A., Walsh, M.K., 2012.
Ciuraru, R., Gosselin, S., Visez, N., Petitprez, D., 2012. Heterogeneous reactivity of Long-term perspective on wildfires in the western USA. Proc. Natl. Acad. Sci. Unit.
chlorine atoms with ammonium sulfate and ammonium nitrate particles. Phys. Chem. States Am. 109, E535–E543. https://doi.org/10.1073/pnas.1112839109.
Chem. Phys. 14, 4527–4537. https://doi.org/10.1039/c2cp23455f. Mauderly, J.L., Burnett, R.T., Castillejos, M., Özkaynak, H., Samet, J.M., Stieb, D.M.,
Cooper, M.J., Martin, R.V., Van Donkelaar, A., Lamsal, L., Brauer, M., Brook, J.R., 2012. A Vedal, S., Wyzga, R.E., 2010. Is the air pollution health research community prepared
satellite-based multi-pollutant index of global air quality. Environ. Sci. Technol. 46, to support a multipollutant air quality management framework. Inhal. Toxicol. 22,
8523–8524. https://doi.org/10.1021/es302672p. 1–19. https://doi.org/10.3109/08958371003793846.
Dai, L., Zanobetti, A., Koutrakis, P., Schwartz, J.D., 2014. Associations of fine particulate McClure, C.D., Jaffe, D.A., 2018. US particulate matter air quality improves except in
matter species with mortality in the United States: a multicity time-series analysis. wildfire-prone areas. Proc. Natl. Acad. Sci. Unit. States Am. 115, 201804353. https://
Environ. Health Perspect. 122, 837–842. https://doi.org/10.1289/ehp.1307568. doi.org/10.1073/pnas.1804353115.
Deng, M., Liu, Q., Cheng, T., Shi, Y., 2011. An adaptive spatial clustering algorithm based Moldanová, J., Fridell, E., Popovicheva, O., Demirdjian, B., Tishkova, V., Faccinetto, A.,
on delaunay triangulation. Comput. Environ. Urban Syst. 35, 320–332. Focsa, C., 2009. Characterisation of particulate matter and gaseous emissions from a
Doerr, S.H., Santín, C., 2016. Global trends in wildfire and its impacts: perceptions versus large ship diesel engine. Atmos. Environ. 43, 2632–2641. https://doi.org/10.1016/j.
realities in a changing world. Philos. Trans. R. Soc. B Biol. Sci. 371. https://doi.org/ atmosenv.2009.02.008.
10.1098/rstb.2015.0345. Ngo, N.S., Bao, X., Zhong, N., 2018. Local pollutants go global: the impacts of inter-
Duque, J.C., Ramos, R., Surinach, J., 2007. Supervised regionalization methods: a survey. continental air pollution from China on air quality and morbidity in California.
Int. Reg. Sci. Rev. 30, 195–220. https://doi.org/10.1177/0160017607301605. Environ. Res. 165, 473–483. https://doi.org/10.1016/J.ENVRES.2018.04.027.
EIA, 2015. Emissions by states [WWW document]. https://www.eia.gov/electricity/ Oakes, M., Baxter, L., Long, T.C., 2014. Evaluating the application of multipollutant ex-
annual/html/epa_09_05.html accessed 7.22.18. posure metrics in air pollution health studies. Environ. Int. 69, 90–99. https://doi.
Franklin, M., Koutrakis, P., Schwartz, P., 2008. The role of particle composition on the org/10.1016/j.envint.2014.03.030.
association between PM2.5 and mortality. Epidemiology 19, 680–689. https://doi. Punj, G., Stewart, D.W., 1983. Cluster Analysis in marketing research: review and sug-
org/10.1097/EDE.0b013e3181812bb7. gestions for application. J. Mark. Res. 20, 134–148. https://doi.org/10.2307/
Fu, X., Wang, S., Zhao, B., Xing, J., Cheng, Z., Liu, H., Hao, J., 2013. Emission inventory of 3151680.
primary pollutants and chemical speciation in 2010 for the Yangtze River Delta re- Querol, X., Alastuey, A., Moreno, T., Viana, M.M., Castillo, S., Pey, J., Rodríguez, S.,
gion, China. Atmos. Environ. Times 70, 39–50. https://doi.org/10.1016/j.atmosenv. Artiñano, B., Salvador, P., Sánchez, M., Garcia Dos Santos, S., Herce Garraleta, M.D.,
2012.12.034. Fernandez-Patier, R., Moreno-Grau, S., Negral, L., Minguillón, M.C., Monfort, E.,
Gunsch, M.J., May, N.W., Wen, M., Bottenus, C., Gardner, D.J., Vanreken, T.M., Bertman, Sanz, M.J., Palomo-Marín, R., Pinilla-Gil, E., Cuevas, E., de la Rosa, J., Sánchez de la
S.B., Hopke, P.K., Ault, A.P., Pratt, K.A., 2018. Ubiquitous influence of wildfire Campa, A., 2008. Spatial and temporal variations in airborne particulate matter
emissions and secondary organic aerosol on summertime atmospheric aerosol in the (PM10 and PM2.5) across Spain 1999-2005. Atmos. Environ. 42, 3964–3979. https://
forested Great Lakes region. Atmos. Chem. Phys. 18, 3701–3715. https://doi.org/10. doi.org/10.1016/j.atmosenv.2006.10.071.
5194/acp-18-3701-2018. Requia, W.J., Adams, M.D., Koutrakis, P., 2017. Association of PM 2.5 with diabetes,
Hand, J.L., Schichtel, B.A., Pitchford, M., Malm, W.C., Frank, N.H., 2012. Seasonal asthma, and high blood pressure incidence in Canada: a spatiotemporal analysis of
composition of remote and urban fine particulate matter in the United States. J. the impacts of the energy generation and fuel sales. Sci. Total Environ. 108, 584–585.
Geophys. Res. Atmos. 117, 1–22. https://doi.org/10.1029/2011JD017122. https://doi.org/10.1016/j.scitotenv.2017.01.166.
Huang, Q., Cheng, S., Perozzi, R.E., Perozzi, E.F., 2012. Use of a MM5–camx–PSAT Roberts, P.T., Friedlander, S.K., 1976. Photochemical aerosol formation. Sulfur dioxide,
modeling system to study SO2 source apportionment in the beijing metropolitan 1-heptene, and NOx in ambient air. Environ. Sci. Technol. 10, 573–580. https://doi.
region. Environ. Model. Assess. 17, 527–538. https://doi.org/10.1007/s10666-012- org/10.1021/es60117a004.
9312-8. Shen, J., Liu, X., Zhang, Y., Fangmeier, A., Goulding, K., Zhang, F., 2011. Atmospheric
Huang, Y., Shen, H., Chen, Y., Zhong, Q., Chen, H., Wang, R., Shen, G., Liu, J., Li, B., Tao, ammonia and particulate ammonium from agricultural sources in the North China
S., 2015. Global organic carbon emissions from primary sources from 1960 to 2009. Plain. Atmos. Environ. 45, 5033–5041. https://doi.org/10.1016/j.atmosenv.2011.
Atmos. Environ. 122, 505–512. https://doi.org/10.1016/j.atmosenv.2015.10.017. 02.031.
Jaffe, D., Hafner, W., Chand, D., Westerling, A., Spracklen, D., 2008. Interannual varia- Spracklen, D.V., Logan, J.A., Mickley, L.J., Park, R.J., Yevich, R., Westerling, A.L., Jaffe,
tions in PM2.5 due to wildfires in the Western United States. Environ. Sci. Technol. D.A., 2007. Wildfires drive interannual variability of organic carbon aerosol in the
42, 2812–2818. https://doi.org/10.1021/es702755v. western U.S. in summer. Geophys. Res. Lett. 34, 2–5. https://doi.org/10.1029/
Jeffe, D.A., Anderson, T.L., Covert, D.S., Kotchenruther, R., Trost, B., Danielson, J., 2007GL030037.
Simpson, W., Berntsen, T.K., Karlsdottir, S., Blake, D.R., Harries, J., Carmichael, G., Spracklen, D.V., Mickley, L.J., Logan, J.A., Hudman, R.C., Yevich, R., Flannigan, M.D.,
Uno, I., 1999. Transport of asian air pollutant to North America. Geophys. Res. Lett. Westerling, A.L., 2009. Impacts of climate change from 2000 to 2050 on wildfire
26, 26711–26714. activity and carbonaceous aerosol concentrations in the western United States. J.
Jeong, J.I., Park, R.J., 2013. Effects of the meteorological variability on regional air Geophys. Res. Atmos. 114, 1–17. https://doi.org/10.1111/lest.12019.
quality in East Asia. Atmos. Environ. 69, 46–55. https://doi.org/10.1016/j.atmosenv. van Donkelaar, A., Martin, R.V., Brauer, M., Boys, B.L., 2014. Use of satellite observations
2012.11.061. for long-term exposure assessment of global concentrations of fine particulate matter.
Kanakidou, M., Seinfeld, J.H., Pandis, S.N., Barnes, I., Dentener, F.J., Facchini, M.C., Environ. Health Perspect. 110, 135–143. https://doi.org/10.1289/ehp.1408646.
Dingenen, R. Van, 2005. Organic aerosol and global climate modelling: a review. Wesson, K., Fann, N., Morris, M., Fox, T., Hubbell, B., 2010. A multi–pollutant, risk–based
Atmos. Chem. Phys. 1053–1123. https://doi.org/10.5194/acp-5-1053-2005. approach to air quality management: case study for Detroit. Atmos. Pollut. Res. 1,
Keller, J.P., Drton, M., Larson, T., Kaufman, J.D., Sandler, D.P., Szpiro, A.A., 2017. 296–304. https://doi.org/10.5094/APR.2010.037.
Covariate-adaptive clustering of exposures for air pollution epidemiology cohorts. Wong, D.W., Yuan, L., Perlin, S.A., 2004. Comparison of spatial interpolation methods for
Ann. Appl. Stat. 11, 93–113. https://doi.org/10.1214/16-AOAS992. the estimation of air quality data. J. Expo. Anal. Environ. Epidemiol. 14, 404–415.
Lage, J.P., Assuncao, R.M., Reis, E.A., 2001. A minimal spanning tree algorithm applied to https://doi.org/10.1038/sj.jea.7500338.
spatial cluster analysis. Electron. Notes Discrete Math. 162–165. Ying, Q., Kleeman, M.J., 2006. Source contributions to the regional distribution of sec-
Laskin, A., Caspar, D.J., Wang, W., Hunt, S.W., Cowin, J.P., Colson, S.D., Finlayson-pitts, ondary particulate matter in California. Atmos. Environ. 40, 736–752. https://doi.
B.J., 2003. Reactions at interfaces as a source of sulfate formation in sea-salt particles. org/10.1016/j.atmosenv.2005.10.007.
Adv. Technol. Aerosp. Database 301, 340–345. Zanobetti, A., Austin, E., Coull, B.A., Schwartz, J., Koutrakis, P., 2014. Health effects of
Lee, H.J., Coull, B. a, Bell, M.L., Koutrakis, P., 2012. Use of satellite-based aerosol optical multi-pollutant profiles. Environ. Int. 71, 13–19. https://doi.org/10.1016/j.envint.
271
2014.05.023. 12171–12195. https://doi.org/10.3390/ijerph121012171.

Zanobetti, A., Franklin, M., Koutrakis, P., Schwartz, J., 2009. Fine particulate air pollu- Zhang, Q., Jiang, X., Tong, D., Davis, S.J., Zhao, H., Geng, G., Feng, T., Zheng, B., Lu, Z.,
tion and its components in association with cause-specific emergency admissions. Streets, D.G., Ni, R., Brauer, M., Van Donkelaar, A., Martin, R.V., Huo, H., Liu, Z.,
Environ. Health (Nagpur) 8, 58. https://doi.org/10.1186/1476-069X-8-58. Pan, D., Kan, H., Yan, Y., Lin, J., He, K., Guan, D., 2017. Transboundary health im-
Zhang, P., Hong, B., He, L., Cheng, F., Zhao, P., Wei, C., Liu, Y., 2015. Temporal and pacts of transported global air pollution and international trade. Nature 543,
spatial simulation of atmospheric pollutant PM2.5 changes and risk assessment of 705–709. https://doi.org/10.1038/nature21712.
population exposure to pollution using optimization algorithms of the back propa-
gation-artificial neural network model and GIS. Int. J. Environ. Res. Public Health 12,
272

Regional Air Pollution Requia2019

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regional Air Pollution Requia2019

Uploaded by

Copyright:

Available Formats

Atmospheric Environment 213 (2019) 258–272

Contents lists available at ScienceDirect

Regional air pollution mixtures across the continental US T

of clusters. To estimate the optimal number of clusters, we applied the Table 1

Fig. 1. Optimal number of clusters (Pseudo F values).

Fig. 2. Distribution of membership probability.

Fig. 9. Spatial distribution and mixture ratios in the region 2 - Southeast.

2014.05.023. 12171–12195. https://doi.org/10.3390/ijerph121012171.

You might also like