Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Science of the Total Environment 897 (2023) 165511

Contents lists available at ScienceDirect

Science of the Total Environment


journal homepage: www.elsevier.com/locate/scitotenv

Hydrogeochemical and sediment parameters improve predication accuracy


of arsenic-prone groundwater in random forest machine-learning models

Wenjing Guo a,b, Zhipeng Gao a,b, , Huaming Guo a,b, , Wengeng Cao c

a
State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Beijing 100083, PR China
b
MOE Key Laboratory of Groundwater Circulation and Environmental Evolution, School of Water Resources and Environment, China University of Geosciences (Beijing), Beijing
100083, PR China
c
Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geological Sciences, Shijiazhuang 050061, PR China

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• The full random forest model well pre-


dicted groundwater arsenic (As) distribu-
tions.
• Better prediction was achieved when con-
sidering groundwater and sediment vari-
ables.
• Groundwater Fe(II) dominated all predic-
tive variables in As prediction.
• The model supports that As mobility was
primarily caused by Fe(III) oxide reduc-
tion.

A R T I C L E I N F O A B S T R A C T

Editor: Filip M.G.Tack The relative importance of groundwater geochemicals and sediment characteristics in predicting groundwater arsenic
distributions was rarely documented. To figure this out, we established a random forest machine-learning model to
Keywords: predict groundwater arsenic distributions in the Hetao Basin, China, by using 22 variables of climate, topographic fea-
Groundwater
tures, soil properties, sediment characteristics, groundwater geochemicals, and hydraulic gradients of 492 groundwa-
Arsenic
ter samples. The established model precisely captured the patchy distributions of groundwater arsenic concentrations
Hetao Basin
Random forest model
in the basin with an AUC value of 0.84. Results suggest that Fe(II) was the most prominent variable in predicting
Prediction groundwater arsenic concentrations, which supported that the enrichment of arsenic in groundwater was caused by
the reductive dissolution of Fe(III) oxides. The high relative importance of SO2− 4 indicated that sulfate reduction
was also conducive to groundwater arsenic enrichment in inland basins. Nevertheless, parameters of climate variables,
sediment characteristics, and soil properties showed secondly important roles in predicting groundwater arsenic con-
centrations. The other two models, which excluded parameters of groundwater geochemicals and/or sediment charac-
teristics, showed much worse predictions than the model considering all variables. This highlights the importance of
variables of groundwater geochemicals and sediment characteristics in improving the precision and accuracy of
predicting results. Future studies should probe a method constructing the random forest predicting model with high
precision based on the limited number of groundwater samples and sediment samples.

⁎ Corresponding authors at: School of Water Resources and Environment, China University of Geosciences (Beijing), Beijing 100083, PR China.
E-mail addresses: zpgao@cugb.edu.cn (Z. Gao), hmguo@cugb.edu.cn (H. Guo).

http://dx.doi.org/10.1016/j.scitotenv.2023.165511
Received 24 March 2023; Received in revised form 1 July 2023; Accepted 11 July 2023
Available online 12 July 2023
0048-9697/© 2023 Elsevier B.V. All rights reserved.
W. Guo et al. Science of the Total Environment 897 (2023) 165511

1. Introduction There are many kinds of machine learning models for predicting the dis-
tribution of groundwater chemicals, including neural network models (Cao
Arsenic (As) is a highly toxic and carcinogenic metalloid commonly et al., 2021), support vector machine learning models (Park et al., 2016),
occurring in groundwater. The World Health Organization (WHO) has rec- boosted regression tree models (Erickson et al., 2021a, b), logistic regres-
ommended an As guideline concentration of 10 μg/L in drinking water. sion models (Podgorski et al., 2017), and random forest machine learning
Long-term consumption of high-As groundwater (>10 μg/L) can cause se- models (Podgorski and Berg, 2020). Random forest algorithm is an artificial
vere adverse health consequences, such as skin diseases, cancer, and cardio- intelligence integrated learning algorithm that has developed rapidly in re-
vascular diseases (Kapaj et al., 2006; Karagas et al., 2015). It has been cent years. It is a classification model that can generate an ensemble of de-
estimated that up to 220 million people in >70 countries and regions cision trees, which can be used to predict a binary class on the basis of the
were threatened by high-As groundwater around the world (Podgorski associated independent variables. It can well handle the samples with
and Berg, 2020), being mostly in the East and Southeast Asia, mixed data types, missing values, and outliers without increasing the com-
e.g., Bangladesh (Huhmann et al., 2022), Cambodia (Richards et al., putation cost, thus greatly reducing the noise impact (Li et al., 2022). Since
2019), China (Feng et al., 2022), India (Khan and Rai, 2022), Nepal the classification or regression is performed by repeatedly bisecting data,
(Gyawali et al., 2022), Pakistan (Ur Rehman et al., 2022), and Vietnam the amount of computation is much lower than other machine learning
(Glodowska et al., 2021). methods (such as neural networks and support vector machines)
It is generally accepted that the occurrence of high-As groundwater is (Ebrahimy et al., 2021). Because of the high tolerance to outliers in data se-
the result of water-rock interactions, being primarily related to the reduc- ries and the significantly higher accuracy of prediction results than other
tive dissolution of As-bearing Fe(III) oxides under reducing conditions commonly used algorithms, it has been extensively applied in natural sci-
(Guo et al., 2013). The process is intimately influenced by geomorphology, ence research, especially in the field of hydrogeology (Ebrahimy et al.,
stratigraphy, and hydrometeor factors (Chakraborty et al., 2022). For in- 2021).
stance, thick clays capping on the top of the aquifer may facilitate genesis The Hetao Basin of Inner Mongolia is one of the areas affected by high-
of high-As groundwater (Choudhury et al., 2018; Verma et al., 2016), As groundwater in China. Field investigation showed that groundwater As
which is likely due to the fact that thick clay layer inhibits flushing of the concentrations were high up to 1480 μg/L, threatening around 300 thou-
aquifer and thus favors accumulation of As in groundwater. This has also sands residents in the basin (Deng et al., 2009). In the area, high-As ground-
been observed in the Hetao Basin, where heterogeneity of clay layers water was thought to be related to soil properties (Tong et al., 2014),
caused by diversion and swing of the ancient Yellow River course since sediment characteristics (Shen et al., 2018), climates (Jia et al., 2017),
the Late Pleistocene has resulted in considerable variations in groundwater and groundwater geochemicals (Guo et al., 2008). Geochemically, ground-
As concentrations (Cao et al., 2018). Furthermore, the combination of high water As enrichment in the basin was primarily the result of the reductive
evapotranspiration, high aridity, high temperature, and low precipitation dissolution of As-bearing Fe(III) oxides, and was secondarily caused by As
increases the evaporation of groundwater, thereby aggravating the concen- desorption under weakly alkaline environments (Gao et al., 2020).
tration of As, especially in inland or enclosed basins with arid or semi-arid Recently, Fu et al. (2022) have used nine surface environmental variables
climates (Alarcón-Herrera et al., 2013; Smedley and Kinniburgh, 2002). Ar- covering climatic and topographical features to predict the possibility of
senic desorption under weakly alkaline environments is also conducive to high-As groundwater across the Hetao Basin based on a random forest
groundwater As enrichment (Richards et al., 2019), which was considered model. They found that the distributions of high-As groundwater increased
as the major cause of high-As groundwater in the arid Indus Plain with high from the winter to the summer with the increases in temperature, rainfall,
soil pH and high levels of soil organic carbon (Podgorski et al., 2017). and evapotranspiration. Nevertheless, synthetic effects of the above-
As stated above, the enrichment of groundwater As is generally the re- mentioned parameters on distributions of high-As groundwater are still un-
sult of the synergy of many environmental factors. Due to the complex en- known, and deserve further investigations.
richment mechanism of groundwater As, limited groundwater samples Therefore, the study aims to (1) develop a random forest prediction
cannot fully constrain the area hosting high-As groundwater. Prediction model based on datasets of climate, topographic features, soil properties,
models can provide information on locations of safe groundwater to gov- sediment characteristics, groundwater geochemicals, and hydraulic gradi-
ernment or managers by simultaneously considering distinct variables ents to show the possibility distributions of high-As groundwater,
(Erickson et al., 2021a, b). Therefore, predictions on distributions of high- (2) rank model variables in determining distribution of groundwater As
As groundwater via machine learning models have been increasingly con- concentrations, and (3) evaluate the importance of groundwater geochem-
ducted at global (Cao et al., 2021; Podgorski and Berg, 2020), country icals and sediment characteristics in improving the prediction accuracy of
(Ayotte et al., 2017; Connolly et al., 2022; Lombard et al., 2021; the established model.
Mukherjee et al., 2021; Rodríguez-Lado et al., 2013; Tan et al., 2020),
basin (Fu et al., 2022), and regional scales (Kumar and Pati, 2022; Singh 2. Materials and methods
et al., 2022). These models have included variables of climate
(e.g., precipitation, evapotranspiration, and temperature), soil, geology, hy- 2.1. Study area
drology, land use/land cover, and the topographic wetness index as predic-
tor variables (Cao et al., 2021; Nath et al., 2022; Wu et al., 2021; Podgorski The Hetao Basin (40°10′-41° 20′N, 106° 15′-109° 30′E) is located in the
and Berg, 2020; Zhang et al., 2012). They coincidentally suggest that cli- western part of Inner Mongolia, which is formed by the alluvial of the Yel-
mate and soil variables played a crucial role in predicting high-As low River and its tributaries (Fig. 1b). It is around 200 km long and 50 km
groundwater distributions (Cao et al., 2021; Podgorski and Berg, wide, with a total area of around 13,000 km2. The elevation of the basin is
2020). Although climate and soil variables were commonly used in pre- around 1020–1050 m above the sea level. From south to north, it can be di-
dictive models on groundwater As distributions, few studies have used vided into the Yellow River alluvial lacustrine plain and the piedmont
groundwater geochemicals and sedimentary characteristics as variables alluvial-proluvial inclined plain. The Hetao Basin has a typical arid and
to predict distributions of high-As groundwater. The ignorance of sedi- semi-arid continental monsoon climate with little rainfall, which is unevenly
mentary characteristics and groundwater geochemicals in the model seasonally distributed. The average annual rainfall is from 130 to 220 mm,
would limit the accuracy and precise of the predicting results. There- being concentrated in June–September, while the annual average evapora-
fore, it is necessary to comprehensively consider those variables poten- tion is much stronger with around 1900–2500 mm (Guo et al., 2008).
tially contributing to groundwater As enrichment to create a model with Quaternary sediments are important stratigraphic components in the
high accuracy, which not only provides the reliable predicting of Hetao Basin, and their thickness varies in different areas within the basin.
groundwater As distribution but also helps to unravel the dominant fac- The Late Pleistocene (Q3) sediments have a thickness of 15–400 m, being
tors controlling groundwater As mobility. mainly composed of conglomeratic coarse sand, fine sand, and silty sand.

2
W. Guo et al. Science of the Total Environment 897 (2023) 165511

Fig. 1. Locations of (a) the Hetao Basin and (b) the sampling sites, and (c) the stratigraphic profiles of the study area obtained from the constructed 3D geological model.
Locations of the stratigraphic profiles are provided in panel b.

It is the main aquifer where high-As groundwater occurs. The Holocene collected from aquifers at depths from 2 to 120 m bls (Fig. S2). The samples
(Q4) is a set of sediments transitioning from the lacustrine fluvial facies to were uniformly distributed in the basin at a density of 0.041 km2/sample
the Yellow River alluvial facies. The main lithology is the interlayer of yel- (Fig. 1b). The As concentrations in groundwater samples varied widely
low clay soil and silty sand, clay sand, sandy clay and clay, with a thickness from below the detection limit (<0.01 μg/L) to 917 μg/L, with an average
of 10–50 m. The Quaternary groundwater systems can be divided into two and median concentration of 57.5 and 8.57 μg/L, respectively. Groundwa-
groups: the shallow and deep aquifers. The shallow aquifers consist of ter As data were classified into two groups: high-As groundwater
upper Pleistocene–Holocene alluvial–pluvial and alluvial–lacustrine sands (>10 μg/L) and low-As groundwater (≤10 μg/L) (Fig. 1b). High-As ground-
within depths up to around 40 m below the land surface (bls). The deep water accounted for a proportion of 48.6 % among all groundwater sam-
aquifers are composed of middle Pleistocene lacustrine sands occurring at ples.
depths >40 m bls, being separated by a thick blanket of clay layers from
the upper shallow aquifers (Guo et al., 2008). 2.4. Independent model variables

2.2. Construction of the 3D geological model Parameters that may potentially influence groundwater As enrichment
based on previous studies (Bindal and Singh, 2019; Cao et al., 2021;
A three-dimensional (3D) geological model of the Hetao Basin was de- Podgorski and Berg, 2020) were selected as predictor variables. A total of
veloped to digitalize distribution of lithological settings. 204 boreholes 32 variables covering climate, topographic features, soil properties, sedi-
that are evenly distributed across the basin (Fig. S1), collected from Cao ment characteristics, groundwater geochemicals, and hydraulic gradients
et al. (2018), were used to construct the 3D geological model utilizing were initially included as prediction variables of the model (Table S1).
ESRI ArcGIS Desktop software. Briefly, we first classified the lithologies Among these variables, datasets of climate, soil properties, and topographic
into four sediment groups, i.e., medium-coarse sand, loamy sand, silt-fine features are all continuous data in raster formats, which were acquired from
sand, and clay. Secondly, stratigraphic division was made according to li- publicly available datasets, while groundwater geochemicals, sediment
thology and geological age. Subsequently, spatial interpolation of the stra- characteristics, and hydraulic gradients are discrete data in vector formats.
tum lithology based on the classified sediment groups in different Values of these continuous variables were extracted using point extraction
boreholes was carried out. The cross-validation and accuracy comparison method to represent the data in the locations of groundwater sample in
were also performed to acquire the 3D geological model with high quality. ESRI ArcGIS Desktop software.
The stratigraphic profiles of 3D geological model are shown in Fig. 1c. The
3D geological model was eventually used to obtain variables of sediment 2.4.1. Climate data
characteristics used in the predicting models. Climate datasets include actual evapotranspiration (AET), precipitation
(PRE), temperature (TEM), and normalized difference vegetation index
2.3. Dataset of groundwater As concentrations (NDVI). The raw datasets of AET (Fig. S3a), PRE (Fig. S3b), TEM
(Fig. S3c), and NDVI (Fig. S3d) were compiled from the APPEEARS website
The dataset of groundwater As concentrations (n = 492) were compiled (https://appeears.earthdatacloud.nasa.gov/), with resolutions of 500 m,
from Cao et al. (2018) and Gao et al. (2022). All groundwater samples were 2500 m, 2500 m, and 500 m, respectively. The summer data (June–August)

3
W. Guo et al. Science of the Total Environment 897 (2023) 165511

in 2014 were collected, being consistent with the groundwater sampling TEM, and NDVI); (2) soil properties (including Coarse sand (volumetric per-
periods. Based on the surface energy balance model (SEBS) combined centage, 0 cm depth), Clay (weight percentage, 200 cm depth), Silt (weight
with MODIS data, the summer evapotranspiration in the study area was percentage, 200 cm depth), Sand (weight percentage, 0 cm depth)); (3) sedi-
inverted, which made the evapotranspiration data more accurate and reli- ment characteristics (including overlying clay thickness, overlying sand thick-
able (Ma et al., 2018). Some non-value grids in these datasets were proc- ness, and well depth); (4) groundwater geochemicals (including pH and
essed utilizing nearby values for PRE, TEM, and NDVI data. concentrations of Fe(II), Mn, Na+, Ca2+, Mg2+, Cl−, SO2− 4 and HCO− 3 );
and (5) others (including hydraulic gradient and DEM). These predictor var-
2.4.2. Soil properties iables were ranked in the order of importance based on their average predic-
Datasets of soil properties mainly comprise coarse sand (volumetric per- tive power within the model. This model was regarded as Model I. In the
centage, 0 cm depth) (Fig. S4a), sand (weight percentage, 0 cm depth) Model I, datasets of groundwater As concentrations and 22 predictor vari-
(Fig. S4b), clay (weight percentage, 200 cm depth) (Fig. S4c), silt (weight ables described above (Table S1) were used to create a statistical prediction
percentage, 200 cm depth) (Fig. S4d), and fluvisols (Fig. S4e), which model of high-As groundwater exceeding the WHO guideline of 10 μg/L.
were compiled from SoilGrids website (https://soilgrids.org/). These In order to explore the roles of groundwater geochemicals and sediment
datasets all have resolutions of 250 m. characteristics in improving the accuracy of the predicting model, we also cre-
ated another two models (Models II and III) to make comparisons with the
2.4.3. Topographic features above developed model (i.e., Model I). The Model II was developed excluding
Topographic features mainly include DEM (Fig. S5a), aspect (Fig. S5b), variables of groundwater geochemicals from Model I, and the Model III was
and slope (Fig. S5c), which were obtained from the Spatial Geographical constructed excluding variables of both groundwater geochemicals and sedi-
Data Cloud website (http://www.gscloud.cn/) with resolutions of 90 m, ment characteristics from Model I. Variables in Model II were identical to
90 m, and 30 m, respectively. those in previous studies (Tan et al., 2020; Podgorski et al., 2020), in which
climate, soil properties, and sediment characteristics were mainly considered.
2.4.4. Sediment characteristics However, variables in Model III were analogous to those in Nath et al. (2022)
Aquifer sediment characteristics include lithology (loamy sand, fine sand, and Cao et al. (2021), in which climate and soil properties were mainly in-
and medium-coarse sand), geological age, the individual number of overlying cluded.
clay and sand layers, the individual thickness of overlying clay and sand The predictor variables verified in the three models (Models I, II, and
layers, and well depths. Well depths for groundwater sampling were compiled III) were applied to predict distributions of the possibility of high-As
from Cao et al. (2018) and Gao et al. (2022), which were used to extract the groundwater in the Hetao Basin using ESRI ArcGIS Desktop software.
individual number of the overlying clay and sand layers and the individual Prior to producing the prediction maps, discrete predictor variables
thickness of the overlying clay and sand layers. All sedimentary datasets, ex- (e.g., groundwater geochemicals and sediment characteristics) were first
cept for well depth, were obtained from the established 3D geological model interpolated using methods of inverse distance weighted (IDW) or radial
(Fig. 1c) by projecting the groundwater samples with varied well depths. basis function (RBF) with resolutions of 500 m × 500 m. The method
showing low values of mean error and root mean square error (RMSE)
2.4.5. Groundwater geochemicals was finally chosen for the interpolation. The interpolation errors of each
Datasets of groundwater geochemicals, including pH, total dissolved predictor variable are shown in Table S2, where the interpolation method
solids (TDS), Mn, Fe(II), K+, Na+, Mg2+, Ca2+, F−, Cl−, HCO−3 and SO4
2
used in this study was highlighted in bold.

concentrations, were compiled from Cao et al. (2018) and Gao et al.
(2022), being similar to groundwater As data. 2.6. Accuracy assessment and sensitivity analysis

2.4.6. Hydraulic gradient Accuracy (ACC) and positive predictive value (PPV) were calculated to
Hydraulic gradient of each groundwater being sampled was obtained by evaluate the accuracy of the prediction model. The ACC was calculated as
projecting groundwater sampling sites into groundwater contour map in the percentage of correctly predicted results among the total sample based
Zhang et al. (2013). on Eq. (1). The PPV is the proportion of true positive samples to all predicted
positive samples according to Eq. (2). Both ACC and PPV are important and
2.5. Development of the random forest machine learning model indispensable metrics in model performance evaluation (Ebrahimy et al.,
2021). The accuracy of the prediction model can also be evaluated by the
A random forest model was constructed to produce possibility of high- model sensitivity (true positive rate) and specificity (true negative rate).
As groundwater distributions, which was implemented in the Python pro- The true positive rate (TPR) reflects the ability of the model correctly classi-
gramming language (version 3.8.5). During the modeling, all datasets fying high-As groundwater samples (Eq. (3)), and true negative rate (TNR)
were randomly divided into the training set (75 %, n = 369) and the test measures its ability to classify low-As groundwater (Eq. (4)). A plot of the sen-
set (25 %, n = 123) to train the model and test the model performance, re- sitivity (i.e., TPR) and specificity (i.e., TNR) for all the outputs of model cutoff
spectively. To achieve maximum model accuracy and reliability, a recursive values between 0 and 1 produces a receiver operating characteristic (ROC)
feature elimination (RFE) method was used to perform feature selection curve. The area under the curve (AUC) determines the predictive capability
within the 10-fold cross-validation (CV). The number of decision trees of the model, which ranges from 0.5 (no predictive capability) to 1 (best pre-
(hereafter as n_estimators), the maximum depth of the decision tree (here- dictive capability) (Tesoriero et al., 2017).
after as max_depth), and the maximum number of features that the random
forest allows a single decision tree to use (hereafter as max_features) are im- Accuracy ðACCÞ ¼ ðTP þ TNÞ=ðTP þ TN þ FP þ FNÞ (1)
portant parameters determining the accuracy and reliability of random for-
Positive predictive value ðPPVÞ ¼ TP=ðTP þ FPÞ (2)
est models. Generally, higher numbers of decision trees (n_estimators) are
associated with better model results. Nevertheless, too many decision
True positive rate ðTPRÞ ¼ TP=ðTP þ FNÞ (3)
trees can also lead to overfitting, thereby reducing the reliability of the
model. The square root of the number of predictors is calculated as the
values of max_features. The out-of-bag (OOB) samples, which refer to ap- True negative rate ðTNRÞ ¼ TN=ðTN þ FPÞ (4)
proximately one third of the data being not included during the random
sampling, were calculated for estimation of model error (Li et al., 2022). where, the true positive (TP) value shows the number of high-As
A total of 22 predictor variables were finally selected (Table S1), which groundwater samples when the model accurately predicts the high-As
were classified into five types: (1) climate variables (including AET, PRE, groundwater class (i.e., observed As concentration > 10 μg/L and predicted

4
W. Guo et al. Science of the Total Environment 897 (2023) 165511

Table 1
Confusion matrix for the Model I.
Confusion matrix Actual value

Positive Negative

Predictive value Positive TP FP


57 13
Negative FN TN
13 40

As concentration > 10 μg/L); the true negative (TN) value is the outcome
when the model accurately predicts the low-As groundwater class
(i.e., observed As concentration ≤ 10 μg/L and predicted As concentra-
tion ≤ 10 μg/L); the false positive (FP) value is a result when the model in-
accurately predicts the high-As groundwater class (i.e., observed As
concentration > 10 μg/L and predicted As concentration ≤ 10 μg/L); and
the false negative (FN) is a result when the model inaccurately predicts
Fig. 2. AUC value of performance evaluation parameters and the probability cut-off
the low-As groundwater class (i.e., observed As concentration ≤ 10 μg/L
point in Model I.
and predicted As concentration > 10 μg/L) (Schapire, 2001).

3. Results and discussion of lacustrine deposit with abundant organic matter (Guo et al., 2008),
which were conductive to groundwater As enrichment (Fendorf et al.,
3.1. Model performance metrics and prediction map of high-As groundwater 2010). Predicted high-As groundwater with probability >0.59 covered an
area of around 10,597 km2, accounting for 70.3 % of the total area of the
The best random forest model (i.e., Model I) was achieved when the per- study area (15,082 km2).
formance evaluation parameter of “n_estimators” was 447 and
“max_depth” was 9. The values of ACC and PPV calculated from the confu- 3.2. Ranking of model variables in Model I
sion matrix shown in Table 1 were 0.79 and 0.81 (Table 2), respectively, in-
dicating a good prediction of the Model I. It should be noted that there was The importance of distinct variables in predicting groundwater As concen-
no distortion of the ACC parameters since the numbers of high- and low-As trations was assessed by the random forest model (Fig. 4). Rankings of the rel-
groundwater samples were approximately equal. The OOB score of the ative importance of the predictor variables in the Model I suggest that high-As
overall model with a value of 0.84 suggests good model performance. In ad- groundwater in the Hetao Basin was the synergistic result of groundwater geo-
dition, TPR and TNR of the overall model reached values of 0.81 and 0.75, chemicals, sediment characteristics, climate, and soil properties.
respectively. This indicates that the model had high ability to predict both
high-As and low-As groundwater distributions (Bindal and Singh, 2019). 3.2.1. Groundwater geochemicals
The AUC value of the Model I under the ROC curve was 0.84 (Fig. 2), Dissolved Fe(II) concentration was the most prominent variable in
being relatively higher than other studies with values of 0.68–0.8 (Bindal predicting groundwater As concentration in the basin (Fig. 4). This was ev-
and Singh, 2019; Erickson et al., 2018; Fu et al., 2022; Podgorski et al., idenced by the significantly higher dissolved Fe(II) concentrations in high-
2017; Winkel et al., 2008; Zhang et al., 2012). As groundwater than those in low-As groundwater with a p value < 0.001
The obtained Model I was used to produce the possibility distribution of (Fig. S6a). The reductive dissolution of Fe(III) oxides in reducing environ-
high-As groundwater of the study area (Fig. 3). The high-risk distribution ments was thus considered to be the main cause of groundwater As mobility
areas of groundwater As with probability of 0.13–0.88 in the study area (Guo et al., 2014). Normally, Fe(III) oxides have a high affinity to As due to
are shown in Fig. 3a. This prediction map covered not only the areas with their high specific area and positively charged surface sites at neutral to
known high-As groundwater, but also the areas that were not monitored weakly alkaline pH (Gao et al., 2020). During the reductive dissolution of
by field investigations. Areas with high-As groundwater of high probability Fe(III) oxides in aquifers under reducing conditions, high-valence Fe(III)
(>0.8) mainly distributed in the east part of the basin (Fig. 3a), covering an is reduced to low-valence Fe(II) with better mobility. Simultaneously, the
area of 2334 km2. However, the predicted high-As groundwater with prob- released As(V) was reduced to As(III) with higher mobility, leading to
ability of 0.6–0.8 covered an area of 8073 km2. Furthermore, high-risk more As release into groundwater (Ayotte et al., 2017; Postma et al.,
areas with groundwater As concentrations exceeding 10 μg/L in the study 2007). This is consistent with a large number of studies, showing that
area were also identified by binary classification of the random forest groundwater As and Fe(II) concentrations were commonly positively corre-
model based on the probability cutoff of 0.59 (Fig. 3b). The probability lated (Perez et al., 2019; Zhou et al., 2022).
cut-off was determined by the best trade-off between sensitivity and speci- Groundwater SO2− 4 concentration was ranked as the third important
ficity (Fig. 2). The prediction results show that the high-As areas were con- predictor in the Model I (Fig. 4). Statistical results showed that dissolved
centrated in the depositional center of the Hetao Basin, especially in the SO2−4 concentration of high-As groundwater samples (average
east of the basin (Fig. 3b). This may be due to the fact that aquifers within 306 mg/L, median 235 mg/L) were significantly lower than that of
the center of the Hetao Basin had sluggish groundwater flow and consisted low-As groundwater samples (average 401 mg/L, median 306 mg/L)
(p < 0.001) (Fig. S6c), indicating that lower SO2− 4 concentrations were

Table 2 conducive to groundwater As mobility. The SO 2− 4 /Cl molar ratios,
The performance evaluation parameters of the Models I, II, and III. which may reflect the extent of SO2− 4 reduction, were also significantly
Model I Model II Model III
lower in high-As groundwater than those in shallow groundwater
(p < 0.001) (Fig. S6d). This suggests that SO 2− 4 reduction facilitated
OOB score 0.84 0.70 0.68
groundwater As enrichment (Liu et al., 2022), being possibly due to
ACC 0.79 0.67 0.63
PPV 0.81 0.60 0.57 the fact that H2 S and S 0, as products from SO 2− 4 reduction, can be
TPR 0.81 0.76 0.71 coupled to Fe(III) oxide reduction to cause As release from aquifer sed-
TNR 0.75 0.59 0.55 iments into groundwater from aquifer sediments (Guo et al., 2016), or
AUC 0.84 0.75 0.69
forming highly mobile thioarsenic species (Nghiem et al., 2023).

5
W. Guo et al. Science of the Total Environment 897 (2023) 165511

Fig. 3. (a) Probability distribution map of high As groundwater with As concentrations >10 μg/L and (b) high As-risk areas based on the probability cut-off point of 0.59 in the
Hetao Basin.

Fig. 4. The relative importance of 22 predictor variables from the Model I.

6
W. Guo et al. Science of the Total Environment 897 (2023) 165511

Dissolved HCO− 3 concentration and the weakly alkaline groundwater Other sedimentary characteristics, including overlying sand thickness
pH also played non-negligible roles in predicting groundwater As concen- and overlying clay thickness, showed relatively lower contributions to
trations (Fig. 4). Groundwater HCO− 3 concentration ranged from 153 to groundwater As concentrations (Fig. 4). Statistical results show that the
1739 mg/L in the study area, being significantly higher in high-As ground- overlying sand thickness (average 17.9 m) and the proportion of overlying
water (average 525 mg/L, median 483 mg/L) than those in low-As ground- sand thickness to total thickness of sediment (average 74 %) of the aquifer
water (average 475 mg/L, median 433 mg/L) (p < 0.01) (Fig. S6e). containing high-As groundwaters were slightly higher than those of the
Similarly, the pH values of the high-As groundwater samples were also sig- low-As ones (average 15.1 m and 72 %, respectively) (Fig. 5). However,
nificantly (average 7.95, median 7.85) higher than those of the low-As the proportion of the thickness of overlying clay layers to total thickness
groundwater samples (average 7.88, median 7.83) (p < 0.05) (Fig. S6f). of the aquifer (average 26 %) was slightly lower in the aquifers containing
The presence of high concentration of HCO− 3 and high groundwater pH high-As groundwater than those of the low-As ones (Fig. 5b). This suggests
may compete with As for the solid surface and induce the As desorption that deposit environments with thicker overlying sand and thinner clay
into groundwater (Stachowicz et al., 2008; Stolze et al., 2019). The compe- layers were favorable for the shallow groundwater to receive the recharge
tition of HCO− −
3 and OH with As for sediment surface sites was evaluated of surface-derived organic matter, and thus facilitated the reductive dissolu-
by utilizing As adsorption experiments and surface complexation model in tion of Fe(III)/Mn oxides (Gao et al., 2022; Li et al., 2014). Therefore, the
our previous study (Gao et al., 2020). It showed that higher HCO− 3 concen- higher proportion of sand layers in the overlying aquifer sediments was
trations and pH values were associated with weaker As adsorption, which conducive to the formation of higher As groundwater.
supported the competition effect of HCO− 3 and pH on As adsorption.
Groundwater Mn concentration was identified as a predictor in Model I, 3.2.4. Hydraulic gradients
although its relative importance was much less than that of groundwater Fe The low hydraulic gradient was conducive to groundwater As enrich-
(II). Basically, groundwater Mn concentration can be a robust indictor of ment, although its importance was relatively low (Fig. 4). The flat topogra-
Mn oxide reduction, which may also release As into groundwater phy of the study area (Fig. S5) and low hydraulic gradient across the region
(Akintomide et al., 2021). Another possible explanation was that ground- represent extremely low groundwater flow rate, which limited flush of
water As and Mn usually showed distinct depth distributions, demonstrat- groundwater As out of the system (van Geen et al., 2008). Furthermore,
ing that shallower groundwater was typically characterized by higher Mn aquifers characterized by flat topography and low groundwater flow rate
and lower As concentrations compared to deeper groundwater (Ying usually had abundant finer-grained sediments enriched in As-bearing Fe
et al., 2017). Overall, the remarkably higher relative importance of dis- (III) oxides and organic matter (Guo et al., 2019), which thus exacerbated
solved Fe(II) than that of dissolved Mn suggests that the enrichment of As As accumulation in the flat, low-lying areas during the Fe(III) oxide reduc-
in groundwater was more controlled by the reductive dissolution of As- tion (Gao et al., 2022).
bearing Fe(III) oxides.
3.3. Importance of groundwater geochemicals and sediment characteristics in As
3.2.2. Variables of climate predictions
Climate variables were also identified as crucial predictors on ground-
water As concentrations in Model I (Fig. 4), being consistent with previous Another two models (i.e., Models II and III) were constructed to explore
studies (Cao et al., 2021; Fu et al., 2022). In the study area, AET ranged the importance of groundwater geochemicals and sediment characteristics
from 46.2 mm/month to 330 mm/month, being significantly higher in in predicting groundwater As distributions. The best Model II was achieved
high-As groundwater (average 268 mm/month, median 273 mm/month) when n_estimators = 320 and max_depth = 8. The performance evaluation
than those in low-As groundwater (average 255 mm/month, median parameters of OOB score, ACC, PPV, and AUC were 0.70, 0.67, 0.60, and
266 mm/month) (p < 0.001) (Fig. S7a). This indicates that high evapotranspi- 0.75, respectively (Table 2). The probability cut-off point of the Model II
ration was associated with high probability of high-As groundwater (Alarcón- was 0.54 (Fig. 6c). According to this probability cut-off point and the pre-
Herrera et al., 2013; Smedley and Kinniburgh, 2002). According to the Model dicted variables, the risk distribution map of high-As groundwater of the
I, the AET ranked as the second vital variable among all variables (Fig. 4), study area was produced (Fig. 6a). The model III achieved the best predic-
which was consistent with previous studies suggesting much important contri- tion when n_estimators = 141 and max_depth = 6. The obtained OOB
bution of the AET to predict high-As groundwater distributions (Ayotte et al., score, ACC, PPV, and AUC values were 0.68, 0.63, 0.57, and 0.69, respec-
2017; Cao et al., 2021; Fu et al., 2022; Nath et al., 2022; Podgorski and Berg, tively (Table 2), with the probability cut-off point of 0.60 (Fig. 6d). The
2020). Furthermore, AET was usually positively correlated with the NDVI risk distribution map of groundwater As of Model III is shown in Fig. 6b.
(Wang et al., 2021), since higher NDVI may result in stronger evaporation The importance of the variables in predicting groundwater As concentra-
and transpiration of vegetation leaves, and therefore higher actual evapotrans- tions of Models II and III is shown in Fig. S8.
piration (Wang et al., 2021). Therefore, the NDVI was also identified as an im- According to the performance evaluation parameters, the precision and
portant predictor on groundwater As enrichment (Fig. 4). accuracy of the Model I were the highest among the three models, followed
Similarly, the TEM was also higher in high-As groundwater (average by the Model II and the Model III. The Model III, excluding variables of
23.6 °C, median 23.7 °C) than those in low-As groundwater (average groundwater geochemicals and sediment characteristics, showed the low-
23.5 °C, median 23.5 °C) (p < 0.001) (Fig. S7b). Higher temperature pro- est predicting precision (e.g., the lowest AUC values of 0.69). In the
motes stronger evapotranspiration and exacerbates drought. Therefore, Model III, climate parameters (e.g., AET and NDVI) were identified as the
the combination of high evapotranspiration, high temperature, and low most important variables (Fig. S8b), being quite similar to other studies
precipitation may be responsible for high-As groundwater (Rodríguez- (Ayotte et al., 2017; Fu et al., 2022; Nath et al., 2022; Podgorski and
Lado et al., 2013). Berg, 2020). The Model II, which considered sedimentary characteristics
relative to the Model III, showed a higher AUC value than that in the
3.2.3. Sediment characteristics Model III (Table 2). This suggests that the performance of the Model II
The well depth was an important predictor of groundwater As concen- was significantly improved relative to that of Model III. In the Model II, sed-
trations. High-As groundwater was mainly distributed in shallow aquifers imentary characteristics (e.g., well depth, overlying sand thickness, and
with depths of <40 m bls, being consistent with high-As groundwater in overlying clay thickness) were also identified as important predictors
Southeast Asia (Fendorf et al., 2010). The reasons may be that shallow aqui- (Fig. S8a). However, when considering groundwater geochemicals in the
fer sediments with younger geological ages were higher in reactivities of Model I, a much higher AUC value (0.84) was achieved compared to that
As-bearing Fe(III) oxides and organic matter (Stuckey et al., 2016), and in the Model II with 0.74. This suggests that even better prediction results
readily received labile dissolved organic matter from surface water were achieved when both groundwater geochemicals and sedimentary
(Erban et al., 2014). characteristics were included in the model (Fig. 3). Groundwater

7
W. Guo et al. Science of the Total Environment 897 (2023) 165511

Fig. 5. Boxplots showing (a) the thickness of overlying clay and sand layers and (b) the proportion of overlying clay and sand layers to total thickness of sediments
corresponding to aquifers with low- and high-As groundwater. Red points show average values.

geochemicals, especially groundwater Fe(II) concentrations, were much 3.4. Model limitation and implications
more important in predicting groundwater As concentrations than climate
parameters (Figs. 4 and S8). This suggests that sedimentary characteristics By utilizing random forest predicting models, we evaluated the impor-
and groundwater geochemicals should be considered in better predicting tance of sediment characteristics and groundwater geochemicals in
groundwater As distributions, which can significantly improve the accu- predicting groundwater As distributions. In comparison with the existing
racy of the model. As prediction models with an AUC value of 0.784 (e.g., Fu et al., 2022),

Fig. 6. Probability distribution map of high As groundwater in (a) Model II, and (b) Model III, and the AUC values in (c) Model II, and (d) Model III.

8
W. Guo et al. Science of the Total Environment 897 (2023) 165511

the prediction results showed significant improvement when considering China (grant No. 2021YFA0715902), and National Natural Science
both groundwater geochemicals and sediment characteristics with an Foundation of China (grant No. 42102285).
AUC value of 0.84 (Fig. 2). We found that groundwater geochemicals, espe-
cially groundwater Fe(II) concentrations representing the extent of Fe(III) Appendix A. Supplementary data
oxide reduction, showed higher relative importance in predicting ground-
water As concentrations than that of climate parameters. The latter one Supplementary data to this article can be found online at https://doi.
has been recognized as the most important predictors in previous modeling org/10.1016/j.scitotenv.2023.165511.
studies (Ayotte et al., 2017; Fu et al., 2022; Nath et al., 2022; Podgorski and
Berg, 2020). The As predicting maps can be used by authorities to identify References
areas potentially threatened by high-As groundwater, and provide the guid-
ance on locating safe drinking groundwater for local residents. Akintomide, O.A., Amer, R.M., Hanor, J.S., Datta, S., Johannesson, K.H., 2021. Pleistocene
The study highlights the importance of groundwater geochemicals and sands of the Mississippi River alluvial aquifer produce the highest groundwater arsenic
concentrations in southern Louisiana, USA. J. Hydrol. 595, 125995.
sediment characteristics in improving the model prediction. However, the Alarcón-Herrera, M.T., Bundschuh, J., Nath, B., Nicolli, H.B., Gutierrez, M., Reyes-Gomez,
limitation of the model is that it requires a large number of datasets V.M., Nuñez, D., Martín-Dominguez, I.R., Sracek, O., 2013. Co-occurrence of arsenic
which would be unavailable in rural/developing areas. Future studies and fluoride in groundwater of semi-arid regions in Latin America: genesis, mobility
and remediation. J. Hazard. Mater. 262, 960–969.
should find a way to construct prediction models precisely with limited Ayotte, J.D., Medalie, L., Qi, S.L., Backer, L.C., Nolan, B.T., 2017. Estimating the high-arsenic
datasets of both groundwater and sediments. domestic-well population in the conterminous United States. Environ. Sci. Technol. 51,
12443–12454.
Bindal, S., Singh, C.K., 2019. Predicting groundwater arsenic contamination: regions at risk in
4. Conclusion highest populated state of India. Water Res. 159, 65–76.
Cao, W., Guo, H., Zhang, Y., Ma, R., Li, Y., Dong, Q., Li, Y., Zhao, R., 2018. Controls of
The random forest model was applied to predict the spatial distribution paleochannels on groundwater arsenic distribution in shallow aquifers of alluvial plain
in the Hetao Basin, China. Sci. Total Environ. 613-614, 958–968.
of geogenic As concentration in the Hetao Basin, using 22 variables cover-
Cao, H., Xie, X., Wang, Y., Deng, Y., 2021. The interactive natural drivers of global geogenic
ing climate, topographic features, soil properties, sediment characteristics, arsenic contamination of groundwater. J. Hydrol. 597, 126214.
groundwater geochemicals, and hydraulic gradients. The established Chakraborty, M., Mukherjee, A., Ahmed, K.M., Fryar, A.E., Bhattacharya, A., Zahid, A., Das,
Model I precisely captured the patchy distributions of groundwater As con- R., Chattopadhyay, S., 2022. Influence of hydrostratigraphy on the distribution of
groundwater arsenic in the transboundary Ganges River delta aquifer system, India and
centrations with OOB score, ACC, PPV, and AUC values of 0.84, 0.79, 0.81, Bangladesh. GSA Bull. 134, 2680–2692.
and 0.84, respectively. The prediction results show that the high-As areas Choudhury, R., Nath, B., Khan, M.R., Mahanta, C., Ellis, T., Geen, A., 2018. The impact of
were concentrated in the depositional center of the Hetao Basin, especially aquifer flushing on groundwater arsenic across a 35-km transect perpendicular to the
upper Brahmaputra River in Assam, India. Water Resour. Res. 54, 8160–8173.
in the east of the basin. Among the all considered predictors, the model Connolly, C.T., Stahl, M.O., DeYoung, B.A., Bostick, B.C., 2022. Surface flooding as a key
identified that Fe(II) was the most prominent variable predicting high-As driver of groundwater arsenic contamination in Southeast Asia. Environ. Sci. Technol.
groundwater, indicating that the enrichment of As in groundwater was re- 56, 928–937.
Deng, Y., Wang, Y., Ma, T., Gan, Y., 2009. Speciation and enrichment of arsenic in strongly
lated to the reductive dissolution of Fe(III) oxides. The high relative impor- reducing shallow aquifers at western Hetao Plain, northern China. Environ. geology (Ber-
tance of SO2− 4 indicated that sulfate reduction was also conducive to lin) 56, 1467–1477.
groundwater As enrichment in inland basins. Climate data, which were pre- Ebrahimy, H., Mirbagheri, B., Matkan, A.A., Azadbakht, M., 2021. Per-pixel land cover accu-
racy prediction: a random forest-based method with limited reference sample data. ISPRS
viously ranked as the predominant predictor, were less important than
J. Photogramm. Remote Sens. 172, 17–27.
groundwater geochemicals (including Fe(II) and SO2− 4 ). By comparing Erban, L.E., Gorelick, S.M., Fendorf, S., 2014. Arsenic in the multi-aquifer system of the Me-
the three different prediction models (e.g., Models I, II, and III), we sug- kong Delta, Vietnam: analysis of large-scale spatial trends and controlling factors. ACS
Publ. 48 (11), 6081–6088.
gested that introduction of groundwater geochemicals and sediment char-
Erickson, M.L., Elliott, S.M., Christenson, C.A., Krall, A.L., 2018. Predicting geogenic arsenic
acteristics as variables significantly improved the precision and accuracy in drinking water wells in glacial aquifers, north-Central USA: accounting for depth-
of the prediction model. Future studies should find a way to construct a pre- dependent features. Water Resour. Res. 54 (12), 10172–10187.
cise prediction model with limited datasets of groundwater samples and Erickson, M.L., Elliott, S.M., Brown, C.J., Stackelberg, P.E., Ransom, K.M., Reddy, J.E., 2021a.
Machine learning predicted redox conditions in the glacial aquifer system, northern con-
sediment samples. tinental United States. Water Resour. Res. 57 (4), 028027.
Erickson, M.L., Elliott, S.M., Brown, C.J., Stackelberg, P.E., Ransom, K.M., Reddy, J.E.,
CRediT authorship contribution statement Cravotta, C.A., 2021b. Machine-learning predictions of high arsenic and high manganese
at drinking water depths of the glacial aquifer system, northern continental United States.
Environ. Sci. Technol. 55, 5791–5805.
Wenjing Guo: Writing – original draft, Data curation, Formal analysis, Fendorf, S., Michael, H.A., van Geen, A., 2010. Spatial and temporal variations of groundwa-
Visualization, Methodology, Software. Zhipeng Gao: Writing – review & ter arsenic in south and Southeast Asia. Science 328, 1123–1127.
Feng, S., Guo, H., Sun, X., Han, S., Li, Y., 2022. Relative importance of hydrogeochemical and
editing, Project administration, Resources, Supervision, Validation, hydrogeological processes on arsenic enrichment in groundwater of the Yinchuan Basin,
Funding acquisition. Huaming Guo: Writing – review & editing, Project China. Appl. Geochem. 137, 105180.
administration, Resources, Supervision, Validation, Funding acquisition. Fu, Y., Cao, W., Pan, D., Ren, Y., 2022. Changes of groundwater arsenic risk in different
seasons in Hetao Basin based on machine learning model. Sci. Total Environ. 817,
Wengeng Cao: Resources, Data curation.
153058.
Gao, Z.P., Jia, Y.F., Guo, H.M., Zhang, D., Zhao, B., 2020. Quantifying geochemical processes
Data availability of arsenic mobility in groundwater from an Inland Basin using a reactive transport model.
Water Resour. Res. 56, e2019WR025492.
Gao, Z., Guo, H., Li, S., Wang, J., Ye, H., Han, S., Cao, W., 2022. Remote sensing of wetland
Data will be made available on request. evolution in predicting shallow groundwater arsenic distribution in two typical inland ba-
sins. Sci. Total Environ. 806, 150496.
Declaration of competing interest Glodowska, M., Stopelli, E., Straub, D., Vu, T.D., Trang, P., Viet, P.H., AdvectAs, T.M., Berg,
M., Kappler, A., Kleindienst, S., 2021. Arsenic behavior in groundwater in Hanoi
(Vietnam) influenced by a complex biogeochemical network of iron, methane, and sulfur
The authors declare that they have no known competing financial inter- cycling. J. Hazard. Mater. 407, 124398.
ests or personal relationships that could have appeared to influence the Guo, H., Yang, S., Tang, X., Li, Y., Shen, Z., 2008. Groundwater geochemistry and its implica-
tions for arsenic mobilization in shallow aquifers of the Hetao Basin, Inner Mongolia. Sci.
work reported in this paper. Total Environ. 393, 131–144.
Guo, H.M., Li, X.M., Xiu, W., He, W., Cao, Y.S., Zhang, D., Wang, A., 2019. Controls of organic
Acknowledgements matter bioreactivity on arsenic mobility in shallow aquifers of the Hetao Basin, P.R.
China. J. Hydrol. 571, 448–459.
Guo, H., Liu, C., Lu, H., Wanty, R.B., Wang, J., Zhou, Y., 2013. Pathways of coupled arsenic
The authors declare no conflict of interest. The study was financially and iron cycling in high arsenic groundwater of the Hetao basin, Inner Mongolia,
supported by the National Key Research and Development Program of China: an iron isotope approach. Geochim. Cosmochim. Acta 112, 130–145.

9
W. Guo et al. Science of the Total Environment 897 (2023) 165511

Guo, H.M., Wen, D.G., Liu, Z.Y., Jia, Y.F., Guo, Q., 2014. A review of high arsenic groundwa- Postma, D., Larsen, F., Minh Hue, N.T., Duc, M.T., Viet, P.H., Nhan, P.Q., Jessen, S., 2007. Ar-
ter in Mainland and Taiwan, China: distribution, characteristics and geochemical pro- senic in groundwater of the Red River floodplain, Vietnam: controlling geochemical pro-
cesses. Appl. Geochem. 41, 196–217. cesses and reactive transport modeling. Geochim. Cosmochim. Acta 71, 5054–5071.
Guo, H., Zhou, Y., Jia, Y., Tang, X., Li, X., Shen, M., Lu, H., Han, S., Wei, C., Norra, S., Zhang, Richards, Laura A., Casanueva-Marenco, Maria J., Magnone, Daniel, Sovann, Chansopheaktra,
F., 2016. Sulfur cycling-related biogeochemical processes of arsenic mobilization in the van Dongen, Bart E., Polya, David A., 2019. Contrasting sorption behaviours affecting
Western Hetao Basin, China: evidence from multiple isotope approaches. Environ. Sci. groundwater arsenic concentration in Kandal Province, Cambodia. Geosci. Front. 10
Technol. 50, 12650–12659. (5), 1701–1713.
Gyawali, T., Pant, S., Nakamura, K., Komai, T., Paudel, S.R., 2022. Spatial and temporal dis- Rodríguez-Lado, L., Sun, G., Berg, M., Zhang, Q., Xue, H., Zheng, Q., Johnson, C.A., 2013.
tribution of arsenic contamination in groundwater of Nawalparasi-West, Nepal: an inves- Groundwater arsenic contamination throughout China. Science 341, 866–868.
tigation with suggested countermeasures for South Asian Region. Environ. Monit. Assess. Schapire, R.E., 2001. Random forests. Mach. Learn. 45, 5–32.
194, 582. Shen, M., Guo, H., Jia, Y., Cao, Y., Zhang, D., 2018. Partitioning and reactivity of iron oxide
Huhmann, L.B., Harvey, C.F., Navas-Acien, A., Graziano, J., Slavkovich, V., Chen, Y., Argos, minerals in aquifer sediments hosting high arsenic groundwater from the Hetao basin,
M., Ahsan, H., van Geen, A., 2022. A mass-balance model to assess arsenic exposure P. R. China. Appl. Geochem. 89, 190–201.
from multiple wells in Bangladesh. J. Exposure Sci. Environ. Epidemiol. 32, 442–450. Singh, S.K., Taylor, R.W., Pradhan, B., Shirzadi, A., Pham, B.T., 2022. Predicting sustainable
Jia, Y., Guo, H., Xi, B., Jiang, Y., Zhang, Z., Yuan, R., Yi, W., Xue, X., 2017. Sources of ground- arsenic mitigation using machine learning techniques. Ecotoxicol. Environ. Saf. 232,
water salinity and potential impact on arsenic mobility in the western Hetao Basin, Inner 113271.
Mongolia. Sci. Total Environ. 601-602, 691–702. Smedley, P.L., Kinniburgh, D.G., 2002. A review of the source, behaviour and distribution of
Kapaj, S., Peterson, H., Liber, K., Bhattacharya, P., 2006. Human health effects from chronic arsenic in natural waters. Appl. Geochem. 17, 517–568.
arsenic poisoning– a review. J. Environ. Sci. Health A 41, 2399–2428. Stachowicz, M., Hiemstra, T., van Riemsdijk, W.H., 2008. Multi-competitive interaction of As
Karagas, M.R., Gossai, A., Pierce, B., Ahsan, H., 2015. Drinking water arsenic contamination, (III) and As(V) oxyanions with Ca2+, Mg2+, PO3− 2−
4 , and CO3 ions on goethite. J. Colloid
skin lesions, and malignancies: a systematic review of the global evidence. Curr. Environ. Interface Sci. 320, 400–414.
Health Rep. 2, 52–68. Stolze, L., Zhang, D., Guo, H., Rolle, M., 2019. Surface complexation modeling of arsenic mo-
Khan, M.U., Rai, N., 2022. Arsenic and selected heavy metal enrichment and its health risk bilization from goethite: interpretation of an in-situ experiment. Geochim. Cosmochim.
assessment in groundwater of the Haridwar district, Uttarakhand, India. Environ. Earth Acta 248, 274–288.
Sci. 81 (12), 337. Stuckey, J.W., Sparks, D.L., Fendorf, S., 2016. Delineating the convergence of biogeochemical
Kumar, S., Pati, J., 2022. Assessment of groundwater arsenic contamination level in Jhar- factors responsible for arsenic release to groundwater in south and Southeast Asia. Adv.
khand, India using machine learning. J. Comput. Sci. 63, 101779. Agron. 140, 43–74.
Li, Y., Guo, H., Hao, C., 2014. Arsenic release from shallow aquifers of the Hetao basin, Inner Tan, Z., Yang, Q., Zheng, Y., 2020. Machine learning models of groundwater arsenic spatial
Mongolia: evidence from bacterial community in aquifer sediments and groundwater. distribution in Bangladesh: influence of Holocene sediment depositional history. Environ.
Ecotoxicology 23, 1900–1914. Sci. Technol. 54, 9454–9463.
Li, Y., Du, Y., Deng, Y., Fan, R., Tao, Y., Ma, T., Wang, Y., 2022. Predicting the spatial distri- Tesoriero, A.J., Gronberg, J.A., Juckem, P.F., Miller, M.P., Austin, B.P., 2017. Predicting
bution of phosphorus concentration in Quaternary sedimentary aquifers using simple redox-sensitive contaminant concentrations in groundwater using random forest classifi-
field parameters. Appl. Geochem. 142, 105349. cation. Water Resour. Res. 53, 7316–7331.
Liu, E., Yang, Y., Xie, Z., Wang, J., Chen, M., 2022. Influence of sulfate reduction on arsenic Tong, J., Guo, H., Wei, C., 2014. Arsenic contamination of the soil–wheat system irrigated
migration and transformation in groundwater environment. Water 14 (6), 942. with high arsenic groundwater in the Hetao Basin, Inner Mongolia, China. Sci. Total En-
Lombard, M.A., Bryan, M.S., Jones, D.K., Bulka, C., Bradley, P.M., Backer, L.C., Focazio, M.J., viron. 496, 479–487.
Silverman, D.T., Toccalino, P., Argos, M., Gribble, M.O., Ayotte, J.D., 2021. Machine Ur Rehman, H., Ahmed, S., Ur Rahman, M., Mehmood, M.S., 2022. Arsenic contamination, in-
learning models of arsenic in private wells throughout the conterminous United States duced symptoms, and health risk assessment in groundwater of Lahore, Pakistan. Envi-
as a tool for exposure assessment in human health studies. Environ. Sci. Technol. 55, ron. Sci. Pollut. Res. 29, 49796–49807.
5012–5023. van Geen, A., Zheng, Y., Goodbred, S., Horneman, A., Aziz, Z., Cheng, Z., Stute, M., Mailloux,
Ma, Y., Shaomin, L., Lisheng, S., Ziwei, X., Yaling, L., Tongren, X., Zhongli, Z., 2018. Estima- B., Weinman, B., Hoque, M.A., Seddique, A.A., Hossain, M.S., Chowdhury, S.H., Ahmed,
tion of daily evapotranspiration and irrigation water efficiency at a Landsat-like scale for K.M., 2008. Flushing history as a hydrogeological control on the regional distribution of
an arid irrigation area using multi-source remote sensing data. Remote Sens. Environ. arsenic in shallow groundwater of the Bengal Basin. Environ. Sci. Technol. 42,
216, 715–734. 2283–2288.
Mukherjee, A., Sarkar, S., Chakraborty, M., Duttagupta, S., Bhattacharya, A., Saha, D., Verma, S., Mukherjee, A., Mahanta, C., Choudhury, R., Mitra, K., 2016. Influence of geology
Bhattacharya, P., Mitra, A., Gupta, S., 2021. Occurrence, predictors and hazards of ele- on groundwater–sediment interactions in arsenic enriched tectono-morphic aquifers of
vated groundwater arsenic across India through field observations and regional-scale the Himalayan Brahmaputra river basin. J. Hydrol. 540, 176–195.
AI-based modeling. Sci. Total Environ. 759, 143511. Wang, H., Li, Z., Cao, L., Feng, R., Pan, Y., 2021. Response of NDVI of natural vegetation to
Nath, B., Chowdhury, R., Ni Meister, W., Mahanta, C., 2022. Predicting the distribution of ar- climate changes and drought in China. Land 10, 966.
senic in groundwater by a geospatial machine learning technique in the two most affected Winkel, L., Berg, M., Amini, M., Hug, S.J., Johnson, C.A., 2008. Predicting groundwater arse-
districts of Assam, India: the public health implications. GeoHealth 6, e2021GH000585. nic contamination in Southeast Asia from surface parameters. Nat. Geosci. 1, 536–542.
Nghiem, A.A., Prommer, H., Mozumder, M.R.H., Siade, A., Jamieson, J., Ahmed, K.M., van Wu, R., Podgorski, J., Berg, M., Polya, D.A., 2021. Geostatistical model of the spatial distribu-
Geen, A., Bostick, B.C., 2023. Sulfate reduction accelerates groundwater arsenic contam- tion of arsenic in groundwaters in Gujarat state, India. Environ. Geochem. Health 43,
ination even in aquifers with abundant iron oxides. Nat. Water 1, 151–165. 2649–2664.
Park, Y., Ligaray, M., Kim, Y.M., Kim, J.H., Cho, K.H., Sthiannopkao, S., 2016. Development of Ying, S.C., Schaefer, M.V., Cock-Esteb, A., Li, J., Fendorf, S., 2017. Depth stratification leads to
enhanced groundwater arsenic prediction model using machine learning approaches in distinct zones of manganese and arsenic contaminated groundwater. Environ. Sci.
Southeast Asian countries. Desalin. Water Treat. 57, 12227–12236. Technol. 51, 8926–8932.
Perez, J.P.H., Tobler, D.J., Thomas, A.N., Freeman, H.M., Dideriksen, K., Radnik, J., Benning, Zhang, Q., Rodríguez-Lado, L., Johnson, C.A., Xue, H., Shi, J., Zheng, Q., Sun, G., 2012.
L.G., 2019. Adsorption and reduction of arsenate during the Fe2+ -induced transforma- Predicting the risk of arsenic contaminated groundwater in Shanxi Province, Northern
tion of Ferrihydrite. ACS Earth Space Chem. 3, 884–894. China. Environ. Pollut. 165, 118–123.
Podgorski, J., Berg, M., 2020. Global threat of arsenic in groundwater. Science (New York, Zhang, Y., Cao, W., Wang, W., Dong, Q., 2013. Distribution of groundwater arsenic and hy-
N.Y.) 368. draulic gradient along the shallow groundwater flow-path in Hetao Plain, Northern
Podgorski, J.E., Eqani, S., Khanam, T., Ullah, R., Shen, H., Berg, M., 2017. Extensive arsenic China. J. Geochem. Explor. 135, 31–39.
contamination in high-pH unconfined aquifers in the Indus Valley. Sci. Adv. 3, e1700935. Zhou, J., Liu, Y., Bu, H., Liu, P., Sun, J., Wu, F., Hua, J., Liu, C., 2022. Effects of Fe(II)-induced
Podgorski, J., Wu, R., Chakravorty, B., Polya, D.A., 2020. Groundwater arsenic distribution in transformation of scorodite on arsenic solubility. J. Hazard. Mater. 429, 128274.
India by machine learning geospatial modeling. Int. J. Environ. Res. Public Health 17
(19), 7119.

10

You might also like