Journal of Geochemical Exploration 148 (2015) 196–205

Contents lists available at ScienceDirect

Journal of Geochemical Exploration

journal homepage:

Determination of natural backgrounds and thresholds of nitrate in South

Korean groundwater using model-based statistical approaches
Kyoung-Ho Kim a, Seong-Taek Yun a,b,⁎, Hyun-Koo Kim c, Ji-Wook Kim d
Department of Earth and Environmental Sciences, Korea University, Seoul 136-701, South Korea
KU-KIST Green School, Korea University, Seoul 136-701, South Korea
National Institute of Environmental Research (NIER), Incheon 404-170, South Korea
Korea Water Resources Corporation, Daejeon 306-711, South Korea

a r t i c l e i n f o a b s t r a c t

Article history: Increased nitrate loading of groundwater has emerged as a major environmental problem in many countries, in-
Received 18 January 2014 cluding South Korea. This study aims to evaluate the nitrate levels of South Korean groundwater on a regional
Accepted 2 October 2014 (national) scale and specifically to demonstrate the procedure to better estimate the natural background level
Available online 12 October 2014
(NBL) and threshold of nitrate as the basis of groundwater management. For this work, nitrate data of groundwa-
ter (n = 8510) in two major hydrogeologic units (alluvium and bedrock) were collected from the National
Background level and threshold of nitrate
Groundwater Monitoring Network (NGMN) of South Korea. Four supplementary datasets (n = 1074) were
Anthropogenic polluted level also used to test the rationality of estimated thresholds by comparing them with NGMN datasets. Compared
Finite normal (Gaussian) mixture model with the data reported in many countries, the nitrate concentrations in NGMN groundwater in 2009 are high,
National Groundwater Monitoring Network of with median values of 12.2 and 8.7 mg/L, respectively, for alluvial groundwater and bedrock groundwater. The
South Korea nitrate levels of South Korean groundwater seem to have been historically steady at these high levels between
1997 and 2009, suggesting widespread diffusive contamination since the 1980s. The NBLs and anthropogenic
polluted levels (APLs) of nitrate on a regional (national) scale are statistically established by the model-based ap-
proach using a finite normal (Gaussian) two-component mixture model, because (1) the sample size (frequency)
of the natural background group is much smaller than that of the polluted group, as a result of widespread nitrate
contamination, and (2) nitrate concentrations are more or less affected by natural attenuation processes. Accord-
ingly, thresholds of nitrate (as the concentration level indicating groundwater pollution) are selected as the
lower limits (i.e., 10th percentile) of the polluted group, which are 3.0 and 5.5 mg/L NO− 3 , respectively, for bed-
rock groundwater and alluvial groundwater. This study provides a practical guideline for national groundwater
management, based on a heuristic procedure to statistically determine the NBLs and thresholds in the case of
groundwater systems with pervasive contamination. Compared with the other classical methods to estimate
NBLs, the model-based approach using a finite normal-mixture model can be more effective to reasonably sepa-
rate the polluted samples from a regional (or national) dataset.
© 2014 Elsevier B.V. All rights reserved.

1. Introduction 1993). As a common constituent in groundwater, nitrate is the most im-

portant indicator of groundwater quality status and, therefore, a practical
Approximately 20% of the global water use is groundwater usage, and need exists to precisely assess the nitrate levels in aquifers for better man-
this share is rapidly increasing (WMO, 1997; Arnell, 1999). However, agement and regulation of groundwater quality.
groundwater quality in many aquifers worldwide has been deteriorating In this context, it is essential to determine both the natural back-
by increased human impact over the past few decades. In particular, ground level (NBL) and the threshold value (in general, the upper
groundwater pollution is highly associated with diffusive (nonpoint) limit or maximum of NBL) of nitrate to differentiate between natural
sources in relation to agricultural activities in which many types of inor- controls (i.e., geogenic, biological, and atmospheric processes) and an-
ganic and organic fertilizers are used (Burkart and Stoner, 2008; Burow thropogenic impacts on groundwater quality (Corniello and Ducci,
et al., 2010; Scanlon et al., 2007; Sebilo et al., 2013). Thus, nitrate became 2014; Limbrick, 2003; Panno et al., 2006; Preziosi et al., 2014). The
the most ubiquitous pollutant in groundwater, frequently threatening threshold is commonly used as a practical reference value for determin-
public water supply sources and human health (Spalding and Exner, ing the “good” status of groundwater quality and also serves as the level
for an early warning of groundwater pollution, because of the limita-
⁎ Corresponding author. Tel.: +82 2 32903176; fax: +82 2 32903189. tions of the existing drinking water standard (DWS) (Edmunds et al.,
E-mail address: (S.-T. Yun). 2003; Langmuir, 1997; Reimann and Garrett, 2005). In recent years,
K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205 197

EU member states have established threshold values (as the regulatory agricultural (45%) purposes and comes from two types of aquifers: shal-
levels) corresponding to the NBLs of a number of constituents in many low alluvial aquifer and relatively deep bedrock aquifer (MOCT, 2007).
aquifers (Edmunds and Shand, 2008; Hinsby et al., 2008; Müller et al., The alluvial aquifers, consisting mainly of sand with subordinate silt or
2006; Preziosi et al., 2010), based on scientific principles established gravel, have thicknesses of 10–50 m and a total areal extent of approx-
through research projects such as BASELINE and BRIDGE. These studies imately 27,000 km2, and commonly occur along rivers and streams. The
also used the NBL of nitrate as b10 mg/L NO3 for a simplified separation transmissivity (T) and storage coefficient (S) values of alluvial aquifers
approach called preselection. are 50–2000 m2/day and 0.1–0.01, respectively. In rural areas, alluvial
Many previous studies have published the NBLs of groundwater ni- aquifers are extensively used for domestic and agricultural water sup-
trate based on a number of approaches that were reviewed by Panno plies and have potential yields ranging from 30 to 800 m3/day/well.
et al. (2006). The reported values are low (b 5 mg/L NO3) because natural Bedrock aquifers represent the typical form of groundwater occur-
or geologic sources of nitrate are not abundant and common (Böhlke, rence in South Korea and are developed in or along weathered zones,
2002; Halberg and Keedny, 1993). Even so, the NBL of nitrate from faults, fractures, joints, and lithologic boundaries of bedrocks, beneath
datasets is still difficult to accurately define because of its ubiquitous oc- shallow alluvial aquifers. Groundwater yields from bedrock aquifers
currence that has resulted from the widespread and/or long-term con- vary significantly, from 10 to 5000 m3 per day, according to the rock
tamination from diffusive sources (Panno et al., 2006; Reimann and type, weathered zone thickness, and topography. For example, ground-
Garrett, 2005). Furthermore, the concentrations of nitrate in groundwater water wells in metamorphic rocks in highlands generally have low yields,
can also depend on redox conditions because anthropogenic nitrate can whereas wells in flat lowlands with sedimentary and volcanic rocks tend
be naturally attenuated by biogeochemical transformations, such as deni- to have high yields. Bedrock aquifers in South Korea can be categorized
trification (Appelo and Postma, 1999; Langmuir, 1997; Postma et al., into five major geologic groups (Chae et al., 2007): granitoids, metamor-
1991). Therefore, a good understanding of biogeochemical processes in phic rocks, complex (intermixed) rocks, volcanic rocks, and sedimentary
the investigated groundwater system is needed to precisely assess the rocks. Granitic and metamorphic rocks comprise about 70% of the surface
NBL of nitrate. For these reasons, statistical methods, including a model- outcrops in South Korea (see Fig. 1B), forming the most important and
based approach, were recently used to determine the NBL of nitrate. prevailing lithologic units of fractured bedrock aquifers. The hydraulic
Groundwater contamination in South Korea has been an important conductivity of bedrock aquifers is generally low (average 0.076 m/day)
environmental concern since the 1980s because of rapid industrialization and varies significantly by more than 4 orders of magnitude (Jeon et al.,
and urbanization. Especially, severe nitrate pollution of groundwater has 2005). Even so, bedrock aquifers in South Korea are used for various pur-
been reported in many localities with active agricultural practices poses (i.e., agricultural, industrial, and domestic). More importantly, bed-
(e.g., Chae et al., 2004, 2013; Choi et al., 2007, 2014; Joo et al., 2009; rock aquifers provide the public drinking water supply in rural areas and
Koh et al., 2010). In South Korea, anthropogenic nitrogen inputs from N- are also pumped for production of commercial bottled water.
fertilizers and manure are much higher than in other countries. The use
of synthetic N–P–K-fertilizers in South Korea is the fifth highest among 2.2. Nitrate datasets used for this study
the OECD countries (OECD, 2004). The average application rate of N-
fertilizers amounts to 224 kg/ha per year (MOAF, 2001), compared with The major dataset used for this study is the nitrate concentration
128 kg/ha per year in England (Petry et al., 2002) and 27.5 kg/ha per data collected from the NGMN (Fig. 1A). The NGMN has been operating
year in the United States (Nolan, 2001). Moreover, the nitrogen efficiency since 1995 by the Korea Water Resources Corporation (KOWACO, cur-
(i.e., balance between inputs and outputs) in the agricultural soils of rently called K-Water) to monitor the overall status of both quantity
South Korea is the lowest among the OECD countries (OECD, 2008). (by the water level fluctuation) and quality of groundwater (Choi
These situations indicate that groundwater of South Korea can be highly et al., 2014; Lee et al., 2007). A total of 320 monitoring stations are cur-
vulnerable to nitrate pollution without efficient and sustainable measures rently assessing bedrock groundwater, at an average depth of 74 m.
for nutrient management in agricultural practices. Accordingly, the Kore- Among these stations, 158 also have parallel shallow monitoring wells
an government recently began to assess the status of nitrate pollution to (average depth = 12 m) for monitoring alluvial aquifers. The regular
establish measures for effective groundwater quality management. monitoring of water quality at the NGMN stations has been performed
As an initial step for groundwater management in South Korea, the twice per year for 15 parameters, including nitrate, chloride, heavy
current study aims to evaluate the regional status of groundwater quality metals, and organic contaminants. In this study, the yearly trend of ni-
and to establish the NBL and threshold of groundwater nitrate as the trate concentrations in groundwater is evaluated for the time period be-
guidelines for management. For these purposes, large datasets collected tween 1997 and 2009 (thus, a total of 24 monitoring times) (Table 1).
from the National Groundwater Monitoring Network (NGMN) of South However, the NGMN's water quality data may not be sufficient to give
Korea are evaluated. In addition, nitrate datasets obtained from four dif- precise information regarding hydrogeochemical processes and con-
ferent hydrochemical surveys are also used to compare with NGMN tamination sources, because of the general absence of information on
data. To set the NBL and threshold, the statistical distribution of nitrate sampling and analysis. Thus, we first investigated the hydrochemistry
concentrations is divided into two groups (natural and anthropogenic) of NGMN groundwater in the first half of 2009 (Choi et al., 2014). A
by the model-based clustering approach using a finite normal (Gaussian) total of 19 parameters, including field measurements (temperature,
mixture model. This study shows the application of a heuristic statistical pH, Eh, DO, EC), alkalinity, total dissolved solid (TDS), and major cat-
procedure to determine the NBL and threshold of ubiquitous and wide- ions/anions, were obtained following the standard procedures of sam-
spread aquatic contaminants, such as nitrate. The proposed method can pling and analysis (APHA, 1985). The calculated charge balance errors
be successfully used for the groundwater systems in which the influence (C.B.E.) for the analyses were mostly within ±5%, indicating the good
of anthropogenic contamination is prevailing or overwhelming natural quality of the dataset (Hounslow, 1995). The nitrate concentration
processes. data obtained from the hydrochemical survey in 2009 (Choi et al.,
2014) are also used as the main dataset in this study (Table 1).
2. Materials and methods Additional datasets are used in this study and they were initially col-
lected from four independent groundwater surveys conducted by the
2.1. Usage and hydrogeology of groundwater in South Korea current authors. The nitrate concentration data obtained from the sur-
veys are used as supplementary datasets for comparing and statistically
In South Korea, groundwater is an important resource that accounts evaluating the nitrate levels according to the locality and scale of
for approximately 11% (about 3.7 billion tons per year) of the total groundwater sampling (Table 1). Fig. 1B and C show the sampling local-
water use. The groundwater is largely used for domestic (49%) and ities for the supplementary datasets, together with geologic and
198 K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205

Fig. 1. Topographic maps showing (A) the locations of the NGMN stations (n = 320) of South Korea, (B) the sampling localities for supplementary datasets (Cases 1, 2, 3, and 4), and
(C) regional geology and major river basins.

topographic information. These datasets are the (1) unpublished Case 1 samples of alluvial and bedrock groundwater (n = 482 for Case 3; n =
dataset (n = 33; filled triangles in Fig. 1B and C) that was collected from 494 for Case 4).
deep wells (average depth N 500 m) for production of commercial bot-
tled water in remote unpolluted areas over South Korea; (2) Case 2 2.3. Methods of statistical evaluation
dataset (n = 65; filled squares in Fig. 1C) that was collected by the de-
tailed groundwater survey in a skirt of Guri City, east of Seoul (KECO, To determine the NBL of a contaminant in groundwater, a variety of
2008), and represents the significant pollution of an alluvial aquifer be- methodologies have been applied in previous studies that generally
cause of its high vulnerability to nitrate pollution caused by high perme- used large-sized datasets (hundreds or more samples). Particularly,
ability of soils (weathered coarse-grained granite) and because of over- the cumulative probability plot (i.e., cumulative frequency diagram on
fertilization in agricultural fields (mainly orchards); and (3) Case 3 and a normal probability scale) has been considered to be most useful for
Case 4 datasets that were collected during regional groundwater determining the inflection points that separate different subpopulations
surveys in the Han River and Geum River basins, respectively (see in the large datasets (Matschullat et al., 2000; Preziosi et al., 2014;
Fig. 1C for localities) (KOWACO, 2002, 2006), and contain a few hundred Sinclair, 1974, 1991). From the inflection points, the threshold value

Table 1
General descriptions of nitrate datasets used for this study.

Dataset Aquifer type Sampling time Sampling size/scale Data adequacya Data source

National Groundwater Monitoring Network (NGMN)

NGMN at 2009 Bedrock/alluvium 2009 Large/regional (national) Adequate Choi et al. (2014)
and this study
NGMN at 1997 to 2009 Bedrock/alluvium 1997–2009 Large/regional (national) Cannot be estimated KOWACOb
(24 sampling campaigns)

Supplementary datasets
Case 1 (pristine wells) Bedrock 2009 Small/regional Adequate Unpublished
Case 2 (high risk area) Alluvium (sandy) 2008 Small/local Adequate KECO (2008)
Case 3 (Han River basin) Bedrock/alluvium 2005 Large/regional Adequate KOWACO (2006)
Case 4 (Geum River basin) Bedrock/alluvium 2001 Large/regional Adequate KOWACO (2002)
Charge balance errors (C.B.E.) for the analyses are mostly within ±5%.
Datasets are available from the National Groundwater Information Management and Service Center of KOWACO (K-water) of Korea at
K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205 199

Table 2
Descriptive statistics of nitrate levels (mg/L) in alluvial and bedrock groundwater of the National Groundwater Monitoring Network (NGMN) of South Korea.

Dataset Aquifer type N Mean SD Min. Median Max. CV (%) P-valuea

NGMN at 1997–2009 Alluvium 2788 19.2 32.1 n.d. 11.1 797.4 167.0 b0.01
Bedrock 5244 16.6 27.0 n.d. 8.9 494.4 162.6 b0.01
Total 8032 17.5 28.9 n.d. 9.7 797.4 165.0 b0.01
NGMN at 2009 Alluvium 158 17.1 21.7 0.035 12.2 132 127.2 b0.01
Bedrock 320 15.9 24.3 0.035 8.7 188.9 153.2 b0.01
Total 478 16.3 23.5 0.035 9.7 188.9 144.3 b0.01

n.d. = not detected (assumed as 0.035 mg/L, corresponding to the detection limit of the 2009 dataset).
Significant value from nonparametric test (Kolmogorov–Smirnov test) for normality.

indicating anthropogenic pollution can be selected as an upper limit (or empirical data on the assumed probability models (i.e., normal and
maximum) of the “background (natural)” subpopulation with concen- log-normal distributions).
trations of NBL. However, graphical inspections of the cumulative prob-
ability plot may lead to an unreasonable threshold value because of the 3. Results and discussion
dependency on “the eyes of the investigator” (Edmunds et al., 2003;
Reimann et al., 2005). As an alternative to the graphical method, a clas- 3.1. Nitrate levels of South Korean groundwater
sical model-based (objective) method (i.e., iterative 2-sigma technique)
was adopted to cluster the background population based on a theoreti- Two types of NGMN datasets (i.e., datasets of 1997 to 2009 and
cal density function (Matschullat et al., 2000; Nakić et al., 2007). By use 2009) are examined to assess the status of nitrate levels in both alluvial
of this method, natural background samples can be identified by fitting and bedrock groundwater on a regional (national) scale. Table 2 sum-
the frequency diagram of a dataset into a normal (or log-normal) distri- marizes descriptive statistics for the two datasets. The mean and medi-
bution after iteratively removing outliers (anomalous samples) an values of nitrate concentrations of the 1997–2009 dataset (n =
(i.e., normality test). However, the effectiveness of this classical 8032) are 17.5 mg/L and 9.7 mg/L, respectively. According to the aquifer
model-based method also largely depends on the sample size of the type, the mean and median values are 19.2 mg/L and 11.1 mg/L in allu-
dataset and can be attained only when the number of removed outliers vial groundwater (n = 2788) and 17.5 mg/L and 9.7 mg/L in bedrock
is small relative to the total sample size (Lindley, 1957; Qian and Lyons, groundwater (n = 5244). The maximum concentrations of nitrate are
2006). Especially for the nitrate dataset from some localities, such as very high (797.4 mg/L). Our NGMN data (n = 478) collected in a
South Korea, this classical statistical method can be inadequate for the hydrochemical survey in 2009 have the mean and median values of
satisfactory estimation of NBL (and threshold) of nitrate because 17.1 mg/L and 12.2 mg/L in alluvial groundwater and 15.9 mg/L and
(1) the outliers among the samples can be very large relative to the sam- 8.7 mg/L in bedrock groundwater (Table 2), which are in agreement
ple size because of the ubiquitous and pervasive nitrate pollution, and with the results from the NGMN 1997–2009 dataset. Compared with a
(2) the datasets may follow the bimodal (or polymodal) distribution be- number of large datasets from other countries, the nitrate levels of
cause of the coexistence of two or more processes controlling water South Korean groundwater seem to be generally high. The median
quality. values of nitrate analyses (n = 6000) in 22 counties of the State of
Therefore, nitrate datasets from the NGMN in this study are exam- Iowa in the US was 5.8 mg/L (Halberg and Keedny, 1993). A large
ined by the model-based approach using a finite mixture model that is dataset from the National Water Quality Assessment (NAWQA) Pro-
very suitable for directly estimating the parameters of component dis- gram by the United States Geological Survey (USGS) showed the medi-
tributions in the datasets without hypothetical normality tests an values of 0.9 mg/L in the low-risk group (n = 779) and 20.6 mg/L in
(Biernacki et al., 2006; Casella et al., 2002). Because nitrate in ground- the high-risk group (n = 1032) (Nolan et al., 1997).
water on a regional scale has an origin from both natural and anthropo- The statistical distribution of the two NGMN datasets is examined by
genic processes, the distribution of nitrate data can be theoretically the coefficient of variation (CV) and the nonparametric normality test
bimodal and thus can be divided into two components based on a finite (i.e., Kolmogorov–Smirnov test). The percentages of the CV range
normal (Gaussian) mixture model (Molinari et al., 2012; Rodríguez from 127.2% to 167.0%, and the P-value of the K–S test on the null hy-
et al., 2006; Wendland et al., 2008). This approach is consistent with pothesis that the distribution is normal is b 0.01 (Table 2), indicating
the basic contamination model that has been widely used to describe that the NGMN nitrate data are characterized by a substantially skewed
a mixture with the combination of statistical distributions (Aitkin and distribution with a heavy tail to the right (i.e., non-normality). The ma-
Wilson, 1980; Joo et al., 2007, 2009; Titterington, 1993). If the random jority of environmental datasets, including groundwater constituents,
variable x follows a finite mixture distribution with two normal densi- generally does not follow a normal distribution (Reimann and
ties of natural (background) and anthropogenic components, the prob- Filzmoser, 2000). Panno et al. (2006) also suggested that the non-
ability density function p(x) of the model can be defined as normality for large datasets of groundwater nitrate is the result of re-
gional influences on aquifers by widespread nonpoint sources. Because
pðxÞ ¼ ð1−πÞf 1 ðxÞ þ π f 2 ðxÞ; statistical distributions of NGMN nitrate datasets are closer to the log-
normality, log-transformed nitrate data are used in the following
where f1(x) and f2(x) are normal forms, such that fi = Ø(x|μi, σi) with analyses.
mean (μi) and variance (σ2i ), and the parameter π denotes the mixing The box plots in Fig. 2 show the historical (time series) variation of
ratio of anthropogenic distribution. Among a few different approaches nitrate levels in NGMN groundwater during the past 12 years (1997
to estimate the parameters of a finite mixture distribution (McLachlan to 2009). Any distinct temporal change of nitrate concentrations does
and Peel, 2000), we use the expectation–maximization (EM) algorithm, not appear in both alluvial and bedrock groundwater, suggesting that
which converges to a maximum likelihood estimate of the mixture pa- regional nitrate contamination has been steadily prevailing over South
rameters. The estimations can be simply determined by the Mixture Korea at least since 1997. Although no data on nitrate concentrations
Modeling (MIXMOD) program (Biernacki et al., 2006). Additionally, before 1997 are available, nitrate pollution of groundwater in South
nonparametric tests, such as the Mann–Whitney (M–W) and Kolmogo- Korea has likely prevailed for approximately 4 decades, since the sub-
rov–Smirnov (K–S) tests, are used to compare different independent stantial production and consumption of chemical nitrogen fertilizers
datasets used in this study and to test the goodness-of-fit of the (N150 kg/ha per year) began in the 1970s in South Korea (MOAF,
200 K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205

Fig. 2. Historical changes of nitrate levels in the NGMN datasets collected from 24 sampling campaigns during 12 years (from 1997 to 2009). The lines above the box plots indicate the
percentage of samples exceeding the drinking water standard (DWS) of nitrate (44.3 mg/L) in Korea.

2001). Fig. 2 also shows a trend of an increase in the number of samples alluvial groundwater than bedrock groundwater (Fig. 3A), which sug-
exceeding the Korean drinking water standard (44.3 mg/L), which is re- gests that shallow aquifers are more vulnerable to nitrate pollution
markable for bedrock groundwater. This trend can be a result of the in- than deep aquifers (Böhlke, 2002; Spalding and Exner, 1993). In Fig. 3,
creased number of monitoring stations in recent years. Nevertheless, it for the interquartile range (IQR) of datasets, Case 3 and Case 4 fall be-
should be noted that NGMN datasets do not show any decreasing trend tween the IQR values of the other two datasets (Case 1 and Case 2).
of nitrate concentrations in either bedrock or alluvial groundwater. Also, the nitrate data of Case 3 and Case 4 show the non-normality as
The nitrate levels of four supplementary datasets are listed in Table 3 observed in the NGMN datasets because the CV values of these datasets
and are shown in Fig. 3. The decreasing order of median values are allu- are high (N100%) and the P-values of the K–S test are b 0.01 (Table 3).
vial groundwater of Case 2 (high-risk area) (76.2 mg/L), alluvial However, the datasets of Case 1 and Case 2 accept the null hypothesis
groundwater of Case 3 (Han River Basin) (21.2 mg/L), bedrock ground- (i.e., P-values of the K–S test are higher than 0.01), indicating the normal
water of Case 4 (Geum River Basin) (18.2 mg/L), alluvial groundwater of distribution of these datasets. Such contrasting results for the two types
Case 4 (15.3 mg/L), bedrock groundwater of Case 3 (13.3 mg/L), and of datasets (i.e., Cases 1 and 2 versus Cases 3 and 4) clearly indicate that
bedrock groundwater of Case 1 (bottled water from pristine areas) groundwater nitrate data will show the normal distribution if the sur-
(3.0 mg/L). As expected, the Case 1 dataset shows the lowest nitrate veyed groundwater system is dominantly influenced by a single source
concentrations, whereas the Case 2 dataset shows the highest concen- process (i.e., the natural process as in Case 1 or the anthropogenic pollu-
trations. The other two supplementary datasets (Case 3 and Case 4) ob- tion as in Case 2), whereas regional datasets, such as NGMN, Case 3 and
tained from two large river basins show nitrate levels similar to the Case 4, show the non-normal distribution because of intermixed source
NGMN 2009 hydrochemical dataset. processes. Therefore, the NBLs are not so simple and easy to estimate
The statistical difference between neighboring datasets is tested by from the nitrate distributions of the NGMN datasets, and the use of in-
the Mann–Whitney U test and the results are shown as P-values depth statistical and graphical approaches is required.
above the box plots (Fig. 3). The datasets of Case 1 and Case 2 are signif-
icantly different (P-values b 0.01) from the other datasets, whereas the 3.2. Natural background levels of nitrate
datasets of Case 3 and Case 4 (i.e., data from regional surveys) are not
statistically different from each other (i.e., P-values N 0.05), except for To estimate the NBLs (and thresholds) of nitrate in South Korean
the difference between bedrock groundwater in the two cases groundwater, several statistical approaches using different datasets
(Fig. 3B). The nitrate levels of the NGMN dataset tend to be higher for are tested. As described above, log-transformation of nitrate data and

Table 3
Descriptive statistics of nitrate levels (mg/L) in four cases of supplementary datasets of groundwater in South Korea.

Dataset N Mean SD Min. Median Max. CV (%) p-valuea

Case 1 (pristine wells) 33 4.2 2.8 0.1 3.0 10 66.5 N0.01

Case 2 (high risk area) 65 85.5 62.2 1.6 76.2 252 72.7 N0.01
Case 3 (bedrock GW; Han River basin) 191 18.8 22.6 0.01 13.3 112.7 120.3 b0.01
Case 3 (alluvial GW; Han River basin) 291 45.3 60.6 0.04 21.1 513.2 133.6 b0.01
Case 4 (bedrock GW; Geum River basin) 304 26.8 23.7 0.1 18.2 122.8 88.6 b0.01
Case 4 (alluvial GW; Geum River basin) 190 42.7 61.5 0.1 15.3 295.9 143.8 b0.01
Significant value from non-parametric test (Kolmogorov–Smirnov test) for normality.
K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205 201

Fig. 3. Box plots comparing the nitrate levels of the different datasets in which the outliers are defined as samples out of the upper and lower whiskers, and the P-values (b0.05) indicate a
significant difference between the neighboring datasets from the Mann–Whitney U test.

the exclusion of outliers (i.e., samples falling outside the whiskers on a estimated by the classical graphical approach (i.e., detecting the first in-
box plot as in Fig. 3; Reimann et al., 2005) are employed for the datasets flection point on the cumulative probability plot).
showing the negatively skewed distribution (e.g., NGMN datasets).
Then, the distribution of log-transformed nitrate concentrations, except 3.2.2. The use of a finite normal (Gaussian) mixture model on the NGMN
outliers, is separated into two subpopulations (sample groups) by the datasets
nitrate concentrations of the NBL and anthropogenic polluted level The hypothetical mixture model from the two supplementary
(APL). Each level is set as a credible interval (i.e., from the 10th to the datasets (Case 1 and Case 2) also provides prior knowledge about the
90th percentile in the distribution). From the limits of estimated levels, statistical distribution of nitrate concentration data on a regional (na-
the threshold directly indicating groundwater pollution can be defined tional) scale. We can expect that the NGMN datasets should have the bi-
as a fixed value (e.g., an upper or outer limit of NBL). modal distribution with two normal components (natural and
anthropogenic) on a log-scale. Thus, a finite normal (Gaussian) mixture
model with two components was tested to fit the observed distributions
3.2.1. An empirical approach using the hypothetical mixture model between of nitrate concentrations in the NGMN data, which allows the separa-
datasets of Cases 1 and 2 tion of the datasets into two subpopulations, reflecting NBL and APL
The datasets of Case 1 and Case 2 show the log-normality as indicat- by directly estimating model parameters such as the mixing ratio and
ed by the P-values of N0.01 (Table 3), because of their influence from a the mean, variance, and interval of each component.
single source process (i.e., natural for Case 1 or anthropogenic for Case The histograms in Fig. 5 present the empirical distribution of log-
2). The histograms of log-transformed datasets, except outliers, show transformed nitrate data, except the outliers, in alluvial and bedrock
that the nitrate distributions follow the normal probability density func- groundwater of the NGMN 2009 dataset. As expected, the distribution
tion (PDF), with distinct mean and variance (Fig. 4A and B). These two is strongly negatively skewed to the low level of nitrate concentrations
independent populations may empirically represent the groups of NBL (i.e., polluted samples are predominate in the datasets) and is neither
and APL of nitrate in South Korea. Therefore, each level (from the 10th normal nor log-normal (i.e., non-normality on log-scale). The nitrate
to the 90th percentile) can be set as 1.6 to 8.0 mg/L for the empirical data show a type of the bimodal distribution (i.e., mixture between nat-
NBL and 43.3 to 177.6 mg/L for the empirical APL. The upper limit of ural and anthropogenic components). Therefore, the empirical bimodal
the empirical NBL (i.e., NBL90 = 8.0 mg/L) can be regarded as a thresh- distribution was fitted by the finite normal mixture model with two
old indicating anthropogenic pollution (Fig. 4A and B). components (Fig. 5). Table 4 lists the estimated parameters for the
In advance, a hypothetical mixture model with two components two normal subpopulations and the confidence intervals indicating
(natural and anthropogenic) can be established by combining the two NBL and APL in each groundwater unit. The estimated NBLs of nitrate
datasets (Case 1 and Case 2). Fig. 4C shows the mixture distribution as range from 0.1 to 1.0 mg/L for bedrock groundwater and from 0.7 to
a cumulative probability plot on which two normal components (blue 1.1 mg/L for alluvial groundwater. In addition, the upper limits of NBL
and red straight lines) are superimposed. The mixture distribution is (i.e., NBL90 = 1.0 and 1.1 mg/L) may represent the thresholds indicating
curved in the range between the NBL90 (8.0 mg/L) and APL10 the anthropogenic influences on the respective groundwater units.
(43.3 mg/L), suggesting a bimodality of the mixed distribution on log- However, these estimated values (NBL90) are significantly lower than
scale. Moreover, the plot shows the occurrence of two inflection points the threshold (8 mg/L) estimated from the empirical NBL (Case 1).
at the upper and lower limits of the empirical normal components Therefore, the reactivity of nitrate should be examined because
(i.e., NBL90 and APL10); here, the first point (NBL90 = 8.0 mg/L) corre- some geochemical processes (e.g., denitrification) can attenuate nitrate
sponds to the empirically selected threshold. Therefore, we now suggest concentrations, as have been reported for many aquifers in South Korea
that the relevant NBL (and threshold) in a mixed dataset can be (e.g., Choi et al., 2011, 2012; Kim et al., 2009). If the NGMN dataset is
202 K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205

Fig. 5. Statistical distributions (histograms) of nitrate levels for (A) bedrock and
(B) alluvial groundwater units in the NGMN datasets, which are fitted by each normal
mixture model with two components and thus separated into two subpopulations
reflecting the estimated NBL and APL, respectively.

in Fig. 5) may represent the groundwater influenced by natural attenu-

ation (i.e., denitrification).
On the other hand, the polluted groundwater observed in high-risk
areas (Case 2) have high NO3/Cl ratios (average 2.6), and the NGMN
groundwater samples with high nitrate concentrations show widely
varying ratios (Fig. 6). This wide variation is mainly caused by diverse
diffusive sources with different composition, even though varying de-
Fig. 4. Statistical distributions (histograms) of nitrate levels for two supplementary grees of denitrification can also occur in some polluted groundwater.
datasets with normality on log-scale: (A) Case 1 (pristine wells) and (B) Case 2 (high- Nevertheless, it is likely that natural (or less polluted) groundwater
risk area), and (C) the cumulative probability plot of two combined datasets showing a hy-
should have the NO3/Cl ratios below or near 1.0 (i.e., the ratios of rain-
pothetical mixture distribution affected by natural and anthropogenic impacts,
simultaneously. water and Case 1 groundwater), regardless of the natural attenuation
of anthropogenic nitrate. In Fig. 6, NGMN groundwater with NO3/Cl ra-
tios below or near 1.0 has nitrate levels below the APL10 (3.0 and
5.5 mg/L). This observation suggests that the estimated subpopulation
affected by denitrification, the estimated NBLs and thresholds (NBL90) reflecting APLs involves no or very few samples originated from natural
may be too underestimated to directly indicate the anthropogenic sources; i.e., the upper limits of true NBLs approach the lower limit of
pollution. the APLs in the NGMN datasets. Thus, we consider that the acceptable
thresholds indicating nitrate pollution fall between the upper limit of
NBL (i.e., N NBL90) and the lower limit of APL (i.e., bAPL10) in Fig. 5.
3.3. Thresholds (as reference values) of nitrate Moreover, the subpopulation of APL dominates the nitrate distribution
of the NGMN datasets (i.e., the number of polluted samples are much
Through an inspection of the relation between nitrate and chloride
in the NGMN 2009 dataset, we tried to analogically assess the effect of Table 4
denitrification on nitrate levels. Chloride ion generally accompanies ni- The results of model-based approach using a finite normal two-component mixture
trate in groundwater affected by diffusive pollution but behaves conser- model.
vatively, unlike nitrate. Fig. 6 shows the relations between nitrate NGMN at 2009 Component Parameter Credible interval
concentrations and NO3/Cl mass ratios in Korean NGMN groundwater. (center)
The NO3/Cl ratios are very low in groundwater, with the nitrate concen-
Mixing ratio Mean SD P10 P50 P90
trations below the proposed NBL90 (1.0 and 1.1 mg/L), whereas the
Bedrock GW NBL 0.11 0.5 2.8 0.1 0.3 1.0
values are highly variable with increasing nitrate levels. The NO3/Cl ra-
APL 0.89 11.9 2.7 3.0 11.6 43.1
tios for the low nitrate groundwater are much lower than the average Alluvial GW NBL 0.09 1.1 1.5 0.7 1.0 2.0
ratio (1.0) of rain water in South Korea (Kim, 2007). The ratios are APL 0.91 15.6 2.2 5.5 16.6 37.1
also lower than the average ratio (1.3) for the pristine groundwater Shown are estimated parameters of two subpopulations: groups reflecting national back-
(Case 1), having very low nitrate levels (Fig. 6). Therefore, the subpop- ground level (NBL) and anthropogenic polluted level (APL) in the NGMN 2009 datasets.
ulation with very low nitrate concentrations (below the estimated NBLs The bolded numbers are selected as thresholds.
K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205 203

Fig. 6. The NO−

3 /Cl mass ratios versus the nitrate levels in the NGMN datasets; the samples Fig. 7. Cumulative probability plots of the NGMN datasets on which two normal subpop-
showing the significant low ratios (b1.0 of average rainwater of Korea) are mainly ob- ulations (the estimated NBL and APL) are superimposed from each mixture model. In each
served in low nitrate levels below NBL90, which indicate that the natural attenuation pro- plot, a single inflection point is observed at intersecting points of the partitioned compo-
cesses may control the estimated components of NBL in the datasets. nents (lines on log-scale).

higher than that of the natural samples). Thus, the normal abundance can
than that of the natural (background) subpopulation, because of ubiqui-
be defined for the APL while the NBL distribution can be anomalous. This
tous and pervasive nitrate pollution.
relation induces the lower tail of APL to cover the upper limit of NBL (see
Furthermore, our statistical investigation can provide a heuristic
Fig. 5), and therefore the estimated NBLs from the mixture model may be
procedure to statistically determine the NBL (and threshold) of any
statistically not significant. This pattern is also shown in the cumulative
widespread pollutants, including nitrate from a large dataset. The meth-
probability plot as follows.
od consists of four steps:
As shown in the hypothetical mixture (Fig. 4C), inflection points are
observed on the cumulative probability plots of the NGMN 2009 (1) Log-transformation followed by exclusion of outlying samples to
datasets (Fig. 7). Blue and red straight lines in the figure indicate the achieve a symmetrical distribution.
fitted normal components reflecting the estimated NBL and APL values, (2) If the distribution shows a bimodality, the empirical dataset can
respectively. Unlike the hypothetical mixture, the inflection points are be successfully fitted using a finite normal model with two com-
difficult to precisely determine because the distribution shape on the ponents, and this approach can help estimate the two subpopu-
cumulative probability plot can be affected by geometric features, lations (NBL and APL) with the basic statistical parameters
such as sizes (or mixing ratios), means, and variances, of subpopulations (i.e., mixing ratios, means, and variances).
consisting of the regional dataset (Qian and Lyons, 2006). The inflection (3) Between the upper and lower limits of the two subpopulations,
points can be clearly shown when the means and sizes of two subpop- we can select a threshold (as a reference value indicating the pol-
ulations are far apart and thus can be equivalent as the hypothetical lution) considering geochemical processes.
mixture (see Fig. 5). In the case of the NGMN 2009 datasets, a single in- (4) Assuming that the NBL subpopulation can be considered as
flection point revealed by intersecting the normal lines superimposed anomalous in the regional dataset, we can graphically inspect
on each plot presents the nitrate concentrations of 2.6 mg/L for bedrock the inflection points on a cumulative probability plot by assum-
groundwater and 5.3 mg/L for alluvial groundwater (Fig. 7). The inflec- ing the normal distribution of anthropogenic polluted subpopu-
tion points are approximately the same with the lower limits of APL lation on a log-scale. In this case, the estimated inflection point
(i.e., APL10) because the upper limits of NBL (i.e., NBL90) are significantly can indicate the lower limit of the subpopulation of APL.
overlapped with the distribution of APL. Thus, our model-based ap-
proach, including graphical methods, indicates that the assumed proba- Using the thresholds (3.0 mg/L for bedrock groundwater and
bility density function is a robust fit onto a polluted subpopulation. 5.5 mg/L for alluvial groundwater), we estimated the pollution status
This study suggests that the lower limits of the polluted subpopula- of South Korean groundwater in two major river basins (i.e., regional
tion (APL10 = 3.0 mg/L for bedrock groundwater and 5.5 mg/L for allu- datasets of Cases 3 and 4) (Fig. 8). In Fig. 8, a P-value of N 0.05 from
vial groundwater; Figs. 5 and 7) in the NGMN 2009 datasets are most the K–S test indicates that the polluted subpopulation has normality,
appropriate as the nitrate thresholds for the following reasons: and thus the estimated threshold can be valid. The percentages of
(1) The estimated NBLs are largely affected by natural attenuation groundwater samples with nitrate concentrations exceeding the
(i.e., denitrification), and (2) the sampling size of APL is much larger thresholds are significantly high: 73.3% and 84.2%, respectively, of
204 K.-H. Kim et al. / Journal of Geochemical Exploration 148 (2015) 196–205

Fig. 8. Nitrate distribution of the supplementary datasets from a regional hydrochemical survey in which groundwater pollution is assessed based on the selected thresholds (3.0 and
5.5 mg/L in bedrock and alluvial groundwater) from the NGMN datasets. In each plot, P-values (N0.05) from a K–S test indicate that an estimated polluted population of each dataset fol-
lows normality on log-scale.

bedrock groundwater and alluvial groundwater in the Han River Basin population. Therefore, the estimated APLs should be more robust than
and 91.8% of bedrock groundwater in the Geum River Basin. This esti- the NBL obtained from model-based approaches, including the cumula-
mation indicates the very widespread influence on South Korean tive probability plot. This study suggests that the lower limits of the pol-
groundwater from diffusive sources, even in bedrock groundwater. luted subpopulation (i.e., APL10) are more appropriate as the thresholds
Therefore, effective groundwater quality management plans are urgent- in the NGMN datasets for two reasons: (1) The estimated NBLs are
ly needed in South Korea. largely affected by denitrification, and (2) the sampling sizes of APL
are more abundant than those of background samples because of the
4. Summary and conclusions widespread nitrate pollution. Our results provide the regional (nation-
al) thresholds of nitrate as 3.0 and 5.5 mg/L, respectively, for bedrock
In this study, we assessed the nitrate levels in groundwater from the groundwater and alluvial groundwater.
National Groundwater Monitoring Network of South Korea. The median This study provides the useful thresholds of nitrate as a practical
values of nitrate in 2009 are 8.7 and 12.2 mg/L in alluvial groundwater guide to differentiate anthropogenic pollution on a regional (national)
and bedrock groundwater, respectively, which are higher than those re- scale, which can be used for groundwater quality management. Further-
ported from a number of regional surveys in other countries. The avail- more, this study shows the usefulness of a heuristic procedure to deter-
able historical data from the NGMN show that high mean levels of mine the NBL and threshold of widespread pollutants, such as nitrate in
nitrate have been consistent during the period between 1997 and groundwater systems. The statistical approach in this study can be suc-
2009, suggesting widespread nitrate pollution of regional aquifer sys- cessfully applied to groundwater datasets with smaller portions of back-
tems of Korea at least since 1997. ground samples than polluted samples.
The statistical distribution of regional nitrate datasets is characteris-
tically skewed substantially with a heavy tail to the right (i.e., non- Acknowledgments
normality); moreover, it does not follow the normality on a log-scale.
Rather, the distribution can be defined as a bimodality with two normal This work was supported by the 2013 project (Survey of Groundwater
subpopulations. Thus, the NGMN 2009 datasets could be fitted with a fi- Contamination and Backgrounds in Livestock Farming Areas, Korea) funded
nite normal (Gaussian) mixture model with two components. The by the National Institute of Environmental Research and by the National Re-
model fitting resulted in a good separation of the datasets into two sub- search Foundation of Korea Grant (2013, University-Institute Cooperation
populations, each reflecting the NBL and APL. Program) funded by the Ministry of Science, ICT & Future Planning.
However, the estimated NBLs in the NGMN datasets are lower than
the expected level because natural attenuation processes, such as deni-
