Professional Documents
Culture Documents
Groundwater - 2023 - Sun
Groundwater - 2023 - Sun
Dongwei Sun
Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON,
N2L3G1, Canada.
d34sun@uwaterloo.ca
Ning Luo
Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON,
N2L3G1, Canada.
n2luo@uwaterloo.ca
Aaron Vandenhoff
Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON,
N2L3G1, Canada.
aaron.vandenhoff@uwaterloo.ca
Wesley McCall
Geoprobe Systems Inc., 1835 Wall St., Salina, KS 67401, USA.
McCallw@geoprobe.com
Zhanfeng Zhao
Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic
Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China.
zhaozhanfeng@igsnrr.ac.cn
Chenxi Wang
Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON,
N2L3G1, Canada.
c592wang@uwaterloo.ca
David L. Rudolph
This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1111/gwat.13348
This article is protected by copyright. All rights reserved.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON,
N2L3G1, Canada.
drudolph@uwaterloo.ca
Walter A. Illman
Corresponding author: Department of Earth and Environmental Sciences, University of Waterloo,
Waterloo, ON, N2L 3G1, Canada. 519-888-4567
willman@uwaterloo.ca
Conflict of Interest: None
Key Words: Hydraulic conductivity, specific storage, connectivity, grain size analysis,
permeameter test, slug test, direct push, hydraulic profiling tool, inverse modeling, hydraulic
Article impact statement: Evaluating K from various approaches showed that inverse modeling
and data fusion are necessary steps in building robust groundwater models.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Abstract
conductivity (K) and specific storage (Ss) to better understand groundwater flow and contaminant
transport processes. Conventional methods including grain size analyses (GSA), permeameter,
slug and pumping tests have been utilized extensively, while Direct Push-based Hydraulic
Profiling Tool (HPT) surveys have been developed to obtain high-resolution K estimates.
parameterized Hydraulic Tomography (HT) have also been advanced to map spatial variations of
K and Ss between and beyond boreholes. While different methods are available, it is unclear
which one yields K estimates that are most useful for high resolution predictions of groundwater
flow. Therefore, the main objective of this study is to evaluate various K estimates at a highly
heterogeneous field site obtained with three categories of characterization techniques including:
(1) conventional methods (GSA, permeameter and slug tests); (2) HPT surveys; and (3) inverse
geology. Then, steady-state and transient groundwater flow models are employed to
quantitatively assess various K estimates by simulating pumping tests not used for parameter
estimation. Results reveal that inverse modeling approaches yield the best drawdown predictions
under both steady and transient conditions. In contrast, conventional methods and HPT surveys
yield biased predictions. Based on our research, it appears that inverse modeling and data fusion
Significant research has been conducted over the last several decades to better understand
groundwater flow and contaminant transport processes. Groundwater flow patterns, contaminant
transport and their subsurface distributions have been found to be primarily governed by the spatial
distribution of hydraulic conductivity (K) and specific storage (Ss), while the accurate delineation
of such parameters is very difficult in complex groundwater flow systems due to high degrees of
geological heterogeneity. Inaccurate hydraulic parameter estimates will lead to poor groundwater
flow and solute transport predictions. In addition, as it was clearly demonstrated by Rehfeldt et al.
(1992) and Yeh et al. (1995), to accurately forecast the migration of a tracer plume, the number of
K estimates required to adequately capture the heterogeneity significantly increases for a site with
high degrees of geological variability. The large number of K measurements required to capture
techniques.
laboratory permeameter analyses of core samples, slug and pumping tests have been used in water-
supply investigations for several decades. However, most of them are not capable of providing
reliable and sufficient information on local heterogeneity efficiently (Butler, 2005; Alexander et
al., 2011). For example, laboratory analyses of core samples, such as GSA and permeameter tests,
can provide small-scale estimates of K at sampling locations. However, they are usually time-
consuming, notwithstanding the low sample recovery rate for coarse grained materials, and
potential errors that may result from using repacked samples for experiments conducted in the
laboratory which may deviate significantly from in situ conditions (Klute and Dirksen, 1986;
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
White, 1988). Moreover, information on K variability between boreholes cannot be delineated
Single well response tests or slug tests are usually conducted to provide point-scale K and Ss
estimates of materials representing a small volume surrounding the screened interval. While these
estimates are useful, they may not be representative of large-scale groundwater flow and solute
transport behavior. Also, considerable care must be taken as conditions at and near the well will
have significant impacts on K and Ss estimates (Beckie and Harvey, 2002; Butler, 2019). In
addition, using a solution (e.g., Hvorslev, 1951) that ignores inertial mechanisms can lead to a
significant overestimation of K (Butler et al. 2003). Therefore, appropriate slug test models should
and solute transport behavior at a site, pumping or injection tests with observation wells are
conducted. Various analytical solutions are available that can be used to obtain large-scale
estimates of K and Ss (e.g., Theis, 1935; Cooper and Jacob, 1946). In addition, some solutions yield
important insights on flow geometry and anisotropy in K (e.g., Neuman et al., 1984; Hsieh and
pumping/injection tests are averaged parameters over large volumes that are frequently impacted
by the scale effect (Clauser, 1992; Rovey and Cherkauer, 1995; Butler and Healey, 1998;
Vesselinov et al. 2001; Illman, 2006). Moreover, Wu et al. (2005) demonstrated that the estimated
K and Ss values from type curve and straight-line analyses of pumping tests in heterogeneous
aquifers are highly dependent on pumping and monitoring locations. Therefore, while the
estimates are useful for various applications, it is unclear what these parameters mean and how
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
useful they are in groundwater models. This is one important reason why novel approaches are
To better capture subsurface heterogeneity, significant efforts have been expended to map
the spatial distribution of K. One such example is the invention of various direct push (DP)
methods over the last three decades. These include DP slug test (DPST), DP permeameter (DPP),
DP injection logging (DPIL), and hydraulic profiling tool (HPT) that have been developed as
profiles of K variability in shallow, unconsolidated aquifers (Hinsby et al., 1992; Butler et al., 2007;
Dietrich et al., 2008; McCall and Christy, 2010; Geoprobe, 2015). Specifically, the HPT can
rapidly obtain high-resolution (~1.5 cm) K profiles based on the ratio of water injection rate and
corrected down-hole water pressure measured in situ (McCall and Christy, 2020).
Most approaches described above can only provide K variations in the immediate vicinity of
a well or DP location, while reliable information away from or between boreholes is difficult to
obtain. As a result, inverse modeling methods (e.g., Poeter and Hill, 1997; Carrera et al., 2005)
hydraulic head fields. Calibration of groundwater models through trial-and-error or with assistance
from nonlinear regression tools such as UCODE (Poeter and Hill, 1998) and PEST (Doherty, 2015)
can produce representative values of K and Ss if the zonation is accurate (Zhao et al., 2016; Tong
et al., 2021) and if many observed heads are available to derive statistically representative values
for each zone (Yeh et al., 2015). However, when the geological models are inaccurate, structural
noise is introduced (Doherty and Welter, 2010) and parameter estimates from inverse models can
be unrealistic with wide confidence intervals (e.g., Zhao et al., 2016; Luo et al., 2017).
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
More recently, hydraulic tomography (HT) has been developed as a new site characterization
approach to yield high resolution K and Ss estimates from deterministic or geostatistical inverse
modeling of multiple pumping tests. Specifically, HT uses the same equipment as traditional
observation wells from tests conducted at different wells. The drawdown/buildup-time dataset
through a single test and the corresponding interpretation with an appropriate inverse model yields
a snapshot of K and Ss heterogeneity. Repeating these tests at different locations and their
interpretation yields many snapshots of K and Ss heterogeneity through multiple tests. However,
quantitative synthesis of these images to accurate K and Ss values requires advanced inverse
modeling techniques. Over the last two decades, HT has been tested through a number of synthetic
(e.g., Yeh and Liu, 2000; Bohling et al., 2002; Xiang et al., 2009; Zhu and Yeh, 2005; Hu et al.,
2011), laboratory (e.g., Liu et al., 2007; Berg and Illman, 2011a; Zhao et al., 2015, 2022; Luo et
al., 2017; Jiang et al., 2021), and field studies (Bohling et al., 2007; Straface et al., 2007; Illman et
al., 2009; Berg and Illman, 2011b; Huang et al., 2011; Castagna et al., 2011; Brauchler et al., 2013;
Cardiff et al., 2013; Zhao and Illman, 2018, 2022a; Fischer et al. 2018; Zha et al., 2016, 2019;
Tiedeman and Barrash, 2020; Luo et al., 2022; Ning et al., 2023; Zhao et al., 2023). HT data from
pumping or injection tests can be inverted sequentially or simultaneously, while treating the
(Illman et al., 2015). Steady-state hydraulic tomography (SSHT) can provide K estimates, while
transient hydraulic tomography (THT) can provide both K and Ss estimates. When pumping and
monitoring locations are sparse, HT yields smooth K and Ss distributions (Illman et al., 2009; Berg
and Illman, 2011b; Cardiff et al., 2013) that could also benefit from regularization of the inverse
problem (Doherty, 2015). For example, the integration of accurate geological information into HT
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
has shown that salient inter- and intra-layer heterogeneities of K can be imaged effectively (Zhao
et al., 2016; Luo et al., 2017) for both aquifer and aquitard units (Zhao and Illman, 2018).
Based on diverse data collection and interpretation approaches that have been developed, we
can classify the aforementioned approaches into three categories of site characterization
methodologies. The first category consists of conventional methods including GSA, permeameter,
slug and pumping tests. The second category includes DP approaches with various tools such as
DPST, DPP, DPIL, and HPT. The third category consists of inverse modeling methods with
to obtain K estimates at a given site for groundwater modeling? A significant amount of research
has been conducted to examine the effectiveness of different approaches (Butler, 2005; Chapuis et
al., 2005; Butler et al., 2007; Alexander et al., 2011; Vienken and Dietrich, 2011; Liu et al., 2012;
Brauchler et al., 2013; Rosas et al., 2014; Zhao and Illman, 2018). For example, Vienken and
Dietrich (2011) utilized various empirical formulae to analyze grain size sieve results revealing
that mean K values varied by several orders of magnitude among the formulae. Alexander et al.
(2011) compared several conventional methods including GSA, permeameter, slug and pumping
tests showing that K estimates varied significantly from one method to another. Liu et al. (2012)
assessed multiple DP approaches including DPST, DPP, and DPIL. Zhao and Illman (2018)
evaluated inverse models built with different conceptualizations including effective parameters,
Thus far, only few studies have compared approaches of different categories. Butler et al.
(2007) assessed the first two categories including GSA and DP methods based on DPST and DPP.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Brauchler et al. (2013) compared the last two categories including DPIL and HT. However, there
is no consensus on which approach yields K estimates that are most representative of a field site
The main objective of this study is to evaluate various K estimates from three categories of
site characterization methods at the well-studied North Campus Research Site (NCRS). The NCRS
is located on the University of Waterloo campus in Waterloo, Ontario, Canada (Alexander et al.,
glaciofluvial deposits. We choose what we believe are the most widely utilized site
methods [Case 1a: GSA; Case 1b: Permeameter Tests; Case 1c: Slug Tests]; (2) HPT surveys with
three different formulae [Case 2a: McCall and Christy (2010); Case 2b: Borden et al., (2021); Case
2c: Zhao and Illman (2022b); and (3) various inverse modeling approaches [Case 3a: PEST
Calibrated Geological Model; Case 3b: Averaged THT Geological Model; Case 3c: Highly
Parameterized THT Model]. It is crucial to understand that each approach differs in terms of the
scale and resolution at which heterogeneity is captured as well as the types and quantity of data
that they rely on. Therefore, the performance of each approach is first qualitatively analyzed by
comparing K estimates to site geology. Then, we quantitatively evaluate the K estimates through
the independent prediction of pumping tests or other drawdown-inducing events that have not been
used during model calibrations as advocated by Illman et al. (2007) and Liu et al. (2007).
HydroGeoSphere (HGS) (Aquanty, 2019) for forward simulations of steady-state drawdown data
from seven independent pumping tests that are not used for K estimation by any method evaluated
in this study. Then, transient forward runs are performed for simulations of transient drawdown
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
data from the same pumping tests. Methods yielding K estimates that result in the smallest
discrepancies between simulated and observed drawdowns are considered the most reliable for the
NCRS.
The shallow subsurface beneath the NCRS is comprised of the Waterloo Moraine, which is
a highly heterogeneous mixture of glaciofluvial deposits. Deposits around and below the surface
are mostly an outcome of advances and retreats of the Laurentide ice sheet lobes during glaciations.
Tills covering and concealing the bedrock are laid down directly by the ice, mixing all sizes of
Karrow (1979) drilled a 50-meter-long borehole to obtain a continuous core sampling of the
materials down to the bedrock. According to the drilling report, below the top organic soil is a thin
silt layer, followed by the Tavistock till which is composed of sandy-to-clayey silt, but only exists
as erosional remnants. This till is underlain by a three-meter-thick sand sequence, followed by the
silty clay Maryhill till and the dense Catfish Creek till, which consist of silty sand and stony silt.
The Catfish Creek till extends approximately 20 meters below the ground surface and has been
treated to be the lower hydraulic barrier of the NCRS (Alexander et al. 2011). Subsequent work
by Sebol (2000) and Alexander et al. (2011) revealed that the primary characteristic of the site is
the alternating and interfingering multi-aquifer-aquitard system consisting of two high-K units
separated by a discontinuous low-K layer. The lower aquifer consists of sandy gravel, while the
upper aquifer is comprised of sand to sandy silt. Hydraulic connections are known to be provided
by the low K layer in between, and the aquifer is semi-confined. Aquitards are also found above
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
and below the two aquifers. Local stratigraphy is discontinuous with the presence of stratigraphic
The schematic configuration of wells at the NCRS in plan view is shown in Figure 1a. The
blue dashed box represents a nine-well pumping and observation network. Initially, Alexander et
al. (2011) installed four continuous multichannel tubing wells (CMT1 – CMT4), each with seven
observation ports, and a pumping well (PW1) screened at eight different elevations (i.e., PW1-1 ~
PW1-8) (Figure 1b). Continuous sediment core samples were collected with recovery rates ranging
from 69% to 83% during well installations. Sample recovery was good, but they reported the
presence of periodic gaps in profiles that corresponded with less consolidated aquifer units
To provide a comprehensive K profile along each borehole, 270 GSA and 471 falling head
permeameter tests were initially carried out using core samples from CMT1 – CMT4 and PW1 by
Alexander et al. (2011). Twenty-eight slug tests were also performed at each monitoring port of
the CMT systems. Later, two multi-screened wells (PW3, PW5) and two well clusters (PW2, PW4)
were installed and described by Berg and Illman (2011b). Fifteen additional slug tests were
performed at various intervals of PW1, PW3 and PW5 by Xie (2015) and interpreted using various
analytical models (Hvorslev, 1951; Bouwer and Rice, 1976; Hyder et al., 1994). Nine pumping
tests at PW1-3, PW1-4, PW1-5, PW3-3, PW3-4, PW4-3, PW5-3, PW5-4, and PW5-5 were
conducted mainly within aquifer layers during a HT survey by Berg and Illman (2011b). Zhao and
Illman (2018) then conducted six additional pumping/injection tests at PW1-1, PW1-6, PW1-7,
PW2-3, PW3-1, and PW5-1 with longer durations to stress both aquifer and aquitard units.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Additional 171 permeameter tests have also been performed by Zhao and Illman (2017) with core
To date, a total of 270 GSA, 642 permeameter analyses of core samples, 43 slug tests, and
15 pumping and injection tests were performed within the CMT and PW system. Moreover,
geophysical surveys were also performed at the NCRS, with Geoprobe DP surveys first conducted
in April of 2015 at eight locations to obtain electrical conductivity (EC) profiles (Williamson,
2016). During the summer of 2019, Sun et al. (2022) carried out HPT surveys at 11 DP locations
(HPT1 – HPT10 and HPT6-2). Figure 1b is the 3-D perspective view of wells and DP locations
(HPT1 ~ HPT6-2) within and around the 15 m × 15 m well clustering area, along with illustrations
of pumping and observation locations, bentonite sealings and high-resolution HPT survey intervals.
Figure 2 is the cross-sectional view (orientations of cross sections are indicated on Figure
1a) of the 3-D geological zonation model created by Zhao and Illman (2017) for the NCRS,
containing 19 different layers representing seven different material types. The model was
depths at the site. Further details on the construction of the geological model are provided in Zhao
commercial software Leapfrog Geo (ARANZ Geo. Limited, 2015), that interpolates various data
types to quickly construct geological models. Locations of the CMT and PW wells and screened
intervals are shown in the C-C’ and D-D’ cross sections in Figure 2, and A-A’ and B-B’ cross
sections in Figure S1 of the Supporting Information (SI) section. The interpolated geology between
boreholes based on known lithology is a reasonable representation of the site. The complex and
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
truncated layering of different sediment types indicate the highly heterogeneous nature of the
Case 1a: Empirical Formulae Applied to Grain Size Analyses (GSA) Results
The first method considered in this study is the application of various empirical formulae to
results from GSA. Specifically, many empirical formulae have been developed to establish
relationships between K and particle size statistics (Vienken and Dietrich, 2011; Rosas et al. 2014;
Devlin, 2015). This method is cost-efficient compared to other conventional approaches when it
comes to obtaining rapid estimates of K, avoiding the need of conducting permeameter tests
through core samples or efforts to install wells to conduct slug or pumping tests. However, the
highly heterogeneous condition at the NCRS leads to significant challenges to the analysis as most
equations described in the literature are developed for relatively permeable materials such as sand
(e.g., Krumbein and Monk, 1943; Kozeny, 1953). Thus, it is hard to determine if one dedicated
empirical relationship is suitable for various unconsolidated materials. In this study, three different
models were applied to derive K estimates from core samples of different materials. Specifically,
the Hazen (1911) model was used for coarse-grained sediments, the Puckett et al. (1985)
relationship for fine-grained sediments, and the Barr (2001) formula for intermediate-grained
analyses of repacked samples retrieved during well drilling and borehole logging. During previous
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
work by Alexander et al. (2011) and Zhao and Illman (2017), a total of 642 temperature-corrected
falling head permeameter analyses were performed on repacked samples to estimate K based on a
formula provided in Freeze and Cherry (1979). Details are provided in the SI section.
Klute and Dirksen (1986), K values of repacked samples estimated in the laboratory can be
artificially lower than those from intact samples. In addition, the extraction and repacking
processes may induce fractures and destroy the internal structures that are well-preserved in intact
samples. Sudicky (1988) demonstrated that the potential error caused by using repacked samples
recover substantial intact core samples from highly permeable zones (Butler, 2005; Alexander et
al., 2011). Therefore, underprediction of K is possible for permeameter tests conducted with
At the NCRS, 28 slug tests were conducted by Alexander et al. (2011) in all seven monitoring
intervals of the four CMT wells (i.e., CMT1 – CMT 4) and 15 tests by Xie (2015) at open intervals
of PW1, PW3, and PW5 wells resulting in a total of 43 tests. Data collected during those tests
yielded head response data that are amenable to standard slug test analyses solutions. For this study,
all tests were interpreted with the Hvorslev (1951) model with details provided in the SI section.
As the slug tests at the site were conducted along existing observation intervals and not with DP
equipment to obtain high resolution K estimates, results are grouped as part of conventional
methods.
Slug tests are suitable for materials that have moderate to low values of K, while high K
materials could also be tested and analyzed. Moreover, the sampled volume of the slug test is
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
usually considered to be much smaller compared to a pumping test, and the estimated hydraulic
parameters are only representative of materials around the test interval, and usually not between
Eleven HPT surveys were conducted by Sun et al. (2022) (Figure 1a) to characterize the
K6050; Geoprobe). Water was continuously injected during the advancement of the HPT probe
through a screen (1-cm in diameter) on the side of the probe and the corresponding water pressure,
injection flow rate (Q), as well as Electrical Conductivity (EC) were recorded electronically at a
1.5-cm vertical interval over time. Due to the highly heterogeneous nature of NCRS sediments,
the HPT probe was advanced at an average rate ranging between 1.4 to 2.2 cm/s for all 11 surveys
depending on varying sediment types. In this study, three different formulae [Case 2a: McCall and
Christy (2010); Case 2b: Borden et al. (2021); Case 2c: Zhao and Illman (2022b)] were utilized to
convert the collected data to K measurements with details provided in the SI section.
An effective way for capturing the spatial variation of hydraulic parameters is to develop
stratigraphic or zonation models, in which hydraulic parameters in each zone are treated to be
homogeneous and their values are estimated based on pumping tests or ambient hydraulic head
data through trial-and-error or automated calibration methods (Doherty, 2015). At the NCRS, Zhao
and Illman (2018) built a zonation model based on the 19-layer geological model and jointly
calibrated with 522 transient data from 176 drawdown/buildup curves obtained through eight
pumping tests (PW1-1, PW1-4, PW1-6, PW1-7, PW2-3, PW3-3, PW4-3, and PW5-3) for K and
Ss estimates. The calibration was performed by coupling HGS (Aquanty, 2019) with the parameter
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
estimation code PEST (Doherty, 2005), while treating elements in each layer to be homogeneous
and isotropic to simplify the analysis, which resulted in 19 pairs of K and Ss estimates. The model
was discretized into 31,713 rectangular finite elements of varying sizes with 34,816 nodes for
inverse modeling. From the central well cluster area to the model boundary, the element size
gradually increased, with blocks expanding from 0.5 m × 0.5 m × 0.5 m to 5 m × 5 m × 0.5 m.
In the work of Zhao and Illman (2018), the unsaturated zone at the NCRS was not considered,
and the water table was designated as the upper boundary. The water table was modelled as a flat
surface since the change in water level was less than the height of the elements at the top. The
Catfish Creek till was treated as a hydraulic barrier (Alexander et al., 2011) and served as the lower
boundary. The top and bottom model boundaries were treated as impermeable boundaries, while
the remaining four boundaries were treated as constant head boundaries as in our previous inverse
models built for the site (Berg and Illlman, 2011b, Zhao and Illman, 2018, 2022a).
Case 3b: Averaged THT Geological Model and Case 3c: Highly Parameterized
THT Model
The same datasets utilized to calibrate the geology-based groundwater flow model were also
utilized for THT analysis by Zhao and Illman (2018) to map the K and Ss heterogeneity at the
NCRS using VSAFT3 (Variably Saturated Flow and Transport 3-D Model) (Yeh et al., 1993),
which utilizes the Simultaneous Successive Linear Estimator (SimSLE) (Xiang et al., 2009) for
geostatistical inverse modeling. Settings of the numerical model (model discretization, initial and
boundary conditions) were the same as the geology-based zonation model described in the
previous section. Furthermore, results from the calibrated geology-based zonation model were
utilized as initial K and Ss guesses for the inversion of the THT model.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
The estimated K and Ss values at 31,713 finite elements from the THT analysis were then
averaged for each layer by taking the geometric mean based on the geological model to compare
with estimates from the calibrated geology-based zonation model using PEST (Doherty, 2005) and
other estimates from this study. This resulted in 19 estimates of K and Ss for each layer which we
refer to as Case 3b: Averaged THT Geological Model, while Case 3c: Highly Parameterized THT
Figure 1 shows that CMT1 is spatially close to HPT3, while CMT3 is close to both HPT6
and HPT6-2. Therefore, K estimates at CMT1 and CMT3 from Case 1a: GSA, Case 1b:
Permeameter Tests, Case 1c: Slug Tests, and various inverse modeling approaches (Cases 3a – 3c)
could be qualitatively and quantitatively compared with HPT results obtained at adjacent DP
locations.
Figure 3 summarizes the vertical profiles of log10K estimates along CMT3 and HPT6 from
GSA, permeameter tests, slug tests, and various inverse modeling approaches along with site
stratigraphy at these locations. Similar figures for vertical profiles along CMT1 and HPT3 (Figure
S3a), as well as CMT3 and HPT6-2 (Figure S3b) are provided in the SI section.
Results show that K measurements are highly variable ranging approximately seven orders
of magnitude across the two CMT wells indicating the highly heterogeneous nature of K at the
site. Figure 3 reveals the K variability from one layer to another (i.e., interlayer heterogeneity)
reflecting the alternating aquifer-aquitard system can be captured by most of the methods, while
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
only small-scale measurements from Case 1a: GSA, Case 1b: Permeameter Tests, Cases 2a – 2c:
HPT surveys, and Case 3c: Highly Parameterized THT Model reveal heterogeneity within
In terms of conventional methods, point-scale measurements of K from Case 1a: GSA and
Case 1b: Permeameter Tests generally follow a similar trend. Case 1c: Slug Test results also follow
the trend, but the measured K values are generally larger than Case 1a: GSA and Case 1b:
Permeameter Test estimates, especially at highly permeable zones potentially exhibiting a scale
effect (Clauser, 1992; Rovey and Cherkauer, 1995; Butler and Healey, 1998; Vesselinov et al.
2001; Illman, 2006). HPT results at three DP locations generally follow the trend of K from
permeameter tests of samples from the collocated CMT wells, while the K estimates are around 1
to 2 orders of magnitude larger than those estimated by permeameter tests, especially from 4 m to
8 m where local geology from collocated CMT wells is primary low permeability materials such
as clay and silt (Figure 3). Using various site-dependent formulae yields similar results at this
upper depth range. Significant differences are observed in the middle and lower portions of the
site.
Examination of Figure 3 at around 8 m – 10 m based on the core log reveals that local
geology consists primarily of highly permeable materials such as sand, thus K estimates from
permeameter tests, Case 1a: GSA and Case 1c: Slug Tests all yield relatively higher estimates of
K than for silt materials located above and below, while HPT estimates using Case 2a: McCall and
Christy (2010)’s model only yields a fixed K estimate at the lower bound of 3.5 × 10-7 m/s. The
Case 2b: Borden et al. (2021) and Case 2c: Zhao and Illman (2022b) models both provide K
estimates that are higher than those generated by McCall and Christy (2010)’s relationship.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Similarly, examination of Figure 3 at depth ranging from 9 m – 11 m, a transition zone from
sand to silt is observed through core logs, and K measurments from Case 1a: GSA and Case 1b:
Permeameter Tests both reflect this variation. However, none of the K estimates from the three
HPT formulae detect this variation, while Case 2a: McCall and Christy (2010)’s model only yields
a fixed lower bound. The Catfish Creek till located at depths below 12 m for CMT1 and below 14
m for CMT3 is detected by a significant drop in K estimates from Case 1a: GSA and Case 1b:
Permeameter Tests. Case 2a: McCall and Christy (2010)’s model yields a fixed lower bound, while
Case 2b: Borden et al. (2021)’s model generates even higher estimates of K. On the other hand,
Case 2c: Zhao and Illman (2022b)’s model at HPT3 and HPT6-2 yields K estimates that are close
to Case 1a: GSA and Case 1b: Permeameter Tests, which is encouraging as this formula was
derived through fitting K estimates mostly in the range of 3.5 × 10-7 m/s ~ 6.9 × 10-4 m/s. Case 1c:
Slug Tests yield K estimates that are in general smaller than those estimated by HPT, but the
The K estimates from various inverse modeling approaches (Cases 3a – 3c) are also plotted
for comparison (Figure 3). For Case 3a: PEST Calibrated Geological and Case 3b: Averaged THT
Geological Models, uniform K values are assigned along each of the 19-layers, while Case 3c:
Highly Parameterized THT model yields spatially variable K estimates along the depth of the
borehole.
From 0 m to 3 m, K estimates from Case 3c: Highly Parameterized THT Model are quite
smooth because there are no monitoring data available for inversion, thus the estimated K values
are nearly identical to the initial K estimate input to THT analysis. Beneath 3 m, it is evident that
THT results follow the general pattern of K variability reflecting the site stratigraphy including
small-scale interlayer heterogeneity. For example, the transition zone at 3 m and 8 m from CMT3,
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
are not captured by the other methods. However, THT results near the bottom of CMT3 from 14
Table 1 summarizes the descriptive statistics of K from various approaches at the NCRS. It
is worth mentioning that only 10 HPT surveys (HPT2 ~ HPT10 and HPT6-2) were utilized for this
study since there was no dissipation test conducted at HPT1 thus corresponding K values may be
less reliable. The reported statistics include minimum, maximum, geometric mean of K (KG), range
of log10K, and variance of log10K (σ2log10K). Another version of the table based on the natural
Examination of Table 1 shows that HPT surveys (Cases 2a – 2c) yield a significantly larger
number of K estimates due to the 1.5 cm profiling intervals along each DP location. Case 3c:
Highly Parameterized THT Model has the largest number of estimated K due to its highly
The geometric mean of K (KG) from HPT methods (Cases 2a – 2c) are higher than those
generated through traditional methods due to the technical limitation of HPT for low K materials.
In addition, KG increases from Case 1a: GSA to Case 1b: Permeameter Tests and to Case 1c: Slug
Tests due to a potential scale effect. Moreover, Case 2a: McCall and Christy (2010)’s model
exhibits the smallest range of log10K due to the use of fixed upper and lower K limits. Although
both Case 2b: Borden et al. (2021)’s and Case 2c: Zhao and Illman (2022b)’s models have not
fixed the lower K limits, HPT measurements with Q less than 10 ml/min in low K sediments were
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
considered to be inaccurate and excluded as suggested by Liu et al. (2012). Therefore, Case 2b:
Borden et al. (2021)’s model extends the range especially at the lower end. In contrast, Case 2c:
Zhao and Illman (2022b)’s model extends the range for both the higher and lower ends and yields
the largest range of log10K among the three formulae to interpret HPT data. Case 1c: Slug Tests
yield a relatively small range of K estimates, while Case 1a: GSA with three models, Case 1b:
Permeameter Tests, and the Case 3c: Highly Parameterized THT Model all yield larger ranges of
estimates.
In terms of the variance of log10K (σ2log10K), Case 1a: GSA yields the highest σ2log10K of 2.63,
perhaps because of the use of three empirical models to target various soil classes. Case 1b:
Permeameter Tests yield the second highest estimate of σ2log10K at 1.55, which is comparable to
the value for Case 1c: Slug Tests (1.47) despite the smallest number of available measurements (n
= 43). It is also noteworthy that the Case 3a: PEST Calibrated Geological Model, Case 3b:
Averaged THT Geological Model, and Case 3c: Highly Parameterized THT Model yield
comparable σ2log10K of 1.61, 1.46, and 1.47, respectively. In contrast, K estimates from the HPT
tend to result in smaller σ2log10K estimates with Case 2b: Borden et al. (2021) and Case 2c: Zhao
and Illman (2022b) models, yielding σ2log10K estimates of 0.38 and 0.86, respectively, while Case
2a: McCall and Christy (2010)’s model results in a σ2log10K of 1.28 that is somewhat lower but
The point scale K measurements from various approaches in Cases 1a – 1c and 2a – 2c were
then used to populate the 19-layer geological model by taking the KG of all data points located in
each layer. Measurements from similar sediment material were attributed to layers that have no
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
sample data available. For example, as only 43 K estimates in 11 out of 19 layers of the geological
model were available from Case 1c: Slug Tests, it was most difficult to populate the model. As a
result, KG from layers 4, 8, 16, and 18 (clay) was assigned to layers 1 and 12 (clay); KG from layers
2, 7, and 14 (silt) was assigned to layer 10 (silt), KG from layer 13 (sandy silt) was assigned to
layers 6 and 9 (sandy silt), KG from layer 11 (sand) was assigned to layer 3 (sand), KG from layer
17 (clay & silt) was assigned to layer 19 (clay & silt), and KG from layers 3 and 11 (sand) and 2
and 10 (silt) were assigned to layer 5 (sand & silt). Similar steps were also performed for the other
methods if there were layers that did not contain any K estimates and described beneath Tables S2
– S7 in the SI section.
Additionally, the maximum, upper quartile, median, KG, lower quartile, and minimum of
log10K values estimated from all investigated approaches (except for Case 3a: PEST Calibrated
Geological Model and Case 3b: Averaged THT Geological Model) were computed for each
geological layer and plotted as box-and-whisker plots in Figure 4, while their numerical values
were summarized in Tables S2 – S9 of the SI section. The lower and upper range of K estimates
from Case 2a: McCall and Christy (2010)’s model and the higher range of the Case 2c: Zhao and
Illman (2022b)’s model were indicated within the box plots in Figure 4.
Examination of Figure 4 reveals that most of the box plots are either positively skewed or
negatively skewed depending on the approach examined. In addition, the interquartile range (IQR)
of K estimates from Case 1a: GSA and Case 1b: Permeameter Tests over 19 layers is generally
larger than those from other methods, suggesting larger variability of K estimates for each layer.
The IQR from Case 2a: McCall and Christy (2010)’s model is larger than the Case 2b: Borden et
al. (2021) and Case 2c: Zhao and Illman (2022b) models. In addition, the KG from Case 2b: Borden
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
et al. (2021)’s model is less variable than Case 2a: McCall and Christy (2010)’s and Case 2c: Zhao
and Illman (2022b)’s models. The IQR for Case 3c: Highly Parameterized THT Model is
consistently smaller, which indicates less dispersion of data sets along each of the 19 layers despite
the large degree of variability in K estimates for each layer compared to other approaches.
Next, the profiles of log10KG of 19 geological layers estimated from all investigated
approaches were plotted in Figure 5. This figure reveals that it is very hard to accurately
characterize a heterogeneous site, such as the NCRS, as estimated log10KG values could range
about four orders of magnitude within a single geological unit when using various site
characterization approaches. Overall, Case 1c: Slug Tests yielded higher K estimates compared to
Case 1a: GSA and Case 1b: Permeameter Tests. The K values from HPT surveys with three
different models (Cases 2a – 2c) yielded similar K estimates, while the estimates were generally
higher than those obtained from conventional methods (i.e., Case 1a: GSA and Case 1b:
Permeameter Tests). The K values estimated from Case 3a: PEST Calibrated Geological Model
were close to those generated from Case 3b: Averaged THT Geological Model.
Because the spatial distribution of the true K field across the NCRS is not available, K
estimates from various approaches were assessed through the prediction of drawdowns from
pumping/injection tests that have not been used for K estimation. For this, we constructed a HGS
model for forward simulations of independent pumping/injection tests with K and Ss fields derived
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
from different methods. Other than hydraulic parameter fields, HGS settings were the same as the
numerical model utilized in the work of Zhao and Illman (2018) as described previously.
Seven pumping/injection tests (PW1-3, PW1-5, PW3-1, PW3-4, PW5-1, PW5-4, and PW5-
5) not used in inverse modeling were simulated with HGS and results were compared to field data
via scatterplots to evaluate the performances of models built with K estimates from different
approaches. Since most of the conventional and HPT methods were not capable of providing Ss
estimates, the forward model’s ability to predict drawdowns under steady-state condition was the
estimates from Case 3a: PEST-calibrated 19-layer geological model were utilized in Cases 1a –
1c, Cases 2a - 2c, and Case 3a. For Case 3b, 31,713 Ss values estimated via THT analysis were
averaged for each of the 19 layers, while for Case 3c, 31,713 Ss values from the THT analysis were
Comparison of K Distributions
The estimated K distributions from conventional (Cases 1a – 1c), HPT (Cases 2a – 2c), and
inverse modeling (Cases 3a – 3c) approaches utilized in the groundwater flow models are
presented as fence diagrams in Figure 6. In this figure, locations of CMT and PW wells as well as
HPT survey locations are indicated. As mentioned previously, the primary characteristic of the site
is an alternating aquifer-aquitard system, in which three discontinuous low-K units of clay to silt
unable to capture the two aquifers correctly. In contrast, the aquitard clay layer 12 between the two
aquifers and the aquitard layer 1 at the uppermost of the model domain has a higher K value than
sand and gravel aquifer units, which is inconsistent with geological data. Based on Table S2, both
aquitard layers are primarily composed of clay. Thus, K values from these two units (layers 1 and
12) are estimated using Puckett et al. (1985)’s model, while K values of the two aquifer layers are
calculated with Hazen (1911)’s model. Similar findings were reported by Alexander et al. (2011),
where both Puckett et al. (1985) and Hazen (1911) models were utilized to calculate 270 grain size
distributions and results showed that the mean K generated by the Puckett et al. (1985) model was
about two orders of magnitude larger than estimated from the Hazen (1911) model. These findings
indicate that the Puckett et al. (1985) model may not be suitable to calculate K for clay materials
at a highly heterogeneous glaciofluvial deposited site even though equation (2) is only dependent
on clay content.
Results from Case 1b: Permeameter Tests reveal the existence of a double-aquifer system.
However, based on Table S3, relatively low K values are estimated for the sand and gravel layer
(i.e., layer 15). Lower K values are obtained for the aquitard layers above and below as well as in
Results from Case 1c: Slug Tests capture the aquitard units below and above the aquifer
system. However, based on Tables S2 to S4, K values are relatively larger than those estimated by
Case 1a: GSA and Case 1b: Permeameter Tests. In addition, the double-aquifer system is not
reflected correctly. Based on Table S4, all the sandy-silt layers (6, 9, and 13) between and above
the aquifers have greater estimates of K than the two aquifer units (layers 11 and 15). The reason
is that the 43 slug test measurements only cover one sandy-silt layer. Specifically, there are only
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
four measurements for layer 13, while the estimated K values are relatively large and do not agree
The K distributions in Figure 6 from HPT using three different models (Cases 2a – 2c) are
generally biased towards higher K values. Specifically, results from Case 2a: McCall and Christy
(2010)’s model capture the lower aquitard. However, K estimates tend to be larger than those
generated from Case 1a: GSA and Case 1b: Permeameter Tests. In addition, only the lower aquifer
is revealed, while the clay layer 4 and silt layers 6, 7, and 10 located in the upper aquitard (based
on Table S5) have higher K estimates than the most permeable aquifer unit layer 15, which does
not conform to known geology. Results from Case 2b: Borden et al. (2021)’s model only captures
the lower aquitard, while K estimates are generally larger, thus every layer above the lower
aquitard is hard to be distinguished. The K distributions from Case 2c: Zhao and Illman (2022b)’s
model only captures the lower aquifer and the lowest aquitard, while the K values for units above
the lower aquifer are generally less variable and the aquitard layers have generally larger K
estimates. The less variable values from high-resolution HPT methods are mainly due to the
limited range of K estimates obtained from each of the three models (McCall and Christy, 2010;
Examination of three inverse modeling results (Cases 3a – 3c) reveals K variations more
accurately. The K values from Case 3a: PEST Calibrated Geological Model capture the expected
variation from one layer to the next. Specifically, the K estimate of the unit in between two aquifers
matches that expected for an aquitard. The K values for aquitards above and below the double
aquifer system are also estimated to be low. However, the K estimate for the upper aquitard is
relatively larger than the value estimated from permeameter tests, due to the sparse monitoring
data at the uppermost model domain. Case 3b: Averaged THT Geological Model has a similar K
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
distribution compared to Case 3a: PEST Calibrated Geological Model (as shown in Figure 6), and
the K estimate for the lowest aquifer agrees more with those obtained by conventional methods
(i.e., Case 1a: GSA and Case 1b: Permeameter Tests). Case 3c: Highly Parameterized THT Model
yields a K field that exhibits both inter- and intra-layer heterogeneity. It is noteworthy that the
estimated K from Case 3c for the lower aquifer layer 15 is higher compared with Cases 3a and 3b.
The performance of each K distribution in Figure 6 obtained by various methods was then
evaluated by predicting independent pumping tests that are not used for model calibration using
HGS (Aquanty, 2019) with the computational mesh described earlier. As previously noted, a total
of 15 pumping tests (PW1-1, PW1-3, PW1-4, PW1-5, PW1-6, PW1-7, PW2-3, PW3-1, PW3-3,
PW3-4, PW4-3, PW5-1, PW5-3, PW5-4, and PW5-5) were conducted at the NCRS, while eight
tests were utilized by Zhao and Illman (2018) for Case 3a, 3b, and 3c model calibrations (PW1-1,
PW1-4, PW1-6, PW1-7, PW2-3, PW3-3, PW4-3, and PW5-3). Therefore, for this study, seven
tests not used in model calibration by Zhao and Illman (2018) were chosen to evaluate the K
Since most of the conventional (i.e., Case 1a: GSA and Case 1b: Permeameter Tests) and
the HPT surveys (Cases 2a – 2c) in its current form cannot provide Ss estimates, steady-state
simulation was the first metric to evaluate the K estimates from various approaches. Only late-time
pressure heads from ports that reach steady or quasi-steady state were chosen, which resulted in
153 head data. To better evaluate the correspondence between the simulated and observed
drawdown values, quantitative analyses were first performed by comparing the coefficient of
determination (R2), mean absolute error (L1) and mean square error (L2), which are provided as:
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2
1 𝑛𝑛
∑ �𝑋𝑋𝑖𝑖 −𝑋𝑋��𝑋𝑋�𝑖𝑖 −𝑋𝑋
�𝚤𝚤 �
𝑛𝑛 𝑖𝑖=1
𝑅𝑅 2 = � 2 1 2
� (1)
� 1 ∑𝑛𝑛 𝑛𝑛 � �
𝑛𝑛 𝑖𝑖=1 �𝑋𝑋𝑖𝑖 −𝑋𝑋� × ∑𝑖𝑖=1 �𝑋𝑋𝑖𝑖 −𝑋𝑋𝚤𝚤 �
𝑛𝑛
1
𝐿𝐿1 = 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 �𝑋𝑋𝑖𝑖 − 𝑋𝑋�𝑖𝑖 � (2)
1 2
𝐿𝐿2 = 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 �𝑋𝑋𝑖𝑖 − 𝑋𝑋�𝑖𝑖 � (3)
where 𝑛𝑛 is the total number of data, 𝑖𝑖 indicates the data number, 𝑋𝑋𝑖𝑖 is the simulated drawdown, 𝑋𝑋�𝑖𝑖
is the observed drawdown, 𝑋𝑋 is the mean of simulated drawdowns, and 𝑋𝑋�𝚤𝚤 is the mean of observed
drawdowns.
Statistics calculated from each method through seven simulations are summarized in Tables
S10 to S12 (in the SI section). Cells in Tables S10 to S12 are colour-coded to enhance the
comparison. Examination of Tables S10 to S12 reveals that Case 3c: Highly Parameterized THT
Model performs the best yielding smallest discrepancies between simulated and measured
drawdowns (i.e., smallest L1 and L2 norms) as well as highest R2 values for most of the pumping
tests, followed by Case 3b: Averaged THT Geological Model and Case 3a: PEST Calibrated
Geological Model.
In terms of K estimates from HPT surveys (Cases 2a – 2c), discrepancies between simulated
and observed drawdown values are the smallest for Case 2c: Zhao and Illman (2022b)’s model
among the three HPT formulae. Three conventional methods, especially Case 1a: GSA and Case
Simulation results are also assessed by plotting scatterplots, as shown in Figure 7. In each
plot, a linear model fit to all data and corresponding slope and intercept of these fits, as well as R2
values are included. Meanwhile, a 1:1 line is also included in each subplot to indicate a perfect
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
match. The slope and intercept values obtained from the linear model fit for individual tests from
all methods are summarized in Table S13 for the interested reader.
To enhance our comparison and evaluation, transient simulations were also performed. Since
most of the selected conventional and HPT methods cannot yield Ss estimates, estimates for the 19
geological layers obtained from Case 3a: PEST Calibrated Geological Model were assigned to
Cases 1a to 1c and Cases 2a to 2c. Three points were selected from the early, intermediate, and
late times of each drawdown curve, which resulted in a total of 388 drawdown data. It is worth
noting that less drawdown data points were selected from injection tests performed at PW3-1 and
PW5-1 as the data from the tests were noisy and impacted by the Noordbergum effect (Verruijt,
1969; Rodrigues, 1983; Berg et al., 2011). Therefore, only late time data were selected from those
drawdown curves. Similar to steady-state results, various model performance metrics such as the
R2, L1, L2, slope and intercept of the linear model are summarized in Tables S14 to S17, while
simulated drawdown curves for the pumping/injection tests at ports PW1-3, PW1-5, PW3-1, PW3-
4, PW5-1, PW5-4, and PW5-5 with various K estimates are compared against observed drawdowns
Discussion
Predictions?
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Examination of Figures 7 and 8 reveals that performance in steady-state and transient
simulation results are quite comparable. Specifically, groundwater models built with conventional
(Cases 1a - 1c) and HPT (Cases 2a - 2c) K estimates yield biased drawdown predictions for both
steady-state (Figure 7) and transient (Figure 8) simulation results. In contrast, inverse modeling
approaches based on Case 3a: PEST Calibrated Geological Model and Case 3b: Averaged THT
Geological Model both yield good predictions of drawdowns, while Case 3c: Highly
Parameterized THT Model produces excellent matches for steady state simulation results (Figure
7). For transient simulation results (Figure 8), the difference in performance among Cases 3a - 3c
is more comparable although Case 3c still yields the best prediction performance.
In terms of conventional methods, K estimates from Case 1a: GSA and Case 1b:
Permeameter Tests overpredict drawdowns, while those estimated via Case 1c: Slug Tests
underpredict drawdowns under steady-state and transient conditions (Figures 7 and 8). According
to Table 1, as well as Figure 5, Case 1a: GSA and Case 1b: Permeameter Tests tend to provide
smaller KG estimates than Case 1c: Slug Tests. Specifically, Table 1 shows that Case 1a: GSA and
Case 1b: Permeameter Tests yield KG values of 1.19 × 10-7 m/s and 3.03 × 10-7 m/s, respectively.
The lower KG values in relation to other approaches (Table 1) may be due to sample loss from
highly permeable zones as observed by Alexander et al. (2011). Core samples have been obtained
with a split spoon sampler that was driven in front of the drill head. Alexander et al. (2011) noted
that the sample recovery was on the order of 80% for all wells except for CMT3, which had a
lower recovery rate of 69%. Sample recovery was found to be good, but periodic gaps were noted
for depths corresponding with aquifers. Therefore, the coarse-grained portion of samples may have
been lost and not subjected to sieve analyses and permeameter tests. An additional factor relevant
to permeameter tests is the repacking of samples. Klute and Dirksen (1986) discussed that K of
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
repacked samples estimated in the laboratory can be artificially lower than those from intact
samples.
It is surprising to note that Case 1c: Slug Tests consistently underpredict drawdowns (Figures
7 and 8) as this method is widely used for various field investigations (Butler, 1997, 2005; Cardiff
et al., 2011). Table 1 shows that 43 slug tests yield a KG value of 2.65 × 10-6 m/s, which is
approximately one order of magnitude higher than those from Case 1a: GSA and Case 1b:
Permeameter Tests. This may be due to three potential commingling factors: 1) the relatively
sparse data points (n = 43) available at the site that could have led to preferential sampling from
higher K intervals; 2) the Hvorslev (1951) approach yielding slightly higher estimates of K (Xie,
2015) compared to the Bouwer and Rice (1976) and Kansas Geological Survey (KGS) models
(Hyder et al., 1994); and 3) the scale effect, in which slug tests sample larger volumes that may be
impacted by highly permeable zones not considered by other methods that sample smaller volumes
such as Case 1a: GSA and Case 1b: Permeameter Tests. Another potential factor may be the slug
test K estimate being representative of the filter pack. However, for this study, care was taken to
avoid fitting the Hvorslev (1951) model to the early portion of the head response curve.
Figures 7 and 8 also reveal that K estimates from various HPT formulae (Cases 2a – 2c)
persistently yield biased low predictions of drawdowns under both steady-state and transient
conditions. Examination of Table 1 reveals that KG from the three methods are 2.85 × 10-6 m/s,
5.78 × 10-6 m/s, and 3.84 × 10-6 m/s for Case 2a: McCall and Christy (2010), Case 2b: Borden et
al. (2021), and Case 2c: Zhao and Illman (2022) models, respectively. The KG estimates from the
three HPT formulae (Cases 2a – 2c) are approximately one order of magnitude larger than those
from conventional methods (Cases 1a – 1c). In terms of predictions of drawdowns, the use of three
different models yields results that are slightly different. Specifically, based on Figures 7 and 8,
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
R2 values increase, while both L1 and L2 norms decrease from Cases 2a to 2c. As previously
mentioned, Case 2c: Zhao and Illman (2022b)’s model is a site-specific relationship developed for
the NCRS. Therefore, building a site-specific model to interpret HPT data is helpful in terms of
site characterization. However, the improvement to drawdown predictions is not very significant
based on Figures 7 and 8 (see also Figures S4 to S10 in the SI section) as Cases 2a to 2c that all
underpredict observed drawdowns. An obvious reason is the limited range of estimated K for the
three models used to interpret HPT data at the NCRS. It is interesting to note that while the KG
values estimated through the three approaches are quite similar, the range of log10K is quite
different for each approach (Table 1). Moreover, despite Case 2c: Zhao and Illman (2022b)’s
model that extends the lower and upper K ranges compared to the other two models (Cases 2a and
2b), the resulting K estimates and corresponding forward simulations yield biased predictions. This
is likely due to the highly heterogeneous nature of the glaciofluvial deposits at the NCRS and the
connectivity of these units is an important consideration for building more accurate groundwater
In contrast to the conventional and HPT K estimates that yield biased drawdown predictions,
we find that inverse modeling approaches (Cases 3a – 3c) with various parameterizations, all yield
more accurate drawdown predictions at the NCRS (Figures 7 and 8). Case 3a: PEST Calibrated
Geological Model and Case 3b: Averaged THT Geological Model both yield a good drawdown
match with the measured data, while Case 3c: Highly Parameterized THT Model with prior
geological information yields excellent forward simulation results under both steady and transient
conditions. The calibration of a HGS groundwater flow model based on geological zonation with
PEST (Case 3a) is also a form of THT analysis (Illman et al., 2015), but it differs from the high-
resolution approach (Case 3c) based on SimSLE in VSAFT3. The HGS/PEST calibration (Case
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
3a) fits all drawdown/buildup data in a least-square sense and restricts its effective parameter
estimates to 19 geologic zones, while SimSLE in VSAFT3 does not have this constraint. For this
reason, VSAFT3 can adjust more parameters such that the calibrated drawdown-time curves honor
the observed ones during each test. Therefore, VSAFT3’s estimates (Case 3c) yield better
Based on Tables S10 to S12 and Tables S14 to S16 (SI section), Case 3c: Highly
Parametrized THT Model consistently yields the best R2, L1 and L2 norms under both steady-state
and transient conditions followed by Case 3a: PEST Calibrated Geological Model and Case 3b:
Averaged THT Geological Model. These results indicate that, even though the K fields (refer to
Figure 6, Cases 3a to 3c) reveal similar overall characteristics, local scale differences in K could
result, Case 3c: Highly Parametrized HT Model that can accurately map both interlayer and
For transient groundwater flow simulations, estimates of Ss are necessary. However, most
of the conventional and HPT methods do not yield these estimates. In addition, Ss is typically
considered to be much less variable than K, thus less attention has been paid. As a result, the
flow simulation is analyzed by: (1) using an effective Ss from Zhao and Illman (2018) who treated
the multi-aquifer-aquitard system to be homogeneous and isotropic; and (2) using estimated Ss
values from Case 3a: PEST Calibrated Geological Model of Zhao and Illman (2018).
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
To answer the question of whether groundwater models should consider variability in Ss or
not, additional transient simulations are performed with seven pumping tests (PW1-3, PW1-5,
PW3-1, PW3-4, PW5-1, PW5-4, and PW5-5) for Cases 1a to 1c and Cases 2a to 2c. The
corresponding L1 and L2 norms are summarized in Tables S18 and S19 (SI section), where the blue
color represents the results from homogeneous Ss, while the yellow color represents results from
heterogeneous Ss. The bold values of L1 and L2 norms on Tables S18 and S19 indicate smaller
values for either the homogeneous or heterogenous Ss case identifying the case exhibiting less
Examination of Tables S18 and S19 reveals that for virtually all cases, providing
heterogeneous Ss estimates to each of the 19-layers in the model yields better transient simulation
results than utilizing a homogeneous Ss value encompassing all 19-layers. As a result, to achieve
more accurate transient groundwater flow simulation results, it may be advisable to spend more
efforts in accurately capturing Ss heterogeneity at sites where the lithology changes significantly
building robust groundwater models for improved predictions of groundwater flow and solute
transport. There are several conventional approaches to estimate K including the use of empirical
and analytical formulae to interpret data from GSA, permeameter, slug and pumping tests. Over
the last two decades, several DP-based field tools such as DPIL and HPT have been developed to
The newer DP tools and interpretation methods have positioned DP surveys to become one of the
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
most efficient approaches for site characterization compared to conventional methods, although
models and more recent development and testing of HT have shown its effectiveness in yielding
Previously, various studies have been published that compared different methods of
estimating K, but there is lack of consensus of a method that yields K estimates that are most useful
for groundwater flow models. In this study, we utilize a groundwater flow model, constructed with
conventional techniques (Case 1a: GSA; Case 1b: Permeameter Tests; and Case 1c: Slug Tests);
(2) HPT survey data interpreted with three different models [Case 2a: McCall and Christy (2010);
Case 2b: Borden et al. (2021); and Case 2c: Zhao and Illman (2022b)]; and (3) three inverse
modeling approaches (Cases 3a: PEST Calibrated Geological Model; Case 3b: Averaged THT
Geological Model; and Case 3c: Highly Parameterized THT Model) in terms of their ability to
predict drawdowns under both steady-state and transient conditions. This study leads to the
1. Despite the time and effort to conduct 270 GSA, 642 permeameter tests, and 43 slug tests,
conventional methods at the NCRS yielded biased K estimates that led to poor predictions
of drawdowns from pumping tests. Most empirical formulae applied with data from GSA
were developed for relatively permeable materials, which presents a challenge for their
2. The development of DP techniques and the HPT has significantly advanced our capabilities
high-resolution, the estimation of K from HPT survey data may require more attention than
previously thought. In this study, three separate approaches [i.e., Case 2a: McCall and
Christy (2010); Case 2b: Borden et al. (2021); Case 2c: Zhao and Illman (2022b)] were
utilized to estimate K. The K estimates obtained through the three different formulae were
each constrained through varying upper and lower bounds, which presented challenges in
characterizing low permeability materials such as silt and clay. Groundwater flow
simulations with K estimates derived from three formulae yielded biased predictions of
3. Inverse modeling of pumping test data with geology-based and highly parameterized
geostatistics-based HT models at the NCRS has shown that they yield robust estimates of
K and Ss that are useful for steady-state and transient groundwater flow simulations.
estimates that consistently led to accurate predictions of pumping tests not used in the
a highly parameterized groundwater flow model with parameter estimates from THT that
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
captured the most salient features of interlayer and intralayer K heterogeneity. Additional
characterization at sites where large changes to lithologies are found. While the accurate
prediction of drawdowns from pumping tests is promising, further studies are needed to
see whether these K distributions are useful for contaminant transport predictions.
4. Our research suggests that inverse modeling is a necessary step in building more robust
groundwater flow models echoing suggestions by Poeter and Hill (1997) and Carrera et al.
(2005). HT additionally fuses information from multiple pumping tests and can integrate
other data such as from geological investigations (e.g., Zhao and Illman, 2018),
geophysical surveys (e.g., Soueid Ahmed et al., 2015), flowmeter surveys (Li et al., 2008,
Aliouache et al., 2021; Luo et al., 2023), tracer tests (e.g., Yeh and Zhu, 2007; Illman et
al., 2010; Doro et al., 2014) and high-resolution pressure (Zhao and Illman, 2022a) as well
as K estimates from the HPT surveys (Zhao et al., 2023) that further improves parameter
estimates are highly dependent on model conceptualization, accuracy of data fed into
models including forcing functions (i.e., initial and boundary conditions, source/sink terms)
applied to models. Data fusion as part of inverse modeling is encouraged for building more
robust groundwater models and obtaining better parameter estimates but should be done
Acknowledgements
The HPT surveys conducted by Geoprobe Systems, GroundTech Solutions Ltd. and the University
of Waterloo (UW) at the NCRS were a result of discussions at the NovCare meeting held at the
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
University of Waterloo during the summer of 2019. We are very grateful to Wes McCall from
Geoprobe Systems and Jeff Bibbings from GroundTech Solutions Ltd. for visiting UW and
training our staff and students to conduct the HPT surveys at the NCRS. Walter A. Illman
acknowledges the partial support from the Discovery Grant awarded by the Natural Sciences and
Engineering Research Council of Canada (NSERC). Dongwei Sun acknowledges the support from
the Qinhuangdao Architecture Design Institute and Brayden McNeill from Aquanty Inc. who
provided guidance on building the initial HGS model for this study. Finally, we thank the
Executive Editor (Charles Andrews), Mike Fienen, and the two anonymous reviewers for
Supporting Information
Supporting Information is generally not peer reviewed. Supporting Information can be found in an
online document that contains additional details to methods used to estimate K and Tables S1 to
References
Alexander, M., S. J. Berg, and W. A. Illman. 2011. Field study of hydrogeologic characterization
Aliouache, M., X. Wang, P. Fischer, G. Massonnat, and H. Jourde. 2021. An inverse approach
integrated subsurface and surface flow and solute transport. Waterloo, Ontario, Canada.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ARANZ Geo. Limited., 2015. Leapfrog Hydro 2.2.3. 3D Geological Modeling Software.
Beckie, R., and C. F. Harvey. 2002. What does a slug test measure: an investigation of
instrument response and the effects of heterogeneity. Water Resources Research 38, no.
12: 1290.
Berg, S. J., P. A. Hsieh, and W. A. Illman. 2011. Estimating hydraulic parameters when
Berg, S. J., and W. A. Illman. 2011a. Capturing aquifer heterogeneity: comparison of approaches
through controlled sandbox experiments. Water Resources Research 47, no. 9: W09514.
Bohling, G. C., X. Zhan, J. J. Butler, Jr., and L. Zheng. 2002. Steady shape analysis of
Bohling, G. C., J. J. Butler, Jr., X. Zhan, and M. D. Knoll. 2007. A field assessment of the value
Borden, R. C., K. Y. Cha, and G. Liu. 2021. A physically based approach for estimating
hydraulic conductivity from HPT pressure and flowrate. Ground Water 59, no. 2: 266–
272.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Bouwer, H. and R. C. Rice. 1976. A slug test method for determining hydraulic conductivity of
Brauchler, R., R. Hu, L. Hu, S. Jiménez, P. Bayer, P. Dietrich, and T. Ptak. 2013. Rapid field
Butler, J. J., Jr. 2019. The Design, Performance, and Analysis of Slug Tests. 2nd ed. CRC Press,
Butler, J. J., Jr. and J. M. Healey. 1998. Relationship between pumping test and slug-test
Butler, J. J., Jr. 2005. Hydrogeological methods for estimation of spatial variations in hydraulic
Butler, J. J., Jr, E. J. Garnett, and J. M. Healey. 2003. Analysis of slug tests in formations of high
Butler, J. J., P. Dietrich, V. Wittig, and T. Christy. 2007. Characterizing hydraulic conductivity
Cardiff, M., W. Barrash, M. Thoma, and B. Malama. 2011. Information content of slug tests for
Cardiff, M., W. Barrash, and P. K. Kitanidis. 2013. Hydraulic conductivity imaging from 3-D
Castagna, M., M. W. Becker, and A. Bellin. 2011. Joint estimation of transmissivity and
Evaluating the hydraulic conductivity at three different scales within an unconfined sand
Clauser, C., 1992. Permeability of crystalline rocks. Eos Transactions American Geophysical
Cooper, H. H., and C. E. Jacob. 1946. A generalized graphical method for evaluating formation
Dietrich, P., J. J. Butler, Jr. and K. Faiß. 2008. A rapid method for hydraulic profiling in
Doherty, J., and D. Welter. 2010. A short exploration of structural noise, Water Resources
PEST: complete theory and what it means for modelling the real world, Watermark
Doro, K. O., O. A. Cirpka, and C. Leven. 2014. Tracer tomography: Design concepts and field
experiments using heat as a tracer. Groundwater 53, no. S1: 139 – 148.
Fischer, P., A. Jardani, and N. Lecoq. 2018. Hydraulic tomography of discrete networks of
Geoprobe. 2015. Geoprobe ® Hydraulic Profiling Tool (HPT) System Standard Operating
Procedure.
Hazen, A. 1911. Discussion: Dams on sand foundations. Transactions, American Society of Civil
Hinsby, K., P. L. Bjerg, L. J. Andersen, B. Skov, and E. V. Clausen. 1992. A mini slug test
conductivity tensor of anisotropic media. 1. Theory. Water Resources Research 21, no.
11: 1655-1665.
Hu, R., R. Brauchler, M. Herold, and P. Bayer. 2011. Hydraulic tomography analog outcrop
study: Combining travel time and steady shape inversion. Journal of Hydrology 409, no.
1–2: 350–362.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Huang, S.-Y., J.-C., Wen, T.-C. J., Yeh, W. Lu, H.-L. Juan, C.-M. Tseng, J.-H. Lee, K.-C.
Hvorslev, M. J. 1951. Time Lag and Soil Permeability in Ground-Water Observations, Bull. No.
Army, 1–50.
Hyder, Z., J. J. Butler, Jr., C. D. McElwee, and W. Liu. 1994. Slug tests in partially penetrating
Illman, W. A. 2006. Strong field evidence of directional permeability scale effect in fractured
Illman, W. A., X. Liu, and A. J. Craig. 2007. Steady-state hydraulic tomography in a laboratory
Illman, W. A., X. Liu, S. Takeuchi, T.-C. J. Yeh, K. Ando, and H. Saegusa. 2009. Hydraulic
Illman, W. A., S. J. Berg, X. Liu, and A. Massi. 2010. Hydraulic/partitioning tracer tomography
Illman, W. A., S. J. Berg, and Z. Zhao. 2015. Should hydraulic tomography be interpreted using
127108.
Klute, A., and C. Dirksen. 1986. Hydraulic conductivity and diffusivity: laboratory methods.
Krumbein, W. C., and G. D. Monk. 1943. Permeability as a function of the size parameters of
inversion of flowmeter and pumping test data. Groundwater 46, no. 2: 193-201.
Liu, G., J. J. Butler, Jr., E. Reboulet, and S. Knobbe. 2012. Hydraulic conductivity profiling with
Liu, X., W. A. Illman, A. J. Craig, J. Zhu, and T.-C. J. Yeh. 2007. Laboratory sandbox validation
Luo, N., Z. Zhao, W. A. Illman, and S. J. Berg. 2017. Comparative study of transient hydraulic
Luo, N., Z. Zhao, W. A. Illman, Y. Zha, C.-M. W. Mok, and T.-C. J. Yeh (2023), Three-
e2022WR034034.
McCall, W., and T. M. Christy. 2010. Development of a Hydraulic Conductivity-Estimate for the
Hydraulic Profiling Tool (HPT) Abstract and Presentation, The 2010 North American
McCall, W., and T. M. Christy. 2020. The hydraulic profiling tool for hydrogeologic
no. 3: 89–103.
Determination of horizontal aquifer anisotropy with three wells, Ground Water 22, no. 1:
66-72.
Ning, Z., N. Luo, K. Inaba, T. Nakashima, T. Shimizu, and W. A. Illman. 2023. Three-
reproducibility, data density, and geological prior models. Journal of Hydrology 616:
128785.
Poeter, E. P. and M. C. Hill. 1997. Inverse model: A necessary next step in ground-water
Virginia, USGS.
Puckett, W. E., J. H. Dane, and B. F. Hajek. 1985. Physical and mineralogical data to determine
soil hydraulic properties. Soil Science Society of America Journal 49, no. 4: 831–836.
Rodrigues, J. D. 1983. The Noordbergum effect and characterization of aquitards at the Rio
distribution for different depositional environments. Ground Water 52, no. 3: 399–413.
Rovey II., C.W., and D. S. Cherkauer. 1995. Scale dependency of hydraulic conductivity
Sebol, L. A. 2000. Determination of groundwater age using CFCs in three shallow aquifers in
Soueid Ahmed, A., A. Jardani, A. Revil, J. P. Dupont. 2014. Hydraulic conductivity field
characterization from the joint inversion of hydraulic heads and self-potential data. Water
field, Montalto Uffugo Scalo, Italy. Water Resources Research 43, no. 7: W07432.
Sun, D., N. Luo, A. Vandenhoff, C. Wang, Z. Zhao, D. L. Rudolph, and W. A. Illman. 2022.
Evaluation of the hydraulic profiling tool (HPT) at a highly heterogeneous field site
Systems, 74 pp.
Theis, C. V. 1935. The relation between the lowering of piezometric surface and the rate of the
fracture network, and connectivity in mudstone. Ground Water 58, no. 2: 238–257.
Tong, X., W. A. Illman, S. J. Berg, and N. Luo. 2021. Hydraulic tomography analysis of
Verruijt, A. 1969. Elastic storage of aquifers. Flow through Porous Media, 1: 331–376.
Vienken, T., and P. Dietrich. 2011. Field evaluation of methods for determining hydraulic
conductivity from grain size data. Journal of Hydrology 400, no. 1–2: 58–71.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Vukovic, M., and A. Soro. 1992. Determination of Hydraulic Conductivity of Porous Media
Colorado.
aquifer: Spatial variability of hydraulic conductivity and its role in the dispersion
Wu, C.-M., T.-C. J. Yeh, J. Zhu, T. H. Lee, N.-S. Hsu, C.-H. Chen, and Sancho, A. F. 2005.
Xiang, J., T.-C. J. Yeh, C.-H. Lee, K.-C. Hsu, and J.-C. Wen. 2009. A simultaneous successive
linear estimator and a guide for hydraulic tomography analysis. Water Resources
Xie, Q. 2015. Slug tests analysis with different analytical models at a highly heterogeneous field
Yeh, T.-C. J., R. Srivastava, A. Guzman, and T. Harter. 1993. A numerical model for water flow
and chemical transport in variably saturated porous media. Ground Water 31, no. 4: 634–
644.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Yeh, T.-C. J., J. Mas‐Pla, T. M . W illiams, and J. F, M cCarthy . 1995. Observation and three-
Yeh, T.-C. J., and S. Liu. 2000. Hydraulic tomography: development of a new aquifer test
Yeh, T.-C. J., and J. Zhu (2007), Hydraulic/partitioning tracer tomography for characterization of
dense nonaqueous phase liquid source zones, Water Resources Research 43: W06435.
Yeh, T.-C. J., D. Mao, Y. Zha, J.-C. Wen, L. Wan, K.-C. Hsu, and C.-H. Lee. 2015. Uniqueness,
scale, and resolution issues in groundwater model parameter identification. Water Science
Zha, Y., T.-C. J. Yeh, W. A. Illman, T. Tanaka, P. Bruines, H. Onoe, H. Saegusa, D. Mao, S.
Zha, Y., T.-C. J. Yeh, W. A. Illman, C. M. W. Mok, C.-H. M. Tso, Y.-L. Wang. 2019.
Zhao, Z., W. A. Illman, T.-C. J. Yeh, S. J. Berg, and D. Mao. 2015, Validation of hydraulic
Zhao, Z., W. A. Illman, and S. J. Berg. 2016. On the importance of geological data for hydraulic
Zhao, Z., and W. A. Illman. 2018. Three-dimensional imaging of aquifer and aquitard
Zhao, Z., and W. A. Illman. 2022a. Integrating hydraulic profiling tool pressure logs and
Zhao, Z., S. J. Berg, W. A. Illman, and Y. Qi. 2022. Improving predictions of solute transport in
Zhao, Z., N. Luo, and W. A. Illman. 2023. Geostatistical analysis of high-resolution hydraulic
conductivity estimates from the hydraulic profiling tool and integration with hydraulic
Zhu, J., and T.-C. J. Yeh. 2005. Characterization of aquifer heterogeneity using transient
Case 1a: GSA (Three Models) 270 3.07×10−11 2.50×10−3 1.19×10−7 7.91 2.63
Case 1b: Permeameter Tests 642 1.15×10−10 4.63×10−3 3.03×10−7 7.60 1.55
Case 2a: HPT (McCall and Christy, 2010) 7,660 3.53×10−7 2.65×10−4 2.85×10−6 2.88 1.28
Case 2b: HPT (Borden et al. 2021) 7,660 1.13×10−8 2.69×10−4 5.78×10−6 4.38 0.38
Case 2c: HPT (Zhao and Illman, 2022b) 7,660 3.78×10−10 6.90×10−4 3.84×10−6 6.26 0.85
Case 3a: PEST Calibrated Geological Model 19 2.53×10−9 1.07×10−4 1.25×10−6 4.63 1.61
Case 3b: Averaged THT Geological Model 19 5.44×10−9 1.29×10−4 1.14×10−6 4.37 1.46
Case 3c: Highly Parameterized THT Model 31,713 4.20×10−11 2.90×10−3 5.79×10−7 7.84 1.47
Figure Captions
Figure 1. a) Schematic diagram in plan view showing the well configuration including the CMT
and PW well network and nine NC wells where geological data are obtained, as well as 11 HPT
profile locations. Gray dashed lines represent four geological cross sections A-A’, B-B’, C-C’ and
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
D-D’ as presented in Figure 2; b) 3D perspective view of wells and DP locations within the 15 m
× 15 m well cluster area shown as a blue dashed area in Figure 1a along with numbered well
Figure 2. Cross-sectional view of the 19-layer geological zonation model with CMT and PW
screened intervals shown in cross sections C-C’ and D-D’. Cross sections along A-A’ and B-B’
are available in Figure S1 of the Supporting Information (SI) section. The 19 layers represent 7
different material types as indicated in the stratigraphic index. The 7 material types were obtained
through examination of cores from 18 boreholes at the site. Specifically, the 19 layers indicated
on cross sections C-C’ and D-D’ are clay (1, 4, 8, 12, 16, 18), silt and clay (17, 19), silt (2, 7, 10,
14), sandy silt (6, 9, 13), silt and sand (5), sand (3, 11) and sand and gravel (15). On cross sections
C-C’ and D-D’, layer numbers are italicized and numbers along PW and CMT wells indicate port
Figure 3. Vertical profiles of log10K estimates from various approaches with K in units of m/s
Figure 4. Box-and-whisker plots of log10K estimates with K in units of m/s from various site
Figure 5. Log10KG estimates with KG in units of m/s from various site characterization
approaches for 19 layers of the geological model. Log10K values from Case 3c are also plotted,
but as the values are not provided in terms of layers, those values are plotted against Depth (m)
on the right axis based on the vertical profile of PW1 at the center of the simulation domain.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Figure 6. K distributions at the NCRS from various site characterization approaches. CMT and
PW well locations (red lines) along with their screened intervals (black colour) as well as HPT