Groundwater - 2023 - Sun

Research Paper\
Evaluation of Hydraulic Conductivity Estimates from Various
Approaches with Groundwater Flow Models
Dongwei Sun
Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON,
N2L3G1, Canada.
d34sun@uwaterloo.ca
Ning Luo
N2L3G1, Canada.
n2luo@uwaterloo.ca
Aaron Vandenhoff
N2L3G1, Canada.
aaron.vandenhoff@uwaterloo.ca
Wesley McCall
Geoprobe Systems Inc., 1835 Wall St., Salina, KS 67401, USA.
McCallw@geoprobe.com
Zhanfeng Zhao
Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic
Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, China.
zhaozhanfeng@igsnrr.ac.cn
Chenxi Wang
N2L3G1, Canada.
c592wang@uwaterloo.ca
David L. Rudolph
This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1111/gwat.13348
This article is protected by copyright. All rights reserved.
17456584, ja, Downloaded from https://ngwa.onlinelibrary.wiley.com/doi/10.1111/gwat.13348 by Cochrane Oman, Wiley Online Library on [31/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
N2L3G1, Canada.
drudolph@uwaterloo.ca
Walter A. Illman
Corresponding author: Department of Earth and Environmental Sciences, University of Waterloo,
Waterloo, ON, N2L 3G1, Canada. 519-888-4567
willman@uwaterloo.ca
Conflict of Interest: None
Key Words: Hydraulic conductivity, specific storage, connectivity, grain size analysis,
permeameter test, slug test, direct push, hydraulic profiling tool, inverse modeling, hydraulic
tomography, geological model, groundwater model.
Article impact statement: Evaluating K from various approaches showed that inverse modeling
and data fusion are necessary steps in building robust groundwater models.
Abstract
Significant efforts have been expended for improved characterization of hydraulic
conductivity (K) and specific storage (Ss) to better understand groundwater flow and contaminant
transport processes. Conventional methods including grain size analyses (GSA), permeameter,
slug and pumping tests have been utilized extensively, while Direct Push-based Hydraulic
Profiling Tool (HPT) surveys have been developed to obtain high-resolution K estimates.
Moreover, inverse modeling approaches based on geology-based zonations, and highly
parameterized Hydraulic Tomography (HT) have also been advanced to map spatial variations of
K and Ss between and beyond boreholes. While different methods are available, it is unclear
which one yields K estimates that are most useful for high resolution predictions of groundwater
flow. Therefore, the main objective of this study is to evaluate various K estimates at a highly
heterogeneous field site obtained with three categories of characterization techniques including:
(1) conventional methods (GSA, permeameter and slug tests); (2) HPT surveys; and (3) inverse
modeling based on geology-based zonations and highly parameterized approaches. The
performance of each approach is first qualitatively analyzed by comparing K estimates to site
geology. Then, steady-state and transient groundwater flow models are employed to
quantitatively assess various K estimates by simulating pumping tests not used for parameter
estimation. Results reveal that inverse modeling approaches yield the best drawdown predictions
under both steady and transient conditions. In contrast, conventional methods and HPT surveys
yield biased predictions. Based on our research, it appears that inverse modeling and data fusion
are necessary steps in predicting accurate groundwater flow behavior.

Introduction
Significant research has been conducted over the last several decades to better understand
groundwater flow and contaminant transport processes. Groundwater flow patterns, contaminant
transport and their subsurface distributions have been found to be primarily governed by the spatial
distribution of hydraulic conductivity (K) and specific storage (Ss), while the accurate delineation
of such parameters is very difficult in complex groundwater flow systems due to high degrees of
geological heterogeneity. Inaccurate hydraulic parameter estimates will lead to poor groundwater
flow and solute transport predictions. In addition, as it was clearly demonstrated by Rehfeldt et al.
(1992) and Yeh et al. (1995), to accurately forecast the migration of a tracer plume, the number of
K estimates required to adequately capture the heterogeneity significantly increases for a site with
high degrees of geological variability. The large number of K measurements required to capture
heterogeneity presents significant challenges to implementing conventional site characterization
techniques.
Conventional methods such as empirical-relation-based grain size analyses (GSA),
laboratory permeameter analyses of core samples, slug and pumping tests have been used in water-
supply investigations for several decades. However, most of them are not capable of providing
reliable and sufficient information on local heterogeneity efficiently (Butler, 2005; Alexander et
al., 2011). For example, laboratory analyses of core samples, such as GSA and permeameter tests,
can provide small-scale estimates of K at sampling locations. However, they are usually time-
consuming, notwithstanding the low sample recovery rate for coarse grained materials, and
potential errors that may result from using repacked samples for experiments conducted in the
laboratory which may deviate significantly from in situ conditions (Klute and Dirksen, 1986;
White, 1988). Moreover, information on K variability between boreholes cannot be delineated
without interpolating point-scale measurements.
Single well response tests or slug tests are usually conducted to provide point-scale K and Ss
estimates of materials representing a small volume surrounding the screened interval. While these
estimates are useful, they may not be representative of large-scale groundwater flow and solute
transport behavior. Also, considerable care must be taken as conditions at and near the well will
have significant impacts on K and Ss estimates (Beckie and Harvey, 2002; Butler, 2019). In
addition, using a solution (e.g., Hvorslev, 1951) that ignores inertial mechanisms can lead to a
significant overestimation of K (Butler et al. 2003). Therefore, appropriate slug test models should
be selected and used for data analysis to minimize interpretation errors.
To obtain larger-scale estimates of K and Ss representative of large-scale groundwater flow
and solute transport behavior at a site, pumping or injection tests with observation wells are
conducted. Various analytical solutions are available that can be used to obtain large-scale
estimates of K and Ss (e.g., Theis, 1935; Cooper and Jacob, 1946). In addition, some solutions yield
important insights on flow geometry and anisotropy in K (e.g., Neuman et al., 1984; Hsieh and
Neuman 1985). However, estimates of K and Ss from the traditional interpretation of
pumping/injection tests are averaged parameters over large volumes that are frequently impacted
by the scale effect (Clauser, 1992; Rovey and Cherkauer, 1995; Butler and Healey, 1998;
Vesselinov et al. 2001; Illman, 2006). Moreover, Wu et al. (2005) demonstrated that the estimated
K and Ss values from type curve and straight-line analyses of pumping tests in heterogeneous
aquifers are highly dependent on pumping and monitoring locations. Therefore, while the
estimates are useful for various applications, it is unclear what these parameters mean and how
useful they are in groundwater models. This is one important reason why novel approaches are
necessary for higher resolution subsurface characterization of K and Ss heterogeneity.
To better capture subsurface heterogeneity, significant efforts have been expended to map
the spatial distribution of K. One such example is the invention of various direct push (DP)
methods over the last three decades. These include DP slug test (DPST), DP permeameter (DPP),
DP injection logging (DPIL), and hydraulic profiling tool (HPT) that have been developed as
efficient alternatives to conventional well-based approaches for providing high-resolution vertical
profiles of K variability in shallow, unconsolidated aquifers (Hinsby et al., 1992; Butler et al., 2007;
Dietrich et al., 2008; McCall and Christy, 2010; Geoprobe, 2015). Specifically, the HPT can
rapidly obtain high-resolution (~1.5 cm) K profiles based on the ratio of water injection rate and
corrected down-hole water pressure measured in situ (McCall and Christy, 2020).
Most approaches described above can only provide K variations in the immediate vicinity of
a well or DP location, while reliable information away from or between boreholes is difficult to
obtain. As a result, inverse modeling methods (e.g., Poeter and Hill, 1997; Carrera et al., 2005)
have been developed as an alternative approach to estimate K and Ss by calibrating a groundwater
flow model consisting of geological zonations with ambient or anthropogenically modified
hydraulic head fields. Calibration of groundwater models through trial-and-error or with assistance
from nonlinear regression tools such as UCODE (Poeter and Hill, 1998) and PEST (Doherty, 2015)
can produce representative values of K and Ss if the zonation is accurate (Zhao et al., 2016; Tong
et al., 2021) and if many observed heads are available to derive statistically representative values
for each zone (Yeh et al., 2015). However, when the geological models are inaccurate, structural
noise is introduced (Doherty and Welter, 2010) and parameter estimates from inverse models can
be unrealistic with wide confidence intervals (e.g., Zhao et al., 2016; Luo et al., 2017).
More recently, hydraulic tomography (HT) has been developed as a new site characterization
approach to yield high resolution K and Ss estimates from deterministic or geostatistical inverse
modeling of multiple pumping tests. Specifically, HT uses the same equipment as traditional
pumping or injection tests and collects drawdown/buildup-time datasets at several surrounding
observation wells from tests conducted at different wells. The drawdown/buildup-time dataset
through a single test and the corresponding interpretation with an appropriate inverse model yields
a snapshot of K and Ss heterogeneity. Repeating these tests at different locations and their
interpretation yields many snapshots of K and Ss heterogeneity through multiple tests. However,
quantitative synthesis of these images to accurate K and Ss values requires advanced inverse
modeling techniques. Over the last two decades, HT has been tested through a number of synthetic
(e.g., Yeh and Liu, 2000; Bohling et al., 2002; Xiang et al., 2009; Zhu and Yeh, 2005; Hu et al.,
2011), laboratory (e.g., Liu et al., 2007; Berg and Illman, 2011a; Zhao et al., 2015, 2022; Luo et
al., 2017; Jiang et al., 2021), and field studies (Bohling et al., 2007; Straface et al., 2007; Illman et
al., 2009; Berg and Illman, 2011b; Huang et al., 2011; Castagna et al., 2011; Brauchler et al., 2013;
Cardiff et al., 2013; Zhao and Illman, 2018, 2022a; Fischer et al. 2018; Zha et al., 2016, 2019;
Tiedeman and Barrash, 2020; Luo et al., 2022; Ning et al., 2023; Zhao et al., 2023). HT data from
pumping or injection tests can be inverted sequentially or simultaneously, while treating the
medium to be homogeneous, consisting of geology-based zonations, or highly parameterized
(Illman et al., 2015). Steady-state hydraulic tomography (SSHT) can provide K estimates, while
transient hydraulic tomography (THT) can provide both K and Ss estimates. When pumping and
monitoring locations are sparse, HT yields smooth K and Ss distributions (Illman et al., 2009; Berg
and Illman, 2011b; Cardiff et al., 2013) that could also benefit from regularization of the inverse
problem (Doherty, 2015). For example, the integration of accurate geological information into HT
has shown that salient inter- and intra-layer heterogeneities of K can be imaged effectively (Zhao
et al., 2016; Luo et al., 2017) for both aquifer and aquitard units (Zhao and Illman, 2018).
Based on diverse data collection and interpretation approaches that have been developed, we
can classify the aforementioned approaches into three categories of site characterization
methodologies. The first category consists of conventional methods including GSA, permeameter,
slug and pumping tests. The second category includes DP approaches with various tools such as
DPST, DPP, DPIL, and HPT. The third category consists of inverse modeling methods with
various degrees of model parameterization ranging from geological zonations to a highly
parameterized geostatistics-based HT approach.
A question frequently encountered by hydrogeologists is what approach should be adopted
to obtain K estimates at a given site for groundwater modeling? A significant amount of research
has been conducted to examine the effectiveness of different approaches (Butler, 2005; Chapuis et
al., 2005; Butler et al., 2007; Alexander et al., 2011; Vienken and Dietrich, 2011; Liu et al., 2012;
Brauchler et al., 2013; Rosas et al., 2014; Zhao and Illman, 2018). For example, Vienken and
Dietrich (2011) utilized various empirical formulae to analyze grain size sieve results revealing
that mean K values varied by several orders of magnitude among the formulae. Alexander et al.
(2011) compared several conventional methods including GSA, permeameter, slug and pumping
tests showing that K estimates varied significantly from one method to another. Liu et al. (2012)
assessed multiple DP approaches including DPST, DPP, and DPIL. Zhao and Illman (2018)
evaluated inverse models built with different conceptualizations including effective parameters,
geology-based zonations, and a highly parameterized geostatistics-based HT approach.
Thus far, only few studies have compared approaches of different categories. Butler et al.
(2007) assessed the first two categories including GSA and DP methods based on DPST and DPP.
Brauchler et al. (2013) compared the last two categories including DPIL and HT. However, there
is no consensus on which approach yields K estimates that are most representative of a field site
and useful for groundwater flow modeling.
The main objective of this study is to evaluate various K estimates from three categories of
site characterization methods at the well-studied North Campus Research Site (NCRS). The NCRS
is located on the University of Waterloo campus in Waterloo, Ontario, Canada (Alexander et al.,
2011), which is underlain by a multiple aquifer-aquitard system consisting of highly heterogeneous
glaciofluvial deposits. We choose what we believe are the most widely utilized site
characterization methods for K heterogeneity. Approaches evaluated include: (1) conventional
methods [Case 1a: GSA; Case 1b: Permeameter Tests; Case 1c: Slug Tests]; (2) HPT surveys with
three different formulae [Case 2a: McCall and Christy (2010); Case 2b: Borden et al., (2021); Case
2c: Zhao and Illman (2022b); and (3) various inverse modeling approaches [Case 3a: PEST
Calibrated Geological Model; Case 3b: Averaged THT Geological Model; Case 3c: Highly
Parameterized THT Model]. It is crucial to understand that each approach differs in terms of the
scale and resolution at which heterogeneity is captured as well as the types and quantity of data
that they rely on. Therefore, the performance of each approach is first qualitatively analyzed by
comparing K estimates to site geology. Then, we quantitatively evaluate the K estimates through
the independent prediction of pumping tests or other drawdown-inducing events that have not been
used during model calibrations as advocated by Illman et al. (2007) and Liu et al. (2007).
Specifically, a three-dimensional (3-D) forward groundwater model is developed using
HydroGeoSphere (HGS) (Aquanty, 2019) for forward simulations of steady-state drawdown data
from seven independent pumping tests that are not used for K estimation by any method evaluated
in this study. Then, transient forward runs are performed for simulations of transient drawdown
data from the same pumping tests. Methods yielding K estimates that result in the smallest
discrepancies between simulated and observed drawdowns are considered the most reliable for the
NCRS.
Description of Field Site and Data Used for Analyses
Site Description and Hydrogeology
The shallow subsurface beneath the NCRS is comprised of the Waterloo Moraine, which is
a highly heterogeneous mixture of glaciofluvial deposits. Deposits around and below the surface
are mostly an outcome of advances and retreats of the Laurentide ice sheet lobes during glaciations.
Tills covering and concealing the bedrock are laid down directly by the ice, mixing all sizes of
materials from clay to boulders (Karrow, 1993).
Karrow (1979) drilled a 50-meter-long borehole to obtain a continuous core sampling of the
materials down to the bedrock. According to the drilling report, below the top organic soil is a thin
silt layer, followed by the Tavistock till which is composed of sandy-to-clayey silt, but only exists
as erosional remnants. This till is underlain by a three-meter-thick sand sequence, followed by the
silty clay Maryhill till and the dense Catfish Creek till, which consist of silty sand and stony silt.
The Catfish Creek till extends approximately 20 meters below the ground surface and has been
treated to be the lower hydraulic barrier of the NCRS (Alexander et al. 2011). Subsequent work
by Sebol (2000) and Alexander et al. (2011) revealed that the primary characteristic of the site is
the alternating and interfingering multi-aquifer-aquitard system consisting of two high-K units
separated by a discontinuous low-K layer. The lower aquifer consists of sandy gravel, while the
upper aquifer is comprised of sand to sandy silt. Hydraulic connections are known to be provided
by the low K layer in between, and the aquifer is semi-confined. Aquitards are also found above
and below the two aquifers. Local stratigraphy is discontinuous with the presence of stratigraphic
windows rendering the site to be highly heterogeneous.
Available Field Data and the 19-layer Geological Model
The schematic configuration of wells at the NCRS in plan view is shown in Figure 1a. The
blue dashed box represents a nine-well pumping and observation network. Initially, Alexander et
al. (2011) installed four continuous multichannel tubing wells (CMT1 – CMT4), each with seven
observation ports, and a pumping well (PW1) screened at eight different elevations (i.e., PW1-1 ~
PW1-8) (Figure 1b). Continuous sediment core samples were collected with recovery rates ranging
from 69% to 83% during well installations. Sample recovery was good, but they reported the
presence of periodic gaps in profiles that corresponded with less consolidated aquifer units
(Alexander et al., 2011).
To provide a comprehensive K profile along each borehole, 270 GSA and 471 falling head
permeameter tests were initially carried out using core samples from CMT1 – CMT4 and PW1 by
Alexander et al. (2011). Twenty-eight slug tests were also performed at each monitoring port of
the CMT systems. Later, two multi-screened wells (PW3, PW5) and two well clusters (PW2, PW4)
were installed and described by Berg and Illman (2011b). Fifteen additional slug tests were
performed at various intervals of PW1, PW3 and PW5 by Xie (2015) and interpreted using various
analytical models (Hvorslev, 1951; Bouwer and Rice, 1976; Hyder et al., 1994). Nine pumping
tests at PW1-3, PW1-4, PW1-5, PW3-3, PW3-4, PW4-3, PW5-3, PW5-4, and PW5-5 were
conducted mainly within aquifer layers during a HT survey by Berg and Illman (2011b). Zhao and
Illman (2018) then conducted six additional pumping/injection tests at PW1-1, PW1-6, PW1-7,
PW2-3, PW3-1, and PW5-1 with longer durations to stress both aquifer and aquitard units.
Additional 171 permeameter tests have also been performed by Zhao and Illman (2017) with core
samples collected from PW2, PW3, PW4, and PW5 wells.
To date, a total of 270 GSA, 642 permeameter analyses of core samples, 43 slug tests, and
15 pumping and injection tests were performed within the CMT and PW system. Moreover,
geophysical surveys were also performed at the NCRS, with Geoprobe DP surveys first conducted
in April of 2015 at eight locations to obtain electrical conductivity (EC) profiles (Williamson,
2016). During the summer of 2019, Sun et al. (2022) carried out HPT surveys at 11 DP locations
(HPT1 – HPT10 and HPT6-2). Figure 1b is the 3-D perspective view of wells and DP locations
(HPT1 ~ HPT6-2) within and around the 15 m × 15 m well clustering area, along with illustrations
of pumping and observation locations, bentonite sealings and high-resolution HPT survey intervals.
Figure 2 is the cross-sectional view (orientations of cross sections are indicated on Figure
1a) of the 3-D geological zonation model created by Zhao and Illman (2017) for the NCRS,
containing 19 different layers representing seven different material types. The model was
constructed by examining lithology information obtained from 18 boreholes completed to different
depths at the site. Further details on the construction of the geological model are provided in Zhao
and Illman (2017).
The geological model is 70 m × 70 m × 17 m in extent and is constructed with the
commercial software Leapfrog Geo (ARANZ Geo. Limited, 2015), that interpolates various data
types to quickly construct geological models. Locations of the CMT and PW wells and screened
intervals are shown in the C-C’ and D-D’ cross sections in Figure 2, and A-A’ and B-B’ cross
sections in Figure S1 of the Supporting Information (SI) section. The interpolated geology between
boreholes based on known lithology is a reasonable representation of the site. The complex and
truncated layering of different sediment types indicate the highly heterogeneous nature of the
glaciofluvial deposit at the NCRS.
Insert Figures 1 and 2 here
Description of Various K Estimation Methods
Case 1a: Empirical Formulae Applied to Grain Size Analyses (GSA) Results
The first method considered in this study is the application of various empirical formulae to
results from GSA. Specifically, many empirical formulae have been developed to establish
relationships between K and particle size statistics (Vienken and Dietrich, 2011; Rosas et al. 2014;
Devlin, 2015). This method is cost-efficient compared to other conventional approaches when it
comes to obtaining rapid estimates of K, avoiding the need of conducting permeameter tests
through core samples or efforts to install wells to conduct slug or pumping tests. However, the
highly heterogeneous condition at the NCRS leads to significant challenges to the analysis as most
equations described in the literature are developed for relatively permeable materials such as sand
(e.g., Krumbein and Monk, 1943; Kozeny, 1953). Thus, it is hard to determine if one dedicated
empirical relationship is suitable for various unconsolidated materials. In this study, three different
models were applied to derive K estimates from core samples of different materials. Specifically,
the Hazen (1911) model was used for coarse-grained sediments, the Puckett et al. (1985)
relationship for fine-grained sediments, and the Barr (2001) formula for intermediate-grained
sediments, with details provided in the Supporting Information (SI) section.
Case 1b: Permeameter Tests
Another traditional method for obtaining K estimates is to conduct laboratory permeameter
analyses of repacked samples retrieved during well drilling and borehole logging. During previous
work by Alexander et al. (2011) and Zhao and Illman (2017), a total of 642 temperature-corrected
falling head permeameter analyses were performed on repacked samples to estimate K based on a
formula provided in Freeze and Cherry (1979). Details are provided in the SI section.
By conducting a permeameter test, vertical K is preferentially determined. As reported by
Klute and Dirksen (1986), K values of repacked samples estimated in the laboratory can be
artificially lower than those from intact samples. In addition, the extraction and repacking
processes may induce fractures and destroy the internal structures that are well-preserved in intact
samples. Sudicky (1988) demonstrated that the potential error caused by using repacked samples
in permeameter tests is small compared to the K heterogeneity. Moreover, it is very difficult to
recover substantial intact core samples from highly permeable zones (Butler, 2005; Alexander et
al., 2011). Therefore, underprediction of K is possible for permeameter tests conducted with
materials from highly permeable intervals.
Case 1c: Slug Tests
At the NCRS, 28 slug tests were conducted by Alexander et al. (2011) in all seven monitoring
intervals of the four CMT wells (i.e., CMT1 – CMT 4) and 15 tests by Xie (2015) at open intervals
of PW1, PW3, and PW5 wells resulting in a total of 43 tests. Data collected during those tests
yielded head response data that are amenable to standard slug test analyses solutions. For this study,
all tests were interpreted with the Hvorslev (1951) model with details provided in the SI section.
As the slug tests at the site were conducted along existing observation intervals and not with DP
equipment to obtain high resolution K estimates, results are grouped as part of conventional
methods.
Slug tests are suitable for materials that have moderate to low values of K, while high K
materials could also be tested and analyzed. Moreover, the sampled volume of the slug test is
usually considered to be much smaller compared to a pumping test, and the estimated hydraulic
parameters are only representative of materials around the test interval, and usually not between
boreholes based on site heterogeneity (Butler, 1997).
Cases 2a – 2c: HPT Surveys
Eleven HPT surveys were conducted by Sun et al. (2022) (Figure 1a) to characterize the
high-resolution variability of K to an approximate depth of 17 m with the HPT probe (Model
K6050; Geoprobe). Water was continuously injected during the advancement of the HPT probe
through a screen (1-cm in diameter) on the side of the probe and the corresponding water pressure,
injection flow rate (Q), as well as Electrical Conductivity (EC) were recorded electronically at a
1.5-cm vertical interval over time. Due to the highly heterogeneous nature of NCRS sediments,
the HPT probe was advanced at an average rate ranging between 1.4 to 2.2 cm/s for all 11 surveys
depending on varying sediment types. In this study, three different formulae [Case 2a: McCall and
Christy (2010); Case 2b: Borden et al. (2021); Case 2c: Zhao and Illman (2022b)] were utilized to
convert the collected data to K measurements with details provided in the SI section.
Case 3a: PEST Calibrated Geological Model
An effective way for capturing the spatial variation of hydraulic parameters is to develop
stratigraphic or zonation models, in which hydraulic parameters in each zone are treated to be
homogeneous and their values are estimated based on pumping tests or ambient hydraulic head
data through trial-and-error or automated calibration methods (Doherty, 2015). At the NCRS, Zhao
and Illman (2018) built a zonation model based on the 19-layer geological model and jointly
calibrated with 522 transient data from 176 drawdown/buildup curves obtained through eight
pumping tests (PW1-1, PW1-4, PW1-6, PW1-7, PW2-3, PW3-3, PW4-3, and PW5-3) for K and
Ss estimates. The calibration was performed by coupling HGS (Aquanty, 2019) with the parameter
estimation code PEST (Doherty, 2005), while treating elements in each layer to be homogeneous
and isotropic to simplify the analysis, which resulted in 19 pairs of K and Ss estimates. The model
was discretized into 31,713 rectangular finite elements of varying sizes with 34,816 nodes for
inverse modeling. From the central well cluster area to the model boundary, the element size
gradually increased, with blocks expanding from 0.5 m × 0.5 m × 0.5 m to 5 m × 5 m × 0.5 m.
The computational mesh is provided as Figure S2 in the SI section.
In the work of Zhao and Illman (2018), the unsaturated zone at the NCRS was not considered,
and the water table was designated as the upper boundary. The water table was modelled as a flat
surface since the change in water level was less than the height of the elements at the top. The
Catfish Creek till was treated as a hydraulic barrier (Alexander et al., 2011) and served as the lower
boundary. The top and bottom model boundaries were treated as impermeable boundaries, while
the remaining four boundaries were treated as constant head boundaries as in our previous inverse
models built for the site (Berg and Illlman, 2011b, Zhao and Illman, 2018, 2022a).
Case 3b: Averaged THT Geological Model and Case 3c: Highly Parameterized
THT Model
The same datasets utilized to calibrate the geology-based groundwater flow model were also
utilized for THT analysis by Zhao and Illman (2018) to map the K and Ss heterogeneity at the
NCRS using VSAFT3 (Variably Saturated Flow and Transport 3-D Model) (Yeh et al., 1993),
which utilizes the Simultaneous Successive Linear Estimator (SimSLE) (Xiang et al., 2009) for
geostatistical inverse modeling. Settings of the numerical model (model discretization, initial and
boundary conditions) were the same as the geology-based zonation model described in the
previous section. Furthermore, results from the calibrated geology-based zonation model were
utilized as initial K and Ss guesses for the inversion of the THT model.
The estimated K and Ss values at 31,713 finite elements from the THT analysis were then
averaged for each layer by taking the geometric mean based on the geological model to compare
with estimates from the calibrated geology-based zonation model using PEST (Doherty, 2005) and
other estimates from this study. This resulted in 19 estimates of K and Ss for each layer which we
refer to as Case 3b: Averaged THT Geological Model, while Case 3c: Highly Parameterized THT
Model utilizes all 31,713 K and Ss estimates.
Qualitative Comparison of K estimates
Figure 1 shows that CMT1 is spatially close to HPT3, while CMT3 is close to both HPT6
and HPT6-2. Therefore, K estimates at CMT1 and CMT3 from Case 1a: GSA, Case 1b:
Permeameter Tests, Case 1c: Slug Tests, and various inverse modeling approaches (Cases 3a – 3c)
could be qualitatively and quantitatively compared with HPT results obtained at adjacent DP
locations.
Insert Figure 3 here
Figure 3 summarizes the vertical profiles of log10K estimates along CMT3 and HPT6 from
GSA, permeameter tests, slug tests, and various inverse modeling approaches along with site
stratigraphy at these locations. Similar figures for vertical profiles along CMT1 and HPT3 (Figure
S3a), as well as CMT3 and HPT6-2 (Figure S3b) are provided in the SI section.
Results show that K measurements are highly variable ranging approximately seven orders
of magnitude across the two CMT wells indicating the highly heterogeneous nature of K at the
site. Figure 3 reveals the K variability from one layer to another (i.e., interlayer heterogeneity)
reflecting the alternating aquifer-aquitard system can be captured by most of the methods, while
only small-scale measurements from Case 1a: GSA, Case 1b: Permeameter Tests, Cases 2a – 2c:
HPT surveys, and Case 3c: Highly Parameterized THT Model reveal heterogeneity within
individual layers (i.e., intralayer heterogeneity).
In terms of conventional methods, point-scale measurements of K from Case 1a: GSA and
Case 1b: Permeameter Tests generally follow a similar trend. Case 1c: Slug Test results also follow
the trend, but the measured K values are generally larger than Case 1a: GSA and Case 1b:
Permeameter Test estimates, especially at highly permeable zones potentially exhibiting a scale
effect (Clauser, 1992; Rovey and Cherkauer, 1995; Butler and Healey, 1998; Vesselinov et al.
2001; Illman, 2006). HPT results at three DP locations generally follow the trend of K from
permeameter tests of samples from the collocated CMT wells, while the K estimates are around 1
to 2 orders of magnitude larger than those estimated by permeameter tests, especially from 4 m to
8 m where local geology from collocated CMT wells is primary low permeability materials such
as clay and silt (Figure 3). Using various site-dependent formulae yields similar results at this
upper depth range. Significant differences are observed in the middle and lower portions of the
site.
Examination of Figure 3 at around 8 m – 10 m based on the core log reveals that local
geology consists primarily of highly permeable materials such as sand, thus K estimates from
permeameter tests, Case 1a: GSA and Case 1c: Slug Tests all yield relatively higher estimates of
K than for silt materials located above and below, while HPT estimates using Case 2a: McCall and
Christy (2010)’s model only yields a fixed K estimate at the lower bound of 3.5 × 10-7 m/s. The
Case 2b: Borden et al. (2021) and Case 2c: Zhao and Illman (2022b) models both provide K
estimates that are higher than those generated by McCall and Christy (2010)’s relationship.
Similarly, examination of Figure 3 at depth ranging from 9 m – 11 m, a transition zone from
sand to silt is observed through core logs, and K measurments from Case 1a: GSA and Case 1b:
Permeameter Tests both reflect this variation. However, none of the K estimates from the three
HPT formulae detect this variation, while Case 2a: McCall and Christy (2010)’s model only yields
a fixed lower bound. The Catfish Creek till located at depths below 12 m for CMT1 and below 14
m for CMT3 is detected by a significant drop in K estimates from Case 1a: GSA and Case 1b:
Permeameter Tests. Case 2a: McCall and Christy (2010)’s model yields a fixed lower bound, while
Case 2b: Borden et al. (2021)’s model generates even higher estimates of K. On the other hand,
Case 2c: Zhao and Illman (2022b)’s model at HPT3 and HPT6-2 yields K estimates that are close
to Case 1a: GSA and Case 1b: Permeameter Tests, which is encouraging as this formula was
derived through fitting K estimates mostly in the range of 3.5 × 10-7 m/s ~ 6.9 × 10-4 m/s. Case 1c:
Slug Tests yield K estimates that are in general smaller than those estimated by HPT, but the
number of available data are low.
The K estimates from various inverse modeling approaches (Cases 3a – 3c) are also plotted
for comparison (Figure 3). For Case 3a: PEST Calibrated Geological and Case 3b: Averaged THT
Geological Models, uniform K values are assigned along each of the 19-layers, while Case 3c:
Highly Parameterized THT model yields spatially variable K estimates along the depth of the
borehole.
From 0 m to 3 m, K estimates from Case 3c: Highly Parameterized THT Model are quite
smooth because there are no monitoring data available for inversion, thus the estimated K values
are nearly identical to the initial K estimate input to THT analysis. Beneath 3 m, it is evident that
THT results follow the general pattern of K variability reflecting the site stratigraphy including
small-scale interlayer heterogeneity. For example, the transition zone at 3 m and 8 m from CMT3,
are not captured by the other methods. However, THT results near the bottom of CMT3 from 14
m to 16 m indicate an increase in K, which does not conform to stratigraphy at this location.
Statistical Analysis of K from Various Site Characterization Approaches
Descriptive Statistics of K from Various Approaches
Table 1 summarizes the descriptive statistics of K from various approaches at the NCRS. It
is worth mentioning that only 10 HPT surveys (HPT2 ~ HPT10 and HPT6-2) were utilized for this
study since there was no dissipation test conducted at HPT1 thus corresponding K values may be
less reliable. The reported statistics include minimum, maximum, geometric mean of K (KG), range
of log10K, and variance of log10K (σ2log10K). Another version of the table based on the natural
logarithm of K (ln K) is provided as Table S1 in the SI section.
Insert Table 1 here
Examination of Table 1 shows that HPT surveys (Cases 2a – 2c) yield a significantly larger
number of K estimates due to the 1.5 cm profiling intervals along each DP location. Case 3c:
Highly Parameterized THT Model has the largest number of estimated K due to its highly
parameterized nature of the geostatistical inversion approach.
The geometric mean of K (KG) from HPT methods (Cases 2a – 2c) are higher than those
generated through traditional methods due to the technical limitation of HPT for low K materials.
In addition, KG increases from Case 1a: GSA to Case 1b: Permeameter Tests and to Case 1c: Slug
Tests due to a potential scale effect. Moreover, Case 2a: McCall and Christy (2010)’s model
exhibits the smallest range of log10K due to the use of fixed upper and lower K limits. Although
both Case 2b: Borden et al. (2021)’s and Case 2c: Zhao and Illman (2022b)’s models have not
fixed the lower K limits, HPT measurements with Q less than 10 ml/min in low K sediments were
considered to be inaccurate and excluded as suggested by Liu et al. (2012). Therefore, Case 2b:
Borden et al. (2021)’s model extends the range especially at the lower end. In contrast, Case 2c:
Zhao and Illman (2022b)’s model extends the range for both the higher and lower ends and yields
the largest range of log10K among the three formulae to interpret HPT data. Case 1c: Slug Tests
yield a relatively small range of K estimates, while Case 1a: GSA with three models, Case 1b:
Permeameter Tests, and the Case 3c: Highly Parameterized THT Model all yield larger ranges of
estimates.
In terms of the variance of log10K (σ2log10K), Case 1a: GSA yields the highest σ2log10K of 2.63,
perhaps because of the use of three empirical models to target various soil classes. Case 1b:
Permeameter Tests yield the second highest estimate of σ2log10K at 1.55, which is comparable to
the value for Case 1c: Slug Tests (1.47) despite the smallest number of available measurements (n
= 43). It is also noteworthy that the Case 3a: PEST Calibrated Geological Model, Case 3b:
Averaged THT Geological Model, and Case 3c: Highly Parameterized THT Model yield
comparable σ2log10K of 1.61, 1.46, and 1.47, respectively. In contrast, K estimates from the HPT
tend to result in smaller σ2log10K estimates with Case 2b: Borden et al. (2021) and Case 2c: Zhao
and Illman (2022b) models, yielding σ2log10K estimates of 0.38 and 0.86, respectively, while Case
2a: McCall and Christy (2010)’s model results in a σ2log10K of 1.28 that is somewhat lower but
closer to the other approaches.
Assignment of K for the 19 Geological Model Layers
The point scale K measurements from various approaches in Cases 1a – 1c and 2a – 2c were
then used to populate the 19-layer geological model by taking the KG of all data points located in
each layer. Measurements from similar sediment material were attributed to layers that have no
sample data available. For example, as only 43 K estimates in 11 out of 19 layers of the geological
model were available from Case 1c: Slug Tests, it was most difficult to populate the model. As a
result, KG from layers 4, 8, 16, and 18 (clay) was assigned to layers 1 and 12 (clay); KG from layers
2, 7, and 14 (silt) was assigned to layer 10 (silt), KG from layer 13 (sandy silt) was assigned to
layers 6 and 9 (sandy silt), KG from layer 11 (sand) was assigned to layer 3 (sand), KG from layer
17 (clay & silt) was assigned to layer 19 (clay & silt), and KG from layers 3 and 11 (sand) and 2
and 10 (silt) were assigned to layer 5 (sand & silt). Similar steps were also performed for the other
methods if there were layers that did not contain any K estimates and described beneath Tables S2
– S7 in the SI section.
Additionally, the maximum, upper quartile, median, KG, lower quartile, and minimum of
log10K values estimated from all investigated approaches (except for Case 3a: PEST Calibrated
Geological Model and Case 3b: Averaged THT Geological Model) were computed for each
geological layer and plotted as box-and-whisker plots in Figure 4, while their numerical values
were summarized in Tables S2 – S9 of the SI section. The lower and upper range of K estimates
from Case 2a: McCall and Christy (2010)’s model and the higher range of the Case 2c: Zhao and
Illman (2022b)’s model were indicated within the box plots in Figure 4.
Examination of Figure 4 reveals that most of the box plots are either positively skewed or
negatively skewed depending on the approach examined. In addition, the interquartile range (IQR)
of K estimates from Case 1a: GSA and Case 1b: Permeameter Tests over 19 layers is generally
larger than those from other methods, suggesting larger variability of K estimates for each layer.
The IQR from Case 2a: McCall and Christy (2010)’s model is larger than the Case 2b: Borden et
al. (2021) and Case 2c: Zhao and Illman (2022b) models. In addition, the KG from Case 2b: Borden
et al. (2021)’s model is less variable than Case 2a: McCall and Christy (2010)’s and Case 2c: Zhao
and Illman (2022b)’s models. The IQR for Case 3c: Highly Parameterized THT Model is
consistently smaller, which indicates less dispersion of data sets along each of the 19 layers despite
the large degree of variability in K estimates for each layer compared to other approaches.
Next, the profiles of log10KG of 19 geological layers estimated from all investigated
approaches were plotted in Figure 5. This figure reveals that it is very hard to accurately
characterize a heterogeneous site, such as the NCRS, as estimated log10KG values could range
about four orders of magnitude within a single geological unit when using various site
characterization approaches. Overall, Case 1c: Slug Tests yielded higher K estimates compared to
Case 1a: GSA and Case 1b: Permeameter Tests. The K values from HPT surveys with three
different models (Cases 2a – 2c) yielded similar K estimates, while the estimates were generally
higher than those obtained from conventional methods (i.e., Case 1a: GSA and Case 1b:
Permeameter Tests). The K values estimated from Case 3a: PEST Calibrated Geological Model
were close to those generated from Case 3b: Averaged THT Geological Model.
Evaluation of K from Various Subsurface Characterization Methods by
Predicting Independent Groundwater Flow Events
Description of Groundwater Model and Experimental Design
Because the spatial distribution of the true K field across the NCRS is not available, K
estimates from various approaches were assessed through the prediction of drawdowns from
pumping/injection tests that have not been used for K estimation. For this, we constructed a HGS
model for forward simulations of independent pumping/injection tests with K and Ss fields derived
from different methods. Other than hydraulic parameter fields, HGS settings were the same as the
numerical model utilized in the work of Zhao and Illman (2018) as described previously.
Seven pumping/injection tests (PW1-3, PW1-5, PW3-1, PW3-4, PW5-1, PW5-4, and PW5-
5) not used in inverse modeling were simulated with HGS and results were compared to field data
via scatterplots to evaluate the performances of models built with K estimates from different
approaches. Since most of the conventional and HPT methods were not capable of providing Ss
estimates, the forward model’s ability to predict drawdowns under steady-state condition was the
first metric employed to evaluate the K estimates from various approaches.
Then, transient forward simulations were performed. For transient simulations, 19 Ss
estimates from Case 3a: PEST-calibrated 19-layer geological model were utilized in Cases 1a –
1c, Cases 2a - 2c, and Case 3a. For Case 3b, 31,713 Ss values estimated via THT analysis were
averaged for each of the 19 layers, while for Case 3c, 31,713 Ss values from the THT analysis were
utilized for simulating transient drawdown responses.
Comparison of K Distributions
The estimated K distributions from conventional (Cases 1a – 1c), HPT (Cases 2a – 2c), and
inverse modeling (Cases 3a – 3c) approaches utilized in the groundwater flow models are
presented as fence diagrams in Figure 6. In this figure, locations of CMT and PW wells as well as
HPT survey locations are indicated. As mentioned previously, the primary characteristic of the site
is an alternating aquifer-aquitard system, in which three discontinuous low-K units of clay to silt
are separated by two high-K units of sand to gravel.

Examination of Figure 6 reveals that the generated K distribution from Case 1a: GSA is
unable to capture the two aquifers correctly. In contrast, the aquitard clay layer 12 between the two
aquifers and the aquitard layer 1 at the uppermost of the model domain has a higher K value than
sand and gravel aquifer units, which is inconsistent with geological data. Based on Table S2, both
aquitard layers are primarily composed of clay. Thus, K values from these two units (layers 1 and
12) are estimated using Puckett et al. (1985)’s model, while K values of the two aquifer layers are
calculated with Hazen (1911)’s model. Similar findings were reported by Alexander et al. (2011),
where both Puckett et al. (1985) and Hazen (1911) models were utilized to calculate 270 grain size
distributions and results showed that the mean K generated by the Puckett et al. (1985) model was
about two orders of magnitude larger than estimated from the Hazen (1911) model. These findings
indicate that the Puckett et al. (1985) model may not be suitable to calculate K for clay materials
at a highly heterogeneous glaciofluvial deposited site even though equation (2) is only dependent
on clay content.
Results from Case 1b: Permeameter Tests reveal the existence of a double-aquifer system.
However, based on Table S3, relatively low K values are estimated for the sand and gravel layer
(i.e., layer 15). Lower K values are obtained for the aquitard layers above and below as well as in
between the aquifer system.
Results from Case 1c: Slug Tests capture the aquitard units below and above the aquifer
system. However, based on Tables S2 to S4, K values are relatively larger than those estimated by
Case 1a: GSA and Case 1b: Permeameter Tests. In addition, the double-aquifer system is not
reflected correctly. Based on Table S4, all the sandy-silt layers (6, 9, and 13) between and above
the aquifers have greater estimates of K than the two aquifer units (layers 11 and 15). The reason
is that the 43 slug test measurements only cover one sandy-silt layer. Specifically, there are only
four measurements for layer 13, while the estimated K values are relatively large and do not agree
with site geology.
The K distributions in Figure 6 from HPT using three different models (Cases 2a – 2c) are
generally biased towards higher K values. Specifically, results from Case 2a: McCall and Christy
(2010)’s model capture the lower aquitard. However, K estimates tend to be larger than those
generated from Case 1a: GSA and Case 1b: Permeameter Tests. In addition, only the lower aquifer
is revealed, while the clay layer 4 and silt layers 6, 7, and 10 located in the upper aquitard (based
on Table S5) have higher K estimates than the most permeable aquifer unit layer 15, which does
not conform to known geology. Results from Case 2b: Borden et al. (2021)’s model only captures
the lower aquitard, while K estimates are generally larger, thus every layer above the lower
aquitard is hard to be distinguished. The K distributions from Case 2c: Zhao and Illman (2022b)’s
model only captures the lower aquifer and the lowest aquitard, while the K values for units above
the lower aquifer are generally less variable and the aquitard layers have generally larger K
estimates. The less variable values from high-resolution HPT methods are mainly due to the
limited range of K estimates obtained from each of the three models (McCall and Christy, 2010;
Borden et al. 2021; and Zhao and Illman, 2022b).
Examination of three inverse modeling results (Cases 3a – 3c) reveals K variations more
accurately. The K values from Case 3a: PEST Calibrated Geological Model capture the expected
variation from one layer to the next. Specifically, the K estimate of the unit in between two aquifers
matches that expected for an aquitard. The K values for aquitards above and below the double
aquifer system are also estimated to be low. However, the K estimate for the upper aquitard is
relatively larger than the value estimated from permeameter tests, due to the sparse monitoring
data at the uppermost model domain. Case 3b: Averaged THT Geological Model has a similar K
distribution compared to Case 3a: PEST Calibrated Geological Model (as shown in Figure 6), and
the K estimate for the lowest aquifer agrees more with those obtained by conventional methods
(i.e., Case 1a: GSA and Case 1b: Permeameter Tests). Case 3c: Highly Parameterized THT Model
yields a K field that exhibits both inter- and intra-layer heterogeneity. It is noteworthy that the
estimated K from Case 3c for the lower aquifer layer 15 is higher compared with Cases 3a and 3b.
Results from Forward Simulations of Pumping/Injection tests
The performance of each K distribution in Figure 6 obtained by various methods was then
evaluated by predicting independent pumping tests that are not used for model calibration using
HGS (Aquanty, 2019) with the computational mesh described earlier. As previously noted, a total
of 15 pumping tests (PW1-1, PW1-3, PW1-4, PW1-5, PW1-6, PW1-7, PW2-3, PW3-1, PW3-3,
PW3-4, PW4-3, PW5-1, PW5-3, PW5-4, and PW5-5) were conducted at the NCRS, while eight
tests were utilized by Zhao and Illman (2018) for Case 3a, 3b, and 3c model calibrations (PW1-1,
PW1-4, PW1-6, PW1-7, PW2-3, PW3-3, PW4-3, and PW5-3). Therefore, for this study, seven
tests not used in model calibration by Zhao and Illman (2018) were chosen to evaluate the K
estimates from various approaches.
Since most of the conventional (i.e., Case 1a: GSA and Case 1b: Permeameter Tests) and
the HPT surveys (Cases 2a – 2c) in its current form cannot provide Ss estimates, steady-state
simulation was the first metric to evaluate the K estimates from various approaches. Only late-time
pressure heads from ports that reach steady or quasi-steady state were chosen, which resulted in
153 head data. To better evaluate the correspondence between the simulated and observed
drawdown values, quantitative analyses were first performed by comparing the coefficient of
determination (R2), mean absolute error (L1) and mean square error (L2), which are provided as:
2
1 𝑛𝑛
∑ �𝑋𝑋𝑖𝑖 −𝑋𝑋��𝑋𝑋�𝑖𝑖 −𝑋𝑋
�𝚤𝚤 �
𝑛𝑛 𝑖𝑖=1
𝑅𝑅 2 = � 2 1 2
� (1)
� 1 ∑𝑛𝑛 𝑛𝑛 � �
𝑛𝑛 𝑖𝑖=1 �𝑋𝑋𝑖𝑖 −𝑋𝑋� × ∑𝑖𝑖=1 �𝑋𝑋𝑖𝑖 −𝑋𝑋𝚤𝚤 �
𝑛𝑛
1
𝐿𝐿1 = 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 �𝑋𝑋𝑖𝑖 − 𝑋𝑋�𝑖𝑖 � (2)
1 2
𝐿𝐿2 = 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 �𝑋𝑋𝑖𝑖 − 𝑋𝑋�𝑖𝑖 � (3)
where 𝑛𝑛 is the total number of data, 𝑖𝑖 indicates the data number, 𝑋𝑋𝑖𝑖 is the simulated drawdown, 𝑋𝑋�𝑖𝑖
is the observed drawdown, 𝑋𝑋 is the mean of simulated drawdowns, and 𝑋𝑋�𝚤𝚤 is the mean of observed
drawdowns.
Statistics calculated from each method through seven simulations are summarized in Tables
S10 to S12 (in the SI section). Cells in Tables S10 to S12 are colour-coded to enhance the
comparison. Examination of Tables S10 to S12 reveals that Case 3c: Highly Parameterized THT
Model performs the best yielding smallest discrepancies between simulated and measured
drawdowns (i.e., smallest L1 and L2 norms) as well as highest R2 values for most of the pumping
tests, followed by Case 3b: Averaged THT Geological Model and Case 3a: PEST Calibrated
Geological Model.
In terms of K estimates from HPT surveys (Cases 2a – 2c), discrepancies between simulated
and observed drawdown values are the smallest for Case 2c: Zhao and Illman (2022b)’s model
among the three HPT formulae. Three conventional methods, especially Case 1a: GSA and Case
1b: Permeameter Tests rank at the lower end.
Simulation results are also assessed by plotting scatterplots, as shown in Figure 7. In each
plot, a linear model fit to all data and corresponding slope and intercept of these fits, as well as R2
values are included. Meanwhile, a 1:1 line is also included in each subplot to indicate a perfect
match. The slope and intercept values obtained from the linear model fit for individual tests from
all methods are summarized in Table S13 for the interested reader.
To enhance our comparison and evaluation, transient simulations were also performed. Since
most of the selected conventional and HPT methods cannot yield Ss estimates, estimates for the 19
geological layers obtained from Case 3a: PEST Calibrated Geological Model were assigned to
Cases 1a to 1c and Cases 2a to 2c. Three points were selected from the early, intermediate, and
late times of each drawdown curve, which resulted in a total of 388 drawdown data. It is worth
noting that less drawdown data points were selected from injection tests performed at PW3-1 and
PW5-1 as the data from the tests were noisy and impacted by the Noordbergum effect (Verruijt,
1969; Rodrigues, 1983; Berg et al., 2011). Therefore, only late time data were selected from those
drawdown curves. Similar to steady-state results, various model performance metrics such as the
R2, L1, L2, slope and intercept of the linear model are summarized in Tables S14 to S17, while
scatterplots of observed and simulated drawdowns are presented in Figure 8. Meanwhile,
simulated drawdown curves for the pumping/injection tests at ports PW1-3, PW1-5, PW3-1, PW3-
4, PW5-1, PW5-4, and PW5-5 with various K estimates are compared against observed drawdowns
on Figures S4 to S10 in the SI section, respectively.
Discussion
Which K Estimates Yield Best Predictions of Steady-State and Transient Drawdown
Predictions?
Examination of Figures 7 and 8 reveals that performance in steady-state and transient
simulation results are quite comparable. Specifically, groundwater models built with conventional
(Cases 1a - 1c) and HPT (Cases 2a - 2c) K estimates yield biased drawdown predictions for both
steady-state (Figure 7) and transient (Figure 8) simulation results. In contrast, inverse modeling
approaches based on Case 3a: PEST Calibrated Geological Model and Case 3b: Averaged THT
Geological Model both yield good predictions of drawdowns, while Case 3c: Highly
Parameterized THT Model produces excellent matches for steady state simulation results (Figure
7). For transient simulation results (Figure 8), the difference in performance among Cases 3a - 3c
is more comparable although Case 3c still yields the best prediction performance.
In terms of conventional methods, K estimates from Case 1a: GSA and Case 1b:
Permeameter Tests overpredict drawdowns, while those estimated via Case 1c: Slug Tests
underpredict drawdowns under steady-state and transient conditions (Figures 7 and 8). According
to Table 1, as well as Figure 5, Case 1a: GSA and Case 1b: Permeameter Tests tend to provide
smaller KG estimates than Case 1c: Slug Tests. Specifically, Table 1 shows that Case 1a: GSA and
Case 1b: Permeameter Tests yield KG values of 1.19 × 10-7 m/s and 3.03 × 10-7 m/s, respectively.
The lower KG values in relation to other approaches (Table 1) may be due to sample loss from
highly permeable zones as observed by Alexander et al. (2011). Core samples have been obtained
with a split spoon sampler that was driven in front of the drill head. Alexander et al. (2011) noted
that the sample recovery was on the order of 80% for all wells except for CMT3, which had a
lower recovery rate of 69%. Sample recovery was found to be good, but periodic gaps were noted
for depths corresponding with aquifers. Therefore, the coarse-grained portion of samples may have
been lost and not subjected to sieve analyses and permeameter tests. An additional factor relevant
to permeameter tests is the repacking of samples. Klute and Dirksen (1986) discussed that K of
repacked samples estimated in the laboratory can be artificially lower than those from intact
samples.
It is surprising to note that Case 1c: Slug Tests consistently underpredict drawdowns (Figures
7 and 8) as this method is widely used for various field investigations (Butler, 1997, 2005; Cardiff
et al., 2011). Table 1 shows that 43 slug tests yield a KG value of 2.65 × 10-6 m/s, which is
approximately one order of magnitude higher than those from Case 1a: GSA and Case 1b:
Permeameter Tests. This may be due to three potential commingling factors: 1) the relatively
sparse data points (n = 43) available at the site that could have led to preferential sampling from
higher K intervals; 2) the Hvorslev (1951) approach yielding slightly higher estimates of K (Xie,
2015) compared to the Bouwer and Rice (1976) and Kansas Geological Survey (KGS) models
(Hyder et al., 1994); and 3) the scale effect, in which slug tests sample larger volumes that may be
impacted by highly permeable zones not considered by other methods that sample smaller volumes
such as Case 1a: GSA and Case 1b: Permeameter Tests. Another potential factor may be the slug
test K estimate being representative of the filter pack. However, for this study, care was taken to
avoid fitting the Hvorslev (1951) model to the early portion of the head response curve.
Figures 7 and 8 also reveal that K estimates from various HPT formulae (Cases 2a – 2c)
persistently yield biased low predictions of drawdowns under both steady-state and transient
conditions. Examination of Table 1 reveals that KG from the three methods are 2.85 × 10-6 m/s,
5.78 × 10-6 m/s, and 3.84 × 10-6 m/s for Case 2a: McCall and Christy (2010), Case 2b: Borden et
al. (2021), and Case 2c: Zhao and Illman (2022) models, respectively. The KG estimates from the
three HPT formulae (Cases 2a – 2c) are approximately one order of magnitude larger than those
from conventional methods (Cases 1a – 1c). In terms of predictions of drawdowns, the use of three
different models yields results that are slightly different. Specifically, based on Figures 7 and 8,
R2 values increase, while both L1 and L2 norms decrease from Cases 2a to 2c. As previously
mentioned, Case 2c: Zhao and Illman (2022b)’s model is a site-specific relationship developed for
the NCRS. Therefore, building a site-specific model to interpret HPT data is helpful in terms of
site characterization. However, the improvement to drawdown predictions is not very significant
based on Figures 7 and 8 (see also Figures S4 to S10 in the SI section) as Cases 2a to 2c that all
underpredict observed drawdowns. An obvious reason is the limited range of estimated K for the
three models used to interpret HPT data at the NCRS. It is interesting to note that while the KG
values estimated through the three approaches are quite similar, the range of log10K is quite
different for each approach (Table 1). Moreover, despite Case 2c: Zhao and Illman (2022b)’s
model that extends the lower and upper K ranges compared to the other two models (Cases 2a and
2b), the resulting K estimates and corresponding forward simulations yield biased predictions. This
is likely due to the highly heterogeneous nature of the glaciofluvial deposits at the NCRS and the
connectivity of these units is an important consideration for building more accurate groundwater
models that predict drawdowns more accurately.
In contrast to the conventional and HPT K estimates that yield biased drawdown predictions,
we find that inverse modeling approaches (Cases 3a – 3c) with various parameterizations, all yield
more accurate drawdown predictions at the NCRS (Figures 7 and 8). Case 3a: PEST Calibrated
Geological Model and Case 3b: Averaged THT Geological Model both yield a good drawdown
match with the measured data, while Case 3c: Highly Parameterized THT Model with prior
geological information yields excellent forward simulation results under both steady and transient
conditions. The calibration of a HGS groundwater flow model based on geological zonation with
PEST (Case 3a) is also a form of THT analysis (Illman et al., 2015), but it differs from the high-
resolution approach (Case 3c) based on SimSLE in VSAFT3. The HGS/PEST calibration (Case
3a) fits all drawdown/buildup data in a least-square sense and restricts its effective parameter
estimates to 19 geologic zones, while SimSLE in VSAFT3 does not have this constraint. For this
reason, VSAFT3 can adjust more parameters such that the calibrated drawdown-time curves honor
the observed ones during each test. Therefore, VSAFT3’s estimates (Case 3c) yield better
predictions of independent pumping tests.
Based on Tables S10 to S12 and Tables S14 to S16 (SI section), Case 3c: Highly
Parametrized THT Model consistently yields the best R2, L1 and L2 norms under both steady-state
and transient conditions followed by Case 3a: PEST Calibrated Geological Model and Case 3b:
Averaged THT Geological Model. These results indicate that, even though the K fields (refer to
Figure 6, Cases 3a to 3c) reveal similar overall characteristics, local scale differences in K could
lead to noticeable differences between simulated drawdowns at various observation points. As a
result, Case 3c: Highly Parametrized HT Model that can accurately map both interlayer and
intralayer heterogeneities may be most suitable for high-resolution characterization at highly
heterogeneous sites such as the NCRS.
Should Groundwater Models Consider Variability in Ss?
For transient groundwater flow simulations, estimates of Ss are necessary. However, most
of the conventional and HPT methods do not yield these estimates. In addition, Ss is typically
considered to be much less variable than K, thus less attention has been paid. As a result, the
importance of including homogeneous or heterogeneous Ss estimates for transient groundwater
flow simulation is analyzed by: (1) using an effective Ss from Zhao and Illman (2018) who treated
the multi-aquifer-aquitard system to be homogeneous and isotropic; and (2) using estimated Ss
values from Case 3a: PEST Calibrated Geological Model of Zhao and Illman (2018).
To answer the question of whether groundwater models should consider variability in Ss or
not, additional transient simulations are performed with seven pumping tests (PW1-3, PW1-5,
PW3-1, PW3-4, PW5-1, PW5-4, and PW5-5) for Cases 1a to 1c and Cases 2a to 2c. The
corresponding L1 and L2 norms are summarized in Tables S18 and S19 (SI section), where the blue
color represents the results from homogeneous Ss, while the yellow color represents results from
heterogeneous Ss. The bold values of L1 and L2 norms on Tables S18 and S19 indicate smaller
values for either the homogeneous or heterogenous Ss case identifying the case exhibiting less
discrepancy between simulated and observed drawdowns.
Examination of Tables S18 and S19 reveals that for virtually all cases, providing
heterogeneous Ss estimates to each of the 19-layers in the model yields better transient simulation
results than utilizing a homogeneous Ss value encompassing all 19-layers. As a result, to achieve
more accurate transient groundwater flow simulation results, it may be advisable to spend more
efforts in accurately capturing Ss heterogeneity at sites where the lithology changes significantly
throughout the simulation domain.
Summary and Conclusions
The accurate characterization of subsurface heterogeneity in K and Ss is important in
building robust groundwater models for improved predictions of groundwater flow and solute
transport. There are several conventional approaches to estimate K including the use of empirical
and analytical formulae to interpret data from GSA, permeameter, slug and pumping tests. Over
the last two decades, several DP-based field tools such as DPIL and HPT have been developed to
characterize high-resolution spatial variations of K in heterogeneous unconsolidated formations.
The newer DP tools and interpretation methods have positioned DP surveys to become one of the
most efficient approaches for site characterization compared to conventional methods, although
information on spatial K variability and connectivity requires interpolation of K values at DP
locations. Inverse modeling methods, such as automatic calibration of geology-based groundwater
models and more recent development and testing of HT have shown its effectiveness in yielding
robust estimates of K and Ss heterogeneity between boreholes.
Previously, various studies have been published that compared different methods of
estimating K, but there is lack of consensus of a method that yields K estimates that are most useful
for groundwater flow models. In this study, we utilize a groundwater flow model, constructed with
19 geological layers representative of a multi-aquifer-aquitard system at the NCRS, to evaluate the
performance of three generations of site characterization approaches for K including: (1)
conventional techniques (Case 1a: GSA; Case 1b: Permeameter Tests; and Case 1c: Slug Tests);
(2) HPT survey data interpreted with three different models [Case 2a: McCall and Christy (2010);
Case 2b: Borden et al. (2021); and Case 2c: Zhao and Illman (2022b)]; and (3) three inverse
modeling approaches (Cases 3a: PEST Calibrated Geological Model; Case 3b: Averaged THT
Geological Model; and Case 3c: Highly Parameterized THT Model) in terms of their ability to
predict drawdowns under both steady-state and transient conditions. This study leads to the
following major findings and conclusions:
1. Despite the time and effort to conduct 270 GSA, 642 permeameter tests, and 43 slug tests,
conventional methods at the NCRS yielded biased K estimates that led to poor predictions
of drawdowns from pumping tests. Most empirical formulae applied with data from GSA
were developed for relatively permeable materials, which presents a challenge for their
application to highly heterogeneous settings consisting of low K media. More importantly,

low core sample recovery from highly permeable zones can lead to biased low K estimates,
which in turn could impact groundwater flow modeling results.
2. The development of DP techniques and the HPT has significantly advanced our capabilities
in high-resolution characterization of K along vertical profiles at DP locations in
unconsolidated media. While the approach yields rapid estimates of K at an unprecedently
high-resolution, the estimation of K from HPT survey data may require more attention than
previously thought. In this study, three separate approaches [i.e., Case 2a: McCall and
Christy (2010); Case 2b: Borden et al. (2021); Case 2c: Zhao and Illman (2022b)] were
utilized to estimate K. The K estimates obtained through the three different formulae were
each constrained through varying upper and lower bounds, which presented challenges in
characterizing low permeability materials such as silt and clay. Groundwater flow
simulations with K estimates derived from three formulae yielded biased predictions of
drawdowns at the NCRS. Given HPT’s significant advantage in hydrogeologic
characterization of unconsolidated deposits, it is necessary to advance the logging
apparatus and corresponding interpretation methods to attain an extended range of
estimates for both higher and lower K geological media.
3. Inverse modeling of pumping test data with geology-based and highly parameterized
geostatistics-based HT models at the NCRS has shown that they yield robust estimates of
K and Ss that are useful for steady-state and transient groundwater flow simulations.
Specifically, the automatic calibration of a groundwater flow model yielded parameter
estimates that consistently led to accurate predictions of pumping tests not used in the
calibration effort. Drawdown predictions were found to improve dramatically by utilizing
a highly parameterized groundwater flow model with parameter estimates from THT that
captured the most salient features of interlayer and intralayer K heterogeneity. Additional
transient simulations, in which heterogeneous Ss values were considered, revealed
obviously improved drawdown predictions suggesting the benefits of Ss heterogeneity
characterization at sites where large changes to lithologies are found. While the accurate
prediction of drawdowns from pumping tests is promising, further studies are needed to
see whether these K distributions are useful for contaminant transport predictions.
4. Our research suggests that inverse modeling is a necessary step in building more robust
groundwater flow models echoing suggestions by Poeter and Hill (1997) and Carrera et al.
(2005). HT additionally fuses information from multiple pumping tests and can integrate
other data such as from geological investigations (e.g., Zhao and Illman, 2018),
geophysical surveys (e.g., Soueid Ahmed et al., 2015), flowmeter surveys (Li et al., 2008,
Aliouache et al., 2021; Luo et al., 2023), tracer tests (e.g., Yeh and Zhu, 2007; Illman et
al., 2010; Doro et al., 2014) and high-resolution pressure (Zhao and Illman, 2022a) as well
as K estimates from the HPT surveys (Zhao et al., 2023) that further improves parameter
estimates. However, HT should not be considered a panacea technology as the parameter
estimates are highly dependent on model conceptualization, accuracy of data fed into
models including forcing functions (i.e., initial and boundary conditions, source/sink terms)
applied to models. Data fusion as part of inverse modeling is encouraged for building more
robust groundwater models and obtaining better parameter estimates but should be done
with caution always considering the information content of data.
Acknowledgements
The HPT surveys conducted by Geoprobe Systems, GroundTech Solutions Ltd. and the University
of Waterloo (UW) at the NCRS were a result of discussions at the NovCare meeting held at the
University of Waterloo during the summer of 2019. We are very grateful to Wes McCall from
Geoprobe Systems and Jeff Bibbings from GroundTech Solutions Ltd. for visiting UW and
training our staff and students to conduct the HPT surveys at the NCRS. Walter A. Illman
acknowledges the partial support from the Discovery Grant awarded by the Natural Sciences and
Engineering Research Council of Canada (NSERC). Dongwei Sun acknowledges the support from
the Qinhuangdao Architecture Design Institute and Brayden McNeill from Aquanty Inc. who
provided guidance on building the initial HGS model for this study. Finally, we thank the
Executive Editor (Charles Andrews), Mike Fienen, and the two anonymous reviewers for
providing constructive comments that led to an improved manuscript.
Supporting Information
Supporting Information is generally not peer reviewed. Supporting Information can be found in an
online document that contains additional details to methods used to estimate K and Tables S1 to
S19 as well as Figures S1 to S10 as referenced in the text above.
References
Alexander, M., S. J. Berg, and W. A. Illman. 2011. Field study of hydrogeologic characterization
methods in a heterogeneous aquifer. Ground Water 49, no. 3: 365–382.
Aliouache, M., X. Wang, P. Fischer, G. Massonnat, and H. Jourde. 2021. An inverse approach
integrating flowmeter and pumping test data for three-dimensional aquifer
characterization. Journal of Hydrology 603: 126939.
Aquanty, Inc. 2019. HydroGeoSphere: A three-dimensional numerical model describing fully
integrated subsurface and surface flow and solute transport. Waterloo, Ontario, Canada.
ARANZ Geo. Limited., 2015. Leapfrog Hydro 2.2.3. 3D Geological Modeling Software.
Barr, D. W. 2001. Coefficient of permeability determined by measurable parameters. Ground
Water 39, no. 3: 356–361.
Beckie, R., and C. F. Harvey. 2002. What does a slug test measure: an investigation of
instrument response and the effects of heterogeneity. Water Resources Research 38, no.
12: 1290.
Berg, S. J., P. A. Hsieh, and W. A. Illman. 2011. Estimating hydraulic parameters when
poroelastic effects are significant. Ground Water 49, no. 6: 815–829.
Berg, S. J., and W. A. Illman. 2011a. Capturing aquifer heterogeneity: comparison of approaches
through controlled sandbox experiments. Water Resources Research 47, no. 9: W09514.
Berg, S. J., and W. A. Illman. 2011b. Three-dimensional transient hydraulic tomography in a
highly heterogeneous glaciofluvial aquifer-aquitard system. Water Resources Research
47, no. 10: W10507.
Bohling, G. C., X. Zhan, J. J. Butler, Jr., and L. Zheng. 2002. Steady shape analysis of
tomographic pumping tests for characterization of aquifer heterogeneities. Water
Resources Research 38, no. 12: 1324.
Bohling, G. C., J. J. Butler, Jr., X. Zhan, and M. D. Knoll. 2007. A field assessment of the value
of steady shape hydraulic tomography for characterization of aquifer heterogeneities.
Water Resources Research 43: W05430.
Borden, R. C., K. Y. Cha, and G. Liu. 2021. A physically based approach for estimating
hydraulic conductivity from HPT pressure and flowrate. Ground Water 59, no. 2: 266–
272.
Bouwer, H. and R. C. Rice. 1976. A slug test method for determining hydraulic conductivity of
unconfined aquifers with completely or partially penetrating wells, Water Resources
Research 12, no. 3: 423-428.
Brauchler, R., R. Hu, L. Hu, S. Jiménez, P. Bayer, P. Dietrich, and T. Ptak. 2013. Rapid field
application of hydraulic tomography for resolving aquifer heterogeneity in
unconsolidated sediments. Water Resources Research 49, no. 4: 2013–2024.
Butler, J. J., Jr. 2019. The Design, Performance, and Analysis of Slug Tests. 2nd ed. CRC Press,
Boca Raton, FL, 280 pp.
Butler, J. J., Jr. and J. M. Healey. 1998. Relationship between pumping test and slug-test
parameters: scale effect or artifact? Ground Water 36: 305–313.
Butler, J. J., Jr. 2005. Hydrogeological methods for estimation of spatial variations in hydraulic
conductivity. Hydrogeophysics, 23-58. Springer Netherlands, 527 pp.
Butler, J. J., Jr, E. J. Garnett, and J. M. Healey. 2003. Analysis of slug tests in formations of high
hydraulic conductivity. Ground Water 41, no. 5: 620–630.
Butler, J. J., P. Dietrich, V. Wittig, and T. Christy. 2007. Characterizing hydraulic conductivity
with the direct-push permeameter. Ground Water 45, no. 4: 409–419.
Cardiff, M., W. Barrash, M. Thoma, and B. Malama. 2011. Information content of slug tests for
estimating hydraulic properties in realistic, high-conductivity aquifer scenarios. Journal
of Hydrology 403, no.1–2: 66–82.
Cardiff, M., W. Barrash, and P. K. Kitanidis. 2013. Hydraulic conductivity imaging from 3-D
transient hydraulic tomography at several pumping/observation densities. Water
Resources Research 49, no. 11: 7311–7326.

Carrera, J., A. Alcolea, A. Medina, J. Hidalgo, L. J. Slooten. 2005. Inverse problem in
hydrogeology. Hydrogeology Journal 13: 206–222.
Castagna, M., M. W. Becker, and A. Bellin. 2011. Joint estimation of transmissivity and
storativity in a bedrock fracture. Water Resources Research 47, no. 9: W09504.
Chapuis, R. P., V. Dallaire, D. Marcotte, M. Chouteau, N. Acevedo, and F. Gagnon. 2005.
Evaluating the hydraulic conductivity at three different scales within an unconfined sand
aquifer at Lachenaie, Quebec. Canadian Geotechnical Journal 42, no. 4: 1212–1220.
Clauser, C., 1992. Permeability of crystalline rocks. Eos Transactions American Geophysical
Union 73, no. 21: 233 - 238.
Cooper, H. H., and C. E. Jacob. 1946. A generalized graphical method for evaluating formation
constants and summarizing well-field history, Eos Transactions American Geophysical
Union 27, no. 4: 526– 534.
Devlin, J. F. 2015. HydrogeoSieveXL: an Excel-based tool to estimate hydraulic conductivity
from grain-size analysis. Hydrogeology Journal 23, no. 4: 837–844.
Dietrich, P., J. J. Butler, Jr. and K. Faiß. 2008. A rapid method for hydraulic profiling in
unconsolidated formations. Ground Water 46, no. 2: 323–328.
Doherty, J. 2005. PEST: Model-Independent Parameter Estimation User Manual. Watermark
Numerical Computing, Brisbane, Australia.
Doherty, J., and D. Welter. 2010. A short exploration of structural noise, Water Resources
Research 46: W05525.

Doherty, J. 2015. Calibration and Uncertainty Analysis for Complex Environmental Models,
PEST: complete theory and what it means for modelling the real world, Watermark
Numerical Computing, 237 pp.
Doro, K. O., O. A. Cirpka, and C. Leven. 2014. Tracer tomography: Design concepts and field
experiments using heat as a tracer. Groundwater 53, no. S1: 139 – 148.
Fischer, P., A. Jardani, and N. Lecoq. 2018. Hydraulic tomography of discrete networks of
conduits and fractures in a karstic aquifer by using a deterministic inversion algorithm.
Advances in Water Resources 112: 83–94.
Freeze, R. A., and J. A. Cherry. 1977. Groundwater. Prentice-Hall.
Geoprobe. 2015. Geoprobe ® Hydraulic Profiling Tool (HPT) System Standard Operating
Procedure.
Hazen, A. 1911. Discussion: Dams on sand foundations. Transactions, American Society of Civil
Engineers 73, no. 11: 199.
Hinsby, K., P. L. Bjerg, L. J. Andersen, B. Skov, and E. V. Clausen. 1992. A mini slug test
method for determination of a local hydraulic conductivity of an unconfined sandy
aquifer. Journal of Hydrology 136, no. 1–4: 87–106.
Hsieh, P. A. and S. P. Neuman. 1985. Field determination of the three-dimensional hydraulic
conductivity tensor of anisotropic media. 1. Theory. Water Resources Research 21, no.
11: 1655-1665.
Hu, R., R. Brauchler, M. Herold, and P. Bayer. 2011. Hydraulic tomography analog outcrop
study: Combining travel time and steady shape inversion. Journal of Hydrology 409, no.
1–2: 350–362.
Huang, S.-Y., J.-C., Wen, T.-C. J., Yeh, W. Lu, H.-L. Juan, C.-M. Tseng, J.-H. Lee, K.-C.
Chang. 2011. Robustness of joint interpretation of sequential pumping tests: Numerical
and field experiments. Water Resources Research 47, no.10: W10530.
Hvorslev, M. J. 1951. Time Lag and Soil Permeability in Ground-Water Observations, Bull. No.
36. Vicksburg, Mississippi: Waterways Experiment Station, Corps of Engineers, U.S.
Army, 1–50.
Hyder, Z., J. J. Butler, Jr., C. D. McElwee, and W. Liu. 1994. Slug tests in partially penetrating
wells, Water Resources Research 30, no. 11: 2945 - 2957.
Illman, W. A. 2006. Strong field evidence of directional permeability scale effect in fractured
rock, Journal of Hydrology 319, no. 1 – 4: 227-236.
Illman, W. A., X. Liu, and A. J. Craig. 2007. Steady-state hydraulic tomography in a laboratory
aquifer with deterministic heterogeneity: Multi-method and multiscale validation of
hydraulic conductivity tomograms, Journal of Hydrology 341, no. 3 – 4: 222-234.
Illman, W. A., X. Liu, S. Takeuchi, T.-C. J. Yeh, K. Ando, and H. Saegusa. 2009. Hydraulic
tomography in fractured granite: Mizunami Underground Research site, Japan, Water
Resources Research 45: W01406.
Illman, W. A., S. J. Berg, X. Liu, and A. Massi. 2010. Hydraulic/partitioning tracer tomography
for DNAPL source zone characterization: Small-scale sandbox experiments.
Environmental Science & Technology 44, no. 22: 8609–8614.
Illman, W. A., S. J. Berg, and Z. Zhao. 2015. Should hydraulic tomography be interpreted using
geostatistical inverse modeling? A laboratory sandbox investigation, Water Resources
Research 51: 3219–3237.

Jiang, L., R. Sun, T.-C. J. Yeh, and X. Liang. 2021. Inverse modeling of different stimuli and
hydraulic tomography: A laboratory sandbox investigation, Journal of Hydrology 603:
127108.
Karrow, P. F. 1979. Quaternary geology of the University of Waterloo campus. Department of
Earth Sciences, University of Waterloo, Waterloo, ON.
Karrow, P. F. 1993. Quaternary geology, Stratford-Conestogo area. Ontario Ministry of
Northern Development and Mines, 283.
Klute, A., and C. Dirksen. 1986. Hydraulic conductivity and diffusivity: laboratory methods.
Methods of Soil Analysis, 687-734. John Wiley & Sons, Ltd.
Kozeny, J. 1953. Das wasser im boden. grundwasserbewegung. Hydraulik 380-445. Springer.
Krumbein, W. C., and G. D. Monk. 1943. Permeability as a function of the size parameters of
unconsolidated sand. Transactions of the AIME 151, no. 01: 153–163.
Li, W., A. Englert, O. A. Cirpka, and H. Vereecken. 2008. Three-dimensional geostatistical
inversion of flowmeter and pumping test data. Groundwater 46, no. 2: 193-201.
Liu, G., J. J. Butler, Jr., E. Reboulet, and S. Knobbe. 2012. Hydraulic conductivity profiling with
direct push methods. Ground Water 17, no. 1: 19–29.
Liu, X., W. A. Illman, A. J. Craig, J. Zhu, and T.-C. J. Yeh. 2007. Laboratory sandbox validation
of transient hydraulic tomography. Water Resources Research 43, no. 5: W05404.
Luo, N., Z. Zhao, W. A. Illman, and S. J. Berg. 2017. Comparative study of transient hydraulic
tomography with varying parameterizations and zonations: Laboratory sandbox
investigation. Journal of Hydrology 554: 758–779.

Luo, N., W. A. Illman, and Y. Zha. 2022. Large-scale three-dimensional hydraulic tomography
analyses of long-term municipal wellfield operations. Journal of Hydrology 610: 127911.
Luo, N., Z. Zhao, W. A. Illman, Y. Zha, C.-M. W. Mok, and T.-C. J. Yeh (2023), Three-
dimensional steady-state hydraulic tomography analysis with integration of cross-hole
flowmeter data at a highly heterogeneous site, Water Resources Research 59:
e2022WR034034.
McCall, W., and T. M. Christy. 2010. Development of a Hydraulic Conductivity-Estimate for the
Hydraulic Profiling Tool (HPT) Abstract and Presentation, The 2010 North American
Environmental Field Conference& Exposition. The Nielsen Environmental Field School,
Las Cruces, NM. January.
McCall, W., and T. M. Christy. 2020. The hydraulic profiling tool for hydrogeologic
investigation of unconsolidated formations. Groundwater Monitoring & Remediation 40,
no. 3: 89–103.
Neuman, S. P. G. R. Walter, H. W. Bentley, J. J. Ward, and D. D. Gonzales. 1984.
Determination of horizontal aquifer anisotropy with three wells, Ground Water 22, no. 1:
66-72.
Ning, Z., N. Luo, K. Inaba, T. Nakashima, T. Shimizu, and W. A. Illman. 2023. Three-
dimensional hydraulic tomography analyses to investigate commingling issues of
reproducibility, data density, and geological prior models. Journal of Hydrology 616:
128785.
Poeter, E. P. and M. C. Hill. 1997. Inverse model: A necessary next step in ground-water
modeling, Groundwater 35, no. 2: 250-260.

Poeter, E. P. and M. C. Hill. 1998. Documentation of UCODE, a computer code for universal
inverse modeling. USGS Water-Resources Investigations Report 98-4080. Reston,
Virginia, USGS.
Puckett, W. E., J. H. Dane, and B. F. Hajek. 1985. Physical and mineralogical data to determine
soil hydraulic properties. Soil Science Society of America Journal 49, no. 4: 831–836.
Rehfeldt, K. R., J. M. Boggs, and L. W. Gelhar. 1992. Field study of dispersion in a
heterogeneous aquifer: geostatistical analysis of hydraulic conductivity. Water Resources
Research 28, no. 12: 3309–3324.
Rodrigues, J. D. 1983. The Noordbergum effect and characterization of aquitards at the Rio
Maior mining project. Ground Water 21, no. 2: 200–207.
Rosas, J., O. Lopez, T. M. Missimer, K. M. Coulibaly, A. H. A. Dehwah, K. Sesler, L. R. Lujan,
and D. Mantilla. 2014. Determination of hydraulic conductivity from grain-size
distribution for different depositional environments. Ground Water 52, no. 3: 399–413.
Rovey II., C.W., and D. S. Cherkauer. 1995. Scale dependency of hydraulic conductivity
measurements. Ground Water 33, no.5: 769–780.
Sebol, L. A. 2000. Determination of groundwater age using CFCs in three shallow aquifers in
Southern Ontario. Ph.D. dissertation, Department of Earth and Environmental Sciences,
University of Waterloo, Waterloo, Ontario, Canada.
Soueid Ahmed, A., A. Jardani, A. Revil, J. P. Dupont. 2014. Hydraulic conductivity field
characterization from the joint inversion of hydraulic heads and self-potential data. Water
Resources Research 50, no. 4: 3502-3522.

Straface, S., T.-C.J. Yeh, J. Zhu, S. Troisi, and C.H. Lee. 2007. Sequential aquifer tests at a well
field, Montalto Uffugo Scalo, Italy. Water Resources Research 43, no. 7: W07432.
Sudicky, E. A. 1988. Reply. Water Resources Research 24, no. 6: 895–896.
Sun, D., N. Luo, A. Vandenhoff, C. Wang, Z. Zhao, D. L. Rudolph, and W. A. Illman. 2022.
Evaluation of the hydraulic profiling tool (HPT) at a highly heterogeneous field site
underlain by glaciofluvial deposits, Draft Technical Report submitted to Geoprobe
Systems, 74 pp.
Theis, C. V. 1935. The relation between the lowering of piezometric surface and the rate of the
duration of discharge of well using groundwater storage. Eos, Transactions American
Geophysical Union 16: 519-524.
Tiedeman, C.R., and W. Barrash. 2020. Hydraulic tomography: 3D hydraulic conductivity,
fracture network, and connectivity in mudstone. Ground Water 58, no. 2: 238–257.
Tong, X., W. A. Illman, S. J. Berg, and N. Luo. 2021. Hydraulic tomography analysis of
municipal-well operation data with geology-based groundwater models. Hydrogeology
Journal 29, no. 5: 1979–1997.
Verruijt, A. 1969. Elastic storage of aquifers. Flow through Porous Media, 1: 331–376.
Vesselinov, V. V., S. P. Neuman, and W. A. Illman. 2001. Three-dimensional numerical
inversion of pneumatic cross-hole tests in unsaturated fractured tuff 2. Equivalent
parameters, high-resolution stochastic imaging and scale effects. Water Resources
Research 37, no. 12: 3019–3041.
Vienken, T., and P. Dietrich. 2011. Field evaluation of methods for determining hydraulic
conductivity from grain size data. Journal of Hydrology 400, no. 1–2: 58–71.
Vukovic, M., and A. Soro. 1992. Determination of Hydraulic Conductivity of Porous Media
from Grain-Size Composition. Water Resources Publications, LLC Highlands Ranch,
Colorado.
White, I. 1988. Comment on “A natural gradient experiment on solute transport in a sand
aquifer: Spatial variability of hydraulic conductivity and its role in the dispersion
process” by E. A. Sudicky. Water Resources Research 24, no. 6: 892-894.
Williamson, P. 2016. Examination of the electrical-hydraulic conductivity relationship at a
highly heterogeneous site, MSc report, University of Waterloo, 78 pp.
Wu, C.-M., T.-C. J. Yeh, J. Zhu, T. H. Lee, N.-S. Hsu, C.-H. Chen, and Sancho, A. F. 2005.
Traditional analysis of aquifer tests: Comparing apples to oranges?, Water Resources
Research 41: W09402.
Xiang, J., T.-C. J. Yeh, C.-H. Lee, K.-C. Hsu, and J.-C. Wen. 2009. A simultaneous successive
linear estimator and a guide for hydraulic tomography analysis. Water Resources
Research 45, no. 2: W02432.
Xie, Q. 2015. Slug tests analysis with different analytical models at a highly heterogeneous field
site. B.Sc. thesis, Department of Earth and Environmental Sciences, University of
Waterloo, Waterloo, Ontario, Canada.
Yeh, T.-C. J., R. Srivastava, A. Guzman, and T. Harter. 1993. A numerical model for water flow
and chemical transport in variably saturated porous media. Ground Water 31, no. 4: 634–
644.
Yeh, T.-C. J., J. Mas‐Pla, T. M . W illiams, and J. F, M cCarthy . 1995. Observation and three-
dimensional simulation of chloride plumes in a sandy aquifer under forced-gradient
conditions. Water Resources Research 31, no. 9: 2141-2157.
Yeh, T.-C. J., and S. Liu. 2000. Hydraulic tomography: development of a new aquifer test
method. Water Resources Research 36, no. 8: 2095–2105.
Yeh, T.-C. J., and J. Zhu (2007), Hydraulic/partitioning tracer tomography for characterization of
dense nonaqueous phase liquid source zones, Water Resources Research 43: W06435.
Yeh, T.-C. J., D. Mao, Y. Zha, J.-C. Wen, L. Wan, K.-C. Hsu, and C.-H. Lee. 2015. Uniqueness,
scale, and resolution issues in groundwater model parameter identification. Water Science
and Engineering 8, no. 3: 175-194.
Zha, Y., T.-C. J. Yeh, W. A. Illman, T. Tanaka, P. Bruines, H. Onoe, H. Saegusa, D. Mao, S.
Takeuchi, and J.-C. Wen. 2016. An application of hydraulic tomography to a large-scale
fractured granite site, Mizunami, Japan, Groundwater 54, no. 6: 793-804.
Zha, Y., T.-C. J. Yeh, W. A. Illman, C. M. W. Mok, C.-H. M. Tso, Y.-L. Wang. 2019.
Exploitation of pump-and-treat systems for characterization of hydraulic heterogeneity,
Journal of Hydrology 573: 324-340.
Zhao, Z., W. A. Illman, T.-C. J. Yeh, S. J. Berg, and D. Mao. 2015, Validation of hydraulic
tomography in an unconfined aquifer: A controlled sandbox study, Water Resources
Research 51: 4137–4155.
Zhao, Z., W. A. Illman, and S. J. Berg. 2016. On the importance of geological data for hydraulic
tomography analysis: Laboratory sandbox study. Journal of Hydrology 542: 156–171.

Zhao, Z., and W. A. Illman. 2017. On the importance of geological data for three dimensional
steady-state hydraulic tomography analysis at a highly heterogeneous aquifer-aquitard
system. Journal of Hydrology 544: 640–657.
Zhao, Z., and W. A. Illman. 2018. Three-dimensional imaging of aquifer and aquitard
heterogeneity via transient hydraulic tomography at a highly heterogeneous field site.
Journal of Hydrology 559: 392–410.
Zhao, Z., and W. A. Illman. 2022a. Integrating hydraulic profiling tool pressure logs and
hydraulic tomography for improved high-resolution characterization of subsurface
heterogeneity. Journal of Hydrology 610: 127971.
Zhao, Z., and W. A. Illman. 2022b. Improved high-resolution characterization of hydraulic
conductivity through inverse modeling of HPT profiles and steady-state hydraulic
tomography: Field and synthetic studies. Journal of Hydrology 612: 128124.
Zhao, Z., S. J. Berg, W. A. Illman, and Y. Qi. 2022. Improving predictions of solute transport in
a laboratory sandbox aquifer through high-resolution characterization with hydraulic
tomography. Journal of Hydrology 615: 128673.
Zhao, Z., N. Luo, and W. A. Illman. 2023. Geostatistical analysis of high-resolution hydraulic
conductivity estimates from the hydraulic profiling tool and integration with hydraulic
tomography at a highly heterogeneous field site, Journal of Hydrology 617: 129060.
Zhu, J., and T.-C. J. Yeh. 2005. Characterization of aquifer heterogeneity using transient
hydraulic tomography. Water Resources Research 41, no. 7: 1–10.

Table 1. Descriptive statistics of K and log10K from various methods at the NCRS.
Method n Min. K (m/s) Max. K (m/s) KG (m/s) Range of log10 K σ2log10 K
Case 1a: GSA (Three Models) 270 3.07×10−11 2.50×10−3 1.19×10−7 7.91 2.63
Case 1b: Permeameter Tests 642 1.15×10−10 4.63×10−3 3.03×10−7 7.60 1.55
Case 1c: Slug Tests 43 1.21×10−8 1.68×10−4 2.65×10−6 4.14 1.47
Case 2a: HPT (McCall and Christy, 2010) 7,660 3.53×10−7 2.65×10−4 2.85×10−6 2.88 1.28
Case 2b: HPT (Borden et al. 2021) 7,660 1.13×10−8 2.69×10−4 5.78×10−6 4.38 0.38
Case 2c: HPT (Zhao and Illman, 2022b) 7,660 3.78×10−10 6.90×10−4 3.84×10−6 6.26 0.85
Case 3a: PEST Calibrated Geological Model 19 2.53×10−9 1.07×10−4 1.25×10−6 4.63 1.61
Case 3b: Averaged THT Geological Model 19 5.44×10−9 1.29×10−4 1.14×10−6 4.37 1.46
Case 3c: Highly Parameterized THT Model 31,713 4.20×10−11 2.90×10−3 5.79×10−7 7.84 1.47
Figure Captions
Figure 1. a) Schematic diagram in plan view showing the well configuration including the CMT
and PW well network and nine NC wells where geological data are obtained, as well as 11 HPT
profile locations. Gray dashed lines represent four geological cross sections A-A’, B-B’, C-C’ and
D-D’ as presented in Figure 2; b) 3D perspective view of wells and DP locations within the 15 m
× 15 m well cluster area shown as a blue dashed area in Figure 1a along with numbered well
screens and pumped ports, as well as high-resolution HPT profile locations.
Figure 2. Cross-sectional view of the 19-layer geological zonation model with CMT and PW
screened intervals shown in cross sections C-C’ and D-D’. Cross sections along A-A’ and B-B’
are available in Figure S1 of the Supporting Information (SI) section. The 19 layers represent 7
different material types as indicated in the stratigraphic index. The 7 material types were obtained
through examination of cores from 18 boreholes at the site. Specifically, the 19 layers indicated
on cross sections C-C’ and D-D’ are clay (1, 4, 8, 12, 16, 18), silt and clay (17, 19), silt (2, 7, 10,
14), sandy silt (6, 9, 13), silt and sand (5), sand (3, 11) and sand and gravel (15). On cross sections
C-C’ and D-D’, layer numbers are italicized and numbers along PW and CMT wells indicate port
numbers (e.g., PW1-1, PW1-2, and so on).
Figure 3. Vertical profiles of log10K estimates from various approaches with K in units of m/s
along CMT3 and HPT6 plotted against site stratigraphy.
Figure 4. Box-and-whisker plots of log10K estimates with K in units of m/s from various site
characterization methods for 19 layers of the geological model.
Figure 5. Log10KG estimates with KG in units of m/s from various site characterization
approaches for 19 layers of the geological model. Log10K values from Case 3c are also plotted,
but as the values are not provided in terms of layers, those values are plotted against Depth (m)
on the right axis based on the vertical profile of PW1 at the center of the simulation domain.
Figure 6. K distributions at the NCRS from various site characterization approaches. CMT and
PW well locations (red lines) along with their screened intervals (black colour) as well as HPT
survey locations (dashed pink lines) are shown on each subfigure.
Figure 7. Scatterplots of observed versus simulated drawdowns from various K characterization
approaches for model validation under steady state conditions.
Figure 8. Scatterplots of observed versus simulated drawdowns from various K characterization
approaches for model validation under transient conditions.

Figure_1.tiff
Figure_2.tiff
Figure_3.tiff
Figure_4.tiff
Figure_5.tiff
Figure_6.tiff
Figure_7.tiff
Figure_8.tiff

Groundwater - 2023 - Sun

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Groundwater - 2023 - Sun

Uploaded by

Copyright:

Available Formats

Research Paper\

Evaluation of Hydraulic Conductivity Estimates from Various

Approaches with Groundwater Flow Models

tomography, geological model, groundwater model.

Significant efforts have been expended for improved characterization of hydraulic

Moreover, inverse modeling approaches based on geology-based zonations, and highly

modeling based on geology-based zonations and highly parameterized approaches. The

performance of each approach is first qualitatively analyzed by comparing K estimates to site

are necessary steps in predicting accurate groundwater flow behavior.

heterogeneity presents significant challenges to implementing conventional site characterization

Conventional methods such as empirical-relation-based grain size analyses (GSA),

without interpolating point-scale measurements.

be selected and used for data analysis to minimize interpretation errors.

To obtain larger-scale estimates of K and Ss representative of large-scale groundwater flow

Neuman 1985). However, estimates of K and Ss from the traditional interpretation of

necessary for higher resolution subsurface characterization of K and Ss heterogeneity.

efficient alternatives to conventional well-based approaches for providing high-resolution vertical

have been developed as an alternative approach to estimate K and Ss by calibrating a groundwater

flow model consisting of geological zonations with ambient or anthropogenically modified

pumping or injection tests and collects drawdown/buildup-time datasets at several surrounding

medium to be homogeneous, consisting of geology-based zonations, or highly parameterized

various degrees of model parameterization ranging from geological zonations to a highly

parameterized geostatistics-based HT approach.

A question frequently encountered by hydrogeologists is what approach should be adopted

geology-based zonations, and a highly parameterized geostatistics-based HT approach.

and useful for groundwater flow modeling.

2011), which is underlain by a multiple aquifer-aquitard system consisting of highly heterogeneous

characterization methods for K heterogeneity. Approaches evaluated include: (1) conventional

Specifically, a three-dimensional (3-D) forward groundwater model is developed using

Description of Field Site and Data Used for Analyses

Site Description and Hydrogeology

materials from clay to boulders (Karrow, 1993).

windows rendering the site to be highly heterogeneous.

Available Field Data and the 19-layer Geological Model

(Alexander et al., 2011).

samples collected from PW2, PW3, PW4, and PW5 wells.

constructed by examining lithology information obtained from 18 boreholes completed to different

and Illman (2017).

The geological model is 70 m × 70 m × 17 m in extent and is constructed with the

glaciofluvial deposit at the NCRS.

Insert Figures 1 and 2 here

Description of Various K Estimation Methods

sediments, with details provided in the Supporting Information (SI) section.

Case 1b: Permeameter Tests

Another traditional method for obtaining K estimates is to conduct laboratory permeameter

By conducting a permeameter test, vertical K is preferentially determined. As reported by

in permeameter tests is small compared to the K heterogeneity. Moreover, it is very difficult to

materials from highly permeable intervals.

Case 1c: Slug Tests

boreholes based on site heterogeneity (Butler, 1997).

Cases 2a – 2c: HPT Surveys

high-resolution variability of K to an approximate depth of 17 m with the HPT probe (Model

Case 3a: PEST Calibrated Geological Model

The computational mesh is provided as Figure S2 in the SI section.

Model utilizes all 31,713 K and Ss estimates.

Qualitative Comparison of K estimates

Insert Figure 3 here

individual layers (i.e., intralayer heterogeneity).

number of available data are low.

m to 16 m indicate an increase in K, which does not conform to stratigraphy at this location.

Statistical Analysis of K from Various Site Characterization Approaches

Descriptive Statistics of K from Various Approaches

logarithm of K (ln K) is provided as Table S1 in the SI section.

Insert Table 1 here