Developing richer engineering geological models through a data-centric approach

to site investigation

Ian Shipway
EDG Consulting, Australia,

Tim Swavley
Macquarie Geotechnical, Australia,

ABSTRACT: Geotechnical workflows require the development of engineering geological models (EGM) which are used to inform
analysis and design objectives. EGMs are “developed” through consideration of data from multiple sources, with some of the most
detailed data sources including manually “logged” boreholes and test pits.
Collation of the data obtained from boreholes and test pits against a depth scale on printable “log reports” is a practice which has
remained relatively unchanged since the early days of modern geotechnical practice. We postulate that reliance on such page bound
reports as the primary “truth reference” for the results of investigation activities cripples the subsequent geotechnical interpretation,
analysis and design workflows, reducing the opportunity for them to be digitally transformed.
Reversing this industry orientation such that the underlying data is the main truth reference enables a broad range of digital
opportunities. We demonstrate how good site investigation data sets can enhance interpretation, allow for greater interrogation, and
enable more efficient practice; ultimately allowing the development of more robust EGMs reduced geotechnical risk and other
benefits to projects.
KEYWORDS: models data geotechnical investigation

1 INTRODUCTION development is initially compromised, with the efforts of each

tender team focused on laboriously trying to digitise the minimal
The recently published IAEG Guidelines for the development of subset of data practical to allow production of a rudimentary long
Engineering Geological Models (Baynes et al, 2022) present the section or cross section. This often requires some data to be
“state of the art” for model development for engineering projects omitted, because there is no time available to reproduce it to a
of all scales. As noted in the Guidelines, standard where it can be usefully incorporated into the EGM.
geological/geotechnical models of some sort have been used to In practice, more hours can be spent on this activity than
improve understanding of ground conditions since the 1950s. interpreting the ground conditions and understanding the impact
Current good geotechnical practice requires that a model is of the project on those conditions, and data entry/transcription
developed for all projects, and AS1726-2017 requires that a errors are likely under such circumstances.
“geotechnical model” is developed for every geotechnical site
investigation. Similarly, the ANCOLD Guidelines for
Geotechnical Investigation of Dams, their foundations and
Appurtenant Structures (ANCOLD, 2020) require the
development of an engineering geological/geotechnical model as
part of the geotechnical investigation process.
Regardless of the degree of complexity of the site conditions
and therefore the Engineering Geological Model (EGM), the
mechanics of model development requires integration and
synthesis of geotechnical data from multiple sources often
ranging from surface observations, mapping data, through to
subsurface investigation data including that from boreholes,
insitu and laboratory tests, and geophysics. Figure 1 illustrates
typical data sets which need to be integrated for an infrastructure
project where ground conditions are dominated by soils/soft
soils. Similar guides are available for projects where rock
dominates the sub-surface such as that provided in Eggers et al
(2021). As is evident from Figure 1, the broad range of
information to be considered makes integration of a large volume
of data necessary when working on large infrastructure projects.
The integrity and completeness is essential to the development of
a robust EGM.
Although the level of sophistication and quantity of the data
obtained in sub-surface investigation has increased significantly
over the last 30 years, the base standard for presentation of that Figure 1. The components that should be synthesized and cross correlated
data is still centered around the “paginated” borehole log. For to develop an engineering geological model for a soils/soft ground
example, tenderers for design and construct contracts on large infrastructure project (Ameratunga et al 2021).
infrastructure projects are often provided only with a PDF
version of the A4 log, low resolution printed imagery and other
data in a form that is not immediately transferable across
software applications. In these circumstances, EGM
2 THE BOREHOLE LOG IN GEOTECHNICAL PRACTICE in many cases the primary source of “truth” on ground
conditions. To some extent this is understandable because the
2.1 Definitions descriptions and classifications made by geotechnical
The term “borehole log” used throughout this document is in professionals on site and presented on the log report are
reference to any set of information which is collated primarily important primary information, much of which can’t be collected
from visual-tactile assessment of material removed from the through other means, despite the availability of sophisticated
ground, is supplemented with data from field and laboratory testing methods. The removal of material from a hole by any
testing, and which makes a contiguous representation of ground means triggers breakdown/corruption of its observable
conditions encountered from hole commencement to characteristics. This breakdown especially accelerates as
termination. This term is intentionally broad and is agnostic to material is handled and transported away from site. Even where
hole construction methodology (for example auger vs cored hole decomposition of materials does not take place, it is often
construction). This definition would also include similar impractical for anyone other than the field logger to assess and
information recorded from excavations or existing exposures. deduce the characteristics of the material. Until new site
investigation methods are invented the primary observations
made when logging will remain one of the most essential sources
of information for ground engineering projects.
Despite the similarities with information presented on log
reports from 60 years ago and now, in modern geotechnical
workflows the borehole log serves as a reference set of
information into which data from rig-side logging, the logging
bench, downhole instrumentation, and laboratory sources is
consolidated. The life of a modern log report often
commences with rig-side information added first, with other
information updated onto a log as it becomes known. In this
way the “final” version of a log can be markedly different from
its initial draft, although it remains a “roundtable” around which
project stakeholders continually reference to inform actions
contributing to project goals.
Therefore, although the information collected in the field by
geologists and engineers that is presented on logs is critical to
development of a robust EGM, the log report itself may provide
only a filtered subset of data rather than all of the available

Distinction is also made between the term “borehole log”

referring to the set of information, and “borehole log report”
which refers to a specific presentation of the information,
rendered onto a template and arranged for presentation of a
physical page (pagination). Figure 3. Part of a typed borehole log from a Qld infrastructure project
of 1966.
Figure 2. Sketch section of a borehole drilled in 1884 in the Parish of 2.1 Methods of log report production
Rodborough, Victoria, reported by Geol. Survey of Vic (1909).
From the beginning, the size of the page and log production
2.1 History & Purpose
method limited the information that could be included within the
The example section in Figure 2 indicates that boreholes were page margins. Logs of the 1960s such as the extract shown in
being drilled in Australia to investigate geological conditions Figure 3 were limited in the information that they could
prior to the beginning of the 20th century, and probably well practically display by the characteristics of the typewriter.
before that. The depth-scaled, paginated borehole log report has In modern practice log reports are most commonly produced
been in common usage in Australia since at least the 1960s (as by a ground data management system (GDMS). Such
exemplified in Figure 3). applications contain paginated report “templates” which leverage
The broad presentation format of such log reports has not the base principles of computerised report generation,
significantly changed over time with the main refinement being comprising report and page level header and footer elements,
to group information in three broad categories relating to hole between which are “sandwiched” depth-scale graduated textual
construction and field activities, materials/mass description, and and graphical column bodies. Beneath the templates are data
interpretations of geological origin. tables containing rows of information with each row explicitly
Involvement within a wide range of infrastructure projects by having a nominated relevant depth (or depth range). In this way
the authors over the last 40 years suggest that the information in log reports of infinite length can be generated by simple
the borehole log report is usually treated as a major source, and reproduction of the information from the data table(s) in the
appropriate column at the nominated depth within the report that the digital data set and paginated log reports convey differing
body. In rare instances (relative to industry scale) logs are (sometimes conflicting) representations.
produced via spreadsheet templates or even manually drafted by 3.1.2 Drivers and pitfalls of report-centric practice
hand. Other than technical limitations associated with GDMS
Workflows for log report production via GDMS have several configuration, report-centric practice is commonly driven by the
advantages over hand/type written forms, in that they produce manner in which stakeholders engage with the information.
log reports that: Report-centric data generators expect that all stakeholders
• Are easily edited and reproduced. requiring their product will refer primarily to the paginated log
• Are generally cleanly spaced with clear presentation. reports when information about a test location is sought. This
• Incorporate algorithmic rule-based compilation and is convenient for users wanting to mentally interpret the overall
presentation of information. conditions encountered (a task for which one is best to consider
• Allow more precise placement of information at the all available information), however it creates a manual reentry
requisite depth. requirement for all subsequent use (including collation into a
The way the printed logs can be used, however, remains
digital EGM or analytical model).
generally unchanged in modern practice.
Although in some cases report-centric data generators can
transfer the underlying data in a datafile format, the information
2.1 Methods of borehole log transmission
is not machine interpretable (today’s computing tools can readily
Historical practice has required delivery of rendered copies of recognize a word, but not understand the meaning of that word
reports to users. Initially this required delivery of a physical in a sentence), and without the benefit of well thought out data
copy of the document, however as technology has evolved governance, data is unable to be algorithmically processed
transmission by fax, and subsequently by email, digital storage without significant reprocessing. An example of the latter is
media, and shared cloud storage has become possible. contiguity “gaps” in rock strength tables for unstated reasons
In modern practice delivery of borehole log information is (“no core”, soil material, or forgotten to log).
almost wholly via digital means, including in rasterized manner At the field data collection level, paginated reporting practice
(scans producing grids of pixels or pixelization of vector cripples the data informing the EGM in several ways:
objects), vectorized (with text and symbol objects dynamically • Dense data (relative to intended scale) is culled to prevent
rendered at the desired position from data), and in a datafile overcrowding on the page (or important information is lost
manner free of typographic and rendering information. amongst relatively unimportant data).
Scanned and vectorized methods commonly use PDF format and • There is still often a requirement to cull or group data to
are generally paginated and do not possess the ability for the ensure clarity of presentation because of the limited page
presentation style to be changed. The most common datafile space.
format is AGS datafiles (comprising multiple text tables of • Page space is used inefficiently as information cannot “cross
comma separated values). lanes” between columns.
• Information not included on the main log is readily forgotten.
• PDFs are often compressed for emailing leading to images
3 MANAGEMENT OF LOG DATA & PRODUCTION OF becoming pixelated and useless.
Beyond the initial collection of field information, report-
centric thinking can continue to compromise the data set. In some
3.1 Report-centric practice
cases, digital transmissions include contractual terms stating that
3.1.1 Definition the paginated representation is superior to the digital data set and
We use the term “report-centric” to refer to the approach where that digital results are an afterthought provided for convenience
geotechnical workflows (both inter and intra-organisational; and only. Such terms diminish the value of the digital data set or
within and outside of the project context) are orientated such that transfer liability for the generator’s mistakes and poor data
the borehole log report is the only valued product of borehole log management onto the receiving party.
information. Report-centric project owners can also cripple data generators
Report-centric practice has a mutual relationship with GDMSs by requiring paginated reports as the primary deliverable. This
configured to operate as a “reportbase”, that is a “database can trigger the same outcomes as above. Over emphasis on the
structure [that] is built as a one-to-one image of the desired report” importance of the visual aesthetics of the paginated product can
(Caronna, 2005). In simple terms this means the underlying reduce opportunity to benefit from data-centric practice, and
data tables and report template only accept pre-completed force compromise of the quality of the underlying dataset.
“sentences” of information and will present it “as written” at the
designated depth. 3.2 Data-centric practice
In the authors’ experience it is likely that reportbase GDMS 3.2.1 Definition
have been popular in Australia because: We use the term “data-centric” to refer to the approach where
• They more closely imitate the familiar approach to log
geotechnical workflows are orientated around a single set of data
production (typewriter).
maintained in a systematically truthful and machine-readable
• Fewer digital skills are required to establish and use such state. That is, the results of site investigation are collected and
configurations. recorded in a manner which is consistent with an established set
However, reportbases can only support report-centric practice,
of rules governing how a computer (rather than its subsystems)
essentially compromising outputs by the limitations of the
is to read and interpret the data, to ensure that the computer-
printed page. interpreted meaning of the information matches the original
In some situations, projects require log information to be generators intention.
converted retrospectively to a similar form as data-centric Data-centric practice stores data in a data base which possesses
outcomes. This is not an impossible task to perform, however
sets of information sufficiently divided into smaller components
as report-centric information is largely unstructured it is virtually
of detail to allow for algorithmic interpretation (“data
impossible to transfer the whole of the information between the granularity”). The data is stored with respect of the rules of
two formats without possible mistranslation of terms and interpretation, rather than specifically for presentation on one
loss/translation of meaning. The ultimate outcome of this is
template in one application. assessment of the engineering geological conditions and allows
This approach can be considered the application of the base the interaction of these conditions with the proposed project to
principles of the greater concept of “digital engineering”, which be evaluated, so that appropriate engineering decisions can be
is broadly defined in many sources as “an integrated digital made throughout the life cycle of the project from inception to
approach that uses authoritative sources of systems’ data and decommissioning”.
models as a continuum across disciplines” (attribution unknown). Implicit in that definition is the requirement that an EGM for a
The AGS data format (first developed in the United Kingdom) large infrastructure project requires a broad range of data to be
provides a data structure designed by industry which is suitable cross correlated and integrated into the overall model.
for most machine-reading applications. The Australian A data-centric approach provides the opportunity for important
Geomechanics Society has published a localized version adopted primary data from sub-surface investigation to be recorded in its
for Australian practice (AGS AU). entirety and retained with integrity throughout the project. It
also allows the observations of the logger on site to be readily
3.2.2 Opportunities enabled by data-centric practice correlated against other digital data sources such as televiewer
One primary value proposition of data-centric practice is the data and the results of other down-hole testing. Importantly it
construction of sets of machine readable data. When data can also ensures that information such as the logger’s assessment of
be read by a computer, programs can algorithmically perform geological origin which sits at the “interpretive” side of
“work” which would otherwise be performed by human effort. fieldwork is recorded and retained as part of the digital dataset.
Some examples of data-centric replacement of human effort
• Algorithmic compilation of material descriptions.
• Algorithmic/rule based presentation of information (for The authors wish to thank Timothy Thompson for introducing us
example automatically detecting and flagging incomplete and suggesting that we collaborate on this paper.
• Introduction of rules to validate and identify erroneous,
conflicting, or incomplete data, for example algorithmic 6 REFERENCES
validation field observations with lab results.
• Direct reading of material description and condition data Ameratunga, J., Sivakugan, N. and Das, B.M. 2021. “engineering
geology of soft clay,” in Soft Clay Engineering and ground
into an analysis package allowing "single click” analysis to
improvement. Abingdon, Oxon: CRC Press.
be performed. Australian National Committee on Large Dams 2020. “Guidelines for
• Scripted derivation of rock-mass properties (eg RQD, geotechnical investigations of dams, their foundations and
RMR, Q etc). appurtenant structures.
• Cross correlation of data from different sources, for AS1726 – 2017 Geotechnical Site Investigations, Standards Australia
example defect descriptions against televiewer data. Caronna, S. 2005. "Data Granularity in the storage and reporting of Soil
Machine readable data can also be easier to consume as it can Exploration Information", The Second Annual Geotechnical,
be presented in many different formats and styles, reduced to Geophysical, and Geoenvironmental Technology Transfer
relevant subsets, or presented along with other foreign data etc. Conference and Expo, Charlotte, North Carolina, 14-15 April 2005.
Baynes, F. J. and Parry, S. 2022. Guidelines for the development and
These characteristics provide opportunity for enhanced
application of engineering geological models on projects.
correlation/analysis of borehole log data with other sources of International Association for Engineering Geology and the
information, enabling richer and more easily developed EGMs. Environment (IAEG) Commission 25 Publication No. 1, 129 pp.
The above examples consider primarily single-task exercises, Eggers, M. J. & Bertuzzi, R. 2020. Chapter 1 The Engineering Geological
where data is extracted and pushed through an analysis as a Model. In: Tunnel Design Handbook 4th Edition; PSM publication
manually driven but automation assisted process. There is, Hunter, S. 1909 “The Deep Leads of Victoria” Memoirs of the
however, much greater opportunity on offer for our industry. Geological Survey of Victoria.
when the databases supporting data-centric practice are designed
with digital engineering principles in mind (sharing data between
applications and across disciplines). When the data is stored in
a common data environment it becomes directly accessible to all
stakeholders. When stakeholders use applications which can
access live data sets and detect changes in state, analysis and
design on a project can be performed in near real-time, as site
investigation information becomes available. Contractual and
data reliability reform is needed to support such working
arrangements in the Australian context.


The information collected by field geologists and engineers is

important primary data on ground conditions. Many important
characteristics of soil and rock which are critical to geological
interpretation can be observed only as the data is collected at the
drilling rig. The report-centric approach, where the borehole
log report provides the main source of truth for this information
often leads to compromises in data integrity, both at the point of
data collection, and later in the project lifecycle.
Conversely, a data-centric approach allows all data to be
collected and maintained through the reporting process without
consideration of space on a “standard page”.
As defined in Baynes et al (2022) an EGM “is a comprehensive
knowledge framework that supports the interpretation and

