Bayesian Inverse Problems Fundamentals and Engineering Applications 1St Edition Juan Chiachio Ruano Editor Online Ebook Texxtbook Full Chapter PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Bayesian Inverse Problems:

Fundamentals and Engineering


Applications 1st Edition Juan
Chiachio-Ruano (Editor)
Visit to download the full and correct content document:
https://ebookmeta.com/product/bayesian-inverse-problems-fundamentals-and-engine
ering-applications-1st-edition-juan-chiachio-ruano-editor/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Inverse Problems with Applications in Science and


Engineering 1st Edition Daniel Lesnic

https://ebookmeta.com/product/inverse-problems-with-applications-
in-science-and-engineering-1st-edition-daniel-lesnic/

Inverse Heat Transfer Fundamentals and Applications 2nd


Edition M. Necat Ozisik

https://ebookmeta.com/product/inverse-heat-transfer-fundamentals-
and-applications-2nd-edition-m-necat-ozisik/

Environmental Engineering Fundamentals and Applications


1st Edition Subhash Verma

https://ebookmeta.com/product/environmental-engineering-
fundamentals-and-applications-1st-edition-subhash-verma/

Calculus for Engineering Students: Fundamentals, Real


Problems, and Computers Jesus Martin Vaquero

https://ebookmeta.com/product/calculus-for-engineering-students-
fundamentals-real-problems-and-computers-jesus-martin-vaquero/
Introduction to Inverse Problems in Imaging, 2nd
Edition Bertero

https://ebookmeta.com/product/introduction-to-inverse-problems-
in-imaging-2nd-edition-bertero/

Deterministic and Stochastic Optimal Control and


Inverse Problems 1st Edition Baasansuren Jadamba
(Editor)

https://ebookmeta.com/product/deterministic-and-stochastic-
optimal-control-and-inverse-problems-1st-edition-baasansuren-
jadamba-editor/

Mathematical Methods in Image Processing and Inverse


Problems 1st Edition Xue-Cheng Tai

https://ebookmeta.com/product/mathematical-methods-in-image-
processing-and-inverse-problems-1st-edition-xue-cheng-tai/

Fundamentals of Engineering Tribology with Applications


1st Edition Harish Hirani

https://ebookmeta.com/product/fundamentals-of-engineering-
tribology-with-applications-1st-edition-harish-hirani/

Primary Mathematics 3A Hoerst

https://ebookmeta.com/product/primary-mathematics-3a-hoerst/
Bayesian Inverse Problems
Fundamentals and Engineering Applications

Editors
Juan Chiachío-Ruano
University of Granada
Spain
Manuel Chiachío-Ruano
University of Granada
Spain
Shankar Sankararaman
Intuit Inc.
USA

p,
p,
A SCIENCE PUBLISHERS BOOK
A SCIENCE PUBLISHERS BOOK
Cover credit: Cover image by Dr Elmar Zander (chapter author). It is original and has not been taken from any copyrighted source.

First edition published 2022


by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742

and by CRC Press


2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

© 2022 Taylor & Francis Group, LLC

CRC Press is an imprint of Taylor & Francis Group, LLC

Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to
trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not
available on CCC please contact mpkbookspermissions@tandf.co.uk

Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification
and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data


Names: Chiachío-Ruano, Juan, 1983- editor.
Title: Bayesian inverse problems : fundamentals and engineering
applications / editors, Juan Chiachío-Ruano, University of Granada,
Spain, Manuel Chiachío-Ruano, University of Granada, Spain, Shankar
Sankararaman, Intuit Inc., USA.
Description: First edition. | Boca Raton : CRC Press, 2021. | Includes
bibliographical references and index.
Identifiers: LCCN 2021016121 | ISBN 9781138035850 (hardcover)
Subjects: LCSH: Inverse problems (Differential equations) | Bayesian
statistical decision theory.
Classification: LCC QA371 .B3655 2021 | DDC 515/.357--dc23
LC record available at https://lccn.loc.gov/2021016121

ISBN: 978-1-138-03585-0 (hbk)


ISBN: 978-1-032-11217-6 (pbk)
ISBN: 978-1-315-23297-3 (ebk)
DOI: 10.1201/b22018

Typeset in Times New Roman


by Radiant Productions
To the students of any scientific or engineering discipline, who may
find in this book an opportunity to learn how mathematics possesses
not only truth and supreme beauty, but also an extraordinary power to
address many of society’s most challenging problems.
Preface

We live in the digital era. As developed societies, we are facing the onset of a new industrial revolution
due to the rapid development of technologies including artificial intelligence, internet of things, and soft
robotics. As a result, the amount of information and data coming from remotely monitored infrastructures,
buildings, vehicles, industrial plants, etc. will increase exponentially over the next few decades. At the same
time, our fundamental knowledge about Nature and engineered systems has experienced a rampant increase
since the last century, and the computational power available today to process such information has seen a
revolutionary transformation. This intersection between empirical (data-driven) and fundamental (physics-
based) knowledge has led to the rise of new research topics for knowledge discovery, out of which the
Bayesian methods and stochastic simulation are prominent. Just as an example, the increased availability
of information coming from digital twins models and the grown ability by intelligent algorithms of fusing
such an information with real-time data is leading to intelligent cyber-physical systems such as autonomous
cars, smart buildings, etc. This engineering “revolution” enabled by digital technologies and increasing
fundamental knowledge is changing the way the 21st century’s engineered assets are designed, built, and
operated.
This book is devoted to the so-called “Bayesian methods” and how this class of methods can be useful to
rigorously address a range of engineering problems where empirical data and fundamental knowledge come
into play. These methods comprise not only the Bayesian formulation of inverse and forward engineering
problems, but also the associated stochastic simulation algorithms needed to solve them. All the authors
contributing to this book are renowned experts in this field and share the same perception about the importance
and relevance of this topic in the upcoming challenges and opportunities brought by the digital revolution in
modern engineering.
The Editors
Contents

Preface v
List of Figures xii
List of Tables xiv
Contributors xv

Part I Fundamentals 1

1. Introduction to Bayesian Inverse Problems 3


Juan Chiachío-Ruano, Manuel Chiachío-Ruano and Shankar Sankararaman
1.1 Introduction 3
1.2 Sources of uncertainty 4
1.3 Formal definition of probability 5
1.4 Interpretations of probability 6
1.4.1 Physical probability 7
1.4.2 Subjective probability 7
1.5 Probability fundamentals 8
1.5.1 Bayes’ Theorem 8
1.5.2 Total probability theorem 9
1.6 The Bayesian approach to inverse problems 10
1.6.1 The forward problem 11
1.6.2 The inverse problem 13
1.7 Bayesian inference of model parameters 14
1.7.1 Markov Chain Monte Carlo methods 18
1.7.1.1 Metropolis-Hasting algorithm 18
1.8 Bayesian model class selection 19
1.8.1 Computation of the evidence of a model class 21
1.8.2 Information-theory approach to model-class selection 23
1.9 Concluding remarks 24

2. Solving Inverse Problems by Approximate Bayesian Computation 25


Manuel Chiachío-Ruano, Juan Chiachío-Ruano and María L. Jalón
2.1 Introduction to the ABC method 25
2.2 Basis of ABC using Subset Simulation 30
2.2.1 Introduction to Subset Simulation 30
2.2.2 Subset Simulation for ABC 33
viii Bayesian Inverse Problems: Fundamentals and Engineering Applications

2.3 The ABC-SubSim algorithm 34


2.4 Summary 36

3. Fundamentals of Sequential System Monitoring and Prognostics Methods 39


David E. Acuña-Ureta, Juan Chiachío-Ruano, Manuel Chiachío-Ruano and
Marcos E. Orchard
3.1 Fundamentals 39
3.1.1 Prognostics and SHM 40
3.1.2 Damage response modelling 40
3.1.3 Interpreting uncertainty for prognostics 41
3.1.4 Prognostic performance metrics 41
3.2 Bayesian tracking methods 43
3.2.1 Linear Bayesian Processor: The Kalman Filter 44
3.2.2 Unscented Transformation and Sigma Points: The Unscented Kalman Filter 46
3.2.3 Sequential Monte Carlo methods: Particle Filters 49
3.2.3.1 Sequential importance sampling 50
3.2.3.2 Resampling 51
3.3 Calculation of EOL and RUL 52
3.3.1 The failure prognosis problem 53
3.3.2 Future state prediction 55
3.4 Summary 60

4. Parameter Identification Based on Conditional Expectation 61


Elmar Zander, Noémi Friedman and Hermann G. Matthies
4.1 Introduction 61
4.1.1 Preliminaries—basics of probability and information 63
4.1.1.1 Random variables 63
4.1.2 Bayes’ theorem 64
4.1.3 Conditional expectation 65
4.2 The Mean Square Error Estimator 66
4.2.1 Numerical approximation of the MMSE 67
4.2.2 Numerical examples 68
4.3 Parameter identification using the MMSE 70
4.3.1 The MMSE filter 70
4.3.2 The Kalman filter 72
4.3.3 Numerical examples 73
4.4 Conclusion 76

Part II Engineering Applications 77

5. Sparse Bayesian Learning and its Application in Bayesian System Identification 79


Yong Huang and James L. Beck
5.1 Introduction 79
Contents ix

5.2 Sparse Bayesian learning 81


5.2.1 General formulation of sparse Bayesian learning with the ARD prior 81
5.2.2 Bayesian Ockham’s razor implementation in sparse Bayesian learning 83
5.3 Applying sparse Bayesian learning to system identification 84
5.3.1 Hierarchical Bayesian model class for system identification 84
5.3.2 Fast sparse Bayesian learning algorithm 88
5.3.2.1 Formulation 88
5.3.2.2 Proposed fast SBL algorithm for stiffness inversion 93
5.3.2.3 Damage assessment 94
5.4 Case studies 95
5.5 Concluding remarks 105
Appendices 107
Appendix A: Derivation of MAP estimation equations for α and β 109

6. Ultrasonic Guided-waves Based Bayesian Damage Localisation and Optimal Sensor 113
Configuration
Sergio Cantero-Chinchilla, Juan Chiachío, Manuel Chiachío and Dimitrios Chronopoulos
6.1 Introduction 113
6.2 Damage localisation 114
6.2.1 Time-frequency model selection 115
6.2.1.1 Stochastic embedding of TF models 115
6.2.1.2 Model parameters estimation 116
6.2.1.3 Model class assessment 117
6.2.2 Bayesian damage localisation 122
6.2.2.1 Probabilistic description of ToF model 122
6.2.2.2 Model parameter estimation 123
6.3 Optimal sensor configuration 125
6.3.1 Value of information for optimal design 126
6.3.2 Expected value of information 127
6.3.2.1 Algorithmic implementation 127
6.4 Summary 131

7. Fast Bayesian Approach for Stochastic Model Updating using Modal Information 133
from Multiple Setups
Wang-Ji Yan, Lambros Katafygiotis and Costas Papadimitriou
7.1 Introduction 133
7.2 Probabilistic consideration of frequency-domain responses 134
7.2.1 PDF of multivariate FFT coefficients 134
7.2.2 PDF of PSD matrix 135
7.2.3 PDF of the trace of the PSD matrix 135
7.3 A two-stage fast Bayesian operational modal analysis 136
7.3.1 Prediction error model connecting modal responses and measurements 136
7.3.2 Spectrum variables identification using FBSTA 137
x Bayesian Inverse Problems: Fundamentals and Engineering Applications

7.3.3 Mode shape identification using FBSDA 138


7.3.4 Statistical modal information for model updating 139
7.4 Bayesian model updating with modal data from multiple setups 139
7.4.1 Structural model class 139
7.4.2 Formulation of Bayesian model updating 140
7.4.2.1 The introduction of instrumental variables system mode shapes 140
7.4.2.2 Probability model connecting ‘system mode shapes’ and measured 140
local mode shape
7.4.2.3 Probability model for the eigenvalue equation errors 141
7.4.2.4 Negative log-likelihood function for model updating 142
7.4.3 Solution strategy 142
7.5 Numerical example 144
7.5.1 Robustness test of the probabilistic model of trace of PSD matrix 144
7.5.2 Bayesian operational modal analysis 145
7.5.3 Bayesian model updating 146
7.6 Experimental study 147
7.6.1 Bayesian operational modal analysis 149
7.6.2 Bayesian model updating 150
7.7 Concluding remarks 152

8. A Worked-out Example of Surrogate-based Bayesian Parameter and Field 155


Identification Methods
Noémi Friedman, Claudia Zoccarato, Elmar Zander and Hermann G. Matthies
8.1 Introduction 155
8.2 Numerical modelling of seabed displacement 157
8.2.1 The deterministic computation of seabed displacements 157
8.2.2 Modified probabilistic formulation 159
8.3 Surrogate modelling 164
8.3.1 Computation of the surrogate by orthogonal projection 165
8.3.2 Computation of statistics 169
8.3.3 Validating surrogate models 170
8.4 Efficient representation of random fields 171
8.4.1 Karhunen-Loève Expansion (KLE) 171
8.4.2 Proper Orthogonal Decomposition (POD) 173
8.5 Identification of the compressibility field 179
8.5.1 Bayes’ Theorem 179
8.5.2 Sampling-based procedures—the MCMC method 180
8.5.3 The Kalman filter and its modified versions 185
8.5.3.1 The Kalman filter 185
8.5.3.2 The ensemble Kalman filter 186
8.5.3.3 The PCE-based Kalman filter 188
8.5.4 Non-linear filters 193
8.6 Summary, conclusion, and outlook 202
Contents xi

Appendices 205
Appendix A: FEM computation of seabed displacements 207
Appendix B: Hermite polynomials 209
B.1 Generation of Hermite Polynomials 209
B.2 Calculation of the norms 211
B.3 Quadrature points and weights 211
Appendix C: Galerkin solution of the Karhunen Loève eigenfunction problem 212
Appendix D: Computation of the PCE Coefficients by Orthogonal projection 216

Bibliography 217
Index 231
List of Figures

1.1 Illustration of the stochastic embedding process represented in Eq. (1.17). 12


1.2 Illustrative example of different model classes consistent with the data. 14
1.3 Illustration of the prior and posterior information of model parameters. 15
1.4 Illustration of stochastic simulation for Bayesian inference. 17
1.5 Example of relative prior and posterior probabilities for model classes. 21
2.1 Scheme of structure used for Example 4. 27
2.2 Output of the ABC rejection algorithm in application to Example 4. 28
2.3 Illustration of Subset Simulation method. 32
2.4 Output of the ABC-SubSim algorithm in application to Example 5. 36
3.1 Illustrations of PH and α − λ prognostics metrics. 42
3.2 Conceptual scheme for RUL and EOL calculation. 53
3.3 Conceptual illustration of Monte Carlo approximation. 56
3.4 Failure time calculation using a random walk (Monte Carlo simulations) approach. 59
4.1 MMSE estimation with increasing polynomial degrees. 69
4.2 MMSE estimation with different configuration parameters. 69
4.3 Approximation of the conditional expectation for different polynomial degrees. 70
4.4 MMSE filter with different polynomial orders. 74
4.5 Comparison of the MMSE filter with with a Bayes posterior and an MCMC simulation. 75
4.6 Continuation of Fig. 4.5. 76
5.1 Information flow in the hierarchical sparse Bayesian model. 89
5.2 A 12-storey shear building structure. 96
5.3 Iteration histories for the MAP values of the twelve stiffness scaling parameter. 97
5.4 Scheme of the investigated structure. 101
5.5 Probability of substructure damage exceeding f for different damage patterns. 104
6.1 Likelihood functions derived from each time-of-flight (dˆj (k) ). The standard deviation 117
of the proposed model ranking is expected to have different values in each model class.
The time-of-flight data are then substituted in the likelihood function p(dˆ (k) |σ c ,M(k)
j ).
6.2 Flowchart describing the ultrasonic guided-waves-based model class assessment problem 118
for one arbitrary scattered signal.
6.3 Flat aluminium plate along with the sensor layout. 119
6.4 Example of the outputs for the different TF models considered in this example. The 120
time represented in each caption corresponds to the first energy peak (time of flight),
which is used later for damage localisation purposes.
6.5 Posterior probability of each TF model for every sensor. 121
6.6 Flowchart describing the ultrasonic guided-waves based Bayesian damage localisation 124
problem.
6.7 Panel (a): Posterior PDF of the damage localisation variable and comparison with 125
the ground truth about the enclosed damaged area. Panel (b): Comparison of prior
and posterior PDFs of the velocity parameter.
List of Figures xiii

6.8 Optimal sensor layouts for different prior PDFs. 130


6.9 Damage location reconstruction using the optimal sensor configurations. 131
7.1 Multiple-level hierarchical architecture of modal analysis. 139
7.2 CDFs of (Sksum ) at f k = 0.806 Hz (left) and f k = 3.988 Hz (right) with different ns. 145
7.3 Conditional PDFs of f s (left) and ϛs (right) for the shear building. 147
7.4 Iteration histories of θi for the shear building. 148
7.5 Shear building used for laboratory testing. 148
7.6 Acceleration of the top story and the trace of PSD matrix. 150
7.7 Iteration histories of model updating for four scenarios. 151
7.8 Identified optimal ‘system mode shapes’ for different scenarios. 152
8.1 FEM grid used for the geomechanical simulations. 159
8.2 Schematic flowchart of the deterministic solver, the computation of subsidence, and 160
the measurable expression.
8.3 Lognormal prior probability distribution of fcm. 161
8.4 Replacing the deterministic solver by a PCE surrogate model. 166
8.5 KLE mesh details. 176
8.6 Relative variance ρL for different eigenfunctions. 176
8.7 Realisations of the random field fcM , j. 178
8.8 Posterior samples from the MCMC random walk chain. 183
8.9 MCMC update results: Prior and posterior for FcM. 183
8.10 Random chain of the first four elements of q (right). 184
8.11 MCMC update results: Scatter plot. 184
8.12 2D and 3D view of the ‘true’ fcM’ field. 185
8.13 PCE-based Kalman filter update results. 191
8.14 2D and 3D view of the scaling factor. 193
8.15 Testing low-rank approximation on MCMC update. 199
8.16 Prior and posterior of FcM. 201
8.17 MMSE field update using linear estimator and a low rank approximation of the 201
measurement model.
List of Tables

5.1 Identification results with four modes and three data segments. 97
5.2 Identification results for different modes and equal number of data segments. 98
5.3 Identification results using various data segments for the full-sensor scenario. 99
5.4 Identification results using different data segments for the partial-sensor scenario. 99
5.5 System modal frequencies for various damage patterns (Example 14). 101
5.6 Identification results for the full-sensor scenario (Example 14). 102
5.7 Identification results for the partial-sensor scenario (Example 14). 103
6.1 Times of flight corresponding to the most probable model for sensors 1 through 4. 121
7.1 Setup information of the measured DOFs. 146
7.2 Identified modal properties for the 2D shear building. 146
7.3 Identified stiffness scaling factors for the 2D shear building. 147
7.4 Identified spectrum variables for the laboratory shear building model. 151
7.5 Identified stiffness parameters for four different scenarios. 152
8.1 Validation of different degree surrogate models. 171
8.2 Performance of the different PCE-based update methods. 203
Contributors

David E. Acuña-Ureta Lambros Katafygiotis


Pontificia Universidad Católica de Chile Hong Kong University of Science and Technology
Chile China
James L. Beck Hermann G. Matthies
California Institute of Technology Technische Universität Braunschweig
USA Germany
Sergio Cantero-Chinchilla Marcos E. Orchard
University of Nottingham University of Chile
UK Chile
Juan Chiachío-Ruano Costas Papadimitriou
University of Granada University of Thessaly
Spain Greece
Manuel Chiachío-Ruano Shankar Sankararaman
University of Granada Intuit Inc.
Spain USA
Dimitrios Chronopoulos Wang-Ji Yan
University of Nottingham University of Macau
UK China
Noémi Friedman Elmar Zander
SZTAKI (Institute for Computer Science and Technische Universität Braunschweig
Control, Budapest) Germany
Hungary
Claudia Zoccarato
Yong Huang University of Padova
Harbin Institute of Technology Italy
China
María L. Jalón
University of Granada
Spain
Part I

Fundamentals of Bayesian
Methods
1
Introduction to Bayesian Inverse Problems

Juan Chiachı́o-Ruano,1, ∗ Manuel Chiachı́o-Ruano1 and Shankar Sankararaman2

This chapter formally introduces the concept of uncertainty and explains the impact of uncertainty
quantification on an important class of engineering problems such as the inverse problems. The
treatment of uncertainty can be facilitated through various mathematical methods, though proba-
bility has been predominantly used in engineering. A simple introduction to probability theory is
presented as the foundation of the Bayesian approach to inverse problems. The interpretation of
this Bayesian approach to inverse problems and its practical implications using relevant engineering
examples constitute the backbone of this textbook.

1.1 Introduction
Research in the area of uncertainty quantification and the application of stochastic methods to
the study of engineering systems has gained considerable attention during the past thirty years.
This can be attributed to the necessity and desire to design engineering systems with increasingly
complex architectures and new materials. These systems can be multi-level, multi-scale, and multi-
disciplinary in nature, and may need sophisticated and complex computational models to facilitate
their analysis and design. The development and implementation of computational models is not
only increasingly sophisticated and expensive, but also based on physics which is often not well-
understood. Furthermore, this complexity is exacerbated when considering the limited availability
of full-scale response data and the measurement errors.
This leads to one of the most commonly encountered problems in science and engineering, that is,
the identification of a mathematical model or missing parts of a model based on noisy observations
(e.g. measurements). This is referred to as the inverse problem or system identification problem
in the literature. The goal of the inverse problem is to use the observed response of a system to
“improve” a single or a set of models that idealise that system, so that they make more accurate
predictions of the system response to a prescribed, or uncertain, excitation [154].
Different model parameterisations or even model hypotheses representing different physics can
be formulated to idealise a single system, yielding a set of different model classes [22]. Following
the probabilistic formulation of the inverse problem [181], the solution is not a single-valued set

1
University of Granada, Spain.
2
Intuit Inc., USA.
* Corresponding author: jchiachio@ugr.es
4 Bayesian Inverse Problems: Fundamentals and Engineering Applications

of model parameters nor a single model class. On the contrary, a range of plausible values for
model parameters and a set of candidate model classes constitute a more complete, rigorous and
principled solution to the system identification problem. The plausibility of the various possibilities
is expressed through probabilities which measure the relative degree of belief of the candidate
solutions conditional to the available information (e.g. data). This interpretation of probability is
not well known in the engineering community where there is a widespread belief that probability
only applies to aleatory uncertainty (e.g. inherent randomness) and not to epistemic uncertainty
(missing information). E.T. Jaynes [100], who wrote extensively about Bayesian techniques and
probability theory, noted that the assumption of inherent randomness is an example of what he
called the Mind-Projection Fallacy:
“Our uncertainty is ascribed to an inherent property of nature, or, more generally, our
models of reality are confused with reality.”
The goal of this chapter is to provide an introductory overview of the fundamentals of the Bayesian
inverse problem and its associate stochastic simulation and uncertainty quantification problems.

1.2 Sources of uncertainty


As introduced before, two types of uncertainty are typically considered in science and engineering.
The first type, aleatory uncertainty, is regarded as inherent randomness in nature. If the outcome
of an experiment differs each time the experiment is run, then this is an example of aleatory uncer-
tainty. This type of uncertainty is irreducible. The second type, epistemic uncertainty, is regarded
as a lack of knowledge in relation to a particular quantity and/or physical phenomenon. This type
of uncertainty can be reduced (and sometimes eliminated) when new knowledge is available, for
example, after some research. Epistemic uncertainty can be present either in the data collected from
engineering systems or in the models used to represent the behaviour of these engineering systems.
Therefore, the various sources of uncertainty can be enumerated as follows:
1. Physical Variability: As mentioned earlier, this type of uncertainty is referred to as aleatory
uncertainty. The inputs to the engineering system may be random (e.g. variability in traffic
loads for a bridge deck), or the parameters governing a physical system may present spatial
variability (e.g. the elastic modulus of a large bridge structure), and this leads to an uncertain
output. It is common to represent such random variables using probability distributions.
2. Data Uncertainty: Data uncertainty is a very broad term and can be of different types. The
most commonly considered type of data uncertainty is measurement errors (both at the input
and output levels). Note that it may be argued that measurement errors occur due to natural
variability and hence, must be classified as a type of aleatory uncertainty. The second type of
data uncertainty occurs during the characterisation of variability; a large amount of data may
be necessary to accurately characterise such a variability and that amount of data may not be
available in practice. Sometimes, the available data may be sparse and sometimes, even in the
form of intervals. This leads to epistemic uncertainty (i.e. uncertainty reducible in the light of
new information). Another type of data uncertainty is related to system-level knowledge; for
example, the operational conditions of the system may be partially unknown and therefore, the
inputs to the system model will automatically become uncertain.
Introduction to Bayesian Inverse Problems 5

3. Model Uncertainty: The engineering system under study is represented using an idealised
mathematical model, and the corresponding mathematical equations are numerically solved
using computer codes. This modelling process is an instance of epistemic uncertainty and com-
prises three different types of errors/uncertainty. First, the intended mathematical equation is
solved using a computer code which leads to rounding off errors, solution approximation errors,
and coding errors. Second, some model parameters may not be readily known, and field data
may be needed in order to update them. Third, the mathematical model itself is an idealisation
of the physical reality, which leads to prediction errors. The combined effect of solution approxi-
mation errors, model prediction errors, and model parameter uncertainty is referred to as model
uncertainty.
There are several mathematical frameworks that provide varied measures of uncertainty for
the purpose of uncertainty representation and quantification. These methods differ not only in the
level of granularity and detail, but also in how uncertainty is interpreted. Such methods are based
on probability theory [85, 157], possibility theory [61], fuzzy set theory [203], Dempster Shafer’s
evidence theory [18, 148], interval analysis [119], etc. Amongst these theories, probability theory
has received significant attention in the engineering community. As a result, this book will focus
only on probabilistic methods and not delve into the aforementioned non-probabilistic approaches.

1.3 Formal definition of probability


The fundamental theory of probability is well-established in the literature, including many textbooks
and journal articles. The roots of probability lie in the analysis of games of chance by Gerolamo
Cardano in the sixteenth century, and by Pierre de Fermat and Blaise Pascal in the seventeenth
century. In earlier days, researchers were interested in Boolean and discrete events only (e.g. win
or not win), and therefore on discrete probabilities. With the advent of mathematical analysis, the
importance of continuous probabilities steadily increased. This led to a significant change in the
understanding and formal definition of probability. The classical definition of probability was based
on counting the number of favorable outcomes, and it was understood that this definition cannot be
applied to continuous probabilities (refer to Bertrand’s Paradox [101]). Hence, the modern definition
of probability, which is based on Set Theory [69] and functional mapping, is more commonly used
in recent times.
Discrete probability deals with events where the sample space is discrete and countable. Consider
the sample space (Ω), which is equal to the set of all possible outcomes (e.g. Ω = {1, 2, . . . , 6} for
the dice roll experiment). The modern definition of probability maps every element x ∈ Ω to a
“probability value” p(x) such that:
1. p(x) ∈ [0, 1] ∀ x ∈ Ω
2. p(x) = 1
P
x∈Ω

Any event E (e.g. getting a value x 6 3 in the dice roll experiment) can be expressed as a subset
of the sample space Ω (E ∈ Ω), and the probability of the event E defined as:
X
P (E) = p(x) (1.1)
x∈E
6 Bayesian Inverse Problems: Fundamentals and Engineering Applications

Hence, the function p(x) is a mapping from a point x in the sample space to a probability value,
and is referred to as the probability mass function (PMF).
Continuous probability theory deals with cases where the sample space is continuous and hence
uncountable; for example consider the case where the set of outcomes of a random experiment is
equal to the set of real numbers (R). In this case, the modern definition of probability introduces
the concept of cumulative distribution function (CDF), defined as FX (x) = P (X ≤ x), that is, the
CDF of the random variable X (e.g. the human lifespan) evaluated at x is equal to the probability
that the random variable X can take a value less than or equal to x. This CDF necessarily contains
the following properties:
1. FX (x) is monotonically non-decreasing, and right continuous.
2. Lim FX (x) = 0
x→−∞

3. Lim FX (x) = 1
x→∞

If the function FX (x) is absolutely continuous and differentiable, then the derivative of the CDF
is denoted as the probability density function (PDF) pX (x). Therefore,
dFX (x)
pX (x) = (1.2)
dx
For any set E ⊆ R (e.g. lifespan longer than eighty years), the probability of the random variable
X belonging to E can be written as
Z
P (X ∈ E) = dFX (x) (1.3)
x∈E

If the PDF exists, then Z


P (X ∈ E) = pX (x)dx (1.4)
x∈E
Note that the PDF exists only for continuous random variables, whereas the CDF exists for all
random variables (including discrete variables) whose realisations belong to R. A PDF or CDF is
said to be valid if and only if it satisfies all of the above properties. The above discussion can be
easily be extended to multiple dimensions by considering the space Rn .

The basic principles of probability theory presented here are not only fundamental to this
chapter, but will be repeatedly used along the rest of this book.

1.4 Interpretations of probability


The previous section formally defined probability using the concepts of the cumulative distribution
function and the probability density function. What is the meaning of this probability? Although
the different concepts of probability are well-established in the literature, there is considerable
disagreement among researchers on the interpretation of probability. There are two major interpre-
tations based on physical and subjective probabilities, respectively. It is essential to understand the
difference between these two interpretations before delving deeper into this book.
Introduction to Bayesian Inverse Problems 7

1.4.1 Physical probability


Physical probabilities [180], also referred to as objective or frequentist probabilities, are related to
random physical systems such as rolling a dice, tossing coins, roulette wheels, etc. Each trial of the
experiment leads to an event (which is a subset of the sample space), and in the long run of repeated
trials, each event tends to occur at a persistent rate, and this rate is referred to as the “relative
frequency”. These relative frequencies are expressed and explained in terms of physical probabilities.
Thus, physical probabilities are defined only in the context of random experiments. The theory of
classical statistics is based on physical probabilities. Within the realm of physical probabilities,
there are two types of interpretations: von Mises’ frequentist [190] and Popper’s propensity [146];
the former is more easily understood and widely used.
In the context of physical probabilities, the mean of a random variable, sometimes referred
to as the population mean, is deterministic (i.e. a single value). It is meaningless to talk about
the uncertainty of this mean. In fact, for any type of parameter estimation, the underlying pa-
rameter is assumed to be deterministic. The uncertainty in the parameter estimate is addressed
through confidence intervals. The interpretation of confidence intervals is sometimes confusing and
misleading, and the uncertainty in the parameter estimate cannot be used for further uncertainty
quantification. For example, if the uncertainty in the elastic modulus of a material was estimated
using several axial loading tests over several beam specimens, this uncertainty cannot be used for
quantifying the response of a plate made of the same material. This is a serious limitation, since
it is not possible to propagate uncertainty after parameter estimation, which is often necessary in
engineering modelling problems. Another disadvantage of this approach is that, when a quantity
is not random, but unknown, then the well-known tools of frequentist statistics cannot be used to
represent this type of uncertainty (epistemic). The second interpretation of probability, that is, the
subjective interpretation, overcomes these limitations.

1.4.2 Subjective probability


Subjective probabilities [54] can be assigned to any “statement”. It is not necessary that this state-
ment is related to an event which is a possible outcome of a random experiment. In fact, subjective
probabilities can be assigned even in the absence of random experiments. Subjective probabilities
are interpreted as degrees of belief of the statement, and quantify the extent to which such a state-
ment is supported by existing knowledge and available evidence. The Bayesian methodology to
which this book is devoted is based on subjective probabilities. Calvetti and Somersalo [30] explain
that “randomness” in the context of physical probabilities is equivalent to a “lack of information”
in the context of subjective probabilities.
Using this interpretation of probability, even deterministic quantities can be represented using
probability distributions which reflect the subjective degree of the analyst’s belief regarding such
quantities. As a result, probability distributions can be assigned to parameters that need to be
estimated, and therefore, this interpretation facilitates uncertainty propagation after parameter
estimation; this is helpful for uncertainty integration across multiple models and scales.
For example, consider the case where a variable is assumed to be normally distributed and the
estimation of the mean and the standard deviation based on available data is desired. If sufficient
data were available, then it is possible to uniquely estimate these distribution parameters (mean and
standard deviation). However, in some cases, data may be sparse and therefore, it may be necessary
to quantify the uncertainty in these distribution parameters. Note that this uncertainty reflects
8 Bayesian Inverse Problems: Fundamentals and Engineering Applications

our epistemic uncertainty; the quantities may be estimated deterministically with enough data.
The former philosophy based on physical probabilities inherently assumes that these distribution
parameters are deterministic and expresses the uncertainty through confidence intervals. It is not
possible to propagate this description of uncertainty through a mathematical model. Instead, the
Bayesian methodology allows obtaining probability distributions for the model parameters, which
can be easily used in uncertainty propagation. Therefore, the Bayesian methodology provides a
framework in which epistemic uncertainty can be also addressed using probability theory, in contrast
with the frequentist approach.

1.5 Probability fundamentals


The vast majority of the probability contents of this book are based on the concepts of conditional
probability, total probability, and Bayes’ theorem. These concepts are briefly explained in this
section.

1.5.1 Bayes’ Theorem


Though named after the 18th century mathematician and theologian Thomas Bayes [21], it was
the French mathematician Pierre-Simon Laplace who pioneered and popularised what is now called
Bayesian probability [176, 175]. For a brief history of Bayesian methods, refer to [66]. The law of
conditional probability is fundamental to the development of Bayesian philosophy:

P (AB) = P (A|B)P (B) = P (B|A)P (A) (1.5)

Consider a list of mutually exclusive and exhaustive events Ai (i = 1 to N ) that form the sample
space all together. Let B denote any other event from the sample space such that P (B) > 0. Based
on Equation (1.5), it follows that:

P (B|Ai )P (Ai )
P (Ai |B) = PN (1.6)
j=1 P (B|Aj )P (Aj )

What does Equation (1.6) mean? Suppose that the probabilities of all events Ai (i = 1 to N ) are
known before conducting any experiments. These probabilities are referred to as prior probabilities
in the Bayesian context. Then the experiment is conducted and event B, which is conditionally de-
pendent on Ai has been observed; therefore, it can be probabilistically expressed as P (B|Ai ). In the
light of this information, the reciprocal event P (Ai |B) (i = 1 to N ), known as the posterior probabil-
ity in the Bayesian approach, can be calculated using Bayes’ theorem given by Equation (1.6). The
quantity P (B|Ai ) is the probability of observing the event B conditioned on Ai . It can be argued
that event B has “actually been observed” and there is no uncertainty regarding its occurrence,
which renders the probability P (B|Ai ) meaningless. Hence, researchers “invented” new terminology
in order to denote this quantity. In earlier days, this quantity was referred to as “inverse probabil-
ity”, and since the advent of Fisher [103, 5] and Edwards [62], this terminology has become obsolete
and has been replaced by the term “likelihood”. In fact, it is also common to write P (B|Ai ) as L(Ai ).
Introduction to Bayesian Inverse Problems 9

In general terms, the conditional probability P (B|Ai ) is interpreted as the degree of belief of
proposition B given that proposition Ai holds. Observe that for the definition of conditional prob-
ability, it is not required for the conditional proposition to be true or to happen; for example, it is
not essential that the event Ai (e.g. an earthquake) has occurred in order to define the probability
P (B|Ai ) (e.g. building collapse given an earthquake); instead, this probability is simply condition-
ally asserted: “if Ai occurs, then there is a corresponding probability for B, and this probability is
denoted as P (B|Ai ).” In addition, this notion of conditional probability does not necessarily imply
a cause-consequence relationship between the two propositions. For example, the occurrence of Ai
does not lead to the occurrence of B. It is obviously meaningless from a causal dependence (physi-
cal) point of view. Instead, P (B|Ai ) refers to the degree of plausibility of proposition B given the
information in proposition Ai , whose truth we need not know. In the extreme situation, that is, if
Ai implies B, then proposition Ai gives complete information about B, and thus P (B|Ai ) = 1;
otherwise, when Ai implies not B, then P (B|Ai ) = 0. This information dependence instead of
causal dependence between conditional propositions brings us to the Cox-Jaynes interpretation of
subjective probability as a multi-valued logic for plausible inference [52, 99], which is adopted in
this book.

1.5.2 Total probability theorem


Let us assume that one of the conditional propositions, for example, A, can be partitioned into N
mutually exclusive propositions, A1 , A2 , · · · , AN . Then, the total probability theorem allows us to
obtain the total probability of proposition B, as follows:
N
X
P (B) = P (B|Ai )P (Ai ) (1.7)
i=1

Example 1 Suppose that we can classify proposition A “it rains” into three different intensity
levels; for example, A1 : “low rainfall intensity”, A2 : “medium rainfall intensity”, A3 : “extreme
rainfall intensity”, whose plausibilities P (Ai ), i = 1, 2, 3, are known. Then, the plausibility of a new
proposition B: “traffic jam” can be obtained as

P (B) = P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + P (B|A3 )P (A3 ) (1.8)

where P (B|Ai ) is the plausibility of a traffic jam given (conditional to) a particular rainfall inten-
sity Ai . If the conditional probabilities P (B|Ai ) are known, then the total probability P (B) can be
obtained using Equation (1.8).
In the cases where the conditional proposition is represented by a continuous real-valued variable
x ∈ X , (e.g. the rainfall intensity), the total probability theorem is rewritten as following
Z
P (B) = P (B|x)p(x)dx (1.9)
X

where p(x) is the probability density function previously presented of continuous variable x. In
what follows, P (·) is used to denote probability, whereas a PDF is expressed as p(·).
10 Bayesian Inverse Problems: Fundamentals and Engineering Applications

When both propositions can be expressed by continuous-valued variables x ∈ X and y ∈ Y,


(e.g. traffic intensity and rainfall intensity, respectively), then
Z
p(y) = p(y|x)p(x)dx (1.10)
X

where p(y|x) is the conditional probability density function between the two propositions.

In the context of mutually exclusive propositions, Bayes’ theorem and total probability theorem
can be combined to obtain the conditional probability of a particular proposition Ai , as follows
P (B|Ai )P (Ai )
P (Ai |B) = N
(1.11)
X
P (B|Aj )P (Aj )
j=1
| {z }
P (B)

The same applies for conditional propositions described by discrete and continuous-valued variables,
as
P (B|x)p(x)
p(x|B) = R (1.12)
X
P (B|x)p(x)dx
or reciprocally
p(x|Bi )P (Bi )
P (Bi |x) = PN (1.13)
j=1 p(x|Bj )P (Bj )

For continuous-valued propositions x ∈ X , y ∈ Y, Bayes’ theorem can be written as


p(y|x)p(x)
p(x|y) = R (1.14)
X
p(y|x)p(x)dx
or reciprocally
p(x|y)p(y)
p(y|x) = R (1.15)
Y
p(x|y)p(y)dy
The integrals in Equations (1.12), (1.14) and (1.15) are nontrivial except for some particular
cases [22]. Section 1.7.1 below provides sampling methods to numerically approximate these equa-
tions.

1.6 The Bayesian approach to inverse problems


The fundamentals of Bayesian philosophy are well-established in several textbooks [117, 118, 30,
172], and the Bayesian approach is being increasingly applied to engineering problems in recent
times, especially to solve inverse problems. This leads to two important questions: What is an
inverse problem? Why is it important? Prior to understanding the concepts related to an “inverse
problem” and their practical implications, it is first necessary to understand the idea of a “forward
problem”.
Introduction to Bayesian Inverse Problems 11

1.6.1 The forward problem


In general, a forward problem can be defined as predicting how an engineering system will behave
based on some knowledge about the system. This includes knowledge regarding the system’s charac-
teristics/properties, inputs experienced or to be experienced by the system, a model that represents
an input-output relationship, where the input includes the system’s loading conditions, operating
conditions, usage conditions, etc., and the output corresponding to the system behaviour, model
parameters (if any), etc. Sometimes, the input-output relationship is explicitly available in the form
of a mathematical model, as follows:
x = g(u, θ) (1.16)
In the expression (1.16), g represents the mathematical expression of the model, the vector u
represents all inputs (including loading and usage conditions), the vector θ represents the model
parameters (such parameters are specific to the model being developed; sometimes they have an
explicit physical meaning whereas some other times, they are simply “fitting” or “calibration”
parameters without physical meaning), and x represents the system output, also referred to as the
system response.
The goal of the forward problem is to predict x based on existing knowledge about g, u, and θ.
Note that Equation (1.16) is deliberately chosen to be very simplistic for the purpose of explanation;
however it may be much more complicated in practise. For example, contrary to an explicit model
such as Equation (1.16), several practical engineering systems are studied using implicit models,
that is, x is not an explicit function of u, and θ. Sometimes they are described using a complicated
partial differential equation. This renders the evaluation of a forward model challenging. A simple
but practically challenging example of such a forward problem is the prediction of the deflection of
an aircraft wing in flight. In this problem, g is essentially composed of a coupled, multi-disciplinary
partial differential equation system where some equations are based on fluid mechanics, while others
refer to structural and solid mechanics.
Apart from these complexities, the forward problem is affected by several sources of uncertainty,
as explained earlier in Section 1.2. Rarely do we know all about g, u, or θ precisely. In fact, the
famous statistician George Box says: “All models are wrong; some are useful.” Therefore, g may be
prone to prediction errors. Moreover, there may even be not just one single model g but multiple
competing g’s, and it may be necessary to employ a hybrid combination of these in order to make
a rigorous and robust prediction. The inputs u and parameters θ could be imprecisely known, and
such imprecision could be represented using probability distributions. This is how forward problem
becomes stochastic in nature, and probabilistic methods need to be employed in order to deal with
such sources of uncertainty. Therefore, the goal of a forward problem is to determine the probability
distribution of x given the best knowledge about g, u, and θ. Typically, forward problems are solved
with quantified uncertainty using Monte Carlo sampling [85] and related-advanced methods [14].
Model-based reliability analysis methods like first-order reliability analysis (FORM) and second-
order reliability analysis (SORM) can be also useful to solve the forward problem [86, 57, 56, 185].
Notwithstanding, a straightforward yet rigorous way to obtain a probabilistic forward model
from any deterministic I/O model is by stochastic embedding [22]. This can be achieved by intro-
ducing an uncertain model error term e as a discrepancy between the actual system output y and
the modelled output x = g(u, θ), as follows:

y = g(u, θ) + |{z}
e (1.17)
|{z} | {z }
system output model output error
12 Bayesian Inverse Problems: Fundamentals and Engineering Applications

Since e is uncertain, it can be modelled by a probability distribution (e.g. a Gaussian dis-


tribution), and this in turns determines the probability model for the system output y. For ex-
ample, if the error term is assumed to be modelled as a zero-mean Gaussian distribution, that
is, e ∼ N (0, Σe ), then the system output y will be distributed as a Gaussian distribution, as
follows:
e = (y − g(u, θ)) ∼ N (0, Σe ) =⇒ y ∼ N (g(u, θ), Σe ) (1.18)
where Σe is the covariance matrix of the error term. See Figure 1.1 for a graphical description of
the stochastic embedding process.

nds N (0, Σe )
g(u; θ)
Uncertainty bands N (0, Σe )
x

Figure 1.1: Illustration of the stochastic embedding process represented in Eq. (1.17).

A rational way to establish a probability model for the error term is given by the Principle
of Maximum Information Entropy (PMIE) [102], which states that a probability model should
be conservatively selected to produce the most prediction uncertainty (largest Shannon entropy),
subjected to parameterised constraints. Thus, if the error variable e is constrained to a particular
mean value µe (e.g. µe = 0) and a variance or covariance matrix Σe , then it can be shown by
PMIE that the maximum-entropy probability model for e is the Gaussian PDF with mean µe and
covariance matrix Σe , i.e. e ∼ N (µe , Σe ). In this context, it follows from expression (1.18) that a
probabilistic forward model can be obtained from the deterministic model g(u, θ) as
1
 
− 1 T
p(y|u, θ) = (2π)No |Σe | 2 exp − (y − x) Σ−1 (y − x) (1.19)
2 e

where No is the size of the y vector and x = g(u, θ) is the output of the deterministic forward model.

It should be noted that Equation (1.17) implicitly assumes that both the model error and the
measurement error are subsumed into e. Such an assumption can be adopted when the measure-
ment error is negligible as compared to the model error, or when an independent study about the
measurement error is not available. Otherwise, e in Equation (1.17) can be expressed as the sum of
the two independent errors e = em +ed , with em being the model error, and ed the measurement er-
ror. Under the maximum-entropy assumption of zero-mean Gaussian errors, the consideration of the
Introduction to Bayesian Inverse Problems 13

two independent errors, namely, measurement and modelling errors, would lead to Equation (1.19)
with Σe = Σem + Σed as covariance matrix.

1.6.2 The inverse problem


In contrast to the forward problem, an inverse problem can be defined as the process of inferring
unknown parts of the system (which may include the inputs, the parameters, or even the model-form
itself) based on a limited and possibly noisy set of observations/measurements of the system output.
The processes of “model parameter estimation”, “statistical model calibration”, “model selection”,
“system identification” (commonly encountered in system/structural health monitoring), etc. are all
examples of inverse problems. In the context of Equation (1.16), an inverse problem is the process
of inferring g and/or u and/or θ based on observed measurements of y.
A natural question is: Why is solving the inverse problem important? It turns out that, in order
to accurately solve the forward problem and predict the system response, it is first essential to solve
the inverse problem. For instance, in the context of Equation (1.16), how would an analyst know
the probability distributions of u and θ, to be used to simulate the forward problem? Sometimes,
the inputs u can be known based on system-knowledge; on the contrary, the parameters θ are
almost always estimated through an inverse problem. Sometimes, even inputs are estimated using
inverse problems; for example, based on the damage experienced by a building, some characteristics
of the earthquake can be inferred. Inverse problems are not only useful for estimating u and θ,
but also for estimating the goodness of the underlying model form “g”; when there are competing
models, inverse problems provide a systematic approach to probabilistic model selection. Therefore,
in practical engineering applications, it is perhaps necessary to iteratively solve the forward problem
as well as the inverse problem in order to achieve desirable results with acceptable confidence.
Another related issue is: Even when there is no stochasticity/uncertainty involved, it is pre-
ferred to solve an inverse problem using a probabilistic methodology. This is because, while the
forward problem may commonly have a unique solution, the inverse problem may have many pos-
sible solutions [181], that is, there are several models, model parameters, and even different model
classes that may be consistent with the same set of observed measurements/data. Figure 1.2 depicts
this concept. In this context, providing deterministic single-valued solutions for inverse problems
has a limited meaning if one considers that the model itself is just an idealisation of reality, and
furthermore, the existence of measurements errors. Instead, probability-based solutions should be
provided, which carry information about the degree of plausibility of the models and hypotheses
which are more consistent with the observations.
Several methods based on classical statistics [29] and Bayesian statistics [27, 22] have been
proposed to deal with probabilistic inverse problems. It has been widely recognised that the
Bayesian approach to inverse problems provides the most adequate and rigorous framework for
“well-developed” inverse problems [102]. The Bayesian approach to inverse problems aims at es-
timating the posterior probability of a particular hypothesis or model (e.g. unknown model form
and/or parameters), conditional on all the evidence at hand (typically, experimental data). This
is accomplished by Bayes’ theorem, which converts the available prior information about the hy-
potheses or missing parts of the model, into posterior information, based on the observed system
response. This model inference process is explained in Section 1.7 for model parameter estimation,
and in Section 1.8 for model class selection.
14 Bayesian Inverse Problems: Fundamentals and Engineering Applications

Mi Mj
g(u; θ)

g(u; θ)
u u
(a) Complex model (b) Simpler model

Figure 1.2: Illustrative example of different model classes consistent with the data.

1.7 Bayesian inference of model parameters


A typical goal in physical sciences and engineering is to estimate the values of the uncertain param-
eters θ, such that a model g(u, θ) parameterised by θ belonging to a particular model class M, is
more likely to reproduce the observed system response data D = {ŷ1 , ŷ2 , . . . , ŷn }. An engineering
example for this problem would be the estimation of the plausible values for the effective mechanical
properties of a degraded structure based on ambient vibration data. Assuming continuous-valued
parameters θ, this plausibility is represented by the conditional PDF p(θ|D, M), which reads:
if model class M is adopted and information D from the system response is available (conditional
proposition), then the model specified by θ will predict the system output with probability p(θ|D, M).
From the above description, we can highlight three important pieces of information in the
Bayesian inverse problem, which are described here:
D : Data set containing the system output or input-output data, depending on the experimental
setup.
M : Candidate model class among a set of possible model classes hypothesised to represent the
system (e.g. analytical model, semi-empirical model, finite-element model, etc.)
θ : Set of uncertain model parameters belonging to a specific model class M, that calibrate the
idealised relationships between the input and output of the system.
Our interest here is to obtain the posterior PDF p(θ|D, M). This is given by Bayes’ theorem,
as follows

p(D|θ, M)p(θ|M)
p(θ|D, M) = (1.20)
p(D|M)
Introduction to Bayesian Inverse Problems 15

Note that Bayes’ theorem takes the initial quantification of the plausibility of each model specified
by θ in M, which is expressed by the prior PDF p(θ|M), and updates this plausibility to obtain
the posterior PDF p(θ|D, M) by using information from the system response expressed through
the PDF p(D|θ, M), known as the likelihood function. The likelihood function provides a measure
about how likely the data D are reproduced by model class M specified by θ1 . It is obtained by
evaluating the data D as the outcome of the stochastic forward model given by Equation (1.19).
Figure 1.3 illustrates the concepts for prior and posterior information of model parameters by means
of their associated probability density functions.

p(θ|M)
p(θ|D, M)

θ θ
(a) Prior PDF (b) Posterior PDF

Figure 1.3: Illustration of the prior and posterior information of model parameters. Observe that
after assimilating the data D, the posterior probabilistic information about θ is concentrated over
a narrower space.

Example 2 Suppose that we are asked to estimate the gravitational acceleration g of a comet based
on a sequence of measurements D = {ŷ1 , ŷ2 , · · · , ŷT } about the angular displacement of a pendulum
mounted on a spacecraft which has landed on the comet. These measurements were obtained using an
on-board sensor whose precision is known to be given by a zero-mean Gaussian noise et ∼ N (0, σ),
with σ being independent of time.
The (deterministic) physical model M describing the angular displacement of the pendulum as
a function of time is assumed to be given by

xt = sin(θt) (1.21)
1 The concept of likelihood is used both in the context of physical probabilities (frequentist) and subjective proba-

bilities, especially in the context of parameter estimation. From a frequentist point of view (the underlying parameters
are deterministic), the likelihood function can be maximised in order to obtain the maximum likelihood estimate of
the parameters. According to Fisher [67], the popular least squares approach is an indirect approach to parameter
estimation and one can “solve the real problem directly” by maximising the “probability of observing the given data”
conditioned on the parameter θ [67, 4]. On the other hand, the likelihood function can also be interpreted using
subjective probabilities. Singpurwalla [167, 168] explains that the likelihood function can be viewed as a collection
of weights or masses and therefore, it is meaningful only up to a proportionality constant [62]. In other words, if
p(D|θ(1) ) = 10, and p(D|θ(2) ) = 100, then D is ten times more likely to be reproduced by θ(2) than by θ(1) .
16 Bayesian Inverse Problems: Fundamentals and Engineering Applications

where θ = Lg is the uncertain model parameter, with g being the actually unknown variable, and L
p

is the length of the pendulum (known). Based on some theoretical considerations, the gravitational
acceleration g is known to be bounded within the interval [g1 , g2 ]; thus, we can use this information
to define a uniform prior PDF for this parameter, as
1
p(θ|M) = (1.22)
θ2 − θ1
q
gj
where θj = L , j = 1, 2. Given the existence of measurement errors, the observed system response
would actually be represented by the equation

yt = sin(θt) + et (1.23)

where et ∼ N (0, σ). Therefore, the stochastic forward model of the system response will be given by

1 1 (yt − sin θt)2


 
p(yt |θ, M) = √ exp − (1.24)
2πσ 2 2 σ2

As explained above, the likelihood function p(D|θ, M) can be obtained by evaluating the data D as
the outcome of the stochastic forward model, therefore

p(D|θ, M) = p(y1 = ŷ1 , y2 = ŷ2 , . . . , yT = ŷT |θ, M) (1.25a)


T
1 1 (ŷt − sin θt)2
Y  
= √ exp − (1.25b)
t=1 2πσ 2 2 σ2

Then, based on Bayes’ theorem, the updated information about the gravitational acceleration of the
comet can be obtained as

p(θ|D, M) ∝ p(D|θ, M) p(θ|M) (1.26)


| {z } | {z }
Eq. (1.25b) Eq. (1.22)

which can be readily simulated using the stochastic simulated methods explained below.

Note from Equation (1.25b) that we implicitly assume there is no dependence between the ob-
QT
servations, that is, p(ŷ1 , . . . , ŷT |θ, M) = t=1 p(ŷt |θ, M). It is emphasised that this stochastic
independence refers to information independence and should not be confused with causal indepen-
dence. It is equivalent to asserting that if the modelling or measurement errors at certain discrete
times are given, this does not influence the error values at other times.
Apart from the likelihood function, another important factor in Equation (1.20) is p(D|M),
which is known as the evidence (also called the marginal likelihood) for the model class M. This
factor expresses how likely the observed data will be reproduced if model class M is adopted. The
evidence can be obtained by total probability theorem as
Z
p(D|M) = p(D|θ, Mj )p(θ|Mj )dθ (1.27)
Θ
Introduction to Bayesian Inverse Problems 17

In most practical situations, the evaluation of the multi-dimensional integral in Equation (1.27) is
analytically intractable. Stochastic simulation methods such as the family of Markov Chain Monte
Carlo (MCMC) methods [134, 76] can be used to draw samples from the posterior PDF in Equation
(1.20) while circumventing the evaluation of p(D|M). By means of this, the posterior PDF can be
straightforwardly approximated as a probability mass function, mathematically described as follows
K
1 X
p(θ|D, M) ≈ δ(θ − θe(k) ) (1.28)
K
k=1

where δ(θ − θe(k) ) is the Dirac function, which makes 1 when θ = θe(k) and 0 otherwise, with
θek , k = 1 . . . , K being samples drawn from p(θ|D, M) using an appropriate stochastic simulation
method. See Figure 1.4 for a graphical illustration of this method. Further insight about MCMC
simulation methods is given in Section 1.7.1 below.

Analytical estimation (Eq.M


p(θ|D, 1.19)
j) Stochastically simulated posterior

Post. p(θ|D, M)

Prior p (θ|M)

θ θ

Figure 1.4: Illustration of stochastic simulation for Bayesian inference.

Finally, it should be noted that, although the posterior PDF of parameters provides full information
about the plausibility of model parameters among the full range of possibilities, most of the times
engineering decisions are made based on single-valued engineering parameters. This fact does not
restrict the practical applicability of the Bayesian inverse problem explained here. On the contrary,
not one but several single-valued “representative values” can be extracted from the full posterior
PDF of parameters (e.g. mean, median, percentiles, etc.), which would enrich the decision-making
process with further information. Among them, a value of particular interest is the maximum a
posteriori (MAP), which can be obtained as the value θM AP ∈ Θ which maximises the posterior
PDF, that is, θM AP = arg maxθ p(θ|D, M). Note from Equation (1.20) that the MAP is equivalent
to the widely known maximum likelihood estimation method (MLE), namely, θ that maximises
p(D|θ, M), when the prior PDF is an uniform distribution.
18 Bayesian Inverse Problems: Fundamentals and Engineering Applications

1.7.1 Markov Chain Monte Carlo methods


While the power of the Bayesian inverse problem for model updating is well recognised, there
are several computational issues during its implementation that need dedicated solutions. As ex-
plained above, the main difficulty when applying Bayes’ theorem is that the factor p(D|M) in
Equation (1.20) cannot be evaluated analytically nor is it readily calculated by numerical integra-
tion methods. To tackle this problem, Markov Chain Monte Carlo (MCMC) methods [134, 76] have
been widely used for their ability to estimate the posterior PDF while avoiding the evaluation of
p(D|M). In general, the goal of these stochastic simulation methods is to generate samples that are
distributed according to an arbitrary probability distribution, commonly known as the target PDF,
which is known up to a scaling constant [132, 121]. In the context of the Bayesian inverse problem,
such target corresponds to the posterior PDF given by Equation (1.20), and the scaling constant is
the evidence p(D|M).
There are two required properties for the MCMC algorithms to obtain correct statistical esti-
mations about the target: (1) ergodicity, which is concerned with whether the generated samples
can populate the regions of significant probability of the target PDF; and (2) stationarity, which
ensures that the forward samples of the Markov Chain are more equally distributed than the pre-
vious samples, provided that the later are distributed as the target PDF. Under the assumption of
ergodicity, the samples generated from MCMC will converge to the target distribution (provided
that a large amount of samples are used) even if the initial set of samples are simulated from a PDF
different from the target. The theoretical demonstration of ergodicity and stationarity for MCMC
is out of the scope of this book, but the interested reader is referred to [178] for a comprehensive
theoretical treatment of MCMC methods.

1.7.1.1 Metropolis-Hasting algorithm


Several MCMC algorithms have been proposed in the literature such as the Metropolis-Hastings
(M-H), Gibbs Sampler or Slice Sampling, among others [121]. In this chapter, we pay special at-
tention to the Metropolis-Hastings algorithm [131, 90] for its versatility and implementation sim-
plicity. Notwithstanding, the selection of the “best” MCMC algorithm is case specific, and several
other MCMC algorithms can provide an excellent performance in the context of some particular
Bayesian inverse problems.
The M-H algorithm generates samples from a specially constructed Markov Chain whose sta-
tionary distribution is the posterior PDF p(θ|D, M). In M-H, a candidate model parameter θ 0 is
sampled from a proposal distribution q(θ 0 |θ (k−1) ), given the state of the Markov Chain at step
k − 1. At the next state of the chain k, the candidate parameter θ 0 is accepted (i.e. θ (k) = θ 0 ) with
probability min{1, r}, and rejected (θ (k) = θ (k−1) ) with probability 1 − min{1, r}, where r is cal-
culated as:

p(D|θ 0 , M)p(θ 0 |M)q(θ (k−1) |θ 0 )


r= (1.29)
p(D|θ (k−1) , M)p(θ (k−1) |M)q(θ 0 |θ (k−1) )

The process is repeated until Ns samples have been generated. An algorithmic description of this
method is provided in Algorithm 1 below.
Introduction to Bayesian Inverse Problems 19

Algorithm 1: M-H algorithm

1. Initialize θ (0) by sampling from the prior: θ (0) ∼ p(θ|M)


for k = 1 to Ns do
2. Sample from the proposal: θ 0 ∼ q(θ 0 |θ (k−1) )
3. Compute r from Equation (1.29)
4. Generate a uniform random number: α ∼ U [0, 1]
if r > α then
5. Set θ (k) = θ 0
else
6. Set θ (k) = θ (k−1)
end if
end for

An important consideration for the proper implementation of the M-H algorithm is the specifi-
cation of the variance σq2 of the proposal distribution q(θ 0 |θ (k−1) ), typically Gaussian, which has a
significant impact on the speed of convergence of the algorithm [90]. Small values tend to produce
candidate samples that are accepted with high probability, but may result in highly dependent
chains that explore the space very slowly. In contrast, large values of the variance tend to produce
large steps and therefore a fast space exploration, but result in small acceptance rates and thus,
a larger time of convergence. Therefore, it is often worthwhile to select appropriate proposal vari-
ances by controlling the acceptance rate (e.g. number of accepted samples over total amount of
samples) in a certain range, depending on the dimension d of the proposal PDF, via some pilot runs
[73, 152]. The interval [20% − 40%] is suggested for the acceptance rate in low dimensional spaces,
say d 6 10 [152]. Note also that if a Gaussian distribution is chosen as the proposal, then q has the
symmetry property, that is, q(θ 0 |θ (k−1) ) = q(θ (k−1) |θ 0 ), then Equation (1.29) simplifies as
p(D|θ 0 , M)p(θ 0 |M)
r= (1.30)
p(D|θ (k−1) , M)p(θ (k−1) |M)
Furthermore, in case of adopting a uniform probability model for the prior PDFs of the model
parameters, then the M-H test in Equation (1.29) reduces as
p(D|θ 0 , M)
r= (1.31)
p(D|θ (k−1) , M)
which corresponds to the ratio between likelihoods.

1.8 Bayesian model class selection


As explained in Section 1.6.2, the Bayesian inverse problem provides a rigorous and systematic
approach to the problem of model class selection. This approach is motivated by the fact that
the model itself may not necessarily reproduce the observed system, but it is just an approxima-
tion [102]. Therefore, there may exist not only different values for model parameters but physically
different models classes that may be consistent with the observations of such a system. As with the
20 Bayesian Inverse Problems: Fundamentals and Engineering Applications

inverse problem of model parameter estimation, the goal here is to use the available information
from the system response D to compare and rank the relative performance of a set of candidate
model classes M = {M1 , . . . , Mj , . . . , MNM } in reproducing the data. This performance can be
compared and ranked using the conditional posterior probability P (Mj |D, M), which provides in-
formation about the relative extent of support of model class Mj for representing the data D among
the set of candidates M = {M1 , . . . , Mj , . . . , MNM }.

The requested posterior probability of the overall model can be obtained extending Bayes’
theorem at the model class level [22], as
p(D|Mj )P (Mj |M)
P (Mj |D, M) = XNM (1.32)
p(D|Mi )P (Mi |M)
i=1

where P (Mj |M) is the prior probability of the jth model class in the set M, satisfying
PNM
j=1 P (Mj |M) = 1. This prior probability expresses the initial modeller’s judgement on the
relative degree of belief on Mj within the set M. An important point to remark here is that ei-
ther the prior and posterior probabilities, P (Mj |M) and P (Mj |D, M), respectively, refer to the
PNM
relative probability in relation to the set of models M, thus satisfying j=1 P (Mj |M) = 1 and
PNM
j=1 P (Mj |D, M) = 1, respectively. In this context, computing the posterior plausibility of a
model class would automatically require to compute it for all models classes in M. Notwithstand-
ing, if the interest is to compare the performance of two competing model classes Mi and Mj , this
can be straightforwardly done using the concept of Bayes’ factor, as follows
P (Mi |D, M) p(D|Mi , M) P (Mi )
= (1.33)
P (Mj |D, M) p(D|Mj , M) P (Mj )
which does not require computing the posterior over all possible model classes. When the prior
plausibilities of the two candidate model classes are identical, that is, P (Mi ) = P (Mj ), then
Bayes’ factor reduces to the ratio of evidences of the model classes.

Example 3 Figure 1.5 shows an illustration of a typical problem of model class Passessment using
4
four generic model classes, that is, M = {M1 , M2 , M3 , M4 }. Observe that j=1 P (Mj |M) =
P (Mj |D, M) = 1. Initially, the prior plausibility of the four model classes is identical, that is,
P (Mj |M) = 0.25, j = 1, . . . , 4. After the updating process, model class M2 results to be the most
plausible. Should another model class, let say M5 , be added to the set of candidates,
P5 then the values
for both the prior and the posterior plausibilities would be different to satisfy j=1 P (Mj |M) =
P (Mj |D, M) = 1.

An important element in any Bayesian model class selection problem is the evidence p(D|Mj ),
previously explained in Section 1.7. This factor expresses how likely the observed system response
(D) is reproduced if the overall model class Mj is adopted instead of an alternative model class. The
evidence is obtained by total probability theorem given by Equation (1.27). It can be observed that
the evidence is equal to the normalising constant in Equation (1.20) for model parameter estimation.
Once the evidences for each model class are computed, their values allow us to rank the model
classes according to the posterior probabilities given in Equation (1.32). However, as explained
before, the evaluation of the multi-dimensional integral in Equation (1.27) is analytically intractable
in most of the cases except for some cases where the Laplace’s method of asymptotic approximation
Introduction to Bayesian Inverse Problems 21

0.6
0.6 Prior Posterior
Relative Plausibility

0.4

0.25 0.25 0.25 0.25


0.2
0.2

0.1 0.1

M1 M2 M3 M4

Figure 1.5: Example of relative prior and posterior probabilities for model classes.

can be used [27]. Details about stochastic simulation methods to compute the evidence are given
in Section 1.8.1 below.

1.8.1 Computation of the evidence of a model class


The calculation of the evidence given in Equation (1.27) is the most difficult task when assessing
the relative plausibility of a model class. In the case of globally identifiable model classes based
on the data [26, 110], the posterior PDF in Equation (1.20) may be accurately approximated by
a Gaussian distribution, and the evidence term can be obtained by Laplace’s approximation. The
reader is referred to [27, 26, 140] for details about this technique. However, in the more general
case, stochastic simulation methods are required.
One straightforward way to approximate the evidence is by considering the probability integral
in Equation (1.27) as a mathematical expectation of the likelihood p(D|θ, Mj ) with respect to the
prior PDF p(θ|Mj ). This approach leads to the Monte Carlo method of numerical integration, as
follows
N1
1 X
p(D|Mj ) ≈ p(D|θ (k) , Mj ) (1.34)
N1
k=1

where the θ (k) are N1 samples drawn from the prior PDF. However, although this calculation can
be easily implemented and can provide satisfactory results, it may result in a computationally in-
efficient method, since the region of high probability content of p(θ|Mj ) is usually very different
from the region where the likelihood p(D|θ, Mj ) has its largest values. To overcome this problem,
some techniques for calculating the evidence based on samples from the posterior p(θ|D, Mj ) are
available [135, 72, 40]. Among them, we reproduce in this chapter the method proposed by Cheung
22 Bayesian Inverse Problems: Fundamentals and Engineering Applications

and Beck [40] based on an analytical approximation of the posterior [40], which is presented here
with uniform notation in the context of the Metropolis-Hastings algorithm.

Let K(θ|θ ∗ ) be the transition PDF of any MCMC algorithm with stationary PDF π(θ) =
p(θ|D, Mj ). The stationarity condition for the MCMC algorithm satisfies the following relation
Z
π(θ) = K(θ|θ ∗ )π(θ ∗ )dθ ∗ (1.35)

A general choice of K(θ|θ ∗ ) that applies to many MCMC algorithms, can be defined as

K(θ|θ ∗ ) = T (θ|θ ∗ ) + (1 − a(θ ∗ ))δ(θ − θ ∗ ) (1.36)

where T (θ|θ ∗ ) is a smooth function that does


R not contain delta functions and a(θ ∗ ) is the acceptance
probability which must satisfy a(θ ) = T (θ|θ )dθ 6 1. By substituting Equation (1.36) into
∗ ∗

(1.35), an analytical approximation of the target posterior results as follows


N1
T (θ|θ ∗ )π(θ ∗ )dθ ∗ 1
R
X
π(θ) = p(θ|D, Mj ) = ≈ T (θ|θ (k) ) (1.37)
a(θ) a(θ)N1
k=1

where the θ (k) are N1 samples distributed according to the posterior. For the special case of the
Metropolis-Hastings algorithm, the function T (θ|θ ∗ ) can be defined as T (θ|θ ∗ ) = r(θ|θ ∗ )q(θ|θ ∗ ),
where q(θ|θ ∗ ) is the proposal PDF, and r(θ|θ ∗ ) is given by
( )
p(D|θ, Mj )p(θ|Mj )q(θ ∗ |θ)
r(θ|θ ) = min 1,

(1.38)
p(D|θ ∗ , Mj )p(θ ∗ |Mj )q(θ|θ ∗ )

Additionally, for this algorithm, the denominator a(θ) in Equation (1.37) can be approximated by
an estimator that uses samples from the proposal distribution as follows
N2
1 X
Z
a(θ) = r(θ̃|θ)q(θ̃|θ)dθ̃ ≈ r(θ̃ (k) |θ) (1.39)
N2
k=1

where the θ̃ (k) are N2 samples from q(θ̃|θ), when θ is fixed. Once the analytical approximation to
the posterior in Equation (1.37) is set, then Equation (1.20) can be used to evaluate the evidence,
as follows
log p(D|Mj ) ≈ log p(D|θ, Mj ) + log p(θ|Mj ) − log p(θ|D, Mj ) (1.40)
| {z }
Analytical approx.

The last expression is obtained by taking logarithms of Bayes’ theorem, explained earlier in Equa-
tion (1.20). Observe that, except for the posterior PDF p(θ|D, Mj ) whose information is based on
samples, the rest of terms can be evaluated analytically for any θ ∈ Θ. Bayes’ theorem ensures
that the last equation is valid for all θ ∈ Θ, so it is possible to use only one value for this parame-
ter. However a more accurate estimate for the log-evidence can be obtained by averaging the results
from Equation (1.40) using different values for θ [40, 43]. The method is briefly summarised by the
pseudo-code given in Algorithm 2, which specifically focuses on the proposed implementation for
the inverse problem based on the M-H algorithm.
Introduction to Bayesian Inverse Problems 23

Algorithm 2: Evidence computation by [40]


 N1
1.- Take θ (k) k=1 samples from p(θ|D, Mj )
2.- Choose a model parameter vector θ ∈ Θ
for k = 1 to N1 do
3.- Evaluate q(θ|θ (k) )
4.- Evaluate r(θ|θ (k) ) (Eq. (1.29))
end for 
N2
5.- Take θ (`) `=1 samples from q(·|θ)
for ` = 1 to N2 do
6.-Evaluate r(θ (`) |θ) (Eq. (1.29))
end for PN1
1
N1
q(θ|θ (k) )r(θ|θ (k) )
7.- Obtain p(θ|D, Mj ) ≈ k=1
1
PN2
N2
r(θ (`) |θ)
`=1
8.- Evaluate log p(D|Mj ) (Eq. (1.40))

1.8.2 Information-theory approach to model-class selection


From the perspective of forward modelling problems, more complex models tend to be preferred
over simpler models because they are considered more realistic. For inverse problems, however, this
may not be the case. A “complex” model may lead to over-fitting of the data where the model is
unnecessarily complex in relation to the data. It means that the model does not generalise well
when making predictions since it depends too much on the details of the data. Figure 1.2 illustrates
this model complexity concept.
A common principle in science and engineering is that, if data can be explained by several models,
then the “simpler” one should be preferred over more complex models that lead to only slightly
better agreement with the data. This is often referred to as the Principle of Model Parsimony
or Ockham’s razor. The Bayesian approach to model class selection explained here shows that
the evidence of a model class automatically enforces a quantitative expression of Ockham’s razor
[82, 102]. This was formally shown by Muto and Beck [132], which led to an expression for the
evidence that allows an information-theoretic interpretation of Ockham’s razor, as follows

p(θ|D, Mj )
Z Z  
log p(D|Mj ) = [log p(D|θ, Mj )] p(θ|D, Mj )dθ − log p(θ|D, Mj )dθ (1.41)
Θ Θ p(θ|Mj )

The first term of the right side of Equation (1.41) is the log-likelihood function averaged by
the posterior PDF, which can be interpreted as a measure of the average goodness of fit (AGF) of
the model Mj . The second term is the relative entropy between the posterior and the prior PDFs,
which measures the “difference” between those PDFs. This difference will be larger for models that
extract more information from the data to update their prior information, and determines the
expected information gained (EIG) about the model class Mj from the data. This term is always
non-negative, and, since it appears subtracting the data-fit (AGF) term, it provides a penalty
against more complex model classes, which extract more information from the data to update their
prior information. Therefore, the log-evidence of a model class is comprised of a data-fit term (AGF)
and a term (EIG) that provides a penalty against more complex model classes. This interpretation
of the evidence allows us to find a correct trade-off between fitting accuracy and model complexity
24 Bayesian Inverse Problems: Fundamentals and Engineering Applications

for a particular model class, and gives an intuitive understanding of why the computation of the
evidence automatically enforces a quantitative expression of the Principle of Model Parsimony or
Ockham’s razor [102].

1.9 Concluding remarks


This chapter presented an overview of uncertainty quantification and its application to engineering
systems, with a special emphasis on Bayesian methods. It was explained that probability theory
provides a systematic foundation to deal with uncertainty in engineering systems, and the funda-
mentals of probability theory were explained in detail. Further, the interpretation of probability was
discussed, drawing the distinction between frequentist probabilities, commonly encountered in the
context of aleatory uncertainty, and subjective probabilities, commonly encountered in Bayesian
analysis. It was shown that subjective probabilities are versatile enough to deal with both aleatory
uncertainty and epistemic uncertainty, by means of expressing the degree of belief regarding certain
variables or hypotheses. The topics of forward problem and inverse problem were discussed in detail,
and it was also explained that the Bayesian approach provides a systematic, rigorous framework to
solve inverse problems including the problem of model class selection.
The following chapters in this book will delve into the topic of Bayesian inverse problems in
greater detail, and will illustrate these methods using practical engineering applications.
2
Solving Inverse Problems by Approximate Bayesian
Computation

Manuel Chiachı́o-Ruano, ∗ Juan Chiachı́o-Ruano and Marı́a L. Jalón

This chapter aims at supplying information about the theoretical basis of Approximate Bayesian
Computation (ABC), which is an efficient computational tool to solve inverse problems without
the need to formulate, nor evaluate the likelihood function. By ABC, the posterior PDF can be
computed in those cases where the likelihood function is intractable, impossible to formulate, or
computationally demanding. Several ABC pseudo-codes are included in this chapter and an example
of application is provided. Finally, the ABC-SubSim algorithm, which was initially proposed by
Chiachı́o et al. [SIAM Journal of Scientific Computing, Vol. 36, No. 3, pp. A1339–A1358 ], is
explained within the context of an example of application.

2.1 Introduction to the ABC method


As explained in Chapter 1, the Bayesian inverse problem aims at updating the a priori information
about a set of parameters θ ∈ Θ ⊂ Rd for a parameterised model class Mj , based on the avail-
able information from the system response contained in the data D ∈ D ⊂ R . This is achieved
by Bayes’ theorem which yields the posterior PDF p(θ|D, Mj ) of the model specified by θ in the
model class Mj . However, in engineering practise, there are situations where the Bayesian inverse
problem involves a likelihood function that is not completely known or it is computationally un-
affordable, perhaps because it requires the evaluation of an intractable multi-dimensional integral
[127]. Approximate Bayesian Computation (ABC) was conceived with the aim of evaluating the
posterior PDF in those cases where the likelihood function is intractable [182]. In the Bayesian lit-
erature, such a method is also called as a likelihood-free computation algorithm, since it circumvents
the explicit evaluation of the likelihood function by using a stochastic simulation approach. In this
section, the method ABC is briefly described.

Let x ∈ R denote a simulated outcome from p(x|θ, Mj ), the stochastic forward model for
model class Mj parameterised by θ, formerly explained in Chapter 1, Equation (1.19). ABC aims
at evaluating the posterior p(θ|D, Mj ) ∝ p(D|θ, Mj )p(θ|Mj ) by applying Bayes’ theorem to the

University of Granada, Spain.


* Corresponding author: mchiachio@ugr.es
26 Bayesian Inverse Problems: Fundamentals and Engineering Applications

pair (θ, x) ∈ Θ × D ⊂ Rd+ :


p(θ, x|D) ∝ p(D|x, θ)p(x|θ)p(θ) (2.1)
In the last equation, the conditioning on model class Mj has been omitted for simplicity, given that
the ABC theory is valid irrespective of a chosen model class. The basic form of the ABC algorithm
to generate samples from the posterior given by Equation (2.1), is a rejection algorithm that consists
in jointly generating θ  ∼ p(θ) and x ∼ p(x|θ  ), and accepting them conditional on the equality
x = D being fulfilled. This is due to the fact that the PDF p(D|x, θ) in Equation (2.1) gives higher
density values for the posterior in those regions where x is close to D. Of course, obtaining the
sample x = D is unlikely in most of the cases, and it is only feasible if D consists of a finite set of
values rather than a continuous region in R . To address the above difficulty, two main approxima-
tions have been conceived in ABC theory [128]: a) replace the equality x = D by the approximation
x ≈ D and introduce a tolerance parameter  that accounts for the quality of such approximation
through a suitable metric ρ; and b) introduce a low-dimensional vector of summary statistics η(·)
that allows a weak comparison of the closeness between x and D. Through this approach, the pos-
terior p(θ, x|D) in Equation (2.1) is approximated by p (θ, x|D), which assigns higher
 probability
density to those values of (θ, x) ∈ Θ × D that satisfy the condition ρ η(x), η(D)  .

The standard version of the ABC algorithm defines an approximate likelihood function given
by P (D|θ, x)  P (x ∈ B (D)|θ, x) [46], where B (D) is a region of the data space D defined as
   
B (D) = x ∈ D : ρ η(x), η(D)   (2.2)

In the expression of the approximate likelihood function and also in what follows, P (·) is adopted
to denote probability whereas p(·) denotes a PDF. Thus, from Bayes’ theorem, the approximate
posterior p (θ, x|D) can be obtained as

p (θ, x|D) ∝ P (x ∈ B (D)|x, θ)p(x|θ)p(θ) (2.3)

The approximate likelihood can be formulated as P (x ∈ B (D)|x, θ) = IB(D) (x), with IB(D) (x)
 
being an indicator function for the set B (D) that assigns the unity when ρ η(x), η(D)  ,
and 0 otherwise. It follows that the approximate posterior p (θ, x|D) can be readily computed as

p (θ, x|D) ∝ p(x|θ)p(θ)IB(D) (x) (2.4)

Since the ultimate interest of the Bayesian inverse problem is typically the posterior of model param-
eters p (θ|D), it can be obtained by marginalising the approximate posterior PDF in Equation (2.4)

p (θ|D) ∝ p(θ) p(x|θ)IB(D) (x)dx = P (x ∈ B (D)|θ)p(θ) (2.5)
D

Note that this integration need not be done explicitly since samples from this marginal PDF are
obtained by taking the θ-component of the samples from the joint PDF in Equation (2.4) [151]. A
pseudo-code implementation of ABC algorithm is given below as Algorithm 3.
Solving Inverse Problems by Approximate Bayesian Computation 27

Algorithm 3: Standard ABC

Inputs  {Tolerance value}, η(·) {Summary statistic}, ρ(·) {metric}, K {number of simulations}
Begin
for k = 1 to K do
repeat
1.- Simulate θ  from the prior p(θ|Mj )
 
2.- Generate
  x from  the stochastic forward model p(x|θ , Mj )
until ρ η(x ), η(D)  
Accept (θ  , x ) as (θ (k) , x(k) )
end for

Output {(θ, x)}K


k=1 ∼ p (θ, x|D)

Example 4 Let us consider a 2 [m] length column with 0.4 [m] square cross section, which is loaded
with F = 1 [kN] at the top, as illustrated in Figure 2.1. Let also consider that, for some reason,
the column is made of a degrading material so that its Young’s modulus decreases at an unknown
constant rate ξ from an initial value E0 = 40 [MPa], such that

En = e−ξ En−1 + vn (2.6)

where En is the Young’s modulus at time or instant n ∈ N expressed in weeks, and vn is an un-
known model error term, which is assumed to be distributed as a zero-mean Gaussian with uncertain
standard deviation, that is, vn ∼ N (0, σ). Next, a sensor is assumed to be placed at the top of the
column to register deflections, and the following measurement equation can be considered:

δn = f (En ) + wn (2.7)

where f : R0 → R0 is a mathematical function that provides the deflection of the column as
a function of En . Assuming the linear elasticity theory, this function can be expressed as f =
F L3
3En I , where I is the inertia momentum of the cross section. In Equation (2.7), the term wn is the
measurement error, which is assumed to be negligible as compared to the model error term, so that it

F
L

Figure 2.1: Scheme of structure used for Example 4.


28 Bayesian Inverse Problems: Fundamentals and Engineering Applications

is subsumed into the error term Vn- In this example, the degradation rate and the standard deviation
of the error term are selected as the uncertain model parameters, so that () = {81, 82}= {�, CT},
whose prior information can be represented by the uniform piece-wise PDFs p( 8d = U[O.OOOl,0.02]'
and p(82) = U[O.Ol,2], respectively. The data in this example are given as a recorded time-history
of deflections over a period of time T 200 weeks, that is, V
= {6n.meas}
= These data are
��o.
synthetically generated from Equations (2.6) and (2.7) considering ()true (0.005,0.1), shown in
=

Figure (2.2), panels (a) and (b). The ABC-rejection algorithm is adopted with K 20,000 samples
=

to obtain the approximate posterior of () based on the referred data. The results are shown in
Figure 2.2c.

Figure 2.2: Output of the ABC rejection algorithm in application to Example 4. In panels (a) and
(b), the history plot of measured deflections and stiffness values based on ()true, are shown. Circles
repre8enL8 value8 in Lhe ()-8pace.
Another random document with
no related content on Scribd:
vegetable substances consist, therefore, are not merely mixed
together—they are united in some closer and more intimate manner.
To this more intimate state of union, the term chemical combination
is applied—the elements are said to be chemically combined.
Thus, when charcoal is burned in the air, it slowly disappears,
and forms, as already stated, a kind of air known by the name of
carbonic acid gas, which rises into the atmosphere and disappears.
Now, this carbonic acid is formed by the union of the carbon
(charcoal), while burning, with the oxygen of the atmosphere, and in
this new air the two elements, carbon and oxygen, are chemically
combined.
Again, if a piece of wood or a bit of straw, in which the elements
are already chemically combined, be burned in the air, these
elements are separated and made to assume new states of
combination, in which new states they escape into the air and
become invisible. When a substance is thus changed by the action
of heat, it is said to be decomposed, or if it gradually decay and
perish by exposure to the air and moisture, it undergoes slow
decomposition.
When, therefore, two or more substances unite together, so as to
form a third possessing properties different from both, they enter into
chemical union—they form a chemical combination or chemical
compound. When, on the other hand, one compound body is so
changed as to be converted into two or more substances different
from itself, it is decomposed. Carbon, hydrogen, &c., are chemically
combined in the interior of the plant during the formation of wood:
wood, again, is decomposed when by the vinegar-maker it is
converted among other substances into charcoal and wood-vinegar,
and the flour of grain when the brewer or distiller converts it into
ardent spirits.
CHAPTER II.
Form in which these different substances enter into Plants.
Properties of the Carbonic, Humic, and Ulmic Acids—of
Water, of Ammonia, and of Nitric Acid. Constitution of
the Atmosphere.

SECTION I.—FORM IN WHICH THE CARBON,


ETC.
ENTER INTO PLANTS.
It is from their food that plants derive the carbon, hydrogen,
oxygen, and nitrogen, of which their organic part consists. This food
enters partly by the minute pores of their roots, and partly by those
which exist in the green part of the leaf and of the young twig. The
roots bring up food from the soil, the leaves take it in directly from
the air.
Now, as the pores in the roots and leaves are very minute,
carbon (charcoal) cannot enter into either in a solid state; and as it
does not dissolve in water, it cannot, in the state of simple carbon, be
any part of the food of plants. Again, hydrogen gas neither exists in
the air nor usually in the soil—so that, although hydrogen is always
found in the substance of plants, it does not enter them in the state
of the gas above described. Oxygen exists in the air, and is directly
absorbed both by the leaves and by the roots of plants; while
nitrogen, though it forms a large part of the atmosphere, is not
supposed to enter directly into plants in any considerable quantity.
The whole of the carbon and hydrogen, and the greater part of
the oxygen and nitrogen also, enter into plants in a state of chemical
combination with other substances; the carbon chiefly in the state of
carbonic acid, and of certain other soluble compounds which exist in
the soil; the hydrogen and oxygen in the form of water: and the
nitrogen in those of ammonia or nitric acid. It will be necessary
therefore briefly to describe these several compounds.

SECTION II.—OF THE CARBONIC, HUMIC,


AND ULMIC ACIDS.

1. Carbonic Acid.—If a few pieces of chalk or limestone be


put into the bottom of a tumbler, and a little spirit of salt (muriatic
acid) be poured upon them, a boiling up or effervescence will take
place, and a gas will be given off, which will gradually collect and fill
the tumbler; and when produced very rapidly, may even be seen to
run over its edges. This gas is carbonic acid. It cannot be
distinguished from common air by the eye; but if a taper be plunged
into it, the flame will immediately be extinguished, while the gas
remains unchanged. This kind of air is so heavy, that it may be
poured from one vessel into another, and its presence recognised by
the taper. It has also a peculiar odour, and is exceedingly
suffocating, so that if a living animal be introduced into it, life
immediately ceases. It is absorbed by water, a pint of water
absorbing or dissolving a pint of the gas.
Carbonic acid exists in the atmosphere; it is given off from the
lungs of all living animals while they breathe; it is also produced
largely during the burning of wood, coal, and all other combustible
bodies, so that an unceasing supply of this gas is poured into the air.
Decaying animal and vegetable substances also give off this gas,
and hence it is always present in greater or less abundance in the
soil, and especially in such soils as are rich in vegetable matter.
During the fermentation of malt liquors, or of the expressed juices of
different fruits,—the apple, the pear, the grape, the gooseberry—it is
produced, and the briskness of such fermented liquors is due to the
escape of this gas. From the dung and compost heap it is also given
off; and when put into the ground in a fermenting state, farm-yard
manure affords a rich supply of carbonic acid to the young plant.
Carbonic acid consists of carbon and oxygen only, combined
together in the proportion of 28 of the former to 72 of the latter, or
100 lbs. of carbonic acid contain 28 lbs. of carbon and 72 lbs. of
oxygen.
2. Humic and Ulmic Acids.—The soil always contains a
portion of vegetable matter (called humus by some writers), and
such matter is always added to it when it is manured from the farm-
yard or the compost heap. During the decay of this vegetable matter,
carbonic acid, as above stated, is given off in large quantity, but
other substances are also formed at the same time. Among these
are the two to which the names of humic and ulmic acids are
respectively given. They both contain much carbon, are both capable
of entering the roots of plants, and both, no doubt, in favourable
circumstances, help to feed the plant.
If the common soda of the shops be dissolved in water, and a
portion of a rich vegetable soil, or a bit of peat, be put into this
solution, and the whole boiled, a brown liquid is obtained. If to this
brown liquid, spirit of salt (muriatic acid) be added till it is sour to the
taste, a brown flocky powder falls to the bottom. This brown
substance is humic acid. But if in this process we use spirit of
hartshorn (liquid ammonia), instead of the soda, ulmic acid is
obtained.
These acids exist along with other substances in the rich brown
liquor of the farm-yard, which is so often allowed to run to waste;
they are also produced in greater or less quantity during the decay of
the manure after it is mixed with the soil, and no doubt yield to the
plant a portion of that supply of food which it must necessarily
receive from the soil.

SECTION III.—OF WATER, AMMONIA,


AND NITRIC ACID.

1. Water.—If hydrogen be prepared in a bottle, in the way


already described, and a gas-burner be fixed into its mouth, the
hydrogen may be lighted, and will burn as it escapes into the air.
Held over this flame a cold tumbler will become covered with dew, or
with little drops of water. This water is produced during the burning of
the hydrogen; and as it takes place in pure oxygen gas as well as in
the open air, this water must contain the hydrogen and oxygen which
disappear, or must consist of hydrogen and oxygen only.
This is a very interesting fact; and were it not that chemists are
now familiar with many such, it could not fail to appear truly
wonderful that the two gases, oxygen and hydrogen, by their union,
should form so very different a substance as water is from either. It
consists of 1 of hydrogen to 8 of oxygen, or every 9 lbs. of water
contain 8 lbs. of oxygen and 1 lb. of hydrogen.
Water is so familiar a substance, that it is unnecessary to dwell
upon its properties. When pure, it has neither colour, taste, nor smell.
At 32° of Fahrenheit’s[2] scale (the freezing point), it solidifies into
ice, and at 212° it boils, and is converted into steam. There are two
others of its properties which are especially interesting in connection
with the growth of plants.
1st, If sugar or salt be put into water, they disappear or are
dissolved. Water has the power of thus dissolving numerous other
substances in greater or less quantity. Hence, when the rain falls and
sinks into the soil, it dissolves some of the soluble substances it
meets in its way, and rarely reaches the roots of plants in a pure
state. So waters that rise up in springs are rarely pure. They always
contain earthy and saline substances in solution, and these they
carry with them, when they are sucked in by the roots of plants.
It has been above stated, that water absorbs (dissolves) its own
bulk of carbonic acid; it dissolves also smaller quantities of the
oxygen and nitrogen of the atmosphere; and hence, when it meets
any of these gases in the soil, it becomes impregnated with them,
and conveys them into the plant, there to serve as a portion of its
food.
2d, Water is composed of oxygen and hydrogen; by certain
chemical processes it can readily be resolved or decomposed
artificially into these two gases. The same thing takes place naturally
in the interior of the living plant. The roots absorb the water, but if in
any part of the plant hydrogen be required, to make up the
substance which it is the function of that part to produce, a portion of
the water is decomposed and worked up, while the oxygen is set
free, or converted to some other use. So, also, in any case where
oxygen is required water is decomposed, the oxygen made use of,
and the hydrogen liberated. Water, therefore, which abounds in the
vessels of all growing plants, if not directly converted into the
substance of the plant, is yet a ready and ample source from which a
supply of either of the elements of which it consists may at any time
be obtained.
It is a beautiful adaptation of the properties of this all-pervading
compound (water), that its elements should be so fixedly bound
together as rarely to separate in external nature, and yet to be at the
command and easy disposal of the vital powers of the humblest
order of living plants.
2. Ammonia.—If the sal ammoniac of the shops be mixed with
quicklime, a powerful odour is immediately perceived, and an
invisible gas is given off which strongly affects the eyes. This gas is
ammonia. Water dissolves or absorbs it in very large quantity, and
this solution forms the common hartshorn of the shops. The white
solid smelling-salts of the shops are a compound of ammonia with
carbonic acid,—a solid formed by the union of two gases.
The gaseous ammonia consists of nitrogen and hydrogen only, in
the proportion of 14 of the former to 3 of the latter, or 17 lbs. of
ammonia contain 3 lbs. of hydrogen.
The chief natural source of this compound is, in the decay of
animal substances. During the putrefaction of dead animal bodies
ammonia is invariably given off. From the animal substances of the
farm-yard it is evolved, and from all solid and liquid manures of
animal origin. It is also formed in lesser quantity during the decay of
vegetable substances in the soil; and in volcanic countries, it
escapes from many of the hot lavas, and from the crevices in the
heated rocks.
It is produced artificially by the distillation of animal substances
(hoofs, horns, &c.), or of coal. Thousands of tons of the ammonia
present in the ammoniacal liquors of the gas-works, which might be
beneficially applied as a manure, are annually carried down by the
rivers, and lost in the sea.
The ammonia which is given off during the putrefaction of animal
substances rises partially into the air, and floats in the atmosphere,
till it is either decomposed by natural causes, or is washed down by
the rains. In our climate, cultivated plants derive a considerable
portion of their nitrogen from ammonia. It is supposed to be one of
the most valuable fertilizing substances contained in farm-yard
manure; and as it is present in greater proportion by far in the liquid
than in the solid contents of the farm-yard, there can be no doubt
that much real wealth is lost, and the means of raising increased
crops thrown away in the quantities of liquid manure which are
almost everywhere permitted to run to waste.
3. Nitric Acid—is a powerfully corrosive liquid known in the
shops by the familiar name of aquafortis. It is prepared by pouring oil
of vitriol (sulphuric acid) upon saltpetre, and distilling the mixture.
The aquafortis of the shops is a mixture of the pure acid with water.
Pure nitric acid consists of nitrogen and oxygen only; the union of
these two gases, so harmless in the air, producing the burning and
corrosive compound which this is known to be.
It never reaches the roots of plants in this free and corrosive
state. It exists in many soils, and is naturally formed in compost
heaps, and in most situations where vegetable matter is undergoing
decay in contact with the air; but it is always in a state of chemical
combination in these cases. With potash, it forms nitrate of potash
(saltpetre); with soda, nitrate of soda; and with lime, nitrate of lime;
and it is generally in one or other of these states of combination that
it reaches the roots of plants.
Nitric acid is also naturally formed, and in some countries
probably in large quantities, by the passage of electricity through the
atmosphere. The air, as has been already stated, contains much
oxygen and nitrogen mixed together, but when an electric spark is
passed through a quantity of air, a certain quantity of the two unite
together chemically, so that every spark that passes forms a small
portion of nitric acid. A flash of lightning is only a large electric spark;
and hence every flash that crosses the air produces along its path a
quantity of this acid. Where thunder-storms are frequent, much nitric
acid must be produced in this way in the air. It is washed down by
the rains, in which it has frequently been detected, and thus reaches
the soil, where it produces one or other of the nitrates above
mentioned.
It has been long observed that those parts of India are the most
fertile in which saltpetre exists in the soil in the greatest abundance.
Nitrate of soda, also, in this country, has been found wonderfully to
promote vegetation in many localities; and it is a matter of frequent
remark, that vegetation seems to be refreshed and invigorated by
the fall of a thunder-shower. There is, therefore, no reason to doubt
that nitric acid is really beneficial to the general vegetation of the
globe. And since vegetation is most luxuriant in those parts of the
globe where thunder or lightning are most abundant, it would appear
as if the natural production of this compound body in the air, to be
afterwards brought to the earth by the rains, were a wise and
beneficent contrivance by which the health and vigour of universal
vegetation is intended to be promoted.
It is from this nitric acid, thus universally produced and existing,
that plants appear to derive a large—probably, taking vegetation in
general, the largest—portion of their nitrogen. In all climates they
also derive a portion of this element from ammonia; but less from
this source in tropical than in temperate climates.[3]

SECTION IV.—OF THE CONSTITUTION OF


THE ATMOSPHERE.
The air we breathe, and from which plants also derive a portion of
their nourishment, consists of a mixture of oxygen and nitrogen
gases, with a minute quantity of carbonic acid, and a variable
proportion of watery vapour. Every hundred gallons of dry air contain
about 21 gallons of oxygen and 79 of nitrogen. The carbonic acid
amounts only to one gallon in 2500, while the watery vapour in the
atmosphere varies from 1 to 2½ gallons (of steam) in 100 gallons of
common air.
The oxygen in the air is necessary to the respiration of animals,
and to the support of combustion (burning of bodies). The nitrogen
serves principally to dilute the strength, so to speak, of the pure
oxygen, in which gas, if unmixed, animals would live and
combustibles burn with too great rapidity. The small quantity of
carbonic acid affords an important part of their food to plants, and
the watery vapour in the air aids in keeping the surfaces of animals
and plants in a moist and pliant state; while, in due season, it
descends also in refreshing showers, or studs the evening leaf with
sparkling dew.
There is a beautiful adjustment in the constitution of the
atmosphere to the nature and necessities of living beings. The
energy of the pure oxygen is tempered, yet not too much weakened,
by the admixture of nitrogen. The carbonic acid, which alone is
noxious to life, is mixed in so minute a proportion as to be harmless
to animals, while it is still beneficial to plants; and when the air is
overloaded with watery vapour, it is provided that it shall descend in
rain. These rains at the same time serve another purpose. From the
surface of the earth there are continually ascending vapours and
exhalations of a more or less noxious kind; these the rains wash out
from the air, and bring back to the soil, at once purifying the
atmosphere through which they descend, and refreshing and
fertilizing the land on which they fall.
CHAPTER III.
Structure of plants—Mode in which their nourishment is
obtained—Growth and substance of plants—
Production of their substance from the food they imbibe
—Mutual transformations of starch, sugar, and woody
fibre.

From the compound substances, described in the preceding


chapter, plants derive the greater portion of the carbon, hydrogen,
oxygen, and nitrogen, of which their organic part consists. The living
plant possesses the power of absorbing these compound bodies, of
decomposing them in the interior of its several vessels, and of
recompounding their elements in a different way, so as to produce
new substances,—the ordinary products of vegetable life. Let us
briefly consider the general structure of plants, and their mode of
growth.

SECTION I.—OF THE STRUCTURE OF PLANTS,


AND THE MODE IN WHICH THEIR
NOURISHMENT IS OBTAINED.
A perfect plant consists of three several parts,—a root which
throws out arms and fibres in every direction into the soil,—a trunk
which branches into the air on every side,—and leaves which, from
the ends of the branches and twigs, spread out a more or less
extended surface into the surrounding air. Each of these parts has a
peculiar structure and a special function assigned to it.
The stem of any of our common trees consists of three parts,—
the pith in the centre, the wood surrounding the pith, and the bark
which covers the whole. The pith consists of bundles of minute
hollow tubes, laid horizontally one over the other; the wood and inner
bark, of long tubes bound together in a vertical position, so as to be
capable of carrying liquids up and down between the roots and the
leaves. When a piece of wood is sawn across, the ends of these
tubes may be distinctly seen. The branch is only a prolongation of
the stem, and has a similar structure.
The root, immediately on leaving the trunk or stem, has also a
similar structure; but as the root tapers away, the pith gradually
disappears, the bark also thins out, the wood softens, till the white
tendrils, of which its extremities are composed, consist only of a
colourless spongy mass, full of pores, but in which no distinction of
parts can be perceived. In this spongy mass the vessels or tubes
which descend through the stem and root lose themselves, and by
them these spongy extremities are connected with the leaves.
The leaf is an expansion of the twig. The fibres which are seen to
branch out from the base over the inner surface of the leaf are
prolongations of the vessels of the wood. The green exterior portion
of the leaf is, in like manner, a continuation of the bark in a very thin
and porous form. The green of the leaf, though full of pores,
especially on the under part, yet also consists of, or contains, a
collection of tubes or vessels, which stretch along its surface, and
communicate with those of the bark.
Most of these vessels in the living plant are full of sap, and this
sap is in almost continual motion. In spring and autumn the motion is
more rapid, and in winter it is sometimes scarcely perceptible; yet
the sap is supposed to be rarely quite stationary in every part of the
tree.
From the spongy part of the root the sap ascends through the
vessels of the wood, till it is diffused over the inner surface of the leaf
by the fibres which the wood contains. Hence, by the vessels in the
green of the leaf, it is returned to the bark, and through the vessels
of the inner bark it descends to the root.
Every one understands why the roots send out fibres in every
direction through the soil,—it is in search of water and of liquid food,
which the spongy fibres suck in and send forward with the sap to the
upper parts of the tree. It is to aid these roots in procuring food that,
in the art of culture, such substances are mixed with the soil where
these roots are, as are supposed to be necessary, or at least
favourable, to the growth of the plant.
It is not so obvious that the leaves spread out their broad
surfaces into the air for the same purpose precisely as that for which
the roots diffuse their fibres through the soil. The only difference is,
that while the roots suck in chiefly liquid, the leaves inhale almost
solely gaseous food. In the sunshine, the leaves are continually
absorbing carbonic acid from the air and giving off oxygen gas. That
is to say, they are continually appropriating carbon from the air.[4]
When night comes, this process ceases, and they begin to absorb
oxygen and to give off carbonic acid. But this latter process does not
go on so rapidly as the former, so that, on the whole, plants when
growing gain a large portion of carbon from the air. The actual
quantity, however, varies with the season, with the climate, and with
the kind of tree. The proportion of the whole carbon contained by a
plant, which has been derived from the air, is greatly modified also
by the quality of the soil in which it grows, and by the comparative
abundance of liquid food which happens to be within reach of its
roots. It has been ascertained, however, that in our climate, on an
average, not less than from one-third to three-fourths of the entire
quantity of carbon contained in the crops we reap from land of
average fertility, is really obtained from the air.
We see then why, in arctic climates, where the sun once risen
never sets again during the entire summer, vegetation should almost
rush up from the frozen soil—the green leaf is ever gaining from the
air and never losing, ever taking in and never giving off carbonic
acid, since no darkness ever interrupts or suspends its labours.
How beautiful, too, does the contrivance of the expanded leaf
appear! The air contains only one gallon of carbonic acid in 2500,
and this proportion has been adjusted to the health and comfort of
animals to whom this gas is hurtful. But to catch this minute quantity,
the tree hangs out thousands of square feet of leaf in perpetual
motion, through an ever-moving air; and thus, by the conjoined
labours of millions of pores, the substance of whole forests of solid
wood is slowly extracted from the fleeting winds. The green stem of
the young shoot, and the green stalks of the grasses, also absorb
carbonic acid as the green of the leaf does, and thus a larger supply
is afforded when the growth is most rapid, or when the short life of
the annual plant demands much nourishment within a limited time.

SECTION II.—OF THE GROWTH AND


SUBSTANCE OF PLANTS.
In this way the perfect plant derives its food from the soil and
from the air; but perfect plants arise from seeds; and the study of the
entire life—the career, so to speak—of a plant, presents many
interesting and instructive subjects of consideration.
When a portion of flour is made into dough, and this dough is
kneaded with the hand under a stream of water upon a fine sieve, as
long as the water passes through milky, there will remain on the
sieve a glutinous sticky substance resembling birdlime, while the
milky water will gradually deposit a pure white powder. This powder
is starch, that which remains on the sieve is gluten. Both of these
substances exist, therefore, in the flour; they both also exist in the
grain. The starch consists of carbon, hydrogen, and oxygen only; the
gluten, in addition to these, contains also nitrogen.
When ground into flour, these substances serve for food to man;
in the unbruised grain they are intended to feed the future plant in its
earliest infancy.
When a seed is committed to the earth, if the warmth and
moisture are favourable, it begins to sprout. It pushes a shoot
upwards, it thrusts a root downwards, but, until the leaf expand, and
the root has fairly entered the soil, the young plant derives no
nourishment other than water, either from the earth or from the air. It
lives on the starch and gluten contained in the seed. But these
substances, though capable of being separated from each other by
means of water, as above stated, yet are neither of them soluble in
water. Hence, they cannot, without undergoing a previous change,
be taken up by the sap, and conveyed along the pores of the young
shoot they are destined to feed. But it is so arranged that, when the
seed first shoots, there is produced at the base of the germ, from a
portion of the gluten, a small quantity of a substance (diastase)
which has so powerful an effect upon the starch as immediately to
render it soluble in the sap, which is thus enabled to take it up and
convey it by degrees, just as it is wanted, to the shoot or to the root.
[5] As the sap ascends, it becomes sweet,—the starch thus dissolved
changes into sugar. When the shoot first becomes tipped with green,
the sugar is again changed into the woody fibre, of which the stem of
perfect plants chiefly consists. By the time that the food contained in
the seed is exhausted,—often, as in the potato, long before,—the
plant is able to live by its own exertions, at the expense of the air and
the soil.
This change of the sugar of the sap into woody fibre is
observable more or less in all plants. When they are shooting fastest
the sugar is most abundant; not, however, in those parts which are
growing, but in those which convey the sap to the growing parts.
Thus the sugar of the ascending sap of the maple and the alder
disappears in the leaf and in the extremities of the twig; thus the
sugar-cane sweetens only a certain distance above the ground, up
to where the new growth is proceeding; and thus also the young beet
and turnip abound most in sugar, while in all these plants the sweet
principle diminishes as the year’s growth draws nearer to a close.
In the ripening of the ear also, the sweet taste, at first so
perceptible, gradually diminishes and finally disappears; the sugar of
the sap is here changed into the starch of the grain, which, as above
described, is afterwards destined, when the grain begins to sprout, to
be reconverted into sugar for the nourishment of the rising germ.
In the ripening of fruits a different series of changes presents
itself. The fruit is first tasteless, then becomes sour, and at last
sweet. In this case the acid of the unripe is changed into the sugar of
the ripened fruit.
The substance of plants,—their solid parts that is—consist chiefly
of woody fibre, the name given to the fibrous substance, of which
wood evidently consists. It is interesting to inquire how this
substance can be formed from the compounds, carbonic acid and
water, of which the food of plants in great measure consists. Nor is it
difficult to find an answer.
It will be recollected that the leaf drinks in carbonic acid from the
air, and delivers back its oxygen, retaining only its carbon. It is also
known that water abounds in the sap. Hence carbon and water are
thus abundantly present in the pores or vessels of the green leaf.
Now, woody fibre consists only of carbon and water chemically
combined together,—100 lbs. of dry woody fibre consisting of 50 lbs.
of carbon and 50 lbs. of water. It is easy, therefore, to see how, when
the carbon and water meet in the leaf, woody fibre may be produced
by their mutual combination.
If, again, we inquire how this important principle of plants may be
formed from the other substances, which enter by their roots, from
the ulmic acid, for example, the answer is equally ready. This acid
also consists of carbon and water only, 50 lbs. of carbon with 37½ of
water forming ulmic acid, so that when it is introduced into the sap of
the plant, all the materials are present from which the woody fibre
may be produced.
Nor is it more difficult to see how starch may be converted into
sugar, and this again into woody fibre; or how, again, sugar may be
converted into starch in the ear of corn, or woody fibre into sugar
during the ripening of the winter pear after its removal from the tree.
Any one of these substances may be represented by carbon and
water only.
Thus,—
50 lbs. of carbon with 50 of water, make 100 of woody fibre.
50 lbs. 37½ 87½ of ulmic acid.
of cane sugar,
50 lbs. 72½ 122½ of starch, or
of gum.
50lbs. 56 106 of vinegar.

In the interior of the plant, therefore, it is obvious that, whichever


of these substances be present in the sap, the elements are at hand
out of which any of the others may be produced. In what way they
really are produced, the one from the other, and by what
circumstances these transformations are favoured, it would lead into
too great detail to attempt here to explain.[6]
We cannot help admiring to what varied purposes in nature the
same elements are applied, and from how few and simple materials,
substances, the most varied in their properties, are in the living
vegetable daily produced.
CHAPTER IV.
Of the Inorganic Constitution of Plants—Their immediate Source—Their Nature
—Quantity of each in certain common Crops.

SECTION I.—SOURCE OF THE EARTHY MATTER


OF PLANTS—SUBSTANCES OF
WHICH IT CONSISTS.
When plants are burned, they always leave more or less of ash behind. This ash varies
in quantity in different plants, in different parts of the same plant, and sometimes in different
specimens of the same kind of plant, especially if grown upon different soils; yet it is never
wholly absent. It seems as necessary to their existence in a state of perfect health as any of
the elements which constitute the organic or combustible part of their substance. They must
obtain it therefore along with the food on which they live: it is in fact a part of their natural
food, since without it they become unhealthy. We shall speak of it therefore as the inorganic
food of plants.
We have seen that all the elements which are necessary to the production of the woody
fibre, and of the other organic parts of the plant, may be derived either from the air, from the
carbonic acid and watery vapour taken in by the leaves, or from the soil, through the
medium of the roots. In the air, however, only rare particles of inorganic or earthy matter are
known to float, and these in a solid form, so as to be unable to enter by the leaves; the
earthy matter which constitutes the ash, therefore, must be all derived from the soil.
The earthy part of the soil, therefore, serves a double use. It is not merely, as some
have supposed, a substratum in which the plant may so fix and root itself, as to be able to
maintain its upright position against the force of winds and tempests; but it is a storehouse
of food also, from which the roots of the plant may select such earthy substances as are
necessary to, or are fitted to promote, its growth.
The ash of plants consists of a mixture of several, sometimes of as many as eleven,
different earthy substances. These substances are the following:—
1. Potash.—The common pearl-ash of the shops is a compound of potash with carbonic
acid; it is a carbonate of potash. By dissolving the pearl-ash in water, and boiling it with
quicklime, the carbonic acid is separated, and potash alone, or caustic potash, as it is often
called, is obtained.
2. Soda.—The common soda of the shops is a carbonate of soda, and by boiling it with
quicklime, the carbonic acid is separated, as in the case of pearl-ash.
3. Lime.—This is familiar to every one as the lime-shells, or unslaked lime of the
limekilns. The unburned limestone is a carbonate of lime; the carbonic acid in this case
being separated by the roasting in the kiln.
4. Magnesia.—This is the calcined magnesia of the shops. The uncalcined is a
carbonate of magnesia, from which heat drives off the carbonic acid.
5. Silica.—This is the name given by chemists to the substance of flint, quartz, and of
siliceous sands and sandstones.
6. Alumina is the pure earth of alum, obtained by dissolving alum in water, and adding
liquid ammonia (hartshorn) to the solution. It forms about two-fifths of the weight of
porcelain and pipe-clays, and of some other very stiff kinds of clay.
7. Oxide of Iron.—The most familiar form of this substance is the rust that forms on
metallic iron in damp places. It is a compound of iron with oxygen, hence the name oxide.
8. Oxide of Manganese is a brown powder, which consists of oxygen in combination with
a metal resembling iron, to which the name of manganese is given. It exists in plants, and in
soils only in very small quantity.
9. Sulphur.—This substance is well known. It generally exists in the ash in the state of
sulphuric acid (oil of vitriol), which is a compound of sulphur with oxygen. It does not always
exist in living plants, however, in this state.
Sulphuric acid forms with potash a sulphate of potash,—with soda, sulphate of soda (or
Glauber’s salts),—with lime, sulphate of lime (gypsum),—with magnesia, sulphate of
magnesia (Epsom salts),—with alumina, sulphate of alumina,—and with oxide of iron,
sulphate of iron or green vitriol. When the sulphate of potash is combined with sulphate of
alumina, it forms common alum.
10. Phosphorus is a soft pale yellow substance which readily takes fire in the air, and
gives off, while burning, a dense white smoke. The white fumes which form this smoke are
a compound of phosphorus with oxygen obtained from the air, and are called phosphoric
acid. In the ash of plants the phosphorus is found in the state of phosphoric acid, though it
probably does not all exist in the living plant in that state.
Phosphoric acid forms phosphates with potash, soda, lime, and magnesia. When bones
are burned, a large quantity of a white earth remains (bone-earth), which is a phosphate of
lime, consisting of lime and phosphoric acid. Phosphate of lime is generally present in the
ash of plants; phosphate of magnesia is contained most abundantly in the ash of wheat and
other varieties of grain.
11. Chlorine.—This is a very suffocating gas, which gives its peculiar smell to chloride of
lime, and is used for bleaching and disinfecting. It is readily obtained by pouring muriatic
acid (spirit of salt) on the black oxide of manganese of the shops. In combination with the
metallic bases of potash, soda, lime, and magnesia, it forms the chlorides of potassium,
sodium (common salt), calcium and magnesium,[7] and in one or other of these states it
generally enters into the roots of plants, and exists in their ash.
Such are the inorganic substances usually found mixed or combined together in the ash
of plants. It has already been observed, that the quantity of ash left by a given weight of
vegetable matter varies with a great many conditions. This fact deserves a more attentive
consideration.

SECTION II.—OF THE DIFFERENCE IN


THE QUANTITY OF ASH.
1. The quantity of ash yielded by different plants is unlike. Thus 1000 lbs. of
Wheat leave 12 lbs.
Oats 26 lbs.
Turnips 8 lbs.
Red Clover 16 lbs.
Rye Grass 17 lbs.
Barley 25 lbs.
Potatoes 8 lbs.
Carrots 7 lbs.
White Clover 17 lbs.
So that the quantity of inorganic food required by different vegetables is greater or less
according to their nature; and if a soil be of such a kind that it can yield only a small quantity
of this inorganic food, then only those plants will grow well upon it which require the least.
Hence, trees may often grow where arable crops fail to thrive, because many of them
require and contain very little inorganic matter. Thus while 1000 lbs. of elm wood leave 19
lbs. and of poplar 20 lbs. of ash, the same weight of the willow leaves only 4½ lbs., of the
beech 4 lbs., of the birch 3½ lbs., of different pines less than 3 lbs., and of the oak only 2
lbs. of ash when burned.
2. The quantity of inorganic matter varies in different parts of the same plant. Thus while
1000 lbs. of the turnip root sliced and dried in the air leave 70 lbs. of ash, the dried leaves
give 130 lbs.; and while the grain of wheat yields only 12 lbs., wheat straw will yield 60 lbs.
of earthy matter. So, though the willow and other woods leave little ash, as above stated,
yet the willow leaf leaves 82 lbs., the beech leaf 42 lbs., the birch 50 lbs., the different pine
leaves 20 lbs. to 30 lbs., and the leaves of the elm as much as 120 lbs. of incombustible
matter when burned in the air.
Most of the inorganic matter, therefore, which is withdrawn from the soil in a crop of corn
is returned to it again, by the skilful husbandman, in the fermented straw,—in the same way
as nature, in causing the trees periodically to shed their leaves, returns with them to the soil
a very large portion of the soluble inorganic substances which had been drawn from it by
the roots during the season of growth.
Thus an annual top-dressing is given to the land where forests grow; and that which the
roots from spring to autumn are continually sucking up, and carefully collecting from
considerable depths, winter strews again on the surface, so as, in the lapse of time, to form
a soil which cannot fail to prove fertile,—because it is made up of those very materials of
which the inorganic substance of former races of vegetables has been entirely composed.
2. The quantity of inorganic matter often differs in different specimens of the same plant.
Thus, 1000 lbs. of wheat straw, grown at different places, gave to four different
experimenters 43, 44, 35, and 155 lbs. of ash respectively. Wheat straw, therefore, does not
always leave the same quantity of ash.
To what is this difference owing? Is it to the nature of the soil, or does it depend upon
the variety of wheat experimented upon? It seems to depend partly upon both. Thus, on the
same field, in Ravensworth dale, Yorkshire, on a rich clay soil abounding in lime, the
Golden Kent and Flanders Red wheats were sown in the spring of 1841. The former gave
an excellent crop, while the latter was a total failure, the ear containing 20 or 30 grains only
of poor wheat. The straw of the former left 165 lbs. of ash from 1000 lbs., that of the latter
only 120 lbs. Something, therefore, depends upon the variety. But as from the straw of a
good wheat crop grown near Durham this last summer on a clay loam I obtained only 66
lbs. of ash, I am persuaded that the very wide variations in the quantity of ash left, by
different wheat straws, must be dependent in some considerable degree upon the soil.
The truth, so far as it can as yet be made out, seems to be this—that every plant must
have a certain quantity of inorganic matter to make it grow in the most healthy manner;—
that it is capable of living, growing, and even ripening seed with very much less than this
quantity;—but that those soils will produce the most perfect plants which can best supply all
their wants,—and that the best seed will be raised in those districts where the soil, without
being too rich or rank, yet can yield both organic and inorganic food in such proportions as
to maintain the corn plants in their most healthy condition.

SECTION III.—OF THE QUALITY OF


THE ASH OF PLANTS.
But much also depends upon the quality as well as upon the quantity of the ash. Plants
may leave the same weight of ash when burned, and yet the nature of the two specimens of
ash, the kind of matter of which they respectively consist, may be very different. The ash of
one may contain much lime, of another much potash, of a third much soda, while in a fourth
much silica may be present. Thus 100 lbs. of the ash of bean straw contain 53½ lbs. of
potash, while that of barley straw contains only 3½ lbs. in the hundred; and, on the other
hand, the same weight of the ash of the latter contains 73½ lbs. of silica, while in that of the
former there are only 7½ lbs.
The quality of the ash seems to vary with the same conditions by which its quantity is
affected. Thus—
1. It varies with the kind of plant. 100 lbs. of the ash of wheat, barley, and oats, for
example, contain, respectively,
Wheat. Barley. Oats.
Potash, 19 12 6
Soda, 20½ 12 5
Lime, 8 4½ 3
Magnesia, 8 8 2½
Alumina, 2 1 ½
Oxide of Iron, 0 trace. 1½
Silica, 34 50 76½
Sulphuric acid, 4 2½ 1½
Phosphoric acid, 3½ 9 3
Chlorine, 1 1 ½
100 100 100

You might also like