From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology Simona Cocco full chapter instant download

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

From Statistical Physics to Data-Driven

Modelling: with Applications to


Quantitative Biology Simona Cocco
Visit to download the full and correct content document:
https://ebookmass.com/product/from-statistical-physics-to-data-driven-modelling-with-
applications-to-quantitative-biology-simona-cocco/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Introduction to Quantitative Ecology: Mathematical and


Statistical Modelling for Beginners Essington

https://ebookmass.com/product/introduction-to-quantitative-
ecology-mathematical-and-statistical-modelling-for-beginners-
essington/

Introduction to Quantitative Ecology: Mathematical and


Statistical Modelling for Beginners Timothy E.
Essington

https://ebookmass.com/product/introduction-to-quantitative-
ecology-mathematical-and-statistical-modelling-for-beginners-
timothy-e-essington/

An Introduction to Statistical Learning with


Applications in R eBook

https://ebookmass.com/product/an-introduction-to-statistical-
learning-with-applications-in-r-ebook/

Data-driven Solutions to Transportation Problems Yinhai


Wang

https://ebookmass.com/product/data-driven-solutions-to-
transportation-problems-yinhai-wang/
Statistical Topics and Stochastic Models for Dependent
Data With Applications Vlad Stefan Barbu

https://ebookmass.com/product/statistical-topics-and-stochastic-
models-for-dependent-data-with-applications-vlad-stefan-barbu/

Statistical Process Monitoring Using Advanced Data-


Driven and Deep Learning Approaches: Theory and
Practical Applications 1st Edition Fouzi Harrou

https://ebookmass.com/product/statistical-process-monitoring-
using-advanced-data-driven-and-deep-learning-approaches-theory-
and-practical-applications-1st-edition-fouzi-harrou/

An Introduction to Data-Driven Control Systems Ali


Khaki-Sedigh

https://ebookmass.com/product/an-introduction-to-data-driven-
control-systems-ali-khaki-sedigh/

Renewable-Energy-Driven Future: Technologies,


Modelling, Applications, Sustainability and Policies
Jingzheng Ren (Eds)

https://ebookmass.com/product/renewable-energy-driven-future-
technologies-modelling-applications-sustainability-and-policies-
jingzheng-ren-eds/

Data Driven: Harnessing Data and AI to Reinvent


Customer Engagement 1st Edition Tom Chavez

https://ebookmass.com/product/data-driven-harnessing-data-and-ai-
to-reinvent-customer-engagement-1st-edition-tom-chavez/
FROM STATISTICAL PHYSICS TO DATA-DRIVEN
MODELLING
From Statistical Physics to Data-Driven
Modelling
with Applications to Quantitative Biology

Simona Cocco
Rémi Monasson
Francesco Zamponi
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Simona Cocco, Rémi Monasson, and Francesco Zamponi 2022
The moral rights of the authors have been asserted
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2022937922
ISBN 978–0–19–886474–5
DOI: 10.1093/oso/9780198864745.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Contents

1 Introduction to Bayesian inference 1


1.1 Why Bayesian inference? 1
1.2 Notations and definitions 2
1.3 The German tank problem 4
1.4 Laplace’s birth rate problem 7
1.5 Tutorial 1: diffusion coefficient from single-particle tracking 11
2 Asymptotic inference and information 17
2.1 Asymptotic inference 17
2.2 Notions of information 23
2.3 Inference and information: the maximum entropy principle 29
2.4 Tutorial 2: entropy and information in neural spike trains 32
3 High-dimensional inference: searching for principal compo-
nents 39
3.1 Dimensional reduction and principal component analysis 39
3.2 The retarded learning phase transition 43
3.3 Tutorial 3: replay of neural activity during sleep following task
learning 52
4 Priors, regularisation, sparsity 59
4.1 Lp -norm based priors 59
4.2 Conjugate priors 64
4.3 Invariant priors 67
4.4 Tutorial 4: sparse estimation techniques for RNA alternative splic-
ing 71
5 Graphical models: from network reconstruction to Boltzmann
machines 81
5.1 Network reconstruction for multivariate Gaussian variables 81
5.2 Boltzmann machines 86
5.3 Pseudo-likelihood methods 92
5.4 Tutorial 5: inference of protein structure from sequence data 97
6 Unsupervised learning: from representations to generative
models 107
6.1 Autoencoders 107
6.2 Restricted Boltzmann machines and representations 112
6.3 Generative models 120
6.4 Learning from streaming data: principal component analysis re-
visited 125
vi Contents

6.5 Tutorial 6: online sparse principal component analysis of neural


assemblies 132
7 Supervised learning: classification with neural networks 137
7.1 The perceptron, a linear classifier 137
7.2 Case of few data: overfitting 143
7.3 Case of many data: generalisation 146
7.4 A glimpse at multi-layered networks 152
7.5 Tutorial 7: prediction of binding between PDZ proteins and pep-
tides 156
8 Time series: from Markov models to hidden Markov models 161
8.1 Markov processes and inference 161
8.2 Hidden Markov models 164
8.3 Tutorial 8: CG content variations in viral genomes 171
References 175
Index 181
Preface

Today’s science is characterised by an ever-increasing amount of data, due to instru-


mental and experimental progress in monitoring and manipulating complex systems
made of many microscopic constituents. While this tendency is true in all fields of
science, it is perhaps best illustrated in biology. The activity of neural populations,
composed of hundreds to thousands of neurons, can now be recorded in real time
and specifically perturbed, offering a unique access to the underlying circuitry and
its relationship with functional behaviour and properties. Massive sequencing has per-
mitted us to build databases of coding DNA or protein sequences from a huge variety
of organisms, and exploiting these data to extract information about the structure,
function, and evolutionary history of proteins is a major challenge. Other examples
abound in immunology, ecology, development, etc.
How can we make sense of such data, and use them to enhance our understanding
of biological, physical, chemical, and other systems? Mathematicians, statisticians,
theoretical physicists, computer scientists, computational biologists, and others have
developed sophisticated approaches over recent decades to address this question. The
primary objective of this textbook is to introduce these ideas at the crossroad between
probability theory, statistics, optimisation, statistical physics, inference, and machine
learning. The mathematical details necessary to deeply understand the methods, as
well as their conceptual implications, are provided. The second objective of this book is
to provide practical applications for these methods, which will allow students to really
assimilate the underlying ideas and techniques. The principle is that students are given
a data set, asked to write their own code based on the material seen during the theory
lectures, and analyse the data. This should correspond to a two- to three-hour tutorial.
Most of the applications we propose here are related to biology, as they were part of
a course to Master of Science students specialising in biophysics at the Ecole Normale
Supérieure. The book’s companion website1 contains all the data sets necessary for
the tutorials presented in the book. It should be clear to the reader that the tutorials
proposed here are arbitrary and merely reflect the research interests of the authors.
Many more illustrations are possible! Indeed, our website presents further applications
to “pure” physical problems, e.g. coming from atomic physics or cosmology, based on
the same theoretical methods.
Little prerequisite in statistical inference is needed to benefit from this book. We
expect the material presented here to be accessible to MSc students not only in physics,
but also in applied maths and computational biology. Readers will need basic knowl-
edge in programming (Python or some equivalent language) for the applications, and
in mathematics (functional and linear analysis, algebra, probability). One of our major
goals is that students will be able to understand the mathematics behind the meth-

1 https://github.com/StatPhys2DataDrivenModel/DDM_Book_Tutorials
viii Preface

ods, and not act as mere consumers of statistical packages. We pursue this objective
without emphasis on mathematical rigour, but with a constant effort to develop in-
tuition and show the deep connections with standard statistical physics. While the
content of the book can be thought of as a minimal background for scientists in the
contemporary data era, it is by no means exhaustive. Our objective will be truly ac-
complished if readers then actively seek to deepen their experience and knowledge by
reading advanced machine learning or statistical inference textbooks.
As mentioned above, a large part of what follows is based on the course we gave
at ENS from 2017 to 2021. We are grateful to A. Di Gioacchino, F. Aguirre-Lopez,
and all the course students for carefully reading the manuscript and signalling us the
typos or errors. We are also deeply indebted to Jean-François Allemand and Maxime
Dahan, who first thought that such a course, covering subjects not always part of the
standard curriculum in physics, would be useful, and who strongly supported us. We
dedicate the present book to the memory of Maxime, who tragically disappeared four
years ago.

Paris, January 2022.

Simona Cocco1 , Rémi Monasson1,2 and Francesco Zamponi1


1
Ecole Normale Supérieure, Université PSL & CNRS
2
Department of Physics, Ecole Polytechnique
1
Introduction to Bayesian inference

This first chapter presents basic notions of Bayesian inference, starting with the def-
initions of elementary objects in probability, and Bayes’ rule. We then discuss two
historically motivated examples of Bayesian inference, in which a single parameter has
to be inferred from data.

1.1 Why Bayesian inference?


Most systems in nature are made of small components, interacting in a complex way.
Think of sand grains in a dune, of molecules in a chemical reactor, or of neurons in a
brain area. Techniques to observe and characterise quantitatively these systems, or at
least part of them, are routinely developed by scientists and engineers, and allow one
to ask fundamental questions, see figure 1.1:
• What can we say about the future evolution of these systems? About how they will
respond to some perturbation, e.g. to a change in the environmental conditions?
Or about the behaviour of the subparts not accessible to measurements?
• What are the underlying mechanisms explaining the collective properties of these
systems? How do the small components interact together? What is the role played
by stochasticity in the observed behaviours?
The goal of Bayesian inference is to answer those questions based on observations,
which we will refer to as data in the following. In the Bayesian framework, both the

Fig. 1.1 A. A large complex system includes many components (black dots) that interact
together (arrows). B. An observer generally has access to a limited part of the system and
can measure the behaviour of the components therein, e.g. their characteristic activities over
time.
2 Introduction to Bayesian inference

Fig. 1.2 Probabilistic description of models and data in the Bayesian framework. Each point
in the space of model parameters (triangle in the left panel) defines a distribution of possible
observations over the data space (shown by the ellipses). In turn, a specific data set (diamond
in right panel) corresponding to experimental measurements is compatible with a portion of
the model space: it defines a distribution over the models.

data and the system under investigation are considered to be stochastic. To be more
precise, we assume that there is a joint distribution of the data configurations and
of the defining parameters of the system (such as the sets of microscopic interactions
between the components or of external actions due to the environment), see figure 1.2.
The observations collected through an experiment can be thought of as a realisation
of the distribution of the configurations conditioned by the (unknown) parameters of
the system. The latter can thus be inferred through the study of the probability of the
parameters conditioned by the data. Bayesian inference offers a framework connecting
these two probability distributions, and allowing us in fine to characterise the system
from the data. We now introduce the definitions and notations needed to properly set
this framework.

1.2 Notations and definitions


1.2.1 Probabilities
We will denote by y a random variable taking values in a finite set of q elements:

y ∈ {a1 , a2 , · · · , aq } = A , (1.1)

and by
pi = Prob(y = ai ) = p(y = ai ) = p(y) (1.2)
the probability that y takes a given value y = ai , which will be equivalently written
in one of the above forms depending on the context.
For a two dimensional variable

y = (y1 , y2 ) ∈ A × B (1.3)

the corresponding probability will be denoted by


Notations and definitions 3

pij = Prob[y = (ai , bj )] = p(y1 = ai , y2 = bj ) = p(y1 , y2 ) = p(y) . (1.4)

In this case, we can also define the “marginal probabilities”


X X
p(y1 ) = p(y1 , y2 ) , p(y2 ) = p(y1 , y2 ) . (1.5)
y2 ∈B y1 ∈A

By definition, y1 and y2 are independent variables if their joint probability factorises


into the product of their marginals, then

p(y1 , y2 ) = p(y1 ) × p(y2 ) . (1.6)

1.2.2 Conditional probabilities and Bayes’ rule


The “conditional probability” of y1 = ai , conditioned to y2 = bj , is

p(y1 , y2 )
p(y1 |y2 ) = . (1.7)
p(y2 )

Note that this definition makes sense only if p(y2 ) > 0, but of course if p(y2 ) = 0, then
also p(y1 , y2 ) = 0. The event y2 never happens, so the conditional probability does not
make sense. Furthermore, according to Eq. (1.7), p(y1 |y2 ) is correctly normalised:
P
y1 ∈A p(y1 , y2 )
X
p(y1 |y2 ) = =1. (1.8)
p(y2 )
y1 ∈A

Eq. (1.7) allows us to derive Bayes’ rule:

p(y1 , y2 ) p(y2 |y1 ) p(y1 )


p(y1 |y2 ) = = . (1.9)
p(y2 ) p(y2 )

This simple identity has a deep meaning for inference, which we will now discuss.
Suppose that we have an ensemble of M “data points” yi ∈ RL , which we denote
by Y = {yi }i=1,··· ,M , generated from a model with D (unknown to the observer)
“parameters”, which we denote by θ ∈ RD . We can rewrite Bayes’ rule as

p(Y |θ)p(θ)
p(θ|Y ) = . (1.10)
p(Y )

The objective of Bayesian inference is to obtain information on the parameters θ.


We will call p(θ|Y ) the “posterior distribution”, which is the object we are interested
in: it is the probability of θ conditioned to the data Y we observe. Bayes’ rule expresses
the posterior in terms of a “prior distribution” p(θ), which represents our information
on θ prior to any measurement, and of the “likelihood” p(Y |θ) of the model, i.e. the
probability that the data Y are generated by the model having defining parameters
θ. It is important to stress that the likelihood expresses, in practice, a “model of the
experiment”; such a model can be known exactly in some cases, but in most situations
it is unknown and has to be guessed, as part of the inference procedure. Last of all, p(Y )
4 Introduction to Bayesian inference

is called the “evidence”, and is expressed in terms of the prior and of the likelihood
through Z
p(Y ) = dθ p(Y |θ)p(θ) . (1.11)

Its primary role is to guarantee the normalisation of the posterior p(θ|Y ).


We will now illustrate how Bayesian inference can be applied in practice with two
examples.

1.3 The German tank problem


The German tank problem was a very practical issue that arose during World War II.
The Allies wanted to estimate the numbers of tanks available to the Germans, in order
to estimate the number of units needed for the Normandy landings. Some information
was available to the Allies. In fact, during previous battles, some German tanks were
destroyed or captured and the Allies then knew their serial numbers, i.e. a progressive
number assigned to the tanks as they were produced in the factories. The problem can
be formalised as follows [1].

1.3.1 Bayes’ rule


The available data are a sequence of integer numbers

1 ≤ y1 < y2 < y3 < · · · < yM ≤ N , (1.12)

where N is the (unknown) total number of tanks available to the Germans, and
Y = {y1 , · · · , yM } ∈ NM are the (known) factory numbers of the M destroyed tanks.
Note that, obviously, M ≤ yM ≤ N . Our goal is to infer N given Y .
In order to use Bayes’ rule, we first have to make an assumption about the prior
knowledge p(N ). For simplicity, we will assume p(N ) to be uniform in the interval
[1, Nmax ], p(N ) = 1/Nmax . A value of Nmax is easily estimated in practical applications,
but for convenience we can also take the limit Nmax → ∞ assuming p(N ) to be
a constant for all N . Note that in this limit p(N ) is not normalisable (it is called
“an improper prior”), but as we will see this may not be a serious problem: if M is
sufficiently large, the normalisation of the posterior probability is guaranteed by the
likelihood, and the limit Nmax → ∞ is well defined.
The second step consists in proposing a model of the observations and in computing
the associated likelihood. We will make the simplest assumption that the destroyed
tanks are randomly and uniformly sampled from the total number of available tanks.
Note that this assumption could be incorrect in practical applications, as for example
the Germans could have decided to send to the front the oldest tanks first, which
would bias the yi towards smaller values. The choice of the likelihood expresses our
modelling of the data generation process. If the true model is unknown, we have to
make a guess, which  has an impact on our inference result.
N
There are M ways to choose M ordered numbers y1 < y2 < · · · < yM in [1, N ].
Assuming that all these choices have equal probability, the likelihood of a set Y given
N is
The German tank problem 5

1
p(Y |N ) = N
 1(1 ≤ y1 < y2 < · · · < yM ≤ N ) , (1.13)
M

because all the ordered M -uples are equally probable. Here, 1(c) denotes the “indicator
function” of a condition c, which is one if c is satisfied, and zero otherwise.
Given the prior and the likelihood, using Bayes’ rule, we have

p(Y |N )p(N )
p(N |Y ) = P 0 0
, (1.14)
N 0 p(Y |N )p(N )

which for an improper uniform prior reduces to

N −1

p(Y |N ) M 1(N ≥ yM )
p(N |Y ) = P = = p(N |yM ; M ) . (1.15)
p(Y |N 0) N 0 −1
P 
N 0
0
N ≥yM M

Note that, because Y is now given and N is the variable, the condition N ≥ yM >
yM −1 > · · · > y1 ≥ 1 is equivalent to N ≥ yM , and as a result the posterior probability
of N depends on the number of data, M (here considered as a fixed constant), and on
the largest observed value, yM , only. It can be shown [1] that the denominator of the
posterior is
X N 0 −1 yM −1 yM
= , (1.16)
0
M M M −1
N ≥yM

leading to the final expression for the posterior,

N −1

M 1(N ≥ yM )
p(N |yM ; M ) = . (1.17)
yM −1 yM

M M −1

Note that for large N , we have p(Y |N ) ∝ N −M . Therefore, if M = 1, the posterior in


Eq. (1.15) is not normalisable if the prior is uniform, and correspondingly Eq. (1.17) is
not well defined, see the M −1 term in the denominator. Hence, if only one observation
is available, some proper prior distribution p(N ) is needed for the Bayesian posterior
to make sense. Conversely, if M ≥ 2, the posterior is normalisable thanks to the fast-
decaying likelihood, and Eq. (1.17) is well defined, so the use of an improper prior is
acceptable when at least two observations are available.

1.3.2 Analysis of the posterior


Having computed the posterior probability, we can deduce information on N . First of
all, we note that p(N |M, yM ) obviously vanishes for N < yM . For large N , we have
p(N |M, yM ) ∝ N −M , and recall that we need M ≥ 2 for the posterior to be well
defined. We can then compute:
1. The typical value of N , i.e. the value N ∗ that maximises the posterior. As the
posterior distribution is a monotonically decreasing function of N ≥ yM we have
N ∗ = yM .
6 Introduction to Bayesian inference

0.10

Posterior distribution P(NjM; yM )


0.08

­ ®
0.06 N
yM

0.04

0.02

» N ¡M
0.00
75 100 125 150 175 200
N

Fig. 1.3 Illustration of the posterior probability for the German tank problem, for M = 10
and yM = 100. The dashed vertical lines locate the typical (= yM ) and average values of N .

2. The average value of N ,


X M −1
hN i = N p(N |yM ; M ) = (yM − 1) . (1.18)
M −2
N

Note that, for large M , we get

yM − M
hN i ∼ yM + + ··· , (1.19)
M

i.e. hN i is only slightly larger than yM . This formula has a simple interpretation.
For large M , we can assume that the yi are equally spaced in the interval [1, yM ].
The average spacing is then ∆y = (yM − M )/M . This formula tells us that
hN i = yM + ∆y is predicted to be equal to the next observation.
3. The variance of N ,

2 2 M −1
σN = N 2 − hN i = (yM − 1)(yM − M + 1) . (1.20)
(M − 2)2 (M − 3)

The variance gives us an estimate of the quality of the prediction for N . Note that
for large M , assuming yM ∝ M , we get σN /hN i ∝ 1/M , hence the relative stan-
dard deviation decreases with the number of observations. Notice that Eq. (1.18)
for the mean and Eq. (1.20) for the variance make sense only if, respectively,
M > 2 and M > 3. Again, we see a consequence of the use of an improper prior:
moments of order up to K are defined only if the number M of observations is
larger than K + 2.
For example, one can take M = 10 observations, and y10 = 100. A plot of the posterior
is given in figure 1.3. Then, hN i = 111.4, and the standard deviation is σN = 13.5.
One can also ask what the probability is that N is larger than some threshold, which
Laplace’s birth rate problem 7

is arguably the most important piece of information if you are planning the Normandy
landings. With the above choices, one finds
p(N > 150|y10 = 100) = 2.2 · 10−2 ,
p(N > 200|y10 = 100) = 1.5 · 10−3 , (1.21)
−4
p(N > 250|y10 = 100) = 2 · 10 .
More generally,
 −(M −1)
Nlower bound
p(N > Nlower bound |yM ) ∝ . (1.22)
hN i

1.4 Laplace’s birth rate problem


We now describe a much older problem, which was studied by Laplace at the end of the
eighteenth century. It is today well known that the number of boys and girls is slightly
different at birth, although the biological origin of this fact is not fully understood. At
his time, Laplace had access to the historical record of the number of boys and girls
born in Paris from 1745 to 1770: out of a total number M = 493472 of observations,
the number of girls was y = 241945, while the number of boys was M − y = 251527.
This observation suggests a slightly lower probability of giving birth to a girl, but could
this be explained by random fluctuations due to the limited number of observations?
1.4.1 Posterior distribution for the birth rate
Laplace wanted to determine the probability of a newborn baby to be a girl, which is
a single parameter, θ ∈ [0, 1]. To do so, we first need to introduce a prior probability1
over θ; let us assume p(θ) = 1 to be constant in the interval θ ∈ [0, 1] (and zero
otherwise), i.e. no prior information. To obtain the likelihood, we can assume that
each birth is an independent event, in which case the distribution of y conditioned to
θ is a binomial,  
M y
p(y|θ; M ) = θ (1 − θ)M −y , (1.23)
y
where M is a fixed, known integer. Note that this is a very simple model, because in
principle there could be correlations between births (e.g. for brothers, twins, etc.). We
will discuss later how to assess the quality of a given model. For now we use Bayes’
rule to obtain the posterior probability density over the birth rate,
p(y|θ; M )p(θ) θy (1 − θ)M −y
p(θ|y; M ) = = R1 . (1.24)
p(y; M ) dθ 0 (θ 0 )y (1 − θ 0 )M −y
0

Eq. (1.24) is a particular case of the “beta distribution”,


θα−1 (1 − θ)β−1
Beta(θ; α, β) = ,
B(α, β)
Z 1
(1.25)
Γ(α)Γ(β)
B(α, β) = dθ θα−1 (1 − θ)β−1 = ,
0 Γ(α + β)
1 Note that this is now a probability density, because θ is a continuous variable.
8 Introduction to Bayesian inference

R∞
with α = y + 1 and β = M − y + 1, and Γ(x) = 0 dθ θx−1 e−θ stands for the Gamma
function. The following properties of the beta distribution are known:
• The typical value of θ, i.e. which maximizes Beta, is θ∗ = α−1
α+β−2 .
α
• The average value is hθi = α+β .
αβ
• The variance is σθ2 = (α+β)2 (α+β+1) .
A first simple question is: what would be the distribution of θ if there were no girls
observed? In that case, we would have y = 0 and

p(θ|0; M ) = (M + 1)(1 − θ)M . (1.26)

The most likely value would then be θ∗ = 0, but the average value would be hθi = 1/M .
This result is interesting: observing no girls among M births should not be necessarily
interpreted that their birth rate θ is really equal to zero, but rather that θ is likely
to be smaller than 1/M , as the expected number of events to be observed to see one
girl is 1/θ. From this point of view, it is more reasonable to estimate that θ ∼ 1/M ,
than θ = 0.
In the case of Laplace’s data, from the observed numbers, we obtain

θ∗ = 0.490291 , hθi = 0.490291 , σθ = 0.000712 . (1.27)

The possibility that θ = 0.5 seems then excluded, because θ∗ ∼ hθi differs from 0.5 by
much more than the standard deviation,

|θ∗ − 0.5|  σθ , | hθi − 0.5|  σθ , (1.28)

but we would like to quantify more precisely the probability that, yet, the “true” value
of θ is equal to, or larger than 0.5.

1.4.2 Extreme events and Laplace’s method


The analysis above suggests that the birth rate of girls is smaller than 0.5. To be
more quantitative let us estimate the tiny probability that the observations (number
of births) are yet compatible with θ > 0.5. The posterior probability that θ > 0.5 is
given by Z 1
p(θ > 0.5|y; M ) = dθ p(θ|y; M ) . (1.29)
0.5
Unfortunately this integral cannot be computed analytically, but this is precisely why
Laplace invented his method for the asymptotic estimation of integrals! Expressing
the posterior in terms of θ∗ = y/M instead of y, he observed that
∗ ∗
p(θ|θ∗ ; M ) ∝ θM θ (1 − θ)M (1−θ )
= e−M fθ∗ (θ) , (1.30)

where
fθ∗ (θ) = −θ∗ log θ − (1 − θ∗ ) log(1 − θ) . (1.31)

A plot of fθ∗ (θ) for the value of θ that corresponds to Laplace’s observations is given
in figure 1.4. fθ∗ (θ) has a minimum when its argument θ reaches the typical value θ∗ .
Laplace’s birth rate problem 9

2.2

2.0

1.8

1.6
µ¤

fµ ¤ (µ)
1.4

1.2

1.0

0.8

0.0 0.2 0.4 0.6 0.8 1.0


µ

Fig. 1.4 Illustration of the posterior minus log-probability fθ∗ (θ) = − log p(θ|θ∗ , M )/M for
Laplace’s birth rate problem.

Because of the factor M in the exponent, for large M , a minimum of fθ∗ (θ) induces a
very sharp maximum in p(θ|θ∗ ; M ).
We can use this property to compute the normalisation factor at the denominator
in Eq. (1.24). Expanding fθ∗ (θ) in the vicinity of θ = θ∗ , we obtain

1
fθ∗ (θ) = fθ∗ (θ∗ ) + (θ − θ∗ )2 fθ00∗ (θ∗ ) + O((θ − θ∗ )3 ) , (1.32)
2
with
1
fθ00∗ (θ∗ ) = . (1.33)
θ∗ (1 − θ∗ )
In other words, next to its peak value the posterior distribution is roughly Gaussian,
and this statement is true for all θ away from θ∗ by deviations of the order of M −1/2 .
This property helps us compute the normalisation integral,
Z 1 Z 1 ∗ ∗ 2
1
dθ e−M fθ∗ (θ) ∼ dθ e−M [fθ∗ (θ )+ 2θ∗ (1−θ∗ ) (θ−θ ) ]
0 0 (1.34)
∗ p
' e−M fθ∗ (θ ) × 2πθ∗ (1 − θ∗ )/M ,

because we can extend the Gaussian integration interval to the whole real line without
affecting the dominant order in M . We deduce the expression for the normalised
posterior density of birth rates,

e−M [fθ∗ (θ)−fθ∗ (θ )]
p(θ|θ∗ ; M ) ∼ p . (1.35)
2πθ∗ (1 − θ∗ )/M

In order to compute the integral in Eq. (1.29), we need to study the regime where
θ − θ∗ is of order 1, i.e. a “large deviation” of θ. For this, we use Eq. (1.35) and expand
it around θ = 0.5, i.e.
10 Introduction to Bayesian inference

1 ∗
e−M [fθ∗ (θ)−fθ∗ (θ )]
Z
p(θ > 0.5|θ∗ ; M ) = dθ p
0.5 2πθ∗ (1 − θ∗ )/M
∗ Z 1
eM fθ∗ (θ ) 0
=p dθe−M [fθ∗ (0.5)+fθ∗ (0.5)(θ−0.5)+··· ] (1.36)
∗ ∗
2πθ (1 − θ )/M 0.5

e−M [fθ∗ (0.5)−fθ∗ (θ )]
∼ p .
fθ0 ∗ (0.5) 2πθ∗ (1 − θ∗ )M

With Laplace’s data, this expression for the posterior probability that θ ≥ 0.5 could
be evaluated to give
p(θ > 0.5|θ∗ ; M ) ∼ 1.15 · 10−42 , (1.37)
which provides a convincing statistical proof that, indeed, θ < 0.5.
We conclude this discussion with three remarks:
1. The above calculation shows that the posterior probability that θ deviates from its
typical value θ∗ decays exponentially with the number of available observations,

+a)−fθ∗ (θ ∗ )]
p(θ − θ∗ > a|θ∗ ; M ) ∼ e−M [fθ∗ (θ . (1.38)

A more general discussion of the large M limit will be given in Chapter 2.


2. The maximum of the function f , fθ∗ (θ∗ ), is equal to the entropy of a binary
variable y = 0, 1 with corresponding probabilities 1 − θ∗ , θ∗ . We will introduce
the notion of entropy and discuss the general connection with inference in the
so-called asymptotic regime (namely, for extremely large M ) in the next chapter.
3. Note that Eq. (1.36) is strongly reminiscent of a Boltzmann distribution. Assume
that a thermal particle moves on a one-dimensional line and feels a potential
U (x) depending on its position x. Let us call x∗ the absolute minimum of U . At
low temperature T , the density of probability that the particle is in x reads (the
Boltzmann constant kB is set to unity here):

e−(U (x)−U (x ))/T
ρ(x; T ) ∼ p , (1.39)
2π U 00 (x∗ )/T

where the denominator comes from the integration over the harmonic fluctuations
of the particle around the bottom of the potential. We see that the above expres-
sion is identical to Eq. (1.36) upon the substitutions θ∗ → x∗ , 0.5 → x, fθ∗ →
U, M → 1/T . Not surprisingly, having more observations reduces the uncertainty
about θ and thus the effective temperature.
Tutorial 1: diffusion coefficient from single-particle tracking 11

1.5 Tutorial 1: diffusion coefficient from single-particle tracking


Characterising the motion of biomolecules (proteins, RNA, etc.) or complexes (vesi-
cles, etc.) inside the cell is fundamental to the understanding of many biological pro-
cesses [2]. Optical imaging techniques now allow for the tracking of single particles in
real time [3]. The goal of this tutorial is to understand how the diffusion coefficient can
be reconstructed from the recordings of the trajectory of a particle, and how the ac-
curacy of the inference is affected by the number of data points (recording length) [4].
The diffusion coefficient depends on the diffusive properties of the environment and
on the size of the object. Supposing that the data are obtained in water, from the
diffusion coefficient reconstructed from the data the characteristic size of the diffusing
object will then be extracted, and a connection with characteristic biological sizes will
be made.

1.5.1 Problem
We consider a particle undergoing diffusive motion in the plane, with position r(t) =
(x(t), y(t)) at time t. The diffusion coefficient (supposed to be isotropic) is denoted by
D, and we assume that the average velocity vanishes. Measurements give access to the
positions (xi , yi ) of the particles at times ti , where i is a positive integer running from
1 to M .

Data:
Several trajectories of the particle can be downloaded from the book webpage2 , see
tutorial 1 repository. Each file contains a three-column array (ti , xi , yi ), where ti is the
time, xi and yi are the measured coordinates of the particle, and i is the measurement
index, running from 1 to M . The unit of time is seconds and displacements are in µm.

Questions:

1. Write a script to read the data. Start by the file dataN1000d2.5.dat, and plot the
trajectories in the (x, y) plane. What are
p their characteristics? How do they fill
the space? Plot the displacement ri = x2i + yi2 as a function of time. Write the
random-walk relation between displacement and time in two dimensions, defining
the diffusion coefficient D. Give a rough estimate of the diffusion coefficient from
the data.
2. Write down the probability density p({xi , yi }|D; {ti }) of the time series {xi , yi }i=1,...,M
given D, and considering the measurement times as known, fixed parameters. De-
duce, using Bayes’ rule, the posterior probability density for the diffusion coeffi-
cient, p(D|{xi , yi }; {ti }).
3. Calculate analytically the most likely value of the diffusion coefficient, its average
value, and its variance, assuming a uniform prior on D.
4. Plot the posterior distribution of D obtained from the data. Compute, for the
given datasets, the values of the mean and of the variance of D, and its most
likely value. Compare the results obtained with different number M of measures.

2 https://github.com/StatPhys2DataDrivenModel/DDM_Book_Tutorials
12 Introduction to Bayesian inference

5. Imagine that the data correspond to a spherical object diffusing in water (of
viscosity η = 10−3 Pa s). Use the Einstein-Stokes relation,

kB T
D= , (1.40)
6πη`

(here ` is the radius of the spherical object and η is the viscosity of the medium) to
deduce the size of the object. Biological objects going from molecules to bacteria
display diffusive motions, and have characteristic size ranging from nm to µm.
For proteins ` ≈ 1 − 10 nm, while for viruses ` ≈ 20 − 300 nm and for bacteria,
` ≈ 2 − 5 µm. Among the molecules or organisms described in table 1.1, which
ones could have a diffusive motion similar to that displayed by the data?

object ` (nm)
small protein (lysozime) (100 residues) 1
large protein (1000 residues) 10
influenza viruses 100
small bacteria (e-coli) 2000
Table 1.1 Characteristic lengths for several biological objects.

6. In many cases the motion of particles is not confined to a plane. Assuming that
(xi , yi ) are the projections of the three-dimensional position of the particle in the
plane perpendicular to the imaging device (microscope), how should the procedure
above be modified to infer D?

1.5.2 Solution
Data Analysis. The trajectory in the (x, y) plane given in the data file for M = 1000
is plotted in figure 1.5A. It has the characteristics of a random walk: the space is not
regularly filled, but the trajectory densely explores one region before “jumping” to
another region. p
The displacement r = x2 + y 2 as a function of time is plotted in figure 1.5B. On
average, it grows as the square root of the time, but on a single trajectory we observe
large fluctuations. The random walk in two dimensions is described by the relation:

hr2 (t)i = 4Dt , (1.41)

where D is the diffusion coefficient whose physical dimensions are [D] = l2 t−1 . Here
lengths are measured in µm and times in seconds. A first estimate of D from the data
can be obtained by just considering the largest time and estimating

r2 (tmax )
D0 = , (1.42)
4tmax

which gives D0 = 1.20 µm2 s−1 for the data set with M = 1000. Another estimate
of D can be obtained as the average of the square displacement from one data point
Tutorial 1: diffusion coefficient from single-particle tracking 13

A 70
B
20 60

50
0
40
y (¹m)

r (¹m)
30
−20
20

−40 10 D=2.47 ¹m 2 s ¡1
D=1.20 ¹m 2 s ¡1
0
0 20 40 60 0 100 200 300 400 500
x (¹m) t (s)

Fig. 1.5 Data file with M = 1000. A. Trajectory of the particle. B. Displacement from the
origin as a function of time.

to the next one divided by the time interval. We define the differences between two
successive positions and between two successive recording times

δxi = xi+1 − xi , δyi = yi+1 − yi , δti = ti+1 − ti . (1.43)

Note that i = 1, . . . , M − 1. The square displacement in a time step is δri2 = δx2i + δyi2
and the estimate of D is
M −1
1 X δri2
D1 = , (1.44)
4(M − 1) i=1 δti

giving D1 = 2.47 µm2 s−1 for the same data set. These estimates are compared with
the trajectory in figure 1.5B.

Posterior distribution. Due to diffusion δxi and δyi are Gaussian random variables
with variances 2D δti . We have

δx2
1 − i
p(δxi |D; δti ) = √ e 4 D δti , (1.45)
4πD δti

and p(δyi |D; δti ) has the same form. The probability of a time series of increments
{δxi , δyi }i=1,...,M −1 , given D is therefore:

M −1 δx2 2
Y 1 − i − δyi
p({δxi , δyi }|D; {δti }) = e 4 D δti 4 D δti
i=1
4πD δti (1.46)
−B/D −(M −1)
= Ce D ,

QM −1 1 PM −1 δr2
where C = i=1 4πδt i
and B = i=1 4 δtii . Note that to infer D we do not need the
absolute values of (xi , yi ), but only their increments on each time interval.
14 Introduction to Bayesian inference

According to Bayes’ rule,

p({δxi , δyi }|D; {δti })p(D)


p(D|{δxi , δyi }; {δti }) = R ∞ . (1.47)
0
dD p({xi , yi }|D; {δti })p(D)

We consider an improper uniform prior p(D) =const. This can be thought as a uniform
prior in [Dmin , Dmax ], in the limit Dmin → 0 and Dmax → ∞. Thanks to the likelihood,
the posterior remains normalisable in this limit.
Note that, introducing
M −1
B 1 X δri2
D∗ = = = D1 , (1.48)
M −1 4(M − 1) i=1 δti

we can write the posterior as



∗ e−(M −1)D /D D−(M −1)
p(D|D ; M ) = R ∞
0
dDe−(M −1)D∗ /D D−(M −1)
(1.49)
−(M −1)D ∗ /D
e D−(M −1) [M − 1)D∗ ](M −2)
= ,
(M − 3)!

where the denominator is easily computed by changing the integration variable to


u = D∗ /D. Note that, as in Laplace’s problem,

D∗
p(D|D∗ ; M ) ∝ e−(M −1)fD∗ (D) , fD∗ (D) = + log D . (1.50)
D
The most likely value of D is precisely D∗ , which is the minimum of fD∗ (D) and the
maximum of the posterior, and coincides with the previous estimate D1 .
The average value of D can also be computed by the same change of variables,

(M − 1) ∗
hDi = D , (1.51)
(M − 3)

and converges to D∗ for M → ∞. The variance of D is

(M − 1)2
2
σD = (D∗ )2 , (1.52)
(M − 3)2 (M − 4)

and it decreases proportionally to 1/M for large M .

Numerical analysis of the data. From the trajectories given in the data files we obtain
the results given in table 1.2. An example of the posterior distribution is given in
figure 1.6.
Note that for large values of M it is not possible to calculate directly the (M − 3)!
in the posterior distribution, Eq. (1.49). It is better to use Stirling’s formula,

(M − 3)! ≈ 2π (M − 3)M −3+1/2 e−(M −3) , (1.53)
Tutorial 1: diffusion coefficient from single-particle tracking 15

Name-file M D D∗ hDi σD
dataN10d2.5.dat 10 2.5 1.43 1.84 0.75
dataN100d2.5.dat 100 2.5 2.41 2.46 0.25
dataN1000d2.5.dat 1000 2.5 2.47 2.48 0.08
Table 1.2 Results for the inferred diffusion constant and its standard deviation (all
in µm2 s−1 ), for trajectories of different length M .

5
Posterior distribution p(Djxi ; yi )

M = 10
M = 100
4 M = 1000

0
0 1 2 3 4
D (¹m ¡2 s ¡1 )

Fig. 1.6 Posterior distribution for the diffusion coefficient D given the data, for several values
of M .

from which one obtains


√ (M −2)
e−B/D D−(M −1) M − 3

Be
p(D|B; M ) ≈ √ . (1.54)
2π e M −3
We see that this is a very good approximation for M = 10 and we can use it for
M = 100 and M = 1000.

object ` (nm) D (µm2 s−1 )


small protein (lysozime) (100 residues) 1 200
large protein (1000 residues) 10 20
influenza viruses 100 2
small bacteria (e-coli) 2000 0.1
Table 1.3 Diffusion constants in water at ambient temperature for several biological objects
as in table 1.1, obtained by the Einstein-Stokes relation.

Diffusion constant and characteristic size of the diffusing object. The order of mag-
nitude of the diffusion constant can be obtained by the Einstein-Stokes relation:
kB T
D = 6πη` , where ` is the radius of the object (here considered as spherical), and
η is the viscosity of the medium. Considering the viscosity of the water η = 10−3 Pa s
16 Introduction to Bayesian inference

and kB T = 4 × 10−21 J, one obtains the orders of magnitude given in table 1.3.
Therefore, the data could correspond to an influenza virus diffusing in water.
Ref. [5] reports the following values: for a small protein (lysozyme) D = 10−6 cm2 s−1 ,
and for a tobacco virus D = 4 10−8 cm2 s−1 , in agreement with the above orders of
magnitude. In Ref. [4], the diffusion coefficient of protein complexes inside bacteria,
and with widths approximately equal to 300 − 400 nm, are estimated to be equal to
D = 10−2 µm2 s−1 . Differences with the order of magnitude given above are due to
the fact that the diffusion is confined and the medium is the interior of the cell, with
larger viscosity than water.
2
Asymptotic inference and
information

In this chapter we will consider the case of asymptotic inference, in which a large
number of data is available and a comparatively small number of parameters have
to be inferred. In this regime, there exists a deep connection between inference and
information theory, whose description will require us to introduce the notions of en-
tropy, Fisher information, and Shannon information. Last of all, we will see how the
maximum entropy principle is, in practice, related to Bayesian inference.

2.1 Asymptotic inference


Consider M data points Y = {yi }i=1...M , independently drawn from a given likelihood
distribution p(y|θ̂). We consider in this chapter the most favourable situation in which
M is very large compared to the dimensions D of the parameter vector, θ̂ ∈ RD , and
L of the data variables y ∈ RL . The meaning of “very large” will be made more precise
below.
In this section, we begin the discussion by considering the easiest situation, under
the following two assumptions:
1. The likelihood p(y|θ) is exactly known, and the scope of the inference is to infer
from the data Y an estimate of the true parameter in the model, θ̂. We will call
the estimate θ.
2. The prior p(θ) over θ is uniform.
Our aim is to provide some theoretical understanding of the prediction error, which
is the difference between θ and θ̂, and how in the limit of a large number of mea-
surements, M → ∞, this error vanishes asymptotically and is controlled by a simple
theoretical bound. We will then discuss non-uniform priors in section 2.1.5, and how
to handle cases in which the likelihood is not known in section 2.1.6.

2.1.1 Entropy of a distribution


A quantity that plays an extremely important role in asymptotic inference and infor-
mation theory is the “entropy” of the probability distribution p,
X X
S(p) = − pi log pi = − p(y) log p(y) . (2.1)
i y∈A

This entropy is defined up to a multiplicative constant, which corresponds to the choice


of the logarithm base. Common choices are natural or base 2 logs.
18 Asymptotic inference and information

For a continuous random variable y taking values in a finite interval A, we will


denote by p(y = a)da, or most often simply p(y)dy, the probability that the random
variable takes values in [a, a + da] or [y, y + dy], respectively. The quantity p(y) is then
called a “probability density”. The previous discussion is generalised straightforwardly,
and sums over discrete values become integrals over continuous intervals. For example,
the entropy is now defined as
Z
S(p) = − dy p(y) log p(y) . (2.2)
A

2.1.2 Cross entropy and Kullback-Leibler divergence


We further need to introduce two important quantities: the cross entropy and the
Kullback-Leibler (KL) divergence. Consider two distributions p(y) and q(y) of a ran-
dom variable y. The cross entropy is defined as
X
Sc (p, q) = − p(y) log q(y) = − hlog qip , (2.3)
y

where the average is over p. The name “cross entropy” derives from the fact that this
is not properly an entropy, because p and q are different. It coincides with the entropy
for p = q, X
Sc (p, p) ≡ S(p) = − p(y) log p(y) . (2.4)
y

The KL divergence of p with respect to q is defined as


X p(y)
DKL (p||q) = p(y) log = Sc (p, q) − S(p) . (2.5)
y
q(y)

An important property of the KL divergence is that it is always positive,


DKL (p||q) ≥ 0 , (2.6)
where the equality is reached when p = q only. To establish the positivity of DKL ,
q(y)
consider z(y) = p(y) , and note that values of y for which p(y) = 0 do not contribute
to the sum, because p(y) log p(y) → 0 when p(y) → 0; hence, we can consider that
p(y) 6= 0 and z(y) is well defined. As y is a random variable, so is z, with a distribution
induced from the probability p(y) over y. According to Eq. (2.5), the KL divergence
between p and q is DKL (p||q) = −hlog z(y)ip . While the average of log z is hard to
compute, the average value of z is easy to derive:
X X q(y) X
hz(y)ip = p(y) z(y) = p(y) = q(y) = 1 , (2.7)
y y
p(y) y

so that
−DKL (p||q) = hlog z(y)ip ≤ hz(y)ip − 1 = 0 , (2.8)
as a consequence of the concavity inequality log x ≤ x − 1. Note that the equality is
reached if z = 1 for all y such that p(y) > 0. As a consequence DKL (p||q) is positive
and vanishes for p = q only.
Asymptotic inference 19

Hence, the KL divergence gives a measure of the dissimilarity of the distributions


q and p. Note that this quantity is not symmetric: for generic p, q, DKL (p||q) 6=
DKL (q||p). We will provide a more intuitive interpretation of DKL in section 2.1.4.

2.1.3 Posterior distribution for many data


According to Bayes’ rule in Eq. (1.10), the posterior distribution p(θ|Y ) is proportional
to the likelihood of the model given the data p(Y |θ) (remember the prior over θ is
uniform). If the data are drawn independently, we have
M M
!
Y 1 X
p(Y |θ) = p(yi |θ) = exp M × log p(yi |θ) . (2.9)
i=1
M i=1

We have rewritten in Eq. (2.9) the product of the likelihoods of the data points as
the exponential of the sum of their logarithms. This is useful because, while prod-
ucts of random variables converge badly, sums of many random variables enjoy nice
convergence properties. To be more precise, let us fix θ; the log-likelihood of a data
point, say, yi , is a random variable with value log p(yi |θ), because the yi are randomly
extracted from p(yi |θ̂). The law of large numbers ensures that, with probability one1 ,
M Z
1 X
log p(yi |θ) −→ dy p(y|θ̂) log p(y|θ) . (2.10)
M i=1 M →∞

We can then rewrite the posterior distribution for M → ∞ as

p(θ|Y ) ∝ p(Y |θ) ≈ e−M Sc (θ̂,θ) , (2.11)

where Z
Sc (θ̂, θ) = − dy p(y|θ̂) log p(y|θ) (2.12)

is precisely the cross entropy Sc (p, q) of the true distribution p(y) = p(y|θ̂) and of the
inferred distribution q(y) = p(y|θ).
As shown in section 2.1.2, the cross entropy can be expressed as the sum of the
entropy and of the KL divergence, see Eq. (2.5)

Sc (θ̂, θ) = S(θ̂) + DKL (θ̂||θ) , (2.13)

where we use the shorthand notation DKL (θ̂||θ) = DKL (p(y|θ̂)||p(y|θ)). Due to the
positivity of the KL divergence, the cross entropy Sc (θ̂, θ) enjoys two important prop-
erties:
1 If we furthermore assume that the variance of such random variable exists
Z
 2
dy p(y|θ̂) log p(y|θ) < ∞ ,

then the distribution of the average of M such random variables becomes Gaussian with mean given
by the right hand side of Eq. (2.9) and variance scaling as 1/M .
20 Asymptotic inference and information

25 M = 10

Posterior distribution p(µ)


M = 100
20 M = 1000

15

10

0 µ¤
0.2 0.3 0.4 0.5 0.6 0.7 0.8
µ

Fig. 2.1 Illustration of the evolution of the posterior probability with the number M of
data, here for Laplace’s birth rate problem, see Eq. (1.35). Another example was given in
figure 1.6.

• it is bounded from below by the entropy of the ground-truth distribution: Sc (θ̂, θ) ≥


S(θ̂), and
• it has a minimum in θ = θ̂.
Therefore, as expected, the posterior distribution in Eq. (2.11) gives a larger weight
to the values of θ that are close to the value θ̂ used to generate the data, for which
Sc (θ̂, θ) reaches its minimum.

2.1.4 Convergence of inferred parameters towards their ground-truth


values
To obtain the complete expression of the posterior distribution, we introduce the
denominator in Eq. (2.11),

e−M Sc (θ̂,θ)
p(θ|Y ) = R . (2.14)
dθ e−M Sc (θ̂,θ)

In the large–M limit the integral in the denominator is dominated by the minimal
value of Sc (θ̂, θ) at θ = θ̂, equal to the entropy S(θ̂) so we obtain, to exponential
order in M ,
p(θ|Y ) ∼ e−M [Sc (θ̂,θ)−S(θ̂)] = e−M DKL (θ̂||θ) . (2.15)
In figure 2.1 we show a sketch of the posterior distribution for Laplace’s problem
discussed in section 1.4. The concentration of the posterior with increasing values of
M is easily observed.
The KL divergence DKL (θ̂||θhyp ) controls how the posterior probability of the
hypothesis θ = θhyp varies with the number M of accumulated data. More precisely,
for θhyp 6= θ̂, the posterior probability that θ ≈ θhyp is exponentially small in M . For
any small ,
Prob(|θ − θhyp | < ) = e−M DKL (θ̂||θhyp )+O(M ) , (2.16)
Asymptotic inference 21

and the rate of decay is given by the KL divergence DKL (θ̂||θhyp ) > 0. Hence, the
probability that |θhyp − θ| <  becomes extremely small for M  1/DKL (θ̂||θhyp ).
The inverse of DKL can therefore be interpreted as the number of data needed to
recognise that the hypothesis θhyp is wrong.
We have already seen an illustration of this property in the study of Laplace’s birth
rate problem in section 1.4 (and we will see another one in section 2.1.6). The real value
θ̂ of the birth rate of girls was unknown but could be approximated by maximizing
the posterior, with the result θ̂ ≈ θ∗ = 0.490291. We then asked the probability that
girls had (at least) the same birth rate as boys, i.e. of θhyp ≥ 0.5. The cross entropy
of the binary random variable yi = 0 (girl) or 1 (boy) is Sc (θ̂, θ) = fθ̂ (θ) defined in
Eq. (1.31) and shown in figure 1.4. The rate of decay of this hypothesis is then
DKL (θ̂||θhyp ) = fθ̂ (θ̂) − fθ̂ (θhyp ) ≈ 1.88 · 10−4 , (2.17)
meaning that about 5000 observations are needed to rule out that the probability that
a newborn will be a girl is larger than 0.5. This is smaller than the actual number
M of available data by two orders of magnitude, which explains the extremely small
value of the probability found in Eq. (1.37).
2.1.5 Irrelevance of the prior in asymptotic inference
Let us briefly discuss the role of the prior in asymptotic inference. In presence of a
non-uniform prior over θ, p(θ) ∼ exp(−π(θ)), Eq. (2.11) is modified into
 
p(θ|Y ) ∝ p(Y |θ)p(θ) ≈ exp −M Sc (θ̂, θ) − π(θ) . (2.18)

All the analysis of section 2.1.4 can be repeated, with the replacement
1
Sc (θ̂, θ) → Sc (θ̂, θ) + π(θ) . (2.19)
M
Provided π(θ) is a finite and smooth function of θ, the inclusion of the prior becomes
irrelevant for M → ∞, as it modifies the cross entropy by a term of the order of
1/M . In other words, because the likelihood is exponential in M and the prior does
not depend on M , the prior is irrelevant in the large M limit. Of course, this general
statement holds only if p(θ̂) > 0, i.e. if the correct value of θ is not excluded a priori.
This observation highlights the importance of avoiding imposing a too restrictive prior.
2.1.6 Variational approximation and bounds on free energy
In most situations, the distribution from which data are drawn may be unknown, or
intractable. A celebrated example borrowed from statistical mechanics is given by the
Ising model over L binary spins (variables) y = (y1 = ±1, · · · , yL = ±1), whose Gibbs
distribution reads (in this section we work with temperature T = 1 for simplicity)
e−H(y) X
pG (y) = , with H(y) = −J yi yj . (2.20)
Z
hiji

If the sum runs only over pairs hiji of variables that are nearest neighbour on a
d–dimensional cubic lattice, the model is intractable: the calculation of Z requires
Another random document with
no related content on Scribd:
lasses never learned ony thing but what they could mak a gude use
o’; while, there’s Mary Foster does nothing but read novells from
morning till night; she’s one o’ your fine misses. If our girls were to
bring a novell into our house, I would put it at the back of the fire,
though there was na another novell i’ the world.”
It is said that in nunneries, where there is neither vice, nor the
possibility of it, the ladies, if unable to talk real scandal, make up for
it by censorious remarks upon the most trifling foibles in their
companions, or upon the most unimportant failures in the
performance of the most unimportant duties. If a holy sister has been
observed to smile at a wrong moment, if she has miscounted a bead,
or tripped in her gown while walking in a procession, there is as
much prattle about it as if she had committed a real offence. Just so,
in a country town, every trivial incident becomes a subject of
comment for those amiable people who make a point of attending to
every body’s business but their own. The consequence is, that every
person moves in a country town as if he were upon an ambuscading
party: he sends by stratagem for every necessary of life which he
requires: he takes all kinds of by-ways and back roads to escape
observation, and cannot so much as cross the street without fearing
he will be circumvented. Any thing like a good round thumping
impropriety is hailed in such a place like rain in a drought. The most
of the matters of remark are very small deer, hardly worth hunting
down. When one of a more important character arises, it is quite a
godsend. Suppose, for instance, the failure of some unfortunate
merchant, who has been ruined by mere simplicity of character. The
country people, somehow, have a most exaggerated idea of the
mischiefs of bankruptcy. A bankrupt, in their eyes, is a person of
distinguished criminality—almost enough to make him be regarded
as a world’s wonder. In proportion, therefore, to their previous
remarks for the edification of the unhappy man, is their wholesome
severity afterwards. They are surprised to find that, after such an
event, he still bears the ordinary shape of a human being—that he
has not become signalised by some external transfiguration, of a
kind sufficiently awful to indicate his offence. Another thing they are
astonished at—that the family of a bankrupt should continue to have
the usual appetites of human beings—that they should not, indeed,
have altogether ceased to eat, drink, or sleep. The following is very
nearly a conversation which really occurred, on such an occasion, in
a somewhat humble rank of mercantile life.
“Weel, hae ye heard the news?”
“What news, woman?”
“Ou, hae ye no heard it?—James Sinclair’s shop’s no open this
mornin’—that’s a’.”
“Aih, weel, has it come to that at last? I aye thought it. It was easy
seeing yon could na stand lang. Sic ongauns as they had for a while
—sic dresses—sic parties! But every thing comes to its level at last. I
wonder folk dinna think shame to gang on sae wi’ other folk’s siller.
It’s a perfect black-burning disgrace. And she’s just as muckle to
blame as him. There was hersell, just last Sunday eight days, at the
kirk wi’ a new pelisse and a bannet, and the laddies ilk ane o’ them
wi’ new leather caps. I’se warrand there was nae bannocks ever
seen in their house—naething but gude wheat bread. Whenever a
bairn whinged for a piece, it buid to get a shave o’ the laiff. Atweel,
her grandfather, auld George Morrison, was na sae ill to serve. Mony
a claut o’ cauld parritch he gat frae my aunty Jess, and was thankfu’
to get them.”
“Na, but, woman, I saw Jamie himsell gaun up the street this
mornin’, and a superfine coat on his back just the same as ever. Na,
the lass was seen this forenoon getting a leg o’ lamb—a fifteenpence
leg it was, for our Jenny got the neebor o’t—the same as if naething
ava had happened. But, of course, this’ll no gang on lang. They’ll be
roupit out, stab and stow, puir thochtless creatures; and I’m sure I
dinna ken what’s to come o’ them. She has nae faither’s house to
gang back to now. They’ll hae to set up some bit sma’ public, I
reckon.”
“Heaven keep us a’ frae extravagance—I had never ony brow o’
that new plan she had o’ pitting black silk ribands round the callants’
necks, instead o’ cotton napkins.”
Such are a few of the remarks of our good friends the controllers-
general of society; and we are very sure that few people alive but
what must look upon them as a most useful, most exemplary, and
most benevolent class of persons.
A TURN FOR BUSINESS.

Next to a thorough grounding in good principles, perhaps the thing


most essential to success in life is a habit of communicating easily
with the world. By entering readily into conversation with others, we
not only acquire information by being admitted to the stores which
men of various modes of thinking have amassed, and thereby gain
an insight into the peculiarities of human character, but those
persons into whose society we may be accidentally thrown are
gratified to think that they have been able to afford instruction.
Seeing that we appreciate their favourite subject, they conceive a
high opinion of our penetration, and not unfrequently exert
themselves wonderfully to promote our interests. Men in business,
particularly, who have this happy turn of being able to slide as it were
into discourse, and to throw it into that train which is best suited to
the capacities and humours of others, are wonderfully indebted to it
for the run of customers it entices to their shops. A stately, grave, or
solemn manner, is very inappropriate in measuring stuffs by the
yard; and though a man be penetrated by the deepest sense of
gratitude, if his bow be stiff, and his countenance not of a relaxing
cast, he makes not half so favourable an impression as another who
may not perhaps be a more deserving person in the main, but has a
more graceful method of acknowledging his obligations. It is
astonishing, too, at how cheap a rate good-will is to be purchased.
An insinuating way of testifying satisfaction with the pleasantness of
the weather, is often a very effectual way of extending popularity; it is
regarded as an act of condescension when addressed to some,
while with others it is received as the indication of a happy
temperament, which is at all times attractive. A person who “has little
to say,” or, in other words, who does not deign to open his mouth
except when it is indispensably necessary, never proves generally
acceptable. You will hear such a one described as “a very good sort
of man in his way;” but people rather avoid him. He has neither the
talent of conversing in an amusing vein himself, nor of leading on
others to do so; and they are only the arrantest babblers who are
contented with an inanimate listener. I remember a striking example
of the various fortune of two persons in the same profession, who
happened to be of those different dispositions.
Two pedlars made their rounds in the same district of country. The
one was a tall, thin man, with a swarthy complexion. Nothing could
exceed this fellow’s anxiety to obtain customers; his whole powers
seemed to be directed to the means of disposing of his wares. He no
sooner arrived at a farm-house than he broached the subject nearest
his heart—“Any thing wanted in my line to-day?” He entered into a
most unqualified eulogium on their excellency; they were all
unequalled in fineness; he could sell them for what might be said to
be absolutely nothing; and as for lasting, why, to take his word for it,
they would wear for ever. He chose the table where the light was
most advantageous, proceeded immediately to undo the labyrinth of
cord with which his goods were secured, and took the utmost pains
to exhibit their whole glories to the eyes of the admiring rustics. If the
farmer endeavoured to elicit from him some information concerning
the state of the crops in the places where he had been travelling, he
could only afford a brief and unsatisfactory answer, but was sure to
tack to the tail of it the recommendation of some piece of west of
England cloth which he held in his hand ready displayed. Nay, if the
hospitality of the good wife made him an offer of refreshment before
he entered upon business, he most magnanimously, but unpedlar-
like, resisted the temptation to eat, animated by the still stronger
desire to sell. There was no possibility of withdrawing him for a
moment from his darling topic. To the master he said, “Won’t you buy
a coat?”—to the mistress, “Won’t you buy a shawl?”—to the servant
girls, “Won’t you buy a gown a-piece?” and he earnestly urged the
cowherd to purchase a pair of garters, regardless of the notorious
fact that the ragged urchin wore no stockings. But all his efforts were
ineffectual; even his gaudiest ribands could not melt the money out
of a single female heart; and his vinegar aspect grew yet more
meagre as he restored each article untouched to his package.
The rival of this unsuccessful solicitor of custom was a short, squat
man, fair-haired and ruddy. He came in with a hearty salutation, and
set down his pack in some corner, where, as he expressed himself, it
might be “out of the way.” He then immediately abandoned himself to
the full current of conversation, and gave a detail of every particular
of news that was within his knowledge. He could tell the farmer every
thing that he desired to know—what number of corn-stalks appeared
in the barn-yards wherever he had been, and what quantity of grain
still remained uncut or in shock, and he took time to enumerate the
whole distinctly. He was equally well prepared in other departments
of intelligence, and so fascinating was his gossip, that, when the
duties of any member of the family called them out of hearing, they
were apt to linger so long, that the goodwife declared he was “a
perfect offput to a’ wark.” This, however, was not meant to make him
abate of his talkative humour; and neither did he: the whole budget
was emptied first, and he received in turn the narratives of all and
sundry. Then came the proposal from some of those whom he had
gratified with his news, to “look what was in the pack.” The goods
were accordingly lugged from their place of concealment, and every
one’s hand was ready to pick out some necessary or some coveted
piece of merchandise. The master discovered that, as he would be
needing a suit ere long, it was as well to take it now. The mistress
was just waiting for Thomas coming round to supply herself with a
variety of articles, “for,” quoth she, “mony things are needit in a
house.” The servants exhorted each other to think whether they did
not require something, for it was impossible to say when another
opportunity of getting it might occur. The ellwand was forthwith put
into diligent requisition, the scissors snipt a little bit of the selvage,
and an adroit “screed” separated the various cloths from the rapidly
diminishing webs. The corners of many chests gave up their
carefully-hoarded gains, with which cheap remnants were
triumphantly secured. In the midst of this transfer of finery, the poor
herd-boy looked on with a countenance so wofully expressive of the
fact that he had not a farthing to spend, that some one took
compassion on him, and, having laid out a trifling sum, had the
satisfaction of making him perfectly happy with the equivalent,
flinging it into his unexpectant arms, and exclaiming, “Here, callant,
there’s something for you!” What a multiplicity of pleasing emotions
had this trader the tact of calling into exercise, all of them redounding
tenfold to his own proper advantage! It was impossible to say
whether he cultivated his powers of talk from forethought, as
knowing that they would produce a crisis favourable to his own
interests, or if he indulged in them because gossiping was congenial
to his own disposition. He had a sharp eye enough to what is called
the main chance; but at the same time he did not possess that
degree of intellectual depth which we might expect to find in one who
could calculate upon exciting the purchasing propensities by a
method so indirect. Most probably, therefore, his success in business
was more the result of an accidental cast of mind than of wisdom
prepense, or any aptitude beyond common for the arts of traffic, as
considered by themselves.
Such, also, in most cases, is that talent which gets the name of “a
fine turn for business.” The possessor exerts his powers of pleasing,
alike when engaged in the concerns of his profession, and in society
when there is no object to serve but that of passing time agreeably.
His engaging address is productive of commercial advantages, but it
is not a thing acquired and brought into exercise solely for that end.
Some people, no doubt, finding themselves to have a prepossessing
manner, do employ it systematically to promote their views of
business; but by far the greater number employ it because they have
it, and without reference to the pecuniary profit that may accrue. The
pecuniary profit, however, follows not the less as its consequence;
and we have the satisfaction of seeing urbanity of manners almost
uniformly rewarded by attaining to easy circumstances, while the
man of a gruff unsocial humour has usually to maintain a hard
struggle with fortune. The mere packing of knowledge into the heads
of children is not the only thing required to ensure their future
respectability and happiness—the qualities of the heart also demand
the fostering care of the instructor; and since so much depends on
their temper and behaviour to those around them, parents cannot be
too assiduous in the cultivation of affability, the possession of which
virtue is the grand secret that confers “a fine turn for business.”
SETTING UP.

The taking of a shop, whether to set up a new business or translate


an old one, is always a matter of deep and anxious concernment. On
such an occasion, one generally gets into a state of fidgetiness and
perplexity, which is felt to be far from disagreeable. A sentiment of
unwonted enterprise rises in his mind. He is going, he thinks, to do a
great thing—at least something beyond the usual range of
commercial existence. In the first place, he pays a few sly and
solitary visits to the place—not that he goes in to look about him—
no, no; he is not for some time up to that point. He tries first how the
premises look when simply walked past as if by an unconcerned
passenger. As he passes, he casts an affectedly careless glance at
the door and windows, taking care, however, to receive as deep an
impression as possible of the whole bearing and deportment of the
place. After walking to a sufficient distance, he turns and walks back,
and sees how it looks when approached from a different point of the
compass. Then he takes a turn along the other side of the street, or
perhaps, if afraid to excite observation (and if the place be in
Edinburgh), goes up a common stair, and takes a deliberate and
secure observation from a window. His feeling is almost exactly the
same as that of a lover making observations of a mistress, whose
figure he wishes to ascertain before getting too deeply in love with
her to put correct judgment out of the question. As, in the one case,
stature is perhaps considered, complexion and outline of face duly
weighed, and possibly some very modest inquiries instituted as to
what Master Slender calls “possibilities,” so, in the other, does the
shop-inspector consider all the particulars of the aspect and
likelihood of the contemplated premises. Shops, it must be
understood, have characters, exactly like human beings. Some have
an open, generous, promising countenance, while others have a
contracted, sinister, louring expression of phiz, according to the
quantity of mason-work there may be in front. Some are of a far
more accessible character than others, with a kind of facilis
descensus in the entry that is in the highest degree favourable to
custom. People can hardly avoid falling into such shops as they pass
along the streets, for positively they gape like so many Scyllas for
your reception, and goodwives, who, like Roderigo, have put money
in their purses, are caught like so many rats without thinking of it.
There are others, I grieve to say, with such a difficulty of entrance,
either from a narrow door, a shut door, an elevation of the pavement,
or a certain distance from the thoroughfare, that it requires an
absolute determination to purchase such and such articles in such
and such shops—a full animus emendi, as lawyers would say—to
overcome the obstacle. Perhaps it does not matter for some
businesses, which are not much overrun with competition, that they
should be carried on in shops of this kind. If there be only one music-
seller in the town, he might have his boutique in a twelfth story, and
yet he would be sure to get all the natural custom of the place; but in
the case of one out of some five hundred haberdashers, or some two
thousand grocers, it is absolutely imperative that he should be
established in some place with a fatal facility of access. In all cases,
there is a combination of qualities in shops, as well as in men and
women. There is something indescribable about it; but an
experienced eye, pretty well acquainted with the characters of the
streets [this is another subject] and parts of streets, could almost in a
moment decide upon the probabilities of any given shop in a large
city. He would combine in an instant in his own mind the various
qualities, and, counting them into each other after the manner of
Lieutenant Drummond, but by a figureless kind of arithmetic, assign
at once the exact value of the shop to any class of traders. And
shops have characters, too, in another sense of the word—that is,
they have reputations. Let a shop have all the apparent advantages
in the world, yet, if it be a shop in which several persons have
committed faux pas in business, it is naught. We often see an
excellent shop thus lose caste, as it were, and become of hardly any
value to its proprietor. Suppose some one has failed in it between
terms, and deserted it: then do all the bill-stickers come in the first
place, and paste it over with huge placards from top to bottom,
exactly as a man drowned in the sea, however fine a fellow he may
have been, gets encased in a few days in barnacles and shell-fish,
the conchological part of the world taking that opportunity to show
their contempt for the human. Though the character of the shop is
not yet, perhaps, at its worst, yet, as it happens to remain unleased
over the next term, the despairing landlord, some time in September
or October, begins to let it “by the month or week” to all kinds of
nameless people, who die and make no sign, such as men that show
orreries, or auctioneers selling off bankrupt stocks, till at last it is as
hopeless to think of getting a good tenant into it, as for a man with a
bad character to expect a good place in the Excise. The shop is
marked for ever; and unless, like the man in the Vicar of Wakefield, it
can get a thoroughly new face and form, it has no chance
whatsoever of resuming its place in the first rank of shops.
After having completely made up his mind to take a particular
shop, he goes and asks advice. His confessor[7] readily consents to
take a walk with him in that direction, and give his candid opinion
upon the subject. The two walk arm in arm past the premises, the
confessor alone looking, lest, in endeavouring to observe, they
should themselves be observed. “I’ll tell you what,” says the
confessor, “I like that shop very much. If the rent be at all suitable, I
think you might do very well in it.” It is then proposed that the
confessor should step in to inquire the rent; for though there be
equal reasons why each should not expose himself as being on the
outlook for a shop, the person not actually concerned has always
least reluctance to submit to that disadvantage, being borne up, it
would appear, by a conscious absence of design, while the guilt of
the other would be betrayed by his first question. If the confessor
reports favourably, then the individual who wants the premises
ventures in himself, inspects the accommodations, and makes
farther inquiries. The two afterwards retire together, and have a deep
and serious consultation upon the subject.
In the deliberations of a person about to enter life in this way, there
is always much that is extravagant, and much that is vague. I never
yet knew it fail, that, if there was success at all, it arose from different
sources from those which had been most securely calculated upon
as likely to produce it. Suppose it is a business for the supply of
some ordinary necessary of life: the novice reckons up almost all the
people he knows in different parts of the town, as sure to become his
customers; he expects, indeed, hardly any other kind of support. It is
found, however, when he commences, that one friend is engaged in
one way, and another in another, so that, with the exception,
perhaps, of some benevolent old lady, who sends occasionally to
buy a few trifles from him upon principle, all is waste and barren
where he expected to reap a plentiful harvest. He finds, however, on
the other hand, that he gets customers where he did not expect
them. People seem to rise out of the earth, like the men of Cadmus,
to buy from him. The truth is, he is resorted to by those who are
disengaged at the time, and to whom his shop is convenient; and all
the good-will of all the friends in the world will not get over, for his
sake, the difficulty of some engagement elsewhere, or the
inconvenience of distance.
It is also a very remarkable thing of people about to enter upon
such an enterprise as we are now describing, that they often
overlook the most important considerations of all and pay a very
minute attention to trifles and things by the by. They perhaps fail to
observe that there is not nearly enough of population around them to
justify their setting up a particular business; but they fully appreciate
and lay great stress upon the circumstance of having a water-pipe in
the back room, by which they may be enabled to wash their hands at
any time of the day. They may neither have capital nor range of
intellect for the business; but they are top-sure that the woman who
sells small wares in the area will supply them with a light for the fire
every morning. The shop may be unsuitable in many important
respects; but nothing could be better in its way than the place for a
sign above the door. Even where every matter of real consequence
is well weighed and found answerable, there is generally a fussy and
festering anxiety about details, accompanied, in the sensations of
the principal party, by peculiar dryness of mouth and excoriation of
thought-chewed lip. Matters may be such that a confessor, with all
the evil-foreboding qualities of a stormy-peterel, could not see a
single flaw in the prospect; yet it is amusing in such cases to hear
the intending trader laying as much stress upon the peculiar situation
of a fire-place in the back room, or the willingness of the landlord to
supply a padlock to the door, as if in these things, and in nothing
else, lay all his hopes of profit and eventual respectability in life.
Suppose, however, that, after all kinds of fond and dreamy
calculations, the shop has been taken and opened. I think there can
hardly, for some time, be a more interesting sight to a benevolent on-
looker, than the young and anxious trader. The shop almost throws
itself out at the windows to attract the observation of the passer by.
The youth himself stands prompt and alert behind his counter, never
idling for a moment, nor permitting his shopboy to idle, but both busy,
cutting, and brushing, and bustling about, whether there be any thing
to do or not. If but an old lady be seen looking up at the window, or
glancing in through the avenue of cheap prints that forms the
doorway, what an angler-like eagerness in the mind of the trader that
she would but walk in!—nothing more required—were she once
within the shop, no fear but she is well done for. And when any body
does go in to buy any thing, what a readiness to fly upon the article
wanted—with what serviceable rapidity of finger is the parcel
unbound—how polite and impressé the manner in which the object is
presented and laid out for inspection—what intense gratitude for the
money wherewith it is paid! With what a solicitous air is a card finally
put into your hand as a memorandum of the place!—a proceeding
only the more eloquent when not accompanied by an actual request
for your farther custom.
In a large city, advertising is necessarily resorted to as one of the
modes, if not almost the only mode, of forming a business. Here it is
obvious that a mere modest statement of the case will not do.
Something must be said, to make the setting up of the new shop
appear in the character of an event. The public attention must be
arrested to the circumstance, as if it were a matter of public
concernment. It must appear as if the interest of the community, and
the interest of the shopman, were identified. No good bargains, no
certainty of good articles, no safety of any kind, any where else.
Such is the strain of his advertisements, which, though they make
the judicious grieve, make a vast number of other people, and even
some of the judicious, buy. The secret is this: A warm and highly
coloured style is necessary with a new shopkeeper, to meet and
counteract the indifference of the public towards his concerns. If he
put forth a cool schedule of his goods and chattels, it does nothing
for him, because it does not single him out from the great herd. But if
he uses a striking and emphatic phraseology, and even mixes a little
extravagance in the composition, it is apt to fix attention to him and
his shop; and the people, being so warmly solicited, go to try. Again
(and here, perhaps, lies the better part of the thing), the frequency
and fervour of his advertisements at least convey the impression that
he is anxious for business, and ready and willing to execute it; and
as people like to deal with such persons, he is apt to be resorted to
on that account, if upon no other. Frequent advertising is, upon the
whole, a mark rather of a want of business, than of that kind of
respectability which consists in the enjoyment of a concern already
in full operation and productiveness; but with beginners, it is quite
indispensable.
The difficulty of establishing a new business is fortunately got over
in a small degree by a certain benevolent principle in human nature
—a disposition to encourage the efforts of the young. Some people
act so much under this sentiment, or have such an appetite for the
sincere thanks of the needy, that they go to hardly any shops but
those of new beginners. The sight of a haberdasher’s shop, in its
first and many-coloured dawn, with prints, and ribands, and shop-
bills, flying in all directions—or of a provision shop, where hams
project their noses into the very teeth, almost, of the passer by, and
cheeses lie gaping with a quarter cut out, as if ready to eat rather
than to be eaten—or of a bookseller’s shop, where every fresh and
trig volume upon the counter seems as if it would take the slightest
hint of your will, and, starting up, pack itself off, without any human
intervention whatsoever, to your lodgings—is irresistible to these
people. They must go in, whether they want any thing or not, and,
after buying some trifle as an earnest of future custom, get
themselves delighted with a full recital of all the young trader’s
feelings, and prospects, and capabilities, which he is ready to
disclose to any one that will lay out sixpence, and appear to take an
interest in his undertaking. If the customer be an old lady, she is
interested in his youth, and inquires whether he be married or not. If
not, then she wants him to get on well, so that he may soon be able
to have a wife; if he be, and have children, then she sympathises but
the more keenly; she thinks how much human happiness depends
upon the success or failure of his undertaking—how one fond soul
will watch with intense anxiety the daily progress of the business,
taking an interest in almost every penny that comes in, and how
many little mouths unconsciously depend upon what is done here,
for the fare which childhood so much requires and so truly enjoys.
She goes away, resolved to speak of the shop to all she knows, and
perhaps in two or three days she is able to bring in a flock of young
ladies who want various articles, and who, recommending the new
beginner to others, aid materially in making up the steady business,
which, with economy, perseverance, and suitable personal qualities,
he at length acquires.

FOOTNOTES:
[7] Bosom friend.
CONSULS.

The population of a large town is perpetually receiving accessions


from the country—not for the purpose of increasing the aggregate of
inhabitants, but to supply the waste of existence which takes place in
such a scene, and to furnish a better selection for the peculiar offices
and business of a city than what could be obtained from the
successive generations of the ordinary inhabitants. Nothing can be
more clear than that the youths born and bred in a large city have a
less chance to establish themselves in its first-rate lines of business,
than the lads who come in from the country as adventurers; the fact
being, that the latter are a selection of stirring clever spirits from a
large mass, while only the same proportion of the former are likely to
possess the proper merit or aptitude. Besides, the town-bred lad is
apt to have some points of silly pride about his status in society; he
cannot do this and he cannot do that, for fear of the sneers of the
numerous contemporaries under whose eyes he is always walking.
But the gilly, hot from Banff or Inverness, who comes into the town,
“with bright and flowing hair,” rugging and riving for a place in some
writer’s office, or elsewhere—why, the fellow would push into the
most sacred parts of a man’s house, like Roderick Random, and at
the most unconscionable hours, in search of some prospective
situation; and when he has got it, what cares he about what he does
(within honesty) in order to advance himself, seeing that all whoever
knew him before are on the other side of the Grampians. Thus, the
sons of the respectable people of large cities are constantly retiring
from the field—some to the East Indies, some to the West—some
evanish nobody knows how—while their places are taken by settlers
from all parts of the country, whose children, in their turn, give way to
fresh importations. Then, there is a constant tide towards the capital,
of all kinds of rural people, who, having failed to improve their
fortunes in the country, are obliged to try what may be done in the
town. A broken-down country merchant sets up a grocery shop in
some suburb—a farmer who has been obliged to relinquish his
dulcia arva, sets up an hostelry for carriers, and so forth. Every
recurrence of Whitsunday and Martinmas sends in large droves of
people on the tops of heavy carts, to pitch their camps in Edinburgh;
many of them with but very uncertain prospects of making a
livelihood when they get there, but yet the most of them astonished a
year or two after to find that they are still living, with the children all at
the school as formerly, although, to be sure, the “reeky toun” can
never be like the green meadows and dear blue hills which they
have left behind in Menteith, or Ayrshire, or Tweeddale. What
change, to be sure, to these good people, is the close alley of the
Old Town of Edinburgh, the changeless prospect of house tops and
chimneys, and the black wall opposite to their windows, ever casting
its dark shade into their little apartments, for the pleasant open fields
in which they have sown and reaped for half a lifetime, and where
every little rustic locality is endeared to them by a thousand delightful
recollections! But yet it is amazing how habit and necessity will
reconcile the mind to the most alien novelties. And, even here, there
are some blessings. The place of worship (always an important
matter to decent country people in Scotland) is perhaps nearer than
it used to be. Mr Simpson’s chapel in the Potterrow is amazingly
convenient. Education for the children, though dearer, is better and
more varied. There is also a better chance of employment for the
youngsters when they grow up. Then Sandy Fletcher, the ——
carrier, goes past the door every Wednesday, with a cart-load of
home reminiscences, and occasionally a letter or a parcel from some
friend left at the place which they have deserted. By means of this
excellent specimen of corduroyed honesty and worth, they still get all
their butter and cheese from the sweet pastures of their own country
side, so that every meal almost brings forward some agreeable
association of what, from feeling as well as habit, they cannot help
still calling home. Then it is always made a point with them to plant
themselves in an outskirt of the town, corresponding to the part of
the country from which they come, and where they think they will
have at least a specimen of the fresh air. A Clydesdale family, for
instance, hardly ever thinks of taking a house (at least for the first
year or two) any where but in the Grassmarket, or about Lauriston,
or the Canal Basin. People from East Lothian harbour about the
Canongate. Bristo Street and the Causewayside are appropriated
indefeasibly to settlers from Selkirkshire and Peeblesshire. Poll the
people thereabouts, and you will find a third of them natives of those
two counties. In fact, the New Town, or any thing beyond the
Cowgate, is a kind of terra borealis incognita to folk from the south of
Scotland. They positively don’t know any thing about those places,
except, perhaps, by report. Well, it must certainly be agreeable, if
one is banished from the country into a town, at least to dwell in one
of the outlets towards that part of the country; so that the exile may
now and then cast his thoughts and his feelings straight along the
highway towards the place endeared to him; and if he does not see
the hills which overlook the home of his heart, at least, perhaps, hills
from which he knows he can see other hills, from which the spot is
visible—the long stages of fancy in straining back to the place

“——He ne’er forgets, though there he is forgot.”

There is one other grand source of comfort—in fact, an


indispensable convenience—to people from the country living in a
large city, namely, Consuls. Every person in the circumstances
described must be familiar with the character and uses of a Consul,
though perhaps they never heard the name before. The truth is, as
from every district of broad Scotland there are less or more settlers
of all kinds of ranks and orders, so among these there is always one
family or person who serves to the occasional visitors from that part
of the country, as well as to the regular settlers, all the purposes
which a commercial Consul serves in a foreign port. The house of
this person is a howff, or place of especial resort, to all and sundry
connected with that particular locality. It is, in fact, the Consul-house
of the district. Sometimes, when there is a considerable influx from a
particular place, there is a Consul for almost every order of persons
connected with that place, from the highest to the lowest. The
Consul is a person—generally an old lady—of great kindliness of
disposition, and who never can be put about by a visit at any time
upon the most vaguely general invitation. Generally, a kind of open
table—a tea-table it is—is kept every Sunday night, which is resorted
to by all and sundry, like an “at home” in high life; and though the
Consul herself and some of her family sit on certain defined and
particular chairs during the whole evening, the rest are tenanted by
relays of fresh visitors almost every hour, who pay their respects,
take a cup, and, after a little conversation, depart. In general, the
individuals resorting to these houses are as familiar with every
particular of the system of the tea-table—yea, with every cracked
cup, and all the initials upon the silver spoons—as the honest Consul
herself. Community of nativity is the sole bond of this association,
but hardly any could be stronger. A person from the country takes
little interest in the gossip of the city, important as it may sometimes
be. He likes to hear of all that is going on in the little village or parish
from which he has been transplanted. All this, and more, he hears at
the house of the Consul for that village, or parish, the same as you
will be sure to find a London newspaper in the house of the British
resident at Lisbon. Any death that may have happened there since
his last visit—any birth—any marriage—any anything—he gets all in
right trim at the Consul-house, with all the proper remarks, the whole
having been imported on Thursday in the most regular manner by
the carrier, or else on some other day by a visitant, who, though only
a few hours in town, was sure to call there. At the Consul-house you
will hear how the minister is now liked—who is likely to get most
votes in the coming election—from whom Mrs —— bought her china
when she was about to be married—and the promise of the crops,
almost to a sheaf or a potato. But the topics are of endless variety.
One thing is remarkable. The most determined scandal is bandied
about respecting their ancient neighbours; and yet they all conspire
to think that there is no sort of people to be compared with them in
the mass. They will let nobody talk ill of them but themselves. There
is sometimes a considerable difference in the characters and ranks
of the individuals who frequent a Consul-house. Perhaps you find,
among persons of higher degree and more dignified age, apprentice
lads, who, being the children of old acquaintances of the Consul, are
recommended by their mothers to spend their Sunday evenings
here, as under a vicarial eye of supervision, and being sure to be out
of harm’s way in the house of so respectable a person. These stolid
youths, with their raw untamed faces, form a curious contrast,
occasionally, to the more polished individuals who have been longer
about town, such as writers’ clerks or licentiates of the church.
Possibly they will sit you out five mortal hours in a Consul-house,
without ever speaking a word, or even shifting their position on their
chairs, staring with unvaried eyes, and hands compressed between
knees, right into the centre of the room, and hearing all that is going
on as if they heard not. At length the young cub rises to go away,
and the only remark is, “Well, Willie, are you going home? Good-
night.” After which, the Consul only remarks to the adults around her,
“That’s ane o’ John Anderson’s laddies—a fine quiet callant.” But this
holds good only respecting Consuls in a certain walk of life. There
are houses where people of very high style, from a particular district,
are wont to call and converse; and there are dens in the inferior parts
of the town, to which only serving girls or boys (there is no rank
among boys) resort. Every place, every rank, has its Consul. And not
only is the Consul valuable as an individual who keeps a Sunday
evening conversazione. She actually does a great deal of business
for the particular district which she represents. If a townswoman
wants a gown dyed, or to obtain swatches of some new prints, or to
purchase any peculiar article which requires some address in the
purchasing, then is the Consul resorted to. A little square
inexplicable epistle, with not nearly enough of fold to admit a wafer,
and the phrase “for goods” on the corner, supposed to be a kind of
shibboleth that exempts letters from the laws of the Post-office,
comes in with the carrier, requesting that Mrs —— will be so good as
go to this or that shop, and do this and that and t’other thing, and
send the whole out by return of Pate Fairgrieve, and the payment will
be rendered at next visit to town. Thus the Consul is a vast
commission-agent, with only this difference, that she makes nothing
by it to compensate her immense outlay of capital. The duties,
however, of the Consulate are their own reward; and we doubt if
Brutus, who first assumed the office, bore it with a prouder head or
more satisfied heart, than many individuals whom we could point out.
Henceforth, we do not doubt, people will refer to the days when such
and such a person was Consul for their native village, in a style
similar to the ancient chronology of Rome; and “Consule Tullo” itself
will not be more familiar or more memorable language, than “in the
Consulate [shall we so suppose?] of Mrs Bathgate!”
COUNTRY AND TOWN ACQUAINTANCES.

The exact balance of favours in ordinary acquaintanceship is a


matter very difficult to be adjusted. Sometimes people think they are
giving more entertainments than they get, and on other occasions
you would suppose that they are mortally offended at their friends for
not coming oftener to eat of their meat and drink of their cup. It is
hard to say whether a desire of reserving or of squandering victuals
predominates; for though one would argue that it is more natural to
keep what one has than to give it away for nothing, yet, to judge by
the common talk of the world, you are far more likely to give offence
by coming too seldom than by coming too often to the tables of your
friends. From this cause, I have often been amused to hear people,
about whose company I was not very solicitous, making the most
abject apologies for having visited me so seldom of late, but
promising to behave a great deal better for the future—that is to say,
to give me henceforward much more of what I never desired before,
even in the smallest portions.
But this kindness of language is not confined to the party
threatening a visit: the party threatened is also given to use equally
sweet terms of discourse. “Really, you have been a great stranger
lately. We thought we never were to see you again. What is there to
hinder you of an evening to come over and chat a little, or take a
hand with the Doctor and Eliza at whist? We are always so happy to
see you. I assure you we are resolved to take it very ill; and if you
don’t repay our last visit, we will never see you again.” With an
equally amiable sincerity, the shocking person with whom you have
been long quite tired, having ceased to gain any amusement or any
eclat from the acquaintance, replies, “I must confess I have been
very remiss. Indeed, I was so ashamed of not having called upon
you for such a length of time, that I could not venture to do it. But,
now that the ice is broken, I really will come some night soon. You
may depend upon it.” And so the two part off their several ways, the
one surprised at having been betrayed into so many expressions of

You might also like