Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Phys Chem Minerals(1994) 21:3649

PHYSICSICHEMISTRY
]MIHERALS
@ Springer-Verlag1994

Bayes Estimation: A Novel Approach to Derivation


of Internally Consistent Thermodynamic Data
for Minerals, their Uncertainties, and Correlations.
Part I: Theory
Walter Olbricht 1 *, Niranjan D. Chatterjee 2, Klaus Miller 1 **
Institute of Mathematics,Ruhr University,D-44780 Bochum,Germany
z Institute of Mineralogy,Ruhr University,D-44780 Bochum,Germany,Fax: 0234 7094 179
Received May 18, 1993/Revised, acceptedJanuary 22, 1994

Abstract. Computation of phase diagrams in mineral sys- had an enormous impact both on the computation of
tems and quantitative geothermobarometry thrive on the geological phase diagrams and on quantitative geother-
availability and accuracy of internally consistent thermo- mobarometry (Powell 1985; Berman 1991). Regardless
dynamic datasets for minerals. The prevailing two meth- of the methodologies employed to derive them, they are
odologies applied to derive them, mathematical pro- invariably based on simultaneous treatment of
gramming (MAP) and least squares regression (REG), 9 phase property (PP) data like thermochemical, ther-
have their very specific advantages and deficiencies mophysical, and volumetric properties of phases and
which are to some extent complementary. Bayes estima- 9 reaction property (RP) data based on the relevant re-
tion (BE), the novel technique proposed here for obtain- action reversal (bracketing) experiments.
ing internally consistent thermodynamic databases, can The data on phase properties show normal probability
combine the advantages of both MAP and REG but distributions; they can be expressed in terms of equali-
avoid their drawbacks. It optimally uses the information ties, with their associated standard deviations. The reac-
on thermochemical, thermophysical, and volumetric tion reversal experiments, by contrast, indicate inequali-
properties of phases and experimental reaction reverals ties in energy difference, whose sign depends upon the
to refine the thermodynamic data and returns their un- relative stability of the reactants or the products at the
certainties and correlations. Therefore, BE emerges as T and P of the experiments. Such experiments help deli-
the method of choice. The theoretical background of BE, mit the feasible solution set (Z) for the reaction equilib-
and its relation to MAP and REG, is explained. Al- ria. The phase properties and the reaction properties
though BE is conceptually simple, it can be computa- are mutually linked through the stoichiometry of the
tionally demanding. Fortunately, modern computer reaction.
technology and new stochastic methods such as Gibbs Simultaneous treatment of PP and RP data has been
sampling help surmount those difficulties. The basic achieved thus far by two alternative methods. While Hol-
ideas behind these methods are explored and recom- land and Powell (1990) used least squares regression
mendations for their use are made using the A12SiO5 (REG) for that purpose, Berman (1988) applied the tech-
unary as an example. The potential of BE and its future nique of mathematical programming (MAP). Both REG
perspective for application to multicomponent-multi- and MAP 1 evolved over the years, the papers cited rep-
phase systems appear very promising. For the conve- resenting a recent stage of development. The advantages
nience of readers not interested in the mathematical de- and limitations of both techniques have been acrimon-
tails of BE, an illustrative example is given in the Appen- iously debated and defended by their proponents (Ber-
dix to promote an intuitive understanding of what BE man et al. 1986; Holland and Powell 1990; Engi 1992;
is all about. Powell and Holland 1993). The fundamental merit of
REG is that it provides a unique set of refined thermody-
namic data with their uncertainties and correlations,
Introduction making propagation of errors into the computed phase
diagrams straightforward. This is a very attractive fea-
The availability of internally consistent thermodynamic
ture indeed, especially when it comes down to geother-
databases (Berman 1988; Holland and Powell 1990) has
mobarometric applications. Unfortunately, REG can
Present addresses:
* Faculty for Mathematics and Physics, University of Bayreuth,
D-95440 Bayreuth,Germany i Readers in search of simple worked examples of REG and MAP
** Helene-Weber-Allee 18, D-80637 Munich,Germany may refer to Powell and Holland (1985) and Chatterjee (1990),
Correspondence to: W. Olbricht respectively
37

not handle sets of inequalities and equalities with rigor. putational fronts are underway to make this technique
To avoid that problem, it operates with arbitrarily paired more versatile than it is today.
sets of reversals, making assumptions as to the nature
of the probability distribution between them. MAP, on
the other hand, does handle sets of inequalities and equa- The Thermodynamic Model
lities with rigor, but returns a range of solutions, out
of which an optimal solution must be derived by mini- Derivation of internally consistent thermodynamic data-
mizing a suitable, if somewhat arbitrarily chosen, objec- sets requires that the underlying thermodynamic model
tive function. Two cases arise during the process: be specified. To achieve that, we start with the condition
(a) A directly measured PP turns out to be an element of chemical equilibrium for a reaction written in terms
of Z, the feasible solution set. In this case the value of of phase components (solid or fluid),
the objective function is zero and the optimal solution
given by MAP is same as the measured phase property. vi #i---At p--O. (1)
Whenever this happens, MAP makes no further use of t
the information inherent to Z.
(b) In the majority of cases, however, a measured PP In this equation v~ and #~ denote, respectively, the stoi-
is unlikely to be an element of the feasible solution set chiometric coefficient and the chemical potential of the
Z. Consequently, the optimal solution obtained by mini- i-th component. Furthermore, the products count posi-
mizing the objective function is an element on the tive and the reactants negative in the summation process,
" b o u n d a r y " of the feasible solution set (cf. Chatterjee specifying the meaning of Ar p, the difference of chemical
1990, Fig. 7.9). potentials due to a reaction. For the present purpose,
Either way, it seems that te optimal solution given by the chemical potential of the i-th component will be de-
MAP is not necessarily the most desirable one. Apart fined as
from that, MAP has not so far led to error propagation
into computed phase diagrams in any straightforward #i-Aa G* +Gtr, i+Gds,i+ RTlnai. (2)
manner, which is indispensable for meaningful geother-
The first term on the RHS of (2), Aa G*, is the apparent
mobarometry.
molar Gibbs energy of formaton of i at a given P and
The dichotomy of approach indicated above need not
T (as defined by Berman 1988) 3. The next two terms,
persist, however. The objective of this paper is to draw
Gtr, t and Gds,~, refer to the Gibbs energy differences due
attention to the technique of Bayes estimation (BE),
to phase transition and disordering, respectively, that
which is capable of combining the advantages of both i may undergo between the standard pressure po (1 bar),
R E G and MAP. BE is not only mathematically sound, standard temperature T o (298.15 K) and P, T. And final-
it also returns unique solutions for the refined thermody-
ly, the R T In at-term includes the activity of i, at, referred
namic parameters, including their uncertainties and cor-
to the pure substance i having unit activity at P and
relations. Furthermore, the solution obtained from BE
T. The Aa G*-term of Eq. (2) is obtained from the relation,
is more appropriate than that from MAP, primarily be-
cause it is not an element on the boundary of the feasible T
solution set Z. The virtue of such a solution becomes A~G*=AfH~ y Cp~(T)dr
quite clear when phase equilibria calculations are done TO
beyond the P - T - r a n g e s of the reversal brackets, a fre-
quent necessity in geothermobarometry. In an earlier -T(S~ ~T~)dT)+ ~ V~,T(P)dP, (3)
stage of development of the Bayes method, satisfactory T~ po
solutions were obtained in chemical systems comprising
just a few phases. 2 Development of faster processors and Af H/~ being the molar, enthalpy of formation of i from
more recent stochastic techniques made its application the constituent stable elements at po, T 0, S Othe standard
to systems with a larger number of phases feasible. In molar entropy, Cpi(T) the molar heat capacity power
the first part of this paper (Part I), we deal with the function, and V~(T,P) the T - P - d e p e n d e n t molar volume
theoretical background of BE. In the second part (Part of i. The Cpz(T)-function will be expressed as
II, Chatterjee et al. 1994), a worked example involving Cpi(r)=ai§ T+ci T-2 § r ~ T 2 + f / T -3
22 phases of the system C a O - A 1 2 0 3 - - SiO2 - - H 2 0 will
be given. Further developments on theoretical and corn- q- gi T- l. (4)

Such a representation allows alternative use of the


2 At that time the technique of restricted sampling, described later
Cpi(T) polynomials utilized by various authors includ-
on in the section "Computation of the Estimate and Uncertainty ing Robie et al. (1979) and Berman and Brown (1985).
Regions", was tested in an attempt to obtain internally consistent
thermodynamic datasets for the system AlzSiOs. The PP and RP
data were identical to those documented by Chatterjee (1990, 3 The use of molar Gibbs energy of i at P and T, G*, is more
Chapter 7) in his worked example of MAP. Although the feasibility deeply entrenched in thermodynamic literature. To grasp the differ-
of BE was established, the time required for restricted sampling ence between G* and the apparentmolar Gibbs energy of formation
(on a mainframe then accessible to us) was too prohibitive for of i at P and T, Aa G*, used here, readers may refer to Chatterjee
its routine application to systems with a larger number of phases (1990, pp. 11-13)
38

Moreover, following Berman (1988), the V~(T,P)-terms (c) an estimated value for S o has to be used because
for the condensed phases are expressed as no direct calorimetric measurement has been executed.
The flip side of treating S o as parameter is of course
V/(T, P ) = V/~[1 +vx,i(T- r~ - T~ 2 an increased dimensionality of the correlation matrix (see
+ v3,i(P- pO) + v4,i(P _ po)2], (5) Appendix of Part II for an example), making data tabula-
tion an unwieldy affair for systems with a large number
Vii~ being the standard molar volume of i. In the event of phases.
of one or more phase transitions intervening between Two more statements must be made regarding our
p0, T o and P, Tthe Gtr-terms associated with them will model:
be evaluated in a manner described by Berman (1988, (1) For each oxide component of a system of interest,
pp. 448-451). We shall also stick with his strategy (Ber- we shall use its reference value of Af H ~ recommended
man 1988, pp. 451-452) to calculate Gas. Alternatively, by the C O D A T A task group; it will not be subject to
a Landau formalism will be used to compute those quan- any further refinement (see also Berman, 1988, p. 454).
tities. Whenever that is done, the quantity of interest To the extent that an oxide does not participate in any
will refer to the low-temperature low-symmetry phase of the reversed reactions or in any thermochemical cycle
at T ~ pO. Consequently, the chemical potential of a con- used to derive the Af H ~ of the silicates, the Af H ~ of
densed component s will follow from the relaton an appropriate silicate will proxy as anchor. Note, that
S o of the reference phases will still be subject to refine-
ment by BE.
(2) At the present stage of application of BE to our
ro r
problem, Cp(T) is not treated as a "parameter" of the
P model. In future that will be necessary at least for those
+ ~ Vr, s(P )dP+ Gt~,s+ Ges,s+Rrln as. (6) phases for which the superambient heat capacity meas-
po
urements extend to barely a few hundred Kelvin and
whose Cp(T) must be extensively extrapolated in order
Aside from condensed phases, fluids also participate to handle the RP restrictions. Cases in point are such
in many equilibria to be handled by the Bayes method. common minerals as anthophyllite, aragonite, brucite,
Choosing the hypothetical ideal gas having unit fugacity calcite, diaspore, dolomite, kaolinite, lawsonite, pyro-
at pO and T as standard state, the chemical potential phyllite, talc, zoisite etc. (cf. Berman and Brown 1985,
of a fluid component fl can be written as Table 3). However, attempts to handle Cp(T) as a "pa-
rameter" will not be meaningful unless reliable experimen-
af~=AfH}~+ S CPs~(r)dr-r S}~+ CPlz(r)dr tal data on Af H ~ S ~ and V/(T,P) are available for the
TO TO r relevant phases. Only when those prerequisites are met,
PT the modified Cp(T) functions are likely to be useful for
f~'~_
+ R T In f~o,r -- R T in ail. (7) predicting phase relations beyond the P - T range of the
f bracketing experiments.

Having outlined the requisite identities, let us return to


the model utilized to derive an internally consistent ther- Mathematical Background
modynamic dataset. This demands that we specify which
of the terms contributing to the chemical potential of Introductory Remarks
solids [Eq. (6)] and fluids [Eq. (7)] must be refined first
and foremost; those are the "parameters" of our model. This section is intended to give a concise and yet fairly
The remaining terms will be handled as if they were precise description of the statistical techniques used in
known with sufficient accuracy and need not be refined the BE approach. Inevitably, this requires the use of
anymore; they are the P - T - X~-dependent "constants" some mathematical terms and notation, which may not
for our model. Given the notorious difficulty of obtaining be familiar to all readers. Although these methods as
sufficiently accurate calorimetric Af H ~ data for minerals, such are not intrinsically complicated, their extensive use
it is evidently the prime candidate for refinement. Indeed, here may make this section somewhat difficult to digest
Holland and Powell (1990) settled for refining only at first sight even for some readers with a rather good
Af H/~ thus their model had just one parameter per working knowledge of inference statistics. Those in need
phase. Our thermodynamic model will have two parame- of a refresher are referred to Box and Tiao (1973) for
ters, Af H ~ and S ~ Though calorimetric measurements an excellent introduction. Readers not interested in delv-
of S o are, in general, quite reliable, constraining S o never- ing into the subtleties of BE will find an elementary ex-
theless appears desirable because ample in the Appendix, which seeks to provide an intui-
(a) in numerous cases an empirical "impurity correc- tive understanding of the essential issues. While pursuing
tion" is applied to calorimetric data on chemically com- this section, readers may find it worthwhile to refer to
plex naturally occurring minerals to derive the S o of the Appendix and vice versa.
the relevant phase component (end member),
(b) the configurational contributions to S o of some
end members remain unknown, or
39

The Bayes Method than 2 but smaller than 6, then we shall of course update
our initial knowledge in the light of the new data y.
Using Eqs. (1), (6), and (7), the condition of chemical Ruling out 1, 2, and 6, and renormalizing, we end up
equilibrium may be generalized as with the following conditional probabilities (for given
y)' p ( 0 t y ) = 0 for 0 = 1 , 2 , 6 and p(0iy)= 89 for 0 = 3 , 4 , 5 .
Ar p = 0 = A r H ~ S~ + RS(P, T, X3. (8) BE formalizes precisely this procedure. Our initial
knowledge about an unknown quantity 0 is described
Here RS(P, T, Xi) is the sum of all P, T, X~-dependent by a probability distribution, the so-called prior proba-
"constants" of our thermodynamic model. By contrast, bility distribution, p(0). Moreover, we assume that for
the terns Ar H ~ and Ar S O arise from summation of the each value of 0 we can calculate the probability that
parameters of our model, Af//o and S ~ in keeping with a certain data vector y is observed in a specified experi-
the sign convention implied in Eq. (1). Our objective is ment, i.e., we have p(y]0). Then if y is the observed
to refine the latter two quantities by BE. data v e c t o r - Bayes' theorem provides us the so-called
As indicated in Section 1, the experimental data fall posterior distribution of 0 for given y,
into two categories - phase property (PP) and reaction
property (RP) measurements. They yield two different
p (0 ]y) - p (y [0). p(0), (10)
types of information: (PP): For k phases, all information
P(Y)
on Af H ~ and S o is conveniently described by a (2k)-
dimensional normal distribution with known expecta- in other words, the posterior distribution is proportional
tion vector and covariance matrix on the basic parame- to likelihood times prior distribution. Note that the de-
ter space nominator, p(y) is a normalizing constant (for further
0 : {(~f H 0, S 0, hf H ~ S ~ .. ., Af Hk,
o &)}
o =IR 2k (9) explanation, see Appendix) which ensures that p(01y)
is again a probability distribution. The posterior distri-
The covariance matrix is diagonal, since the calorimetric bution is an update of the prior in the light of the ob-
data for Af H ~ and S O are independent of each other. served data and summarizes all that is known about
(RP): Information on the sign of Arp of the reaction 0; it combines prior knowledge and new data. For fur-
reversal experiments. By virtue of Eq. (8), each reversal ther details of the Bayes method, the reader may refer
corresponds to a linear inequality dividing O into a feasi- to the statistical literature, in particular, to the compre-
ble half space and its complement. The intersection of hensive treatise by Box and Tiao (1973).
the feasible half spaces constitutes the feasible solution In the specific situation under consideration, the prior
set Z. Note that inconsistent reversals lead to an empty distribution is the normal distribution from PP and the
feasible region Z. However, if Z is not empty, it is a likelihood function is the indicator function on the feasi-
(not necessarily bounded) "polygon", because it is de- ble set Z from RP. Hence, by Bayes' theorem, Eq. (10),
fined by linear constraints. Note moreover that the rever- the posterior distribution is the normal distribution from
sal experiments do not usually give more specific infor- PP truncated to the set Z from RP. In what follows,
mation about the " t r u e " parameter values within Z (cf. we denote this conditional or truncated distribution by
Demarest and Haselton 1981). Therefore, following ad- F and its density (strictly speaking, with respect to the
justment of the reversals to account for the uncertainties Lebesgue measure) by 7 (0). We shall first describe F by
of measurement of T and P, it seems safe to assume a few parameters in a manner reminiscent of the descrip-
that all values of Z are equally likely. Thus, if we calcu- tion of the normal distribution from PP by its expecta-
late the probability of the observational data (the signs tion vector and its covariance matrix. Due to the unimo-
of Ar #) as a function of the underlying parameter vector dality of F and the convexity of its support, the mean
0, we end up with a function which equals 1 on Z and of F in our situation has much to commend itself as
vanishes elsewhere. In statistical terminology, the so- the best point estimator of 0 (cf. also remark (d) below).
called likelihood function turns out to be the indicator It has the following desirable properties:
function of Z. If Z is bounded, the likelihood function - it lies invariably in the interior of Z,
can be standardized, so that it integrates to one and - it is the center of gravity of F, and
can then be regarded as a probability distribution on - it minimizes the posterior expected loss if a quadratic
Z. Thus, the standardized likelihood function corre- loss function is employed (Box and Tiao 1973, p. 308).
sponds here to a (possibly improper) uniform distribu- Hence, we shall use
tion on Z.
The mathematical problem is to combine both types 0.E = E ( F ) = ~ 0 d F ( 0 ) = ~ 0 7 ( 0 ) dO (11)
of information, PP and RP, in a meaningful and exact
way. A simple example may serve to illustrate the situa- as point estimator of 0, E denoting the expectation vec-
tion. Suppose a regular die has been thrown once and tor.
we are asked to guess the outcome. Since the die is regu- The definition of a measure of uncertainty is some-
lar, we know that all outcomes have equal probability; what less straightforward, though, depending upon the
thus, our initial state of knowledge is that the probability purpose, several approaches are possible:
for each of the possible outcomes, 0 = 1,..., 6 is ~, or (U 1) The feasible region Z. The use of Z as a measure
more formally, p(0)= 89 for 0 = 1, ..., 6. If new informa- of uncertainty of a computed equilibrium P - T - c u r v e
tion, y, is now added, asserting that the outcome is larger is conceptually simple; for example, at a specified T, we
40

may derive the range of P compatible with Z. Then the librium curves in the same way as for U 1. Since the
resulting uncertainty bounds for the equilibrium curves use of error propagation is the conventional technique,
are the outer envelopes of the collection of all curves we shall use it here and in the follow-up paper (Part II).
stemming from points from Z. Clearly, this approach Note, however, that the uncertainty envelope derived
is overly cautious and leads to a rather broad uncertainty from U3 by whatever method will also be called U 3
band. for the sake of simplicity. Furthermore, we wish to em-
(U2) An HPD-region (highest posterior density region) phasize that all the above quantities are well-defined and
of content 95%. Such a region M is defined by the prop- unique for the type of problem considered by us.
erties Prior to discussing their computation in practice, we
shall add a few comments concerning our approach:
F(M)=0.95 and V01eM V02~M:y(01)>_y(02). (a) Two points, often made against the Bayesian ap-
proach, are of no concern in our context. First, the prior
Uncertainties for a computed P - T - c u r v e may then be
distribution is objectively given and not subjectively -
obtained as for U 1. Though most attractive theoretical-
let alone arbitrarily - chosen. Even if calorimetric data
ly, such regions are difficult to determine in practice in
on Af H ~ and S o are not available, a suitably chosen
our specific situation and will not be used in the sequel.
uniform distribution is certainly defensible as prior. Sec-
Note, however, that the step from Z to an HPD-region
ondly, a "degree-of-belief" interpretation of probability
of content 95% corresponds exactly to the calorimetric
and confidence intervals is generally accepted in this field
convention of quoting 2 a-uncertainty regions.
(cf. Demarest and Haselton 1981, p. 218). Thus, the Baye-
(U 3) An adjusted covariance matrix. Variances and co-
sian methodology seems to be quite appropriate here
variances are the classical measures of uncertainty, but
(see also Kolassa 1991, p. 3547 for a discussion of some
their direct use in this case is hampered by the fact that
difficulties associated with the frequentist approach).
F is not a normal distribution and would invalidate the
(b) The methodology continues to remain valid if only
intended probability statements such as 2 a ~ 9 5 % . As
PP or RP type information is available. It also applies
a practical expedient, we.therefore suggest that the co-
to the special case where Af H ~ of certain phase (like
variance matrix of F, coy(F), be modified by an adjust-
reference oxides or silicates) are anchored. The latter case
ment factor (a > 1) in such a way that the respective ellip-
leads to a reduction of the dimension of |
soidal 50% and 95% regions (formed in the same way
(c) If the reaction reversal experiments yield more infor-
as for the normal distribution) would contain 50% and
mation about the position of the equilibrium within the
95% of the probability mass of F. Formally, let ~ (b,
reversal brackets (see K o h n and Spear 1991, p. 129), the
c) denote the ellipsoid
Bayes method is still applicable. K6nigsberger and
e(b,c):={O~| (12) Gamsfiiger (1990) and K6nigsberger (1991) have used
Bayesian estimation for a special case with a normal
and set prior distribution and normally distributed data. They
%=1 obtained the Bayes estimate as solution of a general
al ,=inf{b IF(~(b, ~2k(0.5))) > 0.5} minimization problem. However, for a full-fledged Baye-
a2 :=inf{b IF(~(b, ~2k(0.05))) > 0.95}, sian analysis of the general situation the concept of the
with Z2(cQ denoting the c~-fractile (or 1 0 0 ( l - e ) th per- posterior distribution, as in Eq. (10), is indispensable.
centile) of the z2-distribution with n degrees of freedom. (d) The mode (i.e. the location of the maximum) of the
Then the adjustment factor a is defined as posterior distribution F, rather than its mean - as given
by Eq. (11)- may also appear worth considering in this
a : = m a x {ao, a 1, a2} (13) context. It is obvious that the mode coincides with the
and the adjusted covariance matrix is solution obtained by MAP using the qudratic objective
function proposed by Berman et al. (1986, p. 1342). In
I:Br:= a. coy (r). (14) other words, MAP is a special case of BE. The mean
is a more logical choice here because the mode, the qua-
This adjusted covariance matrix can then be used in dratic optimality point, is either identical to the calori-
an identical fashion as a classical covariance matrix to metric input data or lies on the boundary of the feasible
compute ap or at-bands (or 2ap, 2at) on phase dia- set Z. Nevertheless, our point of view has something
grams by error propagation. Its use implies that all cor- to offer to the MAP methodology: the uncertainty re-
relations are taken into account. The same procedure gions U2 and U3 based on the posterior distribution
can also be applied separately to the two-dimensional can be used in this context as well and could complement
marginal distributions of Af H ~ and S o for each phase the use of U 1 suggested, but not used by Berman et al.
i(i--1, ..., k). This version may be attractive because it (1986). (Note, incidentally, that U2 is not "symmetrical"
permits an easier tabulation of the results. However, it with respect to 0~E and that 0~E is not necessarily an
invariably leads to an unrealistically wide uncertainty element of an HPD-region of content ( l - e ) - 1 0 0 % for
envelope because it ignores all correlations between larger values of c~. With e = 0.05, this seems unlikely to
phases; we do not recommend its use. An alternative create any difficulty in practice, if it occurs at all.)
way of using the adjusted covariance matrix is to first (e) Finally, it may be instructive to illustrate the metho-
delimit a 95%-ellipsoid in the basic parameter space | dological differences between BE, MAP, and REG, using
and then derive the respective uncertainties for the equi- a one-dimensional case as example. Suppose that calori-
41

metric data yield a normal distribution with mean 4, m, of OnE, that is


and standard deviation, a, for Af H ~ of a certain phase
N
and that reaction reversal experiments constrain this
value to the interval [c, d] (c < d). Then the Bayes solu- vj'lz(vj)
tion is the mean of the truncated normal distribution 0hE= 1 ~ w j - j=IN ' (19)
on [c, d], that is J =' 2 lz" (vj)
j=l
(t) - qo (u)
0BE = m-[ ( I ) ( u ) - (I)(l) "if' (15) where 1z is the indicator function of the set Z defined
by
c -m d- m
where l.'= , u:= and ~0 and q~ are the density
O" o- lz (v) = {~', if w Z
and the distribution function of the standard normal if vCZ (20)
distribution, respectively. The MAP solution with the
quadratic objective function leads to If F n denotes the empirical distribution of {w~, ..., w,}
1
d, if d<m (i.e. the probability distribution which places mass - on
n
OMA P = m, if m E [c, d] each of the vectors wj(j = 1..... n)), Eq. (19) can be written
c, if m<c (16) in analogy to Eq. (11) as

OnE= E (F~) = ~ w d F n (w). (21)


And finally, the R E G solution is to a first order approxi-
mation
Likewise, we can obtain an estimate for U 3 by replacing
c+d F with its empirical counterpart F n. The computation
0ReG-- 2 (17) of U 2 and U1 does not need to be based on a random
sample from F and will be briefly touched upon below.
Thus, OMApalways ignores part of the reaction reversal The restricted sampling method has a serious draw-
information whereas 0REG ignores the calorimetric infor- back though. If Z is small or far apart from the PP
mation. The meaning of the uncertainty regions U 1, U2, mean, which is true for most cases we need to explore,
and U 3 is obvious in this context (see Appendix). then the probability that a vector vj fulfills all RP restric-
tions is low and most vectors vj will be rejected. In other
words, N may be so much larger than n that the proce-
Computation of the Estimates and Uncertainty Regions dure becomes useless in practice. The real advantage of
this technique is that it produces n independent realiz-
The computation of OnE involves high-dimensional inte- tions from the posterior distribution F, which greatly
gration and is, in general, a challenging problem. Except facilitates subsequent statistical treatment like estimation
for trivial cases, explicit expressions are not available; and accuracy assessment. The method, however, is very
even attempts to resort to conventional numerical inte- useful for simple problems involving few dimensions and
gration formulae falter. The only feasible approach ap- restrictions, for pilot studies (see Section 4), and for tests
pears to be the Monte Carlo technique, which determines and "calibrations" of more sophisticated techniques.
OnE by a statistical experiment on the basis of (pseudo-) The Gibbs sampler is a Markov chain Monte Carlo
random numbers. However, its general formulation (MCMC) method for Bayesian computation. Although
leaves some latitude for a careful design of such an exper- it was applied in statistical physics for equations of state
iment. We consider here two very different, if typical, calculations (Metropolfis et al. 1953) many years ago, the
ideas, that is restricted sampling and the Gibbs sampler. method is currently in rapid development and an impres-
Restricted sampling is an almost foolproof, but poten- sive flow of papers has appeared in the last two years.
tially extremely inefficient way of estimating OnE. In this For a comprehensive survey of the state of the art and
approach, a random sample of N vectors v~ .... , VN from a wealth of ideas on diverse aspects of M C M C methods
the normal distribution pertaining to PP is generated, like convergence diagnostics and monitoring, selection
followed by elimination of all vj(j = 1, ..., N) which vio- of starting values, and implementation details, reference
late at least one of the PR restrictions. The remaining may be made to the papers presented at the Royal Statis-
sample {wl .... , w.} comprises the vectors vj which lie tical Society meeting on "The Gibbs sampler and other
within Z, Markov chain Monte Carlo methods" (e.g. Smith and
Roberts 1993; Besag and Green 1993; Gilks et al. 1993
{W 1 . . . . , Wn} : = {V1 . . . . , VN} (3 Z , (18) and the references therein). Smith (1991, p. 383) also dis-
cusses the relation between analytic approximation, nu-
and is a random sample from the posterior distribution merical integration, and Monte Carlo simulation for the
F. The mean of those vectors is a Monte Carlo estimate calculation of Bayesian integrals and gives some recom-
4 The conventional statistical symbol for mean is p, which has been mendations as to the respective areas of application. Gel-
replaced here by m to avoid any possible confusion with our symbol fand et al. (1990) present some illustrative examples for
for chemical potential Gibbs sampling.
42

The basic idea behind the M C M C methodology is 1993, in preparation). A yet more sophisticated proce-
to use an iterative scheme that generates a discrete time dure might use different vertices of Z, but the use of
Markov chain whose state space is Z and whose equilib- the quadratic optimality point can be justified on the
rium distribution is F. Simulated values from this chain ground that it is the mode of F and hence, the probability
can then be used to calculate the desired quantities of of its neigborhood is relatively high. In summary, we
F. This is an appealing idea since we only generate vec- thus have three "tuning parameters" characterizing our
tors in Z. On the other hand, successive realizations from simulation runs:
the Markov chain are clearly correlated such that the - the number of times the experiment is repeated,
generted random vectors do not represent a random starting near the quadratic optimality point,
sample from F. The Gibbs sampler (Geman and Geman - the number of Gibbs points that a single experi-
1984) proceeds as follows. Given an arbitrary starting ment should yield or, in other words, the number of
vector t o =(t o .... , t~ from Z draw subchains that are linked together in the manner de-
scribed above, and
tl from F(t, It~ ..., t~ - the length of each subchain.
t21from F(t2 Itl, t~ t~ The next section will give some details regarding our
choice of these tuning parameters. We emphasize that
i~k from r(t2k Itl, 9 . . , t2k-1), (22) no claim is being made concerning optimality or effi-
ciency of the above suggestions for a particular situation.
F(.13 denoting the respective full conditional distribu- Our choices were chiefly guided by pragmatism, concep-
tion. This yields the first Gibbs vector t 1 =(t~, ..., t~k ) tual simplicity, and ease of implementation.
and by iterating the same cycle we obtain the sequence Let us turn once more to the computation of the
t ~ t 1, ... which (under mild conditions) is a realization uncertainty regions. Calculation of U 3 based on the em-
of a Markov chain with the above properties. pirical distribution Fn of a sample (or "pseudo-sample")
The sequence t ~ t 1, ..., derived above, may be used from F has been described above. Although we do not
in two different ways. First, we could calculate the time intend to pursue this question in this paper any further,
average as an estimate for 0~E. However, our experiences we do wish to mention that Fn could be a makeshift
with that type of ergodic averaging, recommended by solution if direct handling of U2 happens to be intract-
Ritov (1989), have been disappointing. Presumably, that able. For this purpose, one could trim off the 5% most
is due to the fact that for our type of problems, Z is "distant" (in a sense to be specified) vectors from the
usually very "long and thin". Consequently, the chain sample and use the convex hull of the rest - following
advances in minute steps and convergence to equilibrium some adjustment - as an estimate of an HPD-region
is very slow (cf. also Smith and Roberts 1993, p. 5). The of content 95%. An obvious choice for a "distance meas-
possible remedy - use of the correlation structure of F ure" is the so-called Mahalanobis distance based on the
- would unfortunately destroy the main asset of Gibbs PP-information, which is identical to the quadratic ob-
sampling: its ease of implementation. jective function used in MAP. And finally, uncertainty
The second way of using the Gibbs sampling output envelopes due to U 1 are usually best computed directly
is to mimic a random sample from F. Such a "pseudo- from RP. For a given temperature T, the extremal values
independent" sample can then be used like a "genuine" of RS(P, T, Xi) in Eq.(8) over Z can be calculated by
random sample from restricted sampling to compute 0B~ linear programming. If RS (P, T, Xi) is monotone in the
and U3; Eq. (21) remains valid for this case and no new variable P, the pertaining lower and upper boundaries
statistical considerations are required. The pseudo-inde- for P can be readily obtained.
pendent random sample can be obtained either by repli-
cate independent runs and taking the final states from
each or by subsampling from a single long run of the Implementation Issues
chain with appropriately large spacings between the out-
comes that are to form the sample. The latter procedure Some Computational Details
can be regarded as a version of the former if the final
state from the preceding subchain is chosen as the initial Aside from the methodological questions addressed in
value for the next one. We have opted for a compromise Section 3, the sheer size of the input databases (PP and
idea in which the initial value is randomly chosen from RP) to be handled by the Bayes method poses special
all the states assumed by the previous subchain. This problems that deserve attention. Efficient and skillful use
will hopefully further reduce dependence on the starting of the available resources becomes a necessity no matter
value and also allow occasional larger steps lest the chain how powerful the available computing equipments are.
stalls in a remote region of Z for too long. We shall, For example, elimination of the redundant RP-restric-
however, always repeat the whole experiment several tions may easily help speed subsequent Monte Carlo
times, starting each run near the quadratic optimality computations by a factor of 2 to 5 and should always
point to ensure an element of genuine replication. The be the first step of the analysis. Restriction redundancy
quadratic optimality point itself is not suitable as a start- can be checked by solving a linear programming prob-
ing point because it usually lies at the boundary of Z; lem, for which efficient algorithms are available (e.g. Gill
further details will be given in a subsequent publication et al. 1983). Another crucial topic is the generation of
on the programming aspects of this study (Miller et al. the required random numbers. Since our programs are
43

written in Turbo-Pascal, version 6.0, we have utilized Table 1. Results of the experiment E 1. Entries are means and stan-
its standard generator R A N D O M . Theoretical and em- dard deviations of Af H 0 (J/mol) of andalusite based on 5000 sam-
pirical tests conducted by D o h m a n n et al. (1991) have ples points for ten runs. The last line indicates the overall values
on the basis of all 50000 sample ponts
not detected any deficiencies of this generator. It is a
linear congruential generator with a period length 232, Run Mean Af H ~ (J/tool) of andalusite Standard deviation
but uses only the integers 0 .... ,231 and assumes each
of the values between 1 and 231-1 twice before the se- 1 - 2 589948.54 529.36
quence restarts. The uniformly distributed random 2 - 2 589 952.20 530.25
numbers generated by R A N D O M were mostly con- 3 -2589 946.21 533.78
verted to normally distributed ones by the Box-Muller 4 - 2589960.94 527.65
5 -2589959.08 522.64
method; only if the truncation interval for the one-di- 6 -2589951.14 536.05
mensional conditional normal distributions in Eq. (22) 7 -2589962.15 517.76
was too far from the mean, an acceptance-rejection meth- 8 -2589946.44 510.52
od was utilized. Two further points are worth mention- 9 -2589938.85 528.37
ing. First, if we carry out replicate runs of an experiment 10 - 2589949.73 523.68
(with unchanged tuning parameters), we would like to overall -2589951.53 526.06
be able to combine the results afterwards. However, if
a single experiment already consumes a substantial por-
tion of the random number sequence, there is a non-
negligible danger of " overlap ". As a precaution, we have
always linked repetitions of the same experiment and inclusion of the latter drastically increased the run times
employed the last random number from the preceding needed for a sufficiently large restricted sampling, which
one to initialize R A N D O M for the next run. Secondly, is a prerequisite for a broad range of comparison with
the whole sequence may occasionally be used up. For Gibbs samples with different subchain lengths. This arti-
instance, the ten replicates of the first experiment de- ficial limitation will, however, be lifted in a subsequent
scribed in the following subsection required a total of step. Consequently, the results obtained in the two steps
2517246835 random numbers, roughly half of the period will be slightly different. It must be clearly understood
(232), and about as many as the total number of distinct that exclusion of the restrictions due to the andalusite-
values that R A N D O M can produce. Even though this = sillimanite equilibrium has no consequence whatsoev-
would be inappropriate for the simulation of stochastic er for the methodological aspects of M C M C emphasized
phenomena, it seems still possible for Monte Carlo inte- in this subsection. Note moreover that for the sake of
gration by simple averaging, as in Eq. (19). The reason brevity, we discuss the results of the pilot study in terms
is that the lattice structure which is produced by linear of Af H ~ of andalusite only; however, the final results
congruential generators then makes the procedure given in Tables 3 and 5 list the values of all six variables
border on to a "numerical quadrature". For the experi- of the unary.
ment in question, which serves as a control experiment, The simulation for our pilot study consists of five
this was a desired effect, but in general, one should be parts, each of which comprises 10 replicates of the follow-
wary about exhausting the random number sequence. ing experiments:
Further details about the generation and testing of ran- E 1: generation of a random sample of size 5000 by re-
dom numbers and their use in Monte Carlo studies can stricted sampling,
be found in Ripley (1987) or in Rubinstein (1981) and E2: generation of a random sample of size 500 by re-
the references listed there. stricted sampling,
E3: generation of a Gibbs sample of size 500 with sub-
chain length 500,
A Pilot Study and Some Considerations Regarding E4: generation of a Gibbs sample of size 500 with sub-
the Choice of the Tuning Parameters chain length 1000, and
E5: generation of a Gibbs sample of size 500 with sub-
In this subsection we shall demonstrate the accuracy chain length 1500.
and reliability of the Monte Carlo method and give some For each part, the random number generator was initia-
guidelines for the choice of the tuning parameters and lized with a seed value from a table of random numbers;
the design of the simulation experiments. To facilitate the ten replicates within each part were sequentially
a broad range of comparison, we chose the AlzSiO 5 un- "linked" in the way described earlier to prevent overlaps.
ary - of perennial interest to petrologists - as an exam- Note, however, that each of the ten replicates in E3,
ple. The PP and RP input data used for this purpose E4 and E 5 starts near the quadratic optimality point.
are documented in Part II of this paper (Chatterjee et al. Table 1 reproduces the ten mean values and standard
1994). deviations of Af H ~ for andalusite obtained from ten rep-
The circumstances required that the simulation exper- lications of E 1. The overall mean and standard devia-
iments proceed in two steps. In a preliminary step, we tion,
processed only the restrictions for the equilibria kyani-
te=andalusite and kyanite=sillimanite, but not those Af H/~ (andalusite) = -- 2 589 951.53 J/mol,
for andalusite = sillimanite. Early on it was observed that (andalusite) = 526.06 J/mol, (23)
44

Table 2. Summary statistics for the 10 replicates of the experiments that a r a n d o m sample of 500 points leads to satisfactory
E I-E 5. The four successive entries for each experiment list mean, results and that a subchain length of 500, even more
standard deviation, minimum, and maximum for the ten determina- so 1000, ensures already that the Gibbs sample is suffi-
tions of the mean (left column) and the standard deviation (right
column) of Af H ~ (J/mol) for andalusite. Note that the results for ciently close to a r a n d o m sample. In E5, the mean,
E 1 reproduce the values already given in Table 1 in this condensed -2589940.26, may at first sight appear to be somewhat
form. The results for E2-E5 are analogous summaries for those different, but is in fact still within the range of what
experiments, whose individual values were not presented for brevity is expected. To grasp this, consider that this mean is
based on 5000 points like any other value in Table 1,
Experiment Af H ~ (J/tool) for andalusite Standard deviation and that the others vary to this extent too. Also the
slightly low standard deviation in E 5 is no reason for
E1 - 2589951.53 526.01
7.35 7.63 concern: considering the multitude of comparisons that
-2589962.16 510.52 we carry out, some minor fluctuations are bound to oc-
-2589938.85 536.05 cur. Incidentally, we have not followed the suggestion
to omit the initial part of each simulation run. While
E2 -2589945.36 530.78
16.91 15.65 such a strategy might appear promising and indeed re-
- 2589967.29 506.69 duce dependencies due to the identical starting point,
-2589919.53 550.73 it could on the other hand render detection of insufficient
E3 - 2589958.53 526.97 subchain lengths more difficult and thereby avoid one
17.23 12.75 specific s y m p t o m rather than cure the disease. In summa-
-2589980.41 506.03 ry, we conclude that for this example
- 2589923.62 545.46 - The M o n t e Carlo method works well and a sample
E4 -2589955.54 524.13 size of 500 is adequate for satisfactory results.
28.39 21.51 A subchain length of 500 or better 1000 seems
-2589998.85 492.96 to ensure that a Gibbs sample mimics a r a n d o m sample.
- 2589904.11: 561.26 We have outlined these arguments in some, but by no
E5 -- 2589940.26 533.76 means in exhaustive, details to demonstrate that even
11.01 19.42 elementary considerations suffice to make a headway to-
--2589954.96 508.62 wards a judicious choice of the tunning parameters.
--2589920.31 574.50 Note, in particular, that we are not at all helpless if
only Gibbs sampling works in a specific situation: our
conclusions would still remain valid if only E 3-E 5 were
at our disposal. The following suggestions m a y serve
are based on all 50000 sample points and should be as m i n i m u m requirements for simulation experiments of
this type:
good approximations to the "true values". E 1 was in-
cluded in this study to provide " a l m o s t true values" 1. Conduct pilot studies and replicates, if possible, using
to be used as a yardstick for later comparisons and, different methods and different values of the tuning pa-
rameters.
as indicated above, uses a brute force approach which
2. Carefully analyze the results applying cross-checks.
comes close to numerical quadrature. Table 1 shows a
remarkable stability of the results and illustrates that If unexpected phenomena are observed and they appear
to be attributable to a particular set up - such as the
for sufficiently large samples, the M o n t e Carlo method
method, r a n d o m number generator, tuning parameter
yields highly reliable and reproducible results, despite
- make sure this does not distort the final results.
its inherent randomness.
3. More specifically, for Gibbs sampling the subchain
Table 2 lists similar information for all five parts of
length ought to be sufficiently large so that no influence
the study in a condensed form. Instead of the ten values,
of this parameter on the final results is perceptible. A
we only reproduce the mean, the standard deviation as
reasonably stable estimate of the standard deviation
well as the minimum and the m a x i m u m over the ten
of F can then be used to determine the sample size
values. Note that the overall standard deviation in Eq.
(number of Gibbs points) required to achieve a desired
(23) or in Table 1 is not strictly identical to the mean
standard error using the relation
of the standard deviations of the ten runs from E l . If
everything were "ideal", Eq. (23) and the central limit standard error = o-/(sample size) ~ (24)
theorem would imply that the ten results for Af H ~ (anda-
lusite) from E2 to E5 would in each case behave like
Because of the central limit theorem, standard errors
r a n d o m samples of size 10 from a normal distribution
will be meaningful quantities for any reasonable sample
with mean value - 2 5 8 9 9 5 1 . 5 3 and standard deviation
size even though F m a y be far from normal. This stan-
526.06/5~=23.53. Likewise, the results f r o m E1 dard error describes the accuracy with which Eq.(21)
should also behave like a r a n d o m sample of size 10 from estimates 0BE and is, of course, conceptually different
a normal distribution with mean value - 2 5 8 9 9 5 1 . 5 3 from the uncertainty as measured by "uncertainty re-
and standard deviation 526.06/ 51/5~= 7.44, and the gions". If our suggestions regarding U 1 to U 3 are
reader m a y verify that they indeed do so. Evidently, E 2 adopted, there is no need to incorporate it explicitly in
through E4 pass the test with flying colors, indicating them.
45

Table 3. Final results for the A12SiO5 unary based on 50000 points Table 4. Maximum computing times over the ten replicates of the
of the experiment E 1. The six parameters indicated are At H ~ (J/ simulation experiments E I-E 5
tool) and So (J/K-tool) for each of the phases andalusite (And),
kyanite (Ky), and sillimanite (Si). The entries refer to (a) the mean Experiment Maximum run time (minutes)
values and the adjusted 2a values and (b) the correlation matrix
(up to two decimal places). Note that At H ~ and So of the phases E1 253
have been abbreviated as And H, And S etc. E2 27
E3 14
(a) The mean values (with adjusted 2~ in parentheses: E4 28
adjustment factor = 1.0170804) E5 41

And H -2589952 (1061)


And S 91.396 (0.140)
Ky H -2594198 (1059) Table 5. Refined thermodynamic data for the AI2SiO5 polymorphs
Ky S 82.309 (0.131) (andalusite, And; kyanite, Ky; sillimanite, Si) based on 5000 Gibbs
points obtained in the experiments E6 nd E7 described in the
Si H -2585686 (1081) text. The entries refer to (a) AfH ~ (J/mol) and So (J/K.mol) for
Si S 95.757 (0.371) each phase and (b) the correlation matrix (up to two decimal places)

(b) The correlation matrix (a) The mean values (with adjusted 2or in parentheses;
adjustment factor = 1.0845265)
And H And S Ky H Ky S Si H Si S
And H -2589782 (1063)
And H 1.00 And S 91.398 (0.143)
And S 0.07 1.00 Ky H -2594077 (1057)
Ky H 0.98 -0.03 1.00 Ky S 82.338 (0.132)
Ky S - 0.02 0.00 0.07 1.00 Si H -2585991 (1062)
Si H 0.91 -0.03 0.92 -0.05 1.00 Si S 95.493 (0.255)
Si S -0.13 -0.00 -0.13 0.02 0.23 1.00

(b) The correlation matrix

We wish to emphasize that more sophisticted conver- And H And S Ky H Ky S Si H Si S


gence diagnostics and intrinsic convergence monitoring
And H 1.00
is possible and strongly encourage use of such tech-
And S 0.11 1.00
niques; for further details, the reader may consult refer-
Ky H 0.99 -0.01 1.00
ences on Gibbs sampling cited earlier. A word of caution
Ky S -0.07 -0.01 0.05 1.00
is, however, in order: by their very nature these ideas
Si H 0.97 0.01 0.97 -0.01 1.00
cannot prove that the method works, but only establish
Si S -0.08 0.06 -0.08 0.29 0.13 1.00
a lack of evidence for its failure.
Table 3 displays the results for all six parameters of
the A12SiO5 unary based on the 50000 points from E 1.
The high correlation between Af H ~ of the three phases E6: generation of a Gibbs sample of size 500 with sub-
can be regarded in hindsight as a justification for display- chain length 1000 and
ing the results for andalusite only; the corresponding E7: generation of a Gibbs sample of size 500 with sub-
tables for kyanite and sillimanite would look very simi- chain length 1500,
lar. As in the previous tables, we have retained more linking each run to the previous one to avoid overlap
digits than necessary to facilitate re-calculations. On the in the random number sequence and also overlap be-
whole, numerical accuracy is not a matter of concern tween E6 and E7. A careful perusal of the results de-
in this approach; the formulae are far too simple to tected no inadequacy of the study, nor relevant differ-
create problems. However, time may be the impediment. ences between E6 and E7, so that the samples could
Table 4 lists the maximum run times required over the be pooled. Table 5 lists the results, including the correla-
10 replicates for the simulations E l - E 5 on a PC tion matrix, based on all 5000 Gibbs points.
equipped with a 80486 processor. With 64 bit processors A cursory look at the refined thermodynamic data
in the offing, and considering that the technique lends for the A12SiO5 polymorphs, reproduced in Tables 3 and
itself to all sorts of parallelization, its prospect for appli- 5, may not reveal much difference. However, the subtle
cation to substantially larger number of parameters ap- difference between them translates to a remarkable dif-
pears to be bright. ference in the computed phase diagrams. Figures 1 a and
In the final step of our pilot study, we processed the 1 b depict, respectively, the A12SiO5 phase diagram gen-
complete set of PP and RP restrictions for the A12SiOs erated from the data in Tables 3 and 5. Note that the
unary; that is, the RP restrictions due to the andalusite phase diagrams differ both with regard to the location
=silimanite equilibrium were no l o n g e r e x c l u d e d from of the andalusite = sillimanite curve and its U 3 uncer-
the input database. Thus time, five replicates for each tainty. This is due to our ignoring the RP restrictions
of the following experiments were conducted: for that equilibrium in deriving the datasets of Table
46
AI2SiO5 AI2SiOs
I I I I I I I I I I [ I I I I I

16 16 i AllRestrictions
WithoutAnd= Si ,~ /
Restrictions 2;,. I Included , /

-O
12
9 ,,!;;;;!;;iii
i!!!!!iiiii'#'#' 128 , ,ss*'S*""/Z"#/"
8 13-

4! Kyanite ,'," ''J' '


4-

Andalusite-~"l
~';'" i i "4 \l i i n'.~//,- i i ',i'-4 i i i
400 600 800 1000 400 600 800 1000
a T [~ b T [~
Fig. la-b. Computed phase diagram for the A12SiO5 unary based on the refined thermodynamic datasets given in Table 3 (a) and
Table 5 (b). The uncertainty envelope corresponds to U 3 in both cases. For further details, see text

3. Clearly, the database of Table 5 is more refined, be- like Gibbs sampling provide a feasible tool to calculate
cause it was subjected to all relevant restrictions. The the required quantities with adequate accuracy.
A12SiOs invariant point, based on the comprehensive An increased computational efficiency can be ex-
dataset of Table 5, is located at 509~ and 3808 bar, pected in future by pursuing the following strategies:
in near-agreement with that given by Holdaway and - Use of reparametrization and/or correlation structure.
Mukhopadhyay (1993). This is not fortuitous; it is be- That could considerably accelerate and facilitate Gibbs
cause our PP and RP input data are very similar to sampling. Likewise, random direction sampling schemes
theirs. In Part II of this paper (Chatterjee et al. 1994), or random choice of orthogonal directions may lead to
an example of refining the thermodynamic data for 22 potential gains (see Smith 1984).
phases of the C a O - A 1 2 O s - S i O 2 - H 2 0 system will be - Sophisticated intrinsic convergence monitoring and
given. That system also comprises the three AI2SiO s convergence diagnostics could alleviate the necessity of
polymorphs. The refined thermodynamic data for those time consuming pilot studies or repeat experiments.
three phases will be even further constrained because - O t h e r techniques of multiple integration may also
we shall have to consider additional RP restrictions in- prove useful. The reader may find the articles in Fluor-
volving kyanite and andalusite within the framework of noy and Tsutakawa (1989) useful in this regard.
the C a O - A 1 2 0 3 - S i O 2 - H 2 0 system. Owing to that
we may anticipate a marginally different set of refined
values for the A12SiOs polymorphs (see Chatterjee et al. Appendix: An Illustrative Example of Bayes Estimation
1994, Table 4).
In Eq. (9), we are dealing with a 2k-dimensional space,
which refers to two unknowns (Af H ~ and Sd) for each
Conclusions and Future Perspectives of k phases. In general, we shall have calorimetrically
measured values for them, which we seek to refine by
A close analysis of the two types of information, PP BE. To make the situation intuitively understood, we
and RP, to be combined for deriving internally consistent simplify matters by considering one dimension only; in
thermodynamic data, almost compellingly suggests that our case, this will refer to the standard molar enthalpy
BE is the method of choice. It is conceptually simple, of formation of phase 1, AfH ~ Consequently, this is
mathematically sound, and makes full and adequate use now our parameter 0, and the corresponding parameter
of all available information. It retains the advantages space | is the real line (real numbers) IR. Suppose, more-
of both MAP and REG while avoiding their respective over, that the calorimetrically obtained value for Af H ~
limitations and thus reconciles two objectives - strict is tabulated as -3000 kJ/mol with a 2~-uncertainty of
consistency and calculation of meaningful uncertainties 4- 2 kJ/mol. That is, Af H ~ lies with a probability of 0.95
- long regarded as conflicting or even incompatible. In in the interval [-3002, -29981. In other words, our
multicomponent-multiphase systems, MCMC methods prior knowledge about 0 (identical to Af H ~ is expressed
47

O
1.O p(ylO) , .o

0.5 0.5

I I I I I
--3002 --3000 --2998 --3002 --3000 --2998
0=AtH? 0=thH ?

c ~(OlY) 1.o
1 .ol

, \ o. 1- 0.5

i i L ~ I' I
--3002 --3000 --2998 --3002 --3000 0B E --2998
O=AtH ~ 0 = A f H I~

f
1 .0 1.0
i:!~:
~p(01y)
i:/:iii::ii!i~.
iiiiiiiiiiiiiiii~:. Fig. A. The prior p(0) (a), the likelihood
iiii::i::ii::iii!::~.. 0.5 0.5 p(yl0) (b), the combination of prior and
likelihood (c), and the resulting posterior
iiiiiiiiiiiiiiiiiiiiiiiiiiii::~: distribution p(01Y)for the simple (one-di-
mensional) example in the Appendix. Fig-
ure (d) shows the mean of the posterior dis-
i i I t li:i:i:i:i:i:i:i:~;i:2:!:!:i:!:!:!l! I I J J 1 tribution, 0~E, superimposed on the posteri-
--3002 --3000 -299E1 --3002 - - 3 0 0 0 OB E - - 2 9 9 8 or distribution. In each case, the densities
O:At HO '
i
I
O=AtH ~ i of the respective distributions have been de-
U2i ,I I picte& Finally, (e) and (f) show the uncer-
i l I
tainty regions U 1, U 2, and U 3. For further
UIi i U3 ~ J : discussion, see text

by a normal distribution with expectation value (i.e. not give more specific information about the location
mean value) - 3 0 0 0 kJ/mol and standard deviation (a) of 0 (or Af H ~ within the feasible solution set Z. In sum-
1 kJ/mol. The probability density, p(0), of this so called mary, the likelihood function p(yl0), which gives the
prior distribution is depicted in Fig. A(a). probability for the actually observed data y as a function
Next we consider the second type of information at of the parameter 0 is equal to one on the feasible solution
our disposal, which has been derived from reaction rever- set Z = [ - 2 9 9 9 . 7 , - 2 9 9 8 ] and vanishes elsewhere as in-
sal experiments. It puts constraints on the range of Af H ~ dicated in Fig. A(b). In mathematical language, such a
values that are compatible with the reaction reversals function, which basically describes the set Z, is often
and defines in that way the feasible solution set Z. Sup- called "indicator function" of Z.
pose that this range is [ - 2 9 9 9 . 7 , - 2 9 9 8 1 ; it is based The essential ingredient of Bayes estimation is the
on two half-brackets. For any value outside of this inter- combination of the above two types of information. It
val, the observed signs of Ar It (denoted as y in Eq. (10)) is mathematically achieved via Eq. (10). We arrive at the
would have had probability zero of occurring. For any intuitively expected result: first, values outside Z are ig-
value within that interval, this probability is one; this nored, and second, the area under the curve p(0) over
is true if we assume that the reversal experiments do Z is renormalized such that it becomes one again (see
48

Fig. A(c)). The ensuing new distribution p(01y) is the


posterior distribution of 0 in the light of the new data
y (cf. Fig. A(d)). Note that the denominator in Eq. (10),
p (y), is nothing but

p (y) = ~ p (yl0) p (0) d 0, (A 1)


t~
i.e., in our example the area under p(0) over the interval
Z. t~
For mathematical convenience, we denote the posteri-
or distribution by F and its desnity by y(0). Equation
(11) then gives three equivalent expressions for the mean
of the posterior distribution. This means (or expectation
value), indicated in Fig. A(d), is the Bayes estimator 0BE.
Applying Eq. (15) to our example, 0B~=
-2999.089 kJ/mol. By contrast, the MAP solution with t~ tl tl
quadratic objective function is that value from Z which Fig. B. A schematic illustration of Gibbs sampling. One full cycle
is "closest" to the expectation value of the prior distribu- leading from the initial point t o to the first pont t 1 is indicated
tion ( - 3 0 0 0 kJ/mol in our example), thus OMAP is
-2999.7 kJ/mol. And finally the R E G solution, 0REG,
is to a first order approximation -2998.85 kJ/mol, the
midpoint of the interval Z = [ - 2999.7, -2998]. In other cases, an estimate 0BE of 0BE can be obtained by Monte
words, 0REGsimply ignores the calorimetric information. Carlo methods. For restricted sampling, we would first
On the other hand, 0~tae makes no use of the right end- generate a sequence of random numbers from our prior
point of the interval Z. If the expectation value of the distribution (Fig. A(a)), e.g.:
prior distribution is within the interval Z, OMAe would
- 2999.50, - 2999.72, - 2999.87, - 3000.42,
even go so far as to ignore both endpoints of the interval
- 3001.60, - 3000.53, -2998.89, - 3000.19,
Z. It should be clearly understood that making partial
- 2998.63, - 2998.08.
use of available information has nothing to do with our
specific example, it is intrinsic to the R E G and MAP Following the process illustrated in Figs. A(a) to (d), we
methodololgy as is more apparent from our comment would then omit all such numbers which are outside
(e) in the section "The Bayes Method". of the interval [-2999.7, -2998]. The remaining
We now focus on the derivation of the uncertainty numbersconstitute a sample from the posterior distribu-
regions in our example. U 1 is simply the feasible interval tion and their mean is an estimate of 0B~. Thus,
Z = [ - 2999.7, - 2998] itself (Fig. A (e)). U 2 is obtained
by gradually "lowering" a threshold line until the area 0B~ = ( - 2999.50 - 2998.89 - 2998.63 - 2998.08):4
of the part of the density that "sticks out" carries 95% = - 2998.775 kJ/mol.
of the total probability mass (the stippled area in Fig.
A (e)); numerically, we get U 2 = [ - 2999.7, - 2998.26]. Although this tiny sample based on merely 4 numbers
Note that U 2 is not necessarily symmetrically disposed yields admitedly a very crude estimate compared to the
around 0B~. The adjusted variance from U3, however, value of 0B~= -2999.089 kJ/mol indicated in Fig. A(d),
yields symmetrical intervals around 0B~ of the type it serves well to illustrate the procedure. Note that in
this case, we have 4 '"hits'' and six "misses" out of ten
0B~ _+2 l/a- a (A 2). random numbers. If we so desire, we may improve our
estimate of 0BE by using a sample of 1000 random
Here a is an adjustment factor and a the standard devia- numbers. Results, not reproduced here, show that we
tion of the posterior distribution; in our example, then obtain 361 "hits" and 0B~ becomes -2999.093 kJ/
a = 0.43. The adjustment factor a_> 1 ensures that proba- mol, in near-agreement with 0BE given earlier. Unfortu-
bility statements associated with expressions like (A2) nately, the the ratio of "hits" to "misses" plummets with
are still meaningful and not overly optimistic even increasing number of dimensions and the situation is
though the posterior distribution can be far from normal. further exacerbated by concomitant "narrowing" of the
Here it is determined in such a way that the interval set Z, making restricted sampling of little use for the
0B~_+0.67. a ~ a carries at least 50% and that the inter- majority of our applications. In such cases, we recom-
val 0Be _+ 1.96. a ~ s. a carries at least 95 % of the probabil- mend resorting to Gibbs sampling. The basic idea of
ity mass of the posterior distribution F. Note that we the latter is schematically shown in Fig. B for a two-
invariably set a > 1 (cf. Eq. (13)) and that uncertainty re- dimensional setup. We start with an arbitrary point t o
gion of the type (A2) may extend beyond the boundaries = (t ~ t ~ inside of Z. First, we keep t o fixed and choose
of the feasible solutions set Z (Fig. A(f)). In our example, a new t o in accordance with Eq. (22); this yields (tl, t2~
a = 1.37, and U3 (for 20) is [-3000.10, -2998.07]. Next, we keep tl fixed and alter t o such that we end
Unfortunately, explicit formulae (like our Eq. (15)) for up with the next Gibbs point t 1 =(t~, t~). It is clear that
the calculation of 0B~ are generally not available. In such t 1 will strongly depend on the initial point t ~ but after
49

a sufficiently large n u m b e r , say k of r e p e t i t i o n s of t h a t Geman S, Geman D (1984) Stochastic relaxation, Gibbs distribu-
process, the last p o i n t of the chain, t k, will be " a l m o s t " tions and the Bayesian restoration of images. IEEE Trans Pattn
i n d e p e n d e n t of t o. F u r t h e r m o r e , the m e c h a n i s m of Anal Mach Intell 6:721-741
Gilks WR, Clayton D J, Spiegelhalter D J, Best NG, McNeil A J,
Eq. (22) implies t h a t t k also lies within Z a n d t h a t it
Sharpies LD, Kirby AJ (1993) Modelling complexity: Applica-
b e h a v e s a s y m p t o t i c a l l y like a r a n d o m v e c t o r f r o m the tions of Gibbs sampling in medicine. J Roy Statist Soc B 55:39
p o s t e r i o r d i s t r i b u t i o n . Therefore, b y i t e r a t i n g the w h o l e 52
p r o c e d u r e , we e v e n t u a l l y o b t a i n a p s e u d o - s a m p l e f r o m Gill PE, Murray W, Saunders MA, Wright MH (1983) User's guide
F, f r o m which a n e s t i m a t e 0BE m a y be c o m p u t e d . for SOL/QPSOL: a FORTRAN package for quadratic pro-
gramming. Technical Report SOL 83-7, Department of Opera-
Acknowledgments. We are highly indebted to the Deutsche For- tions Research, Stanford University, Stanford, California
schungsgemeinschaft, Bonn, for financially supporting this work Holdaway M J, Mukhopadhyay B (1993) A reevaluation of the sta-
for 18 months under the grant Ch 46/20-1. For occasional help bility relations of andalusite: Thermochemical data and phase
with programming, we wish to thank C. Sykes and R. Kriiger. diagram for the aluminum silicates. Am Mineral 78:298-315
We are grateful to H.W. Day, Davis, CA, and an anonymous re- Holland TJB, Powell R (1990) An enlarged and updated internally
viewer for their perceptive reviews which helped improve the pre- consistent thermodynamic dataset with uncertainties and corre-
sentation. In particular, they have been instrumental in our adding lations: the system K20 -- Na20 -- CaO - MgO -- MnO
the Appendix despite an increased length of this paper, a move - F e O - F e z O 3 - A 1 2 0 3 - T i O 2 S I O 2 - - C - H 2 - O 2 . J Meta-
kindly accepted by our executive editor K. Langer, Berlin. morphic Geol 8:8%124
K6nigsberger E (1991) Improvement of excess parameters from
thermodynamic and phase diagram data by a sequential Bayes
algorithm. CALPHAD 15:69 78
References K6nigsberger E, GamsjS.ger H (1990) Analysis of phase diagrams
employing Bayesian excess parameter estimation. Monatsh
Berman RG (1988) Internally-consistent thermodynamic data for Chemie 121:119 127
minerals in the system N a 2 0 - K 2 0 - C a O - M g O - F e O Kohn MJ, Spear FS (1991) Error propagation for barometers: I.
- F e 2 0 3 - A1203- SiO2- T i O 2 - H2 O - CO2. J Petrol 29:445- Accuracy and precision of experimentally located end-member
522 reactions. Am Mineral 76:128 137
Berman RG (1991) Thermobarometry using multi-equilibrium cal- Kolassa JE (1991) Confidence intervals for thermodynamic con-
culations: a new technique, with petrological applications. Can- stants. Geochim Cosmochim Acta 55:3543 3552
ad Mineral 29:833-855 Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller
Berman RG, Brown TH (1985) Heat capacity of minerals in the E (1953) Equations of state calculations by fast computing ma-
system N a z O - K 2 0 - - C a O - M g O - FeO-- F e 2 0 3 - - A120 a chines. J Chem Phys 21:1087 1091
- SiO 2 - TiO 2 - H 2 0 - CO 2 : representation, estimation, and Miller K, Olbricht W, Chatterjee ND (1994) Bayes estimation of
high temperature extrapolation. Contrib Mineral Petrol internally consistent thermodynamic data for minerals: some
89:168-183 computational and programming aspects (in preparation)
Berman RG, Engi M, Greenwood HJ, Brown TH (1986) Derivation Powell R (1985) Geothermometry and geobarometry: a discussion
of internally consistent thermodynamic data by the technique J Geol Soc London 142:29 38
of mathematicl programming: a review with application to the Powell R, Holland T (1993) The applicability of least squares in
system M g O - S i O 2 - H 2 0 . J Petrol 27:1331-1364 the extraction of thermodynamic data from experimentally
Besag J, Green PJ (1993) Spatial statistics and Bayesian computa- bracketed mineral equilibria. Am Mineral 78:107 112
tion. J Roy Statist Soc B55:25-37 Powell R, Holland TJB (1985) An internally consistent thermody-
Box GEP, Tiao GC (1973) Bayesian Inference in Statistical Analy- namic dataset with uncertainties and correlations: I. Methods
sis. Addison-Wesley, Reading, Mass and a worked example. J MetamOrphic Geol 3 : 327-342
Chatterjee ND (1990) Applied Mineralogical Thermodynamics: Se- Ripley BD (1987) Stochastic Simulation. Wiley, New York
lected Topics. Springer, Berlin Heidelberg New York Ritov Y (1989) Monte Carlo computation of the mean of a function
Chatterjee ND, Miller K, Olbricht W (1994) Bayes estimation: a with convex support. Comp Statist Data Analysis 7:269-277
novel approach to derivation of internally consistent thermody- Robie RA, Hemingway BS, Fisher JR (1979) Thermodynamic
namic data for minerals, their uncertainties, and correlations. properties of minerals and related substances at 298.15 K and
Part II. Application. Phys Chem Minerals 21:50-62 1 bar (10 s Pascals) pressure and at higher temperatures. US
Demarest HH, Haselton HT (1981) Error analysis for bracketed Geol Surv Bull 1452
phase equilibrium data. Geochim Cosmochim Acta 45:217-224 Rubinstein RY (1981) Simulation and the Monte Carlo Method.
Dohmann B, Falk M, Lessenich K (1991) The random number Wiley, New York
generators of the Turbo Pascal family. Comp Statist Data Anal- Smith AFM (1991) Bayesian computational methods. Phil Trans
ysis (Statistical Software Newsletter) 12:129-132 Roy Soc London A337:369-386
Engi M (1992) Thermodynamic data for minerals: a critical assess- Smith AFM, Roberts GO (1993) Bayesian computation via the
ment. In: Price GD, Ross NL (eds) The Stability of Minerals. Gibbs sampler and related Markov chain Monte Carlo meth-
Chapman and Hall, London, pp 267-328 ods. J Roy Statist Soc B55:3 23
Flournoy N, Tsutakwa RK (eds) (1991) Statistical Multiple Integra- Smith RL (1984) Efficient Monte Carlo procedures for generating
tion. Contemporary Mathematics 115. Am Math Soc, Provi- points uniformly distributed over bounded regions. Operations
dence, RI Res 32:1296 1308
Gelfand AE, Hills SE, Racine-Poon A, Smith AFM (1990) Illustra-
tion of Bayesian inference in normal data models using Gibbs
sampling. J Am Statist Assoc 85:972-985

You might also like